Skip to main content

Stories Storage Solution

Overview

The decision to only migrate Stories with over 200 page views in the last 12 months has been taken with the understanding that there will be a storage solution to save the HTML, CSS and assets of all other stories which do not meet this criteria.

AWS Storage Classes

S3 storage classes serve different use cases. The primary drivers of choice come down to how often you need to access the storage and the level of redundancy you need. These storage classes are not static; they can be updated manually or via a lifecycle policy. S3 lifecycle policy allows you to manage the data lifecycle (i.e., data will automatically transfer to a different storage class without any changes to your application). Data is stored in an AWS S3 bucket, and each bucket holds objects. Each bucket can have objects with a mixture of classes.

Storage classes and expected uses are:

  • S3 Standard: general-purpose storage of frequently accessed data
  • S3 Intelligent-Tiering: unknown or changing access patterns
  • S3 Standard-IA: long-term but less frequently accessed data
  • S3 One Zone-IA: long-term, less frequently accessed data in one geo
  • S3 Glacier Instant Retrieval: long-term archive with fast retrieval
  • S3 Glacier Flexible Retrieval: long-term archive without a need for quick retrieval

S3 Standard

S3 standard is the AWS default. We see that over 93% of objects belong to this class. It’s designed for data that is accessed frequently.

S3 Standard is the workhorse of Amazon S3. The low latency and high throughput make it an extremely versatile backbone for many applications. However, it is the most expensive storage class per GB stored.

S3 Intelligent-Tiering

S3 Intelligent-Tiering relies on inbuilt monitoring and automation capabilities to move data between a frequent-access tier (FA) and an infrequent-access (IA) tier for cost optimization. Intelligent-Tiering ensures you’re not paying FA storage prices for data that isn’t being accessed regularly — files stored in FA are charged at the S3 Standard rate while files stored in IA are discounted between 40-46%. While there is a monthly monitoring and auto-tiering fee, there are no data retrieval fees, so you don’t have to worry about unexpected bill spikes if a data access pattern changes.

S3 Standard-Infrequent Access (IA)

S3 Standard-IA is best for storing data that is accessed less frequently than data stored in S3 Standard, but that still requires rapid access when needed. It’s ideal for long-term storage or backups, and it’s often used as a data store for disaster recovery. Its storage costs are lower (discounted 40-46%) than S3 Standard, but there are data retrieval charges.

S3 One Zone-Infrequent Access (S3 One Zone-IA)

This Amazon S3 class stores data in a single AWS Availability Zone (AZ). Unlike the other S3 classes, it isn’t designed to be resilient to the physical loss of an AZ due to a major event such as an earthquake or a flood. But if you don’t need the extra protection provided by geographic redundancy, then you can take advantage of prices 20% lower than S3 Standard-IA.

S3 Glacier Instant Retrieval

Amazon S3 Glacier Instant Retrieval delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds.

S3 Glacier Flexible Retrieval

S3 Glacier Flexible Retrieval offers similar capabilities as S3 Glacier Instant Retrieval but is only expected to be accessed one to two times a year and doesn’t need immediate access. S3 Glacier Flexible Retrieval provides a balance between cost and access time.

Comparison

S3 StandardS3 Intelligent-TieringS3 Standard-IAS3 One Zone-IAS3 Glacier Instant RetrievalS3 Glacier Flexible Retrieval
Availability99.99%99.9%99.9%99.5%99.9%99.99%
Zones> 3> 3> 31> 3> 3
Minimum capacity charge per objectN/AN/A128KB128KB128KB128KB
Minimum storage durationN/AN/A30 days30 days90 days90 days
Retrieval chargeN/AN/Aper GB retrievedper GB retrievedper GB retrievedper GB retrieved
First byte latencymillisecondsmillisecondsmillisecondsmillisecondsmillisecondsminutes or hours
  • S3 Intelligent-Tiering can store objects smaller than 128 KB, but auto-tiering has a minimum eligible object size of 128 KB. S3 Standard-IA and S3 One Zone-IA storage have a minimum billable object size of 128 KB. Smaller objects may be stored but will be charged for 128 KB of storage at the appropriate storage class rate. * The S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes require an additional 32 KB of data per object for S3 Glacier’s index and metadata charged at the appropriate storage class rate. Amazon S3 requires 8 KB per object to store and maintain the user-defined name and metadata for objects archived to S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive.

Cost

Components of S3 pricing

Storage costs

These costs are as you would expect — the cost of storing your data on S3, charged per GB-month.

Bucket and object requests

Unless you are using S3 for archive storage or regulatory compliance, S3 data doesn’t idly sit in storage. Access, edits, scanning: all these actions on your S3 data are quantifiable and, inevitably, billed. Each storage class has different pricing per request, but activities are the same across classes. They are:

  • POST creates a new object (i.e., upload of a new file)
  • PUT a new object or update of an existing object (i.e., creation or update of a file)
  • LIST request for the contents of a given S3 bucket
  • GET downloads a file from S3
  • DELETE would be the deletion of a file
  • S3 Select Data Returned/Data Scanned. S3 Select pulls only the data you need from a storage object. Smaller data pulls improve performance

S3 Storage Management

Keeping on top of your data requires monitoring tools. S3 monetizes those too. They are:

  • S3 Inventory of listed objects. Cost: $0.0025 per million objects listed.
  • S3 Analytics Storage Class Analysis monitors access to objects. Cost: $0.1 per million objects monitored per month.
  • S3 Object Tagging manages and controls access for S3 objects. Cost: $0.01 per 10,000 tags per month.
  • Data transfer. There are transfer costs associated with how you add or remove data. The main charges are for transferring out to the internet or between AWS regions. Not surprisingly, adding new data to S3 is free. But there are different tiers for the transfer of data once it’s added. Transferring from one S3 geo to another ($0.02/GB) is far cheaper than taking your AWS data and shifting to another provider (e.g., $0.09/GB up to 10TB per month). Organizations concerned with vendor lock-in must evaluate data transfer costs as part of their cloud migration TCO.

Storage Costs

S3 Standard – General purpose storage for any type of data, typically used for frequently accessed data
First 50 TB / Month$0.023 per GB
S3 Intelligent-Tiering – Automatic cost savings for data with unknown or changing access patterns
Monitoring and Automation, All Storage / Month (Objects > 128 KB)$0.0025 per 1,000 objects
Frequent Access Tier, First 50 TB / Month$0.023 per GB
S3 Intelligent – Tiering – Optional asynchronous Archive Access tiers
Archive Access Tier, All Storage / Month$0.0036 per GB
Deep Archive Access Tier, All Storage / Month$0.00099 per GB
S3 Standard – Infrequent Access – For long-lived but infrequently accessed data that needs millisecond access
All Storage / Month$0.0125 per GB
S3 One Zone – Infrequent Access – For re-creatable infrequently accessed data that needs millisecond access
All Storage / Month$0.01 per GB
S3 Glacier Instant Retrieval – For long-lived archive data accessed once a quarter with instant retrieval in milliseconds
All Storage / Month$0.004 per GB
S3 Glacier Flexible Retrieval (Formerly S3 Glacier) – For long-term backups and archives with retrieval option from 1 minute to 12 hours
All Storage / Month$0.0036 per GB

As you can see, there are a lot of moving parts to Amazon S3, all of which can unexpectedly drive up our costs.

Proposal

The first thing to note on this proposal is the need to have everything self contained per story for it to be able to function. This means we would require a folder structure similar to this:

. └── storage/ └── locale/ ├── the-story-url/ │ ├── index.html │ ├── style.css │ └── assets/ │ ├── asset-1.jpg │ ├── asset-2.mp4 │ └── font.woff2 └── the-story-url/ ├── index.html ├── style.css └── assets/ ├── asset-1.jpg ├── asset-2.webm └── font.woff2

To achieve this we will write a python script which scrapes the html, css and assets for the pages and rewrites the urls so they are relative to the html.index.

In terms of storage solutions we will aim to store our stories in both S3 Standard - IA and Google Drive. Storage them in two places has multiple benefits.

Google Drive

This will allow people within the organisation to quickly and easily download everything required to view a story page on their local machine. They would not need to do anything special other than opening the index.html in any browser.

S3

S3 is more of a permanent backup. As Google Drive has its pros for easy of use it does mean that we are susceptible to:

  • Files being changed
  • Files being lost

By having the same files in S3 it means we have a read only disaster recovery should we need it.

Resources