Making a Data Storage Choice
Amazon Web Services (AWS) is the most popular cloud service provider today. Since inception, AWS has pioneered and revolutionized the way modern organizations deploy, operate and manage their IT services in a virtualized environment.
Thousands of companies around the world have either partially or fully migrated their IT workload to AWS. Companies with an on-premises presence would usually take a gradual approach to migrating and adapting to the cloud. Modern startups on other hand would almost certainly be cloud native. Regardless of where they come from, sooner or later an organization using the cloud would need to decide what type of data they would store in the cloud and how they would store it. Based on the application being migrated, this choice can be either straightforward or a bit tricky. This is particularly true for AWS, where a number of options exist for data storage, all catering for different use cases.
In this three-part article series, we will provide a basic introduction to some of the storage, database and analytics solutions offered by Amazon Web Services. First up, we’ll dig into AWS database storage options, looking specifically at Amazon Simple Storage Service (AWS S3) and Amazon Glacier.
AWS Data Storage Services
Simple Storage Service (S3)
Simple Storage Service (S3 for short) is one of the earliest and most widely-adopted cloud services from AWS. Amazon created S3 as a completely new file system from ground up, with its own set of commands for file manipulation.
With S3, data can be stored in different “buckets” which are logical placeholders for data, much like the folders in a computer file system. Unlike a regular file server though, data in S3 is highly durable (99.99999999999 percent).
Like all other AWS services, S3 has evolved over time. It can be the tool of choice for a wide variety of use cases. Here are some common ones:
- At a very basic level, S3 can be a host for a company’s documents and files and mapped as a file server. These files can be encrypted with Amazon Key Management Service (KMS) keys.
- S3 can be be used to host a static website’s contents. Its contents can be cached using CloudFront content delivery network.
- S3 is a popular choice for storing log files. This includes:
- Log files from native AWS services like CloudTrail, Lambda or CloudWatch
- Log files from application and third-party tools
- It’s used as a backup service by some AWS services to store snapshots, or database backups.
- It can be used a source or destination for data movement. For example, Amazon Redshift can import from and export data to S3. Amazon Elastic MapReduce can use S3 as EMR File System (EMRFS). Other use cases can include S3 as part of an enterprise data lake.
- Particular types of data stored in S3 can be directly queried from analytics applications like Amazon Athena with Structured Query Language (SQL).
If data in S3 is considered as “hot,” that in Glacier can be classed as “cold backup.” Amazon Glacier is a storage service where S3 data can be archived for long term retention. Storage in Glacier is extremely cheap, but it also means retrieving the data from Glacier into S3 can take a long time.
With Amazon Glacier, users create “vaults,” or virtual silos for data files. Once a vault is created, it can be configured to send a message whenever an action is performed on it.
Data in an S3 bucket can be configured with a lifecycle policy so that when it reaches a certain age, it’s automatically transferred to Glacier. This can be an ideal solution for companies which need to keep their data archived for long periods of time for regulatory and compliance purposes.
- Long-term archival of enterprise backups
- Long-term archival of log files
- Long term archival of source data used for processing
Stay tuned for the second part in this series next month, where we’ll walk you through which AWS database services are available to your organization with the goal of helping you find the best fit for your current and future IT needs.