AWS S3

Amazon Simple Storage Service (S3) provides developers and IT Teams with secure, durable, highly scalable cloud storage. S3 is easy to use object storage, with a simple web service interface to store and retrieve any amount of data from anywhere on the web.

  • Object Storage:
    • Data (files/videos/pictures) and associated metadata stored as objects
    • Can't create an operating system on it (i.e. like with a filesystem )
    • Objects can be up to 5TB in size

  • Highly Durable:
    • Objects are 99.999999999% durable (11 9's)
    • This means you may lose 1 object in every 100 billion
    • Data is replicated across multiple devices in multiple facilities

  • Highly Available:
    • Offers 99.99% availability

  • Highly Scaleable:
    • For users, it's basically infinite amount of storage

  • Web Based:
    • Upload & download data using web based protocols over the internet

  • Secure:
    • Features can be applied to improve confidentiality, integrity, availability and accountability of data

  • Utility Based Pricing:
    • Pay only for what you use

Source

What is S3 Used for?

  • Backup and Archiving
  • Content Storage and Distribution
  • Big Data and Analytics
  • Static Website Hosting
  • Disaster Recovery

S3 Consists of Buckets

  • A bucket is a basic container within S3 used for storage of objects
  • ARN: Both buckets and objects are classed as resources (any entity in AWS that you can work with)
  • Referred to with an Amazon Resource Name (ARN)
  • Upload as many objects as you like into buckets, can create up to 100 buckets by default (i.e. Soft limit. You can add more by requesting a service limit increase)
  • Buckets must be created in a region
  • Objects stored in a region stay in that region unless you explicitly transfer them out
  • Buckets have subresources that basically define how the bucket is configured
    • A subresource is a resource that belongs to another resource and cannot exist on its own
  • Amazon S3 has a set of dual-stack endpoints, which support requests to S3 buckets over both Internet Protocol version 6 (IPv6) and IPv4. For more information

S3 Namespace

  • S3 has a universal namespace
    • Bucket names must be unique globally regardless of the region they are created in
    • S3 has a flat structure
    • Unlike a file system, it has no directories...
      • However directories can be imitated by the use of prefixes
    • http://siaweb.s3.amazonaws.com/SiaBday20120705/156.jpg
      Note: /SiaBday20120705/ is Prefix to the object /SiaBday20120705/156.jpg
    • Object key names can use UTF-8 encoding but must not be longer than 1024 bytes
    • When naming objects, it's recommended to use DNS safe naming and characters:
      [0-9A-Za-z] and !, - , _, . , *, ', (,)
  • Can be accessed via Virtual or Path style URL
    • Virtual:
      http://bucket.s3.amazonaws.com
      http://bucket.s3-aws-region.amazonaws.com
    • Path:
      http://s3-aws-region.amazonaws.com/bucket
  • Example:
    http://siaweb.s3.amazonaws.com/1/index.html
    http://s3.amazonaws.com/siaweb/1/index.html

S3 Objects

  • S3 is a key, value store designed to store an unlimited number of objects
  • Objects consist of
    • Key = Name of object
    • Value = The data being stored (0-5TB)
    • Version ID = A string of data assigned to an object when versioning is enabled
      • Bucket + Key + Version ID = Uniquely identify an object in S3
    • Metadata = Name-value pairs which are used to store information about the object
    • Subresources = Additional resources specifically assigned to an object
    • Access control information = Policies for controlling access to the resource
  • Object Tagging
    • Object tagging allows you to categorise objects using a key/value pair
    • PROD=website
    • Classification=confidential
    • Object tags enable
      • Fine grained access control
      • Fine grained lifecycle managment
      • Filtering for CloudWatch metrics and CloudTrail Logs
    • Object Tagging Features
      • Keys can be 128 unicode characters in length
      • Values can be 256 unicode characters in length
      • Keys and Values are case sensitive
      • Up to 10 tags per object
      • Each Tag must have a unique key

S3 Consistency Model

  • S3 provides read-after-write consistency for puts of new objects
    • Can only read the data after its been successfully written to all facilites and returned success
  • S3 provides eventual consistency for overwrite puts (updates) and deletes
    • For updates old data may be returned
    • For deletes old data may be returned or a deleted key may still show in a list
  • Eventual consistency provides low latency and high throughput
  • Also note S3 does not provide object locking
    • If two requests are made at roughly the same time the one with the latest timestamp wins

Storage Classes/Tiers

  • S3 provides different tiers of storage based on need:
    • S3 Standard: 99.99% availability, 99.999999999% durability, stored redundantly across Multiple devices in multiple facilities and is designed to sustain the loss of two facilities concurrently
    • S3 Standard - Infrequent Access: Used for data that is accessed less frequently, but requires rapid access when needed. Lower fee than S3 but you are charged a revival fee.
    • Reduced Redundancy Storage: Does not replicate as many times as standard S3 and therefore provides 99.99% availability and durability and comes at a lower cost
    • Glacier: Extermely cheap but only suitable for archival data or infrequently accessed data. Data is not available in real-time and instead must be restored. You can select a retrieval tier which will determine the restore time.
  • S3 provides Lifecycle Policies that allow objects to transition between the storage classes