Simple Storage Service(S3)

Amazon S3 is a simple key-value store designed to store an unlimited number of objects. These objects are stored in buckets, and each object can be up to 5 TB in size.

An object in Amazon S3 consists of the following:

  1. Key: The unique name to an object, used to retrieve it.
  2. Version ID: Within a bucket, a combination of key and version ID uniquely identifies an object.
  3. Value: The content being stored.
  4. Metadata: A set of name-value pairs used to store additional information about the object.
  5. Subresources: Used by Amazon S3 to store object-specific additional information.
  6. Access Control Information: Controls the access permissions for the object.

It’s important to note that metadata, which is included with the object, is not encrypted while stored in Amazon S3. Therefore, AWS recommends not placing sensitive information in S3 metadata.

1. S3 Standard Uploads

  1. Uploading data to S3 from the internet is free.
  2. No data transfer charges apply for uploading into S3, only charges for the storage of the object.

2. Improve file upload speed into Amazon S3

  1. S3 Transfer Acceleration (S3TA)
    1. S3TA improves transfer performance by routing traffic to the nearest edge location and using optimized Amazon CloudFront distribution paths to accelerate uploads.
    2. S3TA can speed up the upload and download of objects to and from S3 bucket, particularly when transferring large files over long distances.
    3. When S3TA is enabled, AWS charges additional fees($0.04/GB) for accelerated transfers.
    4. However, if S3TA does not result in an acceleration, AWS does not charge for S3TA.
  2. multipart uploads
    1. Multipart uploads allow large files to be split into smaller parts and uploaded in parallel, significantly speeding up the upload process.
    2. It is a cost-effective option that doesn't require additional network infrastructure.
    3. Use multi upload instead S3TA where network has intermittent failures.

3. Speeding up uploads using Multipart Upload and S3 Transfer Acceleration together

  1. Multipart Upload: Breaks the file into smaller parts and uploads them in parallel, improving upload speed.
  2. S3 Transfer Acceleration: Optimizes the transfer speed by routing the data through AWS edge locations, ensuring faster uploads over long distances

Using these two features together allows you to take advantage of both parallel uploads (for efficiency) and faster routing (for speed)

4. Versioning

  1. Once Versioning is enabled on an S3 bucket, it can be suspended, but it cannot be permanently turned off. Suspending versioning means that no new versions of objects will be created for updates, but existing versions will still be retained.

5. Protection against accidental deletion of objects

  1. Versioning is a means of keeping multiple variants of an object in the same bucket. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. Versioning-enabled buckets enable you to recover objects from accidental deletion or overwrite.
  2. MFA delete: To provide additional protection, enable multi-factor authentication (MFA) delete. MFA delete requires secondary authentication to take place before objects can be permanently deleted from an Amazon S3 bucket.

Question: Which set of Amazon S3 features helps to prevent and recover from accidental data loss? Ans: Enable the versioning and MFA Delete features on the S3 bucket.

6. Locking objects with Object Lock

Object Lock provides two mechanisms to manage object retention: retention periods and legal holds. Object Lock operates only in buckets with S3 Versioning enabled.

  1. Retention Periods
    1. A retention period specifies a fixed duration during which an object remains locked and cannot be deleted or overwritten.
    2. Retention periods can be applied to object versions either explicitly or through a default setting at the bucket level.
    3. When applied explicitly, a Retain Until Date is set for the specific object version, stored in its metadata, ensuring protection until the retention period expires.
  2. Legal Holds
    1. Legal holds prevent objects from being overwritten or deleted.
    2. Unlike retention periods, legal holds do not have a fixed expiration date and can be applied independently.
  3. Retention Modes
    1. Compliance Mode:
      1. Provides the highest level of protection.
      2. Prevents any user, including the root account, from modifying or deleting objects during the retention period.
    2. Governance Mode:
      1. Protects objects from most users but allows authorized users with specific permissions (e.g., s3:BypassGovernanceRetention) to modify retention settings or delete objects, if necessary.
  4. Version-Specific Protection
    1. Object Lock settings are applied per object version.
    2. Different versions of the same object can have unique retention modes and durations, enabling granular control.

7. Transitioning objects using Amazon S3 Lifecycle

Amazon S3 supports a waterfall model for transitioning between storage classes, meaning that objects in S3 transition progressively from higher-cost to lower-cost storage classes as they age or are accessed less frequently. This transition flows in a downward direction, resembling a waterfall.

  1. Sequential Transitions: Objects move from more expensive storage classes to less expensive ones (e.g., S3 Standard → S3 Standard-IA(Infrequent Access) → S3 Glacier → S3 Glacier Deep Archive).
  2. No Reverse Flow: Objects cannot automatically move back to a higher-cost storage class. If needed, they must be restored or manually copied back to the desired storage class.
  3. S3 Intelligent-Tiering: Automatic Movement Across Tiers
    1. S3 Intelligent-Tiering lies in its automatic tiering capability, which sets it apart from the typical S3 lifecycle transitions.
    2. Unlike other storage classes, S3 Intelligent-Tiering automatically moves objects between its tiers based on access patterns, without requiring a predefined lifecycle rule.
  4. Lifecycle: S3 Standard → S3 Standard-IA → S3 Intelligent-Tiering → S3 One Zone-IA → S3 Glacier → S3 Glacier Deep Archive
  5. Constraints and considerations for transitions alt text

8. Constraints and considerations for transitions

  1. Objects Smaller Than 128 KB:
    1. Objects smaller than 128 KB cannot be transitioned to any storage class by default due to high transition request costs.
    2. You can override this behavior by adding an object size filter (ObjectSizeGreaterThan or ObjectSizeLessThan) in the Lifecycle rule.
  2. Minimum Storage Duration Before Transitioning
    1. Objects must must be first stored in the S3 Standard storage class for at least 30 days before transition objects to another storage class, such as S3 Standard-IA or S3 One Zone-IA or or S3 Glacier.
    2. Non-current objects (in versioned buckets) must also be 30 days noncurrent before transitioning.
  3. Charges for Transitioning Before Minimum Storage Duration:
    1. If an object is transitioned to another storage class before meeting the minimum storage duration of the current storage class, you will be charged for the full minimum duration.
    2. For example, if an object is stored in S3 Glacier Instant Retrieval, which has a 90-day minimum storage duration, and you transition it after 60 days, you will still incur charges for the remaining 30 days to fulfill the 90-day minimum requirement.

9. Choose storage classes

  1. S3 Standard-IA: Data that is accessed less frequently but requires rapid access when needed.
  2. S3 One Zone-IA: Data that can be easily recreated and is stored in a single availability zone.
  3. S3 Glacier Instant Retrieval: Data that is rarely accessed but still needs to be retained, with retrieval times ranging from minutes to hours.
  4. S3 Glacier Flex Archive: Data that is rarely accessed but needs flexible retrieval times, offering a balance between retrieval cost and speed.
  5. S3 Glacier Deep Archive: Data that is rarely accessed (less than once per year) and can be retrieved with long retrieval times (hours to days).
  6. S3 Intelligent-Tiering: Data with unpredictable access patterns, automatically moving between frequent and infrequent access tiers.

10. S3 Prefix, Folder and delimiter

S3 automatically scales to handle high request rates. For instance, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.

  1. There is No limits exist for the number of prefixes in a bucket.
  2. You can increase your read or write performance by Parallelizing reads across multiple prefixes.
  3. For example, if creating 10 prefixes in a bucket to parallelize reads, then we can scale read performance to 55,000 read requests per second.
  4. S3 doesn’t have actual folders but simulates them using prefixes in the object keys.Each prefix will simulate a folder structure in S3
  5. Important keywords
    1. Bucket Name: learnings3bucket24
    2. Prefix: learning1/img/
    3. Delimiter: /
    4. Folders: learning1, img
    5. Object: s3img.png alt text

11. Multipart upload

  1. se a multipart upload to upload larger files, such as the large log files.
  2. If transmission of any part fails, you can retransmit that part without affecting other parts.
  3. Best to use it where network has intermittent failures.

s3-multi-region-access-points

12. What is S3 sync? When to use it?

The AWS S3 console is not suitable for copying very large volumes of data (such as terabytes or petabytes) between S3 buckets because it is slow, manual, and not designed for bulk data transfer. The console is recommended only for small to medium data volumes.

For large-scale or automated transfers, AWS provides the aws s3 sync command in the AWS CLI.

The aws s3 sync command is a powerful tool used to synchronize the contents of one location with another. It can sync:

  1. A local directory → S3 bucket
  2. S3 bucket → local directory
  3. One S3 bucket → another S3 bucket (same region or cross-region)

It is particularly useful when you need to copy a large amount of data or keep two locations continuously synchronized.

Basic Commands:

aws s3 sync s3://source-bucket s3://target-bucket

# --dryrun: Simulates the sync without making any changes, useful for testing.
aws s3 sync s3://source-bucket s3://target-bucket --dryrun

#--delete: Deletes objects in the target that are no longer present in the source.
aws s3 sync s3://source-bucket s3://target-bucket --delete
  1. Key Features:

    1. The command compares objects based on Object existence and LastModified date
    2. Only the current version of the object is copied in a versioned bucket.
    3. By default, the sync command preserves object metadata.
  2. Use Cases:

    1. Backup: Sync files from one bucket to another for backup purposes.
    2. Cross-Region Sync: Synchronize data between buckets in different AWS regions.
    3. Incremental Sync: Only copy new or updated objects, avoiding unnecessary duplication.

13. Amazon S3 Storage Class Latency (Low to High)

S3 Standard (<10 ms) < S3 Intelligent-Tiering (<10 ms) < S3 Standard-IA (<10 ms) < S3 One Zone-IA (<10 ms) < S3 Glacier Instant Retrieval (milliseconds) < S3 Glacier Flexible Retrieval (minutes to hours) < S3 Glacier Deep Archive (up to 12 hours).

14. S3 Object Encryption: Server-Side and Client-Side Options

In Amazon S3, encryption is applied at the object level, not at the bucket level. Each object (file) stored in the bucket is individually encrypted. You can enforce encryption at the bucket level by enabling a bucket policy or default encryption, which ensures all newly uploaded objects are encrypted. Enabling default encryption does not retroactively encrypt existing objects in the bucket.

  1. Server-Side Encryption (SSE):
    1. SSE-S3 (Server-Side Encryption with S3-Managed Keys):
      1. Amazon S3 handles both the encryption and decryption of the data automatically using its own keys.
      2. Encryption is done using AES-256 encryption. Example: aws s3 cp file.txt s3://bucket-name/ --sse AES256
    2. SSE-KMS (Server-Side Encryption with AWS KMS):
      1. Uses keys managed by AWS KMS (either the default KMS key or a customer-managed KMS key).
      2. You have more control over key management, auditing, and key rotation.
    3. SSE-C (Server-Side Encryption with Customer-Provided Keys):
      1. The customer provides their own encryption key during upload and download operations.
      2. Amazon S3 does not store the encryption key; it's the customer's responsibility to manage it securely.
  2. Client-Side Encryption:
    1. Client-Side Encryption with AWS KMS:
      1. The client encrypts data before uploading to S3 using AWS KMS-managed keys.
      2. The encryption keys are managed by AWS KMS.
    2. Client-Side Encryption with Customer-Provided Keys:
      1. The client is responsible for both encrypting data and managing the encryption key.
      2. The client provides the key when uploading and downloading data.

15. Amazon S3 Website Endpoint Formats:

Amazon S3 website endpoints follow one of these two formats. These two formats depend on the AWS Region where the bucket is hosted.

  1. http://bucket-name.s3-website.Region.amazonaws.com
  2. http://bucket-name.s3-website-Region.amazonaws.com

When you configure an Amazon S3 bucket for static website hosting, AWS assigns a Region-specific website endpoint to the bucket. Always ensure that the bucket name is at the start of the URL and the Region is properly placed based on the format (dash (-) or dot (.)).

16. Secure Data in S3 Object Storage with Immutability

Amazon S3 provides S3 Object Lock and S3 Glacier Vault Lock features to ensure that data cannot be altered or deleted for a specified duration. These features help organizations meet compliance requirements and protect critical data from accidental or malicious changes.

  1. S3 Object Lock
    1. Designed for Amazon S3 Standard and other storage classes like Standard-IA or Intelligent-Tiering.
    2. Implements Write Once, Read Many (WORM) protection at the object level.
    3. Ensures objects cannot be modified or deleted during the specified retention period.
    4. Not supported for Glacier or Glacier Deep Archive storage classes.
  2. Amazon S3 Glacier Vault Lock
    1. Specifically designed for Glacier storage classes (Glacier and Glacier Deep Archive).
    2. Provides WORM protection at the vault level (not individual objects).
    3. Once a vault lock policy is locked, it cannot be changed, ensuring compliance with stringent regulations (e.g., SEC 17a-4).

S3 Object Lock cannot be used directly on Amazon S3 Glacier storage classes. Use S3 Glacier Vault Lock for WORM protection in Glacier-based storage.

A vault is a logical container or group that holds a collection of S3 Glacier objects. It acts as a secure storage location where organizations can store large amounts of data in compliance with regulatory requirements.

In Glacier, vaults help ensure data can be managed securely, comply with regulatory requirements.

17. Access Control Lists and Bucket Policies

Access Control Lists (ACLs) and Bucket Policies are both used to control access to Amazon S3 resources, but they serve slightly different purposes:

  1. Access Control Lists (ACLs):
    1. ACLs define access permissions at the individual object or bucket level.
    2. Uses predefined groups identifiers (e.g., AuthenticatedUsers, AllUsers, Owner)
    3. Best for simple, specific, object-level access control.
  2. Bucket Policies:
    1. Bucket Policies operate at the bucket level and can apply to all objects within that bucket.
    2. Uses JSON-based policies with resource-based access control, applying conditions like AWS:SourceIP, aws:username, s3:prefix, etc.
    3. Best for enforcing complex access policies across an entire bucket.

18. Amazon S3 Event Notifications and Destinations

Amazon S3's notification feature allows you to receive alerts when specific events occur in your bucket. To enable notifications, you need to configure:

  1. The events you want Amazon S3 to track and publish (e.g., object creation or deletion).
  2. The destination where the event notifications should be sent.
  3. Supported Notification Destinations:
    1. Amazon SNS Topics: Deliver notifications to subscribers.
    2. Amazon Standard SQS Queues: Queue the events for processing.
    3. Lambda Functions: Trigger serverless workflows.
  4. Important to Note on SQS Compatibility
    1. Standard SQS Queues: Supported as valid destinations for Amazon S3 event notifications.
    2. FIFO SQS Queues: Not currently supported as destinations for Amazon S3 event notifications.

Note: In case of S3 event notifications, we can specify only one destination type (either to an AWS Lambda, SNS, EventBridge, etc.) for each event notification.

If we need to send two or more different notifications (for example, one to trigger AWS Lambda and another to trigger Amazon SageMaker Pipelines), EventBridge is the ideal solution because EventBridge allows multiple targets for a single event, making it highly flexible.

19. Amazon S3 Consistency

  1. Amazon S3 always returns the latest version of the object.
  2. S3 provides strong read-after-write consistency for GET, PUT, and LIST operations, ensuring immediate access to the latest data after a write.
  3. Strong read-after-write consistency ensures that once an object is written or modified, the next read immediately returns the most up-to-date version of the object.
  4. Strong consistency applies to list operations as well, providing an accurate reflection of objects in a bucket post-write.
  5. No changes to performance, availability, or regional isolation—delivered at no extra cost.
  6. Useful for applications requiring immediate reads and listings after writing objects.

20. Download and upload objects with presigned URLs

use pre-signed URLs to upload and download files from an AWS S3 bucket while keeping the bucket private.

21. Question: S3- Intelligent-Tiering

A company stores user data in AWS. The data is used continuously with peak usage during business hours. Access patterns vary, with some data not being used for months at a time. A solutions architect must choose a cost-effective solution that maintains the highest level of durability while maintaining high availability. Which storage solution meets these requirements?

  1. Amazon S3 Intelligent-Tiering (Correct Ans)
  2. Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
  3. Amazon S3 Glacier Deep Archive
  4. Amazon S3 Standard

22. Question: Server-side encryption with AWS KMS keys

A company is preparing to store confidential data in Amazon S3. For compliance reasons, the data must be encrypted at rest. Encryption key usage must be logged for auditing purposes. Keys must be rotated every year. Which solution meets these requirements and is the MOST operationally efficient?

  1. Server-side encryption with customer-provided keys (SSE-C)
  2. Server-side encryption with Amazon S3 managed keys (SSE-S3)
  3. Server-side encryption with AWS KMS keys (SSE-KMS) with manual rotation
  4. Server-side encryption with AWS KMS keys (SSE-KMS) with automatic rotation (Correct Ans)

23. Question: S3 Versioning and Object Lock

A company has a production web application in which users upload documents through a web interface or a mobile app. According to a new regulatory requirement. new documents cannot be modified or deleted after they are stored. What should a solutions architect do to meet this requirement?

  1. Store the uploaded documents in an Amazon S3 bucket with S3 Versioning and S3 Object Lock enabled. (Correct Ans)
  2. Store the uploaded documents in an Amazon S3 bucket. Configure an S3 Lifecycle policy to archive the documents periodically.
  3. Store the uploaded documents in an Amazon S3 bucket with S3 Versioning enabled. Configure an ACL to restrict all access to read-only.
  4. Store the uploaded documents on an Amazon Elastic File System (Amazon EFS) volume. Access the data by mounting the volume in read-only mode.

Explanation: Option 1 is correct because Object Lock ensures that objects stored in the S3 bucket cannot be deleted or overwritten for a specified retention period or indefinitely if "Legal Hold" is applied. Versioning tracks multiple versions of an object. When combined with Object Lock, it ensures that even if an object is uploaded with the same key, previous versions remain immutable.

Option 3 is incorrect because versioning tracks changes, configuring an ACL for read-only access does not guarantee data immutability.An administrator or authorized user could still delete objects or versions.

24. Question: Requester Pays feature on S3 bucket

A survey company has gathered data for several years from areas in the United States. The company hosts the data in an Amazon S3 bucket that is 3 TB in size and growing. The company has started to share the data with a European marketing firm that has S3 buckets. The company wants to ensure that its data transfer costs remain as low as possible. Which solution will meet these requirements?

  1. Configure the Requester Pays feature on the company's S3 bucket. (Correct Ans)
  2. Configure S3 Cross-Region Replication from the company's S3 bucket to one of the marketing firm's S3 buckets.
  3. Configure cross-account access for the marketing firm so that the marketing firm has access to the company's S3 bucket.
  4. Configure the company's S3 bucket to use S3 Intelligent-Tiering. Sync the S3 bucket to one of the marketing firm's S3 buckets.

Explanation: Option A is correct because By enabling the Requester Pays feature on the company's Amazon S3 bucket, the marketing firm (the requester) will pay for the data transfer and request costs when accessing the data. This approach minimizes the data transfer costs incurred by the company while still allowing the marketing firm access to the data.

Option B is incorrect because Cross-Region Replication is used to replicate data between buckets in different AWS regions. It does not reduce data transfer costs out of AWS—it only ensures data is replicated automatically between regions. The company would still bear the costs of transferring the data to the European marketing firm.