Amazon Elastic File System

Amazon EFS is a regional, fully managed NFSv4.1 file system for Linux workloads that supports shared, low-latency access across multiple EC2 instances.

  1. Designed for Linux-based workloads
  2. Uses NFSv4.1 only
  3. Exposes a POSIX-compliant file system interface
  4. Accessible from on-premises servers via AWS VPN or Direct Connect
  5. More expensive than S3 per GB, but provides shared, low-latency, mountable file storage
  6. Storage Types
    • EFS / FSx → File systems
    • S3 → Object storage
    • EBS → Block storage

Note: EFS is primarily a file system (software layer) exposed as a fully managed service. When we say file storage system, it usually refers to the complete solution, including storage hardware, access protocols, and management features. With EFS, AWS manages the storage backend, access, and scaling, so it functions as a full file storage service, even though architecturally it is a file system.

efs fsx

1. What is File Storage System?

A File Storage System is a type of storage system that stores data as files and folders in a hierarchical structure, similar to how we see it on our laptops or desktops.

  1. Key Characteristics
    1. Data organized into folders and subfolders
    2. Supports standard file CRUD operations
    3. Allows in-place file modification
    4. Supports permissions and file locking
    5. Multiple users or applications can access or modify the same files concurrently
    6. Can be mounted on servers
  2. Examples of File Storage Systems
    1. User-facing: Google Drive, Microsoft OneDrive, Dropbox, Apple iCloud Drive
    2. Enterprise: AWS EFS, Azure Files and Google Cloud Filestore
  3. How It Differs from Object Storage (Amazon S3)
    1. S3 does not have real folders (uses key prefixes)
    2. No in-place file edits — objects are immutable
    3. Updating data requires replacing the entire object
    4. Accessed via HTTP/REST APIs, not mountable like a file system
  4. File Storage System – Logical Components
    1. File System (Software Layer): Manages files, directories, metadata, permissions, hierarchy, and locking. Eg.- Local- ext4, NTFS, XFS, ZFS and Distributed - NFS, CephFS, GlusterFS
    2. Storage Backend (Data Layer): Stores the actual data blocks. Examples - Physical disks (HDD/SSD), Cloud block storage (e.g., AWS EBS) and Distributed storage nodes
    3. Access Protocol (Interface Layer): Defines how clients interact with the file system. Common Protocols are -
      • NFS – Linux/Unix
      • SMB/CIFS – Windows

2. Accessing EFS

  1. Accessing EFS Across Different AWS Regions: There are two ways to achieve this:
    1. Cross-Region VPC Peering: Traditional method for connecting VPCs in different regions.
    2. AWS Transit Gateway (TGW): A modern, robust, and scalable solution, preferred for complex many-to-many connectivity across multiple VPCs and regions.
  2. Accessing EFS from On-Premises Servers: There are two options:
    1. AWS Site-to-Site VPN: Provides an encrypted connection over the public internet. Standard and lower-cost solution. Steps:
      • Virtual Private Gateway (VPG): Created in AWS and attached to VPC. Acts as the AWS endpoint for VPN connections. It is a virtual, AWS-managed component, not a physical device.
      • Customer Gateway (CGW): Created in AWS, but represents on-premises VPN device. It is a configuration object in AWS, not an actual device.
    2. AWS Direct Connect (DX): Provides dedicated bandwidth and lower latency, ideal for high-performance or large-scale workloads.

3. EBS vs EFS

  1. EFS : Use it when required
    1. Shared access by multiple EC2 instances
    2. Elasticity and automatic scaling
    3. High availability across multiple AZs
    4. Read-heavy, shared workloads like content management systems, big data, and home directories
  2. EBS: Use it when required
    1. Persistent and directly attached to a single EC2 instance
    2. Databases, transactional applications
    3. Workloads that need low-latency, high-throughput block storage alt text

4. EFS vs FSx

Both EFS and FSx are AWS storage services, but they cater to different needs:

  1. EFS (Elastic File System) is like a big shared folder that everyone can use. It’s perfect when multiple computers or servers need to access and work with files at the same time. It’s simple, scalable, and ideal for general-purpose workloads.
  2. FSx (Amazon FSx) is like a specialized folder built for specific tasks. It excels in high-performance applications or unique needs like Windows-based apps or workloads that require extra speed and specialized file system features.

Example:

  1. If your team is working on a shared project and everyone needs to access the same files at once, EFS is a great choice.
  2. If you’re running a Windows application that relies on Windows-specific file-sharing features or processing large datasets with high-speed requirements, FSx is the better option.

5. How EFS and FSx can work together?

Let’s say you're running a video-making business:

  1. You have lots of videos and files that your team needs to share and edit. You’d use EFS for that, so everyone can access the same files easily.
  2. But for editing and rendering big videos super fast, you use FSx because it’s really good at handling heavy tasks like video editing.

6. Question

A company's website uses an Amazon EC2 instance store for its catalog of items. The company wants to make sure that the catalog is highly available and that the catalog is stored in a durable location. What should a solutions architect do to meet these requirements?

  1. Move the catalog to Amazon ElastiCache for Redis.
  2. Deploy a larger EC2 instance with a larger instance store.
  3. Move the catalog from the instance store to Amazon S3 Glacier Deep Archive.
  4. Move the catalog to an Amazon Elastic File System (Amazon EFS) file system. (Correct Ans)

Explanation:

  • EC2 instance store is ephemeral storage — data is lost if the instance stops, terminates, or fails. It is not durable.
  • Amazon EFS provides:
    • Highly available and durable storage, automatically replicated across multiple Availability Zones (AZs).
    • Shared access for multiple EC2 instances.
  • Why the other options are incorrect:
    1. ElastiCache for Redis: In-memory store, not durable for long-term storage.
    2. Larger EC2 instance store: Still ephemeral, doesn’t solve durability or availability.
    3. S3 Glacier Deep Archive: Extremely low-cost archival storage, not suitable for frequently accessed or highly available workloads.