Introduction to AWS EFS (Elastic File System)

Introduction to AWS EFS (Elastic File System)

AWS EFS

AWS EFS (Elastic File System) is a fully managed, scalable, and shared file storage service. We can share file data without worrying much about provisioning, and managing storage capacity and performance.?

It is designed to work with EC2 instances and other AWS services:

  • AWS EFS has a simple web services interface, we can create and configure file systems easily.
  • Highly scalable: It can shrink or grow automatically as we add and remove files without disrupting applications. We pay as we use, there is no reservation for the capacity.
  • Highly available: we can store the data in multiple AZs to provide resiliency to our data.
  • It is really expensive to use.
  • It can be mounted on multiple EC2 instances
  • It supports NFS (Network File System) protocol v4.0 and v4.1
  • It provides support for authentication, authorization, and encryption capabilities to help us protect our data. Access to EFS by NFS clients can be controlled with security by IAM policies and security policies, such as security groups.
  • Compatible with the Linux-based AMI only
  • We can control the access through Portable Operating System Interface (POSIX) permissions.? ?

EFS Types

  • Regional: store data across multiple AZs in the same region. It provides high availability and durability to our data.
  • One zone: store data within a single AZ. Data stored in these types of file systems might be lost.

Use cases

  • Hosting websites, blogs, or content-heavy applications like WordPress: multiple instances can access the same file system simultaneously such as images, videos, and media files.
  • Data processing with big data tools like Hadoop, spark... EFS allows multiple compute nodes to read/write large datasets concurrently.
  • ...

EFS - Performance & Storage Classes

AWS EFS provides us with some performance configurations to meet the needs of various workloads. The performance is typically measured by some criteria: latency, throughput, and Input/Output operations per second.

There are 3 configurations:

  • File system type – Regional or One Zone
  • Performance mode – General Purpose or Max I/O
  • Throughput mode – Elastic, Provisioned, or Bursting

Performance modes

AWS provides two performance modes

  • General purpose (default) - has the lowest per-operation latency. Best for applications that require low latency and are sensitive to performance, the application doesn't need a massive number of concurrent read or write operations.
  • Max I/O: has high power operation latency. Best for highly parallel, throughput-intensive workloads with a massive number of concurrency.

Real-world analogy

General purpose: you go to a small grocery store, you can buy goods quickly, you can find things fast, and the checkout process is smooth because there aren't many other shopping at the same time:

Max I/O: There are a lot of people buying a large number of items in a mall or shopping center. they are built to handle massive amounts of goods and customers concurrently, but when you need to check out, it is slower than the small grocery store.

Throughput mode

AWS provides three throughput modes:

  • Bursting throughput: Scales the throughput by the amount of storage in our file system. It is ideal for applications with irregular or unpredictable usage patterns where occasional spikes in data throughput are required.

  • Provisioned throughput: we specify a desired throughput. Use it if we can predict our workload's performance requirements. It is suitable for workloads that require consistently high performance.
  • Elastic throughput: EFS automatically scales up or down the throughput to meet our workload's needs. It is the best mode for spiky usage patterns or unpredictable workloads. Up to 3GB/s for reads and 1GB/s for write.

EFS - storage lifecycle

Managing storage lifecycle refers to the ability to automate the movement of data between different storage classes based on the lifecycle configuration to save the cost that we have to pay for storage usage. The lifecycle configuration consists of 3 lifecycle policies. When a file is not accessed for a certain period of time (transition time), it will be automatically moved to a lower-cost storage class. When that file is accessed before it is moved, the transition time resets.

Storage tiers

  • Infrequent access (IA): this is the cost-optimized storage for files that is accessed only a few times each quarter. By default, files that are not accessed for 30 days in the Standard storage are moved to this storage.
  • Archive: suitable for files that are accessed only a few times each year or less. For default 90 days, these files will be moved to this storage.
  • Standard: for frequently accessed files. We can set up a configuration to move out from IA or Archive when the files are accessed in IA or Archive. However, by default, the files is not moved back.

Select life cycle configuration

Lifecycle configuration: 30 days after the last access, the files will be moved into IA. 90 days into Archive and On the first access, the files will moved back to Standard storage.

Performance settings


Provisioned mode with throughput 100 MB/s, and we will be charged up to 720$/month

Performance mode

Max I/O mode is available only for Bursting and Provisioned throughput mode because when using Elastic throughput mode, we will have an I/O unit based on the performance we need, it is scaled automatically.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了