Use Amazon Elastic File System to Turn AWS Lambda Functions into Stateful-New Stack

2021-11-11 08:40:04 By : Mr. Andrew Tang

Lambda by Amazon Web Services is one of the earliest serverless platforms in the industry. Since its launch in 2014, Amazon has added a number of features, making it the most mature function as a service (FaaS). The platform supports various language runtimes, including Node.js, Python, Java, Ruby, C#, Go, and PowerShell. Tightly integrate with mainstream AWS managed services that trigger Lambda functions as event sources.

Traditionally, serverless computing platforms and FaaS products (such as AWS Lambda) are associated with stateless functions. Since functions are called and terminated based on events, there is no internal persistence layer available. The state is always materialized by moving it to an object storage, NoSQL database, in-memory database, or relational database instance. It is common to maintain state in Lambda functions by writing state to objects in S3 buckets or DynamoDB or RDS tables.

But some use cases (such as machine learning inference) require a new approach. Downloading a large model from an Amazon S3 bucket can increase startup time and cause delays. Some functions require external libraries that may be too large. Although the concept of AWS Lambda Layers solves this problem, there is a limit of 50MB (compressed for direct upload), which does not meet the purpose. After deployment, the layer is static, which means that the content can only be changed by deploying a new layer.

In June 2020, AWS added support for Lambda's Amazon Elastic File System (EFS), enabling many exciting use cases.

This tutorial series covers all aspects of using Amazon EFS and AWS Lambda to host serverless machine learning APIs.

Amazon Elastic File System (EFS) provides a managed elastic NFS file system for AWS services and local resources. It can be expanded to the petabyte level without interrupting the application, and automatically grows and shrinks with the increase or deletion of files, without the need to configure and manage capacity to adapt to growth.

Since EFS uses NFS v4 (the industry standard for shared file systems), the file system can be easily attached to EC2 instances running Linux.

Amazon EFS exposes well-known access points that can be configured for each application. EFS access points represent a flexible way to manage application access in an NFS environment, with higher scalability, security, and ease of use. An EFS file system can have multiple access points. Each access point can be configured with permissions associated with POSIX-compliant user ID and group ID. Combined with IAM, the EFS file system can have fine-grained security and access control.

It is important to understand that Amazon EFS is only available within the VPC. Only those users in the same VPC can access the EFS file system. Only after the connection is established through AWS Direct Connect or AWS VPN, the local server can mount the EFS share.

When the EFS file system is attached to an AWS Lambda function, it can access existing data and store the data in it. This method can populate the file system with dependencies and other files that are available to all Lambda instances.

The prerequisite for AWS Lambda to access the EFS file system is that the function should be in the same VPC as EFS. It should also have explicit permissions to access the file system and create an elastic network interface (ENI) for the VPC's subnet. Once these conditions are met, the Lambda function can read and write to the EFS file system.

The easiest way to populate the EFS file system accessed by Lambda functions is to mount it to an EC2 instance. Using standard NFS conventions, the EFS file system is displayed in the /mnt directory.

When launching an EC2 instance from the AWS console, it can choose to mount an existing EFS file system. The wizard automatically adds the appropriate user script to permanently mount the file system by adding an entry in /etc/fstab.

The powerful combination of EFS and Lambda functions can be used to host deep learning inference APIs in a serverless mode. Since the size of the TensorFlow or PyTorch model may exceed the size limit of the Lambda layer and the /tmp directory, EFS comes in handy when storing the model.

Lambda's EFS storage backend can also have all dependencies, such as OpenCV or PIL. These dependencies are not only large, but also take time to install. Lambda Python functions can point to existing directories through the PYTHONPATH environment variable. The same file system also stores the fully trained model in a separate directory loaded by the function.

To populate the EFS file system with Python modules and pre-trained models, we can use Amazon EC2 instances or even SageMaker Notebook. Both of these options allow us to mount the file system and add dependencies through the Python virtual environment or through the pip installer.

The following workflow highlights the steps involved in this method:

This week The New Stack will launch a series of upcoming tutorials on the subject, in which I will guide you through all the steps involved in hosting a serverless machine learning inference API in AWS Lambda. Sign in the next issue tomorrow!

Janakiram MSV's webinar series "Machine Intelligence and Modern Infrastructure (MI2)" provides informative and insightful sessions covering cutting-edge technologies. Register for the upcoming MI2 webinar at http://mi2.live.

Amazon Web Services is a sponsor of The New Stack.

The featured image was taken by moren hsu on Unsplash.