Building a Cost-Efficient Video Processing Pipeline with AWS Lambda
Dushyant Pratap Singh
Senior Software Engineer @ Grappus | Back-End Web Development
If you have worked on video processing tasks, you likely know how challenging they can be. Recently, I worked on an interview tool developed by the team at Unberry that conducts candidate's interviews using AI. The task was to record the interview experience for clients to provide a proctoring experience.
1. Setting up the recording environment
The frontend is developed using React, where we created a Recorder Component that captures both audio streams (from mic & tab audio) and the video stream to relay them to the socket. Instead of storing it as one large blob file, which could overwhelm the client's browser, we chose to chunk and send the data. An important consideration is that whatever chunk size you maintain (e.g., 3-second chunks), each chunk must have sufficient headers and metadata. Any incorrect slicing or splitting would corrupt the entire video recording.
On the backend, we developed a socket handler that is customisable with different storage and persistence strategies. In our case, it saves the chunks in an S3 bucket and persists their metadata in MongoDB.
2. Triggering AWS lambda
When you have multiple chunks in your S3 bucket, you need to merge them into a single file.
For this task, we used ffmpeg. We decided against performing this merging process on the application server since it's CPU-intensive, and deploying a separate EC2 instance just for this occasional task would be inefficient and costly.
AWS Lambda was the obvious choice at this scale. As soon as an interview ends, we trigger the Lambda function using AWS SQS. The Lambda function then processes the files in the bucket and merges the videos using ffmpeg.
领英推荐
3. AWS lambda
Initially, i attempted to use the npm ffmpeg library, but it proved too bloated and incompatible with Lambda, leading to numerous debugging challenges. After researching the documentation and Stack Overflow, AWS Layers emerged as the ideal solution, allowing us to treat ffmpeg as a separate executable file. I downloaded the ffmpeg static build locally for Linux OS (since Lambda runs in a Linux environment) and created an AWS Layer with it.
In the Lambda code, a child process is spawned that utilizes this ffmpeg layer.
This approach resulted in our final architecture:
5. Result and Conclusion
The system successfully merges a 30-minute, 108 MB video in just 4.7 seconds. For a workload of 100 events per day, we're achieving approximately 32x cost savings compared to running a dedicated t3.micro instance with the same configuration.
This architecture demonstrates how server-less solutions can provide both performance and cost benefits for resource-intensive tasks that occur intermittently.
Full stack developer | open source contributer
2 周I have doubt why you are creating a seperate s3 files for the chunk of same video why not just add that to the same s3 file. For performance and cpu intensive task you can also use golang. Great read btw.
Engineering @ WeCP (We Create Problems)
4 个月Hey Dushyant Pratap Singh, great article. Can you please help me with these doubts: 1) It is mentioned that 'each chunk must have sufficient headers and metadata', how are you ensuring this is the case? AFAIK only the first chunk has the necessary metadata. 2) Why store the metadata in a DB we can store it with individual chunks on s3 as metadata. 3) What is the use case, is the video post merging available for download on the product, or is it streamed?