JobTarget Internal Batch Framework that runs 5400 Jobs/Month 60,000 Jobs/Year on AWS Batch
Author
Soumil Nitin Shah (Data collection and Processing Team lead)
I earned a Bachelor of Science in Electronic Engineering and a double master’s in Electrical and Computer Engineering. I have extensive expertise in developing scalable and high-performance software applications in Python. I have a YouTube channel where I teach people about Data Science, Machine learning, Elastic search, and AWS. I work as data collection and processing Team Lead at Jobtarget where I spent most of my time developing Ingestion Framework and creating microservices and scalable architecture on AWS. I have worked with a massive amount of data which includes creating data lakes (1.2T) optimizing data lakes query by creating a partition and using the right file format and compression. I have also developed and worked on a streaming application for ingesting real-time streams data via kinesis and firehose to elastic search
Hari Om Dubey(Consultant Software Engineer, Python developer)
I have completed a Master’s in Computer Application and I have 5 years of experience in developing software applications using Python and Django frameworks. I love to code in Python and creating a solution for a problem by coding excites me. I have been working at Jobtarget for like past 2 months as a Software Engineer in a Data Team.
Himadri Chobisa (Jr Data Engineer)
I recently graduated from UConn with a master’s in Business Analytics and Project Management with a Business Data Science concentration. I joined JobTarget as an intern in April last year and have been with data team since then. I am a data enthusiast with experience of working in SQL, Python, PowerBi, Tableau, and Machine Learning. In my free time, I enjoy dancing and cooking.
April Love Ituhat (Software Engineer, Python)
I have a bachelor’s degree in Computer Engineering and have spent the last three years working on Research and development tasks?involving diverse domains such as AWS, Machine Learning, Robot simulations, and IoT. I've been a part of the JobTarget data team since November 2021, and I usually work with python and AWS. It's exciting for me to see the applications come to fruition.
Paul Allan deserves a special thank you for helping us through this endeavor.
Project Overview:
We discovered a great need throughout the firm to standardize the process of running, scheduling, and error handling while writing Python tasks. This ingestion framework supports scheduling, firing jobs, async invocation of many jobs, templates, error handling, and much more. We wanted an ability to scale up or down easily based on job requirements. We solved certain common concerns with our framework, and each framework sets up a distinct computing environment. It is simple to adjust the computing environment if a job requires extra CPU cores or RAM. Each job specifies how much computer power is needed and timeout. The Frameworks select the Python modules based on jobs?
Introduction:
What is AWS Batch?
AWS Batch enables developers, scientists, and engineers to run hundreds of thousands of batch computing jobs easily and efficiently on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. With AWS Batch, there is no need to install and manage batch computing software or server clusters that are required to run your jobs. AWS Batch plans, schedules, and executes your batch computing workloads across the AWS Cloud.
Current Workload on AWS Batch?
Figure 2: Shows Total Active connection made to Process Meta Data?
What is JobTarget Batch framework?
Batch Framework is a fully scalable internal framework designed to run 1000+ jobs and can scale horizontally. Each job have ability to specify how much compute environment you need you can specify how many cores and RAM you need. When a job starts it creates a process and each process has many tasks if any task is failed it will mark the process as failed on SQL server tables.
Features:
Benefits
·??????Fully managed
领英推荐
·??????Integrated with AWS
·??????Cost-optimized resource provisioning
Figure : Shows the Architecture for Running Scheduled Jobs
Explanation:
Using our framework easily allows you to schedule jobs using AWS event Bridge. Once the CRON schedule matches the time the user wants the job to be scheduled this will fire the lambda as shown in figure
The figure shows Swagger UI for REST API to schedule your jobs?
Figure: Shows the event Bridge CRON rules which are set using REST API
Figure : Shows Lambda Triggers that have been added?
When a user wants to manually run a job, they give the job id they want to fire, which starts AWS lambda and puts the job in the job queue. When a job is in the job queue, the batch job id also indicates how many VCPU and how much RAM the job will require. The job will pull the templates from AWS S3 once the computing environment is ready before starting the jobs. Once the job starts it will create a new process ID. Which helps to to know when the job started and what time job finished and what was status of the job success , failed or progress
Explanation
When a user wants to manually run a job, they give the job id they want to fire, which starts AWS lambda and puts the work in the 006Aob queue. When a job is in the job queue, the batch job id also indicates how many cores and how much RAM the job will require. The job will pull the templates from AWS S3 once the computing environment is ready before starting the jobs. As was already mentioned, each time a job runs, a new process is created, and each process has numerous tasks.
Process Monitoring
Once the user runs the scripts through REST API if needed when it does happen this will fire the lambda and lambda will pull all information regarding the job from SQL server tables including the template path and compute environment required for the job. Once the job starts it creates a process. A process is a simply unique number assigned for a given script to track whether the job succeed or failed as shown in the diagram below?
Figure: When Batch Jobs starts it creates a process?
Each process can have many Tasks associated with it. when making pizzas we need several things such as dough, flour, and sauce these are nothing, but steps required to make the pizza. Tasks are nothing but steps that are being executed?
Figure: Each process has many Tasks We have status on task and process level which helps us to identify whether the process failed or passed?and if failed at what step it failed
As Mentioned before Each job as ability to choose from python library to use a job can have many python modules and all this meta data is pulled from SQL Server tables and passed on to AWS Batch. AWS Batch prepares the compute environment and install all mentioned python library at run time. This gives us flexibility to specify how much VCPU and RAM we need for a job how long the job should run and what python libraries needs to be installed?
Here, we're about to set new benchmarks. By the end of the month of November 2022, we plan to have 7000 jobs running on AWS Batch. This application is being used on a large scale. Here are some additional statistics showing that we execute between 230 and 250 concurrent jobs daily. running 60,000 jobs each year. the system currently can handle 100K jobs a month easily i.e. 1000 EC2 Spot instance running concurrently
Operations Research @ Air India | D&T | Operations Research(Ramjas, DU 2019) | Mathematics(Ramjas, DU 2016)
3 年Great work! Your have mentioned that you are pulling the code from s3 for batch jobs. so have you used aws fetch and run shell script architecture for batch jobs?
Python Developer | Search Engineer | Data Specialist
3 年Congrats
VP- Sr Quantitative Risk Analyst at M&T, Consumer Asset Management , UConn
3 年This was such a good project to work and learn!!! Absolutely amazing team ??