The multi-threading hammer - Part 4
And the process was being orchestrated in Airflow . . . (Part 3, link below)
We were expecting good results by using multi-threading and when that did not work, using multi-processing. But both approaches did not give us the expected results. Though we were executing on a Spark cluster, we were using multi-threading (and then multi-processing) features of Python. This was making all threads and processes run on the same processor - when 200 threads / processes are spawned on the same CPU, things will not work as per expectation as the number of cores in the CPU will still be four or 8 or 16. We were clearly limited by the compute capacity available.
Our next approach was to look for a solution using Airflow. Airflow was already being used for orchestration. I suggested that we define an Airflow Directed Acyclic Graph (DAG) that would execute the required task of EBCDIC to ASCII conversion. We would then launch multiple instances of the same DAG on a schedule of say one minute. Each DAG would have the offset from which it would read the file and write the result to a directory.
When a DAG is executed, Airflow chooses any node from the cluster, allowing us to spawn multiple DAG instances and spread the load across the cluster and execute conversion of multiple blocks at the same time, in parallel.
The team did not implement the solution in Airflow. They decided to implement the same concept using AWS Glue (due to their familiarity). Initially, the team launched around 200 AWS Glue tasks in parallel. Then 400 parallel executions were also tried out. Using this approach, we were able to achieve a significant speed up in the file conversion activity. Finally we settled on a number between 200 and 400.
领英推荐
End of story? Not really . . .
Link