Friday Fun - Reduce time of execution and face execution failure

In my project that has been executing since Dec 2023, things have been going good. We do have the occasional hiccup when new deployment doe not work because someone forgot to copy the required config file and sometime because customers find bugs.

In our project, we have three jobs J1, J2 and J3. J1 and J2 execute four times a day, while J3 executes once a week, each Saturday. The execution time for J1 and J2 was around 1hr 10min each.

Based on my experience and after sufficient experimentation in Dev and QA, I rolled out changes to the way various Python elements were executed. This helped me bring down the execution time of J1 from 1hr 10min to between 40min to 50min (depending on load). And things were fine. So we saved between 30min to 40min each run (which translates to 120min to 160min each day)

Recently, we deployed new versions of a few objects in J1. I knew that we would get additional speed benefits and that is what happened. After the update, J1 executed in 23min (yesterday it executed in 18min).

But this speed boost gave us a kick on Saturday. Because J1 finished early, it triggered J2 early. As it so happens, the start time of J3 on Saturday is the same as that of J1 (which executes every day, four times a day). J3 executes for 45min. Because of the time reduction of J1, J2 started early, causing it to execute while J3 was still running. At one point in time, J2 was referring the same object that was being modified by J3, causing J3 to trip and fail.

The solution was to wait for J3 to finish execution and then re-execute J2. This is the short term solution. As it so happens, we are re-writing J3 and expect it to finish faster, which will avoid such overlaps in the future.

We were very happy to watch a process finish much faster (saving money for the customer), but then ran into this trouble. Sometimes, we get benefits in one area, only to unearth problems in another area :-)

#performance #execution #overlap

要查看或添加评论,请登录

Bipin Patwardhan的更多文章

  • Change management is crucial (Databricks version)

    Change management is crucial (Databricks version)

    My last project was a data platform implemented using Databricks. As is standard in a data project, we were ingesting…

  • Friday fun - Impersonation (in a good way)

    Friday fun - Impersonation (in a good way)

    All of us know that impersonation - the assumption of another person's identity, be it for good or bad - is not a good…

  • Any design is a trade-off

    Any design is a trade-off

    Irrespective of any area in the world (software or otherwise), every design is a trade off. A design cannot be the 'one…

    1 条评论
  • Quick Tip: The headache caused by import statements in Python

    Quick Tip: The headache caused by import statements in Python

    When developing applications, there has to be a method to the madness. Just because a programming environment allows…

  • Databricks: Enabling safety in utility jobs

    Databricks: Enabling safety in utility jobs

    I am working on a project where we are using Databricks on the WAS platform. It is a standard data engineering project…

  • A Simple Code Generator Using a Cool Python Feature

    A Simple Code Generator Using a Cool Python Feature

    For a project that I executed about three years ago, I wrote a couple of code generators - three variants of a…

  • Recap of my articles from 2024

    Recap of my articles from 2024

    As we are nearing the end of 2024, I take this opportunity to post a recap of the year - in terms of the articles I…

  • Handling dates

    Handling dates

    Handling dates is tough in real life. Date handling is probably tougher in the data engineering world.

  • pfff -- why are you spending time to save 16sec execution time

    pfff -- why are you spending time to save 16sec execution time

    In my current project, we are implementing a data processing and reporting application using Databricks. All the code…

    2 条评论
  • Quick Tip - Add a column to a table (Databricks)

    Quick Tip - Add a column to a table (Databricks)

    As the saying goes, change is the only constant, even in the data space. As we design tables for our data engineering…

社区洞察

其他会员也浏览了