Increase efficiency in data engineering using lean agile

Introduction:

Lean agile process is a very effective way to handle any project. The methodology helps eliminate unnecessary processes, reducing ambiguity and allowing the team to utilise their time optimally in developing an effective software with better functionality. With lean agile process, team is better equipped to focus on objectives, improve productivity and deliver superior value to the customers.

Few of the building blocks for implementing lean agile principles are grooming the user stories and the functionalities or epics by understanding acceptance criteria, prioritising, and conducting retrospectives. While grooming is an important part of the process, retrospective is equally imperative that enables the team to understand what went well and what did not. The key objective of retrospective meeting is to understand the drawbacks of the process and make necessary changes. However, there’s no right or wrong way for retrospection.

In this post I am sharing our approach to retrospective meeting that helped my team during the development process. My team is a standard scrum team comprising software developers, test engineers, data engineers and analysts.

The initiation

We followed a process where the first two sprints were development sprints and we started testing from the third sprint. While we conducted retrospective meeting after the end of each sprint, most of it looked good when the code was not tested.

Retrospective meeting led to greater discussions from the third sprint onwards. We also had to dig deeper into the first sprint as the test engineers had more insights about their findings from the sprint-1. This resulted in code re-work in order to fix the bugs altering the velocity of the following sprints.

Initial process included:

Sprint-1: Develop

Sprint-2: Develop

Sprint-3: Develop + Test + Bug fixes

Sprint-4: Develop + Test + Bug fixes

Depending on the severity of the bugs, the velocity varied from sprint to sprint because the priority of the client differed. In some sprints, we were expected to fix key functionality bugs first whereas in certain sprints we had to complete coding on priority.

The issue with this process was that every time the client engineers produced better results than we did with the same set of data. While customer expected better results in each demo, test engineers were out of pace and didn’t have the clarity on what test cases to write.

To address this challenge we decided to make some changes to the process in order to produce better outcome. The changes made to the process include:

  1. As soon as we receive the requirements, developers would make assumptions
  2. These assumptions need to be discussed in sprint meeting at the beginning of the sprints with the team and key stakeholders. This helps clarifying as many doubts as stakeholders may have right at the start of the sprint.
  3. Project managers to do risk analysis based on these assumptions
  4. Test engineers to get involved right from the Sprint-1. Developers should provide the expected data output according to their assumptions to test engineers for them to write relevant test cases.
  5. Not to conduct demo of our work on fixed days but showcase when we develop or reach a significant stage in our model/approach.
  6. During the sprint, whenever we have any doubt in our assumption, discuss it with the related stakeholders because as we know, no result is wrong with respect to data or data science.
  7. Developers should perform unit testing followed by moving the code into test environment and during the testing process, developers should optimize the code (which is an important step added to the process).
  8. End of each sprint, justify the assumptions made, to the stakeholders and demonstrate the outcome.

Here is an example of the scenario where we implemented the preceding changes. The code that we were writing for one of the sprints had to perform the following:

  1. Fetch 100 million records by reading line by line and channelize the data into a platform — Each record took 50 bytes (5 kb in total) and it took 2.3 minutes to fetch all the records
  2. On the platform we cleanse the data and perform some mathematical operations through an API in order to produce the predictive algorithms

We were using certain streams in order to work with the data that included:

InputStreamReader: It fetches data from the coding platform, converts byte into character and passes onto BufferedReader

BufferedReader: It reads the character streams from InputStreamReader and deploy the data sequence into the platform

New process implementation

The change in the process helped us in several ways. As we made assumptions we also provided test cases to the test engineers right at the beginning. This made it easy for them to understand the expected output and write better test cases while clarifying doubts as and when required helped optimise time and effort. We conducted the demo every time we reached a significant stage improving our client’s experience. On the other hand, as we kept our key stakeholders informed throughout the process making it easy to justify our assumptions in retrospective meetings.

Apart from this, we accelerated the process of fetching data by optimising code. The changes made were as follows:

1. We improvised the way we read data from files using specific lightweight streams, which consumes less memory (such as FileInputStreams instead of InputStreamReaders)

2. Made single statement query and set the connection auto commit to false

3. Unit test results showed us that they were unnecessary code which increased the time-lapse and we could avoid the boundary conditions which were not needed

We also deployed coding into the test environment. While the test engineers tested the code, developers started to optimise the code further. We re-designed the architecture — fetched first 1 million records using optimised SQL and then fetched next chunk in parallel with different set of code. We also used Byte channels to read the data and during the process we continually cleared the buffer in the code which helped boost the speed.

Lean agile methodology for continuous improvement

We incorporated these changes at the optimisation stage while testing was done parallelly that enabled us to save time and improve efficiency. It helped reduce coding time by 70% (it took total 69 seconds to fetch the entire datasets) and man hours by 35%. In addition to this, we built a cross functional team that allowed us to think from various perspectives, facilitating knowledge sharing and enhancing productivity.

Combining lean and agile principles helps take advantage of key elements of both the methodologies and customise them to meet project-specific requirements. It allows to integrate continuous improvement and follow best development practices enabling the team to optimise performance and deliver value.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了