Let's Talk - How "Agile"? methodology is helping "Data Science"?

Let's Talk - How "Agile" methodology is helping "Data Science"

Data Science is beautiful in its application and the benefits of the process are plenty. However, problem solving with Data Science is not as easy as it may seem despite experts handling the projects. The reason is that most Data Science projects don’t have an element of certainty. It is easy to assume that you know the problem and that other people have worked on similar projects to find good results. However, things usually don’t move along as you may have planned. Moreover, you also don’t know how to schedule the project because it is impossible to determine a specific timeline.

This is why Agile methodologies have been introduced for applying on Data Science projects. While Agile has been used for software development in the past, it has been realized that it could be quite effective for refining Data Science projects as well. It is true that applying these methodologies to a data problem is different than applying to a software problem. However, creativity is needed in both situations. On the other hand, the benefit remains the same. Agile methodologies make your work easier and organized as you can use cycles. With each cycle, you can learn something new, get more refined results, and share them with other invested entities.

Involvement of Agile Methodology in Data Science projects

When using Agile methodologies for Data Science, the focus is not on what to do but how to think. Experts believe that a Data Scientist should have an active and dynamic approach when applying Agile methodologies. While the Agile methodologies being used for Data Science are the same as those used for software development, the approach is unique. Here is how Agile Data Science works:

  • When working on Data Science projects, it is impossible to get any kind of insight immediately. It needs multiple iterations before you reach any kind of discovery. Moreover, the data needs to be structured before it can be analyzed. Reaching the stage where you can develop a model for predictions needs a lot of iterations. Therefore, iterating continuously is a major part of Agile Data Science
  • Apart from iteration, sharing outputs throughout the Data Science project is important as well. You will have multiple outputs before you reach any conclusion on the project. These are known as intermediate outputs. It is necessary to share them because waiting until the end of a specific sprint to share them will most likely end with you sharing nothing. And that is against the Agile concept. Part of the Agile Data Science is to make the projects self-documenting. Therefore, sharing incomplete outputs while continuing to wild will result in better productivity
  • It is important to understand the fact that unlike software development, Data Science is more experiment based than task based. Data Science helps explore data so it should be treated as multiple experiments
  • When dealing with software development, there are generally some perspectives. These include what the customers want, what the developers want, and what the business seeks. When working with Data Science, another perspective is added. This is what the data is telling you. You can’t make any sense out of the data unless you develop a basic understanding of it
  • Don’t deviate from the data-value pyramid. This pyramid represents the value that is achieved when raw data is refined, then followed by reports and predictions. In the first layer, you record data. Next is the raw data being covered into charts, tables, graph or other structured form. Then comes the reporting layer that deals with exploration and reasoning. The second last layer is of prediction, which is facilitated by the layers preceding it. The last layer includes actions. The insights you have derived will only be valuable if they can give rise to newer actions or advance existing ones
Aspects of using Agile methodology in Data Science projects

When it comes to data science, it’s all about extracting useful information from raw data and implementing machine learning models. This process requires a higher amount of creativity and, honestly, failures. This leads to the process being non-linear and involves a high degree of uncertainty. This is the reason why Agile methodologies can be successful and popular among data science teams. Let's check some aspects mentioned below :

Planning and prioritization

Having regular planning and prioritization meetings provide (internal and external) stakeholders a better understanding of the costs associated with each data science effort, and the overhead associated with frequently changing priorities and context switching. This ensures alignment between the data team and its stakeholders, with stakeholders being conscientious about their data effort budget, and the data team being aware of organisational needs and how they can effectively contribute.

Continuous model deployment

When companies embrace features such as continuous delivery, they push new application functionalities and changes to production quickly. In traditional data models, this deployment is a multi-step process that eventually goes to engineers. The engineers then rewrite and test the data science before rolling it out. This whole process takes months after the original build. With the passage of time, companies have understood that data scientists are being limited due to the power that local machines hold — and are unable to train models that have to be deployed into production. Using Agile methodologies, leading firms are now building machine learning platforms that partition the training data to retrain it and deploy on models through APIs.

Value creation

When it comes to planning and building value from raw data to iterative predictions, data science teams can take help from data value pyramids. It basically provides a conceptual structure for creating reasonable visualization of the project’s progress. Using data value pyramids, data science teams can actually represent their sequential progressions in logical form. This data value pyramid is one of the features the Agile methodology offers. Thus, with each development life cycle comes a better representation and thus better productivity.

So, whether Agile and Data Science can be successful together or not ?

The whole discussion above was to check how we can fit Agile in Data Science and whether it is worthy or not. Expert says that Agile methodologies are expected to become more common for Data Science projects in the near future. Many data scientists have reported that it makes them more productive. It obviously does not increase or decrease the skill of data scientist. However, it can help them optimize their projects. Instead of spending time on models that are unlikely to reveal any productive results, it is better to spend that time for other result-driven purposes.

Hopefully, after the reading the blog, you’ll have a better idea of how to apply agile to data science, and the potential pitfalls. Despite some of the challenges, I believe agile and data science go well with each other.

Raj Keshri

Sr. Tech Lead | Azure Solutions | DevOps | MS Stack | M.Tech | PhD Scholar

5 年

Great post.. Keep up the good work through sharing all these ??

Amit Chakraborty

Pragmatic Certified AI Product Management || Content Intelligence || ML and Gen AI Architect || PhD Scholar || 5 AI Patents

5 年

Nice read - an interesting point would be to assess the suitability of type of agile methodology used in these kind of projects - which i think are more suited to a kanban nature

Sourav Bhattacharyya

Principal Architect - Artificial Intelligence - Azure Open AI, AWS Bedrock, GCP Vertex AI, IBM Watsonx, NVIDIA NEMO; Leading COEs & Technology Competencies.

5 年

Well written on a very important topic! In my opinion, importance of "Agility" in making business decisions is undeniable; engaging Business partners during SDLC is crucial for success. But, the "methodology" of Agility, needs to redefined, it's radically different for Data Science & AI.

Akash K.

Sr. Solution Consultant || Rising Star ?? | ITIL4 | 4 x CIS - ITSM, HR, SPM, GRC | Gen AI | CMDB | Platform Analytics | Integration Specialist || PG Data Science (ML), IIIT Bangalore

5 年

Good to know Agile usage in Data Science processes

Samyabrata Chakrabarty

Driving Data & AI Innovation at Tata Consultancy Services | Gen AI Advocate | Azure Expert | In an infinite loop of learning & unlearning | Ex-Cognizant | Microsoft Certified (3x) | Photography Aficionado

5 年

Good read ! I just hope that business people and project sponsors understand the difference between the iterative approach for AI and data science problems and traditional software development problems !!!

要查看或添加评论,请登录

Ankit Kumar Shaw的更多文章

社区洞察

其他会员也浏览了