Using Agile & Kanban to Manage your Data Science Projects
https://www.paintermagazine.co.uk/show_image.php?imageID=28431

Using Agile & Kanban to Manage your Data Science Projects

Recently I have been blogging about education and technology but this time I want to focus on Data Science. For a few years now I have always had a small data science team as part of the software engineering organization I have been managing. The role of data science is one I have valued all along the way but I must admit, except for my time in Bing, I was part-time. That changed about a year ago and I am now back full-time managing a pure data science team and this time I am part of what I would call a nascent area.  It is this aspect that recently lead me to adopting Kanban as the process we use for our team project management.

KENJ ASIDE: If you want the bottom line up front, here it is. Kanban is working well for us so far. We can adapt to a rapidly shifting world of data, insights, and requirements that let us add value in rapid order. That and we like it.

The part of Microsoft that I am operating in is running a very fast paced agile data science organization. We have a methodology we call the “Insights Factory.” The primary focus of the IF process is to produce rapid valuable insights.  

Image 1: Images from a talk I gave in 2012 on the Five Vs of Big Data

The insights factory fully embraces the Five “V”s of Big Data with an emphasis on my favorite. What’s my favorite you ask? Value of course!

You can’t have a Data Science team that is pure research and what if. You can’t just focus on getting data or getting more data. A great data science team will produce value all along the way and when they hit a blockage or a dead end, they will back up and move along a new path.

 Image 2: Insights Factory (IF) Model from Microsoft Operating Systems Group

This model is great. My team and I have produced dozens of valuable insights to improve our engagement in the Education sector as well as increasing overall Monetization. You can’t argue with results like that, pushing back competition and growing revenue. That is the kind of results any manager would want and all the time.

My main challenge as a manager in this space has been project management. I started off with the traditional project list and then insights factory proposal documents and then emails and lists in OneNote. None of those scaled well and there were a couple simple reason why.

  • You don’t usually know what the next best question is until you answer the current question.
  • You may have a question you want to answer but you may need to wait on the data. 

You won’t usually know the next question

In software development you plan usage scenarios but you don’t usually know all the ways a user will interact with your feature. Not knowing everything when developing a feature has worked just fine for me over the years. In fact that’s why I’m a big fan of the Minimum Viable Product (MVP) model so I can ship and learn from the users what they value. The problem was, we brought that belief that we could figure out what was important up front. That is why we would produce IF project documents. They were like small specs for a full data research project.

Image 3: Reproduction of the Minimum Viable Product model as seen in multiple blogs

KENJ ASIDE: I also wrote a reciprocal post on MVP for software testing called Minimum Viable Quality

I like the IF project documentation process because it forces the Data Scientist to think critically about the project before they engage. They need to think about likely steps, likely data sources, and most importantly think about the value they believe the project will produce.

Where the document falls short is that it can rarely tell you every step you will need to take along the way. It is too difficult to produce the full roadmap up front for a complicated project. What we needed in our Data Science team was something more akin to the MVP model but for insights projects.

Projects will stall waiting on data

None of my Data Science projects has ever landed without the need to do some new pipeline work. In a traditional data warehousing model folks might call this writing ETLs. In my world with multi-terabyte streams we usually call it pipeline work. I like managing pipeline work as much as managing insights work. They usually need each other to achieve business impact.

One lesson I have recently re-observed in my current role is that mature systems will have more complete data streams and new systems will be constantly evolving. The work I am currently doing is in an evolving area and we are quite dependent on both integrating older disparate data streams as well as pushing out new instrumentation and tapping into new data streams.

This produces a stop and go effect in our work and is one of the reasons my teams often have multiple projects running in parallel.

Kanban seems to be the ticket, at least for us

I have used traditional agile practices with software development teams with one month and two week iterations. We’ve tracked our backlogs and planned our sprints and had healthy results.

In this current data science space that wasn’t working well. We would answer a question and produce four new questions. We would also run into the situation where we would finish an insight and the new question would require more data. How to make and track these tradeoffs was becoming quite the challenge.

A very good friend of mine recently published a new book titled, “Agile Project Management with Kanban.” I picked up a copy and read it. Of course I knew about Kanban. Some of the teams I’ve managed in the past have used it, but I was never hands on with a Kanban project. It occurred to me though that the continuous flow of Kanban was ideal for my current team’s situation.

Kanban gave us the speed and simplicity of agile development but freed us up to shift priorities as needed. We also had an easy way to track work by simply putting a sticky on the wall and tracking it through the stages.

I implemented Kanban with the team about five weeks ago. The number of status reports that I have to slog through and update has dropped precipitously. Team satisfaction and cohesion seems much improved. One nice thing of this approach is that three mornings a week we spend 10 minutes going over status and what is next. When one member of the team is blocked they can listen to their peers and jump in to help where needed. There is no penalty for them not getting their story completed for this sprint. They can be productive where productivity is most needed.

We are only a month in with our Kanban process but it seems to be working. I’ll try to make an update on the topic in a few months to document some lessons learned.

 

 

 

 

Christopher Nannini

Litigation and Business Law Attorney @ Fenton & Keller

6 年

Great article! I really appreciate your discussion on topics concerning Minimum Viable Product (MVP), Agile, and Kanban. I am starting to integrate Agile practices and Kanban with my data science teams. I face similar challenges that your teams were facing with traditional project management techniques. Thank you.

Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

6 年

Published my new article, "5 keys to Successful Data Spelunking."? Please check it out, like, share, and leave a comment. ? https://www.dhirubhai.net/pulse/5-keys-successful-big-data-spelunking-ken-johnston/

  • 该图片无替代文字
回复
Srinivasa Raghavan

Sr. Data Scientist at Accenture AI

7 年

Really a good article on the less talked area in Data science - managing Project. looking for more kanban - agile type articles.

Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

7 年

I finally found time to write a follow up on our use of Kanban. I'll be releasing it as a series instead of one post. I've gone deeper on about 7 follow-up topics. https://www.dhirubhai.net/pulse/kanban-data-science-projects-overview-ken-johnston Future posts will cover these lessons learned while using Kanban for two years now, insight reviews vs retrospectives, about constraints, dealing with project management overhead, implementing minimum viable models (MVM), the tyranny of counting, blurred lines between data science and data engineering, dealing with stalled projects, when you don't know the next question, and experimenting with differential privacy.

回复
Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

8 年

I just found a weird bug in LinkedIn comments. If I am logged in (using Edge or Firefox), LinkedIn does not show me the comments on a post at the bottom of the post. I only recently saw the comments when looking at my blog stats. Weird. I noticed a few asks in the comments. Are we still using Kanban, yes. Can I explain more about budgets, how we integrate with other DS projects, how we deal with the serial nature of DS work, and are any of the algos in production. On the in production question, the answer is yes. Tons of our work is in production. I'll try to make a new post in the next couple of weeks answering those questions.

回复

要查看或添加评论,请登录

Ken Johnston的更多文章

社区洞察

其他会员也浏览了