登录查看更多内容

Using Agile & Kanban to Manage your Data Science Projects

Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

发布日期: 2015年5月18日

Recently I have been blogging about education and technology but this time I want to focus on Data Science. For a few years now I have always had a small data science team as part of the software engineering organization I have been managing. The role of data science is one I have valued all along the way but I must admit, except for my time in Bing, I was part-time. That changed about a year ago and I am now back full-time managing a pure data science team and this time I am part of what I would call a nascent area. It is this aspect that recently lead me to adopting Kanban as the process we use for our team project management.

KENJ ASIDE: If you want the bottom line up front, here it is. Kanban is working well for us so far. We can adapt to a rapidly shifting world of data, insights, and requirements that let us add value in rapid order. That and we like it.

The part of Microsoft that I am operating in is running a very fast paced agile data science organization. We have a methodology we call the “Insights Factory.” The primary focus of the IF process is to produce rapid valuable insights.

Image 1: Images from a talk I gave in 2012 on the Five Vs of Big Data

The insights factory fully embraces the Five “V”s of Big Data with an emphasis on my favorite. What’s my favorite you ask? Value of course!

You can’t have a Data Science team that is pure research and what if. You can’t just focus on getting data or getting more data. A great data science team will produce value all along the way and when they hit a blockage or a dead end, they will back up and move along a new path.

Image 2: Insights Factory (IF) Model from Microsoft Operating Systems Group

This model is great. My team and I have produced dozens of valuable insights to improve our engagement in the Education sector as well as increasing overall Monetization. You can’t argue with results like that, pushing back competition and growing revenue. That is the kind of results any manager would want and all the time.

My main challenge as a manager in this space has been project management. I started off with the traditional project list and then insights factory proposal documents and then emails and lists in OneNote. None of those scaled well and there were a couple simple reason why.

You don’t usually know what the next best question is until you answer the current question.

You may have a question you want to answer but you may need to wait on the data.

You won’t usually know the next question

In software development you plan usage scenarios but you don’t usually know all the ways a user will interact with your feature. Not knowing everything when developing a feature has worked just fine for me over the years. In fact that’s why I’m a big fan of the Minimum Viable Product (MVP) model so I can ship and learn from the users what they value. The problem was, we brought that belief that we could figure out what was important up front. That is why we would produce IF project documents. They were like small specs for a full data research project.

Image 3: Reproduction of the Minimum Viable Product model as seen in multiple blogs

KENJ ASIDE: I also wrote a reciprocal post on MVP for software testing called Minimum Viable Quality

I like the IF project documentation process because it forces the Data Scientist to think critically about the project before they engage. They need to think about likely steps, likely data sources, and most importantly think about the value they believe the project will produce.

Where the document falls short is that it can rarely tell you every step you will need to take along the way. It is too difficult to produce the full roadmap up front for a complicated project. What we needed in our Data Science team was something more akin to the MVP model but for insights projects.

Projects will stall waiting on data

None of my Data Science projects has ever landed without the need to do some new pipeline work. In a traditional data warehousing model folks might call this writing ETLs. In my world with multi-terabyte streams we usually call it pipeline work. I like managing pipeline work as much as managing insights work. They usually need each other to achieve business impact.

One lesson I have recently re-observed in my current role is that mature systems will have more complete data streams and new systems will be constantly evolving. The work I am currently doing is in an evolving area and we are quite dependent on both integrating older disparate data streams as well as pushing out new instrumentation and tapping into new data streams.

This produces a stop and go effect in our work and is one of the reasons my teams often have multiple projects running in parallel.

Kanban seems to be the ticket, at least for us

I have used traditional agile practices with software development teams with one month and two week iterations. We’ve tracked our backlogs and planned our sprints and had healthy results.

In this current data science space that wasn’t working well. We would answer a question and produce four new questions. We would also run into the situation where we would finish an insight and the new question would require more data. How to make and track these tradeoffs was becoming quite the challenge.

A very good friend of mine recently published a new book titled, “Agile Project Management with Kanban.” I picked up a copy and read it. Of course I knew about Kanban. Some of the teams I’ve managed in the past have used it, but I was never hands on with a Kanban project. It occurred to me though that the continuous flow of Kanban was ideal for my current team’s situation.

Kanban gave us the speed and simplicity of agile development but freed us up to shift priorities as needed. We also had an easy way to track work by simply putting a sticky on the wall and tracking it through the stages.

I implemented Kanban with the team about five weeks ago. The number of status reports that I have to slog through and update has dropped precipitously. Team satisfaction and cohesion seems much improved. One nice thing of this approach is that three mornings a week we spend 10 minutes going over status and what is next. When one member of the team is blocked they can listen to their peers and jump in to help where needed. There is no penalty for them not getting their story completed for this sprint. They can be productive where productivity is most needed.

We are only a month in with our Kanban process but it seems to be working. I’ll try to make an update on the topic in a few months to document some lessons learned.

Christopher Nannini

Litigation and Business Law Attorney @ Fenton & Keller

6 年

Great article! I really appreciate your discussion on topics concerning Minimum Viable Product (MVP), Agile, and Kanban. I am starting to integrate Agile practices and Kanban with my data science teams. I face similar challenges that your teams were facing with traditional project management techniques. Thank you.

1 次回应

Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

6 年

Published my new article, "5 keys to Successful Data Spelunking."? Please check it out, like, share, and leave a comment. ? https://www.dhirubhai.net/pulse/5-keys-successful-big-data-spelunking-ken-johnston/

Srinivasa Raghavan

Sr. Data Scientist at Accenture AI

7 年

Really a good article on the less talked area in Data science - managing Project. looking for more kanban - agile type articles.

1 次回应

Ken Johnston

7 年

I finally found time to write a follow up on our use of Kanban. I'll be releasing it as a series instead of one post. I've gone deeper on about 7 follow-up topics. https://www.dhirubhai.net/pulse/kanban-data-science-projects-overview-ken-johnston Future posts will cover these lessons learned while using Kanban for two years now, insight reviews vs retrospectives, about constraints, dealing with project management overhead, implementing minimum viable models (MVM), the tyranny of counting, blurred lines between data science and data engineering, dealing with stalled projects, when you don't know the next question, and experimenting with differential privacy.

Ken Johnston

8 年

I just found a weird bug in LinkedIn comments. If I am logged in (using Edge or Firefox), LinkedIn does not show me the comments on a post at the bottom of the post. I only recently saw the comments when looking at my blog stats. Weird. I noticed a few asks in the comments. Are we still using Kanban, yes. Can I explain more about budgets, how we integrate with other DS projects, how we deal with the serial nature of DS work, and are any of the algos in production. On the in production question, the answer is yes. Tons of our work is in production. I'll try to make a new post in the next couple of weeks answering those questions.

查看更多评论

要查看或添加评论，请登录

Ken Johnston的更多文章

Insights Catalog #1: From Solar Eclipses to AI-Imagined McFlurry Quests

2025年3月23日

Insights Catalog #1: From Solar Eclipses to AI-Imagined McFlurry Quests

This is the first in what I hope to be a series. You see, this post will share some really fund AI insights, but it…
Book Review: Eliyahu M. Goldratt's Essentials – "The Goal," "Critical Chain," and "It's Not Luck"

2025年1月28日

Book Review: Eliyahu M. Goldratt's Essentials – "The Goal," "Critical Chain," and "It's Not Luck"

In 2002, I first read The Goal, and it was a revelation. Now, over two decades later, I revisited this classic along…

7 条评论
KenIsms for Data Scientists (With Kittens, Because Clicks Matter)

2024年10月7日

KenIsms for Data Scientists (With Kittens, Because Clicks Matter)

As a veteran in the data science ecosystem, I’ve mentored countless rising star analysis. I have a set of battle-tested…

2 条评论
The DevOps Tetralogy Book Review: Essential Reading for Modern Software Delivery

2024年10月1日

The DevOps Tetralogy Book Review: Essential Reading for Modern Software Delivery

In today’s fast-paced digital landscape, four books stand out as the pillars of modern DevOps: Accelerate, The Phoenix…

18 条评论
The DevOps Tetralogy

2024年10月1日

The DevOps Tetralogy

I'm working on a post about what I call the #DevOps Tetralogy. To me these are the four most important books anyone…
Tech Talk: Getting QA right when adopting DevOps

2024年9月26日

Tech Talk: Getting QA right when adopting DevOps

I’ve been asked that as a former Microsoft engineer could I come to a local testing meetup and give a talk. I jotted…

2 条评论
The Future of Commercial Vehicle Predictive Maintenance Will Be Led by AI

2024年9月19日

The Future of Commercial Vehicle Predictive Maintenance Will Be Led by AI

At Ford our fiscal year maps to the calendar year and in my world, that means 2024 is starting to wrap up and now it is…

10 条评论
I know you so well and you can tell by the sound of my voice I will miss you Microsoft!

2022年6月3日

I know you so well and you can tell by the sound of my voice I will miss you Microsoft!

After more than 25 years impacting the world with my work at Microsoft I have decided to try something different. I…

38 条评论
5 Keys to Successful Big Data Spelunking

2018年8月8日

5 Keys to Successful Big Data Spelunking

A little over four years ago I dramatically changed my data focus from entity extraction for the Bing web graph to…

11 条评论
AI uses in MarTech: Top 5's from the AI Growth Summit

2018年7月2日

AI uses in MarTech: Top 5's from the AI Growth Summit

My team and I have been building ML and AI driven models for business intelligence, sales, and marketing for several…

See all articles

Using Agile & Kanban to Manage your Data Science Projects

Ken Johnston

Executive Leader for Cloud Engineering and Data Science organizations focused on the use of Connected Vehicle Data for Predictive Maintenance, Privacy, Quality of Service, Fleet Management and DevOps.

You won’t usually know the next question

Projects will stall waiting on data

Kanban seems to be the ticket, at least for us

Ken Johnston的更多文章

社区洞察

其他会员也浏览了

MLOps Process: An Overview

Empowering Enterprise Data Science: Integrating Agile, Product Management, and Design Thinking for Strategic Success

How To Get SCRUM Done on a Hybrid Data Team

Functional Data Science: How to Balance Agile Experiments and Steady Progress? // Pt. 2: For the Product Owner

Let's Talk - How "Agile" methodology is helping "Data Science"

DevScience and beyond; the next frontier in DevOps

Failing at Scrum with a Data Science and Machine Learning team

Empowering Efficient Data Retrieval and Agile Development

Understanding ETL and Agile Methodology in Project Management: A Step-by-Step Guide

Why I love the work I do.

You won’t usually know the next question

Projects will stall waiting on data

Kanban seems to be the ticket, at least for us

Ken Johnston的更多文章

Insights Catalog #1: From Solar Eclipses to AI-Imagined McFlurry Quests

Book Review: Eliyahu M. Goldratt's Essentials – "The Goal," "Critical Chain," and "It's Not Luck"

KenIsms for Data Scientists (With Kittens, Because Clicks Matter)

The DevOps Tetralogy Book Review: Essential Reading for Modern Software Delivery

The DevOps Tetralogy

Tech Talk: Getting QA right when adopting DevOps

The Future of Commercial Vehicle Predictive Maintenance Will Be Led by AI

I know you so well and you can tell by the sound of my voice I will miss you Microsoft!

5 Keys to Successful Big Data Spelunking

AI uses in MarTech: Top 5's from the AI Growth Summit

社区洞察

其他会员也浏览了

MLOps Process: An Overview

Empowering Enterprise Data Science: Integrating Agile, Product Management, and Design Thinking for Strategic Success

How To Get SCRUM Done on a Hybrid Data Team

Functional Data Science: How to Balance Agile Experiments and Steady Progress? // Pt. 2: For the Product Owner

Let's Talk - How "Agile" methodology is helping "Data Science"

DevScience and beyond; the next frontier in DevOps

Failing at Scrum with a Data Science and Machine Learning team

Empowering Efficient Data Retrieval and Agile Development

Understanding ETL and Agile Methodology in Project Management: A Step-by-Step Guide

Why I love the work I do.