Starting Slow And Scaling Sustainably – Boosting Business Through Data Science Series
Florian Roscheck
Sr. Data Scientist at Henkel | Teams. Data. Science. Products. | Boosting business through data science for the sustainable good of people.
Are you tired of seeing corporate data science initiatives fail? In this 3-article series, I summarize my experience with what works and does not work when tackling data science as a non-tech company. This article explores 5 key measures that can help data science initiatives grow and flourish sustainably.
Overview: Boosting Business Through Data Science Series
Follow me to learn more.
From the?first article ?in this series, you know which business value you will deliver through data science. Now, you “just need to do it”. If your company is not a tech company, you are up for many challenges here. In this article, I would like to focus on challenges I have seen companies especially struggle with and suggest key measures towards solving those: Find a top management champion, think product, make a platform decision, track value creation, and build an interdisciplinary team.
Find a Top Management Champion
Assuming that modern data science solutions are not the norm at most companies, the writing is on the wall:
Solving a business case around data science means change.
It means doing things differently. It means that somebody needs to be courageous enough to do things differently. This person, this somebody, should be a top manager at the company. It should be a leader, ready to champion a vision and strategy for data and support the framework for implementing it. This top manager is sometimes given the title "Chief Data Officer". Finding a sponsor who oversees the entire value creation chain is also a recommendation issued by Fountaine et al., for scaling AI (Fountaine et al., 2021). How so?
Along your journey to value creation from data science, you will face organizational challenges big enough to derail your commitment to business value creation. There is a risk of getting lost in business processes that were never meant to be used for data products that are developed iteratively. The whole discipline is called data “science” – meaning that outcomes are uncertain and delivered gradually. Nothing can be known until data is assembled, analyzed, and modeled. No value is created until a data science solution is embraced by users. Generating business value through data science takes time and a discovery-based approach.
The top management champion needs to create a user- and use-case-focused niche in the organization where the first data science product team can operate with minimal constraints, allowing it to scale as needed to deliver business value as quickly as possible. As the organization’s data maturity develops, data product learnings from this niche must be applied to the organizational setup at large to enable further data science use cases. Here, the top management champion has to take a guiding role.
It is important to note that the management champion is not operating in a vacuum. Their potentially disruptive actions and budget demands can be justified to stakeholders- and shareholders if they derive from a larger vision or strategy around the value of data for the company. If no such concept or strategy exists, there is a risk that stakeholders do not cooperate with the champion, putting value creation through data and data science at risk. In this situation, even promising projects may fail to gain traction for reasons unrelated to their value proposition. The relatively short tenure of Chief Data Officers is a testament to the difficulties of embedding data science into legacy organizations (Davenport et al., 2021).
From my experience, there is little chance that a high-value data science initiative will succeed without a top management champion. Users of the data science product usually need to be more connected to technical data experts. Providing the organizational means to collaborate as closely as possible requires the formal authority to change the organization towards this model. Failure to provide top management support risks the project being sidetracked by secondary organizational and technical interests, setting it on track to become another project on the pile of “data things we have tried but were not successful with”. Leading a data science project also means being able and courageous enough to stop the project when it becomes clear that user value creation is not happening.
Think Product
In their article “Approach Your Data with a Product Mindset”, Jedd Davis, Dave Nussbaum, and Kevin Troyanos argue that
Digital transformation is “[…] a matter of identifying each team’s unmet needs, developing analytics point solutions that address these needs, abstracting these solutions into analytics products that can be used to address additional needs down the line, and packaging these products in a way that drives adoption by stakeholders across the organization.” (Davis et al., 2020)
Iteratively building a product based on user feedback is a great way to develop data science products. Agile principles, in contrast to traditional waterfall principles, provide an excellent basis for companies to do so.
As discussed above, data science, being “science”, is uncertain. One cannot know whether the idea for generating user value will work until experiments with the data are done. As the product is being developed, more data may be needed or methods must be adjusted. An agile approach with an interdisciplinary team that collaborates with users, in contrast to a waterfall throw-over-the-fence-please-solve-my-problem approach, will allow dynamically adapting to the organization's reality and identifying and pushing its bounds where required. A fast feedback loop ensues, minimizing the risk that the data science project is derailed from delivering its estimated high ROI or, if unsuccessful, quickly providing transparency that the project is going nowhere.
Taking an agile approach will also prevent you from tapping into one of the many fallacies with data: Data collection without purpose. There is no point in collecting data that “you might need in the future for some project”. This is a dangerous approach because collecting and maintaining data takes effort, blocking resources from the organization that could be otherwise used to develop user-focused data projects based on an actual need. An agile approach will dictate data collection only if needed to solve a business case.
An agile product approach can be an enormous challenge in traditional matrix organizations. For starters, you want all team members to be fully committed to the project so that it can achieve its business value without distractions. The most challenging issue here is that you need to form a product team. A product team fully committed to providing business value to the project. A product team that has organizational and technical autonomy to do what it takes to generate business value for the product users.
What does that mean in practice? It means that people should be removed from the usual responsibilities of their line positions so that they can entirely focus on one product. This will enable the product to be developed quickly. It also means that administrative and functional reporting for product team members may need to be split. In an agile team, every developer's work is the result of features requested by the product owner. A good product owner is not per se a people manager with disciplinary responsibility but would take on the employee’s functional reporting line as the product develops. Administrative reporting, people development, and other people management tasks must be continued by the dev team’s member’s line manager. There is potential for conflict as line managers need to “let go” to some extent and entrust the employee’s work management to the product owner. There are many frameworks and opinions about how to manage agile organizations and reading up about them is recommended, for example in the?excellent overview ?by Parabol (Parabol, 2023).
Building interdisciplinary teams in a matrix organization is also challenging for budgeting reasons. No matrix column can claim “full ownership” of product outcomes or “full control” of the features being implemented in the product. Revenue, cost, and profit thus need to be split among different units of the organization. This can be a tricky task and involves conflict and convincing. This is where the importance of having a top management champion manifests itself once again.
Lastly, if this is your first time working with Agile, you must provide training and likely create new job profiles. For many companies, the time-tested Scrum framework is often a good starting point for building products through agile principles. In Scrum, team members take on different roles. At the bare minimum, you need to have a product owner who is close to users and understands the technological context in which the developer team operates. In addition, you will have to identify a Scrum master to lead the product team through the agile process. Henrik Kniberg offers a?great checklist ?that helps you understand which directions your company can take to improve its Scrum practices (Kniberg, 2022).
While there are many things to consider, I would like to offer my perspective on one particular aspect of Scrum: If you skip having a separate Scrum master, this role will likely fall to the product owner. This can be a dangerous gambit because this additional burden takes focus away from the product owner’s responsibility of discovering user needs and channeling them into the product. It puts the creation of business value at risk which is not a good idea at the beginning of your journey into boosting business through data science.
Make a Platform Decision
Here comes the one central value unlock that takes work: Deciding on the right data platform. Given my personal experience, making a decision here is essential. You have to make it early. And you have to get it right. How so?
There are many, many ways to build data science products today. When you hire a data team, most team members have likely used data platforms and machine learning solutions. There will be differing opinions about which is the best. Vendors big and increasingly small know this and deliver aggressive sales pitches to both top management and data scientists. Views send heads spinning and not making a decision will result in a fractured data platform landscape in which components do not work together well and, once again, the creation of business value is at stake. This is why it is essential to decide on a platform early and stick with the decision at least until the product has outgrown its capabilities.
How to make this decision? Likely, the decision for a data platform is made at the top management level. From my experience, it is of absolute importance to make an informed decision based on the experience of the organization’s technical data science, MLOps, and data engineering experts. Most likely, the developers in your product team are some of the only experts on data science in your organization, so you have to consult with them. Due to their experiences with multiple platforms, they know what to look for on the technical side, which is the most crucial aspect of a platform to provide value to a business.
It is vital that whatever platform you choose provides ongoing technical support. You are looking for a partner, not a mere vendor. No platform is perfect and as developers try to bend it to your organization’s needs, they are likely to encounter technical challenges that need to be solved with the platform provider. It is also essential to consider the functional constraints to decide on a platform. Does the platform provide a cybersecurity concept in line with what your organization demands? Can data scientists install and run the latest open-source code specific to your business on the platform? Can the platform interface with the various data sources you have in your business? Does the platform support all components of business value creation, from data storage and modeling, all the way to providing a user interface? Does the platform support separating development from production workspaces? Is the platform and its cost structure scalable so that you can help anywhere from 10 to 10,000 users? Is the code built on proven technology that is easy to hire for? There are so many questions to be answered, you must answer them and come to a decision.
Once again: You want to avoid a fractured data infrastructure landscape. It will add an unprofitable overhead and reflect poorly on your cost structure in the long term. A fractured data infrastructure landscape will result in you having to hire for an unreasonably diverse set of skills and lead to the immobility of tech workers in the organization, leading to frustration and ultimately shorter tenures of valuable human capital. So: Consult with your technical experts, decide, and stick with it.
领英推荐
Formally Track Business Value Creation
Although implicitly linked to building a product in an agile way, this point is so important that it deserves an explicit mention: You need to formally track business value creation. When identifying top use cases, you have already built a framework to assess data science value creation. To refine this framework for future projects, see what works and what does not. And, to steer the project, you must have transparent information about how much business value is created. You want to make business value creation measurable.
For data science products, this is harder than it sounds. Assume your product focuses on prediction and through data science, you are now able to make a better prediction on which you take action. Most likely, you measure the result of the action you took based on the prediction as an indicator of the performance of your project. If you would not have used data science for prediction, you might have taken a different action. But then, to see which action was more successful, you would have to do both actions, measure their results, and compare them to each other. Unfortunately, doing both actions is at best uneconomical and at worst infeasible. So, measuring the success of data science products either takes additional budget, extensive negotiation between stakeholders to find an adequate proxy to measure success, or, here is where the top management champion comes in again, a leap of faith and a strategic vision.
Nevertheless, despite it being difficult, you need a formalized framework to track business value creation. In the long term, it will help you to see through the buzzwords, steer your decisions, and justify your actions to internal and external stakeholders. And, hopefully, it will show the true value data science brings to your company!
For tracking value creation on a portfolio level, the Objective and Key Results (OKR) framework has gained popularity. It helps to formalize objectives and measures the organization’s progress towards those objectives. As a prerequisite, it requires a strategy – thus, once again, a top-level initiative and champion to align the portfolio direction with the company’s vision. On a product level, the product owner is responsible for measuring business value. Depending on the desired level of confidence, business value can be determined as Net Present Value, Return on Investment, or, even by simply guessing. For products which are already deployed and providing value, Key Performance Indicators (KPIs), are a viable way to ensure continuous delivery of business value.
A (repeated) word of caution: The first value proposition for your product is never the final one. Just as any agile deliverable, you need to update it as your project moves along and your users discover how your product solves their problem.Although implicitly linked to building a product in an agile way, this point is so important that it deserves an explicit mention: You need to track business value creation formally. When identifying top use cases, you have already built a framework to assess data science value creation. To refine this framework for future projects, see what works and what does not. And, to steer the project, you must have transparent information about how much business value is created. You want to make business value creation measurable.
For data science products, this is harder than it sounds. Assume your product focuses on prediction and through data science, you can now make a better prediction on which you take action. Most likely, you measure the result of the action you took based on the prediction as an indicator of your project's performance. If you would not have used data science for prediction, you might have taken a different action. But then, to see which action was more successful, you would have to do both actions, measure their results, and compare them. Unfortunately, doing both actions is at best uneconomical and at worst infeasible. So, measuring the success of data science products either takes additional budget, extensive negotiation between stakeholders to find an adequate proxy to measure success, or, here is where the top management champion comes in again, a leap of faith and a strategic vision.
Nevertheless, despite it being not easy, you need a formalized framework to track business value creation. In the long term, it will help you to see through the buzzwords, steer your decisions, and justify your actions to internal and external stakeholders. And, it will show the value data science brings to your company!
The Objective and Key Results (OKR) framework has gained popularity for tracking value creation on a portfolio level. It helps to formalize objectives and measures the organization’s progress towards those objectives. As a prerequisite, it requires a strategy – thus, once again, a top-level initiative and champion to align the portfolio direction with the company’s vision. On a product level, the product owner is responsible for measuring business value. The measurement should be aligned with the context of the formalized use case (see first article ). For products already deployed and providing value, Key Performance Indicators (KPIs), are a viable way to ensure continuous delivery of business value.
A (repeated) word of caution: The first value proposition for your product is never the final one. As with any agile deliverable, you need to update it as your project progresses and your users discover how your product solves their problems.
Build an Interdisciplinary Team
So you want to build a data science product in an organization without formalized data science products? To me, this is like wanting to build a water park in the desert. It can be done, but it takes a variety of experienced experts to go unconventional ways when needed.
At first thought, hiring a few data scientists is enough to build a data science product. But this is another fallacy! A data science product is merely one with data science at its core. However, delivering data science to users takes much more than data scientists. While there are no one-size-fits-all approaches to building the perfect data science product team, I have seen some repeating patterns of what is needed to get it right. Here are the roles you might be looking at in your team:
Let me walk you through why these roles would be important.
The?data scientist’s?role is at the core of the product – they transform data into models that the customers use to create business value. In a modern data science endeavor, data scientists should know how to do machine learning in the cloud. They should be aware of how to perform data science within the legal constraints of their product’s legislation (e.g. GDPR) and consider ethical considerations. If this is too much to ask for your data scientist, consider getting a legal and ethical expert on your team. To build a data science product, data scientists cannot operate in a vacuum.
The?DevOps/MLOps?(Machine Learning operations) engineer role is as crucial as the data scientist role. They are responsible for providing the computational infrastructure on which the data science product is developed and deployed. They set up continuous integration and deployment pipelines which help mend code into a working application quickly. DevOps/MLOps experts also map the complex requirements of cybersecurity and outsourced IT onto the project, a complicated endeavor. Without such a role, the data science role is overburdened with caring about infrastructure and security which puts the core value of the product, the data science, and therefore the generation of business value at risk. Suppose you consider not hiring an expert here and are thinking about leaving the DevOps/MLOps responsibilities to the data science role. In that case, you should know that commonly, data scientists do not possess the necessary skills to manage secure computational infrastructure in a way that allows scaling. Thus, not hiring a DevOps/MLOps engineer on the team can be a costly mistake.
That I list the?frontend designer?role may come as a surprise to some. The issue is that the frontend is the user's primary interface to the data science product. As such, it significantly impacts the adoption of the data science solution. If the frontend is cumbersome to use, then users will not enjoy the product and the value proposition of the product will not transform into actual business value. What kind of skills the frontend role should bring depends on the product's use case. While design and user experience are the focus, this role also covers frontend development.
What happens if you leave the frontend design to the data scientists? As a data scientist myself, I can tell you that we are not really good (=bad) at building frontends. The maximum we can do well is providing an interface to our models as an API that IT experts can call from inside other software products. But, since for high-value use cases in most companies users are usually not IT experts, such an interface will be unusable, at least from my experience. Forcing data scientists to build any other interface will set them on a painstaking path to upskill in user interface design – an area far removed from their expertise with a steep learning curve they most likely do not enjoy. An ask that removes them from what they are good at: Providing user value through data science.
When would you want a?data engineer? When you are dealing with large amounts of data from different sources. If you need a data engineer depends on your use case. Aggregating data from various sources can be a time-consuming and complex challenge. To provide business value, use cases often require that data be updated in near real-time. This requires a specialized skillset that data scientists still need to gain. Depending on your organization's data maturity, you may need an established data governance model. In this case, the data engineer will likely interact with different platforms and data in various states of cleanliness. Once again, this is expert territory that warrants including a data engineer in your team. If you are dealing with small, centralized data sets, however, a data engineer may be optional. One parting note here: If your organization lacks a proper data governance model, having a top management champion on your team becomes even more critical to accelerate the negotiation of who can use what data for which purpose. Your data engineers will thank you.
Your team may include a?full-stack engineer. When is this needed? It is needed if you need to build a full-blown data science-based application with a user interface that extends beyond evaluating the results of a machine learning model. It is required when your data science application becomes business software. Let me give you some background: As you serve your customers’ needs, you will begin separating frontend (user interface) from backend (machine learning model and increasingly other features like visualization). For a proper separation, you must define clear, maintainable interfaces between front and backend. To make your software reliable, you must develop and run a suite of automated tests. This is far beyond the capability of a data scientist and not within the expertise of the frontend engineer as described above either. Writing a functional piece of software is within the expertise of a full-stack engineer though, so having one will add valuable skills to your team. In addition, the full-stack engineer can support the frontend designer with developing the frontend. If your application is a simple interface to a machine learning model, having a full-stack engineer might be overkill.
Let’s talk about the?customer success engineer?role. As your product hopefully gains traction with the users, expect support requests to pile up. This is entirely normal and desired – most likely your documentation is not fully fledged out yet and you should be excited that users want to use your product. But who should help the users? Putting any existing team member on the case will drag them away from their primary task and stall the creation of additional business value. If this is fine for you (and the team members), you do not need to hire a customer success engineer.
But given that it is so essential in an agile project to give users what they want and excite them, think about the additional benefit a customer success engineer could bring: They could provide users with the feeling of being heard and take explicit responsibility for channeling feedback and requests to the product owner. In all likelihood, the customer success engineer has to possess good technical skills (thus “engineer” in the title) so that they do not depend much on the dev team to show them how to work with the application. They may even be able to implement small changes themselves. And, they are a valuable resource for building documentation and tutorials which help widen the user base and deliver business value.
In the next article in the "Boosting Business Through Data Science" series, we will discover what it takes to scale up data science efforts in business with a focus on upskilling, growing a data culture, and portfolio management.?Follow me ?here on LinkedIn to stay in the loop.
References
Davenport, T., Bean, R., & King, J. (2021, August 30).?Why do chief data officers have such short tenures?. Harvard Business Review.?https://hbr.org/2021/08/why-do-chief-data-officers-have-such-short-tenures
Davis, J., Nussbaum, D., & Troyanos, K. (2020, May 12).?Approach your data with a product mindset. Harvard Business Review.?https://hbr.org/2020/05/approach-your-data-with-a-product-mindset
Fountaine, T., McCarthy, B., & Saleh, T. (2021, April 13).?Getting ai to scale. Harvard Business Review.?https://hbr.org/2021/05/getting-ai-to-scale
Kniberg, H. (2022, November 29). Scrum checklist.?https://www.crisp.se/scrum/checklist
Parabol. (2023, June 14).?Agile frameworks: A complete overview.?https://www.parabol.co/resources/agile-frameworks-guide/
#datastrategy #datascience #strategy #dataanalytics