登录查看更多内容

An illusion of success: the problem-solution gap in data science

Alexander Polonsky

VP R&D, co-founder at Bloom

发布日期: 2020年11月13日

Data science solutions, in the form of a commercial product or custom-built, are unlike other types of software. The classic problem-solution gap, illustrated in the well-known cartoon below, can be hugely exacerbated in data science by the fact that, in the end, the customer may not realize they got something other than what they asked for. Instead, they may have the false impression that the solution is working quite well.

Customers have learned to objectively evaluate most aspects of software, such as performance, usability, or functional fit. What they have not learned yet is how to evaluate the quality of mathematical modeling in data science solutions. The reason for this is straightforward: most customers lack the expertise required to be able to see that the solution is not optimally, or not at all, suited to their needs.

In some situations, there are objective metrics that help. If a solution outperforms the competition in relation to a desired metric, such as the precision of a prediction model, then the customer rightly has a reason to be satisfied. Even so, it is not always clear which metric should be used in a given situation nor being better than competition is a guarantee of an optimal solution. Moreover, much of data analytics is not predictive or prescriptive, but rather descriptive, and for this type of analysis objective metrics are rare.

For example, a customer may look for a solution that ranks some indicators by the trend of their temporal evolution. Yet, a trend is not a formally defined mathematical concept. As a result, it is open to multiple interpretations and mathematical models. How is the customer to judge which one is the most relevant? The same applies to many other common data science problems such as integration of analysis across multiple data sources or aggregated ranking across multiple criteria.

The reader may object that it is up to the data scientists to determine the optimal solution for their customers. The problem with this is that data scientists, in their turn, usually lack sufficient domain expertise to deduce what the customer’s needs really are. The conceptual gap between data science and its users is usually simply too large to bridge in the timeframe of today’s fast-paced work environment. Moreover, customers must have an independent way of assessing the quality of the proposed solutions - just think of the difference between trusting a manufacturer’s claim versus a validation by an independent consumer agency.

The frequently proposed way out of this deadlock is to make use of the so-called Data or Analytics Translators. The idea is that one can find or train people who know just enough about both the data science and the business side of things to help bridge the gap between them. While such people can certainly help, they can't ultimately resolve the underlying issue. This is because they are not meant to be experts in either one of the domains, and therefore, cannot effectively challenge the thinking on either side of the divide.

People with strong expertise in both data science and business are quite rare, but they do exist. They may not be up for hire full-time, but may well be available for consulting projects, even despite being employed elsewhere. Hiring such experts during the phase of project conception or solution acquisition is the best strategy to minimize the risk of failure or sub-optimal results. Given the strategic impact and budgets associated with data science projects, this is absolutely worth its while.

要查看或添加评论，请登录

Alexander Polonsky的更多文章

Search Wizard: a “magic” prompt that optimizes an LLM for search

2024年3月12日

Search Wizard: a “magic” prompt that optimizes an LLM for search

Problem Search is a major, but not the only, use case for LLMs. Therefore, an LLM is not specifically optimized for…
GPT3: technological breakthrough

2023年1月30日

GPT3: technological breakthrough

Every once in a while, a major technological advance brings to mind the Arthur Clark’s observation: “Any sufficiently…

2 条评论
Social Media Analysis: talking with humanity

2020年12月15日

Social Media Analysis: talking with humanity

The publicly available Social Media data is a cluster of data sources within the vast “Data Universe”. It is huddled…

1 条评论
Data specialists versus Data generalists

2020年12月7日

Data specialists versus Data generalists

Data specialists focus on a specific data source or category (e.g.

An illusion of success: the problem-solution gap in data science

Alexander Polonsky

VP R&D, co-founder at Bloom

Alexander Polonsky的更多文章

社区洞察

其他会员也浏览了

Why Large Corporations Embrace Data Science for a Competitive Advantage

Data Science Vs Data Analysis - How data-driven decision-making is important for D2C Success?

Data Scientists: Overcoming challenges and making them stars

20 Common Mistakes Made by Data Analysts You Must Be Aware Of!

What Do You Do with Data?

The Data Science Lifecycle

The Meaning of Exploring Data: Unleashing Insights and Uncovering Hidden Stories

Data Science: A Game-Changer for Small Business Owners

6 Key Steps Of The Data Science Life Cycle Explained

Working in Tandem: How Citizen Data Scientists Can Collaborate with Traditional Ones for Analytics at Scale

Alexander Polonsky的更多文章

Search Wizard: a “magic” prompt that optimizes an LLM for search

GPT3: technological breakthrough

Social Media Analysis: talking with humanity

Data specialists versus Data generalists

社区洞察

其他会员也浏览了

Why Large Corporations Embrace Data Science for a Competitive Advantage

Data Science Vs Data Analysis - How data-driven decision-making is important for D2C Success?

Data Scientists: Overcoming challenges and making them stars

20 Common Mistakes Made by Data Analysts You Must Be Aware Of!

What Do You Do with Data?

The Data Science Lifecycle

The Meaning of Exploring Data: Unleashing Insights and Uncovering Hidden Stories

Data Science: A Game-Changer for Small Business Owners

6 Key Steps Of The Data Science Life Cycle Explained

Working in Tandem: How Citizen Data Scientists Can Collaborate with Traditional Ones for Analytics at Scale