Small Data and Artificial Intelligence: The Path to Innovation
Luis Rodrigues, PhD
Senior Data Scientist at Haleon | Generative AI & RAG Specialist | 8+ Years Driving Innovation in NLP & Recommender Systems
"Data is the new oil," a phrase coined in 2006 by English mathematician Clive Humb, has become a common metaphor to illustrate the value of data in today's economy. However, this phrase hides another dark side of data: It cannot be really used if not "refined." And, unfortunately, many companies seem to interpret this phrase in a one-sided manner. Many companies are evolving to collect and store massive amounts of data, big data, yet still few are managing to generate business impact through its use.?
This contrast is evidenced by a study from the Boston Consulting Group, showing that in 2020 about 60% of companies had an Artificial Intelligence (AI) strategy, however, only 10% of these companies achieved significant financial returns. Indeed, data refinement has proven to be more difficult to achieve than that for oil.
According to the same study, companies that reap benefits from big data and AI projects are going beyond the basic issues of having the data, the technology, and the right talents, organized around a corporate strategy. To achieve significant financial returns, companies need to make extensive changes in many processes, integrating big data and AI as strategic components of their businesses. In addition, feedback between AI and the organization must be fostered, in a way that AI learns from human feedback and humans learn from AI. That is, it is not enough to embrace big data and AI projects; it is necessary to develop a data-oriented culture. As a result, companies will have 5x more chances of reaping benefits compared to those that make little or no change in their processes.
The Alternative and Progressive Path to Success: Small Data
Although the path to success with big data and artificial intelligence projects involves extensive changes in company processes, such changes do not need to be abrupt. One way to evolve organically and progressively is to embrace the opportunities of "small data." Several projects do not require a lot of data, can be carried out in a few months by people working part-time, and still result in annual financial benefits of up to $250,000, as revealed by an article from Harvard Business Review. These projects can incorporate machine learning/AI methods or focus on using basic Analytics/Statistics methods, accessible to all.
Indeed, the benefits go beyond financial ones, also permeating the data literacy of employees, the democratization of data use for decision-making, and the building of trust by the organization in data project management. In other words, small data projects help companies create the culture that big data and AI require to achieve significant benefits with their deployments.
Innovation via Small Data and AI
Opting first for small data projects does not mean abandoning or postponing the search for innovation. On the contrary! In the Corporate Venture model, the process of disruptive innovation involves short learning cycles through rapid experimentation, aiming to reduce uncertainty through small bets to systematically find a way to scale the product.
These experiments can vary widely in nature and execution, ranging from qualitative research to a simulated sale, but in general, they are associated with the collection of attitudinal or behavioral data from consumers regarding the product in development in the order of hundreds/thousands of observations, characterizing them as small data projects. Indeed, the data analyses performed afterward usually involve only basic statistics. However, despite being small data, the application of AI can enhance discoveries and improve the product/market fit process. Let's delve into this, starting with how to obtain the data.
In the book The LEAN Product Playbook, author Dan Olsen states that, in market research, it is common to start with qualitative research to understand the issues relevant to consumers and, subsequently, conduct quantitative research to find the number of consumers who provide each answer. This approach provides the "what" to do, but not the "how" to do it. This gap can be filled by the Outcome-Driven Innovation (ODI) process created by Anthony Ulwick, which aims to make innovation predictable.
Specifically, the ODI process has a quantitative research tool, called Opportunity Scoring, to discover opportunities and prioritize efforts in product development, proposing to ask consumers the degrees of importance and satisfaction with each desired outcome of the solutions they currently use (obtained from qualitative research). Then, the scores of all respondents in both questions are combined to compute an opportunity score for each feature, allowing to map points that consumers consider essential but are dissatisfied with.
AI Enhancing Data Analysis
But is it the best approach to aggregate the responses of all respondents together? That is, to use an average score from a statistically representative sample of consumers to make inferences about the preferences of the population. Statistically, yes, but in product development, no, because a product is usually built to serve a specific target audience and not the general public. Even more so in times of personalization! Moreover, there can be different entry angles for a product depending on the market niche observed, and choosing that entry angle may imply different strategies in product development.
One way to check these biases is to perform a stratified data analysis, such as for each type of customer, for each persona identified for your product. Indeed, one could conclude that, on average, the population is well served by the current solutions, but on the other hand, a market niche, a group of potential consumers, or, more specifically, a persona, is not.
领英推荐
How is it possible to arrive at such different conclusions by only changing the data analysis method? To understand why, first, let's focus our discussion on the average. In the book The Black Swan, author Nassim Taleb writes that the average is dumb. Indeed, Taleb's argumentation is more elaborate, explaining that there are classes of magnitudes where the use of the average is appropriate, but that in most real-life cases, it is not. It goes beyond the scope of this article to delve into such details, but let's look at an example to illustrate the concept.
Suppose that the annual sales of a shoe retailer were 40% women's shoes size 36 and 60% men's shoes size 41. When trying to summarize this information for a potential executive report, one would say that, on average, the shoe size sold was 39, a size not even purchased. Therefore, the average does not characterize the distribution of shoe sizes sold and, consequently, someone with only such information could make a completely erroneous estimate/prediction.
On the other hand, it is noted that the average proves to be an appropriate metric to summarize the shoe size sold by gender. In this case, a single attribute, gender, is sufficient to define profiles in which the average represents the group. However, there are cases where it is not, or simply, one wishes to obtain a more granular categorization, incorporating other attributes relevant to the business, such as the customer's consumption habits, the evaluation of the shopping experience, the purpose of use of the product, socio-demographic characteristics, etc.
In these cases, creating personas via basic statistics or combinatorial analysis can be very complex! Indeed, the complexity grows exponentially with the increase in the number of attributes. Therefore, it becomes necessary to use more sophisticated techniques, and this is where AI, or more specifically, machine learning, comes in.
Creation of Personas through AI
The process of creating personas based on AI essentially consists of two main stages: Clustering by similarity and profile identification. Clustering by similarity is a type of machine learning task, usually known as clustering, which involves automatically finding groups of individuals with similar characteristics such that there is homogeneity among individuals within the same group and heterogeneity among individuals from different groups.?
Indeed, for some applications, such as selecting products for cross-selling in recommendation systems, this clustering stage would be sufficient. However, in the case of product development, where it is necessary to know the market niche that the product can reach, it is necessary to characterize these groupings.
Then, comes the stage of profile identification, also known as customer profiling, which involves obtaining the segmentation rules for each group. For example, group 2 is formed by consumers between 30 and 40 years old, who have already consumed at least one premium product and reside in areas with low commercial activity.?
For this purpose, various methods can be employed, with the decision tree being the most commonly applied. It consists of a machine learning algorithm for solving classification tasks by providing groups with well-defined segmentation rules as a result.
Finally…
The creation of personas is an example of the possible application of AI in accelerating innovation projects, but there is a wide range of opportunities, even in the small data scenario, and these opportunities only increase as the organization's maturity with AI projects and the quantity (and quality) of stored data grows.
So, don't wait for the collection of years of data from numerous processes, the structuring of a vast data lake, and the creation of a scalable data processing environment to start benefiting from data and model-based (AI) decision-making. Of course, these steps are essential for the deployment of big data projects and should be included in the company's long-term future. Until then, enjoy the journey to mature progressively and systematically through small data and AI projects. The future has already begun!
Note: I am the original author of this article, which was first published in Portuguese in IT Forum magazine on June 2, 2021. This version for LinkedIn has been personally translated and adapted by me to reach a broader audience.
Strategic Director & VC Investor
1 年I found the section on AI enhancing data analysis through persona creation particularly fascinating. It's inspiring to see how machine learning techniques can be applied to extract meaningful insights from smaller datasets, ultimately driving innovation and enhancing the customer experience. Great read!
Can't wait to dive into this insightful read! ?? Luis Rodrigues, PhD
Data Analyst (Insight Navigator), Freelance Recruiter (Bringing together skilled individuals with exceptional companies.)
1 年Can't wait to dive into this insightful read! ?? Luis Rodrigues, PhD
Founder Director @Advance Engineers | Zillion Telesoft | FarmFresh4You |Author | TEDx Speaker |Life Coach | Farmer
1 年Excited to dive into this insightful article on the power of small data and AI in driving innovation! ??
Founded Doctor Project | Systems Architect for 50+ firms | Built 2M+ LinkedIn Interaction (AI-Driven) | Featured in NY Times T List.
1 年Excited to dive into this insightful read about the power of small data and AI in driving innovation! Luis Rodrigues, PhD