Human vs. Synthetic Data: Unlocking the Potential of AI in Market Research.
With the potential to revolutionize traditional methodologies and open up a world of possibilities, AI technologies are sending shockwaves through the Market Research industry. While the obvious application of AI in market research lies in streamlining research processes and parsing vast amounts of human-generated data, there is another groundbreaking frontier emerging; the generation of “artificial” or “synthetic” data, AI-generated data that mimics human behavior and preferences.
The Rise of Synthetic Data
Synthetic data can be generated through a variety of techniques. One commonly used method is through “agent-based simulations ”, where synthetic data is generated by simulating the behavior and interactions of individual agents within a given system or environment. This technique is particularly useful in scenarios where complex dynamics and interactions need to be captured, such as in social sciences or economic systems where the behavior of individual agents influences the system as a whole. By defining rules and parameters, these simulations can generate realistic data that reflects the behaviors observed in real-world scenarios.
For example, let's consider a scenario where a city is planning to introduce a new public transportation system and wants to evaluate its potential effects on traffic congestion and travel times. To model this situation using an agent-based approach, researchers would create a simulation where individual agents (that we could call “personas”) represent commuters who make travel choices based on their preferences and priorities, some prefer faster travel times while others prioritize cost or convenience. These agents would interact with the transportation infrastructure, choosing the mode of transportation, departure time, or route, and interacting with road networks, public transit stations, and traffic signals.
By running these simulations researchers can observe how the introduction of the new system affects the travel behavior of the agents by analyzing metrics such as traffic congestion levels, average travel times, and sustainable travel choices, to assess the impacts of the new system on the overall transportation network.
Quality assurance, software testing, and machine learning training have become the main users of synthetic data, with companies like Gretel , Mostly AI , Tonic.ai , and Genrocket providing services. But what about the application of synthetic data to market research?
Synthetic Data in Market Research
Just like in the city-planning example, by harnessing AI techniques like agent-based models, researchers can create “synthetic agents or personas” that simulate the behaviors, opinions, and preferences of “human individuals”. This opens up a world of possibilities for market research, offering a powerful tool to explore scenarios, test hypotheses, and derive insights without relying solely on real-world data collection.??
As a result, the concept of synthetic data is gaining some traction, and early initiatives have already been launched in this space, such as Persona Panels or Synthetic Users that by analyzing vast amounts of data create personas that characterize behaviors, attitudes, and attributes of a human audience. This opens up new avenues for generating insights with speed and precision.
While the value and reliability of this kind of data are yet to be seen and researchers may be skeptical to trust predictions based on it, its potential to overcome limitations related to human data collection (speed, cost) will drive its adoption.?
Specifically, agent-based modeling is particularly useful when interrelatedness, reciprocity, and feedback loops are known or suspected to exist. The main advantages are:
However, we could argue that synthetic data cannot fully replace human data. While synthetic data has its advantages, such as privacy protection and cost-effectiveness, it still has limitations in replicating the complexity and authenticity of human data. Some disadvantages:?
领英推荐
Running a Conjoint Analysis Using GPT-3
Another example of synthetic data for market research purposes is the use of? Large language models (LLMs) such as GPT-3 -which are artificial intelligence systems designed to comprehend and generate human-like language-. These models undergo training using vast amounts of text data, enabling them to grasp the patterns and structures of natural language effectively. The paper titled "Using GPT for Market Research " by James Brand, Ayelet Israeli , and Donald Ngwe delves into the potential of one such LLM.?
One significant finding of this paper is that GPT-3 can generate responses to a conjoint analysis survey that aligns with economic theory and established consumer behavior patterns. Conjoint analysis is a research technique used to understand how people value different attributes (features) that make up an individual product or service. It involves presenting people with a series of hypothetical product or service profiles, each with different combinations of attributes, and asking them to choose which one they prefer. By analyzing the choices people make, researchers can estimate the relative importance of each attribute and how much people are willing to pay for each level of each attribute.?
In the paper, the authors used conjoint analysis to evaluate the realism of model-based estimates of willingness-to-pay (WTP) generated by GPT-3. They focused on choices of toothpaste and use the queried responses to estimate a multinomial logit model similar to the kind that market researchers use to estimate preferences in standard conjoint analysis.
The results suggested that GPT-3 can provide consistent and reliable insights that match those of human consumers.
The paper also presents various future directions for research. One suggestion is to prompt GPT-3 to generate artificial data that may better simulate realistic scenarios. This approach could capture emergent properties and potentially yield more accurate results than conventional simulated data.
A theoretical exercise ?
Let's imagine the agent-based approach for pricing optimization in the context of introducing a new fast-moving consumer goods (FMCG) product. Here's how it could be done:
As a conclusion
While synthetic data cannot replace real-world or human data in the short term, it offers new possibilities when combined with traditional research methods. As the era of synthetic data emerges, caution, transparency, and disclosure are crucial.
Accurate reflection of real-world behaviors and human preferences is vital in designing and validating synthetic data models. It is up to us to navigate its challenges and embrace its potential for shaping the future of market research.
CPO at RIWI | Entrepreneur | Maximizing Impact by Decoding Customer Behaviour | AI & Neuroscience | Research Technologies
1 年I agree, synthetic and augmented samples will impact industry a lot. Good article, Enric!
Founder & CEO | AI/Gen-AI | Market Research Tech | Insight250 Winner | ESOMAR Council Member | ESOMAR AI Taskforce | Thought Leadership | Speaker
1 年Fascinating!
At Livepanel we work with Synthetic data from real users