Is Synthetic Sample worth it ?

Is Synthetic Sample worth it ?

The concept of "synthetic sampling," or using generative AI to mimic human responses in market research, has garnered significant interest. Companies like Kantar and Emporia have tested this with promising yet imperfect results. AI-generated responses—though efficient and scalable—often exhibit a strong positive bias and lack the nuance, variability, and sensitivity to sub-group distinctions that real human responses provide.

?

Key Points on Synthetic Sampling in Market Research

  1. Positive Bias in AI Responses Issue: AI-generated responses often lean towards positivity, which can distort research findings.

Example: In Kantar's study, GPT-4 rated luxury product experiences significantly more positively than actual human respondents. While human feedback varied widely, AI’s responses frequently overestimated user satisfaction. For brands, this could lead to overestimating customer satisfaction and missing out on genuine improvement areas.        

  1. Lack of Nuance and Sub-group Sensitivity Issue: AI responses often lack nuanced differentiation for specific demographic or sub-group characteristics.

Example: When Kantar segmented responses by income levels, GPT-4 struggled to reflect how lower-income respondents viewed product price differently from higher-income ones. This shows that while AI can approximate general responses, it fails to capture the diversity within specific groups. Such a limitation could skew a product’s market fit assessment in targeted demographics.        

  1. Homogeneity in Qualitative Responses Issue: AI often generates repetitive, stereotyped responses, missing the richness of human expression.

Example: Emporia found that AI-generated responses for job satisfaction among IT decision-makers had a “herd mentality.” The synthetic personas were overwhelmingly “strongly satisfied,” unlike the varied responses from real people. This lack of variation could be misleading in B2B research where individual motivations and career challenges are critical insights.        

  1. Sensitivity to Training Data and Context Issue: AI's output is limited to its training data, and it often underperforms in unfamiliar contexts or specific product categories.

Example: Off-the-shelf AI models are trained on generic data, making them unreliable for niche product testing or context-specific questions. Kantar’s analysis showed that generic AI responses missed key attitudes unique to specific product categories, leading to less accurate predictions of customer behavior.        

  1. Potential for Supplementary Use in Scaleable Insights Future Opportunity: While AI currently underperforms in detailed insights, it could be a valuable supplement in large-scale, less variable responses.

Example: Using synthetic sampling for high-volume, low-complexity questions—like generic product feedback or brand sentiment—could speed up data collection. AI could provide initial responses on broad questions and leave detailed insights to human analysis, ensuring that core brand messages resonate across diverse groups.        


These findings underscore a pivotal truth: AI is not yet a reliable substitute for authentic human responses, especially in qualitative insights. However, it could become a valuable supplement if fine-tuned with proprietary data and context. As models evolve, blending human and synthetic sampling could enhance research, particularly for scaling generic data or expanding response types where variability isn't critical.

Ultimately, while synthetic sampling is a fascinating prospect, the present over-reliance on AI-driven data might risk undermining the very authenticity and granularity that make market research insights meaningful. As we progress, it’s clear that AI needs further refinement and thoughtful integration to serve as a robust research tool rather than a shortcut.


Anil Pandit

Executive Vice President

Publicis Media



*Disclaimer: This post is for informational purposes only and does not endorse or disapprove of any specific tools, platforms, or technologies. The views and opinions expressed in this article are those of the author and do not reflect the official policy or position of the company he is employed in.



References :

https://www.emporiaresearch.com/case-studies/real-insights-or-robotic-responses-a-comparative-analysis-of-real-vs-synthetic-responses-in-b2b-research


https://www.kantar.com/inspiration/analytics/what-is-synthetic-sample-and-is-it-all-its-cracked-up-to-be


Habeeb N

A futurist building a new data philosophy @ theDATAfirm - World's first single source non PII Humanised Dataset, Protecting Privacy, in-depth profiling of 1.4+Bn Profiles - Creating new data standards.

2 周

Using any of the LLM's without contextual referencing data causes such issues - the LLM's need very very Large contextual referencing data to be able to generate synthetic data. From a market research perspective the Human profile context is essential alongwith the Human ecosystem referencing - it's not enough to reference humans as segments without layering & fine tuning indirect data points of relevance eg. for luxury goods factors such as family size, type of family, size of house, lifestyle, home amenities, consumption, opinions, outlook, location, distance, cultural, regional nuances, financial mindset... from lifestyle related activity points...need to be layered to generate customised profiles that then can be used for generating sythetic data for a particular business vertical - we're at 13000+ attributes per profile today, just right in terms of data volumes to start training LLM's ??

要查看或添加评论,请登录