"Synthetic data" needs a new name (...why am I thinking about Beyond Burger?)
Background: A few post-panel thoughts after joining and listening to smart people like @emilygoligoski @sarahchaten Kumar Doshi Josh Goldenberg @priorium Adam Bai at event hosted by Glimpse for hashtag
#TechWeek2024.
1) "Synthetic data" needs a new label.
The idea of creating new (synthetic) data beyond existing real data sets - to augment the utility of it with AI and try to glean more from it - that's a cool idea for many reasons. While still early, we shared a few use cases during the panel where it feels like it can be done responsibly. It works if you have real data to start, and ideally some ways to validate it in parallel studies. But "synthetic" sounds fake when in fact it leverages real data smartly. So perhaps a new name: maybe "data extension" or "data extrapolation." Or maybe we simplify it further and look at the food industry for inspiration because ...who wants a "synthetic burger?"
2) Yes there is a hype curve, but research ppl should be open minded.
I have seen it before with the onset of online, mobile, big-data, geo-location, when new approaches come in, some (not all) research people like to show other people in the room that they are smarter and can find the flaw. It's good to know the limitations (every methodology has them). But over time, as hype-curves go up, down, and then steadily re-ascend, the smart researcher will be open to these techniques and say: they wont always work but lets find the use cases where they might just make us faster, cheaper, innovative. And it might help you leverage your older data in new ways.
3) The power of asking a great question...even if asking it later.
Was reminded of how important it is for good researchers to frame a good tight question. This is true for synthetic data uses, AI prompting, but also survey research and analytics (...what do we want to prove / disprove). One example shared yesterday from @emilygoligoski was how they asked NYT readers: if NYT was a retail store, which one would it be? The answers were varied and insightful as to why. She went on to say NYT discussed the needs for their audience segmentation and personas to NOT be frozen in time but evolve as people evolve. I have heard this also from Joetta Gobell. Point in time research - unless it is a longitudinal - can be frozen in time. But imagine if you could unfreeze it, and go beyond the original questions to ask good new questions. This feels like one of the key values in "synthetic data"...I mean "real data extrapolation." Or maybe I mean "beyond data" ??