Exploring the Potential of Generative AI in Digital Analytics
Keynote: Webit, 28. June 2023, Sofia.
A synopsis of my Keynote delivered at Webit, 28. June 2023, Sofia.
Hello Sofia,
It is exciting to be here with you today!
I want to discuss the potential of Generative AI in Digital Analytics with you today.
I do not want to address just its potential to improve how analytics are done. I want to focus on what it changes in the way we think about the problems that we encounter in analytics every day. I want to focus on the relationship between analytics and its primary subject, which has now become a Generative AI-enhanced superuser.
?
The Landscape
So, let's begin by sketching the landscape of Digital Analytics. In tech talks on Analytics, Data Science, and Machine Learning in general, 90% of the time, they would first introduce the infrastructure. We start with some data ingestion units from where you collect clicks and text, mostly. We have a broker module that ensures all the massive data inflow is handled and packed into a Data Lake of some form. From there, we proceed towards the number crunching servers where all the magic happens that ultimately deliver the results. In my career as a Data Scientist, I had an opportunity to work on some of the largest and most complex available at all analytical platforms of this common form.
What is neglected more often than not in such talks is the vast ecosystem of real users, consumers, customers, the competition that constitute the origin of the data - the very root of the Data Generation Process - those who click, like, share, and comment. Through the lens of our analytics, we can clearly see some of them, while others remain in lesser detail, and then we strive to improve the resolution with which we observe to offer insights and predictions.
And now a sudden introduction of a new, highly disruptive creature is altering this landscape. As Generative AIs dive in the game we must comprehend the implications.
THE DATA NARRATIVE IS EXHAUSTED
One of my first conclusions is that the Data Narrative has become exhausted. Stay focused on metrics collection for reach, exposure, and user engagement - and it is no longer sufficient. We can already provide various results based on user-generated metrics, from digital market research to product analytics. There is now a greater desire for a deeper understanding of consumers. The bar has risen.
To take an extreme example, consider the challenge of the automatic extraction of user needs from news articles. This conceptual model classifies media content into functional categories based on many criteria and serves as the foundation for contemporary audience-driven journalism. It goes way beyond pure content tagging, extracting topics from documents, and similar. To be able to understand what user needs are satisfied in some particular news article you need to understand the goal, the intention of its author, the editorial strategy, and then figure out the audience perception. That's not just another metric you can calculate from user behaviour. This requires analysing complex criteria beyond what is apparent in the text, and standard methods in Natural Language Processing are not effective for this task. My team at SmartOcto and I have been studying this problem for some time, and of course we are now using Generative AI to solve it.
?
THE INTELLIGENCE NARRATIVE
The Data Narrative is on the verge of being surpassed by the Intelligence Narrative. Just like our opposable thumb - our first technology - sparked human intelligence by allowing precise tool use once in our distant past, today's sensorimotor activities help develop new kinds of external intelligence.
?
However, these new forms of intelligence have yet to match the capacities of our natural minds. They evolve under the selective pressure we impose on them in order to align with our purposes. Sometimes they exhibit brilliance, while at other times they struggle to deliver.
?
The conclusion of the Data Narrative does not imply the replacement of the entire machinery we have invented. On the contrary, Data and Machine Learning Engineers, Data Scientists, as well as Analysts, remain in their roles; however, they will now need to operate within an environment that is fundamentally and rapidly evolving.
This change stems from a process inherent in our socio-technical ecosystem, which encompasses all the behaviours we need to monitor and analyse, our methodologies and metrics, our machinery, and the integration of Generative AIs within it. We can simply refer to this process as "The Loop."
?
The Loop
Any design imposes a set of constraints. Design a new app, a social network, a new programming language and you are imposing a set of constraints on what its user can or cannot do. That has implications for measurement: the nature of metrics and insights that you will be able to derive from system use is already implied in your design.
领英推荐
But the nature of the Generative AI design is different. Trained on inputs that only partly reflect the capacities of real human minds, but trained on such inputs from many millions of individuals, these systems are now able to mimic human production up to a significant degree. But the essential fact to recognise in this loop that goes both ways, connecting human production to Generative AI and back to us by artificially generated content is that when Generative AI enters the loop it focuses its output on one single user. One single individual using a chatbot like OpenAI's ChatGPT or any Generative AI powered application is now exposed to knowledge that was abstracted away from millions.
EVERYONE IS A SUPERUSER NOW!
So, everyone is a superuser now. And imagine what millions of such superusers can now do!
The Intelligence narrative is all about understanding the data generating process. Users, customers, consumers, the competition, and Generative AIs, all of them are the origin of data now, the very data that we use to provide insights and inform the decision makers. But approaching the data while this new, powerful factor messes with how users, consumers, customers and the competition behave online, without studying the ways in which these behaviours alter will be pointless.
?
So, if your user, customer, or market segmentation yesterday looked like this, and then changed to this only tomorrow, don't be surprised. But if you do not understand the causes, if you do not ask "why", you will not be able to either predict or control the processes anymore.
?
Synthetic Data
I would also like to address something that seems frightening to many people nowadays. The Generative AI production, texts, images, sounds that it delivers, will get online, and then - by assumption - feedback into AI's future training epochs. People seem worried about the impossibility to differentiate between AI generated and genuine human content. Some believe that Generative AIs will somehow poison the whole ecosystem by their outputs, downgrading the whole universe of digital discourse.
But what we know for a fact is that our minds are in large part reflections of our natural, technological, social and linguistic environments. Human minds are indeed not bad at tracking information in their environment at all. When you read a printed word in a sentence, the time it takes for your neural system to process and understand its meaning it is a function of how frequently that word occurs in your language, in your socio-linguistic environment. In cognitive psychology we have a plentitude of such and similar established findings for decades already.
While we know that our minds can be biased, they still excel in tracking the relevant information that surrounds them.
Now, if our minds are the root of the data generating processes that feed the Generative AIs in the Loop, the question naturally arises: why should we distrust synthetic data? For example, if we have a data set that we need to tag with additional information which is expensive or otherwise difficult to obtain from human labelers, why should we distrust Generative AIs with such task? Finally, all that they can do is to reflect our collective knowledge in some domain. What exactly then makes their outputs less valuable than ours? Where exactly should we draw the line between synthetic and genuine human data generating processes?
I additionally see no problem in using synthetic, Generative AI produced data, or a mix of human and synthetic data, to train Machine Learning models in specific domains. Moreover, I would encourage everyone to do so: because the mix of AI and human production will inevitably circulate the ecosystem and influence everyone. You can think of it as an infestation or enrichment, but that is only a matter of your sentiment. The pragmatic truth that everyone in analytics needs to be aware of is that the blend of natural and artificial discourse is going to happen.
Generative AIs will definitely empower the Analytics. Thanks to its unprecedented analytical power in comparison to more traditional Machine Learning approaches we will be able to see everyone in the ecosystem more clearly and gain a deeper understanding of their behaviour. However, we must not focus entirely on how it empowers us as analysts, but understand how it changes the environment which we measure also. So, do not ask only what is inside your head but ask what your head is inside of (as famously highlighted by cognitive theorist Mace in 1977).
N.B. "In fact, Gartner estimates that more than 10% of all data will be AI-generated by as early as 2025, heralding a new age, the Age of With?" (source: A new frontier in artificial intelligence: Implications of Generative AI for businesses, Deloitte AI Institute)
Implications
Niels Bohr, among the greatest scientists of the 20th century, once famously said: “Prediction is very difficult, especially if it's about the future!" But we are here because of the future. Here goes my 5 cents of what needs to happen and what I believe will happen.
Product side and Product Analytics will have to undergo fundamental changes. Understanding the AI-enhancent superuser is now the top priority, but that task is way better suited for an engineering mindset than any other. Companies, startups, business in general should consider keeping their product and analytics team very close - if not even to merge them.
Analytics, in any area, be it Product, Marketing, Sales, will have to focus on the understanding of the data generating processes more than they did focus on the brilliance of their compute resources in the age of the Data narrative. We now need new theoretical models, new assumptions on user behavior, new methodological approaches. Diving deep into the field of cognitive ergonomy is a must. Analytics will embrace the synthetic data but they should carefuly study the systematic differences between them and the genuine human output. A study of the mix of synthetic data with real human generated data is unavoidable in the future.?
Finally, strategy-wise and from business development perspective, some not so obvious traps should be avoided. One might think that the availability of advanced Machine Learning, and Generative AIs are nothing else than that, means that things that were not possible to predict in the past will now become predictable. But just the opposite is true. When everyone begin to use the power of Generative AI the behaviour of markets will become more complex, not less complex, and more complex systems are less predictable, not more predictable. Simulations will probably become a necessity before any decision analysis, and the optimal business strategies in digital will have to be developed by discover niches in which business can control behaviour up to some significant degree, and thus predict and optimise. Doing and analysing will have to go hand in hand.?
Thank you.
?
Sr. Principal Human Factors Engineer
1 年You make excellent points. As a cognitive scientist I wonder why people are so troubled by the use of Generative AI data. As it learned from us in the fist place it seems to be in a good position to generate human-identical data for us to use in data analytics.