What questions should you ask of Chat-GPT based analytics platforms?
Harry Powell
Data science leader with track record of innovation and value creation
You know the scenario. You are flooded with sales guys showing you amazing software applications based on some AI technology that is going to change your world. It looks amazing, it really does, and when you see the demo you genuinely think that your company needs to invest or it will get left behind.
But you know, underneath it all, that AI is not a solved problem. This software will have limitations, and maybe they will mean that the software is not right for your firm yet. It's just that you don’t know enough to ask the questions to expose what those limitations are. So you are kind of stuck.
A case in point is the explosion of GPT/Chat interfaces for data analytics. These promise that everyone in your organisation will be able to ask questions of your data, even if they aren’t analysts, just by asking natural language questions. Instead of typing out a SQL query to calculate a KPI, you can simply ask your computer to access the data and tell you what you need to know.
This is a persuasive value proposition. The target market is large; most corporates aspire to data-driven decision making but have struggled to upskill staff. Despite having terabytes of valuable data in data lakes, it isn’t being used at the coal face. Chat promises to remove the hurdle of having to train staff so that ordinary business people can use the data themselves.
I have no doubt that plenty of CEOs will want this, and I am relatively confident that it is technically plausible given the new large language models available. So investors will be interested too.
Perhaps the most important questions to ask are not directly related to the obvious functionality. All of these applications will all be able to turn language into numbers, and the sales guys will have a long list of examples that will be compelling. But it is often the unstated stuff that matters most. Sure the? output looks impressive, but what are the implicit assumptions underlying this process, the unspoken constraints that we all just take for granted when working with real analysts that the software will need to replicate if it to be a true replacement?
Here are some of the questions I would ask.
Does it allow me to ask for an answer or do I still have to tell it what to do?
This is the core functionality. You need to be able to ask something like “tell me the most profitable product” without specifying how the data is to be joined, how margin is to be calculated, what outliers are to be removed etc. All that detail should be taken as implicit depending on context; if the user needs to know it, then it's not clear that there is any gain in using Chat Analytics.
Does it prompt me to ask new and better questions of the data?
If you are always going to ask the same questions you may as well have a dashboard. The point of analytics is not necessarily going straight to the answer, no matter how efficient that sounds. It is to discover facts you didn’t yet know. If the AI allows the user to sidestep the creative process, then the AI had better be able to do that for you as well. So you need your Chat Analytics to say “here’s the answer you asked for, but given the context you might like to know about this as well”. Without this functionality you may end up only noticing problems once they become sufficiently embedded to affect your core KPIs.
领英推荐
How does it check that the query it executed is what the user meant?
One of the great advantages of conventional database queries is that it is possible to express yourself unambiguously. Of course that isn’t always what happens, particularly with complex queries. But any one statement has one meaning, and that meaning is the same to everyone. Chat Analytics needs to be able to disambiguate. This could be simple, but it needs to be able to check between potentially subtle distinctions in a way that the business user can understand.
How do you help users understand how the results should be interpreted?
Analytics results can be surprisingly hard to rely on; they tend to rely upon assumptions which may or may not be valid. Sometimes these assumptions don’t matter, other times they do. Analysts themselves often ignore the assumptions and get away with it because they know how the result is going to be used. Reporting a long list of conditions with every result will just end up with them being ignored. A good Chat Analytics system will be able to flag up material assumptions and ignore the others.
Is it able to to check that the answers are right?
Just like humans, Chat-GPT doesn’t always get the answer right but, dangerously, tends to report results with a degree of authority bordering on certainty. Good analysts doubt themselves. They tend to try to calculate the same thing in two different ways to make sure it is correct. How Chat Analytics does this, and how it determines what to do if the answers don’t match will be critical to success. After all it would be unhelpful simply to deliver both results to the user and ask them to decide.
Knowing the answers to questions like that will tell you how much training your workforce will need despite the new interface. It is possible that you could have a Chat Analytics interface, and yet still have to upskill analytically literate people to ask the big picture questions.
Perhaps the biggest question of them all is not to do with the interface, but with the underlying data.?
Does it need to be fed with good data?
How does it handle incomplete, incompatible and incorrect data? What does it do to bridge the gap. What intuitions does it have about the business that enable it to make the best of a bad data environment. Given that data engineering is 80% of data analytics, it is possible that Chat Analytics platforms are answering the wrong question. Instead of asking “how can I get everyone to do analytics?” maybe you should be asking “how can I get everyone good data?”
But that’s a story for another day.
Founder Graphistry.com & Louie.AI: 100X Data-intensive investigations w/ genAI-native design and GPU graph AI. Started GFQL, GPU dataframes, web FRP. Hiring marketing, engineering, data science @ graphistry.com/careers .
1 年Good questions, and tip of the iceberg for enterprise envs with all sorts of viz, wrangling, safety, perf, & collab needs We are piloting a GPT environment for data analyst & BI teams, including with graph DB connectors for F500 teams, so if a problem for Tigergraph users, happy to chat! Some of our users need this to go all the way to legal court evidence admissibility and preventing novice staff from taking down important accounts , so a lot to get right when teaching an LLM to work with a DB.
He/Him. Black lives matter. Making the world a little better, one byte at a time.
1 年I feel like the part people are forgetting about using these generative-AI the most is that they're completely dependent on what data sources they have access to when training/analyzing. There's also a matter of more sources doesn't inherently equal better sources... connecting a hundred low quality data sources and one high quality data source is still likely (if not weighted properly) to end up with a model that produces results that are very confident and very wrong.
Data Scientist at QuantumBlack, AI by McKinsey
1 年How do you see the confidentiality matters here? Wouldn't the deployed chat need to cross-check vs its source for your data? Processing company's data off-prem, potentially using to further train the model etc... Sounds like a legally murky ground (or expensive opt outs).