登录查看更多内容

ChatGPT for data analysis

Vikram Ekambaram

发布日期: 2024年5月23日

This is a question that has been bugging me for a long time. How can an LLM that never gives me the same answer to the same question, do data analysis - where I expect 100% accuracy?

Here is an example of what I mean when I say ChatGPT will not give you the same answer twice -

Directionally the same but not exactly the same.

This is a feature in Generative AI. It generates text every time and so by default it will be different each time. How GenAI works is that it is predicting the next word and based on probabilities, coming up with the sentence.

That being said, when I give it a table with data and ask it to analyze it, I want deterministic results. If I want to know how many Tables were sold in the East, there is only 1 answer.

So how does ChatGPT's data analyst CustomGPT claim to do data analysis?

Even MIT has a course on this - https://mitsloanedtech.mit.edu/ai/tools/data-analysis/how-to-use-chatgpts-advanced-data-analysis-feature/

I actually asked ChatGPT to answer this and I finally understand how this works.

Here is a step by step -

User Interface: The user inputs natural language queries into the CustomGPT.
Natural Language Understanding (NLU): The LLM interprets the queries.
Task Delegation: The LLM generates and sends code to deterministic tools (like pandas, numpy).
Execution Environment: The generated code runs and processes data.
Results Processing: The output is converted back into text by the LLM.
User Interface: The final results are displayed to the user.

So the actual analytics is done using code and this is the piece I was missing.

Actually this is a great example of GenAI as an agent, where we combine the LLM with Tools to solve real problems and this is the future of GenAI and how we will unlock practical use cases.

That being said, where errors can creep in in this process -

Natural Language Understanding (NLU): The LLM interprets the queries. Again since this is probabilistic there can be mistakes / misinterpretation here.
Results Processing: The output is converted back into text by the LLM. Again since this is probabilistic there can be mistakes / misinterpretation here.

Examples of mistakes in the NLU -

For instance, if a user inputs the query, "Show me the sales growth," the model might misinterpret this in several ways:

领英推荐

This AI newsletter is all you need #24

Towards AI 1 年前

Almost Timely News: Should You Buy a Custom GPT?…

Christopher Penn 9 个月前

Graph Viz with Gephi and ChatGPT, Google's Bard AI…

Open Data Science Conference (ODSC) 1 年前

Temporal Ambiguity: The model might not understand the specific time frame the user is interested in. It might show sales growth for the last month instead of the last quarter or year, depending on what it guesses the user means.

Metric Ambiguity: The term "sales growth" might be interpreted in different ways, such as absolute sales numbers, percentage growth, or growth rate. If the model assumes one interpretation without clear context, it could provide incorrect or irrelevant data.

Contextual Ambiguity: If the user's query lacks context and the model doesn't have enough prior conversation history, it might not know which product line or geographical region's sales growth to display if the company operates in multiple domains.

Examples of mistakes in Results Processing

An example of an error in results processing can occur when the LLM incorrectly interprets or formats the output from the deterministic tool, leading to misleading or incorrect information being presented to the user.

1. Misinterpretation of Data:

- The LLM might incorrectly summarize the data if it doesn't properly calculate the overall average from the monthly averages.

2. Formatting Issues:

- Errors in formatting the output can lead to misunderstandings. For example, misplacing decimal points or mislabeling units (e.g., thousands vs. millions).

3. Inaccurate Aggregation:

- The LLM might incorrectly aggregate the results, such as summing values instead of averaging them.

4. Context Loss:

- If the context is not maintained accurately, the LLM might mix up different time periods or datasets, leading to incorrect results.

Bottomline - Understand all this before you pop that chart into ChatGPT and ask it to do data analysis.

BTW, even with GPT-4o, the image generation from text sucks. I asked it to generate an architecture diagram to show the flow and it came up with this.

GenAI for Go-To-Market teams

1,359 位关注者

Vikram Ekambaram

4 个月

Himanshu Gupta - Our chat about using GenAI for data analytics..

Emeric Marc

I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation

4 个月

Exciting blend of AI and deterministic tools for data analysis.

1 次回应

Vincent Valentine ??

CEO at Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future

4 个月

Insightful analysis on leveraging ChatGPT for data tasks. Combining AI with robust tools opens exciting opportunities. How do you envision integrating such architectures in your workflow? Vikram Ekambaram

Woodley B. Preucil, CFA

Senior Managing Director

4 个月

Vikram Ekambaram Very well-written & thought-provoking.

1 次回应

David Russell

Software Product Strategist

4 个月

Indeed... the deterministic are so very important. It's all under the "theme" of AI, but it's not really the LLM doing the work. ChatGPT exposes the execution of python. For the time being, it's not quite ready for prime time execution of these scripts - it can't even flawlessly generate the python scripts with any consistency. We're still stuck in the step-by-step execution of workflow we decompose and then curate by hand for the next step. Each variation of quality in the output of each step leads to increased deviation from "expected outcome". The further we leave the machines alone to think, the more likely any step in the process will "fail". Step - human - Step - human - Step - human Only when we finally get a quality deterministic application against quality data can we get "repeatable outcomes". And when someone wants to know which product is moving faster, which has the most customer complaints, or which salesperson is leading the pack... coming up with different answers upon each execution "because I thought about the problem a different way" destroys confidence in that output.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

ChatGPT for data analysis

Vikram Ekambaram

领英推荐

GenAI for Go-To-Market teams

1,359 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Why You Need a Custom AI Assistant (And How It Can Supercharge Your Work)

Use of ChatGPT for Data Visualization

Burning To Train Your Own Large Language Model? Here Are Some Important Considerations!

Artificial Intelligence No 96: A three step generic strategy for GPT-3/LLM development

Hello and welcome to issue no.3 of The Data & AI Dispatch! ??

Best Uses of ChatGPT for Data Scientists

Exploring the Impact of ChatGPT Code Interpreter on Data Science

ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist

ChatGPT, Meet DataDistillr! You’ll have lots to discuss!

OpenAI's Latest GPT-4o 'omni'

领英推荐

GenAI for Go-To-Market teams

1,359 位关注者

Using GenAI to understand a 15000 word essay on AI

2024年10月14日

Revisiting GenAI for Images

2024年10月7日

GenAI for my business

2024年9月3日

Using GenAI to find case study snippets (a RAG example)

2024年8月23日

Life of a GenAI General Contractor - Poly Employment

2024年8月14日

GenAI, Security and Privacy

2024年8月13日

Identifying GenAI Use Cases

2024年8月8日

GenAI for CS - Handoff documents

2024年8月5日

GenAI and SaaS - What does the future hold?

2024年8月2日

GenAI for creating proposals and SOWs

2024年7月29日

社区洞察

其他会员也浏览了

Why You Need a Custom AI Assistant (And How It Can Supercharge Your Work)

Use of ChatGPT for Data Visualization

Burning To Train Your Own Large Language Model? Here Are Some Important Considerations!

Artificial Intelligence No 96: A three step generic strategy for GPT-3/LLM development

Hello and welcome to issue no.3 of The Data & AI Dispatch! ??

Best Uses of ChatGPT for Data Scientists

Exploring the Impact of ChatGPT Code Interpreter on Data Science

ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist

ChatGPT, Meet DataDistillr! You’ll have lots to discuss!

OpenAI's Latest GPT-4o 'omni'