登录查看更多内容

Applying LLMs to Data Analytics

SG Mir

Data & AI | Ex-Looker (G ??) - ex CTO - ex CDO | MIT

发布日期: 2024年2月27日

One of the areas in Data I am most passionate about is ANALYTICS AS CODE. I am bullish about this because I believe this is where AI can deliver the most impact to the end users of Data Applications such as Business Intelligence. For me, the biggest revelation from the recent Google post on Gemma release was not what was in the tweet, but what was not: the role of applications.

When it comes to code, underlying models get you only to 20-40 points. The majority of the value (as much as 40-60 additional points according to my own company’s benchmarks) comes from the application-level fine-tuning.

Ok, let’s dive in…

1) Mode Analytics

Mode is an interesting vendor because it started as an internal project at Yammer, and did not have very impressive—or original—architecture to begin with. In some way, it committed the very mistake Mode’s Co-Founder claims are committed by the majority of Data start-ups: “internal (data) tools make lousy startups” (Benn Stancil).

And yet over time it has developed into a comprehensive Analytics tool - one that is capable of serving everyone from a Data Scientist, with their implementation of Python notebooks, to an executive - with fancy Dashboard functionality.

Looking at Mode’s AI Assistant, however, I have to conclude that it is non-impressive. Mode’s AI assistant works by taking the context of the current query and sending it to a raw non-fine-tuned OpenAI model. In this way, Mode’s assistant’s has intelligence neither about the data model of the business, such as column names or business-specific metric definitions, nor about the the domain of the business - such as domain-specific ways for marketing attribution. It is helpful only in the context of a very long query with multiple other Common Table Expressions (CTEs) - from which the assistant can extract some useful information. Basically, it is nothing more than a simple OpenAI extension.

2) Hex

First, it is worth crediting Hex with succeeding where many have failed: notebook collaboration. Between 2015 and 2020 this was a very busy category. VCs have invested into numerous start-ups (e.g. Deepnote, etc) that have largely failed to grow into unicorn-level companies that they pitched their products to be.

Hex works by merging a few concepts together: SQL, Python Notebooks, and “SQL Chaining”. Each new section of the notebook lets analysts apply a transformation to the previous result.

On the surface, Hex’s AI product, Magic, succeeds where others fail. Unlike Mode’s AI, Magic relies on all available context. Whether that’s the vendor’s knowledge of underlying database schema, or information available about partner integrations, Magic takes that meta information and sends it over along with the user prompt to OpenAI. It also has a backend Data Manager interface, where the customer can either provide additional description to the data model OR exclude certain data sources entirely from AI. See for yourself.

领英推荐

NEW from Maven Analytics on Medium!

Maven Analytics 1 年前

Vector Database Revolution - Chroma, Pinecone, and…

Xencia Technology Solutions 1 年前

Issue #4: Marvelous MLOps

Marvelous MLOps 1 年前

However, testing the AI deeper reveals major flaws. A simple SQL test (not even SQL101) to calculate the average price of items in the order fails. The AI does not recognize the difference between the price of an item from a menu table and that of an order from the orders table (a combined total of menu items). For all the increased context, Hex’s Magic is not actually that magical as it relies on the basic raw OpenAI model without adjusting it for any of the knowledge Hex team has about typical data use cases.

The other area Magic fails in my own testing is speed. Where the video demo succeeds, the real use of Magic—even in a very small test scenario ran from a trial account—takes forever to process. If they are sending all meta information in the context window without pre-fine-tuning, it could explain the major lag. The delay there has a relationship of O(n^2) between time and the amount of information in the context. Naturally, this is not a good way to do LLM implementation.

Basically, what Hex has done is put a Tesla battery into a Ford, but made no other significant changes, so the car continues to mainly rely on gasoline.

Basically, what Hex has done is put a Tesla battery into a Ford, but made no other significant changes, so the car continues to mainly rely on gasoline.

3) Looker

As a former Looker employee, I wanted to initially spend a lot of time delving into Looker (not to be confused with Looker Studio - a sliver of full functionality). In principle, Looker, being inside Google and all, has the most to benefit from AI. It has this incredibly powerful semantic layer that captures code cohesively across all analytics into one standard model.

With the flip of a button, all those models shared with Google could be trained on to build a new LLM. With nearly 10,000,000 business users and hundreds of thousands of data developers - this would essentially give Google an unprecedented access to a proprietary codebase to build the best AI for analytics, period.

Because the code from data teams is not readily available in the public, LLMs today are trained only code snippets that are readily available from documentation and places like StackOverflow - not representatively of the true complexity of code inside actual data teams.

But as I dived into the overall Gemini phenomenon and its Looker implementation, my enthusiasm has vanished.

Looker essentially has outsourced all AI to Gemini. Without any special access to LookML from Gemini’s underlying models for training purposes, the AI assistant adds little leverage over just someone connecting a Github copilot or OpenAI extension to their VSCode editor and making changes to LookML there. The implementation is basically analogous to that of Hex, but relies on Google rather than on Microsoft/OpenAI. It has no special understanding of Analytics (or LookML for that matter) other than what Google’s own servers were capable of scrapping from public web. And with Google’s recent Gamma launch flop, personally I would be tempted to keep my Looker code and AI copilot engine hosted on separate vendors - not all in Google.

That is it for now. Did you enjoy reading this? Then please comment bellow and let me know which other vendors I should cover.

Mir.Report

791 位关注者

Mir.Report

12 个月

For an update to the above: https://www.dhirubhai.net/feed/update/urn:li:activity:7174396436782997506/

Prateek Sanjay

Enthusiast of hyper-learning

1 年

Read this. Had some thoughts. 1. How widely adopted is the analytics as code approach in the industry? 2. Are there alternative approaches to improving BI tools that don't involve heavy integration with code? 3. What are the limitations or drawbacks of relying solely on code-based analytics? 4. Can you provide specific use cases where AI integration has demonstrably improved analytics outcomes? 5. What are the main challenges or limitations of current AI technologies when applied to analytics tasks? 6. Are there examples of successful analytics tools that don't heavily rely on AI? 7. How much weight do customers typically place on AI capabilities when selecting analytics tools? 8. Are there other factors besides AI capabilities that contribute significantly to the success of analytics vendors? 9. Can you provide examples of successful analytics vendors that don't prioritize AI in their offerings?

Faiq Ali, FCIPS

Driving Procurement Transformation & Excellence

1 年

Looking forward to more about your work in this exciting space.

Edward Smith

Strategic Revenue Operations Executive | Actionable Analytics | Salesforce and Hubspot Certified

1 年

Commenting for reach as I thought it was very well written Seggi Mir ??

Mir.Report

1 年

Full piece: https://mirdata.substack.com/p/applying-llms-to-data-analytics-part

查看更多评论

要查看或添加评论，请登录

SG Mir的更多文章

Developing AI Applications

2024年11月26日

Developing AI Applications

Taking a closer look at AI feature development lifecycle Airtable Formula Suggest Those of us who grew up learning…
Omni Analytics

2024年3月15日

Omni Analytics

Since before it became the 4th largest acquisition in Google’s history, Looker had a target on its back as every other…

1 条评论
New Year's Resolutions

2023年12月28日

New Year's Resolutions

As we embark on the New Year, I find there is too much happening globally to talk about my private resolutions…

3 条评论
How Databricks won the battle, for now - Conference Recap - Part I

2023年7月31日

How Databricks won the battle, for now - Conference Recap - Part I

The battle between Product led Growth and Sales (SLG) “Databricks is a $38 billion dollar mistake” wrote Benn Stancil…

2 条评论
Plug & Play: Speaking on Generative AI

2023年6月11日

Plug & Play: Speaking on Generative AI

Talking Privacy, Data Ownership, Biases, and Worker Employment A common theme in many conversations at most conferences…
Chess as a barometer for AI

2023年5月2日

Chess as a barometer for AI

For over a century, the game of chess was considered the ultimate benchmark for measuring intelligence. The belief was…
Paul Graham has 49.3 times more "AI tokens", but does that actually mean what we think it does?

2023年4月21日

Paul Graham has 49.3 times more "AI tokens", but does that actually mean what we think it does?

1. The challenge of AI understanding Understanding language remains one of the most complex challenges in artificial…
Generative AI

2023年4月14日

Generative AI

Hello there, Today, we'll explore several pivotal aspects of generative AI. Let’s dive right in… AI Infrastructure:…

1 条评论
ChatGPT, Emotional AI, and the Future of Communication

2023年2月20日

ChatGPT, Emotional AI, and the Future of Communication

Emotional AI is a broad range of technologies aimed at automating objective measurement of opinions, feelings, and…
Why you can only eat that sandwich once…

2022年10月17日

Why you can only eat that sandwich once…

When I met my wife, she wanted to open a restaurant. Great, I thought — I love eating… I mean, everyone does.

See all articles

Applying LLMs to Data Analytics

SG Mir

Data & AI | Ex-Looker (G ??) - ex CTO - ex CDO | MIT

1) Mode Analytics

2) Hex

领英推荐

3) Looker

Mir.Report

791 位关注者

SG Mir的更多文章

社区洞察

其他会员也浏览了

Analytics and Data Science News for the Week of March 17; Updates from Salesforce, Tableau, TIBCO & More

How to detect drift with Evidently and MLFlow

How to Become a Data Analyst in 2025

The AI Revolution in Data Analysis: Top 10 Tools Reshaping Business Intelligence

Analytics and Data Science News for the Week of March 24; Updates from Alteryx, Power BI, Tellius & More

What is No-Code Data Science and it’s Impact in the World

Retrieval Augmented Generation (RAG) for Structured Data Processing

ML Systems for Business: A Step-by-Step Guide

OpenLink Data Twingler AI Agent Example

1) Mode Analytics

2) Hex

领英推荐

3) Looker

Mir.Report

791 位关注者

SG Mir的更多文章

Developing AI Applications

Omni Analytics

New Year's Resolutions

How Databricks won the battle, for now - Conference Recap - Part I

Plug & Play: Speaking on Generative AI

Chess as a barometer for AI

Paul Graham has 49.3 times more "AI tokens", but does that actually mean what we think it does?

Generative AI

ChatGPT, Emotional AI, and the Future of Communication

Why you can only eat that sandwich once…

社区洞察

其他会员也浏览了

Analytics and Data Science News for the Week of March 17; Updates from Salesforce, Tableau, TIBCO & More

How to detect drift with Evidently and MLFlow

How to Become a Data Analyst in 2025

The AI Revolution in Data Analysis: Top 10 Tools Reshaping Business Intelligence

Analytics and Data Science News for the Week of March 24; Updates from Alteryx, Power BI, Tellius & More

What is No-Code Data Science and it’s Impact in the World

Retrieval Augmented Generation (RAG) for Structured Data Processing

ML Systems for Business: A Step-by-Step Guide

OpenLink Data Twingler AI Agent Example