登录查看更多内容

Garbage In, Gold Out!

Timo Elliott

发布日期: 2023年12月14日

We’ve all heard the expression “garbage in, garbage out” when it comes to data systems. But Generative AI brings a big caveat, and a big new opportunity.

Data remains the biggest and most important factor in the usefulness of AI systems. Algorithms are becoming a commodity, so the biggest differentiator is the quantity, quality, and relevance of the underlying data set. And the better the data, the easier it is to create quality outputs.?

But there’s an important distinction between the underlying data and the way it’s actually recorded and stored. Real-world systems see the world through a cracked and smudged lens. But even if each point of light is dubious, we can still get an overall impression of what’s going on.?

For example, if your IoT sensors are recording random numbers, you obviously can’t get anything useful out of them. But if they’re “just” inaccurate, with the real data hidden behind a veil of noise, the result is still potentially useable with the right statistical techniques. Machine learning algorithms can capture the underlying patterns that (probably) generated the observed, messy data.

Now new Generative AI technologies are providing another huge step forward in dealing with imperfect data.

Large language models are very good at dealing with some types of messy data. For example, researchers have shown that large language models like GTP-4 can decipher even very scrambled sentences:?

Researchers in Japan showed that GPT-4 can almost perfectly handle scrambled text

A personal example: my daughter recorded a short section of her economics class (with permission). The quality was awful—the teacher’s voice was almost completely drowned out by the sound of my daughter typing and other background sounds. I personally couldn’t really hear what he was saying.

领英推荐

?? Top AI Papers of the Week

DAIR.AI 1 个月前

AI trained on AI garbage spits out… AI garbage

MIT Technology Review 7 个月前

The Sparks of Artificial General Intelligence AGI in…

Data Science Dojo 1 年前

I ran the recording through OpenAI's open-source transcription algorithm Whisper, using the slowest and most sophisticated model available. It did a good job of deciphering many of the spoken words, but there were gaps, a few words that were clearly incorrect, and the result was hard to follow (the teacher had a tendency to digress and circle back).

I took the transcript and put it into ChatGPT 4, asking it to “take the text and put it into sentences”. As if by magic, out popped a restructured, clear, three-paragraph summary of the economic points the teacher had discussed. It wasn’t what he said, but it was a lot closer to what he meant.

Large language models are good at figuring out what we meant, and the principle applies to many real-world data problems.

For example, machine learning is already used to extract information from documents such as invoices: the date, amount, supplier ID etc. But these models require lots of training data, and don't generalize very well— if you try to use them against a new layout of invoice that the model hasn't seen before, then it may get stumped. By adding generative AI, the system gets much more effective at dealing with edge cases and novel layouts.

There are dangers, because these models are designed to synthesize what "should" or "could" be there, no just analyzing what is actually there. From the previous examples, the result may be thoughts the economics teacher never mentioned, or a supplier ID even if one is not included in the document.

Figuring out how to avoid such "hallucinations" is currently the leading edge of AI research—with approaches that include asking the model to double-check itself, averaging out the results of several instances of the model, or an extra check from a dedicated verification model acting independently.

But overall, generative AI is a great new opportunity to open up more data in new ways, to rethink what data sources are available, how they can be used to improve processes—and to turn what looks like data garbage into business gold.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Michael Schmitt

1 年

Being able to 'gloss over' the little errors people make and still understand what they meant is going to be a really useful tool - excited to see where this all goes next.

Eran Adi Cioban

AI Lead & Director @ The MOFET Institute | Digital learning, GenAI

1 年

Timo Elliott great use case. I used this expression many times in relation to GenAI, but you made me question myself. I'm grateful for that ????

Clinton Jones

"Features seldom used or undiscovered are just unclaimed technical debt" I engage on Software Engineering and all things #ProductManagement

1 年

was the recording worth it in the end even after clean-up sometimes the content is just ??

Jean-Fran?ois Legault

SAP Data & Analytics Consultant

1 年

I agree, there is a huge potential to improve the data quality but human validation is still required. Otherwise "garbage in" may/will result in "modified garbage" (or worse, "nicely looking garbage") that will be harder to detect and without due diligence, can lead to poor or poorer decisions. Trust in data and data models output is hard earned.

1 次回应

查看更多评论

要查看或添加评论，请登录

Timo Elliott的更多文章

From Buildings to Businesses: Lessons from My Architect Father on Enterprise Architecture

2025年2月27日

From Buildings to Businesses: Lessons from My Architect Father on Enterprise Architecture

Enterprise Architect community activist Paul Kurchina asked me, as a first-time attendee of the Next Generation SAP…

9 条评论
Top ChatGPT Prompts for SAP Enterprise Architects(?!)

2025年2月25日

Top ChatGPT Prompts for SAP Enterprise Architects(?!)

I simply asked ChatGPT: "I am an expert in Enterprise Architecture, and I work for a large organization that has SAP…

10 条评论
How AI Thinks AI can Help SAP Customers Get More Value Out of Their Software

2025年2月14日

How AI Thinks AI can Help SAP Customers Get More Value Out of Their Software

I asked ChatGPT to help me think through the big new AI opportunities to help SAP customers get more value out of their…

2 条评论
SAP UKI Partner Summit 2025

2025年2月12日

SAP UKI Partner Summit 2025

We had a wonderful day at Ascot yesterday. Over 700 Partners and SAP experts attended the #SAPUKIPartnerSummit…

8 条评论
5 Things Enterprise Architects Are Doing Wrong—And What They Should Do About It

2025年2月7日

5 Things Enterprise Architects Are Doing Wrong—And What They Should Do About It

The Challenge for Enterprise Architects Enterprise Architecture (EA) is at a crossroads. While EA leaders have long…

3 条评论
To Unlock the Power of Data Management, Proceed Wisely!

2025年2月6日

To Unlock the Power of Data Management, Proceed Wisely!

I recently had the opportunity to interview Robert Reuben , Managing Director of Proceed Group. In an era where data is…
Why Business Insights Always Feel Just Out of Reach

2025年2月3日

Why Business Insights Always Feel Just Out of Reach

I’ve been working with organizations trying to derive insights from their corporate data for over thirty years. Have…

15 条评论
Navigating Enterprise Architecture in the SAP Ecosystem: Skills Needed!

2025年2月3日

Navigating Enterprise Architecture in the SAP Ecosystem: Skills Needed!

At the recent ASUG TechConnect event there was a great conversation about the evolving role of the enterprise architect…

4 条评论
DeepSeek and the Real Value of AI Models (Hint: It's What You DO With Them)

2025年1月28日

DeepSeek and the Real Value of AI Models (Hint: It's What You DO With Them)

The recent launch of DeepSeek, a groundbreaking AI model from China, has sent shockwaves through the global stock…

20 条评论
The Required Enterprise Architect Skills of the Future

2025年1月28日

The Required Enterprise Architect Skills of the Future

Enterprise Architects in large organizations, particularly those working with complex systems like SAP, play a critical…

16 条评论

See all articles

Garbage In, Gold Out!

Timo Elliott

领英推荐

Timo Elliott的更多文章

社区洞察

其他会员也浏览了

What is DeepSeek? Understanding the Impact of This Game-Changing AI Tool

6 Ways that AI Plays in Food Service Distribution (Right Now)

Foundation Models vs. LLMs: Understanding the Core Differences

Why Traditional Machine Learning Still Holds Power in the Age of Generative AI

Geneea's AI Spotlight #6

Is AI Progress Slowing?

Aamar Hussain Explained Why Data is Essential To Building An Efficient AI/ML System

A Glance into GPT Capabilities and Limitations

The Reading Helper: Introduction to Local LLMs and the Power of On-Premise AI

Unraveling the AI Odyssey: From Turing Test Triumphs to GPT-3 Glories - A Comprehensive Guide to Artificial Intelligence Evolution

领英推荐

Timo Elliott的更多文章

From Buildings to Businesses: Lessons from My Architect Father on Enterprise Architecture

Top ChatGPT Prompts for SAP Enterprise Architects(?!)

How AI Thinks AI can Help SAP Customers Get More Value Out of Their Software

SAP UKI Partner Summit 2025

5 Things Enterprise Architects Are Doing Wrong—And What They Should Do About It

To Unlock the Power of Data Management, Proceed Wisely!

Why Business Insights Always Feel Just Out of Reach

Navigating Enterprise Architecture in the SAP Ecosystem: Skills Needed!

DeepSeek and the Real Value of AI Models (Hint: It's What You DO With Them)

The Required Enterprise Architect Skills of the Future

社区洞察

其他会员也浏览了

What is DeepSeek? Understanding the Impact of This Game-Changing AI Tool

6 Ways that AI Plays in Food Service Distribution (Right Now)

Foundation Models vs. LLMs: Understanding the Core Differences

Why Traditional Machine Learning Still Holds Power in the Age of Generative AI

Geneea's AI Spotlight #6

Is AI Progress Slowing?

Aamar Hussain Explained Why Data is Essential To Building An Efficient AI/ML System

A Glance into GPT Capabilities and Limitations

The Reading Helper: Introduction to Local LLMs and the Power of On-Premise AI

Unraveling the AI Odyssey: From Turing Test Triumphs to GPT-3 Glories - A Comprehensive Guide to Artificial Intelligence Evolution