登录查看更多内容

The A.I. Environment

Bobby Hill

Co-Founder & CEO: SuperTruth

发布日期: 2023年9月8日

Since the term “A.I.” is all around us, I used my last post to define just what A.I. and machine learning are - and aren’t. Building on that in this post, I want to talk about some of the terms we hear in the A.I. discussion, to get a general idea of how it works. Peeling back the curtain a bit helps us both optimize the application of A.I., and avoid situations where it could generate flawed results (note that “flawed results” do not include attempts at world domination).

Whatever terms we want to use, most data applications today fall into one of two camps: generative or predictive models. Predictive models look at the available data and make an educated guess about the future. A generative model looks at all the available data and creates something new, using the data for reference.

For either model to function, we require a significant amount of data, and the results of the models are entirely dependent on the quality of the data they use. In the simplest terms, GIGO: garbage in, garbage out.?

With machine learning, though, it's often not that simple. A machine doesn't automatically know the difference between outdated or inaccurate data - or even data input incorrectly - or current and focused data. Without some intervention, data all gets equal weighting. So downstream, when we apply your model's results, we may have hallucinations or incorrect conclusions.?

Even worse, the flawed downstream data may feed into the system and be given equal weight, making our model increasingly corrupted with no indication or practical way to correct it. This is called generative inbreeding. It's a data feedback loop that amplifies problems exponentially. One writer recently compared it to "The Human Centipede."

领英推荐

Solving the Data Daze – Analytics at the Speed of…

Kirk Borne, Ph.D. 1 年前

10 Data Challenges when considering a Data-Driven (ML)…

Steven Reece, PhD 1 年前

Handling Outliers in ML: Best Practices for Robust…

Iain Brown PhD 1 年前

For this reason, the data going in must be high-quality: accurate, current, and relevant. This is not an impossible goal, but it's a challenging target to hit. One of the biggest challenges is data's perishability: "data decay."?

Data loses relevancy almost from the moment it is collected. As I've mentioned, the stats give you a sense of the scope of the problem: in just one hour, 521 businesses will change their corporate addresses, 872 telephone numbers will change or disconnect, and 1,504 URLs will be created, modified, or changed.

Many big LLMs thought they'd found an endless pipeline of high-quality data to train their models. Recent news suggests that it may be changing. But while this could radically change the landscape for some of the biggest names in the industry, it also presents an opportunity.??

The potential shift in the landscape of data usage could pave the way for innovative data collection and application approaches. Although the challenge of maintaining high-quality, relevant data is substantial, it opens up many possibilities. This leads to the development of newer, more efficient algorithms that account for data decay or the conception of inventive strategies for data acquisition. Therefore, while it may initially appear as a hurdle, this changing scenario could catalyze creativity and innovation in machine learning.?

Sources: sited within article

Royce M. Clemens

Faith-driven Investment Banker and Business Advisor

1 年

Good read Bobby.

要查看或添加评论，请登录

Bobby Hill的更多文章

Applying AI to Data

2023年9月28日

Applying AI to Data

Everyone’s definition of data is in constant motion from day to day and hour to hour. What’s essential is relative and…

1 条评论
The Data Hierarchy

2023年9月26日

The Data Hierarchy

I’ve written about the value of data in today’s marketplace and some of its challenges. Similar to everything else…

1 条评论
AI Case Studies

2023年9月21日

AI Case Studies

I’ve been discussing data and AI basics for several weeks. I think it’s essential that everyone have a basic…
A Brief History of AI

2023年9月19日

A Brief History of AI

For many people, AI magically became ubiquitous overnight. That’s not true, of course, but its adoption by (or…
Defining A.I.

2023年9月6日

Defining A.I.

In my last few posts, I've discussed data: what it is, why it's valuable, and how the current paradigm makes getting…
First-Party Data’s Acquisition Problem

2023年8月31日

First-Party Data’s Acquisition Problem

In my last post I talked about the challenges of acquiring quality data, particularly first-party data: that which a…
Good Data: A Moving Target

2023年8月30日

Good Data: A Moving Target

It is crucial to understand that data comes in many varieties, and I want to emphasize the importance of this idea. As…

1 条评论
The Value of Data

2023年8月2日

The Value of Data

In my last post, I introduced the notion of “data decay”: how data degrades almost from its creation. Data decay costs…
Bad Data Causing Big Problems

2023年7月18日

Bad Data Causing Big Problems

Some of the strongest headwinds facing companies today have nothing to do with the economy. They’re due to bad data.

5 条评论

See all articles

The A.I. Environment

Bobby Hill

Co-Founder & CEO: SuperTruth

领英推荐

Bobby Hill的更多文章

社区洞察

其他会员也浏览了

Staying on Track: The Impact of Data Drift on Machine Learning and How to Overcome It

Making Sense of Data Features

Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift

Fatboy SLM and Structured Data Trippin

Mind the Gap: Bridging the Divide Between GenAI Promise and Practice

Why the Boardroom Needs to Pay Attention to Data Bias

How Machines Learn to See Similarities

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Comparison of Dimensionality Reduction Methods

Enhancing Model Performance: The Role of Regularization Techniques

领英推荐

Bobby Hill的更多文章

Applying AI to Data

The Data Hierarchy

AI Case Studies

A Brief History of AI

Defining A.I.

First-Party Data’s Acquisition Problem

Good Data: A Moving Target

The Value of Data

Bad Data Causing Big Problems

社区洞察

其他会员也浏览了

Staying on Track: The Impact of Data Drift on Machine Learning and How to Overcome It

Making Sense of Data Features

Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift

Fatboy SLM and Structured Data Trippin

Mind the Gap: Bridging the Divide Between GenAI Promise and Practice

Why the Boardroom Needs to Pay Attention to Data Bias

How Machines Learn to See Similarities

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Comparison of Dimensionality Reduction Methods

Enhancing Model Performance: The Role of Regularization Techniques