The A.I. Environment
Photo By: Steve Johnson via Unsplash

The A.I. Environment

Since the term “A.I.” is all around us, I used my last post to define just what A.I. and machine learning are - and aren’t. Building on that in this post, I want to talk about some of the terms we hear in the A.I. discussion, to get a general idea of how it works. Peeling back the curtain a bit helps us both optimize the application of A.I., and avoid situations where it could generate flawed results (note that “flawed results” do not include attempts at world domination).

Whatever terms we want to use, most data applications today fall into one of two camps: generative or predictive models. Predictive models look at the available data and make an educated guess about the future. A generative model looks at all the available data and creates something new, using the data for reference.

For either model to function, we require a significant amount of data, and the results of the models are entirely dependent on the quality of the data they use. In the simplest terms, GIGO: garbage in, garbage out.?

With machine learning, though, it's often not that simple. A machine doesn't automatically know the difference between outdated or inaccurate data - or even data input incorrectly - or current and focused data. Without some intervention, data all gets equal weighting. So downstream, when we apply your model's results, we may have hallucinations or incorrect conclusions.?

Even worse, the flawed downstream data may feed into the system and be given equal weight, making our model increasingly corrupted with no indication or practical way to correct it. This is called generative inbreeding. It's a data feedback loop that amplifies problems exponentially. One writer recently compared it to "The Human Centipede."

For this reason, the data going in must be high-quality: accurate, current, and relevant. This is not an impossible goal, but it's a challenging target to hit. One of the biggest challenges is data's perishability: "data decay."?

Data loses relevancy almost from the moment it is collected. As I've mentioned, the stats give you a sense of the scope of the problem: in just one hour, 521 businesses will change their corporate addresses, 872 telephone numbers will change or disconnect, and 1,504 URLs will be created, modified, or changed.

Many big LLMs thought they'd found an endless pipeline of high-quality data to train their models. Recent news suggests that it may be changing. But while this could radically change the landscape for some of the biggest names in the industry, it also presents an opportunity.??

The potential shift in the landscape of data usage could pave the way for innovative data collection and application approaches. Although the challenge of maintaining high-quality, relevant data is substantial, it opens up many possibilities. This leads to the development of newer, more efficient algorithms that account for data decay or the conception of inventive strategies for data acquisition. Therefore, while it may initially appear as a hurdle, this changing scenario could catalyze creativity and innovation in machine learning.?

Sources: sited within article

Royce M. Clemens

Faith-driven Investment Banker and Business Advisor

1 年

Good read Bobby.

回复

要查看或添加评论,请登录

Bobby Hill的更多文章

  • Applying AI to Data

    Applying AI to Data

    Everyone’s definition of data is in constant motion from day to day and hour to hour. What’s essential is relative and…

    1 条评论
  • The Data Hierarchy

    The Data Hierarchy

    I’ve written about the value of data in today’s marketplace and some of its challenges. Similar to everything else…

    1 条评论
  • AI Case Studies

    AI Case Studies

    I’ve been discussing data and AI basics for several weeks. I think it’s essential that everyone have a basic…

  • A Brief History of AI

    A Brief History of AI

    For many people, AI magically became ubiquitous overnight. That’s not true, of course, but its adoption by (or…

  • Defining A.I.

    Defining A.I.

    In my last few posts, I've discussed data: what it is, why it's valuable, and how the current paradigm makes getting…

  • First-Party Data’s Acquisition Problem

    First-Party Data’s Acquisition Problem

    In my last post I talked about the challenges of acquiring quality data, particularly first-party data: that which a…

  • Good Data: A Moving Target

    Good Data: A Moving Target

    It is crucial to understand that data comes in many varieties, and I want to emphasize the importance of this idea. As…

    1 条评论
  • The Value of Data

    The Value of Data

    In my last post, I introduced the notion of “data decay”: how data degrades almost from its creation. Data decay costs…

  • Bad Data Causing Big Problems

    Bad Data Causing Big Problems

    Some of the strongest headwinds facing companies today have nothing to do with the economy. They’re due to bad data.

    5 条评论

社区洞察

其他会员也浏览了