The A.I. Environment
Since the term “A.I.” is all around us, I used my last post to define just what A.I. and machine learning are - and aren’t. Building on that in this post, I want to talk about some of the terms we hear in the A.I. discussion, to get a general idea of how it works. Peeling back the curtain a bit helps us both optimize the application of A.I., and avoid situations where it could generate flawed results (note that “flawed results” do not include attempts at world domination).
Whatever terms we want to use, most data applications today fall into one of two camps: generative or predictive models. Predictive models look at the available data and make an educated guess about the future. A generative model looks at all the available data and creates something new, using the data for reference.
For either model to function, we require a significant amount of data, and the results of the models are entirely dependent on the quality of the data they use. In the simplest terms, GIGO: garbage in, garbage out.?
With machine learning, though, it's often not that simple. A machine doesn't automatically know the difference between outdated or inaccurate data - or even data input incorrectly - or current and focused data. Without some intervention, data all gets equal weighting. So downstream, when we apply your model's results, we may have hallucinations or incorrect conclusions.?
Even worse, the flawed downstream data may feed into the system and be given equal weight, making our model increasingly corrupted with no indication or practical way to correct it. This is called generative inbreeding. It's a data feedback loop that amplifies problems exponentially. One writer recently compared it to "The Human Centipede."
领英推荐
For this reason, the data going in must be high-quality: accurate, current, and relevant. This is not an impossible goal, but it's a challenging target to hit. One of the biggest challenges is data's perishability: "data decay."?
Data loses relevancy almost from the moment it is collected. As I've mentioned, the stats give you a sense of the scope of the problem: in just one hour, 521 businesses will change their corporate addresses, 872 telephone numbers will change or disconnect, and 1,504 URLs will be created, modified, or changed.
Many big LLMs thought they'd found an endless pipeline of high-quality data to train their models. Recent news suggests that it may be changing. But while this could radically change the landscape for some of the biggest names in the industry, it also presents an opportunity.??
The potential shift in the landscape of data usage could pave the way for innovative data collection and application approaches. Although the challenge of maintaining high-quality, relevant data is substantial, it opens up many possibilities. This leads to the development of newer, more efficient algorithms that account for data decay or the conception of inventive strategies for data acquisition. Therefore, while it may initially appear as a hurdle, this changing scenario could catalyze creativity and innovation in machine learning.?
Sources: sited within article
Faith-driven Investment Banker and Business Advisor
1 年Good read Bobby.