Training Sets: The real intelligence behind AI/ML
Image by vecstock on Freepik

Training Sets: The real intelligence behind AI/ML

Over 2023, Artificial Intelligence (AI) and Machine Learning (ML) have seemingly exploded into existence - opening the year with the introduction of ChatGPT, and wrapping up with the recent X (formerly Twitter) addition of Grok.

But what is AI/ML? Many hear the term AI and jump to thoughts of muscled, killer robots or eerie social media ad prompts that make you wonder who's possibly listening in. In reality AI/ML is far much more than those extreme options, returning us to the question of "what is AI?" We'll get to that, but first let's establish some background.

TechTarget defines AI as "the simulation of human intelligence processes by machines, especially computer systems. Specific applications of AI include expert systems, natural language processing, speech recognition, and machine vision."

More specifically, TechTarget notes that, "AI requires a foundation of specialized hardware and software for writing and training machine learning algorithms…. In general, AI systems work by ingesting large amounts of labeled training data, analyzing the data for correlations and patterns, and using these patterns to make predictions about future states."

IBM defines Machine Learning as "a branch of?artificial intelligence (AI)?and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy."

Spiceworks adds further color to that definition saying ML is "a discipline of artificial intelligence (AI) that provides machines the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human intervention."

The key note to these definitions is that a large volume of data must first exist to feed and inform the algorithms.

Recent years have introduced AI personal assistants like Alexa or Siri, bridging the gap of AI in the corporate world into the individual household. Areas of life actually affected by AI include everything from planning simple tasks to complex projects, crunching spreadsheets, finding recipes, music, or shopping suggestions based on your likes, to predicting patterns in weather or the stock market, etc. Some analytical software on the market even takes recorded phone calls and uses machine learning to transform that qualitative conversation into quantitative, actionable insight for a company.

So how did AI get smart to begin with? While some algorithms are designed or trained to recognize patterns and offer predictions, these algorithms didn't spontaneously "big bang" into existence.

At it's most fundamental core, a training pool or data set of pre-selected information must first exist to inform that "intelligence" exactly what it must learn. How else would it distinguish between a calendar date, a grocery list item, a song, or a search query?

The short answer is, it can't. Even we humans needed an external source to teach us what numbers and letters "meant" in a collective, agreed-upon meaning before we could learn to do math or read. (Why do you think Egyptian hieroglyphs continue to frustrate modern researchers?)

Another significant point to consider is that a data set is meaningful and effective when it includes both relevant and non-relevant items (some industry jargon might call these "positive" and "negative" hits). For example, the statement "I want to sign up" vs "I do not want to sign up" have completely opposite meanings, yet hinge on the difference of a single word. (Sentiment/tone analysis is a separate level of complexity, better left to a separate discussion.)

Volume and variety of examples further enhance the quality of that learning data pool. Consider for example, the following two phrases which use none of the same (key) words, and yet mean the same core data point- "I would like to make a reservation" vs "I want to book a room."

So why is the data set/training pool so important? Because if a piece of information has not been added, the AI won't know how to interpret it - a sophisticated algorithm might guess, but it remains just that, a guess.

Some users leverage AI tools to accelerate their research to find new perspectives and sources, others to expedite writing projects or spreadsheet calculations. Even simply leveraging the use of API's to push/pull data from one source to another can accelerate previously tedious database work. (For those less familiar with APIs, consider them to be equivalent to a plug at the end of your device/cord - typically plugs only fit into specific outlets or devices, before that electric circuit is established, and a connection for data transfers is similarly necessary.) APIs are not "intelligent" like an AI, but once they're "plugged in", a regular data transfer can be automated.

Many have a healthy level of skepticism about AI/ML, not wanting to be told what to think or even how to think, while others embrace it for seeming to possess knowledge beyond their own.

What is essential to remember here, is that in instances like ChatGPT or Grok, any answers provided back do not guarantee truth within those replies. While something may sound accurate or well-informed, those "answers" cannot be generated without first being contributed to via the previously built, greater data pool of information.

The important question when using such tools is how data pools are built. Who is deciding what information is built into these training sets? What is the information selected and fed into different data pools? Even more importantly we should ask whether or not info is added with any bias - political, religious, or otherwise.

The mere fact that chatbots and programs can be prompted to generate responses in various tones, using language to engage certain audiences over others tells us that earlier programming taught the algorithm to recognize patterns in the data pool as it was being built.

Recent conversations are now turning to reflect how data pools are "becoming poisoned" by people testing the machine and content creators fighting back against indiscriminate data scraping. (Remember the outcry when Elon shut that free process down at (then) Twitter?) So now we must be further vigilant against blithely trusting the data pool fed to AI, because what common sense or the human eye may reject as accurate may in fact be compromised and accepted as truth by an algorithm.

Thus, when thinking about responses programs like Grok generate, it is essential to remember that such AI is only as intelligent as its data pool is programmed and built to be. The question we really need to be asking is - who are the (data pool) builders, and does their bias compromise the Intelligence, which at the end of the day, is still Artificial?

Nichole Thomas

Professional Educator & Coach

9 个月

Good food for thought!?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了