Learning how AI “Learns”
Image Generated using Microsoft Designer

Learning how AI “Learns”

Original Post on Medium: Learning how AI “Learns”. Delving into the various modalities of… | by Sam Bobo | Sep, 2024 | Medium

The podcast above was generated by AI using Google NotebookLM

One of the most influential ancillary experiences in college was obtaining a minor in Italian Studies, not specifically for the language acquisition aspect but the opportunity to connect with professors deeply intune with the study of learning, specifically, how humans learn. My interest in human learning grew strong as I acted as a teachers assistant for a course entitled “How We Learn,” delving into the Regio Amellia approach to childhood learning, which highlighted the criticality of one’s environment as an ambient teacher. As an aside, I co-authored an article with Nick Potkalitsky imagining the evolution of a Montessori Middle School education augmented with AI (its worth the read!). Fast forward into my career centering around Artificial Intelligence, that theme of learning and pedagogy still remains, but shifted away from humans and towards Artificial Intelligence systems.

Early in my career and even stemming to the present, entrepreneurs, executives, product leaders, even everyday people still claim that AI is “Magic” and a black box. Yes, the mystere behind AI fascinates many invokes fear in others, but simply put… AI is probability and statistics! AI systems are mealy complex models that “learn” based on training data or the environment around them. As a practitioner, the fascinating part for me is unveiling how similar humans have programmed AI to mimic the human brain, simultaneously seeking to understand our own human body while imbuing computers with the same. A recent “Practical AI” podcast sparked interest for this post to try and distill in simplistic terms how AI models are trained. Actually, I had written about the topic in December 2023 entitled “The Weight of Complexity in Decision Making” which this article seeks to build upon. In the blog, I commented:

There is immense complexity to AI that are still unknown such as how the models create internal weights within a neural network and what those weights mean (as opposed to the confidence score output of a Conversational Intelligence engine), but at least we can appreciate the intricate balance involved in creating AI engines, AI models, AI powered applications using those models.

Yes, the weight creation across billions of parameters might be immensely difficult to decode, however, Anthropic recently invested in research and development to unpack Claud to glean insight into how model weights were created. It’s worthwhile to work our way across the eras of AI to build our foundation on how models are built to gain a better appreciation for AI models today, how they function, their limitations, and most importantly, better comprehend the methods for training and tuning these models.

Supervised Machine Learning

During the Intent and Conversational Intelligence eras, Supervised Machine learning was the primary modality for training AI systems. Supervised machine learning entails building an annotated corpus of knowledge and feeding that into a neural network to build the model in which to infer from. With Supervised Machine learning, humans are effectively teaching the AI engine with hundreds or thousands of examples of the correct answer and the incorrect answer. As a very simplistic approach, in order to train an image recognition system to recognize breast cancer, one would give a plethora of examples specifically pointing out areas of cancerous growth and a similar amount without. Partitioned across a training set and a test set, the model then learns and applies its knowledge against the test set to determine its accuracy. This methodology can be applied across voice-related models, whereby audio samples and orthographic (transcribed) scripts build a corpus or even utterances to form a natural language classification model.

Fast forwarding to Large Language Models, the concept of Reinforcement Learning through Human Feedback (RLHF) stems from supervised machine learning whereby humans help to annotate and/or correct the output of LLMs for continuous learning.

Reinforcement Learning

Leveling up from annotated data, another modality of learning from AI systems comes in the form of an optimization function, called reinforcement learning. Reinforcement learning creates a series of reward and punishment functions that compute a final score. The AI model is programmed to understand the environment and make decisions autonomously simply to optimize the output. One of the classic examples is whereby “AI” learned to play Super Mario Brothers (image below from this link). During this experiment, the AI system sought to maximize distance on a map whereby the furthest distance was the flag at the end to complete the level.

When played hundreds of thousands of times, where each iteration was a modified version of the neural net from the last, the model finally “learned” how to play and optimized for the final score.

Similarly to humans learning a complex task, we too optimize for a reward and adjust our approach in order to achieve said reward. Take a sports athlete, for example, swimming the 200 meter butterfly. The optimization function here is minimizing time while penalty functions include false starts that nullify the time. The swimmer builds muscle, perfects the technique of the stroke, optimizes for breaths taken, and even time spent doing a turn on the wall. Over time, that athlete optimizes for time and wins world records at the Olympics.

Masked Language Modeling

Large Language Models have proven complex entities to disambiguate and reverse engineer given the sheer volume of parameters (in the billions). Yet, one of the methods of fine tuning a LLM includes the concept of Masked Language Modeling (“MLM”). In MLM, the model is given input text with portions redacted. The goal of the model is to “guess” and recreate the masked output. Performed across larger and larger redacted portions of the text, the model eventually learns to recreate it.

Drawing parallels to human learning, this comes in the form of memorization. Imagine learning from flashcards or fill-in-the-blank quizzes. Over time, one learns to predict the correct answers and/or memorize large portions of information. While humans can not scale to the level of memorization as computers can, it does mimic a method of learning we employ for specific tasks.

Generative Adversarial Networks (GANs)

GANs are a unique method of learning that power highly complex workloads including image generation, video generation, music composition, or even drug discovery. GANs include two neural networks, a generator and a discriminator. The Generator creates fake data that resembles the training data and the discriminator tries to distinguish between real and fake. These systems play a zero sum game across millions of simulations until the generator creating more realistic data and thus higher quality generations.

Teachers regularly employ a similar approach in creating multiple choice questions. The false answers within the multiple choice exam imbue hidden pedagogical clues that inform the teacher of the mistake made by the student via the nuances in the answer. Furthermore, study buddies bantering back and forth can create a pseudo GAN in solidifying knowledge and/or explaining complex topics, identifying what might be wrong with a particular statement and build upon critical thinking skills.

Transfer Learning

One final method focused on the concept of Transfer Learning. In this paradigm, one machine learning model inherets the weights and vectorization of topics created in a larger foundation model. In effect, the larger model “teaches” the smaller model as a generalist on a particular subject for the smaller model to then develop further expertise as a subject matter expert. I related this process to the modern school system:

In Machine Learning, this concept of transfer learning is taken a step further that parallels our school system as well as within children’s books — Distillation. Take the classic example of professors at universities who are experts in their fields focusing on a specific niche for ongoing research. Professors, in addition to research, teach university courses to college students to build their knowledge of a particular subject akin to their work (or broadly for a 100-level introductory class). An expert, the professor, transfers learning to the students via distillation within the course curriculum as they know the information the best and are skilled in their craft as a professor to design a curriculum with the goal of transferring that learning in the most efficient manner possible.

So why examine how AI models “learn?” First and foremost, building a solid foundation for how models learn can help demystify the negative aura around AI and reinforce that its simply probability and statistics. Second, the concept of using human mimicry continues to persist. For example, I wrote about various prompting techniques back in November 2023 and claimed:

What has become apparent in reading through the scientific papers and abstracts is that many of the prompting techniques mimic logical reasoning humans undertake, just instead of learning them in school (say… to solve word problems in math), scientists are applying them against large language models.

Now, nearly a year later, OpenAI debuted a new model called GPT o1, subbed “strawberry” which leverages Chain of Thought prompting as a method of reinforcement learning. This level of reinforcement helps the model break down problems into logical parts, solve each partition as input into the next, and then formulate a response, mimicking a form of human logic. These types of models will have massive benefit to society to help streamline complex mathematical tasks to even further advancements in the sciences. This might even help inch closer to Explainable AI and peering more into the “black box” that is the algorithmic progression across a model to form a final output.


Created using napkin.ai
Created using Napkin.AI

We as humans simply operate in a modality we are accustom to, which natively introduces biases, most of which we seek to mitigate. Our foray into Artificial Intelligence is both teaching humans that we remain committed to employing learning and teaching modalities we are accustom to but also are unlocking other ways in which we as humans might learn. Artificial Intelligence systems are not humans nor can replace them, rather can operate at scale with immense “knowledge”. It’s fascinating that we take inspiration from our intrinsic self-curiosity but also uncovers pitfalls we might run into. Nontheless, having a background in how humans and machines learn can help explain the modern day landscape of AI and how to operate with this technology.

Woodley B. Preucil, CFA

Senior Managing Director

2 个月

Sam Bobo Very Informative. Thank you for sharing.

Mohammad Waris

Senior Consultant | AI & Digital Solutions | Web & Mobile App Development | Expert in Boosting Productivity at Claritus Consulting

2 个月

Great post! It's amazing to see how AI learning methods mirror human learning processes. The introduction of a chain of thought in GPT-o1 sounds like a big step forward in making AI responses more logical and coherent. Exciting times are ahead for AI development! #AI #Innovation

Peter E.

Helping SMEs automate and scale their operations with seamless tools, while sharing my journey in system automation and entrepreneurship

2 个月

Absolutely! AI learns as we do, and OpenAI's GPT-o1 uses a chain of thought (CoT) to boost its reasoning. Exciting developments, Sam!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了