?? Shining a Light on AI: Demystifying Neural Networks and Language Models for Educators ?

?? Shining a Light on AI: Demystifying Neural Networks and Language Models for Educators ?

Ok, you're using ChatGPT, Gemini, Claude etc. but do you really know how do they work or what can they do?

In recent times, a growing number of individuals are harnessing the power of advanced language models and AI assistants such as ChatGPT, Gemini, and Claude in their daily activities, including teaching. However, a significant challenge arises from the lack of understanding regarding the inner workings, training processes, and capabilities of these systems. Without a solid grasp of the fundamentals, users often struggle to effectively utilize these tools, leading to wasted time and missed opportunities.

Using ChatGPT without knowing what's behind can be really frustrating

To bridge this knowledge gap, it is crucial to gain a basic understanding of the underlying principles that drive these AI systems, even if not at a deep technical level. By doing so, users can better comprehend how these tools function and unlock their true potential. ??

It is essential to recognize that while these AI assistants possess intelligence, it is distinctly different from human intelligence. Their capabilities surpass ours in certain areas while falling short in others. Personalizing these tools excessively can lead to the misconception that they possess the same characteristics and possibilities as humans, which is not the case. ??≠??

Many say these systems are not really intelligent. Well, at least, please, consider them as a new kind of intelligence, different from ours, better in many cases, worse in more other contexts.

At the heart of the current AI boom lies the cornerstone technology of artificial neural networks. Drawing inspiration from the biological neural networks in our brains, where billions of interconnected neurons work in parallel to enable incredible cognitive abilities, artificial neural networks aim to replicate this distributed computing power. ??

Current artificial neural networks are bio-inspired, but just as a way to create powerful processing systems as a huge connected network of simple computational units, the neurons, implemented as mathematical functions in our case.

The perceptron, introduced in the mid-20th century, serves as the fundamental building block of artificial neural networks. It represents an attempt to approximate the function of a biological neuron, receiving numerical inputs, performing a simple mathematical calculation, and producing a numerical output. These perceptrons are then organized into dense networks, allowing information to flow from one layer to another, ultimately yielding a result. ??

However, creating an architecture similar to biological neural networks is not sufficient for artificial neural networks to function effectively. Just as our brains undergo a lifelong training process, artificial neural networks must also be trained using mathematical techniques. This training enables them to learn and adapt based on the numerical information that flows through the network. ???

Machine learning is the approach of making these systems learn what they will be able to solve. And they learn using data or experiences, in quite a similar way to our own minds.

By understanding these fundamental concepts, users can better harness the power of AI language models and assistants. Recognizing their unique capabilities and limitations allows for more efficient and effective utilization, unlocking the true potential of these groundbreaking tools in various domains, including education. ??

Recapitulating, understanding artificial neural networks is not overly complicated. Each neuron can be thought of as a simple mathematical function that takes numerical inputs and produces a numerical output. The key lies in creating sophisticated architectures using these small units of computation, simulating them using available hardware such as CPUs or, more optimally, GPUs. It's crucial to remember that all information circulating within artificial neural networks is numerical. ??

An artificial neuron is just a simple mathematical function. It receives a collection of inputs (numerical values), these values are multiplied by a number called 'parameter' or 'weight' that somehow determines how important this specific input parameter is for the neuron. All the inputs x parameters are added and the function is applied to give a unique value that will be the input of many other neurons in other layers of the neural network.

However, defining an appropriate architecture is not enough; training the neural network is equally important. Without training, it would be like having a blank mind or brain. To train the network, we need data to feed it, following different approaches depending on the desired outcome, such as discrimination, prediction, or generating new information. ??

When an artificial neural network is trained, it's essential to note that the training data itself is not stored within the network. Instead, the information is transformed numerically and injected into the input layer neurons, cascading through the network and being transformed by each artificial neuron according to the architecture used. ??

The training process is achieved by adding crucial elements called parameters or weights to the artificial neurons. These weights are numerical values initialized randomly when creating the neural network. The input values to an artificial neuron are multiplied by these initially random weights, and the sum of these products is processed by the neuron's mathematical function. Random initialization of weights is crucial to establish a parallel with having a blank mind. ??

The training of the neural network involves using mathematical techniques, such as gradient descent and the backpropagation algorithm, to adjust these weights. As the network trains, the weights change from random values to values that optimize the network's performance. These adjusted weights essentially modulate the importance of different input values to each artificial neuron. ??

The neural network learns being fed by thousands, millions, even more examples. Each example gives a final result, an error is calculated, and using maths this error is going to be used to modify the parameters or weights of the neural network, which is the consequence of this learning process.

The consequence of this process is that although the final implementation of artificial neural networks is complex, even from a mathematical standpoint, there is a clear parallel with our own biological neural networks. The knowledge acquired by the neural network is summarized by the set of its parameters—the values that multiply the input values of each artificial neuron. These values start randomly but eventually adopt specific values that yield the best results for each neuron, enabling the entire artificial neural network to perform its task effectively. ??

It's crucial to understand that artificial neural networks only store these parameters, these numerical values, and not the data used for training. Language models like ChatGPT do not store text, image-generating models do not store images, and music-generating models do not store music. Through this parallel implementation of artificial neural networks with biological ones, we create a sophisticated system capable of learning hidden and complex structures from the training information without storing the training information itself. ??

Open your head, inspect your own brain, you won't find text, images, videos etc. This is also exactly what happens with an artificial neural network: you won't find text, images, videos or whatever training element, just parameters, numerical values, that synthesize the knowledge of the training. So, worry, no copy and paste, no plagiarism, even using copyrighted information. Artificial Neural Networks do something really similar to what we do when we listen, see, etc. They get knowledge, patterns, from which new elements can be created, in a quite similar analogy to ourselves.

With these complex neural structures, we can make predictions, solve problems in vision and recognition, and even generate new information that follows the patterns and structures of the training data. Language models, such as ChatGPT, can generate remarkably coherent and human-like text because they have been trained on vast amounts of text using sophisticated artificial neural networks, such as transformers. Similarly, neural networks trained on millions of images with text descriptions can relate text to shapes, colors, actions, or any other element present in the image, much like our own brain does, albeit with significant differences. ???

Understanding the fundamentals of artificial neural networks is essential for effectively harnessing the power of AI language models and assistants. By recognizing the numerical nature of the information processed, the role of weights and parameters, and the training process, users can better comprehend how these tools function and unlock their true potential. ??

ChatGPT, Claude, Gemini, and others are LLMs, Large Language Models. It is important to understand they are something new. They are not databases, they don't store text, they have very limited reasoning and planning skills, but have some incredible features that surpasses our human own abilities though.

To further understand the workings of large language models like ChatGPT, it's essential to recognize that they are built upon artificial neural networks. As previously discussed, these neural networks must be trained, starting from a blank slate with no inherent knowledge. The core objective of a language model is to predict the most probable words that follow a given set of words, a task that may seem simple but is, in fact, incredibly complex. ??

Yes, basically, ChatGPT as an LLM has as the main purpose to predict the most likely words that follows a sentence (the prompt)

Even for humans, accurately predicting the next words in a sequence is challenging, as there are numerous variables and contextual factors that influence the outcome. There is no single, definitive sequence of words that can follow an initial phrase, as it depends on the individual and the context. However, this is precisely the goal we aim to achieve with a language model. ??

To train a neural network to accomplish this task in a manner that aligns with human expectations, we employ a technique called self-supervised learning. In this approach, we feed the neural network a significant portion of text from the internet, which represents a substantial sample of human knowledge across various languages, topics, and domains. While this information may vary in accuracy, bias, and veracity, it is currently the most comprehensive representation of human knowledge available. ??

LLMs such as ChatGPT has been trained in a first stage with a method called self-supervised learning, but processing a significant textual part of the internet.

The training process involves taking the first set of words or tokens (smaller units than words) from this massive corpus of text and asking the neural network to predict the most likely words that follow. Initially, the neural network's predictions will be random, as its parameters are initialized randomly. However, we compare the network's output with the actual words that follow in the original text and calculate the difference or error. Using mathematical techniques, we adjust the network's parameters to maximize the probability of generating responses that align more closely with the training text. ??

This process is repeated iteratively, moving through the entire corpus of text. As the neural network encounters similar phrases and patterns related to grammar, knowledge, and information, it gradually learns to make predictions that are more aligned with what a human would expect. The network begins to acquire a parametric knowledge that exhibits a statistical or stochastic behavior. ??

It's crucial to understand that the neural network does not store the entire training corpus within its parameters. Instead, it synthesizes and compresses the essential patterns and information into a much smaller representation. The knowledge stored in the neural network is parametric, meaning that its ability to provide correct answers depends on how frequently the information appears in the training corpus. ???

LLMs are not databases. They synthesize, compress, the input information. It keeps many of the most important patterns of the text in internet, such as grammar, syntax, text coherence, and also knowledge. But this knowledge has a stochastic nature. The more something is in the corpus, the text from internet used in training the LLM, the more likely the answer will be correct, and viceversa.

For example, if asked who discovered America, the neural network is likely to provide the correct answer, as this information appears numerous times in the training data. However, for very specific or obscure questions that are rarely mentioned, the probability of receiving a correct response is much lower. This parametric knowledge of language models closely resembles our own long-term memory, where our ability to recall information depends on how frequently we have studied or encountered the concept. ??

Think about the knowledge stored in an LLM such as ChatGPT as your own long-term memory.

Language models like ChatGPT are not databases in the traditional sense. They are more akin to a statistical database, similar to our long-term memory. Understanding this distinction is crucial for effectively utilizing these tools. While they may not serve as a reliable source for highly specific or obscure information, their true power lies in their ability to learn and apply patterns of knowledge, language structure, grammar, and other linguistic characteristics. ??

Limited in some areas, but with incredible emergence abilities. LLMs are really good translating, summarizing, transforming information, adapting content, getting insights from unstructured input text, sentiment analysis, few-shot learning, and many more.

These models excel at transforming and adapting information in natural language, enabling them to summarize, synthesize, and extract relevant details. They can translate between languages, adjust their language to suit different age groups or proficiency levels, and perform a wide range of impressive linguistic tasks. The real value of language models lies in their ability to assist us in transforming and processing information rather than serving as a standalone database. ??

Can you minimize hallucinations? Can we combine LLMs with reliable sources of information and get the best of everything? Yes, we can.

However, this does not mean that we cannot use these systems to obtain reliable answers to questions. By combining the parametric knowledge of the neural network with access to trustworthy information sources, we can enhance the accuracy and reliability of the responses. Many current language models, or better LLMs with the right complements, perform preliminary searches on the internet to gather reliable sources when answering a question. This allows them to provide answers that are not solely based on the network's parametric knowledge but also supported by credible references. ??

Users can also supply these systems with solid information or documents, ensuring that the responses are grounded in reliable data. The language model's ability to transform and adapt this information to the user's specific needs and level of understanding makes these tools exceptionally powerful and versatile. ???

A classic error: don't know what an LLM such as ChatGPT can solve or not from a prompt. Well, not so difficult. Current LLMs implemented using transformers (this will change for sure) generate words at the same pace. So, it doesn't matter if the prompt is something simple to solve or not, the words (tokens) are generated with the same computational effort. So, forget get good answers, even answers at all, in front complex questions where you, as human, need to carry out an intensive reasoning process.

Despite their impressive capabilities, it's important to recognize the limitations of current language models. The transformer architectures that underpin these models invest the same amount of computation time in generating each word of the response, regardless of the complexity of the question or prompt. In contrast, humans often require varying amounts of time to respond to questions, depending on the level of reasoning and planning involved. ?

Many questions demand an extensive process of reasoning, planning, and even collaboration among large teams over extended periods to arrive at an answer. Current language models are limited in their ability to tackle problems that require such intensive reasoning and planning. While they have learned numerous problem-solving patterns and can handle small-scale reasoning tasks, they are still far from being able to address the full spectrum of questions that necessitate advanced cognitive capabilities. ??

This limitation is currently being addressed through the development of intelligent agents, which aim to scale up the ability of these systems to provide responses to questions that demand extensive reasoning, planning, and information integration. Various approaches and techniques are being explored to tackle this challenge, and it is undoubtedly an area where language models will continue to evolve and improve in the near future. ??

We have already interesting LLM-based intelligent agents we can work with, able to break current limits on solving complex goals and tasks. The near future comes with a new generation of tools able to combine the good strengths of transformers and similar architectures with other tools and the latest advances on reinforcement learning for complex problem solving. This has just started, the way towards Artificial General Intelligence is long but it is at least promising.

In conclusion, understanding the inner workings, capabilities, and limitations of large language models like ChatGPT is essential for effectively harnessing their potential. By recognizing their parametric knowledge, statistical nature, and ability to transform and adapt information, users can leverage these tools to enhance their own knowledge and problem-solving abilities.

As research in this field progresses, we can expect language models to become increasingly sophisticated, capable of tackling ever more complex tasks and questions. However, it is crucial to approach these systems with a clear understanding of their strengths and weaknesses, using them as powerful assistants rather than infallible sources of knowledge. By doing so, we can unlock the true potential of these remarkable AI tools and drive innovation and progress in various domains. ??

Mónica Villas

Consultora de Tecnología | Inteligencia Artificial | Docente | IA Etica | Tecnologías disruptivas | Divulgadora Tecnológica |?Co-Fundadora de OdiseIA | Co-Autora del "Manual de ética aplicada a la IA"

11 个月

Very clear Jordi Linares, PhD thks

回复

要查看或添加评论,请登录

Jordi Linares, PhD的更多文章

社区洞察

其他会员也浏览了