Overview of Artificial Intelligence and Neural Networks
JULIA CHAMBERS
Marketing & Operations Specialist at Willis Allen Real Estate?? | Forbes Global Properties Associate
Artificial Intelligence starts with this concept called reinforcement learning, a type of machine learning where an agent learns to make decisions by interacting with its environment. It refines its actions based on feedback in the form of rewards or penalties, aiming to maximize its total rewards over time. The cycle of taking actions will continue and the agent will then learn which actions lead to a reward and which don't. The key concept of reinforcement learning is that the agent is not pre programmed.
There are two broad categories of algorithms called deterministic search and non-deterministic search. A deterministic search is where if an agent takes and action such as deciding to go up, there is a 100% chance that it will go up. A non-deterministic search is where if an agent decides to go up there are other paths that it could take so this causes more randomness to the outcome of the action it takes.
The Markov Process is a model that describes a system evolving over time according to the Markov property. In this model, the conditional probability distribution of future states depends only on the current state, not on the sequence of events that preceded it. The system evolves without any decisions from the agent, as the transitions are entirely probabilistic and follow the Markov property. On the other hand, the Markov Decision Process (MDP) provides a mathematical framework for modeling decision-making in situations where outcomes are partially random and partially controlled by the decision maker. It also relies solely on the current state, meaning it has no memory. The key difference between the two is that the MDP involves actions taken by the agent, which influence state transitions and rewards, with the goal of maximizing cumulative rewards over time.
The Bellman equation helps determine the optimal value function that an agent should use to make the best decisions over time, given the potential rewards and penalties. The equation is crucial for finding the optimal strategy in reinforcement learning by recursively calculating the expected reward of different actions. It helps the agent determine which actions will lead to the most favorable outcomes over time.
Sometimes a "living penalty" can be introduced into where the agent will get a small negative reward for any action. This helps encourage the agent to take the quickest route to its goal, since the more actions it takes the more negative rewards incurs.
The Q-learning algorithm is a type of reinforcement learning used to find the optimal action selection policy for an agent interacting with the environment. The "Q" in the name stands for quality, which refers to the quality of a specific action taken in a given state. The goal is for the agent to learn an optimal policy which is a strategy for selecting actions that maximize cumulative rewards over time. Temporal difference is a reinforcement learning method where an agent updates the value of a state or action using an estimate of future rewards. As temporal difference goes on it should be closer and closer to 0 if the environment keeps changing then temporal difference should keep getting calculated.
Deep Q Learning is another type of algorithm that involves learning and actions. The neural network learns through updating its weights. When the temporal difference equals 0 this means the agent has learned well about the environment and it knows it so well that it can predict which action to take. After calculating the loss at the end of the neural network you put it back into the neural network and keep doing this process so that the agent will learn its environment better. At the end of the neural network with the output layer you will have several difference Q-target values. The process of acting is where the Q values pass through the softmax function and the softmax will select the best action possible based on the Q-values. The higher the Q-value the better the action is to take.
Experience replay can be implemented, where each state is stored in the agent's memory. At a certain point, the agent decides it’s time to learn and then randomly samples experiences from its memory. This helps break the bias of patterns. The main advantage of experience replay is that it allows rare, potentially valuable experiences to be stored and revisited. By grouping experiences into batches and sampling them uniformly, the agent can learn more effectively. Ultimately, this method speeds up learning by ensuring the agent can access and learn from rare experiences, reducing bias in its learning process.
领英推荐
Deep Convolutional Q-Leaning is another type of algorithm that is a combination of Deep Q-Learning and Convolutional Neural Networks (CNNs). CNN's are used to process complex high dimensional input data such as images in Q-learning. The agent will go through a convolutional later then a pooling later and finally a flattening layer before being inputted to the neural network.
The first step is the convolution step where in order to obtain our first convolutional layer we will use multiple feature maps. Different filters are then applied to produce feature maps that highlight important patterns, such as edges, textures, or shapes. The filter’s purpose is to highlight certain features, making them easier to learn for the next layers of the network.
The next step will be the ReLu Layer and this is where we apply the Rectifier function because it breaks up linearity. This operation helps introduce non-linearity into the network, accelerates learning, and makes the model more efficient.
After that we have max pooling where we reduce the spatial dimensions (height and width) of the input feature map. This results in fewer parameters and computations, making the network more efficient.During this step a filter slides over the input (typically a feature map from a convolutional layer) and, at each location, selects the maximum value from a small rectangular region called the pooling window.
The last step is the fully connected layer, where every neuron in the previous layer is connected to every neuron in the current layer. This allows the model to combine the features extracted by earlier layers and learn more complex patterns. After the fully connected layer, the model undergoes backpropagation, a process where the model adjusts the weights and biases to minimize the error in its output. During backpropagation, the model computes the gradients of the loss function with respect to the weights and updates the weights accordingly, iterating this process until the neural network learns the optimal weights for making accurate predictions.
The most advanced and up to date algorithm is the Asynchronous Advantage Actor Critic algorithm (A3C). What makes this algorithm different is that at the end of the neural network we will have a set of outputs grouped together and another output. Asynchronous also means that instead of having one agent in the environment we instead have several. All the agents are going through the same environment each having its own experience which will allow for the algorithm to train faster because they will all learn from each others experiences that they share between each other. At then end of the neural network all the agents contribute to the same critic. Once the neural network is completed the policy loss is then calculated and then backpropogated through the neural network until its maximized.
Lastly we have whats called a Large Language Model algorithm and this is a type of deep learning algorithm designed to understand, generate, and predict human language. This is the type of algorithm that Chat GPT uses. LLMs are called "large" because they are trained on vast amounts of data and have a large number of parameters. The more parameters a model has, the better it is at capturing complex language patterns, but it also requires more data and computational power.
These are the ingredients of LLM's:
For tasks like text generation, the model uses a probabilistic approach to generate text. It predicts the next word or token based on the context of the previous tokens. This can be done using techniques like beam search or sampling, which help generate coherent and contextually appropriate text. LLM's have revolutionized the field of natural language processing, enabling machines to understand and generate human-like language in a variety of applications.