Unveiling the Structure of Neural Networks: A Primer on the Basics
Artificial neural networks, often referred to simply as neural networks, are computational models inspired by the intricate structure and functioning of the human brain. They consist of interconnected nodes, or artificial neurons, organized into layers. The fundamental components of a neural network closely resemble the neurons found in the human brain, communicating with each other through connections known as synapses.
The neural network architecture typically comprises three main types of layers:
Each connection between neurons of the layers is assigned a weight, representing the strength of the connection. Biases are individual terms associated with each neuron, accounting for offsets and providing the network with flexibility to capture shifts in the data.
Each neuron applies an Activation Function ( more on this later in the article :-) ) to the weighted sum of its inputs, introducing non-linearity to the model and enabling it to learn intricate patterns.
In essence, the neural network architecture mirrors the inter-contentedness of neurons in the human brain, allowing it to process information, learn from data, and make predictions or classifications. As data passes through the network during training, the weights and biases are adjusted based on the error at the output.
To understand further, in our simplified exploration, we'll focus on the fundamental training techniques that underpin the optimization of neural networks. Central to this process are methods such as Back-propagation, which leverages the Chain Rule, and the ubiquitous Gradient Descent that tries to minimize the value of the loss function. While there exist additional strategies for enhancing model performance, our attention in this article is dedicated to a closer examination of the intricacies of Gradient Descent.
Consider a basic illustration of a neural network featuring a single input neuron and one output neuron, complete with weights and biases to facilitate comprehension.
In this scenario, we designate the,
?Input value as 'x'
?Weight as 'w', and
?Bias as 'b'.
The output, denoted as 'y', is calculated through a simple formula:
y = (x * w) + b, highlighting that the output is a product of the input, adjusted by the respective weights and biases.
Expanding our example to incorporate a hidden layer with a solitary neuron introduces an additional layer of complexity, involving a fresh set of weights and biases. Now, envision,
'x' as the input value
'W_in', the weight for the connection between the input and hidden layer,
'B_in', the bias for the hidden layer neuron,
'W_out', the weight for the connection between the hidden and output layer, and
'B_out', the bias for the output layer.
The computations unfold as follows:
first, the hidden layer computation is expressed as
Subsequently, the output layer computation becomes
This depiction elucidates that, in a single hidden layer with one neuron per layer, the neural network generates an output based on the given input, integrating weights and biases from both the input-hidden and hidden-output layers.
Examining the outcomes of these functions, it becomes apparent that they yield linear outputs, reflecting a stark contrast from the intricate patterns present in real-world data. This simplified neural network, while effective in linear scenarios, lacks the complexity required to handle more nuanced and non-linear data representations.
To be able to classify or make predictions with non linear data such as for ex with,
a non linear function is needed, and this is where the activation functions come in.
The activation functions serve a crucial role in introducing non-linearity to the model. Without activation functions, the network would be limited to linear transformations, making it incapable of capturing complex patterns and relationships in the data. Activation functions enable neural networks to model and learn non-linear mappings, allowing them to approximate more intricate functions and patterns.
So in the basic example with a single neuron in hidden layer with activation function becomes,
Here, A_f(Y_hidden) represents an activation function applied to Y_hidden, and if an activation function such as Sigmoid activation function is used, it is is given by:
There are many different activation functions used in neural networks that is dependent on the data and the problem to solve, and some are illustrated below
Training
To be able to unleash the full potential of the model, the neural network needs to undergo a crucial phase known as training. This intricate process involves grappling with concepts like loss function, gradient descent, and backpropagation. Let's embark on a concise exploration, delving into the mathematical underpinnings of these fundamental concepts to unravel their significance in the training journey.
Loss Function
A loss function, also known as a cost function or objective function, is a mathematical measure that quantifies the difference between the predicted output of a model and the actual target values in the training dataset. The primary purpose of a loss function is to represent how well or poorly a model is performing on a given task. The goal during the training process is to minimize this loss, as a lower loss indicates better alignment between the model's predictions and the true values.
The choice of a specific loss function depends on the nature of the machine learning task. For regression tasks, where the goal is to predict a continuous value, Mean Squared Error (MSE) is commonly used as the loss function. For classification tasks, where the objective is to assign instances to predefined classes, Cross-Entropy Loss is widely employed. There are various other loss functions tailored to specific tasks, and the selection of the appropriate loss function is a critical aspect of designing an effective machine learning model.
?The simplest loss function, the Mean Squared Error (MSE), measures the squared difference between the predicted output of the neural network and the actual target values. It is given by:
where Y_output is the predicted output and Y_target is the actual target value.
Gradient Descent
Gradient descent is an iterative optimization algorithm used to minimize a cost function or loss function in the context of training machine learning models, including neural networks. The primary objective of gradient descent is to find the minimum of a function by iteratively adjusting the model's parameters (weights and biases) in the direction that reduces the function's value most rapidly.
The core idea behind gradient descent is derived from calculus and involves computing the slope of the loss/cost function with respect to each parameter. The positive gradient points in the direction of the steepest ascent, and the negative gradient points in the direction of the steepest descent.
By moving opposite to the gradient, the algorithm aims to reach the minimum of the cost function.
领英推荐
The update rule for each Model parameter (Mp) in gradient descent is given by:
where:
The learning rate is a hyperparameter that influences the convergence and stability of the algorithm. If the learning rate is too large, the algorithm may overshoot the minimum, and if it is too small, convergence may be slow.
The algorithm iteratively updates the parameters until convergence, where the model reaches a state where further adjustments do not significantly reduce the cost function.
To derive the dL/dMp needed for the new parameter (Mp) values, the Back-Propagation through Chain Rule technique is used
Backpropagation involves computing the gradients of the loss function with respect to the weights and biases in the network. The chain rule is fundamental to backpropagation, and the steps include:
The chain rule is applied at each step to calculate the gradients efficiently.
Let’s see how the back propagation and chain rule is implemented on a simple network,
Let, L be the Mean Squared Error (MSE) loss function.
?Using the chain rule, the derivative of Loss function with respect to the model parameters is calculated:
Loss with respect to W_out and B_out:
?Loss with respect to W_in and B_in:
?
Using gradient descent algorithm, the weights are adjusted as,
The calculations can be illustrated with the following values:
First, we calculate Y_output,
Y_output = ((x W_in) + B_in) W_out) + B_out
??????????????? = ((0.4 2) + 1.5) 0.5) + 2
??????????????? = 3.15
The Y_target = 2.15 is the expected output
The Loss , L is,
To adjust B_out, we use the back propagation and gradient descent techniques,
dL/dB_out = dL/dY_output * dY_output/dB_out
dL/dY_output = 2 * (3.15 – 2.15 ) = 2
dY_output/dB_out = 1
so, dL/dB_out = 2
And the new B_out will be, B_out = 2 – (0.5 *2) = 1 ??????????????????????
In the subsequent training iteration, the updated value of B_out shall be used. Likewise, adjustments to weights and biases will be computed, and these refined values will be employed in subsequent training iterations.
Training a neural network involves the iterative process of adjusting its parameters, such as weights and biases, to minimize a defined loss function.
The initial weights and biases are set randomly, and during each training iteration, input data is fed forward through the network to make predictions. The predictions are then compared to the actual target values, and the difference is quantified by the loss function. The goal of training is to minimize this loss by updating the parameters through the above-described optimization algorithm such as gradient descent. The gradient of the loss function with respect to each parameter is computed during backpropagation, and the parameters are adjusted in the opposite direction of the gradient. This process continues until the model reaches a state where the loss is minimized, indicating that the network has learned the underlying patterns in the training data. This is how it happens,
Throughout the training phase, as the data is input into the system, the loss (MSE, for example) is computed by assessing the resultant output. The computation follows the formula:
If the loss is deemed unacceptable, the chain rule is employed to compute dL/dMp. Subsequently, through the backpropagation and gradient descent algorithm, the parameters undergo adjustment:
The newly computed Mp is then utilized to analyze the output by re-issuing the inputs, initiating a repetitive cycle of adjustments.
In our previous discussion in the article https://www.dhirubhai.net/posts/prathap-thammanna-847043a_machinelearning-linearregression-gpus-activity-7124421078516420608-UUxf?utm_source=share&utm_medium=member_desktop, we methodically derived the weights and biases of a linear equation using algorithms. Building upon that knowledge, let's now leverage the insights gained about neural networks. In this context, we will apply these concepts and the architectural principles to construct a linear regression model. Our focus will be on delving into the code, showcasing how the fundamental concepts of Loss, Gradient Descent, and Backpropagation can be employed to determine the same weights and biases. This demonstration will highlight the model's ability to make predictions for linear data along with the code to train the model.
We model this simple network into the code,
import numpy as np
import matplotlib.pyplot as plt
class LinearRegressionModel:
def __init__(self):
self.weights = None
self.bias = None
def train(self, X_train, y_train, learning_rate=0.01, epochs=100):
# Initialize weights and bias
np.random.seed(1)
self.weights = np.random.randn(1)
self.bias = np.random.randn(1)
for epoch in range(epochs):
total_loss = self._train_one_epoch(X_train, y_train, learning_rate)
# Print the total loss for this epoch
if epoch % 10 == 0:
print(f'Epoch {epoch}, Total Loss: {total_loss}')
def _train_one_epoch(self, X_train, y_train, learning_rate):
total_loss = 0
for i in range(len(X_train)):
# Forward pass for a single data point
prediction = self.predict(X_train[i])
# Compute the mean squared error for this data point
loss = (prediction - y_train[i]) ** 2
total_loss += loss
# Backward pass (gradient descent) for a single data point
grad_weights, grad_bias = self._backpropagate(X_train[i], y_train[i], prediction)
# Update weights and bias for a single data point
self.weights -= learning_rate * grad_weights
self.bias -= learning_rate * grad_bias
return total_loss
def _backpropagate(self, x, y_true, y_pred):
# Chain rule for gradients
grad_loss = 2 * (y_pred - y_true)
grad_weights = grad_loss * x
grad_bias = grad_loss
return grad_weights, grad_bias
def predict(self, x):
return x * self.weights + self.bias
# Generate some random data for training
np.random.seed(0)
X_train = np.random.rand(100, 1)
y_train = 2 * X_train + 1 + 0.1 * np.random.randn(100, 1)
# Create and train the linear regression model
linear_model = LinearRegressionModel()
linear_model.train(X_train, y_train)
# Test the model on new data
X_test = np.array([[0.2], [0.5], [0.8]])
predictions = linear_model.predict(X_test)
The graph of dataset, and the predicted Neural network linear regression shown below,
In conclusion, our journey through the intricacies of neural networks has unveiled the fundamental mechanisms that drive their learning and predictive capabilities. From the nuanced architecture inspired by the human brain to the indispensable concepts of loss, gradient descent, and backpropagation, we have navigated the landscape of artificial intelligence with a focus on neural networks. As we stand at the nexus of data science and computational innovation, understanding these principles becomes not just beneficial but essential for anyone venturing into the realms of machine learning and neural network applications. Armed with this knowledge, we can harness the power of neural networks to tackle complex problems, make accurate predictions, and drive advancements that redefine the boundaries of technological innovation. As we move forward, the synergy between human intelligence and artificial neural networks promises a future where the uncharted territories of knowledge and discovery are within our computational grasp.
In the next article, my focus will delve into the realm of TensorFlow, exploring its applications and the utilization of Graphics Processing Units (GPUs) to enhance the computational efficiency of neural networks. We will unravel the synergy between TensorFlow's powerful capabilities, and the accelerated processing potential offered by GPUs, shedding light on how this combination contributes to the optimization of neural network computations.
HR Operations | Implementation of HRIS systems & Employee Onboarding | HR Policies | Exit Interviews
7 个月Well-crafted post. Besides Support Vector Machines, during 1980 and 2010, researchers worked on expanding MultiLayer Perceptrons (MLPs) which were invented by Ivankhnenko and Lapa in 1965 and began to be called Deep Learning Networks (DLNs) in 1986. As mentioned in a previous blog, a one layer Perceptron network consists of an input layer connected to a hidden layer, which is connected to an output layer of Perceptrons (or vertices). The Perceptron multiplies incoming signals by their weights and adds them together. If the sum of the weighted signals exceeds a specified value, the Perceptron "fires". Activation functions, such as Tanh, ReLU, and Sigmoid, are used to determine if a Perceptron fires. Artificial Neural Networks (ANNs) are simply Perceptrons or other similar neurons that may have different activation functions. DLNs have more than one hidden layear and are complex due to the non-linear nature of activation functions, making them unexplainable "black boxes". Researchers like Hinton, LeCun and Schmidhauber popularized variants of DLNs, e.g., Fully Connected Networks, Autoencoders, Convolution Neural Networks, Recurrent Neural Networks, Long Short Term Memory, and Deep Belief Networks.
Senior Manager at Radisys Corporation
10 个月https://redblink.com/generative-ai-vs-machine-learning-vs-deep-learning/