Deep Learning 1: ANN (Artificial Neural Network) Architecture

Deep Learning 1: ANN (Artificial Neural Network) Architecture

Neuron and perceptron

Deep learning is heavily inspired by our own nervous system. Just as our nervous system works, deep learning also works in a similar way. So, we need to discuss this point. We will discuss how a perceptron (ANN Basic building block) compares to a single neuron.

Take a look at this image. On the right-hand side, we have a perceptron. On the left-hand side, there is a neuron. If we compare side by side, we can see some similarities. we could say that the perceptron is inspired by neurons because the structure is similar. Input part here can be compared to the dendrites. The weights can be compared to the nucleus, and the output going out is similar to the axon. Neurons connect with each other to form the nervous system, and similarly, multiple perceptron's connect to form a neural network.

Before diving into ANN Architecture, it's important to understand the Perceptron

To start, let’s look at the structure of a Perceptron. Here’s the model of a Perceptron. First, we’ll understand its different components and then how it operates. In a Perceptron, inputs are provided from one side. These inputs are represented as x1, x2 .. xN. There's also a bias term, denoted as '1'. The connections between inputs and the summation block are called weights and biases. The weights are represented by 'W' and the bias term is represented by 'X0'. The summation block adds up these inputs after they are multiplied by their respective weights.


.

The operation within this block can be described as follows: we multiply each input with its corresponding weight, add the results, and then add the bias term. For example, for inputs x1 and x2, the operation is: W1x1 + W2x2 + X0. The resulting value, let’s call it 'z', is then passed to the activation function. The activation function is responsible for normalizing the output to a specific range, which could be between 0 and 1, or -1 and 1, depending on the function used.

An activation function can be something like a step function, where the output is '1' if z is greater than or equal to 0, and '0' otherwise. There are other activation functions as well, like the sigmoid function, ReLU, etc.

Identification of weight and biases

Training Process

The goal of the training process is to find the correct values for the weights (W1, W2) and the bias (b). This process involves adjusting these values so that the Perceptron can make accurate predictions.

Training a perceptron is covered in this article in details and using backpropagation/forward propagation (will be covered in detail later)

https://www.dhirubhai.net/pulse/logistic-regression-deep-learning-approach-jitender-malik-7vekc/?trackingId=L0b%2F77PUQtmR%2F19HU9yVaQ%3D%3D

Problems with Perceptron

The reason for this is that the perceptron can only work on linear data. It cannot work on nonlinear data. No matter how much time we give perceptron to train, it will never be able to classify nonlinear data correctly.

If we will closely see, then this dataset is linearly separable because both classes are separate. The inputs are X1 and X2 and learning rate at 0.03. When we run it, we can see that the perceptron quickly finds a line and gives us the correct result.


Now, if we look at this dataset. No matter how much time we give or how many epochs we run, we will not get the output. it won't be able to separate both classes.

Multi-Layer Perceptron (MLP)

While perceptron works, but its only problem is that if there is non-linearity in our data, it cannot capture it

like below example, where there is a dataset with two classes, green and red. But we can see to separate these two classes, we need a decision boundary like this, which is actually not a line.


The perceptron creates a boundary like this. The problem right now is that we need an algorithm that can capture any kind of non-linearity that our perceptron cannot.


The most challenging thing is that we need to create that algorithm using perceptron's. So, we will try to create a network using multiple perceptron's that can capture any kind of non-linearity. Now, let's try to get an idea of the multiplayer perceptron, I have trained two separate perceptron's on data.


As shown in image in model 1, W1 is 2, W2 is 3, and the bias is 6 and line of this perceptron will look something like this.

The equation of line will be 2x + 3y + 3 = 0

and in model 2, W1 is 5, W2 is 4, and the bias is 3.

The equation of line will be 5x + 4y + 3 = 0

Now if we somehow combine the output of these two perceptron's, we can create a new decision boundary like this. So, think that I have superimposed this image over the previous one, where I am getting this kind of new decision boundary.



Now we smooth it, and as we smooth it, it will look like this.


This is the decision boundary we needed. So this is the basic idea of the multiplayer perceptron.

So, this whole thing is actually doing the work of a perceptron, what is happening is that we are creating a combination of three perceptron's, which we can call a multi-layer perceptron.

we can see this is called a multi-layer perceptron because here, there are multiple layers. first is called the input layer, hidden layer and output layer. so we understand how we can organize more than one perceptron in such a way that we can capture non-linear interactions. The concept is very simple; we create a linear combination of multiple perceptron's.

ANN Architecture

First of all, it is important to understand what is meant by the architecture of a neural network. The basic meaning of architecture is how nodes, which are also called perceptron's, are connected to each other. The connection between the weights and nodes is referred to as architecture.


Now we will understand how we can make changes in our neural network architecture to have some additional flexibility.

So, what can we do to bring changes to architecture as per the new requirement? There are four ways in which we can bring changes.

1.Increase the number of nodes in the hidden layer: it means we can increase the number of perceptron's. For example, in below image middle one is hidden layer where we are adding one more perceptron.

Until now, we had two perceptron's in the hidden layer, but in this case, we can see there are three perceptron's. Everything else remains the same, we have added an extra perceptron in the hidden layer. Now, what happens because of this is that if I have such data that is very non-linear, then the additional nodes help in capturing that complexity. When I created a linear combination of these three, I got this output. Now, nothing different happened. Everything is the same; just one extra node was added. If we add one more node, we will have to add one more weight well. So, the thing to understand is that we can add as many nodes as we want to the hidden layer, and the second thing to understand is that the more nodes we add, the more it will help in creating complex non-linear decision boundaries.

2. Increasing the number of nodes in input layer. When the input columns in data increase. So, it is very simple that the more columns we have in the input, the more nodes we will have in the input layer.

3. Increasing the number of units in output layer. In all the examples we have discussed so far, there was always only one perceptron in the end. But it is not necessary; we can have more than one perceptron. Generally, we do this when we are doing multi-class classification. For example, if we want to identify from a photo whether there is a dog, a cat, or a human in it, what we will do is create three perceptron's, one for the dog, one for the cat, and one for the human. In the end, whichever has the highest probability, we will say that the particular photo has a dog, or a cat, or a human. So, the learning point here is that in multi-class classification cases, we can have multiple nodes in output layer.


4. Increasing the number of hidden layers (Deep neural network). Until now, we haven't increased the number of layers; we were increasing the nodes within the layer. But if we want, we can increase the number of layers. Here we can see, this is the input layer, this is the


output layer, but here we have two hidden layers. what benefit we get from this? The benefit is that when we have very complex non-linear data, which requires very complex decision boundaries, we will only be able to create those boundaries by adding layers. First layer is only creating the needed amount of complexity. But as we go deeper into the network, we will be able to capture more complex relationships. And then their linear combination will capture even more complex relationships. No matter how complex our data is, how non-linear it is, if we keep adding layers to network, given enough time for training, neural network will be able to capture the relationships.

For example, the data which perceptron could not capture multi-layer perceptron easily captured those non-linear boundaries as shown below.





Vansh Jaiswal

Gen AI and LLM Engineer | Senior Data Scientist | Azure AI & Azure OpenAI | Prompt Engineer | Artificial Intelligence Engineer | Azure OpenAI Certified | Machine Learning | NLP | Corporate Trainer AI & ML

7 个月

Very helpful!

Surya Mani Singh

Assistant Consultant | Sr Devops Engineer

7 个月

Very helpful!

ARUN YADAV

(Senior SDET》| Java | Rest API | Selenium|JavaScript| Playwright|Appium| Automation Innovator | Cross Browser Testing| Continuous Testing | Web Development | Leading with Quality

7 个月

Wonderful

Piyush Kanungo

Principal Architect at Persistent Systems, Pune

7 个月

Explain b ??

要查看或添加评论,请登录

Jitender Malik的更多文章

社区洞察

其他会员也浏览了