Deep Learning : Neural Networks
Chandra Prakash Bathula
Adjunct Faculty at Saint Louis University | Machine Learning Practitioner | Web Developer | GenAI Developer
History Of Neural Networks and Deep Learning
In the history of tech evolution, there have been numerous inventions and services inspired by the biological structure and functioning of living beings. From bullet trains to prosthetic limbs, artificial photosynthesis to structural engineering, and many more innovations, nature has been a rich source of inspiration.
But what if we could mimic the functioning of the brain?
Despite millennia of human evolution, we still understand only about 10-20% of the brain's functioning and capabilities. The brain remains a mysterious marvel!
Now, let's delve into the brain's functioning and its connection to the field of Machine Learning.
The Birth of Neural Networks:
In 1957, a mathematician named Rosenblatt made a groundbreaking contribution to this field by creating the "Perceptron," the very first and simplest Neural Network model. The perceptron, intuitively similar to logistic regression with minimal changes, marked the beginning of a new era.
Understanding Neurons:
A neuron, a fundamental brain cell, is responsible for processing and transmitting information. It comprises dendrites that connect with other neurons, axons for signal transmission, and soma, the cell body responsible for triggering electrochemical signals that produce outputs. Neurons do not act in isolation; they are interconnected and transmit information from one to another.
Have you ever wondered how we react to situations we've never experienced personally?
Some reactions, like avoiding fire or sharp objects, are learned from others' experiences, and our brains store this knowledge for us.
Complex Neural Connections:
Is there a way to mimic the structural functioning of the brain's network?
The Rise of Deep Learning:
For image detection, most algorithms struggled to achieve high accuracy until 2010. In 2012, the ImageNet competition for image classification became a turning point. A team utilized Deep Neural Networks with extensive datasets and advanced computational resources, achieving remarkable results. This success propelled deep learning into the spotlight. Large internet companies like Amazon, Microsoft, Baidu, Google, and Facebook (now Meta) invested in data, computational resources, and talent, leading to the development of new and advanced algorithms.
Applications of Deep Learning:
Since then, Deep Learning has been implemented in various domains:
Artificial Neural Network (ANN):
The initial Artificial Neural Network looks like the below structure
The initial representation of an Artificial Neural Network includes multiple layers where inputs are assigned relative weights. These inputs are combined, summed up, and subjected to an activation function. This output can further serve as input for other neurons.
This architecture is loosely inspired by biology but not an exact replica. And actually CNNs are from early 1990s.
Biological Neurons:
From the perspective of Graph Theory they are outgoing edges or vertices and in biology they are "axons".
And here every dendrite doesn't have to be similar and some of them are thick and some are thin and each have their own importance and information passing through.
Biological neurons, from a graph theory perspective, are outgoing edges or vertices. In biology, these are known as "axons." Not all dendrites are alike; some are thick, some are thin, and each carries unique information.
The weight assigned to each dendrite varies depending on the importance of the signals. The heavier the weight and the thicker the dendrite, the more critical the information.
Neurons are activated or triggered when they receive sufficient signal input, resulting in the release of output.
Opt = f(ΣWiXi) for i = 1 to n.
Formation of Neural Networks:
Neural networks are interactions of neurons. At birth, neural networks are sparse, but during early childhood, as we learn language, speech, and object identification, connections multiply. The brain's energy consumption is substantial during this phase, sustaining electrochemical reactions required for connections.
As we age, neural connections start to degenerate, leading to issues like difficulty recognizing objects and people, communication problems, and impaired hearing and vision.
Just like any other cells in the body, dead brain cells lead to a loss of cognition and thinking ability.
Biological Building of Neural Networks:
Biologically inspired algorithms are designed to form new connections as soon as they consume data. Interestingly, these connections are not coincidental but are influenced by the consumed data.
During early childhood till 6 years old as we tend to learn the language, speech and identification of objects, and at this stage we tend to have high connection rate. And for more connections, connecting with the edges we need lot of energy, calories, to sustain the electro chemical reaction.
Till the 6 yrs age children consumes a lot of data which is visual, audio, sensory inputs etc...
Connections that are formed are unique and different based on the consumed data.
Example:
Let's think of a kid who got hurt by touching a hot object, in this case a new neural connection will be formed and depends on the cases the connections will be differed.
Based on the data, the neurons will learn the behavior. The connections that were formed either directly by physical experience (or) by someone who lets us know, in this way neurons learns the behavior.
To take an example, it is not always necessary that a kid needs to touch the hot object to know that it makes him hurt. He can learn this by listening to his parents or someone else's experience.
All the learnings, in this case biological learnings by the neurons, are nothing but the weights on neural connections according to the learning, behavior (or) experience. This concepts will connect relatively well with "Artificial Neural Networks".
If the weight is equal to "0" there is no connection. If weight increases connections are formed.
In ANN, learning is nothing but connecting to various single neurons with some edges and with some weights on them and thats what learning is all about. And learning happens through data. This means all this weight learning even in Artificial Neural Network (or) a computational Neural Network is nothing but weight calculation.
Logistic Regression & Perceptron:
Logistic regression is a well-known technique for separating positive and negative points geometrically. It can be understood from a neuron's perspective as mapping inputs to predicted values.
Perceptron serves as a linear classifier, attempting to find a line for classifying points, albeit without the squashing function used in logistic regression.
Now, let's go into the explanation of Logistic Regression from a neuron's perspective.
Mathematical Explanation:
We train LR by using SGD. (LR and optimization case.)
The most important part of training a Neural Network is nothing but finding the weights on edges/vertices.
LR => Neuron (As we made the activation function present neuron as the Sigmoid or Logistic Function).
Perceptron:
Perceptron as an idea was created in 1957 by Rossen blat. And the idea is very similar to the idea of Logistic Regression with some differences.
The only difference in LR and perceptron is the activation function.
领英推荐
The idea of a biological neuron, then was, we will get some inputs, differing on sizes; thick and thin based on their weights and this neuron then fires (or) does not trigger; triggering or fire here referring to the generation of a value; if not triggered the result value is "0".
Perceptron is a linear classifier. From a geometric stand point, a perceptron is, or a perceptron also tries to find a line to classify the points accordingly. The only difference is that the LR has the "Squashing or sigmoid function" in perceptron we don't have the above.
Perceptron is the oldest model in Machine Learning ever used.
But wait, how to train the Perceptron?
Of course, we have SGD.
And we also know how to use SGD in Linear and Logistic regression. But training a perceptron means computing the edges weights (or) any of the Gradient Decent approaches can be used.
We can often think LR and perceptron as a simple single neuron models. The only difference is in LR we use "Sigmoid" & in perceptron we have a function in which it says about the value whether it is "0" (or) not.
Multi Layered Perceptron (MLP):
The perceptron represents a simplified version of a neuron and shares some similarities with logistic regression. In a single-neuron model or perceptron, there are input and output layers, akin to logistic regression.
However, a Multi-Layered Perceptron (MLP) introduces multiple layers, each akin to a perceptron or neuron. Each layer comprises a function, which may be the same or different.
In a single neuron model (or) a perceptron we will have an input layer, and the output layer and same as in Logistic Regression.
The above Neural Network is also called as "Multi Layered Perceptron". There are multiple layers here and each of them can be thought of as perceptrons. And in every neuron (or) perceptron there is a function be it same function or different function.
Is there any reason about, Why do we need to care about MLP?
Actually, there are multiple but we lets discuss this in two arguments.
And that is how they have discovered that using a neural structure for reference is more powerful and useful than using a single perceptron, as the multi layered structure can bring more benefits.
Mathematical Argument:
MLPs allow solving complex problems by adding layers and applying simple mathematical functions. Even highly complex problems can be addressed through this layering technique.
Let's take an example of a regression problem by diverting slightly from the core discussion that we had till now.
D = {xi, yi}. Find function F(xi) that connects "xi" with "yi"; ( yi E Real Numbers)
We are passing the inputs through sequences of functions. Now we can represent a complex function f(x) = 2 sin (x^2) + √5x by using Multi layered Perceptron like structures.
By using a Multi-layered structure, we can come up with complex mathematical functions to solve our problem.
The multi-layered perceptron structure gives enormous power to the models.
{A Simplest model is linear model. If the data is not linear, non-linear models like RBF-SVM (Support Vector Machine), RF, GBDT.}
For the above we are creating non-linearity by using sequence of functions and applying them which are very powerful. This is equally relevant to the high school concept, function composition. [F(g(x)) ; G(f(x))]
yi = 2 sin (x^2) + √5x;
F(x) we can write this F(x) in terms of f1,f2,f3,f4,f5,....etc..
F(x) = 2 sin (x^2) + √5x
yi = 2 sin (x^2) + √5x; can be written as :
Multi Layered Perceptron is same as the function composition concept.
Multi-layered structure with powerful models can Over fit easily.
Multi Layered Perceptron is very important and interesting ideas in whole deep learning models and probably in the whole Machine Learning itself.
Notation:
D = {xi, yi} ; xi E Real Number -> Regression Problem.
Images:
Xij => "i" index & "j" feature
Fij => "i" Layer number; "j" function index.
k -> next layer.
Wij => weights; "i" from input, "j" to neuron.
Here, we are connecting all the possible connections in the layers for every possible pair.
A network in which we are having all the possible connections available is called a "Fully Connected Neural Network" (or) a "Fully Connected Multi Layered Perceptron" excluding inter layer connections. We can only connect from one stage to another (or) one layer to another layer.
Now let us look at from the perspective of the first Neuron:
W11, W12 & W13 are the weights from the first input to the first layer.
These weights can be represented as a matrix of 4 X 3 ; 4 rows and 3 columns.
Further Deep Learning Concepts will be discussed in the later articles....
References:
Main Image Src:
Shrestha, A. (2020, January 22). Machine Learning: A Primer to Laboratory Applications. The Connected Lab. https://www.thermofisher.com/blog/connectedlab/machine-learning-a-primer-to-laboratory-applications/
Biological Neural Connection Image:
Szczepaniak, M. O. & V. (2019, January 14). Neuroscience and Suzuki:Brain Development from Age Zero, and the Impact of Early Childhood Music Ed. WSSTE Suzuki. https://www.wsste.com/post/neuroscience-and-suzuki-brain-development-from-age-zero-and-the-impact-of-early-childhood-music-ed
HR Operations | Implementation of HRIS systems & Employee Onboarding | HR Policies | Exit Interviews
12 个月Great share. Besides Support Vector Machines, during 1980 and 2010, researchers worked on expanding MultiLayer Perceptrons (MLPs) which were invented by Ivankhnenko and Lapa in 1965 and began to be called Deep Learning Networks (DLNs) in 1986. As mentioned in a previous blog, a one layer Perceptron network consists of an input layer connected to a hidden layer, which is connected to an output layer of Perceptrons (or vertices). The Perceptron multiplies incoming signals by their weights and adds them together. If the sum of the weighted signals exceeds a specified value, the Perceptron “fires”.Activation functions, such as Tanh, ReLU, and Sigmoid, are used to determine if a Perceptron fires. Artificial Neural Networks (ANNs) are simply Perceptrons or other similar neurons that may have different activation functions. DLNs have more than one hidden layear and are complex due to the non-linear nature of activation functions, making them unexplainable "black boxes". Researchers like Hinton, LeCun and Schmidhauber popularized variants of DLNs, e.g., Fully Connected Networks, Autoencoders, Convolution Neural Networks, Recurrent Neural Networks, Long Short Term Memory, and Deep Belief Networks.