Deep Learning : Neural Networks

Deep Learning : Neural Networks

History Of Neural Networks and Deep Learning

In the history of tech evolution, there have been numerous inventions and services inspired by the biological structure and functioning of living beings. From bullet trains to prosthetic limbs, artificial photosynthesis to structural engineering, and many more innovations, nature has been a rich source of inspiration.

But what if we could mimic the functioning of the brain?

Despite millennia of human evolution, we still understand only about 10-20% of the brain's functioning and capabilities. The brain remains a mysterious marvel!

Now, let's delve into the brain's functioning and its connection to the field of Machine Learning.

The Birth of Neural Networks:

In 1957, a mathematician named Rosenblatt made a groundbreaking contribution to this field by creating the "Perceptron," the very first and simplest Neural Network model. The perceptron, intuitively similar to logistic regression with minimal changes, marked the beginning of a new era.

Understanding Neurons:

Image Of Biological Neuron


A neuron, a fundamental brain cell, is responsible for processing and transmitting information. It comprises dendrites that connect with other neurons, axons for signal transmission, and soma, the cell body responsible for triggering electrochemical signals that produce outputs. Neurons do not act in isolation; they are interconnected and transmit information from one to another.

Have you ever wondered how we react to situations we've never experienced personally?

Some reactions, like avoiding fire or sharp objects, are learned from others' experiences, and our brains store this knowledge for us.

Representation Of how complex neural connections can be.

Complex Neural Connections:

Is there a way to mimic the structural functioning of the brain's network?

  • In the 1980s, people began to find answers. However, they faced challenges from the 1960s onward. One of the most significant breakthroughs in Machine Learning, known as Back Propagation, was proposed by Geoffrey Hinton and a group of mathematicians in 1986. Back Propagation can be thought of as the "Chain rule" of differentiation with mathematical insights.
  • However, computational power in those days was limited, and alternative algorithms such as Support Vector Machines (SVM) gained attention. Two decades later, Geoffrey Hinton returned with a seminal paper on Deep Neural Networks, now known as Deep Learning, ushering in various related concepts.For the problem of image detection most of the algorithms failed to give a high accuracy till 2010.

The Rise of Deep Learning:

For image detection, most algorithms struggled to achieve high accuracy until 2010. In 2012, the ImageNet competition for image classification became a turning point. A team utilized Deep Neural Networks with extensive datasets and advanced computational resources, achieving remarkable results. This success propelled deep learning into the spotlight. Large internet companies like Amazon, Microsoft, Baidu, Google, and Facebook (now Meta) invested in data, computational resources, and talent, leading to the development of new and advanced algorithms.

Applications of Deep Learning:

Since then, Deep Learning has been implemented in various domains:

  • Voice Assistants like Siri, Alexa, and Google Assistant.
  • Self-Driving Cars from companies like Tesla and Google's Waymo.
  • Mobile applications for skin-cancer diagnosis using mobile cameras, all powered by Deep Learning.

Artificial Neural Network (ANN):

The initial Artificial Neural Network looks like the below structure

ANN initial representation.


The initial representation of an Artificial Neural Network includes multiple layers where inputs are assigned relative weights. These inputs are combined, summed up, and subjected to an activation function. This output can further serve as input for other neurons.

  • Sum = W1X1 + W2X2 + ....... + WnXn
  • Function = F(W1X1 + W2X2 + ....... + WnXn)

This architecture is loosely inspired by biology but not an exact replica. And actually CNNs are from early 1990s.

2 Layered Neural Connection.


Biological Neurons:

Neuron and its parts representation.


From the perspective of Graph Theory they are outgoing edges or vertices and in biology they are "axons".

And here every dendrite doesn't have to be similar and some of them are thick and some are thin and each have their own importance and information passing through.

Single Layer inputs and the corresponding output.


Biological neurons, from a graph theory perspective, are outgoing edges or vertices. In biology, these are known as "axons." Not all dendrites are alike; some are thick, some are thin, and each carries unique information.

The weight assigned to each dendrite varies depending on the importance of the signals. The heavier the weight and the thicker the dendrite, the more critical the information.

Neurons are activated or triggered when they receive sufficient signal input, resulting in the release of output.

Opt = f(ΣWiXi) for i = 1 to n.


Formation of Neural Networks:

Neural networks are interactions of neurons. At birth, neural networks are sparse, but during early childhood, as we learn language, speech, and object identification, connections multiply. The brain's energy consumption is substantial during this phase, sustaining electrochemical reactions required for connections.

As we age, neural connections start to degenerate, leading to issues like difficulty recognizing objects and people, communication problems, and impaired hearing and vision.

Just like any other cells in the body, dead brain cells lead to a loss of cognition and thinking ability.

Biological Building of Neural Networks:

Biologically inspired algorithms are designed to form new connections as soon as they consume data. Interestingly, these connections are not coincidental but are influenced by the consumed data.

During early childhood till 6 years old as we tend to learn the language, speech and identification of objects, and at this stage we tend to have high connection rate. And for more connections, connecting with the edges we need lot of energy, calories, to sustain the electro chemical reaction.

Example image of Typical connection between Neurons.


Till the 6 yrs age children consumes a lot of data which is visual, audio, sensory inputs etc...

Neural Connections at different ages of Life


  • As we grow older, and as the process of aging hits, we will have degeneration of neural connections. It is because, with the process of aging we get our body weak and some of the connections will be lost and as a result it is seen that a lot of people in their old ages stops recognizing the objects and people and problems like cannot even communicate clearly, hearing and vision etc..
  • Just like any other cell in the body, if the cells or brain cells are dead we will loose our cognition or the ability to think.
  • By taking all the data as input in an algorithm that is built biologically to train as soon as they consume the data, new connections are formed. And one interesting thing is that the connections formed are due to the data which is consumed but not coincidence.

Connections that are formed are unique and different based on the consumed data.

Example:

Let's think of a kid who got hurt by touching a hot object, in this case a new neural connection will be formed and depends on the cases the connections will be differed.

Imaginary representation of neural connection for person who suffered pain.
Imaginary representation of neural connection for person who tasted sweet.

Based on the data, the neurons will learn the behavior. The connections that were formed either directly by physical experience (or) by someone who lets us know, in this way neurons learns the behavior.

To take an example, it is not always necessary that a kid needs to touch the hot object to know that it makes him hurt. He can learn this by listening to his parents or someone else's experience.

All the learnings, in this case biological learnings by the neurons, are nothing but the weights on neural connections according to the learning, behavior (or) experience. This concepts will connect relatively well with "Artificial Neural Networks".

If the weight is equal to "0" there is no connection. If weight increases connections are formed.

In ANN, learning is nothing but connecting to various single neurons with some edges and with some weights on them and thats what learning is all about. And learning happens through data. This means all this weight learning even in Artificial Neural Network (or) a computational Neural Network is nothing but weight calculation.

Logistic Regression & Perceptron:

Logistic regression is a well-known technique for separating positive and negative points geometrically. It can be understood from a neuron's perspective as mapping inputs to predicted values.

Perceptron serves as a linear classifier, attempting to find a line for classifying points, albeit without the squashing function used in logistic regression.

Classifying two different set of points.


Now, let's go into the explanation of Logistic Regression from a neuron's perspective.

Mathematical Explanation:

  • ^yi = Sigmoid(wTxi+b), where xi belongs to an R-dimensional space.
  • Training logistic regression involves finding the weight vector (w) and bias (b). Training utilizes Stochastic Gradient Descent (SGD).
  • In contrast, the perceptron, developed by Rosenblatt in 1957, is similar to logistic regression but differs in its activation function. The perceptron's activation function is binary: 1 if (WTXi+b) > 0, else 0.

Figmoid Function in the case of a LR


We train LR by using SGD. (LR and optimization case.)

The most important part of training a Neural Network is nothing but finding the weights on edges/vertices.

LR => Neuron (As we made the activation function present neuron as the Sigmoid or Logistic Function).

Perceptron:

Perceptron as an idea was created in 1957 by Rossen blat. And the idea is very similar to the idea of Logistic Regression with some differences.

The only difference in LR and perceptron is the activation function.

  • ΣWjXij + b (j = 1 -> d)
  • f(x) = { 1 if WTXi+b > 0 ; 0 otherwise.}

The idea of a biological neuron, then was, we will get some inputs, differing on sizes; thick and thin based on their weights and this neuron then fires (or) does not trigger; triggering or fire here referring to the generation of a value; if not triggered the result value is "0".

Perceptron is a linear classifier. From a geometric stand point, a perceptron is, or a perceptron also tries to find a line to classify the points accordingly. The only difference is that the LR has the "Squashing or sigmoid function" in perceptron we don't have the above.

Perceptron is the oldest model in Machine Learning ever used.

But wait, how to train the Perceptron?

Of course, we have SGD.

And we also know how to use SGD in Linear and Logistic regression. But training a perceptron means computing the edges weights (or) any of the Gradient Decent approaches can be used.

We can often think LR and perceptron as a simple single neuron models. The only difference is in LR we use "Sigmoid" & in perceptron we have a function in which it says about the value whether it is "0" (or) not.

Multi Layered Perceptron (MLP):

The perceptron represents a simplified version of a neuron and shares some similarities with logistic regression. In a single-neuron model or perceptron, there are input and output layers, akin to logistic regression.

However, a Multi-Layered Perceptron (MLP) introduces multiple layers, each akin to a perceptron or neuron. Each layer comprises a function, which may be the same or different.

Single Neuron or Perceptron.

In a single neuron model (or) a perceptron we will have an input layer, and the output layer and same as in Logistic Regression.

Neural Network.


The above Neural Network is also called as "Multi Layered Perceptron". There are multiple layers here and each of them can be thought of as perceptrons. And in every neuron (or) perceptron there is a function be it same function or different function.

Is there any reason about, Why do we need to care about MLP?

Actually, there are multiple but we lets discuss this in two arguments.

  1. Biological Inspiration:In neuroscience, the complex structures of neurons interconnected in vast numbers inspired the adoption of multi-layered structures. These structures proved more powerful than single perceptrons.

Imaginary diagram of Multiple connections in a network for neurons

And that is how they have discovered that using a neural structure for reference is more powerful and useful than using a single perceptron, as the multi layered structure can bring more benefits.

Mathematical Argument:

MLPs allow solving complex problems by adding layers and applying simple mathematical functions. Even highly complex problems can be addressed through this layering technique.

Let's take an example of a regression problem by diverting slightly from the core discussion that we had till now.

Plot for the above complex problem.


D = {xi, yi}. Find function F(xi) that connects "xi" with "yi"; ( yi E Real Numbers)

Simplifying the Complex problem through adding simple functions and layers.


We are passing the inputs through sequences of functions. Now we can represent a complex function f(x) = 2 sin (x^2) + √5x by using Multi layered Perceptron like structures.

By using a Multi-layered structure, we can come up with complex mathematical functions to solve our problem.

The multi-layered perceptron structure gives enormous power to the models.

{A Simplest model is linear model. If the data is not linear, non-linear models like RBF-SVM (Support Vector Machine), RF, GBDT.}

For the above we are creating non-linearity by using sequence of functions and applying them which are very powerful. This is equally relevant to the high school concept, function composition. [F(g(x)) ; G(f(x))]

yi = 2 sin (x^2) + √5x;

F(x) we can write this F(x) in terms of f1,f2,f3,f4,f5,....etc..

F(x) = 2 sin (x^2) + √5x

yi = 2 sin (x^2) + √5x; can be written as :

  • f5(2,sinx)
  • f5(2,f4)
  • f5(2,f4(f2(x)))
  • F(x) = f1(2,f4(f2(x)),f3(f5(x,5)))

Multi Layered Perceptron is same as the function composition concept.

  • The concept of function composition in MLP is analogous to F(g(x)) and G(f(x)). By adding layers, complex mathematical functions can be represented as compositions of simpler ones. This approach is a fundamental concept in neural networks and deep learning.
  • However, a caveat is that while multi-layered structures result in powerful models, they are prone to overfitting. Balancing between underfitting and overfitting is a crucial task in deep learning.
  • In summary, Multi-Layered Perceptron (MLP) is a pivotal idea in deep learning and machine learning. It introduces the concept of function composition, allowing complex problems to be tackled by combining simpler functions.

Multi-layered structure with powerful models can Over fit easily.

Multi Layered Perceptron is very important and interesting ideas in whole deep learning models and probably in the whole Machine Learning itself.

Notation:

D = {xi, yi} ; xi E Real Number -> Regression Problem.

Images:

Xij => "i" index & "j" feature

Fij => "i" Layer number; "j" function index.

k -> next layer.

Wij => weights; "i" from input, "j" to neuron.

Multi Layered Neural Connection.


Here, we are connecting all the possible connections in the layers for every possible pair.

A network in which we are having all the possible connections available is called a "Fully Connected Neural Network" (or) a "Fully Connected Multi Layered Perceptron" excluding inter layer connections. We can only connect from one stage to another (or) one layer to another layer.

Now let us look at from the perspective of the first Neuron:

From the first layer; weights are given correspondingly.


W11, W12 & W13 are the weights from the first input to the first layer.

  • From Xi1 -> f11, f12, f13.
  • From Xi2 -> f11, f12, f13 => W21, W22, W23
  • From Xi3 -> f11, f12, f13 => W31, W32, W33
  • From Xi4 (Input) -> f11, f12, f13 (Edges) => W41, W42, W43 (Weights).


These weights can be represented as a matrix of 4 X 3 ; 4 rows and 3 columns.




Further Deep Learning Concepts will be discussed in the later articles....



References:

Main Image Src:

Shrestha, A. (2020, January 22). Machine Learning: A Primer to Laboratory Applications. The Connected Lab. https://www.thermofisher.com/blog/connectedlab/machine-learning-a-primer-to-laboratory-applications/

Biological Neural Connection Image:

Szczepaniak, M. O. & V. (2019, January 14). Neuroscience and Suzuki:Brain Development from Age Zero, and the Impact of Early Childhood Music Ed. WSSTE Suzuki. https://www.wsste.com/post/neuroscience-and-suzuki-brain-development-from-age-zero-and-the-impact-of-early-childhood-music-ed










Kajal Singh

HR Operations | Implementation of HRIS systems & Employee Onboarding | HR Policies | Exit Interviews

12 个月

Great share. Besides Support Vector Machines, during 1980 and 2010, researchers worked on expanding MultiLayer Perceptrons (MLPs) which were invented by Ivankhnenko and Lapa in 1965 and began to be called Deep Learning Networks (DLNs) in 1986. As mentioned in a previous blog, a one layer Perceptron network consists of an input layer connected to a hidden layer, which is connected to an output layer of Perceptrons (or vertices). The Perceptron multiplies incoming signals by their weights and adds them together. If the sum of the weighted signals exceeds a specified value, the Perceptron “fires”.Activation functions, such as Tanh, ReLU, and Sigmoid, are used to determine if a Perceptron fires. Artificial Neural Networks (ANNs) are simply Perceptrons or other similar neurons that may have different activation functions. DLNs have more than one hidden layear and are complex due to the non-linear nature of activation functions, making them unexplainable "black boxes". Researchers like Hinton, LeCun and Schmidhauber popularized variants of DLNs, e.g., Fully Connected Networks, Autoencoders, Convolution Neural Networks, Recurrent Neural Networks, Long Short Term Memory, and Deep Belief Networks.

要查看或添加评论,请登录

Chandra Prakash Bathula的更多文章

社区洞察

其他会员也浏览了