登录查看更多内容

Foundations of Neural Nets

Pranav Kumar PB

Senior Machine Learning Engineer

发布日期: 2022年2月17日

It has been a while I did anything related to Machine Learning or Deep learning so I decided to revisit it. Having thought so, I enrolled myself in Coursera's Deep Learning Specialisation taught by none other than Andrew Ng. I also decided to write about my learnings. So here goes my first article in this space. This article is primarily about the foundational concepts for Deep Learning and Deep Neural Nets.

Even though I try to go into details, writing everything covered in the course is not possible so this is an abridged version of my learnings. If you are interested to know more please check out Andrew's course on Coursera.

In this article I'll be covering the following:

Logistic Regression for Binary Classification
Logistic Regression's Cost function
Gradient Descent & It's algorithm
Gradient Descent for Logistic Regression
Vectorising Logistic Regression for m examples n features with O(k) time complexity
Shallow Neural Network

Logistic Regression for Binary Classification

First of all, you might be thinking about what logistic regression is doing in a neural net's article. The thing is, the math behind neural nets and logistic regression is very similar. It is much easier to understand logistic regression than neural nets. When we switch to the sixth section of this article I'll explain how logistic regression connects with neural nets.

So, What is Logistic Regression and where do we use it?

Logistic Regression is a predictive model which determines the category of any given data sample. Let me give you an example, when trained on historical transactional data, it (logistic regression) can determine if a transaction is fraudulent or not.

What is Binary Classification?

Binary classification is when the maximum number of outcomes of the predictive model is 2 (ex: fraudulent transaction or normal transaction) if it exceeds 2 it is called Multi-class classification.

Logistic Regression's Formula

When given some input data logistic regression can make a prediction. Don't you think that is awesome? So how does logistic regression do what it does? Let's dive in....

It uses a formula

y = sigmoid( w * x + b )

Let's go through each element of the formula:

y is the output variable (ex: fraud or not, cat or dog etc)
x is the input variable (ex: time, amount, location etc of a transaction)
w & b are the variables that the model learns while training (technically these are called weights and biases)
sigmoid function

A little note on sigmoid

def sigmoid(x):
 
   return  1 / (1 + np.exp(-x))

if you remember, `y= wx + b` is the formula of a straight line, so for any given value of x, y will have a different value (this type of prediction is called regression) to covert this regression problem into a classification problem (i.e continuous output value to a classification output) we use a sigmoid function.

Logistic Regression's Cost Function

Given training data such as { (x1, y1), (x2, y2), ....., (xm, ym) } for each training example ( xi, yi) the above formula will be applied and the resulting y^ (y hat) will be calculated.

We want y^ to be as close to y[i] as possible. In order to do that, we have to first calculate how much the prediction was overshot (loss or cost) and then we have to employ ways to reduce that loss (we use gradient descent & backprop to achieve this).

Loss Function: The function used to calculate the loss for a given example (xi, yi), we can employ many loss functions like mean squared error, log loss, huber loss etc but the idea is that whichever loss function we employ, it has to make the training problem convex, so for binary classification, we use the following loss function.

领英推荐

Deep Learning Resources and Study Path For Aspiring…

Srivatsan Srinivasan 5 年前

TensorFlow - Aamir?P

AAMIR P 11 个月前

A simple CNN In TensorFlow: Practical CIFAR-10 Guide

Dr. Partha Majumder 4 个月前

L (y^, yi) = - (y*log(y^) + (1 - y)*log(1-y^))

Cost Function: The summation off all losses is called cost function.

# J(w, b) = 1/m * (sum of all losses)

def cost_func(y, y^, m, w, b):
  cost = 0
  for i in range(m):
    cost += - (y[i]*log(y^[i]) + (1 - y[i])*log(1-y^[i]))
  return -(cost) / m

Gradient Descent

Now that we have the total cost of a given training set (we know how off our current prediction is, from the original output) we can employ various methods to reduce that error and make the predictive model's prediction as accurate as possible.

This is a 2d representation of the gradient descent approach but there is a 3d version of it as well, we'll get to it later.

The idea behind this visualisation is that we plot the cost function against the variable w and we find the value of w for which the cost function is minimal.

We can extend the same to 3d if we plot the cost function again for both the changing variables w & b. In order to find this work, the plot should be convex and have only one global minima. That is the reason we need to use a loss function that gives such values during training.

The algorithm: We use partial derivatives for w and b to get the direction in which we need to reduce w and b and we subtract the partially derived value to find global minima. The calculation of partial derivatives and updating weights and biases are called BACKPROPAGATION

# alpha is the learning rate & dw is partial derivative of cost func over w

w = w - alpha * dw 

# alpha is the learning rate & dw is partial derivative of cost func over b

b = b - alpha * db # alpha is the learning rate & db is partial derivative

Gradient Descent for Logistic Regression

We've discussed the gradient descent algo for only one input feature (x) and only one weight (w) but in real-world scenarios, there will be many input features (i.e each x value will be an array of a given length) therefore we will need a separate w for each feature (w will also become an array). So for such a case, the algo will change to the following.

J = 0
dw = [0 * n] # n is the number of features
db = 0

for i in range(m):
    
      z[i] = w * x[i] + b # w and x are arrays,therefore z is also an array
      
      a[i] = sigmoid(z[i]) # prediction
      
      J += -(y[i]*log(y^[i]) + (1 - y[i])*log(1-y^[i])) # cost
      
      
      
      #derivatives
     
      dz[i] = a[i] - y[i]
      
      for j in range(n):
          dw[i] += x[i][j] * dz[i]

      db += dz[i]

Optimisation: If you see the above example it is not optimal we have two loops, one going over m examples and one going over n features making the time complexity O(m * n). We could optimise this using vectorising. A vectorised version of the above algo will look like this

# random initialisation
dz = [random_value * m]
w = [random_value * m]
b = random_value
A = [random_value * m]
Y = [y[1], ...., y[m]]
X = [x[1], ..., x[m]]

dz = A - Y

dzT = np.transpose(dz)

for i in range(1000): # this is for 1000 epochs

   db = 1 /m * (np.sum(dz))

   dw = 1 / m * X * dzT

   w = w - alpha * dw # alpha is learning rate

   b = b - alpha * db

We still have an explicit loop for training the model on the same data for x number of epochs but that neither depends on m or n so we changed O(m * n) to O(k) where k is a constant number of epochs which is much better.

Shallow Neural Networks

So why did we discuss logistic regression in an article about neural networks?

That is because they both are very closely related. If you observe the below image. In logistic regression, there is just one unit of computation (i.e using the formula to calculate a regressive output and applying sigmoid on it to calculate a classification output).

Whereas in a neural network we will have many such computation units (also called neurons) Not only do they have many neurons they are also placed in layers with each layer containing many neurons and a Neural network will have many such layers.

Each layer can have its own activation function (like sigmoid in our logistic classification example). But the idea is when the cost function is computed the loss is backpropagated through each neuron updating its own weight and bias.

Why do neural networks work?

Neural networks particularly deep neural networks work very well for a few problems such as image recognition, speech recognition, text analysis. Stay tuned because I'll be writing another article detailing why they work and how to tune the hyper-parameters for developing highly accurate deep neural nets.

要查看或添加评论，请登录

Pranav Kumar PB的更多文章

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

2025年2月24日

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

Sorry for the click-baity title, but I want to clarify that while the fine-tuned model from this process may not be as…
??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

2024年9月22日

??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

Multimodal Large Language Models (LLMs) that understand both text and images (or other media formats) are becoming…

2 条评论
Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

2024年9月14日

Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

0. Introduction Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP)…
Basic Statistics for Exploratory Data Analysis (EDA)

2022年8月10日

Basic Statistics for Exploratory Data Analysis (EDA)

Even though neural networks are very effective for large unstructured data like images, text and speech, we still have…
Backprop Through Time

2022年3月2日

Backprop Through Time

For both Deep Neural Nets and Convoluted Neural Nets, all the examples in the training set are of the same length but…
Convolutions, Pooling & Flattening

2022年2月25日

Convolutions, Pooling & Flattening

While building neural networks for visual tasks like image recognition, object detection or boundary detection…
Deep Neural Nets & Improving them

2022年2月19日

Deep Neural Nets & Improving them

In the previous article, I wrote about the building blocks of Neural nets such as cost functions, gradient descent…

2 条评论

See all articles

Foundations of Neural Nets

Pranav Kumar PB

Senior Machine Learning Engineer

Logistic Regression for Binary Classification

Logistic Regression's Cost Function

领英推荐

Gradient Descent

Gradient Descent for Logistic Regression

Shallow Neural Networks

Why do neural networks work?

Pranav Kumar PB的更多文章

社区洞察

其他会员也浏览了

Configure Deep Learning Architecture

Learning TensorFlow: Introduction to Convolutions

Deep Learning & The Math You Need: A Quick Overview

Gradient Descent – Basic and Variants

How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls

Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

TensorFlow Decision Forests - ML Made Easy

Machine Learning for Construction: Deep Learning & Neural Nets

The First Machine Learning Model

Unveiling the Mathematical Odyssey: My Journey into Machine Learning

Logistic Regression for Binary Classification

Logistic Regression's Cost Function

领英推荐

Gradient Descent

Gradient Descent for Logistic Regression

Shallow Neural Networks

Why do neural networks work?

Pranav Kumar PB的更多文章

I fine-tuned a LLaMA on Vertex AI using torchtune for $10

??? Expanding the Scope of LLMs: Multimodal and Task-Enhanced AI

Unraveling LLMs: A PyTorch Developer’s Take on Core Concepts of LLMs

Basic Statistics for Exploratory Data Analysis (EDA)

Backprop Through Time

Convolutions, Pooling & Flattening

Deep Neural Nets & Improving them

社区洞察

其他会员也浏览了

Configure Deep Learning Architecture

Learning TensorFlow: Introduction to Convolutions

Deep Learning & The Math You Need: A Quick Overview

Gradient Descent – Basic and Variants

How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls

Basic recommendation system based on matrix factorization technique using Keras (Open Source Neural Network library)

TensorFlow Decision Forests - ML Made Easy

Machine Learning for Construction: Deep Learning & Neural Nets

The First Machine Learning Model

Unveiling the Mathematical Odyssey: My Journey into Machine Learning