Demystify Neural Network. Implement A Simple NN using PYTHON Functions. Tensorflow/keras VS Python-Code...??
NN Demystify

Demystify Neural Network. Implement A Simple NN using PYTHON Functions. Tensorflow/keras VS Python-Code...??


Steps:

  • We will implement NN using Tensorflow first.
  • Post that will implement NN using simple code and math.


  1. A neural network consists of layers, each containing numerous neurons that collectively work to solve specific tasks. Its primary function is to iteratively adjust weights until it converges to the global minimum, minimizing loss.
  2. To backtrack using gradient descent and find optimal weights and biases, the network computes gradients of the loss function with respect to each weight and bias in reverse order, starting from the output layer to the input layer. These gradients indicate the direction of steepest descent in the multidimensional weight space. By iteratively adjusting weights and biases in the opposite direction of these gradients, the network aims to minimize the loss function gradually. This iterative process continues until convergence, where the weights and biases ideally reach optimal values that minimize the overall loss across the training dataset. Thus, through systematic backtracking via gradient descent, the neural network refines its parameters to efficiently solve complex problems

Problem We will be working on is a binary classification problem in which we will be predicting whether a person going to opt for insurance or not depending on his/her age and affordability.

Lets start...

  • Importing required packages

import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline        

  • Loading dataset...

df = pd.read_csv("insurance_data.csv")
df.head()        

Dataset has three columns age(10-90) , affordability(0/1), insurance(0/1).

  • split the dataset :

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['age','affordibility']],df.bought_insurance,test_size=0.2, random_state=25)        

  • Here age data range is (10-90) ,another feature range is (0/1), lets scale the age feature to increase our model performance.

X_train_scaled = X_train.copy()
X_train_scaled['age'] = X_train_scaled['age'] / 100

X_test_scaled = X_test.copy()
X_test_scaled['age'] = X_test_scaled['age'] / 100        

  • Network architecture: here we will have one neuron consisting of weighted sum function and a sigmoid function.

  • defining NN and compiling...


model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_scaled, y_train, epochs=5000)        

  • Post training lets see final weights/kernels and bias

model.evaluate(X_test_scaled,y_test)
1/1 [==============================] - 0s 1ms/step - loss: 0.3550 - accuracy: 1.0000
[0.35497748851776123, 1.0]
#lets predict
model.predict(X_test_scaled)
array([[0.7054848 ],
       [0.35569546],
       [0.16827849],
       [0.47801173],
       [0.7260697 ],
       [0.8294984 ]], dtype=float32)
#lets compare prediction with actual values
y_test
2     1
10    0
21    0
11    0
14    1
9     1
Name: bought_insurance, dtype: int64

#Now get the value of weights and bias from the model

coef, intercept = model.get_weights()
coef, intercept
(array([[5.060867 ],
        [1.4086502]], dtype=float32),
 array([-2.9137027], dtype=float32))        

  • Here are the kernels [5.060867 ], [1.4086502] for respective features and [-2.91] is the bias. This means w1=5.060867, w2=1.4086502, bias =-2.9137027

Now lets mimic using simple python code and math.

  • lets build the loss and activation function. here we gonna use sigmoid and log loss functions.

  • above image describing aggregated error for one epoch and log loss function.

def sigmoid_numpy(X):
   return 1/(1+np.exp(-X))

def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))        

  • Here we will be using batch gradient decent methodology in which in a single epoch we gonna process entire dataset. Now lets build the gradient decent function including the flow.

  • image above describing for one record we are finding predicted value and then error1. like so will do for all records and then will find aggregated error and that will be one epoch. after all epoch will calculate mean of error and that will be final error.
  • First we gonna initialize the weights with 1 and bias with zero. then we will apply sigmoid function followed by weighted sum for all age and affordability values and will find predicted values and then we gonna apply log loss function to find the error . Then we gonna new weights by manipulating existing weights and bias and to do this we gonna use derivative function. So we will find derivative of error with respect to weight1(w1) i.e. how much change in error with given change in w1. similarly for w2 and bias will do the same.

  • Above image representing we got weights and bias after first epoch and then after we gonna back track and through gradient decent how we gonna reduce error and reach to that global minima.

  • Above image describing how we are going to optimize the weights and bias . Given is the formulae. here we are using derivative of error respect to weights or bias. To know more about derivative you can visit mathisfun.

  • Above image showing the formula for derivative function for calculating new weights and bias.

  • Above image defining after applying the formulae what is the next set of weights and bias we got and with those we gonna proceed further to second epoch. And this will continue until we reduce the loss to very minimal.

  • Above image explaining post calculation of new weights and bias , in second epoch we are applying and calculating error .
  • The loop continues like in an epoch we are applying weights and bias and calculating weighted sum and then applying sigmoid function and finding prediction and then calculating error using log loss function, and for each epoch we are aggregating error by mean and then forwarding to next epoch followed by calculating new set of weights and bias by applying derivative function. This process will continue until we reach where no epoch is left . The goal is to reach to the global minima where error or loss will be minimal through this gradient decent approach . However finding optimal epoch is just trial and error kind of thing for which we can do MLOPS.

  • Above is the image showing that we have reached to global minima.

  • Above image is graph showing for using derivative to express the non-linear line. How we are going from a high loss to very less cost by applying derivative function of error with respect to bias followed by arithmetic operation with learning rate and existing bias. Here to find the slope/intercept linear function can't help as the slope decreases/increase after every iteration hence perfect way is by using non-linear functions such as derivatives.
  • Now lets define our gradient decent method:

def gradient_descent(age, affordability, y_true, epochs, loss_thresold):
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)
    for i in range(epochs):
        weighted_sum = w1 * age + w2 * affordability + bias
        y_predicted = sigmoid_numpy(weighted_sum)
        loss = log_loss(y_true, y_predicted)

        w1d = (1/n)*np.dot(np.transpose(age),(y_predicted-y_true)) 
        w2d = (1/n)*np.dot(np.transpose(affordability),(y_predicted-y_true)) 

        bias_d = np.mean(y_predicted-y_true)
        w1 = w1 - rate * w1d
        w2 = w2 - rate * w2d
        bias = bias - rate * bias_d

        print (f'Epoch:{i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')

        if loss<=loss_thresold:
            break

    return w1, w2, bias

gradient_descent(X_train_scaled['age'],X_train_scaled['affordibility'],y_train,1000, 0.4631)        

  • post running for all epochs lets see the weights and bias we got:

  • This shows that in the end we were able to come up with same value of w1,w2 and bias using a plain python implementation of gradient descent function

def sigmoid(x):
        import math
        return 1 / (1 + math.exp(-x))

def prediction_function(age, affordibility):
    weighted_sum = coef[0]*age + coef[1]*affordibility + intercept
    return sigmoid(weighted_sum)

prediction_function(.47, 1)        

We can use above two functions for prediction.


Thank You


要查看或添加评论,请登录

RANJIT PANDA的更多文章

社区洞察

其他会员也浏览了