Neural Networks. What are they ? How to build em ?
Shashank Ravishankar
Data Scientist | Capgemini North America | AI & Analytics Practice |
One of the primary fields in machine learning that has taken prominence in solving real world problems is Deep Learning. Neural Networks are algorithms that form the framework of Deep Learning. In the simplest possible manner, a neural network can be described as an algorithm that resembles the human brain. Just like how a human brain anticipates certain events based on memories, a neural network takes input data, recognises patterns in the data and predicts an output for a new set of identical data.
What is an Artificial Neural Network (ANN) ?
An Artificial Neural Network is nothing more than a set of inputs that are connected to a set of outputs with each individual connection having certain parameters or weights. These parameters determine how strong the connection is between an input or output.
Let us say that a student is preparing for an exam. On the day before the exam the student will generally prioritise the topics which he/she feels is more important or relevant for the exam rather than studying the entire syllabus. Here, a student is basically assigning a certain weight-age or importance to some topics over others. A neural network does precisely this. It assigns a certain weight to the input variables that will influence the predicted output. In the case of the student he knows which topics are more important that the others, a neural network on the other hand identifies the input variables which influence the output and assigns weights to them automatically.
Every Artificial Neural network consists of individual components called Nodes. These nodes are similar to the neurons in the human brain. A neural network is basically a network of these nodes. A neural net consists of three different layers. An input layer, a hidden layer and an output layer. One node performs a certain calculation and then passes that calculation on to the other nodes in the network.
What is an Activation Function ?
We know that a neural network identifies if an input is affecting the predicted output. It does this with the help of certain mathematical equations. These equations are known as Activation Functions. These activation functions are attached to a node or a neuron and decide if a particular input is relevant or important for the prediction of output by the model. If it determines that the input is valid, then that node is activated. (Similar to how neurons in our brains are "fired").
In a neural network, every neuron(or node) is given a certain input. Weights are attached to these inputs. The output for the neuron is obtained by multiplying the input with the weight. This output is then transferred to the next layer. An activation function is a mathematical bridge between the input and the output of the neuron. There are different types of Activation functions. We shall look into those at a later time. Now that we know how a neural network works, let us build one for ourselves in just a few minutes.
Use the link below to learn about the different types of Activation Functions and how and when to use them.
BUILD YOUR OWN NEURAL NET WITH PYTHON
Keras is an open source, easy to use python library that we use to build deep learning models. It consists of the various functionalities required to build neural networks easily and without much trouble. The beauty of Keras is it enables us to effectively build and train Neural Networks without much code.
Problem Description : We are a given a Dataset known as the Pima Indians Diabetes data set. It consists of the health records of the Pima Indians Tribe in North America. We are required to predict the onset of diabetes for individuals based on the health data provided. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
The problem is one of binary classification. If the patient has diabetes then they are classified as 1 if not as 0.
Use the above link to download the data set.
Note: I have carried out this implementation in the Google Colabs Notebook. I suggest you do the same as it does not require any specific system requirements.
STEP 1: LOAD YOUR DATA
We will first import the libraries required to solve our problem.
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
Now we will import the Pandas library and load our dataset onto the pandas data frame.
import pandas as pd
data = pd.read_csv('diabetes.csv')
Take a look inside the file. You will see that there are 8 input variables(X) and 1 output Variable (Y).
Input Variables (X):
- Number of times pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age (years)
Output Variable (Y):
- Class Variable (0 or 1)
Once the CSV file is loaded into memory, we can split the columns of data into input and output variables. The data is stored in the form of a 2D array.There are 9 columns in our dataset. The first 8 columns represent the inputs and the 9th column represents the output. We will split the dataset by:
X = data.iloc[:,:-1]
Y = data.iloc[:,-1]
STEP 2 : DEFINE MODEL
Models in Keras are defined as a sequence of layers.
We create a Sequential model and add layers one at a time until we are happy with our network architecture.
The first thing to get right is to ensure the input layer has the right number of input features. This can be specified when creating the first layer with the input_dim argument and setting it to 8 for the 8 input variables.
In this example, we will use a fully-connected network structure with three layers.
Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the activation argument.
We will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.
- The model expects rows of data with 8 variables (the input_dim=8 argument)
- The first hidden layer has 12 nodes and uses the relu activation function.
- The second hidden layer has 8 nodes and uses the relu activation function.
- The output layer has one node and uses the sigmoid activation function.
# define the model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
STEP 3 : COMPILING THE MODEL
Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.
When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in our dataset.
We must specify the loss function to use to evaluate a set of weights, the optimizer is used to search through different weights for the network and any optional metrics we would like to collect and report during training.
In this case, we will use cross entropy as the loss argument. This loss is for a binary classification problems and is defined in Keras as “binary_crossentropy“.
We will define the optimizer as the efficient stochastic gradient descent algorithm “adam“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems.
We will also use the metrics argument to obtain the accuracy of our model.
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
STEP 4 : FIT THE MODEL
After the model has been compiled, it must be fit or trained. This fitting of the model occurs over something called Epochs. Each epoch is split into batches.
An Epoch is nothing but one pass over the rows in our training dataset. A batch is the samples the model considers in the epoch before adding weights. (Don't worry about this, it's not important)
The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the epochs argument. We must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size and set using the batch_size argument.
For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10. These values are generally chosen by either calculation or trial and error.
# fit the model
model.fit(X, Y, epochs=150, batch_size=10)
STEP 5 : EVALUATE THE MODEL
We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset.
This will only give us an idea of how well we have modeled the dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new data. We have done this for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.
You can evaluate your model on your training dataset using the evaluate() function on your model and pass it the same input and output used to train the model.
This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.
The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. We are only interested in reporting the accuracy, so we will ignore the loss value.
# evaluate the keras model
accuracy = model.evaluate(X, Y)
print('Accuracy: %.2f'' % (accuracy*100))
# make probability predictions with the model
predictions = model.predict(X)
There you have it. From knowing nothing about Deep Learning and Neural Networks, you just built an Artificial Neural Network all by yourself.
SOURCE CODE :