Introduction to neural networks architectures and implementation with Keras

Introduction to neural networks architectures and implementation with Keras

This short article aims to explaining briefly the use of Keras Sequential and Functional APIs to build different neural networks architectures, from multilayer perceptron (MLP) to wide and deep models.

We will start with a quick reminder of basic notions about those models before we tackle their implementation with Keras.

Other aspects, like model hyperparameters finetuning or model training technical details, are not covered here.


What’s a neural network? MLP?

A neural network is a machine learning model inspired by the way we think human brain works. They are the core of deep learning models, allowing to deal with very complex machine learning tasks, such as image classification and speech recognition.

Image 1: a representation of a multilayer perceptron, with z input features and 3 hidden layers containing respectively 5, 4 and 3 neurons.

A neural network is composed of an inputs layer, an outputs layer, and in between, intermediary layers called hidden layers, composed of nodes called neurons. Some calculations are performed at each neuron. More precisely, an activation function applied to a weighted sum of previous layer’s values plus a bias term.

Neurons of a given layer communicate values to the ones in the next layer, up to the output value. This sequential architecture is called MLP (Multilayer Perceptron) and is heavily used today.

For those who are interested, the value from neuron i in hidden layer h is given by the following formula:

Where:

  • g is called an activation function,
  • l is the length of the layer h,
  • p is the length of the previous layer h-1,
  • a term is the weight attributed to neuron k in layer h-1 for the calculation of neuron i in hidden layer h
  • b term is the bias term for neuron i in hidden layer h.

The calculations in layer h can be expressed using matrices multiplications as follows:

Where:

More complex neural networks architectures:

Although the structure presented in the previous section is very common, there exist more complex nonsequential ones that can be very useful in many cases.

Wide and deep neural networks:

This architecture connects the inputs (totally or partially) directly to the output layer. This allows the model to preserve and learn from simple patterns directly from the inputs through a short path. In fact, simple patterns end up being distorted by the successive layers.

In other words, wide and deep models present a combination between memorization and generalization. It is about finding a balance between shallow feature interactions and deep understanding of patterns in data.

Image 2: a representation of a wide & deep model (model A).

Wide and deep models can be applied in many tasks. One example is recommendation systems, where the wide component is about memorization of user-item interactions, while deep one learns more complex patterns.

Multi-input neural networks:

For this model, the idea is to send a subset of the features through the wide path right to the output layer, and another one through the deep path (through the successive hidden layers). The two subsets can be either exclusive (no common features), or they can overlap.

Image 3: a representation of a wide & deep model with multiple inputs? (model B).

Multi-output neural networks:

Multi-output models have multiple outputs from different layers in the network. Their main characteristic is that they allow for shared learning across different tasks, which can lead to better efficiency and generalization power compared to building separate models for each task.

Image 4: a representation of a wide & deep model with multiple inputs and outputs? (model C).

There are many cases where multi-output models are particularly useful. Here are some of them:

  • Locating and classifying the main object in a picture (regression and classification tasks in the same model),
  • Multitask classification: classifying facial expression of a person in a picture and whether the person is wearing a beard,
  • Adding auxiliary outputs as a regularization technique.


Implementing neural networks using Keras:

In this chapter, we will see how we can implement the different models presented in the previous section with Keras. We will also see the difference between sequential and functional Keras APIs.

A word about Keras

Keras is a high level level open-source Deep Learning API that allows the user to build, train, evaluate and execute all sorts of neural networks. Thanks to its flexibility and ease to use, it is very popular among data scientists. In terms of backend, Kears supports different Deep Learning libraries, such as TensorFlow.

In the next sections, we suppose that Tensorflow is installed in your active environment. Otherwise, you can install it using pip command:

$ python3 -m pip install -U -tensorflow        

Also, we suppose that we are dealing with a regression task for which training, validation and test sets are already known: (X_train, y_train), (X_valid, y_valid) and (X_test, y_test).

Keras sequential API

Let’s import keras and build the multilayer perceptron model represented in image 1:

import tensorflow as tf

from tensorflow import keras

model = keras.models.Sequential([

?????????????? keras.layers.Dense(input_shape=X_train.shape[1:]),

?????????????? keras.layers.Dense(5, activation=”relu”),

?????????????? keras.layers.Dense(4, activation=”relu”),

?????????????? keras.layers.Dense(3, activation=”relu”),

?????????????? keras.layers.Dense(1),

])        

In addition to input layer and single neuron output layer, this model contains 3 hidden layers:

  • First hidden layer contains 5 neurons with ReLU activation function,
  • Second hidden layer contains 4 neurons with ReLU activation function,
  • Third hidden layer contains 3 nourons with ReLU activation function.

Here, no activation function has been applied to the output layer.

The model is created and we need to compile it. This step consists of defining the loss function and the optimizer we would like to use to train the model:

model.compile(loss=”mean_squared_error”, optimizer=”sgd”)        

Now, it’s time to train the model over 100 epochs:

model_training = model.fit(X_train, y_train, epochs=100, 
                           validation_data=(X_valid, y_valid))        

To evaluate the model on the test set, i.e to calculate the mean squared error in our example, we call evaluate() function:

mse_test_set = model.evaluate(X_test, y_test)        

For prediction on a new instances X_to_predict:

y_to_predict = model.predict(X_to_predict)        

Sequential API is popular and quite easy to manipulate. However, it many cases, we need to use more complex structures, and have multiple inputs and outputs in the model, which is not supported by the sequential API.

Keras functional API

Keras functional API is very useful to build complex neural networks architectures. The specificity here is that a given layer is used as a function of ?the previous layer. Let’s see how we can implement the previous examples.

  • Wide & deep model (Model A):

?Let’s build this model:

input_layer = keras.layers.Input(shape=X_train.shape[1:])

hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_layer)

hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)

hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)

concat = keras.layers.Concatenate()([input_layer, hidden_layer_3])

output_layer = keras.layers.Dense(1)(concat)

model = keras.Model(inputs=[input_layer], outputs=[output_layer])        

?

Here are few remarks at this stage:

  1. Each layer is passed as an argument for the next one,
  2. We Concatenate() function to concatenate inputs with the last hidden layer,
  3. Model() function takes not only one, but a list inputs and outputs,
  4. The compilation, training, evaluation and prediction steps are similar to what we saw with the previous example. Thus, no need to repeat them.

  • Wide & deep model with multiple inputs (Model B):For this model, there are two inputs: input_wide for the wide path and input_deep for the deep path. input_wide will be concatenated with hidden layer 3 before the output layer.

input_wide = keras.layers.Input(shape=[input_wide_size], name=”wide”)

input_deep = keras.layers.Input(shape=[input_deep_size], name=”deep”)

hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_deep)

hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)

hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)

concat = keras.layers.Concatenate()([input_wide, hidden_layer_3])

output_layer = keras.layers.Dense(1)(concat)

model = keras.Model(inputs=[input_wide, input_deep], outputs=[output_layer])        

We need to compile the model:

model.compile(loss=”mean_squared_error”, optimizer=”sgd”)        

Now, we can train the model, evaluate it and make a new prediction as follows:

model_training = model.fit((X_train_wide, X_train_deep), y_train,           
                 epochs=100, 
                 validation_data=((X_valid_wide, X_valid_deep), y_valid))        

Please note that variables ending with "wide" and "deep" correspond to wide and deep inputs.

mse_test_set = model.evaluate((X_test_wide, X_test_deep), y_test)
y_to_predict = model.predict((X_to_predict_wide, X_to_predict_deep))        

  • Wide & deep model with multiple inputs and outputs (Model C):

input_wide = keras.layers.Input(shape=[input_wide_size], name=”wide”)
input_deep = keras.layers.Input(shape=[input_deep_size], name=”deep”)
hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_deep)
hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)
hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)
concat = keras.layers.Concatenate()([input_wide, hidden_layer_3])
output_1 = keras.layers.Dense(1, name = “output_1”)(concat)
output_2 = keras.layers.Dense(1, name = “output_2”)(hidden_layer_3)
model = keras.Model(inputs=[input_wide, input_deep], 
                    outputs=[output_1, output_2])        

When compiling the model, we have the possibility to specify a loss function for each output as well as a weight vector to give more importance to an output over the other. Here, we give 90% of the importance to output_1 and 10% to output_2):

model.compile(loss=[”mean_squared_error”, ”mean_squared_error”], 
              optimizer=”sgd”, loss_weights=[0.9, 0.1])        

Then, we train the model (y_1 and y_2 variables correspond to output_1 and output_2), evaluate it and make new predictions as follows:

model_training = model.fit((X_train_wide, X_train_deep), 
                 (y_1_train, y_2_train), epochs=100, 
                 validation_data=((X_valid_wide, X_valid_deep), 
                                  (y_1_valid, y_2_valid)))
mse_test_set = model.evaluate((X_test_wide, X_test_deep), 
                              (y_1_test, y_2_test))
y_1_to_predict, y_2_to_predict = model.predict((X_to_predict_wide, 
                                                X_to_predict_deep))        

Conclusion

In this article, we saw different neural networks structures, from the simplest one, that can be implemented using Keras Sequential API, to more complex architectures, that require the use of Keras Functional API. This latter offers more flexibility and control over the model structure.

There exist more sophisticated Deep Learning models based on Neural Networks that are incorporated in a complex system. For example, convolutional neural networks (for image recognition) and recurrent neural networks (for speech recognition for instance).

要查看或添加评论,请登录

社区洞察

其他会员也浏览了