Introduction to neural networks architectures and implementation with Keras
Mohamed Amine MOUTACHAKKIR
Founder of Nextuarial | Actuarial Software Innovator | AI and Actuarial Consulting Expert
This short article aims to explaining briefly the use of Keras Sequential and Functional APIs to build different neural networks architectures, from multilayer perceptron (MLP) to wide and deep models.
We will start with a quick reminder of basic notions about those models before we tackle their implementation with Keras.
Other aspects, like model hyperparameters finetuning or model training technical details, are not covered here.
What’s a neural network? MLP?
A neural network is a machine learning model inspired by the way we think human brain works. They are the core of deep learning models, allowing to deal with very complex machine learning tasks, such as image classification and speech recognition.
A neural network is composed of an inputs layer, an outputs layer, and in between, intermediary layers called hidden layers, composed of nodes called neurons. Some calculations are performed at each neuron. More precisely, an activation function applied to a weighted sum of previous layer’s values plus a bias term.
Neurons of a given layer communicate values to the ones in the next layer, up to the output value. This sequential architecture is called MLP (Multilayer Perceptron) and is heavily used today.
For those who are interested, the value from neuron i in hidden layer h is given by the following formula:
Where:
The calculations in layer h can be expressed using matrices multiplications as follows:
Where:
More complex neural networks architectures:
Although the structure presented in the previous section is very common, there exist more complex nonsequential ones that can be very useful in many cases.
Wide and deep neural networks:
This architecture connects the inputs (totally or partially) directly to the output layer. This allows the model to preserve and learn from simple patterns directly from the inputs through a short path. In fact, simple patterns end up being distorted by the successive layers.
In other words, wide and deep models present a combination between memorization and generalization. It is about finding a balance between shallow feature interactions and deep understanding of patterns in data.
Wide and deep models can be applied in many tasks. One example is recommendation systems, where the wide component is about memorization of user-item interactions, while deep one learns more complex patterns.
Multi-input neural networks:
For this model, the idea is to send a subset of the features through the wide path right to the output layer, and another one through the deep path (through the successive hidden layers). The two subsets can be either exclusive (no common features), or they can overlap.
Multi-output neural networks:
Multi-output models have multiple outputs from different layers in the network. Their main characteristic is that they allow for shared learning across different tasks, which can lead to better efficiency and generalization power compared to building separate models for each task.
There are many cases where multi-output models are particularly useful. Here are some of them:
Implementing neural networks using Keras:
In this chapter, we will see how we can implement the different models presented in the previous section with Keras. We will also see the difference between sequential and functional Keras APIs.
A word about Keras
Keras is a high level level open-source Deep Learning API that allows the user to build, train, evaluate and execute all sorts of neural networks. Thanks to its flexibility and ease to use, it is very popular among data scientists. In terms of backend, Kears supports different Deep Learning libraries, such as TensorFlow.
In the next sections, we suppose that Tensorflow is installed in your active environment. Otherwise, you can install it using pip command:
$ python3 -m pip install -U -tensorflow
Also, we suppose that we are dealing with a regression task for which training, validation and test sets are already known: (X_train, y_train), (X_valid, y_valid) and (X_test, y_test).
领英推荐
Keras sequential API
Let’s import keras and build the multilayer perceptron model represented in image 1:
import tensorflow as tf
from tensorflow import keras
model = keras.models.Sequential([
?????????????? keras.layers.Dense(input_shape=X_train.shape[1:]),
?????????????? keras.layers.Dense(5, activation=”relu”),
?????????????? keras.layers.Dense(4, activation=”relu”),
?????????????? keras.layers.Dense(3, activation=”relu”),
?????????????? keras.layers.Dense(1),
])
In addition to input layer and single neuron output layer, this model contains 3 hidden layers:
Here, no activation function has been applied to the output layer.
The model is created and we need to compile it. This step consists of defining the loss function and the optimizer we would like to use to train the model:
model.compile(loss=”mean_squared_error”, optimizer=”sgd”)
Now, it’s time to train the model over 100 epochs:
model_training = model.fit(X_train, y_train, epochs=100,
validation_data=(X_valid, y_valid))
To evaluate the model on the test set, i.e to calculate the mean squared error in our example, we call evaluate() function:
mse_test_set = model.evaluate(X_test, y_test)
For prediction on a new instances X_to_predict:
y_to_predict = model.predict(X_to_predict)
Sequential API is popular and quite easy to manipulate. However, it many cases, we need to use more complex structures, and have multiple inputs and outputs in the model, which is not supported by the sequential API.
Keras functional API
Keras functional API is very useful to build complex neural networks architectures. The specificity here is that a given layer is used as a function of ?the previous layer. Let’s see how we can implement the previous examples.
?Let’s build this model:
input_layer = keras.layers.Input(shape=X_train.shape[1:])
hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_layer)
hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)
hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)
concat = keras.layers.Concatenate()([input_layer, hidden_layer_3])
output_layer = keras.layers.Dense(1)(concat)
model = keras.Model(inputs=[input_layer], outputs=[output_layer])
?
Here are few remarks at this stage:
input_wide = keras.layers.Input(shape=[input_wide_size], name=”wide”)
input_deep = keras.layers.Input(shape=[input_deep_size], name=”deep”)
hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_deep)
hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)
hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)
concat = keras.layers.Concatenate()([input_wide, hidden_layer_3])
output_layer = keras.layers.Dense(1)(concat)
model = keras.Model(inputs=[input_wide, input_deep], outputs=[output_layer])
We need to compile the model:
model.compile(loss=”mean_squared_error”, optimizer=”sgd”)
Now, we can train the model, evaluate it and make a new prediction as follows:
model_training = model.fit((X_train_wide, X_train_deep), y_train,
epochs=100,
validation_data=((X_valid_wide, X_valid_deep), y_valid))
Please note that variables ending with "wide" and "deep" correspond to wide and deep inputs.
mse_test_set = model.evaluate((X_test_wide, X_test_deep), y_test)
y_to_predict = model.predict((X_to_predict_wide, X_to_predict_deep))
input_wide = keras.layers.Input(shape=[input_wide_size], name=”wide”)
input_deep = keras.layers.Input(shape=[input_deep_size], name=”deep”)
hidden_layer_1 = keras.layers.Dense(5, activation=”relu”)( input_deep)
hidden_layer_2 = keras.layers.Dense(4, activation=”relu”)(hidden_layer_1)
hidden_layer_3 = keras.layers.Dense(3, activation=”relu”)( hidden_layer_2)
concat = keras.layers.Concatenate()([input_wide, hidden_layer_3])
output_1 = keras.layers.Dense(1, name = “output_1”)(concat)
output_2 = keras.layers.Dense(1, name = “output_2”)(hidden_layer_3)
model = keras.Model(inputs=[input_wide, input_deep],
outputs=[output_1, output_2])
When compiling the model, we have the possibility to specify a loss function for each output as well as a weight vector to give more importance to an output over the other. Here, we give 90% of the importance to output_1 and 10% to output_2):
model.compile(loss=[”mean_squared_error”, ”mean_squared_error”],
optimizer=”sgd”, loss_weights=[0.9, 0.1])
Then, we train the model (y_1 and y_2 variables correspond to output_1 and output_2), evaluate it and make new predictions as follows:
model_training = model.fit((X_train_wide, X_train_deep),
(y_1_train, y_2_train), epochs=100,
validation_data=((X_valid_wide, X_valid_deep),
(y_1_valid, y_2_valid)))
mse_test_set = model.evaluate((X_test_wide, X_test_deep),
(y_1_test, y_2_test))
y_1_to_predict, y_2_to_predict = model.predict((X_to_predict_wide,
X_to_predict_deep))
Conclusion
In this article, we saw different neural networks structures, from the simplest one, that can be implemented using Keras Sequential API, to more complex architectures, that require the use of Keras Functional API. This latter offers more flexibility and control over the model structure.
There exist more sophisticated Deep Learning models based on Neural Networks that are incorporated in a complex system. For example, convolutional neural networks (for image recognition) and recurrent neural networks (for speech recognition for instance).