Deep Learning
Neural Network Building Blocks: Programming View
#machinelearning #machinelearningengineer #machinelearningalgorithms #deeplearningai #deeplearning #artificialintelligence
While implementing the concepts of Deep Learning we always use some simple?mathematical concepts such as ?tensors, tensor operations,?differentiation,?gradient descent, and so on. In this post I am trying to share my understanding about these notions without getting overly technical. The most precise, rather than presenting my view by using mathematical terms, I am going to focus on executable code which I think will provide unambiguous description of a mathematical operation.
Let me start with concrete example of a neural network that uses the Python library Keras to learn to classify handwritten digits. The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). I am going to use the MNIST dataset, a classic in the machine learning community. It’s a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s.
?In machine learning, a?category?in a?classification problem is called a?class. Data points?are called?samples. The class?associated with a specific sample is?called a?label.
Let us understand step by step
Step 1: Code sample to load dataset from MNSIT
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images and train_labels form the training set, the data that the model will learn from. The model will then be tested on the test set, test_images and test_labels. The images are encoded as NumPy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.
Step 2: The Network Architecture
code Sample:
from tensorflow import keras?
from tensorflow.keras import layers
model = keras.Sequential([
??layers.Dense(512, activation="relu"),
??layers.Dense(10, activation="softmax")
])
The core building block of neural networks is the?layer. We can?think of a layer as a filter for data: some data goes in, and it comes out in a more useful form. Specifically, layers extract?representations?out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of?progressive?data distillation. A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers.
To make the model ready for training, we need to pick three more things as part of the compilation step:
An optimizer—The mechanism through which the model will update itself based on the training data it sees, so as to improve its performance.
A loss function—How the model will be able to measure its performance on the training data, and thus how it will be able to adjust itself in the right direction which should minimize the difference between expected and actual result.
Metrics to monitor during training and testing— How much Accuracy?
Step 3: Compilation Step
model.compile(optimizer="rmsprop",
???????loss="sparse_categorical_crossentropy",
领英推荐
???????metrics=["accuracy"])
Before training, we’ll preprocess the data by reshaping it into the shape the model expects and scaling it so that all values are in the?[0,?1]?interval. Previously, our training images were stored in an array of shape?(60000,?28,?28)?of type?uint8?with values in the?[0,?255]?interval. We’ll transform it into a?float32?array of shape?(60000,?28?*?28)?with values between 0 and 1.
Step 4: Image data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255?
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255
Now we’re now ready to train the model, which in Keras is done via a call to?the model’s?fit()?method—we?fit?the model to its training data.
Step 5: Fitting the Model
>>> model.fit(train_images, train_labels, epochs=5, batch_size=128)
Step 6: Prediction on Data
test_digits = test_images[0:10]
>>> predictions = model.predict(test_digits)
>>> predictions[0]
array([1.0726176e-10, 1.6918376e-10, 6.1314843e-08, 8.4106023e-06,
????2.9967067e-11, 3.0331331e-09, 8.3651971e-14, 9.9999106e-01,
????2.6657624e-08, 3.8127661e-07], dtype=float32)
Step 6: Evaluating Model
>>> test_loss, test_acc = model.evaluate(test_images, test_labels)
>>> print(f"test_acc: {test_acc}")
test_acc: 0.9785
The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training-set accuracy (98.9%). This gap between training accuracy and test accuracy is an example of overfitting
....... to be continued .........................