登录查看更多内容

MNIST Handwritten Digits Classification Using a Convolutional Neural Network

Asad Kazmi

AI Educator ? Simplifying AI ? I Help You Win with AI ? AI won’t steal your job, but someone who masters it might. Master AI. Stay Unstoppable.

发布日期: 2024年12月25日

The MNIST handwritten digits classification problem involves recognizing digits (0–9) from grayscale images. The MNIST dataset is a benchmark dataset for image classification tasks, particularly useful for testing deep learning algorithms. It contains 60,000 training images and 10,000 test images, each a 28x28 pixel grayscale image.

The MNIST dataset serves as a "Hello World" example for machine learning and deep learning practitioners. Solving it helps in understanding fundamental concepts of computer vision, CNN architectures, and model evaluation.

For creating impactful machine learning/deep learning models for image classification, one of the most foundational steps is data preprocessing—transforming raw data into a format that a machine can learn from.

When I first encountered a computer vision (image classification) task it felt overwhelming. How do you ensure your model is ready to learn from thousands of images? How do you avoid overfitting, and how do you make sure your model generalizes well?

But I realized this process involves several key stages: data loading, preprocessing, augmentation. After data processing we design and train a deep learning model and evaluate its performance on unseen test data.

In this article, we'll walk through a detailed example using the MNIST dataset to showcase how each of these steps contributes to creating a robust deep learning model.

Through trial, error, and learning, I followed the necessary steps to prepare the MNIST dataset:

1. Data Loading and Preprocessing

The first step is to load the dataset and prepare it for use in a machine learning model. We start by loading the dataset using the mnist.load_data() function from Keras. This function automatically splits the dataset into training and testing sets.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Normalization

Raw pixel values in the MNIST dataset range from 0 to 255. To optimize training and make the model converge faster, we normalize these pixel values by dividing them by 255, converting the values to a range of [0, 1].

X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

Reshaping and Adding Channel Dimension

Since the images are grayscale, we need to reshape the data to include a channel dimension. The model expects the input to be in the shape (28, 28, 1), where 28x28 represents the pixel dimensions, and 1 denotes the grayscale channel.

X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

One-Hot Encoding

Next, the target labels (0-9) are converted into one-hot encoded vectors. For instance, the label 3 becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. This encoding is necessary for multi-class classification with the softmax activation function in the final layer.

from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

2. Splitting Data

With separate train and test with the MNIST dataset, we created a validation set from the training data. This allows us to monitor performance during training and help prevent overfitting.

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

3. Data Augmentation

To further improve the model's ability to generalize, we apply data augmentation. This technique artificially increases the size of the training dataset by applying random transformations such as rotations, shifts, and flips. Keras's ImageDataGenerator class makes this process straightforward.

# Data augmentation
datagen = ImageDataGenerator(
    rotation_range=10,
    zoom_range=0.1,
    width_shift_range=0.1,
    height_shift_range=0.1
)
datagen.fit(X_train)  # Fit data generator

The datagen.fit(X_train) ensures that transformations are applied only to the training data, not to the validation or test datasets.

领英推荐

Exploring the Depths: Unraveling the Intricacies of…

Commtel Networks 1 年前

The Future of Embeddings for Computer Vision Data…

Superb AI Inc. 1 年前

Unveiling the Power of Deep Learning in License Plate…

深圳市捷顺科技实业股份有限公司 1 年前

4. Model Architecture

With our data preprocessed, it’s time to design the neural network. We’ll use a Convolutional Neural Network (CNN), which is particularly effective for image data.

Convolutional Layers

Convolutional layers apply filters to extract local features from the image. In our model, we use two convolutional layers with ReLU activation to capture spatial patterns.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, Input

# Define the model
model = Sequential([
    # Use Input layer to specify the input shape
    Input(shape=(28, 28, 1)),  # Specify the input shape here

    Conv2D(32, (3, 3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(),
    
    Conv2D(64, (3, 3), activation='relu'),
    BatchNormalization(),
    MaxPooling2D(),
    
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

Key Layers Explained:

Conv2D: Applies convolutional filters to extract features.
BatchNormalization: Normalizes activations for faster training.
MaxPooling2D: Reduces spatial dimensions by downsampling.
Dropout: Prevents overfitting by randomly deactivating some neurons.
Dense: Fully connected layers for decision-making, with the final layer using softmax for multi-class classification.

5. Compilation and Callbacks

After defining the model architecture, we need to compile it by specifying the loss function, optimizer, and evaluation metrics.

Loss Function

We use categorical_crossentropy for multi-class classification, as it measures the difference between the predicted probabilities and the true class labels.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Callbacks

Callbacks help improve training efficiency and prevent overfitting:

TensorBoard: Visualizes training progress.
ReduceLROnPlateau: Reduces the learning rate when validation performance plateaus.
EarlyStopping: Stops training early if the validation loss stops improving.

from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, TensorBoard

callbacks = [
    EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True),
    ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, min_lr=1e-6),
    TensorBoard(log_dir='./logs')
]

6. Model Training

With our model compiled and callbacks set, we train the model using the augmented data generator. We specify a batch size of 32 and train the model for up to 10 epochs, although early stopping will stop training if performance plateaus.

history = model.fit(datagen.flow(X_train, y_train, batch_size=32),
                    epochs=10,
                    validation_data=(X_val, y_val),
                    callbacks=callbacks)

7. Evaluation

Once training is complete, we evaluate the model on the test data to assess its final performance.

test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}, Test Loss: {test_loss:.4f}")

The code successfully builds a CNN model that learns to classify handwritten digits using image processing techniques, data augmentation, and callbacks to achieve high accuracy and robustness.

Stay tuned for an in-depth look at the math behind neural networks. Don’t miss the next newsletter for more insights!

Human Intent - Machine Action

855 位关注者

要查看或添加评论，请登录

Asad Kazmi的更多文章

GPT-Python Pulse: Creating a Family Tree

2025年2月13日

GPT-Python Pulse: Creating a Family Tree

As artificial intelligence continues to revolutionize how we approach problem-solving, understanding its practical…

5 条评论
GPT-Python Pulse: Multiclass Cohen's Kappa

2025年2月7日

GPT-Python Pulse: Multiclass Cohen's Kappa

As AI continues to reshape industries, understanding its practical applications can significantly enhance your data…
Autoencoder for Data Compression, Denoising, and Anomaly Detection

2025年2月3日

Autoencoder for Data Compression, Denoising, and Anomaly Detection

In the world of machine learning, Autoencoders, as specialized neural networks, play a pivotal role by learning…
GPT-Python Pulse: Mastering Cohen's Kappa with Python

2025年1月30日

GPT-Python Pulse: Mastering Cohen's Kappa with Python

AI is revolutionizing how we work, learn, and innovate—but understanding its practical applications doesn’t have to be…
Explore ChatGPT for Python

2025年1月27日

Explore ChatGPT for Python

GPT-Python Pulse: IPYNB to HTML Conversion - A Seamless Process Want to convert your Jupyter Notebook into an HTML…

7 条评论
GPT-Python Pulse: SciPy Essentials for Data Science

2025年1月21日

GPT-Python Pulse: SciPy Essentials for Data Science

Welcome to the first edition of GPT-Python Pulse, where we explore how ChatGPT and Python combine to supercharge your…
Unveiling Patterns in the MNIST Dataset

2025年1月11日

Unveiling Patterns in the MNIST Dataset

A Deep Dive into Data Visualization and Exploratory Data Analysis The MNIST dataset, a cornerstone in machine learning…

6 条评论
The Future of Collaboration Between Minds and Machines

2025年1月8日

The Future of Collaboration Between Minds and Machines

AI isn’t just the future—it’s the now. At the center of this transformation are tools like Large Language Models…

3 条评论
Merging Left Brain and Right Brain: The AI-Powered Creative Leap

2024年12月15日

Merging Left Brain and Right Brain: The AI-Powered Creative Leap

AI-Driven Data Solutions In today’s rapidly evolving field of Data Science and Artificial Intelligence (AI), innovation…

4 条评论
LINEAR REGRESSION MADE EASY

2024年12月9日

LINEAR REGRESSION MADE EASY

When we hear terms like "Machine Learning" and "Predictive Models", does they sound like magical tools that can…

6 条评论

See all articles

MNIST Handwritten Digits Classification Using a Convolutional Neural Network

Asad Kazmi

AI Educator ? Simplifying AI ? I Help You Win with AI ? AI won’t steal your job, but someone who masters it might. Master AI. Stay Unstoppable.

1. Data Loading and Preprocessing

Normalization

Reshaping and Adding Channel Dimension

One-Hot Encoding

2. Splitting Data

3. Data Augmentation

领英推荐

4. Model Architecture

Convolutional Layers

Key Layers Explained:

5. Compilation and Callbacks

Loss Function

Callbacks

6. Model Training

7. Evaluation

Human Intent - Machine Action

855 位关注者

Asad Kazmi的更多文章

社区洞察

其他会员也浏览了

From Perceptrons to Transformers: The Swift Evolution of Machine Learning

AI & ML Fundamentals: Deep Learning vs. Traditional Machine Learning

Understanding machine learning and deep learning algorithms by comparing and contrasting

Mathematical foundations of Data Science: Deep Learning algorithms expressed as a mapping of a hidden function

Artificial Intelligence Unfolded - Article 1: A Comprehensive Guide to ML, Neural Networks, and Deep Learning

Exploring the Depths: Unraveling the Intricacies of Machine Learning and Deep Learning

Data Transformation and Wrangling: An Extensive Look into ML Data Pipelines In DQN

?? How a Mini Neural Network Reads Handwritten Digits! ??

The misguided intuition I had to unlearn to come to grips with modern machine learning

1. Data Loading and Preprocessing

Normalization

Reshaping and Adding Channel Dimension

One-Hot Encoding

2. Splitting Data

3. Data Augmentation

领英推荐

4. Model Architecture

Convolutional Layers

Key Layers Explained:

5. Compilation and Callbacks

Loss Function

Callbacks

6. Model Training

7. Evaluation

Human Intent - Machine Action

855 位关注者

Asad Kazmi的更多文章

GPT-Python Pulse: Creating a Family Tree

GPT-Python Pulse: Multiclass Cohen's Kappa

Autoencoder for Data Compression, Denoising, and Anomaly Detection

GPT-Python Pulse: Mastering Cohen's Kappa with Python

Explore ChatGPT for Python

GPT-Python Pulse: SciPy Essentials for Data Science

Unveiling Patterns in the MNIST Dataset

The Future of Collaboration Between Minds and Machines

Merging Left Brain and Right Brain: The AI-Powered Creative Leap

LINEAR REGRESSION MADE EASY

社区洞察

其他会员也浏览了

From Perceptrons to Transformers: The Swift Evolution of Machine Learning

AI & ML Fundamentals: Deep Learning vs. Traditional Machine Learning

Understanding machine learning and deep learning algorithms by comparing and contrasting

Mathematical foundations of Data Science: Deep Learning algorithms expressed as a mapping of a hidden function

Artificial Intelligence Unfolded - Article 1: A Comprehensive Guide to ML, Neural Networks, and Deep Learning

Exploring the Depths: Unraveling the Intricacies of Machine Learning and Deep Learning

Data Transformation and Wrangling: An Extensive Look into ML Data Pipelines In DQN

?? How a Mini Neural Network Reads Handwritten Digits! ??

The misguided intuition I had to unlearn to come to grips with modern machine learning