登录查看更多内容

Autoencoder for Data Compression, Denoising, and Anomaly Detection

Asad Kazmi

AI Educator ? Simplifying AI ? I Help You Win with AI ? AI won’t steal your job, but someone who masters it might. Master AI. Stay Unstoppable.

发布日期: 2025年2月3日

In the world of machine learning, Autoencoders, as specialized neural networks, play a pivotal role by learning efficient data representations, enabling them to compress, reconstruct, and interpret data. This article explores how autoencoders apply data compression, denoising, and anomaly detection using the MNIST dataset.

Understanding Autoencoders

An autoencoder is a type of neural network with two main parts:

Encoder: Compresses the input data into a compact latent representation.
Decoder: Reconstructs the original data from the compressed representation.

The process of encoding (mapping each item to a specific location) and decoding (reconstructing the item from its location) is what an autoencoder does in machine learning.

The encoder takes the input (e.g., an image, sound, or data point) and maps it to a lower-dimensional space called the latent space or embedding.

The decoder takes the encoded data and attempts to reconstruct the original item as accurately as possible.

Training the Autoencoder: Minimizing the Reconstruction Error

The goal of training an autoencoder is to minimize the difference between the original input and its reconstructed output. The loss function used during training typically focuses on reducing this reconstruction error, ensuring that the decoded output is as close as possible to the original.

Applications Explored

Once the autoencoder is trained, it can be used for a variety of applications such as:

Data Compression and Reconstruction.
Denoising Corrupted Data.
Detecting Anomalies.

Let’s dive into these tasks with practical code snippets and results.

Task: Data Compression and Reconstruction

Step 1: Preprocessing the MNIST Data

We begin by loading the MNIST dataset, which consists of 28×28 grayscale images, and preparing it for training.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Load MNIST data from CSV
data = pd.read_csv('mnist_data.csv')
y = data.iloc[:, 0].astype(int).values
X = data.iloc[:, 1:].astype(np.float32).values / 255.0  # Normalize pixel values to [0, 1]

# Reshape for visualization
X_images = X.reshape(-1, 28, 28)

# Split data
X_train, X_test, _, _ = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training Data Shape: {X_train.shape}, Test Data Shape: {X_test.shape}")

Step 2: Designing the Autoencoder

The architecture reduces the dimensionality to a 32-dimensional latent space and reconstructs the original 784-dimensional data.

from tensorflow.keras import layers, models

# Define encoder and decoder
input_dim = 784  # Flattened size of the image
encoder_input = layers.Input(shape=(input_dim,))
x = layers.Dense(512, activation='relu')(encoder_input)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dense(128, activation='relu')(x)
latent_space = layers.Dense(32, activation='relu')(x)

# Decoder
x = layers.Dense(128, activation='relu')(latent_space)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dense(512, activation='relu')(x)
decoder_output = layers.Dense(input_dim, activation='sigmoid')(x)

# Autoencoder Model
autoencoder = models.Model(encoder_input, decoder_output)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()

领英推荐

Hyperparameters in Machine Learning: A Comprehensive…

Infomaticae 1 个月前

Unraveling Nonlinearity and the Power of Hidden Layers

Tensor Labs 1 年前

AI-Driven Trends #2 | Dynamic Convolutional Neural…

Lucid Technologies, Inc 1 年前

Step 3: Training and Evaluation

# Train the autoencoder
history = autoencoder.fit(X_train, X_train, validation_data=(X_test, X_test), epochs=50, batch_size=128, verbose=2)

# Visualize loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title("Loss Over Epochs")
plt.show()

Reconstruction results validate the model’s capability to compress and reconstruct images.

Task 2: Denoising Corrupted Data

Adding Noise

# Add Gaussian noise
noise_factor = 0.5
X_train_noisy = np.clip(X_train + noise_factor * np.random.normal(size=X_train.shape), 0., 1.)
X_test_noisy = np.clip(X_test + noise_factor * np.random.normal(size=X_test.shape), 0., 1.)

# Visualize noisy vs. clean images
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(X_test[0].reshape(28, 28), cmap='gray')
plt.title("Original Image")
plt.subplot(1, 2, 2)
plt.imshow(X_test_noisy[0].reshape(28, 28), cmap='gray')
plt.title("Noisy Image")
plt.show()

Training the Denoising Autoencoder

# Train the denoising autoencoder
denoiser = models.clone_model(autoencoder)
denoiser.compile(optimizer='adam', loss='mse')
denoiser.fit(X_train_noisy, X_train, validation_data=(X_test_noisy, X_test), epochs=50, batch_size=128)

Clean reconstruction demonstrates the denoiser's ability to restore corrupted images.

Task 3: Anomaly Detection

Detecting Outliers

Reconstruction error serves as a basis for anomaly detection. Here, random noise is used as anomalous data.

# Simulate anomalies
anomalies = np.random.uniform(0, 1, size=(100, X_train.shape[1]))
X_test_combined = np.vstack((X_test, anomalies))
reconstructions = autoencoder.predict(X_test_combined)

# Compute reconstruction error
errors = np.mean((X_test_combined - reconstructions) ** 2, axis=1)
threshold = np.percentile(errors[:len(X_test)], 95)
predicted_anomalies = (errors > threshold).astype(int)

print(f"Anomaly Threshold: {threshold:.4f}")

Performance Metrics

from sklearn.metrics import precision_score, recall_score, f1_score

true_labels = np.hstack((np.zeros(len(X_test)), np.ones(len(anomalies))))
precision = precision_score(true_labels, predicted_anomalies)
recall = recall_score(true_labels, predicted_anomalies)
f1 = f1_score(true_labels, predicted_anomalies)

print(f"Precision: {precision:.2f}, Recall: {recall:.2f}, F1-score: {f1:.2f}")

Histograms of reconstruction errors highlight the separation between normal and anomalous data.

Conclusion

Autoencoders provide elegant solutions for tasks involving data compression, noise removal, and anomaly detection. With compact latent spaces and robust reconstruction capabilities, these models bridge human intent with machine learning, unlocking insights from data with precision and clarity.

Use the techniques shared here to integrate autoencoders into your next data science or machine learning project!

Human Intent - Machine Action

855 位关注者

要查看或添加评论，请登录

Asad Kazmi的更多文章

GPT-Python Pulse: Creating a Family Tree

2025年2月13日

GPT-Python Pulse: Creating a Family Tree

As artificial intelligence continues to revolutionize how we approach problem-solving, understanding its practical…

5 条评论
GPT-Python Pulse: Multiclass Cohen's Kappa

2025年2月7日

GPT-Python Pulse: Multiclass Cohen's Kappa

As AI continues to reshape industries, understanding its practical applications can significantly enhance your data…
GPT-Python Pulse: Mastering Cohen's Kappa with Python

2025年1月30日

GPT-Python Pulse: Mastering Cohen's Kappa with Python

AI is revolutionizing how we work, learn, and innovate—but understanding its practical applications doesn’t have to be…
Explore ChatGPT for Python

2025年1月27日

Explore ChatGPT for Python

GPT-Python Pulse: IPYNB to HTML Conversion - A Seamless Process Want to convert your Jupyter Notebook into an HTML…

7 条评论
GPT-Python Pulse: SciPy Essentials for Data Science

2025年1月21日

GPT-Python Pulse: SciPy Essentials for Data Science

Welcome to the first edition of GPT-Python Pulse, where we explore how ChatGPT and Python combine to supercharge your…
Unveiling Patterns in the MNIST Dataset

2025年1月11日

Unveiling Patterns in the MNIST Dataset

A Deep Dive into Data Visualization and Exploratory Data Analysis The MNIST dataset, a cornerstone in machine learning…

6 条评论
The Future of Collaboration Between Minds and Machines

2025年1月8日

The Future of Collaboration Between Minds and Machines

AI isn’t just the future—it’s the now. At the center of this transformation are tools like Large Language Models…

3 条评论
MNIST Handwritten Digits Classification Using a Convolutional Neural Network

2024年12月25日

MNIST Handwritten Digits Classification Using a Convolutional Neural Network

The MNIST handwritten digits classification problem involves recognizing digits (0–9) from grayscale images. The MNIST…
Merging Left Brain and Right Brain: The AI-Powered Creative Leap

2024年12月15日

Merging Left Brain and Right Brain: The AI-Powered Creative Leap

AI-Driven Data Solutions In today’s rapidly evolving field of Data Science and Artificial Intelligence (AI), innovation…

4 条评论
LINEAR REGRESSION MADE EASY

2024年12月9日

LINEAR REGRESSION MADE EASY

When we hear terms like "Machine Learning" and "Predictive Models", does they sound like magical tools that can…

6 条评论

See all articles

Autoencoder for Data Compression, Denoising, and Anomaly Detection

Asad Kazmi

AI Educator ? Simplifying AI ? I Help You Win with AI ? AI won’t steal your job, but someone who masters it might. Master AI. Stay Unstoppable.

Understanding Autoencoders

Applications Explored

Task: Data Compression and Reconstruction

Step 1: Preprocessing the MNIST Data

Step 2: Designing the Autoencoder

领英推荐

Step 3: Training and Evaluation

Task 2: Denoising Corrupted Data

Adding Noise

Training the Denoising Autoencoder

Task 3: Anomaly Detection

Detecting Outliers

Performance Metrics

Conclusion

Human Intent - Machine Action

855 位关注者

Asad Kazmi的更多文章

社区洞察

其他会员也浏览了

Face Recognition in Machine Learning

The Math Behind Perceptron: A Step-by-Step Guide to Neural Network Learning and Decision Boundaries

Artificial Intelligence - Part 6.5 - Neural Network/Machine Learning Dimensionality Reduction Algorithm

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

BxD Primer Series: Liquid State Machine (LSM) Neural Networks

DeepSig Autoencoders And Meta-learning systems like DNDR (Deep Neural Decoder with Reinforcement): A Deep Dive

Explaining multilayer perceptrons in terms of general matrix multiplication

Top Most Commonly used Deep Learning Algorithms

Pose Estimation Technology: Unlocking the Potential of Human Motion Analysis

?? Understanding Convolutional Neural Networks (CNNs): The Backbone of Visual AI

Understanding Autoencoders

Applications Explored

Task: Data Compression and Reconstruction

Step 1: Preprocessing the MNIST Data

Step 2: Designing the Autoencoder

领英推荐

Step 3: Training and Evaluation

Task 2: Denoising Corrupted Data

Adding Noise

Training the Denoising Autoencoder

Task 3: Anomaly Detection

Detecting Outliers

Performance Metrics

Conclusion

Human Intent - Machine Action

855 位关注者

Asad Kazmi的更多文章

GPT-Python Pulse: Creating a Family Tree

GPT-Python Pulse: Multiclass Cohen's Kappa

GPT-Python Pulse: Mastering Cohen's Kappa with Python

Explore ChatGPT for Python

GPT-Python Pulse: SciPy Essentials for Data Science

Unveiling Patterns in the MNIST Dataset

The Future of Collaboration Between Minds and Machines

MNIST Handwritten Digits Classification Using a Convolutional Neural Network

Merging Left Brain and Right Brain: The AI-Powered Creative Leap

LINEAR REGRESSION MADE EASY

社区洞察

其他会员也浏览了

Face Recognition in Machine Learning

The Math Behind Perceptron: A Step-by-Step Guide to Neural Network Learning and Decision Boundaries

Artificial Intelligence - Part 6.5 - Neural Network/Machine Learning Dimensionality Reduction Algorithm

BxD Primer Series: Long Short-Term Memory (LSTM) Neural Networks

BxD Primer Series: Liquid State Machine (LSM) Neural Networks

DeepSig Autoencoders And Meta-learning systems like DNDR (Deep Neural Decoder with Reinforcement): A Deep Dive

Explaining multilayer perceptrons in terms of general matrix multiplication

Top Most Commonly used Deep Learning Algorithms

Pose Estimation Technology: Unlocking the Potential of Human Motion Analysis

?? Understanding Convolutional Neural Networks (CNNs): The Backbone of Visual AI