登录查看更多内容

How do I implement a simple neural network from scratch in Python?

Brecht Corbeel

Artist, illustrator, writer.

发布日期: 2023年9月2日

Abstract

Ah, the tantalizing dance of artificial neurons, firing in harmony, decoding the universe's mysteries one byte at a time. Neural networks - they're not just a chunk of code or a mathematical puzzle. They're a canvas, a piece of art where every neuron and every connection tells a story. In this symphony of bytes and tensors, we'll embark on an odyssey. An odyssey to craft a neural network from scratch, with Python as our trusty steed.

Introduction

If you've ever gazed up at the night sky, you've probably felt overwhelmed by the vastness of the universe. But just as the stars in the cosmos come together to form constellations, the elements of a neural network converge to create intelligence. Each neuron, like a star, has its role, and understanding that role is the key to harnessing its power.

The act of creating a neural network from scratch is not just coding; it's an art. It's about understanding the intricate dance of backpropagation as it adjusts weights, feeling the rhythm of activation functions like ReLU and Sigmoid, and orchestrating the ensemble with gradient descent. But as we delve deeper, we encounter the complex harmonies of concepts like convolutional layers and the jazz improvisations of dropout regularization.

Now, you might be wondering, "Why start from scratch?" After all, there are libraries out there that do the heavy lifting. But there's magic in understanding the nuts and bolts, the very fabric of this intelligence. And by the end of our journey, not only will we have a neural network dancing to our tune, but we'll also appreciate every step, every misstep, and every triumph.

Let's embark on this journey, shall we? And as we traverse this digital landscape, we'll sprinkle in a dash of Python to bring our neural dreams to life. Let's dive into the matrix!

Crafting the Canvas: Our Neural Blueprint

Before we delve into the code, let's take a moment to appreciate the architecture of our neural maestro. At its core, a neural network is a series of layers, each containing a set of neurons. These neurons are connected, much like the stars in a constellation, and it's these connections that hold the key to learning.

Setting the Stage with Python

To start, we need to lay down the foundation. Python, with its simplicity and power, is our tool of choice.

import numpy as np

# Define our activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

Now, the sigmoid function is just one of the many activation functions we can use. It squashes values between 0 and 1, making it ideal for output neurons in binary classification tasks.

Next, let's define our neural network structure. For simplicity, let's consider a network with an input layer, one hidden layer, and an output layer.

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)
        self.bias_hidden = np.random.randn(hidden_size)
        self.bias_output = np.random.randn(output_size)

This class sets the stage. But the real magic, the learning, happens during backpropagation. This is where the network adjusts its weights based on the error of its predictions. It's the heart and soul of our network, the maestro that orchestrates the entire performance.

The Maestro's Symphony: Training the Network

Ah, training - the grand performance where our neural network learns to dance. It's an iterative process, a dance of numbers, where each step, each epoch, brings us closer to perfection. Let's delve deeper and understand how our maestro, the neural network, learns from the data.

Backpropagation: The Dance of Learning

Backpropagation is the essence of neural learning. It's a process where errors are propagated backward through the network, adjusting weights to minimize the difference between the predicted output and the actual target. Think of it as a choreographer correcting a dancer's steps in real-time to perfect the performance.

Here's a simple code snippet illustrating the process:

    def train(self, inputs, targets, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward Pass
            hidden_layer_input = np.dot(inputs, self.weights_input_hidden) + self.bias_hidden
            hidden_layer_output = sigmoid(hidden_layer_input)
            
            output_layer_input = np.dot(hidden_layer_output, self.weights_hidden_output) + self.bias_output
            predicted_output = sigmoid(output_layer_input)
            
            # Calculate the Error
            error = targets - predicted_output
            
            # Backward Pass
            d_predicted_output = error * sigmoid_derivative(predicted_output)
            error_hidden_layer = d_predicted_output.dot(self.weights_hidden_output.T)
            d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)
            
            # Update the Weights and Biases
            self.weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
            self.bias_output += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate
            self.weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate
            self.bias_hidden += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

This function embodies the essence of gradient descent optimization, guiding the neural network to find the optimal weights that minimize the prediction error.

Advanced Techniques: The Nuances of the Dance

Neural networks aren't just about raw calculations. They're about finesse, about leveraging advanced techniques to refine the learning process:

Weight Initialization: Instead of randomly initializing weights, techniques like He initialization or Xavier initialization can be applied. They consider the size of the previous layer, ensuring a more balanced weight distribution and faster convergence.
Learning Rate Schedulers: A constant learning rate might not always be efficient. Learning rate schedulers dynamically adjust the learning rate based on the epoch or the error, ensuring faster convergence and preventing overshooting.
Regularization: Just as a dancer doesn't want to overextend a move, we don't want our network to overfit the training data. Techniques like L1 and L2 regularization penalize overly complex models, ensuring a balance between bias and variance.
Batch Training: Instead of training on the entire dataset, we can train on smaller batches. This is known as mini-batch gradient descent, and it often results in faster convergence and can navigate better around the local minima.

The Encore: Evaluating and Fine-tuning the Network

After the training dance, it's essential to evaluate our neural maestro's performance. It’s not just about how well it dances, but also how it adapts to new rhythms and beats.

Cross-Validation: The Rehearsal before the Final Act

Cross-validation is a technique where the training data is split into multiple subsets. The model is trained on some of these subsets and validated on others. This process is repeated multiple times, rotating the validation set. The most common method is the k-fold cross-validation, where the data is divided into k subsets.

This approach ensures that our network is not just memorizing the dance moves but genuinely learning them. By evaluating its performance on unseen data, we can ascertain its generalization capabilities.

Metrics Galore: Applause, Critiques, and Everything in Between

Understanding how well our network performs is crucial. Here are some metrics that offer insights:

Accuracy: The percentage of correct predictions. But, it's not always the best metric, especially for imbalanced datasets.
Precision and Recall: Precision gauges the number of true positive predictions among all positive predictions. Recall measures how many of the actual positives our model captures through labeling it as positive.
F1-Score: The harmonic mean of precision and recall. It provides a balance between the two, especially when the class distribution is uneven.
ROC and AUC: The Receiver Operating Characteristic (ROC) curve visualizes the performance of a binary classifier. The Area Under the Curve (AUC) quantifies this performance, with 1 being perfect and 0.5 indicating random guessing.

领英推荐

AIM Weekly 19-August-2024

Tim Spann 7 个月前

My new GenAI book is now available!

Vincent Granville 1 年前

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs…

Vincent Granville 11 个月前

Advanced Architectures: Beyond the Basic Steps

While our simple neural network is a marvel, the world of deep learning offers intricate dances:

Convolutional Neural Networks (CNNs): The poster child for image processing. CNNs employ filters (or kernels) to scan through an image, capturing spatial hierarchies of patterns.
Recurrent Neural Networks (RNNs): Tailored for sequential data like time series or text. RNNs have a memory of their past calculations, making them adept at understanding sequences and contexts.
Transformers: A newer architecture that’s been making waves, especially in natural language processing. Leveraging attention mechanisms, transformers weigh the importance of different parts of input data differently.

import numpy as np

# Activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Define the neural network architecture
input_neurons = 3
hidden_neurons = 4
output_neurons = 1

# Randomly initialize weights and biases
input_hidden_weights = np.random.rand(input_neurons, hidden_neurons)
hidden_output_weights = np.random.rand(hidden_neurons, output_neurons)
hidden_bias = np.random.rand(1, hidden_neurons)
output_bias = np.random.rand(1, output_neurons)

# Training the neural network
def train_neural_network(X, y, epochs, learning_rate):
    global input_hidden_weights, hidden_output_weights, hidden_bias, output_bias
    
    for epoch in range(epochs):
        # Forward pass
        hidden_layer_input = np.dot(X, input_hidden_weights) + hidden_bias
        hidden_layer_output = sigmoid(hidden_layer_input)
        output_layer_input = np.dot(hidden_layer_output, hidden_output_weights) + output_bias
        predicted_output = sigmoid(output_layer_input)

        # Calculate the error
        error = y - predicted_output

        # Backpropagation
        d_predicted_output = error * sigmoid_derivative(predicted_output)
        error_hidden_layer = d_predicted_output.dot(hidden_output_weights.T)
        d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

        # Update weights and biases
        hidden_output_weights += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
        output_bias += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate
        input_hidden_weights += X.T.dot(d_hidden_layer) * learning_rate
        hidden_bias += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate
        
        # Print error at every 1000 epochs
        if epoch % 1000 == 0:
            print(f"Epoch {epoch}, Error: {np.mean(np.abs(error))}")

# Sample input and output
X = np.array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 0]])
y = np.array([[0], [1], [1], [0]])

# Train the network
train_neural_network(X, y, epochs=10000, learning_rate=0.01)

Optimization Techniques: Making Our Neural Dance Fluid

For our neural network to dance gracefully, it's not enough for it to just follow the beats. It needs the right choreography, the perfect timing, and sometimes, a touch of finesse. Optimization algorithms are our choreographers. They guide the learning process, ensuring that the network doesn't stumble upon local minima or overshoot the global minimum.

Momentum: Building Up The Rhythm

Just like a dancer uses momentum to carry through a series of moves, in the optimization world, momentum helps the gradient descent algorithm to navigate the ravines, i.e., areas where surface curves much more steeply in one dimension than in another, which are common around local optima. By using momentum, we add a fraction of the direction of the previous step to a current step. This serves two primary purposes:

It amplifies the speed of descent in persistent directions.
It diminishes oscillations in variable directions.

RMSprop: Smoothing Out The Moves

Root Mean Square Propagation or RMSprop is like the ballet technique in neural network optimization. It's all about elegance and grace. RMSprop adjusts the learning rate by dividing it by an exponentially decaying average of squared gradients. This ensures that the learning rate gets adjusted in a way that it speeds up the convergence without aggressive oscillations.

Adam Optimizer: The Contemporary Fusion

Adam, short for Adaptive Moment Estimation, is like a contemporary dance form, blending the best of many worlds. It's a combination of RMSprop and momentum. Adam computes adaptive learning rates for different parameters. In other words, it merges the perks of both the aforementioned techniques, ensuring swift, efficient, and robust learning.

Transfer Learning: Borrowing Moves from Other Dances

Imagine a dancer who's mastered salsa trying their hand at cha-cha. They don't start from scratch. They transfer some of their salsa moves, refine them, and adapt to the cha-cha rhythm. Similarly, transfer learning is about using a pre-trained network, trained on a massive dataset, and fine-tuning it for a new, similar task. This technique is particularly beneficial when you have limited data for the new task. It leverages the patterns (or features) learned from the extensive dataset, ensuring a head start in the learning process.

Dropout: The Improvisation Element

In dance, sometimes, the unplanned, spontaneous moves are the ones that stand out. Dropout, in the realm of neural networks, is a bit like that improvisation. It's a regularization method where, during training, random subsets of neurons are "dropped out" or temporarily set to zero, ensuring that the network doesn't rely too much on any single neuron. This promotes a more robust learning process.

The Symphony of Backpropagation and Gradient Descent

Have you ever wondered how a dancer knows when to spin, twirl, or jump? It's all about feedback. A dancer practices, receives feedback, and iteratively refines their movements. This dynamic mirrors one of the most vital processes in training neural networks: backpropagation combined with gradient descent. But let's dive deeper into this dance of numbers and weights.

Backpropagation: Echoes in the Neural Halls

Backpropagation, a term that might sound like some sci-fi concept, is the heart of training most neural networks. Imagine shouting in a vast hall and hearing the echo bounce back. This echo can tell you a lot about the hall's shape and size. Similarly, backpropagation is about sending errors (or the difference between predicted and actual values) backward through the network. It adjusts the weights of the neurons based on the magnitude of the error.

But how does this echo, or error, navigate its way through the myriad of neurons? This is where the loss function comes into play. It quantifies how far off our predictions are from the actual values. It's like a dance instructor telling you, "Your spin was off by a few degrees."

Gradient Descent: The Graceful Descent to Perfection

Now, knowing the error is one thing. Adjusting the dance moves (or in our case, the neuron weights) is another challenge. Gradient Descent is our guide here. Imagine standing on a hilltop and wanting to reach the valley by taking the steepest path downwards. In the realm of neural networks, this hill represents the error landscape, and the valley is the point of minimum error.

Gradient Descent assesses the landscape using the gradients, which are derived from our trusty loss function. The gradients indicate the steepest descent direction. It's akin to water finding its path down a hill—it always chooses the route of maximum descent.

However, this descent isn't always straightforward. Picture a dancer attempting to perfect a move. They don't just keep practicing harder and harder. Sometimes they slow down, refining the nuances. This slowing down is controlled by the learning rate in gradient descent. A high learning rate might make the algorithm converge faster, but there's a risk of overshooting the minimum point. A low learning rate is more meticulous but might take eons to converge.

Stochastic and Mini-Batch Gradient Descent: The Dance Variations

In the traditional gradient descent, we compute the gradient using the entire dataset, which might be time-consuming. Imagine a dancer practicing the entire routine to refine just one move. Sounds inefficient, right?

Enter Stochastic Gradient Descent (SGD). Instead of practicing the entire routine, the dancer focuses on one move at a time. Similarly, SGD updates the weights using only one training example at a time. It's faster but can be a bit erratic.

A middle ground is the Mini-Batch Gradient Descent. It's like practicing a section of the dance rather than the whole routine or a single move. In neural networks, this means updating weights using a batch of training examples, striking a balance between speed and stability.

Vanishing and Exploding Gradients: The Missteps

Training a neural network isn't always a smooth waltz. Sometimes we encounter the missteps of vanishing or exploding gradients, especially in deep networks. Imagine a dancer's spin getting slower and slower until it stops (vanishing) or getting so fast that they lose control (exploding). These issues can severely hamper the training process. Techniques like Batch Normalization and proper weight initialization strategies can help keep our dancer's spin just right.

The Dance of Numbers and Adjustments

Training a neural network, with its intricate choreography of backpropagation and gradient descent, is much like refining a dance. It's a delicate balance of feedback, adjustments, and iterations. As we continue to refine our models, develop novel architectures, and understand deeper nuances, we realize that the dance of numbers, much like any art form, is a blend of precision, creativity, and constant learning.

Exploring the Pillars of Tech:

1,683 位关注者

要查看或添加评论，请登录

Brecht Corbeel的更多文章

Could OpenAI models be integrated with Neuralink to enhance brain-computer interface capabilities?

2024年12月15日

Could OpenAI models be integrated with Neuralink to enhance brain-computer interface capabilities?

Exhibit intricate computational paradigms that unify OpenAI-derived inference mechanisms with Neuralink’s…
How might Neuralink's coordinate systems be adapted for real-time AI-assisted brain mapping?

2024年12月15日

How might Neuralink's coordinate systems be adapted for real-time AI-assisted brain mapping?

Exhibit how Neuralink’s emerging coordinate frameworks can be seamlessly integrated into advanced AI-driven paradigms…
Black Forest Labs FLUX 1 = Midjourney Killer?

2024年8月6日

Black Forest Labs FLUX 1 = Midjourney Killer?

The recently launched FLUX.1 model by Black Forest Labs, the team behind Stable Diffusion, is generating significant…
Tim Cook Is Trying To Turn Apple Into The EU Of Tech: Contribute And Invent Nothing Dictate Morals And Virtue To Your Competitors

2024年6月28日

Tim Cook Is Trying To Turn Apple Into The EU Of Tech: Contribute And Invent Nothing Dictate Morals And Virtue To Your Competitors

Tim Cook's leadership at Apple has often been a subject of intense scrutiny and debate. As he navigates the company…

2 条评论
The 10 Most Important ComfyUI Nodes Explained

2024年6月27日

The 10 Most Important ComfyUI Nodes Explained

ComfyUI, a versatile and powerful tool for managing Stable Diffusion workflows, leverages a node-based architecture…
Tim Cook Is Still Failing To Innovate

2024年6月26日

Tim Cook Is Still Failing To Innovate

Tim Cook, who succeeded Steve Jobs as the CEO of Apple Inc. in 2011, has presided over a period of significant…
Windows 11 The Fastest Nightmare OS You Will Ever Install

2024年6月26日

Windows 11 The Fastest Nightmare OS You Will Ever Install

Windows 11, despite being the latest offering from Microsoft, has garnered significant criticism for various reasons…
The Main Pure Math You Should Learn For ML And ML-Application Development

2024年6月26日

The Main Pure Math You Should Learn For ML And ML-Application Development

This article delves into the foundational pure mathematics essential for machine learning and ML application…
The Fundamentals Of Computer Networking

2024年6月25日

The Fundamentals Of Computer Networking

This article delves into the intricate and advanced concepts underpinning modern networking, providing a deep dive into…
The rise and ever expansive utility of ComfyUI in 2024

2024年6月24日

The rise and ever expansive utility of ComfyUI in 2024

ComfyUI has emerged as a pivotal tool in 2024, representing a significant leap in the utility and accessibility of…

See all articles

How do I implement a simple neural network from scratch in Python?

Brecht Corbeel

Artist, illustrator, writer.

Abstract

Introduction

Crafting the Canvas: Our Neural Blueprint

Setting the Stage with Python

The Maestro's Symphony: Training the Network

Backpropagation: The Dance of Learning

Advanced Techniques: The Nuances of the Dance

The Encore: Evaluating and Fine-tuning the Network

Cross-Validation: The Rehearsal before the Final Act

Metrics Galore: Applause, Critiques, and Everything in Between

领英推荐

Advanced Architectures: Beyond the Basic Steps

Exploring the Pillars of Tech:

1,683 位关注者

Brecht Corbeel的更多文章

社区洞察

其他会员也浏览了

Machine Learning for Beginners: An Introduction to Neural Networks

Exploring TensorFlow: A Python Guide to Machine Learning and Neural Networks

Pytorch : Everything you need to know in 10 min

3D Fractal Dimension

How to Learn AI on Your Own

Review of Imperial College London's Professional Certificate in AI/ML (25 weeks) Course

Building a neural network in python is quite simple

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

MLBP 9: ONNX Shakes up the Deep Learning Landscape and Numpy Drops Support for Python 2.7

Real-time 'me-not_me' Face Detector

Abstract

Introduction

Crafting the Canvas: Our Neural Blueprint

Setting the Stage with Python

The Maestro's Symphony: Training the Network

Backpropagation: The Dance of Learning

Advanced Techniques: The Nuances of the Dance

The Encore: Evaluating and Fine-tuning the Network

Cross-Validation: The Rehearsal before the Final Act

Metrics Galore: Applause, Critiques, and Everything in Between

领英推荐

Advanced Architectures: Beyond the Basic Steps

Exploring the Pillars of Tech:

1,683 位关注者

Brecht Corbeel的更多文章

Could OpenAI models be integrated with Neuralink to enhance brain-computer interface capabilities?

How might Neuralink's coordinate systems be adapted for real-time AI-assisted brain mapping?

Black Forest Labs FLUX 1 = Midjourney Killer?

Tim Cook Is Trying To Turn Apple Into The EU Of Tech: Contribute And Invent Nothing Dictate Morals And Virtue To Your Competitors

The 10 Most Important ComfyUI Nodes Explained

Tim Cook Is Still Failing To Innovate

Windows 11 The Fastest Nightmare OS You Will Ever Install

The Main Pure Math You Should Learn For ML And ML-Application Development

The Fundamentals Of Computer Networking

The rise and ever expansive utility of ComfyUI in 2024

社区洞察

其他会员也浏览了

Machine Learning for Beginners: An Introduction to Neural Networks

Exploring TensorFlow: A Python Guide to Machine Learning and Neural Networks

Pytorch : Everything you need to know in 10 min

3D Fractal Dimension

How to Learn AI on Your Own

Review of Imperial College London's Professional Certificate in AI/ML (25 weeks) Course

Building a neural network in python is quite simple

Python and the Democratization of AI: Hands-On Code Examples and Creative Project Ideas (EN-PT)

MLBP 9: ONNX Shakes up the Deep Learning Landscape and Numpy Drops Support for Python 2.7

Real-time 'me-not_me' Face Detector