Perceptron Based Linear Regression model
Million Individual Neurons Firing in Mouse Brain

Perceptron Based Linear Regression model

What is perceptron ?

Perceptions are one of the simplest types of feed forward neural networks and the foundational building block of deep learning models. Invented in 1958 by Frank Rosenblatt, perceptron were designed to model the way the human brain processes information. A perceptron is essentially a linear classifier used for binary classification tasks, which means it predicts whether an input belongs to one class or another based on a linear predictor function combining a set of weights with the feature vector.

Typically, when we think of perceptron, the immediate association is with classification tasks. These are scenarios where the goal is to categorise inputs into two or more classes, making perceptron the go-to choice for problems with a clear, discrete boundary.

Classification

However, the innovative twist of applying a perceptron to linear regression tasks opens up a exploration in the field of machine learning algorithms and their potential for cross-domain application.

Linear regression, at its core, is a method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The primary goal here is prediction or forecasting, where continuous values are the output, contrasting sharply with the categorical output of classification tasks. This makes the idea of using a perceptron for linear regression initially seem like a square peg in a round hole a challenging fit, but not impossible.

Input, Weight, bias, Linear Activation function & Output

Firstly, it showcases the flexibility of neural networks and their components. A simple tweak in the activation function can repurpose a basic neural network unit for a completely different kind of task. Secondly, it highlights the importance of foundational machine learning concepts and their interconnectivity. Understanding the principles behind algorithms allows for creative applications beyond their standard uses.

The basic structure of a perceptron includes input values, weights, a bias (or threshold), and an activation function.

  1. Input Values (Features): These are the quantitative features or attributes of the observation you're trying to classify. For example, if you're trying to determine whether an image contains a cat, the inputs might be the pixels of the image.
  2. Weights: Each input feature is assigned a weight that signifies its importance. These weights are adjusted during the training process.
  3. Bias: The bias, or threshold, allows the perceptron to shift the decision boundary away from the origin without depending on the input values alone. It can be thought of as an extra input to the perceptron that always has the value of 1 but has its own weight.
  4. Summation: The perceptron computes a weighted sum of its input features, adding the bias to this sum.
  5. Activation Function: The result of the weighted sum is passed through an activation function, which in the case of a basic perceptron is typically a step function. This function decides whether the neuron fires or not, based on the linear combination of inputs, weights, and bias. If the sum is above a certain threshold, the perceptron outputs one class (for instance, "cat"), and if it is below, it outputs the other class (e.g., "not cat").

Background : Biological process into a mathematical model

Human brain is a network of over 100 billion neurons forms the foundation of our thoughts, emotions, and behaviours. These neurons, akin to biological computational units, communicate through a complex web of electrical and chemical signals.

At the heart of this communication network are dendrites, delicate fibers that protrude from neurons, serving as the receivers of electrical impulses from their neighbours. This signals, continuously received by dendrites, is processed within the neuron. When the cumulative force of these incoming signals surpasses a specific threshold, a important event occurs: the neuron activates, or fires, sending its own electrical charge racing down its axon a long, slender projection that extends outward to make contact with the dendrites of adjacent neurons.

The points of connection where axons and dendrites meet are known as synapses. These are not mere points of physical contact but complex biochemical gateways for the transfer of information. Each neuron, a hub of activity, forms approximately 7,000 synaptic connections, illustrating the dense and dynamic network that underpins our neural activities.

Learning and memory in this neural tapestry are governed by the principles of synaptic plasticity, a concept embodied in Hebb’s rule, famously encapsulated in the phrase, Cells that fire together wire together. This rule posits that the synaptic connections between neurons strengthen as they simultaneously activate. This synaptic strengthening is the neural basis for learning, allowing us to form and retain associations between different concepts, experiences, and skills.

Through the lens of Hebb’s rule, we gain insight into the adaptive nature of our brains, constantly rewiring and evolving in response to our experiences. This dynamic interplay of neurons, firing and forming new connections, lies at the core of our ability to learn, adapt, and interact with the world around us. The brain's capacity to reconfigure its connections underscores the remarkable flexibility and resilience of the human mind, enabling a lifetime of learning and discovery.

Biological Neural Nets
Biological Neuron
Biological Neuron

  • Dendrites: These are the input fibers that receive signals from other neurons. It receiving multiple inputs x1,x2,...,xn.
  • Cell Body: This integrates the incoming signals. If the sum of the signals exceeds a certain threshold, the cell body generates an output signal.
  • Axon Terminal: The output signal travels along the axon to the axon terminal, where it can be transmitted to other neurons.
  • Myelin Sheath: This is a fatty layer that encases the axon, facilitating faster transmission of the electrical signal.
  • Output: The neuron either fires an action potential or doesn't, based on the integrated signals. This is represented by the multiple outputs y1,y2,...,ym.

Perceptron

  • Inputs (x0,x1,...,xn): These are analogous to the dendrites, where x0 is often used to represent the bias input.
  • Weights (w0,w1,...,wn): Each input is multiplied by its respective weight, analogous to the synaptic strength in biological neurons. w0 is the weight for the bias.
  • Summation (∑): This is the process of summing all the weighted inputs, along with the bias weight.
  • Activation Function (f): This function determines the output of the perceptron, based on the weighted sum. If the sum is above a threshold, the perceptron 'fires' (usually outputs 1); otherwise, it 'does not fire' (outputs 0).
  • Output: The final binary result after the activation function has been applied.

Perceptron using Python Programming
Biological Model to Mathematical Model

  1. Importing Libraries

import random
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats        

2. Using random seed for reproducibility

#seed for reproducibility
random.seed(42)        

3. Perceptron for Linear Regression using Object-Oriented Programming

The traditional perceptron model is a type of artificial neuron that uses a binary step function as its activation function. It's designed for binary classification tasks. The perceptron makes predictions based on a linear predictor function combining a set of weights with the feature vector. The weighted sum calculation remains the same as in the traditional model.

The step function is replaced with the identity function, which means the output is the weighted sum itself without any further transformation.

Thus, the output y for linear regression becomes:

This equation directly maps the input features to a continuous output, making it suitable for regression tasks.

The training process for this modified perceptron involves adjusting the weights and bias to minimise a cost function appropriate for regression, such as the Mean Squared Error (MSE). The MSE measures the average squared difference between the estimated values and the actual value, providing a quantitative measure to guide the optimisation of weights and bias during training.

iteratively updating the weights and bias in the direction that minimises the MSE, the perceptron model is adapted to perform linear regression. This adaptation underscores the flexibility of neural network architectures, demonstrating their capability to extend beyond their traditional applications in classification to address regression tasks through straightforward modifications.

class PerceptronLinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=100):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.losses = []

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iterations):
            loss = 0
            for i in range(n_samples):
                prediction = np.dot(X[i], self.weights) + self.bias
                error = y[i] - prediction
                self.weights += self.learning_rate * error * X[i]
                self.bias += self.learning_rate * error
                loss += error ** 2
            self.losses.append(loss / n_samples)

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

    def mse(self, X, y):
        predictions = self.predict(X)
        return np.mean((predictions - y) ** 2)

    def r_squared(self, X, y):
        predictions = self.predict(X)
        mean_y = np.mean(y)
        ss_total = np.sum((y - mean_y) ** 2)
        ss_res = np.sum((y - predictions) ** 2)
        return 1 - (ss_res / ss_total)        

4. Generating Linear Regression dataset

#regression dataset
def generate_regression_data(n_samples=100, n_features=1, noise=0.1):
    X = np.random.rand(n_samples, n_features)
    true_weights = np.random.rand(n_features)
    y = np.dot(X, true_weights) + np.random.normal(scale=noise, size=n_samples)
    return X, y, true_weights        

5. Feature and Target

# generate datas
X, y, _ = generate_regression_data(n_samples=200, n_features=1, noise=0.05)        

6. Training Model

# train the model
model = PerceptronLinearRegression()
model.fit(X, y)        

7. Plotting Loss

# loss during training
plt.plot(range(model.n_iterations), model.losses)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Loss During Training')
plt.grid(True, ls='--', alpha=0.2, color='black')
plt.show()        
Loss Plot

8. Making Predictions (Best Fit Line)

y_pred = model.predict(X)
y_pred[:5]        
# best fit line
plt.scatter(X, y, color='black', label='Data points')
plt.plot(X, y_pred, color='red', label='Fitted line')
plt.xlabel('X')
plt.ylabel('y')
plt.grid(True, ls='--', alpha=0.2, color='black')
plt.legend()
plt.show()        
Best Fit Line

9. Model Evaluation using R Squared and Mean Square Error

R Squared
Mean Square Error
# mean squared error (MSE) and r-squared value
mse = model.mse(X, y)
r_squared = model.r_squared(X, y)
print("Mean Squared Error (MSE):", mse)
print("R-squared:", r_squared)        
Mean Squared Error (MSE): 0.002556175423738155
R-squared: 0.8339006158078921

10. Residual Analysis

sns.residplot(x=y_pred, y=residuals, lowess=True, color="black") 
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residual Analysis')
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.show()        
Residual plot

Spread of Residuals: The spread of the residuals seems to be fairly constant across the range of predicted values. This is a good sign, indicating homoscedasticity. If the residuals fan out or form a funnel shape as the predicted values increase, it would be a sign of heteroscedasticity, which could suggest that the variance of the errors is not constant.

Mean of Residuals: The mean of the residuals appears to be around zero across the entire range of predicted values. This is another good indicator, suggesting that the model does not have bias.

Presence of Patterns: Ideally, the residuals should not form any discernible patterns. In this plot, there is a smooth curve fitted to the data (presumably a Lowess curve), which doesn't show strong patterns or systematic deviations from the central line (zero). If there were clear patterns or trends, it would indicate that the model is not capturing all the relevant information, and there may be non-linearity in the relationship that the model is not accounting for.

Outliers: There do not appear to be any extreme outliers that are far removed from the rest of the data points. However, without knowing the scale and context, it's hard to make a definitive statement about outliers. Typically, points that lie a significant distance from the rest of the data might be considered outliers and could warrant further investigation.

Normality of Residuals: While this particular plot doesn't directly inform us about the normality of residuals, a QQ plot would be more appropriate for that purpose. For a regression model, ideally, the residuals should be normally distributed.

Zero Line: The red dashed line at y=0 acts as a reference to easily visualize the deviation of the residuals from zero. This line is an important part of a residual plot as it clearly shows where the residuals are positive or negative.

11. Residual QQ plot & Histogram

plt.figure(figsize=(10,4))
plt.subplot(1, 2, 1)
qq = stats.probplot(residuals, dist="norm")
plt.scatter(qq[0][0], qq[0][1], color='black', alpha=0.5)
plt.plot(qq[0][0], qq[1][1] + qq[1][0]*qq[0][0], color='red', alpha=0.7)
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.title('QQ Plot')

plt.subplot(1, 2, 2)  
plt.hist(residuals, bins=30, color='black', edgecolor='black', alpha=0.7)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Histogram of Residuals')
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.tight_layout()  
plt.show()        
Residual QQ plot & Histogram

In QQ Plot, The points should ideally lie on the red line if the residuals were perfectly normally distributed. In our plot, the points mostly follow the red line, but there are some deviations, especially at the lower end (the left side), where some points fall below the line, indicating that there are more extreme values in the lower tail than would be expected for a normal distribution. This could be a sign of slight left-skewness or light-tailedness in the distribution of residuals.

The distribution of the residuals in the histogram appears to be roughly bell-shaped, which is consistent with a normal distribution, but it is not perfectly symmetrical. There is a noticeable skew to the right, suggesting that there are more residuals that are greater than the mean of the residuals.

The multiple peaks suggest that the residuals might have a multi-modal distribution, which could indicate that different subsets of the data have different characteristics, or that the model might be missing some explanatory variables that affect the outcome.

The combination of these plots provides a richer understanding of the residuals than either would alone. However, they do not seem to indicate any major violations of the assumptions of ordinary least squares regression if that is the type of model used.

12. Residual Box plot

plt.figure(figsize=(3, 5))
sns.boxplot(y=residuals, color='black')
plt.title('Box Plot of Residuals')
plt.ylabel('Residuals')
plt.xlabel('Residuals')  
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.show()        
Residual Box plot

The box plot provides a summary of the distribution of the residuals. The median is close to zero, and the interquartile range is symmetric about the median, which is good. However, there are a couple of points that are identified as potential outliers, indicated by the dots above the upper whisker. These outliers could influence the regression model.

Conclusion

  • Homoscedasticity: The residual plot does not exhibit any clear signs of increasing or decreasing variance in residuals as the predicted values change. This suggests that the model's errors have constant variance (homoscedasticity), which is an important assumption for linear regression models.
  • Normality of Residuals: The QQ plot indicates that residuals largely follow a normal distribution, as most data points adhere closely to the theoretical line, especially in the central region. There are minor deviations at the tails, but they are not extreme. This slight deviation might not be significant enough to undermine the model's validity, especially in larger samples where the central limit theorem assures that the distribution of the residuals tends to be normal.
  • Outliers: The box plot reveals a few potential outliers. These are points that lie outside the expected range of variability and could indicate anomalies in the data or instances where the model does not perform well. However, the number of outliers is small and may not significantly affect the model's performance.
  • Overall Model Performance: The histogram of residuals shows a distribution that is approximately bell-shaped and centered around zero, which suggests that the model does not systematically overpredict or underpredict throughout its range. The slight skewness observed is often acceptable in practical applications unless it's severe.

Advantages

  1. Simplicity: Perceptron are relatively simple neural network models, making them easy to understand and implement, especially for those new to machine learning.
  2. Interpretability: Due to their simplicity, perceptron offer a level of interpretability that more complex models may lack. It's easier to understand how inputs are weighted and combined to produce outputs.
  3. Computational Efficiency: Perceptron can be computationally efficient, especially for smaller datasets, as they involve simple matrix multiplications and activation function calculations.
  4. Online Learning: Perceptron support online learning, meaning they can continuously update their weights based on incoming data, making them suitable for applications where data streams in real-time.
  5. Robustness to Noise: Perceptron can be robust to noise in data, especially if appropriate pre-processing techniques are applied. Their simplicity makes them less prone to overfitting noisy data.

Disadvantages

  1. Linear Limitation: Perceptron are limited to linear decision boundaries. This restricts their ability to model complex relationships present in many real-world datasets, leading to potentially poor performance on nonlinear problems.
  2. Convergence Issues: Perceptron may struggle to converge on a solution for datasets that are not linearly separable or have poorly conditioned features. This can result in longer training times or failure to find an optimal solution.
  3. Limited Expressiveness: Due to their shallow architecture and linear nature, perceptron have limited expressiveness compared to more complex neural network architectures. They may not capture intricate patterns present in the data.
  4. Lack of Hidden Layers: Perceptron consist of a single layer of neurons without hidden layers. This limits their ability to learn hierarchical representations of data, which are crucial for capturing complex relationships in high-dimensional data.

References


Thank you for reading...


Great analogy! This unique twist brings a refreshing perspective to algorithmic applications.?? Hemant Thapa

Exciting to see boundaries being pushed in machine learning! ?? Hemant Thapa

要查看或添加评论,请登录

Harry Thapa的更多文章

  • Python & the Option Greeks

    Python & the Option Greeks

    What Are Primary & Secondary Option Greeks? Have you ever watched options prices bounce around and wondered, What's the…

  • Polynomial Regression

    Polynomial Regression

    Table of content Introduction Feature Engineering Model Evaluation Cross Validation Residual Analysis Conclusion…

    2 条评论
  • Building Neural Network from Scratch for Customer Data Classification and Analysis

    Building Neural Network from Scratch for Customer Data Classification and Analysis

    Table of Content Introduction Selecting Dataset Input Layer , Hidden Layer and Output Layer Activation Function Forward…

    2 条评论
  • Basic Building Blocks of K-Means Clustering Algorithms

    Basic Building Blocks of K-Means Clustering Algorithms

    Table of Content: Introduction Theoretical Background K-means Algorithm Choosing the Number of Clusters Evaluation…

    2 条评论
  • Decision Tree Algorithm

    Decision Tree Algorithm

    Table of Content Introduction What is “Entropy”? and What is its function? Play Golf & Outlook Entropy Average…

  • Ad Dataset - Linear Regression

    Ad Dataset - Linear Regression

    Table of Content 1. Introduction 2.

  • Gradient Descent Algorithm

    Gradient Descent Algorithm

    Table of Contents Introduction Function Requirements Gradient Descent Building the Gradient Descent Algorithm Gradient…

    5 条评论
  • Cineworld, Blackstone & Ticker - Fundamental Analysis

    Cineworld, Blackstone & Ticker - Fundamental Analysis

    Cineworld Group plc & Equity Giant Blackstone Inc Cineworld Group Plc was found in 1995, it is well known in UK for…

  • Facial recognition with Computer Vision

    Facial recognition with Computer Vision

    OpenCV, a powerful open-source library, has become indispensable in modern systems due to its capabilities in object…

    1 条评论
  • Contract for Difference (CFD) Trading: A Beginner’s Guide

    Contract for Difference (CFD) Trading: A Beginner’s Guide

    Have you ever considered getting into the world of trading and investment, but found yourself hesitating, likening it…

社区洞察

其他会员也浏览了