Perceptron Based Linear Regression model
Harry Thapa
Build Predictive Models | Analyst | Smart Digital Solutions for Agencies, Start-Up & B2B | AI Strategies & Tech Innovations
What is perceptron ?
Perceptions are one of the simplest types of feed forward neural networks and the foundational building block of deep learning models. Invented in 1958 by Frank Rosenblatt, perceptron were designed to model the way the human brain processes information. A perceptron is essentially a linear classifier used for binary classification tasks, which means it predicts whether an input belongs to one class or another based on a linear predictor function combining a set of weights with the feature vector.
Typically, when we think of perceptron, the immediate association is with classification tasks. These are scenarios where the goal is to categorise inputs into two or more classes, making perceptron the go-to choice for problems with a clear, discrete boundary.
However, the innovative twist of applying a perceptron to linear regression tasks opens up a exploration in the field of machine learning algorithms and their potential for cross-domain application.
Linear regression, at its core, is a method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The primary goal here is prediction or forecasting, where continuous values are the output, contrasting sharply with the categorical output of classification tasks. This makes the idea of using a perceptron for linear regression initially seem like a square peg in a round hole a challenging fit, but not impossible.
Firstly, it showcases the flexibility of neural networks and their components. A simple tweak in the activation function can repurpose a basic neural network unit for a completely different kind of task. Secondly, it highlights the importance of foundational machine learning concepts and their interconnectivity. Understanding the principles behind algorithms allows for creative applications beyond their standard uses.
The basic structure of a perceptron includes input values, weights, a bias (or threshold), and an activation function.
Background : Biological process into a mathematical model
Human brain is a network of over 100 billion neurons forms the foundation of our thoughts, emotions, and behaviours. These neurons, akin to biological computational units, communicate through a complex web of electrical and chemical signals.
At the heart of this communication network are dendrites, delicate fibers that protrude from neurons, serving as the receivers of electrical impulses from their neighbours. This signals, continuously received by dendrites, is processed within the neuron. When the cumulative force of these incoming signals surpasses a specific threshold, a important event occurs: the neuron activates, or fires, sending its own electrical charge racing down its axon a long, slender projection that extends outward to make contact with the dendrites of adjacent neurons.
The points of connection where axons and dendrites meet are known as synapses. These are not mere points of physical contact but complex biochemical gateways for the transfer of information. Each neuron, a hub of activity, forms approximately 7,000 synaptic connections, illustrating the dense and dynamic network that underpins our neural activities.
Learning and memory in this neural tapestry are governed by the principles of synaptic plasticity, a concept embodied in Hebb’s rule, famously encapsulated in the phrase, Cells that fire together wire together. This rule posits that the synaptic connections between neurons strengthen as they simultaneously activate. This synaptic strengthening is the neural basis for learning, allowing us to form and retain associations between different concepts, experiences, and skills.
Through the lens of Hebb’s rule, we gain insight into the adaptive nature of our brains, constantly rewiring and evolving in response to our experiences. This dynamic interplay of neurons, firing and forming new connections, lies at the core of our ability to learn, adapt, and interact with the world around us. The brain's capacity to reconfigure its connections underscores the remarkable flexibility and resilience of the human mind, enabling a lifetime of learning and discovery.
Biological Neuron
Perceptron
Perceptron using Python Programming
import random
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
2. Using random seed for reproducibility
#seed for reproducibility
random.seed(42)
3. Perceptron for Linear Regression using Object-Oriented Programming
The traditional perceptron model is a type of artificial neuron that uses a binary step function as its activation function. It's designed for binary classification tasks. The perceptron makes predictions based on a linear predictor function combining a set of weights with the feature vector. The weighted sum calculation remains the same as in the traditional model.
The step function is replaced with the identity function, which means the output is the weighted sum itself without any further transformation.
Thus, the output y for linear regression becomes:
This equation directly maps the input features to a continuous output, making it suitable for regression tasks.
The training process for this modified perceptron involves adjusting the weights and bias to minimise a cost function appropriate for regression, such as the Mean Squared Error (MSE). The MSE measures the average squared difference between the estimated values and the actual value, providing a quantitative measure to guide the optimisation of weights and bias during training.
iteratively updating the weights and bias in the direction that minimises the MSE, the perceptron model is adapted to perform linear regression. This adaptation underscores the flexibility of neural network architectures, demonstrating their capability to extend beyond their traditional applications in classification to address regression tasks through straightforward modifications.
class PerceptronLinearRegression:
def __init__(self, learning_rate=0.01, n_iterations=100):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None
self.bias = None
self.losses = []
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.n_iterations):
loss = 0
for i in range(n_samples):
prediction = np.dot(X[i], self.weights) + self.bias
error = y[i] - prediction
self.weights += self.learning_rate * error * X[i]
self.bias += self.learning_rate * error
loss += error ** 2
self.losses.append(loss / n_samples)
def predict(self, X):
return np.dot(X, self.weights) + self.bias
def mse(self, X, y):
predictions = self.predict(X)
return np.mean((predictions - y) ** 2)
def r_squared(self, X, y):
predictions = self.predict(X)
mean_y = np.mean(y)
ss_total = np.sum((y - mean_y) ** 2)
ss_res = np.sum((y - predictions) ** 2)
return 1 - (ss_res / ss_total)
4. Generating Linear Regression dataset
#regression dataset
def generate_regression_data(n_samples=100, n_features=1, noise=0.1):
X = np.random.rand(n_samples, n_features)
true_weights = np.random.rand(n_features)
y = np.dot(X, true_weights) + np.random.normal(scale=noise, size=n_samples)
return X, y, true_weights
5. Feature and Target
# generate datas
X, y, _ = generate_regression_data(n_samples=200, n_features=1, noise=0.05)
6. Training Model
领英推荐
# train the model
model = PerceptronLinearRegression()
model.fit(X, y)
7. Plotting Loss
# loss during training
plt.plot(range(model.n_iterations), model.losses)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Loss During Training')
plt.grid(True, ls='--', alpha=0.2, color='black')
plt.show()
8. Making Predictions (Best Fit Line)
y_pred = model.predict(X)
y_pred[:5]
# best fit line
plt.scatter(X, y, color='black', label='Data points')
plt.plot(X, y_pred, color='red', label='Fitted line')
plt.xlabel('X')
plt.ylabel('y')
plt.grid(True, ls='--', alpha=0.2, color='black')
plt.legend()
plt.show()
9. Model Evaluation using R Squared and Mean Square Error
# mean squared error (MSE) and r-squared value
mse = model.mse(X, y)
r_squared = model.r_squared(X, y)
print("Mean Squared Error (MSE):", mse)
print("R-squared:", r_squared)
Mean Squared Error (MSE): 0.002556175423738155
R-squared: 0.8339006158078921
10. Residual Analysis
sns.residplot(x=y_pred, y=residuals, lowess=True, color="black")
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residual Analysis')
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.show()
Spread of Residuals: The spread of the residuals seems to be fairly constant across the range of predicted values. This is a good sign, indicating homoscedasticity. If the residuals fan out or form a funnel shape as the predicted values increase, it would be a sign of heteroscedasticity, which could suggest that the variance of the errors is not constant.
Mean of Residuals: The mean of the residuals appears to be around zero across the entire range of predicted values. This is another good indicator, suggesting that the model does not have bias.
Presence of Patterns: Ideally, the residuals should not form any discernible patterns. In this plot, there is a smooth curve fitted to the data (presumably a Lowess curve), which doesn't show strong patterns or systematic deviations from the central line (zero). If there were clear patterns or trends, it would indicate that the model is not capturing all the relevant information, and there may be non-linearity in the relationship that the model is not accounting for.
Outliers: There do not appear to be any extreme outliers that are far removed from the rest of the data points. However, without knowing the scale and context, it's hard to make a definitive statement about outliers. Typically, points that lie a significant distance from the rest of the data might be considered outliers and could warrant further investigation.
Normality of Residuals: While this particular plot doesn't directly inform us about the normality of residuals, a QQ plot would be more appropriate for that purpose. For a regression model, ideally, the residuals should be normally distributed.
Zero Line: The red dashed line at y=0 acts as a reference to easily visualize the deviation of the residuals from zero. This line is an important part of a residual plot as it clearly shows where the residuals are positive or negative.
11. Residual QQ plot & Histogram
plt.figure(figsize=(10,4))
plt.subplot(1, 2, 1)
qq = stats.probplot(residuals, dist="norm")
plt.scatter(qq[0][0], qq[0][1], color='black', alpha=0.5)
plt.plot(qq[0][0], qq[1][1] + qq[1][0]*qq[0][0], color='red', alpha=0.7)
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.title('QQ Plot')
plt.subplot(1, 2, 2)
plt.hist(residuals, bins=30, color='black', edgecolor='black', alpha=0.7)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Histogram of Residuals')
plt.grid(True, ls='--', alpha=0.3, color='black')
plt.tight_layout()
plt.show()
In QQ Plot, The points should ideally lie on the red line if the residuals were perfectly normally distributed. In our plot, the points mostly follow the red line, but there are some deviations, especially at the lower end (the left side), where some points fall below the line, indicating that there are more extreme values in the lower tail than would be expected for a normal distribution. This could be a sign of slight left-skewness or light-tailedness in the distribution of residuals.
The distribution of the residuals in the histogram appears to be roughly bell-shaped, which is consistent with a normal distribution, but it is not perfectly symmetrical. There is a noticeable skew to the right, suggesting that there are more residuals that are greater than the mean of the residuals.
The multiple peaks suggest that the residuals might have a multi-modal distribution, which could indicate that different subsets of the data have different characteristics, or that the model might be missing some explanatory variables that affect the outcome.
The combination of these plots provides a richer understanding of the residuals than either would alone. However, they do not seem to indicate any major violations of the assumptions of ordinary least squares regression if that is the type of model used.
12. Residual Box plot
plt.figure(figsize=(3, 5))
sns.boxplot(y=residuals, color='black')
plt.title('Box Plot of Residuals')
plt.ylabel('Residuals')
plt.xlabel('Residuals')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.show()
The box plot provides a summary of the distribution of the residuals. The median is close to zero, and the interquartile range is symmetric about the median, which is good. However, there are a couple of points that are identified as potential outliers, indicated by the dots above the upper whisker. These outliers could influence the regression model.
Conclusion
Advantages
Disadvantages
References
Thank you for reading...
Great analogy! This unique twist brings a refreshing perspective to algorithmic applications.?? Hemant Thapa
Exciting to see boundaries being pushed in machine learning! ?? Hemant Thapa