Day 01 - Linear Regression

Day 01 - Linear Regression

  • Concept: Predict continuous values
  • Implementation: Ordinary Least Squares
  • Evaluation: R-squared, RMSE

CONCEPT

Linear regression is a statistical method employed to model the relationship between a dependent variable (target) and one or more independent variables (features). The aim is to identify the linear equation that most accurately predicts the target variable based on the feature variables.

The equation of a simple linear regression model is:

[y = mx + c]

where:

  • (y) is the predicted value
  • (x) is the independent variable
  • (m) is the slope of the line (co-efficient)
  • (c) is the y-intercept

IMPLEMENTATION

Let's consider an example using Python and its libraries.

Example

Suppose we have a dataset with house prices and their corresponding size (in square feet):

# Import necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

import warnings                                              # To remove warnings from my output
warnings.simplefilter(action = 'ignore')          
# Example Data

data = {
    'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
    'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}
df = pd.DataFrame(data)
df
        
# Defining Independent variable (feature) and Dependent variable (target)

X = df[['Size']]
y = df['Price']        
# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)        
# Creating and training the linear regression model

model = LinearRegression()
model.fit(X_train, y_train)        
# Making predictions

y_pred = model.predict(X_test)        
# Evaluating the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')        
# Plotting the results

plt.scatter(X, y, color = 'blue')                          # Original data points
plt.plot(X_test, y_pred, color = 'red', linewidth = 2).    # Regression line
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Linear Regression: House Prices vs Size')
plt.show()        
# Predicting with new values
# Here, we want to predict the price of a house when given the size

X_new = np.array([[3600]])
y_pred = model.predict(X_new)
print(f'Predicted value for X = 3600: {y_pred[0]:.0f}')        

EXPLANATION OF THE CODE

  1. Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
  2. Data Preparation: We create a DataFrame containing the size and price of houses.
  3. Feature and Target: We separate the feature (Size) and the target (Price).
  4. Train-Test-Split: We split the data into training and testing sets.
  5. Model Training: We create and train a LinearRegression model using the training data.
  6. Predictions: We use the trained model to predict house prices for the test set.
  7. Evaluation: We evaluate the model using Mean Squared Error (MSE) and R-squared (R2)metrics.
  8. Visualization: We plot the original data points and the regression line to visualize the model’s performance.

EVALUATION METRICS

  • Mean Squared Error (MSE): Measures the average squared difference between the actual and predicted values. Lower values indicate better performance.
  • R-squared (R2): Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Values closer to 1 indicate a better fit.


Download the Jupyter Notebook file for Day 01 here.



John AFE

Data Scientist | Data Science, Machine Learning

2 个月

I'll soon start my 30 days data science series too when I'm done with 30 days leetcode. This is great, I'm hoping to see how you progress Ime Eti-mfon keep up the good work

回复
Selong Edem

Ph.D. PTDF OSS Germany 2023 Scholar Petroleum Geoscientist Lecturer, University of Calabar Exxon Mobil 2006-2010 B.Sc Scholar

2 个月

Love this

回复
Yusufu Gambo

HCI l Applied AI l Data Analytics | AI Ethics

2 个月

Nice. Please, how do I get the remaining days

回复
Nwachukwu Jesse

Machine Learning Engineer | Mechatronics Engineer | ML Specialist | Building Innovative Solutions with Machine Learning and AI | Future Tech Entrepreneur

2 个月

This is nice I've finally learnt how to disable warnings from my output ?? Also, why did you use 2 squared brackets when declaring X and X_new? Isn't that meant for multiple linear regression?

回复

要查看或添加评论,请登录

Ime Eti-mfon的更多文章

社区洞察

其他会员也浏览了