登录查看更多内容

Day 01 - Linear Regression

Ime Eti-mfon

Data Scientist | Machine Learning Engineer | Data Program Community Ambassador @ ALX

发布日期: 2025年1月20日

+ 关注

Concept: Predict continuous values
Implementation: Ordinary Least Squares
Evaluation: R-squared, RMSE

CONCEPT

Linear regression is a statistical method employed to model the relationship between a dependent variable (target) and one or more independent variables (features). The aim is to identify the linear equation that most accurately predicts the target variable based on the feature variables.

The equation of a simple linear regression model is:

[y = mx + c]

where:

(y) is the predicted value
(x) is the independent variable
(m) is the slope of the line (co-efficient)
(c) is the y-intercept

IMPLEMENTATION

Let's consider an example using Python and its libraries.

Example

Suppose we have a dataset with house prices and their corresponding size (in square feet):

# Import necessary libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

import warnings                                              # To remove warnings from my output
warnings.simplefilter(action = 'ignore')

# Example Data

data = {
    'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
    'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}
df = pd.DataFrame(data)
df

# Defining Independent variable (feature) and Dependent variable (target)

X = df[['Size']]
y = df['Price']

领英推荐

Data Science #17

Andriy Burkov 1 年前

Fuzzy Regression: A Generic, Model-free, Math-free…

Vincent Granville 2 年前

Mastering XGBoost: From Basics to Advanced Techniques…

Nick Gupta 1 年前

# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

# Creating and training the linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Making predictions

y_pred = model.predict(X_test)

# Evaluating the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Plotting the results

plt.scatter(X, y, color = 'blue')                          # Original data points
plt.plot(X_test, y_pred, color = 'red', linewidth = 2).    # Regression line
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Linear Regression: House Prices vs Size')
plt.show()

# Predicting with new values
# Here, we want to predict the price of a house when given the size

X_new = np.array([[3600]])
y_pred = model.predict(X_new)
print(f'Predicted value for X = 3600: {y_pred[0]:.0f}')

EXPLANATION OF THE CODE

Libraries: We import necessary libraries like numpy, pandas, sklearn, and matplotlib.
Data Preparation: We create a DataFrame containing the size and price of houses.
Feature and Target: We separate the feature (Size) and the target (Price).
Train-Test-Split: We split the data into training and testing sets.
Model Training: We create and train a LinearRegression model using the training data.
Predictions: We use the trained model to predict house prices for the test set.
Evaluation: We evaluate the model using Mean Squared Error (MSE) and R-squared (R2)metrics.
Visualization: We plot the original data points and the regression line to visualize the model’s performance.

EVALUATION METRICS

Mean Squared Error (MSE): Measures the average squared difference between the actual and predicted values. Lower values indicate better performance.
R-squared (R2): Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Values closer to 1 indicate a better fit.

Download the Jupyter Notebook file for Day 01 here.

John AFE

Data Scientist | Data Science, Machine Learning

2 个月

I'll soon start my 30 days data science series too when I'm done with 30 days leetcode. This is great, I'm hoping to see how you progress Ime Eti-mfon keep up the good work

Selong Edem

Ph.D. PTDF OSS Germany 2023 Scholar Petroleum Geoscientist Lecturer, University of Calabar Exxon Mobil 2006-2010 B.Sc Scholar

2 个月

Love this

Yusufu Gambo

HCI l Applied AI l Data Analytics | AI Ethics

2 个月

Nice. Please, how do I get the remaining days

Nwachukwu Jesse

Machine Learning Engineer | Mechatronics Engineer | ML Specialist | Building Innovative Solutions with Machine Learning and AI | Future Tech Entrepreneur

2 个月

This is nice I've finally learnt how to disable warnings from my output ?? Also, why did you use 2 squared brackets when declaring X and X_new? Isn't that meant for multiple linear regression?

查看更多评论

要查看或添加评论，请登录

Ime Eti-mfon的更多文章

Fake News Detection Using Machine Learning and Deep Learning

2025年3月11日

Fake News Detection Using Machine Learning and Deep Learning

Combatting Misinformation using Tech Tools Introduction Misinformation has become a major issue with the rise of social…

1 条评论
30 Days, 30 Concepts: A Deep Dive into Machine Learning

2025年2月24日

30 Days, 30 Concepts: A Deep Dive into Machine Learning

Introduction Over the past month, I completed a 30-day Data Science learning challenge focused on Machine Learning…

3 条评论
Day 30 — Hyperparameter Optimization

2025年2月23日

Day 30 — Hyperparameter Optimization

Concept: Model tuning. Implementation: Grid search, random search.

3 条评论
Day 29 — Model Deployment and Monitoring

2025年2月22日

Day 29 — Model Deployment and Monitoring

CONCEPT Model Deployment and Monitoring involve the processes of making trained machine learning models accessible for…

1 条评论
Day 28 — Time Series Analysis and Forecasting

2025年2月21日

Day 28 — Time Series Analysis and Forecasting

CONCEPT Time Series Analysis involves analyzing data points collected over time to extract meaningful statistics and…

1 条评论
Day 27 — Natural Language Processing (NLP)

2025年2月20日

Day 27 — Natural Language Processing (NLP)

CONCEPT Natural Language Processing (NLP) is a field of artificial intelligence focused on enabling computers to…

1 条评论
Day 26?-?Ensemble?Learning

2025年2月20日

Day 26?-?Ensemble?Learning

CONCEPT Ensemble learning is a machine learning technique where multiple models (learners) are trained to solve the…

1 条评论
Day 25 — Transfer Learning

2025年2月19日

Day 25 — Transfer Learning

Concept: Pre-trained models. Implementation: Fine-tuning.

1 条评论
Day 24 - Generative Adversarial Networks (GANs)

2025年2月18日

Day 24 - Generative Adversarial Networks (GANs)

Concept: Generative models. Implementation: Generator, discriminator.

5 条评论
Day 23 — Autoencoders

2025年2月17日

Day 23 — Autoencoders

Concept: Data compression. Implementation: Encoder, decoder.

1 条评论

See all articles

Day 01 - Linear Regression

Ime Eti-mfon

Data Scientist | Machine Learning Engineer | Data Program Community Ambassador @ ALX

CONCEPT

IMPLEMENTATION

Example

领英推荐

EXPLANATION OF THE CODE

EVALUATION METRICS

Ime Eti-mfon的更多文章

社区洞察

其他会员也浏览了

Simple Linear Regression Practical Example

Logistic Regression implementation in Python

Agent-Based Modeling and Simulation of Supply Chains with Python

A Practical Example for Improving ML Models with Multiple Linear Regression

Assumptions of linear regression explained

LangGraph: A Quick Start

Feature Engineering techniques made simple

Day 08 - Naive Bayes

Exploring Manufacturing Efficiency with Kadane’s Algorithm: A Practical Guide

Day 2: Logistic Regression

CONCEPT

IMPLEMENTATION

Example

领英推荐

EXPLANATION OF THE CODE

EVALUATION METRICS

Ime Eti-mfon的更多文章

Fake News Detection Using Machine Learning and Deep Learning

30 Days, 30 Concepts: A Deep Dive into Machine Learning

Day 30 — Hyperparameter Optimization

Day 29 — Model Deployment and Monitoring

Day 28 — Time Series Analysis and Forecasting

Day 27 — Natural Language Processing (NLP)

Day 26?-?Ensemble?Learning

Day 25 — Transfer Learning

Day 24 - Generative Adversarial Networks (GANs)

Day 23 — Autoencoders

社区洞察

其他会员也浏览了

Simple Linear Regression Practical Example

Logistic Regression implementation in Python

Agent-Based Modeling and Simulation of Supply Chains with Python

A Practical Example for Improving ML Models with Multiple Linear Regression

Assumptions of linear regression explained

LangGraph: A Quick Start

Feature Engineering techniques made simple

Day 08 - Naive Bayes

Exploring Manufacturing Efficiency with Kadane’s Algorithm: A Practical Guide

Day 2: Logistic Regression