Day 01 - Linear Regression
Ime Eti-mfon
Data Scientist | Machine Learning Engineer | Data Program Community Ambassador @ ALX
CONCEPT
Linear regression is a statistical method employed to model the relationship between a dependent variable (target) and one or more independent variables (features). The aim is to identify the linear equation that most accurately predicts the target variable based on the feature variables.
The equation of a simple linear regression model is:
[y = mx + c]
where:
IMPLEMENTATION
Let's consider an example using Python and its libraries.
Example
Suppose we have a dataset with house prices and their corresponding size (in square feet):
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import warnings # To remove warnings from my output
warnings.simplefilter(action = 'ignore')
# Example Data
data = {
'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
'Price': [300000, 320000, 340000, 360000, 380000, 400000, 420000, 440000, 460000, 480000]
}
df = pd.DataFrame(data)
df
# Defining Independent variable (feature) and Dependent variable (target)
X = df[['Size']]
y = df['Price']
领英推荐
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
# Creating and training the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
# Plotting the results
plt.scatter(X, y, color = 'blue') # Original data points
plt.plot(X_test, y_pred, color = 'red', linewidth = 2). # Regression line
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Linear Regression: House Prices vs Size')
plt.show()
# Predicting with new values
# Here, we want to predict the price of a house when given the size
X_new = np.array([[3600]])
y_pred = model.predict(X_new)
print(f'Predicted value for X = 3600: {y_pred[0]:.0f}')
EXPLANATION OF THE CODE
EVALUATION METRICS
Data Scientist | Data Science, Machine Learning
2 个月I'll soon start my 30 days data science series too when I'm done with 30 days leetcode. This is great, I'm hoping to see how you progress Ime Eti-mfon keep up the good work
Ph.D. PTDF OSS Germany 2023 Scholar Petroleum Geoscientist Lecturer, University of Calabar Exxon Mobil 2006-2010 B.Sc Scholar
2 个月Love this
HCI l Applied AI l Data Analytics | AI Ethics
2 个月Nice. Please, how do I get the remaining days
Machine Learning Engineer | Mechatronics Engineer | ML Specialist | Building Innovative Solutions with Machine Learning and AI | Future Tech Entrepreneur
2 个月This is nice I've finally learnt how to disable warnings from my output ?? Also, why did you use 2 squared brackets when declaring X and X_new? Isn't that meant for multiple linear regression?