My First Machine Learning Model. Simple Linear Regression Model to predict the profit of companies.
Ahmad Nawaz
Data | Python | SQL | R | Power BI | Data Visualisation | Data Storytelling | Data Analysis
Importing necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('1000_Companies.csv')
data
Spitting Data into dependent and independent variables
X = data[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]
X
y = data['Profit']
y
We have some text data in our dataset which is the "State" column. Machine Learning models cannot understand text data so we will have to convert this text data to numerical data.?
X['State'].unique()
X = pd.get_dummies(X, columns = ['State'])
X
Model Building Using train test split from sickit learn?
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
We want to predict profit which is a continuous variable so we will have to use linear regression.
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train, y_train)
Prediction
y_predict = reg.predict(X_test)
y_predict
Model Evaluation using R-Squared Value
from sklearn.metrics import r2_score
r2_score(y_test, y_predict)
Model evaluation score is 0.91 which means our model will predict profit up to 91% accuracy.