登录查看更多内容

Machine Learning 101 - Part 2- A Tutorial on Simple Linear Regression

Harry Wang, Ph.D.

Software Engineer at Google

发布日期: 2018年12月12日

In my previous article Machine Learning 101 - Part 1- A Tutorial for Data Preprocessing, we have learned how to get dataset, fill missing data, transform categorical data, and do feature scaling. From this article on, we will dive right into the pool of machine learning algorithms. The first algorithm I am gonna introduce is simple linear regression. In this model, there is only one independent variable, and it assumes that the dependent variable has linear relationship with the independent variable. If x is the independent variable, y is the dependent variable, the simple linear regression model can be mathematically expressed as y = b + a * x, where b is the intercept, and a is the coefficient. Let's take a look at part of the data we are working with:

Assuming we have surveyed 100 publicly traded companies and collected their annual revenue and R&D expense data. We would like to find out if there is a linear relationship between the R&D expense and annual revenue. The data provided here is super clean. There is no missing data and there is neither currency sign in front of the numbers nor commas separating the numbers. If the real-world data contains dollar signs or commas, you can refer to the Python script in my Github repo: Simple Linear Regression. for sanitizing the data.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('revenue_rdexpense.csv')

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

Firstly, we import the dataset from the revenue_rdexpense.csv file, and assign revenue as the independent variable, R&D expense as the dependent variable. Since there is limited amount of data and we want to use all the data for our simple linear regression model, we don't split our dataset this time. Also, because the linear regression model automatically does the feature scaling for us, we don't need to explicitly feature scale the data.

# Fitting Simple Linear Regression to the data set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

The actual linear regression fitting algorithm is only three lines of code. In order to do simple linear regression, we need to import the LinearRegression object from the sklearn.linear_model library. This LinearRegression object is not only able to do simple linear regression with one independent variable, it is also able to do multiple linear regression with multiple independent variables (that is the topic of the following article). We first create an instance of the LinearRegression object called regressor. Then we apply the fit method of regressor to X and y. That is it! The fitting is done.

The next step is to see how good the fitting is compared to the original values. Note in the beginning we import matplotlib.pyplot as plt. It is a super useful tool for visualizing the regression results. We will show how it is used:

# Visualising the Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Simple Linear Regression Model')
plt.xlabel('Revenue(Million Dollars)')
plt.ylabel('R&D Expense(Million Dollars)')
plt.show()

You can use the scatter method of plt to plot the original y values against X. Then we can plot the predicted y values against X. We can also specify the title, x coordinate label, y coordinate label on the graph. Here is what the graph looks like:

Visually, the scattered points lie near the fitting line, which indicates that the R&D expense does have linear relationship with the revenue to some extent. Regressor also provides some other functions to help us gain more insights of the fitting such as getting the coefficients and intercept of the linear regression:

# Print the coeficients and intercept
print('Coefficients: \n', regressor.coef_)
print('Intercept: \n', regressor.intercept_)

Conclusion

In this article, we have introduced how to use the LinearRegression object to perform simple linear regression with just one independent variable. It is easy to implement and it is super powerful. But in reality, things are more complicated. Often times, an outcome depends on multiple factors. And how can model this event? That is the problem the multiple linear regression tries to solve to some extent. In my next article Machine Learning 101 - Part 3- A Tutorial for Multiple Linear Regression, I am going to introduce a more powerful multiple linear regression algorithm, as well as the corresponding backward elimination algorithm to get the best fitting results. Stay tuned.

References

You can download the Python script and dataset for this tutorial from my Github repo: Simple Linear Regression.

要查看或添加评论，请登录

Harry Wang, Ph.D.的更多文章

First Bite of Web3

2022年1月5日

First Bite of Web3

Introduction What is Web3? It means different things for different people. Even the Ethereum co-founder Gavin Wood, who…
Machine Learning 101 - Part 1- A Tutorial on Data Preprocessing

2018年12月11日

Machine Learning 101 - Part 1- A Tutorial on Data Preprocessing

Machine learning is the future of the tech industry. Machine learning plays a big role in virtually every aspect of our…

1 条评论
Mastering Golang Series - Part 1 - How to Read Data From The Terminal and a File

2018年11月24日

Mastering Golang Series - Part 1 - How to Read Data From The Terminal and a File

The Go programming language has gained tremendous popularity in the last few years. More and more technology companies…
A Brief Introduction to The Go Programming Language

2018年6月26日

A Brief Introduction to The Go Programming Language

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. (From…

5 条评论
Introduction to Cryptography

2018年1月17日

Introduction to Cryptography

Before we get into the the world of cryptography, I want you to close your eyes, empty your mind, and think of…

1 条评论
How to Build a Mobile App Using React Native

2017年10月28日

How to Build a Mobile App Using React Native

Have you ever dreamed of building your own mobile app? This article will bring you much closer to that dream. In this…
Bitcoin Supply Mystery: Don't be Fooled by 2140

2017年10月25日

Bitcoin Supply Mystery: Don't be Fooled by 2140

If you follow bitcoin news, you may have learned that bitcoin will eventually reach its 21 million limit by…

1 条评论
A Brief Introduction to Bitcoin

2017年10月17日

A Brief Introduction to Bitcoin

You may have heard about Bitcoin thousands of times, but never got a chance to really dig in and find out what the fuss…

1 条评论

See all articles

Machine Learning 101 - Part 2- A Tutorial on Simple Linear Regression

Harry Wang, Ph.D.

Software Engineer at Google

Conclusion

References

Harry Wang, Ph.D.的更多文章

社区洞察

其他会员也浏览了

The six most painstaking steps in machine learning – what your team isn’t telling you

24 Ultimate Data Science (ML) projects to work on in 2022

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners

KNN Algorithm in Credit Limit Decision-making

Applied Machine Learning Projects: Course Launch

Data Science Roadmap 2025: Complete and Comprehensive Journey

KDnuggets 17:n05: 5 Career Paths in Big Data, Data Science Explained; Identifying Better Predictors

Building 10 Classifier ????Models in Machine?Learning + Notebook

Handle Missing Data with Iterative Imputer by Scikit Learn

Learning Data Science, Winning Solutions in Hackathons, AR Models & Much More!!

Conclusion

References

Harry Wang, Ph.D.的更多文章

First Bite of Web3

Machine Learning 101 - Part 1- A Tutorial on Data Preprocessing

Mastering Golang Series - Part 1 - How to Read Data From The Terminal and a File

A Brief Introduction to The Go Programming Language

Introduction to Cryptography

How to Build a Mobile App Using React Native

Bitcoin Supply Mystery: Don't be Fooled by 2140

A Brief Introduction to Bitcoin

社区洞察

其他会员也浏览了

The six most painstaking steps in machine learning – what your team isn’t telling you

24 Ultimate Data Science (ML) projects to work on in 2022

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners

KNN Algorithm in Credit Limit Decision-making

Applied Machine Learning Projects: Course Launch

Data Science Roadmap 2025: Complete and Comprehensive Journey

KDnuggets 17:n05: 5 Career Paths in Big Data, Data Science Explained; Identifying Better Predictors

Building 10 Classifier ????Models in Machine?Learning + Notebook

Handle Missing Data with Iterative Imputer by Scikit Learn

Learning Data Science, Winning Solutions in Hackathons, AR Models & Much More!!