Salary Prediction with Python
Dr. Fatma Ben Mesmia Chaabouni
Assistant Professor in Computer Science @ CU Ulster University, Qatar |Ph.D. in CS | MSc_B.Sc. in CS| NLP-AI and Data Analytics- Blockchain researcher | MBA mentor| Tunisian AI Society Member
An Engaging Introduction to Simple Linear Regression
Have you ever considered predicting your salary? While many of us passively await our paycheck, let us explore how to use Python for salary forecasting. This guide will teach you to employ Simple Linear Regression to analyze and predict salaries based on years of experience.
Why Predict Salaries?
Imagine understanding your earning potential based on your experience—it's like having a data-driven crystal ball! This insight can enhance negotiations, career planning, and more.
Step 1: Setting the Stage
Begin by importing essential Python libraries for data handling, graph plotting, and model building.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Load the dataset with two columns: YearsExperience and Salary.
dataset = pd.read_csv("Salary_Data.csv")
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
Step 2: Splitting the Dataset
Before jumping into modeling, we need to split our data into training and testing sets. This way, we can train our model and then test its accuracy on unseen data.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=1/3, random_state=0)
Step 3: Training the Model
Now, let us train our Simple Linear Regression model to establish the relationship between experience and salary.
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train, y_train)
Step 4: Making Predictions
Now that our model is trained, we can predict salaries for the test set—this is where the magic happens!
pred = regressor.predict(x_test)
Step 5: Evaluating the Model
We should evaluate the accuracy of our predictions using the Root Mean Squared Error (RMSE), which indicates the deviation of our predictions from the actual values.
领英推荐
from sklearn.metrics import mean_squared_error
from math import sqrt
rmse = sqrt(mean_squared_error(y_test, pred))
print("RMSE:", rmse)
Step 6: Visualizing the Results
Visualizations are essential for grasping our model. Let's create plots to evaluate its fit to the data.
Training Set Results:
plt.scatter(x_train, y_train, color="blue")
plt.plot(x_train, regressor.predict(x_train), color="green")
plt.title("Training set (Salary vs Experience)")
plt.xlabel("Experience in years")
plt.ylabel("Salary")
plt.show()
Test Set Results:
plt.scatter(x_test, y_test, color="blue")
plt.plot(x_train, regressor.predict(x_train), color="green")
plt.title("Test set (Salary vs Experience)")
plt.xlabel("Experience in years")
plt.ylabel("Salary")
plt.show()
Combined Visualization:
plt.scatter(x_train, y_train, color="blue")
plt.scatter(x_test, y_test, color="red")
plt.plot(x_train, regressor.predict(x_train), color="green")
plt.title("Salary vs Experience")
plt.xlabel("Experience in years")
plt.ylabel("Salary")
plt.show()
Data and Model Complexity
Let's examine the dataset we used.
This dataset clearly shows the impact of experience on salary, demonstrated by a fitted linear model.
Task Complexity
Despite its performance, Simple Linear Regression has complexities in many key areas:
Predicting salaries may seem futuristic, but with Python and Simple Linear Regression, it is attainable.
Next time you wonder about your salary growth, remember: data and Python have the answers. Happy coding!
#DataScience #MachineLearning #Python #LinearRegression #PredictiveAnalytics #SalaryPrediction #CareerPlanning #AI #Tech #BigData #Programming #DataVisualization #Analytics #PythonProgramming