登录查看更多内容

Training and running a linear model using Scikit-Learn

Ali Nemati

Founder osllm.ai | PhD Candidate in BHI

发布日期: 2023年3月30日

Linear regression is a popular method for modeling the relationship between a continuous target variable and one or more predictor variables. In this task, we'll use Scikit-Learn, a popular machine-learning library in Python, to train and run a simple linear regression model.

The first step is to import the necessary libraries. We'll need numpy for creating the dataset and LinearRegression from Scikit-Learn for training the model. Once we've imported the necessary libraries, we can generate a sample dataset with two features and one target variable. In this example, we'll create a sample dataset with five records.

Once we have the dataset, we'll create an instance of the LinearRegression model and train it on the dataset using the fit() method. The fit() method adjusts the model parameters to minimize the difference between the predicted and actual target values on the training dataset.

After training the model, we can use it to make predictions on new input data points using the predict() method. In this example, we'll create a new input data point with two features and use the predict() method to compute the corresponding target value.

Finally, we'll print the predicted target value to the console. The predicted target value is an estimate of what the target value would be for the given input data point based on the linear relationship learned from the training dataset.

This code demonstrates the process of creating, training, and evaluating a linear regression model on a synthetic dataset with two features. It is divided into several sections, which I will explain one by one:

1. Import necessary libraries: The code imports required libraries such as NumPy, Matplotlib, and scikit-learn to generate data, visualize it, and create a linear regression model.

#?Import?the?necessary?librari
import?numpy?as?np
import?matplotlib.pyplot?as?plt
from?sklearn.model_selection?import?train_test_split
from?sklearn.linear_model?import?LinearRegression
from?sklearn.metrics?import?mean_squared_errorse

2. Generate synthetic dataset: A synthetic dataset with 1000 records and 2 features is created. The target variable `y` is calculated using a linear combination of the features, with some added Gaussian noise.

#?Generate?synthetic?dataset?with?1000?records
X?=?np.random.normal(size=(1000,?2))
y?=?2*X[:,0]?-?3*X[:,1]?+?np.random.normal(size=1000)

3. Split the dataset: The dataset is split into training (75%) and test (25%) sets using the `train_test_split` function from scikit-learn.

#?Split?the?dataset?into?training?and?test?set
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.25,?random_state=42)s

4. Visualize the training dataset: A scatter plot is created to visualize the training dataset. The color of the points represents the target variable, and the two axes correspond to the two features.

#?Visualize?the?training?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_train[:,0],?X_train[:,1],?c=y_train,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Training?Dataset?with?Continuous?Target')
plt.show()

5. Visualize the test dataset: Similarly, a scatter plot is created to visualize the test dataset.

#?Visualize?the?test?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_test[:,0],?X_test[:,1],?c=y_test,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Test?Dataset?with?Continuous?Target')
plt.show()

6. Create a Linear Regression model: An instance of the `LinearRegression` class from scikit-learn is created.

领英推荐

Understanding the Speed and Efficiency of Polars

Eduardo Miranda 7 个月前

Kalman filters, Natufian, and grilled lamb (convo…

Lars Warren Ericson 6 个月前

Simple Linear Regression Practical Example

Rany ElHousieny, PhD??? 1 年前

#?Create?a?Linear?Regression?model
model?=?LinearRegression()

7. Train the model: The model is trained on the training set using the `fit` method.

#?Train?the?model?on?the?training?set
model.fit(X_train,?y_train)

8. Make predictions: The trained model is used to make predictions on the test set.

# Make predictions on the test set
y_pred = model.predict(X_test)

9. Plot the actual and predicted target values: A scatter plot is created to compare the actual target values with the predicted values on the test set. The actual values are in red, while the predicted values are in blue.

#?Plot?the?actual?and?predicted?target?values?on?the?test?set
fig,?ax?=?plt.subplots()
ax.scatter(range(len(y_test)),?y_test,?color='red',?label='Actual')
ax.scatter(range(len(y_pred)),?y_pred,?color='blue',?label='Predicted')
ax.set_xlabel('Index')
ax.set_ylabel('Target?Value')
ax.set_title('Actual?vs.?Predicted?Target?Values?on?Test?Set')
ax.legend()
plt.show()

10. Compute the mean squared error (MSE): The MSE between the predicted and actual target values is calculated using the `mean_squared_error` function from scikit-learn.

#?Compute?the?mean?squared?error?between?the?predicted?and?actual?target?value
mse?=?mean_squared_error(y_test,?y_pred)

11. Print the mean squared error: The computed MSE is printed to the console.Using

#?Print?the?mean?squared?error
print('Mean?Squared?Error:',?mse)

The result is: Mean Squared Error: 0.9122092569715904

Full code is here:

#?Import?the?necessary?libraries
import?numpy?as?np
import?matplotlib.pyplot?as?plt
from?sklearn.model_selection?import?train_test_split
from?sklearn.linear_model?import?LinearRegression
from?sklearn.metrics?import?mean_squared_error


#?Generate?synthetic?dataset?with?1000?records
X?=?np.random.normal(size=(1000,?2))
y?=?2*X[:,0]?-?3*X[:,1]?+?np.random.normal(size=1000)


#?Split?the?dataset?into?training?and?test?sets
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.25,?random_state=42)


#?Visualize?the?training?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_train[:,0],?X_train[:,1],?c=y_train,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Training?Dataset?with?Continuous?Target')
plt.show()


#?Visualize?the?test?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_test[:,0],?X_test[:,1],?c=y_test,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Test?Dataset?with?Continuous?Target')
plt.show()


#?Create?a?Linear?Regression?model
model?=?LinearRegression()


#?Train?the?model?on?the?training?set
model.fit(X_train,?y_train)


#?Make?predictions?on?the?test?set
y_pred?=?model.predict(X_test)


#?Plot?the?actual?and?predicted?target?values?on?the?test?set
fig,?ax?=?plt.subplots()
ax.scatter(range(len(y_test)),?y_test,?color='red',?label='Actual')
ax.scatter(range(len(y_pred)),?y_pred,?color='blue',?label='Predicted')
ax.set_xlabel('Index')
ax.set_ylabel('Target?Value')
ax.set_title('Actual?vs.?Predicted?Target?Values?on?Test?Set')
ax.legend()
plt.show()



#?Compute?the?mean?squared?error?between?the?predicted?and?actual?target?values
mse?=?mean_squared_error(y_test,?y_pred)


#?Print?the?mean?squared?error
print('Mean?Squared?Error:',?mse)

Question: What is the purpose of the mean squared error in the Linear Regression model?

Answer: The mean squared error (MSE) is a measure of the average squared difference between the predicted and actual target values in the Linear Regression model. It is commonly used as a performance metric to evaluate the accuracy of the model's predictions on the test set. A lower value of MSE indicates that the model has better predictive power, as it means that the predicted values are closer to the actual values. Therefore, the purpose of the mean squared error in the Linear Regression model is to assess the quality of the model's predictions and to determine if it needs to be further optimized.

#Python #DataScience #MachineLearning #LinearRegression #Numpy #Matplotlib #ScikitLearn #SyntheticData #TrainingSet #TestSet #DataVisualization #ContinuousTarget #PredictiveModeling #Accuracy #MeanSquaredError #RegressionAnalysis #FeatureEngineering #ModelEvaluation #DataSplitting #ModelTraining #ModelTesting #DataAnalysis #StatisticalModeling #SupervisedLearning #DataModelling

要查看或添加评论，请登录

Ali Nemati的更多文章

Leveraging Large Language Models to Address Synthetic Data Limitations

2024年9月16日

Leveraging Large Language Models to Address Synthetic Data Limitations

Synthetic data has emerged as a valuable resource for businesses embarking on data science initiatives, offering…
Using Web 3.0 & Blockchain to Optimize Data Storage and Minimize LLM Token Usage

2024年9月14日

Using Web 3.0 & Blockchain to Optimize Data Storage and Minimize LLM Token Usage

The convergence of Web 3.0 and blockchain technology presents a powerful way to optimize data storage for AI and reduce…
Linear regression for housing data using randomized search, cross-validation, search grid, or combines:

2023年3月31日

Linear regression for housing data using randomized search, cross-validation, search grid, or combines:

Linear regression is a statistical technique used to model the relationship between a dependent variable and one or…

Training and running a linear model using Scikit-Learn

Ali Nemati

Founder osllm.ai | PhD Candidate in BHI

领英推荐

Ali Nemati的更多文章

社区洞察

其他会员也浏览了

Python NumPy: Efficient Numerical Computing

Einstein Summation in Numpy

A Gentle Introduction to XGBoost for Applied Machine Learning

A detailed K-nearest Neighbors classifier in Python

Pre-processing data in Python for Machine Learning

Learn Logistic Regression for Classification with Python: 10 Practical Examples.

Understanding pandas and NumPy in Python: A Comprehensive Guide

Seaborn

Wheat Seed Classification Prediction Using Pyaret(Multiclass Classification).

领英推荐

Ali Nemati的更多文章

Leveraging Large Language Models to Address Synthetic Data Limitations

Using Web 3.0 & Blockchain to Optimize Data Storage and Minimize LLM Token Usage

Linear regression for housing data using randomized search, cross-validation, search grid, or combines:

社区洞察

其他会员也浏览了

Python NumPy: Efficient Numerical Computing

Einstein Summation in Numpy

A Gentle Introduction to XGBoost for Applied Machine Learning

A detailed K-nearest Neighbors classifier in Python

Pre-processing data in Python for Machine Learning

Learn Logistic Regression for Classification with Python: 10 Practical Examples.

Understanding pandas and NumPy in Python: A Comprehensive Guide

Seaborn

Wheat Seed Classification Prediction Using Pyaret(Multiclass Classification).