Training and running a linear model using Scikit-Learn
Training and running a linear model using Scikit-Learn

Training and running a linear model using Scikit-Learn

Linear regression is a popular method for modeling the relationship between a continuous target variable and one or more predictor variables. In this task, we'll use Scikit-Learn, a popular machine-learning library in Python, to train and run a simple linear regression model.

The first step is to import the necessary libraries. We'll need numpy for creating the dataset and LinearRegression from Scikit-Learn for training the model. Once we've imported the necessary libraries, we can generate a sample dataset with two features and one target variable. In this example, we'll create a sample dataset with five records.

Once we have the dataset, we'll create an instance of the LinearRegression model and train it on the dataset using the fit() method. The fit() method adjusts the model parameters to minimize the difference between the predicted and actual target values on the training dataset.

After training the model, we can use it to make predictions on new input data points using the predict() method. In this example, we'll create a new input data point with two features and use the predict() method to compute the corresponding target value.

Finally, we'll print the predicted target value to the console. The predicted target value is an estimate of what the target value would be for the given input data point based on the linear relationship learned from the training dataset.


This code demonstrates the process of creating, training, and evaluating a linear regression model on a synthetic dataset with two features. It is divided into several sections, which I will explain one by one:

1. Import necessary libraries: The code imports required libraries such as NumPy, Matplotlib, and scikit-learn to generate data, visualize it, and create a linear regression model.

#?Import?the?necessary?librari
import?numpy?as?np
import?matplotlib.pyplot?as?plt
from?sklearn.model_selection?import?train_test_split
from?sklearn.linear_model?import?LinearRegression
from?sklearn.metrics?import?mean_squared_errorse        


2. Generate synthetic dataset: A synthetic dataset with 1000 records and 2 features is created. The target variable `y` is calculated using a linear combination of the features, with some added Gaussian noise.

#?Generate?synthetic?dataset?with?1000?records
X?=?np.random.normal(size=(1000,?2))
y?=?2*X[:,0]?-?3*X[:,1]?+?np.random.normal(size=1000)        

3. Split the dataset: The dataset is split into training (75%) and test (25%) sets using the `train_test_split` function from scikit-learn.

#?Split?the?dataset?into?training?and?test?set
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.25,?random_state=42)s        

4. Visualize the training dataset: A scatter plot is created to visualize the training dataset. The color of the points represents the target variable, and the two axes correspond to the two features.

#?Visualize?the?training?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_train[:,0],?X_train[:,1],?c=y_train,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Training?Dataset?with?Continuous?Target')
plt.show()        
No alt text provided for this image


5. Visualize the test dataset: Similarly, a scatter plot is created to visualize the test dataset.

#?Visualize?the?test?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_test[:,0],?X_test[:,1],?c=y_test,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Test?Dataset?with?Continuous?Target')
plt.show()        
No alt text provided for this image

6. Create a Linear Regression model: An instance of the `LinearRegression` class from scikit-learn is created.

#?Create?a?Linear?Regression?model
model?=?LinearRegression()        

7. Train the model: The model is trained on the training set using the `fit` method.

#?Train?the?model?on?the?training?set
model.fit(X_train,?y_train)        

8. Make predictions: The trained model is used to make predictions on the test set.

# Make predictions on the test set
y_pred = model.predict(X_test)        


9. Plot the actual and predicted target values: A scatter plot is created to compare the actual target values with the predicted values on the test set. The actual values are in red, while the predicted values are in blue.

#?Plot?the?actual?and?predicted?target?values?on?the?test?set
fig,?ax?=?plt.subplots()
ax.scatter(range(len(y_test)),?y_test,?color='red',?label='Actual')
ax.scatter(range(len(y_pred)),?y_pred,?color='blue',?label='Predicted')
ax.set_xlabel('Index')
ax.set_ylabel('Target?Value')
ax.set_title('Actual?vs.?Predicted?Target?Values?on?Test?Set')
ax.legend()
plt.show()        
No alt text provided for this image

10. Compute the mean squared error (MSE): The MSE between the predicted and actual target values is calculated using the `mean_squared_error` function from scikit-learn.

#?Compute?the?mean?squared?error?between?the?predicted?and?actual?target?value
mse?=?mean_squared_error(y_test,?y_pred)
        

11. Print the mean squared error: The computed MSE is printed to the console.Using

#?Print?the?mean?squared?error
print('Mean?Squared?Error:',?mse)
        

The result is: Mean Squared Error: 0.9122092569715904

Full code is here:


#?Import?the?necessary?libraries
import?numpy?as?np
import?matplotlib.pyplot?as?plt
from?sklearn.model_selection?import?train_test_split
from?sklearn.linear_model?import?LinearRegression
from?sklearn.metrics?import?mean_squared_error


#?Generate?synthetic?dataset?with?1000?records
X?=?np.random.normal(size=(1000,?2))
y?=?2*X[:,0]?-?3*X[:,1]?+?np.random.normal(size=1000)


#?Split?the?dataset?into?training?and?test?sets
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.25,?random_state=42)


#?Visualize?the?training?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_train[:,0],?X_train[:,1],?c=y_train,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Training?Dataset?with?Continuous?Target')
plt.show()


#?Visualize?the?test?dataset
fig,?ax?=?plt.subplots()
ax.scatter(X_test[:,0],?X_test[:,1],?c=y_test,?cmap='viridis')
ax.set_xlabel('Feature?1')
ax.set_ylabel('Feature?2')
ax.set_title('Test?Dataset?with?Continuous?Target')
plt.show()


#?Create?a?Linear?Regression?model
model?=?LinearRegression()


#?Train?the?model?on?the?training?set
model.fit(X_train,?y_train)


#?Make?predictions?on?the?test?set
y_pred?=?model.predict(X_test)


#?Plot?the?actual?and?predicted?target?values?on?the?test?set
fig,?ax?=?plt.subplots()
ax.scatter(range(len(y_test)),?y_test,?color='red',?label='Actual')
ax.scatter(range(len(y_pred)),?y_pred,?color='blue',?label='Predicted')
ax.set_xlabel('Index')
ax.set_ylabel('Target?Value')
ax.set_title('Actual?vs.?Predicted?Target?Values?on?Test?Set')
ax.legend()
plt.show()



#?Compute?the?mean?squared?error?between?the?predicted?and?actual?target?values
mse?=?mean_squared_error(y_test,?y_pred)


#?Print?the?mean?squared?error
print('Mean?Squared?Error:',?mse)

        

Question: What is the purpose of the mean squared error in the Linear Regression model?

Answer: The mean squared error (MSE) is a measure of the average squared difference between the predicted and actual target values in the Linear Regression model. It is commonly used as a performance metric to evaluate the accuracy of the model's predictions on the test set. A lower value of MSE indicates that the model has better predictive power, as it means that the predicted values are closer to the actual values. Therefore, the purpose of the mean squared error in the Linear Regression model is to assess the quality of the model's predictions and to determine if it needs to be further optimized.


#Python #DataScience #MachineLearning #LinearRegression #Numpy #Matplotlib #ScikitLearn #SyntheticData #TrainingSet #TestSet #DataVisualization #ContinuousTarget #PredictiveModeling #Accuracy #MeanSquaredError #RegressionAnalysis #FeatureEngineering #ModelEvaluation #DataSplitting #ModelTraining #ModelTesting #DataAnalysis #StatisticalModeling #SupervisedLearning #DataModelling

要查看或添加评论,请登录

Ali Nemati的更多文章

社区洞察

其他会员也浏览了