Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)
Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS
Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)
Model Performance depends on its Data.
If Data is not Good, output will be garbage.
That's why
Dr. Sheraz Sb Says:
Garbage In, Garbage Out.
From the previous lesson, we have seen that the MSE value is too High.
So, what to do now???
We have to further analyze the data means data pre-processing.
Data cleaning is the part of data pre processing.
We can apply different models in this scenario.
MSE can be reduced but cann't be completely zero.
In this case, you can apply other models as well like Decision Tree etc.
BIG VALUES DOMINATE IN MODEL, Model Give weightage to Big Values.
{import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
plt.rcParams["figure.figsize"] = [10,5]
# Ignore warnings
import warnings
# Set the warning filter to ignore FutureWarning
warnings.simplefilter(action = "ignore", category = FutureWarning)}
{full_data = pd.read_csv('/content/USA_Housing.csv')}
{# Data shape
print('train data:',full_data.shape)}
{# View first few rows
full_data.head(5)}
{# Data Info
full_data.info()}
{# Heatmap
sns.heatmap(full_data.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')
plt.title('Missing Data: Training Set')
plt.show()}
{# Remove Address feature
full_data.drop('Address', axis = 1, inplace = True)}
{# Remove rows with missing data
full_data.dropna(inplace = True)}
{full_data}
{# Numeric summary
full_data.describe()}
{# Shape of train data
full_data.shape}
{# Split data to be used in the models
# Create matrix of features
x = full_data.drop('Price', axis = 1) # grabs everything else but 'Price'
# Create target variable
y = full_data['Price'] # y is the column we're trying to predict
}
{from sklearn import preprocessing
pre_process = preprocessing.StandardScaler().fit(x)
x_transform = pre_process.fit_transform(x)}
{# x Represents the Features
x_transform.shape
x_transform}
{y.shape}
{# Use x and y variables to split the training data into train and test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_transform, y, test_size = .10, random_state = 101)}
{# Fit
# Import model
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Create instance of model
lin_reg = LinearRegression()
# Pass training data into model
lin_reg.fit(x_train, y_train)
# pipe = make_pipeline(StandardScaler(), LinearRegression())
# pipe.fit(x_train, y_train)}
{# Predict
y_pred = lin_reg.predict(x_test)
print(y_pred.shape)
print(y_pred)}
{sns.scatterplot(x=y_test, y=y_pred, color='blue', label='Actual Data points')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', label='Ideal Line')
plt.legend()
plt.show()}
{# Combine actual and predicted values side by side
results = np.column_stack((y_test, y_pred))
领英推荐
# Printing the results
print("Actual Values | Predicted Values")
print("-----------------------------")
for actual, predicted in results:
print(f"{actual:14.2f} | {predicted:12.2f}")}
{residual = actual- y_pred.reshape(-1)
print(residual)}
{# Distribution plot for Residual (difference between actual and predicted values)
sns.distplot(residual, kde=True)}
{# Score It
from sklearn.metrics import mean_squared_error
print('Linear Regression Model')
# Results
print('--'*30)
# mean_squared_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
# Print evaluation metrics
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)}
{s = 10100187858 - 9839952411
print(s)}
{y_train.shape}
{from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
rf_regressor = DecisionTreeRegressor()
rf_regressor.fit(x_train,y_train)
#Predicting the SalePrices using test set
y_pred_rf = rf_regressor.predict(x_test)
DTr = mean_squared_error(y_pred_rf,y_test)
#Random Forest Regression Accuracy with test set
print('Decision Tree Regression : ',DTr)}
{from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
rf_regressor = RandomForestRegressor()
rf_regressor.fit(x_train,y_train)
#Predicting the SalePrices using test set
y_pred_rf = rf_regressor.predict(x_test)
RFr = mean_squared_error(y_pred_rf,y_test)
#Random Forest Regression Accuracy with test set
print('Random Forest Regression : ',RFr)}
{from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
rf_regressor = GradientBoostingRegressor()
rf_regressor.fit(x_train,y_train)
#Predicting the SalePrices using test set
y_pred_rf = rf_regressor.predict(x_test)
#Random Forest Regression Accuracy with test set
GBr = mean_squared_error(y_pred_rf,y_test)
print('Gradient Boosting Regression : ',GBr)}
[# Sample model scores (replace these with your actual model scores)
model_scores = {
"Linear Regression": 9839952411.801708,
"Descison Tree": 29698988724.82603,
"Random Forest":14315329749.65445,
"Gradient Boosting": 12029643835.717766
}
# Sort the model scores in ascending order based on their values (lower values first)
sorted_scores = sorted(model_scores.items(), key=lambda x: x[1])
# Display the ranking of the models
print("Model Rankings (lower values are better):")
for rank, (model_name, score) in enumerate(sorted_scores, start=1):
print(f"{rank}. {model_name}: {score}")
]
Data Transformation for Structured Data:
1- z-score Scaling or Standard Scaling
2- Min-Max Scaling
Ranges are usually in the range of 0 to 1.
Different Regression Techniques:
Decision Tree Regressor:
It involves partitioning the data into subsets based on the values of independent variables and predicting the target variable of each subset.
Random Forest Regressor:
Random Forest is learning method that creates multiple decision trees during training and outputs the average prediction (for regression) from all individual trees.
Grdient boosting Regressor:
Gradient Boosting is learning technique that builds multiple decision trees sequentially, each one correcting the errors of its predecessor
Be a Good Practionaire.
It's all about Curiosity.
Learn & Explore Things from your side.
CS field is Hands-On Knowledge.
You have to train your mind.
Standard Scaling involves Mean & S.D
Google Colab Link:
#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem