Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS 

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 18 - EVALUATION METRICS FOR DIFFERENT MODELS

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Model Performance depends on its Data.

If Data is not Good, output will be garbage.

That's why

Dr. Sheraz Sb Says:

Garbage In, Garbage Out.

From the previous lesson, we have seen that the MSE value is too High.

So, what to do now???

We have to further analyze the data means data pre-processing.

Data cleaning is the part of data pre processing.

We can apply different models in this scenario.

MSE can be reduced but cann't be completely zero.

In this case, you can apply other models as well like Decision Tree etc.

BIG VALUES DOMINATE IN MODEL, Model Give weightage to Big Values.

{import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import sklearn

import seaborn as sns

import warnings

warnings.filterwarnings('ignore')

plt.rcParams["figure.figsize"] = [10,5]

# Ignore warnings

import warnings

# Set the warning filter to ignore FutureWarning

warnings.simplefilter(action = "ignore", category = FutureWarning)}

{full_data = pd.read_csv('/content/USA_Housing.csv')}

{# Data shape

print('train data:',full_data.shape)}

{# View first few rows

full_data.head(5)}

{# Data Info

full_data.info()}

{# Heatmap

sns.heatmap(full_data.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')

plt.title('Missing Data: Training Set')

plt.show()}

{# Remove Address feature

full_data.drop('Address', axis = 1, inplace = True)}

{# Remove rows with missing data

full_data.dropna(inplace = True)}

{full_data}

{# Numeric summary

full_data.describe()}

{# Shape of train data

full_data.shape}

{# Split data to be used in the models

# Create matrix of features

x = full_data.drop('Price', axis = 1) # grabs everything else but 'Price'

# Create target variable

y = full_data['Price'] # y is the column we're trying to predict

}

{from sklearn import preprocessing

pre_process = preprocessing.StandardScaler().fit(x)

x_transform = pre_process.fit_transform(x)}

{# x Represents the Features

x_transform.shape

x_transform}

{y.shape}

{# Use x and y variables to split the training data into train and test set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_transform, y, test_size = .10, random_state = 101)}

{# Fit

# Import model

from sklearn.linear_model import LinearRegression

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import StandardScaler

# Create instance of model

lin_reg = LinearRegression()

# Pass training data into model

lin_reg.fit(x_train, y_train)

# pipe = make_pipeline(StandardScaler(), LinearRegression())

# pipe.fit(x_train, y_train)}

{# Predict

y_pred = lin_reg.predict(x_test)

print(y_pred.shape)

print(y_pred)}

{sns.scatterplot(x=y_test, y=y_pred, color='blue', label='Actual Data points')

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', label='Ideal Line')

plt.legend()

plt.show()}

{# Combine actual and predicted values side by side

results = np.column_stack((y_test, y_pred))

# Printing the results

print("Actual Values | Predicted Values")

print("-----------------------------")

for actual, predicted in results:

print(f"{actual:14.2f} | {predicted:12.2f}")}

{residual = actual- y_pred.reshape(-1)

print(residual)}

{# Distribution plot for Residual (difference between actual and predicted values)

sns.distplot(residual, kde=True)}

{# Score It

from sklearn.metrics import mean_squared_error

print('Linear Regression Model')

# Results

print('--'*30)

# mean_squared_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

rmse = np.sqrt(mse)

# Print evaluation metrics

print("Mean Squared Error:", mse)

print("Root Mean Squared Error:", rmse)}

{s = 10100187858 - 9839952411

print(s)}

{y_train.shape}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

rf_regressor = DecisionTreeRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

DTr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Decision Tree Regression : ',DTr)}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

rf_regressor = RandomForestRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

RFr = mean_squared_error(y_pred_rf,y_test)

#Random Forest Regression Accuracy with test set

print('Random Forest Regression : ',RFr)}

{from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import GradientBoostingRegressor

rf_regressor = GradientBoostingRegressor()

rf_regressor.fit(x_train,y_train)

#Predicting the SalePrices using test set

y_pred_rf = rf_regressor.predict(x_test)

#Random Forest Regression Accuracy with test set

GBr = mean_squared_error(y_pred_rf,y_test)

print('Gradient Boosting Regression : ',GBr)}

[# Sample model scores (replace these with your actual model scores)

model_scores = {

"Linear Regression": 9839952411.801708,

"Descison Tree": 29698988724.82603,

"Random Forest":14315329749.65445,

"Gradient Boosting": 12029643835.717766

}

# Sort the model scores in ascending order based on their values (lower values first)

sorted_scores = sorted(model_scores.items(), key=lambda x: x[1])

# Display the ranking of the models

print("Model Rankings (lower values are better):")

for rank, (model_name, score) in enumerate(sorted_scores, start=1):

print(f"{rank}. {model_name}: {score}")

]

Data Transformation for Structured Data:

1- z-score Scaling or Standard Scaling

2- Min-Max Scaling

Ranges are usually in the range of 0 to 1.

Different Regression Techniques:

Decision Tree Regressor:

It involves partitioning the data into subsets based on the values of independent variables and predicting the target variable of each subset.

Random Forest Regressor:

Random Forest is learning method that creates multiple decision trees during training and outputs the average prediction (for regression) from all individual trees.

Grdient boosting Regressor:

Gradient Boosting is learning technique that builds multiple decision trees sequentially, each one correcting the errors of its predecessor

Be a Good Practionaire.

It's all about Curiosity.

Learn & Explore Things from your side.

CS field is Hands-On Knowledge.

You have to train your mind.

Standard Scaling involves Mean & S.D

Google Colab Link:

https://colab.research.google.com/drive/1KrPiJ9yiRzVAEAeMemgCqZgjpr6S1yp3#scrollTo=fL8XebaeYNvi

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem

要查看或添加评论,请登录

社区洞察

其他会员也浏览了