Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Business People Built Long Term Business Plans.

In today work, dynamics of learning has been changed.

If you want to build yous basics, (strong your Critical Thinking, Algorithm Building Skills & how to use these tools)

Encoding, Decoding is Mostly used in Cryptography and data storing.

But here enconding, decoding is different.

Supervised ML has two models (Classification & Regression)

Classification:

Used when output is nominal or categorical.

Types {Binary Classification (0,1), multi-class (more than two classes)}

Regreesion:

Output will be numerical

Selecting the Target Variable is very Important.

Target variable depends on data.

Target variable / target column are the same words.

In Supervised ML, we have input and output data.

Scikit Learn:

You can made classification, regression, clustering etc.

You can also train your model.

If model is giving 100% accuracy on training data then we can say thay model is overfit.

Data Splitting: Train & Test Sets

Rows in structured data are called observation, data points and instant.

Training data: having both input and output

Testing data: we have input, model will predict the output.

Training the model is like training the child.

Goolge Colab link:

https://colab.research.google.com/drive/1yZOgTwU2iplGPm1OuxF34mhmCekoR0p3#scrollTo=TCKL0fB7oCHM

full_data = pd.read_csv('/content/titanic_dataset.csv')

# Data shape

print('train data:',full_data.shape)

# View first few rows

full_data.head(5)

# Data Info

full_data.info()

# Heatmap

sns.heatmap(full_data.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')

plt.title('Missing Data: Training Set')

plt.show()

plt.figure(figsize = (10,7))

sns.boxplot(x = 'Pclass', y = 'Age', data = full_data, palette= 'GnBu_d').set_title('Age by Passenger Class')

plt.show()

# Imputation function

def impute_age(cols):

Age = cols[0]

Pclass = cols[1]

if pd.isnull(Age):

if Pclass == 1:

return 37

elif Pclass == 2:

return 29

else:

return 24

else:

return Age

# Apply the function to the Age column

full_data['Age']=full_data[['Age','Pclass']].apply(impute_age, axis =1 )

# Remove Cabin feature

full_data.drop('Cabin', axis = 1, inplace = True)

# Remove rows with missing data

full_data.dropna(inplace = True)

# Remove unnecessary columns

full_data.drop(['Name','Ticket'], axis = 1, inplace = True)

# Convert objects to category data type

objcat = ['Sex','Embarked']

for colname in objcat:

full_data[colname] = full_data[colname].astype('category')

# Numeric summary

full_data.describe()

# Remove PassengerId

full_data.drop('PassengerId', inplace = True, axis = 1)

GETTING MODEL READY:

# Shape of train data

full_data.shape

# Identify categorical features

full_data.select_dtypes(['category']).columns

# Convert categorical variables into 'dummy' or indicator variables

sex = pd.get_dummies(full_data['Sex'], drop_first = True) # drop_first prevents multi-collinearity

embarked = pd.get_dummies(full_data['Embarked'], drop_first = True)

full_data.head()

# Add new dummy columns to data frame

full_data = pd.concat([full_data, sex, embarked], axis = 1)

full_data.head(5)

# Drop unecessary columns

full_data.drop(['Sex', 'Embarked'], axis = 1, inplace = True)

# Shape of train data

print('train_data shape',full_data.shape)

# Confirm changes

full_data.head()

OBJECTIVE 2: MACHINE LEARNING

# Split data to be used in the models

# Create matrix of features

x = full_data.drop('Survived', axis = 1) # grabs everything else but 'Survived'

# Create target variable

y = full_data['Survived'] # y is the column we're trying to predict

# Use x and y variables to split the training data into train and test set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .20, random_state = 101)

LOGISTIC REGRESSION:

# Fit

# Import model

from sklearn.linear_model import LogisticRegression

# Create instance of model

lreg = LogisticRegression()

# Pass training data into model

lreg.fit(x_train, y_train)

# Predict

y_pred_lreg = lreg.predict(x_test)

print(y_pred_lreg)

# Score It

from sklearn.metrics import classification_report, accuracy_score

print('Classification Model')

# Accuracy

print('--'*40)

logreg_accuracy = round(accuracy_score(y_test, y_pred_lreg) * 100,2)

print('Accuracy', logreg_accuracy,'%')

For ML {x is input, y is output}. Whole ML community follow this.

Here, {Performance or Accuracy are the same words}

Today is the start of new Journey for you.

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem

要查看或添加评论,请登录

Hamza Nadeem的更多文章

社区洞察

其他会员也浏览了