Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)
Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION
Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)
Business People Built Long Term Business Plans.
In today work, dynamics of learning has been changed.
If you want to build yous basics, (strong your Critical Thinking, Algorithm Building Skills & how to use these tools)
Encoding, Decoding is Mostly used in Cryptography and data storing.
But here enconding, decoding is different.
Supervised ML has two models (Classification & Regression)
Classification:
Used when output is nominal or categorical.
Types {Binary Classification (0,1), multi-class (more than two classes)}
Regreesion:
Output will be numerical
Selecting the Target Variable is very Important.
Target variable depends on data.
Target variable / target column are the same words.
In Supervised ML, we have input and output data.
Scikit Learn:
You can made classification, regression, clustering etc.
You can also train your model.
If model is giving 100% accuracy on training data then we can say thay model is overfit.
Data Splitting: Train & Test Sets
Rows in structured data are called observation, data points and instant.
Training data: having both input and output
Testing data: we have input, model will predict the output.
Training the model is like training the child.
Goolge Colab link:
full_data = pd.read_csv('/content/titanic_dataset.csv')
# Data shape
print('train data:',full_data.shape)
# View first few rows
full_data.head(5)
# Data Info
full_data.info()
# Heatmap
sns.heatmap(full_data.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')
plt.title('Missing Data: Training Set')
plt.show()
plt.figure(figsize = (10,7))
sns.boxplot(x = 'Pclass', y = 'Age', data = full_data, palette= 'GnBu_d').set_title('Age by Passenger Class')
plt.show()
# Imputation function
def impute_age(cols):
Age = cols[0]
Pclass = cols[1]
if pd.isnull(Age):
if Pclass == 1:
return 37
elif Pclass == 2:
return 29
else:
return 24
else:
return Age
# Apply the function to the Age column
full_data['Age']=full_data[['Age','Pclass']].apply(impute_age, axis =1 )
# Remove Cabin feature
full_data.drop('Cabin', axis = 1, inplace = True)
# Remove rows with missing data
领英推荐
full_data.dropna(inplace = True)
# Remove unnecessary columns
full_data.drop(['Name','Ticket'], axis = 1, inplace = True)
# Convert objects to category data type
objcat = ['Sex','Embarked']
for colname in objcat:
full_data[colname] = full_data[colname].astype('category')
# Numeric summary
full_data.describe()
# Remove PassengerId
full_data.drop('PassengerId', inplace = True, axis = 1)
GETTING MODEL READY:
# Shape of train data
full_data.shape
# Identify categorical features
full_data.select_dtypes(['category']).columns
# Convert categorical variables into 'dummy' or indicator variables
sex = pd.get_dummies(full_data['Sex'], drop_first = True) # drop_first prevents multi-collinearity
embarked = pd.get_dummies(full_data['Embarked'], drop_first = True)
full_data.head()
# Add new dummy columns to data frame
full_data = pd.concat([full_data, sex, embarked], axis = 1)
full_data.head(5)
# Drop unecessary columns
full_data.drop(['Sex', 'Embarked'], axis = 1, inplace = True)
# Shape of train data
print('train_data shape',full_data.shape)
# Confirm changes
full_data.head()
OBJECTIVE 2: MACHINE LEARNING
# Split data to be used in the models
# Create matrix of features
x = full_data.drop('Survived', axis = 1) # grabs everything else but 'Survived'
# Create target variable
y = full_data['Survived'] # y is the column we're trying to predict
# Use x and y variables to split the training data into train and test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .20, random_state = 101)
LOGISTIC REGRESSION:
# Fit
# Import model
from sklearn.linear_model import LogisticRegression
# Create instance of model
lreg = LogisticRegression()
# Pass training data into model
lreg.fit(x_train, y_train)
# Predict
y_pred_lreg = lreg.predict(x_test)
print(y_pred_lreg)
# Score It
from sklearn.metrics import classification_report, accuracy_score
print('Classification Model')
# Accuracy
print('--'*40)
logreg_accuracy = round(accuracy_score(y_test, y_pred_lreg) * 100,2)
print('Accuracy', logreg_accuracy,'%')
For ML {x is input, y is output}. Whole ML community follow this.
Here, {Performance or Accuracy are the same words}
Today is the start of new Journey for you.
#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem