登录查看更多内容

Class 13 - DATA TRANSFORMATION, SORTING & VISUALIZATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Hamza Nadeem

Founder & CEO H-Tech | AI Enthusiastic

发布日期: 2024年3月17日

+ 关注

Class 13 - DATA TRANSFORMATION, SORTING & VISUALIZATION

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

If you want to excel in AI, keep yourself upto date.

Compete with yourself, don't compete with other's.

Be Consistent & Persistent.

Structured Data is form of tabular data.

Descriptive statistics give summary of the data.

Features/Attributes/Variable are different names of columns.

Mean:

Average of data

Median:

middle value of data

Mode:

Most frequent value of data

Mean is sensitive to extreme values.

IQR (Inter Quartile Range):

IQR = Q3-Q1

Standard Deviation is taken while using mean.

IQR is taken while using median.

If dispersion is less in data, it is easy for machine to process that data but ML models learning will be less.

If dispersion is more in data, it is difficult for machine to process that data but ML model learning will be more.

IQR, data dispersion, data spread all are same word.

IQR depends on which purpose you are going to use that data.

5 number Summary will helps to check the distribution of data.

Correlation Analysis:

Relation b/w two values.

Correlation analysis is used to determine the relationship between two or more variables in a dataset. It helps us understand how changes in one variable affect another variable.

Correlation values close to +1 or -1 indicate a strong relationship.

Correlation values close to 0 indicate a weak or no relationship.

The magnitude of the correlation value represents the strength of the relationship

In boxplot, box is representing IQR.

Google Colab Link:

https://colab.research.google.com/drive/11cQwJO_oT1GO-mwbamQLjIhuM3B1eDSI#scrollTo=GGyDovL2QDLa

import pandas as pd

import numpy as np

import seaborn as sns #visualisation

import matplotlib.pyplot as plt #visualisation

%matplotlib inline

sns.set(color_codes=True)

df = pd.read_csv("data.csv")

# To display the top 5 rows

df.head(5)

df.tail(5) # To display the botton 5 rows

df.dtypes

df = df.drop([ 'Vehicle Size'], axis=1)

领英推荐

Handling Outliers in ML: Best Practices for Robust…

Iain Brown PhD 1 年前

Data Preprocessing Techniques In Machine Learning:

Nabeelah Maryam 10 个月前

Unlocking Model Performance: Navigating the Key…

VENKATESH MUNGI 1 年前

df.head(5)

df = df.rename(columns={"Engine HP": "HP", "Engine Cylinders": "Cylinders", "Transmission Type": "Transmission", "Driven_Wheels": "Drive Mode","highway MPG": "MPG-H", "city mpg": "MPG-C", "MSRP": "Price" })

df.head(5)

df.shape

duplicate_rows_df = df[df.duplicated()]

print("number of duplicate rows: ", duplicate_rows_df.shape)

df.count() # Used to count the number of rows

df = df.drop_duplicates()

df.head(5)

df.count()

print(df.isnull().sum())

df = df.dropna() # Dropping the missing values.

df.count()

print(df.isnull().sum()) # After dropping the values

sns.boxplot(x=df['Price'])

sns.boxplot(x=df['HP'])

sns.boxplot(x=df['Cylinders'])

Q1 = df.quantile(0.25)

Q3 = df.quantile(0.75)

IQR = Q3 - Q1

print(IQR)

df.corr()

df.Make.value_counts().nlargest(15).plot(kind='bar', figsize=(10,5))

plt.title("Price of cars by make")

plt.ylabel('price of cars')

plt.xlabel('Make');

df.Make.value_counts().nlargest(40).plot(kind='bar', figsize=(10,5))

plt.title("Number of cars by make")

plt.ylabel('Number of cars')

plt.xlabel('Make');

df.Year.value_counts().plot(kind='pie')

plt.show()

df.Make.value_counts().nlargest(20).plot(kind='bar', figsize=(15, 10)) # figsize=(15, 10)

plt.title("Number of HP by car")

plt.ylabel('Number of HP')

plt.xlabel('Make');

# Adjusting the Size of Figure

plt.figure(figsize=(10,5))

# calculating the Correlation

correlation = df.corr()

# Displaying the correlation using the Heap Map

sns.heatmap(correlation,cmap="BrBG",annot=True) # Br: Brown. B: Blue, G: Green

#correlation

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #hamzanadeem

要查看或添加评论，请登录

Hamza Nadeem的更多文章

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年5月29日

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)…
Basics of NumPy

2024年5月16日

Basics of NumPy

NumPy is a Python Library. Numpy is used for working with arrays.
DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年5月14日

DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions) Deep…
Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月23日

Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven…
Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月22日

Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)…
Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月20日

Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven…
Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月18日

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions) Now…
Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月9日

Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)…

3 条评论
Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月8日

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)…
Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

2024年4月4日

Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven…

See all articles

Class 13 - DATA TRANSFORMATION, SORTING & VISUALIZATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Hamza Nadeem

Founder & CEO H-Tech | AI Enthusiastic

领英推荐

Hamza Nadeem的更多文章

社区洞察

其他会员也浏览了

Clustering using Taipy

Overfitting and Underfitting in a Nutshell

Binning

Data Insights & Auto ML

Data Augmentation - The secret ingredient!

Mastering Feature Selection: A Deep Dive into the Filter Method

How to address data imbalance?

Ways to detect outliers that every data scientist should know

Navigating the Labyrinth of Machine Learning: A Step-by-Step Guide

领英推荐

Hamza Nadeem的更多文章

ARTIFICIAL NEURAL NETWORK Notes from the AI Advance course-Class 25 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Basics of NumPy

DEEP LEARNING Notes from the AI Advance course-Class 24 by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 35 - CLASSIFICATION MODEL USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 34 - REGRESSION USING PYTORCH Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 33 - INTRODUCTION TO LLAMA INDEX Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 31 - DOCUMENT GPT HANDS-ON Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 29 - CHATBOT DEBUGGING IN VS CODE Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

社区洞察

其他会员也浏览了

Clustering using Taipy

Overfitting and Underfitting in a Nutshell

Binning

Data Insights & Auto ML

Data Augmentation - The secret ingredient!

Mastering Feature Selection: A Deep Dive into the Filter Method

How to address data imbalance?

Ways to detect outliers that every data scientist should know

Navigating the Labyrinth of Machine Learning: A Step-by-Step Guide