ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Data processing with Python

Angad Gupta ,MIEEE, BITS-Pilani

Renewable Energy | Clean Tech | DR | VPP| DERMS|EV

å‘å¸ƒæ—¥æœŸ: 2020å¹´5æœˆ2æ—¥

+ å…³æ³¨

Understanding of data structure, the finding of missing values as well as handling Missing Values with Continous & categorical type of variables (Mean, median, last occurred value, any value)

Why are data preprocessing required?

Data preprocessing is crucial in any data mining process as it directly impacts the success rate of the project. This reduces the complexity of the data under analysis as data in the real-world is unclean. Data is said to be unclean if it is missing attribute, attribute values, contain noise or outliers, and duplicate or wrong data. The presence of any of these will degrade the quality of the results.

Some of the examples are below:

There are several reasons for the incompleteness of data, noisy data, duplicate records, etc.

Example of identifying missing values, handling missing values etc

The above dataset is used for an explanation of the example. From the above dataset, we can note that:-

Incompleteness: Region & Online Shopper feature contains many "NaN" Values
Noisy: Income feature contains inconsistent salary -10 & -500 which is not possible
Inconsistent: Birthdate column is the same for all and we can see if the age is 49 then how come birth date is 01-Jan-202 is possible?

Let's understand the data and handle it suitably

Importing of required Libraries

2. Reading the dataset

3. Viewing the structure of the dataset

Here data. shape() methods show the number of columns & rows available in the dataset
data.info() method shows the number of columns, name of columns, no of data entries, type of data. from here we can say that Age id having 10 entries whereas dataset is having a total of 12 rows, it means it has issues with 2 rows of data

More detailed view of the missing values

data.isnull().values.any() : shows that is there any value exist or not ? , In our case its True
data.isnull().sum() : shows no of rows, is having null values
data.isnull() : Shows the items details

Dataset summary, which shows some of the required statistics of the continuous columns

Handling of NULL values:

Dropping of null values rows: here we can say that the rows for China have been deleted due to Null values, which may be a loss of data pattern.

This method commonly used to handle null values. Here, we either delete a particular row if it has a null value for a particular feature and a particular column if it has more than 70-75% of missing values. This method is advised only when there are enough samples in the data set. One has to make sure that after we have deleted the data, there is no addition of bias. Removing the data will lead to loss of information which will not give the expected results while predicting the output.

Pros:

Complete removal of data with missing values results in robust and highly accurate model
Deleting a particular row or a column with no specific information is better, since it does not have a high weightage

Cons:

Loss of information and data
Works poorly if the percentage of missing values is high (say 30%), compared to the whole dataset

2. Replacing With Mean/Median/Mode

Here we can see Missing values in Age column replaced by mean values (i.e. 134.5) and Income column with median (i.e. 76800)

Pros:

This is a better approach when the data size is small
It can prevent data loss which results in removal of the rows and columns

Cons:

Imputing the approximations add variance and bias
Works poorly compared to another multiple-imputations method

Handling Missing values with CATEGORICAL data

Replacing Missing values with Most occurred Word

Replacing Missing values with Common word "Unknown"

Angad Gupta ,MIEEE, BITS-Pilaniçš„æ›´å¤šæ–‡ç«

TYPES OF ELECTRIC VEHICLES AND ITS KEY COMPONENTS

2024å¹´5æœˆ23æ—¥

TYPES OF ELECTRIC VEHICLES AND ITS KEY COMPONENTS

There are four types of electric vehicles available: Battery Electric Vehicle (BEV): Fully powered by electricityâ€¦
eRoaming : a Revolutionary step in EV Charging

2024å¹´5æœˆ23æ—¥

eRoaming : a Revolutionary step in EV Charging

eRoaming presents a revolutionary advantage in the realm of electric vehicles. Firstly, it ensures universal access forâ€¦
EV Roaming and Its different protocols (OICP, OCPI, OCHP eMIP)

2024å¹´5æœˆ16æ—¥

EV Roaming and Its different protocols (OICP, OCPI, OCHP eMIP)

An e-Mobility Service Provider (eMSP) is a company that facilitates electric vehicle (EV) charging roaming servicesâ€¦
Open Charge Point Protocol (OCPP) vs. Open Charge Point Interface (OCPI)

2024å¹´5æœˆ16æ—¥

Open Charge Point Protocol (OCPP) vs. Open Charge Point Interface (OCPI)

What is OCPI? The Open Charge Point Interface (OCPI) is an open, automated protocol that connects EV charge pointâ€¦
Interoperability in EV charging Infrastructure

2024å¹´5æœˆ16æ—¥

Interoperability in EV charging Infrastructure

Interoperability and standardization are essential factors in the development and widespread adoption of electricâ€¦
Relationship between SOH (State of Health)and SOC (State of Charge) of the battery

2024å¹´5æœˆ12æ—¥

Relationship between SOH (State of Health)and SOC (State of Charge) of the battery

SOH (State of Health) is mainly influenced by SOC (State of Charge), temperature, discharge multiplier, cumulativeâ€¦
Battery states: State of charge (SoC), State of Health (SoH),Depth-of-Discharge(DoD)

2024å¹´5æœˆ10æ—¥

Battery states: State of charge (SoC), State of Health (SoH),Depth-of-Discharge(DoD)

SoC= State-of-charge SoC stands for State of Charge, which is a measure of how much energy is remaining in a battery asâ€¦
V2X and Its Stakeholders

2024å¹´3æœˆ11æ—¥

V2X and Its Stakeholders

V2X has a diversified range of stakeholders including OEMs, semiconductor companies, telecommunication operators, andâ€¦
Your Electric Car is Your Power House with V2X Technologies (Bidirectional Charging)

2024å¹´3æœˆ9æ—¥

Your Electric Car is Your Power House with V2X Technologies (Bidirectional Charging)

The concept of vehicle-to-everything (V2X) V2X technologies, including vehicle-to-grid (V2G), vehicle-to-home (V2H)â€¦
Bidirectional Charging EVs: V2X [V2G, V2H,V2L , V2V, V2B and V2F]

2024å¹´3æœˆ7æ—¥

Bidirectional Charging EVs: V2X [V2G, V2H,V2L , V2V, V2B and V2F]

Bidirectional charging is becoming more common in electric vehicles, and buyers are increasingly looking for modelsâ€¦

See all articles

Data processing with Python

Angad Gupta ,MIEEE, BITS-Pilani

Renewable Energy | Clean Tech | DR | VPP| DERMS|EV

Understanding of data structure, the finding of missing values as well as handling Missing Values with Continous & categorical type of variables (Mean, median, last occurred value, any value)

Handling of NULL values:

Pros:

Cons:

2. Replacing With Mean/Median/Mode

Pros:

Cons:

Handling Missing values with CATEGORICAL data

Angad Gupta ,MIEEE, BITS-Pilaniçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Python vs. SQL: A Comparative Perspective on Data Processing

20 Advanced Methods For Doing Data Analysis in Excel

Hands-on Debugging for Data Science

The One-Click Data Scientist: The Power of #GPT-4's New .CSV File Analysis with Python

R, Python Duel As Top Analytics, Data Science software â€“ KDnuggets 2016 Software Poll Results

Stock Analysis and Prediction Using Python: A Step-by-Step Guide

Python Challenge: User Activity Analysis

5 Best Scenarios to Use python in Power BI Reports

Data Wrangling with Python

Understanding of data structure, the finding of missing values as well as handling Missing Values with Continous & categorical type of variables (Mean, median, last occurred value, any value)

Handling of NULL values:

Pros:

Cons:

2. Replacing With Mean/Median/Mode

Pros:

Cons:

Handling Missing values with CATEGORICAL data

Angad Gupta ,MIEEE, BITS-Pilaniçš„æ›´å¤šæ–‡ç«

TYPES OF ELECTRIC VEHICLES AND ITS KEY COMPONENTS

eRoaming : a Revolutionary step in EV Charging

EV Roaming and Its different protocols (OICP, OCPI, OCHP eMIP)

Open Charge Point Protocol (OCPP) vs. Open Charge Point Interface (OCPI)

Interoperability in EV charging Infrastructure

Relationship between SOH (State of Health)and SOC (State of Charge) of the battery

Battery states: State of charge (SoC), State of Health (SoH),Depth-of-Discharge(DoD)

V2X and Its Stakeholders

Your Electric Car is Your Power House with V2X Technologies (Bidirectional Charging)

Bidirectional Charging EVs: V2X [V2G, V2H,V2L , V2V, V2B and V2F]

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part I

Python vs. SQL: A Comparative Perspective on Data Processing

20 Advanced Methods For Doing Data Analysis in Excel

Hands-on Debugging for Data Science

The One-Click Data Scientist: The Power of #GPT-4's New .CSV File Analysis with Python

R, Python Duel As Top Analytics, Data Science software â€“ KDnuggets 2016 Software Poll Results

Stock Analysis and Prediction Using Python: A Step-by-Step Guide

Python Challenge: User Activity Analysis

5 Best Scenarios to Use python in Power BI Reports

Data Wrangling with Python

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†