登录查看更多内容

Principal Component Analysis????

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

发布日期: 2022年4月1日

What is PCA?

Principal Component Analysis, or?PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

Why PCA ?

Large datasets are increasingly widespread in many disciplines. In order to interpret such datasets, methods are required to drastically reduce their dimensionality in an interpretable way, such that most of the information in the data is preserved. Many techniques have been developed for this purpose, but principal component analysis (PCA) is one of the oldest and most widely used. Its idea is simple—reduce the dimensionality of a dataset, while preserving as much ‘variability’ (i.e. statistical information) as possible.

How we do PCA ?

The basic idea behind PCA is very simple that we transform all of our attributes of the data into a transformed plane and every transformed attribute will be having some relationship with the original attributes. So, for example if we have 10 attributes and we apply PCA on to them then we get 10 transformed variables. But if we want to keep 5 of the transformed variables, then we can do that because they will be having some of the information of the original 10 attributes. We can reduce the number of PCA components we want to select but, it will be again some loss of variance of original data.

领英推荐

Deepchecks for Data and Model Validation

Aishwarya Srinivasan 2 年前

Learning Data Science with Kaggle's Titantic: Machine…

Randy Lao ?? 7 年前

Checking for the Assumptions of Linear Regression…

Chirag S. 1 年前

That’s why we keep only important principal components out of all components which can properly represent the variance of our original dataset.

?Below is a python code to represent how PCA works on IRIS Dataset:

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponent

 ????????????, columns = ['principal component 1', 'principal component 2'])

Here we specified the number of principal components as 2, So, the transformed data is having 2 attributes representing the variance of all 4 variables of original dataset.

Ishan Batra

Principal Engineer at Paytm

2 年

Nice

1 次回应

查看更多评论

要查看或添加评论，请登录

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

2023年9月20日

reCAPTCHA: The Turing Test We Use Daily

It is amazing that we use some things so frequently that we forget to understand the mechanism behind them, like for…
Enable Machines to Feel: Sentiment Analysis

2022年5月5日

Enable Machines to Feel: Sentiment Analysis

Have you ever got a text from someone and couldn't tell if they were kidding or not? Unless we clearly tell the person…
Introduction to Time Series Analysis

2022年4月28日

Introduction to Time Series Analysis

Time series is a sequence of data points organized in time order. Forecast of data by analyzing time-based data is Time…

1 条评论
Dimensionality Reduction by PCA using Orange

2022年4月21日

Dimensionality Reduction by PCA using Orange

The curse of dimensionality haunts every data scientist dealing with a dataset containing a large number of attributes.…

1 条评论
Model Drift in Machine Learning

2022年4月14日

Model Drift in Machine Learning

“Change is the only constant in life.”- Heraclitus (Greek philosopher).
Curse of Dimensionality

2022年3月17日

Curse of Dimensionality

Yes, data scientists and the data handling community do suffer from this well-known curse. So, is it really a curse or…
Market Basket Analysis:- What will I buy next?

2022年3月10日

Market Basket Analysis:- What will I buy next?

Have you ever wondered, while entering a shopping store that how they organize or stack the things in a particular…
What do Data Engineer Do?

2022年3月3日

What do Data Engineer Do?

So, to define it very shortly a data engineer is that person who is responsible to collect the data from various…

4 条评论
A beginner’s Guide to data mining : RapidMiner

2022年2月24日

A beginner’s Guide to data mining : RapidMiner

RapidMiner studio is a data science and data mining platform that lets users extract transform and load data to draw…
Database Vs Data Warehouse Vs Data Lake

2022年2月17日

Database Vs Data Warehouse Vs Data Lake

In this article, we are going to discuss the difference between databases, data warehouses, and data lakes. So, to need…

1 条评论

See all articles

Principal Component Analysis????

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

领英推荐

Utkarsh Sharma的更多文章

社区洞察

其他会员也浏览了

Using regression with correlated data

Day 13 — Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

The Ultimate Data Scientist Roadmap: From Beginner to Mastery

How to turn Fuzzy Requests into Clear Action? Try this 4-Step Framework

需要Coding/编程的岗位有哪些？从入门到刷题资源推荐

My (rambling) thoughts on data and where we're going with it.

A Reference Notebook for (+30) Statistical Charts in?Seaborn

How logistic regression can save the day?

6-Month Roadmap to Master Data Structures and Algorithms: From Beginner to Advanced

Time Series Analysis - Basics

领英推荐

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

Enable Machines to Feel: Sentiment Analysis

Introduction to Time Series Analysis

Dimensionality Reduction by PCA using Orange

Model Drift in Machine Learning

Curse of Dimensionality

Market Basket Analysis:- What will I buy next?

What do Data Engineer Do?

A beginner’s Guide to data mining : RapidMiner

Database Vs Data Warehouse Vs Data Lake

社区洞察

其他会员也浏览了

Using regression with correlated data

Day 13 — Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

The Ultimate Data Scientist Roadmap: From Beginner to Mastery

How to turn Fuzzy Requests into Clear Action? Try this 4-Step Framework

需要Coding/编程的岗位有哪些？从入门到刷题资源推荐

My (rambling) thoughts on data and where we're going with it.

A Reference Notebook for (+30) Statistical Charts in?Seaborn

How logistic regression can save the day?

6-Month Roadmap to Master Data Structures and Algorithms: From Beginner to Advanced

Time Series Analysis - Basics