Data analysis using pandas in python

Data analysis using pandas in python

Data analysis using pandas in python?

In this article we are going to discuss data analysis using pandas in python. So let's see first what pandas is and why it is used in python.?

What are pandas and why is it used? ?

Pandas is the open source python library which can be used for data analysis and data manipulation and we can perform many other operations related to data analysis.

Pandas library built on top of the numpy and it has cleaning, analysing and manipulating function which helps in extracting valuable insights from the dataset.?

Two-Dimensional is the main data structure in pandas which is also called a dataframe. We can create a dataframe by importing data in these formats: csv, xlsx, json, sql, parquet etc. we can perform many operations with some line of code like delete, update row/column, statistics of data. Identify the outlier and missing values and handle it.?

Why Pandas??

  • Pandas is the open source library and it is easy to use for data analysis and manipulation built on python.?
  • It provides tools for reading and writing into data structure? files and also provides a powerful aggregation function to manipulate data.?
  • Pandas help in extracting valuable insights from the dataset.?

Pandas Download?

We can easily download the pandas in our system. It does not take much time. When you install the anaconda in your system you don’t need to install python and pandas explicitly in your system. If you want to download pandas latest version just run a simple command on command prompt or jupyter notebook.??

pip install pandas?

After installing the pandas, you need to first import pandas and use it in the jupyter notebook.

No alt text provided for this image

Data Analysis of survey_result_public data with Pandas?

Let's? explore the data and perform practical data analysis on the survey_result_public datasets with pandas. This is the open source dataset, you can download it from this link.?

We will read data by Loading the data in pandas dataframe from the .csv file and perform the basic operations on survey_result_public data.?

  1. Read data
  2. View the data
  3. Understand some basic information about the data
  4. Data Selection – Indexing and Slicing data
  5. Data Selection – Based on Conditional filtering
  6. Groupby operations
  7. Sorting operation

Read Data?

Load data from CSV file

No alt text provided for this image

View Data?

Lets see the first five and last five records of the data by using head() and tail() methods.?

head() : This method returns the first five records from the dataset. We can pass the number of row to view as you want to see the number of? rows, like for the first 20 records? “head(20)”.

No alt text provided for this image

tail() : This method returns the last five records from the dataset.

No alt text provided for this image

data.shape : It returns a tuple of array dimensions representing the shape of a given dataframe. 0 index of tuple represents the number of records in the dataset and 1 index represents the number of columns.?

No alt text provided for this image

data.info() : This method returns the information about the dataframe including index dtype and columns, non nulls values and memory usage.

No alt text provided for this image

pd.set_options : In pandas have options that we can customize some aspect of behaviour and display related options. Following code will display all the columns and rows mentioned in the parameter. ?

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Create Dataframe Using Dictionary

Dictionaries store data values in key:value pairs. A dictionary is a collection which is ordered*, changeable and do not allow duplicates.? We can create dataframe using the following code.

No alt text provided for this image
No alt text provided for this image

Index?

In python using the? index() method finds the index position of an element or an item in a string of characters or a list of items.

We can access single column in just like we are accessing the key of a dictionary

No alt text provided for this image

We can access the single columns using df.email

No alt text provided for this image

When we check the type of the df[‘email’] of columns it is pandas core series. Series is the list of data

No alt text provided for this image

When you want to access the multiple columns then we can use bracket notation and pass the list of the columns that we want to access.

No alt text provided for this image

We can see all the columns name from the given dataset using the follwoing code

No alt text provided for this image

We can access the columns using i.loc and pass the index number of columns . Here df.iloc[ : ,1]? extract second column.

No alt text provided for this image

Filtering

Using the filter method we can access the specified columns and rows of dataframe according to labels in the specified index.

No alt text provided for this image
No alt text provided for this image

To filter data apply two condition on data set

No alt text provided for this image

Update

We can update the data in pandas

No alt text provided for this image

Remove

We can remove the rows from the pandas dataframe

No alt text provided for this image

Sorting

We can sort data in pandas dataframe

No alt text provided for this image

Grouping

Grouping the data according to the categories and apply a function to the categories.

No alt text provided for this image

In this article we have discusssed about the Data analysis using pandas dataframe

If you are looking help in Data analysis please contact us [email protected]

Thank you

要查看或添加评论,请登录

社区洞察

其他会员也浏览了