Learn Aggregation and Data Wrangling with Python
Learn Aggregation and Data Wrangling with Python

Learn Aggregation and Data Wrangling with Python

A Brief on DataFrames

A DataFrame in pandas is a 2-dimensional data structure which holds data in a tabular sense. This means it lets us work in a context of rows and columns.

A dataframe is-

  • Mutable.
  • Capable of holding columns of different types.
  • Capable of performing arithmetic operations on rows and columns.
  • A holder of labeled axes for the rows and columns.

This is the DataFrame constructor we have-

pandas.DataFrame (data, index, columns, dtype, copy)

Read about Python Data File Formats – How to Read CSV, JSON, and XLS Files

Python Data Wrangling – Prerequisites

a. Python pandas

For aggregation and Data wrangling with Python, you will need the pandas’ library. It helps us with data manipulation and analysis. It has data structures and allows operations that we can use to manipulate numerical tables and time series.

You can install it using the following command-

  1. C:\Users\lifei>pip install pandas

At the time of writing, we use the version 0.23.1 of pandas.

b. Python NumPy

NumPy is another Python library that lets us handle large, multi-dimensional arrays and matrices. It also offers various high-level mathematical functions to help us deal with these.

To install this, you can try the following command in the command prompt-

  1. C:\Users\lifei>pip install numpy

Do you How to Work with NoSQL Database in Python using PyMongo

c. Python DataFrames

For our purpose, we will need two dataframes. We can create a DataFrame using any data type like list, dictionary, or series. We can also use a NumPy and array or another DataFrame to create it. Let’s use a dictionary for now.

  1. >>> one=pandas.DataFrame({
  2. 'emp_id':[1,2,3,4,5],
  3. 'dept_name':['duvet','bidet','footwear','clothing','electronics'],
  4. 'aisle':[1,2,3,4,5]})
  5. >>> first=pandas.DataFrame(one)
  6. >>> two=pandas.DataFrame({
  7. 'emp_id':[6,7,8,9,10],
  8. 'dept_name':['grocery','toys','laundry','frozen','stationery'],
  9. 'aisle':[6,2,2,9,10]})
  10. >>> second=pandas.DataFrame(two)

We also create a csv file with utf-8 format in Excel-

Why we need Data Wrangling with Python

Much data obtained from various sources are raw and unusable. This could be messy or incomplete. With data wrangling with Python, we can perform operations on raw data to clean it out to an extent. Wrangling is essential to data science. Let’s take a quick look at it.

Dropping Missing Values

As you can see, we have no aisle number for the frozen department in our csv file.

  1. >>> import os
  2. >>> os.chdir('C:\\Users\\lifei\\Desktop')
  3. >>> three=pandas.read_csv('supermarket.csv')
  4. >>> pandas.isnull(three).any()
  5. emp_id False
  6. dept_name False
  7. aisle True
  8. dtype: bool

This shows us that the ‘aisle’ column does have a missing value. The following command shows us the null values in the file-

  1. >>> three.head()

Data Wrangling with Python – Dropping Missing Values

Let’s Revise Python File I/O – Python Write to File and Read File

To drop this record, we use the following command-

  1. >>> three.dropna()

Data Wrangling with Python – Dropping Missing Values

Grouping Data

The pandas gorupby() method returns a DataFrameGroupBy object. Calling the method value_counts() returns the number of occurrences for each unique value in the column we specify.

  1. >>> three.groupby('aisle').dept_name.value_counts()

This tells us about the correlation between aisle number and department name. This lets us discover trends in data. In situations where we want to see trends by the city, we can group records by their geographical factors.

a. Finding unique values

For this, we can use the unique() method.

  1. >>> two.aisle.unique()
  2. array([ 6, 2, 9, 10], dtype=int64)

b. Length of dataframe

Then len() function gives us the length of the dataframe.

  1. >>> len(two)

5

Do you know How I Used These 5 Steps to Learn Python for Data Science

Read Complete Article>>

要查看或添加评论,请登录

社区洞察

其他会员也浏览了