Feature Engineering - Min/Max Aggregate

Feature Engineering - Min/Max Aggregate

TLDR

In this lesson, we’ll learn about the aggregate functions min() and max(), and see how they’re helpful in analyzing and understanding the data.

Glossary

  • Data Aggregation
  • Why is it necessary
  • Definition
  • Example
  • How to code

Data Aggregation

Data aggregation is known as summarization of data. Some of the most common aggregate functions are min(), max(), mean(), count(), sum() etc.?

Why is it necessary

Data aggregation is a part of the data analysis process. Data analysis is the first and most critical step of model building. This allows us to delve deeper into the data and help us understand the data better.?

Definition

In this lesson, we’ll explore min() and max() functions in detail.

  1. min(): This function helps us find the minimum or least value in a feature or column.
  2. max(): This function helps us find the maximum or highest value in a feature or column.

We can apply aggregate functions in 2 different ways:

Case-1: Apply aggregate functions on a single feature or column i.e., analyzing each column individually.

Case-2: Apply aggregate functions on groups i.e., we’ll group rows and analyze each group individually.

Example

Consider a dataset with 2 columns "Product" and "Price". Let’s apply aggregate functions (min() and max()) to find minimum and maximum value in the “Price” column.

No alt text provided for this image

Find?minimum price?

No alt text provided for this image

Find?maximum price

Grouping is a 3 step process as shown below:

Step-1: Split the rows into groups based on the “Product” column.

No alt text provided for this image

There are 3 unique products (Laptop, Desk, Chair) in the “Product” column, so the rows are split into 3 groups.

Step-2: Find the minimum price of each unique product

No alt text provided for this image

Step-3: Display the output. For this, we’ll combine each group’s output to form a data frame and display the data frame.

No alt text provided for this image
No alt text provided for this image

Steps to find minimum value of each unique product

No alt text provided for this image

Steps to find maximum value of each unique product

How to code

In recent years, the popularity of ridesharing has skyrocketed. The key benefits of ridesharing are that it’s inexpensive, convenient, and allows anyone to easily travel from 1 location to another.?

No alt text provided for this image

Image by?mohamed Hassan from?Pixabay?

Service providers frequently change prices based on time, traffic, the number of cabs available, and other factors. As costs fluctuate, it's beneficial to offer users a range of prices for a specific route. So, with the help of?rides?data, let’s find the minimum and maximum prices for each unique route.

No alt text provided for this image

Find the minimum and maximum price of each unique route.

Step-1:

First let’s group rides by source and then by destination. To do this, we’ll iterate through the rows of rides data and save the “source” as keys of the dictionary. The final result should be as shown below.

Output format: {‘sourceA’: [(destination1, price1), (destination1, price2),...], ‘sourceB’:[(destination1, price1), (destination1, price2),...],....}

No alt text provided for this image
No alt text provided for this image

Step-2:

Find minimum price

By comparing the prices of routes with the same starting location and destination, we'll find the minimum price of each route.

No alt text provided for this image
No alt text provided for this image

Lowest price of each unique route

Find maximum price

By comparing the prices of routes with the same starting point and destination, we'll find the highest price for each route.

No alt text provided for this image
No alt text provided for this image

Highest price of each unique route

From the output, we see that the price from “Haymarket Square” to “North Station” ranges between 3.0 and 32.5, “Haymarket Square” to “West End” ranges between 3.0 and 27.5, etc.

Group rows of the same route, and find the minimum and maximum price of each individual route.

Pandas has a built-in function?groupby()?that’s used to group rows in a dataset.?This function is used along with?min()?and?max()?functions to find minimum and maximum values of each unique group.

Find minimum price

No alt text provided for this image
No alt text provided for this image

Find maximum price

No alt text provided for this image
No alt text provided for this image

Magical no code solution

For quick analysis and results, try our product, Mage. Our service features an "Edit data" area with multiple aggregation options. Apart from analyzing the data, you can create a new column and store the aggregation results that help in further analysis of the data.

No alt text provided for this image



Want to learn more about machine learning (ML)? Visit?Mage Academy!????

要查看或添加评论,请登录

Mage的更多文章

社区洞察

其他会员也浏览了