登录查看更多内容

Quartiles

Claudiu Clement

CTO @ e-Comas and PhD in Stats, sharing simplified insights on e-commerce analytics and eRetailer trends.

发布日期: 2024年1月6日

+ 关注

One of the last articles here. The newsletter is moved to substack.

Subscribe here: Newsletter new location

Table of contents:

What are quartiles
When can we use them
How to calculate them manually
How to calculate them using python

1. What are quartiles

Quartiles are position indicators that divide a sequence of numbers into 4 equal parts.

Let’s look at the below schema.

Thanks for reading Is Not Rocket Science! Subscribe for free to receive new posts and support my work.

Subscribed

We have a sequence with n=12 (numbers from 14 to 57) and let’s imagine these represent the number of tractors some 12 farms have in the northern region of Statistics Land.

Quartiles analysis is part of descriptive statistics and consequently, helps us better understand the data at hand. With these, specifically, we will understand what is called central tendency or :

“A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data”

2. When can we use them

Ok, let’s come back now. We have a businessman that asks us to help him better understand this northern region where he wants to open a farm. His main question is: “Will I be in the top 25% of farms considering tractor count if I open a farm with 43 tractors?”

Hmm… Great question.

As we can see in Figure 1 above, I have already split data into 4 quadrants.

The interpretation is the following:

50% of farms in the northern area of Statistics Land have less than 36 tractors and 50% have more.
25% of these farms have more than 51 tractors
75% of these farms have less than 51 tractors

I know. I took it took fast. Let’s look below to see how we actually calculate these values.

3. How to calculate them manually

Before doing any calculation of Quartiles we need to order the sequence from lowest to highest. [already did that in Figure 1]

Next, let’s look below at Figure 2.1 and 2.2 below for some clarification.

领英推荐

Write a python program to detect anomalies in a Solar…

VOCUNI 1 年前

Mastering Matplotlib: Easy Plotting Tips and Common…

Ali Asghar Torabi 1 年前

Python Invasion

Helen Wall 2 年前

Further, let’s calculate the position of these quartiles before calculating them.

So, what do these 3.25 / 6.5 / 9.75 mean?

These represent the position of quartile’s values in this sequence of numbers.

In figure 2.1 in blue, I have written the index of each number. For Q1 for example, its position is 3,25 so somewhere between 3 and 4. Closer to 3rd index number than to 4th. We can now intuitively state that Q1 is equal to something between 16 and 17 (index 3=16, index 4=17).

But what is the number?

What we can do is apply the average of these two numbers and we will end up with the value of our Q1.

So Q1 = (16+17)/2 = 16.5

For Q2, the value is somewhere between the 6th and 7th index so we average the 6th and 7th values of our sequence, thus Q2 = (32+40)/2 = 36.

Same for Q3, the value is 9.75 so somewhere between 9th and 10th position. Q3 = (50/52)/2 = 51

Now, here if you would add my numbers you would get slightly different results. Why? Because precisely Q1 in our case is 3.25 which means the very accurate value is right on the first quarter of the distance between 16 and 17. The website above shows us the value 16.25 and that is calculated as (16*3+17)/4. The value is 3 times closer to 16 than it is to 17.

4. How to calculate them using python

Ok, now moving to python. For this calculation, we will use NumPy.

Alright, ok. But something is off, isn’t it? We have Q1 = 16.75 compared to 16.5 or 16.25 that the above website calculates.

Ok. Not going to boil you too much. The problem relies on the default setting of np.quantile, more on it here.

A simple solution to this and to stick with our reasoning from this article, add interpolation=’midpoint’ and we’re all set.

In Figure 1 I wrote “Interquartile Range” and that represents the midspread or the middle 50% of our data. This also helps us understand which values can be considered outliers, I wrote a few things about this here.

In the end, let’s not forget our business man. Let’s reply his question. Will he be in top 25% of businesses if his farm will own 43 tractors?

The answer is no, for him to be in top 25%, he will need at least 51 tractors.

Conclusion

We’ve looked on what quartiles are, when we can use them and how they’re calculated. I hope it is a bit clearer now.

Until next time, keep learning.

Is Not Rocket Science

493 位关注者

要查看或添加评论，请登录

Claudiu Clement的更多文章

Correlation and Determination

2024年2月21日

Correlation and Determination

Table of Contents: Introduction to Correlation and Determination Coefficient of Correlation: Unveiling Relationships…

1 条评论
Discrete Statistics vs Inferential Statistics

2024年2月15日

Discrete Statistics vs Inferential Statistics

This newsletter has moved to https://inrs.substack.
The Law of Large Numbers

2024年2月7日

The Law of Large Numbers

This newsletter is moving to https://inrs.substack.
Random Sampling

2024年1月31日

Random Sampling

Table of Contents: What is Random Sampling? Understanding Random Sampling Steps for Conducting Random Sampling Random…
Moving Average

2024年1月24日

Moving Average

Hello Rockets, Welcome to another edition of "Is Not Rocket Science"! Today, we're going to unravel the concept of…
Linear Regression

2024年1月17日

Linear Regression

The newsletter is moved to substack. Subscribe here: Newsletter new location Hello Rockets, In today's edition of "Is…
Boxplots

2024年1月10日

Boxplots

The newsletter is moved to substack. Subscribe here: Newsletter new location Table of contents: Why Boxplots? The…
AI = Prediction

2023年10月2日

AI = Prediction

Hello, rockets! In the heart of today's digital renaissance, Artificial Intelligence (AI) stands as a beacon of…
What is sentiment analysis?

2023年7月27日

What is sentiment analysis?

Hello Rockets, Today we're venturing into the fascinating universe of sentiment analysis, and no, it's not rocket…

1 条评论
Variance

2023年7月5日

Variance

Hello rockets, In the intricate realm of numbers and statistics, there are various concepts that may initially seem…

3 条评论

See all articles

Quartiles

Claudiu Clement

CTO @ e-Comas and PhD in Stats, sharing simplified insights on e-commerce analytics and eRetailer trends.

1. What are quartiles

2. When can we use them

3. How to calculate them manually

领英推荐

4. How to calculate them using python

Conclusion

Is Not Rocket Science

493 位关注者

Claudiu Clement的更多文章

社区洞察

其他会员也浏览了

30 Day Map Challenge

Python for Environmental & Climate?Justice

A Slap in the Face with Pandas

Simple scheduled sentiment analysis using Jupyter Notebooks, NBFire, Google Sheets and NLTK

+30 Useful Operations in Pandas ??

A complete Exploratory Data Analysis guide with Python

Displaying Pandas DataFrames Horizontally in Jupyter Notebooks

Operations with NumPy Arrays

Highly Recommended Read for Data Enthusiasts!

Rasterize Polygons with Geopandas and GeoCube

1. What are quartiles

2. When can we use them

3. How to calculate them manually

领英推荐

4. How to calculate them using python

Conclusion

Is Not Rocket Science

493 位关注者

Claudiu Clement的更多文章

Correlation and Determination

Discrete Statistics vs Inferential Statistics

The Law of Large Numbers

Random Sampling

Moving Average

Linear Regression

Boxplots

AI = Prediction

What is sentiment analysis?

Variance

社区洞察

其他会员也浏览了

30 Day Map Challenge

Python for Environmental & Climate?Justice

A Slap in the Face with Pandas

Simple scheduled sentiment analysis using Jupyter Notebooks, NBFire, Google Sheets and NLTK

+30 Useful Operations in Pandas ??

A complete Exploratory Data Analysis guide with Python

Displaying Pandas DataFrames Horizontally in Jupyter Notebooks

Operations with NumPy Arrays

Highly Recommended Read for Data Enthusiasts!

Rasterize Polygons with Geopandas and GeoCube