Quartiles

Quartiles

One of the last articles here. The newsletter is moved to substack.

Subscribe here: Newsletter new location

Table of contents:

  1. What are quartiles
  2. When can we use them
  3. How to calculate them manually
  4. How to calculate them using python

1. What are quartiles

Quartiles are position indicators that divide a sequence of numbers into 4 equal parts.

Let’s look at the below schema.

Thanks for reading Is Not Rocket Science! Subscribe for free to receive new posts and support my work.

Subscribed

We have a sequence with n=12 (numbers from 14 to 57) and let’s imagine these represent the number of tractors some 12 farms have in the northern region of Statistics Land.

Quartiles analysis is part of descriptive statistics and consequently, helps us better understand the data at hand. With these, specifically, we will understand what is called central tendency or :

“A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data”

2. When can we use them

Ok, let’s come back now. We have a businessman that asks us to help him better understand this northern region where he wants to open a farm. His main question is: “Will I be in the top 25% of farms considering tractor count if I open a farm with 43 tractors?”

Hmm… Great question.

As we can see in Figure 1 above, I have already split data into 4 quadrants.

The interpretation is the following:

  1. 50% of farms in the northern area of Statistics Land have less than 36 tractors and 50% have more.
  2. 25% of these farms have more than 51 tractors
  3. 75% of these farms have less than 51 tractors

I know. I took it took fast. Let’s look below to see how we actually calculate these values.

3. How to calculate them manually

Before doing any calculation of Quartiles we need to order the sequence from lowest to highest. [already did that in Figure 1]

Next, let’s look below at Figure 2.1 and 2.2 below for some clarification.

Further, let’s calculate the position of these quartiles before calculating them.

So, what do these 3.25 / 6.5 / 9.75 mean?

These represent the position of quartile’s values in this sequence of numbers.

In figure 2.1 in blue, I have written the index of each number. For Q1 for example, its position is 3,25 so somewhere between 3 and 4. Closer to 3rd index number than to 4th. We can now intuitively state that Q1 is equal to something between 16 and 17 (index 3=16, index 4=17).

But what is the number?

What we can do is apply the average of these two numbers and we will end up with the value of our Q1.

So Q1 = (16+17)/2 = 16.5

For Q2, the value is somewhere between the 6th and 7th index so we average the 6th and 7th values of our sequence, thus Q2 = (32+40)/2 = 36.

Same for Q3, the value is 9.75 so somewhere between 9th and 10th position. Q3 = (50/52)/2 = 51

Now, here if you would add my numbers you would get slightly different results. Why? Because precisely Q1 in our case is 3.25 which means the very accurate value is right on the first quarter of the distance between 16 and 17. The website above shows us the value 16.25 and that is calculated as (16*3+17)/4. The value is 3 times closer to 16 than it is to 17.

4. How to calculate them using python

Ok, now moving to python. For this calculation, we will use NumPy.

Alright, ok. But something is off, isn’t it? We have Q1 = 16.75 compared to 16.5 or 16.25 that the above website calculates.

Ok. Not going to boil you too much. The problem relies on the default setting of np.quantile, more on it here.

A simple solution to this and to stick with our reasoning from this article, add interpolation=’midpoint’ and we’re all set.

In Figure 1 I wrote “Interquartile Range” and that represents the midspread or the middle 50% of our data. This also helps us understand which values can be considered outliers, I wrote a few things about this here.

In the end, let’s not forget our business man. Let’s reply his question. Will he be in top 25% of businesses if his farm will own 43 tractors?

The answer is no, for him to be in top 25%, he will need at least 51 tractors.

Conclusion

We’ve looked on what quartiles are, when we can use them and how they’re calculated. I hope it is a bit clearer now.

Until next time, keep learning.


要查看或添加评论,请登录

Claudiu Clement的更多文章

  • Correlation and Determination

    Correlation and Determination

    Table of Contents: Introduction to Correlation and Determination Coefficient of Correlation: Unveiling Relationships…

    1 条评论
  • Discrete Statistics vs Inferential Statistics

    Discrete Statistics vs Inferential Statistics

    This newsletter has moved to https://inrs.substack.

  • The Law of Large Numbers

    The Law of Large Numbers

    This newsletter is moving to https://inrs.substack.

  • Random Sampling

    Random Sampling

    Table of Contents: What is Random Sampling? Understanding Random Sampling Steps for Conducting Random Sampling Random…

  • Moving Average

    Moving Average

    Hello Rockets, Welcome to another edition of "Is Not Rocket Science"! Today, we're going to unravel the concept of…

  • Linear Regression

    Linear Regression

    The newsletter is moved to substack. Subscribe here: Newsletter new location Hello Rockets, In today's edition of "Is…

  • Boxplots

    Boxplots

    The newsletter is moved to substack. Subscribe here: Newsletter new location Table of contents: Why Boxplots? The…

  • AI = Prediction

    AI = Prediction

    Hello, rockets! In the heart of today's digital renaissance, Artificial Intelligence (AI) stands as a beacon of…

  • What is sentiment analysis?

    What is sentiment analysis?

    Hello Rockets, Today we're venturing into the fascinating universe of sentiment analysis, and no, it's not rocket…

    1 条评论
  • Variance

    Variance

    Hello rockets, In the intricate realm of numbers and statistics, there are various concepts that may initially seem…

    3 条评论

社区洞察

其他会员也浏览了