Demystifying the Interquartile Range (IQR) in Python with NumPy and Pandas
The interquartile range (IQR) is a valuable statistic used to measure the spread of the middle 50% of a dataset. It tells you how much variability exists within the central portion of your data, excluding outliers.
This article dives into calculating IQR in Python using the powerful libraries NumPy and Pandas. We'll explore various scenarios, from analyzing a single array to calculating IQR for multiple data frame columns.
Example 1: Unpacking the IQR for a Single Array
Let's calculate the IQR for this sample dataset:
import numpy as np
data = np.array([14, 19, 20, 22, 24, 26, 27, 30, 30, 31, 36, 38, 44, 47])
Here's the code that calculates and displays the IQR:
# Find the 1st quartile (Q1) and 3rd quartile (Q3)
q3, q1 = np.percentile(data, [75, 25])
# Calculate IQR (Q3 - Q1)
iqr = q3 - q1
# Print the IQR
print(iqr)
This code outputs:
12.25
Key Takeaways:
By mastering IQR calculations in Python, you can gain deeper insights into the distribution of your data!