登录查看更多内容

Normal Distribution

Chandra Girish S

Tech Leader & Blogger | AI Evangelist | Author of 'ebasiq by Girish' Substack | Author of 'Ganitham Guru' - Math Specialist | Enterprise Architect

发布日期: 2024年10月11日

What is Normal Distribution?

A Normal Distribution is a type of continuous probability distribution for a real-valued random variable. The graph of a normal distribution is bell-shaped and symmetrical, centered around the mean. In a normal distribution:

Mean: Determines the center of the distribution.
Standard Deviation: Determines the spread or width of the distribution.

Why is it Used?

The normal distribution is widely used in statistics because many natural phenomena and measurement outcomes tend to be normally distributed. It helps to make inferences about populations from sample data, making it crucial for hypothesis testing, confidence intervals, and many other statistical analyses.

Real-Life Examples of Normal Distribution

Height of People: Heights in a population tend to follow a normal distribution where most people fall near the average height, and fewer people are either much shorter or much taller.
IQ Scores: IQ scores are designed to be normally distributed with a mean of 100 and a standard deviation of 15.
Measurement Errors: Measurement errors in experiments often follow a normal distribution.

How Mean and Standard Deviation Are Used in Normal Distribution

Mean: The average value around which the data is centered.
Standard Deviation: A measure of how spread out the data is. It quantifies the dispersion of the dataset.

What is Meant by 1, 2, 3 Standard Deviations from the Mean?

1 Standard Deviation: Roughly 68% of the data falls within one standard deviation of the mean.
2 Standard Deviations: About 95% of the data falls within two standard deviations.
3 Standard Deviations: Approximately 99.7% of the data falls within three standard deviations.

This is often referred to as the 68-95-99.7 Rule.

Python Example 1: Calculating and Visualizing Normal Distribution

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Generate some data that follows a normal distribution
mean = 50
std_dev = 10
data = np.random.normal(mean, std_dev, 1000)

# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Plot the normal distribution curve
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = stats.norm.pdf(x, mean, std_dev)
plt.plot(x, p, 'k', linewidth=2)
plt.title('Normal Distribution (Mean: 50, Std Dev: 10)')
plt.show()

In this example:

The data is generated using a normal distribution with a mean of 50 and standard deviation of 10.
A histogram is plotted to show the distribution of data.
The normal distribution curve is overlaid.

Code Explanation:

numpy (np): A fundamental package for scientific computing in Python. It provides functions to work with arrays and generate random data.
matplotlib.pyplot (plt): A plotting library in Python used for creating static, interactive, and animated visualizations. Here, it’s used to create a histogram and plot a curve.
scipy.stats (stats): A module within scipy that contains functions for statistical operations, including working with probability distributions.
mean = 50: The mean (or expected value) of the normal distribution is set to 50.
std_dev = 10: The standard deviation of the normal distribution is set to 10.
np.random.normal(mean, std_dev, 1000): This function generates 1000 random data points that follow a normal distribution with the specified mean and std_dev. The result is stored in data.
data: The dataset that we generated from the normal distribution.
bins=30: The number of bins (or intervals) used to group the data in the histogram.
density=True: Normalizes the histogram so that the area under the histogram is equal to 1, making it comparable to the probability density function (PDF).
alpha=0.6: Adjusts the transparency of the bars, making them semi-transparent.
color='g': Sets the color of the histogram bars to green (g).
xmin, xmax = plt.xlim(): Retrieves the current x-axis limits of the plot, so the curve will fit within the histogram’s range.
x = np.linspace(xmin, xmax, 100): Generates 100 evenly spaced points between xmin and xmax to plot the curve smoothly.
stats.norm.pdf(x, mean, std_dev): Computes the probability density function (PDF) of the normal distribution for the generated x values, given the mean and std_dev. The result, p, is the height of the curve at each point in x.
plt.plot(x, p, 'k', linewidth=2): Plots the normal distribution curve on the same graph as the histogram.'k': The color of the line is black. linewidth=2: Sets the thickness of the line to 2.

领英推荐

The Only Roadmap You’ll Ever Need for Data Science…

Arif Alam 5 个月前

Deepchecks for Data and Model Validation

Aishwarya Srinivasan 2 年前

How to Perform Basic Operation with Pyspark

CodersArts 2 年前

plt.title('Normal Distribution (Mean: 50, Std Dev: 10)'): Adds a title to the plot, indicating the parameters of the normal distribution (mean and standard deviation).
plt.show(): Displays the plot with both the histogram and the normal distribution curve.

Python Example 2: Z-Score Calculation for a Normal Distribution

A Z-score tells how many standard deviations an element is from the mean.

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Data: Heights of people in cm
heights = [160, 165, 170, 155, 175, 180, 185, 160, 162, 167]

# Calculate mean and standard deviation
mean_height = np.mean(heights)
std_dev_height = np.std(heights)

# Calculate Z-scores (how many standard deviations each height is from the mean)
z_scores = [(x - mean_height) / std_dev_height for x in heights]

# Print the mean, standard deviation, and Z-scores
print(f"Mean Height: {mean_height}")
print(f"Standard Deviation: {std_dev_height}")
print(f"Z-scores: {z_scores}")

Output:
Mean Height: 167.9
Standard Deviation: 9.104394543296111
Z-scores: [-0.8677128349866006, -0.3185274963874867, 0.23065784221162722, -1.4168981735857145, 0.7798431808107411, 1.329028519409855, 1.878213858008969, -0.8677128349866006, -0.648038699546955, -0.09885336094784113]

In this example:

The mean and standard deviation of the heights are calculated.
Z-scores are computed to determine how far each height is from the mean in terms of standard deviations.
Each value in the Z-scores list corresponds to the number of standard deviations that the corresponding height is away from the mean height.

Code Explanation:

The variable heights is a list containing heights of 10 people (in centimeters). This is the raw data for which we will calculate the Z-scores.

np.mean(heights): This function from the numpy library calculates the mean (average) of the list heights. The mean is a measure of central tendency, calculated by summing all the data points and dividing by the number of points.
np.std(heights): This function calculates the standard deviation of the list heights. The standard deviation measures the amount of variation or dispersion in a set of data points. A small standard deviation means that the data points are close to the mean, while a large standard deviation indicates that the data points are spread out over a wider range of values.
This is a list comprehension that calculates the Z-scores for each height in the list. A Z-score is a statistical measure that tells you how far (in standard deviations) a data point is from the mean. It is calculated using the following formula:
Z= (x?μ)/σ Where:
x is an individual data point (in this case, the height),
μ is the mean of the data points (here, mean_height),
σ is the standard deviation of the data points (here, std_dev_height).
(x - mean_height): Subtracts the mean from each height to find the distance of that height from the mean.
/(std_dev_height): Divides this distance by the standard deviation to express it in terms of standard deviations. The resulting z_scores list contains the Z-scores of all the heights in the original heights list.
The print statements display the calculated mean, standard deviation, and Z-scores in a readable format.
f"Mean Height: {mean_height}": The f-string formatting is used to insert the calculated values into the output text.

What the Z-Scores Mean

A Z-score of 0 means the height is exactly the same as the mean height.
A positive Z-score means the height is above the mean. For example, a Z-score of 1 means the height is 1 standard deviation above the mean.
A negative Z-score means the height is below the mean. For example, a Z-score of -1 means the height is 1 standard deviation below the mean.

For a detailed explanation of the article, you can watch the accompanying YouTube video here: YouTube Video.

For more in-depth technical insights and articles, feel free to explore:

Technical Blog: Ebasiq Blog
GitHub Code Repository: Python Tutorials
YouTube Channel: Ebasiq YouTube Channel
Instagram: Ebasiq Instagram

James LeNoir

4 个月

This post is a great review of the Normal Distribution and the walk-through of coding Python to generate a Normal Distribution.

1 次回应

Dhiraj Patra

AI, ML, GenAI, IoT Innovator | Software Architect | Cloud | Data Science

4 个月

Very helpful

1 次回应

查看更多评论

要查看或添加评论，请登录

Chandra Girish S的更多文章

The Modern Guru: AI as a Catalyst for Intellectual Growth

2025年2月10日

The Modern Guru: AI as a Catalyst for Intellectual Growth

In this Modern World (Kaliyuga), finding a true Guru—someone who expands your thinking, clears your doubts, and guides…

1 条评论
Explore My Knowledge Hub: Python, AI, Data Science, and More

2024年11月13日

Explore My Knowledge Hub: Python, AI, Data Science, and More

Introduction Over the past few years, I’ve had the privilege of diving deep into the worlds of Python programming…

6 条评论
Iterators and Generators in Python

2024年10月13日

Iterators and Generators in Python

1. Iterators in Python An iterator is an object that can be iterated (looped) upon.

1 条评论
TensorFlow Basics

2024年10月13日

TensorFlow Basics

What is TensorFlow? TensorFlow is an open-source machine learning framework developed by Google Brain. It allows…

1 条评论
Standard Normal Distribution

2024年10月11日

Standard Normal Distribution

What is Standard Normal Distribution? The Standard Normal Distribution is a special case of the normal distribution…

2 条评论
The Math You Need to Know for Data Science and Artificial Intelligence

2024年9月22日

The Math You Need to Know for Data Science and Artificial Intelligence

In the world of data science and artificial intelligence (AI), understanding the underlying mathematical concepts is…

1 条评论
An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

2024年9月21日

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

Introduction In today's rapidly evolving technological landscape, Machine Learning (ML) stands as a cornerstone of…

5 条评论
Time and Space Complexity in Algorithms

2024年9月14日

Time and Space Complexity in Algorithms

Time and space complexity are fundamental concepts in computer science that help us evaluate the efficiency of…
The Future of AI: Exploring ChatGPT, Co-pilot, and Gemini in Action

2024年9月14日

The Future of AI: Exploring ChatGPT, Co-pilot, and Gemini in Action

In today’s fast-evolving technological landscape, AI-powered tools like ChatGPT, Co-pilot, and Google Gemini are…

2 条评论
Understanding Key Data Processing Systems: Real-Time, Batch, Stream, and Beyond

2024年9月13日

Understanding Key Data Processing Systems: Real-Time, Batch, Stream, and Beyond

In today’s digital world, data is the new oil. The speed and efficiency with which data is processed can define the…

See all articles

Normal Distribution

Chandra Girish S

Tech Leader & Blogger | AI Evangelist | Author of 'ebasiq by Girish' Substack | Author of 'Ganitham Guru' - Math Specialist | Enterprise Architect

Python Example 1: Calculating and Visualizing Normal Distribution

领英推荐

Python Example 2: Z-Score Calculation for a Normal Distribution

What the Z-Scores Mean

Chandra Girish S的更多文章

社区洞察

其他会员也浏览了

How to Develop a Stock Price Prediction Model: A Beginner's Guide

Debunking Data science Myth - SPSS/SAS is dead, long live Python, R get well soon

Cleaning Data with Pandas

Essential Data scientist skills

Adventures in Data Science: From Wrangling Rogue Data to Predicting the Future (and Everything in Between)

Introduction to Big Data, multi-platform Versatility, Python, Map Reduce

Choosing Your Companion for Data and AI Journey: Jupyter Notebook vs. Dataiku DSS. Part 2.

(Week 9) NumPy and Visualization Tools: A Journey into Efficient Data Manipulation and Stunning Visualizations!

Essential Tools for Aspiring Data Scientists: Your Path to Success

Working with Time Series Data in Python

Python Example 1: Calculating and Visualizing Normal Distribution

领英推荐

Python Example 2: Z-Score Calculation for a Normal Distribution

What the Z-Scores Mean

Chandra Girish S的更多文章

The Modern Guru: AI as a Catalyst for Intellectual Growth

Explore My Knowledge Hub: Python, AI, Data Science, and More

Iterators and Generators in Python

TensorFlow Basics

Standard Normal Distribution

The Math You Need to Know for Data Science and Artificial Intelligence

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

Time and Space Complexity in Algorithms

The Future of AI: Exploring ChatGPT, Co-pilot, and Gemini in Action

Understanding Key Data Processing Systems: Real-Time, Batch, Stream, and Beyond

社区洞察

其他会员也浏览了

How to Develop a Stock Price Prediction Model: A Beginner's Guide

Debunking Data science Myth - SPSS/SAS is dead, long live Python, R get well soon

Cleaning Data with Pandas

Essential Data scientist skills

Adventures in Data Science: From Wrangling Rogue Data to Predicting the Future (and Everything in Between)

Introduction to Big Data, multi-platform Versatility, Python, Map Reduce

Choosing Your Companion for Data and AI Journey: Jupyter Notebook vs. Dataiku DSS. Part 2.

(Week 9) NumPy and Visualization Tools: A Journey into Efficient Data Manipulation and Stunning Visualizations!

Essential Tools for Aspiring Data Scientists: Your Path to Success

Working with Time Series Data in Python