Normal Distribution in Statistics
RAHUL KUMAR
Data engineer with skills :- Python, PySpark, SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake ,Azure Synapse Analytics.Created pipeline to ingest data from heterogeneous sources.Also build python tools.
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve.
This is a distribution that occurs naturally in many situations. For example, the bell curve is seen in tests like the SAT and GRE. The bulk of students will score the average (C), while smaller numbers of students will score a B or D. An even smaller percentage of students score an F or an A. This creates a distribution that resembles a bell (hence the nickname). The bell curve is symmetrical. Half of the data will fall to the left of the mean; half will fall to the right.
Many groups follow this type of pattern. That’s why it’s widely used in business, statistics and in government bodies like the FDA:
- Heights of people.
- Measurement errors.
- Blood pressure.
- Points on a test.
- IQ scores.
- Salaries.
The Empirical Rule for the Normal Distribution
The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean:
- 68% of the data falls within one standard deviation of the mean.
- 95% of the data falls within two standard deviations of the mean.
- 99.7% of the data falls within three standard deviations of the mean.
The standard deviation(σ) controls the spread of the distribution. A smaller standard deviation indicates that the data is tightly clustered around the mean; the normal distribution will be taller. A larger standard deviation indicates that the data is spread out around the mean; the normal distribution will be flatter and wider.
Let’s look at a pizza delivery example. Assume that a pizza restaurant has a mean delivery time of 30 minutes and a standard deviation of 5 minutes. Using the Empirical Rule, we can determine that 68% of the delivery times are between 25-35 minutes (30 +/- 5), 95% are between 20-40 minutes (30 +/- 2*5), and 99.7% are between 15-45 minutes (30 +/-3*5).
Properties of a Normal Distribution
- The mean, mode and median are all equal.
- The curve is symmetric at the center (i.e. around the mean).
- Exactly half of the values are to the left of center and exactly half the values are to the right.
- The total area under the curve is 1.
The Standard Normal Model
A standard normal model is a normal distribution with a mean of 0 and a standard deviation(σ) of 1.