Population vs sample
Jesper Martinsson
From Oceans to Dashboards: Marine Ecologist | Data Wrangler | BI Leader
Population and sample are two fundamental concepts of statistical theory. In every statistical test, you deal with at least one population and an associated sample.
Before even thinking of collecting data you need to define the population(s) involved in the test. A population is the group you want to make generalizations about. You want to make some sort of statement about this group, such as: “Oak trees are on average x meters in length”. When performing a statistical test you often want to see if there is a difference between two potential populations, such as: “Oak trees in area x are on average taller compared to oak trees in area y”. But, if no difference is detected by the test, there is a high probability that heights of all oak trees belong to the same population. You need to be sure about which group you actually make generalizations about.
It is in practice impossible to gather information about the heights of all oak trees in the world or even in Sweden. Therefore you need to collect a subset of all the heights in the population. This is called a sample. The sample is a random subset that represents the population. The sample needs to be random, otherwise it is not really representing the population. Let’s say you are interested in describing the height of all oak trees in the world. But, since you don't like to fly, you only collect random subsets of nearby countries where you can go by train. The problem about this study is that the samples are not randomly drawn from the population of all oak trees in the world. The samples in fact represent the heights of oak trees in a small part of Scandinavia.
领英推荐
When you are working with a dataset, it is important that you know if you are dealing with a population or a sample. In most cases the data is from a sample, but sometimes it is actually possible to collect data from the entire population. The equations used to describe a population differ depending on if you have observations from all the units in the population or from a random sample.
To wrap up:
Business Intelligence Developer at Voyado
1 年Love it. S? pedagogiskt f?rklarat! ??