Population vs sample

Population and sample are two fundamental concepts of statistical theory. In every statistical test, you deal with at least one population and an associated sample.

Before even thinking of collecting data you need to define the population(s) involved in the test. A population is the group you want to make generalizations about. You want to make some sort of statement about this group, such as: “Oak trees are on average x meters in length”. When performing a statistical test you often want to see if there is a difference between two potential populations, such as: “Oak trees in area x are on average taller compared to oak trees in area y”. But, if no difference is detected by the test, there is a high probability that heights of all oak trees belong to the same population. You need to be sure about which group you actually make generalizations about.

It is in practice impossible to gather information about the heights of all oak trees in the world or even in Sweden. Therefore you need to collect a subset of all the heights in the population. This is called a sample. The sample is a random subset that represents the population. The sample needs to be random, otherwise it is not really representing the population. Let’s say you are interested in describing the height of all oak trees in the world. But, since you don't like to fly, you only collect random subsets of nearby countries where you can go by train. The problem about this study is that the samples are not randomly drawn from the population of all oak trees in the world. The samples in fact represent the heights of oak trees in a small part of Scandinavia.

When you are working with a dataset, it is important that you know if you are dealing with a population or a sample. In most cases the data is from a sample, but sometimes it is actually possible to collect data from the entire population. The equations used to describe a population differ depending on if you have observations from all the units in the population or from a random sample.

To wrap up:

  • Be sure to define the population that you want to make generalizations about.
  • The sample of a population needs to be representative, which means it has to be randomly drawn from the population.
  • Be sure that you know whether your data is from the entire population or a sample.

Ilaf Hashim

Business Intelligence Developer at Voyado

1 年

Love it. S? pedagogiskt f?rklarat! ??

要查看或添加评论,请登录

Jesper Martinsson的更多文章

  • Standard error

    Standard error

    I believe the standard error is one of the most confusing concepts for those that are new in statistics. That is my…

  • The normal distribution

    The normal distribution

    The normal distribution has distinct characteristics that form the foundation for parametric statistical tests…

  • How to describe a statistical population using R - Part 2: Distribution

    How to describe a statistical population using R - Part 2: Distribution

    Besides Location and variability you can also use the distribution as a way to describe your data. Frequencies and…

  • How to describe a statistical population using R - Part 1: Location and variability

    How to describe a statistical population using R - Part 1: Location and variability

    Measures of location and variability play a fundamental role in describing a statistical population. They are equally…

  • Hypothesis testing

    Hypothesis testing

    Hypothesis testing is, according to my opinion, analogous to the scientific method. It follows a logical structure that…

  • Variables and scale

    Variables and scale

    Data used in research and statistical tests can be obtained by measuring stuff directly (such as height), collecting…

社区洞察

其他会员也浏览了