Sampling Distributions

Sampling Distributions

#sampling #distributions #paramters

p = population proportion

m = mean of population distribution

s = standard deviation of population distribution

The population distribution is the distribution of a variable of interest in the population. When the variable of interest is binary, measuring whether one out of two possible outcomes has occurred, the population distribution is described by the proportions p and 1-p of outcomes in these two categories. p is called the population proportion. When the variable of interest is quantitative, we summarize its population distribution by the population mean m and the population standard deviation s.

The population proportion p or the population mean m and the population standard deviation s are called population parameters, and are typically unknown. By taking a sample from the population, we try to estimate these unknown population parameters

The data distribution is the distribution of the variable of interest in a sample of size n taken from the population. From it, we compute sample statistics such as the sample proportion or the sample mean that estimate the corresponding population parameters

No alt text provided for this image

The sampling distribution is the distribution that describes all potential values of a sample statistic that we might observe when taking a random sample of size n from the population. It is the key for telling us how close a sample statistic falls to the unknown population parameter we would like to estimate. We will use the sampling distribution to conduct inference, such as computing a margin of error or a range of plausible values for the unknown population parameter.

The sampling distribution of the sample proportion has mean equal to the population proportion p and standard deviation equal to

No alt text provided for this image

The sampling distribution of the sample mean x has mean equal to the population mean m and standard deviation equal to sigma/sqrt(n)

The Central Limit Theorem states that for random samples of sufficiently large size (at least about 30 is usually enough), the sampling distribution of the sample mean is approximately normal. This theorem holds no matter what the shape of the population distribution

The sampling distribution of the sample proportion is also approximately normal whenever n is large enough so that both np and n(1-p) are at least 15.

The approximate normality of statistics involving the sample mean or the sample proportion is the reason the normal distribution plays such an important role in statistics

Other statistics such as the sample median or the sample correlation coefficient have sampling distributions, too. But unlike the sample mean or the sample proportion, there is no mathematical theorem like the CLT that can be used to predict the sampling distribution or standard deviations for these statistics without making further assumptions on the population distribution

The bootstrap is a way to approximate the sampling distribution of many statistics through resampling. A bootstrap resample is a sample taken from the original observations, with replacement and of the same size as the original sample.

The bootstrap provides estimates of key quantities such as the standard deviation of the sampling distribution directly, without the use of formulas.

要查看或添加评论,请登录

Mallikarjuna Sunkara的更多文章

社区洞察

其他会员也浏览了