Sampling size evaluation
Dr. Jacob Mack PhD
Experienced academic researcher, educator, cybersecurity specialist, and quantitative scientist.
The ability to accurately and effectively determine the sample size needed, the sampling method most appropriate, and the effect size plays an integral role in generating accurate and valuable statistical information from given research (Zikmund et al., 2012). Samples sizes too small risk missing statistically significant relationships, and sample sizes too large can create significant false results that act as artifacts in the specific data analysis (Lakens, 2017). Power analyses need to be well-founded, accurate, and valuable (Mysiak, 2020). Sampling methods that are not random or pseudorandom are prone to introducing bias into the generated samples and less likely to represent the population parameter (Mysiak, 2020). Poorly constructed groups can interfere with effect size calculations comparing two groups (Mysiak, 2020).
???????????In this paper, the concepts of and calculations for power, effect size, statistical significance, and probability are discussed via the null and alternative hypothesis. Several sample methods are delved into within the context of stratified, simple random, and systematic sample designs. The statistical software G* Power is applied to a sample size calculation with a one-tailed T-test with two independent groups of equal size, small effect sizes, an alpha of .05, a beta of .2 assumption of a sample size larger than possible. Next, I apply a sample size calculation using G * Power within the context of an ANOVA one-way fixed effects, small effect size, alpha is .05, beta is .02, three groups, and a sample size required that is unobtainable. The goal is to discuss the smallest meaningful effect size within a smaller than the optimal sample size. The appropriate sampling technique for the research is explored within the paper, and the limitations are discussed. While statistical significance is valid, it is insufficient to evaluate the results of research and statistical inference. Power calculations and effect size estimation assist to better evaluate the probability of finding an accurate result.
???????????
Power Calculations, Sample Size Calculations, and Effect Size Overview
???????????Power calculations are necessary to determine the likeliness of an effect within a study is due to an intervention or group difference as opposed to just chance (Faul et al., 2007). Statistical power also known as sensitivity, indicates the probability that the rejection of the null hypothesis is correct (Mysiak, 2020). For example, a power of 0.90 or 90% indicates a 90% chance of the test being significant (Serdar et al., 2021). Higher statistical power numbers indicate a valid study or series of tests, while a lower sensitivity indicates a higher probability of making a type II error (Zikmund et al., 2012). As a point of context, a type I error occurs when the null hypothesis is wrongly rejected, and a type II error is a failure to reject the null hypothesis when the null hypothesis is false (Mysiak, 2020).
Even when a type II error is found, however, this does not guarantee the effect is due to what the researchers think drives the alternative hypothesis (Lakens, 2020). In some cases, the null hypothesis is found to be true when other researchers attempt to replicate results (Mysiak, 2020; Zikmund et al., 2012). Hypothesis testing is carried out in terms of probabilities and statistical inference and no study proves either H0 or H1 conclusively (Lakens, 2017; Lakens, 2020; Mysiak, 2020). Power calculations are used to derive the minimum sample sizes or trials needed to obtain a significant result, but the steps typically go in order as follows:
1.)??Specify a research hypothesis and concepts.
2.)??Determine the significance level necessary.
3.)??Find the smallest acceptable effect size.
4.)??Estimate all salient parameter values.
5.)??Determine the intended power for the test, along with software to be used.
6.)??Evaluate the confidence interval and margin of error in results (Acheson, 2010; Lakens, 2017; Mysiak, 2020 ).
The reason a power test is useful is it helps avoid small samples that miss significant
results that exist and helps avoid sample sizes too large that begin to depict small effects as significant when they are not (Zikmund et al., 2012). Succinctly a statistical power of a hypothesis test is the probability of detecting an effect if there is a true effect to detect within the minimally necessary sample size of elements or participants (Mysiak, 2020).
Hypothesis Testing, P Values, and Significance
???????????Hypothesis testing within statistics starts with an assumption of a null hypothesis with an example of the t test with a belief of there being no difference between the means of two populations (Delacre, et al., 2019). The probability of the result given that H0 is true via the p-value (Mysiak, 2020). Put another way, consider a bell shaped, Gaussian curve with two tails, where the p-value represents the probability of obtaining an outcome equal to or more extreme than was observed within the data (Lakens, 2020). The smaller a p-value the stronger the initial evidence for rejecting the null hypothesis based upon how alpha is set (Zikmund et al., 2012). Alpha is the significance level, and it represents the probability of making an error when the null hypothesis is true (Mysiak, 2020). Alpha levels usually take the form of 0.05, 0.01, or 0.10 (). If the p value is less than or equal to alpha, then the null hypothesis is rejected and if the p value is greater than alpha the null hypothesis is not rejected (Mysiak, 2020). The p-value is not a guarantee of the outcome being significant outside of chance or some hidden variable, and the full context of the research design, purpose, and statistical tests must be taken into consideration (Lakens, 2017; Lakens, 2020; Mysiak, 2020; Zikmund et al., 2012).
Stratified Sampling Overview
The general purpose of a stratified sample is to generate homogeneous subsamples within separated characteristics like, gender, race, height, weight, and age (Zikmund et al., 2012). The strata is term for such homogeneous groups, and then each stratum is sampled via a probability sampling method like clustering or simple random sampling to better ensure representation of diverse characteristics taken from a population (Zikmund et al., 2012). For 75 doctors, 75 lawyers, and 75 engineers, this could work if there is diversity like in: Gender, race, age, ethnicity, and other characteristics the researchers are looking for based upon this subsampling of the population (Zikmund et al., 2012). However, if the there are few diverse characteristics, all are men, all are white, or belong to the same club, then this technique could be flawed (Mysiak, 2020). However, in general N1 doctors, N2 lawyers, and N3 engineers can be selected via individual categories of each sample N, and then subdivided by characteristics to estimate EQ or IQ with traits found by individuals within each profession (Zikmund et al., 2012). The next section discusses how to assign samples and calculate the necessary sample size.
?Sample Calculations
Stratified Sample Assignment and Sample Size Calculations
In general, to begin considering a stratification sampling method one must define the population, choose relevant stratification, list the population, properly list and label the population in accordance with the specific stratification, choose the sample size, calculate the stratification, and apply either simple random or systematic sample to the sample (Zikmund et al., 2012).
Specific Methods and Details of Sample Size Calculations for Stratification
There are several approaches to allocate samples to strata within stratified samples, including proportionate stratification and disproportionate stratification as two examples (Zikmund et al., 2012). The proportionate stratification method takes the form nh?= ( Nh?/ N ) * n, which translates into applying the sample size determination for each stratum is proportionate to the population size of the stratum (Zikmund et al., 2012). Within the equation, nh represents the sample size for the given stratum, h, N is the total population size and, and n is the total sample size (). Disproportionate stratification has less cost and more precision than the proportionate method, and it is applied via a series of steps (Zikmund et al., 2012). The steps involved for the disproportionate stratification method include asking and answering a series of questions with examples below:
1.)??With a fixed budget how should the sample be allocated for optimized precision from a stratified sample?
2.)??Given a set sample size how should the sample be allocated to get the most precision.
3.)??Given a small number of data points within a sample how can the approach be made more robust and precise (Zikmund, et al., 2012).
There are other related questions to ask (Zikmund et al., 2012), but the above three questions illustrate how to consider the use and salience of the disproportionate stratification sampling method. There are a plethora of sample size calculators online and software available for free trials and demos (Zikmund et al., 2012).
?
Simple Random Sample Overview
???????????The purpose of simple random sampling is to ensure equal probability of an individual, characteristic, or trait to be chosen, but in practice this is not realistic (Zikmund et al., 2012). The more common approach is random sampling applying unequal probability sampling with the idea of random assignment to close the gap on one individual or characteristic not being significantly more likely to be found within a sample, but this can still occur (Lakens, 2020). For example, diabetes risk assessment in an N of 150 subscribers to the local newspaper would be useful though not perfect in execution to represent 100% equal chance of individuals being chosen however, pseudorandom generator software can create acceptable confidence intervals and margins of error within characteristics (Zikmund et al., 2012).
Simple Random Sample Size Calculations
???????????Factors that influence the necessary sample size are cost issues, (computational, time, and budget), administrative constraints, the minimum standard for precision, the confidence level needed, variability within populations or subpopulations, and the sampling method employed (Zikmund et al., 2012). For a simple random sampling specifically, it is critical for researchers to determine the margin of error, choose an alpha value, find the critical z score, and unless the population is 20 times or more large than the sample size, then N needs to be specified as well (Zikmund et al., 2012). σ2,the variance must also be known (Zikmund et al., 2012). With a known population size with the sample statistic being the mean the sample size needed is calculated with this formula: n = { z2?* σ2?* [ N / (N - 1) ] } / { ME2?+ [ z2?* σ2?/ (N - 1) ] } (Zikmund et al., 2012). Within the mean and an unknown population size the formula is: n = ( z2?* σ2?) / ME2 (Zikmund et al., 2012). Analyzing the proportion with a known population size the formula is: n = [ ( z2?* p * q ) + ME2?] / [ ME2?+ z2?* p * q / N ] (Zikmund et al., 2012). For a proportion with an unknown population size the formula is: n = [ ( z2?* p * q ) + ME2?] / ( ME2?) (Zikmund et al., 2012).
Systematic Sample Overview
???????????The design of systematic sampling is probabilistic applying a regular interval known as k within the context of a random or random like population order or sequence (Zikmund et al., 2012). Systematic sampling is useful when one does not know the list within a population in advance, but ordering is important to gain an accurate representative sample similar to random assignment (Mysiak, 2020). If one were to assign a unique identifier numbers to each subscriber then use random simulations to choose the ordering not based upon demographics or other details this technique could be effective in picking every kth subscriber in random or near random intervals (Mysiak, 2020). However, this is not an efficient way to generate randomness in my experience due to the complexity of creating an ordered list that mimics randomness.
Systematic Sampling Sample Size Calculations and Sampling Interval
???????????When one has a complete list of elements within a population like members of Congress (Zikmund et al., 2012). The method to calculate sample size for systematic sampling is limited by time, budget, and level of precision needed (Zikmund et al., 2012). To perform systematic sampling, one must assign a number to each element in the sample set, construct a table of all the numbers being used and divide the population/ sample desired to be used; for example, 9,000 COVID infected/ 4,000?for the sample = 2.25, or sample every 2nd student to get 4,000 individuals (Zikmund et al., 2012). The formula for sampling interval is k = N/n, where k is the systematic sampling interval, N is the population size, and n is the sample size (Zikmund et al., 2012). ?Systematic sampling can only be used if the complete list of the population is known, but it does reduce human biases to sampling elements from a given population (Zikmund et al., 2012).
???????????????????????????????????????????????????????????G* Power in Brief
The software used in this paper is G * Power used in determining effect sizes, alpha and beta based calculations as well as generating outputs to visualize sample data. This is my first time using G* Power. I usually work with SPSS, STATA, MATLAB, and R.
One-Tailed Test G*Power
In the above G* Power output the test is a one-tailed t test, with a small effect size (d) and a chosen alpha of 0.05. The actual power 0f 0.80 or 80% is the lowest power that can be achieved to have a reasonable result probabilistically (Zikmund et al., 2012).
?
Compromise Function
Smaller Sample Size Or Not
The smaller sample size should not be used because a power with a minimum of 0.80 is required to be
?
?
?
领英推荐
Consider Trade-Offs ANOVA
ANOVA fixed effects assumptions with 3 groups and small effect Size.
?
?
?
?
Compromise Function
Compromise function with alpha and beta with sample about half the size.
?
?
?
?
?
?
?
Rationale
Beta/Alpha ratio produces a 0.808 power.
??
?
?
??????References
Acheson A. (2010). Sample size. Encyclopedia of Research Design, retrieved from:
https://sk-sagepub-com.proxy1.ncu.edu/reference/researchdesign/n396.xml
Delacre, M., Lakens, D., Mora, Y., & Leys, C. (2019). Taking Parametric Assumptions
Seriously: Arguments for the Use of Welch’s F-test instead of the Classical F-test in One-way ANOVA. International Review of Social Psychology. https://doi.org/10.17605/OSF.IO/WNEZG
?Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A flexible statistical
power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods 39(2), 175-191.
Lakens, D. (2017). How a power analysis implicitly reveals the smallest effect size you care
about. The 20% statistician. Retrieved from: https://daniellakens.blogspot.com/2017/05/how-power-analysis-implicitly-reveals.html
Lakens, D. (2020). The practical alternative to the p-value is the correctly used p-value.
Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/shm8v
Mysiak, K. (2020). The relationship between significance, power, sample size & effect size.
Towards Data Science, retrieved from: https://towardsdatascience.com/the-relationship-between-significance-power-sample-size-effect-size-899fcf95a76d
Serdar, C., Cihan, M., Yucel, D., & Serdar, M. (2021). Sample size, power, and effect size ?
revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies. Biochem Med 31(1). Retrieved from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745163/ ?
Zikmund, W., Babin, B., Carr, J., & Griffin, M. (2012). Business research methods (9th ed.).
United States: Cengage Learning.
?
?
?
?
?
?
?
?
?
?