6 - ANOVA Post Hoc Tests
G?KHAN YAZGAN
PL-300 Microsoft Certified Power BI Data Analyst Associate | Global SAS Certified Specialist: Base Programming Using SAS 9.4
Post hoc tests, also known as multiple-comparison procedures, are used to identify which specific pairs of groups differ significantly from each other. Additionally, these tests help control the experiment-wise Type I error rate, ensuring that the probability of making at least one false-positive conclusion across all tests remains below the chosen alpha level, typically set at 0.05.
Multiple Comparison Methods
We will check the difference in means between groups pairwise to determine which group is different than the other.
However, when you conduct a single statistical test at an α level of 0.05, there is a 5% chance of incorrectly rejecting the null hypothesis, assuming that the null hypothesis is actually true. So without adjustments when compared groups increase the chance of making type 1 error increases.
Multiple comparisons can increase the Type I error rate for the experiment if not properly controlled with post hoc techniques. This means that without adjustments, the likelihood of incorrectly rejecting the null hypothesis when assessing differences in means will rise.
The comparisonwise error rate, or the CER, is the probability of a Type 1 error on a single pairwise test. The experimentwise error rate, or EER, is the probability of making at least one Type 1 error when you perform the entire set of comparisons.
EER is 1 minus the complement, where α is the significance level. nc is the number of comparisons,
We need to use a method that controls the EER at a level like 0.05.
Tukey's and Dunnett's Multiple Comparison Methods
Tukey's and Dunnett's Multiple Comparison Methods are statistical techniques used in the analysis of experimental data, particularly when comparing multiple groups. These methods help identify significant differences between group means but are applied in different contexts.
Tukey's Multiple Comparison Method (Tukey's HSD)
Tukey's HSD (Honestly Significant Difference) method is used to compare all possible pairs of group means after conducting an analysis of variance (ANOVA). It’s designed to identify significant differences between any two groups among several.
Dunnett's Multiple Comparison Method
Dunnett's Test is specifically used to compare each treatment group to a single control group. Unlike Tukey's method, it does not compare every possible pair of groups but focuses on differences between the control group and each treatment group.
Differences
In summary, the choice between Tukey's and Dunnett's methods depends on the study design and the specific comparisons of interest. Tukey's HSD is used when comparing all groups to each other, while Dunnett's Test is used when the primary interest is in comparing several groups to a single control group.
Both groups controls EER to at most α level.
Numerous other multiple comparison methods are available. The various techniques differ in the extent to which they manage the experimentwise error rate. Decreasing the Type 1 error rate raises the Type 2 error rate, meaning it lowers the statistical power. In certain scenarios, a Type 1 error is more detrimental than a Type 2 error, or the opposite.
We can say Tukey adjustment is for more pairwise comparisons than the Dunnett adjustment so the Dunnett comparisons show the same pairs with smaller p-values.
Situational Considerations:
Examples of Multiple Comparison Procedures:
The choice of method depends on the study's context, the consequences of making errors, and the need to balance sensitivity (power) with the control of false positives.
领英推荐
Balance between Type 1 and Type 2 errors and how the importance of these errors can vary depending on the situation
Key points:
In summary, the importance of understanding the balance between error types in statistical analysis and how the significance of each type of error can vary depending on the stage of research or the specific context.
Diffograms and Control Plots
Diffograms can be utilized to visually determine whether the means of different group pairs differ statistically.
A control plot illustrates the least squares mean along with decision limits. It compares each treatment group to the control group using Dunnett's method.
Performing a Post Hoc Pairwise Comparison Using PROC GLM
Lets start by writing our PROC GLM code for our One-Way Anova Post-Hoc analysis of AgeAtDeath = Smoking_Status. Our data is SASHELP.HEART, we want control and diffoogram plots, our categorical predictor variable is smoking_status, we normally demand only tukey but for this time dunnet also to see non-smokers position as control group.
We already determined from a significant overall ANOVA result that at least one smoking status was different before studies (article number 5). Lets use PROC GLM to determine which pairs are significantly different from each other in their mean AgeAtDeath.
ods graphics;
ods select lsmeans diff diffplot controlplot;
proc glm data=SASHELP.HEART
plots(only)=(diffplot(center) controlplot);
class Smoking_Status;
model AgeAtDeath=Smoking_Status;
lsmeans Smoking_Status / pdiff=all
adjust=tukey;
lsmeans Smoking_Status / pdiff=control('Non-smoker')
adjust=dunnett;
title "Post-Hoc Analysis of ANOVA - Smoking Status as Predictor";
run;
quit;
title;
The first table shows the means for each group,and each mean is assigned a number to refer to it in the next table. We can see that the average AgeAtDeath of patients with Non-Smoker Smoking Status is the highest, at approximately 73.76. Patients with Very Heavy Smoking Status have the lowest average AgeAtDeath, at approximately 65.41.
The second table shows the p-values from pairwise comparisons of all possible combinations of means. The nonsignificant pairwise differences are between Heavy and Moderate, Light and Moderate Smoking Status Groups.. These p-values are adjusted using the Tukey method and are, therefore, larger than the unadjusted p-values for the same comparisons. However, the experimentwise Type 1 error rate is held fixed at alpha (0.05).
The comparisons of least square means are also shown graphically in the diffogram. Ten comparisons are shown ((n* n-1) / 2) so we have 5 groups, (5 * 4 / 2 = 10 comparisons are shown)).
The blue solid lines denote significant differences between smoking status levels, because these confidence intervals for the difference do not cross the diagonal equivalence line. Red dashed lines indicate a non-significant difference between treatments.
Starting at the top, left to right, we can see Very Heavy is significantly different from especially Non-smoker's and from other groups. Heavy is significantly different from Non-smoker's, Light and Very-Heavy. Moderate is significantly different from non-smoker group whereas light and moderate, also moderate and heavy groups means are not significantly different.
Lets look at the Dunnetts LSMEANS comparisons as well. In this case, all other smoking status levels are compared to Non-Smoker group. We can see that all the groups are significantly different from Non-Smoker control level.
The control plot corresponds to the tables that were summarized. The horizontal line is drawn at the least squares mean for Non-Smoker, which is 73.76. The other four means are represented by the ends of the vertical lines extending from the horizontal control line.
Blue areas are the non-significance zones vary in size. This is because different comparisons involve different sample sizes. Smaller sample sizes require larger mean differences to reach statistical significance. This control plot shows that all the other groups are significantly different from Non-Smoker control group.