登录查看更多内容

6 - ANOVA Post Hoc Tests

G?KHAN YAZGAN

PL-300 Microsoft Certified Power BI Data Analyst Associate | Global SAS Certified Specialist: Base Programming Using SAS 9.4

发布日期: 2024年8月27日

Post hoc tests, also known as multiple-comparison procedures, are used to identify which specific pairs of groups differ significantly from each other. Additionally, these tests help control the experiment-wise Type I error rate, ensuring that the probability of making at least one false-positive conclusion across all tests remains below the chosen alpha level, typically set at 0.05.

Multiple Comparison Methods

We will check the difference in means between groups pairwise to determine which group is different than the other.

However, when you conduct a single statistical test at an α level of 0.05, there is a 5% chance of incorrectly rejecting the null hypothesis, assuming that the null hypothesis is actually true. So without adjustments when compared groups increase the chance of making type 1 error increases.

Multiple comparisons can increase the Type I error rate for the experiment if not properly controlled with post hoc techniques. This means that without adjustments, the likelihood of incorrectly rejecting the null hypothesis when assessing differences in means will rise.

The comparisonwise error rate, or the CER, is the probability of a Type 1 error on a single pairwise test. The experimentwise error rate, or EER, is the probability of making at least one Type 1 error when you perform the entire set of comparisons.

EER is 1 minus the complement, where α is the significance level. nc is the number of comparisons,

We need to use a method that controls the EER at a level like 0.05.

Tukey's and Dunnett's Multiple Comparison Methods

Tukey's and Dunnett's Multiple Comparison Methods are statistical techniques used in the analysis of experimental data, particularly when comparing multiple groups. These methods help identify significant differences between group means but are applied in different contexts.

Tukey's Multiple Comparison Method (Tukey's HSD)

Tukey's HSD (Honestly Significant Difference) method is used to compare all possible pairs of group means after conducting an analysis of variance (ANOVA). It’s designed to identify significant differences between any two groups among several.

Application: Used when you want to compare all possible pairs of group means.
Purpose: To determine if there are statistically significant differences between the means of every possible pair of groups.
Features: Calculates p-values for all possible pairwise comparisons.Controls the type I error rate, meaning it reduces the likelihood of falsely identifying a difference as significant. Works well under the assumption of equal variances across groups.

Dunnett's Multiple Comparison Method

Dunnett's Test is specifically used to compare each treatment group to a single control group. Unlike Tukey's method, it does not compare every possible pair of groups but focuses on differences between the control group and each treatment group.

Application: Used when comparisons are made between a control group and several treatment groups.
Purpose: To determine if there are significant differences between the control group and each of the other groups.
Features: Only involves comparisons between the control group and each of the other groups, which reduces the number of tests. Also controls the type I error rate. Particularly suitable for studies where the primary interest is in comparing treatment groups against a control.

Differences

Number of Comparisons:
Use Cases:
Error Rate Control:

In summary, the choice between Tukey's and Dunnett's methods depends on the study design and the specific comparisons of interest. Tukey's HSD is used when comparing all groups to each other, while Dunnett's Test is used when the primary interest is in comparing several groups to a single control group.

Both groups controls EER to at most α level.

Numerous other multiple comparison methods are available. The various techniques differ in the extent to which they manage the experimentwise error rate. Decreasing the Type 1 error rate raises the Type 2 error rate, meaning it lowers the statistical power. In certain scenarios, a Type 1 error is more detrimental than a Type 2 error, or the opposite.

We can say Tukey adjustment is for more pairwise comparisons than the Dunnett adjustment so the Dunnett comparisons show the same pairs with smaller p-values.

Lowering the Type 1 error rate increases the Type 2 error rate, that is, it reduces the statistical power.

Situational Considerations:

In some studies, particularly in clinical trials, a Type 1 error can have severe consequences (e.g., incorrectly concluding that a treatment is effective). In these cases, methods that strongly control for Type 1 error, like Bonferroni or Holm’s method, are preferable.
In exploratory research, where the goal is to identify potential differences for further study, a higher Type 2 error might be acceptable, and methods like Tukey’s HSD or Dunnett’s test could be more appropriate.

Examples of Multiple Comparison Procedures:

Bonferroni Correction: Very conservative, controls Type 1 error strictly by dividing the alpha level by the number of comparisons.
Holm’s Method: A step-down procedure that is less conservative than Bonferroni but still controls the experimentwise error rate.
Tukey's HSD: Controls the experimentwise error rate for all pairwise comparisons.
Dunnett’s Test: Specifically controls Type 1 error for comparisons against a control group.

The choice of method depends on the study's context, the consequences of making errors, and the need to balance sensitivity (power) with the control of false positives.

领英推荐

Beyond R-squared: Assessing the Fit of Regression…

The Analysis Factor 8 个月前

Interpreting the Intercept in a Regression Model

The Analysis Factor 10 个月前

Logistic Regression: Predicting Outcomes with Data

Dr. Tuhin Banik 5 个月前

Balance between Type 1 and Type 2 errors and how the importance of these errors can vary depending on the situation

Type 1 Error: This occurs when we incorrectly conclude that there is a difference or effect when there isn’t one (i.e., a false positive). For example, believing that a plant compound cures cancer when it actually doesn’t.
Type 2 Error: This occurs when we fail to detect a real difference or effect (i.e., a false negative). For instance, not identifying an effective cancer treatment.

Key points:

Balancing the Error Types: When you try to reduce Type 1 errors, you often increase the risk of Type 2 errors. This means that by trying to avoid false positives (Type 1 errors), you may lower your ability to detect true effects (statistical power). Statistical power is the test’s ability to correctly identify a true effect.
Priorities Depending on the Situation:

Early Stages of Research: In fields like cancer research, it’s more important to have high power to detect an effective compound (avoiding Type 2 errors) in the early stages. At this point, even if some false positives occur, they can be weeded out in later, more rigorous testing.
Follow-Up Studies: Later, when these compounds are tested on patients, it becomes more crucial to minimize Type 1 errors to avoid recommending a treatment that doesn’t actually work. At this stage, the consequences of a false positive are much more serious, so reducing Type 1 errors is more critical.

In summary, the importance of understanding the balance between error types in statistical analysis and how the significance of each type of error can vary depending on the stage of research or the specific context.

Diffograms and Control Plots

Diffograms can be utilized to visually determine whether the means of different group pairs differ statistically.

A control plot illustrates the least squares mean along with decision limits. It compares each treatment group to the control group using Dunnett's method.

Performing a Post Hoc Pairwise Comparison Using PROC GLM

Lets start by writing our PROC GLM code for our One-Way Anova Post-Hoc analysis of AgeAtDeath = Smoking_Status. Our data is SASHELP.HEART, we want control and diffoogram plots, our categorical predictor variable is smoking_status, we normally demand only tukey but for this time dunnet also to see non-smokers position as control group.

We already determined from a significant overall ANOVA result that at least one smoking status was different before studies (article number 5). Lets use PROC GLM to determine which pairs are significantly different from each other in their mean AgeAtDeath.

ods graphics;

ods select lsmeans diff diffplot controlplot;
proc glm data=SASHELP.HEART 
         plots(only)=(diffplot(center) controlplot);
    class Smoking_Status;
    model AgeAtDeath=Smoking_Status;
    lsmeans Smoking_Status / pdiff=all 
                         adjust=tukey;
    lsmeans Smoking_Status / pdiff=control('Non-smoker') 
                         adjust=dunnett;
    title "Post-Hoc Analysis of ANOVA - Smoking Status as Predictor";
run;
quit;

title;

The first table shows the means for each group,and each mean is assigned a number to refer to it in the next table. We can see that the average AgeAtDeath of patients with Non-Smoker Smoking Status is the highest, at approximately 73.76. Patients with Very Heavy Smoking Status have the lowest average AgeAtDeath, at approximately 65.41.

The second table shows the p-values from pairwise comparisons of all possible combinations of means. The nonsignificant pairwise differences are between Heavy and Moderate, Light and Moderate Smoking Status Groups.. These p-values are adjusted using the Tukey method and are, therefore, larger than the unadjusted p-values for the same comparisons. However, the experimentwise Type 1 error rate is held fixed at alpha (0.05).

The comparisons of least square means are also shown graphically in the diffogram. Ten comparisons are shown ((n* n-1) / 2) so we have 5 groups, (5 * 4 / 2 = 10 comparisons are shown)).

The blue solid lines denote significant differences between smoking status levels, because these confidence intervals for the difference do not cross the diagonal equivalence line. Red dashed lines indicate a non-significant difference between treatments.

Starting at the top, left to right, we can see Very Heavy is significantly different from especially Non-smoker's and from other groups. Heavy is significantly different from Non-smoker's, Light and Very-Heavy. Moderate is significantly different from non-smoker group whereas light and moderate, also moderate and heavy groups means are not significantly different.

Lets look at the Dunnetts LSMEANS comparisons as well. In this case, all other smoking status levels are compared to Non-Smoker group. We can see that all the groups are significantly different from Non-Smoker control level.

Non-Smoker group is the control group here.

The control plot corresponds to the tables that were summarized. The horizontal line is drawn at the least squares mean for Non-Smoker, which is 73.76. The other four means are represented by the ends of the vertical lines extending from the horizontal control line.

Blue areas are the non-significance zones vary in size. This is because different comparisons involve different sample sizes. Smaller sample sizes require larger mean differences to reach statistical significance. This control plot shows that all the other groups are significantly different from Non-Smoker control group.

要查看或添加评论，请登录

G?KHAN YAZGAN的更多文章

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

2024年9月14日

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

This time we will look at the relationship between a continuous response variable and multiple continuous predictor…

4 条评论
9 - Two-Way ANOVA Using PROC GLM and Interactions

2024年9月12日

9 - Two-Way ANOVA Using PROC GLM and Interactions

When we have two categorical predictor variables with multiple groups, then we use Two-Way Anova. We don't use one-way…
8 - Performing Simple Linear Regression Using PROC REG in SAS

2024年9月2日

8 - Performing Simple Linear Regression Using PROC REG in SAS

Simple Linear Regression Simple linear regression is a statistical method used to model the relationship between two…

2 条评论
7 - Pearson Correlation in SAS with PROC CORR

2024年8月29日

7 - Pearson Correlation in SAS with PROC CORR

With ANOVA we examined the relationships between categorical predictor variables with continuous response variable. Now…
5 - One-Way Anova in SAS

2024年8月23日

5 - One-Way Anova in SAS

What Does One-Way Mean In "One-way ANOVA," the term "one-way" indicates that only a single independent variable…
4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

2024年8月16日

4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

Before building a model we must examine the graphical relationships between the response and predictor variables in…
3 - One Sample t-test vs. Two Sample t Test with SAS

2024年8月14日

3 - One Sample t-test vs. Two Sample t Test with SAS

Some General Concepts First Parameters are evaluations of characteristics of populations. They are usually unknown and…
2 - Statistical Hyphothesis Test

2024年7月18日

2 - Statistical Hyphothesis Test

Ho - Equality: Your null hyphothesis is usually one of equality. Ha - Inequality: Alternative hyphothesis is typically…
1 - Overview of Statistical Modelling

2024年7月1日

1 - Overview of Statistical Modelling

Functions of Variables In our model we have response variable on the left side which is the focus of our research -…

See all articles

6 - ANOVA Post Hoc Tests

G?KHAN YAZGAN

PL-300 Microsoft Certified Power BI Data Analyst Associate | Global SAS Certified Specialist: Base Programming Using SAS 9.4

Multiple Comparison Methods

Tukey's and Dunnett's Multiple Comparison Methods

Tukey's Multiple Comparison Method (Tukey's HSD)

Dunnett's Multiple Comparison Method

Differences

Examples of Multiple Comparison Procedures:

领英推荐

Balance between Type 1 and Type 2 errors and how the importance of these errors can vary depending on the situation

Diffograms and Control Plots

Performing a Post Hoc Pairwise Comparison Using PROC GLM

G?KHAN YAZGAN的更多文章

社区洞察

其他会员也浏览了

How to: Best practices of applying A/B testing in businesses.

Linear regression

6 MISTAKES OF HYPOTHESIS TESTING

The Power of Hypothesis Testing

Idea of Use and Abuse of Regression

How to design and conduct data experiments and tests

The Distribution of Independent Variables in Regression Models

Propensity Models: Concept, Development & Maintenance

How to build a Hypothesis Test?

Overfitting in Regression Models

Multiple Comparison Methods

Tukey's and Dunnett's Multiple Comparison Methods

Tukey's Multiple Comparison Method (Tukey's HSD)

Dunnett's Multiple Comparison Method

Differences

Examples of Multiple Comparison Procedures:

领英推荐

Balance between Type 1 and Type 2 errors and how the importance of these errors can vary depending on the situation

Diffograms and Control Plots

Performing a Post Hoc Pairwise Comparison Using PROC GLM

G?KHAN YAZGAN的更多文章

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

9 - Two-Way ANOVA Using PROC GLM and Interactions

8 - Performing Simple Linear Regression Using PROC REG in SAS

7 - Pearson Correlation in SAS with PROC CORR

5 - One-Way Anova in SAS

4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

3 - One Sample t-test vs. Two Sample t Test with SAS

2 - Statistical Hyphothesis Test

1 - Overview of Statistical Modelling

社区洞察

其他会员也浏览了

How to: Best practices of applying A/B testing in businesses.

Linear regression

6 MISTAKES OF HYPOTHESIS TESTING

The Power of Hypothesis Testing

Idea of Use and Abuse of Regression

How to design and conduct data experiments and tests

The Distribution of Independent Variables in Regression Models

Propensity Models: Concept, Development & Maintenance

How to build a Hypothesis Test?

Overfitting in Regression Models