登录查看更多内容

8 - Performing Simple Linear Regression Using PROC REG in SAS

G?KHAN YAZGAN

PL-300 Microsoft Certified Power BI Data Analyst Associate | Global SAS Certified Specialist: Base Programming Using SAS 9.4

发布日期: 2024年9月2日

Simple Linear Regression

Simple linear regression is a statistical method used to model the relationship between two variables by fitting a straight line to the observed data. It estimates how one variable (the dependent variable) changes as the other variable (the independent variable) varies. This technique is commonly used for predictive analysis and determining the strength and direction of relationships between variables.

In order to create your simple linear regression model, you estimate the unknown population parameters β0 and β1. They define the assumed relationship between your response and predictor variable.

Estimate β0 and β1 to determine the line that's as close as possible to all the data points using the method of least squares. This method determines the line that minimizes the sum of the squared vertical distances between the data points and the fitted line. Estimated parameters are denoted with a hat above the parameter, in this case, β0-hat and β1-hat.

Comparing the Regression Model to a Baseline Model

To determine whether the predictor variable explains a significant amount of variability in the response variable, the simple linear regression model is compared to the baseline model.

The fitted regression line in a baseline model is just a horizontal line across all values of the predictor variable. The slope of this line is 0, and the y-intercept is the sample mean of Y, which is Y-bar.

To determine whether a simple linear regression model is better than the baseline model, you compare the explained variability to the unexplained variability similarly to ANOVA.

In simple linear regression, the variability of the dependent variable Y can be decomposed into three key components:

1 - Explained Variability (Model Sum of Squares - SSM)

This measures the variability in Y that is explained by the linear relationship with X. It represents the part of the total variability that can be explained by the regression model. SSM is the amount of variability that your model explains.

The explained variability is related to the difference between the regression line and the mean of the response variable.

2 - Unexplained Variability (Error Sum of Squares - SSE)

The unexplained variability is the difference between the observed values and the regression line. The amount of variability that our model fails to explain.

3 - Total Variability (SST)

Total variability is the difference between the observed values and the mean of the response variable. It is the sum of model and error sum of squares.

SST = SSM + SSE

The Sum of Squares for the Model (SSM) and the Sum of Squares for Error (SSE) are divided by their corresponding degrees of freedom to calculate the Mean Square Regression (MSM) and Mean Square Error (MSE).
The significance of the regression analysis is evaluated similarly to an Analysis of Variance (ANOVA), by computing the F statistic (the ratio of the Mean Square Regression (MSM) to the Mean Square Error (MSE)) and the corresponding p-value. In fact, you'll see an ANOVA table in your regression output as well.

Hypothesis Testing and Assumptions for Linear Regression

Our equation for simple linear regression is this:

So in our hypothesis test we need to chexk the slope β1 is equal to 0 or not.

If the estimated simple linear regression model does not fit the data better than the baseline model, you fail to reject the null hypothesis. Thus, you do not have enough evidence to say that the slope of the regression line in the population differs from zero. If the estimated simple linear regression model does fit the data better than the baseline model, you reject the null hypothesis. Thus, you do have enough evidence to say that the slope of the regression line in the population differs from zero and that the predictor variable explains a significant amount of variability in the response variable.

4 assumptions must be met in order the test to be valid:

Mean of the response variable is linearly related to the value of the predictor variable
Normally distributed error terms.
Error terms have equal variances
Error terms are independent at each value of the predictor variable.

The Simple Linear Regression Model

The Simple Linear Regression Model describes the relationship between two variables using a straight line. The model assumes that the dependent variable Y can be expressed as a linear function of the independent variable X. The equation for the model is:

Where:

Y is the dependent / response variable.
X is the independent / predictor variable.
β0 is the intercept (the value of Y when X=0).
β1 is the slope of the line (the change in Y for a one-unit change in X).
ε is the error term, representing the difference between the observed and predicted values of Y.

ods graphics;

proc reg data=STAT1.bodyfat2;
    model PctBodyFat2 = Weight ;
    title "Simple Regression with Weight as Regressor";
run;
quit;

title;

Here is the result, we can buil model equation from the parameter estimates table.

PctBodyFat2 = -12.05 + 0.17 * Weight

Dr. Partha Majumder

?? Democratizing AI Knowledge | ???? Founder @ Paravision Lab ???? Educator | ?? Follow for Deep Learning & LLM Insights ?? IIT Bombay PhD | ???? Postdoc @ Utah State Univ & Hohai Univ ?? Published Author (20+ Papers)

5 个月

This is a great post. I have also written an article on simple linear regression with great care. Read more: https://paravisionlab.co.in/simple-linear-regression/

1 次回应

查看更多评论

要查看或添加评论，请登录

G?KHAN YAZGAN的更多文章

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

2024年9月14日

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

This time we will look at the relationship between a continuous response variable and multiple continuous predictor…

4 条评论
9 - Two-Way ANOVA Using PROC GLM and Interactions

2024年9月12日

9 - Two-Way ANOVA Using PROC GLM and Interactions

When we have two categorical predictor variables with multiple groups, then we use Two-Way Anova. We don't use one-way…
7 - Pearson Correlation in SAS with PROC CORR

2024年8月29日

7 - Pearson Correlation in SAS with PROC CORR

With ANOVA we examined the relationships between categorical predictor variables with continuous response variable. Now…
6 - ANOVA Post Hoc Tests

2024年8月27日

6 - ANOVA Post Hoc Tests

Post hoc tests, also known as multiple-comparison procedures, are used to identify which specific pairs of groups…
5 - One-Way Anova in SAS

2024年8月23日

5 - One-Way Anova in SAS

What Does One-Way Mean In "One-way ANOVA," the term "one-way" indicates that only a single independent variable…
4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

2024年8月16日

4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

Before building a model we must examine the graphical relationships between the response and predictor variables in…
3 - One Sample t-test vs. Two Sample t Test with SAS

2024年8月14日

3 - One Sample t-test vs. Two Sample t Test with SAS

Some General Concepts First Parameters are evaluations of characteristics of populations. They are usually unknown and…
2 - Statistical Hyphothesis Test

2024年7月18日

2 - Statistical Hyphothesis Test

Ho - Equality: Your null hyphothesis is usually one of equality. Ha - Inequality: Alternative hyphothesis is typically…
1 - Overview of Statistical Modelling

2024年7月1日

1 - Overview of Statistical Modelling

Functions of Variables In our model we have response variable on the left side which is the focus of our research -…

See all articles

Simple Linear Regression

Comparing the Regression Model to a Baseline Model

1 - Explained Variability (Model Sum of Squares - SSM)

2 - Unexplained Variability (Error Sum of Squares - SSE)

3 - Total Variability (SST)

Hypothesis Testing and Assumptions for Linear Regression

The Simple Linear Regression Model

G?KHAN YAZGAN的更多文章

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

9 - Two-Way ANOVA Using PROC GLM and Interactions

7 - Pearson Correlation in SAS with PROC CORR

6 - ANOVA Post Hoc Tests

5 - One-Way Anova in SAS

4 - Graphically Exploring Relationships or Associations in SAS Before Model Building

3 - One Sample t-test vs. Two Sample t Test with SAS

2 - Statistical Hyphothesis Test

1 - Overview of Statistical Modelling