1 - Overview of Statistical Modelling
G?KHAN YAZGAN
PL-300 Microsoft Certified Power BI Data Analyst Associate | Global SAS Certified Specialist: Base Programming Using SAS 9.4
Functions of Variables
In our model we have response variable on the left side which is the focus of our research - variable that we try to predict -, and predictor variable(s) - variables that are used to predict response variable - on the right side.
Types of Variables
Variables can be continuous, categorical or ordinal. Continuous variables are any numeric measurement like the sale price of a home, Categorical variables are specific non-numeric levels such as heating quality of a home like average, fair, good or excellent. Ordinal variables are similar to categorical variables but have a natural hierarchy like small, medium and large size coffees.
Which Model to Use
So if we have continuous response and categorical predictor variable(s) then we must use the ANOVA Model, If we have both continuous response and predictor variable(s) then we must use the Ordinary Least Squares Regression Model.
领英推荐
Y=βo+β1X1+…+βkXk+?
Finally if we have Categorical-Binary response variable and any type of predictor variable(s) then we use Logistic Regression, here we estimate probability of the desired outcome.
logit(Y)=β0+β1X1+…+βkXk
Explanatory and Predictive Modeling
No matter which statistical model we use, we need to differentiate between explanatory and predictive modelling.
In explanatory modelling we try to understand how X is related to Y. Our main concern is to accurately estimate model parameters. We use p-values and confidence intervals to reach our goal. We have small sample sizes and few variables.
In predictive modelling we predict the future values of a response variable. Our main concern is to make accurate predictions. We use holdout or validation data set to reach our goal. We have larger sample sizes and many variables.