登录查看更多内容

How important is that variable?

Andrés Gutiérrez

ECLAC Regional Adviser on Social Statistics - Vicepresident of the International Association of Survey Statisticians (2023 - 2025) - Elected Member of the International Statistical Institute

发布日期: 2023年6月20日

When building a model that includes explanatory variables related to the phenomenon of interest, one question arises: which auxiliary variables have the most impact on the response? This thread is not concerned with significance testing; rather, on knowing the ranking and importance of each variable that influences the response. There are many methods available for answering this question. In this discussion, we will focus on isolating units from variables - a straightforward approach. For example purposes only, let's assume a linear model structure with two explanatory variables.

If you assume that the model is true, you can determine the impact of variable x on response y, by isolating units from variables. One method involves fitting a model using standardized variables (both explanatory and response) and comparing the regression coefficients directly. Another approach involves using this expression:

领英推荐

Test Assumptions Only After Running an Initial Model

The Analysis Factor 1 个月前

Outliers: To Drop or Not to Drop

The Analysis Factor 8 个月前

Getting the best out of your LayTec data: Learn how to…

LayTec 6 个月前

As an illustration, suppose we have a model y = -500 x1 + 50 x2 + e. In this case, the relative importance of the first and second variables can be computed as approximately 0.9 for x1, and 0.1 for x2, by dividing their coefficients by the sum of all coefficients.

To perform this analysis in R, you can use the following code that generates a dataset of size n with two independent variables (x1 and x2) and one dependent variable (y). It then fits a linear model to the data using an intercept-free specification. Next, the two aforementioned methods are used to compute relative importance scores for each variable:

Method 1 calculates standardized betas from coefficient estimates in the fitted model object. The importance measure for each variable is obtained by taking the absolute value of beta divided by its standard deviation, normalized by their sum. Alternatively, we can calculate relative importance directly using formula based on unstandardized coefficients and variable standard deviations.
Method 2 involves fitting a new linear model after scaling (centering & normalizing) both explanatory variables as well as response variable. This method yields standardized regression coefficients that can be used to compare the relative influence of different predictors; here again, we normalize these coefficients so that they sum up to one.

# Set sample size and generate dat
n <- 10000
x1 <- runif(n)
x2 <- runif(n)
y <- -500 * x1 + 50 * x2 + rnorm(n)

# Fit linear model without intercept term
model <- lm(y ~ 0 + x1 + x2)

### Method 1: Standardized betas ###

# Compute standardized betas from coefficient estimates in fitted model object
sd.betas <- summary(model)$coe[,2]
betas?? <- coef(model)
imp???? <- abs(betas) / sd.betas # absolute value of beta divided by its standard deviation gives importance measure for each variable
imp???? <- imp / sum(imp) # divide by total to get relative importance scores

imp # display results


# Alternatively, calculate relative importance directly using formula based on unstandardized coefficients and variable standard deviations
imp1??? <- abs(model$coefficients[1] * sd(x1)/sd(y))
imp2??? <- abs(model$coefficients[2] * sd(x2)/sd(y))

rel_imp_1_to_2 = imp1 / (imp1 + imp2);
rel_imp_2_to_1 = imp2 / (imp1+ imp2);

rel_imp_1_to_2;
rel_imp_2_to_1;

### Method 02: Standardized variables ###

# Fit a new linear model using standardized variables obtained via scaling with mean=0 and SD=10.
model_std_vars?? = lm(I(scale(y)) ~ I(scale(x1)) + I(scale(x)))

summary(model_std_vars) # print summary statistics

abs(coef(model_std_vars))/sum(abs(coef(model_std_vars))) # compute relative importance scores as ratio of absolute coefficients to their sum.a

Leandro Marino

Leader | Researcher | Consultant | Professor

1 年

Thanks for bringing attention to this. That is a useful answer for one common question from a lot of social researchers.

1 次回应

查看更多评论

要查看或添加评论，请登录

Andrés Gutiérrez的更多文章

Milan Kundera and Probability

2023年7月12日

Milan Kundera and Probability

Today, Milan Kundera passed away. He was 94.

1 条评论
Lord's paradox in R

2023年7月4日

Lord's paradox in R

In an article entitled "A Paradox in the Interpretation of Group Comparisons" published in Psychological Bulletin, Lord…
You smart? How about your children? The law of regression to the mean

2023年6月26日

You smart? How about your children? The law of regression to the mean

Francis Galton -British scientist and cousin of Charles Darwin-, in the late 19th century cleverly coined the term…

1 条评论
Beyond the coefficient of variation

2023年6月21日

Beyond the coefficient of variation

It is evident that in order to have reliable and accurate official statistical systems, quality criteria must be used…

How important is that variable?

Andrés Gutiérrez

ECLAC Regional Adviser on Social Statistics - Vicepresident of the International Association of Survey Statisticians (2023 - 2025) - Elected Member of the International Statistical Institute

领英推荐

Andrés Gutiérrez的更多文章

社区洞察

其他会员也浏览了

Step Guide to Creating a Heatmap of E&R U.S. 500 Jan 2000 with Alphie

"A Deep Dive into the Basics of Real Options Analysis -Part 2 - Valuation of Real Options through Black-Scholes Option Pricing Model (excel /python(

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

Facor Analyzer : A Comprehensive Guide

Fit & predict for regression

LINEAR REGRESSION ON BOSTON DATASET

Cpk and Ppk: Process Capability Insights

Understanding Dunn's Test: A Guide for Real-World Applications

How to Compare AIC and BIC in GARCH Fitting: A Comprehensive Guide

Why Mean Squared Error (MSE)? Why not any other loss function?

领英推荐

Andrés Gutiérrez的更多文章

Milan Kundera and Probability

Lord's paradox in R

You smart? How about your children? The law of regression to the mean

Beyond the coefficient of variation

社区洞察

其他会员也浏览了

Step Guide to Creating a Heatmap of E&R U.S. 500 Jan 2000 with Alphie

"A Deep Dive into the Basics of Real Options Analysis -Part 2 - Valuation of Real Options through Black-Scholes Option Pricing Model (excel /python(

10 - Multiple Regression in SAS with PROC REG, PROC GLM and PROC PLM

Facor Analyzer : A Comprehensive Guide

Fit & predict for regression

LINEAR REGRESSION ON BOSTON DATASET

Cpk and Ppk: Process Capability Insights

Understanding Dunn's Test: A Guide for Real-World Applications

How to Compare AIC and BIC in GARCH Fitting: A Comprehensive Guide

Why Mean Squared Error (MSE)? Why not any other loss function?