Can Likert Scale Data ever be Continuous?

Can Likert Scale Data ever be Continuous?

A very common question is whether it is legitimate to use Likert scale data in parametric statistical procedures that require interval data, such as Linear Regression, ANOVA, and Factor Analysis.

A typical Likert scale item has 5 to 11 points that indicate the degree of something. For example, it could measure agreement with a statement, such as 1=Strongly Disagree to 5=Strongly Agree. It can be a 1 to 5 scale, 0 to 10, etc.

The Debate

The issue is that despite having numbers, a Likert scale item is in fact a set of ordered categories. The numerals that are attached to the different categories aren’t really quantitative. They describe order of responses, but not really quantity.

And yet, ultimately what the item is attempting to measure is amount of agreement. Shouldn’t that be treated as quantitative, if it’s really an amount?

One camp maintains that as ordered categories, the intervals between the scale values are not equal. So even if there is a true quantitative amount to the variable we’re attempting to measure, we’re actually measuring it only at discrete points, creating ordinal categories.

This camp claims that any mean, correlation, or other numerical operation applied to the categorical numerals is invalid. Only nonparametic statistics or other analyses for ordered data are appropriate for Likert item data (i.e. Jamieson, 2004).

The other camp maintains that yes, technically the Likert scale item is ordered. Even so parametric tests can be practically valid in some situations.

Additionally, tests that assume real numerical data still tell you a lot about what’s going on with this variable. They’re easier to run and easier to communicate.

For example, Lubke & Muthen (2004) found that it is possible to find true parameter values in factor analysis with Likert item data, if assumptions about skewness, minimum number of categories, etc., were met. Likewise, Glass et al. (1972) found that F tests in ANOVA could return accurate p-values on Likert items under certain conditions.

Meanwhile, the debate rages on.

Recommendations

So, what is a researcher with integrity supposed to do? In the absence of a definitive answer, these are my recommendations:

  1. Understand the difference between a Likert item and a Likert Scale. A true Likert scale, as Likert defined it, is made up of many items, which all measure the same attitude.But many people use the term “Likert Scale” to refer to a single item from that scale. Confusion about what a Likert Scale is, no doubt, has contributed to the debate.
  2. Proceed with caution. Research the consequences of using your procedure on Likert scale data from your study design and the variables you are measuring.The fact that everyone uses it is not sufficient justification. There are some circumstances and procedures for which it is more egregious than others. You bear the burden of justifying why it’s okay to use numerical procedures for ordinal data.
  3. At the very least, insist that you’ll only treat it as numerical under certain conditions. All of these must be true: that the item have at least 7 values; that the underlying construct you’re measuring be continuous, and that there be some indication that the intervals between points are approximately equal.Likewise, make sure other assumptions of your test are reasonable to make (e.g. normality & equal variance of residuals, etc.).
  4. When you can, run the non-parametric equivalent to your test. Or whatever alternate test exists that doesn’t make assumptions of numerical data.If you get the same results, you can be confident about your conclusions. So even if you choose to report the numerical results, you can explain, maybe in a footnote, all the tests you ran and the similar results you found. Transparency is always good science.
  5. If you do choose to use Likert data in a parametric procedure, make sure you have strong results before making claims.Set criteria for yourself of larger effect sizes, to ensure that non-zero effects really exist, even if you’ve measured your effect with some error.Use a more stringent alpha level, like .01 or even .005, instead of .05. If you have p-values of .001 or .45, it’s pretty clear what the result is, even if parameter estimates are slightly biased. It’s when p-values are close to .05 that the effect of bending assumptions is unclear.
  6. Consider the consequences of reporting inaccurate results. Will anyone ever read your paper? Will your research be published? Will others use it to shape public policy or affect practices?The answers to these questions can inform the seriousness of potential problems.



References:

Carifio, J. & Perla, R. (2007). Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. Journal of Social Sciences, 2, 106-116. https://thescipub.com/PDF/jssp.2007.106.116.pdf

Glass, Peckham, and Sanders (1972). Consequences of failure to meet assumptions underlying the analyses of variance and covariance, Review of Educational Research, 42, 237-288.

Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38, 1212-1218.

Lubke, Gitta H.;?Muthen, Bengt O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons. Structural Equation Modeling, 11, 514-534.

Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions

2 个月

It's all about just single question: do I believe (entire statistics is based on assumptions, even when we are sure we don't make/need any) that the items are equidistant? In other words - does ?5 [very good] minus 4 [good] = 3 [neutral] - 2 [bad] = 2 [bad] - 1 [horrible] = 1 unit? If this can be assured, then a meaningful unit can be defined, so the arithmetic operations like addition and subtraction are defined. So the arithmetic mean (sum) is defined as well. Is the fractional mean meaningful? If so (e.g. shows trace of lower or higher responses), then there's absolutely no problem at all and "religious, ideological" "no - because - no" is nothing but splitting hair. Such assumption can be easily verified by fitting ordinal logistic regression, draw the density (normal or logistic for the latent response) and put the intercepts on it. If the sliced probabilities are equal - we can be almost sure that parametric analysis will give practically same results. No magic here - just statistics. BUT if the differences between items are highly non-linear (many oncological and quality of life questionnaires), then the unit is not defined and arithmetic operations are totally meaningless. Here we need quantiles or rank-based methods.

Andrew Ekstrom

Adjunct torturer (I teach math and stats) and push boundaries that should never be.

2 个月

For a lot of the survey analyses I've done, 4 or 5 means "Great job" which means "1". 1, 2 and 3 means "Need to improve" and thus "0". Makes logistic regression a viable method of analysis. Also looked at using multinomial logistic regression... or whatever it's called when you have more than 2 classes. But, doing that means the "Data Science" book I used to teach from is wrong. It claimed you can only analyse data with dichotomous outcomes with logistic regression... and logistic regression is a classification algorithm.... and that you need to use one library in Python for simple linear regression, another if you want to model quadratic terms (polynomial regression) and a third for multiple linear regression... etc...

要查看或添加评论,请登录

Karen Grace-Martin的更多文章

社区洞察

其他会员也浏览了