Can Likert Scale Data ever be Continuous?
Karen Grace-Martin
Statistical Consultant, Trainer, and Mentor for Researchers at The Analysis Factor
A very common question is whether it is legitimate to use Likert scale data in parametric statistical procedures that require interval data, such as Linear Regression, ANOVA, and Factor Analysis.
A typical Likert scale item has 5 to 11 points that indicate the degree of something. For example, it could measure agreement with a statement, such as 1=Strongly Disagree to 5=Strongly Agree. It can be a 1 to 5 scale, 0 to 10, etc.
The Debate
The issue is that despite having numbers, a Likert scale item is in fact a set of ordered categories. The numerals that are attached to the different categories aren’t really quantitative. They describe order of responses, but not really quantity.
And yet, ultimately what the item is attempting to measure is amount of agreement. Shouldn’t that be treated as quantitative, if it’s really an amount?
One camp maintains that as ordered categories, the intervals between the scale values are not equal. So even if there is a true quantitative amount to the variable we’re attempting to measure, we’re actually measuring it only at discrete points, creating ordinal categories.
This camp claims that any mean, correlation, or other numerical operation applied to the categorical numerals is invalid. Only nonparametic statistics or other analyses for ordered data are appropriate for Likert item data (i.e. Jamieson, 2004).
The other camp maintains that yes, technically the Likert scale item is ordered. Even so parametric tests can be practically valid in some situations.
Additionally, tests that assume real numerical data still tell you a lot about what’s going on with this variable. They’re easier to run and easier to communicate.
For example, Lubke & Muthen (2004) found that it is possible to find true parameter values in factor analysis with Likert item data, if assumptions about skewness, minimum number of categories, etc., were met. Likewise, Glass et al. (1972) found that F tests in ANOVA could return accurate p-values on Likert items under certain conditions.
Meanwhile, the debate rages on.
领英推荐
Recommendations
So, what is a researcher with integrity supposed to do? In the absence of a definitive answer, these are my recommendations:
References:
Carifio, J. & Perla, R. (2007). Ten Common Misunderstandings, Misconceptions, Persistent Myths and Urban Legends about Likert Scales and Likert Response Formats and their Antidotes. Journal of Social Sciences, 2, 106-116. https://thescipub.com/PDF/jssp.2007.106.116.pdf
Glass, Peckham, and Sanders (1972). Consequences of failure to meet assumptions underlying the analyses of variance and covariance, Review of Educational Research, 42, 237-288.
Jamieson, S. (2004). Likert scales: how to (ab)use them. Medical Education, 38, 1212-1218.
Lubke, Gitta H.;?Muthen, Bengt O. (2004). Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons. Structural Equation Modeling, 11, 514-534.
Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions
2 个月It's all about just single question: do I believe (entire statistics is based on assumptions, even when we are sure we don't make/need any) that the items are equidistant? In other words - does ?5 [very good] minus 4 [good] = 3 [neutral] - 2 [bad] = 2 [bad] - 1 [horrible] = 1 unit? If this can be assured, then a meaningful unit can be defined, so the arithmetic operations like addition and subtraction are defined. So the arithmetic mean (sum) is defined as well. Is the fractional mean meaningful? If so (e.g. shows trace of lower or higher responses), then there's absolutely no problem at all and "religious, ideological" "no - because - no" is nothing but splitting hair. Such assumption can be easily verified by fitting ordinal logistic regression, draw the density (normal or logistic for the latent response) and put the intercepts on it. If the sliced probabilities are equal - we can be almost sure that parametric analysis will give practically same results. No magic here - just statistics. BUT if the differences between items are highly non-linear (many oncological and quality of life questionnaires), then the unit is not defined and arithmetic operations are totally meaningless. Here we need quantiles or rank-based methods.
Adjunct torturer (I teach math and stats) and push boundaries that should never be.
2 个月For a lot of the survey analyses I've done, 4 or 5 means "Great job" which means "1". 1, 2 and 3 means "Need to improve" and thus "0". Makes logistic regression a viable method of analysis. Also looked at using multinomial logistic regression... or whatever it's called when you have more than 2 classes. But, doing that means the "Data Science" book I used to teach from is wrong. It claimed you can only analyse data with dichotomous outcomes with logistic regression... and logistic regression is a classification algorithm.... and that you need to use one library in Python for simple linear regression, another if you want to model quadratic terms (polynomial regression) and a third for multiple linear regression... etc...