Number of Levels Effect in CBC: Is It Strong and Does It Persist for More than Four Levels?
Bryan Orme and Zachariah Hewitt, Sawtooth Software
If you are a conjoint analysis researcher, you've probably heard about the "Number of Levels Effect" (NOL effect). The NOL effect involves quantitative attributes like price and speed and is that attributes specified on more levels could capture more importance in conjoint analysis than when specified on fewer levels. Perhaps you've even been worried that NOL could contaminate or even invalidate your conjoint analysis study? Is NOL a lion or a pussycat? Should you fear NOL or sleep easy that NOL is calm and manageable?
Executive Summary:
Researchers in the 1980s and 1990s found an NOL effect in conjoint analysis when specifying a quantitative attribute on four versus two levels (holding the range of variation constant).? An attribute such as price specified on four levels rather than two levels would capture significantly more importance, as indicated by the difference in utility between best and worst levels.? We report on two CBC studies on the NOL effect.? Our research suggests that the NOL effect in conjoint analysis (Choice-Based Conjoint) is not as strong as previously thought.? Additionally, we didn’t observe any increase in attribute importance moving from 11 to 21 levels of a quantitative attribute.? When using quantitative attributes in conjoint analysis, we recommend using at least four levels to both reduce the NOL effect and represent these attributes in a way that is likely more realistic regarding the variation in prices seen in the real world.
Background on the Number of Levels Effect:
In the 1980s, academics (Currim et al. 1981) showed that by doubling the number of levels from two to four for a quantitative attribute, the importance of that attribute would increase significantly (holding the range of variation constant).? For example, some respondents would see a two-level conjoint attribute with prices of $100 and $200.? Other respondents would see price defined on four levels ($100, $130, $160, $200) and the difference in utility between $100 and $200 would be significantly larger for the second group compared to the first group.? How much larger?? In one case, Wittink reported for full-profile ratings-based conjoint that increasing the number of levels for an Energy Cost for refrigerators attribute from two levels to four levels (holding the range of variation in Energy Cost constant) led to a more than doubling of the importance of the attribute (Wittink et al. 1991).
Choice-Based Conjoint (CBC, also known as DCM) is by far the most popular conjoint analysis method used today. Does the NOL effect also plague CBC? Only one other study we're aware of has investigated whether the NOL effect continues for quantitative attributes specified on more than four levels for CBC (Choice-Based Conjoint).
In 2000, Marco Hoogerbrugge conducted an experiment in which respondents got one of multiple CBC questionnaires that varied the number of levels of quantitative attributes (Hoogerbrugge 2000). He tested attributes with three, five, eight, and nine levels (holding their ranges constant). Hoogerbrugge found that the NOL effect seemed muted and potentially non-existent when comparing three or more levels to a larger number of levels such as five, eight or nine.
While Hoogerbrugge's findings were comforting, some designs for CBC studies today involve many more than nine levels of a quantitative attribute like price. Some researchers use alternative-specific designs (SKU-specific prices), "Summed Pricing" designs or perhaps even pivot designs where price is seeded -25%, -15%, +0%, +15%, +25% based on the respondent's previously stated utility bill or stated budget. Such designs can lead to respondents seeing potentially dozens or 100s of unique price levels. Linear, log-linear, or piecewise (spline) functions are then fit to the data. Does the NOL effect persist and potentially bias the importance of attributes specified on many more than nine levels?
Our Investigation:
We investigated if the NOL effect for conjoint analysis (CBC) is strong and continues for numbers of levels for quantitative attributes beyond four.? Between January and March 2023, we fielded two 5-7 minute surveys. We conducted one survey using PureSpectrum[1] panel (n=1013, before cleaning) and the other using Amazon’s Mechanical Turk panel (n=1400, before cleaning). The subject matter was HDTV and electric vehicle preferences for MTurk and PureSpectrum respondents, respectively.
We fielded both surveys using Sawtooth Software’s Discover platform.? The surveys included a MaxDiff section (six items, showing six questions each with three items), five different CBC questionnaires (each respondent was randomly assigned to one of the five CBC questionnaires), and a few other opinion and demographic questions.? The CBC design featured eight CBC tasks on four attributes showing three concepts per task plus a None alternative. Below is an example CBC question for respondents who received the HDTV questionnaire.
For one quantitative attribute in each CBC survey (Price for HDTVs and Range for electric vehicles), we varied the number of levels across the experimental cells of our design. We used the quota control feature in Sawtooth Software's Discover platform to randomly assign in a balanced way respondents into one of five design cells of our experiment. For example (as shown below), respondents in Cell 1 saw price on just two levels ($500 and $1000). Whereas, respondents in Cell 5 saw price on 21 levels (again ranging from $500 to $1000).
For both the HDTV and EV projects, we included a short MaxDiff exercise to catch random responders and to prime/warm up respondents to provide better CBC data on HDTV or EV purchase choices.? The MaxDiff included six items regarding buying habits of technology products for the home (for the HDTV study) and for the EV study the MaxDiff items involved six features of electric vehicles.? For both MaxDiff exercises, each item appeared three times per respondent.? When respondents see each item three times in a MaxDiff tasks, this allows us to use the fit statistic (RLH) from HB estimation to identify random responders with a high degree of accuracy (Orme and Chrzan 2022). 59% of MTurkers failed the RLH consistency check from MaxDiff and were hardly distinguishable from random responders.? This seems higher than usual compared to other panels in the market research industry. Only 21% of PureSpectrum respondents failed the consistency test.
Sample sizes per cell (after cleaning the random-looking respondents) were:
Number of Levels Effect Findings:
The aim of this research was to measure what happens with the NOL effect as the number of levels increases beyond four.? Academics in the 1980s showed multiple times that for traditional ratings-based conjoint the importance of a quantitative attribute increases significantly if it is defined on four levels versus two.? But ratings-based conjoint is rarely used today. Does the NOL effect affect CBC (DCM) as strongly? And, does the NOL effect continue to increase if we define that same attribute (holding range constant) on 11 or 21 levels? ?
Recall that we randomly assigned respondents to five different design cells (five different CBC questionnaires).? The only thing different about the CBC questionnaires was how many levels we specified for the Price and Range attribute.
领英推荐
For each of our five CBC design cells, we estimated the Price or Range attribute as a single term using a linear function (constrained negative).? We employed three different approaches for quantifying the importance of attributes: 1) importances summing to 100% from aggregate MNL, 2) importances summing to 100% from HB MNL, 3) importances summing to 100% derived from sensitivity analysis using a “what-if” market simulator built from the HB MNL utilities.? Averaging across the three methods for estimating the effect of Price (for HDTVs) or Range (for EVs) on choices, we obtained the following Importance scores:
When charted, these importance scores by number of levels (of the experimentally manipulated quantitative attribute) look like the following.
In an omnibus ANOVA followed by t-tests, the only significant difference we found in the Mechanical Turk results above is between the 3-level treatment of price and the 11-level treatment (p<0.03). Half of the differences in the PureSpectrum results are significant at the 95% confidence level.?
Our MTurk findings (for HDTVs) seem to echo the earlier findings from academics and from Hoogerbrugge.? Increasing the number of levels from two to four for a quantitative attribute seems to lead to increased importance in terms of effect on choice in conjoint questionnaires but does not continue to increase much beyond the use of four levels.? However, our results for the electric vehicle study fielded using PureSpectrum sample suggest that the NOL effect is modestly seen only when more than four levels are included in an attribute, but not for the range of variation between two and four levels.
The NOL effect seems, frustratingly, to vary from project to project. Perhaps the NOL effect also depends on the quality of the panel source (PureSpectrum being higher quality than Mechanical Turk).? Importantly, in neither of our projects did the number of level effect come close to approaching the magnitude that Wittink and others reported decades ago for ratings-based full-profile conjoint.? And, for neither project did the importance of the test attribute increase when moving from 11 to 21 levels.
Wrapping It Up
As much as we were hoping to shine a clear light on what to expect for CBC projects with the NOL effect, our findings were not as clean as we had hoped.? That said, the magnitude of the NOL effect for CBC was not nearly as worrisome as was reported by academics in the 1980s and 1990s for traditional ratings-based conjoint.
After conducting the HDTV study with MTurk respondents, we were encouraged and thought the story was going to fall in line with Hoogerbrugge’s conclusions from 2000.? But, the electric vehicle study fielded with higher quality PureSpectrum Panel surprised us, with a different and more favorably muted pattern manifested by the NOL effect.? What is comforting regarding both of our studies is that manipulating the number of levels from two clear up to 21 levels for a quantitative attribute led to relatively modest changes in the importance or impact of that attribute on product choice for CBC experiments.? So, we conclude with the recommendation to specify quantitative attributes on three or more levels, which is likely to be more realistic in describing real-world market conditions better than just two levels of a quantitative attribute—and, to not fret so much about the number of level effect for CBC questionnaires.
Limitations of our research are that we examined just two CBC studies, and both studies employed three concepts per task designed with modest level overlap, plus a standard None concept.
For a more complete version of this article, including more details about computations, please visit: https://sawtoothsoftware.com/resources/technical-papers/number-of-levels-effect-in-cbc-is-it-strong-and-does-it-persist-for-more-than-four-levels
?
References:
Currim, I. S., C. B. Weinberg, and D. R. Wittink. 1981. The design of subscription programs for a performing arts series. Journal of Consumer Research 8:67–75.
Hoogerbrugge, Marco. 2000. Practical Issues Concerning the Number-of-Levels Effect. In Sawtooth Software Conference Proceedings, pp. 113-123. Sequim, WA: Sawtooth Software.
Orme, Bryan and Keith Chrzan (2021), “Becoming an Expert in Conjoint Analysis,” Sawtooth Software, Provo, UT.
Orme, Bryan and Keith Chrzan (2022), “Real-Time Detection of Random Respondents in MaxDiff” accessed at: https://sawtoothsoftware.com/resources/technical-papers/categories/maxdiff-scaling
Wittink, Dick, Joel Huber, John Fiedler, and Richard Miller (1991), “The Magnitude of and an Explanation/Solution for the Number of Levels Effect in Conjoint Analysis.”? Working paper, as cited by Wittink in the 1992 Sawtooth Software Conference Proceedings.
[1] Many thanks to PureSpectrum for donating the sample for this research-on-research study.