How Well Does the Net Promoter Score Measure Likelihood-to-Discourage?

How Well Does the Net Promoter Score Measure Likelihood-to-Discourage?

Is not recommending the same as discouraging or recommending others to not use or purchase from a brand?

While a lack of recommending seems like it would correlate with NOT recommending, we suspect there may be a difference but are unsure how much.

In our previous article, we looked to the published literature and found little research that assessed how a low likelihood-to-recommend (the Net Promoter Score item) measures active discouragement.

The studies we found looked at Positive Word of Mouth (PWOM) and negative word of mouth (NWOM). Their key findings were:

  • Even though PWOM appears to be more prevalent and influential than NWOM, NWOM is itself influential and should be measured. Surveying only existing customers will almost surely understate the percentage of people likely to discourage or recommend against, regardless of what you ask (e.g. the NPS or a satisfaction question).
  • The NPS item is a strong predictor of PWOM and significantly predicts the incidence of NWOM, but it might not be the best predictor of NWOM.
  • A nonstandard bipolar scale with endpoints of “Extremely likely to recommend against” and “Extremely likely to recommend” had about the same accuracy for predicting PWOM as the unipolar NPS item (“Not at all likely to recommend” to “Extremely likely to recommend”), but it more accurately predicted NWOM.

Good benchmarks are not developed overnight, and in most cases, the process takes years (and even then, there is no guarantee of broad adoption). Despite the bipolar scale’s superior prediction of NWOM, one of the desirable features of the NPS is its public benchmarks.

Rather than replacing the NPS item in UX and CX research, we were curious about the statistical relationship between the standard likelihood-to-recommend item and a separate item designed to measure the likelihood of recommending against—in other words, the likelihood of discouraging friends and colleagues from engaging with a brand or product.


Measuring Likelihood-to-Discourage

We conduct periodic SUPR-Q? surveys to take the temperature of the user experience of websites and mobile apps for key companies in various sectors. In August 2024, we collected data from 324 participants on their experience with one of the social media platforms they had used in the past year (Facebook, Instagram, LinkedIn, Snapchat, TikTok, or X).

As part of that survey, respondents indicated their likelihood to recommend that platform on the web and/or its mobile app (depending on their past experience) with a standard item format (Figure 1) and their likelihood to discourage others from using the platform in general with a custom item (Figure 2).


Figure 1:


Figure 2:

Comparison of Likelihood-to-Recommend and Likelihood-to-Discourage

We used several methods to gain insight into the relationship between ratings of likelihood-to-recommend and likelihood-to-discourage.

Correlation

Table 1 shows the correlations among the measurements of likelihood-to-recommend the websites in the social media survey (LTRWeb), likelihood-to-recommend the social media mobile apps (LTRApp), and likelihood-to-discourage others from using the social media platforms (LTDiscourage).


Table 1:

The coefficient of determination (R2) is the square of the correlation. This can be interpreted in different ways; for example, the percentage of shared variance between two variables or the extent to which variation in one variable accounts for variation in the other.

All three correlations were statistically significant (p < .0001). LTRWeb accounted for just over a quarter of the variation in LTDiscourage; LTRApp accounted for just under a third. Although these are very different metrics from the percentage of NWOM, the magnitudes are similar to the 25% of NWOM against currently used products reported in previous research.

Guidelines differ on how high a correlation must be to indicate that two variables are measuring the same thing. Some suggest the appropriate value is ±0.90 (81% shared variance), and others recommend ±0.80 (64% shared variance). To support the claim that two variables aren’t just strongly related but essentially measuring the same thing, the stringent benchmark of ±0.90 or even ±0.95 (90% shared variance) seems reasonable. For example, the ten-item System Usability Scale (SUS) and its single ease of use item correlate at .95, with other research demonstrating that they measure the same underlying construct of perceived ease of use. Although significant, the correlations between LTR and LTDiscourage in Table 1 are far from indicating measurement of the same construct.

Scatterplots

Scatterplots are visual representations of correlations. Figures 3 and 4 show the scatterplots between LTR and LTDiscourage for websites and mobile apps. The bounding boxes show the NPS designations of Detractors (LTR ratings from 0 to 6), Passives (LTR ratings from 7 to 8), and Promoters (LTR ratings from 9 to 10).


Figure 3:


Figure 4:

The magnitude of the correlations depicted in Figures 3 and 4 are similar, as are the distribution of points in the scatterplots. The general trend, consistent with negative correlations, is as likelihood-to-recommend increases, likelihood-to-discourage decreases. But as previously discussed, they are not perfect (or even very high) correlations, and they are not measuring exactly the same thing.

Levels of Discouragement and Recommendation

For Figures 5 and 6, we assigned the two lowest LTDiscourage ratings to one category of extreme intensity (Extremely Low Likelihood-to-Discourage), the two highest ratings to another category of extreme intensity (Extremely High Likelihood-to-Discourage), and the intermediate ratings to a category of moderate intensity (Moderate Likelihood-to-Discourage). We then crossed those categories with the NPS categories of Detractors, Passives, and Promoters.


Figure 5:


Figure 6:

Again, the results were very similar for web and mobile app ratings. Promoters were much more likely than Detractors (or even Passives) to have an extremely low likelihood-to-discourage. On the other side of the scale, Detractors were responsible for the vast majority (82%) of the extremely high likelihood-to-discourage ratings (and accounted for 64% of the more moderate ratings). This result is consistent with our previous finding that, when given the opportunity, some respondents classified as Detractors did not make negative comments about the brand they rated, but 90% of the negative comments captured in the study were made by Detractors.


Summary and Discussion

An analysis of the likelihood of 324 social media users to recommend and their likelihood to discourage the use of social media platforms found:

Likelihood-to-recommend measures likelihood-to-discourage, but not perfectly. Likelihood-to-recommend accounts for about a quarter to a third of the variation in likelihood-to-discourage. That is significant, but it leaves about two-thirds to three-fourths of variation in likelihood-to-discourage unaccounted for. This suggests that NOT recommending is not a perfect or even a strong substitute for measuring intent to recommend against or discourage others from a brand.

Detractors account for 80%+ of discouragers. Not all Detractors discourage. But almost all those who are extremely likely to discourage are Detractors (very unlikely to recommend). This is similar to our analysis of negative comments where not all Detractors make negative comments, but 90% of negative comments come from Detractors.

NPS Promoters are more likely than Detractors to have low ratings of likelihood-to-discourage. Across ratings for web and mobile app usage, most very low discouragement ratings are from Promoters (42–47%) with the remainder split between Passives (27–29%) and Detractors (27–29%).

NPS Detractors are much more likely than Promoters to have high ratings of likelihood-to-discourage. Across ratings for web and mobile app usage, the vast majority of high discouragement ratings are from Detractors (82%).

Discouragement might not be exactly the same as recommending against. We used a discouragement scale to measure this behavioral intention because it includes more active wording than “recommending against,” and we found it easier to ask. But there could be a difference (likely small) between responses to measures using “discouragement” wording versus the “recommending against” wording used in some previous research.

Bottom line: If researchers can get ratings of only one behavioral intention in contexts where recommendation is a plausible user behavior, it should be likelihood-to-recommend. For a clearer picture of the full range of behavioral intention, there appears to be value in also collecting ratings of likelihood-to-discourage.


要查看或添加评论,请登录

MeasuringU的更多文章