String Theory: Variable Measurement
In attempting to measure the impossible, we discover some useful lessons about the way we choose to measure everything, writes Newgate pollster Jim Reed.
N.B. This article first appeared in the April-May 2018 edition of the Australian Market and Social Research Society’s publication Research News.
How long is a piece of string? This is the question I recently asked respondents in multiple online samples (n=500), representative of the Australian population.
I did so because I wanted to answer the following question: If we remove the likelihood of an answer being rooted in real-world views and behaviours, can we get a better idea of how the different response options we provide affect people’s answers?
Either there would be no effect – in which case we would have solved an age-old riddle – or the consistently asked hypothetical question would act as a control against which to judge different response formats.
We learnt the following lessons...
Ask a silly question and you will get a silly answer
Most of our questions required an answer in order to complete the survey. In such circumstances, one might expect that dropout rates would rise, but they did not. Our respondents wanted to help us, finish the survey and get their reward, so they all dutifully ticked a box and moved on.
This propensity to answer nonsensical questions is a great danger to survey results and their reporting. If you ask a silly question – however poorly conceived or designed – you will get an answer.
When we included a ‘don’t know’ category, 82 per cent sensibly opted for it. This reflects how many people would not answer if they were not compelled to, but having such a high non-response just creates different headaches for the researcher.
The more open and numerous the options, the more varied the responses
Most of our questions included a list of pre-coded responses, but in some instances we provided open-ended comment boxes. The results from the two formats varied immensely.
Our shortest scale gave respondents the option of choosing a length of string ranging between 5-25cm at 5cm increments, and all were able to provide an answer (averaging 17cm). But when respondents were given free rein with an open-ended response only 55 per cent fell within this range. Instead they nominated many different and much longer lengths. Even after the most massive outliers were excluded, the average was consequently pulled upwards to 114cm.
One might conclude that open-ended responses are better because they are more inclusive of all possible responses, but within them were the humorous outliers one might expect of a question of this sort, and the practicalities of making every survey question open-ended are unrealistic.
One compromise is the addition of an ‘other’ or ‘longer than this’ response, both of which were trialled in our study. When we included these responses (separately and combined), around 80 per cent chose them, but the associated rating scales influenced respondents. In one instance, the average came down significantly from the purely open-ended response to 43cm.
The length of the scale and the scale of the lengths mattered
Within our many pre-coded scales, we provided different length ranges, different numbers of response categories and increments. It became clear that increasing the possible range increased the range of responses.
We also compared the responses to the same range of lengths, but split over a different number of increments. Here, there were differences in the average length and specific frequencies of lengths. More options bred greater variety in response.
But these were not proportional shifts. Fewer people chose the half per cent increments provided between whole integers in the expanded scale, and fewer people went to the longest string lengths in the scale with a broader range. Respondents tended to round up or down (something found in open-ended responses too), and began to shy away from the extremes when more options were given – something I term the ‘reluctant 10-out-of-10 effect’.
Ordering can have an effect
When measuring quantity or degrees of opinion, it’s common and logical to adopt ordinal scales. That was generally the case here, but we included one randomly-ordered version of our five-point scale.
Intriguingly, the choice of the extremes – lengths of 5cm and 25cm – were almost identical regardless of ordering, suggesting that they had been sought out by those with a prejudice on this question. But significantly fewer people opted for the central 15cm category in the nominal version (10% vs 23%).
This shows that some gravitate towards a central (or neutral) category in ordered scales – I call this the ‘happy medium’ effect.
Indeed, when we reversed the ordering of the scale, so that the lengths were presented in descending order, that prejudice to opt for the middling lengths returned, but in this case more chose 20cm than 10cm (the ‘shoulder lengths’), which is likely to be simply because of the order in which they were presented.
Units are used as a guide to the answer
In a second iteration of the open-ended response option, we asked respondents to give their answer in metres, rather than centimetres. Average string length jumped from 114cm to 411cm.
Again, this was not a simple proportional shift – one where metre responses were one hundred times smaller than the answer for centimetres – but clearly shows that in framing possible responses, we guide respondents.
The responses you provide do influence the answer
Despite us asking our question consistently across all samples, responses varied considerably. Despite the tongue-in-cheek nature of this exercise, the findings have real-world implications. This is particularly the case for opinion and recall questions, which are most akin to the hypothetical question asked here.
When we ask questions they are never perfectly objective; there are strings attached. Researchers tend to dwell on question wording when attempting to limit bias, but the response options are equally critical. This is not a new consideration, of course, but reinforces that it is an important one.
What can we do about it?
Researchers can never circumvent these effects because we must always choose one method and one frame of reference. Just as the observer in quantum theory changes the observed, the framing of our questions and response categories alter the result.
We must instead choose a response that suits the situation best, makes sense and is inclusive of likely responses – balanced with the need to ensure it’s easy to respond to the question, analyse the data and present the findings.
Testing responses in real pilot interviews (not just ‘soft launching’) or in a qualitative study is the best way to determine the right response categories and options. Once you have the right responses, stick with them if you are tracking the same questions over time or comparing them against other studies.
Perhaps the most important observation is that, no matter how inane and ill-conceived your question, and regardless of the inappropriateness of your response categories, a large proportion – perhaps all – survey respondents will try to give you an answer if compelled to do so.
Therefore, we must consider carefully not only how to ask a question, but also if we should ask it in the first place.
So, how long is a piece of string?
I can neither average out all the responses we got and give you an answer, nor pick what I think is the best rating scale, because the answer is embedded in the design. Instead, I think I’ll agree with the 82 per cent who answered ‘don’t know’ when we gave them that freedom.
It’s heartening that the vast majority of our industry’s respondents are able and willing to tell the truth accurately – we just need to give them that option.
Jim Reed, Senior Director, Newgate Research