Building a Website Clutter Questionnaire

Building a Website Clutter Questionnaire

Clutter, clutter everywhere, nor any questionnaire to measure.

In a previous article, we described our search for a measure of perceived clutter in academic literature and web posts, but we were left unquenched.

We found that the everyday conception of clutter includes two components that suggest different decluttering strategies: the extent to which needed objects (e.g., tools in a toolbox) are disorganized and/or the presence of unnecessary objects (e.g., a candy wrapper in a toolbox). The first situation requires reorganizing the needed objects, while the second requires discarding unnecessary objects.

The literature in UI design has mostly focused on objectively measuring information displayed on screens (e.g., local density, grouping, feature congestion). We found a published questionnaire for subjective clutter in advanced cockpit displays, but we did not find any standardized questionnaires developed for the measurement of perceived clutter on websites.

So, we decided to develop our own.

The development process for a standardized questionnaire has two major research activities: exploratory and confirmatory. In this article, we focus on the exploratory research.

The Initial Set of Clutter Items

Consistent with the literature we reviewed, we hypothesized that at least two factors might contribute to the perceived clutter of websites: content clutter and design clutter.

We expected content clutter to be driven by the presence of irrelevant ads and videos that occupy a considerable percentage of display space and have negative emotional consequences (e.g., they’re annoying). Considering the components of the everyday conception of clutter, these would be the candy wrappers in the toolbox—items that website users would prefer to discard, perhaps by using ad blockers.

Our conception of design clutter is that it is driven by issues with the presentation of potentially relevant content that make it difficult to consume (e.g., insufficient white space, too much text, illogical layout). Analogous to the everyday definition of clutter, this content is similar to a hammer in the toolbox—it should be retained but needs reorganization.

The first iteration of the perceived website clutter (PWC) questionnaire included one item for overall clutter, six for content clutter, and ten for design clutter (see Figure 1 for the entire questionnaire used in our surveys). The format for overall clutter was an 11-point agreement item (“Overall, I thought the website was too cluttered,” 0: Strongly disagree, 10: Strongly agree). The format for content and design clutter was five-point agreement items (1: Strongly disagree, 5: Strongly agree). The short labels and item wording for the content and design clutter items were:

  • Content_ALot: These types of content made up a lot of the clutter.
  • Content_TooMany: There were too many ads or videos.
  • Content_Space: These types of content took up too much space.
  • Content_Distracting: These types of content were distracting.
  • Content_Irrelevant: These types of content were irrelevant.
  • Content_Annoying: These types of content were annoying.
  • Design_HardToRead: The text was hard to read.
  • Design_SmallFont: The font size was too small.
  • Design_DistractingColors: The colors were distracting.
  • Design_UnpleasantLayout: The layout was unpleasant.
  • Design_WhiteSpace: There wasn’t enough white space.
  • Design_TooMuchText: There was too much text.
  • Design_NotLogical: The content was not logically organized.
  • Design_Disorganized: The layout was disorganized.
  • Design_VisualNoise: There was too much visual noise.
  • Design_HardToStart: It was hard for me to find what I needed to get started.


Figure 1:

The SUPR-Q Surveys

The data for these analyses came from SUPR-Q? data collected in eight retrospective consumer surveys conducted between April 2022 and January 2023. Each survey targeted a specific sector, and in total, we collected 2,761 responses to questions about the UX of 57 websites. The sample had roughly equal representation of gender and age (split at 35 years old). Table 1 shows the participant gender and age for each survey, with sector names linking to articles with more information about each survey (including the websites selected for the sectors). Participants were members of an online consumer panel, all from the United States.

Table 1:

The eight surveys shown in Table 1 were retrospective studies of the UX of websites in their respective sectors. Some survey content differed according to the nature of the sector being investigated, but all surveys included the SUPR-Q, basic demographic items, and the first iteration of the perceived clutter questionnaire. For each survey, we conducted screeners to identify respondents who had used one or more of the target websites within the past year, then invited those respondents to rate one website with which they had prior experience. On average, respondents completed the surveys in 10–15 minutes (there was no time limit).

Exploratory Analyses

To support independent exploratory and confirmatory analysis, we split the sample into two datasets by assigning every other respondent to an exploratory (n = 1,381) or confirmatory (n = 1,380) sample by sector and website in the order in which respondents completed the surveys. These sample sizes ensured that we far exceeded the recommended minimum sample sizes for exploratory factor analysis and multiple regression (and for future confirmatory factor analysis and structural equation modeling), even after splitting the sample.

Factor Analysis

A parallel analysis of the clutter items indicated retention of two factors. Table 2 shows the alignment of items (identified with item code) with factors from maximum likelihood factor analysis and Promax rotation (KMO = 0.95). Content and design items aligned as expected with Content and Design factors. The reliabilities (coefficient alpha) were acceptably high (Content and Design factors were both 0.95; their combined reliability was 0.96).

Table 2:

Item Analysis

Item loadings were especially high for content items due to high item correlations, which is good for scale reliability but indicates an opportunity to improve scale efficiency by removing some items. The situation was similar but not quite as extreme for the design items.

A common strategy for deleting items is to identify those with lower factor loadings. For example, for the Content factor, the lowest item loading was for Content_Irrelevant (.774), and for the Design factor, the lowest item loading was Design_VisualNoise (.664). However, because we collected a measure of overall perceived clutter (Overall Clutter), we were able to use an alternative strategy of backward elimination regression analysis to select the subset of clutter and design items that were best at accounting for variation in Overall Clutter.

Item Retention

Backward regression (key driver analysis) of the six content items retained three: Content_ALot, Content_Space, and Content_Distracting, accounting for 35.5% of variation (adjusted-R2) in Overall Clutter. Backward regression of the ten design items plus deletion of items with negative beta weights retained three: Design_UnpleasantLayout, Design_TooMuchText, and Design_VisualNoise, accounting for 39% of variation (adjusted-R2) in Overall Clutter.

Backward regression of these six items revealed some evidence of variance inflation, and in this combination, Content_Distracting no longer made a significant contribution to the model. After removing Content_Distracting, the remaining five items accounted for almost half of the variation in Overall Clutter (adjusted-R2 = 45%), and all variance inflation factors (VIF) were less than 4. The reliabilities (coefficient alpha) for the revised Content and Design factors were, respectively, 0.91 and 0.88; their combined reliability was 0.90.

Exploratory Validity

For the exploratory research, the method of consulting the literature and expert brainstorming to arrive at the initial item set established content validity for the clutter questionnaire (Nunnally, 1978). The expected alignment of items with factors in the factor analysis is evidence of construct validity. Evidence of concurrent validity of the clutter factors comes from their significant correlations with the single-item measure of overall clutter (content clutter: r(1,379) = 0.60, p < 0.0001; design clutter: r(1,379) = 0.61, p < 0.0001).

Revised Version of the Perceived Website Clutter Questionnaire

Figure 2 shows the original version of the PWC questionnaire followed by the version revised based on our exploratory analyses, with the overall clutter item, two items for the assessment of content clutter, and three items for the assessment of design clutter.

Figure 2:

Summary and Discussion

Exploratory analysis of 1,381 ratings of the perceived clutter of 57 websites found:

The proposed questionnaire items aligned with the expected factors of content and design clutter. A parallel analysis indicated the retention of two factors. Exploratory factor analysis showed that the content clutter items formed one factor, and the design clutter items formed the other.

Scale reliability was very high for the overall and factor scores. The reliability for each factor was .95 with a combined reliability of .96. Reliabilities this high indicate an opportunity to increase scale efficiency by reducing the number of items.

We used multiple regression to increase the efficiency of the questionnaire while keeping its reliability high. The revised questionnaire retained the overall item, two items for content clutter, and three items for design clutter. Reliability coefficients dropped a bit from the original questionnaire but remained high (.91 for content clutter, .89 for design clutter, and .90 combined).

The revised questionnaire had high concurrent validity. Concurrent validity was evident from the highly significant correlations between the factor scores and the single overall clutter item scores.

Bottom line: This exploratory development of a standardized clutter questionnaire for websites produced an efficient two-factor instrument with excellent psychometric properties (high reliability and validity).


Benchmarking in MUiQ

The MeasuringU Intelligent Questioning Platform (MUiQ) is an unmoderated UX testing platform geared towards benchmarking the user experience. It has built-in templates that make it easy to set up complex, competitive benchmarks and an analysis dashboard that compares standardized metrics across each condition so your team can quickly gather insights from the research.?MUiQ supports the following study types:

  • Competitive Benchmarking
  • Think-Aloud Usability Testing
  • IA Navigation (Card Sort, Click Test, Tree Test)
  • Large Scale Surveys
  • Advanced Surveys (Kano, Top Tasks, MaxDiff)
  • Prototype Testing
  • Moderated Interviews

With the results being presented in an easy to understand analysis dashboard, MUiQ provides all of the tools you need to collect meaningful UX insights.Reach out today to learn more about MUiQ.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了