Four Pillars of Assessment: Validity

Four Pillars of Assessment: Validity

By Stuart Kime FCCT


This article was first published as a guest post on The Association of School and College Leaders’ (ASCL) website. The blog series explores the four pillars of great assessment:?purpose, validity, reliability and value.?


There is no such thing as a valid assessment!

Validity is perhaps the most commonly-used word in discussions about the quality of any assessment. While it’s used a lot, it is often misunderstood and can be very misleading.

Validity is a word which, in assessment, refers to two things:

  • The ability of the assessment to test what it intends to measure;
  • The ability of the assessment to provide information which is both valuable and appropriate for the intended purpose.

A common misconception about validity is that it is a property of an assessment, but in reality, there is no such thing as ‘a valid assessment’. However, there is such a thing as ‘an assessment which is valid for a specific purpose’: validity is all about the inferences you make based on the information generated.

Two key questions

Researchers such as Samuel Messick (1989) have suggested there are two key questions to be asked of any assessment:

  1. The scientific question (technical accuracy): Is the test any good as a measure of the big idea, characteristic, or attribute it purports to assess?
  2. The ethical question (social value): Should the test be used for its present purpose?

In many cases, there are two reasons that assessments end up not quite hitting their target: construct under-representation and construct-irrelevant variance.

Construct under-representation: is where the assessment fails to capture important aspects of the construct (the target of the assessment). Examples include:

  • a German assessment of applying verb endings correctly which only tests the present tense;
  • a maths assessment of simplifying and manipulating algebraic expressions that does not test expanding products of two or more binomials.

Construct-irrelevant variance: the assessment outcomes are influenced by things other than just the construct. Examples include:

  • in the German assessment mentioned above, inaccessible vocabulary used in the questions affects the measurement of the intended construct;
  • in the maths assessment mentioned above, to answer a question the pupil is asked to first work out a percentage. Although a mathematical concept, we are no longer assessing just our intended topic (manipulating algebraic expressions).

When we talk of validity and great assessments, we are referring to the assessment’s ability to support the claims we want to make based on the information generated.?

Improving validity

One of the key validity checks we can do when assessing the quality of an assessment is to consider: is there either construct under-representation or construct-irrelevant variance in this assessment? Defining the construct – saying what is and isn’t included in it – is a vital part of a robust assessment process. It is one way in which we can avoid construct under-representation and construct-irrelevant variance.

Ensuring that an appropriate and meaningful range of marks is used to represent performance at particular levels of achievement is another aspect of improving the validity of an assessment. If there are 50 marks available on an assessment task, but no student is awarded more than 35 marks or less than 20, is the assessment really out of 50?

Assessment validity is all about the inferences you make based on the information generated. Therefore, it is important to ask, does the assessment allow you to make inferences which are valid??

What’s next?

Validity and reliability form the foundation great assessment and should be considered side-by-side. In the next article in this series we will explore reliability and its relationship to validity.


“Understanding Purpose” is one unit of learning from the Assessment Lead Programme. Click here to find out more and register today.

要查看或添加评论,请登录