Health of your Item Bank
Tetsuo Kimura: Evaluation of in house item banks

Health of your Item Bank

Maintaining your Item Bank Health

Item Bank allows you to write a question incrementally, over a period of time. One of the benefits of writing a question in a staggered manner is you can plan the quality control efforts. In the initial stages, it primarily consists of self-review and peer review. But there are certain quantitative measures which can give you a very good idea about the quality of each of the question and thus the health of your Item bank.

Pre-testing the questions

All you need to do is to pre-test the questions which have passed the initial QC on a small but representative sample of students and collect standard data. This data can be analysed quantitatively to arrive at the metrics which can be easily interpreted. Some of the commonly followed quantitative methods, the data required, the method of calculation, and how to interpret and benchmark the result are described below.

Core quantitative methods

1.       Mean

2.       Facility Index

3.       Discrimination Index

The pre-testing exercise doesn’t only indicate the quality issues but also pinpoints the mistakes (such as a wrong option marked as the correct answer) so we can improve the quality of the questions as well as correct all those issues to make the exams a quality affair.

Mean

We are talking about the AM or Arithmetic Mean, generally referred to as average- in this case, the average marks. It is obtained by dividing the total number of marks scored by the number of candidates who appeared for the test.

Calculating Mean

  • If 100 students are given a test and their combined score is 2300, then the mean of the marks =         23

Values of mean

  • For multiple choice papers, the mean mark should be in the region of 60% of the marks. E.g. if there are 40 multiple choice questions (each of 1 mark) in a given test, a mean of 24 marks would be expected.
  • Note, that if there are 4 options to each multiple-choice paper, then the guessing score is 1 in 4 i.e. 10 marks.

Facility Index (p)

Facility index measures how easy a question is. It is given the symbol ‘p’. The p-value tells us the proportion of candidates getting the answer correct. For multiple choice question:


Calculating Facility index (Question No. 1)

Values of Facility index

  • The facility value can vary from 1.0 where all the candidates get the answer right to 0.0, where no one gets the right answer (or the answer is wrong!).
  • The mean facility of the paper should be between 0.5 and 0.6
  • Each distracter should have a facility of 0.05 or more.

Optimal Level of Facility Index

The optimal level for an acceptable p-value depends on the number of options per item. A formula that can be used to compute the optimal level is:

where g = the chance level

for an MCQ with 4 options, g = 0.25 therefore, the optimal level of p for the tests will be 62.5 (or p = 0.625). Questions with more options are more difficult to answer so, as you increase the number of options, you would like to bring down the optimal level- which is what this equation does.

Discrimination Index (d)

This is the correlation of (responses to individual items) with (overall test score). The higher the correlation, the more the item results are consistent with the test as a whole. In other words, this measures whether the candidates who chose a particular option were generally the abler ones.

The logic is, you expect the students who have done well (scored high marks) on the test as a whole, to do well on any of the individual items, compared to the students who haven’t done so well on the test as a whole and vice versa. E.g. on a given question the top 20 % of the students got it correct 80% of the time whereas the bottom 20% got it correct only 35% of the time is a very logical occurrence. So, you expect a strong positive correlation here.

The higher the value of d, the more effective the item is. When d is 1.00, all test takers in the upper group and no test takers in the lower group answered the item correctly. Conversely, if none of the upper group but all of the lower group answered an item correctly, the d value would be -1.00. Both of these circumstances are rare, and you will probably never see a value of 1.00. The range of values for the item discrimination index is -1.00 to 1.00.

A discrimination value of > 0.3 is good, the range of 0.1 to 0.3 is fair, but anything below 0.1 is poor and definitely worth improving. A negative correlation (even a small one) is contrary to the logic and the items with such scores must be revisited.

The 20% bracket is a common one but brackets of 25% or 27% are not uncommon (so top 25% vs bottom 25% value is also used)

Calculating Discrimination index (Question no. 1)

Discrimination for option ‘A’

13 of the top 133 candidates and 24 of the bottom 133 chose option ‘A’. So, discrimination for option ‘A’ is:         d = (13-24)/133= -0.083

Discrimination for each of the options

Values of Discrimination index

  • d varies from +1 to -1
  • The discrimination value of correct answer should be greater than + 0.25

Relationship between facility index and discrimination index

When the facility index is at extremes (p= 1 or 0) then the question doesn’t have the ability to discriminate between the ability of the students. If a question is answered by all you don’t have an idea who is a better or worse student based on that question alone. Similarly, if a question is answered by none again you don’t have any way to discriminate between the level of the students based on that question alone.

In general, as the facility index increases from 0, the ability of the question to discriminate will increase. The ability of the question to discriminate will be maximum when the p-value is between 0.5 to 0.7. Beyond p-value of 0.7, the ability of the question to discriminate again starts to decline till it becomes nil again when the p value becomes 1.

Lower bound for item difficulty

From the above explanation, we know that it is not a very good idea to have too many very difficult questions in the paper (with the p-value below a given bound) since these items will not help us discriminate among the test takers. You can find the lower bound for p-value using the below formula.

where k = number of MCQ questions; n = number of students 

For example, where k = 10 and n = 100, the lower bound = 0.15 (approx.)

*Lower bound for p-value for an exam like CAT for IIMs (200 MCQs, about 1 million students) = 0.005. So, now you know why at least some of the questions in that test are so difficult (only around 5000 out of the total students are supposed to solve them!)

A quick glance on the Item Bank coverage and health

A cross tabulation of facility and discrimination values (see table below) is a very good way to know the coverage as well as the health of the Item Bank. The numbers at the cross section (n1, n2 etc.) represent the number of questions in the item bank meeting the criteria. The row in yellow highlight should have no or very small number of questions at any time. The row in blue should have a large number of questions. There should be enough questions representing each of the difficulty levels.

Please note, the division among Low, Medium and High levels of difficulty and Poor, Fair, and Good levels of discrimination are somewhat subjective, the underlying variables are continuous.

Over the lifecycle of a question, the quality control efforts can be hugely assisted if you understand the uses of these quantitative methods, and your Item Bank can be maintained in top health- always ready for the next exam.

Annexure: Live data from Item Analysis and interpretation:

  • Based on discrimination value the correct key for Item No. 5 should be D (which has a positive value) whereas it is marked as C (which has a negative value). This must be revisited- either the key is mistyped or the question is a bad quality question.
  • Similarly, for Item No. 4, since the discrimination value of B > D, B seems to be the correct Key. Here again, it must be verified for mistyped key, but if that is not the case it must be revisited for review and improvement.
  • Based on p-value, only 1 (Item No. 3) out of these 10 questions is of medium difficulty, rest all are of high difficulty. There is not a single low difficulty question.


I have been accumulating quite a bit of domain knowledge and now I think it is 
time to share. something in the series. more to come...
Neeraj
Neeraj Kumar

Product Management & Marketing

8 年

You will find the article interesting if you are into learning & teaching, edutech, assessments, exams, certifications.... #item Bank #question bank #assessment #exam #on demand exam #adaptive learning #question quality #exam software #assessment software

回复

要查看或添加评论,请登录

Neeraj Kumar的更多文章

  • Tech enabled prosthetics. An idea whose time has come.

    Tech enabled prosthetics. An idea whose time has come.

    This is about..

  • Mr. Murthy

    Mr. Murthy

    A random chat So, was talking to a fellow from a product-based software company for the first time. During chat…

  • Test Builder

    Test Builder

    As a teacher, you worked on the top two priorities for your students. Build Concept & Build Practice.

  • Build Practice

    Build Practice

    If reading, re-reading, marking & highlighting portions, taking notes & summarizing, is the way that you are learning…

  • Build Concept

    Build Concept

    Concepts are the glue that holds the entire big picture together, making them the most important part to study…

  • 3 Basic Goals of a Teacher, and Assessments

    3 Basic Goals of a Teacher, and Assessments

    As a teacher (whether in a school, coaching class or in one to one teaching situations) you have three main goals…

  • WHY ‘ONE SIZE FITS ALL' IS A MYTH?

    WHY ‘ONE SIZE FITS ALL' IS A MYTH?

    That ‘one size fits all’ when it comes to learning, is a well-known myth, but why? Learning Techniques Teachers vouch…

  • THE CASE FOR AN ITEM BANK

    THE CASE FOR AN ITEM BANK

    With so many vendors around and every other examination board and certification body adopting some kind of "Item Bank"…

  • MOOCs: the democratization of education

    MOOCs: the democratization of education

    Can you get a reasonably well-rounded education on an eclectic range of topics, from some of the best minds in the…

    7 条评论

社区洞察

其他会员也浏览了