This article is the third in this series of assessing validity & reliability of examination questions. The first two were published earlier last week in the same blog. Kindly read these before going through this blog article;
- Difficulty Index, Discrimination Index and Reliability in UKM OMR Report
- Calculating Difficulty Index, Discrimination Index, Reliability & S.E.M.
The reliability of our examination paper as a measuring instrument is crucial, if we want a good and valid instrument to measure our students’ knowledge and comprehension. However how sure are we, that we are measuring what we want to be measured? Bear in mind, that with this “measurement”, we are actually making some conclusion which may affect someone’s future and we should be responsible enough in exercising due care and diligence. The instrument must have a certain level of difficulty and able to discriminate between the good and poor students. The percentage of correct responses would vary according to each item’s difficulty. The proportion of good or poor students who would respond correctly determine the item’s level of discrimination.
In the earlier articles, we learnt how to calculate the item’s Difficulty Index and Discrimination Index. Then we also show how to determine the reliability of the questions. Based on the results of these calculations, we can determine whether we have been fair to the students or not.
There should be a fair amount of easy, moderate and hard questions. At the same time, there should be no questions with negative or zero discrimination index. If there is any questions with negative or zero discrimination index, we need to check the answer key of that question. A zero or negative index indicates either that topic was not taught at all or has the wrong key.
In terms of KR20 reliability, the value should be 0.7 or more. Whenever our postgraduate students conduct studies, we expect their questionnaire to have a reliability of 0.7 or higher, therefore we as lecturers should adhere to the same standard. But for the last 20 years, the reliability of our examination questions were 0, as reported in the UKM OMR analyses, yet nobody raised any red flags, not even by the MQA and the ISO auditors. I guess in this scenario, “Ignorance (among the auditors) is bliss.”
Therefore any examination, especially those examinations where more than half of the students failed due to the theory paper, should be rigorously examined using the above principles. In summary these are what every module/posting coordinator should do at every examination for the OMR report;
- The number of correct answers, wrong answers and blank answers should tally with the number of questions.
- There should be a fair amount of easy, moderate and hard questions. If the difficulty index is less than 10%, please check the answer key for that question.
- There should be no questions with negative or zero discrimination index. If present, please check the answer key for that question.
- The reliability index should be larger than 0.7.
- The standard error of measurement should be very small.
SEM is directly related to the reliability of a test; that is, the larger the SEM, the lower the reliability of the test and the less precision there is in the measures taken and scores obtained. In the earlier example, the SEM was smaller at 1.95 than the standard deviation at 5.64 since it was highly reliable at 0.88. SEM should not be as big as 18.37 for a 30 questions MCQ as in the above OMR report.
Of course the module/posting coordinator should also tally up all the marks from the various components correctly. Always have someone with fresh eyes to check the calculations and formulas within the spreadsheet for the final marks.
Just to illustrate this point, allow me share this tale which happened way back in 2007. However details such as name and places were removed to protect the innocent (and the guilty).
A Tale of Two OMR Marks
Once upon a time in September 2007, there was an examination for a clinical posting. It was a terrible time since 82 (70.1%) of 117 students failed their posting. They failed because of very poor marks for their MCQs. A total of 115 (98.3%) failed their MCQs. Only 2 students passed MCQ, both had 55% marks (23 correct out of 40 questions).
Various excuses were given for the high failure rate, the most often repeated excuse was the allegedly poor attitude of the students towards the posting. Fortunately the leadership of the faculty at that time had the foresight to order a post-mortem.
Since the high failure rate was for the MCQs, attention was given to the OMR report. Not even 5 minutes into the post-mortem meeting, the fault was quickly discovered. Although there were 40 MCQ questions, students were only given marks for the first 24 MCQ questions. The marks for the other 16 questions were ignored by the OMR machine. The MCQs were quickly scanned again at a neutral site and the new OMR marks were carefully inserted in place of the old marks.
Alhamdulillah! Praise Allah! Suddenly almost everyone passed that clinical posting. The fault was human error while programming and scanning the OMR sheets. It was a happy ending, and the students lived happily ever after. Except for the Dean of course, who had to explain to the Senate about the whole thing 😉 but he was okay after that.
It disturbs me that whenever we have more than 50% failure rate for a clinical posting in theory examination, we blamed it on the allegedly poor attitude of the students towards the posting. Yet no evidence was offered to back it up. At the very least, the affected department should conduct due diligence on the examination and marking process. The current leadership should order a post-mortem since our main business is teaching students, not failing them. When many students failed badly, the teachers should reflect and examine the examination process. No departments should be above scrutiny, all departments should be treated equally, regardless of who are their members.
We are not teaching morons or retards. We are teaching the cream of the cream (crème de la crème) of the Malaysian education system, filtered through the sieve of excellence during the preclinical years. Therefore if we want to blame the students, we must make sure that the blame do not originally lie with us.
“Jangan sampai pisang berbuah dua kali”.
Tale = “a fictitious narrative or story, especially one that is imaginatively recounted.”
Update 9th April 2015;
The OMR software has been rectified for KCKL & PPUKM. It is now reporting the reliability and standard error of measurement correctly.