**Part 2**

In the previous article, I explained the usefulness of the OMR report as generated by the UKM OMR machine software. But not all places have such facilities and support from the IT Department. Therefore what if we want to do the same kind of analysis on our own? In this blog post, I will explain how to do such analysis just by using a spreadsheet software such as Microsoft Excel.

Whenever we are conducting an examination for our students, we are basically conducting a study to measure their knowledge and understanding of the subject matter at hand. Using the prepared questions, we can measure their knowledge, but is the resulting scores a true reflection of their comprehension?

Therefore we need to measure the reliability and the validity of the questions being asked. To do that, we need to measure the following;

- Difficulty Index
- Discrimination Index
- Reliability Index (Kuder and Richardson Formula 20 Reliability Index)
- Standard Error of Measurement

**Getting the Raw Data for Analysis**

Basically we have to enter all answers of each students into the spreadsheet. If there were a lot of questions and a large number of students, it would be difficult to enter the answers for each questions and each students into the spreadsheet. So if possible, beg, borrow or steal an OMR machine and scan the students’ answers into a text file.

Then convert the text file into Excel.

If you can’t get hold of an OMR machine, then consider having an online examination instead using Google Form, which will generate an Excel file for you at the end of the examination. Otherwise start typing in the data into Excel yourself, as illustrated above.

Then convert their answers into Correct and Incorrect Answers. Code 1 for correct answers, code 0 for wrong answers, as illustrated below.

Use the “Autosum” command to generate the total marks for each students. Do the top one first, then “drag and Autofill” for the rest of the column. If you don’t understand that, get your son or daughter to demonstrate that for you 🙂 .

Then sort the data using the “Sort Descending” command, so that the highest score is at the top and the lowest score is at the bottom.

**Difficulty Index (D)**

**D = students with correct answer x 100**

**total students**

Using the Excel table prepared earlier, go to the bottom of the first question column (cell B24) and calculate the Difficulty Index using the formula such as this; “**=SUM(B2:B23)/COUNT(B2:B23)*100**“. Then “drag & Autofill” for the rest of the row till the very last question.

Then calculate the **Average Difficulty Index **by typing command such as “**=AVERAGE(B24:AE24)**” at the very last column (cell AF24), as what is illustrated below;

**Discrimination Index (R)**

R = (H – L)

27% of Total

H = number of correct answers from top 27% of students

L = number of corrects answers from bottom 27% of students

Since we have 22 students in this example, 27% out of 22 = 6 students. So we will have to take the number of correct answers from 6 of the top students (H) and deduct the number of correct answers from 6 of the bottom students (L), then divide it by 6.

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B25) and calculate the Discrimination Index using the formula such as this; “**=(SUM(B2:B7))-(SUM(B18:B23)))/6**“. Then “drag & Autofill” for the rest of the row till the very last question.

Then calculate the **Average Discrimination Index **by typing command such as “**=AVERAGE(B25:AE25)**” at the very last column (cell AF25), as what is illustrated below;

**Transposing the Difficulty and Discrimination Index for Analysis**

We have calculated the Difficulty and Discrimination Index for all 30 questions. Now we need to categorise them and prepare a frequency table. In its current incarnation, that would be difficult. So we need to transpose the current table to facilitate analysis. This is how we do it;

The resulting transposed sheet would look like this, which could be imported into SPSS and analysed.

Questions with low difficulty indexes (below 30) and zero/negative discrimination indexes should be checked again. Either the answer keys were wrong or the questions were poorly prepared.

**Kuder and Richardson Formula 20 Reliability Index**

This index checks the internal consistency of measurements with dichotomous choices (i.e. Correct versus Incorrect). A correct question scores 1 and an incorrect question scores 0. Value of this index ranges from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

*k*= number of questions- Sum of all PQ’s (p = probability of correct answer, q = probability of wrong answer)
- σ2 = variance of the total scores of all the people taking the test.

Therefore to calculate the reliability index, you need to calculate the following items first;

- p of each questions (p = probability of correct answer)
- q of each questions (q = probability of wrong answer)
- Sum of all PQ’s
- σ2 = variance of the total scores of all the people taking the test.

Sum of all PQ’s

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B26) and calculate the **P** using the formula such as this; “**=SUM(B2:B23)/COUNT(B2:B23)**“. Then “drag & Autofill” for the rest of the row till the very last question.

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B27) and calculate the **Q** using the formula such as this; “**=1-B26**“. Then “drag & Autofill” for the rest of the row till the very last question.

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B28) and calculate the **PQ** using the formula such as this; “**=B26*B27**“. Then “drag & Autofill” for the rest of the row till the very last question.

Then calculate the **Sum of All PQ’s **by typing command such as “**=SUM(B28:AE28)**” at the very last column (cell AF28), as what is illustrated below;

Variance

Please take note that a correct question scores 1 and an incorrect question scores 0. Therefore if you have 30 questions, the minimum possible score is 0 and the maximum possible score is 30. On the OMR machine report, these are reported as the raw scores, not the final marks that is based on 100%.

Variance is calculated by deducting the mean from each observed value and then squaring it. The sum of the squares were then divided with the number of observed value minus one.

Putting it all together to calculate reliability

*k*= 30- Sum of all PQ’s = 4.75
- σ2 = 31.83

Therefore **rho KR20** = (30/29)*(1 – (4.75/31.83)) **= 0.88.**

Since rho = 0.88, therefore the questions are highly reliable.

**Standard Error of Measurement**

**Standard error of measurement is related to test reliability** in that it provides an indication of the dispersion of the measurement errors when you are trying to estimate students’ true scores from their observed test scores.

The S.E.M. formula relies on the correct value for reliability. Therefore if the reliability value is wrong, your S.E.M. would be wrong too.

**Standard Error of Measurement** = Standard deviation * SQRT (1 – Reliability)

**Standard deviation** “provides a sort of average of the differences of all scores from the mean.” This means that it is a measure of the dispersion of scores around the mean. Variance is 31.83 (refer above), therefore since standard deviation is the square root of variance, square root of 31.83 is **5.64**.

Reliability here refers to the Kuder and Richardson Formula 20 Reliability Index as calculated above. The value calculated above was **0.88**.

**Standard Error of Measurement** = 5.64 * SQRT (1 – 0.88) = 5.64 * 0.35 **= 1.95**

So it is **1.95.**

**Conclusion**

If you don’t have the OMR machine, you can still calculate all the required indexes using just Microsoft Excel.

<- Part 1 Understanding the UKM OMR Report

Part 3 -> Implication of a High Failure Rate.

Chai-Eng Tan

said:Is there a reason why 27% is chosen as the cut-off point for calculating the discriminatory index?

drtamilppukm

said:The aim is to divide the group into 3 groups, upper 27%, middle 46% and lower 27%. Some textbooks uses 25% instead of 27%. Based on a paper entitled “The selection of upper and lower groups for the validation of test items” written by Kelley, T. L. in the Journal of Educational Psychology, Vol 30(1), Jan 1939, pages 17-24, upper and lower groups consisting of 27% from the extremes of the criterion score distribution are optimal for the study of test items. That is why currently we used 27%, because of this article written in 1939.

Pingback: Table of Contents | PPUKM

Pingback: Reliability of Examination Questions – Implication of a Large Failure Rate | PPUKM

Pingback: Difficulty Index, Discrimination Index and Reliability in the UKM OMR Report | PPUKM