Tags

, , , ,

Part 2

In the previous article, I explained the usefulness of the OMR report as generated by the UKM OMR machine software. But not all places have such facilities and support from the IT Department. Therefore what if we want to do the same kind of analysis on our own? In this blog post, I will explain how to do such analysis just by using a spreadsheet software such as Microsoft Excel.

Whenever we are conducting an examination for our students, we are basically conducting a study to measure their knowledge and understanding of the subject matter at hand. Using the prepared questions, we can measure their knowledge, but is the resulting scores a true reflection of their comprehension?

Therefore we need to measure the reliability and the validity of the questions being asked. To do that, we need to measure the following;

  • Difficulty Index
  • Discrimination Index
  • Reliability Index (Kuder and Richardson Formula 20 Reliability Index)
  • Standard Error of Measurement

Getting the Raw Data for Analysis

Basically we have to enter all answers of each students into the spreadsheet. If there were a lot of questions and a large number of students, it would be difficult to enter the answers for each questions and each students into the spreadsheet. So if possible, beg, borrow or steal an OMR machine and scan the students’ answers into a text file.

Raw text file from an OMR scanner.

Raw text file from an OMR scanner.

Then convert the text file into Excel.

Sample of a text file converted into Excel.

Sample of a text file converted into Excel.

If you can’t get hold of an OMR machine, then consider having an online examination instead using Google Form, which will generate an Excel file for you at the end of the examination. Otherwise start typing in the data into Excel yourself, as illustrated above.

Then convert their answers into Correct and Incorrect Answers. Code 1 for correct answers, code 0 for wrong answers, as illustrated below.

correctcorrectcorrect

Code 1 for correct answer, code 0 for wrong answer.

Use the “Autosum” command to generate the total marks for each students. Do the top one first, then “drag and Autofill” for the rest of the column. If you don’t understand that, get your son or daughter to demonstrate that for you 🙂 .

Then sort the data using the “Sort Descending” command, so that the highest score is at the top and the lowest score is at the bottom.

Difficulty Index (D)

D = students with correct answer x 100
total students

Interpretation of the Difficulty Index

Interpretation of the Difficulty Index

Using the Excel table prepared earlier, go to the bottom of the first question column (cell B24) and calculate the Difficulty Index using the formula such as this;  “=SUM(B2:B23)/COUNT(B2:B23)*100“. Then “drag & Autofill” for the rest of the row till the very last question.

Calculate the Difficulty Index using Microsoft Excel

Calculating the Difficulty Index using Microsoft Excel

Then calculate the Average Difficulty Index by typing command such as “=AVERAGE(B24:AE24)” at the very last column (cell AF24), as what is illustrated below;

averagediff

Calculating the Average Difficulty Index.

Discrimination Index (R)

R = (H – L)
      27% of Total

H = number of correct answers from top 27% of students
L = number of corrects answers from bottom 27% of students

Since we have 22 students in this example, 27% out of 22 = 6 students. So we will have to take the number of correct answers from 6 of the top students (H) and deduct the number of correct answers from 6 of the bottom students (L), then divide it by 6.

interpretdisc

Interpretation of Discrimination Index.

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B25) and calculate the Discrimination Index using the formula such as this; “=(SUM(B2:B7))-(SUM(B18:B23)))/6“. Then “drag & Autofill” for the rest of the row till the very last question.

Calculating Discrimination Index using Microsoft Excel.

Calculating Discrimination Index using Microsoft Excel.

Then calculate the Average Discrimination Index by typing command such as “=AVERAGE(B25:AE25)” at the very last column (cell AF25), as what is illustrated below;

Calculating the Average Discrimination Index.

Calculating the Average Discrimination Index.

Transposing the Difficulty and Discrimination Index for Analysis

We have calculated the Difficulty and Discrimination Index for all 30 questions. Now we need to categorise them and prepare a frequency table. In its current incarnation, that would be difficult. So we need to transpose the current table to facilitate analysis. This is how we do it;

Select the cells, click copy, insert new sheet, click paste special and select transpose.

Select all the cells, click copy, insert new sheet, click paste special and select transpose.

The resulting transposed sheet would look like this, which could be imported into SPSS and analysed.

Import transposed table into SPSS for analysis.

Import the transposed table into SPSS for analysis.

Questions with low difficulty indexes (below 30) and zero/negative discrimination indexes should be checked again. Either the answer keys were wrong or the questions were poorly prepared.

Kuder and Richardson Formula 20 Reliability Index

This index checks the internal consistency of measurements with dichotomous choices (i.e. Correct versus Incorrect). A correct question scores 1 and an incorrect question scores 0. Value of this index ranges from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

Kuder and Richardson Formula 20 Reliability Index

Kuder and Richardson Formula 20 Reliability Index.

  • k= number of questions
  • Sum of all PQ’s (p = probability of correct answer, q = probability of wrong answer)
  • σ2 = variance of the total scores of all the people taking the test.

Therefore to calculate the reliability index, you need to calculate the following items first;

  • p of each questions (p = probability of correct answer)
  • q of each questions (q = probability of wrong answer)
  • Sum of all PQ’s
  • σ2 = variance of the total scores of all the people taking the test.

Sum of all PQ’s

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B26) and calculate the P using the formula such as this; “=SUM(B2:B23)/COUNT(B2:B23)“. Then “drag & Autofill” for the rest of the row till the very last question.

Calculating the P value

Calculating the P value

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B27) and calculate the Q using the formula such as this; “=1-B26“. Then “drag & Autofill” for the rest of the row till the very last question.

Q

Calculating the Q values.

Using the Excel Table prepared earlier, go to the bottom of the first question column (cell B28) and calculate the PQ using the formula such as this; “=B26*B27“. Then “drag & Autofill” for the rest of the row till the very last question.

PQ

Calculating the PQ values.

Then calculate the Sum of All PQ’s by typing command such as “=SUM(B28:AE28)” at the very last column (cell AF28), as what is illustrated below;

SUMPQ

The sum of all the PQ’s is 4.75.

Variance

Please take note that a correct question scores 1 and an incorrect question scores 0. Therefore if you have 30 questions, the minimum possible score is 0 and the maximum possible score is 30. On the OMR machine report, these are reported as the raw scores, not the final marks that is based on 100%.

The number of correct and wrong answers. The raw scores is based on the number of correct answers.

What we want are variances of the raw scores to calculate the reliability index. Not those based on the 100%

Variance is calculated by deducting the mean from each observed value and then squaring it. The sum of the squares were then divided with the number of observed value minus one.

Formula for variance.

Formula for variance.

Variance is 31.83, not 337.54

Variance is 31.83.

Putting it all together to calculate reliability

  • = 30
  • Sum of all PQ’s = 4.75
  • σ2 = 31.83

Therefore rho KR20 = (30/29)*(1 – (4.75/31.83)) = 0.88.
Since rho = 0.88, therefore the questions are highly reliable.

Standard Error of Measurement

Standard error of measurement is related to test reliability in that it provides an indication of the dispersion of the measurement errors when you are trying to estimate students’ true scores from their observed test scores.

The S.E.M. formula relies on the correct value for reliability. Therefore if the reliability value is wrong, your S.E.M. would be wrong too.

Standard Error of Measurement = Standard deviation * SQRT (1 – Reliability)

Standard deviation “provides a sort of average of the differences of all scores from the mean.” This means that it is a measure of the dispersion of scores around the mean. Variance is 31.83 (refer above), therefore since standard deviation is the square root of variance, square root of 31.83 is 5.64.

How the data is distributed around the mean.

How the data is distributed around the mean.

Reliability here refers to the Kuder and Richardson Formula 20 Reliability Index as calculated above. The value calculated above was 0.88.

Standard Error of Measurement = 5.64 * SQRT (1 – 0.88) = 5.64 * 0.35 = 1.95

So it is 1.95.

Conclusion

If you don’t have the OMR machine, you can still calculate all the required indexes using just Microsoft Excel.

<- Part 1 Understanding the UKM OMR Report
Part 3 -> Implication of a High Failure Rate.

Advertisements