Assessment of Inter-examiner Agreement on Double Marking of Essay Papers

Than Myint; Thant Zin; Kyaw Htay; Kyaw Min; Zainal Arifin Mustapha; Ahmad Faris Abdullah

doi:10.51200/bjms.v%vi%i.1004

Authors

Than Myint Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah
Thant Zin Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah
Kyaw Htay Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah
Kyaw Min Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah
Zainal Arifin Mustapha Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah
Ahmad Faris Abdullah Faculty of Medicine and Health Sciences, Jalan UMS, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah

DOI:

https://doi.org/10.51200/bjms.v%25vi%25i.1004

Keywords:

assessment, double marking, essay, interrater reliability

Abstract

Assessment method of medical students byÂ conducting examination is to identify the qualityÂ and quantity of their academic performance. EssayÂ paper is one of the most common assessment toolsÂ in the Faculty of Medicine and Health Sciences,Â Universiti Malaysia Sabah. Double marking isÂ a means by which academic staff attempts to
produce fair results for the students. Eighty eightÂ medical students sat for three sets of Essay PapersÂ of Professional I examination in March 2012.Â The double marking on the essay papers wasÂ done by two lecturers of each clinical departmentÂ concerned for each discipline. Inter-examinerÂ agreement and its effect on the reliability of theÂ final score for the students were calculated byÂ using Kappa statistics and Intra-class Correlation
Coefficient (ICC). Reliability coefficient of theÂ scores were also calculated for the differentÂ disciplines. In Part A essay paper, Cohanâ€™s KappaÂ was 0.48 (p<0.001) and ICC was 0.943 (p<0.001)Â with Cronbachâ€™s alpha = 0.95 for both markings.Â Pearson correlation was 0.91 (p<0.001). In Part BÂ essay paper, Cohanâ€™s Kappa was 0.28 (p<0.05) andÂ ICC was 0.753 (p<0.001) with Cronbachâ€™s alpha =Â 0.81 for both markings. Pearson correlation wasÂ 0.69 (p<0.001). In Part C essay paper, Cohanâ€™s
Kappa was 0.02 (p>0.05) and ICC was 0.256Â (p<0.001) with Cronbachâ€™s alpha = 0.64 for bothÂ markings. Pearson correlation was 0.57 (p<0.001).Â Mean difference of double marking in Part A, PartÂ B and Part C essay paper were âˆ’0.51 (SD= Â±1.3),Â âˆ’1.11 (SD= Â±1.4) and âˆ’5.25 (SD= Â±2.6) respectively.Â There were more interrater reliability in Part AÂ and Part B essay paper than Part C essay paper.Â Part A essay paper had more degree of consistencyÂ between raters than other papers as PearsonÂ correlation was high. Mean difference of doubleÂ marking in Part C essay paper was higher thanÂ other papers and degree of consistency betweenÂ raters was also lower (Pearson correlationÂ coefficient = 0.57). We conclude that there shouldÂ be certain criteria which are designed carefully
and used with clear procedures which can reduceÂ inconsistencies in assessment.