Inter-Rater Reliability: Comparison of Checklist and Global Scoring for OSCEs - Creative Education

CE > Vol.3 No.6, October 2012

Inter-Rater Reliability: Comparison of Checklist and Global Scoring for OSCEs ()

HTML

Download as PDF (Size: 70KB) PP. 937-942

DOI: 10.4236/ce.2012.326142 7,421 Downloads 12,632 Views Citations

Author(s)

Bunmi S. Malau-Aduli, Sue Mulcahy, Emma Warnecke, Petr Otahal, Peta-Ann Teague, Richard Turner, Cees Van der Vleuten

Affiliation(s)

Centre for the Advancement of Learning and Teaching, University of Tasmania, Hobart, Australia.
Menzies Research Institute, University of Tasmania, Hobart, Australia.
School of Health Professions Education, Faculty of Health, Medicine and Life Sciences, Maastricht University, Netherlands.
School of Medicine and Dentistry, Faculty of Medicine, Health and Molecular Sciences, James Cook University, Townsville, Queensland, Australia.
School of Medicine, Faculty of Health Science, University of Tasmania, Hobart, Australia.

ABSTRACT

Objective Structured Clinical Examinations (OSCEs) have been used globally in evaluating clinical competence in the education of health professionals. Despite the objective intent of OSCEs, scoring methods used by examiners have been a potential source of measurement error affecting the precision with which test scores are determined. In this study, we investigated the differences in the inter-rater reliabilities of objective checklist and subjective global rating scores of examiners (who were exposed to an online training program to standardise scoring techniques) across two medical schools. Examiners’ perceptions of the e-scoring program were also investigated. Two Australian universities shared three OSCE stations in their end-of-year undergraduate medical OSCEs. The scenarios were video-taped and used for on-line examiner training prior to actual exams. Examiner ratings of performance at both sites were analysed using generalisability theory. A single facet, all random persons by raters design [PxR] was used to measure inter-rater reliability for each station, separate for checklist scores and global ratings. The resulting variance components were pooled across stations and examination sites. Decision studies were used to measure reliability estimates. There was no significant mean score difference between examination sites. Variation in examinee ability accounted for 68.3% of the total variance in checklist scores and 90.2% in global ratings. Rater contribution was 1.4% & 0% of the total variance in checklist score and global rating respectively, reflecting high inter-rater reliability of the scores provided by co-examiners across the two schools. Score variance due to interaction and residual error was larger for checklist scores (30.3% vs 9.7%) than for global ratings. Reproducibility coefficients for global ratings were higher than for checklist scores. Survey results showed that the e-scoring package facilitated consensus on scoring techniques. This approach to examiner training also allowed examiners to calibrate the OSCEs in their own time. This study revealed that inter-rater reliability was higher for global ratings than for checklist scores, thus providing further evidence for the reliability of subjective examiner ratings.

KEYWORDS

Objective Structured Clinical Examination; Inter-Rater Reliability; Checklist Scores; Global Ratings

Share and Cite:

Malau-Aduli, B. , Mulcahy, S. , Warnecke, E. , Otahal, P. , Teague, P. , Turner, R. & Vleuten, C. (2012). Inter-Rater Reliability: Comparison of Checklist and Global Scoring for OSCEs. Creative Education, 3, 937-942. doi: 10.4236/ce.2012.326142.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies