The Classical Test or Item Response Measurement Theory: The Status of the Framework at the Examination Council of Lesotho

Musa Adekunle Ayanwale, Julia Chere-Masopha, Malebohang Catherine Morena

Abstract


While the Examination Council of Lesotho (ECOL) is burdened with a huge workload of assessment tasks, their procedures for developing tests, analysing items, and compiling scores heavily rely on the classical test theory (CTT) measurement framework. The CTT has been criticised for its flaws, including being test-oriented, sample dependent, and assuming linear relationships between latent variables and observed scores. This article presents an overview of CTT and item response theory (IRT) and how they were applied to standard assessment questions in the ECOL. These theories have addressed measurement issues associated with commonly used assessments, such as multiple-choice, short response, and constructed response tests. Based on three search facets (Item response theory, classical test theory, and examination council of Lesotho), a comprehensive search was conducted across multiple databases (such as Google Scholar, Scopus, Web of Science, and PubMed). The paper was theoretically developed using the electronic databases, keywords, and references identified in the articles. Furthermore, the authors ensure that the keywords are used to identify relevant documents in a wide variety of sources. A general remark was made on the effective application of each model in practice with respect to test development and psychometric activities. In conclusion, the study recommends that ECOL switch from CTT to modern test theory for test development and item analysis, which offers multiple benefits.

https://doi.org/10.26803/ijlter.21.8.22


Keywords


classical test theory; item response theory; Examination Council of Lesotho; item development; item analysis

Full Text:

PDF

References


Ackerman, T. A. (2010). The Theory and Practice of Item Response Theory by de Ayala, R. J. Journal of Educational Measurement, 47(4), 471–476. https://doi.org/10.1111/j.1745-3984.2010.00124.x

Adedoyin, O. O. (2010). Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories. International Journal of Educational Sciences, 2(2), 107–113. https://doi.org/10.31901/24566322.2010/02.02.07

Adegoke, B. A. (2013). Comparison of item statistics of physics achievement test using Classical test theory and item response theory frameworks. Journal of Education and Practice, 22(4), 87–96. www.iiste.org

Adewale, J.G., Adegoke, B.A., Adeleke, J.O. & Metibemu, M. A. (2017). A Training Manual On Item Response Theory. Institute of Education, University of Ibadan in Collaboration with National Examinations Council, Minna, Niger State.

Alagoz, C. (2005). Scoring tests with dichotomous and polytomous items. https://getd.libs.uga.edu/pdfs/alagoz_cigdem_200505_ma.pdf

Algina, J., & Swaminathan, H. (2015). Psychometrics: Classical Test Theory. In International Encyclopedia of the Social & Behavioral Sciences: Second Edition (pp. 423–430). Elsevier Inc. https://doi.org/10.1016/B978-0-08-097086-8.42070-2

Ayanwale, M.A. (2019). Efficacy of Item Response Theory in the Validation and Score Ranking of Dichotomous and Polytomous Response Mathematics Achievement Tests in Osun State, Nigeria. In Doctoral Thesis, Institute of Education, University of Ibadan (Issue April). https://doi.org/10.13140/RG.2.2.17461.22247

Ayanwale, Musa Adekunle, Adeleke, J. O., & Mamadelo, T. I. (2019). Invariance Person Estimate of Basic Education Certificate Examination: Classical Test Theory and Item Response Theory Scoring Perspective. Journal of the International Society for Teacher Education, 23(1), 18–26. https://files.eric.ed.gov/fulltext/EJ1237578.pdf

Baker, F.B. (2001). The Basics of Item Response Theory. Test Calibration. ERIC Clearinghouse on Assessment and Evaluation.

Baker, Frank B, & Kim, S. (2017). The Basics of Item Response Theory Using R (S. E. Fienberg (ed.)). Springer International Publishing. https://doi.org/10.1007/978-3-319-54205-8_1

Behavior, S., Yen, Y. C., Chen, H., & Cheng, M. (2012). The Four-Parameter Logistic Item Response Theory Model As a Robust Method of Estimating Ability Despite Aberrant Responses. Social Behavior and Personality: An international journal, 40(10), 1679-1694.https://doi.org/10.2224/sbp.2012.40.10.1679

Bichi, A. A., Embong, R., Talib, R., Salleh, S., & Bin Ibrahim, A. (2019). Comparative Analysis of Classical Test Theory and Item Response Theory using Chemistry Test Data. International Journal of Engineering and Advanced Technology, 8(5), 1260–1266. https://doi.org/10.35940/ijeat.E1179.0585C19

Bichi, A. A., & Talib, R. (2018). Item Response Theory: An Introduction to Latent Trait Models to Test and Item Development. International Journal of Evaluation and Research in Education, 7(2), 142. https://doi.org/10.11591/ijere.v7i2.12900

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In: Lord, F.M. and Novick, M.R., Eds., Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, 397-479.

Bovaird, J. A., & Embretson, S. E. (2012). Modern Measurement in the Social Sciences. In The SAGE Handbook of Social Research Methods (pp. 268–289). SAGE Publications Ltd. https://doi.org/10.4135/9781446212165.n16

Brown, J. D. (2013). Classical test theory. In The Routledge Handbook of Language Testing (pp. 323–335). Springer, Singapore. https://doi.org/10.4324/9780203181287-35

Cai, L., Choi, K., Hansen, M., & Harrell, L. (2016). Item Response Theory. In Annual Review of Statistics and Its Application (Vol. 3, pp. 297–321). Annual Reviews Inc. https://doi.org/10.1146/annurev-statistics-041715-033702

Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006

Chen, W. H., & Thissen, D. (1997). Local Dependence Indexes for Item Pairs Using Item Response Theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.3102/10769986022003265

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https://doi.org/10.1037/pas0000626

Cohen, R.J., & Swerdlik, M. E. (2009). Psychological testing and assessment: An introduction to tests and measurement. (4th ed.). Mayfield Publishing House.

Cohen, R. . J., Swerdlik, M. E., & Sturman, E. (2013). Psychological testing and assessment : an introduction to tests and measurement. Psychological Assessment, 53(4), 55–67. https://perpus.univpancasila.ac.id/repository/EBUPT181396.pdf

Courville, T. G. (2005). An empirical comparison of item response theory and classical test theory item/person statistics. Dissertation Abstracts International Section A: Humanities and Social Sciences, 65(7), 2575. https://oaktrust.library.tamu.edu/handle/1969.1/1064

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Harcourt Brace Jovanovich. https://eric.ed.gov/?id=ED312281

De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. In Medical Education, 44(1), 109–117. https://doi.org/10.1111/j.1365-2923.2009.03425.x

Debelak, R., & Koller, I. (2020). Testing the Local Independence Assumption of the Rasch Model With Q3-Based Nonparametric Model Tests. Applied Psychological Measurement, 44(2), 103–117. https://doi.org/10.1177/0146621619835501

Demars, C. E. (2017). Classical test theory and item response theory. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development, 2(1), 49–73. https://doi.org/10.1002/9781118489772.ch2

Dent, J. A., Harden, R. M., & Hunt, D. (2001). A Practical Guide for Medical Teachers. Journal of the Royal Society of Medicine, 94(12), 653–653. https://doi.org/10.1177/014107680109401222

DeVellis, R. F. (2006). Classical test theory. Medical Care, 44(11), 50-59. https://doi.org/10.1097/01.mlr.0000245426.10853.30

Downing, S. M. (2003). Item response theory: Applications of modern test theory in medical education. Medical Education, 37(8), 739–745. https://doi.org/10.1046/j.1365-2923.2003.01587.x

Ebel, R. L. (1965). Book Reviews : Measuring Educational Achievement. Educational and Psychological Measurement, 25(4), 1167–1169. https://doi.org/10.1177/001316446502500428

Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5–18. https://doi.org/10.1007/s11136-007-9198-0

Elgadal, A. H., & Mariod, A. A. (2021). Item Analysis of Multiple-choice Questions (MCQs): Assessment Tool For Quality Assurance Measures. Sudan Journal of Medical Sciences, 16(3), 334-346. https://doi.org/10.18502/sjms.v16i3.9695

Embretson, S. E., & Reise, S. P. (2013). Item Response Theory for Psychologists. Lawrence Erlbaum Associates, Inc., Mahwah. 1–371. https://doi.org/10.4324/9781410605269

Esmaeeli, B., Shandiz, E. E., Norooziasl, S., & Shojaei, H. (2021). The Optimal Number of Choices in Multiple-Choice Tests : A Systematic Review. Med Edu Bull, 2(5), 253–260. https://doi.org/10.22034/MEB.2021.311998.1031

Exam Council of, L. (2018). Establishment of ECOL. https://www.google.com/search?q=Exam+Council+of+Lesotho%2C+2018&oq=Exam+Council+of+Lesotho%2C+2018&aqs=chrome..69i57j33i160l2.2034j0j7&sourceid=chrome&ie=UTF-8

Filgueiras, A., Hora, G., Fioravanti-Bastos, A. C. M., Santana, C. M. T., Pires, P., De Oliveira Galvão, B., & Landeira-Fernandez, J. (2014). Development and psychometric properties of a novel depression measure. Temas Em Psicologia, 22(1), 249–269. https://doi.org/10.9788/TP2014.1-19

Finch, H., & Monahan, P. (2008). A bootstrap generalization of modified parallel analysis for IRT dimensionality assessment. Applied Measurement in Education, 21(2), 119–140. https://doi.org/10.1080/08957340801926102

Finch, W. H., & French, B. F. (2015). Modeling of Nonrecursive Structural Equation Models With Categorical Indicators. Structural Equation Modeling, 22(3), 416–428. https://doi.org/10.1080/10705511.2014.937380

Ganglmair, A., & Lawson, R. (2010). Advantages of Rasch modelling for the development of a scale to measure affective response to consumption. In E-European Advances in Consumer Research, 6, 162–167. https://www.acrwebsite.org/volumes/11738

Gay, L.R, Miles, G. E. & Airasian, P. (2011). Educational Research: Competencies for Analysis and Applications. 10th Edition, Pearson Education International, Boston.

González, J., & Wiberg, M. (2017). Applying Test Equating Methods using R. Methodology of Educational Measurement and Assessment. https://link.springer.com/bfm:978-3-319-51824-4/1

Hambleton, R.K. and Swaminathan, H. (1985). Item response theory: principles and applications. p.332. https://doi.org/10.1177/014662168500900315

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x

Hill, C., Nel, J. A., van de Vijver, F. J. R., Meiring, D., Valchev, V. H., Adams, B. G., & de Bruin, G. P. (2013). Developing and testing items for the South African Personality Inventory. SA Journal of Industrial Psychology, 39(1), 1-13. https://doi.org/10.4102/sajip.v39i1.1122

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association, 62(2), 142–147. https://pubmed.ncbi.nlm.nih.gov/22755376/

Immekus, J. C., Snyder, K. E., & Ralston, P. A. (2019). Multidimensional Item Response Theory for Factor Structure Assessment in Educational Psychology Research. Frontiers in Education, 4. https://doi.org/10.3389/feduc.2019.00045

IResearchNet (2022). Classical Test Theory. http://psychology.iresearchnet.com/industrial-organizational-psychology/i-o-psychology-theories/classical-test-theory/

Jabrayilov, R., Emons, W. H. M., & Sijtsma, K. (2016). Comparison of Classical Test Theory and Item Response Theory in Individual Change Assessment. Applied Psychological Measurement, 40(8), 559–572. https://doi.org/10.1177/0146621616664046

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. https://doi.org/10.1037/h0057123

Khan, H. F., Danish, K. F., Awan, A. S., & Anwar, M. (2013). Identification of technical item flaws leads to improvement of the quality of single best multiple choice questions. Pakistan Journal of Medical Sciences, 29(3), 715. https://doi.org/10.12669/pjms.293.2993

Kim, D., de Ayala, R. J., Ferdous, A. A., & Nering, M. L. (2011). The comparative performance of conditional independence indices. Applied Psychological Measurement, 35(6), 447–471. https://doi.org/10.1177/0146621611407909

Kline, R. B. (2005). “Principles and practice of structural equation modelling ”. ((2nd ed.)). The Guilford Press.

Kline, T. (2014). Classical Test Theory: Assumptions, Equations, Limitations, and Item Analyses. In Psychological Testing: A Practical Approach to Design and Evaluation, 23(2), 91–106. https://doi.org/10.4135/9781483385693.n5

Kolen, M. J. (1981). Comparison of traditional and Item Response Theory methods for equatingTests. Journal of Educational Measurement, 18(1), 1–11. https://doi.org/10.1111/j.1745-3984.1981.tb00838.x

Krishnan, V. (2013). The Early Child Development Instrument ( EDI ): An item analysis using Classical Test Theory ( CTT ) on Alberta ’ s data. Early Child Development Mapping (ECMap) Project Community-University Partnership (CUP) Faculty of Extension, University of Alberta.

Lang, J. W. B., & Tay, L. (2021). The Science and Practice of Item Response Theory in Organizations. In Annual Review of Organizational Psychology and Organizational Behavior, 8, 311–338. https://doi.org/10.1146/annurev-orgpsych-012420-061705

Lee, W., & Ansley, T. N. (2007). Assessing IRT Model-Data Fit for mixed format tests. Journal of Applied Psychology, 92(2), 23–50. http://dx.doi.org/10.1026/apl0000636

Lord, F. M. (2012). Applications of item response theory to practical testing problems. In Applications of Item Response Theory To Practical Testing Problems. https://doi.org/10.4324/9780203056615

Magis, D. (2007). Influence, Information and Item Response Theory in Discrete Data Analysis. Retrieved on 12 June, 2022 from http://bictel.ulg.ac.be/ETDdb/collection/available/ULgetd-06122007-100147/.

Mona, N. (2014). Application of Classical Test Theory and Item Response Theory to Analyze Multiple Choice Questions (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/24958

Nataranjan, V. (2009). Basic Principle of Item Response Theory and Application to Practical Testing and Assessement. Merit Trac Services Publishing Ltd.

Ojerinde, D. & Ifewulu, B. C. (2012). Item Unidimensionality Using 2010 Unified Tertiary Matriculation Examination Mathematics Pre-test. A Paper Presented at the 2012 International Conference of IAEA, 5–18.

Pliakos, K., Joo, S. H., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers and Education, 137, 91–103. https://doi.org/10.1016/j.compedu.2019.04.009

Preston, R., Gratani, M., Owens, K., Roche, P., Zimanyi, M., & Malau-Aduli, B. (2020). Exploring the Impact of Assessment on Medical Students’ Learning. Assessment and Evaluation in Higher Education, 45(1), 109–124. https://doi.org/10.1080/02602938.2019.1614145

Privitera, G. J. (2012). Statistics for the behavioral sciences. Sage Publications, Inc. https://psycnet.apa.org/record/2011-21294-000

Reckase, M. D. (2009). Multidimensional Item ResponseTheory. Springer Verlag.

Reise, S. P. (1990). A Comparison of Item- and Person-Fit Methods of Assessing Model-Data Fit in IRT. Applied Psychological Measurement, 14(2), 127–137. https://doi.org/10.1177/014662169001400202

Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27–48. https://doi.org/10.1146/Annurev.Clinpsy.032408.153553

Rupp, A. A. (2003). Item Response Modeling With BILOG-MG and MULTILOG for Windows. International Journal of Testing, 3(4), 365–384. https://doi.org/10.1207/s15327574ijt0304_5

Rusch, T., Lowry, P. B., Mair, P., & Treiblmaier, H. (2017). Breaking free from the limitations of classical test theory: Developing and measuring information systems scales using item response theory. Information and Management, 54(2), 189–203. https://doi.org/10.1016/j.im.2016.06.005

Sim, S. M., & Rasiah, R. I. (2006). Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Annals of the Academy of Medicine Singapore, 35(2), 67–71. http://www.ams.edu.sg

Song, Y., Kim, H., & Park, S. Y. (2019). An Item Response Theory Analysis of the Korean Version of the CRAFFT Scale for Alcohol Use Among Adolescents in Korea. Asian Nursing Research, 13(4), 249–256. https://doi.org/10.1016/j.anr.2019.09.003

Steyer, R. (2001). Classical (Psychometric) Test Theory. International Encyclopedia of the Social & Behavioral Sciences, 1955–1962. https://doi.org/10.1016/B0-08-043076-7/00721-X

Tang, X., Karabatsos, G., & Chen, H. (2020). Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items. Applied Measurement in Education, 280–292. https://doi.org/10.1080/08957347.2020.1789136

Tay, L., Meade, A. W., & Cao, M. (2015). An Overview and Practical Guide to IRT Measurement Equivalence Analysis. Organizational Research Methods, 18(1), 3–46. https://doi.org/10.1177/1094428114553062

Toksöz, S., & Ertunç, A. (2017). Item Analysis of a Multiple-Choice Exam. Advances in Language and Literary Studies, 8(6), 141. https://doi.org/10.7575/aiac.alls.v.8n.6p.141

Traub, R. E. (2015). Classical test theory in historical perspective. Journal of Educational Measurement: Issues and Practice, 16(4), 8–14. https://doi.org/10.1111/emip.2015.16.issue-4

Tuerlinckx, F., Rijmen, F., Molenberghs, G., Verbeke, G., Briggs, D., Van den Noortgate, W., Meulders, M., & De Boeck, P. (2004). Estimation and software. In Explanatory Item Response Models, 6, 343–373. https://doi.org/10.1007/978-1-4757-3990-9_12

Vyas, R., & Supe, A. (2008). Multiple choice questions: A literature review on the optimal number of options. In National Medical Journal of India, 21(3), 130–133. https://pubmed.ncbi.nlm.nih.gov/19004145/

Wells, C. S., & Wollack, J. A. (2018). An Instructor’s Guide to Understanding Test Reliability. Testing and Evaluation Services, 1–7. https://testing.wisc.edu/Reliability.pdf

Yen, W. M. (1993). Scaling Performance Assessments: Strategies for Managing Local Item Dependence. Journal of Educational Measurement, 30(3), 187–213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x

Yu, C. H., Popp, S. O., Digangi, S., & Jannasch-Pennell, A. (2007). Assessing unidimensionality: A comparison of Rasch modeling, Parallel analysis, and TETRAD. Practical Assessment, Research and Evaluation, 12(14), 1–19. https://doi.org/https://doi.org/10.7275/q7g0-vt50

Zhang, J. (2012). Calibration of Response Data Using MIRT Models With Simple and Mixed Structures. Applied Psychological Measurement, 36(5), 375–398. https://doi.org/10.1177/0146621612445904

Zhu, X., & Lu, C. (2017). Re-evaluation of the New Ecological Paradigm scale using item response theory. Journal of Environmental Psychology, 54, 79–90. https://doi.org/10.1016/j.jenvp.2017.10.005


Refbacks

  • There are currently no refbacks.


e-ISSN: 1694-2116

p-ISSN: 1694-2493