Item Analysis of a Reading Test in Sri Lankan Context Using Classical Test Theory

Fouzul Kareema Mohamed Ismail, Ainol Madziah Bt Zubairi

Abstract


This paper is based on a research study on a reading test that evaluates the different cognitive processes prescribed by Khalifa and Weir (2009). The 25-item test was designed based on a test specification targeted at the B2 level of the Common European Framework of Reference for Languages (CEFR). The responses of 50 students were used to check the validity and reliability of the test. The validity of the test was ascertained through item analysis involving item difficulty indices, item discrimination indices, and distractor analysis. Each item was studied to provide detailed information leading to the improvement of test construction. To achieve test reliability, the Kuder-Richardson Formula 20 (KR-20) was applied. The results were achieved by simply using Microsoft Excel. Findings revealed that the test met the standards for content validity, indicating acceptable item difficulty indices, with 17 items at the moderate level between the ranges of 0.30 and 0.79. Except for three items, all others functioned well to differentiate between high- and low-ability students, and only five items had malfunction distractors. Meanwhile, the reliability value of the test scores was 0.82, which is deemed a good value, proving the consistency of the test results. It signifies that more than half, that is 88%, of the test items were well functioning and that the test proved to be valid and reliable. The present research can contribute to students, teachers, and test-makers having an insightful understanding of item analysis and test development. 

https://doi.org/10.26803/ijlter.21.3.3


Keywords


cognitive processing in reading; distractor; item difficulty; item discrimination

Full Text:

PDF

References


Alderson, J. C. (2000). Assessing reading. Cambridge Assessment English.

Bax, S., & Chan, S. H. C. (2016). Researching the cognitive validity of GEPT high intermediate and advanced reading: An eye-tracking and stimulated recall study. LTTC-GEPT Research Reports, 7, 1-47. www.lttc.ntu.edu.tw/lttc-gept-grants/RReport/RG07.pdf

Bichi, A. A., & Embong, R. (2018). Evaluating the quality of Islamic civilization and Asian civilizations examination questions. Asian People Journal (APJ), 1(1), 93-109. www.uniszajournals.com/apj

Brown, H. D., & Abeywickrama, P. (2010). Language assessment: Principles and classroom practices (Vol. 10). Pearson Education.

Carlsen, C. H. (2018). The adequacy of the B2 level as university entrance requirement. Language Assessment Quarterly, 15(1), 75-89. https://doi.org/10.1080/15434303.2017.1405962

Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. https://rm.coe.int/1680459f97

Creswell, J. W. (2012). Educational research: Planning, conducting and evaluating quantitative and qualitative research (4th ed.). Pearson.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Eric.

Deygers, B., Zeidler, B., Vilcu, D., & Carlsen, C. H. (2018). One framework to unite them all? Use of the CEFR in European university entrance policies. Language Assessment Quarterly, 15(1), 3-15. https://eric.ed.gov/?id=EJ1171980

Dundar, H., Millot, B., Riboud, M., Shojo, M., Goyal, S., & Raju, D. (2017). Sri Lanka education sector assessment: Achievements, challenges, and policy options. World Bank Group. https://doi.org/10.1596/978-1-4648-1052-7

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of educational measurement (5th ed.). Prentice Hall.

Eleje, L. I., Onah, F. E., & Abanobi, C. C. (2018). Comparative study of classical test theory and item response theory using diagnostic quantitative economics skill test item analysis results. European Journal of Educational and Social Sciences, 3(1), 57 75. https://www.researchgate.net/publication/343557487

Fleckenstein, J., Leucht, M., & Köller, O. (2018). Teachers’ judgement accuracy concerning CEFR levels of prospective university students. Language Assessment Quarterly, 15(1), 90-101. https://doi.org/10.1080/15434303.2017.1421956

Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge.

Halek, M., Holle, D., & Bartholomeyczik, S. (2017). Development and evaluation of the content validity, practicability and feasibility of the Innovative Dementia-Oriented Assessment System for Challenging Behaviour in Residents with Dementia. BMC Health Services Research, 17(1), 554. https://doi.org/10.1186/s12913-017-2469-8

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x

Kastner, M., & Stangl, B. (2011). Multiple choice and constructed response tests: Do test format and scoring matter? Procedia – Social and Behavioral Sciences, 12, 263-273. https://doi.org/10.1016/j.sbspro.2011.02.035

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17-24. https://doi.org/10.1037/h0057123

Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second language reading. Cambridge University Press.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. https://doi.org/10.1007/BF02288391

Linguapress. (2020). A comparison of different readability scales. https://linguapress.com/teachers/flesch-kincaid.htm

Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment, 1(1), 1-11. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1426043

Manalu, D. (2019). An analysis of students reading final examination by using item analysis program on eleventh grade of SMA Negeri 8 Medan. Journal of English Teaching & Applied Linguistics, 1(1), 13-19. http://repository.uhn.ac.id/handle/123456789/2796

McNamara, T. F. (1996). Measuring second language performance. Longman Publishing Group.

Messick, S. (1989). Validity. In R. L. Linn (ed.), Educational measurement (3rd ed.; pp. 13 104). MacMillan.

Natova, I. (2019). Estimating CEFR reading comprehension text complexity. The Language Learning Journal, 49(6), 699-710. https://doi.org/https://doi.org/10.1080/09571736.2019.1665088

Powell, J. L., & Gillespie, C. (1990). Assessment: All tests are not created equally. https://files.eric.ed.gov/fulltext/ED328908.pdf

Pratiwi, R., Antini, S., & Walid, A. (2021). Analysis of item difficulty index for midterm examinations in junior high schools 5 Bengkulu City. Asian Journal of Science Education, 3(1), 12-18. http://www.jurnal.unsyiah.ac.id/AJSE/article/view/18895

Samad, A. (2004). Essentials of language testing for Malaysian teachers. UPM Press.

Shanmugam, S. K. S., Wong, V., & Rajoo, M. (2020). Examining the quality of English test items using psychometric and linguistic characteristics among grade six pupils. Malaysian Journal of Learning and Instruction, 17(2), 63-101. https://files.eric.ed.gov/fulltext/EJ1272266.pdf

Tamil, A. M. (2015). Calculating difficulty, discrimination and reliability index/standard error of measurement. PPUKM. https://ppukmdotorg.wordpress.com/2015/04/02/calculating-omr-indexes/

Turner, R. C., & Carlson, L. (2003). Indexes of item-objective congruence for multidimensional items. International Journal of Testing, 3(2), 163-171. https://doi.org/10.1207/s15327574ijt0302_5

Urquhart, A. H., & Weir, C. J. (1998). Reading in a second language: Process, product and practice. Longman.

Waluyo, B. (2019). Thai first-year university students’ English proficiency on CEFR levels: A case study of Walailak University, Thailand. The New English Teacher, 13(2), 51 71. http://www.assumptionjournal.au.edu/index.php/newEnglishTeacher/article/view/3651

Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.

Yusup, R. B. (2012). Item evaluation of the reading test of the Malaysian University English Test (MUET) (Master’s thesis). The University of Melbourne. http://hdl.handle.net/11343/37608

Zimmerman, D. W. (1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory. Educational and Psychological Measurement, 32(4), 939 954. https://doi.org/10.1177/001316447203200408

Zubairi, A. M., & Kassim, N. L. A. (2006). Classical and Rasch analyses of dichotomously scored reading comprehension test items. Malaysian Journal of ELT Research, 2(1), 1 20. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.535.2955&rep=rep1&type=pdf


Refbacks

  • There are currently no refbacks.


e-ISSN: 1694-2116

p-ISSN: 1694-2493