EQUIVALENCY EVIDENCE OF THE ENGLISH COMPETENCY TEST ACROSS DIFFERENT MODES: A RASCH ANALYSIS

Muhammad Yoga Prabowo; Sarah Rahmadian

doi:10.15639/teflinjournal.v34i2/301-319

Authors

Muhammad Yoga Prabowo
muhammadyoga@student.unimelb.edu.au
University of Melbourne, Australia
Sarah Rahmadian Financial Education and Training Agency, Indonesia

Vol. 34 No. 2 (2023)

Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

The outbreak of the COVID-19 pandemic has transformed the educational landscape in a way unseen before. Educational institutions are navigating between offline and online learning worldwide. Computer-based testing is rapidly taking over paper-and-pencil testing as the dominant mode of assessment. In some settings, computer-based and paper-and-pencil assessments can also be offered side-by-side, in which case test developers should ensure the evidence of equivalence between both versions. This study aims to establish the equivalency evidence of different delivery modes of the English Competency Test, an English language assessment for civil service officers developed and used by the Human Resources Development Education and Training Center, a civil service training institution under the Ministry of Finance of the Republic of Indonesia. Psychometric analyses were carried out with the Rasch model to measure the unidimensionality, reliability, separation, and standard error of measurement estimates. The findings demonstrate that the paper-and-pencil and computer-based versions of the language assessment exhibit comparatively equivalent psychometric properties. The computer-based version of the English Competency Test is proven to offer a reliable and comparable alternative to the paper-and-pencil version.

Prabowo, M. Y., & Rahmadian, S. (2023). EQUIVALENCY EVIDENCE OF THE ENGLISH COMPETENCY TEST ACROSS DIFFERENT MODES: A RASCH ANALYSIS. TEFLIN Journal, 34(2), 301–319. https://doi.org/10.15639/teflinjournal.v34i2/301-319

Download Citation

Ahmad, J., & Siew, N. M. (2021). Curiosity towards STEM education: A questionnaire for primary school students. Journal of Baltic Science Education, 20(2), 289–304. https://doi.org/10.33225/jbse/21.20.289

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.

Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487

Bailes, L. P., & Nandakumar, R. (2020). Get the most from your survey: An application of Rasch analysis for education leaders. International Journal of Education Policy and Leadership, 16(2), 1–19. https://doi.org/10.22230/ijepl.2020v16n2a857

Berman, A. I., Haertel, E. H., & Pellegrino, J. W. (2020). Comparability of large-scale educational assessments: Issues and recommendations. National Academy of Education.

Berry, V., Kremmel, B., & Plough, I. (2020). International Language Testing Association guidelines for practice. International Language Testing Association

Bond, T., & Fox, C. (2015). Applying the Rasch model, (3rd ed.). Routledge. https://doi.org/10.4324/9781315814698.

Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Rasch analysis in the human sciences. Springer. https://doi.org/10.1007/978-94-007-6857-4

Boone, W. J. (2016). Rasch analysis for instrument development: Why, when, and how? CBE Life Sciences Education, 15(4). https://doi.org/10.1187/cbe.16-04-0148

Burke, M. J., Normand, J., & Raju, N. S. (1987). Examinee attitudes toward computer-administered ability tests. Computers in Human Behavior, 3, 95–107.

Cappelleri, J. C., Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006.

Carr, N. T. (2011). Designing and analyzing language tests. Oxford University Press.

Chapelle, C. A., & Voss, E. (2016). 20 years of technology and language assessment in language learning & technology. Language Learning and Technology, 20(2), 116–128. http://llt.msu.edu/issues/june2016/chapellevoss.pdf

Chen, W. H., Lenderking, W., Jin, Y., Wyrwich, K. W., Gelhorn, H., & Revicki, D. A. (2014). Is Rasch model analysis applicable in small sample size pilot studies for assessing item characteristics? An example using PROMIS pain behavior item bank data. Quality of Life Research, 23(2), 485–493. https://doi.org/10.1007/s11136-013-0487-5

Choi, I. C., Kim, K. S., & Boo, J. (2003). Comparability of a paper-based language test and a computer-based language test. Language Testing, 20(3), 295–320. https://doi.org/10.1191/0265532203lt258oa

Cizek, G. J., & Earnest, D. S. (2015). Setting performance standards on tests. In S. Lane, M. R. Raymond, & T. M. Haladyna (Eds.), Handbook of test development (pp. 212–237). Taylor & Francis Group. https://doi.org/10.4324/9780203102961

Davey, T. (2011). Practical considerations in computer-based testing. Educational Testing Service.

Ebrahimi, M. R., Toroujeni, S. M. H., & Shahbazi, V. (2019). Score equivalence, gender difference, and testing mode preference in a comparative study between computer-based testing and paper-based testing. International Journal of Emerging Technologies in Learning (IJET), 14(07), 128–143. https://doi.org/10.3991/ijet.v14i07.10175

Erguven, M. (2013). Two approaches in psychometric process: Classical test theory & item response theory. Journal of Education, 2(2), 23–30. https://jebs.ibsu.edu.ge/jms/index.php/je/article/view/84

Fan, J., & Bond T. (2019). Applying Rasch measurement in language assessment: Unidimensionality and local independence. In V. Aryadoust & M. Raquel (Eds.), Quantitative data analysis for language assessment, Vol. I: Fundamental techniques (pp. 83–102). Routledge. https://doi.org/10.4324/9781315187815

Hambleton, R.K., Swaminathan, H. (1985). Item response theory: Principles and applications. Springer. https://doi.org/10.1007/978-94-017-1988-9_2

He, D., & Lao, H. (2018). Paper-and-pencil assessment. In B. Frey (Ed.), The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation (pp. 1199–1200). SAGE Publications, Inc. https://www.doi.org/10.4135/9781506326139.n496

Hosseini, M., Abidin, M. J. Z., & Baghdarnia, M. (2014). Comparability of test results of computer-based tests (CBT) and paper and pencil tests (PPT) among English language learners in Iran. Procedia - Social and Behavioral Sciences, 98, 659–667. https://doi.org/10.1016/j.sbspro.2014.03.465

Indonesian Endowment Fund for Education Agency. (2022). Annual report 2021. Retrieved from https://lpdp.kemenkeu.go.id/storage/information/report/file/yearly/yearly_report_1662003384.pdf

International Test Commission. (2006). International guidelines on computer-based and internet-delivered testing. International Journal of Testing, 6(2), 143–171. https://doi.org/10.1207/s15327574ijt0602_4

Isbell, D. R., & Kremmel, B. (2020). Test review: Current options in at-home language proficiency tests for making high-stakes decisions. Language Testing, 37(4), 600–619. https://doi.org/10.1177/0265532220943483

Ishak, A. H., Osman, M. R., Mahaiyadin, M. H., Tumiran, M. A., & Anas, N. (2018). Examining unidimensionality of psychometric properties via Rasch model. International Journal of Civil Engineering and Technology, 9(9), 1462–1467. http://iaeme.com/Home/issue/IJCIET?Volume=9&Issue=9

Kernan, M. C., & Howard, G. S. (1990). Computer anxiety and computer attitudes: An investigation of construct and predictive validity issues. Educational and Psychological Measurement, 50(3), 681–690. https://doi.org/10.1177/0013164490503026

Khoshsima, H., & Toroujeni, S. M. H. (2017). Transitioning to an alternative assessment: Computer-based testing and key factors related to testing mode. European Journal of English Language Teaching, 2(1), 54-74. https://doi.org/10.5281/ZENODO.268576

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). Springer. https://doi.org/10.1007/978-1-4939-0317-7

Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions 7(4), 328.

Linacre, J. M. (2022). A user’s guide to Winsteps Ministeps Rasch-model computer programs: Program manual 5.2.2. https://www.winsteps.com/a/Winsteps-Manual.pdf

Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment, 1(1), 1–11. https://files.eric.ed.gov/fulltext/ED506058.pdf

Merkin, A. G., Medvedev, O. N., Sachdev, P. S., Tippett, L., Krishnamurthi, R., Mahon, S., Kasabov, N., Parmar, P., Crawford, J., Doborjeh, Z. G., Doborjeh, M. G., Kang, K., Kochan, N. A., Bahrami, H., Brodaty, H., & Feigin, V. L. (2020). New avenue for the geriatric depression scale: Rasch transformation enhances reliability of assessment. Journal of Affective Disorders, 264, 7–14. https://doi.org/10.1016/j.jad.2019.11.100

Meyer, P. (2010). Reliability. Oxford University Press.

Ockey, G. J. (2021). An overview of COVID-19’s impact on English language university admissions and placement tests. Language Assessment Quarterly, 18(1), 1–5. https://doi.org/10.1080/15434303.2020.1866576

Papageorgiou, S., & Manna, V. F. (2021). Maintaining access to a large-scale test of academic language proficiency during the pandemic: The launch of TOEFL iBT Home Edition. Language Assessment Quarterly, 18(1), 36–41. https://doi.org/10.1080/15434303.2020.1864376

Pommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Mode effects for passage-based tests. Journal of Technology, Learning, and Assessment, 2(6), 1–45. https://files.eric.ed.gov/fulltext/EJ905028.pdf

Powers, D. E., & O'Neill, K. (1993). Inexperienced and anxious computer users: Coping with a computer-administered test of academic skills. Educational Assessment, 1, 153–173. https://doi.org/10.1002/j.2333-8504.1992.tb01506.x

Prabowo, M. Y., & Rahmadian, S. (2022). Computer-based English competency assessment for scholarship selection: Challenges, strategies, and implementation in the Ministry of Finance. Jurnal Sosioteknologi, 21(1), 84–96. https://doi.org/10.5614/sostek.itbj.2022.21.1.9

Read, J. (2022). Test review: The International English Language Testing System (IELTS). Language Testing, 39(4), 679–694. https://doi.org/10.1177/02655322221086211

Retnawati, H. (2015). The comparison of accuracy scores on the paper and pencil testing vs. computer-based testing. TOJET: The Turkish Online Journal of Educational Technology, 14(4), 135–142. http://www.tojet.net/articles/v14i4/14413.pdf

Stoynoff, S. (2012). Research agenda: Priorities for future research in second language assessment. Language Teaching, 45(2), 234–249. https://doi.org/10.1017/S026144481100053X

Stricker, L. J., Wilder, G. Z., & Rock, D. A. (2004). Attitudes about the computer-based Test of English as a Foreign Language. Computers in Human Behavior, 20(1), 37–54. https://doi.org/10.1016/S0747-5632(03)00046-3

Sumintono, B., & Widhiarso, W. (2015). Aplikasi pemodelan Rasch pada assessment pendidikan [Application of Rasch modelling in educational measurement]. Trim Komunikata Publishing House.

Taber, K. S. (2018). The use of Cronbach's alpha when developing and reporting research instruments in science education. Research in Science Education, 48(6), 1273–1296. https://doi.org/10.1007/s11165-016-9602-2

Trisnawati, I. K. (2015). Validity in computer-based testing: A literature review of comparability issues and examinee perspectives. Englisia Journal, 2(2), 86–94. https://doi.org/10.22373/ej.v2i2.345

Wang, T. H., Kao, C. H., & Chen, H. C. (2021). Factors associated with the equivalence of the scores of computer-based test and paper-and-pencil test: Presentation type, item difficulty and administration order. Sustainability, 13(17), 1–14. https://doi.org/10.3390/su13179548

Yuzar, E., & Rejeki, S. (2020). The correlation between productive and receptive language skills: An examination on ADFELPS test scores. SALEE: Study of Applied Linguistics and English Education, 1(02), 99–113. https://doi.org/10.35961/salee.v1i02.111

EQUIVALENCY EVIDENCE OF THE ENGLISH COMPETENCY TEST ACROSS DIFFERENT MODES: A RASCH ANALYSIS

Authors

Downloads

Downloads

Login

Scopus Indexing

IndexedBy

Indexed on

Visitors

Statistics

indexing

Address

Contact Info: