Human-AI Partnership Models for Preserving Validity in High- Stakes Reading Tests

Safriadi Safriadi, Mahlil Mahlil

Abstract


High-stakes reading assessments are now different as a result of using AI which allows for faster and more flexible measurement. Still, because of these new tools, it becomes necessary to preserve the accuracy of reading assessments so they can be used fairly for measuring reading skills. This document focuses on the change from using completely automated AI for scoring to working together between AI and humans—automated, augmented and hybrid frameworks—to bring out the best of both approaches. By discussing validity issues related to AI such as construct validity, content validity, bias, transparency and data quality. The study points out the difficulties of using AI in important situations. Besides, the discussion covers new approaches that encourage effective teamwork between humans and AI, focusing on multiple ways to assess, ethics and steps assuring the integrity of assessments. Examples from real use include taking the PTE Academic English proficiency test and using adaptive classroom platforms to show how these partnership models help. The research showed that having human experts checks AI results is important for making reading scores reliable and trustworthy. It takes part in the ongoing discussion about AI in education by analyzing partnership models that maintain accuracy, giving useful information to those who work with AI in assessment, decision-making and research.

Full Text:

PDF

References


Agarwal, A., Mitros, P., & Paruchuri, V. (2014). Assessment in digital at-scale learning environments. Ubiquity, 2014, 1. https://doi.org/10.1145/2591795

Alharbi, W. (2023). AI in the foreign language classroom: A pedagogical overview of automated writing assistance tools. Education Research International, 2023, 1. https://doi.org/10.1155/2023/4253331

Arbab, A. (2019). How leadership and mentoring enhance the quality of teaching and learning in higher education. International Journal of Learning and Development, 9(1), 57. https://doi.org/10.5296/ijld.v9i1.13814

Artemeva, N. (2008). Toward a unified social theory of genre learning. Journal of Business and Technical Communication, 22(2), 160–185.

Artemeva, N., & Freedman, A. (2008). Rhetorical genre studies and beyond. Inkshed Publications.

Bartholomew, S., Mentzer, N., Jones, M. D., Sherman, D., & Baniya, S. (2020). Learning by evaluating (LbE) through adaptive comparative judgment. International Journal of Technology and Design Education, 32(2), 1191. https://doi.org/10.1007/s10798-020-09639-1

Bawarshi, A. S., & Reiff, M. J. (2010). Genre: An introduction to history, theory, research, and pedagogy. Parlor Press.

Boitshwarelo, B., Reedy, A., & Billany, T. (2017). Envisioning the use of online tests in assessing twenty-first century learning: A literature review [Review of Envisioning the use of online tests in assessing twenty-first century learning: A literature review]. Research and Practice in Technology Enhanced Learning, 12(1). Springer Nature. https://doi.org/10.1186/s41039-017-0055-7

Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society. Studies in Continuing Education, 22(2), 151. https://doi.org/10.1080/713695728

Broad, B. (2003). What we really value: Beyond rubrics in teaching and assessing writing. Utah State University Press.

Crossley, S. A., & McNamara, D. S. (2016). Say more and be more coherent: How text elaboration and cohesion can increase writing quality. Journal of Writing Research, 7(3), 351–370.

DelleBovi, B. M. (2012). Literacy instruction: From assignment to assessment. Assessing Writing, 17(4), 271. https://doi.org/10.1016/j.asw.2012.07.001

Dias, P., Freedman, A., Medway, P., & Paré, A. (1999). Worlds apart: Acting and writing in academic and workplace contexts. Lawrence Erlbaum Associates.

East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14(2), 88–115.

Elliot, N. (2005). On a scale: A social history of writing assessment in America. Peter Lang.

Ferris, D. R., & Hedgcock, J. S. (2014). Teaching L2 composition: Purpose, process, and practice (3rd ed.). Routledge.

Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387.

Graham, S., & Perin, D. (2007). Writing next: Effective strategies to improve writing of adolescents in middle and high schools. Alliance for Excellent Education.

Hargreaves, D. J. (1997). Student learning and assessment are inextricably linked. European Journal of Engineering Education, 22(4), 401. https://doi.org/10.1080/03043799708923471

Harun, A., Yusoff, R. Md., & Zakaria, A. M. (2020). TVET in Malaysia: Capabilities and challenges as viable pathway and educational attainment. *5*(1), 52.

Haswell, R. H. (2001). Beyond outcomes: Assessment and instruction within a university writing program. Ablex Publishing.

Hays, R., Ramani, S., & Hassell, A. (2020). Healthcare systems and the sciences of health professional education. Advances in Health Sciences

Education, 25(5), 1149. https://doi.org/10.1007/s10459-020-10010-1

Hosni, J. A. (2024). Stylometric analysis of AI chatbot-generated emails: Are students losing their linguistic fingerprint? Journal of English Language Teaching and Applied Linguistics, 6(3), 33. https://doi.org/10.32996/jeltal.2024.6.3.5

Huot, B. (2002). (Re)articulating writing assessment for teaching and learning. Utah State University Press.

Hyland, K. (2016). Methods and methodologies in second language writing research. System, 59, 116–125.

Hyland, K. (2019). Second language writing (2nd ed.). Cambridge University Press.

Johnson, M. (2008). Exploring assessor consistency in a Health and Social Care qualification using a sociocultural perspective. Journal of Vocational Education and Training, 60(2), 173. https://doi.org/10.1080/13636820802042446

Knoch, U. (2011). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16(2), 81–96.

Kynell-Hunt, T., & Savage, G. J. (Eds.). (2003). Power and legitimacy in technical communication. Baywood Publishing.

Lovat, T. (2020). Holistic learning versus instrumentalism in teacher education: Lessons from values pedagogy and related research. Education

Sciences, 10(11), 341. https://doi.org/10.3390/educsci10110341

Mbuvha, T. R. (2019). Kinds of support offered by the disability unit to students with disabilities at institutions of higher learning in South Africa: A case study of the University of Venda. Journal of Student Affairs in Africa, 7(2). https://doi.org/10.24085/jsaa.v7i2.3825

Migueláñez, S. O., Sánchez, E. M. T., Gamazo, A., Sáiz, M. S. I., & Gómez, G. R. (2017). Design of an assessment system for the improvement of competences among Pedagogy students. *1*. https://doi.org/10.1145/3144826.3145373

Neumann, M. M., Anthony, J. L., Erazo, N. A., & Neumann, D. L. (2019). Assessment and technology: Mapping future directions in the early childhood classroom. Frontiers in Education, 4. https://doi.org/10.3389/feduc.2019.00116

Nkalane, P. K. (2018). Inclusive assessment practices in vocational education: A case of a technical vocational education and training college. The International Journal of Diversity in Education, 17(4), 39. https://doi.org/10.18848/2327-0020/cgp/v17i04/39-50

Pappas, E., & Hendricks, R. W. (2000). Holistic grading in science and engineering. Journal of Engineering Education, 89(4), 403. https://doi.org/10.1002/j.2168-9830.2000.tb00543.x

Parks, S., & Maguire, M. H. (1999). Coping with on-the-job writing in ESL: A constructivist-semiotic perspective. Language Learning, 49(1), 143–175.

Rissanen, P., Elfvengren, K., & Metso, L. (2023). Agile onboarding process for external partners in B2B SaaS industry.

Selfe, C. L. (2004). Students who teach us: A case study of a new media text designer. In A. F. Ball & S. W. Freedman (Eds.), Bakhtinian perspectives on language, literacy, and learning (pp. 43–66). Cambridge University Press.

Smart, G., & Brown, N. (2002). Learning transfer or transforming learning? Student interns reinventing expert writing practices in the workplace. Technostyle, 18(1), 117–141.

Song, C., & Song, Y. (2023). Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1260843

Spandel, V. (2012). Creating writers: 6 traits, process, workshop, and literature (6th ed.). Pearson.

Teng, M. F. (2024). “ChatGPT is the companion, not enemiesâ€: EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing. Computers and Education Artificial Intelligence, 7, 100270. https://doi.org/10.1016/j.caeai.2024.100270

Tumlos-Castillo, L. M., Prudente, M. S., & Aguja, S. E. (2021). Designing authentic online distance learning assessments in teaching Asian history. *163*. https://doi.org/10.1145/3450148.3450165

Van, T. N. (2021). Effects of computer-based feedback and formative assessment on learning outcomes. Can Tho University Journal of Science, 13(1). https://doi.org/10.22144/ctu.jen.2021.008

Wahyuni, L. G. E., Dewi, N. L. P. E. S., & Paramartha, A. A. G. Y. (2021). Authentic assessment practice. Advances in Social Science, Education and Humanities Research/Advances in Social Science, Education and Humanities Research. https://doi.org/10.2991/assehr.k.210407.258

Weigle, S. C. (2002). Assessing writing. Cambridge University Press.

White, E. M. (2007). Assigning, responding, evaluating: A writing teacher's guide (4th ed.). Bedford/St. Martin's.

Wilson, M. (2018). Making measurement important for education: The crucial role of classroom assessment. Educational Measurement: Issues and Practice, 37(1), 5–20.

Winsor, D. A. (2003). Writing power: Communication in an engineering center. SUNY Press.

Yahiaoui, H. (2020). Private tuition: High stakes and thorny issues. English Language Teaching, 13(7), 88. https://doi.org/10.5539/elt.v13n7p88

Yasuda, S. (2011). Genre-based tasks in foreign language writing: Developing writers' genre awareness, linguistic knowledge, and writing competence. Journal of Second Language Writing, 20(2), 111–133.

Ye, J., He, Z., Bai, B., & Wu, Y.-F. (2024). Sustainability of Technical and Vocational Education and Training (TVET) along with Vocational Psychology. Behavioral Sciences, 14(10), 859. https://doi.org/10.3390/bs14100859


Refbacks

  • There are currently no refbacks.


Indexing :

Creative Commons License
Journal of Language Testing and Studies (J-Latest) licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.