Towards Scalable Vocabulary Acquisition Assessment With BERT

Description

The authors propose new machine learning methods for automated scoring models that predict the vocabulary acquisition in science and social studies of second-grade English language learners, based on free-form spoken responses. They evaluate performance on an existing dataset and use transfer learning from a large pretrained language model, reporting the influence of various objective function designs and the input-convex network design. In particular, they find that combining objective functions with varying properties, such as distance among scores, greatly improves the model reliability compared to human raters. The models extend the current state-of-the-art performance for assessing word definition tasks and sentence usage tasks in science and social studies, achieving excellent quadratic weighted kappa scores compared with human raters. However, human-human agreement still surpasses model-human agreement, leaving room for future improvement. Even so, this work highlights the scalability of automated vocabulary assessment of free-form spoken language tasks in early grades.

Citation

Wu, Z., Larson, E., Sano, M., Baker, D. L., Gage, N. S., & Kamata, A. (2023). Towards scalable vocabulary acquisition assessment with BERT. In Proceedings of the Tenth ACM Conference on Learning @ Scale. Association for Computing Machinery, New York, NY, United States, 272–276. https://doi.org/10.1145/3573051.3596170