Original Research ARTICLE
Predictive Feature Generation and Selection Using Process Data from PISA Simulation-Based Environment: An Application of Tree-Based Ensemble Methods
- 1Teachers College, Columbia University, United States
- 2Educational Testing Service, United States
- 3National Board of Medical Examiners, United States
As one of the most innovative international large-scale assessments, the Programme for International Student Assessment (PISA) introduced the measurement of problem-solving skills in the 2012 cycle. The items in this new domain were typically designed as scenario-based environments and featured in interactions between students and computers. Process data collected in log files were especially valuable to provide deeper insight into students’ behaviors and allowed tracking their problem-solving strategies. This study illustrates a two-stage approach to generate features from process data and select those that predict student performance using a released problem-solving item “Climate Control” from PISA 2012. The specific research questions we focus on are: (1) how well the features generated from process data can predict test takers’ responses on a certain item, and (2) which features are the most predictive ones. We used a tree-based ensemble method called Random Forest to explore the association between response data as well as to extract features from process data. The eventual goal is to address issues around the complex structure of extracted features and the availability of massive numbers of variables representing different interactions in log-file entries.
Keywords: Process data, scenario-based environment, Feature generation, Feature Selection, random forest, PISA
Received: 11 Jan 2019;
Accepted: 03 Jun 2019.
Edited by:Samuel Greiff, University of Luxembourg, Luxembourg
Reviewed by:Timothy R. Brick, Pennsylvania State University, United States
Daniel W. Heck, Universität Mannheim, Germany
Copyright: © 2019 Han, He and von Davier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Mr. Zhuangzhuang Han, Teachers College, Columbia University, New York City, 10027, New York, United States, email@example.com
Dr. Qiwei He, Educational Testing Service, Princeton, United States, firstname.lastname@example.org
Dr. Matthias von Davier, National Board of Medical Examiners, Philadelphia, Pennsylvania, United States, MvonDavier@nbme.org