AUTHOR=Casella Monica , Milano Nicola , Dolce Pasquale , Marocco Davide 

TITLE=Transformers deep learning models for missing data imputation: an application of the ReMasker model on a psychometric scale

JOURNAL=Frontiers in Psychology

VOLUME=Volume 15 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1449272

DOI=10.3389/fpsyg.2024.1449272

ISSN=1664-1078

ABSTRACT=Missing data in psychometric research presents a substantial challenge, impacting the reliability and validity of study outcomes. Various factors contribute to this issue, including participant nonresponse, dropout, or technical errors during data collection. Traditional methods like mean imputation or regression, commonly used to handle missing data, rely upon assumptions that may not hold on psychological data and can lead to distorted results. This study aims to evaluate the effectiveness of transformer-based deep learning for missing data imputation, comparing ReMasker, a masking autoencoding transformer model, with conventional imputation techniques (mean and median imputation, Expectation-Maximization algorithm) and machine learning approaches (Knearest neighbors, MissForest, and an Artificial Neural Network). Using a psychometric dataset from the COVID distress repository, we assessed imputation performance through the Root Mean Squared Error (RMSE) between the original and imputed data matrices. Results indicate that machine learning techniques, particularly ReMasker, achieve superior performance in terms of reconstruction error compared to conventional imputation techniques across all tested scenarios. This finding underscores the potential of transformer-based models to provide robust imputation in psychometric research, enhancing data integrity and generalizability.