AUTHOR=Tiwari Tamanna , Kondratenko Maxim , Nasiha Nihmath , Ong Toan , Chandrasekaran Sangeetha , Kostbade Gary , Giano Zachary TITLE=Evaluating the completeness of electronic health records in dental education: a big data study JOURNAL=Frontiers in Oral Health VOLUME=Volume 6 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/oral-health/articles/10.3389/froh.2025.1535164 DOI=10.3389/froh.2025.1535164 ISSN=2673-4842 ABSTRACT=ObjectivesThe BigMouth Dental Data Repository is an oral health database developed from de-identified electronic health record (EHR) data from eleven dental schools within the United States. To better understand how this database can be used for further research, the repository must be analyzed for data quality, such as accuracy, consistency, and completeness. This study determined the completeness of all patient health records between 2017 and 2019, including demographic, dental, behavioral, and health history variables at the students, faculty, and resident level.MethodsThis study analyzed demographic (age, gender, race/ethnicity, zip code, insurance), dental (pain ratings), behavioral (tobacco, alcohol, and drug use), and health history variables for completeness. ANOVA was conducted to detect differences in providers collecting data by year (using Tukey post hoc differences at p < .05). Effect sizes are presented by comparing students to all other provider types.ResultsOverall, the data showed high completeness in demographic variables (97.6%-99.9% for age, gender, and zip code) among the total sample of 543,363 patient visits. However, lower completeness rates were found in dental and behavioral variables (ranging from 1.5% to 66.1%), suggesting potential limitations for certain research applications. The study found significant differences in the completeness of records between students, faculty, and residents. In demographic variables, students demonstrated significantly higher completeness rates than faculty across the years 2017–2019, with 79.8%, 79%, and 78.8% completeness for race/ethnicity records, respectively. Furthermore, residents and faculty exhibited significantly higher completeness rates (76.8% and 86.7%, respectively) in insurance information compared to students (56.7%). Notably, students showcased greater completeness percentages in variables related to tobacco use, alcohol use, drug use, and health history compared to both faculty and residents.ConclusionThis study underscores significant variations in the completeness of EHR data among students, faculty, and residents across different schools. Despite these variances, the overall findings suggest a robust level of completeness in the demographic and health variables within the dataset.