DATA REPORT article
Front. Public Health
Sec. Public Health Education and Promotion
This article is part of the Research TopicIntegrating Oral Health into Public Health: Bridging Gaps to Reduce Health Disparities in the USView all 9 articles
Employing AI tools to predict features for dental care use in the United States during the global respiratory illness outbreak
Provisionally accepted- 1Texas A&M University, College Station, United States
- 2SysBioSolutions LLC, Portage, MI, United States
- 3Health Partners Institute, Minneapolis, United States
- 4The University of Texas Rio Grande Valley - Edinburg Campus, Edinburg, United States
- 5University of Maryland, Institute for Health Computing, North Bethesda, United States
- 6Inova Health System, Fairfax, VA, United States
- 7NEIO Systems LLC, Los Angeles, United States
- 8Western University, Toronto, ON, Canada
- 9University of Texas Austin, Sugar Land, TX, United States
- 10Temple University Kornberg School of Dentistry, Philadelphia, United States
- 11University of Pittsburgh, Pittsburgh, United States
- 12National Institute on Minority Health and Health Disparities, Bethesda, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Oral health is strongly associated with overall health. Prior research has demonstrated the benefits of regular dental visits to prevent initiation and progression of systemic diseases such as diabetes and cardiovascular diseases (Macek MD & Tomar SL, 2009;Hardon L, et al., 2023). Prior research has also demonstrated the impacts of dental treatments such as scaling, and root planning on reductions in chronic inflammation which can further help in managing chronic conditions like diabetes (Santas VR et al., 2009). The American Dental Association recommends routine visits to a dentist once or twice a year for children, adults and older adults to manage overall health (1,2). Some individuals may need to visit a dentist more than twice a year to maintain good oral health and overall health (1). Despite these recommendations, access to dental care in the United States (U.S.) has long been a critical public health issue, with only half of the U.S. population over the age of two having ever accessed dental care (3) and 72 million or approximately 27% of adults not having dental insurance (CareQuest Institute of Oral Health, 2025). Moreover, there is socioeconomic gradient in access to oral health in the U.S. (Sanders AE et al., 2006). Increasing levels of income and education are associated with higher frequency of annual dental care visits on a population level. That is, those with higher income and higher education are more likely to have dental insurance which is a key factor for access to routine and timely dentist visits.Additionally, even among those with higher socio-economic status, gaps in oral health and dental care access continue to exist, as these challenges are influences by broader structural issues within the health care system and by factors such as transportation and housing conditions (Broomhead T, Baker SR 2023; Northridge ME et al., 2020).Factors related to low-income status, race, ethnicity, and rural status have further contributed to the lack of, and variability of dental care access thereby contributing to poor oral health outcomes (4,5). Challenges in accessing dental public health services have been associated with the exacerbation of systemic health ailments, including multimorbidity, diabetes, heart and lung disease, dementia, arthritis and complications during pregnancy among ethnically and culturally diverse populations (5)(6)(7). As such, these pronounced disparities in access to, quality of, and affordability of routine primary oral healthcare have impacted a wide range of communities, particularly those with disabilities, dual-eligible, and racial and/or ethnic minority populations who face a greater burden of health disparities (8)(9)(10)(11). In a recent study, those with a low-income status and less than high school education reported higher unmet dental needs than their more affluent and educated counterparts (11). When comparing population groups, Black/African and Hispanic/Latino Americans have reported higher levels of untreated dental disease compared to White Americans (11) In the context of the recent pandemic of 2020-2022, these disparities were further compounded due to urgent oral treatments taking precedence over routine visits (12). This shift resulted in a notable decrease in oral healthcare utilization in the U.S. by 18 million users in the first year of the pandemic, with lasting repercussions and a gradual, slow recovery to prepandemic levels (12). While research has been done on what factors and obstacles in healthcare were associated with barriers to dental visits before 2020 (8)(9)(10)(11), much less has been examined regarding the factors associated with decline in dental care utilization beyond age-related survey stratifications of dental visits during the peak of the global public health crisis in 2021 (12). A study examined dental utilization and oral conditions from 2019-2020 using Electronic Health Records (EHRs) from federally qualified health centers in the U.S. and found visits to dental care providers decreased more in comparison to other health providers during the shelter-in-pace orders during the 2020-2021 (13). Another study found 47% of respondents delayed visiting a dentist for dental check-ups, pain, and for seeking care for planned treatment during the recent pandemic of 2020-2022 (14).Although these studies provided meaningful insights into oral health disparities and access to care issues, the barriers can be further studied by utilizing comprehensive survey datasets and using AI models that would help identify and uncover complex, non-linear relationships between confounding factors that traditional statistical methods may overlook. The statistical methods have been widely used in prior research and limited research has employed Al models on U.S. federal nationally representative survey datasets, a gap we identified. Therefore, to address this gap, our study examines dental care utilization during the second year of the pandemic by applications of multiple AI/ML models to uncover complex, interacting social and health determinants that are not easily captured through traditional statistical approaches.Our aims for this data report were to examine the major contributors that drive oral healthcare visits and identify barriers that have limited access during the pandemic using the nationally representative Medical Expenditure Panel Survey (MEPS). Specifically, the objectives were to (1) demonstrate how Al approaches can be used with nationally representative survey data; (2) provide preliminary findings from multiple Al models regarding top factors associated with dental services utilization in the U.S. using feature importance; and (3) identify to which extent these factors vary across different Al models. To achieve these objectives, multiple AI models were utilized to process and analyze the MEPS dataset. By leveraging AI tools, preliminary findings were identified regarding the top factors for dental visits in the U.S. along with the extent to which these factors vary across AI models. In this data report, we describe the dataset, outline the analytic workflow, and present key modelderived findings rather than developing or testing any causal hypotheses. We used the cross-sectional Medical Expenditure Panel Survey data spanning year 2021. The data report utilizes the nationally representative MEPS, the most comprehensive data collected from American households on healthcare use, costs (including out-of-pockets from self/family, healthcare expenditures and payments to public and private payers), and health insurance coverage in the U.S. MEPS includes information on dental insurance, including on health and dental service use for the American households. Importantly, in recent years MEPS has been oversampling certain population groups and collecting demographic and social factors and satisfaction with care. MEPS has been oversampling Hispanics, Blacks, and Asians, as well as low-income households, to improve the precision and reliability of estimates for these groups. This is because these groups are underrepresented in a standard national sample, and oversampling allows researchers to have a statistically significant sample size for detailed analysis of their healthcare utilization and expenditures. The primary sponsor for MEPS is the For this data report, the MEPS full-year consolidated household file, HC-233, from the 2021 calendar year (the second year of the pandemic) was analyzed. Person-level de-identified fullyear household consolidated file MEPS data on healthcare utilization was used, with a focus on predictive factors. These factors include sociodemographic factors such as age, gender, race/ethnicity, socioeconomic factors such as education, household income, and enabling factors such as dental insurance, health behaviors, such as smoking status for dental visits during 2021. The National Institute on Minority Health and Health Disparities (NIMHD) and the National Institute on Aging Health Disparities Research Framework across domains and levels of influence was utilized to conceptualize, operationalize and provide structure to our methodological approach (17)(18). While MEPS does not collect data on all elements represented within the NIMHD framework (e.g., biological and physical environmental domain), a wide range of factors (e.g., within health behaviors, sociocultural environmental, healthcare system) and levels (e.g., individual, interpersonal, family, community and societal/population) that impact health outcomes are part of MEPS routine data collection. For our data report, individual level factors and healthcare system domains that can influence oral health outcomes for a person and for community-living noninstitutionalized civilian U.S. population were examined. Dental utilization was conceptualized as a proxy for health seeking behavior of an individual based on the Anderson Health Behavior Model for health service utilization (19-20). Our multidisciplinary and cross-sectorial team from academia, the private sector, and government, (18). XGBoost and LightGBM are known for their scalability and predictive accuracy, particularly in identifying key features from high-dimensional data (22)(23)(24). These models collectively allow for a thorough examination of the factors affecting dental visit utilization in the U.S. The target variable for all models was dental visits, with a total of 1,488 variables included in the study. To focus on the factors that most influence how often people visit the dentist, certain variables that could introduce unnecessary complexity or bias into the models were excluded.isSpecifically, while the total amount paid by families for dental care variable was retained in the models, variables that break down payments by specific sources excluded, including total charges, out-of-pocket payments, and payments from Medicare, Medicaid, and private insurance. Additionally, survey-specific identifiers, variance estimation parameters, demographic identifiers, and statistical weights used for producing nationally representative estimates were also excluded (see Supplementary Table 1 for details). This decision was made to streamline the analysis and maintain a clear focus on the most critical variables impacting dental visit frequency, without being confounded by peripheral or redundant data, Two primary objectives were followed to address the research goals. For the first objective, identification of significant features that predict whether an individual had no dental visits versus at least one dental visit was aimed, given the data's imbalance, with a higher prevalence of zero dental visits. To achieve this, binary classification models were performed to distinguish between those with zero dental visits and those with at least one, allowing for the identification of barriers and facilitators associated with dental care utilization and the characteristics of non-utilizers. An 80/20 stratified train-test split was also used to maintain the class distribution between individuals with and without dental visits across training and testing subsets. For the second objective, analysis focused on individuals with 1-2 dental visits, which represent routine preventive care, versus those with more than two visits, potentially indicative of therapeutic visits. A binary classification model was employed to differentiate between these two groups, aiming to identify the factors contributing to more frequent dental visits. In this study, the natural class distribution was retained to reflect real-world utilization patterns, and model performance was evaluated using metrics (e.g., F1 and ROC-AUC) that are appropriate for imbalanced data. Model performance metrics (accuracy, precision, recall, F1 score, and ROC-AUC) and feature importance rankings served as the statistical basis for interpreting key predictors of dental utilization. (22)(23)(24). These metrics provide a comprehensive view of the models' performance, capturing both their predictive accuracy and their ability to Objective 1 aimed to identify the significant factors associated with no dental visits versus at least one dental visit. Four binary classification models (i.e., Decision Tree, Random Forest, XGBoost, and LightGBM) were developed to distinguish between individuals who did not utilize dental care and those who had at least one visit. Table 1 presents performance metrics, including accuracy, precision, recall, F1 score, and ROC-AUC, alongside the top ten features ranked by feature importance for each model, providing insights into the key determinants of dental visit behavior.Performance metrics and top ten predictive features for each model are summarized in Table 1.Based on all performance indicators, overall XGBoost's performance was superior followed by LigthtGBM. XGBoost provided the highest values for overall model metrics, i.e. accuracy (0.864), precision (0.828), recall (0.843), F1 score (0.835), and ROC-AUC (0.939). (Table 1).Objective 2 aimed to differentiate between individuals with one or two dental visits (representing routine preventive dental care) and those with more than two visits (potentially indicative of therapeutic care). Binary classification models were applied to identify the factors associated with more frequent dental visits. Table 2 highlights the performance metrics (accuracy, precision, recall, F1 score, and ROC-AUC) for Decision Tree, Random Forest, XGBoost, and LightGBM models and top ten features ranked by their importance for predicting dental visit frequency. While XGBoost has slightly better recall and F1 score, LightGBM provided slightly higher precision and ROC-AUC. Given that ROC-AUC is a key metric for classification tasks with imbalanced data, LightGBM performed superior to XGBoost as the better model in this particular case, due to its higher ROC-AUC of 0.782 compared to 0.769 for XGBoost. Utilizing four Al/ML models, we identified the top determinants for total dental visits in 2021 as outof-pocket costs, doctor office visits, the use of preventive behaviors in other non-dental care domains (e.g., having glasses/contact lenses), having a diagnosis of attention-deficit /hyperactivity disorder (ADHD), delays in receipt of dental care due to pandemic, emergency visits, and total expenses paid for private insurance. Additionally, our Al models identified years of education and educational degree and level as other top features that predict at least one visit to a dentist (Table 1). When we classified dental visits as either one or two or more than two, we found similarities for top features across this classification than 2 (Table 1 andTable 2).For objective 1, XGBoost's performance metrics were superior to all other models. For objective 2, XGBoost and LightGBM performed better than other Al models, but given the binary classification task, LightGBM is considered superior due to its advantage on ROC-AUG. We observed some op features were similar across Al models but not all, with their rankings and importance differing across Al models (See Tables 1 and2). When compared to findings from prior research, our Al models for predicting top features of preventive dental visits and dental visits for more complex care were robust as they identified similar features that have been shown to influence dental visits (3-10, 12, 21-28). For preventive dental care utilization, higher educational level or higher degree (24), age (24), losing teeth, perceived health status (24), and Social Determinants of Health (SDOH) factors (24) were identified as relevant factors by the XGBoost model (32)(33)(34). For preventive vs. treatment dental visits that were more than two in 2021, the XBBoost and LightGMB models identified factors that were associated with more severe dental care needs, such as number of emergency visits, having limitations in school, work, or household activities (8,35), family income (24,(33)(34) or level of household poverty (33)(34), having private insurance, total costs paid by self and or family, total health expenditures, and a person's age (24).This data report provides several avenues for improving the Al models. First, MEPS data can be pooled across years to increase sample size, before applying transformation, classifications of any sort and before running any Al models. Second, out-of-pocket costs and number of dental visits in a given year can be used to further assess dental needs/severity. Third, the top medical conditions that are associated with those who visit dentists more often can be identified. Fourth, Al models can be developed and trained to determine to what extent having delayed visits to dentists are predictive of having multiple chronic conditions or multimorbidity. Costs, dental insurance, or socioeconomic barriers make up three of the top ten predictors of any dental visits in 2021, and all top ten predictors of routine vs. therapeutic dental visit patterns. Our findings thus is line with the previous literature that has consistently reported financial and socioeconomic barriers as significant factors for oral health disparities in the U.S. (3-10, 12, 21-26). These results point to household economic precarity and large disparities in wealth and income, in addition to longstanding issues with dental insurance design and integration (23,26), as potential areas for both long-term and short-term policy interventions to reduce disparities in dental healthcare utilization with the goal to improve oral health outcomes. At least four of the top ten predictors of any dental visits during 2020-2022 were related to disability, mental health, and overall health status (8-9). These have also been previously linked to barriers for preventative oral healthcare (7-9, 22-23, 29). For individuals with disabilities, barriers may be due to competing health priorities and resource constraints that affect healthcare utilization (30). Potential policies meant to reduce barriers to care can be designed to reduce process complexity We acknowledge several limitations. Given the secondary nature of the MEPS data, our Al models do not account for imbalances originating from the MEPS data collection or measurement, specifically those that exacerbated when data collection were modified during the pandemic of 2020-2022.MEPS response rates declined during 2020-2021, creating data characteristics that AI models can carry forward or amplify. Nationally representative survey designs impose unique challenges when applying Al and ML approaches and very few software packages provide ways to handle such complex data (31). The Al models were not survey-weighted and did not consider the complex survey design of MEPS incorporating clustering, stratification and survey weighting using primary sampling unit, variance stratum and person weights. As a result, the Al models are skewed and disproportionately reflect those who are more represented in the MEPS and may not be nationally representative. Therefore, our Al models are not generalizable to the U.S.population.Strengths include our methodological approach grounded in Al/ML techniques and tools helps further provide evidence for social determinants of health and the socioeconomic barriers that have led to the variability of dental care in patient populations during the 2020-2021 public health emergency. This approach also lays groundwork for future strategies to promote oral health access and outcomes for all populations in the U.S. The outlook of utilizing Al/ML approaches to improve the health and well-being of populations consistently experiencing greater health challenges remains bright. Our data report describes how the SCHARE platform can be utilized for Al model development to shed light on obstacles and factors that can hinder or facilitate healthcare access, address dental disparities, advance dental and health outcomes research, and to inform policy decisions (e.g., integrating oral and systemic health; so that access to services provided by dentists and/or primary care providers are consolidated and paid for in one office setting). We provide several avenues for future research.
Keywords: artificial intelligence, Cloud computing, dental population health, dental public health, dental health services research, Dental health policy, Cross-sector collaboration (partnerships)
Received: 25 Aug 2025; Accepted: 03 Dec 2025.
Copyright: © 2025 Zanwar, Kodan-Ghadr, Thirumalai, Ghaddar, Huang, Harkness, Rey, Shah, Kurelli, Patel, Calzoni, Dede Yildirim and Duran. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Preeti Pushpalata Zanwar
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
