AUTHOR=Silva Sepulveda Rosario , Boman Magnus TITLE=Multimodal machine learning for analysing multifactorial causes of disease—The case of childhood overweight and obesity in Mexico JOURNAL=Frontiers in Public Health VOLUME=Volume 12 - 2024 YEAR=2025 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2024.1369041 DOI=10.3389/fpubh.2024.1369041 ISSN=2296-2565 ABSTRACT=BackgroundMexico has one of the highest global incidences of paediatric overweight and obesity. Public health interventions have shown only moderate success, possibly from relying on knowledge extracted using limited types of statistical data analysis methods.PurposeTo explore if multimodal machine learning can enhance identifying predictive features from obesogenic environments and investigating complex disease or social patterns, using the Mexican National Health and Nutrition Survey.MethodsWe grouped features into five data modalities corresponding to paediatric population exogenous factors, in two multimodal machine learning pipelines, against a unimodal early fusion baseline. The supervised pipeline employed four methods: Linear classifier with Elastic Net regularisation, k-Nearest Neighbour, Decision Tree, and Random Forest. The unsupervised pipeline used traditional methods with k-Means and hierarchical clustering, with the optimal number of clusters calculated to be k = 2.ResultsThe decision tree classifier in the supervised early fusion approach produced the best quantitative results. The top five most important features for classifying child or adolescent health were measures of an adult in the household, selected at random: BMI, obesity diagnosis, being single, seeking care at private healthcare, and having paid TV in the home. Unsupervised learning approaches varied in the optimal number of clusters but agreed on the importance of home environment features when analysing inter-cluster patterns. Main findings from this study differed from previous studies using only traditional statistical methods on the same database. Notably, the BMI of a randomised adult within the household emerged as the most important feature, rather than maternal BMI, as reported in previous literature where unwanted cultural bias went undetected.ConclusionOur general conclusion is that multimodal machine learning is a promising approach for comprehensively analysing obesogenic environments. The modalities allowed for a multimodal approach designed to critically analyse data signal strength and reveal sources of unwanted bias. In particular, it may aid in developing more effective public health policies to address the ongoing paediatric obesity epidemic in Mexico.