AUTHOR=Paasche Hendrik , Dega Ségolène , Schrön Martin , Dietrich Peter TITLE=Comprehensive data aleatory uncertainty propagation in regression random forest using a Monte Carlo approach: a struggle with incomplete data provision using a case study on probabilistic soil moisture regionalization JOURNAL=Frontiers in Environmental Science VOLUME=Volume 13 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2025.1599320 DOI=10.3389/fenvs.2025.1599320 ISSN=2296-665X ABSTRACT=Data uncertainty never decreases along processing chains and should always be reported alongside processing results. In this study, we attempt to propagate aleatory data uncertainty through a multiple regression analysis to generate regionalized probabilistic soil moisture maps. We employ a non-parametric solution for multiple regression by means of random forests within a Monte Carlo framework. Our input data comprise sparse soil moisture data and spatially dense auxiliary soil and topographic maps, which serve as response and predictor variables in our regression model, respectively. While the methodology is technically straightforward, challenges arise due to incomplete communication of data uncertainty by data providers. This results in knowledge gaps that must be filled by subjective assumptions rather than data-driven insights. We highlight the issues that hinder straightforward uncertainty propagation, ultimately making our final uncertainty quantification of regionalized soil maps an optimistic estimate. Additionally, we sketch how existing uncertainty classification schemes could help data providers deliver quantified uncertainties with their data, enabling users to more accurately assess and report uncertainties in their derived data products.