AUTHOR=Jundt EvaLynn , Hu Xinping TITLE=Statistical models for the estimation of pH and aragonite saturation state in the Northwestern Gulf of Mexico JOURNAL=Frontiers in Marine Science VOLUME=Volume 12 - 2025 YEAR=2025 URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2025.1621280 DOI=10.3389/fmars.2025.1621280 ISSN=2296-7745 ABSTRACT=Historical water column carbonate measurements have been scarce in the Gulf of Mexico (GOM); thus, the progression of ocean acidification (OA) is still poorly understood, especially in the subsurface waters. In the literature, statistical models, such as multiple linear regression (MLR), have been created to fill OA data gaps in different ocean regions. Additionally, machine learning techniques such as random forest (RF) have been used in model creations for both the open ocean and marginal seas. However, there is no statistical model for subsurface carbonate chemistry parameters (i.e., pH and ΩArag) in the GOM. By creating models with various architectures built upon the relationships between commonly measured hydrographic properties (e.g., salinity, temperature, pressure, and dissolved oxygen or DO) and carbonate chemistry parameters (e.g., pH and aragonite saturation state, or ΩArag), data gaps can be potentially filled in areas with insufficient sampling coverage. In this study, two statistical models were created for pH and ΩArag in the northwestern GOM (nwGOM) within the range of 27.1–29.0˚N and 89–95.1˚W using both MLR and RF methods. The calibration data used in the models include salinity, temperature, pressure, and DO collected from seven cruises that took place between July 2007 and February 2023. The models predict ΩArag with R2 ≥ 0.94, mean square error (MSE) ≤ 0.04, and pH with R2 ≥ 0.93, MSE ≤ 0.0005. Both the MLR and RF models perform similarly. These models are valuable tools for reconstructing pH and ΩArag data where direct chemical observations are absent but hydrographic information is available in the nwGOM. Nevertheless, potential shifts in circulation, water mass changes, and accumulation of anthropogenic CO2 need to be accounted for to improve and revise these models in the future.