ORIGINAL RESEARCH article
Front. Environ. Sci.
Sec. Freshwater Science
Estimating phytoplankton group abundances in an agricultural pond from in situ sensed data with machine learning: Use of the SHAP analysis for ecological assessments
Provisionally accepted- 1USDA-ARS Environmental Microbial & Food Safety Laboratory, Beltsville, United States
- 2University of Maryland, College Park, United States
- 3Oak Ridge Institute for Science and Education, Oak Ridge, United States
- 4US Food and Drug Administration Human Foods Program, College Park, United States
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Phytoplankton are a crucial component of aquatic ecosystems and are closely tied to water quality. Direct counts of phytoplankton abundances are resource-demanding, but the indirect estimation of those abundances has proven to be beneficial when conducting ecological assessments of waterbodies. Agricultural ponds serve as important water sources for irrigation, recreation, processing harvested agricultural products, animal watering, among other purposes. This work examined the use of random forest (RF), coupled with a Shapley Additive exPlanations (SHAP) analysis, to estimate the abundances of phytoplankton groups in an agricultural pond in Maryland. In situ sensing (ISS) of water quality parameters on a permanent sampling grid during the produce growing season provided dissolved oxygen, pH, specific conductance, chlorophyll a, phycocyanin, fluorescent dissolved organic matter, and turbidity measurements. Phytoplankton abundance data was determined using a modified Utermöhl microscopy method. Values of the determination coefficient for training and testing datasets were on average 0.81 and 0.74, and varied from 0.50 to 0.88 for ISS predictors, respectively. The explanatory analysis using the SHAP method revealed that the most influential predictors, identified as the top three for each phytoplankton taxonomic group, were specific conductance, fluorescent dissolved organic matter, and chlorophyll a. The RF analysis provided good estimates of the abundance of the phytoplankton community in agricultural pond waters and the addition of the SHAP analysis allowed for an exploration of what factors were most critical in supporting the phytoplankton groups observed.
Keywords: agricultural waters, random forest, Shapley additive explanations, Phytoplankton community composition, Water Quality
Received: 30 Jul 2025; Accepted: 30 Jan 2026.
Copyright: © 2026 Smith, Hong, Wolny, Stocker and Pachepsky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jaclyn E Smith
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
