Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Astron. Space Sci.

Sec. Astrobiology

Volume 12 - 2025 | doi: 10.3389/fspas.2025.1651953

This article is part of the Research TopicMachine Learning Applications in the Search for Life Beyond EarthView all articles

Local-NPDR: A Novel Variable Importance Method for Explainable Machine Learning and False Discovery Diagnosis for Ocean Worlds Biosignatures

Provisionally accepted
  • 1Computer Science, University of Tulsa, Tulsa, OK, United States
  • 2University of South Florida, Tampa, United States
  • 3Stockton University, Galloway, United States
  • 4NASA Goddard Space Flight Center, Greenbelt, United States

The final, formatted version of the article will be published soon.

Explainable machine learning (ML) is important for biosignature prediction on future astrobiology missions to minimize the risk of false positives due to geochemical biotic mimicry and false negatives due to environmental factors that mask biosignatures. ML models often use feature importance scores to provide insights into model prediction mechanisms by quantifying each variable's contribution to the prediction. Global variable importance methods aggregate information across all training samples and therefore do not provide interpretation for the classification of a single sample. In contrast, local variable importance scores quantify the contribution of variables to the classification of a single sample and can therefore help explain why the sample was predicted to be in a certain class and diagnose whether it is a false prediction. We present a new local variable importance method that handles nonlinearity, statistical interactions, and includes penalized feature selection. Our approach represents a local version of Nearest-neighbor Projected Distance Regression (NPDR) feature selection. We evaluate local-NPDR on complex simulated data and real data from a study of carbon and oxygen isotopic biosignatures using laboratory-generated ocean world analogue brines. The ability of local-NPDR to differentiate between true and false predictions is compared with other common local importance methods. To diagnose individual false predictions, we use the concordance between global-and local-NPDR scores, allowing the mechanisms of a true or false prediction to be explained. We illustrate the capacity of local-NPDR to integrate scientific explanations of single-sample ML predictions to support a more comprehensive framework for biosignature detection.

Keywords: Explainable Machine Learning, Biosignature detection, Local Importance Scores, Ocean worlds, Biotic Mimicry

Received: 22 Jun 2025; Accepted: 13 Aug 2025.

Copyright: © 2025 Clough, Major, Seyler, Da Poian, Theiling and McKinney. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Brett A McKinney, Computer Science, University of Tulsa, Tulsa, 74014, OK, United States

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.