Skip to main content

ORIGINAL RESEARCH article

Front. Mol. Biosci.
Sec. Structural Biology
Volume 11 - 2024 | doi: 10.3389/fmolb.2024.1393564
This article is part of the Research Topic Breakthroughs in Cryo-EM with Machine Learning and Artificial Intelligence View all 4 articles

Towards Interpretable Cryo-EM: Disentangling Latent Spaces of Molecular Conformations

Provisionally accepted
  • 1 SLAC National Accelerator Laboratory, Stanford University, Menlo Park, California, United States
  • 2 University of Helsinki, Helsinki, Uusimaa, Finland
  • 3 University of California, Santa Barbara, Santa Barbara, California, United States

The final, formatted version of the article will be published soon.

    Molecules are essential building blocks of life and their different conformations (i.e., shapes) crucially determine the functional role that they play in living organisms. Cryogenic Electron Microscopy (cryo-EM) allows for acquisition of large image datasets of individual molecules.Recent advances in computational cryo-EM have made it possible to learn latent variable models of conformation landscapes. However, interpreting these latent spaces remains a challenge as their individual dimensions are often arbitrary. The key message of our work is that this interpretation challenge can be viewed as an Independent Component Analysis (ICA) problem where we seek models that have the property of identifiability. That means, they have an essentially unique solution, representing a conformational latent space that separates the different degrees of freedom a molecule is equipped with in nature. Thus, we aim to advance the computational field of cryo-EM beyond visualizations as we connect it with the theoretical framework of (nonlinear) ICA and discuss the need for identifiable models, improved metrics, and benchmarks. Moving forward, we propose future directions for enhancing the disentanglement of latent spaces in cryo-EM, refining evaluation metrics and exploring techniques that leverage physics-based decoders of biomolecular systems. Moreover, we discuss how future technological developments in time-resolved single particle imaging may enable the application of nonlinear ICA models that can discover the true conformation changes of molecules in nature. The pursuit of interpretable conformational latent spaces will empower researchers to unravel complex biological processes and facilitate targeted interventions. This has significant implications for drug discovery and structural biology more broadly. More generally, latent variable models are deployed widely across many scientific disciplines. Thus, the argument we present in this work has much broader applications in AI for science if we want to move from impressive nonlinear neural network models to mathematically grounded methods that can help us learn something new about nature.

    Keywords: cryo-EM, machine learning, ICA, AI for Science, Disentanglement, Physics-based models

    Received: 29 Feb 2024; Accepted: 22 May 2024.

    Copyright: © 2024 Klindt, Hyvärinen, Levy, Miolane and Poitevin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    David Klindt, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94025, California, United States
    Frédéric Poitevin, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94025, California, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.