Editorial: Machine Learning in Biomolecular Simulations
- 1Schmid College of Science and Technology, Chapman University, United States
- 2Department of Biomedical and Pharmaceutical Sciences, School of Pharmacy, Chapman University, United States
- 3University of Chemistry and Technology in Prague, Czechia
- 4University College London, United Kingdom
Interest in machine learning is growing in all fields of science, industry and business. This interest was not primarily initiated by new theoretical findings. Interestingly, the theoretical basis of the majority of machine learning techniques, such as artificial neural networks, decision trees or kernel methods, have been known for a relatively long time. Instead, there are other effects that triggered the recent boom of machine learning.
First, machine learning needs data to learn on. Huge data sets from Internet, Internet of Things, social networks, phones, wearable devices and other sources are now available. Such datasets were not available a decade ago. Second, the recent wave of machine learning benefits from hardware advances, in particular from computing on graphical processing units and specialized hardware.
Biomolecular modeling and simulations are an ideal field for the application of machine learning approaches in the spirit of the recent boom of machine learning. Biomolecular simulations produce large amounts of data in the form of trajectories that can be used to train machine learning algorithms. At the same time, vast amounts of genomic data were critical in allowing AlphaFold in leading the field of de novo protein prediction in the most recent CASP protein prediction round. Moreover, GPUs are routinely used in biomolecular simulations for more than a decade to offload critical parts of calculation.
This research topic collects eight innovative works showcasing the application of machine learning in biomolecular simulations and related fields. It demonstrates major machine learning approaches such as artificial neural networks, random forests and non-linear dimensionality reduction methods. These techniques are applied in analysis of trajectories, acceleration of biomolecular simulations, parametrization of force fields and other tasks.
Helfrecht and co-workers (Helfrecht et al., 2019) present an alternative to classical definitions of structural motifs in proteins. Classical definitions of secondary and super-secondary structures are based on intuitive criteria, such as hydrogen bonds, dihedral angles and others and have been widely used. However they experience problems with borderline and partially disordered structures. This article presents an alternative based on machine learning, namely on Probabilistic Analysis of Molecular Motifs algorithm previously developed in the group.
The article from Trapl and co-workers (Trapl et al., 2019) presents a program Anncolvar. This tool makes it possible to approximate a collective variable using a simple neural network. The choice of optimal collective variables is crucial to the convergence of enhanced algorithms based on them. Anncolvar is shown to be very useful for collective variables that cannot be explicitly calculated on-the-fly or computationally expensive collective variables.
Wang and co-workers (Wang et al., 2019) used classical as well as by unsupervised and supervised machine learning methods (principal component analysis, random forest) to analyze protein dynamics. They analysed trajectories of an enzyme linked to antibiotic resistance β-lactamase, simulated in multiple conformational states.
Intrinsically disordered proteins (IDPs) are a hot topic given that about 10 % of all proteins are disordered, and about 40 % of eukaryotic proteins have at least one long disordered loop. It has been shown that proteins can have a function despite not having a stable conformation. This brings a new challenge in analysis of dynamics. Grazioli and co-workers (Grazioli et al., 2019) use machine learning and network models on simulation trajectories of amyloid beta in its wild type and its medicinally relevant mutant. They show that machine learning analysis can explain the difference between protein variants. This was not possible by conventional trajectory analysis methods.
The article of Agajanian and co-workers (Agajanian et al., 2019) drives us more into the bioinformatics area. Recent applications of next-generation sequencing makes it possible to identify the role of mutations associated with cancer. The authors integrated multiple machine learning approaches to classify mutations an the basis of nucleotide sequence. The approach is further illustrated on biomolecular simulations of cancer associated protein kinases.
Tribello and Gasparotto (Tribello and Gasparotto, 2019) use unsupervised machine learning methods to analyse simulation trajectories. Trajectory of the C-terminal fragment of the immunoglobulin binding domain B1 of protein G of Streptococcus was used as a model trajectory and analysed by a range of mostly non-linear dimensionality reduction methods, namely principal component analysis, distance matching, laplacian eigenmaps, Isomap, tSNE and sketchmap. These methods are illustrated together with clustering methods. The article provides an overview of these methods and their advantages and disadvantages are discussed.
Kinetics of drug unbinding is recently becoming equivalently or even more important than binding thermodynamics in drug design as a parameter distinguishing between good and bad compounds. The article of Kokh and co-workers (Kokh et al., 2019) addresses this problem by machine learning. There are several trajectories of spontaneous drug binding available in literature. Drug unbinding is several orders of magnitude slower and today cannot be simulated without enhanced sampling. The authors analysed a series of trajectories from enhanced sampling method Random Accelerated Molecular Dynamics, in particular its variant designed for simulation of drug unbinding kinetics. The approach has been tested on a series of heat shock protein 90 ligands differing by four orders of magnitude in their unbinding rates. Excellent agreement with experiment was obtained for most classes of compounds.
There is a growing number of works indicating that molecular mechanics potentials (force fields) developed for compactly folded proteins may fail in modeling of unfolded proteins and especially IDPs. This fact motivated Demerdash and co-workers (Demerdash et al., 2019) to optimize force field for IDPs on the basis of data from small-angle X-ray/neutron scattering. This was done by iterative rounds of molecular dynamics simulations and comparison with experimental data. This approach was demonstrated on three IDPs.
We believe that the papers included in this research topic demonstrate the great potential of machine learning in all fields pertaining biomolecular modeling and simulations, including in improving the accuracy of the models, in the analysis of molecular simulations and in providing effective variables to enhance the sampling. With this research topic Frontiers Molecular Biosciences aspires to become a key forum for publishing of approaches combining machine learning with biomolecular simulations and further promote this multidisciplinary field.
Keywords: machine learning, Molecular modeling, Molecular dynaamics computer simulation, intrinsically disordered protein, Ligand design, Collective variable, sampling enhancement, Nonlinear dimension reduction, secondary structure, beta-lactamase, Kinetics
Received: 12 Aug 2019;
Accepted: 16 Aug 2019.
Copyright: © 2019 Verkhivker, Spiwok and Gervasio. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Dr. Vojtech Spiwok, University of Chemistry and Technology in Prague, Prague, Czechia, email@example.com