HEMNMA-3D: Cryo Electron Tomography Method Based on Normal Mode Analysis to Study Continuous Conformational Variability of Macromolecular Complexes

Cryogenic electron tomography (cryo-ET) allows structural determination of biomolecules in their native environment (in situ). Its potential of providing information on the dynamics of macromolecular complexes in cells is still largely unexploited, due to the challenges of the data analysis. The crowded cell environment and continuous conformational changes of complexes make difficult disentangling the data heterogeneity. We present HEMNMA-3D, which is, to the best of our knowledge, the first method for analyzing cryo electron subtomograms in terms of continuous conformational changes of complexes. HEMNMA-3D uses a combination of elastic and rigid-body 3D-to-3D iterative alignments of a flexible 3D reference (atomic structure or electron microscopy density map) to match the conformation, orientation, and position of the complex in each subtomogram. The elastic matching combines molecular mechanics simulation (Normal Mode Analysis of the 3D reference) and experimental, subtomogram data analysis. The rigid-body alignment includes compensation for the missing wedge, due to the limited tilt angle of cryo-ET. The conformational parameters (amplitudes of normal modes) of the complexes in subtomograms obtained through the alignment are processed to visualize the distribution of conformations in a space of lower dimension (typically, 2D or 3D) referred to as space of conformations. This allows a visually interpretable insight into the dynamics of the complexes, by calculating 3D averages of subtomograms with similar conformations from selected (densest) regions and by recording movies of the 3D reference's displacement along selected trajectories through the densest regions. We describe HEMNMA-3D and show its validation using synthetic datasets. We apply HEMNMA-3D to an experimental dataset describing in situ nucleosome conformational variability. HEMNMA-3D software is available freely (open-source) as part of ContinuousFlex plugin of Scipion V3.0 (http://scipion.i2pc.es).

Cryogenic electron tomography (cryo-ET) allows structural determination of biomolecules in their native environment (in situ). Its potential of providing information on the dynamics of macromolecular complexes in cells is still largely unexploited, due to the challenges of the data analysis. The crowded cell environment and continuous conformational changes of complexes make difficult disentangling the data heterogeneity. We present HEMNMA-3D, which is, to the best of our knowledge, the first method for analyzing cryo electron subtomograms in terms of continuous conformational changes of complexes. HEMNMA-3D uses a combination of elastic and rigid-body 3D-to-3D iterative alignments of a flexible 3D reference (atomic structure or electron microscopy density map) to match the conformation, orientation, and position of the complex in each subtomogram. The elastic matching combines molecular mechanics simulation (Normal Mode Analysis of the 3D reference) and experimental, subtomogram data analysis. The rigid-body alignment includes compensation for the missing wedge, due to the limited tilt angle of cryo-ET. The conformational parameters (amplitudes of normal modes) of the complexes in subtomograms obtained through the alignment are processed to visualize the distribution of conformations in a space of lower dimension (typically, 2D or 3D) referred to as space of conformations. This allows a visually interpretable insight into the dynamics of the complexes, by calculating 3D averages of subtomograms with similar conformations from selected (densest) regions and by recording movies of the 3D reference's displacement along selected trajectories through the densest regions. We describe HEMNMA-3D and show its validation using synthetic datasets. We apply HEMNMA-3D to an experimental dataset describing in situ nucleosome conformational variability. HEMNMA-3D software is available freely (open-source) as part of ContinuousFlex plugin of Scipion V3.0 (http://scipion.i2pc.es).

INTRODUCTION
Cryogenic electron microscopy (cryo-EM) image collection and analysis technique referred to as single-particle analysis (SPA) allows near-atomic structural resolution of purified biomolecular complexes (in vitro). It is based on the principle of reconstructing a three dimensional (3D) structure from two dimensional (2D) parallel-beam projection images of vitrified specimens containing many copies of the same macromolecular complex at unknown orientations and positions. The 3D reconstruction requires extracting macromolecular complexes (particles) from the collected images into individual (single-particle) images and determining the particle orientation and position in every single-particle image. On the other hand, cryogenic electron tomography (cryo-ET) is gaining popularity for studying biomolecular complexes in their native environments (in situ). Cryo-ET requires the acquisition of multiple 2D projection images of the specimen in a range of orientations. In most practices, the specimen is physically rotated around a single axis (perpendicular to the electron beam) inside the cryo electron microscope. An image is collected at each tilting angle in a specific range (e.g., −60 to 60 • with a step of 1 • ), yielding a tilt series representing 2D projections of the specimen. The tilt series is then used to computationally reconstruct a 3D volume called tomogram. A tomographic reconstruction typically contains hundreds of copies of a target biomolecular complex at unknown orientations and positions. These copies are then identified and extracted into individual (single-particle) volumes called subtomograms, either manually or semi-automatically (via template matching methods). Subtomograms suffer from a low signal-to-noise ratio (SNR), which is due to exposing the sample to a low electron dose during data acquisition in order to preserve the fragile biological structure. Additionally, subtomograms suffer from the so-called missing wedge artifacts, which are due to inability to include in the 3D reconstruction the images from all orientations (the maximum tilt angle in the microscope is usually limited to ± 60 • ). The missing wedge artifacts are often observed as elongation along the beam axis, blurring, and distracting caustics in the subtomograms. Due to the low SNR and the missing wedge artifacts, cryo-ET data processing is mainly based on rigid-body aligning and averaging many subtomograms to enhance the data quality and reveal the targeted biomolecular structure (Leigh et al., 2019).
The primary technique for macromolecular structural determination is so-called subtomogram averaging (StA), in which subtomograms are classified, rigid-body aligned and averaged into 3D density maps iteratively (Mahamid et al., 2016;Schur et al., 2016;Wan and Briggs, 2016;Albert et al., 2017;Böck et al., 2017;Bykov et al., 2017;Pfeffer et al., 2017;Riedel et al., 2017;Wan et al., 2017;Davies et al., 2018;Guo et al., 2018;Hutchings et al., 2018;Kovtun et al., 2018;Mosalaganti et al., 2018;Park et al., 2018;Kaplan et al., 2019;Rapisarda et al., 2019). However, with recent instrumentation and software development, more research moves in the direction of studying single-particle subtomograms individually (with no or a minimum of averaging) by developing new methods for denoising, missing wedge correction, and 3D reconstruction (Zhang and Ren, 2012;Moebel and Kervrann, 2020;Zhai et al., 2020). Biomolecular complexes are not rigid but flexible entities with gradual (continuous) conformational transitions, and this flexibility is usually referred to as continuous conformational variability. If not properly taken into account, conformational heterogeneity limits the resolution of the resulting 3D structure. However, SPA research in the last decade has shown that disentangling the different conformations and identifying the conformational transitions from heterogeneous samples is valuable to study molecular mechanisms of action of complexes (Dashti et al., 2014;Jin et al., 2014;Zhou et al., 2015;Abeyrathne et al., 2016;Banerjee et al., 2016;Haselbach et al., 2018). The majority of available SPA computational methods rely on optimized biochemical specimen preparation protocols and data classification, and simplify the problem of conformational heterogeneity by assuming that the data can be classified into a small number of different conformations (Penczek et al., 2006(Penczek et al., , 2011Fu et al., 2007;Elad et al., 2008;Scheres, 2012;Lyumkis et al., 2013). However, some SPA methods explicitly take into account continuous conformational variability and aim at determining the full conformational distribution (Dashti et al., 2014;Jin et al., 2014;Sorzano et al., 2014;Katsevich et al., 2015;Tagare et al., 2015;Frank and Ourmazd, 2016;Andén and Singer, 2018;Harastani et al., 2020). They represent images in a lowdimensional space, referred to as space of conformations or energy landscape, and allow a 3D visualization of conformational changes along trajectories in this space. For more information on SPA methods for continuous conformational variability analysis, the reader is referred to the recent reviews by Jonić (2017) and Sorzano et al. (2019).
Methods reported to deal with cryo-ET data heterogeneity are based on rigid-body alignment and can be classified into (i) post-alignment classification approaches, and (ii) simultaneous alignment and classification approaches (Förster et al., 2008;Scheres et al., 2009;Stölken et al., 2011;Xu et al., 2012;Chen et al., 2014;Bharat and Scheres, 2016;Himes and Zhang, 2018). In the first family, the starting point is usually the covariance matrix representing the similarities of each pair of aligned subtomograms. The covariance matrix serves as a basis for a classification technique with some variants including dimensionality reduction. The second family of methods is based on competitive alignment. An example of the competitive alignment is a multireference alignment in which a subtomogram is compared with a set of different references provided by an expert user based on a prior knowledge and, then, attributed to the reference that yields the highest similarity score. Another example is maximum-likelihoodbased alignment, where each subtomogram contributes to all references with a probability. The main drawback of the postalignment classification approaches is that the classification is heavily dependent on the alignment quality that degenerates with broadly heterogeneous specimens. The main drawback of the simultaneous alignment and classification methods is that the number of classes must be decided and set prior to the use of the methods. Besides, the methods that require prior knowledge of the specimen's anticipated conformations are prone to overfitting and data misinterpretation. Finally, as macromolecular complexes are not rigid but flexible entities with continuous conformational transitions, particles assigned to the same class will rarely, if ever, have perfectly identical conformations. For more information regarding the classification based techniques for cryo-ET conformational heterogeneity, the reader is referred to a recent review on the available techniques by Castaño-Díez and Zanetti (2019).
Existing multivariate statistical analysis techniques adapted to cryo-ET analysis of continuous flexibility of particular systems have been used previously (e.g., Mattei et al., 2016). However, to the best of our knowledge, no method is currently available that has been specifically designed for cryo-ET analysis of continuous conformational variability of a general-case macromolecular complex. In this article, we present one such method, named HEMNMA-3D, which allows analyzing continuous conformational variability of macromolecular complexes by cryo-ET. It is inspired by HEMNMA, a method for continuous conformational variability analysis in SPA (Jin et al., 2014;Sorzano et al., 2014;Harastani et al., 2020). HEMNMA interprets the conformation in each cryo-EM single-particle image by comparing this image with 2D projections of a 3D reference (an atomic structure or a density map) deformed elastically using normal modes. Normal Mode Analysis (NMA) is a method for molecular mechanics simulation. One of its main applications is elastic deformation of an existing atomic structure of one conformation to fit an electron microscopy density map (EM map) of a different conformation of the same macromolecule, which is usually known as normal mode flexible fitting and allows obtaining atomic resolution models for the EM map (Tama et al., 2004a,b). However, as HEMNMA, HEMNMA-3D can use normal modes of an atomic structure or a density map. As in the case of the reference density-map structure in HEMNMA, HEMNMA-3D converts the density map into a collection of 3D Gaussian functions, referred to as pseudoatoms, and computes normal modes of this pseudoatomic structure, following the procedures described in Nogales-Cadenas et al. (2013), Jin et al. (2014), and Jonić and Sorzano (2016). HEMNMA-3D uses the atomic or pseudoatomic normal modes to elastically deform the 3D reference to match the conformation of the complex in the given series of subtomograms (3D data) (see Figure 1). More precisely, HEMNMA-3D uses a combination of elastic (based on normal modes) and rigid-body 3D-to-3D iterative alignments of the 3D reference to match the conformation, orientation, and position of the complex in each subtomogram, and includes compensation for the missing wedge. The conformational parameters (amplitudes of normal modes) of the complexes in subtomograms obtained through the alignment are then processed to visualize the distribution of conformations in a space of lower dimension (typically, 2D or 3D) referred to as space of conformations. This space allows a visually interpretable insight into the dynamics of complexes, by calculating 3D averages of subtomograms with similar conformations from selected (densest) regions and by recording movies of the 3D reference's displacement along selected trajectories in the densest regions.
In this article, we describe HEMNMA-3D and show its validation using synthetic datasets. Additionally, we show an application of HEMNMA-3D with an experimentally obtained dataset for in situ nucleosome conformational variability. HEMNMA-3D software is available freely (opensource) as part of ContinuousFlex plugin of Scipion V3.0 (http://scipion.i2pc.es). The article is organized as follows: section 2 describes building blocks of HEMNMA-3D workflow, notably, in 2.5 we describe the software developed for this method. In section 3, we present (i) the process of synthesis of test datasets and HEMNMA-3D validation using these synthetic test data, and (ii) use of HEMNMA-3D with experimental, in situ nucleosome data, and we discuss these results. The conclusions are provided in section 4.

MATERIALS AND METHODS
The flowchart in Figure 2 describes the workflow of the proposed method, which was inspired by the workflow of HEMNMA (Jin et al., 2014;Harastani et al., 2020). A graphical summary of the method is presented in Figure 3. The workflow comprises the following steps: (1) Input: the input to the method are a reference structure and a set of subtomograms. In the case where the reference structure is a density map (a 3D volume, such as an EM map or a subtomogram average), a conversion to 3D Gaussian functions (pseudoatoms) takes place. (2) Normal mode analysis of the reference atomic structure or the reference pseudoatomic structure (obtained by converting the reference density map into 3D Gaussian functions in the previous step). (3) Combined iterative elastic and rigid-body 3D-to-3D alignment of the reference structure with each input subtomogram independently from other subtomograms, with missing wedge compensation.
(4) Visualization of the computed conformations, after projecting the conformational parameters obtained for all subtomograms onto a low-dimensional space. In the remaining part of this section, we describe these steps in more detail. Please note that the first two steps of the workflow are exactly as those of HEMNMA and were thoroughly presented, tested and discussed in our previously published works on HEMNMA, its tools and applications (Nogales-Cadenas et al., 2013;Jin et al., 2014;Sorzano et al., 2014;Jonić and Sorzano, 2016;Harastani et al., 2020). However, for completeness of the present article, we here recall their basic principles. We close this section by a brief description of the software implemented for HEMNMA-3D.

Input Reference and Conversion of Reference Density Maps Into Pseudoatoms
A reference structure of the molecule targeted in the subtomograms can be used in the form of an atomic model (PDB formatted files) or a density map, such as an EM map (SPA reconstruction) or a subtomogram average (obtained using classical StA without taking into account conformational heterogeneity). Although our method can be used with both atomic and density-map reference structures, one should prefer the use of a reference density map from the data at hand, if it can be obtained. If a reference density map is used, it must be FIGURE 1 | The general scheme of elastic deforming of a reference structure (atomic or pseudoatomic) using normal modes to fit a density map (e.g., an EM map or a subtomogram average).
converted into a collection of Gaussian functions (pseudoatoms) with a carefully selected standard deviation (pseudoatom size, whose default value is 1 voxel Nogales-Cadenas et al., 2013;Jonić and Sorzano, 2016). The pseudoatom size should lead to a structure (called pseudoatomic structure) that, converted back to a density map, approximates the input density map with a small error (given a target approximation error, whose default value is 5% Nogales-Cadenas et al., 2013;Jonić and Sorzano, 2016). Optionally, a mask on the density map can be used prior to the conversion into pseudoatoms (e.g., a spherical binary mask of a given radius) to reduce background noise. Such masks may also be useful if applied on input cryo-ET subtomograms to maximize the chance of having a single molecular complex in each subtomogram (Preprocessing block in the workflow in Figure 2A).

Normal Mode Analysis
This step involves computing normal modes of a reference atomic or pseudoatomic structure, for the 3D-to-3D elastic alignment in the next step. The computation of normal modes is based on the elastic network model (Tirion, 1996;Tama et al., 2002) by representing the interaction between the (pseudo-)atoms as if they are locally connected by elastic springs (within a cutoff distance). Normal Mode Analysis requires the diagonalization of a 3N × 3N matrix of second derivatives of the potential energy (Hessian matrix), where N is the number of nodes in the elastic network model determined by the total number of atoms (or pseudoatoms) in the input reference. In the case of atomic structures, we use the rotation-translation block (RTB) method, which divides the structure into blocks (one or a few consecutive residues per block) whose rotations and translations are considered rather than all degrees of freedom for all atoms (Durand et al., 1994;Tama et al., 2000). Since the RTB method reduces the basis for Hessian diagonalization, it allows fast computing of normal modes. Since pseudoatomic structures usually contain fewer nodes (pseudoatoms) than atomic structures, normal modes can be obtained by a direct diagonalization of the 3N × 3N Hessian, which is referred to as the Cartesian method. As in the case of HEMNMA, we here use the RTB and Cartesian method implementations of Tama et al. (2002) and Suhre and Sanejouand (2004), respectively. Larger values of the interaction cutoff distance (the distance below which atoms or pseudoatoms do not interact) lead to more rigid motions. The atomic interaction cutoff distance may be set manually (by default 8 Å) and the pseudoatomic cutoff distance is recommended to be computed automatically based on the distribution of the pseudoatomic pairwise distances (e.g., as the value below which is a given percentage of all distances as in Nogales-Cadenas et al., 2013;Jin et al., 2014;Jonić and Sorzano, 2016;Harastani et al., 2020). The modes are computed along with their respective collectivity degrees, which count the number of atoms or pseudoatoms affected by the mode as in Brüschweiler (1995). To allow faster data analysis and avoid noise overfitting in the 3D-to-3D elastic alignment in the next step, we select a subset of normal modes (usually, less than 10) with lowest frequencies and highest collectivities, as previously described (Jin  Sorzano et al., 2014;Harastani et al., 2020). Lowfrequency high-collectivity normal modes have been shown to be relevant to functional conformational changes (Tama and Sanejouand, 2001;Delarue and Dumas, 2004;Wang et al., 2004;Ma, 2005;Suhre et al., 2006;Tama and Brooks, 2006). The first six (lowest-frequency) normal modes are related to rigid-body transformations and are thus not used for the 3D-to-3D elastic alignment in the next step. The rigid-body 3D-to-3D alignment is done without using these rigid-body normal modes, as explained in the next paragraph. Additionally, a prior knowledge about the conformational transitions of the complex under study can be used to select the normal modes for the use in the next step. The HEMNMA-3D graphical interface helps the user decide which normal modes to select. The reader is referred to Ma (2005) and Tama and Brooks (2006) for reviews on the usefulness and limitations of NMA.

Combined Iterative Elastic and
Rigid-Body 3D-to-3D Alignment This step, represented in Figure 2B, is the backbone of the proposed method. It has been inspired by the combined iterative elastic and rigid-body 3D-to-3D alignment step of StructMap method , which was proposed for pairwise similarity analysis of SPA high-resolution EM maps (no missing wedge). In HEMNMA-3D proposed here, this step comprises simultaneous NMA-based elastic alignment (search for amplitudes of a linear combination of normal modes) and rigid-body alignment (search for orientation and position, meaning three Euler angles and x, y, and z shifts) of the reference structure with each given subtomogram. It refines the amplitudes of displacement along each used normal mode (elastic parameters) as well as the angles and shifts (rigid-body parameters) of the reference structure until the best match is obtained between this reference structure and the given subtomogram. The latter is achieved by maximizing the similarity between the subtomogram and the density volume from the elastically deformed, oriented and shifted reference, and includes missing wedge compensation. The missing wedge compensation is done by calculating the cross-correlation between the reference and subtomogram density maps only in the region of the Fourier space where the data can be trusted, i.e., by constraining the cross-correlation evaluation to the Fourier space region that excludes the missing wedge region (the region outside of the one specified by the tilt angle range, e.g., −60 to +60 • ). To maximize this constrained cross-correlation (CCC), we use a variant of Powell's UOBYQA method, which subjects the objective function to a trust-region radius (Berghen and Bersini, 2005). To control the elastic deformation with highly noisy data, the radius of the trust region is adjusted iteratively. The scaling factor of the initial trust-region radius is a parameter that controls the normal-mode amplitude search range and can be modified by the user. It should have a positive value and its default value of 1 produces good results in general. It may be increased (typically to a value between 1 and 2) or decreased (e.g., between 0.5 and 0.9), if expecting larger or smaller conformational changes, respectively. For each subtomogram, the normal mode amplitudes are initiated with zeros, meaning that the nondeformed reference is used in the first iteration. As the iterations evolve, the reference model is displaced with the new guesses of the normal mode displacement amplitudes, converted into a volume and rigid-body aligned with the subtomogram using the method of fast rotational matching. Fast rotational matching has been largely used for rigid-body fitting of atomic models to high-SNR consensus EM maps (Kovacs and Wriggers, 2002;Kovacs et al., 2003). It has been extended to alignment of noisy subtomograms in Chen et al. (2013) and this implementation is used in our work. At the end of each iteration, the CCC is found and fed to the numerical optimizer (Berghen and Bersini, 2005). The iterations repeat until the final value of the trust-region radius or the maximum number of iterations is reached.

Visualizing and Utilizing the Space of Conformations
The number of elastic alignment parameters (normal mode amplitudes) is determined by the number of selected normal modes for the 3D-to-3D elastic alignment. The ensemble of normal mode amplitudes (for all subtomograms) can be projected onto a lower-dimensional space, so-called conformational space, using a dimensionality reduction technique. Here, we use linear Principal Component Analysis (PCA) as it is the most widely known and intuitively clear dimensionality reduction method, but other dimension reduction methods could also be used (linear or nonlinear). The dimensionality reduction is usually performed to two or three dimensions, which allows a global data display and easier modeling of conformational changes. Each point in the conformational space represents a subtomogram and close points correspond to similar conformations in the subtomograms. The points that differ significantly from the remaining observations (too isolated, outlier points) may be excluded from the further analysis, by excluding the points below a certain p-value based on the Mahalanobis distance (the distance between each point and the whole distribution) (Mahalanobis, 1936). The excluded points can be explained by the fact that some orientations of the molecule combined with the missing wedge artifacts and the high noise make some volumes more difficult to align with the elastically deformed reference. After excluding such outlier points, the space of conformations can be analyzed to reveal molecular dynamics. This can be done by averaging subtomograms of similar conformations in the densest regions of the conformational space or by exploring the densest regions by fitting curves (approximation by line segments) through the data and displacing the reference structure along these curves (referred to as trajectories) to animate the motion along them. The 1-CCC color bar of the conformational space shows coloring the points according to the value of the CCC between the subtomograms and the density maps from the elastically deformed reference model. Subsequently, the colors provide the level of confidence in the obtained conformations. Those subtomograms in which we have less confidence (subtomograms with lower CCC values) than in the "consensus" observations (subtomograms with higher CCC values) can be eliminated from the group averages.

Averaging Subtomograms of Similar Conformations
Close points in the conformational space can be grouped, which results in grouping subtomograms of similar conformations and averaging them. Before computing group averages, the rigid-body alignment parameters found during the combined iterative elastic and rigid-body alignment are applied on the subtomograms. Optionally, before computing group averages, the missing-wedge Fourier space region of individual subtomograms may be filled in with the corresponding region of the global average computed from all subtomograms. A similar procedure of missing wedge filling of individual subtomograms is used in EMAN2 software package (Galaz-Montoya et al., 2015). The subtomogram averages obtained from the selected groups of subtomograms can be overlapped and compared to understand the conformational changes of the complex in the given set of subtomograms.

Animating Motions (Trajectories)
Distinct trajectories can be determined through the data in the conformational space, and animated to see the motion of the biomolecule while it is displaced along the trajectory. To animate a trajectory, several points (e.g., 10) along the trajectory should be mapped back to the original displacement space (e.g., using inverse PCA), resulting in elastic alignment parameters that can be used to deform the reference atomic or pseudoatomic structure. Concatenating and displaying the resulting structures can show a movie-like animation of the reference biomolecule traveling across the specified trajectory.

Software Implementation and Technical Details
The software of HEMNMA-3D method proposed here is freely available (open-source). It is a part of ContinuousFlex plugin for the open-source software Scipion3 (De la Rosa-Trevín et al., 2016). ContinuousFlex was introduced in Harastani et al. (2020) and also contains HEMNMA software. The software provides a graphical user interface (GUI) and is empowered with a C++ backend with a message passing interface (MPI) parallelization scheme to efficiently analyze large datasets (simultaneous analysis of N subtomograms using N computing threads). We tested the software on our local workstations and on supercomputer centers. On our local workstations (2.2 GHz Intel Xeon Silver 4214 CPU processors), the current implementation takes around 10 and 30 min to analyze a subtomogram of size 64 3 voxels with three and six normal modes, respectively. These times are reported for the two types of complexes used in this article, together with the number of normal modes used in these two cases. They are the average times required for all iterations of analyzing one subtomogram using a single computing thread while the different subtomograms are analyzed in parallel using different computing threads (if the time is measured for the entire dataset, it will vary with the size of the dataset). It should be noted that the software allows the use of any number of MPI threads and any number of normal modes. However, the more modes are used, the slower the processing. Finally, it should be noted that there is no constraint regarding the size of the dataset (the number of subtomograms) or the size of the individual subtomograms (the number of voxels) that can be analyzed with our software.

RESULTS AND DISCUSSION
In this section, we present and discuss the results of HEMNMA-3D with synthetic and experimental subtomograms.

Synthesizing Datasets for Testing the Method Performance
For testing HEMNMA-3D in general, and the combined elastic and rigid-body 3D-to-3D alignment module in particular (which is the core module of the proposed method), we synthesized two datasets of conformationally heterogeneous subtomograms that mimic discrete and continuous conformational variability, called "Discrete" and "Continuous" datasets, respectively. The flowchart for the data generation procedure is shown in Figure 4 and is detailed in the following.
The "Continuous" dataset comprises 1,000 synthetic subtomograms representing a continuum of conformations of the same PDB:4AKE structure. We generated this dataset using this atomic structure and its modes 7 and 8 using a linear relationship between the amplitudes of the two modes. More precisely, the synthesized amplitudes of modes 7 and 8 were identical and randomly distributed in the range [−200, +200] (uniform distribution).
Normal-mode amplitudes do not have a physical unit. Nonetheless, the Root Mean Square Deviation (RMSD) (Kufareva and Abagyan, 2011) between the reference atomic coordinates and these coordinates displaced using normal-mode amplitudes transforms the normal mode amplitudes in physical units. To provide a basis for further evaluation of the method performance, we found a RMSD of 6.95 Å corresponding to the displacement using the amplitude of 200 for each of the two combined modes 7 and 8 (this represents one half of the full range of the synthesized motion).
To generate a subtomogram, first, we deform the atomic structure using appropriate amplitudes for the selected normal modes depending on the dataset in hand, i.e., we use (mode 7, mode 8) = (+150, 0) or (−150, 0) or (0, +150) to create a subtomogram in the "Discrete" dataset, while we assign a random value in the range [−200, 200] for both mode 7 and mode 8 to generate a subtomogram in the "Continuous" dataset. Then, we convert the deformed structure to a volume of size 64 3 voxels and the voxel size of 2.2 Å 3 (Peng et al., 1996). Afterwards, we rotate and shift this volume in 3D space using random Euler angles (each of the three Euler angles was randomized in the range [0, 360 • ]) and random shifts (the shift along each of the x, y, and z axes was randomized in the range [−5, +5] voxels), and we project the rotated and shifted volume using tilt values −60 to +60 • to obtain a tilt series. We simulate microscope conditions by adding heavy noise (signal to noise ratio SNR = 0.01) and modulating the images with a contrast transfer function (CTF) of defocus −1 µm, so that one part of the noise is affected by the CTF and the other is not (Sorzano et al., 2007; HEMNMA-3D FIGURE 4 | Flowcharts of synthesis of the datasets used for testing and validating HEMNMA-3D, namely "Discrete" dataset (left) and "Continuous" dataset (right). et al., 2013). Finally, we reconstruct a volume (our synthetic subtomogram) from the tilt series using a Fourier reconstruction method (Sorzano et al., 2013). A few examples of the synthesized subtomograms (SNR = 0.01) and their less noisy version (SNR = 0.5, for illustration) is presented in Figure 5.

Synthetic Discrete-Type Conformational Variability
In this experiment, our goal is to retrieve the ground-truth amplitudes of normal modes 7 and 8 by the combined elastic and rigid-body alignment (the core module of HEMNMA-3D) of a FIGURE 6 | Plots showing the output of the 3D-to-3D elastic and rigid-body alignment module of HEMNMA-3D with "Discrete" dataset (synthetic subtomograms are simulating discrete conformational heterogeneity). (A) Use of the atomic structure (chain A of PDB:4AKE) and its normal modes to estimate the conformational parameters (normal-mode amplitudes) and rigid-body parameters (orientation and shift) of the molecules in the input synthetic subtomograms. (B) Use of a pseudoatomic structure (from a simulated density map) and its normal modes to estimate the conformational and rigid-body parameters of the molecules in the input synthetic subtomograms. The goal was the retrieval of the ground-truth relationship between the amplitudes along normal modes 7 and 8; ideally, all data should lay in one of the following three clusters of normal-mode amplitudes: (mode 7, mode 8) ∈ {(-150, 0), (150, 0), (0, 150)}; each point in the plot represents a subtomogram and close points represent similar conformations. Note that the dashed curves enclose the data points where p-value > 0.01 in Table 1. See the text for more details on this experiment. TABLE 1 | Mean absolute error and standard deviation between the estimated and ground-truth normal-mode amplitudes along with the angular and shift distances obtained with HEMNMA-3D and "Discrete" synthetic dataset, using an atomic structure (Atomic) and simulated EM map (Volume) as input references.

Experiment
Mode 7  The data points below the p-value of 0.01 were excluded from the error evaluation based on the Mahalanobis distance measure (few data points differing significantly from the remaining observations, which would not be selected in real-case experiments as being too isolated and far from other points). The number of points used for the error computation is shown in the last column of the table (column Samples) and the region with the kept points (p-value > 0.01) is shown in Figure 6.
reference model with the subtomograms in the "Discrete" dataset. In other words, the goal is to find a solution for the challenging inverse problem of finding the conformation of the structure in each subtomogram. Since the proposed method can use two choices for the reference model, namely, an atomic structure and a density map (e.g., an EM map or a subtomogram average), we performed two types of tests. In the first test type, the atomic structure used to generate the synthetic subtomograms (chain A of the PDB:4AKE) was used as a reference for retrieving normal mode amplitudes of the synthetic subtomograms. In the second test type, we converted (Peng et al., 1996) the atomic structure into a density map (volume) of size 128 3 voxels and voxel size of 1 Å 3 , and we used this density map as a reference for retrieving normal mode amplitudes of the synthetic subtomograms. In the case of the reference density map, normal modes were computed from the corresponding structure obtained by converting the density map into pseudoatoms (1675 pseudoatoms for the given pseudoatom radius of 1.25 voxels and the target approximation error of 5%). In both cases (reference atomic structure and reference pseudoatomic structure, with their corresponding normal modes), we used three modes (modes 7, 8, and 9) instead of only two modes (modes 7 and 8 that were used to generate synthetic subtomograms), to make the 3D-to-3D elastic and rigid-body alignment task even more challenging. Figure 6 presents the estimated amplitudes of normal modes 7 and 8 (the estimated amplitude of normal mode 9 is close to 0 and is therefore not shown graphically). Table 1 presents the mean absolute error and the standard deviation between the estimated and ground-truth normal-mode amplitudes along with the angular and shift distances. In both test cases, the three distinct synthetic groups of subtomograms are correctly separated, taking into account the extreme noise level. The results show a less accurate alignment in the second case, which is expected since, in that case, the atomic structure was used to FIGURE 7 | Averages of the three groups (enclosed by ellipses) of subtomograms identified from the output of the 3D-to-3D elastic and rigid-body alignment module of HEMNMA-3D with "Discrete" dataset (shown in Figure 6A), using the atomic structure (chain A of PDB:4AKE) and its normal modes to estimate the conformational parameters (normal-mode amplitudes) and rigid-body parameters (orientation and shift) of the molecules in the input synthetic subtomograms. Subtomograms are represented by points and close points represent similar conformations. The numbers of volumes written above the shown subtomogram averages are the numbers of synthetic subtomograms used for computing these subtomogram averages (the numbers of points enclosed by the corresponding ellipses). On the bottom, the subtomogram averages are shown at 50% transparency along with the corresponding ground-truth deformed atomic structure (in red).
generate the dataset and the pseudoatomic structure was used as the reference model for the method to estimate the normalmode amplitudes from this generated dataset. This is in contrast to the first test case where the same atomic structure was used to create the dataset and as the reference for the method to estimate the normal-mode amplitudes from this dataset. We found a RMSD of 0.55 and 0.60 Å corresponding to a combined displacement along modes 7, 8, and 9 with the mean absolute errors in Table 1 for the tests with atomic and pseudoatomic structures, respectively. Similarly, we found a RMSD of 0.94 and 1.06 Å corresponding a combined displacement along modes 7, 8, and 9 with the sum of the mean and standard deviation of the absolute errors in Table 1 for the tests with atomic and pseudoatomic structures, respectively. Hence, the error range is significantly inferior to the half range of the synthesized motion (6.95 Å) and the pixel size used to create the data (2.2 Å). Figure 7 shows grouping and averaging the subtomograms in the first test type (atomic reference). We compared the obtained subtomogram averages with the corresponding groundtruth volumes (density maps from ground-truth deformed models, without noise and missing wedge, used for synthesizing noisy and CTF-affected tilt-series from which subtomograms were obtained by 3D reconstruction). The visual comparison shows no significant difference between them. The resolutions calculated using Fourier shell correlation-FSC (threshold value of 0.143), after applying onto the subtomogram average a large spherical mask (radius of 28 voxels) with smooth edges (Gaussian smoothing with Gaussian standard deviation of five voxels), are 5.30, 5.35, and 5.74 Å for the three subtomogram averages shown from left to right in Figure 7, respectively. Without masking subtomogram averages, these resolutions are 6.31, 6.10, and 6.43 Å, respectively. As a basis for comparison, we provide the resolutions of three individual subtomograms arbitrarily chosen from the three corresponding subtomogram averaging groups. The resolutions of individual subtomograms with masking (the mask already described) are 10.58, 10.57, and 12.75 Å. The resolutions of the same subtomograms without masking are 12.67, 13.55, and 14.72 Å. These results show that a twice better resolution is obtained after averaging only about 300 individual subtomograms per group (Figure 7).

Synthetic Continuous-Type Conformational Variability
Similarly to the previous experiment, our goal in this experiment is to find a solution for the inverse problem of finding the conformation of the structure in each subtomogram using the combined elastic and rigid-body alignment of a reference model with the subtomograms in the "Continuous" dataset. We used the same two reference models as in the previous experiment to estimate the normal-mode amplitudes: an atomic structure (chain A of PDB:4AKE) and a density map from this atomic structure. Also, as in the previous experiment, we used three modes for both tests (atomic or pseudoatomic modes 7, 8, and 9). Figure 8 presents the estimated amplitudes of modes 7 and 8 (the estimated amplitude of mode 9 is close to 0 and is not shown in the plots). Table 2 shows the mean absolute error and the standard deviation between the estimated and ground-truth normal-mode amplitudes along with the angular and shift distances. In both test cases, a linear relationship between the estimated amplitudes of normal modes 7 and 8 is clearly distinguishable, which is close to the identity relationship between the ground-truth amplitudes taking into account strong noise present in the data. As in the previous experiment, the results show a slightly less accurate alignment in the second test type (pseudoatomic reference) for the same aforementioned reason. We found a RMSD of 0.66 and 0.75 Å corresponding to a combined displacement along modes 7, 8, and 9 with the mean absolute errors in Table 2 for the tests with atomic and pseudoatomic structures, respectively. Similarly, we found a RMSD of 1.09 and 1.21 Å corresponding a combined displacement along modes 7, 8, and 9 with the sum of the mean and standard deviation of the absolute errors in Table 2 for the tests with atomic and pseudoatomic structures, respectively. ) and its normal modes to estimate the conformational parameters (normal-mode amplitudes) and rigid-body parameters (orientation and shift) of the molecules in the input synthetic subtomograms. (B) Use of a pseudoatomic structure (from a simulated density map) and its normal modes to estimate the conformational and rigid-body parameters of the molecules in the input synthetic subtomograms. The goal was the retrieval of the ground-truth relationship between the amplitudes along normal modes 7 and 8 (ideally linear relationship, with equal amplitudes of normal modes 7 and 8); each point in the plot represents a subtomogram and close points represent similar conformations. Note that the dashed ellipses enclose the data points where p-value > 0.001 in Table 2. See the text for more details on this experiment.
TABLE 2 | Mean absolute error and standard deviation between the estimated and ground-truth normal-mode amplitudes along with the angular and shift distances obtained with HEMNMA-3D and "Continuous" synthetic dataset, using an atomic structure (Atomic) and simulated EM map (Volume) as input references. The data points below the p-value of 0.001 were excluded from the error evaluation based on the Mahalanobis distance measure (few data points differing significantly from the remaining observations, which would not be selected in real-case experiments as being too isolated and far from other points). The number of points used for the error computation is shown in the last column of the table (column Samples) and the region where p-value > 0.001 is shown in Figure 8.

Experiment
Hence, the error range is significantly inferior to the half range of the synthesized motion (6.95 Å) and the pixel size used to create the data (2.2 Å). Figure 9 shows grouping and averaging of subtomograms in this experiment, with eight subtomogram averages calculated along the distribution of the points for the first test type (atomic reference). The subtomogram averages show different conformations of adenylate kinase chain A. Note that the noise contained in the individual subtomograms (SNR = 0.01, Figure 5B) was reduced through subtomogram averaging (Figure 9). Additional experiments, for other noise levels in input subtomograms, can be found in Supplementary Figure 1 and Supplementary Table 1.

Experimental Cryo-ET Data: Nucleosomes in situ
We applied our method on a dataset comprising 650 in situ subtomograms of nucleosomes collected from a cell of a Drosophila embryonic brain, whose conformational variability was detected but not fully explored in a previous work (Eltsov et al., 2018). The subtomograms had the size of 64 3 voxels and the voxel size of 4.4 Å 3 . A density map obtained with classical subtomogram averaging (without taking into account conformational heterogeneity) was used as the reference density map for HEMNMA-3D ( Figure 10C). The resolution of this reference density map is around 2 nm (as determined by Fourier Shell Correlation between the reference density map and the density map from the atomic nucleosome structure PDB:3w98 Iwasaki et al., 2013 shown in Figure 10B). For more information on how this reference density map (global initial subtomogram average) was obtained, please see section 1 of the Supplementary Material (Nucleosome data preparation and acquisition). This reference density map was converted into pseudoatoms (1368 pseudoatoms for the pseudoatom radius of 0.5 voxels and the target approximation error of 5%) and normal mode analysis of the obtained reference pseudoatomic structure was performed. The combined elastic and rigid-body alignment was performed using the pseudoatomic structure and a set of its six low-frequency high-collectivity normal modes.
FIGURE 9 | Averages of eight groups (enclosed by ellipses) of subtomograms identified from the output of the 3D-to-3D elastic and rigid-body alignment module of HEMNMA-3D with "Continuous" dataset (shown in Figure 8A), using the atomic structure (chain A of PDB:4AKE) and its normal modes to estimate the conformational parameters (normal-mode amplitudes) and rigid-body parameters (orientation and shift) of the molecules in the input synthetic subtomograms. Subtomograms are represented by points and close points represent similar conformations. The numbers of volumes written above the shown subtomogram averages are the numbers of synthetic subtomograms used for computing these subtomogram averages (the numbers of points enclosed by the corresponding ellipses). On the bottom, the subtomogram averages are shown at 50% transparency along with the corresponding theoretical centroid deformed atomic structure (in red).
We selected modes 7-11 and mode 16, as described above and in our previous works (Jin et al., 2014;Sorzano et al., 2014;Harastani et al., 2020). Modes 7-11 were selected as being the five lowest-frequency non-rigid-body modes with collectivities above 0.5. They include the mode related to gaping motion (mode 7) and the mode related to breathing motion (mode 9), which have been described in previous nucleosome studies (Zlatanova et al., 2009;Eltsov et al., 2018). Mode 16 was selected as being related to a motion that could be potentially interesting but it is more complex (a slightly higher frequency motion), potentially including gaping-and breathinglike motions. The normal-mode amplitudes estimated through the alignment (six normal mode amplitudes per subtomogram) were then projected onto a 2D space of conformations using PCA. The space of conformations is presented in Figure 10.
Recall that each of the points represents a subtomogram, and close points represent similar conformations. By inspecting this conformational space, we identified four densest regions with 70, 183, 74, and 64 points from left to right in Figure 10D.
Following this analysis, we grouped the subtomograms in each of these four regions and averaged them. Before averaging, we filled in the missing-wedge Fourier space region of the individual subtomograms with the corresponding region of the global average computed from all subtomograms (please note that this global average was computed after aligning subtomograms using the rigid-body alignment parameters found along with the 3D-to-3D elastic alignment by HEMNMA-3D, which is a similar density map to the initial global average map shown in Figure 10C as both density maps result from averaging conformational heterogeneous subtomograms). The displacement of the reference pseudoatomic structure (converted into a density map) along two directions D1 and D2 in the space of conformations is shown in Figure 11 and in Supplementary Videos 1, 2. The significant difference between the four group averages ( Figure 10D) and the reference density map ( Figure 10C) as well as the motion observed along the two directions D1 and D2 (Figure 11, Supplementary Videos 1, 2) can be described, mainly in terms of opening the nucleosome by increasing the distance between the two gyres of the DNA superhelix. This result consents the previous findings, observed but not fully explored in a previous study (manual analysis) of the nucleosome conformational variability (Eltsov et al., 2018). The group averages are also compared with the atomic nucleosome structure PDB:3w98 in Supplementary Figure 2.

DISCUSSION AND CONCLUSIONS
This article presents HEMNMA-3D, the first cryo-ET subtomogram data analysis approach to study continuous conformational variability of biomolecular complexes, which maps a set of subtomograms into a space of conformations using a reference model and its normal modes. The conformational space permits (i) grouping (and averaging) subtomograms with similar conformations and revealing hidden conformations and (ii) recording animated displacements of the reference model along the densest regions of the space, along trajectories identified by curve fitting of the data in these regions. These HEMNMA-3D outputs could be valuable to cryo-ET studies of molecular mechanisms involved in conformational changes of complexes in vitro and in situ. HEMNMA-3D is thoroughly tested using synthetic subtomograms and applied to a cryo-ET experimental dataset (nucleosome subtomograms recorded in situ in Drosophila interphase nucleus). It provides promising results coherent with previous findings. An open-source software with a graphical user interface is provided for this method with a C++ backend and a Message Passing Interface parallelization scheme. Both HEMNMA-3D and NMA-based flexible fitting are NMA applications concerned with estimating the molecular conformation in density maps based on an atomic or pseudoatomic reference. However, the purpose of HEMNMA-3D is different from that of the classical NMA-based fitting methods. Classical NMA-based fitting methods aim at determining an atomic representation of an EM density map, which is done by flexible fitting of a given atomic structure into that EM map. The purpose of HEMNMA-3D is to get a low-dimensional representation of the heterogeneity of a given set of EM maps, such as subtomograms. Such low-dimensional representations do not require pushing the limits of the fitting accuracy as in the case of classical flexible fitting of atomic structures into EM maps, which also prevents overfitting. Besides, HEMNMA-3D performs a rigid-body alignment simultaneously with the flexible alignment, which accounts for the missing wedge of the low-SNR subtomograms, whereas classical NMA-based fitting methods typically use high-SNR average consensus EM maps reconstructed from single particle cryo-EM images without missing wedge.
The uniform random distribution of conformations was used in our experiments with synthetic data to show that HEMNMA-3D finds the correct values (within an acceptable error) for any conformation, rotation and translation, and that it does not yield wrong biased solutions (e.g., systematic alignment errors, such as a wrong biased alignment to one or the other conformation and systematic rotational or translational errors). Taking into account the independent analysis of each individual subtomogram, HEMNMA-3D should be able to recover any other conformational distribution, with similar errors to those obtained with the conformational distributions used in this article.
The tests with synthetic data using a pseudoatomic structure from a simulated density map as a reference were used to demonstrate the ability of the method to retrieve the ground truth conformations with a comparable accuracy to the accuracy achieved using an atomic reference despite that (i) the pseudoatomic reference was not used to synthesize the data (the datasets were synthesized using the atomic reference), (ii) pseudoatomic coordinates unlikely coincide with atomic coordinates (pseudoatomic coordinates are obtained through volume-to-pseudoatoms conversion, which does not use any prior information about atoms), and (iii) the method for calculating normal modes is different for a pseudoatomic reference (Cartesian method) and an atomic reference (RTB method). In experimental cases, one could obtain pseudoatoms from a density map of higher resolution than the data itself, if such density map is already available (e.g., an EMDB map obtained by high-resolution single particle cryo-EM reconstruction) or if it can be simulated from an available atomic structure (as in our synthetic data experiments). However, a preferred choice for the reference density map should be a density map from the data itself, which can be obtained by classical subtomogram averaging (without taking into account conformational heterogeneity), as was the case in the nucleosome experiments shown in this article.
The synthetic subtomogram datasets used in this work do not account for crowded molecular environments, radiation dose accumulation during tilting, and differences in CTF defocus over the tilted planes. Nevertheless, the synthesized subtomograms used here were challenging, as containing a small number of voxels and as being obtained by 3D reconstruction from synthetic tilt images affected by strong noise and CTF, which altogether lowered the resolution of the reconstructed subtomograms. The subtomograms were additionally affected by the missing wedge artifacts. Despite such difficult conditions, HEMNMA-3D finds the correct conformational, orientational, and translational parameters, which suggests that it can be useful in practice, and this usefulness was here demonstrated with experimental nucleosome subtomograms.
In experimental cases, such as the nucleosome study shown in this article, the number of used normal modes will always be smaller than the actual number of normal modes (the entire set of modes is too large to be included in our calculations as this would require too long computing times). Therefore, the conformational landscape will always be an approximation of the actual conformational landscape. Small sets of selected potentially relevant modes have been shown to produce good approximation of the actual conformational landscape of the nucleosome studied here by HEMNMA-3D as well as of other complexes studied by HEMNMA. In some cases, a single normal mode could be enough, such as in the case of 70S ribosomes, where the normal mode describing the rotation between the two subunits 30S and 50S was used to analyze conformational and compositional variability of EF-G bound and unbound 70S ribosome cryo-EM dataset (Jin et al., 2014). HEMNMA-3D uses the same software for NMA and the same numerical optimizer (to estimate the amplitudes of normal modes iteratively) as HEMNMA, and it should thus have similar performance as HEMNMA regarding the determination of the conformational landscape using a smaller number of normal modes. HEMNMA-3D software (ConinuousFlex plugin of Scipion V3.0) helps the user decide which normal modes to select, based on the lowest-frequency highest-collectivity criterion and including or not a prior knowledge about the conformational transitions, as is the case with HEMNMA software (Harastani et al., 2020).
HEMNMA-3D can deal with larger sets of subtomograms than those shown in this article. Each subtomogram is analyzed independently of other subtomograms, meaning on a separate computing thread. The same computing time per subtomogram is required for larger and smaller datasets of the same molecular complex and the time required to process the entire dataset varies with the size of the dataset.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Files. HEMNMA-3D software code is publicly available on Github (https://github.com/scipion-em/ scipion-em-continuousflex.git) and is also part of the opensource ContinuousFlex plugin of Scipion V3.0. The nucleosome data used in this article have been deposited in EMPIAR and EMDB databases under the accession codes EMPIAR-10679 and EMD-12699, respectively. Further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
MH and SJ designed the method and the experiments with synthetic data. ME obtained the nucleosome subtomograms and their initial average used as the reference for the method in nucleosome experiments. MH implemented the method and performed all the experiments, under the supervision of SJ, and wrote the first draft of the article manuscript, which was finalized by SJ with input from all authors. ME and AL participated in the results interpretation. All authors designed the experiments with nucleosome data, contributed to the manuscript preparation, read, and approved the final version of the manuscript.