Reconstruction of Three-Dimensional Conformations of Bacterial ClpB from High-Speed Atomic-Force-Microscopy Images

ClpB belongs to the cellular disaggretase machinery involved in rescuing misfolded or aggregated proteins during heat or other cellular shocks. The function of this protein relies on the interconversion between different conformations in its native condition. A recent high-speed-atomic-force-microscopy (HS-AFM) experiment on ClpB from Thermus thermophilus shows four predominant conformational classes, namely, open, closed, spiral, and half-spiral. Analyses of AFM images provide only partial structural information regarding the molecular surface, and thus computational modeling of three-dimensional (3D) structures of these conformations should help interpret dynamical events related to ClpB functions. In this study, we reconstruct 3D models of ClpB from HS-AFM images in different conformational classes. We have applied our recently developed computational method based on a low-resolution representation of 3D structure using a Gaussian mixture model, combined with a Monte-Carlo sampling algorithm to optimize the agreement with target AFM images. After conformational sampling, we obtained models that reflect conformational variety embedded within the AFM images. From these reconstructed 3D models, we described, in terms of relative domain arrangement, the different types of ClpB oligomeric conformations observed by HS-AFM experiments. In particular, we highlighted the slippage of the monomeric components around the seam. This study demonstrates that such details of information, necessary for annotating the different conformational states involved in the ClpB function, can be obtained by combining HS-AFM images, even with limited resolution, and computational modeling.


Preprocessing of AFM images:
Each of the AFM images (Uchihashi et al., 2018) was first processed to define a region of interest by using a manually defined binary mask. The manual definition of binary mask is done by carefully observing an AFM image and selecting a region of ClpB structure near the center excluding adjacent molecules in an image editing software (Schneider et al., 2012). This region is termed as region of interest (ROI). The ROI includes the height information of a molecule which is subjected to modelling in our study. We calculate minimum height data inside ROI and replace height data outside ROI by the minimum value. This background corrected AFM image data is saved as 32-bit float in tiff format. The mask for 9 AFM images and corresponding background corrected AFM images are shown in Supplementary  2. Preprocessing initial models before sampling: The preparation of initial models is discussed in main text. Here, we add further information about A) superposition of monomer of ClpB to the chains in hexameric ClpB, B) coarse graining monomer chain by Gaussian mixture model, C) pre-orientation of hexameric models according to AFM images and D) setting of a clipping-plane for image comparison. A. Superposition of monomer of ClpB to the chains in hexameric ClpB:

Supplementary
The monomer of ClpB from T. thermophilus (1QVR, chain A) (Lee et al., 2003) is sequentially the same construct as in the AFM experiment, only the difference is that in the AFM experiment the N-terminal region is deleted (Uchihashi et al., 2018). We prepared hexameric models of ClpB using T. thermophilus primary structure, based on Yeast Hsp104 (5KNE, sequence identity to 1QVR, chain A is 44.8% (Madeira et al., 2019)) (Yokom et al., 2016) or E. coli ClpB (5OG1, sequence identity to 1QVR, chain A is 56.1%) (Deville et al., 2017). Chain A of 1QVR is superimposed on the individual chains of 5KNE and 5OG1 (chains A, B, C, D, E, and F) as rigid-body superposition by Chimera 'matchmaker' (Pettersen et al., 2004;Meng et al., 2006). In some of the chains in hexamer (5KNE and 5OG1) the long coiled-coil middle domain is only partially resolved or not resolved, and in other chains, the coiled-coil domains are oriented very differently than that in 1QVR, chain A. Moreover, in some cases, the N-terminal region (which is deleted in further analysis) is oriented differently. Therefore, in Chimera, we have used atom pairs pruning to locate more conserved regions. The number of pruned atom pairs and root mean square deviation between 1QVR chain A and chains in hexamer are shown in Supplementary Table 2. Moreover, because of the coarse-graining of our initial model, finer changes in atomic coordinate between ClpB monomer and hexamers are insignificant given the nanometer resolution of AFM image data. B. Coarse graining monomer chain by Gaussian mixture model: In order to obtain a coarse-grain representation of the hexameric ClpB, we fitted a 3-kernel Gaussian mixture model to the atomic coordinates of each monomer (after deleting N-terminal domain) superimposed to the chains A, B, C, D, E and F. The 3-kernel Gaussian mixture model is obtained from the software gmconvert with argument '-ng 3' (Kawabata, 2008) which fits 3 Gaussian kernels to the 3D coordinates by expectation-maximization algorithm. The resulting model can be represented by, Where, r represents position vector of an atom, µ and S are center of mixture model and overall covariance matrix. The ! is the weight of the i -th kernel, and , Σ ! are center and covariance matrix of the i -th kernel. After all the chains are converted to 3-kernel representation, they are accumulated in a single model and weights are normalized over all the kernels giving rise to the 18-kernel representation of a hexamer. The monomer chain resulted from 1QVR chain A (where loops are already modelled, see main text) is shown in Supplementary Figure 1 (Webb and Sali, 2014), and its 3-kernel representation is shown in main text Figure 2I. In the main text Figure 2I, we represent those 3-kernels by head, body and tail as annotated. The head kernel includes mostly the long coiled-coil middle domain, and tail kernel includes mostly C-terminal domain. The body kernel includes top AAA+ domain (AAA1+) together with the linker region connecting to the bottom AAA+ domains (AAA2+). The AAA2+ domain is at the junction of body and tail kernels. The two Walker motifs are also included in the body kernel. The orientation of the part included in the body kernel guides the hexameric architecture.
C. Pre-orientation of hexameric models according to AFM images: In Figure 2 of the main text, a seam can be observed from the top view of hexameric models, and in Figure 1 we see such a seam can also be observed in the AFM images (Uchihashi et al., 2018). However, we observed that our sampling algorithm cannot generate such a seam by randomly moving the kernels in all possible ways. This is due to the sampling algorithm has no special moves to match the seam globally. Therefore, before sampling, we rotate our initial models around the zaxis so that seam in the hexameric model (observed from top view) matches the seam in the respective target AFM image. The seams can be clearly seen for AFM image 7 (closed class) and the top view of the S model, so initially we rotate S model (also an asymmetric structure in 5KNE) to fit its seam to the AFM image by manual observation. Then we fitted R model to S by applying rigid-body transformation based on kernel centers. We save these rotated initial models as models ready for sampling (Supplementary Table 3). Therefore, for most of the AFM images (class open, closed and spiral) z-rotation of Rand S are identical. This scheme however did not work well for half-spiral cases, and for them z-rotation for R and S models are different (instead of fitting R model to S, R also rotated on the basis of visual inspection).
In addition, the initial model is shifted along z-axis to match the average heights in the masked AFM image. The average height along z-axis for initial models after translation are given in Supplementary Table 3. Note that, this is done before sampling for each AFM image.
D. Setting of clipping-plane for image comparison: In the last pre-processing step of initial model prior to sampling, we set a clipping plane for pseudo-AFM image obtained from the models serving as a background height. This step is required because we see that range of heights in masked AFM image (29.5Å to 39.5Å (or 38.7Å excluding half-spiral)) is much narrower than height range of 3D models (61.8Å). After translating initial model along the z-axis by average height (see above), some portion of 3D models were below 0Å. The clipping plane cannot be deduced based on masked AFM image, as the range of height in the unmasked AFM image reflects the condition of AFM experiments and the depth at which the AFM probe can sample. Thus, we first calculate change in maximum height in the target AFM image (dh) due to masking as the maximum height in an unmasked AFM image minus maximum height in the masked AFM image. Then we define clipping plane (hplane) from minimum height in the masked AFM image (hmin) and dh by hplane = hmin -dh, and we assume AFM probe could gauge this height in similar experimental condition. The clipping plane height is also given in Supplementary Table 3. Supplementary The flowchart for MC algorithm is given in Supplementary Figure 2. We use a two-step MC algorithm, first on the basis of restraints of candidate model, and second on the basis of 2D image comparison technique. There are three types of parameters used to control MC process -A) parameters for restraint calculation, B) parameters related to MC move C) parameters for pseudo-AFM image generation. The choice of parameters for restraint calculation is discussed in detail in the next section.
In our MC algorithm, we use a few parameters to control width of the random move. In all the sampling, we randomly update one out of 18 kernels. To update a given kernel, we apply random translation to the kernel center and random rotation along one of the randomly chosen axes of the kernel. The width of translation is 0.5Å maximum and the width of the rotation is 0.5 degrees maximum. We proceed 1.5x10 6 steps of proposal in one run of MC process. Among these proposal steps, we apply random rigid-body translation to the 18-kernel representation in first 1000 steps. In the following proposals, kernels move independently of each other. In every 10 steps of acceptance, we saved a candidate model for subsequent analysis. The computational time required for one MC step is around 200 milliseconds and therefore for 1.5x10 6 steps for a run takes around 83 hours. However, this depends on I/O during the process, including output of model in hdf5-file, image (png-formatted) files for reference purpose, and standard output. By minimizing I/O, the time per step can be decreased to 120 milliseconds.
To generate pseudo-AFM image from coarse-grain representation of an 18-kernel model, we regard that the kernels are cut at a probability threshold giving rise to a kernel isosurface, and a sharp AFM probe interacts with these isosurface. The integration of the isosurface represents the total enclosed volume of the molecule in this representation. The shape of the kernel is included in the eigenvalues (l1,l2, and l3) of the covariance matrix (S) of the kernel and orientation of the kernel around its center (rc) in represented by the eigenvectors (e1, e2 and e3) of the S. The radius of the kernel (R1, R2 or R3) can be set equal to its radius of gyration if & = /5 & . To generate AFM image, the integrated isosurfaces of all the kernels is rasterized to a XY-grid emulating an AFM probe coming from positive Z-axis. The method is described previously in more detail (Dasgupta et al., 2020). In the current experimental data, the lateral resolution AFM images along X and Y directions are 4.545Å and 5.477Å (Uchihashi et al., 2018). To estimate the kernel surface within this area we split a pixel into 9 grids and evaluate height along z-axis based on the kernel parameters in each grid. The representative height for the pixel is taken to be maximum height over 9 grid points.
After generation of the pseudo-AFM image from a candidate model, we need to compare it against the target AFM image, which in the current study is done by structural similarity metric (SSIM) (Wang et al., 2004;Wang and Bovik, 2009), a popular 2D image comparison metric. The SSIM between two images A and B is given by, During the MC sampling, the temperature parameter is updated by following an annealing schedule given in Supplementary Figure 3A, where the temperature changes from 0.2 to 1.0. Here, we regard that 0.2 is the equilibrium sampling temperature, and 1.0 is a temperature where our models may undergo large changes, which are obtained by trial and error. In the MC algorithm we have two temperature parameters, but we have observed that if we scale distance scores obtained from the above SSIM (ranged between 0 and 1) by a multiplicative factor of 10000, the change in SSIM in our study is comparable to the change in restraint score in a typical move.
We have analyzed how two steps in the MC optimization process (Supplementary Figure 2) are accepting or rejecting candidate models taking one trajectory as an example. In Supplementary Figure 3B, we plot histograms of change in restraint scores (from the first MC step in our two-step optimization) and change in SSIM (from the second MC step) only when the change is positive (an upward move). Such upward moves are accepted or rejected by restraint calculation or image similarity. The majority of the proposals are rejected in the first step (around 57% of all proposals) based on restraint score and overall, around 37% of the proposals are accepted. We also observed that the proposals that are rejected in the first step are mostly due to harmonic restraint (around 78%), and the rest are due to connectivity restraint. For the accepted steps, we further analyzed that restraint scores are increasing and decreasing with nearly equal probability. Interestingly in nearly 64% of the accepted steps, the similarity to target AFM image increased (downward move in the second step).

Sampling restraints:
One of the key ingredients in our MC algorithm is the restraint based on 3D structure of a proposed model. To define a restraint between the kernels of Gaussian mixture model, we used correlation coefficient between a pair of kernels. The correlation coefficient between two 3D-kernels, i and j, is given by, where, !" indicates overlap between kernels i and j, given by, where, Σ !" = Σ ! + Σ " . When two kernels of identical axes lengths are completely eclipsed then the correlation coefficient is 1. The restraint based on the correlation coefficient is a harmonic function, where repulsion and attraction are treated independently. For this restraint, we also need to define an equilibrium value between two kernels, !&,2 which we call native value. If at an instance, the correlation coefficient !& is greater than !&,2 , then kernels are closing in, which we try to avoid due to repulsion, and if at an instance the correlation coefficient !& is smaller than !&,2 , then kernels are going farther apart, which we try to avoid using attraction. A simplified harmonic score to those instances for M number of kernel pairs that treats attraction and repulsion in similar way can be written as, where !& is corresponding force-constant. However, in our case we model attraction and repulsion differently. So, in the current study we define our harmonic score as follows, where, !& 3 is force-constant related to attraction and !& 4 is force-constant related to repulsion. Next, we parameterize the force-constants in the following way. We define a relative absolute error in !" from !",2 as, which can also be written in the form, and then by combining equation (6), takes the form where, !&,3 is relative error related to attraction and !&,4 is relative error related to repulsion. Next, we assume the presence of a threshold value of !& beyond which error changes drastically so that those instances are less probable. The threshold !& is given by !&,; and we arbitrarily assign it a Boltzmann probability of 0.2, so for the case of repulsion we get, where, !&,4,; is the relative error at the threshold repulsion score (relative threshold error), which is parameterizing !& 4 . Similarly, we define !&,3,; is the relative error at the threshold attraction score, which is parameterizing !& 3 . Before starting our sampling, we compute the force-constant from user-defined values of !&,4,; and !&,3,; , and then score at any instance is calculated by equation (6) during sampling.
!&,4,; and !&,3,; for different types of kernel pairs are defined as follows. Here, we are mostly concerned with how kernels from neighboring chains interact with each other and how kernels within a monomer interact. Therefore, we broadly classify kernel pairs in the a) intrachain kernel pairs, b) interchain kernel pairs. Here, intrachain kernel pairs are among head, body, and tail kernels within a monomer. For the intrachain kernel pairs, the relative threshold error was set to be small so that three domains within a chain remain connected. However, with threshold errors smaller than used in the current study, the monomers become more rigid such that relative orientations between head and body kernels are fixed. We used a relative threshold error for neighboring body kernels three times higher, i.e., weaker, than for intrachain pairs. When larger values were tested, random overlaps between distant kernels occurred and the overall hexametric structure did not remain intact, such that the pore in the middle vanishes. The head kernels that are placed on the coiled-coil domain of the chains should be least restricted as they are highly flexible, and thus the head kernels between neighboring chains interact weakly. For interchain kernel pairs at the seam, we assigned a large threshold error for the attractive interactions to allow the opening/closing conformational transitions that are evident from the experimental data. Such parameter tuning was performed mostly by using AFM image 7 as it clearly shows a seam (Supplementary Table 4). For halfspiral cases, we assumed another seam (between chains C and D) opposite to the original seam between chains A and F. For these cases, we use identical parameters between C and D to that between A and F.
The native correlation coefficient, !&,2 , is defined as follows. The bacterial ClpB structure in the AFM experimental condition is undergoing a large conformational transition (Uchihashi et al., 2018) and many AFM images were annotated as spiral and closed. For other organisms and different experimental conditions, atomically detailed symmetric closed-like structure and asymmetric spiral structure (Yokom et al., 2016;Deville et al., 2017) have been determined. Those two atomic structures are the two extreme conformations known so far with and without symmetry and we assumed that the conformations observed by AFM experiments are within the conformational space between these two extreme conformations. Therefore, we prepare two low-resolution initial models based on Gaussian mixture model based on those two extreme cases and the average correlation coefficient between equivalent kernel pairs is used as native correlation coefficients.
We also used a restraint score based on overlapping region between a pair of kernels to define a connected region. This restraint is used for the kernel pairs within a monomer and is used to keep the overall shape of the monomer and to ensure that two kernels do not slip along each other. Such connections between the kernels are shown in Fig. 2I where the head and body kernels are connected in a monomer near the AAA1+ domain part of the body kernel, and the body and tail kernels are connected near the AAA2+ part of the body kernel. For the connectivity restraint we initially define a pair of phantom particles (pi and pj) each on top of the other in the overlapping region of a kernel pair i,j (Supplementary Figure 5) affixed to the respective (initially the distance between pi and pj, dij = 0). During the random move of one of the kernels the affixed phantom particle moves with the kernel, therefore the phantom particles are separated. We added a condition that the distance between these two phantom particles dij < 5Å and always pi and pj remain in the overlapping region. Otherwise, such a move is rejected in the first step of MC.
Supplementary Figure 4: Initial positions of phantom particles for connectivity restraint within a monomer between body and head kernels and between body and tail kernels. .99 $ # Except for the neighboring kernels at the seam $ 9.99 signifies a large value for the relative threshold error, effectively any change in attraction do not add significant contribution to the score. Supplementary Table 5: Clustering of selected candidate models at the end of sampling that started from R and S initial models. The method for clustering is discussed in the main text. For each type, in the left the clustering is shown in 2-dimentional projected space along principal components axes, and the converged representative model is shown as a cyan triangle point and the cluster medians are shown as black circle. In the right the cluster sizes are shown in a pie-chart. The cluster that includes converged representative model is exploded. The cluster names are given in the legend in the left and annotated in the pie-chart. Supplementary Figure 5. Comparison of converged representative model for each AFM image derived from R or S models to the respective cluster medians. In A) comparison to the largest cluster median, and in B) comparison to the cluster median to which representative model belongs.

Supplementary
In A) for most of the cases the similarity by CC3Ddensity > 0.9, except of AFM image 10 (half-spiral) and AFM image 2 (open) from S. Visual inspection confirms such differences are due to orientation of flexible coiled-coil domain or C-terminal domain, which is only partially observed or not observed in AFM images.
Supplementary Figure 6: Orientations of head kernels in converged models from R (Fig. 4, main text, leftmost column, rotated here around x-axis by 45 o ). The annotations '1', '2' and '3' are corresponding AFM images. 'A' and 'F' are chains from which head kernels originated. In AFM image 1, both the head kernels from A and F are facing downward, while in AFM image 3, a clear hexameric arrangement is seen.
Supplementary Figure 7: Correlation coefficient (CC3Ddensity) between the converged models (final candidate solutions) for each image and corresponding initial model R and S. The horizontal axis includes AFM image indices 1 to 12 and in vertical axis the CC3Ddensity are shown for R and S separately. The vertical dotted-line separates different classes of AFM images.