Structural insights into the C-terminus of the histone-lysine N-methyltransferase NSD3 by small-angle X-ray scattering

NSD3 is a member of six H3K36-specific histone lysine methyltransferases in metazoans. Its overexpression or mutation is implicated in developmental defects and oncogenesis. Aside from the well-characterized catalytic SET domain, NSD3 has multiple clinically relevant potential chromatin-binding motifs, such as the proline–tryptophan–tryptophan–proline (PWWP), the plant homeodomain (PHD), and the adjacent Cys-His-rich domain located at the C-terminus. The crystal structure of the individual domains is available, and this structural knowledge has allowed the designing of potential inhibitors, but the intrinsic flexibility of larger constructs has hindered the characterization of mutual domain conformations. Here, we report the first structural characterization of the NSD3 C-terminal region comprising the PWWP2, SET, and PHD4 domains, which has been achieved at a low resolution in solution by small-angle X-ray scattering (SAXS) data on two multiple-domain NSD3 constructs complemented with size-exclusion chromatography and advanced computational modeling. Structural models predicted by machine learning have been validated in direct space, by comparison with the SAXS-derived molecular envelope, and in reciprocal space, by reproducing the experimental SAXS profile. Selected models have been refined by SAXS-restrained molecular dynamics. This study shows how SAXS data can be used with advanced computational modeling techniques to achieve a detailed structural characterization and sheds light on how NSD3 domains are interconnected in the C-terminus.


Comparative analysis of the structural models
A structural comparison of the homology models generated and optimized against SAXS data has been carried out by considering unidimensional profiles of Protein Angular Value (PAV), a geometrical descriptor representing the orientation of the individual profiles along the backbone.
They are shown shifted one respect to the other and colored according to the method used for their generation in Figure S9 A,C.It can be noted that the α-helix of the SET domain can be easily recognized in the PAV profiles of all structural models.Indeed, this region, which ranges between 1181 and 1195, shows very small fluctuations of PAV around an average value of 100°, which is typical for backbone dihedral angles of residues forming a α-helix.Just as easily, the linker connecting PWWP and SET domains (1026-1056) (blue) can be recognized as an α-helix in most of the models, while the linker between SET and PHD4 domains (1290-1310) forms an α-helix only in the AlphaFold, ColabFold and Phyre2 models, while it forms a loop in the I-Tasser and RaptorX models.Application of Principal Component Analysis (PCA) allows to trace the structural differences among models as distances among points representative of the PAV profiles in the score plot of the first two principal components (Figure S9 C,D).In Figure S9 A This suggests that the structural differences introduced by the MDFF refinement are not covered by using homology modeling procedures, so that the resulting model is intrinsically different from all those generated in this study.The effect of the application of MDFF on the full-length models in best agreement with SAXS data i.e.AlphaFold, AlphaFold mix and RaptorX model 1 (RaptorX 1) has been assessed both in the direct space, measured by the normalized structural discrepancy between model and ab initio molecular envelope, and in the reciprocal space, measured by the χ 2 of the fit of scattering experimental data with that calculated by the model.Results, reported in Figure S10, show as a general trend that MDFF optimization improves the agreement of the models with SAXS data both in direct and reciprocal space.However, while for AlphaFold and AlphaFold mix the improvement occurs in the same way for NSD3-PWWP-SET and NSD3-SET-PHD4 regions, for RaptorX 1 it only occurs for NSD3-PWWP-SET, while the NSD3-SET-PHD4 region worsens greatly.As a result, the model with best agreement with SAXS data is the AlphaFold mix refined by MDFF.

Figure S1 .
Figure S1.Similarity test among SEC-SAXS datasets by using the reduced χ 2 statistic for NSD3-PWWP2-SET (A) and NSD3-SET-PHD4 (B).The right-side bar shows colors related to p-value of the test.

Figure S2 .Figure S3 .Figure S4 .
Figure S2.Results of the re-binning procedure applied to dataset 2_3.Values of qmax as a function of the number of points of the SAXS profile.The number of points has been reduced in linear (circles) or logarithmic (squares) way along the q axis.

Figure S6 .Figure S7 .Figure S8 .
Figure S6.Results of the MDFF optimization applied to the AlphaFold mix model.The crosscorrelation coefficient (CORR) between model and the experimental molecular envelope (A), the root mean square deviation (RMSD) of Cα atoms with respect to the initial structure (B), the radius of gyration of the model (Rg) and the χ 2 between experimental and calculated SAXS profiles for NSD3-PWWP2-SET (C) and NSD3-SET-PHD4 (D) constructs are reported as a function of the simulation time.The structural model obtained after 453 ps was chosen as that in best agreement with SAXS data.
,B all the models are considered for comparison, showing that those generated by I-Tasser and RaptorX differ substantially from the others, especially in the linker between SET and PHD4.Their representative points are clearly separated in Figure S9 B along the first principal component (PC1) for I-Tasser and along the second principal component (PC2) for RaptorX.In Figure S9 C,D the I-Tasser and RaptorX models are excluded from the comparison, in order to focus on the fine differences among remaining models.The ColabFold models form a well-defined cluster in Figure S9 D, together with the AlphaFold model and the related AlphaFold mix model.The Phyre2 model is separated along PC1, while the AlphaFold and AlphaFold mix models optimized by MDFF are separated along PC2.

Figure S9 .
Figure S9.Comparative analysis of the Protein Angular Value (PAV) profiles, representing the conformations of the structural models generated in this study.(A,C) PAV profiles calculated from models generated in this study.A sketch of the domains of the constructs is given on the top, where predicted α-helices are shown by full boxes.(B,D) Score plot obtained by principal component analysis applied to PAV profiles, where each point represents a PAV profile.The data variance explained by the first (PC1) and second (PC2) principal components is shown on the axes.85% confidence level ellipses indicate the results of a hierarchical clustering procedure applied to representative points in the score plot.In C and D the RaptorX and I-Tasser models have been excluded from the comparative analysis.

Figure S10 .
Figure S10.Assessment of the agreement of the best structural models generated by AlphaFold (AlphaFold and AlphaFold mix), RaptorX (model 1) and CORAL against SAXS data obtained by using NSD3-PWWP2-SET (blue points) and NSD3-SET-PHD4 (orange points) samples.Quality parameters are the normalized structural discrepancy (NSD) with respect to the ab initio SAXS molecular envelope (horizontal axis) and the χ 2 of the fit with SAXS profiles (vertical axis).Point labels refer to the original homology models and those optimized by MDFF.

Table S3 .
Standard deviation of the Rg values for frames selected under the p2 peaks (<Rg>).