Deep computational phenotyping of genomic variants impacting the SET domain of KMT2C reveal molecular mechanisms for their dysfunction

Introduction: Kleefstra Syndrome type 2 (KLEFS-2) is a genetic, neurodevelopmental disorder characterized by intellectual disability, infantile hypotonia, severe expressive language delay, and characteristic facial appearance, with a spectrum of other distinct clinical manifestations. Pathogenic mutations in the epigenetic modifier type 2 lysine methyltransferase KMT2C have been identified to be causative in KLEFS-2 individuals. Methods: This work reports a translational genomic study that applies a multidimensional computational approach for deep variant phenotyping, combining conventional genomic analyses, advanced protein bioinformatics, computational biophysics, biochemistry, and biostatistics-based modeling. We use standard variant annotation, paralog annotation analyses, molecular mechanics, and molecular dynamics simulations to evaluate damaging scores and provide potential mechanisms underlying KMT2C variant dysfunction. Results: We integrated data derived from the structure and dynamics of KMT2C to classify variants into SV (Structural Variant), DV (Dynamic Variant), SDV (Structural and Dynamic Variant), and VUS (Variant of Uncertain Significance). When compared with controls, these variants show values reflecting alterations in molecular fitness in both structure and dynamics. Discussion: We demonstrate that our 3D models for KMT2C variants suggest distinct mechanisms that lead to their imbalance and are not predictable from sequence alone. Thus, the missense variants studied here cause destabilizing effects on KMT2C function by different biophysical and biochemical mechanisms which we adeptly describe. This new knowledge extends our understanding of how variations in the KMT2C gene cause the dysfunction of its methyltransferase enzyme product, thereby bearing significant biomedical relevance for carriers of KLEFS2-associated genomic mutations.


Supplementary Material
Deep Computational Phenotyping of Genomic Variants Impacting the SET Domain of KMT2C

Figure S1 .
Figure S1.Lollipop plot of reported KMT2C and KMT2D variants in the literature and twodimensional representation of their locations in the domain organization of the proteins (x-axis).FYRN/FYRC = phenylalanine and tyrosine-rich region (N-and C-terminal); HMG, high mobility group; N-SET = N-terminal of SET; PHD, plant homeodomain; Post-SET, C-terminal of SET; SET = Su(var)3-9.

Figure S2 .
Figure S2.In silico Alanine mutagenesis scanning.(A) Heatmap of the mutation energy values projected in the WT:KMT2C complex model.Cartoon representation model of KMT2C is colored by mutation energy values.The cofactor product SAH and the substrate H3K4 are shown in stick model with carbon atoms colored in blue and yellow, respectively, oxygen in red, nitrogen in blue, and sulfur in yellow.The Zn 2+ ion is shown as an orange sphere.(B) Contribution of the side chains of the residues to KMT2C function by mutating by the smaller alanine residue.For each KMT2C variant, the simulation calculates the difference between the folding free energy (ΔΔGfold-

Figure S4 .
Figure S4.The MD simulation of 10 replicates of the WT:KMT2C complex.(A) The potential energy plot of WT:KMT2C model during MD simulation.The protein stabilizes at approximately 4 ns, with repetitive movements.To further analyze the time-dependent interactions, 250 conformations were extracted from the last 2.5 ns of each simulation.(B) The RMSD plot calculated for the WT-KMT2C complex explains the equilibration nature of the protein throughout the 10 ns.

Figure S5 .
Figure S5.Superposition of time-dependent RMSF plots of the averaged values of 10 replicates of WT:KMT2C (red) and each variant (gray) per individual residues during the MD production stage.

Figure S6 .
Figure S6.Multiple sequence alignment (MSA) of the SET domain of KMT2 family proteins.Positions identical between the orthologs are highlighted with a red background, and similar residues are written with bold black characters and boxed in yellow.Alignment was performed using the MultAlin server (Sievers, Wilm et al. 2011) and displayed using Esprit 3.0 Server (Robert and Gouet 2014).Above the MSA is the schematic-colored representation of subdomains of the SET domain.The residues binding to the cofactor, substrate, and salt bridge are indicated with black arrows and brackets, and the structural pseudoknot as pink brackets.Variant E4792D is indicated in red.

TABLE S1 .
List of pathogenicity prediction algorithms based on chromosome and position on genome version GRCh38

Table S2 .
WT:KMT2C and variant scores based on molecular dynamics simulations.

Table S3 .
Scores and classification of KMT2C variants in the SET domain based on dynamics data