Comprehensive Proteomics Profiling Identifies Patients With Late Gadolinium Enhancement on Cardiac Magnetic Resonance Imaging in the Hypertrophic Cardiomyopathy Population

Introduction In hypertrophic cardiomyopathy (HCM), late gadolinium enhancement (LGE) on cardiac magnetic resonance imaging (CMR) represents myocardial fibrosis and is associated with sudden cardiac death. However, CMR requires particular expertise and is expensive and time-consuming. Therefore, it is important to specify patients with a high pre-test probability of having LGE as the utility of CMR is higher in such cases. The objective was to determine whether plasma proteomics profiling can distinguish patients with and without LGE on CMR in the HCM population. Materials and Methods We performed a multicenter case-control (LGE vs. no LGE) study of 147 patients with HCM. We performed plasma proteomics profiling of 4,979 proteins. Using the 17 most discriminant proteins, we performed logistic regression analysis with elastic net regularization to develop a discrimination model with data from one institution (the training set; n = 111) and tested the discriminative ability in independent samples from the other institution (the test set; n = 36). We calculated the area under the receiver-operating-characteristic curve (AUC), sensitivity, and specificity. Results Overall, 82 of the 147 patients (56%) had LGE on CMR. The AUC of the 17-protein model was 0.83 (95% confidence interval [CI], 0.75–0.90) in the training set and 0.71 in the independent test set for validation (95% CI, 0.54–0.88). The sensitivity of the training model was 0.72 (95% CI, 0.61–0.83) and the specificity was 0.78 (95% CI, 0.66–0.90). The sensitivity was 0.71 (95% CI, 0.49–0.92) and the specificity was 0.74 (95% CI, 0.54–0.93) in the test set. Based on the discrimination model derived from the training set, patients in the test set who had high probability of having LGE had a significantly higher odds of having LGE compared to those who had low probability (odds ratio 29.6; 95% CI, 1.6–948.5; p = 0.03). Conclusions In this multi-center case-control study of patients with HCM, comprehensive proteomics profiling of 4,979 proteins demonstrated a high discriminative ability to distinguish patients with and without LGE. By identifying patients with a high pretest probability of having LGE, the present study serves as the first step to establishing a panel of circulating protein biomarkers to better inform clinical decisions regarding CMR utilization.


INTRODUCTION
Hypertrophic cardiomyopathy (HCM) is among the most common inherited cardiac diseases (1). The combined prevalence of clinically expressed HCM and genetic carrier status is approximately 1 in 200 individuals in the United States (1). Patients with HCM are at risk of sudden cardiac death (SCD), yet identifying the patients at highest risk of this feared outcome remains a challenge (2)(3)(4).
Late gadolinium enhancement (LGE) on cardiac magnetic resonance imaging (CMR) represents myocardial fibrosis (5)(6)(7). In patients with HCM, LGE has been associated with an increased risk of SCD from ventricular arrhythmias (8)(9)(10)(11). The appropriate use of implantable cardioverter-defibrillators (ICD) has reduced disease-specific mortality (12)(13)(14)(15). Therefore, identifying LGE on CMR is critical to reducing HCMspecific mortality by facilitating ICD implantation in high-risk patients. Although CMR allows clinicians to identify LGE and patients who are at higher risk of developing SCD, it is not widely accessible, requires particular expertise for interpretation and is relatively expensive and time-consuming compared to other imaging modalities. Moreover, careful assessment of the risk-benefit balance of CMR is required in patients with claustrophobia or chronic kidney disease and pediatric patients who may require sedation or intubation during CMR (16). As a result, it is clinically indispensable to accurately determine which patients with HCM would have high pre-test probability of having LGE as the utility of CMR is higher in such cases.
Proteomics profiling is a recently developed technology that simultaneously measures the concentrations of thousands of proteins with as little as 65 microliters of blood and has been successfully used to discover biomarkers with high discriminative value in identifying HCM (17). Small studies have suggested that certain biomarkers may be associated with LGE on CMR (18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29). However, a comprehensive analysis with a high-throughput proteomics platform has not yet been performed. Thus, the purpose of this study was to test the hypothesis that plasma proteomics profiling can distinguish patients with and without LGE on CMR and identify signaling pathways associated with LGE in the HCM population. Development of a small panel of circulating protein biomarkers associated with LGE in HCM would help clinicians more precisely identify which patients with HCM should undergo CMR.

Study Design and Sample
We designed a case-control study in the HCM population between cases with LGE and controls without LGE. These patients were enrolled from the HCM programs at Columbia University Irving Medical Center (CUIMC) (New York, NY) and Massachusetts General Hospital (MGH) (Boston, MA) between October 13, 2015 and December 11, 2018 and were consecutively included in this study if a cardiac MRI and plasma proteomics profiling were performed. The diagnosis of HCM was established by echocardiographic evidence of left ventricular (LV) hypertrophy (maximum LV wall thickness ≥ 15 mm) that was out of proportion to the degree of systemic loading conditions and not explained by other diseases capable of producing similar findings (i.e., HCM phenocopies such as Fabry disease or cardiac amyloidosis). For patients with a family history of HCM, LV wall thickness ≥ 13 mm was considered diagnostic of HCM (30). Genetic variants classified as "definitely pathogenic" or "likely pathogenic" were considered a positive genotype whereas "variant of uncertain significance, " "likely benign" and "benign" were considered as negative genotype. We excluded patients with conditions that could lead to LV remodeling that may mimic HCM, such as aortic stenosis, subaortic membrane and exercise induced cardiac remodeling. The training set to derive the discrimination model consisted of patients from MGH. The independent test set for validation was based on patient data from CUIMC. The Mass General Brigham Institutional Review Board and that of CUIMC approved the study protocol and all participants provided written informed consent.

Blood Sample Processing and Proteomics Profiling
Venous blood specimens were drawn at the time of an outpatient clinic visit. Samples were collected in K 2 EDTA-treated tubes and centrifuged for 10 min at 3,100 rpm. The supernatant plasma was aliquot and immediately frozen at −80 degree Celsius (17).
Proteomics profiling was performed using the SomaScan assay (SOMALogic, Inc., Boulder, CO) (17,31,32). This is a tool for proteomics profiling that is highly multiplexed, sensitive, quantitative and reproducible. The assay measures plasma protein concentrations, from femtomolar to micromolar, with an excellent level of reproducibility -the median coefficient of variation is 4.6% (17,31,32). The assay's performance is similar to that of sandwich enzyme linked immunosorbent assay and is especially useful to accurately measure concentrations of low-abundance proteins that conventional liquid chromatography/mass spectrometry cannot detect (17,(31)(32)(33). Other details of the SomaScan assay have been previously published (17,31,32).

Cardiac Imaging
Two-dimensional and Doppler echocardiographic studies were performed with iE33 (Philips Medical Systems, Andover, Massachusetts) to obtain the clinical parameters presented in Table 1 using standard definitions (34,35). Peak LV outflow tract gradient was measured with continuous-wave Doppler.
CMR was ordered at the discretion of the treating physicians. CMR studies were performed on a 1.5-T field strength scanner (HDXt platform, General Electric Healthcare, Milwaukee, Wisconsin) with a dedicated 8-channel cardiac-coil. The imaging protocol included localizer images with cine-balanced steadystate free precession imaging in the short axis, paraseptal long axis, horizontal long axis and 3-chamber views. The myocardial late enhancement sequences were performed in LV short axis and radial long axis 8 to 15 min after the 0.2 mmol/kg injection of intravenous gadopentetate demglumine (Magnevist, Bater HealthCare Pharmaceuticals Inc., Wayne, New Jersey). Short axis late enhancement views were obtained with both 2-dimensional single slice per breath-hold imaging and 3-dimensional volumetric ventricular imaging. Inversion times were determined on an individual basis to null the normal myocardial signal.
The images were reviewed by expert readers using dedicated CMR analysis software (cmr 42 , Circle Cardiovascular Imaging Inc., Calgary, Alberta, Canada). Late myocardial enhancement images were analyzed using 2-dimensional views and coregistered 3-dimensional and long axis views for correlation when indicated (36). The presence and absence of LGE was determined by the reading cardiac radiologist. CMR readers were blinded to the results of plasma proteomics.

Univariable Analysis
We presented continuous variables as mean ± standard deviation if normally distributed and as median [interquartile range] if not normally distributed. To compare clinical characteristics between patients with LGE and without LGE, we used the unpaired Student's t-test for normally distributed continuous variables, the Mann-Whitney-Wilcoxon test for other continuous and ordinal variables (e.g., degree of mitral regurgitation) and the χ 2 test for categorical variables.

Development of a 17-Protein Model to Distinguish Patients With and Without LGE
Logistic regression with elastic net regularization was used with a plan to identify a set of 15 candidate proteins with the greatest potential to discriminate patients with and without LGE (e.g., the 15 most discriminant proteins). This method was chosen due to its advantage in addressing issues when the number of predictors (4,979) is much larger than the number of observations (147) (37,38). The logistic regression with elastic net regularization was trained using a 5-fold cross-validation methods in the training  set (i.e., patients followed at MGH, 111 of 147 patients). To preprocess the variables for the elastic net model, we performed sample-wise normalization using the median of all protein concentrations in each sample followed by protein-wise log transformation and Pareto scaling in each protein concentration. We then created a hyperparameter tuning grid to identify best hidden component and threshold parameter using the R caret and glmnet packages. Ultimately, 3 proteins were tied for the 15 th most discriminative protein, thus leading to the inclusion of a total of 17 proteins in the model. We tested the discriminative ability of the model in the independent test set (i.e., patients followed at CUIMC, 36 of 147 patients) to test the performance of the model developed from the training set. We calculated the area under the receiver-operating-characteristic curve (AUC), sensitivity, specificity, positive predictive value and negative predictive value as indicators of the model's discriminative ability.

Pathway Analysis
We performed pathway analysis to identify canonical pathways that are dysregulated (i.e., either upregulated or downregulated) between patients with and without LGE. We used the 144 most discriminant proteins based on a p-value of < 0.05 with univariable analysis. We subsequently determined the associations among the 144 most discriminant proteins and canonical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) (39). Significance was based on the ratio of the number of proteins within the 144 most discriminant proteins that map to a canonical pathway divided by the number of proteins that belong to the pathway (39). We considered a pathway as positive (i.e., dysregulated) if the false discovery rate (FDR) was <0.05 and there were at least 3 associated proteins (40 Table 1. Patients with LGE had a greater degree of both maximal LV wall and interventricular septal wall thickness, were more likely to have had prior non-sustained ventricular tachycardia (NSVT) and a pathogenic or likely pathogenic genetic variant when compared to patients without LGE. Other demographic and clinical characteristics were similar between the 2 groups.
As shown in Figure 1, the proteomic profiles differed between HCM patients with and without LGE. The discrimination model to distinguish LGE positivity showed high discriminative ability in the training set (AUC 0.83, 95% confidence interval [CI] 0.75-0.90; p < 0.001 compared to the null hypothesis of AUC = 0.5; Figure 2). The sensitivity in the training set was 0.72 (95% CI, 0.61-0.83) and the specificity was 0.78 (95% CI, 0.66-0.90). Furthermore, the discrimination model derived from the training set maintained accuracy when applied to the independent test set for validation (AUC 0.71, 95% CI 0.54-0.88, p = 0.03; Figure 2). The sensitivity was 0.71 (95% CI, 0.49-0.92) and the specificity was 0.74 (95% CI, 0.54-0.93;   derived from the training set, patients in the test set who had high probability of having LGE had significantly higher odds of having LGE compared to those who had low probability (odds ratio 29.6, 95% CI 1.6-948.5; p = 0.03). Figure 3 displays the 17 most discriminant proteins that were included in the discrimination model.
Pathway analysis using the 144 most discriminant proteins demonstrated dysregulation in 15 pathways with FDR < 0.05 ( Table 3). These included pathways that have been recognized to be dysregulated in HCM, such as those involved in inflammation (interleukin-17, cytokine-cytokine receptor interaction) as well as sugar and amino acid metabolism. Moreover, the list of dysregulated pathways contained pathways that were previously unrecognized to be dysregulated in HCM with LGE -e.g., the RIG-I-like receptor signaling pathway and the PI3K-Akt signaling pathway.

Summary of Findings
In the present multi-center case-control study of 82 cases with LGE and 65 controls without LGE on CMR in the HCM population, comprehensive proteomics profiling of 4,979 proteins demonstrated a good discriminative ability to distinguish patients with and without LGE. Furthermore, pathway analysis displayed previously recognized (e.g., inflammation, sugar and amino acid metabolism) and newly recognized (e.g., RIG-I-like receptor signaling, PI3K-Akt signaling) pathways that were dysregulated in patients with LGE.

Results in Context
LGE in HCM represents myocardial fibrosis (5-7) and has been associated with an increased risk of SCD from ventricular arrhythmias, which can be effectively aborted by ICD (8)(9)(10)(11). By subsequently facilitating ICD implantation, identifying high-risk features such as LGE on CMR contributes to reduced diseasespecific mortality (12)(13)(14)(15). However, in certain circumstances, patients are unable to easily undergo CMR with gadolinium enhancement for risk stratification due to accessibility and expertise required to conduct and interpret the test and patientspecific factors such as claustrophobia and chronic kidney disease (41-43). Therefore, it is important to identify patients with HCM who have a high pretest probability of LGE, because pursuing  CMR, despite potential barriers, would be more likely to change clinical management in this HCM subpopulation. The prevalence of LGE in the present study is consistent with prior studies that suggest a pooled prevalence of LGE of approximately 60% (44). Prior studies have used various methods to predict LGE on CMR. A prior study reported that a clinical model including a history of NSVT, reduced LV systolic function and maximal echocardiographic LV wall thickness had a high discriminative ability to predict extensive LGE. Nevertheless, the study's exclusion of patients at high risk for SCD limits the generalizability of the study (29). Thus far, 2 studies attempted to estimate the likelihood and extent of LGE based on electrocardiographic findings (45,46). However, 1 study was limited by a small sample size (42 patients including controls) and young age (7-31 years), making the inferences less applicable to older patients seen in adult cardiology practices (45). The other study used the Selvester QRS score and showed a high degree of accuracy to determine the presence and extent of LGE but was limited by an extensive scoring system and the need for automated software (46).
In addition to these clinical and electrocardiogram-based prediction models, the association between plasma circulating biomarkers such as cardiac troponin, natriuretic peptides and markers of collagen turnover have been studied in the context of LGE in HCM (21,23,24,47). Higher concentrations of cardiac troponin have been associated with LGE (21,22,24,25,29,47). Other studies have shown elevated concentrations of midregional pro-adrenomedullin (27) and matrix metalloproteinase 9 (18) and lower levels of apelin to be associated with LGE (23). Concentrations of serum N-terminal pro-B-type natriuretic peptide and B-type natriuretic peptide have been associated with LGE in some studies but not in others on multivariable analysis (24,47). Taken together, these prior studies collectively support the importance of identifying protein biomarkers of LGE in HCM that are easily obtained in a non-invasive manner (e.g., blood). In this context, the present study serves as the first to apply comprehensive proteomics approach to specify novel circulating biomarkers of LGE in HCM.

Application of Proteomics Profiling to Biomarker Discovery in Cardiovascular Diseases
Proteomics profiling using the SomaScan assay has previously been utilized to identify novel plasma circulating biomarkers associated with cardiovascular diseases and cardiometabolic risk (e.g., the Framingham Heart Study) (48). Furthermore, proteomics profiling has been applied to several cardiovascular conditions including coronary artery disease (48)(49)(50), hypertension (51), heart failure and cardiomyopathies (52)(53)(54)(55)(56). Our group and others have previously demonstrated the role of plasma proteomics profiling in distinguishing HCM from healthy controls (57) and other cardiovascular conditions (17,58). A recent small study has also shown differences in proteomics profiling among patients with HCM before and after surgical myectomy (59). On the whole, these studies support the role of proteomics profiling to identify novel biomarkers in a variety of cardiovascular conditions. Our current study of comprehensive proteomics profiling adds to the literature by demonstrating that the discrimination model using a small number (17 proteins) of plasma biomarkers has good accuracy to detect LGE in the HCM population. The potential clinical utility of such a small panel of plasma circulating biomarkers is further underscored by the availability of rapid and low-cost methods to determine plasma protein concentrations (e.g., sandwich enzyme linked immunosorbent assay).

Signaling Pathways Associated With LGE on CMR in HCM
Prior studies have demonstrated several dysregulated signaling pathways associated with cardiac hypertrophy, including those related to glycolysis (60)(61)(62), the pentose phosphate pathway (60,63), fructose and mannose metabolism (61,62,64,65), tyrosine metabolism (57) and glycine, serine and threonine metabolism (66). These metabolic pathways were also found to be dysregulated in the present study, suggesting that these pathways contribute not only to the development of LV hypertrophy but also to the progression to LV fibrosis in HCM.
The present study also revealed dysregulation of previously unrecognized pathways -e.g., the RIG-I-like receptor signaling, the PI3K-Akt signaling pathway (67-70) -in patients with LGE in the HCM population. The association between the PI3K-Akt pathway and LGE in HCM is an interesting finding because this pathway is upstream to the Ras-MAPK pathway, upregulation of which has been shown to cause HCM-like cardiac changes in RASopathies such as Noonan syndrome (71)(72)(73)(74)(75). Recent proteomics studies have suggested that the Ras-MAPK pathway is upregulated in patients with HCM and is associated with larger left atrial diameters and more severe New York Heart Association functional classes (17,58). While the PI3K-Akt pathway has been previously associated with physiologic cardiac hypertrophy (67)(68)(69), the newly observed association with LGE in HCM has particularly relevant clinical implications because the downstream Ras-MAPK pathway is modifiable. Specifically, the HCM-like cardiac phenotype in RASopathies can be mitigated by Ras-MAPK inhibition (75)(76)(77). Moreover, this pathway has been a drug target in cancer treatment development and such data may inform future applicability to cardiovascular disease (78). Taken together with prior reports, the observation in the present study indicates that the PI3K-Akt pathway and its downstream Ras-MAPK pathway may play a role not only in HCM pathogenesis but also in progression to LV fibrosis. Our findings also suggest that targeting the upstream P13K-Akt pathway may be another worthwhile focus for future drug development as it relates to LV fibrosis in HCM and the availability of inhibitors specific to the pathway further underscores the potential utility of such efforts (78,79).

Strengths of the Present Study
We took multiple measures to minimize false positive and negative findings and to enhance the internal and external validity of the study. First, we derived a proteomics-based discrimination model from the training set of patients followed at MGH and validated its discriminative ability in an independent test set of patients followed at CUIMC. The observation that the proteomics-based model to predict LGE maintained good accuracy in the independent test set underscores the robustness of the model and the external validity of the inferences from the present study. Second, to reduce false positive declarations, we used an FDR threshold of 0.05 to determine the significance of pathway dysregulation. Using FDR restricts the study-wide rate of false positives. An FDR threshold of 0.05 ensures that <1 of 20 pathways that are declared positive are false positives. Moreover, by using pathway analysis, we strengthen the biological plausibility and reduce the risk of false positive discovery given that the proteins are interconnected versus isolated findings using a univariable analysis (40). Third, with respect to false negative findings, our list of differentially regulated proteins and pathways included those known to be dysregulated in cardiac hypertrophy (e.g., the KEGG pathway named "hypertrophic cardiomyopathy") and other pathways known to be involved in HCM pathogenesis (e.g., inflammation, sugar and amino acid metabolism). These pathways serve as "positive controls" in our study and further support the robustness of plasma proteomics to identify signaling pathways that are differentially regulated between patients with and without LGE on CMR in the HCM population. Finally, our study utilized the most comprehensive (∼5000 proteins) proteomics profiling to date (17,80), thus reducing the risk of false negatives (ie: failure to identify important protein biomarkers and pathways).

Potential Limitations
There are several potential limitations to the current study. First, LGE was a binary variable and quantification was not performed to identify the extent of LGE. Second, no association with subsequent clinical outcomes (e.g., SCD) was evaluated. Third, the study sample consisted of patients who were enrolled at tertiary care centers and underwent CMR. Therefore, the inferences may not be generalizable to populations with less severe clinical manifestations or those who did not undergo CMR. However, limiting enrollment to 2 centers enabled strict control and standardization of the protocol which are indispensable components of accurate proteomics profiling and CMR. Fourth, temporality or causality between differentially regulated pathways and LGE in HCM was not assessed. Fifth, not all patients with HCM underwent genetic testing. Sixth, myocardial samples were not available and as such, direct analysis with tissue specimens could not be performed. Seventh, the number of patients included in the test set was relatively small and the negative predictive value was modest, and therefore negative prediction by proteomics profiling did not completely rule out LGE on CMR. Nevertheless, the current analysis serves as a proof-of-concept study for future investigations to further improve the predictive ability. And finally, although the sample size was larger than most prior studies and we used the aforementioned methods to reduce the chance of false positive discovery, the possibility of false positive discovery remains.

CONCLUSIONS
The present study demonstrated, for the first time, the role of comprehensive proteomics profiling to distinguish patients with and without LGE on CMR in the HCM population and revealed signaling pathways associated with LGE. By identifying patients with a high pretest probability of having LGE, the present study would serve as the first step to establishing a panel of circulating protein biomarkers to better inform clinical decisions between patients and physicians regarding CMR utilization when the risk-benefit calculation of CMR is balanced. Our work also exhibited that multiple pathways, both known and novel, were upregulated in patients with LGE. These findings should facilitate further investigations into the underlying molecular mechanisms through which genetic mutations lead to the development of LGE in patients with HCM and pathways that may be targeted by future pharmacotherapies.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Columbia University Irving Medical Center Institutional Review Board and the Mass General Brigham Institutional Review Board. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
BL, YZ, KH, and YS contributed to conception and design, analysis, and interpretation of the data, as well as manuscript preparation and revision. MM, AT-R, MF, and MR contributed