Single-Cell RNA Sequencing and Quantitative Proteomics Analysis Elucidate Marker Genes and Molecular Mechanisms in Hypoplastic Left Heart Patients With Heart Failure

Objective To probe markers and molecular mechanisms of the hypoplastic left heart (HLH) by single-cell RNA sequencing (scRNA-seq) and quantitative proteomics analysis. Methods Following data preprocessing, scRNA-seq data of pluripotent stem cell (iPSC)-derived cardiomyocytes from one HLH patient and one control were analyzed by the Seurat package in R. Cell clusters were characterized, which was followed by a pseudotime analysis. Markers in the pseudotime analysis were utilized for functional enrichment analysis. Quantitative proteomics analysis was based on peripheral blood samples from HLH patients without heart failure (HLH-NHF), HLH patients with heart failure (HLH-HF), and healthy controls. Hub genes were identified by the intersection of pseudotime markers and differentially expressed proteins (DE-proteins), which were validated in the GSE77798 dataset, RT-qPCR, and western blot. Results Cardiomyocytes derived from iPSCs were clustered into mesenchymal stem cells, myocardium, and fibroblast cells. Pseudotime analysis revealed their differentiation trajectory. Markers in the three pseudotime clusters were significantly associated with distinct biological processes and pathways. Finally, three hub genes (MMP2, B2M, and COL5A1) were identified, which were highly expressed in the left (LV) and right (RV) ventricles of HLH patients compared with controls. Furthermore, higher expression levels were detected in HLH patients with or without HF than in controls. Conclusion Our findings elucidate marker genes and molecular mechanisms of HLH, deepening the understanding of the pathogenesis of HLH.


INTRODUCTION
Hypoplastic left heart (HLH) is a rare and complex form of congenital heart disease (CHD) including a spectrum of malformations that are characterized by the dysplasia of the leftsided cardiac structures, and thus resulting in the obstruction of the flow of blood into the systemic circulation (Tchervenkov et al., 2006). Epidemiological Investigation of HLH defect that accounts for 2-9% of all infants born with CHD (Hoffman and Kaplan, 2006;Bittle et al., 2018). The prevalence of HLH is approximately 2-3 cases per 10,000 live births (Gordon et al., 2008;Reller et al., 2008;Fixler et al., 2010). HLH accounts for 2-4% of congenital heart malformation-related deaths among newborns (Alphonso et al., 2020). Without intervention, this disease is highly fatal and remains clinically challenging, with approximately 25% of deaths in the first few weeks of life (Egbe et al., 2015). Despite recent advances in multistage reconstructive Norwood or Norwood-like operation, transferring the systemic circulation based on right ventricle (RV) physiology, approximately one-third of HLH patients die from terminal RV failure by the age of 25 (Bittle et al., 2018). Currently, there are two main treatments: heart transplantation and staged functional univentricular palliation (Barron et al., 2009). Heart transplantation is the sole treatment strategy for end-stage HF patients, accompanied by long-term immunosuppression and graft dysfunction (Everitt et al., 2012). The morbidity and mortality of HLH patients are determined by the function of the tricuspid valve and RV (Carrillo et al., 2020). Despite advances in the 3-stage surgery and medication approach, HLH still has a high mortality.
Human genetic studies suggest that HLH has genetic heterogeneity and polygenicity in etiology (Yagi et al., 2018). Genetic linkage research has demonstrated many important loci in the relatives of HLH, confirming genetic heterogeneity as well as common etiology (Martin et al., 2007;Hinton et al., 2009;McBride et al., 2009). Despite large-scale chromosomal abnormalities that have been detected among a high proportion of patients with extracardiac defects, most cases of patients scarcely exhibit large-scale alterations with other phenotypes (Glidewell et al., 2015). Furthermore, the genetic changes related to the etiology of non-syndromic cases remain unclear. Increasing evidence suggests that a single gene mutation could contribute to the progression of HLH. For example, mutations in NKX2-5 (Elliott et al., 2003), NOTCH1 receptor (Hrstka et al., 2017), and the MYH6 head domain variant  have been detected in HLH patients. Furthermore, copy number variants (CNVs) can affect the transcription of cardiovascular genes, thereby affecting the etiology of HLH (Glidewell et al., 2015). Researchers try to interpret genetic effect or molecular regulation mechanism in the fields of HLH occurrence, development and complications, and so on. But most studies, undeniably, ignore the clarification of the morphogenetic background and correlation analysis among between genefeatures (Anderson).
scRNA-seq can simultaneously study the expression levels of multiple genes in thousands of single cells. It has been widely used to identify various cell types in the pathological process of cardiovascular diseases (Chaudhry et al., 2019). Furthermore, scRNA-seq has been able to explore distinct heterogeneity within a population of single-cell subtypes. Characterization of phenotypes for individual cells based on transcriptome data can depict cell development and expound changes in specific subgroups due to diseases. Quantitative proteomics is one of the most important applications in the field of proteomics research. Through quantitative proteomics technology, the changes in protein levels between samples can be quantitatively compared, which has been applied to cardiovascular disease etiology (Motiwala, 2019). In this study, we identified proteins related to HLH progression based on scRNA-seq of cardiomyocytes derived from iPSCs and quantitative proteomics analysis of HLH blood samples , which may provide evidence of HLH molecular mechanisms.

scRNA-Seq Data
This study obtained single-cell transcriptome profiles of cardiomyocytes that were derived from induced pluripotent stem cells (iPSCs) isolated from one HLH patient and one control who experienced cardiac-directed differentiation from the GSE146763 dataset in the Gene Expression Omnibus (GEO 1 ) repository (Barrett et al., 2013;Miao et al., 2020).

Data Preprocessing
10x Genomics data were read by the Seurat package in R, version 3.1 (Hafemeister and Satija, 2019). Cells with >2500 or <200 genes or a single cell with more than 5% mitochondrial genes were removed. Then, data were normalized by the LogNormalize function. Using the FindVariableFeatures function, the mean and variance of each gene were calculated. Highly variable genes between different cells were obtained for downstream analysis. The data were scaled through the ScaleData function. Dimensionality reduction analysis was then presented based on principal component analysis (PCA). The cells with similar expression were clustered through the FindClusters function. The shared nearest neighbor (SNN) was utilized to identify cell types. First, the SNN graph was constructed through k nearest neighbors, and then the module was optimized through the Louvain algorithm to determine the clustering results (Hafemeister and Satija, 2019). The resolution was set as 1.0. The cell clusters were visualized by t-distributed stochastic neighbor embedding (t-SNE).

Differential Expression Analysis
Differential expression analysis was performed to find marker genes for different cell clusters through the FindMarkers function, with the threshold of the percentage of any gene detected in the two cell populations ≥ 0.25 and log fold change (FC) ≥ 0.25 (Hafemeister and Satija, 2019). Using the Doheatmap function, a heat map was visualized for the indicated cells and marker genes. Each cell cluster displayed the top 20 marker genes.

Pseudotime Analysis
The objects in the Seurat were converted into those recognized by the Monocle2 package in R. First, following data normalization, feature genes that defined a cell's progress were chosen. Then, data dimensionality was reduced with the DDRTree method, followed by cell order along the trajectory. Pseudotime marker gene sets were obtained by the differential GeneTest function (Hafemeister and Satija, 2019). The root state parameter was utilized to specify the starting end by the orderCells function. The branch that contained the most cells at state 0 was then identified.

Functional Enrichment Analysis
To explore the potential biological functions of pseudotime marker genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were carried out. GO annotation analysis was presented through the "clusterprofiler" package, including biological process, cellular component, and molecular function. An adjusted p-value < 0.05 was significantly enriched (Yu et al., 2012). KEGG pathway enrichment analysis was carried out based on the DAVID online

Clinical Sample Collection
Peripheral blood samples from one HLH child without heart failure (HLH-NHF), one HLH child with heart failure (HLH-HF), and one healthy child (control) were collected from Guangzhou Women and Children's Medical Center between June 1, 2018, and January 1, 2020. This research strictly followed the guidelines of the Declaration of Helsinki. All children were diagnosed by physical signs, two-dimensional echocardiogram (2-DE), electrocardiogram (ECG), cardiac ultrasonography, or cardiac CT that abided by the guidelines of The European Association for Cardio-Thoracic Surgery (EACTS). And the cases combined with ventricular septal defects (VSDs), atrial septal defects (ASDs), and double outlet right ventricle (DORV) were excluded. The diagnosis of heart failure was referred to the American Heart Association Foundation (ACCF)/American Heart Association (AHA) guideline (Yancy Cw Fau -Jessup et al.

Mass Spectrometry Analysis
After centrifugation at 12,000 g, the supernatant of blood samples was harvested and stored at -80 • C until further analysis. The samples were incubated with a reaction solution composed of 1% SDC, 10 mM TCEP, and 40 mM CAA at 60 • C for 30 min for protein denaturation, disulfide bond reduction, and sulfhydryl alkylation. The protein concentration was determined by the Bradford method. The sodium deoxycholate (SDC) concentration was diluted to <0.5%, and trypsin was added according to a 1:50 mass ratio of enzyme to protein, followed by incubation at 37 • C with shaking overnight for digestion. The next day, trifluoroacetate (TFA) was added to terminate the digestion. The pH of the solution was adjusted to approximately 6.0. Following centrifugation at 12,000 × g for 15 min, the supernatant was collected for desalination. After the desalted peptide solution was drained by a centrifugal concentrator, the samples were stored at -20 • C for testing.

DIA Quantitative Proteomics Analysis
The mass spectrum files scanned by DIA were processed by DIA-Umpire to obtain secondary mass spectrum files that can be used for database search. MSFragger software was utilized to search the database of the secondary mass spectrum, and the result obtained was used as a spectral library for subsequent targeted extraction. The DIA-NN algorithm was used for DIA targeted extraction and quantification (Demichev et al., 2020). The screening criteria were as follows: false discovery rate (FDR) < 0.01. The protein quantitative intensity data obtained by DIA analysis were converted by log2. Moreover, data were filled using the imputation algorithm in Perseus software. Following normalization, differential expression analysis was performed by Student's t-test. DEproteins were screened with the threshold of FC ≥ 2 or ≤0.5 and p-value ≤ 0.05. Then, GO, KEGG, and Clusters of Orthologous Groups (COG 2 ) annotation and enrichment analyses were performed. An adjusted p-value < 0.05 was significantly enriched. The RAW data and analysis scripts can be accessed from Dr. RZ upon request.

Validation by RNA-Seq
RNA-seq profiles of HLH-LV and HLH-RV from HLH mice and their littermate controls (control-LV and control-RV) were retrieved from the GSE77798 dataset in the GEO database 4 (Liu et al., 2017). The expression of MMP2, COL5A1, and B2M was verified in the control-LV vs. HLH-LV and control-RV vs. HLH-RV groups.

Western Blot
Proteins were extracted from blood samples, and their concentration was determined using a BCA kit (Beyotime, Shanghai, China). Extracted proteins were separated using an SDS-PAGE separation gel, followed by transference onto membranes. After being blocked with skimmed milk for 1 h, the membrane was incubated with primary antibodies against MMP2, COL5A1, B2M, and GAPDH at 4 • C overnight, followed by incubation with secondary antibodies for 1 h. The antibodies used were from Cell Signaling Technology (Danvers, MA, United States). Immunoluminescence agents (Pierce, Rockford, IL, United States) were added to the membrane. The protein blots were visualized on Kodak Scientific Imaging Systems (New Haven, CT, United States).

Statistical Analysis
Statistical analysis was p using R language (version 3.6.3 5 ) and GraphPad Prism software (version 8.0). For RT-qPCR and western blot assays, each experiment was repeated ≥ 3 times. Data are expressed as the mean ± standard deviation. Multiple comparisons were analyzed by one-way analysis of variance (ANOVA). P < 0.05 was statistically significant.

Characterization of Cell Cluster Compositions in HLH and Normal Cardiomyocytes Derived From iPSCs
The workflow of this research is shown in Figure 1. A total of 6737 highly variable genes between different HLH and normal cardiomyocytes derived from iPSCs were screened for downstream analysis, which could highlight the biosignals in the scRNA-seq dataset (Figure 2A). Based on HLH and normal cardiomyocytes derived from iPSCs, a total of five cell clusters (0-4) were clustered ( Figure 2B). Clusters 0, 2, and 4 were categorized as the myocardium population. Cluster 1 was classified as the mesenchymal stem cell population. Cluster 3 was composed of fibroblast cells. Following identification of the cell clusters, marker genes for these cell clusters were screened with the criteria of min.pct = 0.25 and log FC ≥ 0.25 ( Figure 2C). The top 20 marker genes are displayed for each cell cluster. Then, we analyzed the state of the four cell clusters. Pseudotime analysis revealed their differentiation trajectory (Figures 2D-F).

Functional Enrichment Analysis of Marker Genes in Pseudotime Clusters
GO and KEGG annotation enrichment analyses were presented to explore the biological functions of cell development-related marker genes from pseudotime analysis. For pseudotime cluster 1, marker genes distinctly exhibited RNA binding functions and were significantly associated with protein localization to the endoplasmic reticulum ( Figure 3A) and the eukaryotic translation elongation pathway ( Figure 3B). Marker genes in pseudotime cluster 2 had collagen binding, extracellular structural constituent, and cell adhesion molecule binding functions ( Figure 3C) and were distinctly enriched in the extracellular matrix organization pathway ( Figure 3D). Marker genes in pseudotime cluster 3 were significantly related to circulatory system development ( Figure 3E) and dilated cardiomyopathy ( Figure 3F).

DIA Quantitative Proteomics Analysis of HLH-NHF, HLH-HF, and Control Blood Samples
Data-independent acquisition quantitative proteomics was analyzed based on HLH-NHF, HLH-HF, and control blood samples. The sample information and clinical features are presented in Supplementary Table 1. In total, 276 plasma proteins were obtained. There were high correlations of plasma proteins between HLH-NHF, HLH-HF, and control samples ( Figure 4A). With the threshold of FC ≥ 2 or ≤0.5 and p-value ≤ 0.05, we identified 65 DEproteins (36 upregulated and 29 downregulated) between HLH-NHF and the control, 84 DEproteins (46 upregulated and 38 downregulated) between HLH-HF and the control, and 85 DEproteins (48 upregulated and 37 downregulated) between HLH-NHF and HLH-HF. The GO enrichment analysis results showed that DEproteins in different comparison groups were enriched in distinct biological processes. In Figure 4B, DEproteins in HLH-NHF vs. control were significantly enriched in blood coagulation, hemostasis, and coagulation. DE proteins in HLH-HF vs. control mice were distinctly associated with extracellular matrix organization and the protein activation cascade. Moreover, DE proteins in HLH-NHF vs. HLH-HF had a significant relationship with extracellular structure organization and regulation of the inflammatory response. The heat maps visualized the top 20 plasma DEproteins in the three groups. As depicted in Figure 4C, there were distinct differences in their expression. The KEGG pathway enrichment analysis results revealed that the DEproteins were related to distinct signaling pathways. For instance, in Figure 4D, the HLH-NHF vs. control and HLH-HF vs. control groups had the PI3K-Akt signaling pathway distinctly enriched by DEproteins. In the HLH-NHF vs. HLH-HF group, DEproteins were significantly enriched in the phospholipase D and calcium signaling pathways. Furthermore, the COG term enrichment analysis results demonstrated that DEproteins in the three groups had a significant relationship with carbohydrate transport and metabolism, posttranslational modification, protein turnover, chaperones, energy production and conversion, and signal transduction mechanisms (Figure 4E).

Validation of Hub Genes From scRNA-Seq and Proteomics for HLH
Protein-protein interaction networks were constructed to explore the interactions of DE proteins in the three groups of HLH-NHF vs. control (Figure 5A), HLH-HF vs. control ( Figure 5B), and HLH-NHF vs. HLH-HF ( Figure 5C). Among the three networks, HPT had the highest degree. Following combining pseudotime markers and DEproteins in HLH-NHF vs. control, HLH-HF vs. control, and HLH-NHF vs. HLH-HF, three hub genes were identified, including MMP2, COL5A1, and B2M ( Figure 5D). Their expression was validated in the GSE77798 dataset. The results showed that MMP2, COL5A1, and B2M expression was significantly increased in the HLH-LV group compared to the control LV group (all p-values < 0.05; Figure 5E). Moreover, their expression was distinctly higher in the HLH-RV group than in the control RV group (all p-values < 0.05; Figure 5E).
Simultaneously, we obtained 6 cases of HLH and 3 cases of normal children's peripheral blood samples from the biological sample bank of Guangzhou Women and Children's Medical Center for verification. The corresponding clinical information was also collected. Among the six cases, includes three cases HLH-NHF and three cases HLH-HF, we found all the had a  In this study, all of the HLH cases had varying degrees of mitral valve dysplasia, includes atresia and stenosis, and showing a limitation in the dynamic activity of the stenotic valve. Additionally, the diameter size of RV (RVD) was apparently higher in HLH than in normal patients (Control vs. HLH-NHF: Mean Diff = -3.67, 95%CI of diff = -6.13 to -1.21; Control vs. HLH-HF: Mean Diff = -4.67, 95%CI of diff = -7.13 to -2.21; all p < 0.05). On the other hand, RV walls were hypertrophied in four HLH cases (66.7%), and the tricuspid annulus had a mild to moderate degree of dilation and regurgitation. In this setting, all of the HLH cases present aortic dysplasia, which is characterized by the decreased ascending aortic diameter (five cases, 83.3%) and the abnormalities of the descending aortic arch (two cases, 33.3%). In contrast to the normal group, the diameter size of aortic diameter (AOD) was significantly lower in HLH children (Control vs. HLH-NHF: Mean Diff = 7.00, 95%CI of diff = 4.54-9.46; Control vs. HLH-HF: Mean Diff = 7.17, 95%CI of diff = 4.71-9.63; all p < 0.05). No significant differences were detected among HLH-NHF and HLH-HF groups. Ultimately, the 2-DE results also shown a mild to moderate degree of MPAD dilatation in HLH cases than the normal group (Control vs. HLH-NHF: Mean Diff = -3.00, 95%CI of diff = -5.46 to -0.54; Control vs. HLH-HF: Mean Diff = -4.00, 95%CI of diff = -6.46 to -1.54; all p < 0.05).
We validated the expression of hub genes in blood samples from control, HLH-NHF, and HLH-HF patients by RT-qPCR and western blot. As shown in Figure 5F, compared to the control group, hub gene mRNA expression was markedly increased in both the HLH-NHF and HLH-HF groups (all p-values < 0.05; Figure 5H). There were distinctly higher mRNA expression levels in the HLH-HF group than in the HLH-NHF group (all p-values < 0.05). Consistently, western blot results showed that MMP2, COL5A1, and B2M exhibited higher protein expression levels in the HLH-NHF and HLH-HF groups than in the control group (all p-values < 0.05; Figure 5I). However, there was no significant difference in their expression between the HLH-NHF and HLH-HF groups (p > 0.05).

DISCUSSION
Hypoplastic left heart is one of the most complex congenital heart diseases and is usually established quickly based on symptoms, prenatal examination, and postnatal echocardiography. HLH is almost universally fatal among infants and children (Barron et al., 2009;Saraf et al., 2019). Remmell-Dow et al. (1995) compared 42 HLH hearts with 16 normal hearts and found eustachian valve dysplasia in up to 92.9% of HLH hearts, as well as 33.3% limbus dysplasia. Similarly, Bharati and Lev reported a surgical anatomy analysis of 230 HLH hearts that illustrated that 45.7% of hearts had aortic atresia (AS) and mitral stenos (MS), as well as 41.3% of hearts having aortic and mitral atresia (AA/MA) (Bharati et al., 1984). The same exciting research by Crucean et al. (2017) recategorized the HLH heart into the slit-like left ventricle (SLLV), miniaturized left ventricle (MLV), and thickened left ventricle with endocardial fibroelastosis (TLV-EFE) according to valve patency phenotype. In this study, they shared the proportion of each subtype of HLH that was up to 70% TLV-EFE, and there was no evidence of a connection between the degree of myocardial thickening and valvar stenosis (Crucean et al., 2017). Regarding anatomical heterogeneity, Stephens et al. (2020a,b) found that 30% of HLH combined with AA/MA, 22% had AS/MS, and 48% had MS/AA, and interestingly, these results showed that the syndrome was an acquired disease of fetal life, not abnormal embryogenesis.
Most patients still suffer from HF and recurrent pneumonia, although the treatment of the disease has been greatly improved until the Norwood operation or various surgical stages. Herein, the management and control of heart failure in HLH children are particularly important, while the causes of HLH are still unclear. The advancement of scRNA-seq and proteome technology opens a new chapter in the study of the pathological mechanism of HLH. For example, scRNA-seq studies in the adult heart have elucidated distinct differences in the gene expression of cardiomyocytes (Skelly et al., 2018). scRNA-seq can find more heterologous gene expression, which has not been detected by previous tissue-based RNA-seq (Chaudhry et al., 2019). In this study, we analyzed the transcriptional expression changes during the HLH development by the single-cell transcription profile. Furthermore, combined with DIA protein sequencing technology, the occurrence and progression of HLH were described from the two levels of transcription and protein expression. Three core regulatory factors of HLH were finally identified and validated by integrating scRNA-seq and DIA quantitative proteomics analysis, which could offer novel insights into the etiology of HLH.
In this study, we analyzed scRNA-seq data of cardiomyocytes derived from HLH and normal iPSCs. Three cell clusters, including mesenchymal stem cells, myocardium, and fibroblast cells, were characterized by identification of the specific expression of marker genes in each cell type, indicating distinct expression and functional differences in HLH iPSCderived cardiomyocytes (Jiang et al., 2014). In many biological processes, cells are not completely synchronized. Pseudotime analysis was achieved by ranking cells based on the similarity of their gene expression profiles. During development, cells will respond to stimuli and change from one functional "state" to another throughout their life. Cells in different states express different genes and produce dynamic repetitive sequences of proteins and metabolites to complete their work. When a cell moves between different states, it undergoes a process of transcriptional recombination, in which some genes are silenced, and others are activated. Differential expression analysis was used to characterize cell clusters in this study. Compared with all other cells, differential expression analysis can identify gene sets that are more highly expressed in the cell cluster, which provides clues on the cell type/state or cell function in the cell cluster. Herein, we identified the genes that were regulated during the cell trajectory. GO and KEGG annotation and enrichment analysis revealed that these pseudotime marker genes in different clusters were involved in distinct key biological processes and pathways. For example, in pseudotime cluster 1, eukaryotic translation elongation was significantly enriched, indicating that these marker genes could be involved in mediating the cardiac translation process. Protein synthesis is a strictly regulated process. The increase in heart protein levels can induce cardiac hypertrophy and HF (Grund et al., 2019). Thus, the identification of genetic factors that regulate the level of protein synthesis in cardiomyocytes may offer promising therapeutic targets for HLH. Extracellular matrix organization was distinctly enriched by marker genes in pseudotime cluster 2, which was a critical factor for valve formation. It is also related to intrinsic endocardial defects in HLH (Miao et al., 2020). Furthermore, marker genes in pseudotime cluster 3 had a significant relationship with dilated cardiomyopathy, indicating that these genes might be important for heart development and cardiovascular disease (Sucharov et al., 2015).
Our DIA quantitative proteomics analysis of HLH-NHF, HLH-HF, and control blood samples identified 276 plasma proteins related to HLH. After the intersection of pseudotime markers and DEproteins in HLH-NHF vs. control, HLH-HF vs. control, and HLH-NHF vs. HLH-HF, three hub genes were identified, including MMP2, COL5A1, and B2M. We further validated their expression in an independent dataset. Our results demonstrated that they were expressed at low levels in both LV and RV HLH compared to controls. Moreover, our RT-qPCR and western blot results confirmed that they exhibited lower expression levels in HLH with or without HF than in controls. Cardiac fibrosis has been detected in HLH by histopathology (Kido et al., 2018). Increased angiotensin II and aldosterone levels promote collagen synthesis and reduce collagen breakdown by regulating matrix metalloproteinases, thereby inducing extracellular matrix (ECM) homeostasis (Nakano et al., 2017). Radosinska et al. (2017) found that ECM homeostasis was considered a structural and physiological foundation of the heart and that matrix metalloproteinases (MMPs) are an important environmental mediator of cardiac diastolic or systolic dysfunction (Radosinska et al., 2017). Moreover, both the MMP2 expression level in myocardial tissue and blood circulation was most frequently reported to represent myocardial remodeling accompanied by myofibrosis, inflammation, and consequent development of heart failure (George et al., 2005;Andreasová et al., 2020). It has been reported that COL5A1 mutations are closely related to the abnormal heart or arterial development (Weerakkody et al., 2016;Richer et al., 2020). Ramirez et al. (2012) identified that the expression level of COL5A1, reflecting the status of inflammatory and extracellular adhesion, was increased in left ventricle tissues in relation to the hypoxia response. Based on in situ hybridization analysis, Roulet's group reported that the expression of COL5A1 plays an important role in the development of functional connective tissues, especially in the heart, dorsal aorta wall, skin, and vertebral column, suggesting that COL5A1 plays an important role in the development of these organs in developing embryos (Roulet et al., 2007). Interestingly, Yokota et al. (2020) found that the COL5A1 expression level significantly correlated with scar size after myocardial infarction mechanically, not only in the regulation of the size of cardiac scars but also in tissue mechanical properties in an integrin-dependent manner. The expression level of B2M in the plasma is not simply a significant marker for kidney injury but a novel regulator involved in heart failure and heart remodeling (Vianello et al., 2015). Vianello's group demonstrated that the B2M level was increased in chronic heart failure patients and that it was positively correlated with the levels of proBNP and GFR but negative in percent ejection fraction (Vianello et al., 2015). Using 9,988 participants' data from the Atherosclerosis Risk in Communities (ARIC) study, we showed a more sensitive advantage in predicting outcomes, including kidney function, cardiovascular disease, and mortality, in the general population for serum B2M levels (Astor et al., 2012). Through retrospective analysis of 460 elderly isolated systolic hypertension (ISH) patients, Wang's group suggested that serum B2M level is an independent risk factor associated with major cardiovascular events and all-cause death (Wang et al., 2018). Thus, the three hub genes could participate in the pathogenesis of HLH and become promising therapeutic targets for HLH, which deserves more in-depth exploration in future research.

CONCLUSION
Taken together, this study clarified the heterogeneity of HLH cardiomyocytes based on scRNA-seq expression profiles. Furthermore, we characterized the type and function of different cell clusters through marker gene or functional annotation analysis. The cell types or states were organized into a trajectory. After validation, three hub genes (MMP2, COL5A1, and B2M) were identified by combining quantitative proteomics analysis, which could become promising therapeutic targets for HLH. Our research offers new insights into the molecular mechanism of HLH pathogenesis, which provide a theoretical basis for new therapeutic interventions. Here, several limitations were also detected in our study. First, there is no systematic correlation analysis constructed among the clinical features and hub marker's expression. Second, HLH is still controversial and debatable, and its definition happens to be constantly improved and supplemented, among which the most authoritative one is defined as the left side of the heart is underdeveloped, while excludes the aortic atresia with VSD and a welldeveloped mitral valve and LV. There are still shortcomings in this study that applied the term "HLH" not the common term "hypoplastic left heart syndrome (HLHS)" since the complexity of morphologic and physiologic characteristics in HLHS is hard to define based on clinical examination in our center. Third, our verification and quantitative proteomics analysis of hub gene expression are based on the HLH patient's blood serum, not the myocardial samples, which has certain limitations.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the patients of all children signed written informed consent. This study was approved by the Ethics Committee of Chinese Clinical Trial Registry Center and Guangzhou Women and Children's Medical Center (Registration number: ChiCTR-EOC-17013273; Approved No. of ethic committee: 2017103101). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin. Written informed consent was obtained from the individual(s), and minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
LM, NZ, and RZ takes responsibility for all aspects of the reliability and freedom from bias of the data presented and their discussed interpretation, drafting the article. WS, YL, ND, JZ, and XZ takes responsibility for statistical analyses, and interpretation of data. XC, HX, and YW takes responsibility for full-text evaluation and guidance, final approval of the version to be submitted. All the authors read and approved the final manuscript.

FUNDING
This study was funded by the Guangdong peak project (DFJH201802), the Key Project of Natural Science Foundation of Guangdong Province (2017B030311010).