Functional enrichment analysis of mutated genes in children with hyperthyroidism

Objective Hyperthyroidism in Chinese children is relatively high and has been increasing in recent years, which has a significant impact on their healthy development. Hyperthyroidism is a polygenic disorder that presents greater challenges in terms of prediction and treatment than monogenic diseases. This study aims to elucidate the associated functions and gene sets of mutated genes in children with hyperthyroidism in terms of the gene ontology through GO enrichment analysis and in terms of biological signaling pathways through KEGG enrichment analysis, thereby enhancing our understanding of the expected effects of multiple mutated genes on hyperthyroidism in children. Methods Whole-exome sequencing was performed on the DNA samples of children with hyperthyroidism. Screening for pathogenic genes related to hyperthyroidism in affected children was performed using the publicly available disease databases Malacards, MutationView, and Clinvar, and the functions and influences of the identified pathogenic genes were analyzed using statistical analysis and the gene enrichment approach. Results Through GO enrichment analysis, it was found that the most significant gene ontology enrichment was the function “hormone activity” in terms of gene ontology molecular function. The corresponding mutated genes set that has common effects on hyperthyroidism in children included TG, CALCA, POMC, CGA, PTH, GHRL, FBN1, TRH, PRL, LEP, ADIPOQ, INS, GH1. The second most significant gene ontology enrichment was the function “response to peptide hormone” in terms of biological process. The corresponding mutated genes set that has common effects on hyperthyroidism in children included LRP6, TSC2, KANK1, COL1A1, CDKN1B, POMC, STAT1, MEN1, APC, GHRL, TSHR, GJB2, FBN1, GPT, LEP, ADIPOQ, INS, GH1. Through KEGG enrichment analysis, it was found that the most significant biological signaling pathway enrichment was the pathway “Thyroid hormone signaling pathway” function. The corresponding mutated genes set that has common effects on hyperthyroidism in children included NOTCH3, MYH7, TSC2, STAT1, MED13L, MAP2K2, SLCO1C1, SLC16A2, and THRB. The second most significant biological signaling pathway enrichment was the pathway “Hypertrophic cardiomyopathy” in terms of biological process. The corresponding mutated genes set that has common effects on hyperthyroidism in children included IGF1, CACNA1S, MYH7, IL6, TTN, CACNB2, LAMA2, and DMD. Conclusion The mutated genes in children with hyperthyroidism were closely linked to function involved in “hormone activity” and “response to peptide hormone” in terms of the biological signaling pathway, and to the functional pathways involved in “Thyroid hormone signaling pathway” and “Hypertrophic cardiomyopathy” in terms of the biological signaling pathway.


Introduction
Hyperthyroidism is characterized by the excessive synthesis and release of thyroid hormones, leading to increased metabolism and sympathetic nervous system activity, resulting in symptoms such as palpitations, sweating, increased appetite, increased bowel movements, and weight loss.Its hallmark is an increase in the synthesis and secretion of thyroid hormones and thyroxine (T4) and triiodothyronine (T3) (1).According to an epidemiological survey report released in 2021, the overall incidence rate of hyperthyroidism in mainland China is approximately 1.78% (2).The overall incidence of hyperthyroidism in European children and adolescents is approximately 4.58 per 10,000 per year, with a higher prevalence in girls than boys (3).Children with hyperthyroidism may experience thyroid enlargement, increased appetite with weight loss, bulging eyes, irritability, excessive sweating, tachycardia, hyperactivity, and tremors (4).They may also have comorbid conditions such as attention deficit hyperactivity disorder (ADHD), regulatory disorders, anxiety, bipolar disorder, depression, and an increased risk of suicide.Children with hyperthyroidism's physical and mental health can be severely affected (5).Therefore, early and accurate diagnosis, precise treatment, and improved prognosis of pediatric hyperthyroidism are of great clinical significance.
Currently, more than 100 genes have been found to be related to the development of hyperthyroidism (according to the Malacards database, MutationView database, and ClinVar database).Among these genes, those that have received more attention include (1): the thyroid stimulating hormone receptor (TSHR) gene.TSHR gene mutations are one of the causes of non-autoimmune hyperthyroidism (6,7), and neonatal non-autoimmune hyperthyroidism is associated with the TSHR gene c.1856A>G (p.Asp619Gly) heterozygous mutation (8); (2) the cytotoxic T-lymphocyte associated protein 4 (CTLA4) gene.CTLA-4 gene mutations in the Chinese Han population are related to the development of hyperthyroidism (9)(10)(11); (3) the thyroid globulin (TG) gene.Specific TG SNP haplotypes are associated with hyperthyroidism (12), and hyperthyroidism patients with the TG gene E33SNP C/C genotype are more likely to relapse after discontinuing medication (13); (4) the GNAS complex locus (GNAS) gene.Gene mutations in GNAS are found in approximately 70% of patients with autonomous thyroid adenomas (14), and non-autoimmune hyperthyroidism is also associated with GNAS mutations (15); (5) the thyroid hormone receptor beta (THRB) gene.THRB gene mutations are related to a tendency towards thrombosis in patients with hyperthyroidism (16).
Research on gene mutations related to hyperthyroidism has mainly focused on individual genes, such as the study of TSHR gene mutations in congenital non-autoimmune hyperthyroidism (17)(18)(19), the impact of novel heterozygous TSHR gene mutations on hyperthyroidism (20), and special attention paid to its effects on hyperthyroidism (21).The role of BAFF gene mutations in the pathogenesis of hyperthyroidism is also studied (22).Other studies have explored the relationship between hyperthyroidism and other related diseases, such as the analysis of KCNJ18 gene mutations in hyperthyroidism associated with hypokalemic periodic paralysis (23), the study of BRAF gene mutations in hyperthyroidism associated with thyroid papillary carcinoma (24), and the study of UGT1A1 gene mutations in hyperthyroidism associated with liver failure (25).
Studies on gene mutations in children with hyperthyroidism have also mainly focused on single genes, including TSHR gene mutation analysis in children with hyperthyroidism (26,27) and its relationship with congenital non-autoimmune hyperthyroidism in newborns (28), as well as functional gain-of-function mutations in the TSHR gene leading to increased function and thyroid growth (29).Additionally, there have been studies on gene sets, such as the age-related HLADRB1*03 allele in the Polish population (30) and gene sets significantly associated with early-onset hyperthyroidism, including BTNL2, NOTCH4, TNFAIP3, and CXCR4 (31).Other recent studies on pediatric hyperthyroidism have focused on prevention (32,33) and treatment options (34)(35)(36)(37)(38)(39), as well as research on related clinical symptoms (40,41).
In summary, the genetic research on hyperthyroidism, especially pediatric hyperthyroidism, has mainly focused on single genes that affect the development of the disease.There are relatively few studies analyzing the expected effects of multiple mutated genes in hyperthyroidism.Therefore, more comprehensive understanding of the disease and help promote healthy development for children and adolescents, it is necessary to conduct more in-depth research and analysis on the functional implications of mutated genes in pediatric hyperthyroidism.

WES data generation and gene processing
The cases were obtained from the Guangzhou Women and Children's Medical Center.After approval from the Center's Ethics Committee and obtaining informed consent from the guardians of the children with hyperthyroidism, peripheral blood samples were collected from 39 children with hyperthyroidism and DNA was extracted.
Whole-exome sequencing was performed on the DNA samples with GRCh38 as the reference genome.The main steps of Wholeexome sequencing include data filtering, data quality control, map to reference, mark duplicates, InDel realignment, base recalibration, the second data quality control, variant calling, variant filtering and variant annotation.GATK was used to detect the SNP and InDel.The reads map rate of the resulting data is 99%, indicating that the selection of the reference sequence is more accurate, and the average sequencing depth is 150X, which is enough for analysis, and 99% reaches 4X coverage.
The Whole-exome sequencing data were screened.This study obtained genes related to hyperthyroidism by searching public disease databases such as Malacards, MutationView, and Clinvar, comparing them with the mutated genes of 39 patients, and a total of 144 genes related to hyperthyroidism were screened.The specific genes are shown in Table 1.

Clinical data of the children studied
All the 39 children with hyperthyroidism in this study have corresponding clinical characteristics.The statistics of their clinical data are shown in the Table 2 below.Female patients have more than male patients, Schoolagechild accounts for the largest proportion, 14 patients with family history of hyperthyroidism, 9 patients with abnormal liver function, the largest proportion of patients with second degree goiter, most of the patients' TPOAb, TGAb, TG and TRAb indexes are abnormal.

Functional enrichment analysis
The "clusterProfiler" R package was used for enrichment analysis of this paper.The basic principle of enrichment analysis is to compare the gene set to be analyzed from patients in section 1.1 with the reference gene set to determine which biological functions or pathways appear significantly more frequently in the gene set to be analyzed than in the reference gene set in the database, so as to find the biological processes, functions or pathways that have a significant impact on patients and the corresponding gene set.

Mutation data analysis and visualization
The "maftools" R package (42) was used to analyze a large amount of whole-exome sequencing data and visualize the data.First of all, we made a statistical analysis of the whole exon sequencing data set of screened genes: the distribution of highfrequency genes in each patient, the number of rare mutations compared with the 1000G_ALL database in the sample patients.Finally, the data distribution is displayed visually.
Table 3 shows the distribution of the top 10 high-frequency mutated genes on each sample.In this paper, the top 3 genes were selected for graphical display.Figure 1 shows that the TTN gene has the highest mutation frequency in patient GD1, the PRUNE2 gene has the highest mutation frequency in patients GD30 and GD31, and the CDH23 gene has the highest mutation frequency in patient GD17.
Table 4 shows the rare variance of the top 10 high-frequency variant genes in each sample compared with the 1000G_ALL database, and Figure 2 shows the proportion of rare variants corresponding to Table 4 for each patient.It can be found that the proportion of rare variants in GD9 and GD39 is the largest.
A summary of the mutation information of hyperthyroidism in children was visualizing in Figure 3.The top 15 mutated genes of hyperthyroidism in children by percentage was shown in the figure, including TTN, PRUNE2, CDH23, RYR1, SPINK5, SCN10A, VWF, NOTCH3, MYO3B, TG, LAMA2, KANK1, APC, TECTA, PCDH15.The mutation rate of the 15 genes in 39 children with hyperthyroidism is 100% (Figures 3A, B).The synonymous mutation was the most common type of mutation in 39 children with hyperthyroidism, followed by missense mutations (Figure 3A).Among the SNV classes, T>C was the predominant type of SNVs detected (Figure 3B).Different mutated genes have different tendencies for mutation types in children with hyperthyroidism.Gene WDR37 tends to have Splice site mutations, Gene COL3A1, GRHPR, OPA3 and SOX10 tend to have Silent mutations, and Gene INS and SLC24A5 tend to have missense mutations (Figure 3C).

GO enrichment analysis
With the ORA method, the screened 144 genes were used to perform a statistical analysis of the gene set using the GO database as a reference.The returned p-values were subjected to multiple testing FDR correction, deriving a significance index denoted as p.adjust.A lower p.adjust value indicates a higher significance level in the enrichment analysis.
Figure 4 displays the functional enrichment of the 144 mutated genes in three dimensions of gene ontology.BP represents Biological Process: characterizing the biological processes in which genes are involved, such as cell differentiation and DNA replication; CC represents Cellular Component: describing where the gene product acts in which cellular component or gene product; MF represents Molecular Function: describing the activities of individual molecules in molecular biologies, such as protein kinase activity and insulin receptor activity.In Figure 4, the horizontal axis labeled "Count" denotes the quantity of genes in the enriched gene set, with longer bars indicating a more significant number of genes.
The color coding represents the significance of the enrichment, with a redder color and a smaller p.adjust value indicating a higher significance level.
Based on the results presented in Figure 4, in terms of the Biological Process (BP) category, the most significant enrichment of mutated genes is in the "response to peptide hormone" function,  followed by enrichment in "hormone transport", "cellular response to peptide", "anatomical structure homeostasis", "cellular response to peptide hormone stimulus" and "glucose homeostasis".In terms of the Cellular Component (CC) category, the most significant enrichment of mutated genes is in the "endoplasmic reticulum lumen" function, followed by enrichment in "collagen-containing extracellular matrix", "cation channel complex", "ion channel complex", "chromosomal region" and "myofibril".Similarly, within the Molecular Function (MF) category, the "hormone activity" function exhibits the most significant enrichment of mutated genes, followed by enrichments in functions "receptor ligand activity", "signaling receptor activator activity", "extracellular matrix structural constituent", "insulin-like growth factor receptor binding" and "four-way junction DNA binding".The complete information of the top 8 significantly enriched biological processes (BP) terms are shown in Table 5, ranked by their level of significance based on the p.adjust values.The "Description" column describes the biological process associated with the enriched gene set, the "geneID" column lists all the genes in the gene set that has common effects, and the "Count" column indicates the number of genes in the enriched gene set.
The complete information of the top 8 significantly enriched cellular component (CC) terms are shown in Table 6, ranked by their level of significance based on the p.adjust values.The "Description" column provides a description of the cellular component associated with the enriched gene set, the "geneID" column lists all the genes in the gene set that has common effects and the "Count" column indicates the number of genes in the enriched gene set.
The complete information of the top 8 significantly enriched molecular function (MF) terms are shown in Table 7, ranked by significance level based on the p.adjust values.The "Description" column describes the molecular function associated with the enriched gene set, the "geneID" column lists all the genes in the gene set that has common effects and the "Count" column indicates the number of genes in the enriched gene set.
Overall, the functional significance of mutated genes in CC terms was relatively low in affected children and relatively high in BP terms.In all GO enrichments, the most significant functional enrichment of Rare variation proportion of each sample.TABLE 4 The rare variance of the top 10 high-frequency variant genes in each sample.

Sample
The number of rare mutations from top 10 high-frequency genes GD9 10 mutated genes in hyperthyroidism children was observed in "hormone activity" and "response to peptide hormone" functions.

KEGG enrichment analysis
With the ORA method, the 144 screened genes were used to perform a statistical analysis of the gene set using the KEGG biological signaling pathway database as a reference.The returned p-values were subjected to multiple testing FDR correction, deriving a significance index denoted as p.adjust.A lower p.adjust value indicates a higher significance level in the enrichment analysis.
Figure 5 displays the top 10 functional biological signaling pathways of the 144 mutant genes that were most significantly enriched.The horizontal axis labeled "Count" denotes the quantity of genes in the enriched gene set, with longer bars indicating a more significant number of genes.The color coding represents the significance of the enrichment, with a redder color and a smaller p.adjust value indicating a higher significance level.
Based on the results presented in Figure 5, the most significant enrichment of mutated genes is in the biological signaling pathway "Thyroid hormone signaling pathway" function, followed by enrichment in the biological signaling pathway "Hypertrophic cardiomyopathy", "Dilated cardiomyopathy", "Neuroactive ligand-receptor interaction", "Thyroid hormone synthesis" and "cAMP signaling pathway".
The complete information of the top 10 significantly functional biological signaling pathways are shown in Table 8, ranked by their significance level based on the p.adjust values.The "Description" column provides a description of the biological signaling pathways associated with the enriched gene set, the "geneID" column lists all the genes in the gene set that has common effects and the "Count" column indicates the number of genes in the enriched gene set.

Discussion
The collective action of multiple genes and various genetic factors contribute to the development of hyperthyroidism.The improvement of bioinformatics has provided new approaches and tools for studying diseases such as hyperthyroidism.In this study, we processed and statistically analyzed the whole-exome sequencing data of 39 children with hyperthyroidism.We used bioinformatics tools to perform GO enrichment analysis and KEGG enrichment analysis to obtain the significant functional gene sets associated with hyperthyroidism in children, and to help understand the pathogenesis of hyperthyroidism in polygenic diseases and the common effects of its mutated genes on different functions.
Through the mutation data visualization, we found that among the 144 mutated genes in 39 children with hyperthyroidism, the top 15 most mutated genes were TTN, PRUNE2, CDH23, RYR1, SPINK5, SCN10A, VWF, NOTCH3, MYO3B, TG, LAMA2, KANK1, APC, TECTA, PCDH15, and the top 45 genes with high mutation frequencies had mutations in each patient.In addition, Gene WDR37 tends to have Splice site mutations, Gene COL3A1, GRHPR, OPA3 and SOX10 tend to have Silent mutations, and Gene INS and SLC24A5 tend to have missense mutations.Among the mutation classifications, Synonymous mutation was the most common mutation classification in children with hyperthyroidism, followed by missense mutations.Regarding single nucleotide variations, T>C had the highest mutation frequency.
Through GO enrichment analysis, we found that in terms of the biological process (BP), the enriched functional gene set associated with hyperthyroidism in children was most significant for function "response to peptide hormone" with the p.adjust value of 9.01×10-7 and the corresponding gene set with mutations included 19 genes: CPS1, LRP6, TSC2, KANK1, COL1A1, CDKN1B, POMC, STAT1, MEN1, APC, GHRL, TSHR, GJB2, FBN1, GPT, LEP, ADIPOQ, INS, GH1.In terms of the cellular component (CC), the enriched functional gene set associated with hyperthyroidism in children was most significant for the function "endoplasmic reticulum lumen", its p.adjust value is 1.03×10-3.The corresponding gene set with mutations included 12 genes: PLOD3, COL1A1, COL12A1, COL5A1, MEN1, IL6, GHRL, ALB, FBN1, COL3A1, F2, INS.In terms of the molecular function (MF), the enriched functional gene The top 15 pathways involved hyperthyroidism in children in by KEGG.

FIGURE 2
FIGURE 2 (A) Cohort summary plot of the mutation information among 39 cases of children with hyperthyroidism, the first row shows the statistical distribution of mutation classification, type, SNV class, the second row shows the distribution of mutation frequency and classification for each sample, and the last stacked statistic plot shows the top 15 most mutated genes.(B) Transition (Ti) and transversion(Tv) plot displays the distribution of SNVs in children with hyperthyroidism, and the stacked bar statistics show the distribution of SNVs in each child.(C) Landscape of mutation profiles shows mutation information of 39 cases of children with hyperthyroidism.The mutated genes are ordered by their mutation frequency.

TABLE 1
List of genes associated with hyperthyroidism of children studied.

TABLE 2
Clinical data of the children studied.

TABLE 3
The distribution of the top 10 high-frequency mutated genes on each sample.

TABLE 5 GO
Enrichment results in terms of the BP category.

TABLE 6
GO enrichment results in terms of the CC category.

TABLE 7 GO
Enrichment results in terms of the MF category.

TABLE 8
The results of KEGG enrichment.