Whole Exome Sequencing Identifies a Novel Pathogenic RET Variant in Hirschsprung Disease

Hirschsprung disease is a birth defect characterized by complete absence of neuronal ganglion cells from a portion of the intestinal tract. To uncover genetic variants contributing to HSCR, we performed whole exome sequencing on seven members of an HSCR family. With the minor allele frequency (MAF) calculated by gnomAD, we finally filtered a total of 1,059 rare variants in this family (MAF < 0.1%). With the mode of inheritance and pathogenicity scores by bioinformatics tools, we identified an in-frameshift variant p.Phe147del in RET as the disease-causing variant. Further analysis revealed that the in-frameshift variant may function by disrupting the glycosylation of RET protein. To our knowledge, this is the first study to report the in-frameshift variant p.Phe147del in RET responsible for heritable HSCR.


INTRODUCTION
Hirschsprung disease (HSCR) is a congenitally genetic disorder of the enteric nervous system (ENS) characterized by complete absence of neuronal ganglion cells from a portion of the intestinal tract. The incidence of HSCR is approximately 1 in 5,000 live births, which varies among different ethnic groups (Parisi and Kapur, 2000). HSCR can be classified into short-segment HSCR (S-HSCR), long-segment HSCR (L-HSCR), and total colonic aganglionosis (TCA) based on the length of the aganglionic segment (Lantieri et al., 2006). Treatment options for HSCR include surgical treatment with resection of the aganglionic segment and reconstitution of the intestinal passage after the first year of life, following bridging therapy with colostomy (Bachmann et al., 2015).
The mode of inheritance of HSCR vary from dominant with reduced penetrance or recessive in familial cases to a more complex, non-Mendelian mode of inheritance in the sporadic cases (Amiel et al., 2008). So far, several genes have been found to contribute to HSCR, such as RET (Tomuschat and Puri, 2015), ECE1 (Vohra et al., 2007), EDN3 (Sanchez-Mejias et al., 2010), EDNRB (Sanchez-Mejias et al., 2010), GDNF (Eketjall and Ibanez, 2002), NRTN (Doray et al., 1998), SOX10 (Lecerf et al., 2014), PHOX2B (Fernandez et al., 2013), and KIAA1279 (Amiel et al., 2008). With the exception of the RET proto-oncogene that is responsible for approximately 50% of familial and up to 15% of sporadic cases, other HSCR genes only account for a small proportion of the cases (Amiel et al., 2008). RET encodes a transmembrane tyrosine kinase receptor that, during development of specific neuronal cell lineages, transduces extracellular signals for cell growth and differentiation. The mechanism of HSCR caused by loss-of-function mutations in RET is highly dependent on their location in the protein (So et al., 2011). For example, mutations affecting the intra-cytoplasmatic domain could impair the kinase activity required for proper signal transduction, altering either the catalytic function, the stability of the enzyme structure or the binding of transduction effectors (Hyndman et al., 2013). In contrast, mutations of the extracellular domain (ECD) can affect RET function through a number of different mechanisms, such as lack of ligand binding, and impairment of protein folding (Kjaer and Ibanez, 2003). However, these mechanisms are mostly revealed for missense, nonsense, frameshift, and splicing mutations, and the pathogenicity of in-frameshift variants is underestimated by previous studies.
In the present study, we collected an HSCR family with four affected and three unaffected members. To uncover the novel pathogenic genes or variants, we performed whole exome sequencing on seven family members. With the filtering steps by minor allele frequency (MAF), the mode of inheritance, cosegregation, and pathogenicity scores by the bioinformatics tools, we identified a novel in-frameshift variant p.Phe147del in RET as the disease-causing variant, which may function by disrupting RET N-glycosylation. To our knowledge, this is the first study to report the p.Phe147del in RET as a disease-causing variant for heritable HSCR.

Clinical Features of the HSCR Cases
The proband (III-2) was a 2-month-old Chinese boy who had the symptoms of abdominal distension and vomiting after birth ( Figure 1A). The proband was diagnosed as Hirschsprung disease by barium enema examination (Figures 1B-D), which showed typical symptoms of congenital megacolon. In detail, we observed that the ganglion cells were present in dilated segment, but not detected in narrow segment of mucosa of intestinal wall and myenteric nerve plexus by microscopic image-based histologic examination. Unmyelinated nerve fiber and Schwann cells were increased in the narrow segment. Further family history survey revealed that the proband's father (II-1), older brother (III-1), and cousin (III-3) had the similar symptoms ( Figure 1A). The proband as well as his older brother and cousin was diagnosed as short-form and long-form aganglionosis based on the length of aganglionosis, respectively. Unfortunately, the length of aganglionosis was unclear due to loss of medical record. All the patients were cured by radical operation.

Identification of Rare Variants in Coding or Splicing Regions
To uncover the genetic variants contributing to HSCR, we performed whole exome sequencing on the seven family members, including II-1, II-2, II-3, II-4, III-1, III-2, and III-3. Variants were called by VarScan (Koboldt et al., 2012) with the trio-based mode. The steps of genetic variant analysis were illustrated in Figure 2. In total, we detected 239,225 variants at which at least one family member had an allele that varied from the reference genome, including 217,172 substitutions and 22,053 indels (insertion and deletion) (Figure 2A). For the three affected boys (III-1, III-2, and III-3), the Mendelian errors were estimated about 1.10, 1.43, and 1.33%, suggesting that the variants were high reliable by the trio-based variant calling strategy. As the heritable Hirschsprung disease was rare in the population, the pathogenic variants were more likely to be rare in healthy population. To identify the rare variants, we firstly obtained their MAFs from gnomAD database (Lek et al., 2016). By excluding the non-coding and synonymous variants, we finally filtered a total of 1,059 rare variants in this family (MAF < 0.1%). As the Hirschsprung disease could be inherited by autosomal dominant and recessive patterns (Amiel et al., 2008), autosomal dominant and recessive variants were considered. Among the homozygous variants and biallelic variants, no recessive variants were shared by the four patients. Notably, under the hypothesis of autosomal dominant inheritance, the unaffected female, II-4, may be a carrier of the pathogenic variant due to incomplete penetrance. Finally, we identified 18 dominant variants ( Figure 2B), including 14 missenses, 1 nonsense, 2 frameshifts, and 1 in-frameshift variants, co-segregated in the three boys.
To further evaluate the relationship between the variants and HSCR, we performed literature review about the seven pathogenic candidates. We found one in-frameshift variant p.Phe147del in RET, the most commonly observed pathogenic gene for HSCR. The remaining genes, such as GLYCTK (Sass et al., 2010), DRD5 (Daly et al., 1999), NPHP3 (Bergmann et al., 2008), and FANCI (Mehta and Tolar, 1993), were wellcharacterized pathogenic genes for some other rare diseases with recessive mode of inheritance or multi-gene diseases, such as D-glyceric aciduria, attention deficit-hyperactivity disorder, Meckel syndrome, and Fanconi Anemia. However, these genes were excluded due to recessive inheritance of their associated diseases. To further clarify the implications of CALN1 and CAPN9 in HSCR, we mapped the RET, CALN1 and CAPN9, combined with some known HSCR pathogenic genes, including, ECE1, EDN3, EDNRB, GDNF, NRTN, SOX10, PHOX2B, and KIAA1279, to protein-protein interaction (PPI) network curated in STRING database (Szklarczyk et al., 2015). CALN1 and CAPN9 were observed to connect with none of these known genes directly or indirectly within five nodes, suggesting that the two genes may not be pathogenic for HSCR (Figure 3). Only RET connected with the known pathogenic genes in the PPI network, in particular, which was also a known HSCR gene. The result indicated that the in-frameshift variant p.Phe147del in RET was pathogenic for the HSCR family.

Potential Impact of the In-Frameshift Variant on RET Protein Function
As illustrated in Figure 4A, the ECD of RET was composed of four cadherin-like domains (CLD1-4), and cysteine-rich domain (CRD). We found the in-frameshift variant p.Phe147del was located within CLD1. Previous study (Leon et al., 2012) reported that a disease-causing mutation at pVal145Gly in RET, which was close to the in-frameshift variant p.Phe147del, could disrupt RET N-glycosylation, giving us a hint that the in-frameshift variant may also function by disrupting RET N-glycosylation.
To determine the consequence of the in-frameshift variant p.Phe147del on RET glycosylation, we predicted the N-glycosites of wild-type and p.Phe147del RET proteins using GlycoEP (Chauhan et al., 2013), a webserver for predicting potential Nand O-glycosites in protein sequence. Moreover, the N-glycosites of p.Val145Gly was also predicted as a positive control. Finally, the sites of Asn151, Asn834 and Asn1084 were predicted to be glycosylated in wild-type RET protein (GlycoEP score > 0.85). However, Asn151, which is closest glycosylated site to the two mutants, p.Phe147del and p.Val145Gly, was predicted to be not glycosylated in the two mutant RET proteins ( Table 1). The result indicated that the in-frameshift variant p.Phe147del could function by disrupting RET N-glycosylation.

Validation of Pathogenic Variant by Sanger Sequencing
We validated the two variants by Sanger sequencing (Figure 4B). Our results demonstrated that the four patients carried the in-frameshift variant p.Phe147del in RET. Moreover, we also confirmed our hypothesis that the pathogenic variant carrier II-4 was unaffected due to incomplete inheritance. In accordance with the genotyping by whole exome sequencing, the other family members, II-2 and II-3, were wild genotypes. The result indicated that whole exome sequencing was efficient to identify pathogenic variants for monogenic inherited diseases.

Ethics Statement
The present study was approved by the Ethics Committee of the Children's Hospital of Shanghai Jiao Tong University, Shanghai, China, and was conducted according to the principles expressed in the Declaration of Helsinki. Participants and/or their legal guardians involved in this study gave a written informed consent prior to inclusion in the study.

Sample Collection
The present study included DNA samples from four patients (II-1, III-1, III-2, and III-3) and three unaffected family members (II-2, II-3, and II-4) as shown in Figure 1A. Genomic DNA samples were obtained with written informed consent. TIAN amp Blood DNA Kit (Tiangen Biotech, Co., Ltd., Beijing) was used for extracting genomic DNA from blood samples.

Reads Mapping and Variants Calling
Paired-end reads of 300 bp (150 bp at each end) from whole exome sequencing were mapped to UCSC human reference genome (GRCh37/hg19 assembly) using BWA (Li and Durbin, 2010) version 0.7.7-r441 'mem' mode with default options followed by removal of PCR duplicates and low-quality reads (BaseQ < 20). The bam files were then sorted and indexed by samtools (Li et al., 2009), and were converted as three-sample mpileup format for each parents-offspring trio. Variant calling was performed by VarScan (Koboldt et al., 2012) software 1 using version 2.3.7 of Trios mode.

Prediction of Glycosylated Sites
The glycosylated sites for RET proteins with wild-type, and p.Phe147del and p.Val145Gly mutants were predicted by GlycoEP (Chauhan et al., 2013) 2 with standard predictor, a webserver for predicting potential N-and O-glycosites in protein sequence. The prediction was performed based on Average Surface Accessibility (ASA+BPP) with threshold score 0.85.

DISCUSSION
Hirschsprung disease is a birth defect characterized by complete absence of neuronal ganglion cells from a portion of the intestinal tract. In the present study, we performed whole exome sequencing on seven members of the HCSR family to identify the disease-causing gene. Microscopic image-based histologic examination of the proband's diseased tissue also observed the absence of neuronal ganglion cells in narrow segment of mucosa of intestinal wall and myenteric nerve plexus.
To uncover genetic variants contributing to HSCR, we identified 100s of 1000s variants in seven members of the HSCR family using whole exome sequencing data. The lower Mendelian error demonstrated that trio-based variant calling was an effective strategy for family-based sequencing. As the heritable Hirschsprung disease was rare in the population, the disease-causing variants were more likely to be rare in healthy population. With the MAF calculated by gnomAD, we finally filtered a total of 1,059 rare variants in this family (MAF < 0.1%). In addition to MAF in healthy population, the dominant and recessive inheritance modes of HSCR were also considered in this family. In general, the autosomal recessive genes may be altered by two compound heterozygous variants, or one homozygous variant. However, we did not detect any recessive variants based on this assumption. For the assumption of autosomal dominant inheritance, the unaffected female, II-4, must be a carrier of the pathogenic variant due to incomplete penetrance, in accordance with the previous studies (Parisi and Kapur, 2000;Belknap, 2002). Moreover, the pathogenic variant must be co-segregated in the four patients and the carrier. The filtering steps by MAF, mode of inheritance and co-segregation could greatly exclude non-pathogenic variants.
To accurately identify the pathogenic variants, we further performed pathogenicity analysis on the 18 dominant variants. Generally, the algorithms evaluating the pathogenicity of single nucleotide substitutions and indels were different. Specifically, among the single nucleotide substitutions, the missense variants were mostly evaluated by SIFT, PolyPhen, MutationTaster and M-CAP, while the nonsense variants were evaluated by MutationTaster and DDIG-in. On the other side, we evaluated the pathogenicity of indels, frameshift and in-frameshift variants, using DDIG-in (Zhao et al., 2013) and SIFT-indel (Hu and Ng, 2013). Among the 18 dominant variants, seven genes or variants, including CAPN9, GLYCTK, DRD5, NPHP3, FANCI, CALN1, and RET, were predicted as pathogenic candidates by these algorithms. To our knowledge, GLYCTK (Sass et al., 2010), DRD5 (Daly et al., 1999), NPHP3 (Bergmann et al., 2008), and FANCI (Mehta and Tolar, 1993), were wellcharacterized pathogenic genes for some other rare diseases with recessive mode of inheritance or multi-gene diseases, such as Dglyceric aciduria, attention deficit-hyperactivity disorder, Meckel syndrome, and Fanconi Anemia. However, these genes were excluded due to recessive inheritance of their associated diseases. In addition, CAPN9 and CALN1 are thought to be associated with gastric cancer (Yoshikawa et al., 2000) and schizophrenia (Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011), respectively. To further narrow down the gene list that may contribute to HSCR, we performed literature review and mapped the pathogenic candidates to protein-protein interaction network. Apart from RET, the literature review and PPI network analysis successfully excluded the other six genes. Notably, p.Phe147del in RET was a novel pathogenic variant in HSCR based on the curation by ClinVar database (Landrum et al., 2018). Finally, the in-frameshift variant p.Phe147del in RET, the most commonly observed pathogenic gene for HSCR, was identified as the pathogenic variant.
To further examine the functional impact of the inframeshift variant p.Phe147del in RET on the occurrence of the disease, we mapped the variant to RET protein structure. It is well-recognized that variants locating within specific functional domains or protein translation modification sites could alter the protein conformation, protein-ligand binding, or protein-protein interaction. In this study, the inframeshift variant p.Phe147del in RET was located within the CLD1 domain. We accessed the Uniprot database, and found that the five amino acids adjacent to the in-frameshift variant were only characterized to be glycosylated, not be phosphorylated or methylated. Further literature investigation also accorded with the annotations by Uniprot database. RET encodes a transmembrane receptor, which is composed of ECD, transmembrane domain, and intracellular tyrosine kinase domain. Particularly, the CLD1 domain belongs to the ECD. Further analysis of the consequence of p.Phe147del variant on the RET protein revealed that this in-frameshift variant may disrupt glycosylation of RET protein, which may be the cause of HSCR in this family.
Compared with previous studies, our study focused on the cases from familial HSCR. The pathogenic genes for familial HSCR including RET, EDNRB and EDN3, exhibited high penetrance. However, for the sporadic cases with Hirschsprung disease, like the report by Tang et al. (2018), some novel pathogenic or susceptibility genes, such as PLD1, had reduced penetrance, indicating that the penetrance of pathogenic genes was higher in familial HSCR than the sporadic Hirschsprung disease. The sporadic Hirschspring disease may be caused by additive effect of the susceptibility genes and environmental factors.
In reality, the lack of experimental validation is a major concern about this research. However, we conducted systematic bioinformatics analysis to demonstrate the pathogenicity and functionality of the variant in this family. To our knowledge, this is the first study to report the in-frameshift variant p.Phe147del in RET responsible for heritable HSCR. In conclusion, the systematic analysis of this study not only improved understanding of the causes of this disease, but also was useful for clinical and prenatal diagnosis.

AUTHOR CONTRIBUTIONS
ZL and WW collaborated in the conception and design of the present study. LZ and LL collected and assembled the data. WW and QS were involved in data analysis and interpretation. All authors contributed to writing the manuscript and approved the final version.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00752/full#supplementary-material TABLE S1 | The evaluation of the pathogenicity of the 18 dominant variants using bioinformatics tools.