Parallel Tests of Whole Exome Sequencing and Copy Number Variant Sequencing Increase the Diagnosis Yields of Rare Pediatric Disorders

Background: Both whole exome sequencing and copy number variants sequencing were applied to identify the genetic cause of rare pediatric disorders. In our study, we aimed to investigate the diagnostic yield of parallel tests of trio whole exome sequencing and copy number variants sequencing and its clinical utility. Methods: After collecting detailed clinical information, a total of 60 patients were referred to parallel tests of whole exome sequencing and copy number variants sequencing, which used shared initial libraries. Results: 26 pathogenic or likely pathogenic single nucleotide variants and 11 copy number variants were identified in 32 patients. 65.4% (17/26) of the SNVs were novel. The overall diagnosis rate was 53.3%. For the patients with positive results, 22 (36.7%) patients were diagnosed by whole exome sequencing and 10 (16.7%) patients were diagnosed by copy number variants sequencing. We also reviewed clinical impact on selected cases. Conclusion: We adopted an approach by performing parallel tests of trio whole exome sequencing and copy number variants sequencing with shared initial libraries. This strategy is relatively efficient and cost-effective for the diagnosis of rare pediatric disorders with high heterogeneity.


INTRODUCTION
High genetic heterogeneity in pediatric disorders has been an obstacle to phenotype-based diagnostic testing (Hu et al., 2018). Compared to conventional genetic tests on single-gene scale, tests on genomic scale are more likely to produce higher diagnostic yields and shorter turn-around times (Vissers et al., 2017). Generally, these genomic tests include whole exome sequencing (WES) and chromosomal microarray analysis (CMA). WES has been ordered increasingly in clinical molecular diagnosis laboratories as a powerful tool for rare Mendelian disorders, especially for genetically heterogeneous disorders such as intellectual developmental disorders and multiple congenital anomalies (Bowling et al., 2017;Han and Lee, 2020). According to previous researches, the overall diagnostic yields among different cohorts were ∼25% (Yang et al., 2013(Yang et al., , 2014Lee et al., 2014). Recently, multiple researches on clinical utility and cost of WES have provided evidence endorsing it as a first-tier test for children with suspected monogenic disorders (Nguyen and Charlebois, 2015;Monroe et al., 2016;Stark et al., 2016;Hu et al., 2018). CMA has been used as a first-tier clinical diagnostic test in patients with developmental delay (DD)/intellectual disability (ID), autism spectrum disorders (ASD), and multiple congenital anomalies (MCA) since 2010 (Manning et al., 2010;Miller et al., 2010); and copy number variants (CNVs) detected by CMA explained the pathogenesis for over 10% of these cases (Sanmann et al., 2015;Homma et al., 2018;Jang et al., 2019).
Recently, WES or whole genome sequencing (WGS) data were also used to call CNVs through the development of various algorithms and software programs that utilize read-depth information as the main strategy (Wang et al., 2016;Yao et al., 2017). WES-data-based CNV calling is usually performed as a supplement of routine WES test and the accuracy is affected by capture area and amplification efficiency (Rajagopalan et al., 2020;Sun et al., 2020). On the other hand, low-coverage WGS or CNV-seq were widely used in spontaneous miscarriage and prenatal cases, but only a few pediatric CNV-seq testing studies were reported. Most of the studies focused only on neurological disorders (Gao et al., 2019;Jiao et al., 2019). In our study, we performed parallel tests of trio-WES and CNV-seq for 60 children from different clinical departments. Our results showed this parallel testing strategy significantly increased diagnostic yields and lightened the burden of physicians in selecting optimal test.

Patients
Patients in this research were initially referred to the Beijing Children's Hospital from September 2018 to September 2019, with suspected Mendelian disorders. These patients had DD/ID, ASD, or MCA, and both SNVs and CNVs were highly suspected for their causes of disorders. Next-generation sequencing (NGS) was primarily ordered by the patient's physician and performed by the Laboratory for Genetics of Birth Defects at Beijing Children's Hospital. Written informed consent was obtained from the individuals and their legal guardians for the publication. This study was approved by the Beijing Children's Hospital institutional review board. After recruitment, a total of 60 patients were referred to parallel tests of WES and Abbreviations: WES, whole exome sequencing; CNV-seq, copy number variants sequencing; SNVs, single nucleotide variants; CNVs, copy number variants; CMA, chromosomal microarray analysis; DD, developmental delay; ID, intellectual disability; ASD, autism spectrum disorders; MCA, multiple congenital anomalies; WGS, whole genome sequencing; NGS, next-generation sequencing; TAT, turnaround time; QC, quality control. CNV-seq. The ages of the patients ranged from 1 month to 12 years, and the median age was 1 year. The maleto-female ratio in this study was 1.31:1 (34/26). For the referring reasons, 25 of the patients had multiple congenital anomalies. Ten had DD/ID. Six had the combination of DD/ID and multiple congenital anomalies. Four had autistic traits. The remaining 15 patients had other phenotype, such as congenital heart disease, short stature, recurrent infections, etc. (Supplemental Table 1).

Whole Exome Sequencing
DNA was isolated from peripheral blood samples obtained from the proband and parents using the Gentra Puregene Blood Kit (QIAGEN, Hilden, Germany). 200-ng genomic DNA of each individual was sheared by Biorupter (Diagenode, Liège, Belgium) to acquire 150∼200-bp fragments. The ends of DNA fragment were repaired, and Illumina Adaptors were added (Fast Library Prep Kit, iGeneTech, Beijing, China). After sequencing library was constructed, the whole exome was captured with AIExome Enrichment Kit V1 (iGeneTech, Beijing, China) and sequenced on Illumina NovaSeq 6000 (Illumina, San Diego, USA) with 150 base paired-end reads. Raw reads were filtered to remove low-quality reads using FastQC. Clean reads were mapped to the reference genome GRCh37. Quality control (QC) information included: average read length of >100×, accurate mapping rate of >98%, bases capture rate of >55%, 20× mean depth coverage rate of >96%, duplication rate of <25%, and accurate mapping rate of <96%. Single nucleotide variants (SNVs) were annotated and filtered by TGex (tgex.genecards.org). Variants with a frequency over 1% in the databases of gnomAD, ESP or 1000G were excluded. Variants that lacked segregation in family members were also filtered. Variants were classified following the American College of Medical Genetics and Genomics and the Association for Molecular Pathology interpretation standards and guidelines (Richards et al., 2015).

Copy Number Variant Calling
To identify large copy number variants (CNVs), part of the library without capture was sequenced directly onto Illumina NovaSeq 6000, and each sample yielded one giga base (Gb) raw data (QC: average read length: >0.3× WGS). An in-house pipeline was applied to map and call CNVs based on the software CNV-seq (Xie and Tammi, 2009;Hu et al., 2019). Clean reads were mapped to the reference genome GRCh37. CNVs called from parental WGS data were used as controls. CNVs reported in multiple peer-reviewed publications or annotated in curated databases as benign or likely benign, CNVs observed frequently in the general population and CNVs containing no genes were filtered. Database of Genomic Variants, DECIPHER database, ClinVar, OMIM, and ClinGen were used for interpretation and classification of the clinical significance of candidate CNVs according to previously reported guidelines (Kearney et al., 2011;Riggs et al., 2019). WES data was also used for CNVs calling, and the samples of the same batch were used as controls. Putative pathogenic or likely pathogenic CNVs identified from low-pass WGS were validated using WES reads depth data.

Molecular Diagnosis Rates
We identified 37 pathogenic or likely pathogenic variants in 32 patients. The overall diagnosis rate was 53.3%. For the patients with positive results, 22 (36.7%) individuals were diagnosed by WES ( Table 1) and 10 (16.7%) individuals were diagnosed by CNV-seq ( Table 2). For patients with MCA and/or DD/ID, the diagnosis rates were all higher than 50%. For patients with autistic traits, only one individual was diagnosed ( Table 1).

Characterization of Variants
For the 37 pathogenic or likely pathogenic variants, 26 were SNVs and 11 were CNVs (Tables 2, 3). Twenty-six SNVs were located in 21 different genes, and most of them (17/26, 65.4%) were in autosomal dominant diseases genes. Six variants were located in three autosomal recessive diseases genes, and three variants were located in X-linked recessive diseases genes. 65.4% (17/26) SNVs were not reported in previous literature or public population databases. Thirteen variants were identified to be de novo, six variants were compound heterozygous inherited from the parents, five were inherited from parents with similar phenotypes, and two X-linked recessive genes variants inherited from maternal carriers. Mutation in only two genes, ELN and FBN1, occurred in more than one patient, indicating the high heterogeneity of pediatric disorders ( Table 2). For 11 CNVs identified by CNV-seq, the sizes ranged from 1.41 to 43.68 Mb. All these variants were also called from WES data to eliminate false positive variants. All these variants were de novo ( Table 3). Patient 23 had a distal deletion and distal duplication, which was potentially inherited from a parent with balanced translocation. Further karyotype test may help to verify this hypothesis.

Turn-Around Time (TAT) and Cost Analysis
We aimed to decrease the difficulty for physicians to select optimal test between WES and CNV-seq, and to shorten TAT, as well, by parallel testing. We tracked the TAT of our parallel test for these patients; it ranged from 38 days to 125 days. The median TAT is 72 days with 69.2% of patients received reports within 80 days. The raw data were usually obtained within 20 days. Initial test results were generated 10-20 days afterwards and were delivered to ordering physicians to further check potential phenotype. Typically, negative cases cost more days because multiple rounds of communication with ordering physicians would be guaranteed, and a second analysis by another geneticist was performed before the formal reports were issued. The direct cost of running a parallel trio-WES and CNV-seq is about $600 US dollars ($200 per person), including library construction (∼$100 USD), WES (∼$80 USD, 100×) and lowcoverage WGS (∼$10 USD, 0.3×), and Sanger validation (∼$10 USD). These results showed that our parallel test strategy was affordable and additional CNV-seq did not increase the prime cost since the generation of one Gb raw data usually costed <$20 USD.

Clinical Impact of Genetic Diagnoses
Over half of the patients ended their diagnosis odysseys after parallel tests. The positive diagnostic results affected clinical management including appropriate genetic counseling, other systemic evaluation, and change of treatment. We provided examples of impacts on patients' clinical managements, below.
Case 4 was a 12-month-old boy. He was referred to Clinic of Development & Behavioral Pediatrics for Short Stature and DD. A de novo SMARCA2 mutation was identified and was therefore diagnosed as Nicolaides-Baraitser syndrome . This syndrome was less recognizable and always misdiagnosed as Coffin-Siris syndrome, Williams syndrome, etc. Once molecular diagnosis was confirmed, this patient was referred to neurologist for seizure evaluation. Ophthalmological and audiological examinations were also ordered.
Case 40 was a 5-month-old boy who had developmental and growth delays. Mutations in two genes were identified. DD was caused by SETD5 mutation inherited from his mother with intellectual disability and growth delay was caused by ACAN mutation inherited from his father with short stature. Referral to an early intervention program is recommended for access to occupational, physical, speech, and feeding therapy. Growth pattern should be monitored and growth hormone therapy can improve height increase .
Case 29 was a 15-year-old girl. She was first referred to Department of Respiration for Pulmonary Lesions. Further evaluation revealed short stature, ataxia, tooth agenesis, depigmentation/hyperpigmentation of skin and absence of secondary sex characteristics. Parallel tests identified a 3.1-Mb de novo interstitial deletion of the 14q13.2q21.1 region encompassing 17 OMIM genes . In these genes, NKX2-1 deletion is responsible for choreoathetosis, hypothyroidism, and neonatal respiratory distress and haploinsufficiency of PAX9 causes oligodontia phenotype (Das et al., 2002;Santen et al., 2012;Hayashi et al., 2015). For this patient, thyroid function testing should be performed annually, and when hypothyroidism is discovered, thyroid hormone replacement therapy should be initiated. For choreoathetosis, Tetrabenazine and Levodopa therapy have been reported to effectively reduce chorea (Setter et al., 2009;Rosati et al., 2015).
Cases 1, 13, and 59 all had 22q11.2 deletion syndrome. This was the only recurrent CNV disorder in our cohort. Case 1 was initially considered as DiGeorge syndrome for the characteristic facial features. Case 13 was referred only for autistic behaviors, and Case 59 was referred for congenital heart disease and laryngomalacia. After diagnosis, these patients took further evaluation and treatment following the clinical practice guidelines for individuals with 22q11.2 deletion syndrome (Bassett et al., 2011).

DISCUSSION
In this study, we performed parallel tests of WES and CNV-seq for 60 patients. Thirty-seven pathogenic or likely pathogenic variants in 32 patients were identified, and the overall diagnosis rate was 53.3%. Our study provided preliminary results of clinical utility of parallel tests of CNVs and SNVs at the same time in pediatric patients with developmental delay/intellectual disability, autism spectrum disorders, and multiple congenital anomalies.
Over 6500 phenotypes were included in the OMIM by the end of 2019 and most of them were onset during childhood. Genetic pediatric disorders are highly heterogeneous and relatively rare, which, for investigating them, necessitates a process of serial testing for specific conditions (Han and Lee, 2020). In consequence, this strategy may be expensive and timeconsuming. The molecular diagnosis of these rare diseases is based on the appropriate choice among tests of karyotype, Sanger sequencing, multiplex ligation dependent probe amplification, CMA, NGS, etc., which requires a physician with medical genetics training and experience. However, most children's hospitals do not have independent genetic clinic for patients with genetic disorders (Hu et al., 2018). Meanwhile, a more comprehensive strategy of genomic-first approach should be applied to achieve higher positive rate and reduce diagnostic odyssey (Johnson, 2015).
Previous WES studies revealed a diagnosis rate of ∼25-30% in nonselective patients (Yang et al., 2013(Yang et al., , 2014Lee et al., 2014;Daoud et al., 2016;Retterer et al., 2016). Moreover, additional pathogenic and likely pathogenic CNVs were also identified in over 10% of patients (Sanmann et al., 2015;Homma et al., 2018;Jang et al., 2019). The development of NGS data based CNV calling algorithm makes it possible to identify SNVs and CNVs at same time. In this study, we performed trio-WES and low-coverage WGS (0.3×) simultaneously and uncovered pathological causes in over half of the patients. The rate of patients with neurological features was 67% (14/21), which was higher than previous reports (Gao et al., 2019;Jiao et al., 2019). However, it was notable that our sample size was relatively small, and more researches were needed to better the clinical efficacy.
The application of NGS to detect genome-wide CNVs is a recently developed method (Xie and Tammi, 2009). Comparing to pediatric disorders, CNV-seq was more widely used in prenatal screening and diagnosis, which required lower resolution (Liang et al., 2014;Zhao and Fu, 2019). The resolution of CNV-seq can be adjusted by increasing or decreasing the sequencing data volume of raw data. For pediatric diagnosis, the resolution was usually >100 Kb (Gao et al., 2019;Jiao et al., 2019). By comparing differences of aligned reads number between case and control samples, losses or gains of chromosomal regions can be identified and quantitated. Therefore, it has been reported that it is easier to identify lowlevel mosaicism by CNV-seq, rather than CMA (Grotta et al., 2015). Furthermore, identifying CNVs by sequencing has other advantages, including lower starting input of the DNA, cheaper cost, shorter TAT and higher throughput.
In the present study, our parallel tests of WES and low-pass WGS (0.3×)-based CNV-seq decreased TAT and human labor without a significantly increased cost. Compared to WES, the extra cost was used for WGS of each sample to yield one Gb raw data directly, using the same sample library constructed during WES before capture, and the sequencing cost for 1 Gb raw data was about $10 USD with Illumina NovaSeq, in this research. We identified pathogenic or likely pathogenic variants with WES and CNV-seq in 36.7 and 16.7% of patients, respectively, which was comparable with previous study, indicating that our strategy is efficient to simultaneously identify SNVs and CNVs. Therefore, the ordering physicians do not need to choose between CMA and NGS, a choice which is dependent on clinical expertise, and they are more likely to identify clinically relevant variants in patients with atypical presentation or novel phenotypes. The mean age of positive cases at diagnosis was 3 years (95% CI: 1.73, 4.26). Early diagnosis avoided long diagnostic odyssey once symptoms appear, but also provided opportunities for early intervention, helping to improve outcomes and prolonging life.
In conclusion, our study provided preliminary experience of parallel tests of WES and CNV-seq with same initial library and evaluated the clinical utility. Compared to traditional trio-WES test, our strategy increased the diagnosis rate from 36.7 to 53.3% in our relatively small cohort. With less depending on physicians' selection between SNVs and CNVs tests, we propose this strategy is efficient and cost-effective for the diagnosis of genetic pediatric disorders with high heterogeneity.

DATA AVAILABILITY STATEMENT
The datasets for this article are not publicly available due to concerns regarding participant/patient anonymity. Requests to access the datasets should be directed to the corresponding author.

ETHICS STATEMENT
Written informed consent was obtained from the individuals and their legal guardians for the publication.

AUTHOR CONTRIBUTIONS
XH, RG, JG, and ZQ collected the clinical data. CH and WL designed the study. XH, RG, and JG performed data analysis. XH and CH wrote the manuscript. All authors contributed to data acquisition and data interpretation, critically revised the manuscript for important intellectual content, approved the final, submitted version of the manuscript, and agreed to be accountable for all aspects of the work.