Targeted NGS Platforms for Genetic Screening and Gene Discovery in Primary Immunodeficiencies

Background: Primary Immunodeficiencies (PIDs) are a heterogeneous group of genetic immune disorders. While some PIDs can manifest with more than one phenotype, signs, and symptoms of various PIDs overlap considerably. Recently, novel defects in immune-related genes and additional variants in previously reported genes responsible for PIDs have been successfully identified by Next Generation Sequencing (NGS), allowing the recognition of a broad spectrum of disorders. Objective: To evaluate the strength and weakness of targeted NGS sequencing using custom-made Ion Torrent and Haloplex (Agilent) panels for diagnostics and research purposes. Methods: Five different panels including known and candidate genes were used to screen 105 patients with distinct PID features divided in three main PID categories: T cell defects, Humoral defects and Other PIDs. The Ion Torrent sequencing platform was used in 73 patients. Among these, 18 selected patients without a molecular diagnosis and 32 additional patients were analyzed by Haloplex enrichment technology. Results: The complementary use of the two custom-made targeted sequencing approaches allowed the identification of causative variants in 28.6% (n = 30) of patients. Twenty-two out of 73 (34.6%) patients were diagnosed by Ion Torrent. In this group 20 were included in the SCID/CID category. Eight out of 50 (16%) patients were diagnosed by Haloplex workflow. Ion Torrent method was highly successful for those cases with well-defined phenotypes for immunological and clinical presentation. The Haloplex approach was able to diagnose 4 SCID/CID patients and 4 additional patients with complex and extended phenotypes, embracing all three PID categories in which this approach was more efficient. Both technologies showed good gene coverage. Conclusions: NGS technology represents a powerful approach in the complex field of rare disorders but its different application should be weighted. A relatively small NGS target panel can be successfully applied for a robust diagnostic suspicion, while when the spectrum of clinical phenotypes overlaps more than one PID an in-depth NGS analysis is required, including also whole exome/genome sequencing to identify the causative gene.


INTRODUCTION
Primary immunodeficiencies (PIDs) are a phenotypically and genetically heterogeneous group of more than 300 monogenic inherited disorders resulting in immune defects that predispose patients to infections, autoimmune disorders, lymphoproliferative disease, and malignancies (1)(2)(3). PIDs with a more severe phenotype lead to life-threatening infections and life-limiting complications that require a prompt and accurate diagnosis in order to initiate lifesaving therapy (4,5). Phenotypic and genotypic heterogeneity of PIDs make genetic diagnosis often complex and delayed. Indeed, more than one genotype might cause similar clinical phenotypes, but identical genotypes will not often produce the same phenotype and finally clinical penetrance may be different (6)(7)(8)(9). The characterization of PIDassociated genes is expected to significantly contribute to define the molecular events governing immune system development and will provide new insights into the pathogenesis of PIDs. Molecular genetic testing is also a useful tool for the diagnosis of PIDs in atypical cases (6,10). Despite the progress in the genetic characterization of PIDs, many patients still lack a molecular diagnosis. A better understanding of the genetic and immune defects of patients is critical to develop therapeutic strategies aimed at changing the clinical course of the disease and to guarantee an appropriate genetic counseling allowing the identification of PID patients before the onset of the disease (11)(12)(13). The application of Next Generation Sequencing (NGS) to PIDs has been a revolution and it has accelerated the discovery and identification of novel disease-causing genes and the genetic diagnosis of patients with monogenic inborn errors of immunity (7,8,(14)(15)(16). Targeted gene-panel sequencing (17)(18)(19)(20)(21), whole exome sequencing (WES) (22,23) or whole genome sequencing (WGS) (24) approaches can rapidly identify candidate gene variants in an increasing number of genetically undefined diseases (17,24) and are widely used in several laboratories for the diagnosis of PIDs (10). WGS also offers the opportunity to find causative variants in the structural regions of a given gene. These tools increase the amount of data analysis that can identify causative genes in both clinically defined and atypical diseases. Nonetheless, delay in diagnosis can be caused by the huge amount of data retrieved from whole sequencing, increased costs sustained by clinical laboratories and the requirement of trained personnel to validate variants (7,8,22). An increased depth of the sequencing coverage is generally obtained using targeted gene panels, in favor of a high accuracy, amelioration of sensitivity and management of datasets, reducing the time of analysis, the costs and the interpretation of results, thus accelerating the diagnosis for the majority of PIDs (14,(16)(17)(18). On the other hand, the usefulness of targeted exome sequencing approach for the identification of PID patients has been demonstrated, with accurate detection of point mutations and exonic deletions in patients with either known or unknown genetic diagnosis (7,8).
In this study, we report the clinical and molecular characterization of 105 PID patients presenting with either typical SCID/CID or with overlapping PID phenotypes. Differently from other studies (20,21,25), most patients enrolled in this work had non-consanguineous parents. Two targeted sequencing approaches were compared to test the ion torrent reliability in diagnostics and Haloplex Target Enrichment System in diagnostics and for research purposes. Three diagnostic panels including known disease genes had been developed for the Ion Torrent platform (ThermoFisher). The Haloplex panels comprised well-defined PID genes (>300) and candidate genes associated with PIDs due to their expression and function in critical immune-pathways (1,3). This work underlines how targeted NGS panels allow a high-throughput low-cost pipeline to identify the molecular bases of PIDs and are sensitive and accurate diagnostic tools for simultaneous mutation screening of known or putative PID-related genes.

Patients
We report the clinical and molecular characterization of 105 PID patients mainly referred to three centers (2 in Rome and 1 in Milan) participating in the Italian network of PIDs (IPINET) and part of The European Reference Network on immunodeficiency, autoinflammatory, and autoimmune diseases (ERN RITA). Nine of these patients have been enrolled in the pCID study (DRKS00000497). Data were obtained from year 2014 to 2017. Ion Torrent and/or Haloplex panels were applied for the analysis of samples and compared. Six patients previously diagnosed by Sanger sequencing were included in the study (Table 2A) as internal positive controls. The Ion Torrent panels were used for the analysis of 73 patients with suspicion of PID. Among this group, 18 patients, still remaining without a molecular diagnosis and 32 additional patients, were tested by Haloplex panels (Target Enrichment System for Illumina platform). The work was conducted in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent, approved by the Ethical Committee of the Children's Hospital Bambino Gesù, San Raffaele Hospital (TIGET06, TIGET09) and Policlinico Tor Vergata, was obtained from either patients or their parents/legal guardians, if minors. Patients and their clinical and immunological features are reported in Table 1.

Panel Design
The construction of targeted panels design required the study of several reported clinical phenotypes of known PID genes described in the IUIS (International Union of Immunological Societies) in the years 2014-2015. Our three custom Ion Torrent panels were designed with Ampliseq Designer software using GRCh37 (panel 1 and 2) and GRCh38 (panel 3) as references. Primers were divided into two pools. The first custom panel (panel 1) contains 17 known genes related to SCID-CID phenotypes (85.85 kb). The second custom panel (panel 2) includes 24 genes for less frequent CID phenotypes (101.9 kb) and the third panel (panel 3) includes 62 genes for CVID (240.01 kb) (Supplementary Tables S1-S3). The final design was expected to cover 95.43% of the first panel, 94.13% of the second panel and 97.2% of the third genes panel. For each gene included in the panels a 10 bp of exon padding was included to cover the flanking regions of exon's coding sequences (CDS) including (panel 1 and 2) or not (panel 3) the untranslated regions (UTRs).      In bold novel mutations.
and canonical splice site variants were considered potentially pathogenic (6). In silico prediction of functional consequences of novel SNV was performed using Mutation taster, LTR, Polyphen2, SIFT, and CADD score >15 (26-30) and literature available data. Supplementary Figure 1A summarizes all steps of the process.

Panel Design
We designed two panels including up to 300 known PID genes (3) (Supplementary Tables 4A,B). The final probe design was expected to cover >97% of target regions. Practical coverage is indicated.

Statistical Analysis
Data were analyzed with Graph-Pad Prism, version 6.2 (Graph Pad Software, la Jolla, CA).

Characterization of PID Patients
In this study, we report the clinical and molecular characterization of 105 PID patients presenting with either typical or overlapping PID phenotypes. Patients were clustered according to initial clinical presentation in 3 main categories ( Figure 1A): T-cell defects (including Omenn syndrome, SCID, CID, syndromic T-cell defect, unclassified T-cell deficiency, hyper IgM syndrome); Humoral defects (agammaglobulinemia, CVID, unclassified antibody deficiency, dysgammaglobulinemia); Other PIDs (immune dysregulation, innate immunity defects including congenital defects of phagocytes, syndromic defects with immune-deficiency signs/symptoms, ALPS-ALPS-like, autoinflammatory syndrome, and a miscellaneous that includes non-typical PID patients with a broad range of clinical phenotypes). The clinical, immunological, and molecular features are reported in Table 1. The percentage of patients in each subgroup is shown in Figures 1B-D. Among the T-cell defects (n = 50; 47,7%), the majority of patients presented with SCID (48%), followed by CID (32%) (Figure 1B). The Humoral Defects group (n = 28; 26,6%) was mainly represented by CVID (75%), while the Other PIDs group (n = 27; 25,7%) included a wide spectrum of rare defects and uncommon phenotypes. Seventy-three PID patients were analyzed by Ion Torrent sequencing system using three different panels including SCID/CID and CVID known genes. Two Haloplex panels including more than 600 known and candidate PID genes were applied to 32 additional patients. Additionally, 18 patients previously analyzed by Ion Torrent but still without a clear molecular diagnosis, were analyzed by Haloplex system. A flow chart showing the route map for sequencing of index patients is shown in Figure 2.

Target Enrichment Performance and Gene Coverage
The mean target coverage resulted of 529 ± 169X (panel 1), 361 ± 97X (panel 2) and 417 ± 117X (panel 3) for Ion Torrent and 229 ± 25X for Haloplex panels (Supplementary Figure 2A). The mean target coverage for Ion Torrent panels was optimal as compared to recently published works in which a coverage of 335X was obtained (34). Indeed, the Ion Torrent expected coverage of the coding regions was 95.43% for panel 1 (SCID-CID), 94.13% for panel 2 (rare CID) and 97.2% for the panel 3 (Supplementary Tables 1-3). The practical coverage obtained from Ion Torrent panels is shown in Supplementary Figures 2B-D. Primer design for Haloplex aimed at covering more than 97% of the coding regions for all genes. The observed coverage of the targeted regions after running the two panels is represented in Supplementary Tables 4A,B. The majority of shared genes included in all panels and analyzed by both technologies were well-covered (Supplementary Figures 3A-C).

Performance Evaluation
The use of large panels for NGS retrieved a big number of data as compared to small panels. Putative variants detected by Ion Torrent have been examined and validated obtaining an average of false positive variants <0.6%. Such value decreases reducing the number of genes included in the panel. Haloplex produces larger amount of variants, but only the ones significantly indicative among those related to the patient's phenotype have been investigated; hence, we could not properly evaluate data accuracy. In the 18 patients resequenced by Haloplex, no variants in genes included in the Ion Torrent panels were found supporting the accuracy of these methods. Furthermore,  6 available samples previously diagnosed by Sanger sequencing with 8 known different mutations in RAG1, IL2RG, JAK3, and LIG4 genes, were included in the study and detected by Ion Torrent panel 1 (Table 2A).
One false negative diagnosis has been recently recognized. Indeed, the Torrent Suite Variant Caller TVC program was unable to identify the c.C664T: p.R222C mutation in exon 5 of IL2RG gene in patient PID16 but this was detectable on IGV.

Molecular Diagnoses
In our cohort, 28.6% (30/105) of molecular diagnosis was obtained ( Figure 3A). Sanger sequencing for all mutations and parents' carrier status were performed. Functional studies were conducted for most novel variants and results are reported in Table 2B.
A rapid molecular diagnosis was established in 30.1% (22/73) of PID patients who were investigated by Ion Torrent. Diagnoses were achieved in RAG1, RAG2, IL2RG, JAK3, ADA, CD3D, IL7R, CD40L, and XIAP genes (see Table 2B). As expected, the identification of a molecular defect resulted more frequent in patients with a clear clinical and immunological phenotype as shown in those included in the group of T cell defects (20/42; 47.6%) (Figure 3B). Interestingly, the percentage of diagnosis in the group of SCID/CID patients was 60.6% (20/33).
Two additional patients were diagnosed analyzing the 18 patients, previously negative by Ion Torrent, presenting with a less defined immunological phenotype (Figure 3D). For one patient (PID12), the Ion Torrent panel 1 was able to detect In bold novel not described mutations only a missense mutation in the ADA gene. Haloplex identified the second intronic mutation located in the fifth nucleotide upstream exon 5, not included in the Ion Torrent design, of the gene. In the second Ion Torrent negative patient (PID82) presenting an atypical HyperIgE syndrome, Haloplex detected two rare homozygous mutations in MYD88 and CARD9 genes, which were not included in the Ion Torrent panels (1 and 2). The pathogenic role of each single gene mutation is still under investigation but this molecular information is important to optimize the clinical management of the patient including the evaluation of HSCT as definitive treatment (44). In summary, 4 SCID/CID patients out of a total of 16 T cell defects, were identified by Haloplex, demonstrating once more a higher percentage of diagnosis in this PID group (Table 2B). However, although the possibility to identify a causative gene mutation correlates with a precise clinical clusterization, the identification of patients, with complex and extended phenotypes, needs larger NGS panels.

Disease-Associated Variants
Comparing the results obtained by the two methods, 44 (32 Ion Torrent and 12 Haloplex) disease-associated variants have been identified in 30 patients, of whom 18 were novel ( Table 2B). The majority of variants detected by Ion Torrent were missense (n = 23; 74.2%) as summarized in Figure 4A. We were also able to detect 4 small deletions and 5 splice site variants. The Haloplex panels detected 5 missense, 2 deletions, 4 splice site and 1 stop codon variants (Figure 4B). Among the 30 diagnosed patients, we found 13 compound heterozygous patients with mutations in RAG1, JAK3, ADA, IL7R, and CECR1 genes, 9 homozygous variants including ADA, RAG1, RAG2, CD3D, JAK3, ARPC1B, MYD88/CARD9, and JAGN1, 7 hemizygous variants in IL2RG, CD40LG, and XIAP, and only 1 heterozygous somatic variant in NRAS (Figure 4C). Therefore, most patients enrolled in this study were offspring of non-consanguineous marriages. The most frequent mutated gene in our cohort is RAG1 followed by IL2RG ( Figure 4D).

Putative Neutral Variants vs. Variants of Uncertain Significance (VUS)
Fifteen CVID patients were initially analyzed by Ion Torrent panels 1-2, but no causative variants were found. We therefore designed a specific CVID panel and found 4 putative causative variants suggestive of AD disease that was confirmed by Sanger sequencing. Indeed, we found a heterozygous damaging variant in the CTLA4 gene and a predicted damaging variant in the PTEN gene in an adult patient followed since childhood (PID56). The patient inherited one mutation from the father and one from the mother but the real role of these variants and their possible combined effect is still under investigation. In addition, two other VUS in TCF3 and PLCG2 genes were found in two patients (PID75 and PID54), in which no other evidences are available (see Table 1).
A rare variant in CD40L gene (p.R200S) found in patient PID50 was excluded from the analysis, although an altered CD40L expression was detected. This variant was predicted benign in multiple databases. Furthermore, a homozygous rare variant in CECR1 gene (p.Q233R) was found in patient PID13. However, the two proband's healthy brothers were found to be homozygous for this variant thus it was not considered pathogenic, nevertheless, additional functional studies will be performed to exclude genetic predisposition (e.g., ADA2 activity, protein expression).
Three novel variants of uncertain significance (VUS) identified by Haloplex in patients with classical and complex phenotypes are still "under investigation." We are currently validating a novel damaging variant in the TCF3 gene in two twin patients (PID57-58) and their mother affected by CVID (49). EMSA assay is ongoing to assess the capacity of TCF3 protein to bind DNA target sequences. In these twin patients we also previously found by Sanger sequencing a mutation in TNFRSF13B gene already described to be associated to CVID (50).
A causative variant in the BMP4 gene (51) with a severe myopia, ectodermal dysplasia, and cytopenia was found in a patient (PID95) in whom the altered immunological phenotype remains poorly explained by this mutation. Moreover, NFκB1 variant in a CID patient (PID38) was found but its significance is still under investigation.
Finally, heterozygous variants in TNFRSF13B (PID70, PID87) and NOD2 (PID77), genes were found by Haloplex in three patients. Generally, variants in susceptibility genes involved in the disease pathogenesis should be considered for potential future phenotypic implications particularly in adult patients where multiple factors may contribute to the onset of the disease.

DISCUSSION
The application of multigene NGS panels has extended our knowledge of PIDs and is currently recognized as a comprehensive diagnostic method in the field of rare disorders consenting the diagnosis in the 15-70% of all cases depending on the PID clinical and phenotypic clusterization (25,52). In the present work we show that the complementary, integrated use of two custom-made targeted sequencing approaches, Ion Torrent or Haloplex, allowed to clearly identify causative variants in 28.6% (n = 30) of the patients in all groups of PIDs, confirming the value of NGS assays to obtain a genetic diagnosis for PIDs (17)(18)(19)(20)(21)(22)(23).
The Ion Torrent approach resulted highly successful for SCID patients, a group generally more defined for its immunological and clinical presentation (53). Indeed, with this approach we identified 20/33 SCID/CID patients (60,6%). The Haloplex workflow was able to identify causative variants in 8/50 patients (16%) of whom 4 were found in the group of SCID/CID patients and 4 fall in that of complex and extended phenotypes. Interestingly, a molecular diagnosis was achieved in 2/18 (11%) patients presenting with typical and atypical clinical phenotypes resulted negative after Ion Torrent analysis and included in the Haloplex approach.
By NGS it is possible to identify unexpected mutations in apparently not corresponding PID cases, as recently reported by our group for a patient with agammaglobulinemia due to RAG1 deficiency (41). This result strengthen the notion of a large phenotypic variety associated with RAG deficiency, suggesting that it should be considered also in patients presenting with an isolated marked B-cell defect (54)(55)(56)(57) and as already reported that RAG mutations are more frequent than expected. Notably, RAG1 is the most frequent PID cause in our cohort. This case represents a paradigmatic model of how new questions arise on the management and follow-up for patients in which a milder phenotype could be associated to alternative treatments to transplantation (41,57,58). CVID is a typical example of a disease with a broad phenotype due to different gene alterations (59)(60)(61). Notably, in 4 CVID patients with mutations in TNFRSF13B and AIRE previously detected by Sanger sequencing (see Table 1) we extended NGS analysis to looking for novel disease causing genes. Therefore, frequent variants comparable to polymorphisms should be considered with caution since the pathogenic meaning is still unclear. Additional functional studies in these cases are required. Four additional diagnoses are summarized in Supplementary Table 5 (62). These were obtained after the completion of the present study by other targeted NGS panels and Sanger sequencing, indicating that the combination of indepth clinical knowledge and appropriate sequencing techniques can lead to new diagnoses.
Although the prioritization methods applied in this study follows all common assumptions for a correct data analysis, the identification of novel variants currently under investigation represents a challenge and their validation needs the essential support of further in-depth experimental studies (6)(7)(8)63). The integration of clinical, immunological, biochemical and molecular data might favor a revised PIDs classification of patients with similar phenotype due to a different genetic cause, or patients with different phenotypes but with the same genetic cause. In our experience, the use of selected NGS panels is useful and easy to handle for rapid diagnosis in clinically and immunologically wellcharacterized phenotypes. As compared to WES, targeted small NGS panels provide an important alternative for clinicians for direct sequencing of relevant genes, guaranteeing a high coverage and sequencing depth (64). On the contrary, their application in patients with atypical phenotypes could result in an incomplete and delayed diagnosis. Extended gene panels or WES should be directly used in these cases for research purposes, to allow the diagnosis of unexpected genotype-phenotype association.
As reported by several groups (7,8,(22)(23)(24), the application of targeted WES for each suspicion of PID by exploring gene-by-gene also for limited numbers of striking genes still remain time and resource consuming in the absence of synergy between clinical and bioinformatics supports. This is yet unfeasible for extended diagnostic purposes. Indeed, the huge amount of retrieved data and the risk of incidental findings in other non-PID genes involved in different monogenic or multifactorial pathologies may be confounding and do not corresponding to the first suspicion. Additionally, the confidence of the results decreases with the number of targeted genes and may preclude any variant detection in self-evident known genes (65). Many previously undetected variants do not have a well-defined role in our genome (1.5 × 10 6 million variants in each genome and lesser in exome). In this scenario, ethical and legal issues related to the disclosure of genetic information generated by NGS need to be considered and guidelines should be developed to help the different specialists to translate the genetic results into the clinics (64).
The achievement of NGS application will require further integration of knowledge based on clinical, immunological and molecular data and the collaboration among different experts in these fields. A better clinical, immunological and genetic characterization of new PIDs will significantly contribute to the identification of diagnostic and prognostic markers and early individual therapeutic strategies with significant patients' benefit.

DATA AVAILABILITY
Data have been uploaded to ClinVar, accession number: SUB5252744.