Evaluating the Genetics of Common Variable Immunodeficiency: Monogenetic Model and Beyond

Common variable immunodeficiency (CVID) is the most frequent symptomatic primary immunodeficiency characterized by recurrent infections, hypogammaglobulinemia and poor response to vaccines. Its diagnosis is made based on clinical and immunological criteria, after exclusion of other diseases that can cause similar phenotypes. Currently, less than 20% of cases of CVID have a known underlying genetic cause. We have analyzed whole-exome sequencing and copy number variants data of 36 children and adolescents diagnosed with CVID and healthy relatives to estimate the proportion of monogenic cases. We have replicated an association of CVID to p.C104R in TNFRSF13B and reported the second case of homozygous patient to date. Our results also identify five causative genetic variants in LRBA, CTLA4, NFKB1, and PIK3R1, as well as other very likely causative variants in PRKCD, MAPK8, or DOCK8 among others. We experimentally validate the effect of the LRBA stop-gain mutation which abolishes protein production and downregulates the expression of CTLA4, and of the frameshift indel in CTLA4 producing expression downregulation of the protein. Our results indicate a monogenic origin of at least 15–24% of the CVID cases included in the study. The proportion of monogenic patients seems to be lower in CVID than in other PID that have also been analyzed by whole exome or targeted gene panels sequencing. Regardless of the exact proportion of CVID monogenic cases, other genetic models have to be considered for CVID. We propose that because of its prevalence and other features as intermediate penetrancies and phenotypic variation within families, CVID could fit with other more complex genetic scenarios. In particular, in this work, we explore the possibility of CVID being originated by an oligogenic model with the presence of heterozygous mutations in interacting proteins or by the accumulation of detrimental variants in particular immunological pathways, as well as perform association tests to detect association with rare genetic functional variation in the CVID cohort compared to healthy controls.

Common variable immunodeficiency (CVID) is the most frequent symptomatic primary immunodeficiency characterized by recurrent infections, hypogammaglobulinemia and poor response to vaccines. Its diagnosis is made based on clinical and immunological criteria, after exclusion of other diseases that can cause similar phenotypes. Currently, less than 20% of cases of CVID have a known underlying genetic cause. We have analyzed whole-exome sequencing and copy number variants data of 36 children and adolescents diagnosed with CVID and healthy relatives to estimate the proportion of monogenic cases. We have replicated an association of CVID to p.C104R in TNFRSF13B and reported the second case of homozygous patient to date. Our results also identify five causative genetic variants in LRBA, CTLA4, NFKB1, and PIK3R1, as well as other very likely causative variants in PRKCD, MAPK8, or DOCK8 among others. We experimentally validate the effect of the LRBA stop-gain mutation which abolishes protein production and downregulates the expression of CTLA4, and of the frameshift indel in CTLA4 producing expression downregulation of the protein. Our results indicate a monogenic origin of at least 15-24% of the CVID cases included in the study. The proportion of monogenic patients seems to be lower in CVID than in other PID that have also been analyzed by whole exome or targeted gene panels sequencing. Regardless of the exact proportion of CVID monogenic cases, other genetic models have to be considered for CVID. We propose that because of its prevalence and other features as intermediate penetrancies and phenotypic variation within families, CVID could fit with other more complex genetic scenarios. In particular, in this work, we explore the possibility of CVID being originated by an oligogenic model with the presence of heterozygous mutations in interacting proteins or by the accumulation of detrimental variants in particular immunological pathways, as well as perform association tests to detect association with rare genetic functional variation in the CVID cohort compared to healthy controls.
Keywords: common variable immunodeficiency, primary immunodeficiency, exome sequencing, loss-of-function, rare disease genetics inTrODUcTiOn Common variable immunodeficiency (CVID) is the most prevalent symptomatic primary humoral immunodeficiency with a prevalence from 1:10,000 to 1:50,000 in North America and Europe (1). The diagnosis criteria consist in low serum concentrations of IgG, IgA and/or IgM, recurrent bacterial infections and poor antibody response to vaccines, in addition to the exclusion of other known causes of hypogammaglobulinemia (1)(2)(3)(4). Patients' phenotypes are highly heterogeneous due to different time onsets and to a high variety of related complications, such as autoimmune manifestations, lymphoproliferation, enteropathy, and lymphoid malignancies, suggesting that CVID could be a common outcome of diverse immune system failures.
The clinical heterogeneity of CVID has hindered both the diagnostic and the identification of the underlying genetic defect of the disease, allowing a molecular characterization of the origin in less than 20% of the patients, and usually in familiar forms of the disease which constitute only a small fraction of the CVID cases (1,(5)(6)(7). Despite that, mutations in the genes CR2, LRBA, NFKB1, NFKB2, IL21, TNFRSF13B, TNFRSF13C, CD81, IKZF1, PRKCD, MS4A1, and CD19 are listed in the OMIM database 1 as causative of disease, inducing reclassification of CVID in these new diagnostics, and establishing new therapeutic approaches based on the affected pathways that have markedly improved affected patients' prognoses (8). Specific variants in these genes as well as in others not listed in the OMIM database (NOD2, MSH5, TNFRSF13B, HLA) have been reported to confer susceptibility to the disease or to originate similar phenotypes to CVID (CTLA4, PLCG2, PIK3CD, PIK3R1), blurring even more the boundaries that define this disorder. Furthermore, some of the mutations have incomplete penetrance (9, 10) and many sporadic cases remain unexplained after deep genetic analyzes, suggesting that an important fraction of CVID cases might not follow a monogenic Mendelian pattern of inheritance (11).
In recent studies using whole-genome and exome sequencing to study CVID, 15-30% of CVID patients have been proposed to have a monogenic origin (12)(13)(14), with genetic variants both at candidate or new genes for CVID, although not all of these mutations have been functionally validated. In this work, we aim to estimate the proportion of monogenic cases in CVID and to explore other possible genetic models for CVID. For that, 1 https://www.omim.org/ (Accessed: September, 2015).
we have analyzed high coverage whole-exome sequencing and copy number variants data for 36 CVID pediatric patients. We hypothesize that focusing on pediatric cases will allow us to estimate the maximum proportion of monogenic CVID cases, based on the higher incidence of infectious disease in childhood and theoretical and molecular evidence of higher impact of inborn single gene defects in childhood than in adults, which tend to present more complex genetics of predisposition to infection (15,16). Because of the heterogeneity of CVID etiology and manifestations, we first examined the role of known genetic variants and candidate genes for CVID, and then expanded the analysis to interacting proteins and genes in the same pathway, and finally to the rest of the genome. We propose single candidate genes for the CVID patients according to different models of inheritance and by considering both genetic variants properties such as the allele frequency, bioinformatic predictions of the phenotypic effect or evolutionary conservation rates, as well as gene features such as haploinsufficiency and essentiality predictors. In addition, beyond the estimation of the proportion of patients under a monogenic model, we also propose exploring other possible disease models such us the oligogenic or polygenic by considering the presence of mutations in interacting proteins or the accumulation of functional variants in immunological pathways, as well as the disease association with rare functional genetic variants by comparison to healthy controls (17).

MaTerials anD MeThODs individuals included in the study
This study includes 36 patients diagnosed with CVID, including both sporadic and familiar cases, without any genetically confirmed primary immunodeficiency (PID), and completing the conventional criteria for CVID classification: (1) from 2 to 18 years old at the age of diagnosis; (2) lack of antibody production after immunization of antigen exposure in at least two assays; (3) 2 years post-diagnosis to exclude lymphoid malignancy; (4) IgG levels 2.5th centile for age and low IgA or/and IgM levels. CVID patients presenting one of the following features were excluded from the study: (a) well-known gene-identified PID such as hyper IgM; CD19+ or CD20+ B cell deficiency; ICOS or transmembrane activator and calcium-modulating cyclophilin ligand interactor (TACI) gene mutation already diagnosed; (b) secondary immunodeficiencies such as those due to complications such as associated tumors and lymphomas or from other therapies (side-effects following splenectomy, corticosteroid, or immune suppressive therapies). Patients L283, L286, and N216 were reported to be consanguineous. In addition, parents and siblings have also been included in the study, when available. Written informed consent for genetic analysis and research was obtained from all participants and ethical approval for the project was obtained from the institutional ethical committees.
We used two different sets of controls: whole-exome sequences from 36 individuals from a Spanish cohort diagnosed with autism spectrum disorders (ASD) (18) and 267 whole-exome sequences from healthy controls from a Spanish cohort (19). In the case where no data were available for the 267 whole-exome sequences, we retrieved data from the CIBERER Spanish Variant Server (csvs.babelomics.org) and used data for individuals with different syndromes not related to primary immunodeficiencies. genetic analyses DNA was extracted from blood samples. CNV analysis was performed with the CytoScanHD array (Affymetrix) according to the manufacturer's protocol. The CytoScanHD array contains 743,304 SNPs and 2,696,550 CNV markers. The obtained cychp files were analyzed with Chromosome Analysis Suite v.2.1.0.16 software and NetAffx na33 annotation version. For CNV detection and to prevent false positives, we considered alterations involving at least 25 markers and more than 150 Kb in length for gains, and 35 markers and more than 75 Kb for losses. For detection of loss of heterozygosity (LOH) regions, we considered alterations of at least 50 markers in more than 5 Mb. Exome capture was performed with the Agilent SureSelect XT enrichment system. DNA was sequenced in an Illumina HiSeq 2000 platform in a 2 × 75 paired-end cycles run. PCR duplicates were removed with Picard. 2 Sequence reads were mapped to the human reference genome (hg19) using GEM (20). Variant calling was performed using GATK (21) and SNP annotation with SnpEff (22) and SnpSift (23). Candidate mutations were visually inspected with the Integrative Genomics Viewer (24) and, when required, validated by Sanger sequencing. Somatic variants analysis was performed with VarScan2 (25), considering the high impact variants predicted with SnpEff (22), P-value <0.05, present in less than 40% of the reads and in a maximum of two patients.

genetic Data and statistical analyses
Only functional variants were considered, including missense, stop-gain and stop-loss, splice donor or acceptor sites mutations, and frameshift insertions and deletions. In addition to standard filters for mapping and variant calling and annotation we also discarded indels clustering within 10 base pairs of another indel and for most of the analyses we excluded those variants present in 10 or more individuals in our dataset. We used allele frequencies from The 1000 Genomes Project (26) and the NHLBI and Exome Sequencing Project. 3 We used GERP (27,28) to asses for evolutionary conservation and Polyphen (29) and SIFT (30) to predict the phenotypic impact of missense variants. We have also used predicted haploinsufficiency (31), intolerance to functional variation (32), and essentiality (33) scores to infer the possible model of the disease and prioritize candidate genes in the different patients.
We used the Fisher's exact test to assess the statistical significance of an excess of rare functional variants in cases compared to controls, from two by two tables with the total number of rare functional variants, and the total number of synonymous variants in patients and controls. In both cases, variants present in more than 10 individuals were excluded from the analysis to exclude false positives produced by sequencing artifacts. We applied the Li and Leal's collapsing method (34) to detect an excess of CVID patients with rare functional variation when compared to controls. Statistical significance was also assessed using the Fisher's exact test. For these two analyses, only nucleotide substitutions were considered.
The protein-protein interaction (PPI) data was obtained from the Human Protein Reference Database (35) considering the whole set of non-redundant interactions between two proteins. Gene lists for each pathway were extracted from the KEGG database (36)(37)(38). We considered the 25 pathways shown in Table S1 in Supplementary Material.

Functional Validations
To assess the effect of specific gene alterations, additional functional tests were performed. Mainly with peripheral blood mononuclear cells (PBMCs) or Epstein-Barr transformed B cells (EBV-B), including lymphocyte phenotyping and western-blot Ficoll-Hypaque (Sigma-Aldrich, St. Louis, MO, USA) density gradient centrifugation of heparinized blood was used for PBMC isolation. Cells were cultured with complete medium [RPMI (Gibco, Grand Island, NY, USA) supplemented with 10% heatinactivated fetal calf serum (Sigma-Aldrich, St. Louis, MO, USA), 1 µg/ml penicillin and 1 µg/ml streptomycin (Invitrogen, Grand Island, NY, USA)]. Viable cells were counted using a hemocytometer in an inverted microscope.
Lymphocyte stimulation capacity was assessed by flow cytometric detection of activation markers. PBMCs were stimulated for 7 days and then surface-stained with the following antibodies against activation markers: CD62L, CD25, HLA-DR, CD69, and CD40-L (BD Biosciences) and then acquired with the cytometer (FACS Canto II, BD biosciences).
Protein extraction and Western Blot: LRBA determination was performed in EBV-B cells (43). EBV-B cells were lysed with 1% NP-40 buffer. Protein concentration was normalized between

resUlTs
We generated whole-exome sequencing data for the 36 CVID patients included in the study, as well as for eight relatives, with an average coverage of 120×. In addition, we also generated CNV data for all the samples except in one case where DNA was not available. Table S2 in Supplementary Material shows the number of functional genetic variants described in each sample, classified in different annotation categories: missense, stop-gain (or nonsense), start-gain, splice site, and inframe and frameshift indels, with total numbers similar to what has been previously reported (44). Table S2 in Supplementary Material also contains the number of structural variants and LOH regions detected in the genotyping analysis with the CytoScanHD array.

OMiM cViD-causing Mutations
The OMIM database 4 includes known variants originating CVID in 13 genes: ICOS, TNFRSF13B, TNFRSF13C, CD19, CR2, MS4A1, CD81, IL21, LRBA, NFKB1, NFKB2, PRKCD, and IKZF1. There is also evidence that defects in other genes (CTLA4, PLCG2) can cause a similar phenotype or modify the severity of the disease with comorbidities (MSH5). These genes are mainly related to T-cell and B-cell defects leading to a deficiency in antibody production. In these 16 genes, we found a total of 96 nucleotide variants and 6 CNVs previously described to be putatively related to CVID in the literature ( Four of them were found in the CVID patients of this study ( Table 1). Two of the reported variants are included in the TNFRSF13B gene (TACI), which is known to harbor functional mutations in 5-10% of patients diagnosed with CVID (48,49). However, the existence of healthy controls with heterozygous mutations in this gene and the lack of a clear Mendelian pattern of inheritance in families have led to consider some of the mutations at TNFRSF13B as risk factors (9, 10) which could be determinant only in the case of homozygous individuals (50). Thus, TNFRSF13B would be considered a modifier gene rather than a causal gene in monogenic cases (51). The p.C104R variant is the most common TNFRSF13B functional mutation found in CVID patients (51). Three of the patients in this study present this mutation, in one case in homozygous state, being the second case found to date (52). This mutation is significantly more frequent in our CVID patients compared to the Spanish cohort controls (19) (P = 0.003, Fisher's exact test) and absent in the ASD controls (18) ( Table 1).
In the same gene, we report nine samples with the protein change P251L, although in this case the proportion is not significantly higher than in controls. In addition, a direct causal role for this variant can probably be discarded because of its high frequency in the reference populations (14% in the ExAc database, 11% for the European population). On the other hand the P21R variant of the TNFRSF13C gene found in four patients, and also one healthy parent, shows a higher frequency when compared to controls (P = 0.003, Fisher's exact test). However, this variant (rs77874543) has also been found in non-CVID exomes in homozygosity, and has a population frequency higher than 5%. Finally, we also detected two patients with the L85F substitution in the MSH5 gene (47). The same aminoacid substitution was also present in the mother of one these patients, not diagnosed with CVID but with some of the clinical features described in the patient. Nonetheless, this genetic variant has been found at lower frequencies in CVID patients compared to controls, and has a population frequency of 2% or higher in some populations (7% in Africans), which suggests that it does not have a determinant role in CVID.

loss-of-Function (loF) Variants
Loss-of-Function variants include stop-gain and loss mutations, splice-site mutations, and frameshift indels, which are predicted to disrupt proteins and, therefore, could likely relate to disease phenotypes, and in fact account for approximately 20% of the coding variants associated with disease (53). Table S2 TaBle 2 | Genes with Loss-of-Function (LoF) homozygous or heterozygous variants in common variable immunodeficiency (CVID) candidate genes and interacting proteins. in Supplementary Material shows the number of LoF variants identified in each individual of the study. The number of LoF variants ranges from 78 to 153, similar to what has been previously described (44,53,54). Applying different frequency thresholds substantially reduces the number of LoF variants per individual (54,55). We established a permissive allele frequency threshold of 1%, and first focused the analysis on the LoF variants described in candidate genes for CVID ( Table 2). With this aim, we generated a list of 97 candidate genes for CVID (Table S4 in Supplementary Material), including genes in the OMIM database, 5 genes defined in a review by Bogaert and colleagues (51), and others from the literature. Second, we also analyzed the presence of LoF variants in proteins interacting with the proteins encoded by candidate genes (see Materials and Methods) ( Table 2). Finally, we also report all the genes with LoF variants using a very low frequency threshold (0.001) ( mutation was validated by Sanger sequencing in the patient, and also detected in heterozygosis in both parents and three healthy siblings ( Figure 1A). Copy number and SNP analyses confirmed the existence of consanguinity in this patient. We estimated a consanguinity index of 0.058 compatible with descendants from third degree kinship marriages, based in the total of 174 Mb included in LOH regions (57), with 10 LOH regions of more than 5 Mb. We then performed assays with the patient cells to test the effect of the variant on the protein. The western blot gel electrophoresis separation (Figure 1B) shows that the cells of the patient do not produce any detectable amount of LRBA protein, thus validating the deleterious effect of the mutation abolishing protein production probably through nonsense-mediated decay. Furthermore, the expression of CTLA4 is downregulated in Treg cells of the LRBA-deficient patient (Figure 1C), in agreement with the previous description of CTLA4 detection in Treg cells from LRBA-deficient patients (39).
The N211 patient presents a new LoF genetic variant located at the CTLA4 gene, which has already been reported to harbor causal heterozygous CVID variants (41,42). The mutation causes a frameshift deletion not previously described and absent in the reference databases. We performed Sanger sequencing of this mutation and confirmed that it is a de novo mutation absent in the parents ( Figure S1 in Supplementary Material) and, therefore, a strong candidate to originate CVID. We performed functional analyses to study the expression of CTLA4 in Treg cells and we found that it is downregulated before and after stimulation with PHA. CTLA4 detection was lower than in the case of the aforementioned LRBA-deficient patient after PHA stimulation ( Figure 1C). Finally, we also analyzed the lymphocyte stimulation in the patient. After 7 days stimulation with PHA, the stimulation ratio of different lymphocyte stimulation markers was increased in the patient compared to a healthy control (Figure 2).
N202 presents a heterozygous splicing variant in PIK3R1. This variant has been previously reported to originate an immunodeficiency because of its dominant gain of function effect on PI3K signaling (58) in agreement with its high haploinsufficiency prediction value of 0.89 (31). For the remaining five patients presenting a low frequency heterozygous LoF variant in a CVID candidate gene ( Table 2), three of them have a variant in NFKB1, which has also been reported to harbor heterozygous mutations originating CVID (7). Two of them share a start loss variant affecting one of the transcripts, although its frequency of 0.002 makes it unlikely to have a causal (monogenic) role in the disease. By contrast, a new splice-site mutation in NFKB1 is described in N234, being a good candidate to originate the disease. In addition, N227 presents a 13 MB heterozygous deletion (chr4: 94,135,868-107,295,574) not present in parents which includes the NFKB1 gene among others (Table S2 in Supplementary Material). Finally, although the variants described at NOD2 and IL10RA are not present in any database, no CVID cases with heterozygous variants at these genes have been described, in agreement with their low haploinsufficiency values (0.119 and 0.173, respectively). In addition, Table 2 also includes low frequency LoF variants of genes interacting with candidate genes related with CVID.

Functional genetic Variation at candidate genes for cViD
We then explored the presence of functional variants, other than LoF described above, in candidate genes for CVID. The final number of variants with frequency less than 1% in each individual is shown in Table S6 in Supplementary Material, differentiating variants in candidate genes, variants in interacting proteins and in other genes. We excluded from this, and subsequent analyses, the two individuals with a functionally validated LoF candidate (L283 and N211, see above), and the variants also present in healthy relatives (when this information is available from exome sequencing). We first analyzed the presence of single variants in CVID genes that could originate the disease following a dominant  model. Because the number of genes with one or more functional variants is too high we applied stringent filters to produce a short list of candidate genes. We selected the variants with a GERP conservation score higher than 2 (28), a Polyphen score higher than 0.5 (for nucleotide variants) and a frequency in the ExAC and GMAF databases below 0.001. The nine variants at CVID genes fulfilling these conditions are shown in Table 3. Two of them are in frame indels and, therefore, less prone to have an effect on the protein. Among the heterozygous missense variants, PRKCD, CLEC16A, and DOCK8 (this latter absent in the healthy sister N209) are the more interesting candidates, considering their haploinsufficiency predictions (31) and essentiality values estimated from network and evolutionary properties (59). We then considered the recessive genetic model with the disease being originated by two rare functional variants in the same gene.
We analyzed the presence of homozygous variants or compound heterozygotes in CVID candidate genes, at frequencies below 0.01 (Table 4). Interestingly, two candidate genes (CR2 and PLCG2) are found as compound heterozygotes in patients N233 and N212, respectively.

compound heterozygotes at non-cViD genes
We expanded the analysis beyond the list of CVID candidate genes to the rest of the genome. We based our approach on the use of stringent filters (frequency, conservation, predicted effect) and the consideration of predictors of the degree of essentiality of the gene. This approach produces a list of new candidate genes in each CVID patient which can be ranked using the different variant and gene properties. We produced a list of genes harboring compound heterozygotes in each patient and applied two different allele frequency thresholds of 0.01 and 0.001. Table 5 shows the number of compound heterozygotes per patient, and gene names are shown in Table S7 in Supplementary Material. The number of genes per patient can be reduced using additional filters based in evolutionary conservation or predicted phenotypic effect. We established a threshold of a GERP > 2 for the functional variants, since positions with values greater than 2 are considered to be conserved among mammals and, therefore, more to prone to be of functional importance (28).
On the functional effect, we used the Polyphen prediction and established a threshold value of 0.5 (60) ( Table 5). Table S8 in Supplementary Material also shows additional information on gene properties which might aid the prioritization of candidate genes. Four genes are detected as compound heterozygotes in more than one patient (with GERP > 2 and Polyphen > 0.5): SLC25A5 (eight), ACOT4 (7), KMT2C (two), and OR10X1 (two). However, SLC25A5 and OR10X1 are two genes which have been recurrently reported in next-generation sequencing studies (61), probably because of being prone to mapping artifacts and, thus, to accumulating false variants. On the other hand, ACOT4 (with a function apparently not related to the immune function), is also a paralog of ACOT1. Finally, KMT2C is also present in two patients, although one of them is N227 which harbors a large deletion encompassing NFKB1 among other genes.

Oligogenic Disease
For the patients without a clear candidate gene for a monogenic origin of the disease, we then considered an oligogenic model of inheritance. In particular, we considered the digenic model. DIDA, a database of digenic diseases, included 44 diseases with 213 digenic combinations collected from the literature until June 2015 (62). This form of disease refers to both situations with a primary and a secondary locus or cases where two loci contribute to the disease with roughly the same importance (63). Modifier TaBle 6 | Patients with rare functional variants (MAF < 0.01) and GERP > 2 in a common variable immunodeficiency (CVID) candidate gene and interacting proteins. genes, affecting the severity of the disease, can also be considered a type of digenic inheritance (64). The case of TNFRSF13B, with several common variants related to CVID but with reported healthy carriers, could fit with this digenic model where additional variants would be needed to develop the disease. We analyzed the two patients with variants in this gene ( Table 1), and searched for variants in genes interacting with TNFRSF13B. Patient L297 harboring the C104R change in homozygosis, also has a heterozygous missense variant with a 2% frequency in TNFRSF13C, which directly interacts with TNFRSF13B. No other variants in interacting proteins were described in the patients with known CVID variants in TNFRSF13B, TNFRSF13C, or MSH5 ( Table 1). We expanded this analysis by assessing the presence of heterozygous rare functional variants in a CVID gene and in an interacting protein in the same patient. Table 6 shows the 10 patients in which this situation has been found, considering variants with GERP > 2 and below 0.01 frequency (see Table S9 in Supplementary Material) when considering a maximum frequency of 0.05. Interestingly, two pairs of related patients (sisters N205 and N206, and brothers N207 and N208) share the presence of variants at the interacting proteins PIK3R1-AXL and PIK3CD-RALY, respectively. In four more patients (Table 6), the CVID genes had already been suggested as probably causal (Tables 2-4) following recessive (N233) or dominant models (L288, N210, N234).

Patient cViD gene Variants interacting protein Variants
We then expanded the analysis to a scenario where variants in several genes of an individual might contribute to the disease. For this purpose, we assessed the presence of particular CVID patients which compared to the rest of the patients in the study harbors an excess of rare functional variants at any of 25 KEGG pathways related to the immune function (36) (Figure 3). We used a frequency threshold of 1% and estimated the ratios of functional to synonymous variants in each sample, to correct for possible differences in coverage across samples. We considered as outliers those individuals departing from twice the SD of the average number of rare functional variants (Figure 3). Table 7 shows the CVID patients with an excess of rare functional genetic variants in a particular pathway. The presence of more than one pathway in three of the patients is mostly due to the fact that these patients have genetic variants in genes with a role in several pathways. In the case of patient N208, it shows an excess of variants in five pathways that share the presence of three MAP Kinases (MAPK14, MAP2K2, and MAP2K3). Of interest, we found four patients with an excess of rare functional genetic variants in the B cell signaling pathway and three in the T cell signaling pathway, in addition to another two in the tumor necrosis factor and Fc epsilon RI signaling pathways ( Table 7).

association to rare Variants
Next, we assessed the association of rare functional genetic variation to CVID. In this case, analyses are performed to detect an excess of rare functional variation in a particular gene or pathway in CVID patients compared to controls, rather than the detection of the causal genetic variant(s) in particular individuals. To analyze the presence of genes harboring an excess of rare functional variants in the CVID patients compared to healthy controls, we first compared the ratio of rare functional to synonymous variants for each gene in cases compared to    after applying Bonferroni's multiple test correction. Table S10 in Supplementary Material reports these functional variants and their properties. Second, we used the Li and Leal's collapsing method (34) to detect an excess of CVID patients harboring rare functional genetic variants. In this method, individuals with and without at least one functional rare variant are compared between CVID patients and controls. This test has been performed only for those genes with similar lengths for the targeted regions to avoid false positives with more functional variants because of a larger scanned region in CVID patients. Table 9 presents six genes (PIK3CD, ICOSLG, TNFRSF13B, PIK3R1, CD84, and PRKCD) showing a statistically significant excess of individuals with rare functional variants in CVID cases when compared to controls. Interestingly, PRKCD showed also a significant excess of functional variation in cases in the previous analysis (Table 8), although only PIK3CD remains significant after Bonferroni's correction. Genetic variants in each gene are shown in Table S10 in Supplementary Material. Finally, we assessed a possible excess of functional variants in the 25 KEGG pathways by comparing our CVID patients to a set of controls (see Materials and Methods), by comparing the ratios of rare (<1%) functional to synonymous variants in each sample. We detected a significant excess of variants in two of the pathways in CVID patients when compared to controls: Fc epsilon RI signaling and cytokine-cytokine receptor interaction pathways (P < 0.001 and 0.002, respectively), plus two other marginally significant pathways after applying multiple test correction: cytosolic-DNA sensing, and NFKB signaling (P = 0.002 and 0.001, respectively). The four pathways also show a significant excess of functional variation in CVID patients when compared to the ASD controls set.
healthy controls. Table 8 shows the results of the analysis for the 60 genes analyzed (with at least one synonymous variant in each cohort), for the 34 patients without a validated candidate gene for a monogenic origin of the disease. Four genes (PRKCD, CLEC16A, DOCK8, and PLCG2) show a statistically significant excess of rare functional variants in CVID patients,

DiscUssiOn
In this work, we first approach the proportion of monogenic cases in CVID by using deep whole-exome sequencing combined with CNV analysis, in a cohort with mostly early diagnosis patients (and all of them less than 18 years old), which is expected to optimize the probability of including monogenic cases (15). We propose candidate genetic variants and genes with different levels of confidence (Figure 4). The higher confidence cases are the five LoF variants very likely to originate CVID: one in LRBA and CTLA4 (both functionally validated), two in NFKB1 (a large deletion and a new splice-site variant), and one in PIK3R1 (a known splice-site variant causing disease). Thus, a minimum of 15% of the 33 cases included in this study (the 36 patients include three pairs of relatives) would have a monogenic origin of CVID. Among the LoF variants described in proteins interacting with CVID candidate genes ( Table 2), a new LoF variant in MAPK8 is also a good candidate variant. MAPK8 shows high essentiality and haploinsufficiency prediction scores and is thought to play a key role in T cell proliferation, apoptosis, and differentiation (65)(66)(67). This stop gain variant is not found in genetic databases although it affects a base with a very low GERP value. We have also described the presence of LoF variants in the genes CR1, IBTK, and NCOR2 ( Table 2) that have been related to B cell development and activation (68), agammaglobulinemia (69), and lymphoma (70,71), respectively. However, CR1 and IBTK show low predicted haploinsufficiency values and the cases described at NCOR2 follow a recessive model for the disease. In addition to these LoF mutations at candidate CVID genes and interacting proteins, we propose other possible monogenic cases produced by missense variants at CVID candidate genes following a dominant (PRKCD, CLEC16A, DOCK8) or recessive models (CR2, PLCG2), as well as in other genes not previously associated with CVID (KMT2C). Of interest, missense variants and deletions in PLCG2 with dominant inheritance have been related to PID in previous studies (72,73). RVIS scores are also negative in these three genes which suggests a certain level of intolerance to mutations, although in the case of immunological diseases this value seems to be less indicative than for other diseases (32). Finally, two affected sisters (N205 and N206) harbor a new missense variant in PIK3R1. This variant has not been previously reported and is located in a conserved nucleotide according to its GERP value (3.24), although it is not predicted to be damaging using SIFT and Polyphen. On the situations fitting a recessive model, for the PLCG2 gene, one of the variants is predicted to be damaging with Polyphen and also shows a very high level of evolutionary conservation, although for the second variant both the evolutionary conservation and predicted phenotypic effect are low. Similarly, only one of the variants at CR2 in patient N212 shows a high level of evolutionary conservation, and none of the two variants is predicted to be damaging with Polyphen, being therefore a less promising candidate to originate CVID. Finally, KMT2C encodes for a nuclear methyltransferase (MLL3) of the mixed-lineage leukemia family the genes of which are among the more frequently mutated in cancer (74); somatic mutations at MLL3 have been related to different types of cancer (75), while in activated B-cells, deficiencies in the MLL3-MLL4 complex have been shown to manifest defective immunoglobulin class switching (76).
Thus, the proportion of CVID monogenic cases described in this work would rank from 15 to 24% or higher (Figure 4), similar to what has been described in previous studies (12)(13)(14) although lower than the 40% proposed in a recent analysis of 278 PID families including 20 CVID cases (77) ( Table 10). However, these studies follow differing filtering strategies and stringency criteria making the results to be only roughly comparable between them. Overall, the fraction of monogenic CVID cases seems to be slightly lower to that described in other PID (78,79), with some recent analyses showing considerably higher detection rates of PID monogenic cases (77,80) which is especially high in a study of severe combined immunodeficiency (SCID) (81) ( Table 10). The higher percentage of Mendelian patients described in some other PID (77,80) and especially SCID (81) is probably because of a higher severity which is also expected to correlate with the number of Mendelian cases (15). However, it is important to highlight that different factors can contribute to an underestimation of the Mendelian cases in CVID in comparison to other PID. First, because of the clinical heterogeneity of CVID it is not recommended to apply the standard exome sequencing strategy where candidate genes are compared across patients to identify as causal the gene present in several patients (82). Because of that, we have used a conservative approach by mainly considering a list of candidate genes, and used genetic variants characteristics (evolutionary conservation, Polyphen values) and gene features (haploinsufficiency, essentiality or tolerance to functional variation) mainly to indicate but not conclusively exclude a given candidate gene. For example, filtering by genic intolerance to functional variation is more effective in detecting false-positive rather than identifying the causal gene since it is known that genes producing Mendelian diseases show from medium to high intolerance values (83). Second, because of the higher prevalence of CVID compared to other PID, the use of too stringent frequency filters is not recommended, which hinders the identification of causal genes by increasing the number of candidates. And third, exome and even genome sequencing have some limitations that may produce false negatives because of the difficulties to detect structural variation. However, based on our results, the contribution of CNVs to monogenic CVID cases would be quite limited, in contrast to a more important role for common CNVs proposed in previous studies (84,85). We have used one of the highest density array optimized for CNV detection (86), and detected only a candidate CNV consisting of one big deletion including, among others, the NFKB1 gene. Similarly, in the recent whole-exome sequencing analysis of 278 PID families CNV represented 8% of the likely causing mutations, but no causal CNV was found among the 20 CVID patients (77). Independently of the exact proportion of monogenic cases in CVID, in an important percentage of patients the disorder remains genetically uncharacterized, and it seems clear than other possible models beyond the monogenic scenario should be considered. A genome-wide association study performed on 363 CVID patients has revealed susceptibility factors in MHC and ADAM, among others (84), but association with common variation seems to be far from explaining all non-monogenic situations. As has been proposed for complex disease, this CVID missing heritability (87) must be hidden under other models that have not been deeply explored, as oligogenic, accumulation of rare functional variation, epigenetic (11,88) or even somatic (89). In fact, the prevalence of CVID would fit with a model where the disease is produced by mutations in two or in a few genes, an intermediate scenario between the very rare disorders originated by a single locus and common disease produced by the interaction of many genes and environmental factors (90). Other features, such as different penetrancies and severities or the phenotypic variation in affected families, could also suggest an oligogenic origin for CVID, where the disease is caused or modulated by a few genes (91). Thus, we have performed different approaches to explore the possibility of CVID cases being originated by genetic variants in two or several genes.
Considering the digenic model, we have combined exome sequencing with PPI data, and described cases of patients with rare functional variants in CVID candidate genes and an interacting protein. Although promising, to date the number of reported examples in the literature with pieces of evidence of digenic inheritance remains quite low (62), probably because of difficulties in statistical and mainly functional analyses to demonstrate a real role in the disease (63). We have used a prudent approach based on the existence of physical interactions between proteins, to produce a reduced number of candidate interactions. Other tools to identify related genes, as the human genome connectome (92,93) or GIANT (94) could also be used. However, since interactions predicted by these tools are based both in physical and functional associations, the number of candidate protein pairs would be higher. Still at the individual level, we have considered a polygenic model and hypothesized that CVID in a particular patient might be produced by an accumulation of rare functional genetic variants in genes related to the same function, producing a list of patients with an excess of genes with functional variants in the same immunological pathway. Finally, we have performed tests of association of rare genetic variants to disease. In this case, the goal is not proposing candidate gene(s) in a particular patient but to detect genes enriched for rare functional variation in the cohort of CVID cases compared to healthy controls. Interestingly, most of the genes with significant results in these analyses (Figure 4) are among the ones with more pieces of evidence of being related to primary immunodeficiencies (51), thus supporting their role in the etiology of CVID. However, the application of these cohort approaches can be limited to syndromes as CVID because of its genetic heterogeneity. Instead, the use of higher levels of association such as pathways or functionally related genes can reduce the genetic heterogeneity and increase the detection power.
The detection of somatic genetic variants from exome sequ encing data is not straightforward. The detection power ultimately depends on the mutation frequency in the tissue, which is conditioned by the cell populations affected by the mutation and their relative abundance in blood, and will be, therefore, practically undetectable if present in low-frequency cell populations. On the other hand, high-frequency mutations present in more than 40% of the reads cannot be differentiated from germline mutations unless very high coverages are achieved. In addition to high coverages, the modification of standard NGS data analysis pipelines, which by default discard genetic variants in allelic imbalance, is required. We have tentatively analyzed exome sequencing data generated in this study (with 120X is the higher for CVID produced to date) scanning for low frequency variants with predicted high impact in our set of candidate genes. Not one of the patients presented a candidate somatic variant in any of the 97 CVID genes. A previous study proposed no role for somatic CNV in CVID, based on the stability of the overall CNV burden over time (85). However, for a proper analysis of the role of somatic variation much higher sequencing coverage would be needed, and the possibility of sequencing different cell populations or tissues with different origin could be also considered since variant callers for somatic variant calling are optimized for the comparison between healthy and affected tissue (tumor). We also propose that, as a change to the experimental design of our study, late onset CVID cases should be included in a study targeting somatic variation. Finally, epigenetics is also suspected to contribute to CVID. Altered epigenetic profiles are known to be related both to common and rare genetic disease (95). However, although epigenetics is known to play an important role in B lymphocyte differentiation and activation, there is less evidence of their involvement in PID (96). Interestingly, it has been proposed that the hypermethylation of important B lymphocyte genes has a role in CVID, through the analysis of monozygotic discordant twins (88). Thus, methylation could explain some of the many cases of CVID with intermediate penetrances, and also suggests an important role of mutations affecting gene expression (mostly not detected in exome sequencing approaches) in CVID.
We think that CVID is a main example of rare disease where it is possible to arrive at similar phenotypes by several different genetic defects, either by mutations in different genes or by different genetic mechanisms including from monogenic to epigenetic scenarios. After the success of new sequencing technologies, and in particular of whole-exome sequencing in unraveling the molecular mechanisms of many rare syndromes, rare diseases such as CVID that do not completely fit with a Mendelian model represent a new challenge for medical genomics. In this manuscript, we have proposed different approaches to the analysis of CVID from whole-exome sequencing data, and have shown its power and limitations as a diagnostic tool for the study of these diseases. Beyond the identification of the causal gene in some patients, we hope that these kinds of studies can also be used to help detect key pathways related to the development of the disease, thus contributing to a better understanding of its etiology. From our and previous results, we conclude that in an important proportion of patients it will be essential to integrate data from different omic approaches to solve the genetic origin of the disease.

eThics sTaTeMenT
This study was carried out in accordance with the recommendations of the "Guidelines on the Informed Consent" of the Bioethics Committee of Catalonia (Departament de Salut, Generalitat de Catalunya) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Comitè ètic d'Investigació Clínica-Parc de salut Mar (Barcelona).

acKnOWleDgMenTs
We thank all participants in the study. Exome sequencing was performed at the CNAG (Barcelona).

FUnDing
This study was funded by grants SAF2012-35025 and SAF2015-68472-C2-2-R from the Ministerio de Economía y Compe titividad (Spain) and FEDER (EU) to FC; by Direcció General de Recerca, Generalitat de Catalunya (2014SGR-866 and 2017SGR-702) to FC and EB; to EB by grant BFU2016-77961-P from Ministerio de Economía, Industria y Competitividad (Spain) AEI (Spain) and FEDER (EU); by Instituto de Salud Carlos III, grant PI14/00405, cofinanced by the European Regional Development Fund (ERDF) to RC; partially funded by CERCA Programme/Generalitat de Catalunya (JIA), and SAF2015-68472-C2-1-R grant from the Spanish Ministry of Economy and Competitiveness co-financed by European Regional Development Fund (ERDF) to JIA; GV-I was supported by grant BES-2012-051794; JH-R was supported by grant BES-2013-064333. TMB is supported by U01 MH106874 grant, Howard Hughes International Early Career, Obra Social "La Caixa" and Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya. All phases of this study were supported by the projects PI12/01990 and PI15/01094 to LA and PI13/00676 to MJ. This work was also supported by the Jeffrey Modell Foundation. This study makes use of data generated by the Medical Genome Project. A full list of the investigators who contributed to the generation of the data is available from http://www.medicalgenomeproject.com/en. Funding for the project was provided by the Spanish Ministry of Economy and Competitiveness, projects I + D + i 2008, Subprograma de actuaciones Científicas y Tecnológi cas en Parques Científicos y Tecnológicos (ACTEPARQ 2009) and ERFD.