ORIGINAL RESEARCH article

Front. Genet., 20 September 2019

Sec. Genomic Medicine

Volume 10 - 2019 | https://doi.org/10.3389/fgene.2019.00854

New Recurrent Structural Aberrations in the Genome of Chronic Lymphocytic Leukemia Based on Exome-Sequencing Data

  • 1. Research Group on Lymphoproliferative Diseases, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain

  • 2. Complexo Hospitalario Universitario de Santiago de Compostela (CHUS), Division of Hematology, SERGAS, Santiago de Compostela, Spain

  • 3. Department of Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain

Abstract

Chronic lymphocytic leukemia (CLL) is the most frequent lymphoproliferative syndrome in Western countries, and it is characterized by recurrent large genomic rearrangements. During the last decades, array techniques have expanded our knowledge about CLL’s karyotypic aberrations. The advent of large sequencing databases expanded our knowledge cancer genomics to an unprecedented resolution and enabled the detection of small-scale structural aberrations in the cancer genome. In this study, we have performed exome-sequencing-based copy number aberration (CNA) and loss of heterozygosity (LOH) analysis in order to detect new recurrent structural aberrations. We describe 54 recurrent focal CNAs enriched in cancer-related pathways, and their association with gene expression and clinical evolution. Furthermore, we discovered recurrent large copy number neutral LOH events affecting key driver genes, and we recapitulate most of the large CNAs that characterize the CLL genome. These results provide “proof-of-concept” evidence supporting the existence of new genes involved in the pathogenesis of CLL.

Introduction

Chronic lymphocytic leukemia (CLL) is the most frequent lymphoproliferative disease in Western populations, and it is characterized by its clinical and genetic heterogeneity. Döhner et al. (2000) described the widely used cytogenetic classification of CLL based on the most prevalent chromosomal aberrations in the CLL genome (Döhner et al., 2000), that is, trisomy 12 and deletions in 13q14.2–14.3, 11q22.3, and 17p13.1. Since that moment, new genome-wide studies have revealed new recurrent genomic aberrations, such as trisomy 19, amplifications at 2p and 8q, and deletions at 8p, 6q21, 18p, and 20p (Pfeifer et al., 2007; Landau et al., 2015). Similarly, a wealth of genomic and epigenomic modulators of CLL’s clinical aggressivity have been discovered (Nadeu et al., 2018), such as point mutations in NOTCH1, SF3B1, ATM, TP53, and POT1 and the absence of somatic hypermutation at the IGHV locus. It has been observed that copy number aberrations (CNAs) in CLL genomes tend to be acquired early in disease evolution and usually remain stable, whereas the mutational heterogeneity can increase (Nadeu et al., 2018). Indeed, mounting evidence indicates that the accumulation of these cytogenomic events modulates CLL proliferation and clinical aggressivity to a great extent (Raponi et al., 2018; Gruber et al., 2019), acting as drivers of genomic complexity and clonal evolution (Edelmann et al., 2017; Yu et al., 2017; Hernández-Sánchez et al., 2019) and accumulating in relapsed cases (Ljungström et al., 2016; Leeksma et al., 2019).

CNAs and copy number neutral loss of heterozygosity (CNN-LOH) are oncogenic mechanisms that induce gene-dosage effects, disrupt coding sequences, cause structural rearrangements, or potentiate epigenetic effects. Oncogenes are frequently affected by copy number gains, while tumor suppressor genes tend to be deleted. Massive array-based techniques such as array comparative genomic hybridization and single-nucleotide polymorphism (SNP) arrays have enabled the analysis of structural aberrations on cancer genomes to an unprecedented resolution of 10–100 kb. With the development of massive sequencing technologies, large databases of cancer sequence data have been published. This motivated the development of a variety of CNA detection algorithms from exome-sequencing data, which have the additional benefit of detecting smaller CNA events at the expense of increased false discoveries and reduced sensitivity and specificity (Nam et al., 2016). These methods are specifically designed to face particular issues, particularly those inherent to the sequencing protocol (such as biases induced by hybridization, GC content, and read mappability), due to cancer biology (ploidy estimation and subclonality) (Zare et al., 2017) and due to the presence or absence of matched controls (Kim et al., 2017).

In this analysis, we used previously published exome-seq data in order to detect small recurrent structural events involved in the pathogenesis of CLL. Our results not only reproduce the known cytogenetic aberrations in CLL but also support the existence of multiple recurrent focal CNAs and CNN-LOH affecting key oncogenic pathways, some of which are clearly associated with higher proliferative capacity, shorter survival, and altered gene expression. We conclude that focal CNAs may be more relevant than previously expected in the pathogenesis of CLL, and they merit further consideration for prognostic stratification.

Methodology

Data Source

The International Cancer Genome Consortium (ICGC) Data Access Committee granted us access to the CLL sequencing data (Ramsay et al., 2013) deposited in the European Genome-Phenome Database (EGA) under DACO-1040945. For this analysis, we used exome-seq data from matched control and tumor samples of CLL cases under the accession code EGAD00001001464. Patient characteristics can be consulted in Table 1.

Table 1

Patient characteristics
Number of cases441
Males/females59.7%/40.3%
Median age at diagnosis62.5 years
% of MBL11.20%
% of Binet A77.12%
% of Binet B9.61%
% of Binet C2.06%
% unmutated IGHV35.03%

Patient characteristics.

MBL, monoclonal B-cell lymphocytosis.

Data Preprocessing

Samples were processed by Puente et al. (2015) as described in their original paper (Puente et al., 2015). Briefly, 3 μg of genomic DNA was used for paired-end sequencing library construction, followed by enrichment in exomic sequences using the SureSelect Human All Exon 50Mb kit (Agilent Technologies). Next, DNA was pulled down using magnetic beads with streptavidin, followed by 18 cycles of amplification. Sequencing was performed on an Illumina GAIIx or on a HiSeq2000 sequencer (2 × 76 bp). Exome-seq data were aligned to the reference genome (GRCh37.75) using bwa (Li and Durbin, 2009). Duplicate read removal, sorting, and indexing were done using samtools (Li et al., 2009). Base quality score recalibration was made with BamUtil (Breese and Liu, 2013) using a logistic regression model.

CNA and CNN-LOH Detection

We analyzed paired tumor-normal exome-sequencing data with Control-FREEC version 11.3 in order to identify somatic CNA and CNN-LOH regions (Boeva et al., 2012). Control-FREEC uses aligned reads to construct and normalize a copy number profile and a B-allele frequency (BAF) profile. Then, it performs profile segmentation and infers genotype status for each segment using both copy number and allelic frequency information. Finally, genomic aberrations are identified and annotated.

The following specifications were used: “window = 0,” “ploidy = 2,” “breakPointThreshold = 1.2,” “noisyData = TRUE,” “readCountThreshold = 50,” “forceGContentNormalization = 1,” “contaminationAdjustment = TRUE,” “telocentromeric = 50000,” and “mateOrientation = 0.” BAFs were estimated using all variants reported in dbSNP (version 150) with a minimal coverage per variant position of 5 reads and a minimal sequencing quality per position of 20 Phred. Variant calling was limited to regions covered in the SureSelect Exome Capture 50Mb version 4 kit.

LOH and CNA Selection and Filtering

CNN-LOH were called with p-values < 0.05 (Kolmogorov–Smirnov test) and an uncertainty upper threshold of 5%. Regions with lowmappability according to UCSC 75-bp mappability tracks (score below 0.5) were filtered out. As expected, we observed regions that seemed prone to CNA erroneous detection. Thus, we decided to apply a hard filter and discard those CNAs significantly enriched in both amplifications and deletions, as well as those located near a telomeric or centromeric region. CNA events were detected using GISTIC2.0 (Beroukhim et al., 2007). GITIC2.0 was run with default parameters plus the arm peel-off filter. Focal recurrent CNAs were defined as those spanning less than 50% of a chromosome arm with a residual q-value < 0.05 and a wide peak size below 10 megabases. A 1 − log2 tumor/normal ratio above 0.3 was used to define amplifications, and a ratio below −0.3 was used to define deletions.

Survival Analysis

Variables associated with time to treatment (TTT) and overall survival (OS) were analyzed using Cox regression as implemented in the survival R package (Therneau and Grambsch, 2000; R Development Core Team, 2011; Therneau, 2015). Assessment of the proportional hazards assumption was performed using the cox.zph function. We created two different models: a univariate model that only includes CNA status for each gene and an adjusted model that included variables associated with clinical outcome at a p-value < 0.2 in a univariate model. The combined model (CM) for TTT analysis included as covariates donor sex, stage at diagnosis (monoclonal B-cell lymphocytosis (MBL), Binet A, Binet B, and Binet C), and IGHV mutational status, while the CM for OS analysis included stage at diagnosis, IGHV mutation status, and age at diagnosis. The Benjamini–Hochberg (BH) method was used to adjust for multiple testing.

RNAseq Analysis and Correlation With CNA Status

Two hundred twenty patients had matched RNAseq data of CLL-purified cells (accession IDs EGAD00001001443 and EGAD00001000258). Illumina adapters were removed using cutadapt (Martin, 2011), and alignment to the human reference genome (GRCh37) was performed using Hisat2 (Kim et al., 2015) with default specifications. We used the Hisat2-provided Hierarchical Graph FM index for GRCh37 with SNP and Ensembl transcript information. Bam files were sorted and indexed using samtools (Li et al., 2009). Bam files were processed in R (R Development Core Team, 2011) according to the RNAseq gene expression protocol developed by Love et al. (2015). Briefly, bam files were read using Rsamtools (Morgan et al., 2017), followed by gene-level expression estimation using the SummarizeOverlaps function from the GenomicAlignments package (Lawrence et al., 2013). Gene models in GTF format were downloaded from Ensembl (GRCh37.75 version) (Yates et al., 2016). A log2-transformation on normalized frames per kilobase counts was performed. Focal CNAs were classified according to Gistic into low-range events (tumor/normal log2 ratio > 0.3 and <0.9 for amplifications and less than −0.9 and more than −0.3 for deletions) and higher-range events (tumor/normal log2 ratio > 0.9 for amplifications and less than −0.9 for deletions). Correlation between CNA status and gene expression was performed using Spearman’s correlation. A minimum of 5 CNA events with matched transcriptomic data was set for analysis. Furthermore, immunoglobulin and T-cell receptor gene rearrangements were not included. p-values were adjusted for multiple testing using the BH method.

Results

Focal CNA Regions and Their Association With TTT and OS

We identified 54 recurrent focal CNAs in the CLL genome (residual q-value < 0.05, Supplemental Figure 1, Table 2). Among them, there were 31 recurrent amplifications with a wide peak size of 75.1 kb (Figure 1) and 23 recurrent deletions with a median wide peak size of 405 kb (Figure 2).

Table 2

Unique NameCasesCytobandWide Peak BoundariesLengthQ valueResidual Q valueGenes in Wide Peak
Amplification Peak 391p31.1chr1:74500015–746215521.22E+054.80E−161.86E−11LRRIQ3, FPGT, TNNI3K
Amplification Peak 491p22.2chr1:91784576–918131832.86E+043.13E−115.32E−09HFM1
Amplification Peak 8111q25.2chr1:176093431–1761538736.04E+042.73E−138.89E−06RFWD2
Amplification Peak 931q42.12chr1:225152123–22521169659,5731.71E−031.71E−03DNAH14
Amplification Peak 14153q25.1chr3:149563840–149684409120,5694.78E−041.92E−03PFN2, RNF13
Amplification Peak 15143q29chr3:195053676–1950633189.64E+031.01E−051.72E−05ACAP2
Amplification Peak 1764p16.3chr4:3449609–349529345,6849.11E−049.11E−04HGFAC, DOK7
Amplification Peak 1854p15.2chr4:26622159–2664195919,8000.010.01TBC1D19
Amplification Peak 1954q31.21chr4:145658875–146010087351,2125.17E−035.17E−03ANAPC10, HHIP
Amplification Peak 20105q35.3chr5:180052785–1801670761.14E+051.68E−051.68E−05FLT4, OR2Y1
Amplification Peak 2246q15chr6:89479426–8956357184,1451.47E−041.47E−04RNGTT
Amplification Peak 24707p14.1chr7:38284691–383570867.24E+041.62E−821.62E−82TCRG locus
Amplification Peak 28198q24.13chr8:124810460–1248121961,7362.36E−030.02FAM91A1
Amplification Peak 3069q34.3chr9:138905634–139092618186,9840.030.03LHX3, C9orf69, NACC2
Amplification Peak 3189q34.3chr9:140161390–14024605084,6601.71E−031.71E−03COBRA1, C9orf167, EXD3, NRARP
Amplification Peak 32610q24.33chr10:105073962–10514057966,6171.28E−041.28E−04TAF5, PCGF6
Amplification Peak 332611p13chr11:32676388–327051412.88E+047.50E−135.84E−12CCDC73, WT1
Amplification Peak 34211p11.2chr11:45935717–4595576120,0440.040.04PEX16, PHF21A, GYLTL1B
Amplification Peak 401113q13.3chr13:35622613–3569771575,1025.17E−035.17E−03NBEA
Amplification Peak 41413q22.1chr13:73530979–73643032112,0530.010.01KLF5, PIBF1
Amplification Peak 4218814q11.2chr14:22749319–229258671.77E+054.44E−1862.84E−185TCRA locus
Amplification Peak 451214q32.33chr14:105059814–1051725111.13E+052.70E−054.01E−04INF2, TMEM179, MIR4710, AKT1
Amplification Peak 50816q22.1chr16:66819728–668249595.23E+035.54E−055.54E−05CCDC79, NAE1
Amplification Peak 511217q11.2chr17:30228618–303217729.32E+042.72E−202.23E−17SUZ12, UTP6
Amplification Peak 53617q25.3chr17:79389937–7943002240,0850.020.02hsa-mir-3186, BAHCC1, MIR3186
Amplification Peak 54718q11.2chr18:21229308–21329567100,2590.020.02LAMA3, ANKRD29
Amplification Peak 551518q22.1chr18:66365105–6637738812,2830.030.03TMX3
Amplification Peak 57819p12chr19:20221949–2031796996,0200.020.03ZNF90, ZNF486
Amplification Peak 59320q13.12chr20:42694274–42813092118,8185.94E−035.94E−03JPH2, TOX2
Amplification Peak 62520q13.33chr20:62705116–62831313126,1970.020.02NPBWR2, MYT1, OPRL1, RGS19, C20orf201
Amplification Peak 66422q13.33chr22:50967459–5100805040,5911.39E−031.39E−03CPT1B, TYMP, KLHDC7B, CHKB-CPT1B, ODF3B, SYCE3
Deletion Peak 71932p11.2chr2:88894337–957222186,827,88100IGK locus
Deletion Peak 842q23.3chr2:152234651–152663572428,9210.040.04NEB, RIF1
Deletion Peak 13123q25.33chr3:159713418–160118695405,2770.020.04IFT80, C3orf80, IL12A
Deletion Peak 16194q13.2chr4:69344747–696915213.47E+057.11E−063.92E−05UGT2B15, UGT2B17
Deletion Peak 1784q13.3chr4:70897589–71020148122,5590.020.02HTN1, CSN1S2AP, CSN1S2BP
Deletion Peak 1894q21.23chr4:84367119–8445773590,6161.71E−036.90E−03MRPS18C, FAM175A
Deletion Peak 21156q21chr6:110797012–111583801786,7891.74E−041.74E−04AMD1, CDK19, RPF2, GTF3C6, SLC16A10, GSTM2P1
Deletion Peak 23127p22.1chr7:5269419–5920366650,9470.020.02hsa-mir-589, ACTB, FSCN1, RNF216, FBXL18, TNRC18, SLC29A4, ZNF815
Deletion Peak 24117q21.2chr7:92147625–922473689.97E+043.11E−059.98E−05RBM48, MGC16142, FAM133B, LOC728066, CDK6
Deletion Peak 27118q22.1chr8:95690341–95840099149,7580.020.02DPY19L4, ESRP1, TP53INP1
Deletion Peak 2939p21.2chr9:26116211–27109465993,2541.62E−031.62E−03PLAA, LRRC19, C9orf82, IFT74, TEK
Deletion Peak 323311q14.3chr11:89185111–894436152.59E+051.36E−341.06E−22FOLH1B, NOX4
Deletion Peak 344411q22.3chr11:110018375–1104499754.32E+051.18E−311.13E−18FDX1, RDX, ATM
Deletion Peak 371212q24.33chr12:132335406–132436802101,3961.09E−031.05E−03ULK1, PUS1
Deletion Peak 3820013q14.2chr13:50306479–515017521,195,27300hsa-mir-15a, DLEU2, TRIM13, DLEU1, SPRYD7, ST13P4, DLEU7, CTAGE10P, KCNRG, MIR15A, MIR16-1, MIR3613
Deletion Peak 404114q21.1chr14:39721874–398710821.49E+051.56E−183.83E−10CTAGE5, LOC100288846, MIA2, FBXO33
Deletion Peak 422714q32.33chr14:104506663–1052013226.95E+051.61E−111.05E−03hsa-mir-203, KIF26A, INF2, ASPG, TMEM179, C14orf180, MIR203, MIR3545, MIR4710
Deletion Peak 433314q32.33chr14:105861125–1073495401.49E+061.89E−136.44E−08CRIP1, CRIP2, ELK2AP, ADAM6, MTA1, KIAA0125, TMEM121, C14orf80, LINC00226, LINC00221, TEX22
Deletion Peak 461015q15.1chr15:42138836–4219273053,8943.47E−030.01hsa-mir-4310, SPTBN5, MIR4310
Deletion Peak 49216q12.2chr16:53968071–553604731,392,4020.010.01IRX5, IRX3, CRNDE
Deletion Peak 54818q21.1chr18:43604458–43796648192,1900.040.04ATP5A1, HAUS1, PSTPIP2
Deletion Peak 551018q23chr18:77246915–77733812486,8972.64E−042.84E−04CTDP1, KCNG2, PQLC1, HSBP1L1, NFATC1
Deletion Peak 6612522q11.22chr22:23162356–234041712.42E+052.71E−2775.01E−277IGL locus

Recurrent focal amplifications and deletions in the chronic lymphocytic leukemia (CLL) genome, including their frequency, wide peak region, length, q-value, residual q-value, and involved genes.

Figure 1

Figure 2

The most frequently amplified regions were found in 11p13 (adjacent to the WT1 locus), 8q24.13 (FAM91A1), 3q25.1 (PFN2 and RNF13), 18q22.1 (TMX3), 3q29 (ACAP2), 17q11.2 (SUZ12), 14q32.33 (adjacent to the AKT1 locus), 13q13.3 (NBEA/BCL8B), 1q25.2 (RFWD2), 5q35.3 (FLT4), 19p13.3 (adjacent to the APC2 locus), 1p22.2 (HFM1), and 1p31.1. Similarly, the most recurrent focal deletions were detected in 13q14.2 (DLEU1 locus), 11q22.3 (ATM locus), 14q21.1 (MIA2), 11q14.3 (NOX4), 14q32.33 (IGH locus), and 4q13.2 (UGT2B15 and UGT2B17 loci). Moreover, we observed frequent deletions of immunoglobulin loci and recurrent amplified regions at TCR genes, likely reflecting deletions present in the T lymphocytes within control samples.

Three focal deletions were associated with TTT (BH q-value < 0.05): 11q22.3 (ATM locus), 14q32.33 (IGH locus), and 7q21.2 (CDK6 locus). 11q22.3 loss was associated with shorter survival too. No event was associated with TTT or OS after adjusting for IGHV status, sex, and disease stage (BH q-value < 0.05). No recurrent gain was associated with treatment-free survival or OS. The association results can be consulted in Supplementary Table 1.

Furthermore, SETD2 deletions (nine cases) and IRF4 gains (five cases) were nearly significant (GISTIC residual q-values of 0.08 and 0.06, respectively) and associated with clinical evolution. SETD2 deletion was associated with shorter treatment free survival (p-value 1.3 × 10−8) independent of IGHV status, sex, and disease stage (p-value 2.3 × 10−3); and it was also associated with shorter survival (p-value 9.9 × 10−3) but not independently of IGHV status (p-value 0.24). Amplifications in IRF4 were associated with shorter survival (p-value 6.4 × 10−5) in a partially IGHV-independent manner (p-value 0.025). Nevertheless, this finding must be interpreted with caution due to the position of IRF4 near the telomeric end of the short arm of chromosome 6.

Correlation of Focal CNAs With Gene Expression

We detected significant correlations between some recurrent CNAs and their correspondingly encoded genes (Supplementary Table 2). As expected, deletions in 11q22.3 and 13q14.2 and the expression of their respective genes (ATM, FDX1, and RDX in the first case, and DLEU1, DLEU2, and KCNRG in the second case). Deletions in 6q21 were correlated with lower expression of CDK19, and so did those in 14q21.1 with the expression of MIA2. Furthermore, we detected significant correlations between amplifications in 3q25.1 and 3q29 and expression of their target genes PFN2 and ACAP2, respectively. Surprisingly, an inverse correlation was observed between 11q14.3 deletions and FOLH1B expression.

Similarly, we detected significant correlations between 19 of these CNAs and the expression of 926 protein-coding genes (q-value < 0.01; Supplementary Table 3). The CNAs with more correlated genes were deletions in 13q14.2 (389 genes), 12q24.33 (ULK1 locus, 170 genes), 6q21 (CDK19 locus, 79 genes), 11q22.3 (ATM locus, 39 genes), and 3q25.33 (IL12A locus, 26 genes), as well as amplifications in 18q22.1 involving TMX3 (135 genes).

Nonetheless, this study is probably underpowered to detect CNA–transcript correlations due to the fact that only 50% of the exome-seq samples had matched RNAseq data of purified CLL cells.

Broad CNA Regions and Their Association With Clinical Evolution

Recurrent broad amplifications and deletions were detected in the CLL genome (Table 3, Supplementary Figure 1). Among the amplifications, the most frequent were found in chromosomes 12 (60 cases, 13.6% of patients), 2p (10 patients), 8q (6 patients), 18q (6 patients), 18p (5 patients), and 3q (5 patients). Similarly, the most recurrent large deletions were detected in 17p (8 patients), 18p (7 patients), and 8p (6 patients).

Table 3

ArmGenesAmp frequencyAmp frequency scoreAmp z-scoreAmp q-valueDel frequencyDel frequency scoreDel z-scoreDel q-value
1p2,12100−0.6130.9200−0.6130.93
1q1,95500−0.7290.9200−0.7290.93
2p9240.020.026.991.81E−11000.4460.93
2q1,556001.160.48000.1060.93
3p1,062000.5430.8300−1.170.93
3q1,1390.010.013.263.59E−0300−1.130.93
4p48900−1.380.92000.07080.93
4q1,049000.530.8300−1.170.93
5p27000−0.07420.9200−1.450.93
5q1,427000.9650.5900−1.010.93
6p1,17300−1.130.9200−1.130.93
6q83900−1.250.920.010.011.140.67
7p64100−1.330.9200−0.5740.93
7q1,27700−0.1510.9200−0.1510.93
8p580000.1550.920.010.013.120.01
8q8590.010.013.581.36E−0300−0.4330.93
9p42200−1.40.920.010.010.740.93
9q1,11300−1.150.92000.5940.93
10p40900−1.410.9200−1.410.93
10q1,26800−1.080.9200−1.080.93
11p86200−0.4420.9200−0.4420.93
11q1,51500−0.9690.92001.090.67
12p5750.140.1443.2000−1.260.93
12q1,4470.140.1458.9000−0.9330.93
13q65400−1.320.9200−1.320.93
14q1,34100−1.050.9200−0.10.93
15q1,35500−0.08810.9200−1.050.93
16p87200−1.240.9200−0.4390.93
16q70200−1.310.9200−0.540.93
17p68300−1.30.920.020.024.83.16E−05
17q1,592000.1440.92001.220.67
18p1430.010.011.90.120.020.023.240.01
18q4460.010.012.929.84E−0300−1.390.93
19p9950.010.012.160.07000.4920.93
19q1,7090.010.013.739.33E−04001.440.64
20p35500−0.7140.920.010.011.390.64
20q75300−0.510.9200−1.290.93
21q5090.010.010.8230.67000.09450.93
22q92100−1.220.92000.4110.93

Significantly enriched broad cytogenetic aberrations in the chronic lymphocytic leukemia (CLL) genome.

Amplifications in chromosomes 12, 2p, and 8q, as well as deletion of 17p, were significantly associated with shorter TTT (Supplementary Tables 4 and 5). Furthermore, deletion of 17p was significantly associated with TTT independently of IGHV status, sex, and Binet staging. Amplifications in chromosomes 12 and 8q were associated with shorter OS, but no event was significant after adjusting for IGHV status, age, and Binet staging. Nonetheless, we detected a significant difference in OS and TTT between IGHV-mutated cases with and with and without trisomy 12 (Figures 3A, B respectively).

Figure 3

Regions of CNN-LOH

Control-FREEC identified 63 regions of CNN-LOH with a genotype uncertainty below 5% (Supplementary Table 6). Ten events were located at 1p, six of which affected ARID1A. By comparing with mutation data published by Puente et al. (2015), none of these samples bore concurrent non-synonymous mutations in ARID1A. Other three events were detected at 1q, with a minimally affected region on 1q21.3, which holds likely driver genes such as PI4KB and IL6R/CD126. Four CNN-LOH events with a minimally involved region in 9q34.13–q34.3 involved the NOTCH1 gene. One of these cases also had a frameshift deletion in NOTCH1. Three events at 11p15.5–15.4 affected the imprinted locus of IGF2 and CDKN1C. Eight CNN-LOH events affected the 11p11.2–q13.2 region. Two different samples had events at 11q involving the ATM locus, both of which also had non-synonymous ATM mutations. Three events were located in 16p13.3–p13.11, which involve the CREBBP gene. None of these had mutations in CREBBP. Ten CNN-LOH events affected chromosome 17, three of which included the TP53 locus. Among the latter, two had non-synonymous mutations in TP53. Four samples had CNN-LOH events at chromosome 20, all of which affect the ASXL1 gene but without any concurrent mutation in it. It is interesting to mention that only two of the CNN-LOH events overlapped with those reported by Puente et al. (2015) using SNP array technology.

Discussion

The detection of new cytogenetic aberrations using targeted sequencing takes advantage of the increased sensitivity of this technique in order to detect small events that would be otherwise difficult to identify using array-based techniques. CLL is characterized by large-scale cytogenetic alterations (Döhner et al., 2000), but focal rearrangements have been studied to a lower extent. Using sequencing data originally produced by Puente et al. (2015), here, we report the existence of 54 putatively recurrent focal CNAs in the CLL genome.

Recurrent focal amplifications were shorter than deletions, mostly involving one or two genes. The most significantly enriched focal gains affected the loci of SUZ12, WT1, HFM1, RFWD2, FLT4, and TTNI3K. On the contrary, focal deletions were wider and tended to span more than two genes. As expected, among the most significant deletions were those in 13q14.2 and in 11q22.3. Nevertheless, other highly significant regions involved the loci of genes such as NOX4, a component of the NADPH oxidase complex (Guo and Chen, 2015), and MIA2, a tumor suppressor gene (Hellerbrand et al., 2008). Deletions in SETD2, 11q22.3, and 14q32.33 were associated with shorter time to first treatment, and gains of IRF4 were independently associated with short survival. Furthermore, we could detect significant positive correlations between six recurrent CNAs and the expression of genes encoded in their respective loci, as well as correlations between 19 CNAs and the expression 926 protein-coding genes genome wide.

Focal recurrent gains and losses tended to target genes that participate in oncogenic pathways. For example, five amplified genes (HFM1, ANAPC10, TAF5, COBRA1, and SYCE3) and one deleted gene (FAM175A/ABRAXAS) are involved in DNA transcription, replication, and repair mechanisms. Both COBRA1 and FAM175A physically interact with the tumor suppressor BRCA1 (Castillo et al., 2014; Yun et al., 2018), whereas ANAPC10 belongs to the anaphase-promoting complex/cyclosome family of proteins that control sister chromatid segregation and cytokinesis (Chang et al., 2014). The amplified genes PHF21A, PCGF6, and SUZ12 encode epigenetic regulators with repressor activity (Iwase et al., 2006; Vizán et al., 2015; Zhao et al., 2017). Other genes targeted by recurrent events participate in important pathways. This is the case of the amplified oncogenes AKT1 (Hyman et al., 2017) and WT1 (Bergmann et al., 1997), and of the receptor tyrosine kinase gene FLT4, which regulates lymphangiogenesis and tumor metastasization to lymphatic vessels (Lee et al., 2016). Likewise, the deleted genes AMD1, TP53INP1, ULK1, and NFATC1 are also important in tumorigenesis. AMD1 and TP53INP1 participate in metabolic pathways, and both have tumor suppressor activity (Scuoppo et al., 2012; Saadi et al., 2015), whereas ULK1 plays a decisive role in autophagy initiation (Zachari and Ganley, 2017) and NFATC1 maintains an anergic phenotype in CLL cells (Märklin et al., 2017). Finally, two cyclin-dependent kinase genes were recurrently deleted: CDK6 and CDK19. Deletions in CDK6 were associated with shorter time to first treatment, whereas those in CDK19 were correlated with a reduced expression of the gene. CDK19 is a component of the mediator kinase module, which associates with the mediator complex in order to regulate diverse cellular functions (Dannappel et al., 2019), and CDK6 is a promoter of cell-cycle progression (Kollmann and Sexl, 2013). The role of both genes in the pathogenesis of CLL needs further clarification.

Finally, recurrent broad cytogenetic aberrations characteristic of CLL were identified at the expected frequency, as in the case of trisomy 12, 17p deletion, amplification of 8q, and loss of 8p (Blanco et al., 2016). Interestingly, we detected a significant adverse time to event and OS effect of trisomy 12 among IGHV-mutated cases. The importance of IGHV mutation as a predictor of disease evolution within the group of patients with trisomy 12 was previously reported by others (Bulian et al., 2017; Roos-Weil et al., 2018), but to our knowledge, this is the first report about the prognostic importance of trisomy 12 among IGHV-mutated cases. Furthermore, new CNN-LOH events affecting CLL drivers were also detected, most of which had not been described in previous array-based analysis of the CLL genome (Novak et al., 2002; Pfeifer et al., 2007). These CNN-LOH events affected the ATM, NOTCH1, TP53, ARID1A, ASXL1, CREBBP, and PI4KB/IL6R loci, as well as the telomeric region of 11p. Only part of these mutations had concurrent mutations in their corresponding driver genes, suggesting the existence of other mechanisms of pathogenicity.

Detection of copy number changes based on exome-sequencing has been proven to be prone to false positives by some studies (Rieber et al., 2017). In this analysis, we have included patient matched control samples, we applied stringent thresholds in order to minimize false detections, and we recapitulated most of the cytogenetic findings of CLL in the expected frequency. Furthermore, we could correlate the presence of this structural aberrations with changes in gene expression in a subgroup of patients, although we believe that this study may be underpowered to detect such associations.

In conclusion, our study presents proof-of-concept evidence for the existence of new focal recurrent CNAs and CNN-LOH in the genome of CLL, some of which influence clinical outcome. Furthermore, we observed that some of these novel events have significant correlations with gene expression changes. The results are concordant with the possible involvement of a set of oncogenes and tumor suppressors in the development of CLL. These results should be considered a “proof of concept,” and their existence and functionality should be validated in the future.

Funding

This research has been performed without funding. The publication costs associated with this manuscript have been partially paid by Roche Pharmaceuticals. The funder played no role in the study design, data collection, analysis, results interpretation, writing or in the decision to submit this paper for publication.

Statements

Author contributions

AM, BA, and JB designed the study. AM performed the research and analyzed the data. AM, BA, JD, MG and JB analyzed the results and wrote the paper.

Acknowledgments

The authors gratefully thank CESGA (Supercomputing Center of Galicia) for providing the necessary resources for the development of this project, as well as the International Cancer Research Consortium and the European Bioinformatics Institute for supplying data access.

The content of this paper is part of the doctoral thesis of Adrián Mosquera Orgueira to obtain a PhD at the Department of Medicine, University of Santiago de Compostela.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00854/full#supplementary-material

Supplemental Figure 1

Heatmap representation of genomic profiles made using segmented copy number data of the CLL genomes.

Supplementary Table 1

Association of focal amplifications and deletions with TTT and OS. Only those events affecting at least 5 patients of this cohort were selected for the analysis.

Supplementary Table 2

Significant correlations detected between CNA events and expression of genes encoded in their respective loci.

Supplementary Table 3

Significant correlations detected between CNA events and any protein-coding genes genome-wide (q-value <0.01).

Supplementary Table 4

Association of large-scale deletions with TTT and OS. Only those events affecting at least 5 patients of this cohort were selected for the analysis.

Supplementary Table 5

Association of large-scale amplifications with TTT and OS. Only those events affecting at least 5 patients of this cohort were selected for the analysis.

Supplementary Table 6

Summary of the detected LOH events detected, along with candidate gene annotation and its mutation status in the data published by Puente et al.

References

  • 1

    BergmannL.MiethingC.MaurerU.BriegerJ.KarakasT.WeidmannE.et al. (1997). High levels of Wilms’ tumor gene (wt1) mRNA in acute myeloid leukemias are associated with a worse long-term outcome. Blood90, 12171225.

  • 2

    BeroukhimR.GetzG.NghiemphuL.BarretinaJ.HsuehT.LinhartD.et al. (2007). Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl. Acad. Sci. U. S. A.104 (50), 2000720012. doi: 10.1073/pnas.0710052104

  • 3

    BlancoG.PuiggrosA.BaliakasP.AthanasiadouA.García-MaloM.ColladoR.et al. (2016). Karyotypic complexity rather than chromosome 8 abnormalities aggravates the outcome of chronic lymphocytic leukemia patients with TP53 aberrations. Oncotarget7 (49), 8091680924. doi: 10.18632/oncotarget.13106

  • 4

    BoevaV.PopovaT.BleakleyK.ChicheP.CappoJ.SchleiermacherG.et al. (2012). Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics28, 423425. doi: 10.1093/bioinformatics/btr670

  • 5

    BreeseM. R.LiuY. (2013). NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics29, 494496. doi: 10.1093/bioinformatics/bts731

  • 6

    BulianP.BombenR.BoM. D.ZucchettoA.RossiF. M.DeganM.et al. (2017). Mutational status of IGHV is the most reliable prognostic marker in trisomy 12 chronic lymphocytic leukemia. Haematologica102 (11), e443e446. doi: 10.3324/haematol.2017.170340

  • 7

    CastilloA.PaulA.SunB.HuangT. H.WangY.YazinskiS. A.et al. (2014). The BRCA1-interacting protein Abraxas is required for genomic stability and tumor suppression. Cell Rep.8 (3), 807817. doi: 10.1016/j.celrep.2014.06.050

  • 8

    ChangL. F.ZhangZ.YangJ.McLaughlinS. H.BarfordD. (2014). Molecular architecture and mechanism of the anaphase-promoting complex. Nature513 (7518), 388393. doi: 10.1038/nature13543

  • 9

    DannappelM. V.SoorajD.LohJ. J.FiresteinR. (2019). Molecular and in vivo functions of the CDK8 and CDK19 kinase modules. Front. Cell Dev. Biol.6, 171. doi: 10.3389/fcell.2018.00171

  • 10

    DöhnerH.StilgenbauerS.BennerA.LeupoltE.KröberA.BullingerL.et al. (2000). Genomic aberrations and survival in chronic lymphocytic leukemia. N. Engl. J. Med.343, 19101916. doi: 10.1056/NEJM200012283432602

  • 11

    EdelmannJ.TauschE.LandauD. A.RobrechtS.BahloJ.FischerK.et al., (2017). Frequent evolution of copy number alterations in CLL following first-line treatment with FC(R) is enriched with TP53 alterations: results from the CLL8 trial. Leukemia31, 734738. doi: 10.1038/leu.2016.317

  • 12

    GruberM.BozicI.LeshchinerI.LivitzD.StevensonK.RassentiL.et al. (2019). Growth dynamics in naturally progressing chronic lymphocytic leukaemia. Nature570 (7762), 474479. doi: 10.1038/s41586-019-1252-x

  • 13

    GuoS.ChenX. (2015). The human Nox4: gene, structure, physiological function and pathological significance. J. Drug Target.Dec23 (10), 888896. doi: 10.3109/1061186X.2015.1036276

  • 14

    HellerbrandC.AmannT.SchlegelJ.WildP.BatailleF.SprussT.et al. (2008). The novel gene MIA2 acts as a tumour suppressor in hepatocellular carcinoma. Gut57 (2), 243251. doi: 10.1136/gut.2007.129544

  • 15

    Hernández-SánchezM.Rodríguez-VicenteA. E.González-GascónYMarínI.Quijada-ÁlamoM.Hernández-SánchezJ. M.Martín-IzquierdoM.et al. (2019). DNA damage response-related alterations define the genetic background of patients with chronic lymphocytic leukemia and chromosomal gains. Exp. Hematol.72, 913. doi: 10.1016/j.exphem.2019.02.003

  • 16

    HymanD. M.SmythL. M.DonoghueM. T. A.WestinS. N.BedardP. L.DeanE. J.et al. (2017). AKT inhibition in solid tumors with AKT1 mutations. J. Clin. Oncol.35 (20), 22512259. doi: 10.1200/JCO.2017.73.0143

  • 17

    IwaseS.ShonoN.HondaA.NakanishiT.KashiwabaraS.TakahashiS.et al. (2006). A component of BRAF–HDAC complex, BHC80, is required for neonatal survival in mice. FEBS Lett.580 (13), 31293135. doi: 10.1016/j.febslet.2006.04.065

  • 18

    KimD.LangmeadB.SalzbergS. L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat. Methods.12 (4), 357360. doi: 10.1038/nmeth.3317

  • 19

    KimH. Y.ChoiJ. W.LeeJ. Y.KongG. (2017). Gene-based comparative analysis of tools for estimating copy number alterations using whole-exome sequencing data. Oncotarget8 (16), 2727727285. doi: 10.18632/oncotarget.15932

  • 20

    KollmannK.SexlV. (2013). CDK6 and p16INK4A in lymphoid malignancies. Oncotarget4 (11), 18581859. doi: 10.18632/oncotarget.1541

  • 21

    LandauD. A.TauschE.Taylor-WeinerA. N.StewartC.ReiterJ. G.BahloJ.et al. (2015). Mutations driving CLL and their evolution in progression and relapse. Nature526, 525530. doi: 10.1038/nature15395

  • 22

    LawrenceM.HuberW.PagèsH.AboyounP.CarlsonM.GentlemanR.et al. (2013). Software for computing and annotating genomic ranges. PLoS Comput. Biol.9 (8), e1003118. doi: 10.1371/journal.pcbi.1003118

  • 23

    LeeJ. Y.HongS. H.ShinM.HeoH. R.JangI. H. (2016). Blockade of FLT4 suppresses metastasis of melanoma cells by impaired lymphatic vessels. Biochem. Biophys. Res. Commun.478 (2), 733738. doi: 10.1016/j.bbrc.2016.08.017

  • 24

    LeeksmaA. C.TaylorJ.WuB.GardnerJ. R.HeJ.NahasM.et al. (2019). Clonal diversity predicts adverse outcome in chronic lymphocytic leukemia. Leukemia33 (2), 390402. doi: 10.1038/s41375-018-0215-9

  • 25

    LiH.DurbinR. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics25, 17541760. doi: 10.1093/bioinformatics/btp324

  • 26

    LiH.HandsakerB.WysokerA.FennellT.RuanJ.HomerN.et al1000 Genome Project Data Processing Subgroupet al. (2009). 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics25 (16), 20782079.

  • 27

    LjungströmV.CorteseD.YoungE.PandzicT.MansouriL.PlevovaK.et al. (2016). Whole-exome sequencing in relapsing chronic lymphocytic leukemia: clinical impact of recurrent RPS15 mutations. Blood127 (8), 10071016. doi: 10.1182/blood-2015-10-674572

  • 28

    LoveM. I.AndersS.KimV.HuberW. (2015). RNA-seq workflow: gene-level exploratory analysis and differential expression. F1000Res.14 (4), 1070. doi: 10.12688/f1000research.7035.2

  • 29

    MärklinM.HeitmannJ. S.FuchsA. R.TruckenmüllerF. M.GutknechtM.BuglS.et al. (2017). NFAT2 is a critical regulator of the anergic phenotype in chronic lymphocytic leukaemia. Nat. Commun.8 (1), 755. doi: 10.1038/s41467-017-00830-y

  • 30

    MartinM. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J.17 (1), 1012. doi: 10.14806/ej.17.1.200

  • 31

    MorganM.PagèsH.ObenchainV.HaydenN. (2017). Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version 1.30.0, http://bioconductor.org/packages/release/bioc/html/Rsamtools.html.

  • 32

    NadeuF.ClotG.DelgadoJ.Martín-GarcíaD.BaumannT.SalaverriaI.et al. (2018). Clinical impact of the subclonal architecture and mutational complexity in chronic lymphocytic leukemia. Leukemia32 (3), 645653. doi: 10.1038/leu.2017.291

  • 33

    NamJ. Y.KimN. K.KimS. C.JoungJ. G.XiR.LeeS.et al. (2016). Evaluation of somatic copy number estimation tools for whole-exome sequencing data. Brief. Bioinformatics17, 185192. doi: 10.1093/bib/bbv055

  • 34

    NovakU.Oppliger LeibundgutE.HagerJ.MühlematterD.JotterandM.BesseC.et al. (2002). A high-resolution allelotype of B-cell chronic lymphocytic leukemia (B-CLL). Blood100, 17871794.

  • 35

    PfeiferD.PanticM.SkatullaI.RawlukJ.KreutzC.MartensU. M.et al. (2007). Genome-wide analysis of DNA copy number changes and LOH in CLL using high-density SNP arrays. Blood109, 12021210. doi: 10.1182/blood-2006-07-034256

  • 36

    PuenteX. S.BeàmS.Valdés-MasR.VillamorN.Gutiérrez-AbrilJ.Martín-SuberoJ. I.et al. (2015). Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature526, 519524. doi: 10.1038/nature14666

  • 37

    R Development Core Team (2011). R: A Language and Environment for Statistical Computing. Austria: The R Foundation for Statistical Computing: Vienna. ISBN: 3-900051-07-0. Available online at http://www.R-project.org/.

  • 38

    RamsayA. J.Martínez-TrillosA.JaresP.RodríguezD.KwarciakA.QuesadaV. (2013). Next-generation sequencing reveals the secrets of the chronic lymphocytic leukemia genome. Clin. Transl. Oncol.15, 38. doi: 10.1007/s12094-012-0922-z

  • 39

    RaponiS.Del GiudiceI.MarinelliM.WangJ.CafforioL.IlariC.et al. (2018). Genetic landscape of ultra-stable chronic lymphocytic leukemia patients. Ann. Oncol.29 (4), 966972. doi: 10.1093/annonc/mdy021

  • 40

    RieberN.BohnertR.ZiehmU.JansenG. (2017). Reliability of algorithmic somatic copy number alteration detection from targeted capture data. Bioinformatics33 (18), 27912798. doi: 10.1093/bioinformatics/btx284

  • 41

    Roos-WeilD.Nguyen-KhacF.ChevretS.TouzeauC.RouxC.LejeuneJ.et al. (2018). Mutational and cytogenetic analyses of 188 CLL patients with trisomy 12: a retrospective study from the French Innovative Leukemia Organization (FILO) working group. Genes Chromosomes Cancer57 (11), 533540. doi: 10.1002/gcc.22650

  • 42

    SaadiH.SeillierM.CarrierA. (2015). The stress protein TP53INP1 plays a tumor suppressive role by regulating metabolic homeostasis. Biochimie118, 4450. doi: 10.1016/j.biochi.2015.07.024

  • 43

    ScuoppoC.MiethingC.LindqvistL.ReyesJ.RuseC.AppelmannI.et al. (2012). A tumour suppressor network relying on the polyamine-hypusine axis. Nature487 (7406), 244248. doi: 10.1038/nature11126

  • 44

    TherneauT. M. (2015) A Package for Survival Analysis in S. version 2.38, https://CRAN.R-project.org/package=survival.

  • 45

    TherneauT. M.GrambschP. M., (2000). Modeling Survival Data: Extending the Cox Model. New York: Springer, ISBN: 0-387-98784-3. doi: 10.1007/978-1-4757-3294-8

  • 46

    VizánP.BeringerM.BallaréC.Di CroceL. (2015). Role of PRC2-associated factors in stem cells and disease. FEBS J.282 (9), 17231735. doi: 10.1111/febs.13083

  • 47

    YatesA.AkanniW.AmodeM. R.BarrellD.BillisK.Carvalho-SilvaD.et al. (2016). Ensembl 2016. Nucleic Acids Res.44 (D1), D710–6. doi: 10.1093/nar/gkv1157

  • 48

    YuL.KimH. T.KasarS.BenienP.DuW.HoangK.et al. (2017). Survival of Del17p CLL depends on genomic complexity and somatic mutation. Clin. Cancer Res.23, 735745. doi: 10.1158/1078-0432.CCR-16-0594

  • 49

    YunH.BedollaR.HorningA.LiR.ChiangH. C.HuangT. H.et al. (2018). BRCA1 interacting protein COBRA1 facilitates adaptation to castrate-resistant growth conditions. Int. J. Mol. Sci.19 (7), Pii: E2104. doi: 10.3390/ijms19072104

  • 50

    ZachariM.GanleyI. G. (2017). The mammalian ULK1 complex and autophagy initiation. Essays Biochem.61 (6), 585596. doi: 10.1042/EBC20170021

  • 51

    ZareF.DowM.MonteleoneN.HosnyA.NabaviS. (2017). An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinformatics18 (1), 286. doi: 10.1186/s12859-017-1705-x

  • 52

    ZhaoW.TongH.HuangY.YanY.TengH.XiaY.et al. (2017). Essential role for Polycomb group protein Pcgf6 in embryonic stem cell maintenance and a noncanonical Polycomb repressive complex 1 (PRC1) integrity. J. Biol. Chem.Feb 17292 (7), 27732784. doi: 10.1074/jbc.M116.763961

Summary

Keywords

copy number aberration, chronic lymphocytic leukemia, driver, time to treatment, overall survival

Citation

Mosquera Orgueira A, Antelo Rodríguez B, Díaz Arias JÁ, González Pérez MS and Bello López JL (2019) New Recurrent Structural Aberrations in the Genome of Chronic Lymphocytic Leukemia Based on Exome-Sequencing Data. Front. Genet. 10:854. doi: 10.3389/fgene.2019.00854

Received

15 January 2019

Accepted

16 August 2019

Published

20 September 2019

Volume

10 - 2019

Edited by

Shaochun Bai, GeneDx, United States

Reviewed by

Armand Valsesia, Nestle Institute of Health Sciences (NIHS), Switzerland; Natasa Djordjevic, University of Kragujevac, Serbia

Updates

Copyright

*Correspondence: Adrián Mosquera Orgueira,

This article was submitted to Genomic Medicine, a section of the journal Frontiers in Genetics

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics