Evolution of Advanced Chronic Lymphoid Leukemia Unveiled by Single-Cell Transcriptomics: A Case Report

Genetic and transcriptional heterogeneity of Chronic lymphocytic leukaemia (CLL) limits prevention of disease progression. Longitudinal single-cell transcriptomics represents the state-of-the-art method to profile the disease heterogeneity at diagnosis and to inform about disease evolution. Here, we apply single-cell RNA-seq to a CLL case, sampled at diagnosis and relapse, that was treated with FCR (Fludarabine, Cyclophosphamide, Rituximab) and underwent a dramatic decrease in CD19 expression during disease progression. Computational analyses revealed a major switch in clones’ dominance during treatment. The clone that expanded at relapse showed 17p and 3p chromosomal deletions, and up-regulation of pathways related to motility, cytokine signaling and antigen presentation. Single-cell RNA-seq uniquely revealed that this clone was already present at low frequency at diagnosis, and it displays feature of plasma cell differentiation, consistent with a more aggressive phenotype. This study shows the benefit of single-cell profiling of CLL heterogeneity at diagnosis, to identify clones that might otherwise not be recognized and to determine the best treatment options.

followed by washing with 2 ml PBS (local source) for 5 min at 350g. In case of peripheral blood, red blood lysis with 1x BD FACS™ lysing solution (2ml, 10min; BD) was performed before washing step. The cell pellets were resuspended in 0.5 ml of PBS and immediately measured on a BD FACSCanto™ II (BD Bioscience) or NAVIOS (Beckman Coulter) cytometer. The analysis was performed using the FlowJo Software (v10.6.2; BD).

Cell hashing
Cell hashing was carried out prior to pooling and running on the 10x Chromium. Viability was found to be > 90% for both samples. One million cells from each sample were separately suspended in 100 µl cell staining buffer (BioLegend). The cells were blocked by incubating the samples at 4 °C for 10  TGATGGCCTATTGGG) for diagnosis and relapse, respectively. Cells were then washed with 1 mL of cell staining buffer (Biolegend) three times, resuspended in PBS at a concentration of 1000 cells/ µl and filtered through 40 µm strainers to ensure single cell suspension. Cell concentration and viability were then verified using the LUNA™ cell counter (Logos Biosytems). The two samples were pooled at a 50:50 ratio and loaded onto the Chromium instrument as detailed by the manufacturer.

Single-cell RNA-sequencing library construction
Single cells were isolated and barcoded using the Chromium Single-Cell 3' Gene Expression Kit, version 3 chemistry (10x Genomics). The first step was to prepare the reverse transcription (RT) master mix as per manufacturer's instructions. Next, the Chromium Chip B was prepared and assembled in a chip holder, with 50% glycerol dispensed in any unused chip wells. The preprepared RT master mix was then added to the cell suspension before being loaded into row 1 without introducing any bubbles. Row 2 was loaded with gel beads, while row 3 was loaded with partitioning oil before a gasket was attached to the chip. The chip was then loaded into the Chromium controller (10x Genomics) to generate single-cell gel beads in emulsion (GEMs). The GEMs were aspirated from the recovery well and dispensed into a tube strip on ice. GEM-RT was carried out in a C1000 Touch Thermal cycler (Bio-Rad) as follows: 53 °C for 45 minutes, 85 °C for 5 minutes; held at 5 °C. Following RT, the GEMs were broken using a recovery agent and the single-strand cDNA was cleaned up with DynaBeads MyOne Silane Beads (Thermofisher) and subsequently washed twice with 80% ethanol before elution. cDNA was amplified using the 10x Amp mix and cDNA primers. Amplification was performed in a C1000 Touch Thermal cycler (Bio-Rad) as follows: 98 °C for 3 min; cycled 13x as follows: 98 °C for 15s, 63 °C for 20s, and 72 °C for 60s; 72 C for 60s; held at 4 °C. Amplified cDNA product was cleaned up using SPRIselect Reagent Kit (0.6X SPRI; Beckman Coulter) before quantification on TapeStation (Agilent). The cDNA was subsequently fragmented to ~ 200 bp using the 10x Genomics fragmentation enzyme while run in a thermocycler at 32 °C for 5 min. End repair and A-tailing were then carried out in the C1000 Touch Thermal cycler (Bio-Rad) at 65 °C for 30 min. Double-sided size selection was then carried out using SPRIselect reagent. Adaptor ligation was carried out using adaptor oligos and ligation buffer provided by the Chromium Single-Cell 3' Library Kit (10x Genomics) and run in the thermocycler at 20 °C for 15 min before a 4 °C hold. This was followed by post-ligation clean-up with SPRIselect reagent. Finally, sample index PCR was carried out using a Chromium i7 sample index, amplification mix and primers provided by the Chromium library kit. The PCR was carried out in the C1000 Touch Thermal cycler (Bio-Rad) as follows: 98 °C 45s; cycled 16x: 98 °C for 20s, 54 °C for 30s, and 72 °C for 20s; 72 °C 60s; held at 4 °C. This was followed by one final clean-up using SPRIselect reagent. The barcode sequencing library was then quantified using TapeStation (Agilent) before sequencing.

Hashtag amplification and sequencing
At the cDNA amplification step, additional primers were added to increase yield of Hashtag oligonucleotide (HTO, cDNA derived from the TotalSeq™ Hashtag antibodies) products. 1 µl of HTO primer (5'GTGACTGGAGTTCAGACGTGTGC*T*C) was added and the cDNA amplification was carried out. Following cDNA amplification, 0.6X SPRI selection was performed to separate mRNA-derived and antibody-oligo-derived cDNAs, with the supernatant containing HTO-derived cDNAs (180 bp). The bead fraction containing the mRNA-derived cDNA (>300 bp) was washed with 80% ethanol and cDNA library preparation was carried out as described by manufacturer. The HTOs in the supernatant fraction were then purified using 2X SPRIselect purification before washing with 80% ethanol and subsequent elution in nuclease free water. HTO sequencing libraries were amplified using a separate PCR reaction. HTO  . Following amplification, the HTO products was purified using 1.6X SPRI purification. Both libraries (cDNA and HTO) were quantified using BioAnalyzer (Agilent), before being pooled in the following molar proportions: HTO libraries 10% and cDNA libraries 90%. The pooled libraries were then sequenced using Illumina NextSeq500.

sc-RNA-seq data analysis
Illumina generic adapters were removed from the fastq paired end reads during the base calling step using the bcl2fastq tool. CellRanger count was used to identify cell barcodes and quantify unique transcript counts according to GRCh38 annotation. After removing R2 reads shorter than 25 bps, CITE-seq-Count was used to quantify cell hashing data per lane using the cell barcodes reported by CellRanger. CellRanger was then re-run separately on the extracted reads for the two individual samples (diagnosis and relapse).
Hdf5 data for diagnosis and relapse were then load into the the statistical computing environment R using Read10X_h5 function from Seurat package (1), merged by genes and turned into Seurat object by CreateSeuratObject. Percent of mitochondrial reads for each cell was determined by PercentageFeatureSet using ^MT-pattern. These genes were subsequently removed before further processing. Cells showing less than 1500 expressed transcripts, less than 500 expressed genes or more than 20% of reads mapping to mitochondrial genes were removed. Data were normalized by SCTransform (2) and principal components calculated using RunPCA. Clustering was peformed using the first 15 identified principal components, with default parameters. The number of components to include was evaluated using the elbow method. For further dimensionality reduction UMAP (3) was applied, using the RunUMAP function.
To annotate the identified clusters to known cell types signatures derived from (4) were used.
The top 100 marker genes for each cell type were selected and used to calculate cell-type specific expression scores for each cell. Scores were computed using AddModuleScore from Seurat, and each cell was classified according to the signature with the highest score. Finally, the most common cell type assigned to the cells in each cluster was determined and used to classify the cluster.
After removing cells from clusters that were not containing B cells, the raw counts for the reaming cells were normalized, dimensionality reduced and clustered in the same way as described above for the full dataset, with exception that only first 12 principal components were considered.
B cell stage was assigned to each cell again using AddModuleScore and then selecting signature with highest score. Signatures were in this case derived from GenomicScape (5) and significance of differential proportion of stages calculated using fisher.test.
To identify CNAs, the function infercnv from the inferCNV package (6) was used with default settings except for cutoff set to 0.1 (as recommended for 10x data). As reference we used B cells from the 3kpbmc dataset provided by 10x Genomics (http://support.10xgenomics.com/single-cell-gene-expression/datasets). B cells in this dataset were identified as described above. To obtain average CNAs for each cluster, the result table from infercnv (infercnv.preliminary.observations.txt) was loaded into R, and the predicted CNA for each gene averaged across all the cells in the cluster and plotted using ComplexHeatmap (7).
To identify differentially expressed genes between clusters we used FindMarkers with default settings and wilcox.test. Resulting genes were then exported for EnrichmentMap v3.2.1 (8) analysis or used for enrichment analysis. The enrichment analysis was performed using the clusterProfiler R package (9) and Reactome pathways as gene sets. For the EnrichmentMap analysis the enriched gene sets were determined using g:Profiler (10) with its version of ReactomePathways according to (11) and then visualized Cytoscape (12). To visualize pathways scores across cells given a gene set, the AddModuleScore was used. Similarly, single cells were scored for expression of a previously published signature for resistance to combined Rituximab, Fludarabine and Cyclophosphamide treatment (13). In line with the scoring used in the original publication, the sum of expression values was used as proxy for resistance.
Unless specified otherwise, all the analyses were performed in the statistical computing environment R v3 (14). Plots were generated using the package ggplot2 (15).