FB5P-seq: FACS-Based 5-Prime End Single-Cell RNA-seq for Integrative Analysis of Transcriptome and Antigen Receptor Repertoire in B and T Cells

Single-cell RNA sequencing (scRNA-seq) allows the identification, characterization, and quantification of cell types in a tissue. When focused on B and T cells of the adaptive immune system, scRNA-seq carries the potential to track the clonal lineage of each analyzed cell through the unique rearranged sequence of its antigen receptor (BCR or TCR, respectively) and link it to the functional state inferred from transcriptome analysis. Here we introduce FB5P-seq, a FACS-based 5′-end scRNA-seq method for cost-effective, integrative analysis of transcriptome and paired BCR or TCR repertoire in phenotypically defined B and T cell subsets. We describe in detail the experimental workflow and provide a robust bioinformatics pipeline for computing gene count matrices and reconstructing repertoire sequences from FB5P-seq data. We further present two applications of FB5P-seq for the analysis of human tonsil B cell subsets and peripheral blood antigen-specific CD4 T cells. We believe that our novel integrative scRNA-seq method will be a valuable option to study rare adaptive immune cell subsets in immunology research.


INTRODUCTION
Technologies to reliably amplify and sequence the mRNA content of single cells have progressed dramatically. By quantitatively measuring the expression levels of thousands of genes per cell, single-cell RNA sequencing (scRNA-seq) enables an unbiased classification of cell types and fine characterization of functional cell states (1,2). The bioinformatics analyses of scRNA-seq datasets allow the identification of new cell types and functional subsets (3), and may reconstruct cellular differentiation or activation dynamics a posteriori from snapshot data (4)(5)(6)(7).
All scRNA-seq protocols are based on four common steps: (i) single cell isolation, (ii) reverse transcription (RT) of mRNA, (iii) amplification of cDNA, and (iv) preparation of next-generation sequencing libraries. Single cell isolation can be performed through FACS or nanodroplet encapsulation. FACS has the advantage of allowing the user to record the precise cell surface phenotype of each sorted cell (index sorting) and link it to its deeply sequenced transcriptome (>2,000 genes/cell), but with a limited throughput of a few hundred cells per sample (8,9). Nanodroplet encapsulation, as performed in the commercial system proposed by 10× Genomics (10), enables an easy parallel processing of thousands of single cells, albeit at the cost of reduced sensitivity (around 1,000 genes/cell in peripheral blood lymphocytes). Depending on the protocols, the scRNA-seq reads which are used for gene expression analysis may cover the whole mRNA sequence (full-length sequencing) or only the beginning or the end of mRNA molecules (5 ′ -end or 3 ′ -end sequencing, respectively). Because every scRNA-seq protocol has its own strengths and limitations (11,12), method choice should be driven by the biological issue at hand but will also be constrained by the desired depth (number of genes detected per cell), throughput (number of cells analyzed per sample), and budget.
In the adaptive immune system, complex gene rearrangements generate diverse B cell receptor (BCR) and T cell receptor (TCR) repertoires, enabling the recognition of an infinite range of antigens by B and T cells, respectively. When stimulated by their cognate antigen, each B or T cell clone can differentiate into multiple effector cell types that differ transcriptionally and functionally (13,14). In the process of differentiation, the TCR sequence of mature T cells remains unchanged, while the BCR sequence of B cells may be altered in affinity maturation events such as class switch recombination and somatic hypermutation (15). The progeny of a single T or B cell can thus be accurately identified through identical (for TCR) or very similar (for BCR) V-J junctional sequences in their TCR or BCR chain genes, respectively. Integrating single-cell immunoglobulin heavy chain (IGH) sequencing with low-throughput gene expression analysis by single-cell qPCR already revealed important features of memory B cell diversification (16) and B cell lymphoma evolution (17). Methods which enable the parallel analysis of repertoire sequence and whole transcriptome gene expression in single B or T cells are required to deeply investigate the multiple aspects of lymphocyte biology and malignancy.
Experimental and computational approaches to infer TCR and BCR sequence from scRNA-seq datasets of T and B cells have been developed, relying either on data produced by plate-based full-length sequencing (Smart-seq2) (18)(19)(20)(21)(22) or droplet-based 5 ′ -end sequencing (10× Genomics) (23). The former allows for a deep analysis of phenotypically defined FACS-sorted cells but is costly, labor intensive, and does not support unique molecular identifiers (UMIs, random DNA barcodes incorporated into cDNA molecules during RT and enabling the quantitative deconvolution of scRNAseq reads despite potential cDNA amplification artifacts). The latter incorporates UMIs, is relatively cheap, and is easy to perform on thousands of cells but does not allow the precise selection of phenotypically defined cells and requires the production and sequencing of additional libraries for BCR or TCR sequencing.
Here we present FB5P-seq, a novel protocol for 5 ′ -end scRNAseq analysis of FACS-sorted cells, which incorporates UMI for accurate molecular counting and allows direct efficient recovery of paired BCR and TCR repertoire sequences when applied to B and T cells. We report the good sensitivity and accuracy of FB5Pseq when applied to human tonsil B cell subsets and antigenspecific peripheral blood T cells, highlighting the relevance and performance of our cost-effective and scalable technology.

FB5P-seq Experimental Workflow
We based the design of the FB5P-seq experimental workflow on existing full-length (8) and 5 ′ -end (9,24) scRNA-seq protocols. The main originalities in FB5P-seq were to perform cell-specific barcoding, incorporate 5-bp UMI during reverse transcription, and sequence the 5 ′ -ends of amplified cDNAs from their 3 ′ -end, not from the transcription start site (Figures 1A,B). In FB5Pseq, single cells of interest are sorted in 96-well plates by FACS, routinely using a 10-color staining strategy to identify and enrich specific subsets of B or T cells while recording all parameters through index sorting. Single cells are collected in lysis buffer containing External RNA Controls Consortium (ERCC) spikein mRNA (0.025 pg per well), and the sorted plates are immediately frozen on dry ice and stored at −80 • C until further processing. ERCC controls are a mix of 92 distinct synthetic polyadenylated RNAs of distinct length with known sequences and concentrations which cover a broad range of abundance levels. ERCC controls are reverse-transcribed and amplified along endogenous mRNA and may therefore be used to control the accuracy and molecular sensitivity of scRNA-seq methods (see section Materials and Methods) (11,12,24). The amount of ERCC spike-in mRNA added to each well was optimized to yield around 5% of sequencing reads covering ERCC genes when studying lymphocytes which generally contain little mRNA. mRNA reverse transcription, cDNA 5 ′ -end barcoding, and PCR amplification are performed with a template switching (TS) approach. Notably, our TSO design included a PCR handle (different from the one introduced at the 3 ′ -end upon RT priming), an 8-bp well-specific barcode followed by a 5-bp UMI, a TATA spacer (25), and three riboguanines. We empirically selected the 96 well-specific barcode sequences to avoid TSO concatemers in FB5P-seq libraries. After amplification, the barcoded full-length cDNAs from each well are pooled for purification and one-tube library preparation. For each plate, an Illumina sequencing library targeting the 5 ′ -end of barcoded cDNA is prepared by a modified transposase-based method (tagmentation, a process where full-length cDNA is fragmented into small fragments by a Tn5 transposase enzyme which incorporates specific adapter sequences at the ends of the processed DNA strands), incorporating a plate-associated i7 barcode. The FB5P-seq library preparation protocol is costeffective (260 e for the library preparation of a 96-well plate), is easily scalable, and may be implemented on a pipetting robot.
FB5P-seq libraries are sequenced in paired-end single-index mode (i.e., sequencing both ends of the sequencing libraries and one library-specific barcode index, a standard Illumina sequencing protocol) with Read1 covering the gene insert from its 3 ′ -end, Read i7 assigning the plate barcode, and Read2 covering the well-barcode and UMI. Because FB5P-seq libraries have a broad size distribution, with a gene insert of 100-850 bp, Read 1 sequences cover the 5 ′ -end of transcripts approximately from 30 to 850 bases downstream of the transcription start site. Consequently, sequencing reads cover the whole variable and a significant portion of the constant region of the IGH-and IGK/Lexpressed mRNAs (Figure 1C), enabling in silico assembly and reconstitution of BCR repertoire from scRNA-seq data. Because TCRα and TCRβ genes share a similar structure, FB5P-seq is equally suitable for reconstructing TCR repertoire from scRNAseq data when T cells are analyzed.

FB5P-seq Bioinformatics Workflow
The FB5P-seq data are processed to generate both a singlecell gene count matrix and single-cell BCR or TCR repertoire sequences when analyzing B cells or T cells, respectively. After extracting the well-specific barcode and UMI from Read2 sequences and filtering out low-quality or unassigned reads, we use two separate pipelines for gene expression and repertoire analysis (Figure 2). The transcriptome analysis pipeline was derived from the Drop-seq pipeline (26). Briefly, it consists of mapping all Read1 sequences to the reference genome, then quantifying, for each gene in each cell, the number of unique molecules through UMI sequences. After merging the data from all 96-well plates in the experiment, we filter the resulting gene-by-cell count matrices to exclude low-quality cells and normalize by total UMI content per cell (see section Materials and Methods).
For the extraction of BCR or TCR repertoire sequences from FB5P-seq data, we have developed our own pipeline based on de novo single-cell transcriptome assembly and mapping of reconstituted long transcripts (contigs or isoforms) on public databases of variable immunoglobulin or TCR genes. We identify and select contigs corresponding to productive V(D)J rearrangements in-frame with an identified constant region gene. In cases where multiple isoforms are identified for a given chain (e.g., IGH) in a single cell, we assign the most highly expressed isoform and discard the other one(s). In early validation experiments, our pipeline was equally efficient and accurate as RT-PCR followed by Sanger sequencing for IGH variable region analysis (data not shown), with the major advantage of retrieving complete variable regions and large portions of constant regions of both IGH and IGK/L, or TCRA and TCRB, from the same scRNA-seq run.

FB5P-seq Quality Metrics on Human Tonsil B Cell Subsets
To illustrate the performance of our scRNA-seq protocol, we obtained non-malignant tonsil cell suspensions from two adult human donors, referred to as Tonsil 1 and Tonsil 2. Based on surface marker staining, we excluded monocytes, T cells, and naïve B cells and sorted memory (Mem) B cells, germinal center (GC) B cells, and plasmablasts or plasma cells (PB/PCs) for FB5Pseq analysis ( Figure 3A). We processed the Tonsil 1 and Tonsil 2 samples in two separate experiments, generating libraries from five and six plates, respectively. The libraries were sequenced at an average depth of ∼500,000 reads per cell (Table S1). After bioinformatics quality controls (see section Materials and Methods), we retained more than 90% of cells in the gene expression dataset (Table S1). We computed per-cell accuracy ( Figure 3B) and per-experiment sensitivity ( Figure 3C) based on ERCC spike-in detection levels and rates, respectively (11,12). All cells showed high quantitative accuracy independently of their phenotype, with an overall mean correlation coefficient of 0.83 ( Figure 3B). The molecular sensitivity ranged from 9.5 to 21.2 ( Figure 3C), which compares favorably with other current scRNA-seq protocols (11,12). We detected a mean of 987, 1,712, and 1,307 genes per cell in Mem B cells, GC B cells, and PB/PCs, respectively ( Figure 3D). The GC and Mem B cells displayed higher total molecule counts (mean UMI counts of 192,765 and 145,356, respectively) than the PB/PCs (mean UMI count of 67,861) ( Figure 3E).
As expected from the method design, the FB5P-seq Read1 sequence coverage was biased toward the 5 ′ -end of gene bodies, with a broad distribution robustly covering from the 3rd to the 60th percentile of the gene body length on average ( Figure 3F). In Tonsil 1 and Tonsil 2 B cell subsets, the BCR reconstruction pipeline retrieved at least one productive BCR chain for the majority of the cells ( Figure 3G). Consistent with the high expression of BCR gene transcripts for sustained antibody production, we obtained the paired IGH and IGK/L repertoire for the vast majority of PB/PCs. In Mem and GC B cells, we obtained paired IGH and IGK/L sequences on ∼50% of the cells and only the IGK/L sequence in most of the remaining cells. The superior recovery of IGK/L sequences was likely because the expression level of IGK/L was about two-fold higher than that of the IGH in our FB5P-seq data (data not shown).
Altogether accuracy, sensitivity, gene coverage, and BCR sequence recovery highlighted the high performance of the FB5Pseq method for the integrative analysis of transcriptome and BCR repertoire in single B cells.

FB5P-seq Analysis of Human Tonsil B Cell Subsets
As a biological proof-of-concept, we further analyzed the Tonsil 1 and Tonsil 2 datasets. T-distributed stochastic neighbor embedding (t-SNE) analysis on the gene expression data discriminated three major cell clusters. Tonsil B cells clustered based on their sorting phenotype (Mem B cells, GC B cells, or PB/PCs) and did not cluster by sample origin ( Figure S1A). Cell cycle status further separated the cycling (S and G2/M phase) from the non-cycling (G1) GC B cells ( Figure S1B). The expression levels of surface protein markers recorded through index sorting were consistent with the gating strategy of Mem B cells (CD20 + CD38 lo CD10 − CD27 + ), GC B cells (CD20 + CD38 + CD10 + ), and PB/PCs (CD38 hi CD27 hi ) ( Figure Figure 4B) and identified the top marker genes for each cell subset ( Figure 4C). Those analyses were consistent with previous single-cell qPCR analyses (17) and bulk microarray analyses of human B cell subsets (27,28).
Integrating the single-cell BCR repertoire data to the t-SNE embedding, we revealed that the IGH and IGK/L repertoire of tonsil B cell subsets was polyclonal (Figures 4D,E and  Figures S1C,D). Interestingly, while the somatic mutation load was equivalent in Igκ and Igλ light chains from Mem B cells, GC B cells, and PB/PCs (Figure S1E), the IGH mutation rate depended on isotype, with the IgA + cells expressing the most mutated BCR (Figure 4F) regardless of phenotype or sample origin. By contrast, the IgM/IgD + cells exhibited the lowest somatic mutation loads ( Figure 4F).
Overall, those analyses confirmed that the FB5P-seq method is relevant for simultaneous protein, whole-transcriptome, and BCR sequence analysis in human B cells.

FB5P-seq Analysis of Human Peripheral Blood Antigen-Specific CD4 T Cells
To test whether our protocol is also effective in T cells, we applied FB5P-seq to Candida albicans-specific human CD4 T cells sorted after a brief restimulation of fresh peripheral blood mononuclear cells with a pool of MP65 antigen-derived peptides (29) (Figure 5A and section Materials and Methods). C. albicans is a common commensal in humans known to generate antigenspecific circulating memory CD4 T cells with a T H 17 profile (30). Similar to the B cell dataset, the T cell dataset displayed high per-cell accuracy ( Figure 5B) and an average of 1,890 detected genes per cell ( Figure 5C). Gene expression analysis showed an efficient detection of T cell marker genes (CD3E), activation genes (CD40LG, EGR2, NR4A1, and IL2), and TH17specific genes (CCL20, CSF2, IL22, IL23A, and IL17A) in those reactivated antigen-specific T cells ( Figure 5D). We recovered at least one productive TCRα or TCRβ chain in 88% of cells and paired TCRαβ repertoire in 61% of cells ( Figure 5E). Moreover, the CDR3β sequence analysis revealed some expanded TCRβ clonotypes likely related to MP65 antigen specificity ( Figure 5F). A principal component analysis of the gene expression data and the visualization of V β -J β TCR rearrangements revealed no apparent segregation of antigen-specific T cells expressing different clonotypes ( Figure 5G).
Taken together, these data indicate that our method is also relevant for integrative scRNA-seq analysis of human T cells, especially for tracking rare antigen-specific cells ex vivo.

DISCUSSION
FB5P-seq is a novel 5 ′ -end scRNA-seq workflow that allows an accurate gene expression analysis of hundreds of FACS-sorted cells in parallel. When applied to B or T cells, the FB5P-seq data support the in silico reconstruction of paired full-length BCR or TCR variable repertoire on a per-cell basis. As demonstrated by our studies of human tonsil B cells and peripheral blood antigen-specific T cells, FB5P-seq is particularly adapted for the integrative analysis of surface phenotype (through index sorting), gene expression, and antigen receptor repertoire of rare phenotypically defined B or T cells. Because FB5P-seq integrates three layers of barcodes (UMIs, cellular barcodes, and plate barcodes), the workflow is more cost-and time-effective than Smart-Seq2 (31), which has been so far the method of choice for plate-based scRNA-seq analysis of gene expression and antigen receptor repertoire in B or T cells (18,19,21,22). The molecular detection limit of FB5P-seq (10-20 molecules) was higher than what has been reported for Smart-seq2 (11) (seven molecules), suggesting that FB5P-seq may recover the expression of less genes per cell. In a recent benchmarking study of scRNA-seq methods, Smart-Seq2 detected the expression of ∼2,500 genes per cell in human peripheral blood mononuclear cells (PBMCs) at a sequencing depth of 1 × 10 6 reads per cell (32). The sensitivity of FB5P-seq was sufficient to recover in the order of 1,000-2,000 genes per cell on average in human lymphocytes (depending on cell type and cell cycle status) at a sequencing depth of ∼5 × 10 5 reads per cell.
One of our main constraints when developing FB5P-seq was to enable the in silico reconstruction of BCR or TCR repertoire sequences directly from the scRNA-seq reads. Because the variable regions of antigen receptor genes are encoded from the 5 ′ -end of mRNAs, 3 ′ -end scRNA-seq protocols are not suitable for the parallel analysis of BCR or TCR repertoire, unless a separate amplification with cell-specific and genespecific primers is performed on full-length cDNA before pooling single-cell contents (33). Full-length scRNA-seq protocols with paired-end sequencing have been used successfully for parallel transcriptome and repertoire sequencing of single B and T cells, with dedicated bioinformatics pipelines named TraCeR (18), BASIC (22), BraCeR (19), and VDJPuzzle (21). In FB5P-seq, the sequencing reads mainly cover the 5 ′ -ends of mRNAs in a 3 ′ to 5 ′ orientation, which generates a broad coverage of the most 5 ′ half of mRNAs when using 67-nucleotide-long reads. As a consequence, FB5P-seq can successfully reconstruct BCR or TCR repertoire sequences from the scRNA-seq reads when sequencing libraries on a 75-cycle Illumina flow cell at a targeted read depth of 5 × 10 5 reads per cell, as had been done for the datasets presented here. In that configuration, the cost of performing FB5P-seq is roughly 5 e per cell (2.5 e library preparation + 2.5 e sequencing), which is significantly lower than Smart-Seq2 (30 e library preparation + 2.5-5 e sequencing) (11). In FB5Pseq, only one library preparation is required per 96-well plate, which drastically reduces the handling time when working with manual pipetting, as compared to generating one library per cell in Smart-Seq2. Yet because library preparation only requires a small amount of pooled cDNA, the full-length amplified cDNA from each single-cell can be archived and used for downstream applications such as single-cell antibody cloning (34).
Our bioinformatics pipeline for recovering repertoire sequences differs from previously published solutions (18,19,21,22) in that it performs de novo transcriptome   reconstruction on all sequences prior to filtering contigs corresponding to the rearranged BCR or TCR chains. Because competing algorithms rely on full-length scRNA-seq pairedend reads, we did not test their ability to recover BCR or TCR sequences on FB5P-seq reads. Yet the demonstrated performance of our method on human B cell subsets, with close to 100% paired heavy and light chain reconstruction efficiency on PB/PCs, suggests that it performs at least equally well as those methods.
We have designed the FB5P-seq TSOs with short fivenucleotide UMIs, enabling the detection of a maximal molecular diversity of 4 5 = 1,024 molecules for each gene per cell. Although this may be low for some genes in some cell types, in our analyses on lymphocytes, only the CD74 gene showed recurrent saturating levels in Mem B cells and GC B cells (not shown). Longer UMI sequences could be used in the TSO design to prevent saturation issues (10,24), but template switching efficiency may be affected by increased TSO length (35). In our analyses of human lymphocytes, we have used very low levels of ERCC spike-ins (0.025 pg per well, corresponding to a 1:2,000,000 final dilution) compared to those of other scRNA-seq protocols (11,12,36). We have optimized the concentration of ERCC spike-ins for having <5% of sequencing reads mapping on ERCC genes to maximize the amount of biological information in our experiments while retaining the usefulness of ERCC spike-ins for quality control purposes. This would need to be adjusted when studying cell types with more mRNA content than the primary lymphocytes.
The FB5P-seq protocol was designed to use FACS single cell sorting for cell deposition into 96-well plates. Most cell sorting machines and software enable the use of index sorting to record the flow cytometry parameters (cell size and fluorescence levels for each channel) of each sorted cell, including for markers that were not used in the gating strategy but were included during cell staining. This may be used to analyze mRNA/protein levels in parallel or to investigate the relevance of cell surface markers to delineate transcriptionally robust cell subsets (7). In our implementation of FB5P-seq, we use the 96-well plate format, sort single cells into 2-µl lysis buffer, and use manual pipetting throughout the protocol, but our method would only require minor adjustments to work on 384-well plates with smaller volumes and automatic liquid handlers like the MARS-seq protocol (37,38). Furthermore, plates containing the lysis buffer can be stored at −80 • C and shipped on dry ice, before and after cell sorting, facilitating collaborative work and multisite projects.
High throughput droplet-based 5 ′ -end scRNA-seq (10× Genomics 5 ′ -end) is an excellent option to analyze a transcriptome and repertoire sequence for thousands of cells in parallel in complex tissues (23,39). Combining 5 ′ -end scRNA-seq with DNA-tagged antibody labeling and barcoding may further enable the multiplexing of several samples and the analysis of the surface phenotype of cells (40). Yet, because those approaches best apply when many thousands of cells are available for input, we believe that FB5P-seq is a valuable option to obtain the same multimodal information when focusing on rare cells defined by a complex surface phenotype. Furthermore, high-throughput methods may be less appropriate when one of the protein markers to be tracked in the analyzed cells is an intracellular fluorescent reporter (e.g., lineage tracer, biochemical indicator) which will not be accessible with DNA-tagged antibody labeling in intact live cells. Another limitation of high throughput droplet-based 5 ′ -end scRNA-seq is the difficulty to recover BCR sequences from non-antibody producing cells (e.g., naïve, Mem, or GC B cells, expressing low to moderate amounts of IGH and IGK/L transcripts) when antibody-producing cells (PB/PCs, expressing on the order of 100-fold more IGH and IGK/L transcripts than other B cells) are present in the analyzed sample. In that situation, most of the molecules in the BCR sequencing library will be coming from PB/PCs, leaving too few reads to successfully reconstruct a BCR repertoire in the other B cells (our unpublished observations and https://support.10xgenomics.com/single-cell-vdj/software/ overview/design). In FB5P-seq, the targeted sequencing depth (5 × 10 5 reads per cell) is sufficient to reconstruct the BCR repertoire for all B cell types, and the efficiency of our method is not dependent on the proportion of PB/PCs in the sample.
Thus, FB5P-seq is a novel alternative to Smart-seq2 and 10× Genomics 5 ′ -end scRNA-seq for the integrative analysis of transcriptome and antigen receptor repertoire in B and T cells ( Table 1). We expect that our method will provide valuable insights for decoding the mechanisms regulating the molecular and functional diversity of lymphocytes during protective immunity, auto-immunity, cancer immunity, or lymphoma development and progression.
Peripheral blood mononuclear cells were collected in Nantes University Hospital and used fresh in the peptide restimulation assays for isolating C. albicans-specific T cells.
Written informed consent was obtained from the donors.

Flow Cytometry and Cell Sorting of B Cell Subsets
Frozen live cell suspensions were thawed at 37 • C in RPMI + 10% FCS and then washed and resuspended in FACS buffer [phosphate-buffered saline (PBS) + 5% FCS + 2 mM EDTA] at a concentration of 10 8 cells/ml for staining. The cells were first incubated with 2% normal mouse serum and Fc-Block (BD Biosciences) for 10 min on ice. Then, the cells were incubated with a mix of fluorophore-conjugated antibodies (Table S2) for 30 min on ice. The cells were washed in PBS and then incubated with the Live/Dead Fixable Aqua Dead Cell Stain (Thermofisher) for 10 min on ice. After a final wash in FACS buffer, the cells were resuspended in FACS buffer at a concentration of 10 7 cells/ml for cell sorting on a four-laser BD FACS Influx (BD Biosciences). Mem B cells were gated as CD3 − CD14 − IgD − CD20 + CD10 − CD38 lo CD27 + SSC lo single live cells. GC B cells were gated as CD3 − CD14 − IgD − CD20 + CD10 + CD38 + single live cells. PB/PCs were gated as CD3 − CD14 − IgD − CD38 hi CD27 + SSC hi single live cells.
Immediately after cell sorting, each plate was covered with an adhesive film (Thermofisher), briefly spun down in a benchtop plate centrifuge, and frozen on dry ice. The plates containing single cells in lysis mix were stored at −80 • C and shipped on dry ice (only T cells) until further processing.
The plate containing the single cells in lysis mix was thawed on ice, briefly spun down in a benchtop plate centrifuge, and incubated in a thermal cycler for 3 min at 72 • C (lid temperature 72 • C). Immediately after, the plate was placed back on ice and 3 µl RT mastermix was added to each well. For library preparation, 5 µl of amplified cDNA from each well of a 96-well plate were pooled and completed to 500 µl with PCR-grade H 2 O (Qiagen). Two rounds of 0.6X solid-phase reversible immobilization beads (AmpureXP, Beckman, or CleanNGS, Proteigene) cleaning were used to purify 100 µl of pooled cDNA with final elution in 15 µl PCR-grade H 2 O (Qiagen). After quantification with Qubit dsDNA HS assay (Thermofisher), 800 pg purified cDNA pool was processed with the Nextera XT DNA sample Preparation kit (Illumina), according to the manufacturer's instructions with modifications to enrich the 5 ′ -ends of tagmented cDNA during library PCR. After tagmentation and neutralization, 25 µl tagmented cDNA was amplified with 15 µl Nextera PCR Mastermix, 5 µl Nextera i5 primer (S5xx, Illumina), and 5 µl of a custom i7 primer mix (9) (0.5 µM i7_BCx + 10 µM i7_primer, Table S3). The amplification was performed in a thermal cycler (lid temperature 72 • C) for 3 min at 72 • C, 30 s at 95 • C, followed by 12 cycles of 10 s at 95 • C, 30 s at 55 • C, 30 s at 72 • C, and then a final elongation for 5 min at 72 • C. The resulting library was purified with 0.8X solidphase reversible immobilization beads (AmpureXP, Beckman, or CleanNGS, Proteigene).
Libraries generated from multiple 96-well plates of single cells and carrying distinct i7 barcodes were pooled for sequencing on an Illumina NextSeq550 platform, with High Output 75 cycles flow cells, targeting 5 × 10 5 reads per cell in paired-end singleindex mode with the following primers (

Single-Cell RNA-seq Data Processing
We used a custom bioinformatics pipeline to process fastq files and generate single-cell gene expression matrices and BCR or TCR sequence files. Detailed instructions for running the FB5P-seq bioinformatics pipeline can be found at https:// github.com/MilpiedLab/FB5P-seq. Briefly, the pipeline to obtain gene expression matrices was adapted from the Drop-seq pipeline (26), relied on extracting the cell barcode and UMI from Read2 and aligning Read1 on the reference genome using STAR and HTSeqCount. For BCR or TCR sequence reconstruction, we used Trinity for de novo transcriptome assembly for each cell based on Read1 sequences and then filtered the resulting isoforms for productive BCR or TCR sequences using MigMap, Blastn, and Kallisto. Briefly, MigMap was used to assess whether the reconstructed contigs corresponded to a productive V(D)J rearrangement and to identify germline V, D, and J genes and the CDR3 sequence for each contig. For each cell, the reconstructed contigs corresponding to the same V(D)J rearrangement were merged, keeping the largest sequence for further analysis. We used Blastn to align the reconstructed BCR or TCR contigs against the reference sequences of constant region genes and discarded the contigs with no constant region identified in-frame with the V(D)J rearrangement. Finally, we used the pseudoaligner Kallisto to map each cell's FB5P-seq Read1 sequences on its reconstructed contigs and quantify the contig expression. In cases where several contigs corresponding to the same BCR or TCR chain had passed the above filters, we retained the contig with the highest expression level.
The per well accuracy ( Figure 3B) was computed as the Pearson correlation coefficient between log 10 (UMI ERCC−xxxxx + 1) and log 10 (#mol ERCC−xxxxx + 1), where UMI ERCC−xxxxx is the UMI count for gene ERCC-xxxxx in the well and #mol ERCC−xxxxx is the actual number of molecules for ERCCxxxxx in the well (based on a 1:2,000,000 dilution in 2 µl lysis mix per well). For each well, only ERCC-xxxxx which were detected (UMI ERCC−xxxxx > 0) were considered for calculating the accuracy.
To estimate sensitivity (Figure 3C), the percentage of wells with at least one molecule detected (UMI ERCC−xxxxx > 0) was calculated over all the wells from five or six 96-well plates corresponding to human B cells sorted from Tonsil 1 or Tonsil 2, respectively. The value for each ERCC-xxxxx gene was plotted against log 10 (#mol ERCC−xxxxx ). and a standard curve was interpolated with asymmetric sigmoidal 5PL model in GraphPad Prism 8.1.2 to compute the EC50 for each dataset.
The normalized coverage over genes ( Figure 3F) was computed with RSeQC geneBody_coverage on bam files from 11 scRNA-seq 96-well plates corresponding to human B cells sorted from Tonsil 1 and Tonsil 2.

Single-Cell Gene Expression Analysis
Quality control was performed on each dataset (Tonsil 1, Tonsil 2, and T cells) independently to remove poor-quality cells. Cells with <250 genes detected were removed. We further excluded cells with values below three median absolute deviations (MADs) from the median for UMI counts, for the number of genes detected, or for ERCC accuracy, and cells with values above three MADs from the median for ERCC transcript percentage.
For each cell, the gene expression UMI count values were log-normalized with Seurat v3 (42) NormalizeData with a scale factor of 10,000. Data from B cells of Tonsil 1 and Tonsil 2 were analyzed together. Data from C. albicans-specific T cells were analyzed separately. Four thousand variable genes, excluding BCR or TCR genes, were identified with Seurat FindVariableFeatures. After centering and scaling with Seurat ScaleData, principal component analysis was performed on variable genes with Seurat RunPCA and embedded in twodimensional t-SNE plots with Seurat RunT-SNE on 40 principal components. Cell cycle phases were attributed with Seurat CellCycleScoring using lists of S phase-or G2/M phase-specific genes as described (43). Plots showing t-SNE embeddings colored by index sorting protein expression or other metadata (including BCR or TCR sequence-related information) were generated with ggplot2 ggplot. Plots showing t-SNE embeddings colored by gene expression were generated by Seurat FeaturePlot. Gene expression heatmaps were plotted with a custom function (available upon request).
For displaying the V-J repertoire polyclonality on t-SNE embeddings (Figures 4E, 5G), we used the VH and JH (or Vβ and Jβ for T cells) reference germline gene identities (identified by the MigMap algorithm, running IgBlast) to assign VH-JH clonotype or Vβ-Jβ clonotype (cells sharing the same VH and JH reference germline genes or sharing the same Vβ and Jβ reference germline genes, respectively) for each cell with a reconstructed IGH sequence (or TCRβ sequence, respectively). For display on the gene expression-based tSNE embedding, we used ggplot function, using the clonotype assignments as color-coded metadata.