Deep Parallel Characterization of AAV Tropism and AAV-Mediated Transcriptional Changes via Single-Cell RNA Sequencing

Engineered variants of recombinant adeno-associated viruses (rAAVs) are being developed rapidly to meet the need for gene-therapy delivery vehicles with particular cell-type and tissue tropisms. While high-throughput AAV engineering and selection methods have generated numerous variants, subsequent tropism and response characterization have remained low throughput and lack resolution across the many relevant cell and tissue types. To fully leverage the output of these large screening paradigms across multiple targets, we have developed an experimental and computational single-cell RNA sequencing (scRNA-seq) pipeline for in vivo characterization of barcoded rAAV pools at high resolution. Using this platform, we have both corroborated previously reported viral tropisms and discovered unidentified AAV capsid targeting biases. As expected, we observed that the tropism profile of AAV.CAP-B10 in mice was shifted toward neurons and away from astrocytes when compared with AAV-PHP.eB. Transcriptomic analysis revealed that this neuronal bias is due mainly to increased targeting efficiency for glutamatergic neurons, which we confirmed by RNA fluorescence in situ hybridization. We further uncovered cell subtype tropisms of AAV variants in vascular and glial cells, such as low transduction of pericytes and Myoc+ astrocytes. Additionally, we have observed cell-type-specific transitory responses to systemic AAV-PHP.eB administration, such as upregulation of genes involved in p53 signaling in endothelial cells three days post-injection, which return to control levels by day twenty-five. The presented experimental and computational approaches for parallel characterization of AAV tropism will facilitate the advancement of safe and precise gene delivery vehicles, and showcase the power of understanding responses to gene therapies at the single-cell level.


INTRODUCTION
Recombinant AAVs (rAAVs) have become the preferred gene delivery vehicles for many clinical and research applications (1,2) owing to their broad viral tropism, ability to transduce dividing and non-dividing cells, low immunogenicity, and stable persistence as episomal DNA ensuring long-term transgene expression (3)(4)(5)(6)(7)(8). However, current systemic gene therapies using AAVs have a relatively low therapeutic index (9). High doses are necessary to achieve sufficient transgene expression in target cell populations, which can lead to severe adverse effects from off-target expression (10)(11)(12). Increased target specificity of rAAVs would reduce both the necessary viral dose and off-target effects; thus, there is an urgent need for AAV gene delivery vectors that are optimized for cell-type-specific delivery (13). Lower viral doses would also alleviate demands on vector manufacturing and minimize the chances of undesirable immunological responses (14)(15)(16). Capsid-specific T-cell activation was reported to be dosedependent in vitro (17,18) and in humans (19,20). Shaping the tropism of existing AAVs to the needs of a specific disease has the potential to reduce activation of the immune system by detargeting cell types, such as dendritic cells, that have an increased ability to activate T-cells (21)(22)(23)(24)(25)(26).
Although these histological approaches preserve spatial information, current technical challenges limit their application to profiling the viral tropism of just one or two AAV variants across a few gene markers, thus falling short of efficiently characterizing multiple AAVs across many complex cell types characteristic of tissues in the central nervous system (CNS). The reliance on known marker genes also prevents the unbiased discovery of tropisms since such marker genes need to be chosen a priori. Choosing marker genes is particularly challenging for supporting cell types, such as pericytes in the CNS microvasculature and oligodendrocytes, which often have less established cell type identification strategies (60,61). The advent of single-cell RNA sequencing (scRNA-seq) has enabled comprehensive transcriptomic analysis of entire cell-type hierarchies, and brought new appreciation to the role of cell subtypes in disease (62)(63)(64)(65)(66). However, experimental and computational challenges, such as the sparsity of RNA capture and detection, strong batch effects between samples, and the presence of ambient RNA in droplets, reduce the statistical confidence of claims about individual gene expression (67)(68)(69). Computational methods have been developed to address some of these challenges, such as identifying contaminating RNA (68), accounting for or removing batch effects (70)(71)(72), and distinguishing intact cells from empty droplets (69,73,74). However, strategies for simultaneously processing transcripts from multiple delivery vehicles and overcoming the computational challenges of confidently detecting individual transcripts have not yet been developed for probing the tropism of AAVs in complex, heterogeneous cell populations.
Collecting the entire transcriptome of injected and noninjected animals also offers an opportunity to study the effects of AAV transduction on the host cell transcriptome. A similar investigation has been conducted with G-deleted rabies virus (75). This study demonstrated that virus infection led to the downregulation of genes involved in metabolic processes and neurotransmission in host cells, whereas genes related to cytokine signaling and the adaptive immune system were upregulated. At present, no such detailed examination of transcriptome changes upon systemic AAV injection has been conducted. High-throughput single-cell transcriptomic analysis could provide further insight into the ramifications of AAV capsid and transgene modifications with regard to innate (76)(77)(78)(79)(80) and adaptive immune recognition (20,(81)(82)(83)(84). Innate and adaptive immune responses to AAV gene delivery vectors and transgene products constitute substantial hurdles to their clinical development (85,86). The study of brain immune response to viral gene therapy has been limited to antibody staining and observation of brain tissue slices post direct injection. In particular, prior studies have shown that intracerebral injection of rAAV vectors in rat brains does not induce leukocytic infiltration or gliosis (87,88); however, innate inflammatory responses were observed (89). Results reported by these methods are rooted in single-marker staining and thus prevent the discovery of unexpected cell-type-specific responses. A comprehensive understanding of the processes underlying viral vector or transgene-mediated responses is critical for further optimizing AAV gene delivery vectors and treatment modalities that mitigate such immune responses.
Here, we introduce an experimental and bioinformatics workflow capable of profiling the viral tropism and response of multiple barcoded AAV variants in a single animal across numerous complex cell types by taking advantage of the transcriptomic resolution of scRNA-seq techniques ( Figure 1A). For this proof-of-concept study, we profile the tropism of previously-characterized AAV variants that emerged from directed evolution with the CREATE (AAV-PHP.B, AAV-PHP.eB) (32,34) or M-CREATE (AAV-PHP.C1, AAV-PHP.C2, AAV-PHP.V1, AAV.CAP-B10) (45,90) platforms. We selected the AAV variants based on their unique CNS tropism following intravenous injection. AAV-PHP.B and AAV-PHP.eB are known to exhibit overall increased targeting of the CNS compared with AAV9 and preferential targeting of neurons and astrocytes. Despite its sequence similarity to AAV-PHP.B, the tropism of AAV-PHP.V1 is known to be biased toward transducing brain vascular cells. AAV-PHP.C1 and AAV-PHP.C2 have both demonstrated enhanced blood-brain barrier (BBB) crossing relative to AAV9 across two mouse strains (C57BL/6J and BALB/cJ). Finally, AAV.CAP-B10 is a recently-developed variant with a bias toward neurons compared to AAV-PHP.eB (90).
In the initial validation experiment, we quantify the transduction biases of AAV-PHP.eB and AAV.CAP-B10 across major cell types using scRNA-seq, and demonstrate results which correlate well with both published results and conventional IHC-based quantification. We then demonstrate the power of the transcriptomic approach by going beyond the major cell types to reveal significant differences in cell-subtype transduction specificity. Compared with AAV.CAP-B10, AAV-PHP.eB displays biased targeting of inhibitory neurons, and both variants transduce Sst+ or Pvalb+ inhibitory neurons more efficiently than Vip+ inhibitory neurons. We validate these results with fluorescent in situ hybridizationhybridization chain reaction (FISH-HCR). We then develop and validate a barcoding strategy to investigate the tropism of AAV-PHP.V1 relative to AAV-PHP.eB in non-neuronal cells and reveal that pericytes, a subclass of vascular cells, evade transduction by this and other variants. We further use scRNA-seq to profile celltype-specific responses to AAV-PHP.eB at 3 and 25 days postinjection (DPI), finding numerous genes implicated in the p53 pathway in endothelial cells to be upregulated at 3 DPI and returning back to control levels at 25 DPI. Finally, we showcase the capabilities of parallel characterization by verifying the preceding findings in a single animal with seven co-injected AAV variants and revealing their respective cell-type biases.

Multiplexed Single-Cell RNA Sequencing-Based AAV Profiling Pipeline
To address the current bottleneck in AAV tropism profiling, we devised an experimental and computational workflow ( Figure 1A) that exploits the transcriptomic resolution of scRNA-seq to profile the tropism of multiple AAV variants The full length cDNA library is fragmented for sequencing as part of the single-cell sequencing protocol (top). To enable viral tropism characterization of multiple rAAVs in parallel, an aliquot of the intact cDNA library undergoes further PCR amplification of viral transcripts (bottom). During cDNA amplification, Illumina sequencing primer targets are added to the viral transcripts such that the sequence in between the Illumina primer targets contains the AAV capsid barcode sequence. Viral cargo in the cell transcriptome is converted to variant barcodes by matching the corresponding cell barcode + UMI in the amplified viral transcript library (right). across complex cell-type hierarchies. In this workflow, single or multiple barcoded rAAVs are injected into the retro-orbital sinus of mice followed by tissue dissociation, single-cell library construction using the 10X Genomics Chromium system, and sequencing with multiplexed Illumina next-generation sequencing (NGS) (69). The standard mRNA library construction procedure includes an enzymatic fragmentation step that truncates the cDNA amplicon such that its final size falls within the bounds of NGS platforms ( Figure 1B). These cDNA fragments are only approximately 450 bp in length and, due to the stochastic nature of the fragmentation, sequencing from their 5' end does not consistently capture any particular region. The fragment length limit and heterogeneity pose a problem for parallelizing AAV tropism profiling, which requires reliable recovery of regions of the transgene that identify the originating AAV capsid. For example, posttranscriptional regulatory elements, such as the 600 bp Woodchuck hepatitis virus posttranscriptional regulatory element (WPRE), are commonly placed at the 3' end of viral transgenes to modulate transgene expression. The insertion of such elements pushes any uniquely identifying cargo outside the 450 bp capture range, making them indistinguishable based on the cDNA library alone (Supplemental Figure 1A). An alternative strategy of adding barcodes in the 3' polyadenylation site also places the barcode too distant for a 5' sequencing read, and reading from the 3' end would require sequencing through the homopolymeric polyA tail, which is believed to be unreliable in NGS platforms (91,92). We circumvented these limitations in viral cargo identification by taking an aliquot of the intact cDNA library and adding standard Illumina sequencing primer recognition sites to the viral transcripts using PCR amplification such that the identifying region is within the two Illumina primer target sequences (e.g. Figure 2B). The cell transcriptome aliquots undergoing the standard library construction protocol and the amplified viral transcripts are then sequenced as separate NGS libraries. We sequence shorter viral transcripts in the same flow cell as the cell transcriptomes and longer viral transcripts on the Illumina MiSeq, which we found to be successful at sequencing cDNAs up to 890 bp long. The sequencing data undergoes a comprehensive data processing pipeline (see Methods). Using a custom genome reference, reads from the cell transcriptome that The fraction of the total number of transduced cells labeled as expressing the corresponding marker gene. For each AAV variant, the results of a two-way ANOVA with correction for multiple comparisons using Sidak's test are reported with adjusted P-values (****P ≤ 0.0001, ***P ≤ 0.001, and **P ≤ 0.01, and *P ≤ 0.05 are shown; P > 0.05 is not shown). (D) Comparison of transduction rates based on quantification via scRNA-seq or IHC. Transduction rate in IHC was calculated as (number of transduced cells in the group)/(total number of cells in the group). Transduction rate in scRNA-seq is based on an estimate of the fraaction of cells containing transcripts above background (see Methods). Each dot represents the transduction rate of neurons/Rbfox3+, astrocytes/S100b+, or oligodendrocytes/Olig2+ by AAV-PHP.eB or AAV.CAP-B10 in one animal. Histology data are averages across three brain slices per gene marker and animal. r indicates the Pearson correlation coefficient.  Figure 1C). In parallel, reads from the amplified viral transcripts are used to count the abundance of each viral barcode associated with each cell barcode and unique molecular identifier (UMI). The most abundant viral barcode for each cell barcode and UMI is assumed to be the correct viral barcode, and is used to construct a variant lookup table. This lookup table approach identifies an originating capsid in 65.7 ± 2.3% of viral transcripts detected in the cell transcriptome aliquots (Supplemental Table 4).
For determining viral cell-type tropism, we developed a method to estimate the fraction of cells within a cell type that express viral transcripts. Viral RNA expression levels depend on both the multiplicity of infection and the transcription rate of the delivered cargo. Thus, directly using viral RNA counts to determine tropism is confounded by differences in transcription rate between cell types, limiting comparison with imaging-based tropism quantification methods. As evidence of this, we detected that viral RNA expression levels can vary by cell type but are not perfectly rank correlated with the percent of cells detected as expressing that transcript (Supplemental Figure 2B). An additional confound arises from the ambient RNA from cellular debris co-encapsulated with cell-containing droplets, which can lead to false positives, i.e., detecting viral RNA in droplets containing a cell that was not expressing viral RNA. For example, we detected low levels of viral transcripts in large percentages of cells, even in cell types suspected to evade transduction, such as immune cells (Supplemental Figure 2A). To reduce the effect of both variability in expression and ambient RNA, we developed an empirical method to estimate the percentage of cells expressing transcripts above the noise, wherein the distribution of viral transcript counts in a set of cells of interest is compared to a background distribution of cellfree (empty) droplets (see Methods, Supplemental Figure 2C). In simulation, this method accurately recovers the estimated number of cells expressing transcripts above background across a wide range of parameterizations of negative binomial distributions (see Methods, Supplemental Figure 2D). Importantly, this method can yield a different ranking of viral tropism as compared to mean transcript expression rate (Supplemental Figure 2E).
To address several additional technical problems in default single-cell pipelines, we developed a simultaneous quality control (QC) and droplet identification pipeline. Our viral transduction rate estimation method described above relies on having an empirical background distribution of viral transcript counts in empty droplets to compare against the cell type of interest. However, the default cell vs. empty droplet identification method provided by the 10X Cell Ranger software, which is based on the EmptyDrops method (73), yielded unexpectedly high numbers of cells and clusters with no recognizable marker genes, suggesting they may consist of empty droplets of ambient RNA or cellular debris (Supplemental Figures 3A, B). Additionally, we sought to remove droplets containing multiple cells (multiplets) from our data due to the risk of falsely attributing viral tropism of one cell type to another. However, using Scrublet (93), an established method for identifying droplets containing multiplets, failed to identify multiplets in some of our samples and only identified small proportions of clusters positive for known non-overlapping marker genes, such as Cldn5 and Cx3cr1 (Supplemental Figure 3C). To address both the empty droplet and multiplet detection issues, we built a droplet classification pipeline based on scANVI, a framework for classifying single-cell data via neural-network-based generative models (94). Using clusters with a high percentage of predicted multiplets from Scrublet as training examples of multiplets, and clusters positive for known neuronal and non-neuronal marker genes as training examples of neurons and non-neuronal cells, we trained a predictive model to classify each droplet as a neuron, non-neuron, multiplet, or empty droplet (see Methods, Supplemental Figure 4A). This model performed with 97.4% accuracy on 10% of cells held out for testing, and yielded a database of 334,151 cortical cells (Supplemental Figure 4B). Inspection of the cells classified as empty droplets reveals that these droplets have lower transcript counts and higher mitochondrial gene ratios, consistent with other single-cell quality control pipelines (Supplemental Figure 4D). Critically, we discovered that non-neuronal clusters contained significantly more cells that had been previously removed by the Cell Ranger filtering method as compared to neuronal clusters (P = 0.025, 2-sided student ttest). In some clusters within cell subtypes, such as mature oligodendrocytes and endothelial cells, we identified up to 67% more cells than what were recovered via Cell Ranger.
Using our combined experimental and computational pipeline for viral transcript recovery and droplet identification, we can recover a lower bound on the expected number of cells expressing each unique viral cargo within groups of cells in heterogeneous samples.

Single-Cell RNA Sequencing
Recapitulates AAV Capsid Cell-Type-Specific Tropisms As a first step, we validated our method by comparing the quantification of AAV transduction of major cell types via scRNA-seq to conventional IHC. For this purpose, we characterized the tropism of two previously reported AAV variants, AAV-PHP.eB (32) and AAV.CAP-B10 (90) (Figure 2A). In total, four animals received single or dual retro-orbital injections of AAV-PHP.eB and/or AAV.CAP-B10 with 1.5 × 10 11 viral genomes (vg) per variant. Co-injection of both variants served to test the ability of our approach to parallelize tropism profiling. By having each variant package a distinct fluorophore, tropism could be simultaneously assessed via multi-channel fluorescence and mRNA expression of the distinct transgene. After 3-4 weeks of expression, we harvested the brains and used one hemisphere for IHC and one hemisphere for scRNA-seq. To recover viral transcripts, we chose primers such that enough of the XFP sequence was contained within the Illumina primer target sequences to differentiate the two variants ( Figure 2B, Supplemental Table 1). For this comparison, we focused on the transduction rate for neurons (Rbfox3), astrocytes (S100b), and oligodendrocytes (Olig2). For IHC, a cell was classified as positive for the marker gene on the basis of antibody staining, and was classified as transduced on the basis of expression of the delivered fluorophore. For scRNA-seq, all cells that passed our QC pipeline were projected into a joint scVI latent space and clustered. To most closely match our imaging quantification, we considered all clusters that were determined to be positive for the respective marker gene as belonging to the corresponding cell type (see Methods). All clusters of the same marker gene were grouped together, and the transduction rate of the combined group of cells was determined using our viral transduction rate estimation method. Our analysis of the scRNA-seq data demonstrates that the viral tropism biases across the three canonical marker genes are consistent with previous reports ( Figure 2C) (32,90). In contrast to AAV-PHP.eB, AAV.CAP-B10 preferentially targets neurons over astrocytes and oligodendrocytes. No marked discrepancies in viral tropism characterization were observed with single versus dual injections.
To quantify the similarity of the AAV tropism characterizations obtained with IHC and scRNA-seq, we directly compared the transduction rate of each AAV variant for every cell type and its corresponding marker gene (i.e., Rbfox3, S100b, or Olig2) as determined by each technique and noticed a good correlation ( Figure 2D). Despite the different underlying biological readouts-protein expression in IHC and RNA molecules in labeled cell types for scRNA-seq-the two techniques reveal similar viral tropisms.

Tropism Profiling at Transcriptomic Resolution Reveals AAV Variant Biases for Neuronal Subtypes
After validating our approach against the current standard of AAV tropism characterization (IHC imaging), we scrutinized the tropism of AAV-PHP.eB and AAV.CAP-B10 beyond the major cell types ( Figure 3). Since AAV.CAP-B10 has increased neuronal bias relative to AAV-PHP.eB, we first sought to understand if there were neuronal subtypes that were differentially responsible for this bias. However, in-depth cell typing of transcriptomes collected from tissues with numerous and complex cell types, such as neurons in the brain, requires expert knowledge of the tissue composition, time to manually curate the data, and the availability of large datasets (66). To minimize the burden of manual annotation, computational tools have been developed that use previously-annotated single-cell databases to predict the cell type of cells in new, unannotated single-cell experiments, even across single-cell platforms (94,96,97). We decided to leverage these tools and expanded our marker gene-based cell typing approach by having more complicated or well-established cell types be assigned based on annotations in a reference dataset (Supplemental Figure 4A). To this end, we again employed scANVI to construct a joint model of cells from our samples and cells from an annotated reference database. For this model, we used the Mouse Whole Cortex and Hippocampus 10x v2 dataset available from the Allen Brain Institute (95).
Since this is a neuron-enriched dataset, we constructed the model using only the 125,341 cells in our dataset classified as neurons from our marker-based QC pipeline combined with the 561,543 neuronal cells from cortical regions from the reference database. We trained this model to predict to which of 14 neuron subtype groupings each cell belonged. We held out 10% of the data for testing: the model performed with 97.9% classification accuracy on the held-out data. We then applied the model to predict the neuron subtypes of our cells.
During our in-depth characterization, we discovered several previously unnoticed cell-subtype biases for AAV-PHP.eB and AAV.CAP-B10 ( Figure 3A). Starting at the top of our neuronal hierarchy, the fraction of transduced cells that were glutamatergic neurons was markedly reduced for AAV-PHP.eB compared with AAV.CAP-B10 (P = 0.03, 2-sided student t-test, corrected for 2 neuron subtype comparisons). Furthermore, Pvalb+ and Sst+ inhibitory neurons both represented a larger fraction of transduced cells than Vip+ inhibitory neurons (adjusted P = 0.0009, P = 0.045, respectively, two-way ANOVA with multiple comparison correction for inhibitory neuron subtypes using Tukey's method).
To confirm these tropism biases in neuronal subtypes with a traditional technique, we performed FISH-HCR for glutamatergic and GABAergic gene markers ( Figure 3B) (98,99). As indicated by our scRNA-seq data, AAV.CAP-B10, when compared with AAV-PHP.eB, has increased transduction efficiency of glutamatergic neurons (SLC17A7). Furthermore, FISH-HCR verified the downward trend in transduction efficiency from Pvalb+, to Sst+, to Vip+ neurons in both AAV variants ( Figure 3C).

Pooled AAVs Packaging Barcoded Cargo Recapitulate the Non-Neuronal Tropism Bias of PHP.V1
To enable profiling viral variants in parallel without needing distinct transgenes per variant, we established a barcoding strategy whereby we package AAV variants with the same transgene and regulatory elements but with short, distinguishing nucleotide sequences within the 3' UTR ( Figure 4A). To verify that this barcoding strategy can recover tropisms consistent with our previous transgene-based capsid-identification strategy, we performed a set of experiments to re-characterize the tropism of AAV-PHP.eB in parallel with that of the recently developed AAV-PHP.V1, which has increased specificity for vascular cells over AAV-PHP.eB (45).
We produced AAV-PHP.eB carrying CAG-mNeonGreen and AAV-PHP.V1 carrying either CAG-mRuby2 or CAG-tdTomato. Additionally, we produced AAV-PHP.eB and AAV-PHP.V1 both carrying CAG-mNeonGreen with 7-nucleotide barcodes 89 bp upstream of the polyadenylation start site such that they did not interfere with the WPRE. We ensured each barcode had equal G/C content, and that all barcodes were Hamming distance 3 from each other (Supplemental Table 5). Each of the barcoded variants was packaged with multiple barcodes that were pooled together during virus production. Four animals received a retroorbital co-injection of 1.5 x 10 11 vg/each of AAV-PHP.V1 and AAV-PHP.eB. Two animals received viruses carrying separate fluorophores (cargo-based), and two animals received viruses carrying the barcoded cargo (barcode-based). For amplification of the viral cDNA in the animals receiving the barcoded cargo, we used primers closer to the polyA region such that the sequencing read covered the barcoded region (Supplemental Table 1). During the single-cell sequencing dissociation and recovery, one of our dissociations resulted in low recovery of neurons (Supplemental Figure 4C); thus, we investigated only non-neuronal cells for this experiment. (C) Confirmation of viral tropism biases across neuronal subtypes using FISH-HCR (3 mice per AAV variant, 1.5 × 10 11 vg dose). Dots represent the average values across three brain slices from one animal. Results from a two-way ANOVA with correction for multiple comparisons using Tukey's test is reported with adjusted Pvalues (****P ≤ 0.0001; and P > 0.05 is not shown on the plot). Despite variability in the total transgene RNA content between barcodes of the same variant (Supplemental Figure 5A), the estimated percent of cells expressing the transgene within each cell type was consistent between barcodes within a single animal, with standard deviations ranging from 0.002 to 0.056 (Supplemental Figure 6A). Our analysis of both the barcode-based animals and cargo-based animals shows the same bias in non-neuronal tropism, with AAV-PHP.eB significantly preferring astrocytes over oligodendrocytes, vascular cells, and immune cells ( Figure 4D). Interestingly, our analysis also revealed that the variance between barcodes within an animal was less than the variance between animals, even when controlling for cargo and dosage (P = 0.030, Bartlett's test, P-values combined across all variants and cell types using Stouffer's method, weighted by transduced cell type distribution). This is not surprising, since we found differences in cell type distribution alone can account for up to 58% of the perceived variability in tropism bias (Supplemental Figure 6B).
Next, we investigated the distribution of cells transduced by AAV-PHP.eB vs AAV-PHP.V1 in the major non-neuronal cell types across both barcode-based and cargo-based paradigms ( Figure 4E). The single-cell tropism data confirms the previously-established finding that AAV-PHP.V1 has a bias toward vascular cells relative to AAV-PHP.eB. Additionally, we uncovered that this is coupled with a bias away from astrocytes relative to AAV-PHP.eB, but that transduction bias of oligodendrocytes and immune cells did not differ between the variants. To investigate for a specific effect of the barcoding strategy, we performed a three-way ANOVA across the variant, . Results from the different experimental paradigms are combined. Results shown are from a two-way ANOVA with correction for multiple comparisons using Sidak's test comparing transduction by AAV-PHP.eB to AAV-PHP.V1 for each cell type, with adjusted P-values (****P ≤ 0.0001, ***P ≤ 0.001 is shown; P > 0.05 is not shown). (F) Within-animal difference in the fraction of cells transduced with AAV-PHP.V1 relative to AAV-PHP.eB across four animals, two from each experimental paradigm. For each cell type in each sample, the combined 2-proportion z score for the proportion of that cell type transduced by AAV-PHP.V1 vs AAV-PHP.eB is reported. Cell types with fewer than 2 cells transduced by both variants were discarded. Z scores were combined across multiple animals using Stouffer's method and corrected for multiple comparisons. Cell-type differences with an adjusted P-value below 0.05 are indicated with *.
cell type, and experimental paradigm factors. We found that the cell type factor accounted for 87.80% of the total variation, the combined cell type + variant factor accounted for 8.39% of the total variation, and the combined cell type + experimental paradigm factor accounted for only 2.36% of the total variation, confirming our hypothesis that barcoded pools can recover tropism with minimal effect.

Relative Tropism Biases Reveal Non-Neuronal Subtypes With Reduced AAV Transduction
To further characterize the tropism biases of AAV-PHP.V1 and expand our method to less well-established cell hierarchies, we explored the non-neuronal cell types in our dataset. Since the Allen Brain Institute reference database that we used to investigate neuronal tropism was enriched for neurons, it does not contain enough non-neuronal cells to form a robust nonneuronal cell atlas. Our combined dataset consists of 203,661 non-neuronal cells, making it large enough to establish our own non-neuronal cell clustering. Thus, we performed an additional round of automatic clustering on the cells classified as nonneuronal in our combined dataset, and identified 13 nonneuronal cell subtypes based on previously established marker genes (Figures 4B, C and Supplemental Table 2). Most cell subtypes had multiple clusters assigned to them, which suggested there may be additional subtypes of cells for which we did not find established marker genes. To determine whether any of these clusters delineated cell types with distinct transcriptional profiles, we investigated the probability of gene expression in each cluster compared to the other clusters of the same cell subtype (see Methods). Our approach determined two subclusters of pericytes and astrocytes. Both clusters of pericytes had strong expression of canonical pericytes marker genes Rgs5, Abcc9, and Higd1b. However, one of the clusters had no marker genes that made it distinct from the other pericyte cluster, nor from endothelial cells. Consistent with previous reports, this suggests that this cluster could be pericytes contaminated with endothelial cell fragments, and thus was not considered for further analysis (100)(101)(102). Two distinct groups of astrocytes were detected, one of which had unique expression of Myoc and Fxyd6. Using these new marker genes, we expanded our nonneuronal cell taxonomy to 13 cell types, now including Myoc+ and Myoc-astrocytes.
Given our finding that inter-sample variability exceeds intrasample variability, we established a normalization method for comparing transduction biases between variants co-injected into the same animal. This normalization-calculating the difference in the fraction of transduced cells between variants-captures the relative bias between variants, instead of the absolute tropism of a single variant (see Methods). By considering the relative bias between variants, we are able to interrogate tropism in a way that is more robust to inter-sample variability that arises from different distributions of recovered cells, expression rate of delivered cargo, and success of the injection. Using this normalization method, we evaluated the non-neuronal cell type bias of AAV-PHP.V1 relative to AAV-PHP.eB in both the cargo-based animals and the barcode-based animals across our non-neuronal cell-type taxonomy ( Figure 4F). We discovered that the bias of AAV-PHP.V1 for vascular cells is driven by an increase in transduction of endothelial cells, but not pericytes. Similarly, AAV-PHP.V1's bias away from astrocytes is driven by a decrease in transduction of Myoc-astrocytes, but not Myoc+ astrocytes. Further inspection of the transduction of pericytes and Myoc+ astrocytes revealed that pericytes are not highly transduced by any of the AAVs tested in this work, and that Myoc+ astrocytes have both lower viral transcript expression and lower abundance than Myoc-astrocytes, and thus do not contribute significantly to tropism (Supplemental Figures 4,  7A, B).

Single-Cell RNA Sequencing Reveals
Early Cell-Type-Specific Responses to IV Administration of AAV-PHP.eB That Return to Baseline by 3.5 Weeks To investigate the temporal cell-type-specific transcriptional effects of systemic AAV delivery and cargo expression, we performed a single-cell profiling experiment comparing animals injected with AAV to saline controls. We injected six male mice with AAV-PHP.eB (1.5 x 10 11 vg) carrying mNeonGreen, and performed single-cell sequencing on three mice three days post-injection (3 DPI) and three mice twentyfive days post-injection (25 DPI). These time points were chosen based on previous work showing MHC presentation response peaking around day seven and transgene response peaking around day 30 (89). Three saline control mice were processed 3 DPI. We then analyzed differential gene expression for each cell type between injected animals and controls using DESeq2 (Supplemental Table 8). Of note, we excluded cell types with less than 50 cells in each sample, and excluded leukocytes and red blood cells given the risk of their presence due to dissociation rather than chemokine mediated infiltration. Additionally, we collapsed subtypes of excitatory neurons, inhibitory neurons, and OPCs to have greater than 50 cells for differential expression analysis. We estimated viral transduction rate of AAV-PHP.eB using its delivered cargo, mNeonGreen, across cell types and time points. We identified that Myoc-astrocytes have significantly higher estimated transduction rate at 25 DPI compared to 3 DPI (adjusted P-value = 0.0438, two-way ANOVA with multiple comparison correction using Sidak's method). It is also worth noting that endothelial cells have a similar transduction rate between the time points in all animals, while one of the animals at 25 DPI exhibited higher transduction in neurons ( Figure 5A). The number of statistically relevant genes between the injected and control group (adjusted P-value < 0.05, DESeq2) were highest in endothelial cells (41 genes) at 3 DPI, followed by inhibitory neurons (9 genes) at 25 DPI ( Figure 5B) (adjusted P-value < 0.05, DESeq2).
We found that endothelial cells had the most acute response at 3 DPI with p53 signaling pathway notably impacted. A significant upregulation of Phlda3 and its effectors Bax, Aen, Mdm2, and Cdkn1a, all involved in the p53/Akt signaling pathway, was present ( Figures 5C, E) (103, 104). Of relevance, we also detected Trp53cor1/LincRNA-p21, responsible for negative regulation of gene expression (105), upregulated in  endothelial cells at 3 DPI. Another example of an upregulated gene relevant to inflammation and stress response in endothelial cells is Mmrn2, responsible for regulating angiogenesis in endothelial cells (106). In brain immune cells, we observe a few substantial changes in genes pertaining to immune regulation at 3 DPI for microglia and at 25 DPI for perivascular macrophages. For example, we observe an upregulation of MHC-I gene H2-K1 at 3 DPI in microglia, which then stabilizes back to control levels at 25 DPI ( Figure 5C). Ifitm3 and Slfn2, genes implicated in type I interferon response (107,108), also show upregulation at 3 DPI in microglia. Cd74, a chaperon responsible for regulating antigen presentation during immune response, was upregulated in perivascular macrophages at 3 DPI (109). We did not observe significant differences in pro-inflammatory chemokines, Ccl2 and Ccl5, which are related to breakdown of the blood-brain barrier via regulation of tight-junction proteins and recruitment of peripheral leukocytes (110). Ccl3, responsible for infiltration of leukocytes and CNS inflammation (111), was upregulated in perivascular macrophages in 25 DPI ( Figure 5C).
We found that neurons had only a few differentially expressed genes at 25 DPI. Immediate early genes, such as Fos and Junb were upregulated in inhibitory neurons, while genes involved in modulating cell proliferation, such as Tafa1 and S1pr1, were downregulated at 25 DPI (79,112).
By investigating the gene expression differences in subpopulations of cells post-injection, we found that endothelial cells upregulate genes linked to p53 signaling at 3 DPI ( Figure 5D) which all return to control levels at 25 DPI. Immune cells such as microglia and perivascular macrophages upregulate genes involved in type I interferon response, MHC-I antigen processing, and chemokine signaling ( Figure 5D). Inhibitory neurons display a subtle effect, consisting of differential expression of genes involved in stress response and cell proliferation at 25 DPI.

Larger Pools of Barcoded AAVs Recapitulate Complex Tropism Within a Single Animal
To showcase the capabilities of parallel characterization, we next designed a 7-variant barcoded pool that included the three previously characterized variants (AAV-PHP.eB, AAV-CAP-B10, and AAV-PHP.V1), AAV9 and AAV-PHP.B controls, and two additional variants, AAV-PHP.C1 and AAV-PHP.C2. For simplification of cloning and virus production, we designed a plasmid, UBC-mCherry-AAV-cap-in-cis, that contained both the barcoded cargo, UBC-mCherry, and the AAV9 capsid DNA (Supplemental Figure 1B). We assigned three distinct 24 bp barcodes to each variant (Supplemental Table 5). Each virus was produced separately to control the dosage, and 1.5 x 10 11 vg of each variant was pooled and injected into a single animal.
After 3 weeks of expression, we performed single-cell sequencing on extracted cortical tissue. To increase the number of cells available for profiling, we processed two aliquots of cells, for a total of 36,413 recovered cells. To amplify the viral transcripts, we used primers that bind near the 3' end of mCherry such that the barcode was captured in sequencing (Supplemental Table 1).
Using our cell typing and viral transcript counting methods, we investigated the transcript counts and transduction bias of the variants in the pool. Compared with our previous profiling experiments, the log-transformed transcript abundance of UBC-mCherry detected per cell was lower than CAG-mNeonGreen-WPRE and CAG-tdTomato (adjusted P < 0.0001, P=0.0767, respectively, two-way ANOVA with multiple comparison correction using Tukey's method) and shifted towards vascular cells (adjusted P < 0.0001, P=0.0004, respectively, two-way ANOVA with multiple comparison correction using Tukey's method) (Supplemental Figures 5B, C). Next, we looked at the transduction rate difference for each variant compared with the rest of the variants in the pool for each cell type in our taxonomy ( Figures 6A, B). Despite the lower expression rate and bias shift, the transduction rate difference metric captured the same tropism biases for AAV.CAP-B10 and AAV-PHP.V1 as determined from our previous experiments. AAV.CAP-B10 showed enhanced neuronal targeting relative to other variants in the pool, with this bias coming specifically from an increase in the transduction of glutamatergic neurons. All five variants with transcripts detected in neurons showed a decreased transduction rate in Vip+ neurons relative to other GABAergic neuronal subtypes (Supplemental Figure 7C). AAV-PHP.eB showed enhanced targeting of astrocytes (+5.9%, P = 3.0 x 10 -10 , 2-proportion z-test, multiple comparison corrected with Benjamini/Hochberg correction), and AAV-PHP.V1 showed strong bias for vascular cells (+49.7%, p = 1.7 x 10 -45 ). In addition to confirming all our existing hypotheses, we were able to identify biases for the previously reported AAV-PHP.C2, which has not been characterized in depth. This variant, which was reported as having a non-neuronal bias similar to AAV-PHP.V1, showed significant transduction bias not only toward vascular cells (+15.7%, P = 1.5 x 10 -7 ), but also toward astrocytes (+21.5%, P = 3.0 -28 ), and a bias away from neurons (−38%, p = 4.5 x 10 -32 ).

DISCUSSION
The advent of NGS has enabled screening of large libraries of AAV capsids in vivo by extracting viral DNA from relevant tissue followed by sequencing of capsid gene inserts or DNA barcodes corresponding to defined capsids. To date, NGS-based screening has been successfully applied to libraries created by peptide insertions (28,113), DNA shuffling of capsids (114)(115)(116), and site-directed mutagenesis (117). Although these NGS-based strategies allow the evolution of new AAV variants with diverse tissue tropisms, it has been difficult to obtain a comprehensive profiling for multiple variants across cell types, which is of utmost importance in organs with complex cell-type compositions, such as the brain (34,45,(64)(65)(66). Towards this end, techniques such as IHC, fluorescent in situ RNA hybridization (98,(118)(119)(120)(121)(122) or in situ RNA sequencing (123)(124)(125) can be employed. Several limitations make it challenging to apply these techniques as high-throughput, postselection AAV tropism profiling methods. First, the limits of optical resolution and the density of transcripts in single cells pose challenges for full in situ transcriptome analysis and, until recently, have restricted the total number of simultaneously measured genes in single cells within tissue to several hundred (121,(123)(124)(125)(126). By contrast, scRNA-seq with the 10x Genomics Chromium system enables detection of over 4000 genes per cell (95), fast transcriptomic analysis, and multiplexing across different tissue types (127,128). Furthermore, the method is already widely used by the research community which can help with adoption of our proposed pipelines. Although droplet-based scRNA-seq methods lose spatial information during the dissociation procedure, analysis packages have been developed that can infer single-cell localization by combining scRNA-seq data with pre-existing information from ISH-based labeling for specific marker genes (129)(130)(131)(132)(133)(134). Therefore, scRNA-seq techniques have great potential to rapidly profile the tropism of multiple AAV variants in parallel across several thousand cells defined by their entire transcriptome.
Here, we established an experimental and data-analysis pipeline that leverages the capabilities of scRNA-seq to achieve simultaneous characterization of several AAV variants across multiplexed tissue cell types within a single animal. To differentiate multiple AAV capsid variants in the sequencing data, we packaged variants with unique transgenes or the same transgene with unique barcodes incorporated at the 3' end. We added standard Illumina sequencing primer recognition sites (Read 2) to the viral transcripts using PCR amplification such that the barcoded region could be consistently read out from the Illumina sequencing data. Our computational pipeline demultiplexes viral reads found in the transcriptome according to which matching sequence is most abundant in a separate amplified viral transgene library. Comparing the distribution of viral transcripts by cell type to a null model of empty droplets, we could then determine the cell-type biases.
Our platform has corroborated the tropism of several previously characterized AAV variants and has provided more detailed tropism information beyond the major cell types. The fraction of transduced cells that are glutamatergic neurons was found to be markedly reduced for AAV-PHP.eB when compared with AAV.CAP-B10. Furthermore, within all the variants we tested, both Pvalb+ and Sst+ inhibitory neurons have greater transduction rates than Vip+ neurons. This bodes well for delivery to Pvalb+ neurons, which have been implicated in a wide range of neuro-psychiatric disorders (135), and suggests Vip+ interneurons, which have recently been identified as being a sufficient delivery target for induction of Rett syndrome-like symptoms, as a target for optimization (136). Awareness of neuronal subtype biases in delivery vectors is critical both for neuroscience researchers and for clinical applications. Dissection of neural circuit function requires understanding the roles of neuronal subtypes in behavior and disease and relies on successful and sometimes specific delivery of transgenes to the neuronal types under study (1). We further discovered that the vascular bias of AAV-PHP.V1 originates from its transduction bias towards endothelial cells. Interestingly, this is the only cell type we detected expressing Ly6a (Supplemental Figure 8), a known surface receptor for AAV variants in the PHP.B family (137)(138)(139). Given AAV-PHP.V1's sequence similarity to AAV-PHP.B and its tropism across mouse strains, this pattern suggests that AAV-PHP.V1 transduction may also be Ly6a-mediated. Finding such associations between viral tropism and cell-surface membrane proteins also suggests that full transcriptome sequencing data may hold a treasure trove of information on possible mechanisms of transduction of viral vectors.
We also revealed that AAV-PHP.C2 has a strong, broad nonneuronal bias toward both vascular cells and astrocytes. AAV-PHP.C2 also transduces BALB/cJ mice, which do not contain the Ly6a variant that mediates transduction by PHP.B family variants (137)(138)(139). This suggests that PHP.C2 may be the most promising candidate from this pool for researchers interested in delivery to non-neuronal cells with minimal neuronal transduction both in C57BL/6J mice and in strains and organisms that do not have the Ly6a variant.
All our tested variants with non-neuronal transduction have lower expression in Myoc+ astrocytes and pericytes. Astrocytes expressing Myoc and Gfap, which intersect in our data (Supplemental Figure 8), have been previously identified as having reactive behavior in disease contexts, making them a target of interest for research on neurological diseases (140,141). Similarly, pericytes, whose dysfunction has been shown to contribute to multiple neurological diseases, may be an important therapeutic target (60,142,143). Both of these cell types may be good candidates for further AAV optimization but may have been missed with marker gene-based approaches. In both AAV characterization and neuroscience research efforts, different marker genes are often used for astrocyte classificationsometimes more restrictive genes such as Gfap, and other times more broadly expressing genes such as S100b or Aldh1l1 (144,145). Similarly, defining marker genes for pericytes is still an active field (100, 102). Given the constraints of having to choose specific marker genes, it is difficult for staining-based characterizations to provide tropism profiles that are relevant for diverse and changing research needs. This highlights the importance of using unbiased, full transcriptome profiling for vector characterization.
We have shown that our combined experimental and computational platform is able to recover transduction biases and profile multiple variants in a single animal, even amidst the noise of ambient RNA. We have further shown that our method is robust to the variability inherent in delivery and extraction from different animals, with different transgenes, and with different regulatory elements. For example, we discovered lower overall expression from vectors carrying UBC-mCherry compared with CAG-mNeonGreen-WPRE. Such differences are not surprising since the WPRE is known to increase RNA stability and therefore transcript abundance (146). Furthermore, the shift in cell-type bias may come from the UBC promoter, as even ubiquitous promoters such as CAG and UBC have been shown to have variable levels of expression in different cell types (147). Despite these biases, looking at the differences in transduction between variants delivering the same construct within an individual animal reveals the strongest candidate vectors for on-target and off-target cell types of interest. While we show that our method can profile AAVs carrying standard fluorescent cargo, caution is needed when linking differences in absolute viral tropism to changes in capsid composition alone without considering the contribution of the transgene, regulatory elements, and distribution of cell types in recovered tissue. Therefore, for more robust and relative tropism between variants, we found it beneficial to use small barcodes and co-injections of pools of vectors. Our scRNA-seq-based approach is not restricted to profiling capsid variants but can be expanded in the future to screen promoters (148)(149)(150), enhancers (151,152), or transgenes (86,153), all of which are essential elements requiring optimization to improve gene therapy.
Finally, we have used scRNA-seq to understand how intraorbital administration of AAV-PHP.eB affects the host cell transcriptome across distinct time points. Results from our study show genes pertaining to the p53 pathway in endothelial cells are differentially expressed 3 days after injection, an effect which vanishes at a later time point of 25 DPI. Though other cell types such as immune cells and neurons had a few differentially expressed genes pertaining to antigen presentation and cell differentiation, respectively, endothelial cells at 3 DPI are the only cells with a profound response signature. The highest number of differentially expressed genes being in endothelial cells suggests that vascular cells could be the initial responders to viral transduction and expression of the transgene. This is supported by Kodali et al., who have shown that endothelial cells are the first to elicit a response to peripheral inflammatory stimulation by transcribing genes for proinflammatory mediators and cytokines (154). With regards to p53 differentially expressed genes, Ghouzzi, et al. have also shown that the genes Phdla3, Aen, and Cdkn1a were upregulated in cells infected with ZIKA virus, signifying genotoxic stress and apoptosis induction (104). Upregulation of genes such as Bax and Cdkn1a could be a response to cellular stress induced by viral transduction (103,155). However, the initial inflammatory responses did not escalate as shown by the low number of differential expressed genes across all cells ( Figure 5B) at day twenty-five. Additionally, antigen presenting genes, such as Cd74 and H2-K1, returning back to control expression levels in microglia and a lack of proinflammatory cytokines being upregulated support that the event of infiltration of peripheral leukocytes is unlikely, in agreement with prior studies (87,88). Based on prior studies, the few genes that are differentially expressed at day 25 in excitatory and inhibitory neurons could also be due to transgene expression rather than the virion (89). Upregulation of immediate early genes such as Fos, Junb, and Ier2 in inhibitory neurons could indicate that the cells which are transduced and expressing viral transcripts could be under increased stress and metabolic demands, either directly in response to transgene expression, or in combination with the stresses of dissociation. For example, c-jun and c-fos were found to be upregulated by lung epithelial cells as part of the response to measles virus (156). Given that scRNA-seq is a sensitive technique, it can be prone to high false discovery rates if not properly controlled. To account for this, we used a highly conservative pseudo-bulk differential expression procedure, which has been shown to minimize false discovery rates compared to other batch correction methods (157). This conservative procedure, however, has lower relative power, and thus there may be additional effects in cell subtypes to AAV transduction. The confidence and statistical power of future scRNA-seq studies looking at AAV-related immune responses could be improved with increased sample size, such as via sample multiplexing strategies to pool multiple animals (127), or by increasing the sensitivity and specificity of viral transcript detection and performing differential expression within animals. It is also important to note that the findings discussed here are specific to the rAAV, transgene, and dosage. Nonetheless, our results highlight the power of single-cell profiling in being able to ascertain cell-type-specific responses at an early time point post-injection.
As shown throughout this work, there are several challenges we had to overcome to gain valuable insights from our dropletbased single-cell RNA sequencing approach. While we were able to overcome these in the context of our study, they hint at some important limitations of this method. First, droplet-based singlecell sequencing of tissues that are difficult to dissociate, such as brain, can lead to substantial background noise from debris. Alternative methods, such as single-nucleus RNA sequencing (158,159), could potentially overcome this debris problem. Exploratory work would need to be performed to determine whether single-nucleus RNA sequencing captures a sufficient amount of immature viral transcripts, but, if effective, may obviate the need for computational detection of transduction above a background level. Another potential challenge of our method is scaling up to much larger numbers of variants. In order to establish high statistical confidence in tropism, many cells need to be transduced. However, given restrictions on the total dosage an animal can receive, adding more variants would require a lower dosage per variant. In simulation, we found that subsampling our 2-variant pool by 10-fold did not change the major tropism findings (Supplemental Figure 9). Given our current injections are 8-fold lower than the maximum allowed dosage, this suggests this method could scale up to 80 variants; however, further work would need to be done to validate whether this holds for a diversity of variants that may be competing for binding. Scaling higher would be challenging with current droplet-based single-cell RNA sequencing pipelines that process on the order of 10 4 cells per reaction. Alternative approaches, such as split-pool strategies, which can profile many more cells (160), may thus be appealing for larger variant pools.
In summary, our platform enables thorough tropism characterization of existing and emerging recombinant AAVs and helps uncover cellular responses to rAAV-mediated gene therapy, thus further guiding the engineering and use of gene delivery vehicles.

Animals
Male C57BL/6J mice (Stock No: 000664) used in this study were purchased from the Jackson Laboratory (JAX). AAV variants were injected i.v. into the retro-orbital sinus of 6-7 week old mice.

Plasmids
In vivo vector characterization of AAV variant capsids was conducted using single-stranded (ss) rAAV genomes. pAAV : CAG-NLS-mNeonGreen, pAAV : CAG-NLS-mRuby2, pAAV : CAG-tdTomato, and pAAV : CAG-NLS-tdTomato constructs were adapted from previous publications (32,45). To introduce barcodes into the polyA region of CAG-NLS-mNeonGreen, we digested the plasmid with BglII and EcoRI, and performed Gibson assembly (E2611, NEB) to insert synthesized fragments with 7bp degenerate nucleotide sequences 89 bp upstream of the polyadenylation site. We then seeded bacterial colonies and selected and performed Sanger sequencing on the resulting plasmids to determine the corresponding barcode.
The UBC-mCherry-AAV-cap-in-cis plasmid was adapted from the rAAV-Cap-in-cis-lox plasmid from a previous publication (34). We performed a restriction digest on the plasmid with BsmbI and SpeI to remove UBC-mCherry and retain the AAV9 cap gene and remaining backbone. We then circularized the digested plasmid using a gblock joint fragment to get a plasmid containing AAV2-Rep, AAV9-Cap, and the remaining backbone via T4 ligation. In order to insert UBC-mCherry with the desired orientation and location, we amplified its linear segment from the original rAAV-Cap-in-cis-lox plasmid. The linear UBC-mCherry-polyA segment and circularized AAV2-Rep,AAV9-cap plasmid were then both digested with HindIII and ligated using T4 ligation. In order to get the SV40 PolyA element in the proper orientation with respect to the inserted UBC-mCherry, we removed the original segment from the plasmid using AvrII and AccI enzymes and inserted AvrII, AccI treated SV40 gblock using T4 ligation to get the final plasmid.
To insert barcodes into UBC-mCherry-AAV-cap-in-cis, we obtained 300 bp DNA fragments containing the two desired capsid mutation regions for each variant and the variant barcode, flanked by BsrGI and XbaI cut sites. The three segments of the fragment were separated by BsaI Type I restriction sites.
We digested the UBC-mCherry-AAV-cap-in-cis plasmid with BsrGI and XbaI, and ligated each variant insert to this backbone. Then, to reinsert the missing regions, we performed Golden Gate assembly with two inserts and BsaI-HF.

Tissue Processing for Single-Cell Suspension
Three to four weeks after the injection, mice (9-10 weeks old) were briefly anesthetized with isoflurane (5%) in an isolated plexiglass chamber followed by i.p. injection of euthasol (100 mg/ kg). The following dissociation procedure of cortical tissue into a single-cell suspension was adapted with modifications from a previous report (162). Animals were transcardially perfused with ice-cold carbogenated (95% O 2 and 5% CO 2 ) NMDG-HEPES-ACSF (93 mM NMDG, 2.5 mM KCl, 1.2 mM NaH 2 PO 4 , 30 mM NaHCO 3 , 20 mM HEPES, 25 mM glucose, 5 mM Na Lascorbate, 2 mM thiourea, 3 mM Na-pyruvate, 10 mM MgSO 4 , 1 mM CaCl 2 , 1 mM kynurenic acid Na salt, pH adjusted to 7.35 with 10N HCl, osmolarity range 300-310 mOsm). Brains were rapidly extracted and cut in half along the anterior-posterior axis with a razor blade. Half of the brain was used for IHC histology while the second half of the brain was used for scRNA-seq. Tissue used for scRNA-seq was immersed in ice-cold NMDG-HEPES-ACSF saturated with carbogen. The brain was sectioned into 300-mm slices using a vibratome (VT-1200, Leica Biosystems, IL, USA). Coronal sections from Bregma −0.94 mm to −2.80 mm were collected in a dissection dish on ice containing NMDG-HEPES-ACSF. Cortical tissue from the dorsal surface of the brain to~3.5 mm ventral was cut out and further sliced into small tissue pieces. NMDG-HEPES-ACSF was replaced by trehalose-HEPES-ACSF (92 mM NaCl, 2.5 mM KCl, 1.2 mM NaH 2 PO 4 , 30 mM NaHCO 3 , 20 mM HEPES, 25 mM glucose, 2 mM MgSO 4 , 2 mM CaCl 2 , 1 mM kynurenic acid Na salt, 0.025 mM D-(+)-trehalose dihydrate*2H 2 O, pH adjusted to 7.35, osmolarity ranging 320-330 mOsm) containing papain (60 U/ml; P3125, Sigma Aldrich, pre-activated with 2.5 mM cysteine and a 0.5-1 h incubation at 34°C, supplemented with 0.5 mM EDTA) for the enzymatic digestion. Under gentle carbogenation, cortical tissue was incubated at 34°C for 50 min with soft agitation by pipetting every 10 min. 5 ml 2500 U/ml DNase I (04716728001 Roche, Sigma Aldrich) was added to the single-cell suspension 10 min before the end of the digestion. The solution was replaced with 200 ml trehalose-HEPES-ACSF containing 3 mg/ml ovomucoid inhibitor (OI-BSA, Worthington) and 1 ml DNase I. At room temperature, the digested cortical tissue was gently triturated with fire-polished glass Pasteur pipettes for three consecutive rounds with decreasing pipette diameters of 600, 300, and 150 mm. 800 ml of trehalose-HEPES-ACSF with 3 mg/ml ovomucoid inhibitor was added. The uniform single-cell suspension was pipetted through a 40 mm cell strainer (352340, Falcon) into a new microcentrifuge tube followed by centrifugation at 300 g for 5 min at 4°C. The supernatant was discarded and cell pellet was resuspended in 1 ml of trehalose-HEPES-ACSF. After mixing using a Pasteur pipette with a 150 mm tip diameter, the single-cell suspension was centrifuged again. Supernatant was replaced with fresh trehalose-HEPES-ACSF and the resuspended cell pellet was strained with a 20 mm nylon net filter (NY2004700, Millipore). After resuspension in trehalose-HEPES-ACSF, cells were pelleted again and resuspended in 100 ml of ice-cold resuspension-ACSF (117 mM NaCl, 2.5 mM KCl, 1.2 mM NaH 2 PO 4 , 30 mM NaHCO 3 , 20 mM HEPES, 25 mM glucose, 1 mM MgSO 4 , 2 mM CaCl 2 , 1 mM kynurenic acid Na salt and 0.05% BSA, pH adjusted to 7.35 with Tris base, osmolarity range 320-330 mOsm). Cells were counted with a hemocytometer and the final cell densities were verified to be in the range of 400-2,500 cells/ml. The density of single-cell suspension was adjusted with resuspension-ACSF if necessary.

Transcriptomic Library Construction
Cell suspension volumes containing 16,000 cells-expected to retrieve an estimated 10,000 single-cell transcriptomes-were added to the 10x Genomics RT reaction mix and loaded to the 10x Single Cell Chip A (230027, 10x Genomics) for 10x v2 chemistry or B (2000168, 10x Genomics) for 10x v3 chemistry per the manufacturer's protocol (Document CG00052, Revision F, Document CG000183, Revision C, respectively). We used the Chromium Single Cell 3' GEM and Library Kit v2 (120237, 10x genomics) or v3 (1000075, 10x Genomics) to recover and amplify cDNA, applying 11 rounds of amplification. We took 70 ng to prepare Illumina sequencing libraries downstream of reverse transcription following the manufacturer's protocol, applying 13 rounds of sequencing library amplification.

Viral Library Construction
We selectively amplified viral transcripts from 15 ng of cDNA using a cargo-specific primer binding to the target of interest and a primer binding the partial Illumina Read 1 sequence present on the 10x capture oligos (Supplemental Table 1). For animals injected with a single cargo, amplification was performed only once using the primer for the delivered cargo; for animals with distinct cargo sequences per variant, amplification was performed in parallel reactions from the same cDNA library using different cargo-specific primers for each reaction. We performed the amplification using 2x KAPA HiFi HotStart ReadyMix (KK2600) for 28 cycles at an annealing temperature of 53°C. Afterwards, we performed a left-sided SPRI cleanup with a concentration dependent on the target amplicon length, in accordance with the manufacturer's protocol (SPRISelect, Beckman Coulter B23318). We then performed an overhang PCR on 100 ng of product with 15 cycles using primers that bind the cargo and the partial Illumina Read 1 sequence and appending the P5/P7 sequences and Illumina sample indices. We performed another SPRI cleanup, and analyzed the results via an Agilent High Sensitivity DNA Chip (Agilent 5067-4626).

Sequencing
Transcriptome libraries were pooled together in equal molar ratios according to their DNA mass concentration and their mean transcript size as determined via bioanalyzer. Sequencing libraries were processed on Novaseq 6000 S4 300-cycle lanes. The run was configured to read 150 bp from each end. Sequencing was outsourced to Fulgent Genetics and the UCSF Center for Advanced Technology.
All viral transcript libraries except barcoded UBC-mCherry were pooled together in equal molar ratios into a 4 nM sequencing library, then diluted and denatured into a 12 pM library as per the manufacturer's protocol (Illumina Document #15039740v10). The resulting library was sequenced using a MiSeq v3 150-cycle reagent kit (MS-102-3001), configured to read 91 base pairs for Read 2 and 28 base pairs for Read 1. To characterize the effect of sequencing depth, one viral transcript library was additionally processed independently on a separate MiSeq run.
The UBC-mCherry viral transcript library, which was recovered with primers near the polyadenylation site, consisted of fragments~307 bp long. Since this length is within the common range for an Illumina NovaSeq run, this viral transcript library was pooled and included with the corresponding transcriptome library.

Transcriptome Read Alignment
For transcriptome read alignment and gene expression quantification, we used 10x Cell Ranger v5.0.1 with default options to process the FASTQ files from the transcriptome sequencing library. The reads were aligned against the mus musculus reference provided by Cell Ranger (mm10 v2020-A, based on Ensembl release 98).
To detect viral transcripts in the transcriptome, we ran an additional alignment using 10x Cell Ranger v5.0. 1 with a custom reference genome based on mm10 v2020-A. We followed the protocol for constructing a custom Cell Ranger reference as provided by 10x Genomics. This custom reference adds a single gene containing all the unique sequences from our delivered plasmids in the study, delineated as separate exons. Sequences that are common between different cargo are provided only once, and annotated as alternative splicings.

Viral Transcript Read Alignment
For viral read alignment, we aligned each Read 2 to a template derived from the plasmid, excluding barcodes. The template sequence was determined by starting at the ATG start site of the XFP cargo and ending at the AATAAA polyadenylation stop site. We used a Python implementation of the Striped Smith-Waterman algorithm from scikit-bio to calculate an alignment score for each read, and normalized the score by dividing by the maximum possible alignment score for a sequence of that length, minus the length of the barcode region. For each Read 2 that had a normalized alignment score of greater than 0.7, we extracted the corresponding cell barcode and UMI from Read 1, and any insertions into the template from Read 2.

Constructing the Variant Lookup Table
For co-injections with multiple templates and injections of barcoded templates, we constructed a lookup table to identify which variant belongs to each cell barcode/UMI. For each template, we counted the number of reads for each cell barcode/UMI. For reads of barcoded cargo, we only counted reads where the detected insertion in the barcode region unambiguously aligned to one of the pre-defined variant barcodes. Due to sequencing and PCR amplification errors, most cell barcode/UMI combinations had reads associated with multiple variants. Thus, we identified the variant with the largest count for each cell barcode/UMI. We discarded any cell barcode/UMIs that had more than one variant tied for the largest count. Finally, each cell barcode/UMI that was classified as a viral transcript in the transcriptome (see Transcriptome Read Alignment) was converted into the virus detected in the variant lookup table, or was discarded if it did not exist in the variant lookup table.

Estimating Transduction Rate
To determine an estimate of the percent of cells within a group expressing viral cargo above background, we compared the viral transcript counts in that group of cells to a background distribution of viral transcript counts in debris (see Droplet Type Classification). First, we obtained the empirical distribution of viral transcript counts by extracting the viral counts for that variant in cell barcodes classified as the target cell type as well as cell barcodes classified as debris. Next, we assumed a percentage of cells containing debris. For each viral transcript count, starting at 0, we calculated the number of cells that would contain this transcript count, if the assumed debris percentage was correct. We then calculated an error between this estimate and the number of cells with this transcript count in the cell type of interest. We tallied this error over all the integer bins in the histogram, allowing the error in a previous bin to roll over to the next bin. We repeated this for all possible values of percentage of debris from 0 to 100 in increments of 0.25, and the value that minimized the error was the estimated percentage of cells whose viral transcript count could be accounted for by debris. The inverse of this was our estimate of the number of cells expressing viral transcripts above background.
To validate that this method reliably recovers an estimate of transduction rate, we performed a series of simulations using models of debris viral transcript counts added to proposed cell type transcript count distributions across a range of parameterizations. To get estimates of the background distribution of debris, we used diffxpy (https://github.com/ theislab/diffxpy) to fit the parameters of a negative binomial distribution to the viral transcript counts in debris droplets within a sample. We then postulated 1,000 different parameterizations of the negative binomial representing transcript counts in groups of cells, with 40 values of r ranging from 0.1 to 10, spaced evenly apart, and 25 values of p ranging from 0.001 to 0.99, spaced evenly apart. For each proposed negative binomial model, we drew 1,000 random samples of viral counts from the learned background distribution, and 1,000 random samples from the proposed cell distribution, and summed the two vectors. This summed vector was then used in our transduction rate estimation function, along with a separate 1,000 random samples of background viral transcripts for the function to use as an estimate of the background signal. We calculated the true probability of non-zero expression in our proposed cell negative binomial model (1 -P(X = 0)), and compared this value with the estimated value from the transduction rate estimation method.

Calculating Viral Tropism
For each variant v n and cell type of interest c i , we estimated the percentage of cells expressing viral cargo. To calculate tropism bias, we used this estimated expression rate, t c i ,v n , to estimate the number of cells expressing viral transcripts in that cell type, T c i ,v n out of the total number of cells of that type, N c i : T c i ,v n = t c i ,v n N c i . Cell type bias, b c i ,v n , within a sample was then calculated as the ratio of the number of cells of interest divided by the total number of transduced cells, b c i ,v n = T c i ,vn S j T c j ,vn . Finally, to calculate the difference in transduction bias for a particular variant relative to other variants in the sample, d c i ,v n , we subtracted the bias of the variant from the mean bias across all other variants,

Fluorescent In Situ Hybridization Chain Reaction
FISH-HCR was conducted as previously reported (99). Probes targeting neuronal markers were designed using custom-written software (https://github.com/GradinaruLab/HCRprobe). Probes contained a target sequence of 20 nucleotides, a spacer of 2 nucleotides, and an initiator sequence of 18 nucleotides. Criteria for the target sequences were: (1) a GC content between 45%-60%, (2) no nucleotide repeats more than three times, (3) no more than 20 hits when blasted, and (4) the DG had to be above -9 kcal/mol to avoid self-dimers. Last, the full probe sequence was blasted and the Smith-Waterman alignment score was calculated between all possible pairs to prevent the formation of cross-dimers. In total, we designed 26 probes for Gad1, 20 probes for Vip, 22 probes for Pvalb, 18 probes for Sst, and 28 probes for Slc17a7. Probes were synthesized by Integrated DNA Technologies.

Droplet Type Identification
scRNA-seq datasets were analyzed with custom-written scripts in Python 3.7.4 using a custom fork off of scVI v0.8.1, and scanpy v1.6.0. To generate a training dataset for classifying a droplet as debris, multiplets, neuronal, or non-neuronal cells, we randomly sampled cells from all 27 cortical tissue samples. We sampled a total of 200,000 cells, taking cells from each tissue sample proportional to the expected number of cells loaded into the single-cell sequencing reaction. Within each sample, cells were drawn randomly, without replacement, weighted proportionally by their total number of detected UMIs. For each sample, we determined a lower bound on the cutoff between cells and empty droplets by constructing a histogram of UMI counts per cell from the raw, unfiltered gene count matrix. We then found the most prominent trough preceding the first prominent peak, as implemented by the scipy peak_prominences function. We only sampled from cells above this lower bound. Using these sampled cells, we trained a generative neural network model via scVI with the following parameters: 20 latent features, 2 layers, and 256 hidden units. These parameters were chosen from a coarse hyperparameter optimization centered around the scVI default values (Supplemental Table 3). We included the sample identifier as the batch key so that the model learned a latent representation with batch correction.
After training, Leiden clustering was performed on the learned latent space as implemented by scanpy. We used default parameters except for the resolution, which we increased to 2 to ensure isolation of small clusters of cell multiplets. Using the learned generative model, we draw 5000 cells from the posterior distribution based on random seed cells in each cluster. We draw an equal number conditioned on each batch. From these samples, we then calculated a batch-corrected probability of each cluster expressing a given marker gene (see Cluster Marker Gene Determination). For this coarse cell typing, we chose a single marker gene for major cell types expected in the cortex (Supplemental Table 2). If a cluster was expressing the neuron marker gene Rbfox3, it was labeled as "Neurons". If a cluster was expressing any of the other non-neuronal marker genes, it was labelled as "Non-neurons". Next, we ran Scrublet on the training cells to identify potential multiplets. Scrublet was run on each sample independently, since it is not designed to operate on combined datasets with potential batch-specific confounds. We then calculated the percentage of droplets in each cluster of the combined data that were identified as multiplets by Scrublet. We found a percentage threshold for identifying a cluster as containing predominantly multiplets by using Otsu's threshold, as implemented by scikit-image. All droplets in any cluster above the multiplet percentage threshold were labelled as "Multiplets". All other clusters were labelled as "Debris".
Next, we trained a cell-type classifier using scANVI on the droplets labeled as training data. We used the weights from the previously trained scVI model as the starting weights for scANVI. Rather than using all cells for every epoch of the trainer, we implemented an alternative sampling scheme that presented each cell type to the classifier in equal proportions. Once the model was trained, all cells above the UMI lower noise bound were run through the classifier to obtain their cell-type classification. Droplets classified as "Neurons" or "Non-neurons" were additionally filtered by their scANVI-assigned probability. We retained only cells above an FDR threshold of 0.05, corrected for multiple comparisons using the Benjamini-Hochberg procedure. Finally, since the original run of Scrublet for multiplet detection was performed on only the training data, and thus did not take advantage of all the cells available, we ran Scrublet on all droplets classified as cells, and removed any identified multiplets.

Cluster Marker Gene Determination
To identify which clusters are expressing marker genes, we determined an estimated probability of a marker gene being expressed by a random cell in that cluster. For each cluster, we randomly sampled 5,000 cells, with replacement. We used scVI to project each cell into its learned latent space, and then used scVI's posterior predictive sampling function to generate an example cell from this latent representation, and tallied how many times the gene is expressed. We repeated this for each batch, conditioning the posterior sample on that batch, to account for technical artifacts such as sequencing depth. Once we obtained a probability of expression of a marker gene for each cluster, we find a threshold for expression using Otsu's method, as implemented by scikit-image. Clusters that have a probability of expression above the threshold are considered positive for that marker gene.

Neuronal Subtype Classification
Cells classified as neurons were further subtyped using annotations from a well-curated reference dataset. We used the Mouse Whole Cortex and Hippocampus 10x dataset from the Allen Institute for Brain Science as our reference dataset (95). First, we filtered the reference dataset to contain only cell types that are found within the brain regions collected for our experiments. To ensure that, overall, enough cells per cell type were present in our datasets, we merged cell types with common characteristics, such as expression of key marker genes. We realigned our cell transcriptome reads to the same pre-mRNA reference used to construct the reference dataset, so that the gene count matrices had a 1:1 mapping. We then trained a joint scANVI model with all cells identified as neurons from our samples and the reference database to learn a common latent space between them. The model was trained to classify cells based on the labels provided in the reference dataset. Cells were sampled from each class in equal proportions during training. After the model was trained, all neurons from our sample were run through the model to obtain their cell type classification.

Non-Neuronal Subtype Classification
Cells classified as non-neuronal were further subtyped using automatic clustering and marker gene identification. We trained an scVI model using only the non-neuronal cells and performed Leiden clustering as implemented by scanpy on the latent space. We determined which clusters were expressing each of 31 marker genes across 13 cell subtypes. Marker genes were identified from a review of existing scRNA-seq, bulk RNA-seq, or IHC studies of mouse brain non-neuronal subtypes (Supplemental Table 2). Each cluster was assigned to a cell subtype if it was determined positive for all the marker genes for that cell subtype (see Cluster Marker Gene Determination). If a cluster contained all the marker genes for multiple cell subtypes, the cluster was assigned to the cell subtype with the greatest number of marker genes. Clusters that did not express all the marker genes for any cell subtype were labeled as "Unknown". Clusters that expressed all the marker genes for multiple cell subtypes with the same total number of marker genes were labeled as "Multiplets". For cell types that contained multiple clusters, we then calculated the probability of every gene being zero in each cluster (see Cluster Marker Gene Determination). We then compared gene presence between clusters of the same cell type to see if there were any subclusters that had a dominant marker gene (present in > 50% of samples), that was not present in any of the other clusters (< 10% of samples). For the three cell types that had unique marker genes, we named the cluster after the gene with the highest 2-proportion z-score between the sampled gene counts in that cluster vs the rest.

Quantification of Images
Quantitative data analysis of confocal images was performed blind with regard to AAV capsid variant. Manual quantification was performed using the Cell Counter plugin, present in the Fiji distribution of ImageJ (National Institutes of Health, Bethesda, MD) (164). Transduction rate was calculated as the total number of double positive cells (i.e. viral transgene and cell type marker) divided by the total number of cell type marker labeled cells. For each brain slice, at least 100 cells positive for the gene markers of interest were counted in the cortex.

Differential Expression
To calculate differential expression within cell types between groups of animals, we used the DESeq2 R package (165). For each cell type, the gene counts are summed across all cells of that type and treated as a pseudo-bulk sample. The summed gene counts from each animal are then included as individual columns for a DESeq2 differential expression analysis. We performed DE for 3 DPI and 25 DPI separately, testing each sample against saline-injected controls. For each cell type, only genes that were present in all samples of at least one condition are included.

Marker Gene Dot Plots
To generate dot plots for marker genes, we used scanpy's dotplot function (166). Gene counts were normalized to the sum of the total transcript counts per cell using scanpy's normalize_total function. Normalized gene expression values are logtransformed as part of the plotting function.

Statistics
Statistical analyses comparing the fraction of transduced cells and transduction rate in different cell types for Figures 2, 3, 4D, E and 5A were conducted using GraphPad Prism 9. Statistical analyses comparing proportions of transduced cells within an animal in Figures 4F and Figure 6 were performed using the Python statsmodels library v0.12.1. No statistical methods were used to predetermine sample sizes. The statistical test applied, sample sizes, and statistical significant effects are reported in each figure legend. The significance threshold was defined as a = 0.05.

ETHICS STATEMENT
Animal husbandry and all experimental procedures involving animals were performed in accordance with the California Institute of Technology Institutional Animal Care and Use Committee (IACUC) guidelines and reviewed and approved by the Office of Laboratory Animal Resources at the California Institute of Technology (animal protocol no. 1650).

AUTHOR CONTRIBUTIONS
DB, MA, TD, and VG conceived the project and designed the experiments. SC and MT provided critical single-cell RNA sequencing expertise. TD, MA, and DB prepared the DNA constructs and produced virus. MA performed the injections, tissue dissociation, histology, imaging and image quantification. DB and TD performed the single-cell library preparation and prepared samples for sequencing. DB and MA built the data processing pipeline. DB, MA, TD, and AW performed the analysis. All authors contributed to the MS as drafted by DB, MA, and VG. MT supervised single-cell RNA sequencing computational pipelines while VG supervised the overall project. All authors contributed to the article and approved the submitted version.