Analysis of Single Nucleotide Variants in CRISPR-Cas9 Edited Zebrafish Exomes Shows No Evidence of Off-Target Inflation

Therapeutic applications of CRISPR-Cas9 gene editing have spurred innovation in Cas9 enzyme engineering and single guide RNA (sgRNA) design algorithms to minimize potential off-target events. While recent work in rodents outlines favorable conditions for specific editing and uses a trio design (mother, father, offspring) to control for the contribution of natural genome variation, the potential for CRISPR-Cas9 to induce de novo mutations in vivo remains a topic of interest. In zebrafish, we performed whole exome sequencing (WES) on two generations of offspring derived from the same founding pair: 54 exomes from control and CRISPR-Cas9 edited embryos in the first generation (F0), and 16 exomes from the progeny of inbred F0 pairs in the second generation (F1). We did not observe an increase in the number of transmissible variants in edited individuals in F1, nor in F0 edited mosaic individuals, arguing that in vivo editing does not precipitate an inflation of deleterious point mutations.


Introduction 13
CRISPR-Cas9 gene editing technology has offered powerful investigative tools and opened new 14 potential avenues for the treatment of genetic disorders. Nonetheless, like preceding 15 technologies, the clinical implementation of CRISPR-Cas9 editing faces potential barriers. 16 These include restricted control over the delivery and activity of the system; immune responses 17 to the system components; and permanent alteration of unintended genomic targets (Ho et al., 18 2018). In cell culture systems, the alteration of off-target regions decreases precipitously with 19 the use of stringently designed sgRNA sequences and Cas9 enzymes engineered for high 20 specificity (Doench et al., 2016;Fu et al., 2013;Hu et al., 2018), though recent work 21 demonstrates that precise control over the nature of editing even at on-target sites remains 22 challenging (Kosicki et al., 2018). In rodents, these same factors influence the efficiency and 23 specificity of CRISPR-Cas9 editing (Anderson et al., 2018). However, examination of atypical 24 CRISPR-Cas9 influence on organisms remains limited; it is often focused primarily on predicted 25 off-target assessment and is not always agnostic (Varshney et al., 2015). 26 Here, we evaluated the incidence and transmission of off-target effects in a cohort of 27 CRISPR-Cas9 edited zebrafish embryos derived from the same founding pair. Using 52 28 zebrafish embryos from the same clutch targeted with sgRNAs with variable on-target efficiency, 29 we whole-exome sequenced DNA from the entire cohort and their genetic parents and we 30 measured the transmission of variants to the next generation. 31 32 33 Next, we co-injected each sgRNA and Cas9 protein into wild-type zebrafish embryos 48 from the same clutch at the 1-cell stage. For each sgRNA, we harvested DNA from six edited 49 individuals to serve as technical replicates. In addition, we collected DNA from two individuals 50 for each of the following conditions: uninjected, sgRNA alone, or Cas9 alone ( Figure 1A). 51 Finally, to assess the potential transmission of de novo variants to the next generation, we 52 raised the F0 cohort for the smchd1 high efficiency sgRNA and intercrossed adults to obtain the 53 F1 generation. In total, we performed whole exome sequencing (WES) on two parents, 52 F0 54 individuals and 16 F1 individuals ( Figure 1A). WES resulted in 76x average target coverage in 55 F0 samples and 115x average target coverage in F1 individuals ( Figure 1B, C). The F0 56 sequencing data covered 83% of the exome at ≥30x and 65% at ≥50x. The F1 sequencing data 57 covered 88% of the exome at ≥30x and 78% of the exome at ≥50x. 58 7

De novo mutation counts are not inflated across the multigenerational cohort 109
We then returned to the F0 cohort to investigate whether variant burden outside of the targeted 110 locus differed among individuals injected with sgRNA in the presence or absence of Cas9. 111 Importantly, the expected allelic series of variants are reported robustly at the on-target 112 locations of the sgRNAs against two of the target genes, anln on chromosome 19 and kmt2d on 113 chromosome 23 (Supplementary Figure 3A) (Hall et al., 2018;Tsai et al., 2018). No on-target 114 variants are observed for the smchd1 locus because our exome capture did not include baits for 115 this locus in the Zv9 assembly of the zebrafish genome. However, we demonstrated 116 experimentally the on-target CRISPR-editing capability of the two smchd1 sgRNAs and the 117 transmission of on-target variants produced by the high-efficiency sgRNA to the F1 generation 118 via Sanger sequencing (Supplementary Figure 3B), as described (Shaw et al., 2017). 119 We first considered the agnostic off-target VarScan2 variants called in the mosaic F0 120 generation (Suppl . Table S6). Initially, we applied the same arbitrary 0.3 AF threshold that we 121 used with the F1 calls, reasoning that editing occurs at the one-to-two cell stage and would 122 likely manifest as an off-target inflation at high allele frequencies. We determined the Bonferroni 123 correction threshold for four groups (p<0.012), and again, we did not observe a significant 124 inflation in de novo variant counts between control and F0 edited groups, in either the 125 algorithmically predicted counts or the manually reviewed counts (p>0.15; Wilcox rank test; 126  Table S7). We then repeated the analysis on the agnostic MuTect2 call set, and 127 consistent with the filtered VarScan2 data, we did not observe an inflation in de novo mutation 128 counts between control and edited groups (p>0.04; Suppl. Table S7). Finally, because a 0.3 AF 129 may fail to detect inefficient targeting events or lower mosaicism levels, we tested lower cutoff 130 frequencies. At either an arbitrary 0.1 AF threshold, or without applying a threshold, we still 131 observe no significant differences (p>0.08; Suppl. Table S7). 132 For the VarScan2 dataset generated from F0 exomes, the variant count exceeded the 133 to what we observed in the F1 cohort, we inspected all variants exceeding the 0.3 AF cutoff 136 using IGV. We found that this dataset also was subject to similar technical artifacts as observed 137 for F1s; exclusion of these variants brought the de novo mutation call number within the 138 expected range ( Figure 2B). Using the same Bonferroni correction for four groups (p<0.012), we 139 were unable to detect a difference between control versus edited groups (p>0.38; Suppl. Table  140 S8). Since we had observed that variants detected by both callers represented an unbiased way 141 to assess high confidence calls in F1, we also asked whether we could detect a difference in 142 variant counts in this subset of calls in F0 (7 of 8 unambiguous calls; Figure 2C). Again, we 143 observed no significant differences between controls and edited groups (p>0.78; Suppl.

De novo mutations are not observed at predicted off-target sites 147
To examine the potential incidence of off-target mutations more sensitively, we removed the 148 filters on the variant calls and searched predicted off target sites across our multigenerational 149 cohort using three algorithms: the MIT CRISPR design site, the CRISPR-direct engine, and 150 CAS-OFFinder, for any variants occurring within 100 bp flanking a predicted off-target site. 151 Consistent with previous reports (Hruscha et al., 2013;Varshney et al., 2015), we found no 152 support for single nucleotide variants or small indels occurring at predicted off-target locations in 153 the F1 generation, and sporadic low allele frequency calls near predicted off-target regions in 154 F0s. The number of reported variants in the F0 samples are not significantly different than 155 expected by chance (p>0.08; Supplementary Table S10). 156 We reviewed the 15 reported variant calls near predicted off-target sites in F0s, and 157 found that none are supported by both variant callers (Supplementary Table S11). Seven are 158 also reported in siblings subjected to editing with alternative guides or control conditions, 159 making them unlikely to be induced by Cas9-mediated genome editing. Another four were not 9 supported by reads on both strands. Of the four remaining variants, one was only reported in a 161 control condition, making it unlikely to be a result of editing. The other three occur at a 5% 162 alternate allele frequency, near the limit of detection for the variant callers, increasing the 163 likelihood that they may be artifacts. We do note that one variant has features consistent with an 164 expected off-target cut. This is a small deletion reported directly at a predicted off-target cut site 165 detected by two prediction engines (Supplementary Table S10). Notably, this small deletion 166 occurs in an exonic region, has a high CFD risk score (CFD score = 0.52), and is observed at 167 the predicted locus in a few reads from the VarScan2 call set as well, even though it is not 168 Trio sequencing designs enable off-target analyses to distinguish gene editing effects from 175 natural and inherited genetic variation. In our study, the bulk of variant calls in zebrafish exomes 176 are filtered out due to their existence in the parental strain. Our ability to recover transmissible 177 on-target deletions and Sanger-validated de novo mutations outside of predicted off-target 178 regions and in quantities indistinguishable from natural variation suggests that off-target 179 CRISPR events occur infrequently. 180 Our results are consistent with previous results in zebrafish demonstrating limited off-181 target activity at select predicted regions (Hruscha et al., 2013;Varshney et al., 2015) and with 182 recent work in mice that found limited support for off-target effects genome-wide (Iyer et al., 183 2018). However, we are limited to detecting potential off-target variation within the exon-capture 184 space of the genome. We did not assess large structural variants or long deletions at the on-185 target site. In addition, we occasionally observed trends toward variant inflation in the predicted 186 variant call sets that were related to sequencing depth and did not survive visual inspection. 187 This observation suggests that even with trio designs and other precautionary measures, care 188 should be exercised in interpreting variant predictions agnostically and that sequencing even 189 more individuals per condition may be required to expose subtle differences in off-target effects. 190 In response to initial reports that CRISPR-Cas9 edited mammalian cells harbored off- The ZDR strain in our laboratory gives consistently robust clutch sizes of ~100 embryos. To 221 preserve enough individuals to generate an F1 generation, we anticipated that we would have 222 approximately 50 individuals available for exome sequencing. Using the CFD score cut-off of 223 0.2 as a threshold for the likelihood of inducing transmissible off-target mutations, we expected 224 that we would need at least 5-6 embryos per condition to observe one of these events. Thus, we 225 selected six independent embryos per gRNA plus Cas9 condition for comparison with controls 226 while maintaining the experiment within a single clutch to control for inherited variation. 227 228

Heteroduplex Editing Efficiency by PAGE 229
For each sgRNA plus Cas9 condition we PCR-amplified gDNA from 12 embryos per batch using 230 site-specific primers and screened for heteroduplex formation as described (Zhu et al., 2014). 231 Five samples with evidence of heteroduplex formation were gel purified alongside a control 232 sample, 'A' overhangs were added to the PCR products, and the products were cloned into a 233 TOPO4 vector (Thermo Fisher). We picked 12 colonies per embryo to estimate targeting 234 efficiency by Sanger sequencing. locations for CRISPR-editing. The potential for variants to occur due to off-target CRISPR-264 mediated editing was assessed by comparing variant counts between groups with either a 265 Wilcoxon rank test for two groups, or a Kruskal-Wallis rank test for more than two groups and 266 assessing the p-value against a Bonferroni critical value to correct for multiple testing. In parents from the ZDR laboratory strain of wild-type zebrafish. A total of 52 embryos were 408 selected for DNA extraction and sequencing at 4 dpf in the F0 generation (2 uninjected, 2 Cas9 409 injected, 2 sgRNA injected across 6 different sgRNAs targeting 3 genes for a total of 12 410 embryos, and 6 CRISPR-Cas9 embryos per sgRNA guide for a total of 36 edited individuals). 411 Additional embryos for each condition were injected concurrently, but raised to adulthood. The Confirmation of CRISPR editing efficiencies. Efficiency data for the high efficiency guides have been published previously. (A, C, E) Schematic of the D. rerio locus, sgRNA targeted regions (red squares) and primers used to determine sgRNA efficiency (red triangles) for each gene of interest. (B, D, F) Heteroduplex analysis (left) and Sanger sequencing of 12 clones amplified from a single representative embryo injected with the low efficiency sgRNA plus Cas9 for each target gene (right). Efficiency was estimated by taking the average number of targeted clones across five embryos per sgRNA. * denotes samples from the heteroduplex analysis choosen for sequencing; PAM, protospacer adjacent motif.

Figure S2
CFD score distribution of MIT-predicted off-target sequences by sgRNA.

Figure S4
Transition-Transversion ratio in F1 exomes compared to grandparental exomes. Sizes of circles represent the number of observations for each variant class: transitions (blue), transversions (orange), indels (grey). After filtering, the transition-transversion ratio is 1.09.       Hypergeometric p-values calculated with the Rothstein lab hypergeometric calculator, using the capture space (74691693 bp) as the population size, and a high or low estimate for the expected population variant counts (10975 vs 608, respectively).