# PRECISE GENOME EDITING TECHNIQUES AND APPLICATIONS

EDITED BY : Zhiying Zhang, David Jay Segal and Kun Xu PUBLISHED IN : Frontiers in Genetics

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-779-9 DOI 10.3389/978-2-88963-779-9

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# PRECISE GENOME EDITING TECHNIQUES AND APPLICATIONS

Topic Editors: Zhiying Zhang, Northwest A and F University, China David Jay Segal, University of California, Davis, United States Kun Xu, Northwest A and F University, China

Citation: Zhang, Z., Segal, D. J., Xu, K., eds. (2020). Precise Genome Editing Techniques and Applications. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-779-9

# Table of Contents


Tao Huang, Qiang Gao, Tongying Feng, Yi Zheng, Jiayin Guo and Wenxian Zeng

*52 Precise and Rapid Validation of Candidate Gene by Allele Specific Knockout With CRISPR/Cas9 in Wild Mice*

Tianzhu Chao, Zhuangzhuang Liu, Yu Zhang, Lichen Zhang, Rong Huang, Le He, Yanrong Gu, Zhijun Chen, Qianqian Zheng, Lijin Shi, Wenping Zheng, Xinhui Qi, Eryan Kong, Zhongjian Zhang, Toby Lawrence, Yinming Liang and Liaoxun Lu

*62 Guidelines for Fluorescent Guided Biallelic HDR Targeting Selection With PiggyBac System Removal for Gene Editing*

Javier Jarazo, Xiaobing Qing and Jens C. Schwamborn

*74 Programmable Base Editing of the Sheep Genome Revealed No Genome-Wide Off-Target Mutations*

Shiwei Zhou, Bei Cai, Chong He, Ying Wang, Qiang Ding, Jiao Liu, Yao Liu, Yige Ding, Xiaoe Zhao, Guanwei Li, Chao Li, Honghao Yu, Qifang Kou, Wenzhi Niu, Bjoern Petersen, Tad Sonstegard, Baohua Ma, Yulin Chen and Xiaolong Wang

*82 sgRNA-shRNA Structure Mediated SNP Site Editing on Porcine IGF2 Gene by CRISPR/StCas9*

Yongsen Sun, Nana Yan, Lu Mu, Bing Sun, Jingrong Deng, Yuanyuan Fang, Simin Shao, Qiang Yan, Furong Han, Zhiying Zhang and Kun Xu


Yong Zhao, Peijuan Liu, Zhiqian Xin, Changhong Shi, Yinlan Bai, Xiuxuan Sun, Ya Zhao, Xiaoya Wang, Li Liu, Xuan Zhao, Zhinan Chen and Hai Zhang

*111 Enhancement of Precise Gene Editing by the Association of Cas9 With Homologous Recombination Factors*

Ngoc-Tung Tran, Sanum Bashir, Xun Li, Jana Rossius, Van Trung Chu, Klaus Rajewsky and Ralf Kühn

*124 Efficient and Precise CRISPR/Cas9-Mediated MECP2 Modifications in Human-Induced Pluripotent Stem Cells*

Thi Thanh Huong Le, Ngoc Tung Tran, Thi Mai Lan Dao, Dinh Dung Nguyen, Huy Duong Do, Thi Lien Ha, Ralf Kühn, Thanh Liem Nguyen, Klaus Rajewsky and Van Trung Chu

## Editorial: Precise Genome Editing Techniques and Applications

Kun Xu<sup>1</sup> , David Jay Segal <sup>2</sup> \* and Zhiying Zhang<sup>1</sup> \*

*<sup>1</sup> College of Animal Science and Technology, Northwest A&F University, Yangling, China, <sup>2</sup> Department of Biochemistry and Molecular Medicine, Genome Center, University of California, Davis, Davis, CA, United States*

Keywords: precise genome editing, CRISPR, HDR efficiency, biallelic HDR targeting, off-target effect, animal model, base editing

**Editorial on the Research Topic**

#### **Precise Genome Editing Techniques and Applications**

The CRISPR/Cas system, particularly CRISPR/Cas9 (Jinek et al., 2012; Cong et al., 2013), has been developed as a robust and versatile platform for manipulating the genomes of a variety of species. In recent years, numerous reports have suggested its powerful potential application for human gene therapy and life science research, as well as animal and plant breeding. This might be evidenced by the collections in this Research Topic, "Precise Genome Editing Techniques and Applications."

Generally, the CRISPR/Cas9 nuclease is used to cleave target genomic DNA to generate sitespecific double-strand breaks (DSBs), which are predominantly repaired via non-homologous end joining (NHEJ) or, to lesser extent, by homology-directed repair (HDR). The classical NHEJ repair pathway can generate small insertions or deletions (indels), resulting in loss-of-function of targeted coding genes by introducing a frameshift in the open reading frame (ORF). NHEJ mutagenesis is a highly popular strategy for gene manipulation. In addition to the classical NHEJ, alternative or accurate NHEJ-mediated repair can achieve precise genomic DNA deletions (Guo et al., 2018; Shou et al., 2018).

Two papers in this Research Topic by Chao et al. and Zhao et al. describe the manufacture of allele-specific knockout and double gene knockout mouse models for rapid disease gene validation and human xenograft studies, respectively. N 6 -methyladenosine (m6A) is a well-established epigenetic modification on eukaryotic mRNA. An increasing number of studies have uncovered the significance of m6A methylation, which has given rise the nascent field of "epitranscriptomics." Another article in this volume (Huang et al.) describes a knock-out study in mouse spermatogonial GC-1 cells of the fat mass and obesity-associated (Fto) gene, which has been shown to act on the epitranscriptome as an m6A demethylase (Li et al., 2017; Lin et al., 2017).

On the other hand, the HDR repair pathway relies on homologous donor DNA to produce targeted gene knock-ins at the DSB site or gene replacement between two DSB sites. Precise point mutations and designed small indels can also be achieved by this method. One paper in this topic describes efforts to precisely correct the methyl-CpG binding protein 2 (MECP2) gene in the context of Rett syndrome (RTT) by CRISPR/Cas9-mediated HDR in human induced pluripotent stem cells (iPSCs). This report provides a reference for iPSC-based disease modeling and gene correction therapy (Le et al.).

Although the HDR-based genome can achieve gene insertions and precise substitutions, it is still confronted by several disadvantages during the precise editing process including low HDR efficiency, failure of biallelic targeting, complications of positive selection, and the re-deletion of selection markers.

#### Edited and reviewed by:

*Youri I. Pavlov, University of Nebraska Medical Center, United States*

#### \*Correspondence:

*David Jay Segal djsegal@ucdavis.edu Zhiying Zhang zhangzhy@nwafu.edu.cn*

#### Specialty section:

*This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics*

Received: *19 February 2020* Accepted: *01 April 2020* Published: *21 April 2020*

#### Citation:

*Xu K, Segal DJ and Zhang Z (2020) Editorial: Precise Genome Editing Techniques and Applications. Front. Genet. 11:412. doi: 10.3389/fgene.2020.00412*

**5**

It has been reported that inhibiting the key molecules of the competitive NHEJ pathway, such as DNA ligase IV (LIG4) and KU70, could improve the HDR efficiency effectively (Chu et al., 2015; Maruyama et al., 2015). We have previously developed a novel sgRNA-shRNA structure transcribing multiple sgRNAs for multiplex genome targeting (Yan et al., 2016). Here, the structure was further used for simultaneous LIG4 RNA interference and the enhanced HDR-based IGF2 SNP editing (Sun et al.). On the other hand, the HDR pathway can be also enhanced by the association of Cas9 with a variety of homologous recombination factors, such as yRad52 (Shao et al., 2017), dn53BP1 (Paulsen et al., 2017; Jayavaradhan et al., 2019), hRad51 (Rees et al., 2019), and CtIP (Tran et al.). A review paper in this topic (Liu et al.) further summarizes the methodologies and other considerations for improving the HDR efficiency.

Regarding biallelic targeting, we have previously reported a novel strategy using two donors with paired selectable markers (Wu et al., 2017). However, the removal of the selection is often required to allay concerns of marker-dependent effects. There are several "pop in and out" two-step techniques for markerfree genome engineering, including the Cre/LoxP system (Zhu et al., 2015), the piggyBac transposon (Xie et al., 2014), and the SSA repair mechanism (Li et al., 2018). This Research Topic presents a protocol article for biallelic HDR targeting using piggyBac-mediated selection removal (Jarazo et al.).

The ever-expanding repertoire of CRISPR editing systems includes the widely used Cas9 of Streptococcus pyogenes (SpCas9) (Jinek et al., 2012; Cong et al., 2013), as well as Streptococcus thermophilus (StCas9) (Xu et al., 2015), and Neisseria meningitides (NmCas9) (Hou et al., 2013). In addition, other proteins in the CRISPR family such as Cpf1/Cas12a (Zetsche et al., 2015) have been applied for genome editing. More recently, CRISPR/Cas-derived novel genome editing tools that do not create DSBs have been developed, including the cytidine

#### REFERENCES


and adenine base editors (CBE and ABE) (Komor et al., 2016; Gaudelli et al., 2017), as well as prime editors (PE) (Anzalone et al., 2019). The paper by Wu et al. in this topic describes efforts to increase the CBE scope and efficiency in rice.

The rapid development of genome editing technology has provided opportunities for modifying large animal models and domestic animal breeding. Pigs serve as an important agricultural resource as well as animal models for biomedical studies. In this topic, Yang and Wu summarize the genome editing of pigs in agricultural and biomedical applications. Off-target effects are one of the major concerns for genome editing research. The last two articles (Li et al.; Zhou et al.) in this Research Topic report no obvious off-target events in the offspring of genetically edited goats. However, it remains to be determined as to whether these observations might be affected by survivorship bias, as well as differences in off-target events between human gene therapy and animal genetic breeding.

In conclusion, the articles contained within this Research Topic illustrate the mechanisms and great potential of precise genome editing techniques to further scientific inquiry and produce useful outcomes that benefit society.

#### AUTHOR CONTRIBUTIONS

KX wrote the manuscript draft. DS and ZZ supervised the topic and read the manuscript.

#### FUNDING

This work was supported by grants from the China Postdoctoral Science Foundation (2018T111111 and 2015M580887), the National Natural Science Foundation of China (NSFC, 31702099), and the National Transgenic Major Project of China (2018ZX08010-09B).


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Xu, Segal and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genome Editing of Pigs for Agriculture and Biomedicine

#### Huaqiang Yang\* and Zhenfang Wu\*

National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, China

Pigs serve as an important agricultural resource and animal model in biomedical studies. Efficient and precise modification of pig genome by using recently developed gene editing tools has significantly broadened the application of pig models in various research areas. The three types of site-specific nucleases, namely, zinc-finger nucleases, transcription activator-like effector nucleases, and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein, are the main gene editing tools that can efficiently introduce predetermined modifications, including knockouts and knockins, into the pig genome. These modifications can confer desired phenotypes to pigs to improve production traits, such as optimal meat production, enhanced feed digestibility, and disease resistance. Besides, given their genetic, anatomic, and physiologic similarities to humans, pigs can also be modified to model human diseases or to serve as an organ source for xenotransplantation to save human lives. To date, many genetically modified pig models with agricultural or biomedical values have been established by using gene editing tools. These pig models are expected to accelerate research progress in related fields and benefit humans.

## Keywords: pig, genome editing, ZFN, TALEN, CRISPR/Cas9, disease model, xenotransplantation, breeding

## INTRODUCTION

Pigs hold great promise in agriculture and biomedicine. As an important meat source, domestic pigs provide the most commonly consumed meat worldwide. Through selective breeding, humans produce pigs that harbor desired characteristics for agriculture, albeit the selection is a long and slow process (Ruan et al., 2017). However, the process can now be substantially revolutionized and accelerated through genetic modification, including random transgenesis and gene knockouts and knockins (Gaj et al., 2013; Garas et al., 2015). With the improved efficiency of genetic modification, pig genome modification can confer any desired, predetermined genetic changes, which would take years to be realized in traditional selective breeding. Numerous economically significant characteristics, such as increased meat production (Qian et al., 2015; Wang et al., 2015, 2017b; Bi et al., 2016; Rao et al., 2016), reduced fat deposition (Zheng et al., 2017), or enhanced disease resistance (Whitworth et al., 2016; Burkard et al., 2017; Wells et al., 2017; Yang et al., 2018), have been achieved simply and efficiently through genetic modification in pigs, which can be used as valuable breeding materials to advance pig production. In biomedical research, pigs serve as an important large animal model given their advantages over other models. Compared with rodent models, pigs share a higher similarity to human beings in terms of body/organ size, lifespan, anatomy, physiology, and metabolic profile. Compared with non-human primates, pigs have low-cost and mature embryonic manipulation techniques. Pigs can be modified to carry the

Edited by:

Kun Xu, Northwest A&F University, China

#### Reviewed by:

Dengke Pan, Beijing Academy of Agriculture and Forestry Sciences, China Angelika Schnieke, Technische Universität München, Germany

#### \*Correspondence:

Huaqiang Yang yangh@scau.edu.cn Zhenfang Wu wzfemail@163.com

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 28 April 2018 Accepted: 21 August 2018 Published: 04 September 2018

#### Citation:

Yang H and Wu Z (2018) Genome Editing of Pigs for Agriculture and Biomedicine. Front. Genet. 9:360. doi: 10.3389/fgene.2018.00360

**8**

same gene mutation found in humans to replicate inherited diseases (Perleberg et al., 2018), or offer organs with minimal transplant rejection during xenotransplantation (Hryhorowicz et al., 2017). Pigs bridge the gap between humans and the heavily used small rodent models to favor biomedical research ranging from basic science to translational medicine.

However, production of gene-targeted mammals other than mice remained difficult until the late 2000s because the traditional gene targeting technique developed in mice requires homologous recombination (HR) manipulation in embryonic stem (ES) cells (Mansour et al., 1988; Capecchi, 1989). The lack of bona fide germline-competent ES in large animals urged researchers to perform HR in somatic cells rather than in ES cells and then use somatic cell nuclear transfer (SCNT) to produce genetically modified large animals (Polejaeva et al., 2000). Although theoretically possible, the modification of somatic cells by using HR is extremely inefficient; thus, the generation of such cells is impractical (Wells and Prather, 2017). Therefore, only very few gene-targeted pigs were created within two decades (between the establishment of gene targeting technique prior to 1990 and the late 2010s when the novel gene targeting tools began to be used in large animals) (Dai et al., 2002; Lai et al., 2002; Rogers et al., 2008; Suzuki et al., 2012; Davis et al., 2014). This situation changed when newly developed gene targeting technologies called site-specific engineered nucleases or "gene scissors" became available. The designed engineered nucleases, such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), or clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas), are very effective in creating double-stranded breaks (DSB) at a specific locus of a genome, thereby facilitating genetic modifications, including knockouts via non-homologous end joining (NHEJ) and knockins via homology-directed repair (HDR) (Hsu et al., 2014) (**Figure 1**). Creation of an intentional DSB in the genomic target can stimulate HR by 50–1000-fold (Jasin, 1996). Therefore, a high rate of DSB formation results in a high rate of modification either in somatic cells or in embryos. With the use of these gene scissors, a large number of genetically modified pigs have been generated through SCNT of modified somatic cells or direct microinjection of engineered nucleases into the embryos. In addition to establishment of genetically modified pigs with agricultural and biomedical values, this technology might have a potentially wider range of application in pigs, such as treatment of viral infections as a therapeutic tool, and gene therapy to correct mutation in pig disease model. These areas still await further investigation.

#### NUCLEASE-BASED GENE EDITING TOOLS

#### Zinc Finger Nucleases (ZFNs)

Zinc Finger Nucleases are artificial chimeric proteins consisting of a specific DNA-binding domain, which comprises tandem zinc finger-binding motifs, fused to a non-specific cleavage domain of the restriction endonuclease FokI (Kim et al., 1996; Urnov et al., 2010). Zinc finger protein characteristically consists of two beta sheets and an alpha helix, with one or more coordinated zinc ions at their core to confer rigidity to finger (Pavletich and Pabo, 1991). Given that a zinc finger protein unit recognizes 3 bp of DNA, usually in ZFN, 3–6 zinc finger units are combined to recognize 9–18 bp DNA sequences to achieve a specific targeting. By designing two zinc finger motifs recognizing either side of 5– 6 bp spacer sequences at a target region, FokI nuclease combined with zinc finger can introduce DSBs within a target region (Kim et al., 1996; Smith et al., 1999; Bibikova et al., 2003; Porteus and Baltimore, 2003; Urnov et al., 2010).

Being the early version of artificial engineered nucleases, ZFN opened a new possibility for gene targeting manipulation in pigs, although this technology still suffers from a complicated construction process and unpredictability of targeting activity. In general, a rational design and assembly of ZFN is somewhat a tough task for many laboratories (Klug, 2010; Lam et al., 2011; Chandrasegaran and Carroll, 2016). An effective ZFN reagent can only be obtained from some commercial sources at a prohibitive price or laboratories that embark in intensive work on ZFN. To the best of my knowledge, studies creating ZFN-mediated genetically modified pigs all use commercially synthesized ZFN reagents. The difficulty of generating active ZFN reagents has impeded their extensive use. Nevertheless, compared with the extremely low gene targeting efficiency of less than 10−<sup>6</sup> by using conventional HR in somatic cells, ZFN can achieve approximately 1–4% gene targeting rate in selection of modified pig somatic cells (Hauschild et al., 2011; Yang et al., 2011), thereby allowing the cost-effective generation of genetically modified pigs.

### Transcription Activator-Like Effector Nucleases (TALENs)

Transcription activator-like effector nucleases (TALENs) actually have a similar conceptual structure to ZFNs by comprising a DNA binding domain and a DNA cleavage domain and by acting in pairs to satisfy the requirement for dimerization (Christian et al., 2010; Miller et al., 2011). The DNA binding domain of TALENs, named transcription activator-like effector (TALE), originates from the plant-pathogenic bacterium Xanthomonas and includes tandem repeat modules of 34 amino acids, with each module specifying the binding to a single base pair. The repeat modules can be rearranged according to a simple cipher to target any DNA sequence (Boch et al., 2009; Moscou and Bogdanove, 2009). Unlike ZFN, TALEN reagents are easy to build by several assembly schemes and can be produced routinely for many laboratories (Cermak et al., 2011; Li et al., 2011; Morbitzer et al., 2011). Aside from their simple design and assembly, TALENs have a broad target range and a substantially improved targeting activity; thus, active TALENs may be designed for almost any DNA target in a genome. About 64% of synthesized TALENs are active in livestock fibroblasts. Three-quarters of these active TALENs demonstrate a high cleavage efficiency (19– 40% NHEJ rates). Moreover, TALEN pairs efficiently induce gene knockouts after direct injection of TALEN-encoding mRNA into the cytoplasm of swine and bovine embryos, with a 29% and 43–75% knockout efficiency, respectively (Carlson et al., 2012).

Given that HR-mediated gene knockin results in precise alteration, including point mutation, DNA fragment replacement, and insertion of a new DNA sequence in a target site, knockin modification has a wide range of application prospect compared with the NHEJ-mediated gene knockout, which only causes loss in the functional phenotype of a gene. However, most of the created DSBs were repaired by NHEJ, and only few of them could be repaired by HR when a donor DNA repair template was offered (Mao et al., 2008). Therefore, HR-mediated knockin manipulation is less efficient than NHEJmediated knockout even in the presence of engineered nucleases. The high effectiveness of TALEN-mediated gene targeting could achieve an effective gene knockin. Using TALENs and oligonucleotide donor transfection that introduces defined nucleotide changes into multiple targets in the genome of livestock fibroblasts, Tan et al. achieved 10–64% knockin cell colonies, with up to 32% of the homozygous knockin colonies in just one round of transfection (Tan et al., 2013). Another study has established Rosa26 knockin pig models by using TALEN plasmid and long-range arm donor DNA transfection. In this study, the knockin efficiency in selected fibroblasts was as high as 31.3% (60 positive colonies of 192 selected fibroblast colonies) (Li et al., 2014).

#### Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-Associated Protein (Cas)

Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas), originally known as a microbial adaptive immune system, has been adapted for mammalian gene editing recently. The CRISPR/Cas system is based on an adaptive immune mechanism in bacteria and archaea to defend the invasion of foreign genetic elements through DNA or RNA interference (Gasiunas et al., 2012; Jinek et al., 2012; Wiedenheft et al., 2012). Through mammalian codon optimization, CRISPR/Cas has been adapted for precise DNA/RNA targeting and is highly efficient in mammalian cells and embryos. The most commonly used and intensively characterized CRISPR/Cas system for genome editing is the type II CRISPR system from Streptococcus pyogenes; this system uses a combination of Cas9 nuclease and a short guide RNA (gRNA) to target specific DNA sequences for cleavage. A 20-nucleotide gRNA complementary to the target DNA that lies immediately 5<sup>0</sup> of a PAM sequence (NGG) directs Cas9 to the target DNA and mediates cleavage of double-stranded DNA to form a DSB (Cong et al., 2013; Mali et al., 2013). Thus, CRISPR/Cas9 can achieve gene targeting in any N20-NGG site.

Since CRISPR/Cas9 first emerged, researchers have been highly impressed by its incomparable gene targeting efficiency and simple construction of customized vectors compared with previous site-specific nucleases. These characteristics render CRISPR/Cas9 highly accessible for almost any laboratories, thereby significantly contributing to the progress of research in many areas of biomedicine and agriculture. The high cleavage activity of CRISPR/Cas9 allows simultaneous targeting of multiple loci in a single cell within a single reaction (Cong et al., 2013). GGTA1/iGb3S and GGTA1/CMAH double knockout and GGTA1/iGb3S/CMAH triple knockout pigs, which have potential for xenotransplantation, were created by a single-step transfection of multiplexed sgRNA and Cas9 nuclease together with a single nuclear transfer (Li et al., 2015). We generated homozygous Pink1/Parkin double knockout pigs as Parkinson's disease models through a single transfection of CRISPR/Cas9 and SCNT. The frequency of selecting homozygous double knockout fibroblast colonies could reach up to 38.1% (Zhou et al., 2015).

#### GENETICALLY MODIFIED PIGS FOR AGRICULTURAL APPLICATION

Traditional selective breeding has produced a series of superior livestock varieties that demonstrate a dramatically enhanced production performance compared with their

original counterparts. However, individual trait shows only 0.5–3.0% genetic response to selection per year, and some traits such as fertility and disease resistance remain difficult to improve (Clark and Whitelaw, 2003). Furthermore, when production traits have been improved to a certain degree, their further optimization would even be more difficult and would need an even longer breeding cycle to achieve even a slight progress. Genome editing offers an alternative approach to rapidly and directly realize genetic improvement in livestock (Ruan et al., 2017). Fully improvement of individual and even multiple traits can be accomplished within only one generation. Importantly, by using genome editing, we can confer upon animals favorable genetic traits that are unavailable in natural genetic sources, thereby generating novel livestock varieties that cannot be achieved through traditional breeding.

#### Meat Production

Myostatin (MSTN) is a negative regulating factor of skeletal muscle mass in vivo. MSTN knockout mice exhibit a two– threefold increase in muscle mass due to muscle fiber hyperplasia and hypertrophy (McPherron et al., 1997). Natural mutations of MSTN have been found in several species, including cattle (Grobet et al., 1997; Kambadur et al., 1997; McPherron and Lee, 1997), sheep (Clop et al., 2006), and dogs (Mosher et al., 2007). MSTN modification is an effective approach to enhance muscle growth in various animals. Given that this gene is a "hot" candidate that is possibly beneficial in agriculture, several strains of MSTN-knockout pigs have been generated by using ZFNs (Qian et al., 2015), TALENs (Rao et al., 2016), and CRISPR/Cas9 (Wang et al., 2015, 2017b; Bi et al., 2016). These MSTN-knockout pigs demonstrate muscle hypertrophy or double-muscled (DM) phenotype, with increased muscle mass and decreased fat accumulation compared with wildtype (WT) pigs. One study showed some MSTN-mutant pigs with one extra thoracic vertebra (Qian et al., 2015). MSTNknockout pigs are valuable breeding materials for rapid genetic improvement to produce lean meat from fat-type (indigenous) pig breeds.

However, some deleterious effects were found in some MSTNnull DM pigs. In homozygous MSTN knockout piglets of the Landrace breed, newborns have abnormal forelegs and/or hind legs and thus their motor function is severely impaired. The affected piglets usually die quickly after birth (Zou et al., 2018). MSTN possibly plays an important role in the development and function of muscles and other organs as MSTN is expressed from early embryogenesis through adulthood. It seems that MSTN modification could result in severe side effect in some pig breeds, albeit not observed in all reported MSTNknockout pigs. Therefore, a careful selection of pig breeds which show an increased meat production but minimized collateral damage by MSTN modification is needed for establishment of MSTN-knockout DM pig breeds. A new candidate target (FBXO40) that influences muscle production has been found recently. FBXO40 knockout pigs display a muscle hypertrophy phenotype and survive normally without detectable pathological changes in major organs (Zou et al., 2018). Also of note is that the DM animals are more susceptible to respiratory disease, lameness, stress and dystocia, thus requiring extra attention in husbandry to preserve animal welfare (Fiems, 2012).

#### Viral Resistance

Porcine reproductive and respiratory syndrome virus (PRRSV) is the most economically important swine disease worldwide, currently causing huge economic losses in the swine industry. Genome editing shed light on the establishment of PRRSVresistant pigs through knockout of viral receptors in pigs. Potential PRRSV entry mediators, SIGLEC1 and CD163 were knocked out in pigs through conventional HR and CRISPR/Cas9, respectively (Prather et al., 2013; Whitworth et al., 2014). PRRSV challenge in knockout pigs demonstrated that SIGLEC1 is unnecessary for infectivity (Prather et al., 2013), whereas CD163 is the definitive receptor for PRRSV. CD163 knockout pigs are fully resistant to PRRSV challenge with no obvious PRRSV-related symptom and no detectable PRRSV antibody and RNA in the serum (Whitworth et al., 2016). Furthermore, multiple genotypes of CD163 modifications were achieved by using CRISPR/Cas9 in different laboratories; these modifications include CD163 total knockout (CD163 null) (Yang et al., 2018), domain swap of CD163 exon 7 (corresponding to SRCR 5, the PRRSV binding domain at the protein level) with human CD163L-1 exon 11 (chimeric CD163) (Wells et al., 2017), and CD163 truncation with a deletion of exon 7 (1SRCR5 CD163) (Burkard et al., 2017). Among these CD163-modified pigs, those with CD163 null phenotype were completely resistant to both Type 1 and Type 2 PRRSV isolates (Wells et al., 2017). The chimeric CD163 phenotype was resistant to Type 1 PRRSV but still supported the replication of the Type 2 virus (Wells et al., 2017). Our group also generated CD163 knockout pigs with null phenotype, and they demonstrated their complete resistance to the highly pathogenic PRRSV, which is the dominant circulating strain in China and other Asian countries (Yang et al., 2018).

#### Thermoregulation

Due to their lack of a functional UCP1 gene, pigs lack brown adipose tissue (BAT). As a result, the BAT-mediated adaptive non-shivering thermogenesis is absent in pigs (Trayhurn et al., 1989; Berg et al., 2006; Jastroch and Andersson, 2015). Newborn piglets are thus susceptible to cold stress, which may result in neonatal death. To address this issue, Zheng et al. inserted a mouse adiponectin-driven UCP1 into the porcine endogenous UCP1 locus by using a CRISPR/Cas9-mediated knockin strategy combined with SCNT. The resultant UCP1 knockin pigs showed an improved ability to maintain body temperature when acutely exposed to cold. Moreover, UCP1 prevented obesity by reducing fat deposition, an economically important trait targeted in pig breeding. UCP1 knockin pigs demonstrated reduced fat deposition through UCP1-promoted lipolysis (Zheng et al., 2017). Thus, UCP1 knockin pigs are a potentially valuable genetic resource for agricultural production on the basis of their improved thermoregulation and decreased fat deposition.

### Recipients for Spermatogonial Stem Cell (SSC) Transplantation

Spermatogonial stem cells sustain normal spermatogenesis and maintain male fertility through self-renewal and differentiation. SSCs can be transplanted to generate donor-derived offspring (Brinster and Zimmermann, 1994). From the agricultural perspective, SSC transplantation is a potential tool to rapidly expand the availability of gametes from desirable superior livestock, thereby dramatically influencing production efficiency, quality, and other production traits in a population (Ehmcke et al., 2006). A recipient male lacking endogenous SSC and other germ cells but preserving intact somatic support cells is required for a successful SSC transplantation. Although chemotoxic drug or irradiation was employed to destroy spermatogenesis and cause infertility, the outcome was not ideal as manifested by the incomplete elimination of endogenous germ cells or the severe side effect on the recipient animals (Oatley, 2017; Park et al., 2017). An approach to eliminate SSC by knockout of the gene essential for SSC development has established an ideal surrogate for SSC transplantation. Park et al. generated NANOS2 knockout pigs by directly injecting CRISPR reagents into the cytoplasm of embryos. Knockout males could not produce sperm but still kept intact seminiferous tubules structure, thus had the potential to serve as an ideal SSC recipient (Park et al., 2017).

#### GENETICALLY MODIFIED PIGS AS DISEASE MODELS

Genome editing has extensively and intensively promoted the application of pigs as human disease models. Changing the pig genome allows these animals to resemble the mutations causing genetic disorder in humans, and pigs could phenocopy human disease manifestations more accurately than the commonly used mouse models. The suitable size and long lifespan of pig disease models also facilitates carrying out surgical manipulation closer to clinical conditions and long-term tracking and evaluation of therapeutics over clinically relevant time frames. Using the genome editing tools, genomic changes could occur not only in a single gene but also in multiple genes simultaneously with a high editing efficiency, thereby paving a way to mimic and decipher complex polygenic heredity diseases in large animal models similar to humans.

#### Neurodegenerative Diseases

Genetically modified pigs have been successfully used to establish animal models of neurodegenerative diseases, including Huntington's disease (HD) and Parkinson's disease (PD) (Zhou et al., 2015; Yan et al., 2018). Yan et al. reported a huntingtin (HTT) knockin pig as Huntington's disease model, in which pig HTT exon 1 containing 18 CAG repeats was replaced with human HTT exon 1 containing a 150-CAG repeat with CRISPR/Cas9. An expanded CAG repeat causes the HD phenotype (Mangiarini et al., 1996). The cloned HTT knockin pigs showed less weight gain compared with age- and sexmatched WT pigs, and HD-like symptoms, including deficient motor function and respiratory difficulty usually observed in HD patients. Morphological analysis revealed neuropil and nuclear HTT aggregates in the brain of knockin pigs, as well as a marked decrease in the number of neurons in the striatum compared with that in the cortex and cerebellum; this condition is similar to the selective neurodegeneration in the striatum of brains of HD patients (Yan et al., 2018).

Parkinson's disease is the second most common form of neurodegenerative disorders. Approximately 10% of PD are familiar cases with many disease-associated/caused genetic mutations were identified (Lesage and Brice, 2009). Zhou et al. (2015) reported on Pink1/Parkin double knockout pigs generated using CRISPR/Cas9 in a one-step transfection and cloning. The brains of cloned knockin pigs lost the expression of Pink1 and Parkin, but no PD-associated phenotypic changes were found (Zhou et al., 2015). These pigs are expected to show late onset of the disease because PD is a progressive disease with a mean onset at around the age of 60 in humans.

#### Cardiovascular Diseases

Cardiovascular diseases are the number one cause of death and disability worldwide (Joseph et al., 2017). As the most commonly used disease models, mice usually cannot accurately model the physiology and pathology of the human cardiovascular system given their significantly different heart size and rate compared with humans (Milani-Nejad and Janssen, 2014). Therefore, mouse models usually fail to predict human outcomes. For example, thiazolidinediones (TZDs), selective ligands of PPAR-γ, can sensitize insulin to treat type 2 diabetes. Although TZDmediated activation of PPAR-γ exerts a beneficial effect on cardiovascular diseases in mouse models, the therapeutic efficacy of TZDs has been severely compromised because of the increased risk of adverse cardiovascular events in patients (Nissen and Wolski, 2007). Pig and human myocardia are highly similar. Thus, pigs are the most attractive models bridging the gap between humans and mice. Yang et al. (2011) established PPAR-γ heterozygous knockout pigs by using ZFN. Their work was the first to use the genome editing tool to knockout an endogenous gene of large animals (Yang et al., 2011).

The loss of function of low-density lipoprotein receptor (LDLR) and Apolipoprotein E (ApoE) has been implicated in the progression of atherosclerosis, the primary culprit for cardiovascular diseases (Mahley, 1988; Hasler-Rapacz et al., 1998; Sehayek et al., 2000; Rader et al., 2003). Given that the lipoprotein profiles and metabolism in mice differ from those in humans, atherosclerosis pig models may benefit atherosclerosis research. ApoE/LDLR double knockout pigs have been recently established by using CRISPR/Cas9 (Huang et al., 2017). These pigs show an abnormal lipid metabolism related to atherosclerosis. Prior to that, several strains of LDLR knockout pigs, including Yucatan miniature pigs engineered using the traditional HR (Davis et al., 2014) and Ossabaw pigs engineered by using TALENs (Carlson et al., 2012), were generated. Among the models, LDLR knockout Yucatan miniature pigs, which are used to model atherosclerosis, have been well characterized. LDLR-deficient pigs had considerably elevated levels in total and LDL cholesterol

fed a standard diet, resulting in atherosclerotic lesions in the coronary arteries and abdominal aorta that resemble human familial hypercholesterolemia and atherosclerosis in a short time (Davis et al., 2014). Recently, LDLR-deficient Yucatan pigs were used to test the effect of bempedoic acid on lowering LDL cholesterol and in attenuating atherosclerosis, indicating the value of LDLR-deficient pigs in preclinical evaluation of therapeutics (Burke et al., 2018).

#### Cancer

Acquired mutations are the most common causes of cancer. Instability of genome, activation of oncogenes, and inactivation of tumor-suppressor genes all can result in different types of cancer. An optimal cancer animal model should be established by inducing genetically defined tumors in a tissue-specific manner rather than by genetically engineering the animal. Wang et al. (2017a) established a Cre-dependent inducible Cas9-expression pig with CRIPSR/Cas9-mediated knockin in the Rosa26 locus. This pig model allows ex vivo genome editing in isolated pig cells and in vivo genome editing by introducing a corresponding gRNA. To mimic an abnormal oncogenic EML4-ALK fusion gene identified in a subset of non-small-cell lung cancers (NSCLCs) (Soda et al., 2007), two gRNAs targeting EML4 and ALK were introduced into the isolated pig fibroblasts via lentivirus. Fibroblasts with oncogenic EML4-ALK fusion gene arising through a paired CRISPR/Cas9-mediated genome inversion could be generated. Moreover, the study investigated the feasibility of lung cancer induction in vivo through intranasal delivery of multiplexed gRNAs targeting tumor suppressor genes TP53, PTEN, APC, BRCA1, and BRCA2, as well as oncogene KRAS. Three months after gRNA administration, the Cre-dependent Cas9-expressing pigs presented signs of pneumonopathy, and morphological analysis of lung tissue showed a pathological feature similar to the human adenocarcinoma. The target genes harbored insertions/deletions close to the CRISPR-cleavage site, indicating a loss-of-function mutation in the tumor suppressor genes. For the oncogene KRAS, gain-of-function mutations induced by CRISPR/Cas9 were observed, with genotypes similar to that in the potent oncogenic mutations in human lung tumors (Wang et al., 2017a). The inducible Cas9 pig models offer an ideal platform for inducing tumor-associated somatic mutations in situ to model human cancer. Apart from this model, several other pig cancer models were produced, including knockout of tumor suppressors P53 and RUNX3 by TALENs and CRISPR/Cas9 as the germline mutations, respectively (Kang et al., 2016; Shen et al., 2017). The carcinogenic phenotypes of these animals require further investigation.

#### Immunodeficiency

An animal with severe combined immune deficiency (SCID) not only mimics human diseases but also serves as a valuable research tool for cancer, stem cell, cell therapy, and organ transplantation. The interleukin-2 receptor gamma (IL2RG) knockout pigs that were generated using conventional HR or ZFN exhibited X-linked SCID, in which T and NK cells were absent (Suzuki et al., 2012; Watanabe et al., 2013). In generation of ZFN-mediated IL2RG knockout pigs, ZFN-encoding mRNA was transfected into male porcine fibroblasts to target IL2RG. The researchers obtained 1 ZFN-induced knockout cell line from 192 single cell-derived cell lines obtained by limiting dilution (0.5% targeting efficiency) (Watanabe et al., 2013). In the allogeneic bone marrow transplantation, the lymphoid lineage of the SCID pigs was reconstituted by donor cells and survival time was prolonged through restoration of immune function (Suzuki et al., 2012). This SCID pig serves as an essential preclinical model to evaluate stem cell therapy. In 2014, two laboratories separately established RAG1/2 knockout pigs by using TALENs (Huang et al., 2014; Lee et al., 2014). RAG knockout pigs exhibited an SCID phenotype and lacked T and B cells. In one study, human induced pluripotent cells were injected into these SCID pigs, which developed teratomas that represent a wide range of human tissues.

In another study, B cell-deficient pigs were generated by applying CRISPR/Cas9 to target the IgM heavy chain gene, which is crucial in B cell development and differentiation (Chen et al., 2015). The cloned modified pigs manifested a depletion of both B cells and antibody in their blood and thus can be used to model human B cell deficiency. Moreover, the pig models can be further engineered for large-scale production of therapeutic humanized polyclonal antibodies for clinical use.

#### GENETICALLY MODIFIED PIGS FOR XENOTRANSPLANTATION

Domestic pigs are the most suitable donors for xenotransplantation to alleviate the growing shortage of allogeneic donor organs for clinical transplantation to treat patients with end-stage organ failure. Pigs can provide sizematched organs at an affordable cost. However, the high immune incompatibility between the donor and the recipient is the major barrier for xenotransplantation. The immunological hurdles to xenotransplantation include but not limited to hyperacute rejection, delayed xenograft rejection, acute cellular rejection, and chronic rejection (Yang and Sykes, 2007; Hryhorowicz et al., 2017). Advances in genome editing enable genetic modifications in pigs to reduce cross-species immune barrier and prevent xenograft rejection.

The host normally destroys xenografts within minutes or hours through hyperacute xenograft rejection, which inevitably leads to failure. The main reason for hyperacute rejection of xenograft is the presence of naturally occurring antibodies in human plasma that recognizes the Galactose α(1–3)Galactose (Galα(1–3)Gal) antigen on the surface of porcine endothelial cells. Synthesis of Galα(1–3)Gal is catalyzed by the enzyme a-1,3-galactosyltransferase (GGTA1), which is present in pigs but absent in humans (Good et al., 1992; Galili, 1993). As early as 2002, pigs with heterozygously knockout GGTA1 were produced using HR (Dai et al., 2002; Lai et al., 2002), and homozygous GGTA1 knockout (GTKO) piglets were born by further screening of homozygous knockout cells from the heterozygous pigs (Phelps et al., 2003;

Yang and Wu Genome Editing of Pigs

Kolber-Simonds et al., 2014). Heart xenotransplantation from GTKO pigs into immune-suppressed baboons showed a mean graft survival period of 99 days, and the longest surviving graft functioned in the recipient for 179 days (Kuwaki et al., 2005). Generation of GTKO pigs indicates that hyperacute xenograft rejection has been overcome to a great extent. A series of GTKO pigs with different genetic backgrounds has been recently established by using genome editing tools (Hauschild et al., 2011; Xin et al., 2013; Bao et al., 2014; Feng et al., 2016; Petersen et al., 2016; Chuang et al., 2017). Furthermore, other xenoreactive antigens present in pigs but absent in humans have been identified; these antigens include Neu5Gc antigen (N-glycolylneuraminic acid) catalyzed by cytidine monophosphate-N-acetylneuraminic acid hydroxylase (CMAH) and a glycan produced by β1,4-Nacetylgalactosaminyltransferase (B4GALNT2) (Byrne et al., 2015, 2018). In addition, corresponding knockout pigs with double (GGTA1/CMAH) and triple (GGTA1/CMAH/B4GALNT2) gene inactivation were established (Lutz et al., 2013; Estrada et al., 2015; Miyagawa et al., 2015; Gao et al., 2017). Based on the GTKO pigs, numerous other genetic modifications were performed to further overcome xenogeneic barriers; these approaches mainly include individual or combined overexpression of transgenes, such as CD46, CD55, CD59, CD39, thrombomodulin, heme oxygenase 1, A20, HLA-E, and CD47, to prevent complement activity, delayed xenograft rejection, or/and acute cellular rejection (Fischer et al., 2016; Hryhorowicz et al., 2017; Laird et al., 2017). These pigs were produced through transgenesis and thus excluded in this review for a deep discussion. These multi-modified pigs may exhibit further immune tolerance to attenuate xenograft injury.

Besides the concern on pig-to-human immune barrier, another notable issue in xenotransplant is the risk of cross-species transmission of porcine endogenous retroviruses (PERVs), which are dormant (inactive) endogenous retroviruses constituting an integral part of the porcine genome; PERVs may be reactivated by certain factors or changes in the environment and thus become infectious (Patience et al., 1997; van der Laan et al., 2000). PERVs in xenografts possess the potential to become pathogenic and infectious in the recipient. Elimination of PERV in porcine genome is difficult because they are integrated in multiple locations in the genome. Yang et al. (2015) used CRISPR/Cas9 to disrupt all 62 copies of the PERV pol gene in the immortal pig cell line PK15; by using the PERV knockout cells, they demonstrated a >1000-fold reduction in PERV transmission to human cells (Yang et al., 2015). The same group has recently cloned PERV inactivated pigs combining CRISPR/Cas9 and SCNT, in which all 25 copies of functional PERVs were inactivated (Niu et al., 2017). The use of PERV knockout pigs addressed the safety concern in clinical xenotransplantation. Moreover, the powerful ability of the CRIPSR tool to target dozens of genomic sites simultaneously provides infinite possibility to create animals harboring complex modifications.

In this section, we present an emerging cutting-edge technology, that is, growing of humanized organ in pigs or called xeno-generation; this technology is greatly promoted with the use of genome editing. Compared with directly modifying pig as an organ donor, xeno-generation includes interspecies blastocyst complementation combining donor (human) pluripotent stem cells and organogenesis-disabled hosts (pigs), allowing the enrichment of donor cells in a target organ to form a chimeric animal harboring a humanized target organ, which can be finally used for transplantation (Wu et al., 2016). In this regard, organogenesis-disabled pigs can be simply realized by genome editing when the gene controlling the development of the target organ is identified. Matsunari et al. (2013) demonstrated the feasibility of isogeneic organ generation using blastocyst complementation in pigs. In which, the transgene Hes1 under the Pdx1 promoter (Pdx1-Hes1) was used to suppress pancreatic development, resulting in the creation of apancreatic pigs compatible for pancreatogenesis derived from donor cells. Wu et al. (2017a) reported on an interspecies chimerism that human pluripotent stem cells could integrate and differentiate in a pig embryo, constituting a big step toward xenogeneic organ generation. Furthermore, this group disabled pancreatogenesis in pigs through knockout of PDX1 using CRISPR/Cas9, creating a suitable platform for realizing human organogenesis in pig (Wu et al., 2017b).

### CONCLUSIVE THOUGHTS

#### Comparison of Three Engineered Nucleases

Engineered nuclease-based genome editing has revolutionized the creation of genetically modified pigs, thus expanding their utilization in diverse research fields. As the prelude of next-generation genome editing technology, ZFN surprised researchers for opening an effective approach to modify a target genome site. The gene targeting efficiency of ZFN, although limited, remains considerably higher than that of traditional HR. TALEN quickly overtook ZFN technology when it first appeared due to its higher efficiency in gene targeting, greater flexibility in targeting specific sequences, and ease of construction. Moreover, the emergence of CRISPR/Cas system represents a major leap for remarkably efficient specific genetic modification in mammalian cells and zygotes. In addition, the design and production of gRNA are through a quick and simple procedure by the in vitro transcription of synthetic DNA oligonucleotides or by the cloning of oligonucleotides into expression vectors; this process offers a clear advantage over the production of ZFN and TALEN. However, few concerns emerged in CRISPR/Cas, such as requirement of a PAM sequence adjacent to the 3<sup>0</sup> end of the target sequence and a high frequency of off-target cleavage. The requirement of a specific PAM sequence often restricts the range of targetable sequences. The commonly used Cas9 protein derived from S. pyogenes utilizes NGG as the PAM sequence, but recent exploitation of Cas9 orthologs from other bacterial species or redesigned/evolved Cas9 can recognize different PAM sequences, thereby increasing flexibility in genome editing. A study reported on the fusion of Cas9 with a programmable DNA-binding domain at an improved precision and increased targeting range. The Cas9 fusion protein was equally efficient for a range of PAMs, including NAG, NGA,

NGC, and NGG (Bolukbasi et al., 2015). Another study used phage-assisted continuous evolution to generate an expanded PAM SpCas9 variant that can recognize a broad range of PAM sequences, including NG, GAA, and GAT. Moreover, the SpCas9 variant demonstrated a considerably greater DNA specificity than the original SpCas9 (Hu et al., 2018). The current Cas9 family expands the DNA targeting scope of CRISPR systems and is suitable at nearly any genomic locus. With regard to off-target, a typical TALEN target sequence usually covers 30 nt, which is unique within the genome, whereas CRISPR/Cas recognizes 20 nt target and can allow multiple mismatches in the guide sequence, thereby increasing the likelihood of off-target effects (Fu et al., 2013; Hsu et al., 2013). A possible solution is the use of Cas-nickases guided by a pair of gRNAs targeting the opposite strands for cooperative genome editing, as the longer target site increases the on-target precision (Mali et al., 2013; Ran et al., 2013). In addition, various strategies have been developed to improve the targeting specificity; these strategies include optimized design of gRNA, Cas9 enzyme engineering, and off-target detection assays (Tycko et al., 2016). Notably, two recent engineered enzymes (eSpCas9 and SpCas9-HF) have elegantly increased SpCas9 specificity by reducing tolerance for mismatched DNA binding (Kleinstiver et al., 2016; Slaymaker et al., 2016).

#### Approaches in Pig Genome Editing: SCNT Versus Embryo Injections

Somatic cell nuclear transfer or cloning involves screening of somatic cells (typically fetal fibroblasts), which carry the intended genetic alterations, and the nuclear transfer of the modified cells in a cloning process. Engineered nuclease can be easily applied to create either NHEJ- or HDR-induced mutations within a donor cell in vitro through a pre-screening or selection strategy, which enables enrichment of cells carrying the desired mutation (Zhou et al., 2015). An alternative to SCNT is the method involving direct gene editing in single-cell embryos. The mRNA of editors (for knockout) or together with donor DNA (for knockin) can be microinjected into the cytoplasm or pronucleus of zygotes, which are then transferred into the synchronized surrogates to generate edited animals. This procedure is vastly simple compared with SCNT (Lillico et al., 2013).

The major advantage of SCNT over direct embryo injection is the predictable genotype of the founder pigs. By contrast, pigs generated via embryo injection usually contain mosaic genotypes with multiple modification types in different cells, and several cycles of breeding are usually needed to produce homozygously modified pigs with identical genotype. In some situations, the chimeric founder includes intact WT germ cells and thus cannot generate genetically modified progeny. However, SCNT of genome-edited cells suffer from an impaired embryonic development probably due to the offtarget effect or other unidentified toxicity of the genome editing nucleases. Therefore, genome-edited somatic cells usually have a relatively lower cloning efficiency than WT and randomly integrated transgenic cells. Taking into account the aspects mentioned above, embryos injection is currently the preferable ways for production of CRISPR/Cas9-edited pigs in many laboratories. The extremely high editing activity and low toxicity to embryos during microinjection of CRISPR/Cas9 mRNA cocktail can reduce mosaicism and even generate homozygously knockout founder pigs with identical mutation (Whitworth et al., 2017). However, knockin manipulation remains a limitation because few knockin pigs generated through zygote injection have been reported to date (Peng et al., 2015; Zhou et al., 2016). The field still awaits for the optimal strategies that enhance HDR, such as drug treatment, optimal design of donor, and new generation of editing tools.

## Other Emerging Genome Editing Nucleases

The number of reported genetically unique large animals dramatically increases as a result of the extensive use of genome editing tools. Moreover, the types of endonucleases used in genome editing are rapidly increasing. In the Cas9 system, many Cas9-like nucleases were developed given the natural diversity of bacterial CRISPR systems. Cpf1, a putative Class 2 CRISPR effector, mediates target DNA editing with distinct features from Cas9 (Zetsche et al., 2015). In contrast to Cas9 which generates blunt ends, Cpf1 generates a 5 nt staggered cut with a 5<sup>0</sup> overhang, which is particularly advantageous in facilitating a NHEJ-based gene insertion (knockin) into a genome. Recently, the CRSIPR/Cpf1-mediated dystrophin knock-out pigs, and phospholamban knock-in pigs with a 3-nt deletion in the presence of a single-stranded oligo donor, have been established as a Duchenne muscular dystrophy (DMD) and dilated cardiomyopathy (DCM) models, respectively. In this study, the CRSIPR/Cpf1 induced 41.8% knockout rate and 2% knockin rate in the selected fibroblast colonies (Wu et al., 2018). A hybrid enzyme combining the Cas9-nickase and PmCDA1, an activation-induced cytidine deaminase could perform targeted nucleotide substitution (C-U) without the use of template DNA, providing a novel route for point mutation (Nishida et al., 2016). A CRISPR system (Cas13a) that targets RNA has also been developed recently (Abudayyeh et al., 2017). In addition to the CRISPR system, Xu et al. (2016) designed a structure-guided endonuclease (SGN) consisting of flap endonuclease-1 that recognizes the 3 <sup>0</sup> flap structure and the cleavage domain of Fok I, which cleaves DNA strands. A guide DNA complementary to the target with an unpaired 3<sup>0</sup> end is needed to form a 3<sup>0</sup> flap structure. SGN recognizes and cleaves the target DNA on the basis of the 3<sup>0</sup> flap structure of a double-flap complex formed between the target and the guide DNA. The SGN offers a strategy for a structure-based recognition, capture, and editing of any desired target DNA, thereby expanding the toolkit for genetic modification (Xu et al., 2016). Taken together, these systems are expected to substantially broaden the application of artificial engineered nucleases and facilitate the establishment of excellent pig models with desirable genotypes in agriculture and biomedicine.

### AUTHOR CONTRIBUTIONS

fgene-09-00360 September 3, 2018 Time: 10:49 # 9

HY wrote the initial draft of the manuscript and worked on subsequent revisions. ZW worked on revising the manuscript.

#### REFERENCES


### FUNDING

This work was supported by a grant from the National Natural Science Foundation of China (31772555).



monophospho-N-acetylneuraminic acid hydroxylase gene double-knockout pigs. J. Reprod. Dev. 61, 449–457. doi: 10.1262/jrd.2015-058



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Yang and Wu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Trio-Based Deep Sequencing Reveals a Low Incidence of Off-Target Mutations in the Offspring of Genetically Edited Goats

#### Edited by:

Zhiying Zhang, Northwest A&F University, China

#### Reviewed by:

Alejo Menchaca, Fundación IRAUy, Uruguay Rui Chen, Baylor College of Medicine, United States Robin Ketteler, University College London, United Kingdom

#### \*Correspondence:

Bjoern Petersen bjoern.petersen@fli.de Xiaolong Wang xiaolongwang@nwafu.edu.cn

†These authors have contributed equally to this work as co-first authors

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 02 August 2018 Accepted: 18 September 2018 Published: 04 October 2018

#### Citation:

Li C, Zhou S, Li Y, Li G, Ding Y, Li L, Liu J, Qu L, Sonstegard T, Huang X, Jiang Y, Chen Y, Petersen B and Wang X (2018) Trio-Based Deep Sequencing Reveals a Low Incidence of Off-Target Mutations in the Offspring of Genetically Edited Goats. Front. Genet. 9:449. doi: 10.3389/fgene.2018.00449 Chao Li<sup>1</sup>† , Shiwei Zhou<sup>1</sup>† , Yan Li<sup>1</sup>† , Guanwei Li<sup>1</sup> , Yige Ding<sup>1</sup> , Lan Li<sup>1</sup> , Jing Liu<sup>1</sup> , Lei Qu<sup>2</sup> , Tad Sonstegard<sup>3</sup> , Xingxu Huang<sup>4</sup> , Yu Jiang<sup>1</sup> , Yulin Chen<sup>1</sup> , Bjoern Petersen<sup>5</sup> \* and Xiaolong Wang<sup>1</sup> \*

<sup>1</sup> College of Animal Science and Technology, Northwest A&F University, Yangling, China, <sup>2</sup> Life Science Research Center, Yulin University, Yulin, China, <sup>3</sup> Recombinetics, Saint Paul, MN, United States, <sup>4</sup> School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China, <sup>5</sup> Institut für Nutztiergenetik, Friedrich-Loeffler-Institut, Neustadt an der Weinstraße, Germany

Unintended off-target mutations induced by CRISPR/Cas9 nucleases may result in unwanted consequences, which will impede the efficient applicability of this technology for genetic improvement. We have recently edited the goat genome through CRISPR/Cas9 by targeting MSTN and FGF5, which increased muscle fiber diameter and hair fiber length, respectively. Using family trio-based sequencing that allow better discrimination of variant origins, we herein generated offspring from edited goats, and sequenced the members of four family trios (gene-edited goats and their offspring) to an average of ∼36.8× coverage. This data was to systematically examined for mutation profiles using a stringent pipeline that comprehensively analyzed the sequence data for de novo single nucleotide variants, indels, and structural variants from the genome. Our results revealed that the incidence of de novo mutations in the offspring was equivalent to normal populations. We further conducted RNA sequencing using muscle and skin tissues from the offspring and control animals, the differentially expressed genes (DEGs) were related to muscle fiber development in muscles, skin development, and immune responses in skin tissues. Furthermore, in contrast to recently reports of Cas9 triggered p53 expression alterations in cultured cells, we provide primary evidence to show that Cas9-mediated genetic modification does not induce apparent p53 expression changes in animal tissues. This work provides adequate molecular evidence to support the reliability of conducting Cas9-mediated genome editing in large animal models for biomedicine and agriculture.

Keywords: genome editing, CRISPR/Cas9, whole genome sequencing, off-target, de novo mutation

## INTRODUCTION

fgene-09-00449 October 1, 2018 Time: 16:45 # 2

Recent advances in genome editing using the type II bacterial clustered, regularly interspaced, palindromic repeats (CRISPR) associated (Cas) system have enabled efficient genetic modification in the genomes of many organisms, including large animal models for biomedicine or agricultural purposes. Within the CRISPR/Cas9 system, the Cas9 from Streptococcus pyogenes recognizes a 5<sup>0</sup> -NGG-3<sup>0</sup> PAM sequence on the non-target DNA strand, and allows complementation for 20-base-pair of target DNA sequence (Mali and Church, 2013). However, unwanted off-target mutations and chromosomal translocations are potential drawbacks, raising concerns about the precision of the CRISPR/Cas system, which would prohibit its use in correcting human genetic diseases, and for optimal commercialization within livestock genetic improvement programs (Garas et al., 2015; Ruan et al., 2017).

Off-target detection in advance is a challenge, as the existing proven analysis methods depend largely on amplification and sequencing of pre-selected off-target sites, identified by several bioinformatics tools [e.g., CasOT (Xiao et al., 2014), and CT-Finder (Zhu et al., 2016)]. This approach can be more difficult to implement when the analysis aims to interrogate all possible non-unique matches and allowed mismatches distal from the PAM sequences. Compared with Sanger sequencing or shortreads deep sequencing of pre-selected off-target PCR amplicons, whole genome sequencing (WGS) is a less biased assessment of off-target mutations caused by Cas9. WGS is able to fully characterize the genome-wide mutation profiles, which not only include small insertion and deletions (indels) and SNPs but also structural variants such as inversions, rearrangements, duplications, and major deletions (Zischewski et al., 2017). This approach has been used to screen for off-target mutations induced by CRISPR/Cas9 in human cells (Smith et al., 2014; Kim et al., 2015), mice (Veres et al., 2014; Iyer et al., 2015), and plants (Zhang et al., 2017). Screening for off-target mutagenesis in gene-edited animals is rare and will be highly important in farm animal since gene editing brings the commercial benefits of improving the genetics of livestock, and also serves as a research model for biomedical studies. In addition, investigating the mutation profiles in the offspring of edited animals will provide fundamental evidence to support the reliability of the CRISPR/Cas9 system.

Genetically modified goats were successfully generated through multiplex injection of four sgRNAs targeting two functional genes (MSTN and FGF5) and Cas9 mRNA in onecell-stage embryos (Wang et al., 2015). The desired phenotypes were independently observed in the edited animals. For example, disruption of MSTN caused increased muscle mass (Wang et al., 2018b), while disruption of FGF5 increased the number of secondary hair follicles and enhanced fiber length (Wang et al., 2016). Although healthy edited goats with ideal phenotypes were generated by this effort, the rate of genome-wide offtarget mutations in the edited animals and their progenies have not been well documented. As such, 11 goats from four family trios were sequenced at a high coverage (>36.8×), and the mutational profiles of these animals were systematically characterized to determine rates of off-target activity. The de novo mutation rate in the offspring were determined to be largely equivalent to the mutation rates of other populations such as human and cattle. Together with our previous results using triobased WGS to show a low incidence of off-target mutations in gene-edited sheep (Wang et al., 2018a), this work confirms the reliability of a multiplex-based CRISPR/Cas9 approach for the production of offspring from genetically modified large animal models that are intended for bio-medical studies and food production.

#### MATERIALS AND METHODS

#### Animals

The genetically edited animals (Shaanbei Cashmere goat) used in the present study were generated previously (Wang et al., 2015). Two edited males (#009 and #040) were selected and mated with either edited (#076, #082, #073) or wildtype (#1392) female animals (same breed) after puberty (1.5 years old). All animals involved in this study were maintained at the Research Farm of Yulin University, Yulin, China. All protocols involving animals were approved by the College of Animal Science and Technology, Northwest A&F University (Approval ID: 2014ZX08008002). The pedigree information of these four family trios was validated by estimating the kinship coefficient according to a previous study (Manichaikul et al., 2010), and were present in **Figure 1A**. Additional WGS data from two goat trios were selected as control trios for the present study (unpublished data).

#### Whole-Genome Sequencing and Data Analyses

A total of 11 goats from four family trios were chosen for WGS (**Figure 1A**). For each animal, genomic DNA was extracted from peripheral venous blood samples with a Qiagen DNeasy Blood and Tissue Kit (Qiagen). To construct the WGS library, 1 µg of genomic DNA was fragmented to around 300 bp by ultrasonication using a Covaris S2 system. Then, the sheared DNA fragments were used for library construction using an Illumina TruSeq DNA library preparation kit at Novogene<sup>1</sup> . The final quality-ensured libraries were sequenced on an Illumina Hiseq 3000 for 125-bp paired-end reads. The raw sequencing reads were first filtered to remove low quality paired reads with the following criteria: (1) reads with >10% N bases, (2) reads with >50% bases with a sequencing quality of <3, and (3) reads with residual length of <40 bases after the adaptor sequences were trimmed. All reads that passed the quality control procedures were converted into FASTQ files.

To analyze the mutational classes in all the mutations, 12 different mutations were categorized into nine classes (T > G, T > C, T > A, G > T, G > A, C > T, A > T, A > G, and A > C), and the base changes were

<sup>1</sup>http://www.novogene.com

substitution ratios from paternal or maternal origins in the F1 progenies.

measured in each animal and each family trio (Jónsson et al., 2017).

#### Identification of de novo Variants

All reads after quality control were mapped to the goat reference sequence assembly ARS1 (Bickhart et al., 2017), using BWA (v0.7.13) tools (Li and Durbin, 2009) with default parameters. Local realignment and base quality recalibration were conducted using the Genome Analysis Toolkit (GATK, v.2.7-2) (Mckenna et al., 2010). Both single nucleotide variants (SNVs) and indels (2–100 bp) were called using GATK and Samtools (Li et al., 2009; Mckenna et al., 2010).

De novo SNVs and indels for each F1 animal were extracted according to the following criteria: (1) SNVs were independently identified by GATK and Samtools within a single trio, (2) choosing the overlapped SNVs that were identified by GATK and Samtools, and selecting the SNVs that were specific in each F1 animal; (3) filtering SNVs that exist in a goat SNP database (n = 234, 11 populations including 30 cashmere goats, >79 million SNPs, unpublished data); (4) filtering SNVs with a read depth in parents <12, and in F1 animals <1/10 of the sum of the coverage in both parents (Allen et al., 2013; Jónsson et al., 2017); (5) the normalized phred-scaled likelihood (PL) scores for the genotypes (AA, AB, and BB) of F1 animals (A, reference allele; B, alternate

allele), the PL scores for each genotype in founders should be >20, 0, and >0, the PL scores for each genotype in both parents should be 0, >20, and >20 (Allen et al., 2013); (6) filtering SNVs with >10% average soft clipping per read; (7) manual examination to remove mis-aligned or miscalled SNVs/indels. Copy number variations (CNVs) were identified using a CNVcaller tool (Wang X. et al., 2017), we developed recently. The scanning window was defined as 400 bp, and the effective window value was set as 7, then the low-quality CNVs were removed manually. Structural variations (SVs) were independently detected using BreakDancer (Chen et al., 2009) and Lumpy (Layer et al., 2014) with suggested parameters. The detailed filtering pipelines for de novo SNVs are summarized in **Table 1**.

#### Prediction of Off-Target Sites

The putative off-target sites in the goat genome that might be recognized by the sgRNAs targeting the MSTN and FGF5 genes were predicted by CasOT (Xiao et al., 2014), and Cas-OFFinder (Bae et al., 2014). The potential off-target sites were defined as up to five mismatches according to a recent study (Boyle et al., 2017). The detailed information of predicted off-target sites is summarized in **Supplementary Table S1**.

#### Estimation of Mutation Rate

The estimation of mutation rate per base pair per generation was calculated according to a recent study (Jónsson et al., 2017). Briefly, we retrieved short read sequences (.bam file) were retrieved by averaging the coverage in 10,000 base windows and the sequences from autosome genome within 13× to 130× coverage were selected. This resulted in 245,722 effective windows or 2,457,220,000 base pairs (R) within our coverage

TABLE 1 | Summary of filtering process for SNVs and indels in the F1 progenies.

range. We then estimated the mutation rate per base pair per generation for each F1 animal by dividing the average number of de novo mutations (µα) by twice the R account.

$$
\stackrel\frown{\mathfrak{g}}\mathfrak{g} = \frac{\stackrel\frown{\mathfrak{h}}\mathfrak{a}}{2 \times R}
$$

#### Validation of Edited Sites, SNVs, and Indels

PCR-based Sanger sequencing was conducted to validate the genetic regions with editing or the existence of de novo variants (SNVs or indels) identified by WGS. Primers for amplifying the edited sites or the regions encompassing de novo variants are listed in **Supplementary Tables S2, S3**. Procedures for the purification of PCR products, the T7E1 cleavage assay, and Sanger sequencing were conducted according to our previously report (Wang et al., 2015).

#### RNA Sequencing and Data Analysis

Muscle from MSTN-edited and skin tissues from FGF5-edited F1 animals (n = 3) and wild-type animals (n = 3) at the same age (4-month) on the same farm, were collected for RNA-seq analysis. RNA extraction and sequencing were performed as described previously (Wang L. et al., 2017). Total RNA was isolated using Trizol Reagent (Invitrogen) and then treated with RNAse-free DNase I (Qiagen) according to the manufacturer's instructions. The quality and concentration of the total RNA were determined using an Agilent 2100 Bioanalyzer (Agilent). From each sample, 12 RNA libraries were constructed and oligo (dTs) were used to isolate poly (A) mRNA. The mRNA was fragmented and reverse transcribed using random primers. Second-strand cDNAs were synthesized using RNase H and


DNA polymerase I. The double-strand cDNAs were then purified using the QiaQuick PCR extraction kit. The required fragments were purified via agarose gel electrophoresis and were enriched through PCR amplification. Finally, the amplified fragments were sequenced using Illumina HiSeqTM 3000 system at Novogene (see text footnote 1), according to the manufacturer's instructions.

Among the raw data from RNA-seq, the sequencing adaptors, reads with unknown nucleotides larger than 5%, and the bases with low quality (more than half of the bases' qualities were less than 10) were removed. The remaining clean data was mapped to the currently available goat genome sequence assembly (ARS1) (Bickhart et al., 2017) using TopHat2 (Kim et al., 2013), to screen differentially expressed genes (DEGs). Counts for each gene were computed by means of the HTSeq Python package (Anders et al., 2015), and DEGs between the F1 progenies and control groups were determined with the EdgeR Bioconductor package using the classic method (Robinson et al., 2010). Gene Ontology (GO) functional enrichment analysis was conducted using g: Profiler to identify the functional categories enriched in DEGs (Reimand et al., 2016). The default settings were used, and GO terms with corrected P-value of less than 0.05 were considered significantly enriched.

The same RNA from muscle or skin tissues was used for qPCR analyses, to validate the RNA-seq results. First strand cDNA synthesized using the Thermo Scientific RevertAid First Strand cDNA Synthesis kit (#K1622, Thermo Fisher Scientific, United States) under the manual instructions, and was then subjected to quantification using a standard SYBR Premix Ex Taq (Tli RNaseH Plus) kit (#RR420A, Takara, China) on the Bio-Rad CFX96 Real-Time System. The primers for eight genes used for this study are listed in **Supplementary Table S4**. Biological and technical replicates were performed in triplicate for each sample. Gene relative expression was calculated using 2 <sup>−</sup>11Ct method, quantified relative to the housekeeping gene GAPDH.

### Immunofluorescent Staining and Western Blotting

The biopsied tissues were immediately fixed in 4% paraformaldehyde at 4◦C overnight. The fixed tissues were then embedded in paraffin using standard immunohistochemical protocols. The immunofluorescence staining was conducted with anti-p53 (Cell Signaling Technology) primary antibody, the sections were then counterstained with Hoechst 33342 and analyzed by confocal laser microscopy. We extracted total protein from muscles, and then quantified the protein using the Bradford assay. Equal amounts of soluble protein were separated by SDS/PAGE and transferred onto a polyvinylidene difluoride membrane (PVDF, Roche). Immunoblotting was conducted using antibodies specific for phospho-p53 (Ser15) (Cell Signaling Technology, 1:1000), and β-actin (Proteintech, 1:1000). Primary antibodies were visualized using a fluorescence imager system (Sagecreation).

### RESULTS

#### Generation of F1 Progenies

Cas9 mRNA and single guide RNAs targeting two functional genes, MSTN and FGF5, were multiplex-injected into one-cellstage goat embryos, to generate animals with gene disruption (Wang et al., 2015). From this treatment, edited goats with improved phenotypes for muscling and fiber length were successfully obtained (Wang et al., 2016, 2018b). We then selected five founder animals (#9, #040, #076, #082, and #073) and one wildtype individual (#1392) for natural breeding, and obtained five F1 progenies (**Figure 1A**). We next genotyped the targeted sites in the progenies and their parents through Sanger sequencing. The genotypes of mutations at on-targeting site were validated by both WGS data (**Supplementary Figure S1**) and Sanger sequencing (**Supplementary Figure S2**). The sequencing data confirmed that the edited sites in the founder animals are successfully transmitted to their offspring, except the F1 animal #P59 was wild-type at the MSTN\_sg1 locus, most likely because its dam is wildtype at this site (**Supplementary Figure S2**). In particular, mutations in #009 were transmitted to the twin progeny #P8 and #P9, even though #009 was mated with a wildtype female goat.

#### Deep Sequencing of Family-Trio Individuals

These 11 animals representing four family trios were subject to WGS variant analyses. The kinship coefficient values in each animal was used to ensure the correct pedigree information (**Supplementary Table S5**). WGS of 11 animal genomic DNAs yielded a total of 722 Gb of raw data, and produced between 516 and 944 million sequence reads mapped per animal (**Supplementary Table S6**). Over 99.02% of the generated sequence reads were mapped, indicating that high quality sequences were obtained. After alignment to the goat reference genome (ARS1) (Bickhart et al., 2017), an average of ∼36.8× (25.0–47.8×) sequencing depth were generated for further analysis (**Supplementary Table S6**).

Of all the SNVs identified by GATK, the mutation spectrums in each animal and each trio were analyzed. It was found that the C > T, A > G, G > A, and T > C substitutions are predominant in all the mutation types in each sequenced animal, and each base change type represents >17% of base changes (**Figure 1B**). Additionally, the proportion of base changes in parents and offspring is non-significant in each trio (p = 0.326, Student's t-test) (**Supplementary Figure S3**). The base changes in each family trio were further examined, and no significant changes were found among the trios used for sequencing (**Figure 1C**). Moreover, we observed that the differences in nucleotide substitution patterns between paternal and maternal mutations in F1 animals were non-significant (**Figure 1D**).

#### Identification and Validation of de novo SNVs

To detect SNVs that may be derived from Cas9-mediated genetic modification, we employed a series of stringent variant filtering

procedures (**Figure 2A** and **Table 1**). We initially called >11.6 million SNVs by GATK, and >12.0 million SNVs by Samtools independently in each progeny. We then selected the specific SNVs in each F1 animal, and chose the SNVs that were both identities by GATK and Samtools. Next, any SNVs that already existed in the goat SNP database (294 individuals from 11 wild and domestic populations, >79 million SNPs) were removed, and further filtering procedures included read depth, phred-scaled likelihood (PL) scores and manual examination according to a previous study (Allen et al., 2013). After manual check, 18, 24, 18, 11, and 14 SNVs remained in the F1 progenies P6, P97, P59, P8, and P9, respectively (**Table 1** and **Supplementary Table S4**). These de novo SNVs were distributed in all chromosomes, and did not no cluster near the gene target sites (**Figure 2B**). We selected the SNVs from P6, P59, and P9 for PCR amplification followed by Sanger sequence validation, which confirmed that

FIGURE 2 | Structural features of de novo mutations in the goat genome. (A) Workflow of filtering procedures of de novo variants in goat family trios. (B) Genomic distribution of de novo variants (SNVs, indels, and CNVs) in the goat genome. Red dots indicate the location of the two target sites in MSTN and FGF5. (C) Summary of off-target sites predicted by CasOT and Cas-OFFinder at each target site. (D) The distances between 100,000 randomly selected sites (upper), and de novo SNVs (below) to predicted off-target sites. The off-target sites were defined as one mismatch at seed regions, and up to four mismatches at non-seed regions. The least distance to predicted off-target sites was chosen. The area between two dashlines represents the 95% confidence interval. (E) Genomic distribution of SVs across the goat genome.

over 70% of SNVs truly exist (**Table 1** and **Supplementary Table S2**). The germline de novo mutation rate (per base pair per generation) in these F1 progenies was estimated to range from 0.85 × 10−8–1.42 × 10−<sup>8</sup> base substitutions per site per generation (**Table 1**). We next predicted the genome-wide offtarget sites using two programs, Cas-OT and Cas-OFFinder, we show that the vast majority of the off-targets identified by Cas-OFFinder were also included in the off-targets predicted by Cas-OT (**Figure 2C**). To ensure the reliability of predicted offtargets, we chose the overlapped off-targets for further analysis. The distance between 100,000 randomly selected SNV sites, and de novo SNVs to 534 predicted off-target sites were simulated (five mismatches) (**Supplementary Table S1** and **Supplementary Figure S4**), and no significant effects between the random selected SNVs and these de novo SNVs were observed from five F1 progenies, and two F1 animals from the control trios (p > 0.05, Kolmogorov–Smirnov test) (**Figure 2D**). Together, these results indicated that de novo SNVs in the F1 progenies resulted from normal spontaneous mutagenesis rather than from CRISPR/Cas-mediated gene editing.

#### Identification of Indels, CNVs, and SVs

Next, a comparative analysis within each trio was performed to identify for Cas9-induced small indels, given the possible likely outcomes of Cas9 induced double-strand breaks (DSB) repaired via non-homologous end-joining (NHEJ). Similar filtering procedures were used to screen candidate de novo indels as were conducted for de novo SNVs. After stringent indel filtering procedures including read depth, PL value and manual examination, a total of 19 indels were determined as de novo indels in all the five F1 animals (**Table 2** and **Supplementary Table S3**). PCR-based Sanger sequencing confirmed the existence of 13 indels (**Table 1** and **Supplementary Table S3**).

We next examined whether the large-scale genomic alterations (CNVs and SVs) could be attributed to Cas9 nucleases. CNVcaller (Wang X. et al., 2017) was used to search for CNVs, and after filtering CNVs by its genotypes and the effective window value, only four candidate CNVs were left in the F1 animals (**Figure 2B** and **Table 2**). A number of SVs were identified using BreakDancer (Chen et al., 2009) and Lumpy (Layer et al., 2014), and only a few remained after filtering and were considered as candidate de novo SVs in each animal (**Figure 2E**, **Table 2**, and **Supplementary Table S7**).

## Analyzing of Off-Target Mutations

To assess the off-target effects in F1 animals, we identified, using Cas-OFFinder, potential off-target sites with up to 3 nucleotide mismatches and NRG PAM sites in the goat genome. We then determined whether the de novo mutations, as well as the mutations that shared in parents and progenies were within the potential off-target sites, merely two indels were determined as off-target mutations (**Supplementary Table S8**). Sanger sequencing further validated these two variants (**Supplementary Figure S5**), indicating the off-target mutations are low in the offspring of edited animals and is guide RNA specific.

### RNA-seq Analyses of Edited Animals

We have recently analyzed the transcriptome profiles using muscle tissues from MSTN and/or FGF5-edited cashmere goats (Wang L. et al., 2017), and showed that the MSTNdisruption resulted in substantial changes in genes involved in lipid metabolism and biosynthesis. To better understand the transcriptional consequences of gene disruption in the genome of F1 progenies, we conducted transcriptome sequencing (RNA-seq) analysis in the edited progenies and WT animals using muscle or skin tissues. The volcano plot demonstrated that the expression of MSTN did not change significantly (**Figure 3A**). However, the disruption of MSTN resulted in 43 (23 up-regulated and 20 down-regulated) genes with significantly changed expression (**Figure 3C** and **Supplementary Table S9**). Some of these genes are known to be associated with muscle developmental including FMOD, ARG2, TNMD, CSRP3, PCK2, EGR1, and TNC. Meanwhile, disruption of FGF5 led to the identification of 140 DEGs (74 up-regulated and 66 down-regulated) in the skins of F1 progenies and control animals (**Figures 3B,D** and **Supplementary Table S10**). Key regulators related to skin/hair follicle development such as AQP3, AQP5, SPINK7, and WIF1 were involved, indicating that FGF5 disruption may stimulate hair follicle functions resulting in increased fiber length (Wang et al., 2016). We performed qPCR on ten DEGs (including MSTN and FGF5) using RNA isolated from tissues of the same individuals. The validation results revealed a similar correlation between the RNA-seq and qPCR results (**Figures 3E,F** and **Supplementary Figure S6**), suggestion the reliability of RNA-seq analyses.

Subsequently, we performed GO enrichment analysis to predict the over-represented GO terms associated the DEGs


"myeloid leukocyte migration") (**Figure 3H**).

identified in muscle and/or skin tissues. The DEGs identified in muscles exhibited significant over-representation of GO terms related to muscle fiber development (such as "sarcomere," "contractile fiber," and "myofibril") (**Figure 3G**). The DEGs identified in skins are significantly more enriched in GO terms related to skin and hair follicle development ("skin development," "keratinization," "keratinocyte differentiation," and "epidermis development") and immune responses ("granulocyte chemotaxis and migration," "leukocyte chemotaxis and migration," and muscles (**Figure 4C**). These results are consistent with the p53 expression in mice (personal communication), and suggested that Cas9-mediated modification did not induce apparent p53 expression changes in animal tissues. This result was probably due to p53-dependent molecular response in animal bodies may be differed with that in cultured cells (Haapaniemi et al., 2018), or the cellular response to DNA damage was repaired during the embryonic development process.

#### DISCUSSION

Deep sequencing is able to fully characterize the mutational profiles in genomes, and is used to detect mutational changes in genetically edited organisms (Huang et al., 2016). In this study, through sequencing four family trios at a high coverage, the de novo variants in F1 animals that could be attributed to the engineered nucleases were determined to be neglectable, representing a low incidence of CRISPR/Cas9-mediated offtarget mutations. On the other hand, our results further demonstrate the reliability of WGS in documenting mutations induced by genome editing.

fgene-09-00449 October 1, 2018 Time: 16:45 # 8

Previous studies have shown that the de novo mutations exhibited variant type preferences and discriminative parental origins, and the mutational signatures were influenced by multiple factors including nucleotide type, sequence context, replication timing, and epigenetics. In this study, the mutational spectrum in genomes other than human and mice are reported. The C-T and A-G transitions were predominant in goat genomes, the enrichment of C > T transitions at CpG dinucleotides could reflect spontaneous domination of methylated cytosine to thymine (Bolli et al., 2014). We observed that the proportion of base changes was not significant in the parents or in the F1 progenies (**Figure 1C**), indicating the mutation profiles are largely stable in the edited animals and their offspring.

Mutation rate is a key parameter for calibrating the timescale of sequence divergence. The estimated average germline de novo SNV rate (per base pair per generation) in the present study is 1.15 × 10−<sup>8</sup> , which is equivalent to the average germline mutation rate in serval trio-based human populations including CEU (1.2 × 10−<sup>8</sup> ) and YRI (1.0 × 10−<sup>8</sup> ) (Consortium et al., 2010), Icelanders (1.29 × 10−<sup>8</sup> ) (Jónsson et al., 2017), Danish (1.28 × 10−<sup>8</sup> ) (Maretty et al., 2017), a large cattle population (1.2 × 10−<sup>8</sup> ) (Harland et al., 2017), two goat trios (unpublished data), as well as three family trios of gene-edited sheep<sup>16</sup> (**Figure 5**). These findings further supported the conclusion that the de novo SNVs in the F1 animals are generated naturally rather than induced by genetic modification.

The CRISPR based approach relies on micro-injection of recombinant Cas9 mRNA/protein and guide RNAs and often results in off-target mutagenesis (Fu et al., 2013; Cho et al., 2014). Off-target sites predicted by different computational programs

may have less overlaps (Tsai et al., 2015). In the present study, Cas-OT recognized most of the off-targets that predicted by Cas-OFFinder, indicating the overlapped off-target sites are most likely represent the bona fide off-target sites for further analysis. Furthermore, our work presents trio-based WGS analysis to examine genome-wide de novo variants that may be induced by genetic modification in the F1 animals. Consistent with the offtarget mutations observed in human cells (Veres et al., 2014; Yang et al., 2014), and mice (Iyer et al., 2015), we observed low incident nuclease induced mutations in large animal models through deep sequencing. Therefore, supported the reliability of CRISPR approach for the production of viable animals.

We have previously demonstrated that disruption of MSTN in goats resulted in increased body weight and enlarged myofiber diameters in muscles (Wang et al., 2018b), we also show disruption of the FGF5 genes led to longer hair fibers in goats (Wang et al., 2016). To test the effect of Cas9-modification on global transcriptional status in the F1 progenies, we conducted RNA-seq on muscle and skin tissues to independently characterize the transcriptional consequences and genetic mechanism by knockout MSTN and FGF5 in F1 progenies. Inconsistent with previous studies (Wang et al., 2016, 2018b), we did not observe significantly expression changes of both MSTN and FGF5 with RNA-seq and validated by qPCR (**Figures 3E,F**). The plausible reason for this difference is animals from two generations were used for analyses. However, in the present study, we did identify a list of DEGs that are known to be associated with muscle development (e.g., FMOD, ARG2, TNMD, CSRP3, PCK2, EGR1, and TNC) in the muscle tissues of MSTN-disrupted animals, or skin and hair follicle development (AQP3, AQP5, SPINK7, and WIF1) in the skin tissues of FGF5-disrupted animals. Interestingly, functional enrichment analyses indicated that the DEGs are over-represented in GO terms associated with muscle fiber development in MSTN-disrupted goats, and GO terms related to skin development and immune responses in FGF5-edited animals. MSTN is primarily thought to inhibit muscle differentiation and growth (Poncelet, 1997; Mosher et al., 2007), while FGF5 represses hair growth by blocking dermal papilla cell activation (Hebert et al., 1994; Ota et al., 2002). Therefore, it is plausible that disruption of these key regulators triggers multiple functional regulatory genes at posttranscriptional levels and eventually resulting in observed phenotypic changes.

#### CONCLUSION

In summary, the present study provides a comprehensive analysis of genomes from edited animals and their progenies through deep sequencing. We provide sufficient evidence to show that the incidence of de novo mutations is low not only in edited founder animals (Wang et al., 2018a), but also in the F1 progenies, and their mutation rate is not different from what normally occurs in wild type animals as spontaneous mutations. This study will serve as a valuable resource for evaluating the reliability of the CRISPR-based genome editing technologies in the engineering the genomes of large animals.

#### DATA AVAILABILITY

All relevant results are within the paper and its **Supplementary data files**. The raw WGS data of 11 animals are available at the NCBI-SRA under accession nos. SRR6378093, SRR6378094,

fgene-09-00449 October 1, 2018 Time: 16:45 # 10

SRR6378095, SRR6378096, SRR6378097, SRR6378098, SRR6378099, SRR6378100, SRR6378101, SRR6378102, and SRR6378103. Transcriptomic data are available at NCBI-SRA under accession nos. SRR6411189, SRR6411190, SRR6411191, SRR6411192, SRR6411193, SRR6411194, SRR6411195, SRR6411196, SRR6411197, SRR6411198, SRR6411199, SRR6411200, and SRX743626.

## AUTHOR CONTRIBUTIONS

fgene-09-00449 October 1, 2018 Time: 16:45 # 11

XW, TS, BP, XH, YJ, and YC conceived the research plans. CL, SZ, YL, GL, YD, LL, and JL performed the experiments. LQ provided the samples. XW, TS, and BP wrote the article.

### REFERENCES


#### FUNDING

This study was supported by grants from National Natural Science Foundation of China (31772571 and 31572369), and local grants (2017NY-072 and 2018KJXX-009), as well as by China Agriculture Research System (CARS-39). XW is a Tang Scholar at Northwest A&F University.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00449/full#supplementary-material


activation. Biochem. Biophys. Res. Commun. 290, 169–176. doi: 10.1006/bbrc. 2001.6140


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with the authors XW, CL, SZ, YL, GL, YD, LL, JL, YJ, and YC at time of review.

Copyright © 2018 Li, Zhou, Li, Li, Ding, Li, Liu, Qu, Sonstegard, Huang, Jiang, Chen, Petersen and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fgene-09-00449 October 1, 2018 Time: 16:45 # 12

# Methodologies for Improving HDR Efficiency

Mingjie Liu<sup>1</sup> , Saad Rehman<sup>1</sup> , Xidian Tang<sup>1</sup> , Kui Gu<sup>1</sup> , Qinlei Fan<sup>2</sup> , Dekun Chen<sup>1</sup> \* and Wentao Ma<sup>1</sup> \*

*<sup>1</sup> College of Veterinary Medicine, Northwest A&F University, Xianyang, China, <sup>2</sup> China Animal Health and Epidemiology Center, Qingdao, China*

Clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein 9 (Cas9) is a precise genome manipulating technology that can be programmed to induce double-strand break (DSB) in the genome wherever needed. After nuclease cleavage, DSBs can be repaired by non-homologous end joining (NHEJ) or homology-directed repair (HDR) pathway. For producing targeted gene knock-in or other specific mutations, DSBs should be repaired by the HDR pathway. While NHEJ can cause various length insertions/deletion mutations (indels), which can lead the targeted gene to lose its function by shifting the open reading frame (ORF). Furthermore, HDR has low efficiency compared with the NHEJ pathway. In order to modify the gene precisely, numerous methods arose by inhibiting NHEJ or enhancing HDR, such as chemical modulation, synchronized expression, and overlapping homology arm. Here we focus on the efficiency and other considerations of these methodologies.

#### Edited by:

*David Jay Segal, University of California, Davis, United States*

#### Reviewed by:

*James Carney, Sandia National Laboratories (SNL), United States Alex Michael Ward, Sangamo BioSciences, United States*

#### \*Correspondence:

*Wentao Ma mawentao@nwsuaf.edu.cn Dekun Chen cdk@nwsuaf.edu.cn*

#### Specialty section:

*This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics*

Received: *31 August 2018* Accepted: *11 December 2018* Published: *07 January 2019*

#### Citation:

*Liu M, Rehman S, Tang X, Gu K, Fan Q, Chen D and Ma W (2019) Methodologies for Improving HDR Efficiency. Front. Genet. 9:691. doi: 10.3389/fgene.2018.00691* Keywords: CRISPR-Cas9, HDR, NHEJ, HDR enhancement, DSB, cell arrest, NHEJ inhibitors

#### OVERVIEW OF CRISPR-CAS9 SYSTEM

Clustered regularly interspaced short palindromic repeats (CRISPR) represents a family of DNA sequences in bacteria and archaea (Barrangou, 2015). This family is characterized by direct palindromic repeats, where sequences are the same in both directions, varying in size from 21 to 37 bp (Barrangou and Marraffini, 2014), interspaced by spacers, which have fragments gathered from viruses or phages that previously tried to infect the cell (Horvath and Barrangou, 2010; Morange, 2015). To appreciate this characteristic array, it is termed CRISPR (Jansen et al., 2002; Mojica and Rodriguez-Valera, 2016). CRISPR-associated (cas) genes are invariably located adjacent to a CRISPR locus (Jansen et al., 2002). The CRISPR-Cas system can be grouped into three types: type I, type II, and type III. In addition, there are 12 subtypes of the CRISPR-Cas system, which are based on their exclusive genetic content and structural differences (Makarova et al., 2015). Cas1 and cas2 are universal across types and subtypes, whereas cas3, cas9, cas10 are signature genes for type I, type II, and type III, respectively (Makarova et al., 2011). Here in this review we only focus on type II. The CRISPR-Cas system functions as a defense system in bacteria and archaea against bacteriophage infection, conjugation, and natural transformation by degrading foreign nucleic acid that enters the cell (Marraffini, 2015). The CRISPR-Cas system involves three distinct mechanistic stages: adaptation, biogenesis, and interference (Marraffini and Sontheimer, 2010). The adaptation stage involves the integration of fragments of foreign DNA (termed "protospacers," captured, excised, and inserted by Cas proteins) into the CRISPR array as new spacers. New spacers are usually added at the beginning of the CRISPR locus next to the leader sequence, creating a chronological record of viral infections (Sorek et al., 2013) and protecting the cell from further

**32**

infection. During the biogenesis stage, the CRISPR array is transcribed as a single long transcript (termed "precrRNA") containing much of the CRISPR array (Marraffini and Sontheimer, 2010) and is then processed and matured to produce CRISPR RNAs (crRNAs) with only one spacer sequence. As for the interference stage, the spacers in these crRNAs guide cas proteins to foreign DNAs and cleave them (van der Oost et al., 2009; Wiedenheft et al., 2012; Barrangou, 2013). The type II CRISPR-Cas system needs only cas9 to execute immunity in the presence of an existing targeting spacer sequence (Sapranauskas et al., 2011). It requires two small RNAs: the crRNA and the transencoded crRNA(tracrRNA) (Deltcheva et al., 2011). TracrRNA forms a secondary structure that interacts with cas9 protein and it has a complementary region that enables itself to bind to precrRNA (Anders et al., 2014; Jinek et al., 2014; Nishimasu et al., 2014). The dsRNA formed between pre-crRNA and tracrRNA is then handled by RNase III to form mature crRNA guides that are used in genome editing. When crRNA and tracrRNA are combined together, they are collectively termed as guide RNA (gRNA) (Jinek et al., 2012). Another important short (3–5 bp) DNA termed protospacer adjacent motif (PAM) is required for targeting. PAM is a component of the invading virus or plasmid, but it is not a component of the bacterial CRISPR locus. Cas9 will not successfully bind to or cleave the target DNA sequence if it is not followed by the PAM sequence (Mojica et al., 2009). The first step in target recognition is the transient binding of Cas9 to PAM sequences within the target DNA, which promotes the unwinding of the two DNA strands immediately upstream of the PAM (Sternberg et al., 2014), the spacer sequence of the crRNA binds with the unwinded DNA (6–8 bp in length), then forms an RNA-DNA heteroduplex and triggers cleavage at the targeted site (Sternberg et al., 2014; Szczelkun et al., 2014). After recognition, the CRISPR-Cas9 system introduces a crRNAspecific DSB in the target sequence, which is further resolved either by homology-directed repair (HDR) or non-homologous end joining (NHEJ).

### NHEJ PATHWAY

Typically, cells employ two main mechanisms to repair DSBs: classical NHEJ and HDR (Symington and Gautier, 2011). There are also many alternative error-prone DSB repair pathways: single-strand annealing (SSA) and breakage-induced replication (BIR) (Pardo et al., 2009; Jasin and Rothstein, 2013). SSA does not require a homologous template, and rejoining DNA ends with direct sequence repeats (Symington, 2014). BIR repairs one-ended DSBs, a process that is caused by the collapse of a replication fork (Mayle et al., 2015). When DSBs occur in cells, the first reaction is usually carried out in an NHEJ manner. Compared to other DNA repair and DNA recombination pathways, the NHEJ pathway is a robust, error-prone but predominant and fast pathway with high flexibility. It can recognize diverse end structures at DSBs and accomplish diverse repair results (Aravind and Koonin, 2001; Gu and Lieber, 2008; Salsman and Dellaire, 2017). NHEJ can be classified into two types: canonical NHEJ (c-NHEJ) and alternative NHEJ (alt-NHEJ), also called microhomology-mediated end-joining (MMEJ) (Bae et al., 2014). c-NHEJ is active throughout the cell cycle and stabilizes the DSB from translocations (Roth et al., 1995; Soutoglou et al., 2007). Based on different DNA ends, NHEJ is capable of employing different strategies. The whole process deals with assembling the core complex, which recognizes broken ends and keeps them together so that the following processing factors can act (Waters et al., 2014). The core complex is considered to include the Ku heterodimer (Ku80/70), the DNA-dependent protein kinase catalytic subunit (DNA-PKcs), DNA ligase IV, the X-ray repair cross-complementing protein 4 (XRCC4), the XRCC4-like factor (XLF, or Cernunnos). Ku is a heterodimer, composed of two subunits (70 and 83 kD), that recognizes and binds to blunt DSBs first (Walker et al., 2001). In c-NHEJ, Ku recruits DNA-PKcs to the DSB site and forms a very stable complex that remains bound to the end (Weterings et al., 2003). Their assembly activates the kinase activity of DNA-PK and orchestrates c-NHEJ (Davis et al., 2014; Radhakrishnan et al., 2014). DNA-PK phosphorylates a host of DNA damage response proteins and thus regulates c-NHEJ and DSB processing and recruits Artemis nuclease (Moshous et al., 2001). However, DNA-PK mostly phosphorylates itself, which is crucial in DSB processing (Neal et al., 2014). Artemis has 5′ to 3′ single-stranded DNA exonuclease activity and DNA-PKcs-dependent 5′ and 3′ endonuclease activity on hairpins and single-stranded overhangs (Moshous et al., 2001). Ku and DNA-PKcs alone can also promote multiple DNA end-processing activities at the break site. The X family of DNA polymerases (pol mu and pol lambda) adds missing nucleotides at the DSB ends (Daley et al., 2005; Paull, 2005). Next, the DSBs will be ligated by Ligase IV/XRCC4/XLF, which is regulated by DNA-PK. Ligase IV/XRCC4/XLF forms an extended filament that wraps and stabilizes DNA and stimulates ligation (Tsai et al., 2007; Andres et al., 2012). Recent research also showed that a newly identified PAXX (a paralog of XRCC4 and XLF), a member of the XRCC4 superfamily, is another important mediator of c-NHEJ, which interacts directly with Ku. In most cases DNA is repaired via the c-NHEJ pathway and its efficiency can approach nearly 90% (Yang et al., 2013; Dow et al., 2015), which constitutes the basis of CRISPR/Cas9 technology (Vartak and Raghavan, 2015). The NHEJ process is illustrated in **Figure 1**.

### HDR PATHWAY

HDR is a faithful repair pathway. It comes into action mainly in the S- or G2-phase of the cell cycle and requires homologous DNA sequences. Homologous recombination is the desired mechanism for precise genome editing, which only happens in the presence of a homologous duplex template to repair the broken site. When DSB occurs, pathway choice depends on end resection (Symington and Gautier, 2011). The MRE11-RAD50- NBS1 (MRN, MRX in yeast) complex recognizes dsDNA and first creates a nick 15–20 bp from the 5′ -ends of the DSB (Symington, 2014). Exonucleases such as SGS1-DNA2 and EXO1 complete the resection step (Kim and Mirkin, 2018). It then moves into flanking dsDNA regions and recruits ataxia telangiectasia mutated (ATM) kinase, the key upstream kinase of DSB signaling

(Falck et al., 2005), and interacts with CtIP (Makharashvili and Paull, 2015). MRN also tethers DNA ends, which increases its local concentration and thus facilitates ATM activation (Dupre et al., 2006).

The MRN complex consists of three subunits. MRE11 is a Mn2<sup>+</sup> dependent nuclease involved in homologous recombination, telomere maintenance, and DNA DSB repair (Paull, 2015). SAE2 activates MRE11 for its dsDNA-specific endonuclease activity (Cannavo and Cejka, 2014) and regulates the resection step during appropriate stages of the cell cycle (Mathiasen and Lisby, 2014). RAD50 belongs to the structural maintenance of chromosomes (SMC) family, and contains ATPase activity (de Jager et al., 2001). RAD50 becomes dimerized and its DNA-binding activity is activated after ATP binding. Two MRE11 genes will then bind to the ATPase heads of the RAD50 homodimer, enabling itself to interact with RAD50 (Williams et al., 2008). RAD50 forms the core of MRN and uses its extended coiled-coil domain to tether DSB ends during HR (Williams et al., 2010; Hohl et al., 2011). NBS1 contains a Fork-Head associated (FHA) domain and BRCA1 C-terminal (BRCT) domain, binds MRE11 and recruits ATM, linking the core MRN activities to DNA damage response (DDR) proteins domains at its N terminus (Glover et al., 2004; Williams et al., 2009).

Histone variant H2AX is phosphorylated by ATM, which becomes γH2AX throughout the area surrounding the breakage within seconds after damage occurs (Rogakou et al., 1998). This sets off elaborate ubiquitylation and SUMOylation cascades to promote recruitment of BRCA1 (Morris et al., 2009) and 53BP1 (Stewart, 2009) but it is not crucial for the activation of ATM substrates such as CHK2 and p53 (Kang et al., 2005). Then MDC1, a large nuclear factor, directly binds to γH2AX and functions as a molecular scaffold that interacts with ATM and NBS1, promoting further MRN accumulation. In addition, MDC1 also helps ATM spread on DSB-flanking chromatin and furthers H2AX activation (Spycher et al., 2008). It also mediates the accumulation of many DDR factors, including 53BP1 and BRCA1 (Wang et al., 2002; Stucki and Jackson, 2006; Kim et al., 2007). ATM leads to phosphorylation of DDR cascades such as BRCA1, Chk2, p53, etc., (Shiloh, 2003; Lavin, 2008). DSBs also activate ataxia telangiectasia and RAD3-related protein (ATR). Full ATR activation requires not only itself and DNA damage sensors but also proteins that function as signal transducers and effectors, such as RPA, RAD17, TopBP1, Claspin, and Chk1. ssDNA overhangs will be coated by replication protein A (RPA) rapidly, and the ssDNA-RPA complex acts as a scaffold to attract ATR/ATR-interacting protein (ATRIP) (Zou and Elledge, 2003) and other DNA damage checkpoint kinases to trigger DDR (Chen and Wold, 2014). It will then be replaced with adenosine triphosphate (ATP)-dependent recombinase RAD51 (San Filippo et al., 2008) with the help of BRCA1 and BRCA2 as described above (Prakash et al., 2015). ssDNA can be generated by nuclease resection, such as the MRN-C-terminal binding protein interacting protein (CtIP) for short resection, and EXO1/BLM for long resection (Mladenov et al., 2016). In mammals, resection depends on CtIP and needs to be phosphorylated by CDK first (Huertas and Jackson, 2009). BRCA1 contributes to HR by colocalizing with MRN after DNA damage occurs and interacts directly with the resection factor CtIP (Sartori et al., 2007). BRCA1 assists RAD51 binding to ssDNA by evicting RPA (Zelensky et al., 2014) and promotes the recruitment of BRCA2 to DSBs through the bridging protein PALB2 (Sy et al., 2009). BRCA1 also appears to inhibit the resection suppressor 53BP1 (Bunting et al., 2010). RAD51 is a DNA strand-exchange protein that exists in mammalian cells and forms a filament referred to as the presynaptic complex (van der Heijden et al., 2007; Hilario et al., 2009). The assembly of a RAD51 nucleoprotein filament promotes homologous search by locating and pairing the 3′ -overhang with a homologous duplex DNA and catalyzing strand invasion (termed single-end invasion, SEI) (Morrical, 2015; Ma et al., 2017). The two ends of the DSB are identical, but one end serves as the "first end," which searches for the homologous sequence and forms a displacement loops (D-loops) structure while the other end waits for the latter process (Kim and Mirkin, 2018). Besides RAD51, DNA strand exchange also requires RAD54 and RDH54/TID1, which performs this step by stabilizing RAD51-ssDNA presynaptic filaments (Mazin et al., 2003).

Resolution of the exchanged DNA strands includes the Holliday Junction (HJ) pathway and the synthesis-dependent strand annealing (SDSA) pathway. Dissolution is the primary pathway for HJ resolution, which involves the BLM helicase-Topoisomerase IIIα-RMI1-RMI2 (BTR) complex. The BTR complex promotes branch migration of Holliday junctions (Karow et al., 2000) and also acts to suppress crossing over during homologous recombination (Wu and Hickson, 2003). Thus, this dissolution pathway gives rise exclusively to non-crossovers. The other pathways use structure-selective resolvases (SLX-MUS complex and GEN1) to process the exchange intermediates and can produce both crossover and non-crossover products (West et al., 2015). The SLX1 and MUS81-EME1 nucleases bind in close proximity on the SLX4 scaffold and process HJs (Castor et al., 2013). SLX1 catalyzes the initial incision and MUS81 introduces the second cut on the opposing strand (Wyatt and West, 2014). GEN1 is a member of the RAD2/XPG family and can only access and cleave recombination intermediates when the nuclear membrane breaks down (Rass et al., 2010). GEN1 first forms a dimeric complex that contains the two active sites and then performs a dual symmetric incision at HJs, generating nicked duplex products that can be ligated.

The SDSA pathway also begins with the generation of a D-loop structure like the HJ pathway but also includes DNA synthesis in the 3′ -direction, which extends the heteroduplex (Daley et al., 2014). The translocating D-loop then collapses, and the other resected DSB end will anneal to this extended DSB end. Both ends will go through replicative extension and ligation, which generates non-crossover products.

When it comes to single-stranded template repair (SSTR), the repair mechanism is quite different from the dsDNA repair template scenario. Richardson et al. found that human Cas9 induced SSTR requires the Fanconi anemia (FA) pathway, which was previously implicated in responses to interstrand crosslinks rather than nuclease-induced breaks (Richardson et al., 2018). They confirmed that SSTR is RAD51-independent while dsDNA donor HDR is RAD51-dependent. After FA pathway knockdown, the efficiency of SSTR decreased while simultaneously the levels of NHEJ increased, and the total editing stayed relatively constant. This means that the FA pathway can drive the repair events from NHEJ to SSTR. Additionally, FA pathway knockdown specifically inhibits SSTR and has no effect on NHEJ. RAD51C and XRCC3 are required for SSTR, but RAD51B and XRCC2 are not. They also found that FANCD2, a central player in the FA pathway, enriched even in the absence of an exogenous homology donor. In short, SSTR is much more efficient than HDR from a dsDNA donor but still needs future investigations. The HDR pathway is demonstrated in **Figure 2**.

#### FAVORING THE HDR PATHWAY USING CHEMICAL AND GENETIC MODULATION

DSBs caused by Cas9 can go through both the NHEJ and HDR pathways, but in most cases they are handled by NHEJ (Frit et al., 2014), so it seems reasonable to inhibit key enzymes (e.g., DNA ligase IV) of the NHEJ pathway. Maruyama et al. (2015) investigated the effect of SCR7, a putative inhibitor of ligase IV, which targets the DNA binding domain of ligase IV, impeding its ability to bind to DSBs (Srivastava et al., 2012) in human epithelial (A549) and melanoma (MelJuSo) cell lines. Results showed that SCR7 promotes a 21 bp precise insertion (ssDNA donor with two 100 bp homology arms) in A549 cells 3-fold at 0.01µM and 19 fold for MelJuSo at 1µM. They also assessed the effect of SCR7 on insertion of a ∼800 bp fragment (ssDNA donor with ∼80 bp homology flanking sequence on both sides of the DSB) into a murine bone marrow-derived dendritic cell line (DC2.4 cells). After treating the DC2.4 cells with 1µM SCR7, the efficiency of insertion increased by ∼13-fold. It is worth mentioning that SCR7 affects lymphocyte development and can induce apoptosis. As additional approaches to DNA ligase IV inhibition, Chu et al. used the adenovirus 4 (Ad4) E1B55K and E4orf6 proteins to suppress NHEJ. These two proteins can mediate the ubiquitination and proteasomal degradation of DNA ligase IV (Forrester et al., 2011). Results showed that HDR efficiency was enhanced up to 7-fold (5 to 36%) by the Ad4 protein in human HEK293 cells. And in a mouse Burkitt lymphoma cell line, the addition of Ad4 proteins reduced transfection efficiency from 40 to 27%, but promoted HDR by 5-fold (Chu et al., 2015). Yu et al. identified two small molecules (L755507 and Brefeldin A) that could improve HDR efficiency. L755507, a β3-adrenergic receptor agonist, can increase HDR by 3-fold at 5µM in mouse ESCs. Brefeldin A, an inhibitor of intracellular protein transport from the ER to the Golgi apparatus, promotes HDR by 2-fold at 0.1µM in mouse ESCs (Yu et al., 2015).

Pinder et al. identified that RS-1 can enhance HDR between 3- and 6-fold, varying with the locus and transfection factor in HEK-293A human embryonic kidney and U2OS osteosarcoma cell lines (Pinder et al., 2015). RS-1 is a compound that stabilizes the association between Rad51 and DNA. They also found that an optimized ratio for Cas9/gRNA to homology donor plasmid doubled the HDR efficiency. Upon BRCA1 over-expression, HDR is increased by 2- to 3-fold.

Besides these inhibiting molecules, HDR can also be promoted using small interfering RNA (siRNA) to inhibit the expression of Ku protein, which is the pioneer protein in the NHEJ pathway. Li et al. assessed this method on pig fetal fibroblasts (Li et al., 2018). The result showed that by inhibiting Ku70 or Ku80 expression, HR can be promoted by 1.6- to 3 fold, as well as SSA and ssODN-mediated repairs. Yu et al. constructed a Rad51 and a Rad50 co-expression vector to evaluate its performance (Yu et al., 2011). It was determined that HR efficiency increased 110–245%. Chu et al. used short hairpin RNA (shRNA) in HEK293 cells to suppress key NHEJ pathway proteins such as KU70, KU80, and DNA ligase IV. Results showed that HDR efficiency was enhanced from 5 to 8–14% when transfected with single shRNAs against KU70, KU80, or DNA ligase IV. Furthermore, they found that the expression of the target gene diminished in the cells undergoing NHEJ blockade, which may be caused by local chromatin remodeling through extended DNA damage (Chu et al., 2015).

exchange intermediate: the D-loop structure. Most D-loop structures will be extended by DNA synthesis (dashed arrow). The second end pairs to the D-loop and starts extension. This pathway is called the double Holliday junction pathway. Ligation generates the characteristic double Holliday junction, which may be cleaved by HJ resolvases into either crossover or non-crossover products. The synthesis-dependent strand annealing pathway is illustrated on the right. After D-loop formation, replication and branch migration take place which can lead to D-loop translocation. The translocating D-loop is unstable and collapses easily. After collapse, the extended first end may anneal to complementary ssDNA in the resected second end. Replicative extension of both ends and ligation generates non-crossover products.

#### TIMED DELIVERY OF THE CRISPR-CAS9 SYSTEM

While HDR is typically restricted to the S and G2 phases of the cell cycle, its efficiency can be increased by synchronizing and capturing cells at the S and G2 phases or using timed delivery. Lin et al. combined cell cycle synchronization techniques, using chemical inhibitors to arrest the cells at specific phases of the cell cycle with direct nucleofection of pre-assembled Cas9 ribonucleoprotein (RNP) in HEK293T cells (Lin et al., 2014). Results showed that using lower Cas9/RNP concentrations and cell cycle arrest can improve HDR efficiency to 31% (3.4 fold) at maximum. Consistent with this strategy, Yang et al. successfully enhanced HDR by three- to six-fold using the microtubule polymerization inhibitor nocodazole or ABT-751 (Yang et al., 2016). As for non-proliferating cells, BRCA1 is inhibited by the 53BP1 (Escribano-Diaz et al., 2013) and KEAP1-CUL3 complexes (Orthwein et al., 2015), and RIF1 thus will not bind to DSB. In addition, CtIP can function normally after being phosphorylated by CDK, but CDK is absent when cells stay in the G0/G1 phase (Escribano-Diaz et al., 2013). In order to overcome this inhibition, Orthwein et al. overexpressed mutated activated CtIP. This depleted the 53BP1 and KEAP1-CUL3 complexes simultaneously, which successfully activated the HDR pathway in G1 cells (Orthwein et al., 2015).

#### ENHANCING HDR BY USING OVERLAPPING SEQUENCES

Several types of donor DNA have been used, such as plasmid DNA and synthetic oligonucleotides (Carroll and Beumer, 2014). Rational design of homology repair templates strongly enhances HDR efficiency (Renaud et al., 2016). By using a linear repair template with homologous flanks in zebrafish, HDR can increase by almost 10-fold (Irion et al., 2014). Chu et al. assessed the influence of the lengths of homology regions of the repair template on HDR efficiency (Chu et al., 2015). A homologous template is the most important component of HDRmediated genome editing. It usually contains intended mutations or insertions flanked by homologous regions. Templates can be plasmids (up to kilobases modification) or single-stranded oligodeoxynucleotides (ssODN) (50–100 nt modification). It is suggested that sequencing around the interested region should be carried out because cell-specific mutations and single nucleotide polymorphisms (SNPs) can influence gRNA targeting up to 6 fold, merely one mismatch in 100 bases (Tham et al., 2016). This problem can be overcome by amplifying the homology arm from the genomic DNA extracted from target cells or by synthesize consulting sequence analysis (Salsman and Dellaire, 2017). It is important to remember that DSB should always be as close as possible to the region of homology, within 10 nt up and to a maximum of 100 nt (Elliott et al., 1998). Furthermore, having each homology arm about 50– 100% the size of the payload that can promote HDR (Li et al., 2014).

#### ENHANCING HDR BY USING MODIFIED CAS9

As mentioned above, DSB ends must be resected so that they can enter the HDR pathway. CtIP, a key protein in early steps of DSB resection, is essential for HDR initiation. In order to ensure the presence of functional activated CtIP, Charpentier et al. fused a minimal N-terminal fragment of CtIP to the Cas9 protein (Charpentier et al., 2018). Forcing CtIP to the DNA cleavage site, through fusion to either catalytically dead Cas9 (dCas9) or Cas9 together with 800 bp homology arms, obtained a 2-fold increase in the efficiency of HDR in human fibroblast RG37DR cells, iPS cells, and rat zygotes. However, the expression of dCas9-CtIP is not sufficient by itself to stimulate integration. In addition, the patterns of indels induced by modified Cas9 were different from Cas9. This may be because the modified Cas9 induced a different balance of the end-joining pathway.

As for point mutation, current approaches to target base correction are inefficient and typically induce an abundance of random insertions and deletions at the target locus. Alexis et al. reported a powerful approach called a "base-editing (BE) system" to introduce specific point mutations without introducing DSB or a donor template by linking deaminases and dCas9 together (Komor et al., 2016). dCas9 helps deaminase to locate and deaminases modify cytidine to uridine. This conversion will then be repaired through various pathways (Hess et al., 2017). BE2 (optimized BE system) results from the addition of the uracil DNA glycosylase inhibitor, which increases the baseediting efficiency of the C>T substitution 3-fold (Hess et al., 2016). Additionally, the BE3 system was realized by changing the dCas9 to Cas9 D10A. This improvement achieved a 6 fold increase in efficiency over BE2 but exhibited a slightly increased indel frequency as nicks can lead to NHEJ at a low rate (Certo et al., 2011). Target-activation-induced cytidine deaminase (AID), a similar system, uses nickase Cas9 D10A to recruit the cytidine deaminase PmCDA1 to the target, achieving a mutation frequency of 35%. Adding UGI can obtain 2- to 3 fold increase in efficiency and reduction in deletions (Nishida et al., 2016). In addition, targeted AID-mediated mutagenesis (TAM) (Ma et al., 2016) and CRISPR-X (Hess et al., 2016) can generate transitions and transversions. These systems can both achieve the precision editing of single C>T bases with a low rate of indels as well as sequence diversification. By further Cas9 engineering, the inhibition of HR during the G1 phase will be overcome.

The CRISPR-Cas system has been fully studied and adapted for various applications over the decades, which gives us the ability to manipulate the genome as we desire. It has certain limitations, such as off-target effects (which can be overcame by rational sgRNA design; Doench et al., 2016) and low efficiency, which can be improved by utilizing the methodologies as described above. These promising strategies have proven their enhancement in the HDR pathway more than once with results varying from 2- to 30-fold. Combining various approaches can be a potential method of maximizing the rates of HDR. Here we only reviewed NHEJ inhibition by using inhibitors or hindering certain gene expression with siRNA or shRNA, CRISPR-Cas delivery in the G2/S phase, adding homologous arms in donor templets and using modified Cas9. These tactics surely make the CRISPR-Cas system more efficient. There is no doubt that more and more ways to boost CRISPR-Cas are imminent.

### AUTHOR CONTRIBUTIONS

ML, WM, and DC designed the structure of this review. ML wrote the first version of the manuscript. SR, XT, KG, and QF helped to revise the manuscript. All authors have reviewed the final version of the manuscript.

## FUNDING

This research was supported by the Qinghai Province Major R&D and Transformation Project (2018-NK-125), Xianyang Science and Technology Major Project (2017K01- 34), Key Industrial Innovation Chains of Shaanxi Province (2018ZDCXL-NY-01-06), and the PhD research startup fund of the Northwest Agriculture and Forestry University (00500/Z109021716).

## REFERENCES


minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191. doi: 10.1038/nbt.3437


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Liu, Rehman, Tang, Gu, Fan, Chen and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# FTO Knockout Causes Chromosome Instability and G2/M Arrest in Mouse GC-1 Cells

#### Tao Huang, Qiang Gao, Tongying Feng, Yi Zheng, Jiayin Guo and Wenxian Zeng\*

Laboratory of Reproductive Biology and Cell Engineering, College of Animal Science and Technology, Northwest A&F University, Xianyang, China

N 6 -methyladenosine (m6A) is the most abundant modification on eukaryotic mRNA. m6A plays important roles in the regulation of post-transcriptional RNA splicing, translation, and degradation. Increasing studies have uncovered the significance of m6A in various biological processes such as stem cell fate determination, carcinogenesis, adipogenesis, stress response, etc, which put forwards a novel conception called epitranscriptome. However, functions of the fat mass and obesity-associated protein (FTO), the first characterized m6A demethylase, in spermatogenesis remains obscure. Here we reported that depletion of FTO by CRISPR/Cas9 induces chromosome instability and G2/M arrest in mouse spermatogonia, which was partially rescued by expression of wild type FTO but not demethylase inactivated FTO. FTO depletion significantly decreased the expression of mitotic checkpoint complex and G2/M regulators. We further demonstrated that the m6A modification on Mad1, Mad2, Bub1b, Cdk1, and Ccnb2 were directly targeted by FTO. Therefore, FTO regulates cell cycle and mitosis checkpoint in spermatogonia because of its m6A demethylase activity. The findings give novel insights into the role of RNA methylation in spermatogenesis.

#### Keywords: N<sup>6</sup> -methyladenosine, FTO, spermatogonia, cell cycle, chromosome instability, mitotic checkpoint

### INTRODUCTION

Life-long male fertility relies on spermatogenesis that is responsible for the generation of millions of sperm (Fok et al., 2014). Spermatogenesis is a complex developmental process that consists of three stages: mitosis of spermatogonia, meiosis of spermatocyte and transformation of sperm from haploid spermatids (Kanatsu-Shinohara and Shinohara, 2013). Thus, spermatogonia are the cornerstone of sperm production (Hamra et al., 2004). Nevertheless, the underlying mechanism regulating spermatogonial proliferation and differentiation remains largely elusive.

Over 100 different types of chemical modifications have been found in RNAs, among which the N 6 -methyladenosine (m6A) is mostly prevalent in eukaryotes (Niu et al., 2013). In general, m6A is mainly enriched near the stop codon, within the consensus motif DRACH (D = A, G, U; R = A, G; H = A, C, U) (Dominissini et al., 2012). In most species, m6A is installed by the "writer" complex that is composed of METTL3, METTL14, WTAP, and several unknown components (Liu et al., 2014). Interestingly, m6A can be erased by the demethylase FTO and ALKBH5 (Zheng et al., 2013), and is recognized by the YTH-domain-containing proteins (Zhu et al., 2014). Growing evidences have indicated that m6A is involved in post-transcriptional processes including translation, mRNA degradation, alternative splicing, and microRNA maturation, thus affecting gene expression

#### Edited by:

David Jay Segal, University of California, Davis, United States

#### Reviewed by:

Huabing Li, Shanghai Jiao Tong University School of Medicine, China Milind Ratnaparkhe, ICAR-Indian Institute of Soybean Research, India Masoud Zamani Esteki, Maastricht University Medical Centre, Netherlands

> \*Correspondence: Wenxian Zeng zengwenxian2015@126.com

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 31 August 2018 Accepted: 22 December 2018 Published: 21 January 2019

#### Citation:

Huang T, Gao Q, Feng T, Zheng Y, Guo J and Zeng W (2019) FTO Knockout Causes Chromosome Instability and G2/M Arrest in Mouse GC-1 Cells. Front. Genet. 9:732. doi: 10.3389/fgene.2018.00732

**41**

(Wang et al., 2014, 2015; Alarcon et al., 2015). Recent studies have elucidated the significance of m6A in the regulation of stem cell fate determination, pre-adipocytes differentiation, DNA damage response, self-renewal of neural stem cells, and T cell homeostasis (Batista et al., 2014; Zhao et al., 2014; Li H.B. et al., 2017; Xiang et al., 2017; Wang et al., 2018).

The importance of m6A in spermatogenesis have been preliminarily revealed. Depletion of METTL3 leads to inhibition of spermatogonial differentiation and arrest of meiosis initiation, resulting in infertility (Xu et al., 2017). Furthermore, double knockout of METTL3 and METTL14 in advanced germ cells with Stra8-Cre disrupts spermiogenesis (Lin et al., 2017). Knockout of ALKBH5 causes severe apoptosis in spermatogonia and spermatocytes (Zheng et al., 2013). Consistently, YTHDC2 deficient leads to arrest of meiosis at zygotene spermatocytes (Hsu et al., 2017). These evidences provide the proof of concept with respect to the involvement of m6A in spermatogenesis.

The fat mass and obesity-associated (Fto) gene, located at chromosome 16 in humans, encodes FTO protein, which belongs to the α-ketoglutarate-dependent dioxygenase alkB family. Loss of FTO leads to postnatal growth retardation and a significant reduction in adipose tissue and lean body mass (Fischer et al., 2009). It has been sufficiently demonstrated that FTO possesses the activity of m6A demethylase, which regulates preadipocyte differentiation, the leukemic oncogene-mediated cell transformation, and tumorigenesis of glioblastoma stem cells (Li L. et al., 2017; Li Z.J. et al., 2017). Recent studies have revealed the significance of FTO in the regulation of neural development and stress response (Du et al., 2018; Engel et al., 2018). Interestingly, two missense mutations in Fto are associated with the reduced semen quality in azoospermic patients (Landfors et al., 2016). However, little is known about the functions of FTO in spermatogenesis.

The aim of the present study was to gain more insights into the role of FTO in spermatogonia division. To this end, we employed the mouse GC-1 spermatogonial cell line as a research model and performed loss-of-function study by CRISPR/Cas9. We found that knockout of FTO triggered abnormal chromosome segregation and cell cycle arrest. This phenotype could be partially rescued by wild-type FTO but not mutant FTO. FTO depletion elevated the m6A level of core mitosis checkpoint complex (MCC) components and G2/M regulators. Therefore, FTO regulates cell cycle and mitosis checkpoint in spermatogonia because of its m6A demethylase activity.

#### MATERIALS AND METHODS

#### Cell Culture and Plasmid Transfection

The mouse spermatogonia cell line (GC-1) were maintained in Dulbecco's Modified Eagle's Medium (GE) with 10% fetal bovine serum (Gibco), 100 U/ml penicillin and 0.1 mg/ml streptomycin (PS) and incubated at 37◦C with 5% CO2. For plasmid transfection, cells were seeded to 6 well plate (2 × 10<sup>5</sup> cells per plate) and cultured overnight. Plasmids were transfected to cells using TurboFectTM Transfection Reagent (Thermo Fisher ScientificTM) following the instructions. Twenty-four hours post-transfection, cells were subjected to puromycin (2 µg/ml, Sigma) selection for 2 days.

### Antibodies

The primary and secondary antibodies were purchased from commercial sources as follows: Mouse anti-FTO, Mouse anti-Mad2, Mouse anti-Cdc20, Mouse anti-Bub1, Mouse anti-Bub1b, Mouse anti-Bub3, Mouse anti Tubulin (Santa Cruz Biotechnology), Rabbit anti m6A (Synaptic Systems), Rabbit anti-Actin (Sigma-Aldrich). HRP-goat anti rabbit IgG (CWbio) and HRP-goat anti mouse IgG (CWbio).

#### Vectors Construction

For knocking out FTO in GC-1 cells, the following sgRNAs were designed and synthesized, sg-FTO1U: 5<sup>0</sup> - ACCGCCGTCCTGCGATGATGAAG-3<sup>0</sup> , sg-FTO1D: 5<sup>0</sup> -AAAC CTTCATCATCGCAGGACGG-3<sup>0</sup> , sg-FTO2U: 5<sup>0</sup> -ACCGGAAC TCTGCCATGCACAG-3<sup>0</sup> , sg-FTO2D: 5<sup>0</sup> -AAACCTGTGCATG GCAGAGTTC-3<sup>0</sup> . The PGL3-U6-PGK plasmid (gifted from Shanghai Tech University) was used as the backbone. Plasmid was ligated with annealed sgRNAs via T4 ligase (Thermo Fisher Scientific). For the FTO rescue experiment, total RNA was extracted from GC-1 cells using RNAiso plus Reagent (Takara Clontech). cDNA was synthesized by the first strand cDNA synthesis kit (Takara Clontech) following the manufacturer's instructions. The following primers were designed and synthesized for the amplification of FTO CDS, FTO-res-F: 5<sup>0</sup> -GAATCTAGAATGAAGCGCGTCCAGAC-3<sup>0</sup> , FTO-res-R: 5<sup>0</sup> -GGAGAATTCTGCTGGAAGCAAGATCCTAG-3 0 . PCR products were purified by the PCR clean-up Kit (Axgen). CD513B plasmid and purified PCR products were digested by restriction enzymes EcoRI and XbaI (NEB), following by ligation using the T4 ligase.

For the FTO mutant experiment, the following primers were designed and synthesized. FTO-mut-1F: 5 0 -GAATCTAGAATGAAGCGCGTCCAGAC-3<sup>0</sup> , FTOmut-1R: 5<sup>0</sup> -GCGTGAGTGGAACTAAACGCAGGCTGTGA GCCAGC-3<sup>0</sup> , FTO-mut-2F: 5<sup>0</sup> -GCTGGCTCACAGCCTGCG TTTAGTTCCACTCACCG-3<sup>0</sup> , FTO-mut-2R: 5<sup>0</sup> -GGAGAAT TCTGCTGGAAGCAAGATCCTAG-3<sup>0</sup> . cDNA of FTO was used as the PCR template. PCR products were purified using Gel Extraction Kit (Omega) following by recombinant using Neotec reagent. Recombined fragments were purified and ligated with CD513-B1 plasmids.

#### T7E1 Assay

Genomic DNA was extracted using phenol-chloroform followed by ethyl alcohol precipitation. For indels detection, following primers were designed and synthesized, FTO-F: 5<sup>0</sup> -CCAGTGTCTCGCATCCTCATC-3<sup>0</sup> , FTO-R: 5<sup>0</sup> - TTACTCATCCTCAGAGCCTCAGA-3<sup>0</sup> . PCR products were purified using PCR clean up Kit. The purified DNA was annealed following by digested by T7 endonuclease (NEB). After digestion

at 37◦C for 30 min, DNA was analyzed by the agarose gel electrophoresis. Image J was used to calculate the cleavage efficiency.

## Establishment of the FTO−/<sup>−</sup> Cell Strain

Plasmids expressing cas9 and sgRNAs were co-transfected to spermatogonia using the TurboFectTM Transfection Reagent as previously described. Twenty-four hours post-transfection, cells were screened using 2 µg/ml puromycin for 2 days. The residual cells were suspended to 300 cells/ml and seeded to the 100-mm-dish. After 7 days of culture, mono clones were observed under the microscope. Monoclones were picked and transferred to the 96-well plate (one clone per well) followed by a 7-day culture. Subsequently, genomic DNA of each cell clone were extracted using the QuickExtractTM DNA Extraction Solution 1.0 (Epicenter) following the manufacturer's instructions. The DNA fragments containing sgRNA target sites were amplified using PCR followed by Sanger sequencing. Cell strains harboring frameshift mutations within Fto locus in di-alleles were considered as the Fto−/<sup>−</sup> cell strain.

### m6A Dot Blot

Total RNA was extracted from cells using Trizol reagent (TAKARA). mRNA was isolated and purified using Poly Attract mRNA Isolation System III with Magnetic Stand (Promega) following the manufacturer's instructions. For m6A dot blot, mRNA was hybridized onto the Hybond-N+ membrane (GE Healthcare). After crosslinking at 80◦C for 30 min, the membrane was blocked with 5% non-fat milk (Bio-Rad) for 1 h, incubated with rabbit anti-m6A antibody (1:1000, Synaptic Systems) at 4◦C overnight. Then the membrane was incubated with HRP-conjugated goat anti-rabbit IgG at room temperature for 2 h. After being incubated with Immobilon Western Chemiluminescent HRP Substrate (Millipore), the immunocomplex was photographed using the ECL imaging system (Bio-Rad). Finally, the membrane was stained with 0.02% methylene blue to eliminate the difference in mRNA amount. Relative m6A level was quantified via gray intensity analysis using ImageJ.

#### Western Blot Assay

Cells were lysed with RIPA buffer containing 1% PMSF followed by ultrasonication. Cell lysates were incubated on ice for 30 min, centrifuged at 10,000 g for 10 min. The supernatants were collected and the protein concentration was detected using a BCA detection Kit. Equal amount of proteins was loaded to the polyacrylamide gel. The proteins were separated through SDS-PAGE using the electrophoresis apparatus (Bio-Rad). After electrophoresis, the proteins were transferred to the PVDF membrane (Millipore, IBFP0785C) using a semi-dry transfer instrument (Bio-Rad). The membranes were blocked with 5% non-fat milk for 1 h at room temperature, incubated with primary antibodies at 4◦C overnight. Subsequently, the membranes were washed with PBST and incubated with HRP-conjugated secondary antibodies for 1 h at room temperature. After washing, the membranes were incubated with the Immobilon Western Chemiluminescent HRP Substrate (Millipore, United States) and photographed using the ECL imaging system (Bio-Rad, United States).

### Flow Cytometric Analysis

For cell cycle analysis, cells were suspended in 75% cold ethanol and treated with 0.1% Triton X-100 and 100 µg /ml RNase at 37◦C for 30 min. Subsequently, the cells were stained with 50 µg/ml PI for 2 h and analyzed by flow cytometry. For cell clustering analysis, cells were fixed in cold 70% ethanol, permeablized with 0.1% Triton X-100. Then the cells were stained with 4<sup>0</sup> ,6-diamidino-2-phenylindole (DAPI, Thermo Fisher Scientific) for 30 min and analyzed by flow cytometry.

#### Quantitative Real-Time PCR

Cells were lysed with Trizol regent (TAKARA). Total RNA was isolated by chloroform followed by precipitating with isopropanol. cDNA was synthesized with the PrimeScriptTM RT reagent Kit (TAKARA) following the manufactory's instructions. Primers designed and synthesized for RT-qPCR were listed in **Supplementary Table S1**. Quantitative PCR was performed using the SYBR Green II PCR Mix (TAKARA) and the IQ5 (Bio-Rad).

#### Chromosome Spread Assays

Wild-type and FTO-KO cells were cultured in complete medium to 70% confluence and treated with 50 ng/µL nocodazole for 16 h. Cells were collected and subjected to hypotonical swell in 75 mM KCl at 37◦C for 30 min. Subsequently, cells were fixed in Carnoy's fluid (methanol: acetic acid 3:1) at room temperature for 30 min. Cells were dropped onto pre-cooling glass slides and air dried. Slides were stained with Hochest 33342 (1:500) and photographed under the fluorescence microscope. For each biological repetition, chromosome number of 150 cells were counted and analyzed.

#### Immunofluorescence

For immunofluorescence analysis, cells were fixed in 4% paraformaldehyde/PBS for 30 min, permeabilized in 0.5% Triton X-100/PBS and blocked with 5% bovine serum albumin (BSA). After washed with PBS for three times, the cells were incubated with rabbit anti-CREST antibody (1:200) and mouse anti β-tubulin (1:200) antibody at 4◦C overnight. Then the cells were washed for another three times with PBS and incubated with FITC-conjugated goat anti rabbit and rhodamine red-conjugated goat anti-mouse secondary antibodies (1:2000) at room temperature for 1 h. Cells were washed in PBS for three times and counterstaining with DAPI. Images were photographed under an inverted fluorescence microscope (Olympus, IX71).

## m6A-IP-qPCR

Total RNA was extracted from cells using the RNAiso plus regent (TAKARA). mRNA was isolated using the PolyATtract <sup>R</sup> mRNA Isolation Systems (Promega, Z5310) following the manufacturer's instructions. The m6A-IP was performed as previously described

(Dominissini et al., 2012). In brief, 3 µg mRNA was mixed with 12.5 µL of rabbit anti m6A antibody (0.5 mg/mL, Synaptic Systems, 202003), 5× IP buffer (50 mM Tris–HCl, pH 7.4, 750 mM NaCl, and 0.5% NP-40), RNA inhibitor and DEPCtreated nuclease free water to make 500 µL of IP mixture. Protein A beads were washed with wash buffer (IP buffer mixed with RNA inhibitor) for three times, and then blocked with 0.5 mg/mL BSA. After blocking, the beads were incubated with IP mixture and rotated at 4◦C overnight followed by extensive washing. Bound RNA was eluted using 100 µL elution buffer (1× IP buffer, 6.7 mM m6A). For m6A level measurement, 40 ng of IP-RNA and Input-RNA was used for cDNA synthesis. The m6A <sup>+</sup> mRNA level was finally determined by real-time quantitative PCR.

#### RNA-Decay Assay

WT cells and FTO-KO cells were treated with 5 µg/mL actinomycin D for 0, 3, and 6 h, respectively. Cells were harvested and subjected to RNA extraction. Real-time quantitative PCR were used to analyze the mRNA level of target genes in each group.

#### Statistical Analysis

All data were collected from at least three independent experiments. Data were analyzed using two-tailed Student's t-test or one-way ANOVA followed by a Duncan's multiple range test (SPSS 22 for windows). Significance were presented as <sup>∗</sup>p < 0.05, ∗∗p < 0.01, and ∗∗∗p < 0.001. Error bars represented SEM of the mean.

## RESULTS

### Depletion of FTO in Spermatogonia Subjected to CRISPR-Cas9 Gene Editing

The endogenous FTO of spermatogonia was abrogated using the CRISPR-Cas9 genome editing technique with Fto-specific sgRNAs. Two sgRNAs targeting exon 3 of Fto were designed and synthesized (**Figure 1A**). The CRISPR-Cas9 mutations resulted in 7 and 49 nucleotide deletions in the two alleles of the Fto gene, respectively, and thus induced the frameshift mutations at the target sites (**Figure 1A**). Western blot analysis showed that the FTO protein was completely absent in the KO cells (**Figure 1B**). We next detected total m6A level of mRNA extracted from wild type cells and FTO-KO cells. The blot signal strength showed significant increases in mRNA of FTO depletion cells (**Figure 1C**), indicating that depletion of FTO elevated the m6A level in spermatogonia.

### FTO Depletion Induces Formation of Multinuclear Giant Cells

CCK-8 assay was used to detect cell viability. FTO depletion did not affect cell viability (**Figure 2A**). The morphology in the FTO-KO cells was markedly different from the Cas9 transfected controls, which was characterized by an increasing population of cells with large cell sizes and spreading areas (**Figure 2B**). In general, a small number of giant cells can be found in the wild type cells, which was considered as the binucleated spermatocytes. Un-expectedly, in the FTO-KO cells the rate and size of giant cells dramatically increased

(**Figure 2D**). To further investigate the giant cells in detail, we stained cell nuclei using DAPI. As shown in **Figure 2C**, giant cells were aneuploidies that contained large and irregular nuclei. Flow cytometry analysis showed that the ratio of aneuploidies was significantly increased in FTO-KO cells compared with WT cells (**Supplementary Figure S1**). These results suggested that FTO deletion caused aneuploidy formation in spermatogonia.

### FTO Depletion Suppresses Chromosome Segregation

Either cell fusion or abnormal chromosome segregation probably leads to aneuploidy formation. To determine whether the increase in aneuploidy proportion was caused by cell fusion, we stained the cells using the double fluorescent tracer assay. The cells stained with red dye and the one stained with green dye were mixed and cultured for 48 h, the fused cells showed bifluorescence (**Figure 2E**). However, the proportion of fused cells in FTO-KO cells was not different with that in wild-type cells (**Figure 2F**), indicating that the giant cells were not induced by cell fusion.

To investigate whether the aneuploidy was induced by abnormal chromosome segregation, we counted the chromosomes in wild-type cells and FTO-KO cells through the chromosome spreading assay. Interestingly, chromosome number in FTO-KO cells showed a significant increase, compared with wild-type cells (**Figures 3A,B**). The mitotic checkpoint complex (MCC) is the effector of the spindle assembly checkpoint (SAC) that prevents cells from undergoing cytokinesis when the spindle is assembled improperly with chromosome at metaphase (Lara-Gonzalez et al., 2012). Previous

studies have reported that dys-regulation of MCC components resulted in chromosomal instability and aneuploidy (Kapanidou et al., 2015). Here, we hypothesized that FTO depletion induced the formation of aneuploidy due to aberrant expression of MCC. To verify it, we detected the expression of the core MCC components Mad1, Mad2, Bub1, Bub1b, Bub3, and Cdc20. Interestingly, the expression of all detected MCC components significantly decreased in FTO-KO cells both in mRNA and protein levels (**Figures 3C,D**). These results suggested that FTO deletion suppressed chromosome segregation and induced aneuploidy formation through up-regulation of MCC expression.

#### FTO Depletion Arrests G2/M Transition

Previous studies have shown that m6A methylation is correlated with cell cycle progress during oocyte meiotic maturation (Qi et al., 2016). Therefore, we presumed that FTO regulates cell cycle in spermatogonia. To this end, we analyzed the cell cycle by flow cytometry. Interestingly, we found that the proportion of G2 stage cells significantly increased in FTO-KO cells, compared with wild-type cells (**Figures 4A,B**). We next detected the expression of core regulatory proteins involved in G2/M transition. As shown in **Figures 4C,D**, the expression of CDK1 and CCNB2 was significantly downregulated in FTO-KO cells, indicating that FTO modulated G2/M transition through regulating the expression of Cdk1/Ccnb2 complex.

### FTO Regulates Cell Cycle and Aneuploidy Formation Through the m6A Demethylase Activity

Previous studies have reported that mutation of the critical amino acid residue 313R to A (R313A) in the catalytic center of FTO protein can completely ablate its m6A demethylase activity (Zhao et al., 2014). Hence, to investigate whether the FTO knockout phenotype in spermatogonia is due to its m6A demethylase activity, we constructed two lentivirus vectors that expressed wild-type FTO (named FTO-wt) and R313A mutant FTO (named FTO-mut), respectively (**Figure 5A**). We next established three cell lines by transfection of the FTO-wt, the FTO-mut and the control (GFP) lentivirus to the FTO-KO cells, respectively. Western blot analysis showed that the FTO expression was rescued in FTO-wt and FTO-mut cells, but not control cells (**Figure 5B**). We next detected the proportion of aneuploidy and G2 stage cells in the three cell lines. Interestingly, the rate of aneuploidy and G2 stage cells in FTO-wt group was significantly less than those in the FTO-mut and control cells, indicating that the FTO depletion phenotype could be partially rescued by wildtype FTO but not mutant FTO (**Figures 5C–F**). These results suggested that FTO regulated cell cycle in spermatogonia mainly through its m6A demethylase activity.

To verify whether FTO deletion leads to increase of m6A level in the transcripts of target genes, we performed m6A-IP-qPCR. As shown in **Figure 6A**, in the m6A-IP transcripts, the abundance of Bub1b, Mad1, Mad2, Cdk1 and Ccnb2 was

significantly up-regulated in FTO depletion group, while Bub1, Cdc20, and Bub3 were undetectable under the sensitivity of q-PCR, suggesting that m6A level in the transcripts of Bub1b, Mad1, Mad2, Cdk1, and Ccnb2 is elevated due to FTO knockout.

To further demonstrate whether the increased m6A level accelerates the degradation of target mRNAs, we performed an RNA decay assay. Cells were treated with 5 µg/mL actinomycin D for 0, 3, and 6 h and harvested for RNA extraction. The remained mRNA level was normalized by real-time quantitative PCR. As shown in **Figure 6B**, mRNA stability of Mad1, Mad2, Bub1b, CDK1, and Ccnb2 were significantly decreased after FTO depletion. These results suggested that FTO regulated the expression of target transcripts through the regulation of RNA stability.

Together, these data indicate that FTO directly regulates the expression of the core MCC components and G2/M regulators through the m6A/RNA decay pathway, thus regulating cell cycle and mitosis checkpoint in spermatogonia.

### DISCUSSION

Spermatogenesis is a highly dynamic developmental process involving intricate regulation of gene expression. The significance of m6A in spermatogenesis has increasingly been unraveled. FTO, the first discovered m6A demethylase, regulates RNA splicing, stability or translation, thereby making a difference in cell fate determination (Li L. et al., 2017). However, FTO function in spermatogonia remain unclear. Here, we established a Fto-null mouse spermatogonial cell line using CRISPR/Cas9 system. We found that FTO deletion led to aneuploidy formation and G2/M arrest. We further demonstrated that FTO demethylated five transcripts of core MCC components and G2/M regulators. The findings suggest that FTO regulates chromosome segregation and cell cycle progression via m6A demethylase activity.

Accurate segregation of duplicated chromosomes is indispensable for the cell division. The accurate chromosome segregation relies on precise temporal regulation of sequential processes including the orientation of bipolar spindle, the attachment of kinetochore and microtubules and the separation of daughter cells during cytokinesis (Thompson et al., 2010). Error occurred at any step may lead to chromosome missegregation and aneuploidy formation (Meraldi, 2016). The mitotic checkpoint is a safeguard mechanism against the aneuploidy formation (London and Biggins, 2014). When chromosomes fails to assemble with spindle, the checkpoint activates to inhibit the downstream anaphase promoting complex (APC/C), resulting in prevention of cells from entry the next cell cycle (Thompson et al., 2010). The importance of MCC in the regulation of chromosome instability have been largely reported (Lara-Gonzalez et al., 2012). Chromosome segregation errors in mitosis are the most common cause for aneuploidies formation in vitro, as well as in clinical cancer samples (van Jaarsveld and Kops, 2016). Hence, the MCC components have been selected as promising targets for the therapy of cancers (Tanaka and Hirota, 2016). In the present study, we found that FTO regulates the expression of core MCC components, thus regulating chromosome segregation. The roles of FTO and

m6A in the regulation of chromosome stability have not been reported yet. Therefore, our results suggested that FTO may play important roles in the progression of seminoma for the first time. It will be interesting to deeply investigate the functions of FTO in seminoma carcinogenesis.

The Cdk1/Ccnb complex is the main composition of the maturation-promoting factor (MPF) that triggers the G2/M transition (Adhikari and Liu, 2014). The role of Cdk1/Ccnb in the regulation of metaphase arrest during oogenesis have been well documented (Turner, 2015). In contrast, studies on the function of MFP in spermatogenesis are limited. Clement et al. (2015) reported that decreased expression of Cdk1 caused late meiotic arrest and infertility in mice. Recent studies showed that FTO regulates the expression of CDK2 and CCNB2, thus affecting cell cycle progression during adipogenesis (Wu et al., 2018). In the present study, we showed that FTO depletion led to decreased expression of the Cdk1 and Ccnb2, resulting in G2/M arrest in spermatogonia.

A few studies have reported the significance of m6A relative proteins in spermatogenesis (Zheng et al., 2013; Hsu et al., 2017; Lin et al., 2017). The underlying mechanisms of how m6A regulates the spermatogenesis remain obscure. Previous studies mainly analyzed the global m6A methylome, combined the variation of m6A peaks with the differentiation in global expression or splicing of transcriptome, thus determined the function of m6A in stability or splicing of target transcripts (Xu et al., 2017). In the present study, we detected the m6A level of target genes via the m6A-IP-qPCR assay, which can precisely reveal the m6A variation on target transcripts. We found that three transcripts of the core MCC components (Bub1b, Mad1, and Mad2) and two transcripts of the G2/M regulatory proteins (Cdk1 and Ccnb2), were directly targeted by FTO. We also demonstrated that the increased m6A level retarded the stability of target transcripts. Recent reports have shown that FTO simultaneously demethylates m6A and m6A<sup>m</sup> in mammalian cells. To further elucidate the mechanism by which FTO knockout leads to the phenotypes, it will be interesting to detect m6A<sup>m</sup> through the miCLIP-seq (Mauer et al., 2017; Wei et al., 2018). Additionally, our results showed that m6A modification did not occur in the transcripts of Bub1, Bub3, and Cdc20, indicating that FTO regulates the three genes through other pathways. It is no doubt that changes on gene expression can be global after FTO depletion. Though the target genes we focused on are directly associated with the phenotypes, contributions of other differentially expressed genes through other pathways should not be ignored. The mechanisms of m6A

Data were represented by the mean ± SEM, n = 3, <sup>∗</sup>p < 0.05, ∗∗p < 0.01.

on the regulation of gene expression are comprehensive. Changes in m6A level can make differences in mRNA decay, splicing or translation, which relies on the recognition of different readers. Here we mainly focused on the effects of FTO on the decay of target transcripts. Hence, to gain deeper insights into the regulatory role of FTO to the phenotype, RNA-seq combined with splicing analysis and translation efficiency assay will be necessary.

Male infertility has been becoming a worldwide problem in recent years (Mascarenhas et al., 2012). To understand the underlying mechanisms of spermatogenesis is important for the precise therapy of male infertility. As spermatogonia are the precursor of male germ cells, to elucidate the regulation of spermatogonia homeostasis is important for understanding male infertility. The present study first revealed the role of RNA demethylase FTO in the regulation of chromosome instability and cell cycle progression in spermatogonia, thus giving novel insights into the role of RNA methylation in spermatogenesis and potentially, in seminoma progression. Our studies are limited to the functions of FTO in immortalized cell line.It will be important to generate conditional knockout mice to gain better understandings of the roles of FTO plays in spermatogenesis.

### CONCLUSION

In conclusion, knockout of FTO triggered aberrant chromosome segregation and cell cycle arrest, which could be partially rescued by wild-type FTO but not mutant FTO. FTO depletion elevated the m6A level of core MCC components and G2/M regulators. Therefore, FTO regulates cell cycle and mitosis checkpoint in spermatogonia through the m6A/mRNA degradation pathway. Our findings give novel insights into the role of RNA methylation in spermatogenesis.

### DATA AVAILABILITY STATEMENT

All datasets (generated/analyzed) for this study are included in the manuscript.

### AUTHOR CONTRIBUTIONS

TH and WZ conceived and designed the experiments. TH, QG, TF, and JG performed the experiments. TH analyzed the data. TH, WZ, and YZ wrote the manuscript.

#### FUNDING

This study was supported in part by the National Natural Science Foundation of China (Grant No. 31572401) to WZ.

#### ACKNOWLEDGMENTS

fgene-09-00732 January 17, 2019 Time: 18:38 # 10

We thank Yungui Yang and Ying Yang from the Beijing Institute of Genome Research for the advices on m6A-IP. Thanks to

#### REFERENCES


Yinghua Lv for the technical support. We also thank Jiaying Li from the Northwest A&F University for the analysis of flow cytometry. Thanks to all the members of Zeng laboratory for the helpful discussion.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00732/full#supplementary-material


self-renewal through histone modifications. Nat. Neurosci. 21, 1139–1139. doi: 10.1038/s41593-018-0169-2


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Huang, Gao, Feng, Zheng, Guo and Zeng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Precise and Rapid Validation of Candidate Gene by Allele Specific Knockout With CRISPR/Cas9 in Wild Mice

Tianzhu Chao1,2† , Zhuangzhuang Liu1,2† , Yu Zhang1,2, Lichen Zhang1,2, Rong Huang1,2 , Le He1,2, Yanrong Gu1,2, Zhijun Chen1,2, Qianqian Zheng1,2, Lijin Shi1,3, Wenping Zheng1,2 , Xinhui Qi1,2, Eryan Kong<sup>1</sup> , Zhongjian Zhang<sup>1</sup> , Toby Lawrence<sup>4</sup> , Yinming Liang1,2,3 \* and Liaoxun Lu1,2,3 \*

#### Edited by:

Zhiying Zhang, Northwest A&F University, China

#### Reviewed by:

Shun Li, Fudan University, China Ashwin S. Shetty, Harvard University, United States Feng Gu, Wenzhou Medical University, China

#### \*Correspondence:

Yinming Liang yinming.liang@foxmail.com Liaoxun Lu luliaoxun@foxmail.com †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 26 August 2018 Accepted: 04 February 2019 Published: 19 February 2019

#### Citation:

Chao T, Liu Z, Zhang Y, Zhang L, Huang R, He L, Gu Y, Chen Z, Zheng Q, Shi L, Zheng W, Qi X, Kong E, Zhang Z, Lawrence T, Liang Y and Lu L (2019) Precise and Rapid Validation of Candidate Gene by Allele Specific Knockout With CRISPR/Cas9 in Wild Mice. Front. Genet. 10:124. doi: 10.3389/fgene.2019.00124 1 Institute of Psychiatry and Neuroscience, Xinxiang Medical University, Xinxiang, China, <sup>2</sup> Henan Key Laboratory of Immunology and Targeted Therapy, School of Laboratory Medicine, Xinxiang Medical University, Xinxiang, China, <sup>3</sup> Laboratory of Genetic Regulators in the Immune System, Henan Collaborative Innovation Center of Molecular Diagnosis and Laboratory Medicine, School of Laboratory Medicine, Xinxiang Medical University, Xinxiang, China, <sup>4</sup> Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom

It is a tempting goal to identify causative genes underlying phenotypic differences among inbred strains of mice, which is a huge reservoir of genetic resources to understand mammalian pathophysiology. In particular, the wild-derived mouse strains harbor enormous genetic variations that have been acquired during evolutionary divergence over 100s of 1000s of years. However, validating the genetic variation in non-classical strains was extremely difficult, until the advent of CRISPR/Cas9 genome editing tools. In this study, we first describe a T cell phenotype in both wild-derived PWD/PhJ parental mice and F1 hybrids, from a cross to C57BL/6 (B6) mice, and we isolate a genetic locus on Chr2, using linkage mapping and chromosome substitution mice. Importantly, we validate the identification of the functional gene controlling this T cell phenotype, Cd44, by allele specific knockout of the PWD copy, leaving the B6 copy completely intact. Our experiments using F1 mice with a dominant phenotype, allowed rapid validation of candidate genes by designing sgRNA PAM sequences that only target the DNA of the PWD genome. We obtained 10 animals derived from B6 eggs fertilized with PWD sperm cells which were subjected to microinjection of CRISPR/Cas9 gene targeting machinery. In the newborns of F1 hybrids, 80% (n = 10) had allele specific knockout of the candidate gene Cd44 of PWD origin, and no mice showed mistargeting of the B6 copy. In the resultant allele-specific knockout F1 mice, we observe full recovery of T cell phenotype. Therefore, our study provided a precise and rapid approach to functionally validate genes that could facilitate gene discovery in classic mouse genetics. More importantly, as we succeeded in genetic manipulation of mice, allele specific knockout could provide the possibility to inactivate disease alleles while keeping the normal allele of the gene intact in human cells.

Keywords: CD44, allele specific knockout, CRISPR/Cas9, functional genomics, wild mice

## INTRODUCTION

fgene-10-00124 February 16, 2019 Time: 17:31 # 2

Wild mice refer to both inbred lines and individual animals from natural house mouse populations, both of which harbor enormous genetic variations. Such models are particularly useful for study of genetic and environmental factors contributing to host immune response to pathogens (Rosshart et al., 2017). C57BL/6 and PWD/Ph strains are wild-derived mouse strains representing two major subspecies of house mouse, namely M. m. domesticus and M. m. musculus (Guenet and Bonhomme, 2003). The M. m. domesticus derived B6 mice and M. m. musculus derived PWK mice have highly diverged genomes with over 17 million single nucleotide polymorphisms (SNPs) which far outnumbers the genetic variations between classical laboratory strains (Keane et al., 2011). Over 90% of the genomic composition of classical laboratory mouse strains are mainly derived from subspecies M. m. domesticus, and remnants of the M. m. musculus genome are extremely rare (Yang et al., 2011). Therefore, from a genetics perspective, it is interesting to analyze phenotype of the M. m. musculus derived strain and harness the functional genetic variations which could not be found in classical mice (Gregorova and Forejt, 2000). We compared phenotype of T lymphocytes between B6 and PWD strains in an attempt to search for genetic factors contributing to T cell biology, which is essential to understand host defense against infection and cancer (Malissen and Bongrand, 2015). We found that a typical subset of naïve CD4 T cells expressing high levels of CD62L and low levels of CD44, was absent in the PWD strain. Both CD4 and CD8 T cells express higher levels of CD44 on cell surface in PWD mice. To map the genetic factor(s) responsible for this phenotype, we crossed B6 and PWD mice to generate F1 hybrids, interestingly the F1 mice had an identical phenotype to PWD mice, suggesting a dominant effect of these gene(s). We then backcrossed F1 mice to B6 to segregate the causative genetic alleles. Indeed, in the backcrossed population, we found 50% mice (275 out of 559) displaying the PWD T cell phenotype.

By means of genome wide scanning with genetic markers, we identified a single locus on Chr2 of PWD mice. Since the PWD strain was involved in a very particular genetic resource, named chromosome substitution strains, which are available for rapid validation of genetic mapping, we used the C57BL/6J-Chr2PWD/Ph/ForeJ strain which carries the entire Chr2 from PWD on the pure B6 background (Gregorova et al., 2008). We found that such B6.PWD-Chr2 mice have an identical T cell phenotype to PWD mice. In further fine mapping experiments, we found Cd44 per se was among the candidate genes in the mapped locus which was responsible for the T cell phenotype. CD44 is a cell surface marker for memory T cells and regulates memory cell survival (Baaten et al., 2010). In further experiments, we sought to inactivate the PWD derived Cd44 locus in F1 hybrids via CRISPR/Cas9 genome editing, as previous studies showed that PAM sequences were necessary to cleave target DNA and allele specific modification of DNA sequence could be performed in mice (Hsu et al., 2013; Wu et al., 2013). To perform functional gene validation involving wild mice in our study, we employed allele specific genome editing which was also reported in human iPSCs and rats, we first analyzed the sequences between PWD mice and B6 mice (Yoshimi et al., 2014; Smith et al., 2015). In the coding sequence of Cd44, two SNPs constitute CRISPR/Cas9 PAM sequences only for PWD mice which enables allele specific knockout of Cd44. Therefore, we could validate the functional relevance of this gene in F1 hybrids by specific knockout of the PWD allele. We designed two sgRNAs which were co-injected into B6 eggs fertilized by PWD sperms along with the CRISPR/Cas9 machinery. The results showed that among the 10 newborns, 80% of the mice carried ORF loss mutations of Cd44 from PWD origin, with no mistargeting of the B6 Cd44 allele. The CRISPR/Cas9 engineered F1 mutant mice had a phenotype identical to B6 mice. Therefore, we validated the causative gene for this phenotype, and our study provided a strategy for functional gene identification via allelespecific knockout, which could be useful for forward genetic studies in mice.

### MATERIALS AND METHODS

#### Animals

PWD/PhJ and Chromosome substitution mice C57BL/6J-Chr 2 PWD/Ph/ForeJ strain (B6.PWD-Chr2, Stock No: 005995) were purchased from the Jackson Laboratory<sup>1</sup> via distribution by Shanghai MAOSHENGYAN Biologic Science & Technology Co., Ltd., ICR outbred foster mice and C57BL/6 mice were purchased from Beijing Vital River Laboratory Animal Technology Co., Ltd., and all animal procedures were performed according to guidelines approved by the committee on animal care at Xinxiang Medical University.

### Generation of CD44 Allele Specific Knockout F1 Mice

Single nucleotide polymorphisms or SNPs information of Cd44 gene between PWK and C57BL/6 mice were analyzed from Mouse Genomes Project<sup>2</sup> . The SNPs were validated in PWD mice by Sanger sequencing before design of allele specific knockout sgRNAs. The sgRNA (Shao et al., 2014) and Cas9 mRNA (Liang et al., 2015) were produced by in vitro transcription (IVT) as descripted previously. 4-week old female C57BL/6N mice were intra-peritoneally injected Pregnant Mare Serum Gonadotropin (PMSG) at 5 p.m. followed by Human Chorionic Gonadotropin (hCG, 10 units/mouse) 48 h later and then mated with 12 week old male PWD mice immediately. Fertilized embryos were collected in the next morning and Cas9 mRNA (50 ng/µL) and sgRNA (50 ng/µL) were microinjected into the cytoplasm of fertilized embryos by using a standard microinjection system (Eppendorf TransferMan <sup>R</sup> 4r, Eppendorf, Germany). Survived eggs were cultured at 37◦C in 5% CO2 over-night and in the next day were transferred into the oviductal ampullas of the surrogate ICR mice.

Genomic DNA of F1 newborn tails were subjected to PCR analysis by using Phusion High-Fidelity DNA

<sup>1</sup>www.jax.org

<sup>2</sup>https://www.sanger.ac.uk/science/data/mouse-genomes-project

Polymerase (Thermo Fisher Scientific) with 5<sup>0</sup> -FAMlabeled primers to amplify two loci targeted by sgRNAs (Primer pair 1: GCTTTCTGGGGTGCTCTTCT; AGAGTA TGTGGGTGAAGGGG. Primer pair 2: TGGATGTGAGATTGG GTCGAAG; GGCAGCATGTGTCGAGAATTAC). The PCR products were run on an ABI 3730 DNA analyzer and data analyzed by GeneMapper software V3.1. The positions of the peaks indicate the lengths of PCR products (Velasco et al., 2007). For sequencing, PCR products were further cloned into T-vector (Tiangen, China). In general, 10 colonies were picked from each agar plate and were proceeded to Sanger sequencing.

#### Immunophenotyping by Flow Cytometry

The splenocytes and thymocytes of mice were stained with monoclonal antibody mixture and analyzed by flow cytometry. For activated T cell analysis, in vitro stimulation with anti-CD3 (3 µg/mL, 145-2C11, BD Biosciences) and anti-CD28 (1 µg/mL, 37.51, BD Biosciences) was performed in total splenocytes. The antibody labeling experiments were done as described in our previous studies for mouse immunophenotyping (Liang et al., 2013). In brief for splenocytes, 1 million cells were stained in 50 µL with antibody mixes CD3 (Alexa700), CD4 (eflour450), CD8 (PE), CD25 (PE-Cy5.5), CD44 (Brilliant Violet 605), CD45 (APC-eflour780), CD62L (APC), and TCRβ (FITC). Live cells were gated by Sytox blue staining and acquired on the FACS Canto flow cytometer (BD, United States). For immunophenotyping of the thymocytes, 2 million cells were stained in 50 µL with antibody mixture CD4 (PE), CD8 (Brilliant Violet 421), CD25 (APC), CD44 (Brilliant Violet 605) and acquired on the FACS Canto flow cytometer (BD, United States). The FACS data was analyzed using Flowjo software version 10.0.

#### QTL Mapping

The inbred strain C57BL/6 female mice and PWD/PhJ male mice were crossed to obtain F1 hybrids. The resultant F1 mice were crossed to C57BL/6 mice to establish N2 backcross population for linkage mapping. Hundred and twenty short random repeat (STR) markers were used to construct a genetic map using the function est.map from the R/qtl package. The phenotypic data and genetic map were analyzed with R/qtl using a standard interval mapping method (Broman et al., 2003). Initial interval mapping was performed with the R/qtl function scannone using EM algorithm (Lander and Botstein, 1989), and a significant threshold at p = 0.05 for the selected T cell trait was determined with 1000 permutation. Next, refining the localization of QTL with logarithm of odds (LOD) score above threshold. To fine-map the major QTL implicated in mouse T cell phenotype, CD44 expression in CD4 T cells, we selected markers flanking the candidate gene on Chr2, 3 STR markers D2Mit75, D2Mit97 and D2Mit100 were used to screen the N2 individuals for recombinants with 491 mice. Then 37 recombinant mice were genotyped with 6 additional STR markers to narrow the mapped interval. The primers of the markers used for genotyping are listed in **Supplementary Table S2**.

## Statistical Analysis

GraphPad Prism software (version 7.0) was used for data analysis and statistical significance was assessed by unpaired, two-tailed Student's t-test. Data were presented as Mean ± SEM. <sup>∗</sup>p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.

### RESULTS

#### Wild Derived PWD/PhJ Mice Display a Distinct T Cell Phenotype From C57BL/6 Mice

T lymphocytes are pivotal components in the immune system of mammals to fight against pathogens and tumors and their development is under sophisticated molecular control which requires genetic dissection (Liang et al., 2013; Wang et al., 2017). In both CD4 and CD8 T cells, which are the two major subsets of T lymphocytes, CD44 expression on the cell surface is used as important marker to monitor T cell activation (Huet et al., 1989; DeGrendele et al., 1997). In a standard B6 mouse, assessment of CD44 expression on the cell surface helps define activated memory T cells. In vitro, T cell receptor (TCR) stimulation with anti-CD3 and anti-CD28, results in high CD44 expression (Flaherty and Reynolds, 2015). Interestingly, T cells from PWD mice displayed a distinct CD44 expression pattern in steady-state compared to B6 mice, and an increase of effector memory T cells upon TCR (CD3 and CD28) stimulation. As shown in **Figures 1A,B**, the CD44highCD62Lhigh compartment among CD4-positive T cells, termed central memory cells (R2), was strikingly increased in PWD mice compared to B6 counterparts. While CD44lowCD62Lhigh naïve T cells (R1), conversely, were dramatically reduced. In addition, age matched PWD mice had significantly less CD44highCD62Llow effector memory CD4 T cells (R3). Among CD8 T cells of PWD mice, the CD44highCD62Lhigh compartment was also dramatically increased in frequency (R5), while CD44lowCD62Lhigh naïve T cells (R4), were again significantly reduced (**Figure 1C**). Next, we compared the mean fluorescent intensity (MFI) of CD44 by surface staining of CD4 and CD8 T cells; the MFI of CD44 in PWD CD4 T cells was 2.3-fold higher than in B6 mice, and in CD8 T cells CD44 expression was increased 5.8-fold (**Figures 1D,E**). Since CD44 expression can be induced by TCR stimulation, we analyzed the CD44 levels of CD4 and CD8 T cells between PWD and B6 mice following TCR stimulation. Notably, CD4 and CD8 T cells from PWD and B6 mice both showed a dramatic increase of CD44 expression 24 h after TCR stimulation, which could facilitate T cell migration to sites of inflammation. PWD mice had more CD44highCD62Llow effector memory cells among both CD4 and CD8 T cells (R6, R7), following in vitro activation by TCR stimulation (**Figures 1F,G**). Unexpectedly, the MFI of CD44 on both CD4 and CD8 T cells was 3-fold higher in PWD mice following TCR stimulation (**Figures 1H,I**). These results showed that PWD mice had significantly higher expression of CD44 in steady-state, and more importantly, CD44 upregulation following TCR stimulation was more potent in T cells of PWD mice.

and TCRβ was used to define T cells in which CD4 and CD8 T cell subsets were further analyzed. For activated T cell analysis, in vitro stimulation with anti-CD3 and anti-CD28 was performed in total splenocytes. CD4 and CD8 T cells were analyzed for naïve and memory T cell frequencies in percentage and mean fluorescence intensity or MFI. (A) The gating method for T cells in the spleen of B6 and PWD mice. (B) Frequency of CD44lowCD62Lhigh cells (R1), CD44highCD62Lhigh cells (R2) and CD44highCD62Llow cells (R3) in CD4 T cells (CD4+CD8<sup>−</sup> cells) of B6 and PWD mice. (C) Frequency of CD44lowCD62Lhigh cells (R4) and CD44highCD62Lhigh cells (R5) in CD8 T cells (CD4−CD8<sup>+</sup> cells) of B6 and PWD mice. (D) MFI of CD44 in CD4 T cells of B6 and PWD mice, FMO control was used as negative control by staining all the surface labeling antibodies except CD44. (E) MFI of CD44 in CD8 T cells of B6 and PWD mice, FMO control was used as negative control by staining all the surface labeling antibodies except CD44. (F) The gating method for immune cells in the spleen of B6 and PWD mice after anti-CD3 (3 µg/mL, coated) and anti-CD28 (1 µg/mL, soluble) stimulation. (G) Frequency of CD44highCD62Llow cells in CD4 T cells (R6) and CD8 T cells (R7) of B6 and PWD mice after anti-CD3 (3 µg/mL, coated) and anti-CD28 (1 µg/mL, soluble) stimulation. (H) MFI of CD44 in CD4 T cells of B6 and PWD mice after anti-CD3 (3 µg/mL, coated) and anti-CD28 (1 µg/mL, soluble) stimulation. (I) MFI of CD44 positive cells in CD8 T cells of B6 and PWD mice after anti-CD3 (3 µg/mL, coated) and anti-CD28 (1 µg/mL, soluble) stimulation. Representative FACS data were from two independent experiments involving at least six animals for each group of mice. Data were analyzed by two-tailed Student's t-test. Data were presented as Mean ± SEM. <sup>∗</sup>p < 0.05, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.

### The Splenic CD4 T Cell Phenotype of PWD Mice Is Dominant in F1 Hybrids

Since CD44 is implicated in T cell mobilization and differentiation into immunotolerant T cells, we analyzed the inheritance pattern of the T cell phenotype observed in PWD mice, prior to genetic mapping of the chromosomal region containing the causative gene(s) (DeGrendele et al., 1997; Wu et al., 2014). In F1 hybrids derived from B6 and PWD crosses, the CD4 T cells had significantly higher CD44 expression in ex vivo assays, closely resembling the phenotype of parental PWD mice with respect to R1 and R2 compartments, shown in **Figure 2A**. Furthermore, both F1 and PWD parental strains had dramatically different distributions of naïve and central memory T cells (**Figure 2B**). CD44 expression by CD4 T cells, was significantly higher in the PWD parental strain and F1 mice than that of B6 mice, and this increase in CD44 was not distinguishable between male and female F1 mice (**Figure 2C**). Since the increased CD44 expression on CD4 T cells was observed in both F1 and parental PWD mice, we regard this phenotype as "dominant," even though the phenotype in CD8 T cells was less stringent (data not shown). We further analyzed the CD44 expression in F1 hybrids in thymocytes which give rise to T cells. CD4−CD8−double negative (DN) thymocytes, which are the progenitors of T cells, cells were further divided into DN1, DN2, DN3, and DN4 populations (**Figure 2D**). As shown in **Figures 2E,F**, F1 mice either displayed an intermediate phenotype, between the two parental strains of B6 and PWD, or no significant difference from B6 mice, for multiple parameters including the thymocyte number and CD44 expression. In more complete comparisons between F1 and parental mice for CD44 expression in various populations in thymus, we found that F1 phenotype was generally intermediate (**Figure 2G**). These results indicated that the dominant PWD phenotype in CD4 T cells of increased CD44 expression, was restricted to T cells in the periphery. Therefore, we used CD4 T cell phenotype to perform further genetic mapping experiments to facilitate identification of causative gene.

#### A Single Genetic Locus on Chr2 Controls the PWD CD4 T Cell Phenotype

To map the genetic factor(s) regulating the T cell phenotype we observed in PWD mice, we constructed a backcross pedigree for linkage mapping by crossing F1 hybrids to B6 mice (Mihola et al., 2009; Su et al., 2009). In the

resultant backcross N2 animals, we observed obvious phenotype segregation in 1:1 ratio, suggesting a single gene or closely linked genes were responsible for the phenotype. Among the 559 N2 mice, 275 mice had high CD44 expression, another 284 animals had low CD44 expression, sex ratio was close to 1:1. Genome wide scanning with 120 microsatellite markers was performed initially with 68 N2 mice (**Figure 3A**). We found the highest LOD score was located at Chr2 D2Mit395 (**Figures 3B,C**). From the chromosome substitution (CS) mouse resource or consomic mice, the C57BL/6J-Chr 2 PWD/Ph/ForeJ (B6.PWD-Chr2, Stock No: 005995) strain carries a complete Chr2 from PWD and the rest of the genome is of C57BL/6 origin. In these CS mice we confirmed the T cell phenotype, originally found in PWD mice. In the B6.PWD-Chr2 CS strain, CD44 expression on CD4 T cells was comparable to the PWD mice and significantly higher than the B6 mice (**Supplementary Figures S1A,B**). Therefore, we mapped the genetic locus for this phenotype to Chr2, and further validated the localization of the causative locus in CS mice. Following initial linkage mapping, fine mapping using an additional 491 N2 mice, further identified that a 1 cM chromosomal segment between D2Mit300 and D2Mit127 was responsible for this phenotype (**Figure 3D** and **Supplementary Table S1**). Within this chromosomal segment, there were 21 protein coding genes. Referring to Immunological Genome Project database (Heng et al., 2008; Benoist et al., 2012), 8 genes were expressed in T cells which included Cd44, Cat, Caprin, Trim44, Ldlrad3, Traf6, Rag1, and Rag2 (**Supplementary Table S1**). We suspected Cd44 as the candidate gene, since the T cell phenotype was defined by changes in CD44 expression, and more notably among the candidate genes only Cd44 mRNA expression was higher in PWD CD4 T cells than the B6 controls (Immgen database, and data not shown).

#### Allele Specific Knockout of PWD Cd44 in F1 Mice by CRISPR/Cas9

We suspected that Cd44 per se was responsible for the dominant T cell phenotype in PWD and F1 mice, based on fine mapping which resulted in 21 candidate genes and 2 differentially expressed genes between the parental strains (**Supplementary Table S1**). To confirm the dominant effect of the Cd44 allele originating from PWD mice on the T cell phenotype we observed in F1 mice, we designed allele specific knockout (ASK) to inactivate only the PWD copy of Cd44 gene. CRISPR/Cas9 genome editing requires guide RNA and PAM sequence to target specific DNA elements. In the absence of PAM, genome editing efficiency was not detectable. To validate that PWD Cd44 itself determines the T cell phenotype we observed in B6/PWD F1 hybrids, we sought to inactivate only the PWD copy of the gene, since the phenotype was dominant. The exon and intron structure of murine Cd44 and SNPs between B6 and PWK, which is closely related to PWD, are depicted in **Figure 4A**. The whole genome sequence and SNPs for PWK, but not PWD, mice are available from the Mouse Genomes Project<sup>3</sup> . The CD44 protein coding sequences of B6 and PWK were aligned and 8 out of 19 exons had SNPs (**Figure 4A**). Among these SNPs we selected those that formed PAM (5<sup>0</sup> -NGG-3<sup>0</sup> ) sequences which were only existing in the PWK mice, and such SNPs were later confirmed in PWD mice by Sanger sequencing (**Figure 4B**). The PWD copy of Cd44 were sequenced for cDNA, and we found 2 amino acid absence in comparison to B6 protein sequence (**Supplementary Data Sheet S1**). The guide RNA was prepared by IVT, and potential off-targets were analyzed using the CRISPOR software, as described previously (Luo et al., 2018). Before microinjection of the CRISPR/Cas9 machinery designed to specifically target Cd44 of PWD mice, fertilized eggs were prepared by superovulation of B6 female mice aged 4–6 weeks and fertilization with sperm cells from PWD male mice aged 10–12 weeks. Two sets of allele specific guide RNAs and Cas9 mRNA were co-injected (**Figure 4C**). The engineered eggs were developed in ICR outbred foster mice and genotyped using mouse tail tip DNA from newborns by fluorescent PCR and capillary gel electrophoresis (**Figure 4D**). As shown in **Figure 4E**, in total 170 fertilized eggs were subject to microinjection and 111 live eggs were transplanted, resulting in 8 mice with Indels. In theory, the Indels should only occur for the PWD allele of Cd44, which possesses PAM sequences that are absent in B6 mice.

<sup>3</sup>www.sanger.ac.uk/sanger/Mouse\_SnpViewer

To validate the consequence of genome editing by allele specific targeting, we sequenced all the DNA samples from mice which carried Indel mutations by TA cloning of the PCR products and Sanger sequencing (Luo et al., 2018). Indeed, the Indel mutations were stringently restricted to the PWD allele, and all the B6 alleles tested were completely intact (**Figure 4F** and **Supplementary Table S3**). By selecting PAM sequences specific to the allele of PWD, we obtained animals carrying only the B6 copy of Cd44

by using capillary electrophoresis analysis (left) and Sanger sequencing (right).

### Allele Specific Knockout of PWD Cd44 Rescues the T Cell Phenotype in Hybrid F1 Mice

gene in F1 hybrids which were used for further phenotyping.

The T cell phenotype we observed in PWD and F1 mice was comparable, furthermore, the phenotype was consistent in N2 backcross mice and distributed in mendelian ratio. We mapped the causative PWD gene to Chr2 and further validated the phenotype using chromosome substitution mice, which have a clean genomic background identical to B6 animals, except for the donor chromosome. CRISPR/Cas9 genome editing allowed us to obtain F1 animals that were specifically deficient in the Cd44 allele of PWD origin. In such allele specific knockout (ASK) mice, we tested the T cell phenotype and found that the phenotype in ASK mice differed significantly from that of PWD and F1 mice, and resembled the B6 phenotype. As shown in **Figures 5A–C**, comparison of the T cell phenotype between F1 mice and Cd44ASK F1 mice for CD4 and CD8 subsets revealed that inactivation of PWD CD44 resulted in significant alteration, notably increased frequency of CD44lowCD62Lhigh naïve CD4 (R1) and CD8 T cells (R3) and conversely, decrease of CD44highCD62Lhigh central memory CD4 (R2) and CD8 T cell (R4) frequencies. Cd44ASK F1 mice, that were deficient in only the PWD copy of Cd44, had significantly decreased CD44 MFI in both CD4 and CD8 T cells (**Figures 5D,E**). We performed further experiments to compare Cd44ASK F1 mice and B6 mice, interestingly both CD4 and CD8 T cells had decreased CD44 expression in allele specific knockout F1 mice, to a level comparable to B6 T cells (**Supplementary Figures S2A–C**). Therefore, allele specific knockout of PWD CD44 in F1 mice was sufficient to restore the T cell phenotype to that of parental B6 mice. This allele specific knockout strategy demonstrated that a candidate gene from a specific parental origin can be precisely targeted in hybrid mice.

### DISCUSSION

Mouse genetics contributes enormously to understanding of human pathophysiology (Demant, 2003; Nguyen and Xu, 2008).

Functional mirroring of human immune cells in mice has led to discovery of novel cellular and genetic components of the immune system (Liang et al., 2013; Wang et al., 2016). Gene discovery via forward genetic mapping in mice has been highly successful in the last three decades, as genome wide scanning revealed thousands of genetic loci implicated in various diseases and immunological traits (Peters et al., 2007; Siggs, 2014). However, identification of the causative gene in the mapped loci, remains extremely difficult, due to extensive linkage disequilibrium and low efficiency in validation of candidate genes (Peters et al., 2007). The development of chromosome substitution mice provides a straight forward and rapid method for validation of mapped loci, maintaining a homogeneous genomic background identical to the reference C57BL/6 mice, however, cloning the causative gene is still challenging. In this study, we first discovered a T cell phenotype which was dramatically different between B6 and PWD mice, and further mapped locus on Chr2 responsible for the phenotypic difference. To validate the causative role of the mapped locus, we analyzed the chromosome substitution line B6.PWD-Chr2 and found the same phenotype as observed in PWD parental mice. Such results excluded genetic factors outside Chr2 that could co-contribute to the T cell phenotype we observed in PWD mice.

Fine mapping can be achieved with the N2 progeny by increasing the number of samples, to analyze more recombinants, or increasing the generations of backcrossing to generate more recombination flanking the causative gene(s) (Jeffs et al., 2000). In the initial mapping of the dominant locus, we used 68 mice,

and for fine mapping we analyzed 491 animals that gave rise to an interval confining the causative gene inside a 1 cM segment on Chr2 of PWD strain. Among the candidate genes, we highly suspected Cd44 itself as causative gene, as the CD44 protein was more abundantly expressed in CD4 T cells of PWD mice (Heng et al., 2008). Our experiments reflect numerous other studies using forward genetics in mice to identify genetic variation in inbred lines based on phenotyping and to test for candidate genes via linkage mapping, however, the challenge to confirm the role of the PWD allele in regulating T cell phenotype lied in obtaining selective knockout of this allele and test its consequence on a background that should otherwise maintain the phenotype. Knockout of the gene on a B6 background does not provide direct evidence of the functional role, since B6 mice themselves do not have the PWD phenotype. We found that F1 hybrids keep the PWD phenotype, therefore we could test by specific CRISPR/Cas9 mediated targeting of PWD allele of candidate gene Cd44.

We have established a procedure for producing allele specific knockout of a candidate gene by designing sgRNA with PAM sequences that only exist on the PWD background. Even though both copies of the gene from B6 and PWD parents were exposed to CRISPR/Cas9 machinery, precise and allele specific targeting of only the PWD allele of Cd44 was achieved in our study, and it is interesting to note that such experiments could be further improved by applying variants of Cas9 nuclease to obtain higher fidelity in genome editing and non-canonical PAM sequences could be applied to fit broader range of genomic contexts (Zhang et al., 2014; Kleinstiver et al., 2016) In the F1 animals which were mutated only in the PWD copy of Cd44 had an altered phenotype compared to B6 parental mice. Therefore, by allele specific targeting of PWD CD44, we confirmed the functional role of CD44 itself in maintaining the CD44highCD62Lhigh population only found in PWD mice. The molecular mechanisms behind how PWD CD44 contributes to this T cell phenotype, still remain to be elucidated, which could provide further hints to uncover new roles for CD44 since this molecule has been already found crucial for regulatory T cell development (Wu et al., 2014). More importantly, our approach of allele specific knockout could provide a new strategy to inactivate disease alleles while keeping the normal allele of the gene intact.

#### REFERENCES


#### DATA AVAILABILITY STATEMENT

All datasets for this study are included in the manuscript and the **Supplementary Files**.

#### AUTHOR CONTRIBUTIONS

YL designed the project and wrote the manuscript. LL supervised the project, analyzed the data, and prepared the figures. TC and ZL performed the experiments and analyzed the data. YZ established the N2 population and performed the initial QTL mapping. ZL, RH, LH, YG, ZC, QZ, LS, WZ, and XQ were involved in preparation of mRNA, mouse embryos and genotyping. EK and ZZ assisted in supervising the project. TL contributed to writing and revising the manuscript. All authors read and approved the final manuscript.

### FUNDING

The work was supported by NSFC No. 31400759 and No. 81471595 to YL, NSFC No. 81501342 to LL, and by the Foundation of Henan Educational Committee No. 16HASTIT030 to YL.

### ACKNOWLEDGMENTS

We would like to thank Assegai Medical Laboratory Xinxiang for assistance with the capillary array electrophoresis-based genotyping, and we are also like to thank the core facility of flow cytometry in Xinxiang Medical University for technical support.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00124/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Chao, Liu, Zhang, Zhang, Huang, He, Gu, Chen, Zheng, Shi, Zheng, Qi, Kong, Zhang, Lawrence, Liang and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Guidelines for Fluorescent Guided Biallelic HDR Targeting Selection With PiggyBac System Removal for Gene Editing

Javier Jarazo† , Xiaobing Qing† and Jens C. Schwamborn\*

Developmental and Cellular Biology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg

The development of new and easy-to-use nucleases, such as CRISPR/Cas9, made

#### Edited by:

David Jay Segal, University of California, Davis, United States

#### Reviewed by:

Amar M. Singh, University of Georgia, United States Michael Tsang, University of Pittsburgh, United States

\*Correspondence:

Jens C. Schwamborn jens.schwamborn@uni.lu

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 30 November 2018 Accepted: 22 February 2019 Published: 13 March 2019

#### Citation:

Jarazo J, Qing X and Schwamborn JC (2019) Guidelines for Fluorescent Guided Biallelic HDR Targeting Selection With PiggyBac System Removal for Gene Editing. Front. Genet. 10:190. doi: 10.3389/fgene.2019.00190 tools for gene editing widely accessible to the scientific community. Cas9-based gene editing protocols are robust for creating knock-out models, but the generation of single nucleotide transitions or transversions remains challenging. This is mainly due to the low frequency of homology directed repair, which leads to the screening of a high number of clones to identify positive events. Moreover, lack of simultaneous biallelic modifications, frequently results in second-allele indels. For example, while one allele might undergo homology directed repair, the second can undergo non-homologous end joining repair. Here we present a step-wise protocol for biallelic gene editing. It uses two donors carrying a combination of fluorescent reporters alongside homology arms directed to the same genomic region for biallelic targeting. These homology arms carry the desired composite of modifications to be introduced (homozygous or heterozygous changes). Plus, the backbone of the plasmid carries a third fluorescent reporter for negative selection (to discard random integration events). Fluorescent selection of non-random biallelic targeted clones can be performed by microscopy guided picking or cell sorting (FACS). The positive selection module (PSM), carrying the fluorescence reporter and an antibiotic resistance, is flanked by inverted terminal repeats (ITR) that are recognized by transposase. Upon purification of the clones correctly modified, transfection of the excision-only transposase allows the removal of the PSM resulting in the integration of only the desired modifications.

Keywords: CRISPR, biallelic, HDR, genome editing, IPSC

## INTRODUCTION

Disease modeling in vitro had a technological leap with the advent of biotechnology tools such as the induction to pluripotency sates, and the targeted nucleotide modifications by gene editing techniques (Hockemeyer and Jaenisch, 2016). Combining both techniques allows us to validate the effect of disease causing point mutations, as well as the influence of risk variants in the context of a human cell model (Jehuda et al., 2018). Moreover, it can help in the assessment of disease modifiers by introducing mutations that lead to phospho-mimetic or phospho-null protein variants

identifying them as novel targets for drug development, without the influence of exogenous or overexpressed sequences (Chen and Cole, 2015).

Even though CRISPR/Cas9 represents the democratization of gene editing tools for most research labs (Jasin and Haber, 2016), certain aspects of the process demonstrated to be cumbersome in practice, such as the number of clones to be screened, reduced biallelic targeting or high on target non-homologous end joining (NHEJ). We previously reported the concept of circumventing these issues by using two constructs targeting the same genomic region but having different positive selection modules (PSM) (Arias-Fuenzalida et al., 2017). These PSM have different fluorescent proteins (namely EGFP and dTomato) allowing the identification of a correct knock-in in both alleles simultaneously. Compared to other systems using only an antibiotic resistance (e.g., puromycin) in the PSM, the use of fluorescent proteins circumvents clones that underwent NHEJ repair in the second allele. The PSM is surrounded by transposon inverted terminal repeats (ITRs) of the piggyBac transposon system for removal of the PSM after selection. The transposase enzyme recognizes these ITRs, excising the sequenced flanked by them reconstituting a TTAA motif in the host genome (Yusa et al., 2011). The use of excision only variants prevents the reintegration of the transposon in the genome (Li et al., 2013).

The previous reported workflow faces challenges when trying to edit genomic regions that present a high density of repetitive elements since it increases the chances of having homologous recombination in other genomic regions (Saito and Adachi, 2016). In our previous work we modeled the influence of the different types of repetitive elements and showed that the presence of repetitive elements of the family Short Interspersed Nuclear Elements (SINE) in the homology arms present higher frequency of random integration (Arias-Fuenzalida et al., 2017). Our model matched the observations previously reported (Ishii et al., 2014). Due to the high content of repetitive elements in mammalian genomes mainly coming from transposable elements integrated during evolution (de Koning et al., 2011), it is in some cases difficult to design homology arms of an appropriate size that are free of repetitive elements. The donor plasmids in the presented design carry negative selection modules for the identification and exclusion of random integration events. As random integration events could occur excluding the negative selection module, we explore adapting the genome engineering pipeline to perform fluorescent-microscopy guided colony picking. The clones selected and picked carry the EGFP+ dTomato+ BFP- fluorescent phenotype. Colonies will have to be PCR screened for detecting the presence of the backbone of the donor construct before continuing with the rest of the workflow. Here we present a detailed protocol for this process.

#### STEPWISE PROTOCOL

Please read the entire process before starting since elements listed in the reagents table (**Supplementary Table 1**) are only those specific for this pipeline (summarized in **Figure 1**), and common cell culture reagents are not described in detail. Please also use as a reference the **Supplementary Table 2** containing the list of primers used in the protocol.

### CONCRETE EXAMPLE

In order to easily understand the pipeline of work here presented, we provide an example of a particular Single Nucleotide Polymorphism (SNP) we have edited. The SNP rs45539432<sup>1</sup> is a transition (c.1366C > T, NM\_032409.2) in the PINK1 gene that generates a premature stop codon (p.Gln456Ter, NP\_115785.1). This SNP has been linked to early-onset Parkinson's disease (Hedrich et al., 2006). In this case, the mutations to be corrected are homozygous, hence the design of the homology arms for both donors (carrying EGFP or dTomato in the PSM) is identical. For doing biallelic targeting of heterozygous modifications one of the donors (either the EGFP or dTomato one) should not have the SNP, hence a different homology arm would have to be generated.

#### IN SILICO WORK

One of the first steps in designing your plasmids for gene editing is the identification of the region of interest to be edited (**Figure 1A**). It is important to assess if the gene to be modified presents splicing variants that might show unexpected effects of the modification when performing downstream assays for phenotyping. The in silico work is required for designing the donors, the sgRNA and the oligonucleotides used to generate the constructs or to screen the editing process (**Figure 1A**).

#### Designing of the Donors

For designing the donors, the identification of the Base to Edit (BTE) and the context of the genomic region allows the user to screen for the presence of repetitive elements that could define the boundaries of the homology arms (**Figure 2A**). Considering a broader genomic region around the desired site for introducing the mutation helps the user to create the entire pipeline for screening the editing process. We recommend the usage of a sequence editor software (SES) such as ApE<sup>2</sup> or SnapGene<sup>3</sup> for working with the sequences over all the steps of the design. The steps required for designing the donors can be summarized in: identification of the region to be edited, evaluation of the presence of repetitive elements, identification of a TTAA site and design of primers for generating the arms.

#### Identification of the Region to Be Edited


<sup>1</sup>https://www.ncbi.nlm.nih.gov/projects/SNP/snp\_ref.cgi?rs=45539432 <sup>2</sup>http://jorgensen.biology.utah.edu/wayned/ape/ <sup>3</sup>http://www.snapgene.com/

FIGURE 2 | Representation of the genomic region to correct the transition (c.1366T > C, NM\_032409.2) in the PINK1 gene (A) Genomic region around the PINK1 Q456X mutation identifying the position of the base to edit respect of the repetitive elements in the area. (B) Close up of the genomic sequence centered in the Base to Edit (BTE), with the design of the primer Right Homology Arm Forward (RHAF) and Left Homology Arm Reverse (LHAR). Notice that the RHAF primer carries the correction of the BTE and a silent mutation to avoid PAM recognition. Also notice that the LHAR primer carries the silent mutation to generate a TTAA site close to the BTE. (C) Close up of the genomic region that is the boundary of the left homology arm, with the design of the Left Homology Arm Forward (LHAF) primer. (D) Close up of the genomic region that is the boundary of the right homology arm, with the design of the Right Homology Arm Reverse (RHAR) primer.

protein sequence is known, we recommend following the steps mentioned on **Box 1**.

3. Centered on BTE, select a genomic region that expands 3 kbp upstream and downstream, and transfer this information into a SES.

#### Evaluation of the Presence of Repetitive Elements


#### Identification of a TTAA Region in the Vicinity of the BTE


#### Designing of the sgRNAs

#### Selection of the Guides


BOX 1 | Guidelines for identifying the base to edit (BTE) in the genomic sequence.


BOX 2 | Rough guidelines for reducing the size of the homology arms to avoid the repetitive elements (RepEl).


Moreover, the distance between the double strand break (DSB) and the BTE should not be more than 25 bp. In the case a microhomology-mediated end-joining (MMEJ) repair occurs, the integration of the selection cassette can happen without the SNP, if this one is designed outside this limit (Nakade et al., 2014).

3. As previously reported, it is recommended to pick the guides that hit the reading DNA strand of the gene. It is reported to increase efficiency since the RNA polymerase dislodges the bound Cas9 allowing the access to the cell's repair mechanism (Clarke et al., 2018).

#### Oligonucleotides Design

#### Primer Design to Obtain the Arms


<sup>4</sup>http://repeatmasker.org/

<sup>5</sup>http://resitefinder.appspot.com/

<sup>6</sup>https://www.genscript.com/tools/codon-frequency-table

<sup>7</sup>https://portals.broadinstitute.org/gpp/public/analysis-tools/ssgRNA-design

<sup>8</sup>https://www.ncbi.nlm.nih.gov/tools/primer-blast/

Homology Arm Reverse (LHAR), Right Homology Arm Forward (RHAF), and Right Homology Arm Reverse (RHAR) (**Figures 2B–D**). Each primer possess a homology region to the genomic DNA and a homology to do donor plasmid for assembly. For the homology region in the genomic DNA, consider an amount of bases in the border of your homology arm that generates an oligo with a Tm of 60◦C (this can be assessed in the SES) (**Figures 2B–D**). For the homology region in the donor, overhangs (of 20 bp in length) will need to be added to the designed oligos matching the splitted scaffold after digestion with the restriction enzyme Hpa I (**Figures 3A–C**). These primers will be used to perform Gibson's assembly (Gibson, 2011) of the homology arms into the donor scaffold (**Figure 3D**). The assembly of the homology arms is performed in the TTAA splitting point of the ITR of the donor (**Figure 3B**).


Left Homology Arm Forward (LHAF)

AAGCTTGGATCCCCTAGGTT (+ sequence into your left homology arm) 3<sup>0</sup> end

Left Homology Arm Reverse (LHAR) at the splitting point of TTAA

CAGACTATCTTTCTAGGGTT (+ sequence into your left homology arm TTAA site) 3<sup>0</sup> end

Right Homology Arm Forward (RHAF) at the splitting point of TTAA

ATGATTATCTTTCTAGGGTT (+ sequence into your right homology arm TTAA site) 3<sup>0</sup> end

Right Homology Arm Reverse (RHAR)

GCATACGCGTATACTAGGTT (+ sequence into your right homology arm) 3<sup>0</sup> end

#### Primer Design to Introduce the SNPs

The donor would need to have not only the modification of the BTE that shall be introduced but also the modification of the PAM and a silent mutation to generate a TTAA site (if needed). We recommend to introduce a silent mutation for the PAM of at least 2 different sgRNAs. We recommend to clone the extracted amplicon with the surrounding genomic region to the BTE in a TOPO vector (Zero BluntTM TOPOTM PCR Cloning Kit, Thermo Fisher Scientific) for doing the steps for inserting the SNPs (see section "Preparation of the homology arms template"). If the SNP are close to the extremes of the arms, they can be introduced with the primers for generating the homology arms (see the previous section). If not, insertion of the SNP could be performed by Site Directed Mutagenesis (SDM, e.g., Q5 <sup>R</sup> Site-Directed Mutagenesis Kit, NEB). Primers would have to be designed to introduce these mutations. Alternatively, this process can be outsourced to a de novo DNA sequence synthesis company (e.g., GeneArt <sup>R</sup> Gene Synthesis, Thermo Fischer Scientific). In that case, the best option could be to synthesize the left homology arm and the right homology arm independently to then be ligated with the donor scaffold.

#### Oligonucleotide Design for sgRNA

	- 5 0 -NNNNNNNNNNNNNNNNNNNNNGG-3<sup>0</sup>
	- 3 0 -NNNNNNNNNNNNNNNNNNNNNCC-5<sup>0</sup>

#### Oligonucleotide Design for Validating the Knock-In (VKI)

In order to validate the knock-in, a set of primers has to be designed to verify the right and left junctions between the PSM and the genomic DNA. The left junction forward primer (VKI Primer 1) and the right junction reverse primer (VKI Primer 4) depend on the genomic region of interest (**Figure 1A**). We recommend designing oligonucleotides in the genomic region at a distance of 500 bp from the junction between the homology arms and genomic DNA for VKI Primer 1 and 4. Alternatively, the same SEQPRA designed in section "Primer design to obtain arms" could be used. For the left junction reverse primer (VKI Primer 2) and the right junction forward primer (VKI Primer 3) we recommend using:

VKI Primer 2 5 0 -AGATGTCCTAAATGCACAGCG-3<sup>0</sup> VKI Primer 3 5 0 -CGTCAATTTTACGCATGATTATCTTTAAC-3<sup>0</sup>

Plus, a set primers to obtain an amplicon expanding from the PSM to the backbone of the plasmid to identify random integration events that might have left out the BFP during the integration process. For detecting the presence of the backbone of the plasmid, we recommend designing the left backbone forward primer (VKI Primer 5) and the right backbone reverse primer (VKI Primer 6) (**Figure 1A**):

VKI Primer 5 5 0 -GCTGCCTATCAGAAGGTGGTG-3<sup>0</sup>

FIGURE 3 | Schematics of the EGFP donor plasmid and the homology in the primers for performing the assembly. (A) Representation of the pDONOR-tagBFP-PSM-EGFP with the restriction sites of HpaI. (B) Close up of the HpaI region where the left homology arm will be assembled. (C) Close up of the HpaI region where the right homology arm will be assembled. (D) Representation of the donor after assembly. On each primer from (E–H) it is represented in light blue the homology to the donor and in orange the homology to the genomic region of the example. Every double stranded DNA section in E–H represents the sequence of the homology arms to be assembled, showing in green (E,F) the genomic sequence incorporated in the left arm, and in dark blue (G,H) the genomic sequence incorporated in the right arm. In light orange (F,G) the homology in the arms to the ITR sequence of the donor, and unlabeled (E,H) the backbone of the donor plasmid.

VKI Primer 6 5 0 -GCAGCCACTGGTAACAGGAT-3<sup>0</sup>

#### Oligonucleotide Design for Final Sequencing

An oligonucleotide (SEQPR) at a distance of around 100 bp from the BTE has to be designed for doing the final sequencing of the edited clone.

### BENCH WORK

The steps performed in this section are summarized in **Figure 1B**

#### Generation of the Guides

This protocol established by the Zhang lab has been explained in detail previously (Ran et al., 2013). Here we summarize the steps needed for the generation of the sgRNAs.

#### Preparation of px330 Scaffold

1. Digest the vector px330 with Bpil (FastDigest, Thermo Fisher Scientific) for 3 h at 37◦C for complete digestion. Prepare a maximum of 1 µg of DNA per single reaction:


2. Column purify the digestion product (e.g., with QIAquick PCR Purification Kit, Qiagen) assume a 50% lost in column purification and elute in a volume of nuclease free water.

3. Determine purified plasmid concentration (e.g., by NanoDrop Spectrophotometer).

#### Annealing of sgRNA Oligonucleotides


fgene-10-00190 March 12, 2019 Time: 11:11 # 7


4. Annealing. Use a ramp protocol for annealing in thermocycler.


#### Ligation of Annealed Oligonucleotides and px330

1. Set the ligation reaction as bellow.



#### Picking of Colonies and Sequencing


#### 5 0 -GAGGGCCTATTTCCCATGATTCC-3<sup>0</sup>

#### Generation of the Donor

#### Preparation of the Donor Scaffold DNA



4. Incubate for 2 h at 37◦C in incubator.

5. Purify the digestion product using a column purification kit (e.g., with QIAquick PCR Purification Kit, Qiagen).

#### Preparation of the Homology Arms Template

If the homology arm generation was outsourced to a de novo DNA sequence synthesis company skip this section.





#### Introduction of Mutations

As explained previously in the section "Primer design to introduce the SNPs," SNPs to be introduced in the region of interest should be performed on the TOPO vector generated in the previous section using SDM or introduced with the primers designed to obtain the arms (see step 4 of "Primer design to obtain the arms").

#### Preparation of the Homology Arms for Assembly

The assembly of the donor DNA is performed using Gibson assembly (Gibson, 2011). Homology between the homology arms and the scaffold is required. Use the primers designed in step 5 of section "Primer design to obtain the arms" to incorporate the arms in the HpaI splitting sites of the scaffold donor.


#### Preparation of Transposase mRNA

Removal of the selection cassette after can be performed by the transfection of mRNA. This mRNA can be either in vitro generated in the lab using the template described in Yusa et al., 2011; Yusa, 2013 (construct pCMV-HAhyPBase); or Li et al., 2013; or it can be commercially acquired<sup>11</sup> .


#### In vitro Testing of sgRNA Efficiency

In silico efficiency of the guides can be further tested with in vitro assays for EGFP reconstitution as described in Mashiko et al. (2013); Mashiko et al. (2014).

### CELL CULTURE WORK

Edition of cells under this protocol is normally performed in hiPSCs cultured in Matrigel (Corning) coated plates with daily changes of Essential 8 media (Thermo Fisher Scientific) supplemented with 1% Penicillin/Streptomycin. Cells are normally passaged and handled as single cells (**Figure 1C**). This is performed by using Accutase (Thermo Fisher Scientific) and supplementing the Essential 8 for 24 h after passaging with ROCK inhibitor to prevent apoptosis (Y-27632, Merck Millipore).

### Nucleofection of Parental Cells and Selection

Nucleofection is performed using a 4D-NucleofectorTM X Unit (Lonza) and the P3 Primary Cell 4D-NucleofectorTM X Kit L (Lonza). Expansion of the cells pre-nucleofection can be performed in flasks or 10 cm dishes. Seeding of the cells after nucleofection should be done in a Matrigel pre-coated 1 well

<sup>9</sup>http://nebiocalculator.neb.com/#!/ligation

<sup>10</sup>http://nebuilder.neb.com/

<sup>11</sup>https://www.transposagenbio.com

plate NuncTM OmniTrayTM (Thermo Fisher Scientific) to allow the possibility of doing fluorescence guided picking. Ideally, do 2 different batches of electroporation each with a different sgRNA-Cas, and make five electroporations per sgRNA-Cas.


#### FACS or Fluorescence Guided Picking

As explained in the introduction, the presence of repetitive elements in the homology arms increases the chances of having random integration events. Even though the usage of a BFP in the backbone of the plasmid is used to detect these events, there is still a chance of having integration in an unspecific site that does not include the BFP. For this reason we recommend if the presence of repetitive elements in the homology arm cannot be avoided during the design phase to perform fluorescence guided picking of EGFP+/Dtomato+/BFP- colonies rather than generating a panclone through FACS (**Figure 1C**). It is important to notice that one of the advantages of using fluorescence reporters with strong promotors as in these donors is that some cells not harboring the PSM (and hence not having resistance to Puromycin) can still survive the Puromycin treatment if in the context of a colony containing resistant cells. These WT cells could be easily carried over, reducing the specificity of a drug based approaches (**Figure 4A**).

#### Fluorescence Guided Picking

1. Colonies having a size of around 0.25 mm are suitable for fluorescence guided picking. This time will depend on the dividing rate of the iPSCs you are working with.


#### FACS

The starting point for the cell sorting can be from the puromycin selected culture after one/two weeks of nucleofection or from the isolated clone obtained from fluorescence guided picking. In case of the former, it is expected to see around 2% of cells presenting an EGFP+/Dtomato+/BFP- pattern (Arias-Fuenzalida et al., 2017). In the case of the latter, positive cells can range between 50

FIGURE 4 | Cell culture work of the gene correction example. (A) Representative images of hiPSCs colonies expressing the different possible outcomes of gene modification. Scale bar = 500 µm. (B) A one-well plate screened for detecting correct biallelic targeting (dTomato+/EGFP+/BFP-). Bounding box shows selected isolated region. Scale bar = 2 cm. (C) Single cell isolation gating strategy performed, plus exclusion of BFP+ cells from an expanded culture derived from the selected region in (B). (D) First (upper panel) and second (lower panel) purity sort for purifying dTomato+/EGFP+/BFP- cells. (E) Removal of the positive selection module after treatment with transposase. (F) First sort after transposase induction (top left panel). Last purity sort before sequencing (n = 3) (top right panel and bottom panels). (G) Sequencing results of the gene correction of the patient line.

and 100% depending on the number of WT cells that could be forming the colony (**Figures 4C,D**). If using a BD FASC Aria II for performing the cell sorting, we recommend using an 85 µm nozzle, the 2.0 neutral density filter and the refrigeration system set at 4◦C.


### Removal of Positive Selection Module With Transposase


### FACS of Cells That Underwent PSM Removal

Follow the same instructions performed in section "FACS" to obtain a single cell suspension ready for sorting. We recommend performing a first round of sorting using a "yield" mask to maximize the recovery of EGFP-/Dtomato-/BFP- fluorescent cells. An efficiency between 5 and 15% should be expected of the removal of cells (**Figure 4F**). In successive runs, change instead to a "4-way purity mask." Only after the culture presents 100% of EGFP-/Dtomato-/BFP- fluorescent cells, a final sequencing should be performed (**Figure 4F**).

#### Confirming by DNA Sequencing


#### CONCLUSION

Gene editing technologies, specifically CRISPR/Cas9 gene editing, are starting to revolutionize biological sciences to a similar extent as the invention of PCR or hiPSCs did (Ledford, 2015). The real and potential applications of gene editing range from increasing crops and livestock yields to disease diagnostics and gene drives (Doudna and Barrangou, 2016). In the context of medicine, one of the applications of this technique is disease modeling by targeted modifications of the genome. Here we covered a procedure for the generation of isogenic lines for doing disease modeling, which allows to evaluate the influence of a specific point mutation or the effect of the genetic background of the patient in the onset and progression of a disease (Bolognin et al., 2018). Due to the high number of clones that needed to be screened to obtain a positive one (Paquet et al., 2016), alternatives techniques were needed by the research community.

We provided a detailed guideline for our previous work (Arias-Fuenzalida et al., 2017), adding the alternative path of doing fluorescence guided picking when the design of the homology arms cannot avoid the inclusion of repetitive elements. We consider that the strength of our protocol resides in the certainty obtained after reaching the different milestones of the pipeline, and the simultaneous biallelic targeting to generate isogenic lines for disease modeling.

#### AUTHOR CONTRIBUTIONS

JJ, XQ, and JS contributed conception and design of the protocol. JJ wrote the first draft of the manuscript. All authors

contributed to manuscript revision, and read and approved the submitted version.

#### FUNDING

This project was supported by the LCSB Pluripotent Stem Cell Core Facility. JJ and XQ were supported by fellowships from the FNR (AFR, Aides à la Formation-Recherche). JJ was supported by a Pelican award from the Fondation du Pelican de Mie et Pierre Hippert-Faber. This is an EU Joint Programme-Neurodegenerative Disease Research (JPND) project (INTER/JPND/14/02; INTER/JPND/15/11092422). Acquisition of flow cytometry data was supported by the flow cytometry core of the LCSB bio-imaging platform. Further support comes from the SysMedPD project, which has received

#### REFERENCES


funding from the European Union's Horizon 2020 Research and Innovation Program under grant agreement no. 668738.

#### ACKNOWLEDGMENTS

We would like to thank Prof. F. Zhang from the McGovern Institute for Brain Research for providing the Cas9 vector.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00190/full#supplementary-material


**Conflict of Interest Statement:** The authors are inventors in patent PCT/EP2017/051889.

Copyright © 2019 Jarazo, Qing and Schwamborn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Programmable Base Editing of the Sheep Genome Revealed No Genome-Wide Off-Target Mutations

Shiwei Zhou<sup>1</sup>† , Bei Cai<sup>1</sup>† , Chong He<sup>2</sup> , Ying Wang<sup>1</sup> , Qiang Ding<sup>1</sup> , Jiao Liu<sup>1</sup> , Yao Liu<sup>1</sup> , Yige Ding<sup>1</sup> , Xiaoe Zhao<sup>3</sup> , Guanwei Li<sup>1</sup> , Chao Li<sup>1</sup> , Honghao Yu<sup>4</sup> , Qifang Kou<sup>5</sup> , Wenzhi Niu<sup>5</sup> , Bjoern Petersen<sup>6</sup> , Tad Sonstegard<sup>7</sup> , Baohua Ma<sup>3</sup> \*, Yulin Chen<sup>1</sup> \* and Xiaolong Wang<sup>1</sup> \*

<sup>1</sup> College of Animal Science and Technology, Northwest A&F University, Yangling, China, <sup>2</sup> College of Information Engineering, Northwest A&F University, Yangling, China, <sup>3</sup> College of Veterinary Medicine, Northwest A&F University, Yangling, China, <sup>4</sup> Guilin Medical University, Guilin, China, <sup>5</sup> Ningxia Tianyuan Tan Sheep Farm, Hongsibu, China, <sup>6</sup> Institute of Farm Animal Genetics, Friedrich-Loeffler-Institut, Neustadt, Germany, <sup>7</sup> Recombinetics, Saint Paul, MN, United States

#### Edited by:

David Jay Segal, University of California, Davis, United States

#### Reviewed by:

Pavel Georgiev, Institute of Gene Biology (RAS), Russia Nathan Ellis, The University of Arizona, United States

#### \*Correspondence:

Baohua Ma mabh@nwafu.edu.cn Yulin Chen chenyulin@nwafu.edu.cn Xiaolong Wang xiaolongwang@nwafu.edu.cn †Co-first authors

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 16 November 2018 Accepted: 27 February 2019 Published: 15 March 2019

#### Citation:

Zhou S, Cai B, He C, Wang Y, Ding Q, Liu J, Liu Y, Ding Y, Zhao X, Li G, Li C, Yu H, Kou Q, Niu W, Petersen B, Sonstegard T, Ma B, Chen Y and Wang X (2019) Programmable Base Editing of the Sheep Genome Revealed No Genome-Wide Off-Target Mutations. Front. Genet. 10:215. doi: 10.3389/fgene.2019.00215 Since its emergence, CRISPR/Cas9-mediated base editors (BEs) with cytosine deaminase activity have been used to precisely and efficiently introduce single-base mutations in genomes, including those of human cells, mice, and crop species. Most production traits in livestock are induced by point mutations, and genome editing using BEs without homology-directed repair of double-strand breaks can directly alter single nucleotides. The p.96R > C variant of Suppressor cytokine signaling 2 (SOCS2) has profound effects on body weight, body size, and milk production in sheep. In the present study, we successfully obtained lambs with defined point mutations resulting in a p.96R > C substitution in SOCS2 by the co-injection of BE3 mRNA and a single guide RNA (sgRNA) into sheep zygotes. The observed efficiency of the single nucleotide exchange in newborn animals was as high as 25%. Observations of body size and body weight in the edited group showed that gene modification contributes to enhanced growth traits in sheep. Moreover, targeted deep sequencing and unbiased family trio-based whole genome sequencing revealed undetectable off-target mutations in the edited animals. This study demonstrates the potential for the application of BEmediated point mutations in large animals for the improvement of production traits in livestock species.

Keywords: base editing, genome editing, point mutation, whole genome sequencing, off-target mutation

### INTRODUCTION

Clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) 9 is widely used to establish site-specific genome-edited cell lines and animal models (Sander and Joung, 2014). The Cas9 protein, under the guidance of single guide (sg) RNA, cleaves DNA at sequence-specific sites in the genome and produces a double-strand break (DSB). To response DSB, cellular DNA repair pathways generated more abundant insertion and deletions (indels) by non-homologous end-joining (NHEJ) than that of homology-directed repair (HDR) mediated gene correction (Chu et al., 2015; Paquet et al., 2016; Sakuma et al., 2016). Therefore, developing alternative approaches to correct point mutations that do not need DSBs is highly expected. Rat cytidine deaminase (rAPOBEC1) linked to nCas9 (Cas9 nickase) was reported to efficiently convert C→T at target sites without introducing DSBs (Komor et al., 2016). After several generations of modification, base editor 3 (BE3) including rAPOBEC1, nCas9 (A840H),

and uracil DNA glycosylase inhibitor (UGI) was developed; the mutation efficiency was up to 74.9% in mammalian cells (Komor et al., 2016). To further optimize BE3, previous studies have been conducted to improve target specificity (Kim D. et al., 2017), editing efficiency and product purity (Komor et al., 2017), expand the genome-targeting scope (Kim K. et al., 2017), and reduce off-target effects (Kim D. et al., 2017). To prove that BE3 has a high efficiency for converting C:G to T:A base pairs, several groups have used BE3 to silence genes by introducing nonsense mutations (Billon et al., 2017; Kuscu et al., 2017).

Sheep are a phenotypically diverse livestock species that are raised globally for meat, milk, and fiber production. Suppressor of cytokine signaling 2 (SOCS2), a member of the SOCS protein family, is a negative regulator of biological processes mediated by various cytokines, such as metabolism, skeletal muscle development, and the response to infection (Inagaki-Ohara et al., 2014; Letellier and Haan, 2016). The most important of these processes is the regulation of GH signaling during growth and development (Yang et al., 2012; Dobie et al., 2018). SOCS2 is the major gene involved in the promotion of bone development in mice, and it plays a vital role in the control of bone mass and body weight (Metcalf et al., 2000; Dobie et al., 2018). A point mutation g.C1901T (p.R96C) in SOCS2 that completely abrogates SOCS2 binding affinity for the phosphopeptide of growth hormone receptor (GHR) is highly associated with an increased body weight and size in sheep (Rupp et al., 2015). We recently reported the usage of the BE3 system to induce nonsense mutations in the goat FGF5 gene, to generate animals with longer hair fibers (Li G. et al., 2018). It was the first base editing study in large animals and further inspired us to examine the feasibility of induce amino acid exchanges in sheep. In the present study, we obtained BE3-mediated lambs by co-injection of a BE3 mRNA and guide RNA target the p.R96C variant in SOCS2. In addition, we used a parent-progeny whole genome sequencing (WGS) approach to show that no off-target mutations were detected and the mutation frequency in edited animals is equivalent to that in control groups.

### MATERIALS AND METHODS

### Animals

Tan sheep were maintained at the Ningxia Tianyuan Sheep Farm, Hongsibu, Ningxia Autonomous Region, China. All experimental animals were provided water and standard feed ad libitum, consistent with normal sheep, and were treated according to Guidelines for the Care and Use of Laboratory Animals formulated by the College of Animal Science and Technology, Northwest A&F University. The experimental study was approved by the Northwest A&F University Animal Care and Use Committee (Approval ID: 2016NXTS001).

### Design of sgRNA

The sequences target the g.C1901T (p.R96C) variant in the ovine SOCS2 gene is listed in **Supplementary Table S1**. Two oligonucleotides (**Supplementary Table S2**) used for the transcription of sgRNA in vitro were precisely synthesized and annealed to form double-stranded oligos. These double-stranded oligos were subcloned into the pUC57-T7-gRNA vector as described previously (Shen et al., 2013). The clones containing the desired sequence were selected, expanded by cultivation, and the plasmid was extracted using a plasmid extraction kit (AP-MN-P-250G; Axygen, Union City, CA, United States), sgRNA was transcribed in vitro using the MEGAshortscript Kit (AM1354; Ambion, Foster City, CA, United States) and purified using the MEGAclear Kit (AM1908; Ambion). Subsequently, the BE3 mRNA in vitro transcription vector (No. 44758; Addgene, Cambridge, MA, United States) was used as a template to produce BE3 mRNAs following a previously published protocol (Shen et al., 2013).

### Production of Single-Nucleotide Mutation Sheep

Healthy ewes (3–5 years old) with regular estrous cycles were selected as donors for zygote collection. The superovulation treatment of donors and the procedures for zygote collection were as described previously (Wang et al., 2015). Briefly, an EAZI-BREED controlled internal drug release (CIDR) Sheep and Goat Device (containing 300 mg of progesterone) was inserted into the vagina of the donor sheep for 12 days and superovulation was performed 60 h before CIDR Device removal. Zygotes at the 1-cell stage were surgically collected and immediately transferred to TCM-199 medium (Gibco, Gaithersburg, MD, United States). A mixture of BE3 mRNA (25 ng µL −1 ) and sgRNA (10 ng µL −1 ) was co-injected into the cytoplasm of 1-cell stage zygotes using an Eppendorf FemtoJet system. The injection pressure, injection time, and compensatory pressure were 45 kPa, 0.1 s, and 7 kPa, respectively. Microinjections were performed on the heated stage of the Olympus ON3 micromanipulation system. Injected embryos were cultured in Quinn's Advantage Cleavage Medium and Blastocyst Medium (Sage BioPharma, Toronto, ON, Canada) for ∼24 h and were then transferred into surrogates, as reported previously (Wang et al., 2016). Pregnancy was determined by observed estrous behaviors of surrogates at every ovulation cycle. After 150 days of pregnancy, newborn lambs were delivered and genotyped.

### Genotyping of Delivered Animals

Peripheral venous blood samples were collected from newborn lambs at day 15 after birth for genomic DNA extraction. Polymerase chain reaction (PCR) amplification-based Sanger sequencing was conducted using KOD-NEO-Plus enzyme (DR010A; TOYOBA, Osaka, Japan) and primers are listed in **Supplementary Table S3**.

### Prediction of Off-Target Sites

Potential off-target sites with up to three mismatches were predicted using the openly available tool SeqMap (Jiang and Wong, 2008). The process for searching for off-target sites was implemented as previously described (Wang et al., 2015; Niu et al., 2017). The primers used for amplifying off-target sites by captured deep sequencing are given in **Supplementary Table S4**.

#### Captured Deep-Sequencing

On-target and potential off-target mutations were amplified using a KAPA HiFi HotStart PCR Kit (#KK2501; KAPA Biosystems, Wilmington, MA, United States) for deep sequencing library generation. Pooled PCR amplicons were sequenced using the MiSeq with TruSeq HT Dual Index system (Illumina, San Diego, CA, United States).

#### Whole Genome Sequencing

Genomic DNA of nine animals from three edited families were used for WGS. Nine DNA libraries with insert sizes of approximately 300 bp were constructed following the manufacturer's instructions, and 150-bp paired-end reads were generated using the Illumina HiSeq XTen PE150 platform. The qualified reads were mapped to the sheep reference genome (Jiang et al., 2014) using the BWA (v0.7.13) tool (Li and Durbin, 2009). Local realignment and base quality recalibration were assessed with the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). Single-nucleotide polymorphisms (SNPs) and small indels (<50 bp) were called using GATK (McKenna et al., 2010) and SAMtools (Li et al., 2009).

#### Identification of Off-Target Mutations

The called SNPs were filtered according to the following criteria: (1) SNPs that were identified by both GATK and SAMtools; (2) excluding SNPs that exist in NCBI sheep SNP database (>59 million SNPs); (3) excluding SNPs that exist in our sheep SNP database (n = 294, >79 million SNPs<sup>1</sup> ); (4) within the remaining SNPs, SNPs with C and G converted to other base types were selected. The potential off-target sites were predicted using Cas-OFFinder (Bae et al., 2014) by consider allowing up to

<sup>1</sup>http://animal.nwsuaf.edu.cn

five mismatches. SNPs within the predicted off-target sites were identified as off-target mutations.

### Identification of de novo Mutations

Putative de novo SNPs and indels were identified according to our recent report (Wang et al., 2018). Briefly, the SNPs/indels were identified by both GATK and SAMtools, and SNPs/indels that were found in the NCBI and our own sheep SNP databases were removed. Next, the SNPs/indels inherited from parents were excluded. Additional SNPs/indels were filtered based on parameters including read depth and Phred-scaled likelihood (PL) scores (Wang et al., 2018). Finally, the mis-aligned or miscalled SNPs/indels were removed manually. Genome-wide structure variations (SVs) were called using BreakDancer (Chen et al., 2009), then the SVs specific in the edited animals were remained. To identify the de novo SVs, common SVs in every two founders, and the read depth <50%, as well as the scaffolds were removed.


#### RESULTS AND DISCUSSION

### Generation of Edited Animals

To obtain lambs comprising the precise g.1901C > T mutation in SOCS2, we micro-injected the BE3 mRNA and sgRNA into the cytoplasm of 1-cell-stage embryos. The sgRNA was designed to encompass the target point mutation p.R96C in SOCS2 (**Figure 1A**). Five mated Tan sheep that were treated for superovulation received 54 one-cell fertilized oocytes; after 53 embryos were subjected to micro-injection, 20 developing embryos were transplanted into eight recipients. Three recipient sheep were confirmed with pregnancy. After ∼150 days of gestation, four lambs (#28, #34, #41, and #42) were obtained (**Table 1**).

Genomic DNA was isolated from the blood samples of the four lambs (#28, #34, #41, and #42) and the targeted region was evaluated by PCR-based Sanger sequencing; this analysis confirmed that three animals (#28, #34, and #42) were edited at the target site (**Supplementary Figure S1A**). We then used TA cloning and sequencing to further validate the genotypes of the three edited animals, and a nucleotide substitution at the p.R96C mutation site was found in #28 and #42 (**Supplementary Figure S1B**). TA-cloned sequencing further revealed short indels in the edited animals, for example, the founder animal #34 had 19- and 23-bp deletions and #42 was mosaic with the defined point mutation and a 5 bp deletion (**Supplementary Figure S1B**). To fully screen the genotypes in the edited animals, these three edited animals were subject to targeted deep sequencing, which confirmed the TA-sequencing results and identified additional low incidence of C-T genotypes within the editing window (**Figures 1B,C**; Gehrke et al., 2018). We demonstrated that BE3-medicated modification in sheep led to a gene knockout animal (#34), apparent mosaics, and a low incidence of short indels in edited animals (**Figure 1D**). To further investigate the mosaicism in the BE-edited animals, we sequenced the modified loci in additionally biopsied tissues (tail, muscle, and skin) of the three animals (#28, #34, and #42). We identified same heterozygous genotypes in these tissues as observed in whole blood in #28 and #42 (**Supplementary Figure S1A**), indicating the genetic modification occurred during early embryogenesis. The non-specificity of the programmable deaminase BE3 often results in short indels and mosaicism (Kim D. et al., 2017; Park et al., 2017; Sasaguri et al., 2018). Efforts have been made to optimize DNA specificity and minimize bystander effects of BEs (Rees et al., 2017; Gehrke et al., 2018) or to develop advanced cytidine and adenine BEs with high efficiency (e.g., BE4max, AncBE4max, and ABEmax; Koblan et al., 2018).

Although the editing efficiency in this study was as high as 75.0% (3/4), we only generated one animal with the precise point substitution (#28, 25%, 1/4) (**Table 1**). The efficiency of precise single-base substitution was equivalent to that observed in goats (24%) (Niu et al., 2018) and sheep (Zhou et al., 2018), and was significantly higher than that in zebrafish (4%) (Armstrong et al., 2016). We expect to use a variety of newly invented BEs to improve the DNA specificity and diminish bystander effects in the editing window.

#### Phenotypes of Edited Animals

Subsequently, we analyzed the growth curve and body size of mutant and control lambs to assess whether the p.R96C mutation impaired the function of the SOCS2 protein associated with morphology. The body weight of three edited sheep (#28, #34, and #42) was higher than that in the control group on D0, D30, and D60; body length and height in modified sheep were higher than those in the control group (**Figure 2**). We did

FIGURE 3 | Detection of potential off-targeted sites by deep sequencing. Five potential off-targeted sites (OT1–OT5) were predicted by Cas-OFFinder. Deep sequencing was used to determine substitution frequencies at predicted target sites for the three founder animals. Mismatched nucleotide and PAM sequences are indicated in red and in blue, respectively.

not observe clear phenotypic differences in the three edited animals that were related to their genotypes (substitutions, deletions, or both), even the body parameters of #42 was higher than in the other two animals (**Figure 2**). Considering the mosaicism in the edited animals, more phenotypic data from a long period is needed to address the correlation of mutation types to phenotypes. Nevertheless, these results are consistent with a spontaneous mutation in SOCS2 causing a 30–50% increase in the postnatal growth of mice (Horvat and Medrano, 2001), and the SOCS2 p.R96C mutation in sheep led to an increase body weight, body size, and milk yield (Rupp et al., 2015). Moreover, SOCS2 deletion protects bone heath in inflammatory bowel disease and causes a high-growth phenotype in mice (Horvat and Medrano, 2001; Dobie et al., 2018). We found that two animals (#34 and #42) with SOCS2 indels were as healthy as normal sheep, and we did not observe any health issues.

### Off-Target Mutations in Edited Animals

To characterize off-target effects induced by the BE system, a deep sequencing assay was used to amplify predicted off-target sites in all the three edited animals. Five offtarget sites (OT1–OT5) were predicted using the SeqMap tool (Jiang and Wong, 2008) (**Supplementary Table S5**). Targeted deep sequencing revealed that the frequency of BE3-induced point mutations is low at all predicted sites

in all three founder animals (#28, #34, and #42) (**Figure 3**), indicating that the incidence of BE3-induced off-target mutations is rare.

To further characterize off-target mutations at the wholegenome scale, we conducted family trio-based WGS to assess off-targets and de novo mutations in the three edited animals (#28, #34, and #42) (**Figure 4A**). We calculated the kinship coefficient for pair-wised animals to guarantee the pedigree information (**Supplementary Table S6**). The WGS yielded an average sequence coverage of 37.3× per individual, within a range of 34- to 41-fold, and generated 12–13 million SNPs for each animal (**Supplementary Table S7**). SNPs were first called by both GATK and SAMtools, and an average of 16 million SNPs were identified for each founder. Of the SNPs we were able to map in this study, we next removed naturally occurring variants in the NCBI sheep SNP database (>59 million SNPs) and in our own sheep SNP database (>79 million SNPs from 294 individuals) and filtered out SNPs that were inherited from parents, resulting in ∼37,000 remaining SNPs for each founder animal. We then excluded base substitutions including SNP types C to T/A/G and their antisense type G to A/T/C according to a recent study (Kim D. et al., 2017). Subsequently, we assessed the remaining SNPs that were within the predicted off-target sites (tolerant to five mismatches) (**Supplementary Table S8**) using Cas-OFFinder (Bae et al., 2014), and no single variants were identified (**Figure 4B**), indicating that no off-target mutations were induced by BE3 in the present study. The detailed filtering procedure is summarized in **Supplementary Table S9**.

To identify de novo mutations (SNPs and indels) in the edited animals, we used a stringent pipeline for variant filtering, as previously described (Li C. et al., 2018; Wang et al., 2018). Briefly, we selected SNPs that were identified by both GATK and SAMtools and removed existing SNPs in both the NCBI SNP database and our own sheep SNP database as well as the SNPs found in their parents. Next, we filtered out SNPs according to sequence read depth, PL scores, and manual examination of the FASTQ files (Wang et al., 2018). The remaining 15, 18, and 17 SNPs in individuals #28, #34, and #42 were identified as de novo SNPs for each progeny (**Figure 4C**). We next validated these de novo SNPs with Sanger sequencing. Of the 46 successfully amplified and sequenced SNPs, 44 of them were determined as true variants (**Supplementary Figure S2** and **Supplementary Table S10**), indicating the pipeline for identification of de novo SNPs was robust. Similarly, we were able to identify 5, 5, and 10 de novo indels in individuals #28, #34, and #42, respectively (**Figure 4D** and **Supplementary Table S11**). To further characterize the largescale genomic alterations induced by base editing, we called the SVs by BreakDancer (Chen et al., 2009), and identified a total of ten de novo SVs in the three BE-mediated animals (**Supplementary Tables S12, S13**), none of these variants were adjacent to the SOSC2 site.

Additionally, we estimated the mutation rates per base pair per generation (Li C. et al., 2018), and found that no apparent differences between the BE3 and control animals in term of frequency of de novo SNPs in the present study and former studies (**Figure 4E**). Albeit only three trios were analyzed in this study, the mutation rate in base-edited sheep was equivalent to that in human populations (1000 Genomes Project Consortium et al., 2010; Maretty et al., 2017), cattle (Harland et al., 2017), as well as our previously generated CRISPR/Cas9-edited sheep and goat populations (Li C. et al., 2018; Wang et al., 2018). Along with our previous studies reporting the de novo mutations in edited animals and their offspring (Li C. et al., 2018; Li G. et al., 2018; Wang et al., 2018) and recently two trio-based studies in mice (Iyer et al., 2018; Willi et al., 2018), we demonstrate that the mutation frequency does not differ in Cas9-mediated or BEmediated animals, thereby providing evidence to support the reliability of genome editing in large animals for biomedicine and agriculture.

### CONCLUSION

In summary, a single sheep carrying the SOCS2 p.R96C mutation was successfully generated using programmable deaminases BE3. We confirmed that BE3 did not induce unintended off-target mutations at the genome-wide scale, and the mutation frequency in BE-mediated animals was equivalent to those in Cas9-edited animals and in natural populations. This study facilitates gene correction and genetic improvement of large animals caused by single base mutations.

### DATA AVAILABILITY

All relevant results are within the paper and its **Supplementary Data Files**. The raw WGS was available at NCBI SRA database under BioProject ID: PRJNA505205.

### AUTHOR CONTRIBUTIONS

SZ, XZ, XW, BM, and YC conceived the study. BC, YW, QD, XZ, JL, YL, YD, GL, HY, and BM performed the experiments. CH, CL, and XW analyzed the dataset. QK and WN provided samples. XW, TS, and BP wrote the article.

### FUNDING

This work was supported by the National Natural Science Foundation of China (31772571), and Local Grants (NXTS20118- 001, 2017NY-072, and 2018KJXX-009). XW is a Tang Scholar at Northwest A&F University.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019. 00215/full#supplementary-material

#### REFERENCES

fgene-10-00215 March 13, 2019 Time: 18:14 # 7


Mu Gam protein yields C:G-to-T: a base editors with higher efficiency and product purity. Sci. Adv. 3:eaao4774. doi: 10.1126/sciadv.aao4774



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Zhou, Cai, He, Wang, Ding, Liu, Liu, Ding, Zhao, Li, Li, Yu, Kou, Niu, Petersen, Sonstegard, Ma, Chen and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# sgRNA-shRNA Structure Mediated SNP Site Editing on Porcine IGF2 Gene by CRISPR/StCas9

Yongsen Sun† , Nana Yan† , Lu Mu, Bing Sun, Jingrong Deng, Yuanyuan Fang, Simin Shao, Qiang Yan, Furong Han, Zhiying Zhang\* and Kun Xu\*

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China

#### Edited by:

Youri I. Pavlov, University of Nebraska Medical Center, United States

#### Reviewed by:

Claudio Mussolino, Institut für Zell- und Gentherapie (IZG), Germany Magdy Mahfouz, King Abdullah University of Science and Technology, Saudi Arabia

#### \*Correspondence:

Kun Xu xukunas@nwafu.edu.cn Zhiying Zhang zhangzhy@nwafu.edu.cn †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 29 November 2018 Accepted: 01 April 2019 Published: 18 April 2019

#### Citation:

Sun Y, Yan N, Mu L, Sun B, Deng J, Fang Y, Shao S, Yan Q, Han F, Zhang Z and Xu K (2019) sgRNA-shRNA Structure Mediated SNP Site Editing on Porcine IGF2 Gene by CRISPR/StCas9. Front. Genet. 10:347. doi: 10.3389/fgene.2019.00347 The SNP within intron 3 of the porcine IGF2 gene (G3072A) plays an important role for muscle growth and fat deposition in pigs. In this study, the StCas9 derived from Streptococcus thermophilus together with the Drosha-mediated sgRNA-shRNA structure were combined to boost the G to A base editing on the IGF2 SNP site, which we called "SNP editing." The codon-humanized StCas9 as we previously reported was firstly compared with the prevalently used SpCas9 derived from Streptococcus pyogenes using our idiomatic surrogate report assay, and the StCas9 demonstrated a comparable targeting activity. On the other hand, by combining shRNA with sgRNA, simultaneous gene silencing and genome targeting can be achieved. Thus, the novel IGF2.sgRNA-LIG4.shRNA-IGF2.sgRNA structure was constructed to enhance the sgRNA/Cas9-mediated HDR-based IGF2 SNP editing by silencing the LIG4 gene, which is a key molecule of the HDR's competitive NHEJ pathway. The sgRNA-shRNA/StCas9 all-in-one expression vector and the IGF2.sgRNA/StCas9 as control were separately used to transfect porcine PK15 cells together with an ssODNs donor for the IGF2 SNP editing. The editing events were detected by the RFLP assay, Sanger sequencing as well as Deep-sequencing, and the Deep-sequencing results finally demonstrated a significant higher HDR-based editing efficiency (16.38%) for our sgRNA-shRNA/StCas9 strategy. In short, we achieved effective IGF2 SNP editing by using the combined sgRNA-shRNA/StCas9 strategy, which will facilitate the further production of baseedited animals and perhaps extend for the gene therapy for the base correction of some genetic diseases.

Keywords: IGF2 gene, SNP, base editing, CRISPR, StCas9, sgRNA-shRNA

## INTRODUCTION

The CRISPR/Cas9 technology (Jinek et al., 2012; Mali et al., 2013) has been widely used for genome editing in various cell types and organisms since its advent. So far, the S. pyogenesderived SpCas9 is the most prevalently applied Cas9 enzyme (Cong et al., 2013). Nevertheless, Cas9 variants from different microbial species can also contribute efficient genome editing, such

as S. thermophilus (StCas9) (Xu et al., 2015a) and Staphylococcus aureus (SaCas9) (Kleinstiver et al., 2015; Ran et al., 2015).

The Cas9 endonuclease is directed by an artificial singleguide RNA (sgRNA or gRNA) to recognize specifically the target DNA sequence with given protospacer adjacent motif (PAM) by base pairing (Jinek et al., 2012). In mammalian cells, the DNA double-strand breaks (DSBs) induced by the endonucleases can be repaired by two main mechanisms, the error-prone non-homologous end joining (NHEJ) and the donor-dependent homology-directed repair (HDR). The NHEJ pathway generates stochastic nucleotide insertions and deletions (Indels) at the target locus resulting in open reading frame (ORF) shift and lossof-function of target genes. Alternatively, the HDR pathway can result in desired genome editing events by targeted recombination of designed homologous DNA template donors (Salsman and Dellaire, 2017).

In animal breeding researches, researchers usually knockout/in genes of interest (GOI) to study their function and relationship in the network of signaling pathways (Komor et al., 2017). However, it's always confronted with the concern of genetic safety, when creating gene knock-out/in animals. On the other hand, precise genome editing takes the advantage of the HDR pathway to make point mutations for gene function study, as well as gene therapy and animal breeding researches (Komor et al., 2017; Salsman and Dellaire, 2017). However, the HDR efficiency is extremely lower than NHEJ in mammalian cells. Hence, different approaches have been reported to enhance the efficiency of the HDR-based precise genome editing. Firstly, inhibiting the key molecules of the competitive NHEJ pathway, such as DNA ligase IV (LIG4) and KU70 (Maruyama et al., 2015; Hu et al., 2018); Secondly, optimizing the DNA template donors (ssDNA or dsDNA); Thirdly, several molecules or small compounds have been also reported to improve the HDR efficiency significantly, such as RAD51 and RAD52 (homologous recombination related proteins), Nocodazole and CCND1 (synchronizes cell cycle at specific phase) (Lin et al., 2014; Chu et al., 2015; Salsman and Dellaire, 2017; Shao et al., 2017).

We have previously developed the novel Drosha-mediated sgRNA-shRNA structure for transcribing multiple sgRNAs to promote the CRISPR/Cas9-based multiplex genome targeting (Yan et al., 2016). Interestingly, we noticed that the byproduct shRNA could be used for silencing the LIG4 gene to aid the HDR-based genome editing. Insulin like growth factor-2 (IGF2) is an important gene involved in pig muscle growth and fat deposition, and influencing the heart size. It has been reported that the NO.3072 nucleotide substitution from guanine (G) to adenine (A) within the intron 3 of the IGF2 gene would obstruct the binding of the transcriptional inhibiting factor ZBED6, resulting in increased IGF2 expression and muscle yield (Van Laere et al., 2003; Markljung et al., 2009). Our previous research has confirmed that it's the wild-type G at the IGF2 SNP site in the local pig species (Shao et al., 2017).

In this study, the codon-humanized StCas9 derived from S. thermophilus (Xu et al., 2015a) and the novel sgRNA-shRNA structure (Yan et al., 2016), both as we previously reported, were combined for enhancing the SNP editing of the porcine IGF2 gene. Taking the advantage of the sgRNA-shRNA structure, simultaneous IGF2 genome targeting by sgRNA/Cas9 and transient LIG4 gene silencing by shRNA could be achieved. On the other hand, the CRISPR/StCas9 system with a stricter PAM requirement of NGGNG could reduce the off-target events compared with the CRISPR/SpCas9 system with the PAM pattern of NGG (Muller et al., 2016). Besides, the SNP editing also provided a good idea to avoid the genetic safety problem for the animal breeding study. Therefore, our novel sgRNAshRNA/StCas9 strategy is of clear significance for gene editing or base correction, which will facilitate the further animal breeding research and the gene therapy study for the correction of genetic mutations.

#### MATERIALS AND METHODS

#### Construction of sgRNA/Cas9 Expression Vectors and Surrogate Reporters

The IGF2.sgRNA/StCas9 expression vector (pll3.7-mU6- IGF2.sgRNA-CMV-hStCas9) was constructed with a further modified sgRNA scaffold as shown in **Supplementary Figure S1**, which was designed by referencing our (Xu et al., 2015a) and another (Chen et al., 2013) previous studies for optimizing the sgRNA structure. The IGF2.sgRNA/SpCas9 expression vector was subsequently constructed by replacing the CMV-StCas9 cassette of the IGF2.sgRNA/StCas9 vector with the CBh-SpCas9 cassette amplified from the plasmid pX330-U6-Chimeric\_BB-CBh-hSpCas9 (Addgene, #42230).

A series of single strand annealing (SSA)-based surrogate reporters have been developed in our previous studies (Ren et al., 2019). The DsRed-eGFP (RG) and eGFP surrogate reporters (Xu et al., 2015a) were firstly designed and used for sgRNA/Cas9 activity verification in mammalian cells. The DsRed-Puro<sup>R</sup> eGFP (RPG) surrogate reporter with dual-reporter genes was further constructed to assist the enrichment and screening of genetically modified cells by either puromycin selection or fluorescence-activated cell sorting (FACS) (Ren et al., 2015). The IGF2.RG and IGF2.eGFP surrogate reporters were constructed in this study as we previously did (Xu et al., 2015a), while the IGF2.RPG surrogate reporter was constructed in our previous study (Shao et al., 2017).

### Cell Culture and Transfection

The human embryonic kidney 293T (HEK293T) and porcine kidney epithelial (PK15) cells were cultured routinely in DMEM supplemented with FBS, penicillin and streptomycin as we previously did (Shao et al., 2017), with an additional 250 ng/ml antimycotic amphotericin B supplemented. The transfection assays were conducted within six-well plates using LipofectamineTM 2000 reagent (Invitrogen) following the manufacturer's protocol, with a total of 3 µg plasmid DNA peer well. At least three independent wells were used for parallel transfections for each experiment group.

### Surrogate Report Assay for Comparing StCas9 and SpCas9

The human embryonic kidney 293T cells were co-transfected with the IGF2.sgRNA/StCas9 or IGF2.sgRNA/SpCas9 expression vector and the IGF2.RG surrogate reporter within six-well plates. At least three wells were used for parallel transfections for each group, and the molecular ratio for the sgRNA/Cas9 vector and the RG reporter was 1:1. After transfected for 2 days, the cells from each parallel well were harvested independently for flow cytometric analysis to count the DsRed<sup>+</sup> single and DsRed+eGFP<sup>+</sup> dual positive cells. The flow cytometric data was analyzed by the flowJo v10 software. The percentage of dual-fluorescence positive cells as DsRed+eGFP+/(DsRed+eGFP++DsRed+) was calculated to evaluate the Cas9 activity indirectly.

### shRNA Design and Verification for Porcine LIG4 Gene Interference

Before designing the shRNAs against the porcine LIG4 gene, three overlapping fragments (**Supplementary Figure S2**) from its complete CDS were amplified by RT-PCR with the PK15 cDNA as the template. The primers used are shown in **Supplementary Table S1**. The three PCR fragments were then sequenced and matched for verifying the sequence information of the LIG4 gene. Afterward, three shRNA candidates were forecasted accordingly through the Invitrogen BLOCK-iTTM RNAi Designer and the corresponding oligonucleotides as shown in **Supplementary Table S2** were synthesized (Invitrogen). The three shRNA-1/2/3 cassettes were then generated by oligonucleotides-annealing (Xu et al., 2015b) and were cloned into the pLenti-H1 expression vector respectively. The non-specific shRNA vector pLenti-H1- SC (shRNA control) had been constructed previously in our lab. The H1-shRNA-CMV-eGFP expression cassettes from these four pLenti-H1 vectors were further amplified by PCR and cloned into the pB-CBh-puro vector. In addition to the fluorescent eGFP marker gene, the upgraded shRNA vectors (pB-CBh-Puro-H1-shRNA-CMV-eGFP) contained the puromycin resistant gene (Puro<sup>R</sup> ) (**Figure 2A**), which was intent for the enrichment of the transfected cells by puromycin selection. The PK15 cells were transfected with these pB-based shRNA vectors. 24 h after the transfection, the cells were selected by puromycin (3 µg/ml) for about another 2 days, and then were harvested for the total RNA preparation and quantitative RT-PCR analysis.

### Quantitative RT-PCR Assay for LIG4 Gene Expression

Quantitative RT-PCR (qRT-PCR) assays were conducted as we previously performed (Yan et al., 2016) for detecting the relative transcript level of the LIG4 gene. Generally, the parallel wells of transfected PK15 cells for each experiment or control group were collected independently for the total RNA isolation, the first-strand cDNA preparation and the further quantitative PCR analysis. The porcine β-actin gene was used as the internal control. The primers used for the qRT-PCR assays were listed in **Supplementary Table S1**.

## sgRNA-shRNA Structure Design and Activity Verification

We have reported the novel Drosha-mediated sgRNA-shRNA structure for multiplex genome targeting in previous study (Yan et al., 2016). Here, the LIG4.shRNA-1, which showed the highest activity for silencing the porcine LIG4 gene in the preceding experiment, was used for the sgRNA-shRNA structure design. For further improvement, a pair of more efficient Drosha-processing sequences from miR-30 (Zeng and Cullen, 2005) were used to replace the former Drosha recognition sites. As designed in **Figure 3A**, the LIG4.shRNA sequence flanked by the Droshaprocessing sequences (as shown in **Supplementary Figure S3**) was synthesized directly and inserted into the middle of two identical IGF2.sgRNA sequences. And then the IGF2.sgRNA-LIG4.shRNA-IGF2.sgRNA cassette was cloned into the pll3.7 mU6-CMV-hStCas9 vector (Xu et al., 2015a), generating the sgRNA-shRNA/StCas9 all-in-one expression vector, which was supposed capable for the simultaneous IGF2 gene targeting and LIG4 gene silencing.

Surrogate reporter assay was conducted to verify the sgRNA activity as described above. In brief, HEK293T cells were cotransfected with the IGF2.eGFP surrogate reporter and the sgRNA-shRNA/StCas9, the IGF2.sgRNA/StCas9, or the single StCas9 (pll3.7-mU6-CMV-hStCas9 with no sgRNA as the negative control) expression vector. The linearized IGF2.eGFP surrogate reporter, which was supposed to repair spontaneously in cells after the transfection, was used as the positive control to co-transfect the cells with the single StCas9 expression vector as we previously did (Xu et al., 2015a). After transfected for 2 days, the cells were photographed and cells from each parallel well were harvested independently for flow cytometric analysis. The percentage of eGFP positive cells was used as an indirect measurement for the IGF2.sgRNA activity.

In addition to the IGF2.sgRNA-LIG4.shRNA-IGF2.sgRNA (Sg-Sh) cassette, an IGF2.sgRNA-SC-IGF2.sgRNA (Sg-SC) cassette was generated by replacing the LIG4.shRNA-1 with the non-specific shRNA control. To verify the shRNA activity driven by the sgRNA-shRNA structure, the Sg-SC cassette, as well as Sg-Sh, was further cloned into the pLenti-H1 vector. Then, the pLenti-H1 based shRNA control (SC), LIG4.shRNA-1(Sh-1), Sg-SC and Sg-Sh expression vectors were used to transfect the PK15 cells, respectively, along with the pB-CBh-Puro vector. The transfected cells were enriched by puromycin selection for about 2 days as above, and then were harvested for detecting the relative expression of the LIG4 gene by qRT-PCR analysis.

### Genome Editing of the Porcine IGF2 Gene

To conduct the HDR-based IGF2 SNP editing, an 110 nt singlestranded oligodeoxynucleotides (ssODNs) with the desired G > A substitution (**Supplementary Table S2**) was synthesized by GenScript (Nanjing, China). The PAM motif of the IGF2.sgRNA target site within the ssODNs donor was mutated to the NheI restriction endonuclease (RE) site for the subsequent restrictive fragment length polymorphism (RFLP) assay.

Porcine kidney epithelial cells were co-transfected with the sgRNA-shRNA/StCas9 or IGF2.sgRNA/ StCas9 expression vector, the IGF2.RPG surrogate reporter and the ssODNs HDR donor. The transfections were conducted within six-well plates and the molecular ratio for RNA/Cas9:RPG reporter:ssODNs donor was 2:1:1. At least three wells were used for parallel transfections for each group. 2 days after the transfection, the cells were transferred into 60 mm dishes and were maintained continuously with puromycin treatment for another 5 days. After the puromycin selection, the resistant cell clones from the parallel dishes for each experiment group were collected as a pool as we previously did (Shao et al., 2017). The genomic DNAs for different pools were extracted, respectively, and the target locus was amplified by PCR for the subsequent RFLP and sequencing detections. The primers used were shown in **Supplementary Table S1**.

For the RFLP assay, the PCR products of the IGF2 locus were 1319 bp in length. When the IGF2 gene was edited successfully as designed, the PCR product would be cut into two fragments (1081 and 238 bp) by the NheI RE induced. On the other hand, the PCR products from the two experimental groups were further cloned into the pMD19- T "T-A" cloning vector, respectively, and a total of 40∼50 clones were picked for each group for Sanger sequencing. Simultaneously, the IGF2 locus was also amplified for the Deepsequencing analysis.

#### Deep Sequencing Analysis

PCR amplicons of the two experiment groups were amplified, respectively, using different barcode-primer pairs (**Supplementary Table S1**), purified using a gel extraction kit (OMEGA Bio-Tek, China), and sequenced on an Illumina HiSeq (GENEWIZ, China). Among the files provided by GENEWIZ, reads with the sequence CTCaCAGCGCGctAGC (harboring both the desired G > A mutation and the PAM mutations) were considered as the HDR-based editing, and the percentage

of HDR reads relative to all reads was calculated as the HDR efficiency for each group. The deep sequencing data are available under the BioProject ID: PRJNA526113.

#### Statistical Analysis

For the histograms except **Figure 4H**, the data were collected from three independent experiments, and were analyzed by an unpaired and two-tailed t-test. Differences were considered statistically significant (<sup>∗</sup> ) when P < 0.05. Error bars represented the standard error.

### RESULTS

#### StCas9 Showed Similar Activity With SpCas9

We have developed the CRISPR/StCas9 system and the SSA-based DsRed-eGFP (RG) and eGFP surrogate reporters in our previous study (Xu et al., 2015a). Here, our codonhumanized StCas9 was firstly compared with SpCas9 using the dual-fluorescent RG surrogate report (**Figure 1A**). The DsRed gene was used as the transfection marker, and the interrupted eGFP gene was used as the reporter, which was designed to be repaired accurately by SSA when targeted by the sgRNA/Cas9 complex (Xu et al., 2015a). The HEK293T cells were observed under a fluorescent microscope 2 days after the transfection. Robust red fluorescence and obvious green fluorescence were evidenced within the cells from both SpCas9 and StCas9 experiment groups (**Figure 1B**). Then the cells were harvested and the DsRed<sup>+</sup> single and DsRed+eGFP<sup>+</sup> dual positive cells were counted by flow cytometric analysis (**Figure 1C**). The percentage of the DsRed+eGFP<sup>+</sup> cells was calculated to evaluate the Cas9 activity, and our StCas9 demonstrated a comparable activity (20.85%) with the prevalently used SpCas9 (17.65%) (**Figure 1D**, P = 0.221).

#### shRNA Verification for Porcine LIG4 Gene Interference

Since the transfection efficiency for PK15 cells is limited, we constructed the upgraded shRNA vectors (**Figure 2A**), which contained the Puro<sup>R</sup> gene for the puromycin selection of the transfected cells, as well as the fluorescent eGFP gene for the visualization. The representative pictures for the untransfected and transfected cells were shown in **Figure 2B**, which demonstrated that the PK15 cells transfected with the shRNA vector were enriched significantly after the puromycin selection. The results of the qRT-PCR analysis further confirmed that the relative expression of LIG4 gene was declined about 60% by two of the shRNAs (Sh-1/3, **Figure 2C**, <sup>∗</sup>P < 0.05 compared with SC).

### Functional Assay of the sgRNA-shRNA Structure

The sgRNA-shRNA structure was constructed with two identical sgRNAs targeting the IGF2 gene and one shRNA

used as an indirect measurement for the IGF2.sgRNA activity. (E) The relative expression of LIG4 gene down-regulated by different shRNAs or structures (n = 3, <sup>∗</sup>P < 0.05 compared with SC). SC, the non-specific shRNA control; Sg-SC, sgRNA-shRNA structure with non-specific shRNA control; Sg-Sh, sgRNA-shRNA structure with LIG4.shRNA-1; Sh-1, LIG4.shRNA-1.

against the LIG4 gene (**Figure 3A**) as we previously did (Yan et al., 2016). The adjacent sgRNA and shRNA was linked by the optimized Drosha cutting sequences (**Supplementary Figure S3**).

Alternatively, the surrogate reporter assay was conducted using the single-fluorescent IGF2.eGFP reporter (**Figure 3B**) for verifying the sgRNA activity. The linearized IGF2.eGFP surrogate reporter, which was supposed to repair spontaneously in cells after the transfection, was used as the positive control to co-transfect the cells with the single StCas9 expression vector. The representative pictures of fluorescent cells and the flow cytometric counting results for different experiment groups were shown in **Figure 3C**. The percentage of eGFP<sup>+</sup> cells was used as an indirect measurement for the IGF2.sgRNA activity. As shown in **Figure 3D**, the IGF2.sgRNA driven by the sgRNA-shRNA/StCas9 all-in-one expression vector (38.37%) demonstrated similar activity with that driven by the IGF2.sgRNA/StCas9 vector (32.33%, P = 0.265). To further verify the LIG4.shRNA activity driven by the sgRNA-shRNA structure, the qRT-PCR analysis was

performed with both the SC (non-specific shRNA control) and Sg-SC (sgRNA-shRNA with non-specific shRNA) negative controls. The results suggested that the relative expression of LIG4 gene was declined about 50% by the sgRNA-shRNA structure with LIG4.shRNA (Sg-Sh, **Figure 3E**, <sup>∗</sup>P < 0.05 compared with SC).

#### Efficient IGF2 Gene Editing by sgRNA-shRNA/StCas9

The IGF2.RPG surrogate reporter was constructed and used for the selection of the genetically modified positive cells as we previously reported (Ren et al., 2015; Shao et al., 2017). PK15 cells were co-transfected with the sgRNAshRNA/StCas9 or the IGF2.sgRNA/StCas9 expression vector and the ssODNs HDR donor (**Figure 4A**), along with the IGF2.RPG surrogate reporter (**Figure 4B**). After the puromycin selection (**Figure 4C**), the resistant positive cell clones (**Figure 4D**) were pooled and the genomic DNA was extracted for the subsequent RFLP and Sanger sequencing analyses. For the RFLP assay, the PCR products (1319 bp) of the IGF2 locus would be cut into two fragments (1081 and 238 bp) by NheI when the IGF2 gene was edited successfully as designed (**Figures 4A,E**). The editing events were firstly confirmed by the results of both the RFLP assay (**Figure 4E**) and Sanger sequencing (**Figure 4F**). The Sanger sequencing results of the "T-A clones" for the IGF2 target locus (**Figure 4G**) further demonstrated 9.3% (4/43) HDR-based and 81.4% (35/43) NHEJ-based repair efficiencies driven by sgRNA-shRNA/StCas9, while the HDR-based and NHEJ-based repair efficiencies for the IGF2.sgRNA/StCas9 group were 8.6% (3/35) and 80.0% (28/35), respectively (**Figure 4H**). As limited clones were sequenced by Sanger sequencing, which may not reveal the real difference between the two experiment groups, we further conducted the Deep-sequencing analysis and the results finally demonstrated a significantly higher HDR-based editing efficiency (16.38%) for the sgRNA-shRNA/StCas9 group (**Figure 4I**, <sup>∗</sup>P < 0.05).

### DISCUSSION

The HDR-based genome editing holds great promise to the development of safe and highly precise approaches for gene therapy and animal breeding researches (Steyer et al., 2018). Recent years, massive efforts have been made to enhance the CRISPR/Cas9-mediated HDR efficiency. Here, we combined the StCas9 and the novel sgRNA-shRNA structure for boosting the HDR-based "SNP editing" of the porcine IGF2 gene.

We have developed the S. thermophilus-derived CRISPR/StCas9 system for eukaryotic genome editing in our previous study (Xu et al., 2015a). Although the CRISPR/StCas9 system could share the same NGG PAM with the S. pyogenes-derived CRISPR/SpCas9 system, it required a stricter NGGNG PAM pattern for full activity (Xu et al., 2015a), which may contribute to reduce the offtarget effect compared with the CRISPR/SpCas9 system (Muller et al., 2016). We found in our study that the key elements of our optimized sgRNA scaffold (Xu et al., 2015a) were almost the same with that of the CRISPR/SpCas9 system (Mali et al., 2013). A series of our subsequent applications also suggested that the two systems may share the same sgRNA structure. However, our StCas9 remained to be compared with the prevalently used SpCas9. To compare the targeting activities of the two Cas9 variants, we used the uniform and further modified sgRNA scaffold (**Supplementary Figure S1**), and the surrogate report assay was conducted using our idiomatic SSA-based RG surrogate reporter. We are glad to see that our StCas9 demonstrated comparable activity with the SpCas9 (**Figure 1D**).

We developed the novel Drosha-mediated sgRNA-shRNA structure mainly for the multiplex genome targeting at the beginning (Yan et al., 2016). However, taking the advantage of the combined sgRNA-shRNA structure, simultaneous genome targeting by sgRNA/Cas9 and transient gene silencing by shRNA could be achieved. Interestingly, we noticed that the by-product shRNA could be used for interfering LIG4 gene to enhance the HDR-based genome editing. During the multiplex genome targeting assays, we found that the sgRNAs driven by the sgRNA1-shRNA-sgRNA2 structure showed lower activity than the independent sgRNA controls driven by routine sgRNA/Cas9 expression vectors. Hence, we used the sgRNA-LIG4.shRNA-sgRNA structure with identical sgRNAs flanking the LIG4.shRNA to guarantee the sgRNA activity for the enhanced HDRbased genome editing (Yan et al., 2016). In this study, we used the same strategy and further applied a pair of more efficient Drosha-processing sequences (Zeng and Cullen, 2005) for the processing of the short RNAs. The surrogate reporter and qRT-PCR assays demonstrated effective sgRNA and shRNA activities driven by sgRNAshRNA/StCas9 (**Figures 3D,E**). Another point to explain, since we have compared the sgRNA-shRNA structures with LIG4.shRNA and non-specific shRNA for enhancing the HDR-based genome editing, this study was designed mainly focusing on the practical question whether our sgRNAshRNA strategy (with LIG4.shRNA) is better than the routine single sgRNA strategy.

It is reported that numerous human genetic diseases were caused by single nucleotide mutation, such as the 878 G > A (AVPR2 W293X) in X-linked Nephrogenic diabetes insipidus and the 1517 G > A (FANCC W506X) in Fanconi anemia (Cox et al., 2017). Moreover, there have been so many SNP sites found in livestock, and most of them are related to animal diseases or the production traits, such as the 3072 G > A in the porcine IGF2 gene in this study. Thus, the SNP editing provides a good idea for the production of base-edited animals, as well as the gene therapy for the base correction of some genetic diseases.

In view of the unavoidable concern of off-target effect during the genome editing manipulation, our sgRNA-shRNA/StCas9 system still has room for improvement. It has been reported that paired Cas9-nickase (Cas9n) can be used for efficient genome editing with significant decreased off-target events (Ran et al., 2013). Hence, an improved sgRNA-shRNA/StCas9n system may be a good idea in the further study.

#### AUTHOR CONTRIBUTIONS

fgene-10-00347 April 16, 2019 Time: 17:57 # 9

KX and ZZ conceived the research plans. YS, NY, LM, BS, JD, and YF performed the experiments. SS, QY, and FH provided related plasmids. YS, NY, and KX wrote the article. KX provided financial support.

#### FUNDING

This work was supported by grants from the Shaanxi Natural Science Foundation of China (2017JQ3007), the China Postdoctoral Science Foundation (2018T111111 and

### REFERENCES


2015M580887), the National Natural Science Foundation of China (NSFC, 31702099) and the National Transgenic Major Project of China (2018ZX08010-09B).

### ACKNOWLEDGMENTS

The authors would like to thank Associate Professor Zehui Wei for his help for the statistical analysis of the data.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00347/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Sun, Yan, Mu, Sun, Deng, Fang, Shao, Yan, Han, Zhang and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Increasing Cytosine Base Editing Scope and Efficiency With Engineered Cas9-PmCDA1 Fusions and the Modified sgRNA in Rice

Ying Wu, Wen Xu, Feipeng Wang, Si Zhao, Feng Feng, Jinling Song, Chengwei Zhang\* and Jinxiao Yang\*

Beijing Key Laboratory of Maize DNA Fingerprinting and Molecular Breeding, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China

#### Edited by:

Kun Xu, Northwest A&F University, China

#### Reviewed by:

Sachin Rustgi, Clemson University, United States Cem Kuscu, The University of Tennessee Health Science Center (UTHSC), United States

#### \*Correspondence:

Chengwei Zhang zhangchengwei2017@126.com Jinxiao Yang yangjinxiao@maizedna.org

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 16 November 2018 Accepted: 09 April 2019 Published: 26 April 2019

#### Citation:

Wu Y, Xu W, Wang F, Zhao S, Feng F, Song J, Zhang C and Yang J (2019) Increasing Cytosine Base Editing Scope and Efficiency With Engineered Cas9-PmCDA1 Fusions and the Modified sgRNA in Rice. Front. Genet. 10:379. doi: 10.3389/fgene.2019.00379 Base editors that do not require double-stranded DNA cleavage or homology-directed repair enable higher efficiency and cleaner substitution of targeted single nucleotides in genomic DNA than conventional approaches. However, their broad applications are limited within the editing window of several base pairs from the canonical NGG protospacer adjacent motif (PAM) sequence. In this study, we fused the D10A nickase of several Streptococcus pyogenes Cas9 (SpCas9) variants with Petromyzon marinus cytidine deaminase 1 (PmCDA1) and uracil DNA glycosylase inhibitor (UGI) and developed two new effective PmCDA1-based cytosine base editors (pBEs), SpCas9 nickase (SpCas9n)-pBE and VQR nickase (VQRn)-pBE, which expanded the scope of genome targeting for cytosine-to-thymine (C-to-T) substitutions in rice. Four of six and 12 of 18 target sites selected randomly in SpCas9n-pBE and VQRn-pBE, respectively were base edited with frequencies of 4–90% in T<sup>0</sup> plants. The effective deaminase window typically spanned positions 1–7 within the protospacer and the single target C showed the maximum C-to-T frequency at or near position 3, counting the end distal to PAM as position 1. In addition, the modified single guide RNA (sgRNA) improved the base editing efficiencies of VQRn-pBE with 1.3- to 7.6-fold increases compared with the native sgRNA, and targets that could not be mutated using the native sgRNA were edited successfully using the modified sgRNA. These newly developed base editors can be used to realize C-to-T substitutions and may become powerful tools for both basic scientific research and crop breeding in rice.

#### Keywords: base editing, cytosine base editor, SpCas9, VQR, the modified sgRNA, rice

**Abbreviations:** Cas9, CRISPR-associated protein 9; CBE, cytosine base editor; CRISPR, clustered regularly interspaced short palindromic repeats; C-to-T, cytosine-to-thymine; EQR, engineered SpCas9 variant with residue substitutions D1135E/R1335Q/T1337R; Indel, insertion and deletion; PAM, protospacer adjacent motif; pBE, PmCDA1-based cytosine base editor; PmCDA1, Petromyzon marinus cytidine deaminase1; rAPOBEC1, rat cytidine deaminase APOBEC1; SaCas9, Staphylococcus aureus Cas9; SaCas9n, SaCas9 nickase; SaKKH, engineered SaCas9 variant with residue substitutions E782K/N968K/R105H; SaKKHn, SaKKH nickase; sgRNA, single guide RNA; SpCas9, Streptococcus pyogenes Cas9; SpCas9n, SpCas9 nickase; SpCas9-NG, engineered SpCas9 variant with residue substitutions R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/T1337R; SpCas9n-NG, SpCas9-NG nickase; tRNA, transfer RNA; UGI, uracil DNA glycosylase inhibitor; VQR, engineered SpCas9 variant with residue substitutions D1135V/R1335Q/T1337R; VQRn, VQR nickase; VRER, engineered SpCas9 variant with residue substitutions D1135V/G1218R/R1335E/T1337R; VRERn, VRER nickase; xCas9, engineered SpCas9 variant with residue substitutions E480K/E543D/E1219V.

## INTRODUCTION

fgene-10-00379 April 26, 2019 Time: 16:26 # 2

Genome-wide association studies have shown that point mutations create elite trait variations in crop plants, and point mutagenesis is one of the main strategies for crop improvement (Henikoff and Comai, 2003; Zhao et al., 2011; Yin et al., 2017). The discovery and development of the CRISPR – Cas9 system (Doudna and Charpentier, 2014; Hsu et al., 2014; Shalem et al., 2015; Wang et al., 2016; Komor et al., 2017a) has provided a powerful genome engineering tool for generating point mutations in plants through precise irreversible base conversion (base editing) without the need for double-stranded DNA backbone cleavages or donor DNA templates (Komor et al., 2016; Nishida et al., 2016). Base editing is much cleaner and more efficient than current methods used in plants [e.g., targeting induced local lesions in genomes (TILLING) and conventional nuclease-mediated, homologydirected repair (HDR)-dependent genome editing] (Henikoff et al., 2004; Slade et al., 2005; Hess et al., 2017; Yang et al., 2017; Kim, 2018).

The first reported CBEs that mediate C-to-T conversion were developed in a wide variety of organisms by fusion of Cas9n with rat cytidine deaminase rAPOBEC1 or activationinduced cytidine deaminase ortholog PmCDA1 (Lu and Zhu, 2017; Li et al., 2017; Ren et al., 2017; Shimatani et al., 2017; Zong et al., 2017). Although highly efficient and useful, these CBEs were restricted to edit sites that contained NGG PAM sequences because of the common SpCas9n that was used (Anders et al., 2014; Nishimasu et al., 2014). This characteristic limited the base editing to a narrow window of several base pairs from the PAM distal region. To circumvent this limitation, several studies have reported new CBEs that use SpCas9 variants or Cas9 homologs that recognize expanded or altered PAMs to increase the targets suitable for base editing. In human cells, several engineered SpCas9 variants that accept NGA (VQR), NGCG (VRER), NGAG (EQR), or NG (xCas9 and SpCas9-NG) PAM sequences have been employed with rAPOBEC1 or activation-induced cytidine deaminase to generate new CBEs (Kim Y.B. et al., 2017; Hu et al., 2018; Nishimasu et al., 2018). In addition, the SaCas9, which recognizes the NNGRRT PAM, and its engineered variant SaKKH, which recognizes the NNNRRT PAM sequence, also have been used to create base editors that expand the editing capability of CBEs (Kim Y.B. et al., 2017).

Most of the Cas9s described above have been used to create new CBEs for plants; the exceptions are VRER and EQR (Hua et al., 2018; Qin et al., 2018; Endo et al., 2019). In addition, wild type SpCas9 was used to broaden the base editing targets for non-canonical NAG PAMs in plants (Hua et al., 2018). Among them, SpCas9n-NG, SaCas9n, and SaKKHn CBEs, both rAPOBEC1-based and activationinduced cytidine deaminase-based or PmCDA1-based, were successfully developed in rice (Qin et al., 2018; Endo et al., 2019). However, only rAPOBEC1-based CBEs were created with SpCas9n and VQRn, and only one editable target site was reported for each CBE (Hua et al., 2018). In this study, to better utilize SpCas9 and VQR to enlarge the base editing scope in rice, we developed two new effective PmCDA1-based CBEs (pBEs), SpCas9n-pBE, and VQRn-pBE. These two pBEs substantially broaden the target sites from those with NGG PAMs to those with NAG and NGA PAMs. Additionally, the editing efficiency of VQRn-pBE was further increased using the modified sgRNA.

#### MATERIALS AND METHODS

#### Plasmid Construction

We modified the pCambia2300 plasmid to construct a vector called 2300-Spe. A schematic illustration of 2300-Spe vector construction is given in **Supplementary Figure S1**. Four fragments were digested at each end by restriction endonucleases to construct the SpCas9n-pBE-basic vector (**Supplementary Figure S2**). Then the four digested fragments together with the KpnI and SbfI digested 2300-Spe backbone were ligated using T4 ligase (NEB, Cat# M0202L) to generate SpCas9n-pBEbasic. Based on the SpCas9n-pBE-basic vector, specific point mutations described by Kleinstiver et al. (2015) were introduced into SpCas9n (D10A) using a Fast MultiSite Mutagenesis System (TransGen Biotech, Beijing, China) to generate VQRn-pBE-basic and VRERn-pBE-basic vectors. Target sequences were cloned before the sgRNA using BsaI according to Xie et al. (2015) to generate pBE constructions. The modified sgRNA linked with tRNA and the Oryza sativa U3 (OsU3) terminator was synthesized and digested with BamHI and HindIII, and used to replace the native sgRNA in the SpCas9n-pBE and VQRnpBE constructions to obtain the corresponding pBEs with the modified sgRNA. Target sites in the same constructs are shown in **Supplementary Table S1**. The primers used in this study are listed in **Supplementary Table S2**.

#### Rice Transformation

The wild type Agrobacterium tumefaciens strain LBA4404 (Weidi Biotech, Shanghai, China) was transformed by the resultant pBE constructs using a freeze/thaw method. Embryogenic calli induced from mature seeds of rice variety Nipponbare (O. sativa L. japonica. cv. Nipponbare) were used for the transformation, which was conducted as previously described (Hiei and Komari, 2008). After incubation with Agrobacterium for 10 min, the calli were recovered for 3 days and selected on 50 µg/ml hygromycin for 4 weeks to obtain resistant calli. Then, the resistant calli were transferred to regeneration medium (not containing hygromycin) to induce shoot regeneration for 1 month. When the shoots were 4–5 cm long, they were transferred to rooting medium for root induction for about 2 weeks to obtain T<sup>0</sup> plants.

### DNA Extraction and Identification of Transgenic Resistant Calli and T<sup>0</sup> Plants

Resistant calli and T<sup>0</sup> plants were harvested for genomic DNA extraction using a DNA-quick Plant System kit (Tiangen Biotech, Beijing, China). The target locus was amplified by PCR with Cas9 specific primers (**Supplementary Table S2**) and samples with a 1150-bp nucleic acid band in agarose gel electrophoresis were identified as transgenic resistant calli or T<sup>0</sup> plants.

#### Mutant Identification

fgene-10-00379 April 26, 2019 Time: 16:26 # 3

Several transgenic resistant calli and T<sup>0</sup> plants in a single experiment were used to detect C-to-T conversions and indels. Target loci were amplified by specific primers and the PCR products were purified using an EasyPure PCR Purification Kit (TransGen Biotech). The PCR products were sent for Sanger sequencing (Tsingke Biological Technology, Beijing, China) to detect mutations. C-to-T frequency in calli or T<sup>0</sup> plants was defined as the percentage of mutants with any target C-to-T substitution among all the transgenic samples. Indel frequency was defined as the percentage of mutants with any indels among the resulting C-to-T mutants. Single C-to-T frequency was defined as the percentage of mutants with C-to-T substitution at a specific single position among all the transgenic samples. Homozygous mutants were designated when all the mutations were homozygous. Frequency of mutant genotype was defined as the percentage of mutants with the same genotype among all the mutants.

#### Detection of Off-Target Mutations

Five to eight single T<sup>0</sup> plants, including base mutated lines and wild type lines, were selected for each off-target site detection. Potential off-target sites were searched on Cas-OFFinder (Bae et al., 2014) and amplified using the primers listed in **Supplementary Table S2**. The PCR products were purified using an EasyPure PCR Purification Kit (TransGen Biotech) and sent for Sanger sequencing (Tsingke Biological Technology) to detect off-target mutations.

### RESULTS

#### SpCas9n-pBE Enables Base Editing at NAG PAM Target Sites in Rice

Previous studies revealed that the most widely used wild type SpCas9 enables efficient genome editing at target sites bearing both the canonical NGG PAM and the non-canonical NAG PAM in rice (Meng et al., 2018). Because the combination of PmCDA1 with SpCas9n leads to C-to-T substitutions at targets with the NGG PAM (Shimatani et al., 2017), we hypothesized that a SpCas9n base editor also could function at targets with the NAG PAM. We fused the D10A nickase of SpCas9 with PmCDA1 and UGI. The fusion protein was driven by the O. sativa ubiquitin (OsUbq) promoter and the corresponding cassette was introduced into our tRNA–sgRNA editing system to generate a pBE designated as SpCas9npBE (**Figure 1A**).

We first used a resistant rice calli system to determine the feasibility of C-to-T base editing using SpCas9n-pBE. Six targets with NAG PAMs from the OsWaxy gene, which encodes an enzyme essential in the biosynthesis of granulebound starch, were selected (**Supplementary Table S3**). In three of the six targets, C-to-T base editing was detected with frequencies of 13.3–60% (**Figure 1B**). No indels were detected at any of the six on-target loci (**Supplementary Table S3**). Moreover, by analyzing the C-to-T frequency at each single C in the three edited targets, we found that the editing window spanned bases at positions 1 to 6 upstream of the PAM sequence (**Figure 1C**).

To further assess the use of SpCas9n-pBE in rice plants, the resistant calli were transferred to a regeneration culture to generate stable transgenic T<sup>0</sup> plants. In T<sup>0</sup> plants, four of the six target sites had C-to-T substitutions with frequencies of 7.7– 53.8% (**Figure 1B**). Three of the target sites (W-T1, W-T3, and W-T6) were edited in both T<sup>0</sup> plants and calli, whereas W-T2 was edited only in T<sup>0</sup> plants with a frequency of 7.7% (**Figure 1B**). Among the four edited sites, indel was detected only at the W-T1 site in T<sup>0</sup> plants (**Supplementary Table S4**). Except for the edited positions 13 and 12 in targets W-T3 and W-T6, respectively, the deamination window in T<sup>0</sup> plants was consistent with that in the resistant calli (**Figures 1C,D**). Single and double-base substitutions were predominant in the edited targets in T<sup>0</sup> plants. Triple or quadruple-base substitutions also were obtained for targets W-T6 and W-T1 (**Supplementary Table S4**). Furthermore, SpCas9n-pBE was able to be used for multiplex genome editing because two or three target sites were edited simultaneously in the same T<sup>0</sup> plant line (**Supplementary Table S5**). Taken together, our results indicated that SpCas9n-pBE could broaden PAM recognition from NGG to NAG in rice.

#### VQRn-pBE Enables Base Editing at NGA PAM Target Sites in Rice

With different PAM specificities compared with SpCas9, VRER for NGCG PAMs, and VQR for NGA PAMs were reported to enable efficient genome editing of endogenous genes in zebrafish, human cells, and rice (Kleinstiver et al., 2015; Hu et al., 2016). To further enlarge the scope of C-to-T base editing in rice, we engineered two SpCas9 variants, VRER and VQR, then individually fused them with PmCDA1 and UGI to generate VRERn-pBE and VQRn-pBE (**Figure 2A**).

Eleven targets with NGCG PAMs were selected for VRERnpBE editing, but the sequencing results showed none of them had C-to-T mutations in the resistant calli (**Supplementary Table S6**), implying VRERn-pBE had poor base editing activity in rice.

Because VQR mediated knockout mutations with a preference for NGAG > NGAT = NGAA in human cells (Kleinstiver et al., 2015), we tested the editing efficiency of VQRn-pBE for targets with NGAG PAMs. Four targets with NGAG PAMs from the OsWaxy gene were selected (**Supplementary Table S7**), and three of them were mutated successfully with frequencies of 5.7–77.1% in resistant calli and 10–90% in T<sup>0</sup> plants (**Figure 2B**). The deamination windows spanned positions 1 to 5 of the protospacers, counting from the 5<sup>0</sup> end of the target (**Figure 2C**). Indels were detected in two targets with relatively high substitution frequencies, and the corresponding indel frequencies were higher in T<sup>0</sup> plants than in calli (**Figure 2D**). Single or double C conversions were the most common genotypes (**Figure 2E**). Among all the T<sup>0</sup> base edited mutants, one homozygous mutant for W-T8 and three for W-T9 were obtained (**Figure 2E**). These results indicate VQRn-pBE could be used to enlarge the scope of base editing in rice.

We also detected the off-target effects of VQRn-pBE. Potential off-target sites that contained three to five mismatches with targets W-T7, W-T8, and W-T9 were chosen for the analysis (**Supplementary Table S8**). All these sites in T<sup>0</sup> plants were sequenced and no mutations were detected among any of the selected off-target sites (data not shown).

To avoid gene restriction and to confirm the capability of VQRn-pBE for base editing in rice, we used the O. sativa acetolactate synthase gene (OsALS), which encodes an essential enzyme in the biosynthesis of branched-chain amino acids. Six target sites with NGAG PAMs were selected (**Supplementary Table S9**). Among all the regenerated T<sup>0</sup> events, five of the six sites were base edited with frequencies of 10–80%; the exception

respectively. (B) Frequencies of mutations induced by VQRn-pBE at NGAG PAM target sites in resistant calli and T<sup>0</sup> plants. (C) Frequencies of targeted single C-to-T substitutions in the targets edited by VQRn-pBE at NGAG PAM sites in resistant calli and T<sup>0</sup> plants. (D) Indel frequencies in the targets edited by VQRn-pBE at NGAG PAM target sites in resistant calli and T<sup>0</sup> plants. (E) Mutations induced by VQRn-pBE at NGAG PAM sites in T<sup>0</sup> plants.

was the ALS-T4 site, which was not edited (**Table 1**). Indels were identified only in two targets with relatively high editing efficiencies (80 and 70.8%) (**Table 1**), similar to the results for the OsWaxy gene target sites. The effective deamination window typically spanned positions 2–7 within the protospacer, and the frequency of single C-to-T conversion was highest at or near position 3 (**Figure 3A**). Single C-to-T mutants were detected in all edited sites, and double C-to-T mutants were more than triple or quadruple mutants (**Table 1**). Additionally, three mutant lines contained homozygous substitutions were obtained in the ALS-T3 (C3 > T3 and C3C7 > T3T7) and ALS-T6 (C3 > T3 and C2C3 > T2T3) sites (**Figures 3B,C** and **Table 1**).

Because VQRn-pBE mediated efficient base editing at sites that contained NGAG PAMs in rice, we determined whether it


worked on targets with NGAT, NGAC, or NGAA PAMs. We selected 12 endogenous genomic target sites in the OsWaxy gene (**Supplementary Table S10**). We found that VQRn-pBE produced more C-to-T editing at target sites with the NGAT PAM than at target sites with the NGAC PAM in T<sup>0</sup> plants, although the editing efficiencies were much lower in the former (**Table 2**). Similar to the results for the NGAG PAM target sites, VQRn-pBE produced indels only at the NGAC PAM W-T15 site, which had a relatively high base editing frequency (41.2% in T<sup>0</sup> plants) (**Supplementary Table S11**). Single and double C-to-T mutants accounted for most of the detected mutants (**Supplementary Table S11**). VQRn-pBE produced no C-to-T editing at the four sites with the NGAA PAM in both calli and T<sup>0</sup> plants (data not shown).

#### The Modified sgRNA Increases the Base Editing Efficiency of VQRn-pBE

Several studies have reported that modified sgRNAs with a mutation in the streak of T and an extended duplex increased editing efficiency in mammalian cells and rice (Chen et al., 2013; Dang et al., 2015; Hu et al., 2017). To try to enhance


TABLE 2 | Frequencies of mutations induced by VQRn-pBE at target sites on the OsWaxy gene with NGAT and NGAC PAMs in rice resistant calli and T<sup>0</sup> plants.

the C-to-T substitution frequency, we modified the sgRNA as described previously in rice (Hu et al., 2017). We replaced the fourth T in the streak of T with C and extended the duplex by 5 bp (**Supplementary Figure S3**). Then, we used all 22 target sites in the OsWaxy gene with SpCas9n-pBE and VQRn-pBE with the modified sgRNA (**Supplementary Tables S3**, **S7**, **S10**). With SpCas9n-pBE, the modified sgRNAs showed equal or slightly higher editing frequencies (1.2 and 1.5 folds) than the native sgRNAs at the W-T2, W-T3, and W-T6 sites, efficiencies of 0–6.3% at the W-T4 site, and sharply decreased efficiencies (from 33.3% with the native sgRNA to 5.3% with the modified sgRNA) at the W-T1 site (**Figure 4A**). These results indicate that the modified sgRNA had an unsubstantial or small enhancement effect compared with native sgRNA for SpCas9n-pBE.

With VQRn-pBE, the modified sgRNA showed no base mutations at all four target sites with NGAA PAMs, which is similar to the results obtained with the native sgRNA (data not shown). However, base editing efficiencies were significantly enhanced by 1.3- to 7.6-fold for target sites with the other three PAMs using VQRn-pBE with the modified sgRNA (**Figure 4A**). Moreover, using the modified sgRNA produced C-to-T editing events with efficiencies of 0–12.5% and 5.9% in the W-T10 and W-T17 sites, but equal or slightly decreased efficiencies in the W-T9, W-T11, and W-T13 sites compared with using the native sgRNA (**Figure 4A**). These results indicate that the modified sgRNA was more effective in promoting the base editing efficiency of VQRn-pBE than that of SpCas9n-pBE.

Interestingly, except for the four targets with NGAG PAMs, no indels were identified in any of the targets with the other three PAMs using the modified sgRNA (**Supplementary Figure S4**). We compared the mutant genotypes of four sites with the highest editing efficiencies using VQRn-pBE with the native or modified sgRNAs. We found that the frequencies of single C-to-T mutations decreased and the frequencies of double or multiple C-to-T mutations increased when the modified sgRNA was used (**Figure 4B**). New genotypes also were produced in some target sites, such as C3C13 > T3T13 at the W-T7 site and C1C2C5 > T1T2T5 at the W-T9 site (**Figures 4B,C**). Moreover, the editing window was enlarged to position 13 and 10 within the protospacer at W-T7 and W-T8 sites when the modified sgRNA was used (**Figures 4B,C**).

Because the base editing efficiency of VQRn-pBE was increased with the modified sgRNA, we also detected its offtarget effects. The potential off-target sites were the same as those analyzed using the native sgRNA (**Supplementary Table S8**). The sequences were amplified in T<sup>0</sup> plants for Sanger sequencing and no mutations were found at any of the off-target loci tested (data not shown).

### DISCUSSION

Cytosine base editors are powerful new tools for targeted base editing in cells and organisms (Hess et al., 2017; Kim, 2018). The NGG PAM requirement of canonical SpCas9 greatly limits the targeting scope of CBEs. In this study, by fusing SpCas9n and its variants VRERn and VQRn with PmCDA1, we obtained two effective base editors, SpCas9n-pBE, and VQRn-pBE. Consistent results from both calli and T<sup>0</sup> plants confirmed the editing ability of both these base editors. About 66.7% of the selected target sites were base edited using SpCas9n-pBE (4/6) and VQRn-pBE (12/18) with frequencies of 4–90% in T<sup>0</sup> plants (**Figures 1B**, **2B** and **Tables 1**, **2**). Therefore, the pBEs enlarged the range of cytidine base editing from NGG PAM to NAG and NGA PAMs in rice.

Although VRERn fused with rAPOBEC1 was active in human cells (Kim Y.B. et al., 2017), no mutations were detected among the selected targets in rice with VRERn-pBE. This may be explained by the different genome environments in plant and human cells and the different deaminase base editing systems used. This result suggests that not all Cas9 variants that perform well in human cells will perform well in rice, and other Cas9 variants need to be tested before they are applied in plant.

In human cells, VQR produced different cleaving efficiency at sites that contained NGAN PAMs as follows: NGAG > NGAT = NGAA > NGAC (Kleinstiver et al., 2015). Therefore, we designed four NGAN PAMs for rice and tested the base editing activity of VQRn-pBE for the different target sites. Mutations were detected in 80% (8/10) of the target sites with the NGAG PAM, and in 75% (3/4) and 25% (1/4) at the target

with the native or modified sgRNA at four sites with the highest editing efficiencies. No mutations were detected in any cytidine residue located after position 13 in the targets, so the data from positions 14–20 upstream of the PAM sequence are omitted. (C) Sequencing chromatograms of T<sup>0</sup> plants at the W-T7 target of line9 and W-T9 target of line 7, with additional C13 and C1 base substitutions obtained using the modified sgRNA. Blue arrows indicate the additional edited bases obtained using the modified sgRNA; black arrows indicate the edited bases obtained using the native sgRNA.

sites with the NGAT and NGAC PAMs. No mutations were found in target sites with the NGAA PAM. These results imply that VQRn-pBE may have a strong preference for targets with NGAN PAMs as follows: NGAG > NGAT > NGAC > NGAA. However, the editing efficiencies were very low (4–7.1%) in the 75% edited targets with a NGAT PAM and much higher (41.2%) in the 25% edited targets with a NGAC PAM. Hence, we can not conclude the editing efficiency of VQRn-pBE was higher for targets with the NGAT PAM than for targets with the NGAC PAM. Recently, Hua and coworkers fused VQRn with rAPOBEC1 to achieve C-to-T substitutions in rice and reported 71.4% editing efficiency at one target site with a NGAG PAM (Hua et al., 2018). This is similar with our results with VQRn-pBE, which was more efficient at sites harboring NGAG PAMs.

For both SpCas9n-pBE and VQRn-pBE, the editing window typically spanned positions 1 to 7 within the protospacer, and the target C at or near position 3 showed the highest C-to-T substitution frequency. This result is a little different from the reported Cas9n-plant base editor composed of rAPOBEC1, Cas9n, and UGI in rice, in which the deamination window spanned positions 3–9, and the target C at or near position 7 showed the highest editing efficiency (Zong et al., 2017). Moreover, indels seem to be produced at the targets with

relatively high editing efficiencies. Introduction of the Gam protein of bacteriophage Mu (Komor et al., 2017b) can be tested to reduce the indel frequency and improve product purity in the future.

Modified sgRNAs were reported to improve knock out activity in mammalian cells and in rice when combined with wild type SpCas9 or its VQR variant (Chen et al., 2013; Hu et al., 2017). In our study, targets that were not mutated with the native sgRNA were successfully base edited with the modified sgRNA. Moreover, the enhanced efficiency using the modified sgRNA was more predominant for VQRn-pBE than for SpCas9n-pBE. Analysis of the mutant genotypes in targets with high editing frequencies revealed changes that occurred when the modified sgRNA was used: (i) the frequencies of single C-to-T conversions decreased and double or multiple C-to-T substitutions increased, and (ii) mutants with novel single C-to-T conversions in different positions or with new multiple C-to-T mutations at various positions were obtained. Together, these results suggest that the modified sgRNA could be used to increase the editing frequency of VQRn-pBE in rice under applicable circumstances.

Other natural or evolved CRISPR nucleases with different PAM requirements that can broaden the editable targets include Lachnospiraceae bacterium Cpf1, Neisseria meningitidis Cas9, Campylobacter jejuni Cas9, and Streptococcus thermophilus Cas9 (Hou et al., 2013; Glemzaite et al., 2015; Kim E. et al., 2017; Tang et al., 2017; Li et al., 2018; Zhong et al., 2018). Recently, a human cytidine deaminase APOBEC3A was used to generate an effective base editor in mammalian cells and plants (Gehrke et al., 2018; Zong et al., 2018). To further expand the scope of base editing in rice, all the

REFERENCES


above CRISPR nucleases and human APOBEC3A should be tested in the future.

#### CONCLUSION

In this study, we described two efficient PmCDA1-based CBE systems, SpCas9n-pBE and VQRn-pBE, that will help to expand the scope of cytosine base editing in rice. The effective deamination window typically spanned positions 1–7 of the protospacer and the target single C showed the highest editing frequency at or near position 3. The mutant genotypes were mainly single or double C-to-T substitutions. Furthermore, the editing efficiency of VQRn-pBE was increased by the modified sgRNA. These base editors will be useful tools for scientific research and crop breeding in rice.

#### AUTHOR CONTRIBUTIONS

JY, CZ, and YW designed the experiments and wrote the manuscript. YW, FW, SZ, FF, and JS performed all the experiments. WX and FW analyzed the results. JY supervised the project. All authors read and approved the final manuscript.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00379/full#supplementary-material


**Conflict of Interest Statement:** The authors submitted a patent application based on the results reported in this paper.

Copyright © 2019 Wu, Xu, Wang, Zhao, Feng, Song, Zhang and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fgene-10-00379 April 26, 2019 Time: 16:26 # 10

# Biological Characteristics of Severe Combined Immunodeficient Mice Produced by CRISPR/Cas9-Mediated Rag2 and IL2rg Mutation

Yong Zhao<sup>1</sup> , Peijuan Liu<sup>1</sup> , Zhiqian Xin<sup>1</sup> , Changhong Shi<sup>1</sup> , Yinlan Bai<sup>2</sup> , Xiuxuan Sun<sup>3</sup> , Ya Zhao<sup>1</sup> , Xiaoya Wang1,4, Li Liu1,5, Xuan Zhao1,4, Zhinan Chen<sup>3</sup> \* and Hai Zhang1,6 \*

<sup>1</sup> Laboratory Animal Center, Air Force Medical University, Xi'an, China, <sup>2</sup> Department of Microbiology, Air Force Medical University, Xi'an, China, <sup>3</sup> Department of Cell Biology, National Translational Science Center for Molecular Medicine, Air Force Medical University, Xi'an, China, <sup>4</sup> College of Veterinary Medicine, Northwest A&F University, Yangling, China, <sup>5</sup> Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi'an, China, <sup>6</sup> National Translational Science Center for Molecular Medicine, Air Force Medical University, Xi'an, China

#### Edited by:

David Jay Segal, University of California, Davis, United States

#### Reviewed by:

Serap Yalın, Mersin University, Turkey Wei Xu, Texas A&M University–Corpus Christi, United States

\*Correspondence:

Zhinan Chen znchen@fmmu.edu.cn Hai Zhang hzhang@fmmu.edu.cn

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 27 November 2018 Accepted: 12 April 2019 Published: 30 April 2019

#### Citation:

Zhao Y, Liu P, Xin Z, Shi C, Bai Y, Sun X, Zhao Y, Wang X, Liu L, Zhao X, Chen Z and Zhang H (2019) Biological Characteristics of Severe Combined Immunodeficient Mice Produced by CRISPR/Cas9-Mediated Rag2 and IL2rg Mutation. Front. Genet. 10:401. doi: 10.3389/fgene.2019.00401 Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas)9 is a novel and convenient gene editing system that can be used to construct genetically modified animals. Recombination activating gene 2 (Rag2) is a core component that is involved in the initiation of V(D)J recombination during T- and B-cells maturation. Separately, the interleukin-2 receptor gamma chain gene (IL2rg) encoded the protein-regulated activity of natural killer (NK) cells and shared common receptors of some cytokines. Rag2 and IL2rg mutations cause immune system disorders associated with T-, B-, and NK cell function and some cytokine activities. In the present study, 2 single-guide RNAs (sgRNAs) targeted on Rag2 and IL2rg genes were microinjected into the zygotes of BALB/c mice with Cas9 messenger RNA (mRNA) to create Rag2/IL2rg−/<sup>−</sup> double knockout mice, and the biological characteristics of the mutated mice were subsequently analyzed. The results showed that CRISPR/Cas9-induced indel mutation displaced the frameshift of Rag2 and IL2rg genes, resulting in a decrease in the number of T-, B-, and NK cells and the destruction of immune-related tissues like the thymus and spleen. Mycobacterium tuberculosis 85B antigen could not induce cellular and humoral immune response in mice. However, this aberrant immune activity compromised the growth of several tumor heterogenous grafts in the mutated mice, including orthotopic and subcutaneous transplantation tumors. Thus, Rag2/IL2rg−/<sup>−</sup> knockout mice possessed features of severe combined immunodeficiency (SCID), which is an ideal model for human xenograft.

Keywords: Rag2, IL2rg, CRISPR/Cas9, severe combined immunodeficient mice, biological characteristic

### INTRODUCTION

The construction of chimeras of a rodent animal model that harbors human tissues has provided valuable in vivo assay systems in biomedical research. To do this, aberrant immune-related genes make it possible to construct chimeric rodent animals. The nude mouse (or athymic nude mouse) was first described by Flanagan (1966), which involved a spontaneous mutation in the

Zhao et al. SCID Mice Produced by CRISPR/Cas9

Foxn1 gene, resulting in a lack of fur development and impaired T-cell function (Schorpp et al., 1997). Thereafter, CBA/N and Beige mice, which boasted mutations in the xid and beige genes, respectively, leading to B-cell- and natural killer (NK)-cellmediated immune-response failure, were also discovered (Clark et al., 1981; Klaus et al., 1997). After that, prkdc gene and Rag2 mutation mouse, which showed T- and B-cell dysregulation, were defined as a severe combined immunodeficiency (SCID) mouse and used widely in biomedical research (Shinkai et al., 1992; Greiner et al., 1998). Subsequently, SCID mice were greatly improved by the development of non-obese diabetic (NOD) mice, and a new strain of NOD/SCID mice was created by backcrossing SCID mice with NOD mice (Shultz et al., 1995). In these mice, the mature, function lymphocytes were absent, and lower levels of NK cells and cytokine production were present. Further studies were carried out by mating NOD/shi-SCID mice or Rag2 mutation mice with interleukin-2 receptor gamma chain gene (IL2rg) mutation mice, which generated T, B, and NK cells combined deficiency mice, like NOG, NSG, and Rag2/IL2−/<sup>−</sup> double knockout mice (Shultz et al., 2005; Belizário, 2009). These mouse have higher immunocompromised symptoms than the previously mentioned mice did due to the simultaneous absence of mature T-cells, B-cells, and NK cells as well as defective macrophage activity and reduced dendritic cell function (Ito et al., 2002; Shultz et al., 2007; McDaniel and Grisham, 2018). However, all of above-mentioned rodents were known as immunodeficient mice due to one or more immune response being impaired. As a result, these immunodeficient mice are advantageous because of their engraftment, infection control, and tumor control, and thus are a useful tool in biomedical research.

Clustered regularly interspaced short palindromic repeats (CRISPR) was a DNA loci in that it contained multiple, short, and direct repetitions of base sequences and could be found in bacteria and archaea (Jansen et al., 2002; Ishino et al., 2018). In the adjacent region of a CRISPR sequence, a conserved coding protein sequence was also identified, which was labeled as a CRISPR-associated (Cas) gene, and, correspondingly, its encoded protein was referred to as the Cas protein (Barrangou, 2015). Cas proteins form a large family that includes many subtypes; among these, the Cas9 protein originating from the bacterial type II CRISPR/Cas system is a programmable RNA-guided endonuclease that is capable of binding and cutting site-specific cleavage of double-stranded DNA (Mojica and Montoliu, 2016). The Cas9 enzyme recognizes the protospacer adjacent motif (PAM) sequence 5<sup>0</sup> -NGG-3<sup>0</sup> and cleaves the DNA at 3 bp to 4 bp upstream of PAM guiding by tracrRNA and crRNA, the damage DNA is subsequently repaired using 2 main pathways: nonhomologous end joining (NHEJ) and homology-directed repair (HDR). NHEJ always generates indel (insertion or deletion) mutations, while HDR occurs when the repaired template is presented (Hsu et al., 2014). The original biological function of CRISPR/Cas9 was an adaptive immune defense mechanism against phage for bacteria; the invaded DNA was recognized by CRISPR, after which Cas9 cleaved the exogenous DNA, leading the invading phage to become inactive (Sampson and Weiss, 2014). Since it represents a more convenient, rapid, and efficient way to introduce a mutation in a genome sequence, CRISPR/Cas9 is now known as a key novel genome engineering tool for replacing, deleting, or inserting base pairs into a DNA sequence, which could be used to construct genetically modified animals (Sander and Joung, 2014; Tschaharganeh et al., 2016).

Recombination activating gene 2 (Rag2), expressed in adult thymus (Wilson et al., 1994), is an immune-related molecule that is involved in the initiation of immunoglobulin V(D)J gene rearrangement and T-cell-receptor gene recombination during T- and B-cell development (Notarangelo et al., 2016). Rag2 is essential to the generation of mature T- and B-lymphocytes; importantly, mutations of this gene in humans retards Tand B-cell development, resulting in SCID associated with autoimmune-like Omenn symptom occurrence (Corneo et al., 2001; Notarangelo et al., 2016). Separately, IL2rg, expressed in thymus and spleen (Cao et al., 1993), is known as an immune regulator in cytokine secretion; in the growth and differentiation of T-cells, B-cells, and NK cells; and in maintaining the homeostasis of the immune system (Aliyari et al., 2015). Mutation of the IL2rg gene prompted a deficiency in functional NK cell and cytokine secretion reduction, including IL-2, IL-4, IL-7, IL-9, IL-15, and IL-21 (Puck et al., 1997). In the present study, we postulated the construct of SCID mice through a mutation in the Rag2 and IL2rg genes using the CRISPR/Cas9 gene editing tool, sought to determine the biological characteristics of the mutated mice by investigation the immune response against the 85B antigen of Mycobacterium tuberculosis, and establishment a human tumor xenograft model in vivo.

#### MATERIALS AND METHODS

#### Animals

BALB/c mice were obtained from the Laboratory Animal Center of Air Force Medical University. ICR mice, which used as recipient animal for transplanting microinjected zygotes, were purchased from Vita River Laboratory Animal Technology Co., Ltd., (Beijing, China). The mice were housed in a temperatureand climate-controlled specific pathogen free facility with a 12-h light/dark schedule. Body weight and the intake of food and water were calculated per week. All mouse experiments were approved by the Institutional Animal Care and Use Committee of Air Force Medical University.

#### Reagents and Plasmid

A MEGAshortscriptTM T7 high-yield transcription kit and MEGAclearTM kit were provided by Thermo Fisher Scientific (Waltham, MA, United States). Cas9 messenger RNA (mRNA) and protein were purchased from Biomics biotechnologies (Nantong, China) and New England Biolabs (Ipswich, MA, United States), respectively. Pregnant mare's serum gonadotropin (PMSG) and human chorionic gonadotropin (hCG) were purchased from the Ningbo Second Hormone Factory (Ningbo, China). M2 medium was provided by Sigma-Aldrich (St. Louis, MO, United States), while a KSOM powdered media kit (cat: MR-020P-SF) was obtained from Millipore (Burlington, MA, United States). Mouse FITC-CD3, PE-NKp46, and APC-B220 antibodies were purchased from BioLegend (San Diego, CA,

United States). LongAmp Taq DNA polymerase and Bbs I restriction enzyme was provided by New England Biolabs (Ipswich, MA, United States). A mouse tail genome extraction kit was sourced from Foregene Biological Technology Co., Ltd., (Chengdu, China). pX330 plasmid was purchased from Addgene. Interferon (IFN) γ, IL-2, and IL-10 cytokine enzymelinked immunoassay (ELISA) detection kits were purchased from eBioscience (San Diego, CA, United States).

#### Cell Culture

The brain glioma cell line U87 was purchased from the Type Culture Collection of the Chinese Academy of Sciences (Shanghai, China). Human primary gastric, renal, and bladder carcinoma cell-luciferase and Passage Burkitt's lymphoma cell line Raji-luciferase were obtained from the Laboratory Animal Center of Air Force Medical University. Cells were incubated in high-glucose Dulbecco's modified Eagle medium or Roswell Park Memorial Institute 1640 supplemented with 10% fetal bovine serum under a humidified atmosphere of 5% CO<sup>2</sup> at 37◦C.

### Preparation of Single-Guide RNA and Microinjection

For the purpose of single-guide RNA (sgRNA) transcription in vitro, 2 20 bp sgRNA sequences targeting Rag2 exon3 (gene ID: 19374) and IL2rg exon1 (gene ID: 16186) were screened on the website of http://crispr.mit.edu and synthesized by TsingKe Biological Technology (Xi'an, China). After annealing, doublestrand DNA was digested with Bbs I restriction enzyme and cloned into pX330 plasmid. Polymerase chain reaction (PCR) was performed to obtain a sgRNA sequence carrying T7 promoter and the 121 bp PCR product then was transcripted with the MEGAshortscriptTM T7 high-yield transcription kit according to the manufacture's protocol and purified. Mice superovulation and microinjection were carried out according to a previous report (Esmail et al., 2016). Briefly, 20 µg of Rag2, 20 µg of IL2rg sgRNA mixture, and 10 µg of Cas9 mRNA were microinjected into the cytoplasm of collected fertilized eggs. After incubation for 24 h at 37◦C, the 2-cell forms of the eggs were then transplanted to the ampulla of recipient pseudopregnancy ICR female mice.

### Single-Guide RNA in vitro Cleavage Efficiency Assay

PCR reaction was performed with Rag2 and IL2rg specific primers to obtain substrate DNA. After purification, 1 µg substrate DNA was digested with 2 µg Cas9 protein, 200 ng sgRNA, and 2 µL of 10 × Cas9 buffer at 37◦C for 1 h in 20 µL of reaction volume. Reaction products were run on 1.5% agarose gel to examine cleavage efficiency.

### Flow Cytometry

50 µL of peripheral blood was collected from the tail veins of homozygous mice. Samples were lysed with erythrocyte lysing solution and incubated for 30 min with 1:1,000-diluted FITC-CD3, PE-NKp46, and APC-220 antibodies in a dark place. Then, samples were analyzed by flow cytometry (Becton, Dickinson and Company, Franklin Lakes, NJ, United States) and data were analyzed with the FlowJo softwares (FlowJo LLC, Ashland, OR, United States).

## Real-Time Quantitative RT-PCR

Total RNA was extracted from spleen and/or thymus of homozygotes mice with TRIzol reagent (Invitrogen, Carlsbad, CA, United States) according to the manufacturer's instructions. 500 ng total RNA was reverse-transcribed to cDNA and qPCR was performed using a SYBR Green PCR kit (TakaRa, Dalian, China). Each sample was run in triplicate in a final volume 25 µl reaction mix, which contained 1 µl cDNA template, 10 pmol of Rag2 and IL2rg specific primers (**Table 1**), and 12.5 µl of SYBR Green solution. Assays were run using following procedures: 1 cycle of 95◦C 30 s, followed by 40 cycles of 95◦C for 20 s and 60◦C for 30 s. Data was analyzed with the 2−11CT method.

### Tumor Xenograft Model

15 Rag2/IL2rg−/<sup>−</sup> mice were divided into three group: (1) human primary tumor cells inoculation group; (2) Raji cells inoculation group; and (3) U87 cells inoculation group. The logarithmic growth phase of human primary gastric, renal, and bladder carcinoma cells were collected and 1 × 10<sup>7</sup> cells/mouse were implanted subcutaneously in the flank site and bred for 3 weeks. Meanwhile, 1 × 10<sup>7</sup> Raji cells were inoculated intravenously to replicate a hematopoietic model. For the glioma xenograft model, 1 × 10<sup>7</sup> U87 cells were stereotaxically injected into the precuneus, while the other mice were implanted subcutaneously in the flank site. 3 weeks later, all mice were euthanized and tumor formation was observed through skull anatomy in glioma xenograft mice, while the luciferase-labeled cell xenograft mice were visualized using the IVIS Lumina II imaging system (PerkinElmer, Waltham, MA, United States).

### Hematoxylin and Eosin Staining

Once the mice were euthanized, the thymus, spleen, and the xenograft tumor tissue samples were sectioned and fixed in 4% formaldehyde for 24 h, followed by dehydration with a series of ethanol solutions and subsequent embedding in paraffin. Then, 5-µm-thick sections were cut and stained with hematoxylin and eosin (H&E) according to protocol. The histopathological changes were examined under a light microscope (BX43; Olympus, Tokyo, Japan).

### Genotype Analysis

Genome DNA was extracted from the tail tip of 1-week-old mice and PCR reaction was performed with Rag2 and IL2rg specific forward and reverse primers, respectively. After purification, PCR products were sequenced with Sanger sequencing and the results



were analyzed with the SnapGene 3.1.1 software (GSL Biotech, Chicago, IL, United States).

#### Immunization, Lymphocyte Proliferation, Antibody and Cytokine Assay

10 Rag2/IL2rg−/<sup>−</sup> mice were divided into 2 group: (1) 85B antigen treated group; and (2) control group. Recombinant 85B antigen of Mycobacterium tuberculosis (MTB) was mixed with aluminum hydroxide adjuvant at a 1:1 ratio. Then, the mutated mice (experimental group) and WT BALB/c mice (control group) were inoculated 3 times intramuscularly with a 2-week interval by 50 µg of 85B antigen/adjuvant mixture at the hind leg. Following immunization, antibody, and lymphocyte proliferation assay were performed as done in a previous report (Zhang et al., 2010; Wang et al., 2014). Briefly, recombinant 85B antigen was coated and the titer of anti-85B specific antibody was carried out by ELISA assay. Meanwhile, lymphocytes were isolated from the spleen of immunized mice and simulated with recombinant 85B antigen (experimental well) or PPD (positive control), post which the supernatants were collected for cytokine measurement. Next, 20 µL MTS (Promega, Madison, MI, United States) was added in each well and incubated for another 4 h; optical density was measured under 490 nm; and the stimulation index (SI) was used to evaluate lymphocyte proliferation, as follows: SI = (A<sup>490</sup> of stimulated wells – A<sup>490</sup> of blank cells) / (A<sup>490</sup> of negative wells – A<sup>490</sup> of blank wells).

#### Statistical Analysis

Data are expressed in the form of mean ± standard deviation. A Student's t test and one-way analysis of variance were used for assessing significant differences among experimental groups using the Statistical Package for the Social Sciences software, version 17 (IBM Corp., Armonk, NY, United States). p-values of <0.05 or <0.01 were considered to be statistically significant.

### RESULTS

#### Construction of Rag2/IL2rg−/<sup>−</sup> Gene Double-Knockout Mice With CRISPR/Cas9 System From BALB/c Strain

Rag2 and IL2rg were involved in the development of T-, B-, and NK cells and the production of cytokines. The mutation of these genes retarded their development in the immature stage and contributed to a lack of both innate and adaptive immune response. Based on this, we planned to construct a SCID BALB/c mouse model targeting the Rag2 and IL2rg genes simultaneously with a CRISPR/Cas9 gene editing tool. After screening on the http://crispr.mit.edu website, a pair of 20 bp oligonucleotides was selected as sgRNA targeting sequences from the Rag2 exon3 sense strand and IL2rg exon1 anti-sense strand, respectively (**Figure 1A**). Subsequently, in vitro cleavage efficiency of the sgRNA assay demonstrated that Rag2 and IL2rg sgRNA were endowed with stronger cutting activity for the target sequence, resulting in producing two obvious bands on 1.5% agarose gel (**Figure 1B**). Similarly, Sanger sequencing also

mutated mice in target region (C). mRNA relative expression of Rag2 and IL2rg in wild-type BALB/c and mutated mice (D).

suggested that the CRISPR/Cas9 system possessed higher gene editing efficiency in vivo. There were 40 pups born after transplantation, of which 20 pups (50%) and 18 pups (45%) showed induced indel mutation on the Rag2 and IL2rg target sequences, respectively. Among these, 4 pups showed mutations simultaneously on both of these sequences, leading to small-fragment deletion or insertion (**Figure 1C**). Thus, 10# mouse was mated with wild-type female BALB/c mice to examine germline transmission. Three generations later, a similar genotype was observed in homozygote Rag2/IL2rg−/<sup>−</sup> mice, suggesting indel mutations were stably inherited by offspring. Rag2 and IL2rg expression was detected in adult thymus and/or spleen with specific primers (**Table 1**), data showed Rag2 and IL2rg mRNA transcriptional level were decreased significantly (**Figure 1D**). Thus Rag2/IL2rg−/<sup>−</sup> gene double-knockout mice were constructed based on BALB/c background.

#### Rag2/IL2rg−/<sup>−</sup> Double Knockout Alter the Number of Granulocytes, but Not Physiological Behavior of Mutated Mice

Gene mutation might alter mouse phenotype, so we speculated as to whether Rag2/IL2rg−/<sup>−</sup> double knockout influenced mutated mice' normal behavior or not. Hematological parameters in routine blood test were assayed by biochemical analyzer. Data indicated that granulocyte counts for Rag2/IL2rg−/<sup>−</sup> mice was decreased significantly, which resulting in increasing of percentage of neutrophils, monocytes, eosinophils and basophiles, while other hematological parameters unchanged (**Table 2**). There were no significant differences regarding body weight or food and water intake between mutated mice and wild-type mice (**Figures 2A–C**). Thus, Rag2/Il2rg mutation did not influence the physiological behaviors of mice.

TABLE 2 | Comparison of Hematological parameters between Rag2/IL2rg−/<sup>−</sup> and BALB/c mice.


p < 0.05 or <0.01 was considered statistically significant, compare with BALB/c mice.

#### Rag2/IL2rg−/<sup>−</sup> Double Knockout Retarded Thymus and Spleen Development and Reduced the Number of Lymphocytes

Since Rag2 and IL2rg genes were involved in immune-related tissue development, we attempted to investigate histopathological structure changes in thymus and spleen tissue. Notably, volume and organ weight ratio of thymus and spleen tissue were decreased significantly after mutation (**Figures 3A,B**). Thymus atrophy, spleen dysplasia, and lymphocyte reduction could be observed with H&E staining: thymus cells decreased and stromal cells increased in thymus tissue, cortical staining became lighter than the medulla, and the boundary between the cortex and medulla was blurred. Additionally, white pulp shrunk and red pulp expanded in the spleen tissue; the numbers of lymphocytes and hematopoietic and monocyte cells in white pulp were reduced; and the boundary between white pulp and red pulp was more unclear. However, the histopathologic structure of the spleen and thymus was normal in wild-type BALB/c mice (**Figure 3C**). Affected by this, the numbers of T-cells, B-cells, and NK cells in the peripheral blood were decreased significantly (**Figure 3D**).

#### MTB Antigen 85B Could Not Stimulate Immune Response in Rag2/IL2rg−/<sup>−</sup> Double-Knockout Mice

Mycobacterium tuberculosis 85B protein was a prominent antigen designed to stimulate stronger cellular and humoral immune responses in inbred mice (Lu et al., 2018). To investigate the immune response induced by 85B antigen in Rag2/IL2rg−/<sup>−</sup> double-knockout mice, recombinant 85B protein was immunized 3 times and the titer of anti-85B specific antibody was detected by ELISA. Higher-titer antibody could be induced in wild-type BALB/c mice, but not in Rag2/IL2rg−/<sup>−</sup> double-knockout mice, even with an extension of immunization time (**Figures 4A,B**), and lymphocyte proliferation was inhibited because of lower SI (**Figure 4C**). Meanwhile, the expressions of IL-2, IL-10, and IFN-γ were decreased significantly (**Figures 4D–F**). Considering the above data, Rag2/IL2rg double knockout not only destructed the histopathological structure of thymus and spleen tissues but also attenuated the cellular and humoral immune responses in mutated mice, suggesting these mice presented the features of an immunocompromised animal.

#### Rag2/IL2rg−/<sup>−</sup> Double-Knockout Mice Were More Suitable for the Construction of Human Tumor Xenograft Model

Rag2/IL2rg−/<sup>−</sup> double knockout impelled SCID in mice; next, we attempted to transplant various human-tissue-derived primary and passage tumor cells into these mutated mice. To establish the orthotopic transplantation of glioma, U87 cells were inoculated intracerebrally into the mutated mice. 3 weeks later, tumor growth could be observed on parenchyma of the brain significantly. H&E staining showed tumor cell arrangement of dense, spindle cells. Notably, nuclear hyperchromatism and

wild-type BALB/c mice were sacrificed. Various organs were extracted, and the organ/weight ratio calculated (A,B). Spleen and thymus tissues were sectioned and stained with the H&E method (C). Peripheral blood was collected from tail veins and, after dilution, sample were stained with FITC-CD3, PE-NKp46, and APC-220 antibodies and detected by flow cytometry FCM (D). Data are presented in the format of mean ± standard deviation of 3 independent experiments performed in triplicate (∗p ≤ 0.05, ∗∗p ≤ 0.01, compared with controls).

pathological mitosis were common in carcinoma, while abundant hemorrhage and necrosis were also observed in this region. Although tumor cell infiltration was displayed in the junction region, there was a clear boundary between carcinoma and paracancerous tissue, which was consistent with the pathological characteristics of glioma (**Figure 5A**). We also used lymphoma

FIGURE 4 | Immune response level induced by MTB 85B antigen. Recombinant 85B protein was immunized 3 times, the serum and spleen cells collected, and anti-85B specific antibody titer was detected by ELISA (A,B). Spleen lymphocytes' stimulation index (SI) was measured by MTS (C). IL-2, IL-10, and IFN-γ concentrations in immunized serum were assayed by commercial ELISA kit (D,E,F). Data are presented in the format of mean ± standard deviation of 3 independent experiments performed in triplicate (∗p ≤ 0.05, ∗∗p ≤ 0.01, compared with controls).

Raji cells to replicate a hematopoietic tumor model: as shown in **Figure 5B**, Raji cells penetrated the blood–brain barrier and distributed all over the body after intravenous injection, including in the brain (**Figure 5B**). Primary cultured cells of bladder cancer, renal cancer, and gastric cancer from clinical patients were transplanted subcutaneously, and the volume of xenotransplanted tumors was increased along with the time (**Figure 5C**). Taken together, our findings suggest Rag2/IL2rg−/<sup>−</sup>

double-knockout mice represent an ideal xenograft tumor model, as this mouse type showed compromised growth of various tissue-derived cancer cells and different inoculation methods.

### DISCUSSION

fgene-10-00401 April 26, 2019 Time: 14:50 # 8

In present study, severe combined immunodeficient mice were prepared by CRISPR/Cas9-mediated Rag2 and IL2rg mutation. This mice was produced from clear genetic background of BALB/c inbred strain, with homozygosity as the main characteristic. Although severe combined immunodeficient mice could also be obtained by mating Rag2 with IL2rg mutated mice (Belizário, 2009), but this mating might increase heterozygosity since Rag2 and IL2rg mutated mice were from different strains. In addition, the biological characteristics of this immunodeficient mice were studied, the mice not only developed abnormal lymphatic organs, leading to number of immune cells decreased, but also could not induce immune response even stimulated with recombinant antigen, resulting in immune response defects. Interestingly, this attenuant immune response was more susceptible to compromising tumor xenotransplantation, which made this mice was more adapted to tumor xenograft model.

Gene mutation led to the production of an immunodeficiency animal. Foxn1 gene mutation brought about a T-lymphocytemediated cellular immune response defect of hairless nude mice (Vaidya et al., 2016), Xid gene mutations produced a B-lymphocyte-mediated humoral immune response obstacle of CBA/N mice (Szymczak et al., 2013), and beige gene mutations also produced NK cell dysfunction of beige mice (Pflumio et al., 1989). At this point, these immunodeficiency animals showed only a single immune cell disorder. Gene mutations could additionally cause a variety of immune dysfunctions, with prkdc and Rag2 spontaneous gene mutations having resulted in SCID mice. Although there were some similarities between these genes, Rag2 was a vital kinase in the V(D)J recombination. Tand B-cell development was retarded when the Rag2 gene was mutated because of V(D)J recombination (Carmona et al., 2016). At this moment, T-cell development was blocked at the immature CD3−CD4−CD8−CD25<sup>+</sup> stage and B-cells were blocked at the B220−CD43<sup>+</sup> IgM−progenitor B-cell stage (Riccetto et al., 2014; von Muenchow et al., 2017). The difference was that prkdc mutation led to immune leakiness in 20% of animals at 12 weeks of age, where immunoglobulin could be detected in the serum of prkdc-mutated mice (Danska et al., 1996; Katano et al., 2011). This interfered with the experimental results, but there was no immune leakiness in the Rag2 gene mutation; that was the reason for why we selected the Rag2 gene in this study.

Immunodeficient mice have been used to construct humanized animal models after the transplantation of human cells or tissues, so we attempted to explore the possibility of building a humanized animal after transplantation model involving human CD34<sup>+</sup> hematopoietic stem cells and NK cells on Rag2/IL2rg−/<sup>−</sup> mice. Unfortunately, transplanted CD34<sup>+</sup> hematopoietic stem cells and NK cells abolished the

ability of their survival and proliferation in mutated mice (data not shown), suggesting that mice with Rag2/IL2r−/<sup>−</sup> gene mutation were not suitable for the construction of humanized mice. Several reasons might be considered as answers for this phenomenon: (1) sgRNAs of Rag2/IL2rg produced indel mutation in both of these genes, which induced small-fragment deletion, rather than the absence of large fragments. Although indel also could prompt frameshift mutation in Rag2/IL2rg genes, as compared with large-fragment absence, small-fragment mutation of indel seems less impactful on T- or B-cells. The utility of dual sgRNAs in follow-up studies to achieve the loss of large fragments of Rag2/IL2rg gene might be a better solution (Song et al., 2016). Additionally, (2) prkdc-mutated NOG or NSG mice were currently the best used humanized mice, although there was immune leakiness after Prkdc mutation. From the point of view of humanized mice construction, prkdc mutation was more suitable than Rag2, and (3) except for T-cells, B-cells, NK cells, neutrophils, macrophages, and cytokines could also cause a reaction of graft-versus-host disease, though the Rag2/IL2rg mutation reduced the number of neutrophils, lymphocytes in this study, residual granulocytes, macrophages, and cytokines might make it difficult for grafts to survive in mice. Although this immunodeficient mice were not suitable for humanized animal model, but it was a better tool for human tumor xenotransplantation, no matter orthotopic, hematopoietic, and xenotransplantatic tumor. Therefore, this immunodeficient mice was more propitious to application in tumor xenograft.

Except for applications in tumor research, immunodeficient mice could also be used in infection and immunity. Previous study demonstrated innate lymphoid cells were a critical role against Clostridium difficile infection based on data from Rag1−/<sup>−</sup> single gene and Rag2/IL2rg−/<sup>−</sup> double gene mutated mice (Abt et al., 2015). T, B cells were absent in Rag1−/<sup>−</sup> and Rag2/IL2rg−/<sup>−</sup> mice, however, innate lymphoid cells, like NK cells, Th17, and Th22 cells were normal in Rag1−/<sup>−</sup> mice, but not in Rag2/IL2rg−/<sup>−</sup> mice. Compared with Rag1−/<sup>−</sup> mice, Rag2/IL2rg−/<sup>−</sup> mice was succumbed to death after C. difficile infection owing to absence of NK cells, Th17, and Th22 cells. Thus innate lymphoid cells plays a protective role against C. diificile infection. Survival rate and cytokines expression, like IL-22, IL-17, and IFN-γ were assayed after challenge with C. diificile virulence strain. In present study, we evaluated the level of immune response, like antibodies titer, lymphocytes proliferation index, Th1 and Th2 cytokines after immunization with recombinant MTB 85B. Similarity, both mice abolished abilities of immune response, no matter immunization with virulence strain or recombinant MTB antigen. The difference was detection indexes, the immune properties of innate lymphoid cells was not investigated in this manuscript. Although we didn't compare the difference of this two types mice, the immune properties should be similarity because both mice was mutated on same genes.

In summary, we have constructed a Rag2/IL2 gene mutant mouse model using the CRISPR/Cas9 gene editing technology. The Rag2/IL2 gene mutation did not affect the normal physiological behavior of mice, but the mutated mice displayed the typical characteristics of immunodeficiency. This mouse model could be used as a good animal model option in tumor research and other related fields.

#### AUTHOR CONTRIBUTIONS

fgene-10-00401 April 26, 2019 Time: 14:50 # 9

HZ and ZC conceived the project. HZ designed the experiments. YoZ, PL, ZX, CS, YB, XS, YaZ, XW, LL, and XZ performed the experiments and prepared the manuscript. HZ and ZC

#### REFERENCES


supervised the study and contributed reagents and materials. All authors contributed to data analysis.

#### FUNDING

This work was supported by Scientific and technological resources coordination project of Shaanxi Province (2018PT-03) and Special fund for military laboratory animals (SYDW(2016)001).



smegmatis expressing Ag85B-ESAT6 fusion protein against persistent tuberculosis infection in mice. Hum. Vaccin. Immunother. 10, 150–158. doi: 10.4161/hv.26171


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Zhao, Liu, Xin, Shi, Bai, Sun, Zhao, Wang, Liu, Zhao, Chen and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Enhancement of Precise Gene Editing by the Association of Cas9 With Homologous Recombination Factors

Ngoc-Tung Tran<sup>1</sup>† , Sanum Bashir1,2† , Xun Li<sup>1</sup> , Jana Rossius1,2, Van Trung Chu1,2 , Klaus Rajewsky<sup>1</sup> and Ralf Kühn1,2 \*

<sup>1</sup> Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany, <sup>2</sup> Berlin Institute of Health, Berlin, Germany

#### Edited by:

David Jay Segal, University of California, Davis, United States

#### Reviewed by:

Robin Ketteler, University College London, United Kingdom Ashok S. Bhagwat, Wayne State University, United States

#### \*Correspondence:

Ralf Kühn ralf.kuehn@mdc-berlin.de †These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics

Received: 30 November 2018 Accepted: 05 April 2019 Published: 30 April 2019

#### Citation:

Tran N-T, Bashir S, Li X, Rossius J, Chu VT, Rajewsky K and Kühn R (2019) Enhancement of Precise Gene Editing by the Association of Cas9 With Homologous Recombination Factors. Front. Genet. 10:365. doi: 10.3389/fgene.2019.00365 The CRISPR-Cas9 system is used for genome editing in mammalian cells by introducing double-strand breaks (DSBs) which are predominantly repaired via non-homologous end joining (NHEJ) or to lesser extent by homology-directed repair (HDR). To enhance HDR for improving the introduction of precise genetic modifications, we tested fusion proteins of Cas9 nuclease with HDR effectors to enforce their localization at DSBs. Using a traffic-light DSB repair reporter (TLR) system for the quantitative detection of HDR and NHEJ events in human HEK cells we found that Cas9 fusions with CtIP, Rad52, and Mre11, but not Rad51C promote HDR up to twofold in human cells and significantly reduce NHEJ events. We further compared, as an alternative to the direct fusion with Cas9, two components configurations that associate CtIP fusion proteins with a Cas9- SunTag fusion or with guide RNA that includes MS2 binding loops. We found that the Cas9-CtIP fusion and the MS2-CtIP system, but not the SunTag approach increase the ratio of HDR/NHEJ 4.5–6-fold. Optimal results are obtained by the combined use of Cas9-CtIP and MS2-CtIP, shifting the HDR/NHEJ ratio by a factor of 14.9. Thus, our findings provide a simple and effective tool to promote precise gene modifications in mammalian cells.

Keywords: Cas9, gene editing, homologous recombination, CtIP, CRISPR

### INTRODUCTION

The RNA guided Cas9 nuclease is used to create targeted double-strand breaks (DSBs) in the genome of mammalian cells and represents a versatile tool for genome editing (Barrangou and Doudna, 2016; Komor et al., 2016). CRISPR-Cas9 mediated DSBs are repaired by either the nonhomologous end joining (NHEJ) repair pathway that leads to randomly sized small deletions or insertions (Indels), or by homology-directed repair (HDR) enabling precise sequence modifications that are copied from a repair template. Since HDR requires the presence of a repair template and is restricted to the S and G2 phases of the cell cycle it occurs less frequently than NHEJ. This presents a barrier to applications that rely on precise sequence modifications, such as the correction of mutations in somatic gene therapy or the modeling of disease mutations. To reinforce precise gene editing at Cas9 induced DSBs, tools for shifting the repair pathway choice in favor of HDR must be developed. DSB repair pathway choice is largely determined by the competition between the

53BP1 and BRCA1 regulator proteins, triggering either the protection or resection of DSB ends and the subsequent engagement of the NHEJ or HDR pathway, respectively (**Figure 1A**; Daley and Sung, 2014; Gupta et al., 2014; Zimmermann and de Lange, 2014). 53BP1 is recruited to DSBs by recognition of the Ubiquitin mark at Lysine 15 of histone H2A (Fradet-Turcotte et al., 2013) in chromatin flanking the break sites. 53BP1 blocks CtIP-based end resection (Bunting et al., 2010) and recruits Rif1 and the Shieldin complex (Mirman et al., 2018), which further block end resection and inhibit BRCA1 accumulation (Escribano-Díaz et al., 2013; Zimmermann et al., 2013). In contrast, the HDR pathway requires the exclusion of 53BP1 and the resection of DSB ends in order to be initiated. During the S/G2 phase, BRCA1 excludes Rif1 from DSB repair foci, and recruits CtIP and the MRE11-Rad50-NBS1 (MRN) complex. This complex initiates a cleavage step which is then further respected at the 5<sup>0</sup> end by Exo1 (Sartori et al., 2007; Symington and Gautier, 2011; Symington, 2016) extending on each side of the DSB (Zakharyevich et al., 2010). The exposed single-stranded DNA (ssDNA) is protected by binding of RPA1 that is subsequently replaced by Rad51 through the action of BRCA2 and Rad52, forming a nucleofilament competent for homology search and strand invasion (Liu et al., 2011).

Previous approaches to enhance HDR include enrichment of cells in the S/G2 phase (Lin et al., 2014; Yang et al., 2016), restriction of Cas9 activity to the S/G2 phase (Gutschner et al., 2016; Howden et al., 2016), inhibition of the NHEJ key molecules DNA ligase IV (Chu et al., 2015; Maruyama et al., 2015) or 53BP1 (Paulsen et al., 2017; Canny et al., 2018) and the fusion of Cas9 with CtIP (**Figure 1B**; Charpentier et al., 2018). Here we extended the Cas9 fusion approach to the HDR key proteins Mre11, CtIP, RPA1, Rad51C, and Rad52. Using CtIP, we compared the direct fusion with Cas9 to alternative molecular configurations that associate CtIP with guide RNA that includes MS2 binding loops (**Figure 1C**) or with a Cas9-SunTag fusion protein (**Figure 1D**), previously developed for binding of transcriptional activators to dCas9 (Tanenbaum et al., 2014; Konermann et al., 2015). In addition, we compared wildtype CtIP, that is activated in the S/G2 cell cycle phases through phosphorylation at the Threonine residue 847, with the phosphomimetic T847E mutant that is also active in the G1 phase (Huertas and Jackson, 2009). Using a traffic-light DSB repair reporter (TLR) system for the quantitative detection of HDR and NHEJ events in human HEK cells we found that both the fusion of CtIP with Cas9 and the MS2, but not the SunTag system strongly shift the balance of DSB repair pathway choice toward HDR. Best results were obtained by the combined use of both systems, increasing the HDR/NHEJ ratio 14.9-fold.

#### RESULTS

#### DSB Repair Modification by Cas9 Fusion Proteins

To quantitatively determine CRISPR/Cas9-induced DSB repair by HDR or NHEJ, we used a traffic light reporter (TLR) construct, integrated into the AAVS1 locus of human HEK293 cells (HEKTLR) as previously described (Chu et al., 2015). Briefly, the reporter cassette includes a CAG promoter for expression of a non-functional coding region for Yellow fluorescent (Venus) protein, disrupted by the replacement of codons 117–152 with a 23 bp gRNA target sequence from the mouse Rosa26 locus (sgRosa26), followed by a P2A peptide and the coding region for a red fluorescent (TagRFP) protein in a reading frame shifted by 2 bp (**Figure 2A**). If an intact Venus coding sequence is provided as a template and the repair of DSBs occurs via the HDR pathway the reporter cells are detected by the expression of Venus. CRISPR/Cas9-induced DSBs in the target region that are repaired via NHEJ and acquire Indels resulting into the shift of translation into the frame (+2) of P2A-RFP are detectable by the expression of RFP in reporter cells. Analysis by the inDelphi tool (Shen et al., 2018) for the repair of the Rosa26 target site in HEK293 cells predicts a frequency of 16% of products in the +2 frame. Therefore, Venus positive reporter cells represent all HDR events but the number of RFP positive cells in a sample indicates only a fraction of NHEJ repair events. For the assessment of HDR modifiers we constructed N- or C-terminal fusions of Cas9 with the coding regions of human MRE11A, CtIP (wildtype or the phosphomimetic T847E mutant), RPA1, Rad51C, or Rad52 separated by a flexible linker of 16 residues (**Figure 2B**). To monitor the effects of Cas9 fusions on DSBs repair pathways, we co-transfected HEKTLR cells with plasmids expressing either Cas9 or Cas9 fusions, a vector for expression of sgRosa26 and a Blasticidine resistance gene together with the donor plasmid (pTLR-repair) for repair of the defective Venus reporter gene (**Figure 2C**). The transfected cells were selected with Blasticidine for the enrichment of transfected cells and the frequency of Venus<sup>+</sup> and RFP<sup>+</sup> cells was analyzed 4 days later by flow cytometry (**Supplementary Figure 1**) in 4 independent samples. The results were used to calculate mean values and standard deviation. The ratio of Venus<sup>+</sup> versus RFP<sup>+</sup> cells is used as a relative index for DSB repair of the reporter by HDR or by NHEJ events resulting into the +2 reading frame. As shown in **Figure 2D**, upon expression of Cas9 we observed 0.95% of Venus+ and 7.55% of RFP<sup>+</sup> cells in the ratio of 0.13 (sample 1). The expression of a N- or Cterminal fusion protein of Cas9 with Rad52 both lead to the increase of Venus<sup>+</sup> and the decrease of RFP<sup>+</sup> cells, shifting the Venus/RFP ratio to values of 0.35 and 0.42, respectively. The expression of a N- or C-terminal fusion protein of Cas9 with MRE11A lead to a higher increase of Venus<sup>+</sup> cells but a lower decrease of RFP<sup>+</sup> cells (samples 7 and 8), exhibiting Venus/RFP ratios of 0.35 and 0.37, respectively. The expression of a C-terminal fusion of Cas9 with CtIP or its N-terminal fusion with the CtIP (T847E) mutant (samples 5 and 6) increased the level of Venus+ cells up to 2.6%, showing Venus/RFP ratios of 0.42 and 0.53, respectively. In contrast to Rad52, MRE11A, and CtIP, the Cas9-RPA1 fusion protein (sample 4) lead to a smaller shift of the Venus/RFP ratio (0.22). As compared to the control (Cas9, sample 1) the increase of Venus<sup>+</sup> cells and the decrease of RFP<sup>+</sup> cells in samples 2–8 was significantly different (p < 0.05). Only the expression of the Cas9-Rad51C fusion (sample 9) resulted into levels of Venus<sup>+</sup> cells and RFP<sup>+</sup> cells that were not significantly different from the control (sample 1). These results show that the use of Cas9 fusions with multiple

proteins of the HDR pathway, specifically CtIP, MRE11A and Rad52, can be used to stimulate DSB repair by HDR up to 2.7-fold.

#### Cas9 Fusion Proteins in 53BP1 Knockout HEKTLR Cells

The 53BP1 key regulator of NHEJ as well as its interaction partners Rif1 and the shieldin complex counteract DSB end resection. In order to elucidate the role of 53BP1 in the TLR assay with Cas9 fusion proteins we generated reporter cells harboring frameshift knockout alleles of the 53BP1 gene (HEKTLR/153BP1) using CRISPR-Cas9 (**Supplementary Figure 2A**). HEKTLR and HEKTLR/153BP1 reporter cells were transfected with expression vectors for Cas9 or Cas9 fusions, the sgRosa26 and a BFP reporter together with pTLR-repair (**Figure 3A**). The BFP<sup>+</sup> population representing the transfected cells was analyzed 4 days later by flow cytometry (**Supplementary Figure 2B**) to determine the frequency of Venus<sup>+</sup> and RFP<sup>+</sup> cells. The results were used to calculate mean values shown in bar plots with standard deviation. As shown in **Figure 3B**, in HEKTLR cells with wildtype alleles of 53BP1 the expression of Cas9-MRE11A, -Rad52, or -CtIP lead to a statistically significant (p < 0.05) increase of Venus<sup>+</sup> cells and a decrease of RFP<sup>+</sup> cells, as observed previously (**Figure 2D**). In contrast, the transfection of Cas9 into 53BP1 deficient HEKTLR/153BP1 cells lead to twofold higher levels of Venus<sup>+</sup> cells and a more than threefold reduction of RFP<sup>+</sup> cells to 1.7% (**Figure 3C**, sample 1), as compared to 6.73% RFP<sup>+</sup> in HEKTLR cells (**Figure 3B**, sample 1). As compared to HEKTLR cells the transfection of HEKTLR/153BP1 cells with Cas9-Mre11, -Rad52, and -CtIP wildtype fusion proteins did not significantly increase the levels of Venus<sup>+</sup> cells (samples 2–4), whereas the levels of RFP<sup>+</sup> cells fall below 2% and was not significantly different from the control (sample 1). Only the expression of Cas9-CtIPT847E fusion protein lead to a significant increase of Venus<sup>+</sup> cells. These results suggest that in wildtype HEKTLR cells Cas9 fusion proteins with MRE11A, Rad52, or CtIP are efficiently counteracting the inhibitory action of 53BP1 on DSB end resection, leading to increased HDR repair. Nevertheless, Cas9 fusion proteins interfere only weakly with the NHEJ promoting activity of 53BP1 since the numbers of RFP<sup>+</sup> cells in wildtype HEK cells were only moderately suppressed, but strongly reduced in 53BP1 knockout cells. In conclusion, we found that Cas9 fusions with MRE11A, Rad52, and CtIP are useful tools for increasing HDR mediated gene editing but only weak suppressors of NHEJ. Thus, experimental conditions for the concurrent increase of HDR together with NHEJ suppression would require an additional, active inhibition of 53BP1.

### Targeting of the Beta-2 Microglobulin Gene in HEK Cells

In addition to the TLR reporter construct we assessed the effect of Cas9 fusion proteins on targeting a red fluorescent mCherry reporter gene into the last exon of the beta-2 microglobulin (B2M) gene of HEK cells. For targeting of the B2M locus we used a sgRNA against B2M (sgB2M) and a repair template vector for HDR (B2M-donor) that leads to the insertion of a 2A peptide and the mCherry coding region upstream of the

FIGURE 2 | DSB repair modification by Cas9 fusion proteins. (A) The traffic light reporter (TLR) construct indicates DSB repair by NHEJ or HDR and was integrated into the AAVS1 locus of HEK cells (HEKTLR). Upon induction of a DSB in the defective Venus coding region, RFP is expressed upon NHEJ repair resulting into deletions that shift translation by 2 bp. Venus expression occurs upon HDR with a repair template vector that includes the intact Venus coding region (pTLR-repair). (B) Vectors for the expression of Cas9 or fusion proteins between the N- or C-terminal end of Cas9 and Rad52, RPA1, CtIP wildtype (wt), or the T847E mutant (mut), Mre11A or Rad51C, driven by the CBh promoter. pA – polyadenylation signal. (C) For DSB repair assays HEKTLR reporter cells were cotransfected with expression vectors for Cas9 or Cas9 fusion proteins, sgRosa26, Blasticidine and pTLR-repair. After 4 days of Blasticidine selection the samples were analyzed by FACS for the presence of Venus and RFP positive cells. (D) Bar graph representation of Venus and RFP positive cells, indicating HDR or NHEJ repair of the TLR reporter, upon transfection with an expression vector for Cas9, or Cas9 in C-terminal (Cter) or N-terminal (Nter) fusion with Rad52, RPA1, CtIP wildtype (wt), CtIP mutant T847E (mut), MRE11A, or Rad51C. Bars show mean values of three independent samples with standard deviation. These values were used to calculate the ratio of Venus<sup>+</sup> to RFP<sup>+</sup> cells and p-values (T-test) to determine the significance in levels of Venus<sup>+</sup> or RFP<sup>+</sup> cells between samples 1 and 2–9. n.s., not significant, p > 0.05.

B2M Stop codon (**Figure 4A**). HEK cells were transfected with expression vectors for Cas9 or Cas9 fusions, sgB2M and the B2M donor vector (**Figure 4B**). The transfected cell population was analyzed 4 days later by flow cytometry for the presence of Cherry<sup>+</sup> cells (**Supplementary Figure 3**). The results were used to calculate mean values of 4 independent samples with standard deviation. As shown in **Figure 4C**, the transfection with Cas9, sgB2M, and B2M donor resulted into 6.4% Cherry<sup>+</sup> cells. The use of expression vectors for Cas9-MRE11A or Cas9- Rad52 fusion protein moderately, but significantly (p < 0.05) increased the frequency of Cherry<sup>+</sup> cells to 8.1 or 10.9%, whereas the expression of Cas9-CtIP or Cas9-CtIP(T847E) fusion protein more than doubled the number of Cherry<sup>+</sup> cells to 18.4 or 19.5%, respectively (**Figure 4C**). These results show that in particular the Cas9-CtIP fusion proteins are able to support HDR at the endogenous B2M target locus in HEK cells.

### Association of CtIP and Cas9 via the MS2 and SunTag Systems

Besides Cas9-CtIP fusion proteins we further assessed the MS2 and the SunTag system as alternative molecular configurations for the association of CtIP to Cas9 induced DSBs (**Figures 1C,D**). In a first approach we fused CtIP with a monomer of the MS2 bacteriophage coat protein (MS2-CtIP) which binds as dimer to a MS2 derived 34 nt RNA aptamer sequence motif that can be included in the tetraloop and stem loop of sgRNAs, as described (Konermann et al., 2015; **Supplementary Figures 4A,B**). Since CtIP must associate via its N-terminus into a tetramer to become functional, the binding of two closely neighbored MS2-CtIP molecules at each MS2-sgRNA loop (**Supplementary Figure 4B**). may be inefficient or interfere with its tetramerization. Therefore, we also constructed a fusion of CtIP with a single chain MS2 dimer (MS2di-CtIP), that binds to the MS2 aptamer RNA as a monomer (Peabody and Lim, 1996; **Supplementary Figure 4C**). In a second approach Cas9 was fused with 10 repeats of a GCN4 derived peptide motif (Cas9-SunTag) (Tanenbaum et al., 2014) that is recognized by a high affinity single-chain antibody, designated here as SunLigand (SunL) (**Supplementary Figure 4D**). For the association of CtIP with Cas9-SunTag we constructed an expression vector for a SunL-CtIP(T847E) fusion protein. For these assays we used HEK reporter cells harboring in the AAVS1 locus a modified TLR reporter construct (HEKTLR6) that includes a Venus coding region disrupted by

the replacement of codons 95–97 with the same Rosa26 derived target sequence used in the TLR reporter. Since HEKTLR6 cells are homozygous for the reporter construct a small population of double positive cells appears upon DSB induction, undergoing HDR repair on one reporter allele and a mutagenic NHEJ event on the other. For DSB repair assays the number of single positive Venus and RFP positive cells was determined, excluding double positive cells. For repair of the TLR6 reporter we used a template vector (pTLR-donor) that includes the Venus coding sequence, excluding the start codon to prevent background expression. HEKTLR6 cells were cotransfected with plasmids for expression of Cas9, Cas9-CtIP or Cas9-SunTag driven by the CAG promoter together with pTLR-donor and the Rosa26 specific sgRNA including two MS2 aptamer sequences [sgRosa26(MS2]. Three days after transfection the frequency of Venus<sup>+</sup> and RFP<sup>+</sup> cells from triplicate samples was determined by flow cytometry and the mean values and standard deviations were calculated (**Figure 5A**). The ratio of Venus+/RFP<sup>+</sup> cells

FIGURE 5 | DSB repair modification by MS2-CtIP and SunL-CtIP fusion proteins in HEKTLR6 reporter cells. (A) Expression vectors for Cas9 or Cas9-SunTag, sgRosa26(MS2), pTLR-donor and CtIP, MS2-CtIP, MS2di-CtIP, or SunL-CtIP were cotransfected into HEKTLR6 cells and the frequency of Venus<sup>+</sup> cells (green columns) and RFP<sup>+</sup> cells (red columns), indicating DSB repair by HDR or NHEJ, was determined by flow cytometry (FACS) 72 h after transfection. (B) Transfection results are shown as bar graphs representing mean values with standard deviation of triplicate samples. Plasmids selected (+) for individual transfection samples are indicated in the table below. Mean values were used to calculate the ratio of Venus<sup>+</sup> to RFP<sup>+</sup> cells and p-values (T-test) to determine the significance in levels of Venus<sup>+</sup> or RFP<sup>+</sup> cells between samples 2 and 3–6 (table bottom) and to compare samples 4 and 5 (top horizontal lines). n.s., not significant, p > 0.05. Two independent replicates of the assay were performed that confirmed the results.

is used as an index for DSB repair choice by HDR or NHEJ. Of note, this relative value represents the ratio of all reporter HDR events to only a fraction of NHEJ mediated Indels that reconstitute the RFP reading frame. As shown in **Figure 5B**, the transfection of Cas9, sgRosa26(MS2) and pTLR-donor resulted into 5.6% Venus and 11.4% RFP cells (sample 2; Venus/RFP ratio: 0.5), whereas a control without pTLR-donor (sample 1) showed a Venus background of 0.4%. The expression of Cas9 and free CtIP (sample 3) yielded 6.8% Venus and 12.2% RFP cells, showing a similar ratio (0.56) as the sample with Cas9 alone. The expression of MS2-CtIP(T847E) lead to a moderate, but statistically significant increase of Venus<sup>+</sup> cells with a ratio of 0.54 (sample 4). In contrast to MS2-CtIP(T847E), the expression of MS2di-CtIP(T847E) strongly increased (p = 0.0018) the Venus<sup>+</sup> cells to 14.6% and reduced the RFP<sup>+</sup> cells (p = 0.018) to 8.3%, resulting into a Venus/RFP ratio of 1.76 (sample 5). In contrast to MS2di-CtIP, the expression of Cas9-SunTag and SunL-CtIP did not increase the Venus<sup>+</sup> cells (sample 6, ratio: 0.82), as compared to the sample with free CtIP. These results show that in fusion with CtIP only the MS2di, but not the MS2 or the SunTag configuration, leads to a strong shift of the HDR/NHEJ balance upon DSB repair of the reporter. Since the MS2 and SunTag systems assemble two or more CtIP molecules in close proximity these configurations may interfere with CtIP oligomerization and the stimulation of end resection.

## Comparison of Cas9-CtIP, MS2di-CtIP, and SunL-CtIP Fusion Proteins

Next, we compared the performance of the Cas9-CtIP fusion protein with the MS2di-CtIP and SunL-CtIP systems in HEKTLR6 cells to determine which approach is most effective for HDR stimulation. For a side-by-side comparison we transferred the Cas9-CtIP coding region into the same vector used for Cas9 expression in MS2di-CtIP assays that includes a CAG promoter region and the sgRosa26(MS2) expression cassette (**Figure 6A**). As shown in **Figure 6B**, the transfection of Cas9, sgRosa26(MS2), and pTLR-donor resulted into 3.6% Venus and 9.2% RFP cells (sample 2; ratio: 0.38), whereas a control without pTLR-donor (sample 1) showed a Venus background of 0.7%. The expression of MS2di-CtIP (sample 3) yielded 12.4% Venus and 7.2% RFP cells, showing a Venus/RFP ratio of 1.7. The expression of the Cas-CtIP fusion lead to a further increase of the Venus/RFP ratio to 2.31 although the absolute levels of Venus and RFP cells were decreased (sample 4). Interestingly, the combined expression of Cas-CtIP and MS2di-CtIP lead to a further decrease of RFP<sup>+</sup> cells to 2.1% (sample 5), resulting into an increased ratio of 5.67. The expression of Cas9-SunTag and SunL-CtIP lead to a Venus/RFP ratio of 1.62, but lower levels of Venus<sup>+</sup> and RFP<sup>+</sup> cells as compared to the sample with MS2di-CtIP. These results show that both Cas9-CtIP and MS2di-CtIP are shifting the ratio of HDR/NHEJ repair of the reporter construct. Best results were obtained when both systems were combined, leading to a significant decrease of RFP<sup>+</sup> cells (p < 0.05) as compared to the use of Cas9-CtIP or MS2di-CtIP alone, shifting the HDR/NHEJ balance of DSB repair at the reporter by a factor of 14.9.

## DISCUSSION

For the improvement of HDR and precise gene editing at Cas9 induced DSBs we compared three approaches for associating HDR effector proteins with the nuclease or guide RNA. Firstly, we tested the ability of Cas9/HDR effector fusion proteins and found that CtIP but also Rad52 and MRE11A are effective for HDR stimulation, counteracting the inhibition of end resection by 53BP1 without strong suppression of NHEJ. Using CtIP as paradigm we compared the Cas9-CtIP fusion to methods based on anchoring of SunL-CtIP to a Cas9-SunTag protein or MS2- CtIP fusions to gRNA that includes MS2 recognition sequences. Our results show that the performance of the single chain MS2di - CtIP fusion was comparable to the Cas9-CtIP direct fusion and that the combination of both proteins had an additive effect on NHEJ suppression, shifting the HDR/NHEJ repair of the reporter construct by a factor of 14.9. Our findings provide simple and effective tools to promote precise gene modifications in mammalian cells.

Previous studies based on HDR stimulation rarely explored the association of HDR effectors with Cas9/gRNA complexes, except for the work of Charpentier et al. (2018) that described the N-terminal fusion of CtIP with Cas9. In this study it was found that CtIP-Cas9 increased HDR at the AAVS1 locus in human cells twofold without reducing Indel formation and that the N-terminal oligomerization domain of CtIP is sufficient for HDR stimulation. Our results extend the analysis of Cas9-fusion proteins to the HDR effectors MRE11A and Rad52 and confirm the previous finding that the N-terminal but also the C-terminal fusion of CtIP to Cas9 stimulates HDR. In addition, we validated the MS2 aptamer system as a useful tool for HDR stimulation by the association of CtIP with sgRNA loops. In fusion with CtIP the MS2di single chain configuration was more effective than MS2 monomers that require dimer formation for binding to the MS2 RNA aptamer. Whereas the MS2 tagged sgRNA provides two binding sites for MS2 fusion proteins, the SunTag system has previously shown utility for signal amplification by binding of up to 24 SunL-GFP molecules to a Cas9-SunTag fusion protein (Tanenbaum et al., 2014). In fusion with CtIP and Cas9- SunTag providing 10 SunL binding sites, we found the SunTag approach was less effective for HDR stimulation than the MS2 system, which might be caused by an interference with CtIP oligomerization in the presence of multiple localized molecules. A similar effect may account for the improved performance of the MS2di single chain dimer as compared to the use of MS2-CtIP monomers. Nevertheless, it remains to be investigated whether MS2 monomers or the SunTag system provide valid options for the use of other HDR effectors which do not form oligomers.

Remarkably, we observed that the combined use of Cas9-CtIP and MS2di-CtIP had a cumulative effect on NHEJ suppression but not on HDR stimulation, pointing to the enhanced counteraction against the initiation of NHEJ through 53BP1 by localization of three instead of two or a single CtIP molecule at the Cas9/sgRNA complex. This observation may allow further improvements of the toolbox for manipulation of the DSB repair pathway choice by the assembly of two cooperative HDR effector proteins at the DSB site, such as CtIP together with Rad52, MRE11A or other factors.

FIGURE 6 | Comparison of Cas9-CtIP, MS2di-CtIP, and SunL-CtIP fusion proteins. (A) Expression vectors for Cas9, Cas9-CtIP, or Cas9-SunTag, sgRosa26(MS2), pTLR-donor, and MS2di-CtIP or SunL-CtIP were cotransfected into HEKTLR6 cells and the frequency of Venus<sup>+</sup> cells (green columns) and RFP<sup>+</sup> cells (red columns), indicating DSB repair by HDR or NHEJ, was determined by FACS analysis 72 h after transfection. (B) Transfection results are shown as bar graphs representing mean values ± standard deviation of triplicate samples. Plasmids selected (+) for individual transfections are indicated as in the table below. Mean values were used to calculate the ratio of Venus<sup>+</sup> to RFP<sup>+</sup> cells and p-values (T-test) to determine the significance in levels of Venus<sup>+</sup> or RFP<sup>+</sup> cells between samples 5 and 3 or 4 (top horizontal lines). Two independent replicates of the assay were performed that confirmed the results.

Furthermore, it may be possible to combine the N-terminal CtIP HDR enhancer domain with one or more active domains from other effectors into multifunctional fusions that are able to bypass rate limiting steps of HDR initiation or HDR processing. Besides the DSB repair pathway choice, the availability of donor templates at DSBs represents a limiting factor for the completion of HDR. It has been previously shown that HDR can be enhanced by the co-localization of donor templates and Cas9 via covalent linkage using the SNAP Tag in fusion with Cas9 that couples to oligodeoxynucleotides modified with Benzylguanin (Savic et al., 2018). Therefore, it will be interesting to explore whether the colocalization of donor templates and of HDR effectors at DSBs by using Cas9 fusion proteins together with the MS2 aptamer system has a synergistic effect on HDR enhancement.

In summary, here we employed a TLR reporter system for the assessment of DSB repair by HDR and NHEJ in HEK cells that we had used earlier to validate DNA Ligase IV as a target for NHEJ suppression (Chu et al., 2015). Although we expect that HDR enhancement at the TLR reporter system is predictive for other genomic targets as previously shown and we indeed confirmed the effect of Cas9-CtIP at the B2M gene, the Cas9 fusion and the MS2 aptamer approach will require further validation using additional target genes and other cell types. For reporting NHEJ activity, the present configuration of the TLR construct has a restriction since it reports only a fraction of NHEJ repair events that restore the RFP reading frame. To increase the utility of TLR reporter constructs in future, modified constructs are required enabling to report NHEJ repair products that occur in all three reading frames. Furthermore, we did not discriminate between DSB repair choice in the G1 and S/G2 cell cycle phases. We anticipate that HDR enhancement occurs primarily in the S/G2 phases in which the HDR pathway usually operates. It has been shown that HDR can be at least partially reactivated in G1 by the combined suppression of 53BP1 and the expression of a degradation resistant Palb2 mutant together with the phosphomimetic CtIP (T847E) mutant (Orthwein et al., 2015). Thus, it will be interesting to determine the effect of Cas9 and MS2 fusions with HDR effectors specifically on DSB repair

in the G1 phase and whether the current protocol or further modifications will enable HDR mediated DSB repair in the G1 phase or even in resting cells.

In the current format of using plasmid-based expression vectors we expect that the CtIP co-localization approach can readily support applications of precise gene editing in human cell lines such as modeling or correcting disease-causing mutations by cotransfection with Cas9/sgRNA vectors. In combination with recombinant proteins its applications may be extended in future to primary human cells such as hematopoietic stem cells, muscle satellite or other stem cells exhibiting sufficient basal levels of HDR to assess its utility for the precise correction of mutations required for somatic gene therapy. Since DSB repair mechanism are conserved in evolution, we expect that the co-localization approach using CtIP or other HDR effectors can also be applied for HDR enhancement in other species.

#### MATERIALS AND METHODS

#### Plasmid Constructions

In the first set of experiments (**Figures 2–4**) we used Cas9 fusion vectors that were constructed based on the pX330 vector backbone (Addgene ID 42230), including a CBh promoter for protein expression. First, an oligonucleotide encoding the 16 mer flexible linker (SGSETPGTSESATPES) and multiple cloning sites (PacI and SalI) was in-frame cloned into either the N- or C-terminal end of Cas9. Then, cDNA of DNA repair factors (RAD51, RAD52, CtIP, CtIP\_mutant, MRE11A, NBN, RPA1, and RAD50) were cloned between the PacI and SalI sites by standard PCR cloning methods. All primers for cloning are listed in **Table 1**. sgRNAs against 53BP1 was purchased as separated oligos, phosphorylated, annealed, and cloned into pX330 by standard cloning technique. pTLR-repair (Addgene 64322) was used as repair template vector for DSB repair assays in HEKTLR reporter cells (Chu et al., 2015). In the second set of experiments (**Figures 5**, **6**) we used pU6Rosa-CAG-Cas9 for expression of Rosa26 sgRNA and of Cas9 from the CAG promoter, constructed by ligation of oligonucleotides sgRosa-A/-B into the BbsI sites downstream of a human U6 promoter into plasmid pU6(BbsI). The U6-sgRosa cassette was recovered as AscI fragment and inserted into pCAG-Cas9-bpA-EF1- BFP, upstream of the CAG promoter driving Cas9 expression, followed by a BFP coding region under control of the human EF1α promoter. For cloning of pCAG-Cas9-CtIP a SphI fragment was isolated from pX330-CBh-Cas9-CtIP and ligated between the SphI sites of pCAG-Cas9-bpA-EF1-BFP. Plasmid pTLR-donor was generated by whole plasmid PCR amplification using 5<sup>0</sup> phosphorylated primers TLRtv-1 and TLRtv-2 and pTLR-repair as template, followed by of the PCR fragment. The modification of pTLR-repair removes the Start codon of the Venus coding region, eliminating background fluorescence upon transient transfection. Plasmid pU6Rosa-CAG-CtIP(T847E)-EF1-BFP was generated by cloning of a PCR product, amplified with the primers CtIP-for/-rev using as template pCW-GFP-CtIP-T847E (Addgene ID 71111), into the PacI/NotI sites of pU6Rosa-CAG-Pac-Not-EF1BFP. Plasmid pU6Rosa(MS2) was cloned by ligation of oligonucleotides sgRosa-A/-B into the BbsI sites downstream of a human U6 promoter into plasmid sgRNA(MS2) (Addgene ID 64124). The U6Rosa(MS2) cassette was recovered by PCR (primers U6MS2-for/-rev) and cloned in between the AscI sites of pU6Rosa-CAG-CtIP(T847E)-EF1-BFP, replacing the U6Rosa cassette, to derive plasmid pU6Rosa(MS2)-CAG-CtIP(T847E)- EF1-BFP. The latter plasmid was digested with PacI upstream of the CtIP coding region for the in frame insertion of the MS2 coat protein coding region, amplified with primers MS2-for/ rev from plasmid MS2-P65-HSF1\_GFP (Addgene ID 61423), to complete the pU6Rosa(MS2)-CAG-MS2-CtIP(T847E)-EF1-BFP vector. Alternatively a 875 bp synthetic gene fragment from pMS2-dimer (Thermo Scientific) encoding a single chain dimer of the MS2 protein (Peabody and Lim, 1996) was cloned into the PacI site of pU6Rosa(MS2)-CAG-CtIP(T847E)-EF1-BFP to obtain the pU6Rosa(MS2)-CAG-MS2di-CtIP(T847E)-EF1-BFP vector. The SunLigand single chain antibody coding region was amplified using primers SunLigand-for/-rev from pHRdSV40 scFv-GCN4-sfGFP-VP64-GB1-NLS (Addgene ID 60904) and cloned into the PacI and AfeI site of pMS2-dimer to derive plasmid pSunLigand. Next, the SunLigand coding region was recovered by PCR (primers SunLigand-for, MS2-rev) and cloned into the PacI site of pU6Rosa-CAG-CtIP(T847E)-EF1-BFP in frame with CtIP to derive pU6Rosa-CAG-SunL-CtIP-EF1-BFP. Plasmid pCAG-Cas9-SunTag-bpA was cloned by ligation of a 1483 bp PCR product, amplified from pHRdSV40-dCas9- 10xGCN4\_v4-P2A-BFP (Addgene ID 60903) using primers Bsmup and SunTag, into the BsmI/MluI sites of pCAG-Cas9v3abpA. pAAVS1-TLR6 was constructed by cloning of a fusion PCR product from a PCR fragment (using primers TLRvenus-1 and TLR6-1) and a PCR fragment (using primers TLR6-2 and TagRFP-2), using pAAVS1-TLR donor (Addgene ID 64215) (Chu et al., 2015) as template, into the backbone of plasmid pCAGvenusTarget+1P2A+3TagRFP (opened with PacI and MluI), resulting into pCAG-TLR6. pCAG-TLR6 was used for isolation of a AscI-AsiSI fragment that was ligated into pAAVS1-TLR donor, resulting into pAAVS1-TLR6, serving as AAVS1 targeting vector with the TLR6 reporter insert.

#### Generation of HEK Reporter Cells

The generation of HEKTLR cells was previously described (Chu et al., 2015). Cells were cultured in DMEM (Gibco) supplemented with 10% FBS (Gibco), 2 mM sodium pyruvate (Gibco), 2 mM L-glutamine (Gibco), and 1× NEAA (Gibco). Cells were maintained in the exponential phase. Transfected cells with Blasticidin-resistant vector were selected with 10 ug/ml of blasticidin for 4 days before analysis. The HEKTLR6 line was maintained in Dulbecco's Modified Eagle's medium with Glutamax (Gibco) supplied with 10% fetal bovine serum (Gibco) and generated by transfection of pAAVS1-TLR6 together with sgRNA plasmid against AAVS1 and a Cas9 plasmid (750 ng each) using Xtreme-gene transfection reagent (Roche). Antibiotic selection was performed using 0.4 µg/ml Puromycin. Single clones were generated and genotyped using genomic DNA isolated using the Wizard genomic DNA purification kit (Promega #A1125). PCR was performed for knock-in of the TLR6 construct (puro 5<sup>0</sup> ) as well as the AAVS1 WT locus specific as a control. The PCR reaction for TLR Knockin was performed using primers ST\_puro\_gt\_fw and ST\_puro\_gt\_rv. Phusion HF DNA polymerase (NEB # M0530 L) and 200 ng genomic DNA using following conditions; initial denaturation at 98◦C for 3 min, followed by 35 cycles of 98◦C 30 s, 58◦C 30 s and 72◦C for 90 s. final extension was done at 72◦C for 5 min. AAVS1 WT PCR was performed using primers hAAVS1-For and hAAVS1-Rev by amplifying at 98◦C for 5 min, followed by 40 cycles of 98◦C 30 s, 60◦C 30 s, 72◦C for 45 s and final extension at 72◦C for 5 min. After confirmation of TLR insertion by genotyping, selected clones were tested for activity of the TLR allele by FACS-based assay

TABLE 1 | Oligodeoxynucleotides.

fgene-10-00365 April 8, 2020 Time: 18:4 # 11

and a single clone was chosen for further assays. All cell lines were confirmed for the absence of mycoplasma using the PCR assay of Uphoff and Drexler (2002).

#### Targeting of the 53BP1 Gene in HEKTLR Cells

sgRNAs against Exon1 and 2 of the 53BP1 gene were designed using the CrispRGold program<sup>1</sup> and cloned into the pX330 vector using the oligonucleotides sgRNA\_53BP1\_exon1.1/- 1.2 and \_exon2.1/-2.2. Plasmid px330-sgRNA\_exon1.1 and

<sup>1</sup>https://crisprgold.mdc-berlin.de/



#### DSB Repair Assays

fgene-10-00365 April 8, 2020 Time: 18:4 # 12

HEKTLR or HEKTLR/153BP1 cells were plated into 6-well plates 1 day before transfection at a density of 1 × 10<sup>5</sup> cells/well. Transfection was carried out using FuGENE HD Reagent (Promega) according to the manufacturer's introduction. Briefly, DNA was diluted in OptiMEM medium (GIBCO), then FuGENE was added in the mixture with the ratio of 3:1 (FuGENE:DNA). The mixture was incubated at room temperature for 10 min and then dropped slowly into pre-plated cells. Next day, medium was changed to the stop reaction. We used the same molar ratio of plasmids for transfection if multiple plasmids were used. HEKTLR6 cells were seeded in 24-well plates (50,000 cells per well) 1 day before transfection. Cells were transfected with pU6Rosa-CAG-Cas9, pTLR-donor template and one or two plasmids for the expression of CtIP or Cas9 fusion proteins (375 ng each) using Xtreme gene transfection reagent according to manufacturer's recommendation. All samples were transfected in triplicate unless otherwise stated. FACS analysis for all the experiments was performed upon 72 h after transfection. For preparation of cells for FACS analysis the medium was aspirated from wells and each well was washed with PBS and treated with Trypsin for cell detachment. The cells were collected after adding media by centrifugation at 300 g for 4 min and finally resuspended in 500 µl PBS. The cells were kept on ice and FACS analysis was performed immediately. Single cells were gated for BFP positive populations and the frequency of Venus (HDR) and RFP (NHEJ) positive cells was determined using a BD LSR Fortessa flow cytometer (BD Biosciences). The results from replicate wells of each sample were used to calculate the mean value and standard deviation (SD). Mean

#### REFERENCES


values and SD were used to calculate p-values (T-test) using the GraphPad Prism 8 software (GraphPad Software Inc., San Diego, CA, United States).

#### Synthetic Oligodeoxynucleotides

Synthetic oligonucleotides used in this study are shown in **Table 1**.

#### AUTHOR CONTRIBUTIONS

N-TT, SB, KR, and RK conceived and designed the project. N-TT, JR, and SB acquired the data. N-TT, SB, and RK analyzed and interpreted the data. VTC, XL, and JR provided materials. N-TT, SB, KR, and RK wrote the manuscript.

### FUNDING

This work was supported by the German Ministry of Education and Research within the VIP program (TAL-CUT 03V0261 to RK).

### ACKNOWLEDGMENTS

We thank Hans Peter Rahn (MDC FACS core facility) for excellent service and support, Miguel Rodriguez de los Santos for cloning of the pTLR-donor. Plasmids pCW-GFP-CtIP-T847E (Addgene ID 71111), MS2-P65-HSF1\_GFP (Addgene ID 61423), sgRNA(MS2) cloning backbone (Addgene ID 61424), pHRdSV40-dCas9-10xGCN4\_v4-P2A-BFP (Addgene ID 60903), pHRdSV40-scFv-GCN4-sfGFP-VP64-GB1-NLS (Addgene ID 60904) were gifts from Daniel Durocher, Feng Zhang, and Ron Vale, obtained via Addgene.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00365/full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Tran, Bashir, Li, Rossius, Chu, Rajewsky and Kühn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fgene-10-00365 April 8, 2020 Time: 18:4 # 13

# Efficient and Precise CRISPR/Cas9- Mediated MECP2 Modifications in Human-Induced Pluripotent Stem Cells

*Thi Thanh Huong Le1†, Ngoc Tung Tran2†, Thi Mai Lan Dao1, Dinh Dung Nguyen1, Huy Duong Do1, Thi Lien Ha1, Ralf Kühn2,3, Thanh Liem Nguyen1, Klaus Rajewsky2 and Van Trung Chu2,3\**

*1 Department of Gene Technology, Vinmec Research Institute of Stem Cell and Gene Technology, Hanoi, Vietnam, 2 Immune Regulation and Cancer, Max-Delbrück-Center for Molecular Medicine, Berlin, Germany, 3 iPS Cell Based Disease Modeling, Berlin Institute of Health, Berlin, Germany*

#### *Edited by:*

*Kun Xu, Northwest A&F University, China*

#### *Reviewed by:*

*Daniela Tropea, Trinity College Dublin, Ireland Wei Li, University of Alabama at Birmingham, United States Robin Ketteler, University College London, United Kingdom*

*\*Correspondence:*

*Van Trung Chu vantrung.chu@mdc-berlin.de*

*†These authors have contributed equally to this work.*

#### *Specialty section:*

*This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics*

*Received: 21 February 2019 Accepted: 17 June 2019 Published: 02 July 2019*

#### *Citation:*

*Le TTH, Tran NT, Dao TML, Nguyen DD, Do HD, Ha LT, Kühn R, Nguyen TL, Rajewsky K and Chu VT (2019) Efficient and Precise CRISPR/ Cas9-Mediated MECP2 Modifications in Human-Induced Pluripotent Stem Cells. Front. Genet. 10:625. doi: 10.3389/fgene.2019.00625*

Patients with Rett syndrome (RTT) have severe mental and physical disabilities. The majority of RTT patients carry a heterozygous mutation in methyl-CpG binding protein 2 (MECP2), an X-linked gene encoding an epigenetic factor crucial for normal nerve cell function. No curative therapy for RTT syndrome exists, and cellular mechanisms are incompletely understood. Here, we developed a CRISPR/Cas9-mediated system that targets and corrects the disease relevant regions of the MECP2 exon 4 coding sequence. We achieved homologous recombination (HR) efficiencies of 20% to 30% in human cell lines and iPSCs. Furthermore, we successfully introduced a MECP2R270X mutation into the MECP2 gene in human induced pluripotent stem cells (iPSCs). Consequently, using CRISPR/Cas9, we were able to repair such mutations with high efficiency in human mutant iPSCs. In summary, we provide a new strategy for MECP2 gene targeting that can be potentially translated into gene therapy or for iPSCs-based disease modeling of RTT syndrome.

Keywords: MECP2 mutations, CRISPR/Cas9, RETT syndrome, homologous recombination, iPSCs

#### INTRODUCTION

Rett (RTT) syndrome is a genetic neurodevelopmental disorder found predominantly in female infants. Disease symptoms including cognitive disabilities, decreased coordination and mobility, repetitive movements, and slowed brain growth typically appear after 6 to 18 months of age (Hagberg et al., 1983; Hagberg, 1985; Trevathan and Naidu, 1988). The RTT syndrome is caused by dominant negative mutations in the X-linked transcription factor methyl CpG-binding protein 2 (MECP2). Most mutations occur within one of two functional domains of MECP2: the highly conserved methyl binding domain (MDB) or the transcription repressing domain (TRD) (Rasmussen et al., 1975; Amir et al., 1999). Gene therapy using an adeno-associated viral vector to deliver a functional sequence of the *MECP2* gene *in vivo* has been intensively investigated in mouse models (Sinnett and Gray, 2017). However, this approach induces toxicity and side effects due to the supraphysiological expression of exogenous *MECP2* (Sinnett and Gray, 2017). The study by Guy et al. using conditional MECP2 alleles has shown that by restoring the physiological expression level of *MECP2*, the symptoms of RTT syndrome could be reversed in affected adult mice (Guy et al., 2007; Clarke and Abdala Sheikh, 2018). This study suggests that CRISPR/Cas9-mediated correction of MECP2 mutant alleles is a potential gene therapy for RTT.

The type II CRISPR/Cas9 system is a RNA-guided nuclease providing adaptive immunity in *Streptococcus pyogenes*. In mammalian cells, Cas9 nuclease can be used for editing of genomic sequence by the induction of targeted DNA doublestrand breaks (DSBs). The induced DSBs are mostly repaired by the non-homologous end-joining (NHEJ) pathway creating micro-deletions/or insertions (INDELs) or, to a lesser extent, by the homologous recombination (HR) pathway allowing precise genetic manipulations if a repair template is provided (Cong et al., 2013; Hsu et al., 2013; Mali et al., 2013; Chu et al., 2015). Many studies have shown that CRISPR/Cas9-mediated mutagenesis can lead to efficient HR in human iPSCs (Byrne et al., 2015; Zhang et al., 2017; Li et al., 2018).

To our knowledge, there is no previous attempt to correct endogenous MECP2 mutant alleles in human cells using the CRISPR/Cas9 system. Here, we provide a system to repair the mutation hotspots of the MECP2 exon 4. We achieved HR efficiency up to 20% to 30% in human cell lines and iPSCs. To show a proof of principle of our system, we precisely inserted the MECP2R270X mutation into the MECP2 gene in wild-type iPSCs. Finally, using the CRISPR/Cas9 system, we subsequently repaired this mutation successfully in mutant iPSCs. Overall, we provide an efficient strategy for MECP2 correction that is crucial for disease modeling and can potentially be translated into gene therapy of the RTT syndrome.

### MATERIALS AND METHODS

#### Cell Culture and Reagents

HEK293 cells were maintained in DMEM (Gibco) supplied with 10% FBS (Gibco). K562 cells were cultured in RPMI 1640 supplied with 10% FBS and 2 mM l-glutamine (Gibco). Human iPSCs (BCRT cell line) were cultured in E8 flex medium (Gibco) according to the manufacturer's manual. Cas9 protein was purchased from NEB (M0386S), synthetic sgRNA was purchased from Synthego, and ssODN was ordered from IDT. To generate RNP complex, the Cas9 protein and synthetic sgRNA were mixed at ratio 1:2 and incubated at 25°C for 10 min.

#### CRISPR/Cas9 and Donor Vectors Construction

sgRNAs were designed based on CrisprGold software (Chu et al., 2016). Forward and reverse oligos were mixed and phosphorylated individually. Then, annealed oligo duplexes were cloned into the BbsI sites of the CRISPR/Cas9-T2A reporter plasmid (Addgene, 64216). To generate the pMECP2-T2A-mCherry reporter donor vector, the 5′ and 3′ homology arms (HA) were amplified from genomic DNA using Hercules Phusion polymerase (Agilent). The 5′ HA fragment was cloned into XhoI/EcoRI sites of pTV-T2A-mCherry; the 3′ HA fragment was cloned into AsiSI/KpnI sites of the pTV-T2A-mCherry vector. To generate the pMECP2 donor vector, 5-kb MECP2 fragment was amplified from genomic DNA, and at the cleavage site, the silent mutations were added to generate a new PstI recognition site.

### Transfection, Electroporation, and Flow Cytometry

Human HEK293 cells were plated into six-well plates at 1 day before transfection. On the day of transfection, cells were supplied with fresh complete medium, and the DNA was mixed with FuGENE® HD Reagent (Promega) in Opti-MEM (Invitrogen) according to the manufacturer's introduction. After 15 min of incubation at RT, the mixture was dropped slowly into the well. Next day, the medium was exchanged. The transfected cells were analyzed in different time points. For flow cytometry analysis, HEK293 cells were trypsinized and resuspended in PBS/1% bovine serum albumin (BSA) fluorescence-activated cell sorting (FACS) buffer and analyzed with a Fortessa machine (Becton Dickinson).

For electroporation, human K562 cells were harvested and counted, 2 × 105 cells were resuspended with sgRNA/Cas9 RNP and 100 pmol ssODN in 20 µl electroporation buffer P3 (Lonza) and transferred to a 16-strip cuvette and electroporated using a 4D Nucleofector X unit (Lonza). Then, cells were transferred into the pre-warmed complete medium.

For human BCRT-iPSCs, we used Lipofectamine® 3000 (Life Technologies) according to the manufacturer's manual. Briefly, 1 day before transfection, iPSCs were placed as small clumps in 500 μg/ml of vitronectin (Life Technologies) precoated plate filled with E8 flex completed medium supplemented with Rock inhibitor (10 µM) (Bio Cat). The next day, the medium was replaced without Rock inhibitor 6 to 8 hours before transfection. Diluted plasmids and lipofectamine were prepared as described in manual. In the case of multiple plasmids, pX330 and targeting vector were used for transfection; we used a 1:1 molar ratio. The medium was changed on the next day to stop the reaction. Two days post-transfection, transfected iPSCs were collected by incubation with Accutase (Merck) for 5 min and the cell pellet was then resuspended in a medium containing E8 Flex complete medium, Rock inhibitor (10 µM), RevitaCell Supplement (Life Technologies), and Gentamycin (Lonza) (FACS-PREP medium). GFP+ iPSCs were sorted by FACSAria (Becton Dickinson) and placed into a Vitronectin-precoated plate filled with FACS-PREP medium. The next day, the medium was replaced by E8 flex completed medium. Cells were maintained in culture for several passages before analysis.

For introducing the MECP2R270X mutation into the MECP2 gene, wild-type iPSCs were transfected with 2 μg of the plasmid expressing Cas9 and sgRNA-5 and 60 pmol of ssODN-R270X donor using Lipofectamine® 3000 (Life Technologies). Two days post-transfection, 103 GFP+ iPS cells were sorted and plated on a Vitronectin-coated well of a six-well plate in the FACS-PREP medium. The medium was changed the next day. Seven days after sorting, iPSC clones were picked and plated on a new Vitronectincoated well of a six-well plate. iPSC clones were expanded for 2 weeks. To this end, iPSC clones were harvested for analyzing insertion efficiency by genotyping PCR, RFLP assay, and Sanger sequencing. To repair the MECP2R270X mutation, homozygous mutant iPSC clone (clone 18) was transfected with the plasmid expressing Cas9 and sgRNA-3 together with the donor template plasmid containing silent mutations and a recognition site of PstI restriction enzyme as described above.

#### Genomic DNA Isolation, PCR, T7EI, and RFLP Assay

Reporter+ cells were cultured and harvested at different time points. Single-cell clones were sorted in 96-well plates. Genomic DNA was extracted using the QuickExtract DNA extraction kit (Epicentre) following the manufacturer's instruction. For T7EI assay, PCR was done using Herculase II Fusion DNA Polymerase (Agilent Technology) with PCR gene-specific primers (**Table 1**) using the following conditions: 98°C for 3 min; 39 cycles (95°C for 20 s, 60°C for 20 s, 72°C for 20 s), and 72°C for 3 min. PCR products were run on 2% agarose gels, purified, denatured, annealed, and treated with T7EI (NEB). Cleaved DNA fragments were separated on 2% agarose gels, and the DNA concentration of each band was quantified using the ImageJ software. For genotyping RFLP assay, PCR product was purified and digested with a PstI restriction enzyme for 1 h. The digestion was separated on 2% agarose gels.

DNA sequencing PCR products were directly sequenced by specific primers or cloned into the pSTBlue-1 Blunt vector (Novagen) following the manufacturer's protocol. Plasmid DNAs were isolated using the NucleoSpin Plasmid (Macherey-Nagel). Plasmids were sequenced using T7 forward primer (5′-TAATACGACTCACTATAGGG-3′) by the Sanger method (LGCgenomics, Berlin, Germany).

#### qRT-PCR

Total RNA was extracted from wild-type, mutant, and repaired iPSC clones with RNAeasy Mini Kit (Qiagen) and was reversetranscribed into cDNA with a SuperScript™ III kit (Invitrogen). The expression of MECP2 was measured by real-time PCR using SYBR green PCR Master Mix (Thermo Scientific) and StepOnePlus™ (Applied Biosystems). The relative expression level of MECP2 was normalized with GAPDH housekeeping gene.

#### Statistical Analysis

Statistical tests were performed using Prism 7.0 (GraphPad) using a paired two-tailed Student's t-test. \*\*\*\*P <0.0001.

#### RESULTS

#### CRISPR/Cas9-Mediated Reporter Insertion in the MECP2 Locus

Our previous study identified the mutation spectrum of the MECP2 gene in Vietnamese patients with RTT syndrome. The recurrent mutations T158M, G269fs, R270X, and R306H are located in the MDB and TRD domains of MECP2 protein encoded by exon 4 of the *MECP2* gene (**Figure 1A**). These mutations are specific for the Vietnamese patients, listed in RettBASE (Le Thi Thanh et al., 2018). We first developed a system to quantitatively determine HR efficiency in the human *MECP2* locus by inserting in-frame the coding sequence of cleavage peptide (T2A) and an mCherry reporter in the last exon of the *MECP2* gene. As a result, correctly targeted cells will express the mCherry reporter (**Figure 1B**). We used the CRISPRGold tool (Chu et al., 2016) to design two gRNAs (sg1 and sg2) targeting sequences proximal to the MECP2's stop codon. T7EI assay indicated that both sgRNAs efficiently targeted the *MECP2* locus upon delivery into HEK293 cells along with Cas9. Sanger sequencing showed a broad range of INDELs at the targeted site (**Figure 1C**). To access HR efficiency, plasmids expressing sg1 and Cas9 were transfected into the HEK293 cells together with donor plasmid carrying the MECP2\_T2A\_mCherry repair template. Transfected cells were analyzed by flow cytometry at days 14 and 21 posttransfection. We detected about 23% of mCherry+ cells in HEK293 cells that received sg1, Cas9, and donor template, but only background signals in control groups transfected with sgRNA/Cas9 or donor template alone (**Figure 1D** and **E**). We confirmed the corrected integration of the reporter into the targeted *MECP2* locus by using an external forward primer annealing to a genomic sequence outside of the 5′ HA and a specific T2A sequence reverse primer for PCR. The expected ~3.6-kb fragment was amplified only in HEK293 cells transfected with sg1, Cas9, and donor template (**Figure 1F**). These data indicate the correct configuration of the HR alleles in mcherry+ HEK293 cells. Single mCherry+ cells were sorted into individual wells of 96-well plates for genotyping PCR. We found that about 75% of these cells harbor heterozygous



(above) in the targeted MECP2 sequence. Sequencing data (below) indicates the INDEL spectrum of sg1 in the targeted MECP2 locus. (D) The percentage of mCherry+ cells were analyzed by flow cytometry at days 14 and 21 post-transfection. (E) The HR efficiency was summarized in the graph from three independent experiments, data show means ± SD (\*\*\*\* p<0.0001). (F) Corrected integration PCR. (G) Mono-allelic and bi-allelic knockin analysis in single mCherry+ cell clones. The data represent at least two independent experiments.

integrations of the T2A-mCherry sequence into MECP2, whereas homozygous integrations represent about 25% (**Figure 1G**). Taken together, using CRISPR/Cas9 and reporter systems, we achieved efficient HR in the human *MECP2* locus.

#### CRISPR/Cas9-Mediated Precise Modification in MECP2 Locus

Next, we developed the CRISPR/Cas9-mediated system to correct the recurrent mutations in the MECP2 exon 4: G269fs, R270X, and R306H. We designed two sgRNAs close to these mutations (sg3 and sg4) and a repair template containing HAs of 2.5 kb. To facilitate quantification of the HR efficiency, a PstI restriction site was created by introducing silent mutations into the repair template (**Figure 2A**). High editing activities of sg3 and sg4 were validated in HEK293 cells by T7E1 assays (**Figure 2B**, top panel). Sequencing data of sg3-targeted cells showed a broad range of INDELs (**Figure 2B**, bottom panel). Next, plasmids, carrying Cas9/sg3 and donor template were transfected into HEK293T cells. Thirty days post-transfection, genomic DNA of transfected

quantification-based knock-in efficiency. The data represent at least two independent experiments.

cells was isolated, and the targeted region was amplified by PCR. PstI-mediated restriction fragment length polymorphism (RFLP) showed that PstI-cleaved bands were only detected in cells co-transfected with sg3, Cas9, and repair template. Band quantification showed HR efficiency of 20% to 30% in the *MECP2* locus (**Figure 2C**). Sequencing data of the targeted homozygous clone confirmed that the precise modifications were correctly inserted into the MECP2 gene (**Figure 2D**).

It is known that HR efficiency is determined by many factors. The type of donor template is considered as the most important one (Song and Stieger, 2017). Double-stranded DNA plasmid, PCR sequences, and ssDNA oligonucleotides (ssODN) are often used as a donor template for precise insertion of large or small sequence changes at CRISPR/Cas9-induced DSBs. To test whether we can also use ssODN as donor template for MECP2 precise modifications, we designed a 100-nucleotide ssODN with homology regions of 45 nucleotides each and silent replacements to create a PstI restriction site (**Figure 2E**). Next, we electroporated the sg3/Cas9 RNP complexes with ssODN into human leukemic K562 cells. Four days post-targeting, the targeted cells were harvested for analyzing HR efficiency. As shown in **Figure 2E**, we detected PstI-cleaved PCR products only in cells that received both RNP and ssODN. The HR efficiency ranged from 29% to 34% (**Figure 2E**).

#### CRISPR/Cas9-Mediated Reporter Insertion in Human iPSCs

Patient-derived iPSCs have been intensively studied for disease modeling, drug discovery, and potential somatic cell therapy. Thus, we next tested whether our system works efficiently in human iPSCs. We exploited the reporter system as described above to evaluate HR efficiency. The sg1/Cas9 vector and donor template vector were co-transfected into human iPSCs. Two days later, the transfected iPSCs were enriched and expanded (**Figure 3A**). The transfection efficiency in human iPSCs was about 25% to 28% (**Figure 3B** and **C**). Ten days after expansion, the percentage of mCherry+ iPSCs was analyzed by flow cytometry. As shown in **Figure 3D** and **E**, the percentage of mCherry+ iPSCs was about 20% when transfected with sg1/Cas9 and donor template vectors, whereas there were only background signals in control cells transfected with only sg1/Cas9 or donor

FIGURE 3 | CRISPR/Cas9-mediated reporter insertion in human iPSCs. (A) Experimental scheme of gene editing in human iPSCs. (B) Transfection efficiency in iPSCs using Lipofectamine 3000 was analyzed by FACS. (C) A bar graph indicated the transfection efficiency of human iPSCs, data show means ± SD (\*\*\*\*p < 0.0001). (D) FACS analysis showed the knock-in efficiency in human iPSCs represented by the mCherry+ cells. (E) A bar graph showed knock-in efficiencies in human iPSCs, data show means ± SD (\*\*\*\*p < 0.0001). (F) Corrected integration PCR and Sanger sequencing data showed the junctions of 5′ HA. The data represent at least two independent experiments.

template vectors alone (**Figure 3D** and **E**). Correct integration PCR proved that the T2A-mCherry reporter was successfully inserted into the *MECP2* locus in human iPSCs (**Figure 3F**). Sequencing data confirmed that the targeted MECP2 sequences were configured as planned (**Figure 3G**). Overall, we provide a new efficient system to precisely modify the human *MECP2* gene in human iPSCs.

#### CRISPR/Cas9-Based Modeling MECP2R270X Mutation in Human iPSCs

To precisely introduce a MECP2R270X mutation (c.808 C>T) into the *MECP2* gene in human iPSCs, we designed a new sgRNA (sg5) and ssODN (ssODN-R270X) with homology regions of 60 nucleotides, including the C>T mutation and an XbaI recognition site (**Figure 4A**). Next, we transfected human iPSCs with the sg5/Cas9 vectors and ssODN-R270X donor template. Two days post-transfection, the transfected iPCSs were sorted and single cell-derived clones were expanded (**Figure 4B**). To this end, cell clones were harvested for analyzing the insertion efficiency of the MECP2R270X mutation. Among 22 cell clones, we detected two (9%) homozygous and five (23%) heterozygous clones (**Figure 4C** and **4D**). Sequencing data confirmed that we succeeded to insert the R270X mutation into the *MECP2* gene in human iPSCs (**Figure 4E**). To show a proof of principle of our correction system, we used the strategy depicted in **Figure 2A**. The sg3/Cas9 and donor template vectors were transfected into MECP2R270X/R270X iPS cells (clone 18, see **Figure 4C**). Two days later, single cell-derived clones were isolated and expanded for 2 weeks. As shown in **Figure 4F**, the PstI-mediated RFLP assay showed that we were able to correct the MECP2R270X mutation in mutant iPSCs. To assess the expression of the MECP2 gene, we isolated mRNA from the wild-type, mutant, and repaired iPSC clones. Quantitative RT-PCR data showed that we enabled to restore the expression level of the MECP2 gene in the repaired iPSCs as comparable as the wild-type iPSCs (**Figure 4G**). Sequencing data of MECP2 cDNA confirmed that the mutant MECP2 allele was successfully repaired in the corrected iPSCs (**Figure 4H**). Thus, this system will open new avenues to iPSCbased disease modeling and therapeutic development of RTT syndrome.

### DISCUSSION

In this study, we developed an efficient CRISPR/Cas9-mediated system that precisely modifies the human *MECP2* locus. Restoration of the physiological expression level of the MECP2 gene is critical for the reversion of disease symptoms. Thus, the correction of mutations in the endogenous MECP2 gene holds promise for gene therapy of RTT syndrome. Using preclinical mouse models of RTT, many laboratories have attempted to deliver intact MECP2 cDNA *in vivo,* using the AAV vectors (Sinnett and Gray, 2017). Although AAV-mediated MECP2 delivery leads to positive effects, such as increasing survival and improving weight, this system still causes multiple side effects and toxicity to animals due to the uncontrolled expression level of the MECP2 transgene (Collins et al., 2004; Luikenhuis et al., 2004; Gadalla et al., 2013). Thus, the expression level of MECP2 is essential for restoring the function of defected neurons. Too much MECP2 protein is as harmful to neural cells as too little. It has been shown that patients with the MECP2 duplication syndrome have two copies of the *MECP2* gene on their single X-chromosome leading to mental disability and autistic-like behavior (Moretti and Zoghbi, 2006). Furthermore, a reduction of MECP2 protein due to protein instability is also related to the milder neurological and psychiatric symptoms, such as anxiety and depression (Ramocki et al., 2009; Chao and Zoghbi, 2012). CRISPR/Cas9-mediated mutation correction restores the endogenous expression of the MECP2 gene that might fully recover RTT symptoms. This system can be potentially applied as gene therapy for RTT patients *in vivo*. However, many challenges remain. For example, the mainly affected cells are neurons that are non-dividing. It is known that the HR pathway occurs in the S-G2 phase of the cell cycle. Thus, non-dividing or low proliferating cells, such as neurons, are not suitable for HR-mediated mutant correction. However, this issue can be potentially addressed by either suppressing the NHEJ pathway or activating HR-related factors (Chu et al., 2015; Charpentier et al., 2018). In addition, the safety of the system needs to be further investigated.

In addition, many studies have shown that MECP2 is not only essential for the functions of neurons but also important for the functions of glial subtypes in the brain as well as of many cell types in the immune system, including microglia and macrophages (Maezawa and Jin, 2010; Derecki et al., 2012; O'Driscoll et al., 2013; Cronk et al., 2015; Jin et al., 2015; Cronk et al., 2016). Physiological restoration of normal MEPC2 functions in neurons and in all immune cells should be considered as an optimal therapy. It is supported that hematopoietic stem cell transplantation (HSCT) is beneficial in Mecp2-deficient mice and that MeCP2 plays important roles for immune cells both in the brain and in the periphery (Maezawa and Jin, 2010; Derecki et al., 2012; O'Driscoll et al., 2013; Cronk et al., 2015; Jin et al., 2015). Thus, correction of the MECP2 mutations *ex vivo* with the CRISPR/Cas9 system followed by autologous HSCT is a promising future therapeutic option for RTT syndrome.

Our system also works efficiently in human iPSCs opening great avenues for disease modeling, drug screening, and somatic cell therapy. The generation of RTT patient-derived iPSCs (RTT-hiPSCs) has been reported by many groups; however, the natural random X-chromosome inactivation (XCI) status of RTT-hiPSCs is inconsistent. XCI results in cellular mosaicism where some cells express wild-type MECP2 (isogenic), whereas other cells express mutant MECP2. Some studies showed that the maintenance of inactive X-chromosome of the founder cell allows RTT-hiPSCs to express either wild-type or mutant MECP2 allele (Ananiev et al., 2011; Cheung et al., 2011; Pomp et al., 2011). In contrast, others reported that the inactive X-chromosome of the founder cells is reactivated during the reprogramming process (Kim et al., 2009; Marchetto et al., 2010). Importantly, RTT-hiPSCs could differentiate into neurons that exhibit disease phenotypes of RTT syndrome. Thus, these cells will be valuable for being rescued by CRISPR/Cas9-mediated MECP2

correction. Using CRISPR/Cas9 technology, we have achieved efficient precise MECP2 modifications in human iPSCs. Overall, our work provides an efficient system for repairing MECP2 mutations in RTT-hiPSCs.

#### DATA AVAILABILITY STATEMENT

All datasets generated/analyzed in this study are included in the manuscript or supplementary files.

#### AUTHOR CONTRIBUTIONS

TTHL and VTC designed the project. TTHL, NTT, TMLD, DDN, HDD, TLH, and VTC performed experiments and acquired the

#### REFERENCES


data. NTT, VTC, RK, KR, and TLN analyzed and interpreted the data. VTC, NTT, RK, and KR wrote the article.

#### FUNDING

This work was supported by Vinmec Healthcare System (ISC.17.05 to TTHL and TLN).

#### ACKNOWLEDGMENTS

The authors thank H.P. Rahn for excellent FACS-related support. The authors thank Dr. Harald Stachelscheid from Berlin-Brandenburg Center for Regenerative Therapies for kindly providing human BCRT-iPSC line. The authors would also like to thank K. Petsch and J. Pempe for technical support.


human induced pluripotent stem cells. *Cell* 143, 527–539. doi: 10.1016/j. cell.2010.10.016


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Le, Tran, Dao, Nguyen, Do, Ha, Kühn, Nguyen, Rajewsky and Chu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

digital media

of impactful research

article's readership