Original Research ARTICLE
Mapping Transgene Insertion Sites Reveals Complex Interactions Between Mouse Transgenes and Neighboring Endogenous Genes
- Center for Brain Science and Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, United States
Transgenic mouse lines are routinely employed to label and manipulate distinct cell types. The transgene generally comprises cell-type specific regulatory elements linked to a cDNA encoding a reporter or other protein. However, off-target expression seemingly unrelated to the regulatory elements in the transgene is often observed, it is sometimes suspected to reflect influences related to the site of transgene integration in the genome. To test this hypothesis, we used a proximity ligation-based method, Targeted Locus Amplification (TLA), to map the insertion sites of three well-characterized transgenes that appeared to exhibit insertion site-dependent expression in retina. The nearest endogenous genes to transgenes HB9-GFP, Mito-P, and TYW3 are Cdh6, Fat4 and Khdrbs2, respectively. For two lines, we demonstrate that expression reflects that of the closest endogenous gene (Fat4 and Cdh6), even though the distance between transgene and endogenous gene is 550 and 680 kb, respectively. In all three lines, the transgenes decrease expression of the neighboring endogenous genes. In each case, the affected endogenous gene was expressed in at least some of the cell types that the transgenic line has been used to mark and study. These results provide insights into the effects of transgenes and endogenous genes on each other’s expression, demonstrate that mapping insertion site is valuable for interpreting results obtained with transgenic lines, and indicate that TLA is a reliable method for integration site discovery.
The invention of a method for generating transgenic mice by injection of a plasmid into the fertilized oocyte (Brinster et al., 1981; Gordon and Ruddle, 1981; Brinster and Palmiter, 1984/1985) was a transformative advance in biology. In most cases, the plasmid encodes a cDNA linked to regulatory elements (promoter and enhancer) that direct its expression. These mice have been used in three main ways. In one, the purpose is to identify regulatory sequences that govern temporal and spatial patterns of gene expression; here, the cDNA encodes a reporter gene that enables expression to be mapped. In the second, the purpose is to analyze the roles of a gene product by expressing it, or a protein that interferes with it; here, the cDNA encodes the protein under study and the regulatory elements are chosen to promote the desired expression pattern. In the third, the reporter is used to mark and analyze cells that express the gene from which the regulatory elements are derived. In all three cases, the expectations are (a) that regulatory elements will direct expression in some or all of the cells in which the parent gene is normally expressed and (b) that all transgenic lines established from the same plasmid will exhibit qualitatively similar expression patterns. Indeed, these conditions are frequently met. In some cases, however, expression does not correspond to that expected from the regulatory elements included in the transgene, and/or expression patterns vary among lines. Such unexpected labeling patterns can be an advantage or a detriment. Most often, they foil attempts to map enhancers, mark cells, or interfere with a biological process in a desired manner. They can also, however, provide unanticipated opportunities to define and mark cell types that had been undiscovered or inaccessible (e.g., Weis et al., 1991; Kim et al., 2010).
What accounts for these unpredictable expression patterns? When patterns are similar among lines established from the same plasmid, the likely explanations are that juxtapositions among normally separate regulatory elements or isolation of such sequences from their native context lead to new specificities (Swanson et al., 1985; Donoghue et al., 1991; Rao et al., 1996). In contrast, when expression patterns differ among independently generated lines, variations are generally presumed to reflect influences of endogenous sequences near the chromosomal site of integration (Palmiter et al., 1983) and are therefore termed “integration site-dependent.” The simplest explanation is that the reporter comes to be controlled by regulatory elements of a nearby endogenous gene, as seen in “enhancer traps” in transgenic Drosophila (Bier et al., 1989), zebrafish (Golling et al., 2002) and, recently, mice (Shima et al., 2016), all of which incorporate a minimal, tissue non-specific promoter but no strong regulatory elements. This mechanism is unlikely to provide a full explanation for conventional transgenes, however, which generally include tissue-specific promoters and enhancers. Other possibilities include novel specificities generated by juxtapositions of transgene and endogenous sequences, variations in chromatin conformation near the integration site, differences in transgene copy number, or mutations of either the transgene or flanking sequences that occur upon transgene integration (Palmiter and Brinster, 1986; Feng et al., 2000).
Having generated many transgenic lines with such insertion site-dependent expression patterns (Weis et al., 1991; Feng et al., 2000; Kim et al., 2010), we have become interested in the relationship between insertion sites and transgene expression patterns. In a few cases, the expression of a transgene has been related to that of a specific, nearby endogenous gene (Kothary et al., 1988; Sharpe et al., 1999; Narboux-Nême et al., 2012). We reasoned that if this were a general phenomenon, the endogenous gene might play a role in the development or function of the marked cells. Unfortunately, although identifying insertion site is straightforward for invertebrates, available methods for mice (Burgess et al., 1995; Sharpe et al., 1999; Suzuki et al., 2006; Sha et al., 2007; Liang et al., 2008; Dubose et al., 2013; Srivastava et al., 2014; Raman et al., 2015) have been cumbersome, little used and, in our hands, largely unsuccessful. Recently, however, a newly developed method termed Targeted Locus Amplification (TLA) was introduced that appeared to be more promising (de Vree et al., 2014; Cain-Hom et al., 2017). In TLA, genomic DNA in nuclei is cross-linked by formaldehyde, digested into small fragments by the frequently cutting NlaIII restriction enzyme (average fragment size ∼0.2 kb) and religated to form larger circular DNA containing fragments that were likely to have been near neighbors on a chromosome. These fragments are de-crosslinked and digested by another restriction enzyme, NspI, to create ∼2 kb fragments, which are then subjected to PCR with primers derived from sequences unique to the transgene. By amplifying fragments that contain the transgene sequence, this step selectively amplifies genomic sequences near the transgene insertion site. The product is sequenced and mapped to the genome, thereby localizing the transgene and also revealing insertions, deletions or other structural rearrangements both within the transgene and in flanking sequences.
We used TLA to determine insertion sites for three transgenic lines that incorporate fluorescent proteins as reporters: HB9-GFP (green; Wichterle et al., 2002), Mito-P (cyan, CFP; Misgeld et al., 2007), and TYW3 (yellow, YFP; Kim et al., 2010). All label subsets of cells in retina by what appears to be an insertion site-dependent mechanism, and have been used in studies of retinal development and function (Schubert et al., 2008; Kim et al., 2010; Trenholm et al., 2011; Kay et al., 2011a,b, 2012; Duan et al., 2014; Krishnaswamy et al., 2015; Shekhar et al., 2016; Peng et al., 2017; Sethuramanujam et al., 2017; Ray et al., 2018). For two of them, our interest was heightened by breeding experiments in which we attempted to generate transgenic animals that also carried mutations of genes expressed in cell types labeled by the transgene. Our inability to generate such animals by conventional mating strategies suggested that the transgenes were linked to genes expressed in some of the retinal cells marked by the fluorescent protein: Cdh6 in one line and Fat4 in another. For all three transgenic lines, we document interactions between the transgene and the closest endogenous gene. For two of them, the nearest endogenous gene is hundreds of kilobases (kb) from the transgene and yet it appears to strongly influence transgene expression. For all three, the transgene decreases expression of a nearby endogenous gene in a position-dependent manner. Together, our results provide novel insights into insertion site-dependent transgene expression and strengthen the argument that determination of insertion sites can be useful both for gene discovery and for assessing effects of transgene insertion that would otherwise go undetected.
Materials and Methods
Animal protocols were approved by the Institutional Animal Care and Use Committee (IACUC) at Harvard University. Animals were used in accordance with NIH guidelines. Mutants were maintained on a C57BL/6J background (JAX Stock No. 000664). We obtained both the HB9-GFP (Wichterle et al., 2002) and the Thy1-mitoCFP-P (Misgeld et al., 2007) transgenic mouse lines from Jackson Laboratories (JAX Stock No. 005029 and 006617). For brevity, we refer to Thy1-mitoCFP-P as Mito-P. The Fat4 conditional mutant (Saburi et al., 2008) was a kind gift of Helen McNeill (U. Toronto). The TYW3 line was generated in our laboratory using a Thy1-lox-YFP-STOP-lox-WGA-ires-LacZ sequence, as previously reported (Kim et al., 2010). The Cdh6CreER line was also generated in house by targeted insertion of a frt-neo-frt cassette, a 6xMyc-tagged CreER-T2, and a poly-adenylation signal at the translational start site of the Cdh6 coding sequence (Kay et al., 2011a) (JAX Stock No. 029428). HB9-GFP, Mito-P and TYW3 mice were back-crossed to C57BL/6J mice for at least 10 generations before being used for TLA.
For histology, in situ hybridization, and TLA, HB9-GFP and Mito-P tissue was collected at post-natal day 15 (P15) and P30 respectively. TYW3 retinas and spleen were collected at P56. For PCR of genomic DNA and RT-qPCR, samples were collected at P6–8 for HB9-GFP, P25–30 for Mito-P, and P50–60 for TYW3. Age-matched control animals were either wild-type littermates or C57BL/6J. Animals of both sexes were included in all experiments in roughly equal numbers.
Mice were euthanized by intraperitoneal injection of Euthasol (Virbac). Eyes were removed and fixed in 4% PFA in PBS for 90 min. Retinas were then dissected and rinsed with PBS. Retinas to be sectioned were sunk in 30% sucrose in PBS overnight at 4°C, embedded in tissue freezing medium, frozen in dry ice and stored at -80°C until processing. Retinas were then sectioned at 20 μm on a cryostat. Sections were rehydrated in PBS, incubated in 5% Normal Donkey Serum (NDS) (Jackson ImmunoResearch), 0.3% Triton X-100 (Sigma-Aldrich) in PBS for 2 h and then incubated with primary antibodies overnight at 4°C. Sections were then washed in PBS, incubated with secondary antibodies for 2 h at room temperature, washed again, dried, and mounted with Vectashield (Vector Lab).
For whole mounts, fixed retinas were incubated with 5% NDS, 1% Triton X-100 in PBS for 3 h and then incubated in primary antibody for 5 days at 4°C. Retinas were then washed in PBS and incubated overnight in secondary antibody. Finally, retinas were washed in PBS, flat-mounted on cellulose membrane filters (Millipore), coverslipped with Fluoro-Gel (Electron Microscopy Sciences), and sealed.
Antibodies used were as follows: rabbit and chicken anti-GFP (1:2000, Millipore Cat#AB3080P; 1:1000, Abcam Cat#ab13970); goat anti-choline acetyltransferase (ChAT) (1:500, Millipore Cat#AB144P); mouse anti-TFAP2 (1:200, DSHB Cat#3B5); guinea pig and rabbit anti-Rbpms (1:2000, PhosphoSolutions Cat#1832-RBPMS; 1:300, Abcam Cat#ab194213). Rabbit and guinea pig antibodies against Slm1 (1:5000) and Slm2 (1:2500) were the generous gift of Peter Scheiffele (Iijima et al., 2014). Secondary antibodies were conjugated to Alexa Fluor 488 (Invitrogen), Alexa Fluor 568 (Invitrogen), or Alexa Fluor 647 (Jackson ImmunoResearch) and used at 1:1000. Nuclei were stained with ToPro Cy5 (1:5000, Thermo Fisher).
In situ hybridization was performed as described elsewhere (Kay et al., 2011a; Duan et al., 2014). Tissue was collected and prepared with RNase-free reagents, sectioned and imaged as described above. Section hybridization was carried out at 65°C. Probes were detected using anti-digoxigenin (DIG) antibodies conjugated to horseradish peroxidase (HRP), followed by amplification with Cy3-tyramide (TSA-Plus System; Perkin-Elmer Life Sciences, Waltham, MA, United States) for 2 h.
Images were acquired using 488, 568, and 647 nm lasers on an Olympus-FV1000 Confocal Microscope. We used ImageJ (NIH) software to analyze confocal stacks and generate maximum intensity projections.
Targeted Locus Amplification
Targeted locus amplification (TLA) technology uses the physical proximity of nucleotides within a locus of interest to generate a map of original sequences and corresponding inserted transgenes (de Vree et al., 2014). Transgenic homozygotes and wild-type controls were euthanized and cells were prepared from their spleens (Cain-Hom et al., 2017). Homozygotes were distinguished from heterozygotes by fluorescent quantitative PCR (qPCR) results from a commercial genotyping service (Transnetyx)1. In some cases we confirmed their results in our laboratory by mating or additional PCR. The cells were frozen and shipped to Cergentis (Utrecht, Netherlands). TLA was then performed as described in de Vree et al. (2014) and Hottentot et al. (2017). Briefly, DNA was crosslinked, fragmented, religated, and decrosslinked. This product served as the TLA template, which was subsequently fragmented, circularized, and amplified with inverse primers complementary to a short locus-specific sequence. Once the complete locus was amplified, ∼2 kb segments were sheared. Libraries were prepared for sequencing by MiSeq or HiSeq technologies on an Illumina platform.
Primer Sequences for TLA:
Set 1: CAACAGCCACAACGTCTATA
Set 2: GTCCTCCATCTGCCTAAGGG
Set 1: CGACAGCAGTTGAGTTCA
Set 2: CGAGAAGCGCGATCACATG
Set 1: GAGTCCAGGTGAGAGCAG
Set 2: CGACAGCAGTTGAGTTCAGCA
PCR of Genomic DNA
Genomic PCR was used to confirm insertion sites and deletions revealed by TLA. Primers were designed as shown in Figure 3 for HB9-GFP (primers 1–4), Figure 4 for Mito-P (primers 5–7) and Figure 5 for TYW3 (primers 8–11). Sequences were as follows:
Primer 1: 5′-AACTTGTGCGGTTCTGTCCT-3′
Primer 2: 5′-TTGACAAAGTGGGGGTTAGGC-3′
Primer 3: 5′-CAGCGAAGGGGAAATTTGCATAT-3′
Primer 4: 5′-GCATCTGTGTGTCACAGCAGTGGT-3′
Primer 5: 5′-CTAGCCAAAGGGATTAACAATGTG-3′
Primer 6: 5′-CAATCATAATGCAGACAGGAATGT-3′
Primer 7: 5′-CAGAGCTCTGGGTCCAGTCAGTA-3′
Primer 8: 5′-TATGTGCGCCACTGTGTAGTT-3′
Primer 9: 5′-TTTGGTTCCCGGTCTCTGAAG-3′
Primer 10: 5′-ATCCTGTTGCAGCGTCGTTA-3′
Primer 11: 5′-GTCAGGGACCTCTGTGGTTG-3′
We confirmed the predicted insertion of HB9-GFP on Chromosome 15 using two primer pairs, one for the left junction Primers 1 and 2, and one for the right junction, Primers 3 and 4. We confirmed the predicted insertion of Mito-P on Chromosome 3 with Primers 5 and 6 and the predicted deletion in Chromosome 3 with Primers 5 and 7. We confirmed the insertion of TYW3 with Primers 8 and 9 and the predicted deletion in Chromosome 1 with Primers 10 and 11.
We extracted DNA from tail-clips of wild-type, heterozygous, and homozygous animals with 50 μL Quick Extract (Lucigen) at 68°C for 30 min and 98°C for 3 min in a PCR machine. Zygosity was determined as described above. PCR reaction mixtures were 3 μL DNA, 12.5 μL Econotaq Plus Green 2X Master Mix (Lucigen), 7.5 μL H20, 0.4 μL F primer 10 mM, 0.4 μL R primer 10 mM for 25 μL reactions. The reaction program was 94°C for 2 min; [94°C for 30 s, 50–55°C for 30 s, 72°C for 1 min] x 40 cycles; 72°C for 5 min. The annealing temperature varied in accordance with the melting temperature of the primer pairs tested.
Mice were euthanized by intraperitoneal injection of Euthasol. Retinas were dissected and RNA was extracted with 250 μL Trizol Reagent (Invitrogen) and DirectZol RNA miniprep kit (Zymo Research). cDNA synthesis was carried out by incubating 8 μL (100–500 ng) extracted RNA with 1 μL oligodT(20) (Thermo Fisher) and 1 μL dNTP (Thermo Fisher) for 5 min at 65°C. Next, reverse transcription was performed by incubating samples with 2 μL 10X RT buffer (Invitrogen), 4 μL 25mM MgCl2 (NEB), 2 μL 0.1 DTT (Invitrogen), 1 μL RnaseOUT Recombinant RNase Inhibitor (Thermo Fisher), and 1 μL SuperScript III (Invitrogen) in a PCR cycler for 50 min at 50°C, 5 min at 85°. Remaining RNA was removed by addition of 1 μL Ribonuclease H (Thermo Fisher) and incubation at 37°C for 20 min. qPCR was performed using KAPA SYBR FAST qPCR master mix (Kapa Biosystems). Reactions were carried out with 8.2 μL H2O, 10 μL SYBR FAST master mix, 1 μL 10 mM forward primer, 1 μL 10 mM reverse primer, and 1 μL DNA. The reaction program was run on an ABI 7900 as follows: 95°C for 5 min; [95°C for 30 s, 60°C for 45 s]x 40 cycles; 95°C for 15 s; 60°C for 15 s; 95°C for 15 s. For Fat4, the reactions differed only in that the annealing temperature was set to 54°C instead of 60°C. We ran samples in triplicates and normalized expression of our genes of interest to Gapdh levels. Resulting CT values were used to calculate ΔΔCT and fold changes in the expression of endogenous genes in our three transgenic lines. Both technical and experimental replicates were included. In most case, wild-type littermates were used as controls. Primers were as follows:
Estimating Copy Number
We estimated transgene copy number from quantitative fluorescent PCR data provided by our genotyping service (Transnetyx). The raw signal returned for every sample was a function of the ΔCT between the housekeeping gene c-Jun and a probe for our gene of interest:
We compiled raw signal data for heterozygous animals from the HB9-GFP, Mito-P, and TYW3 lines genotyped with either a GFP or a LacZ probe. We then averaged the signal for each transgenic strain. We normalized these values to the average intensity returned for our single copy Cre-GFP and LacZ knock-in lines and calculated an estimated copy number for our three transgenic lines of interest.
Comparisons were performed using GraphPad Prism software. For single comparisons, we used Student’s t-test. For multiple comparisons, we used one-way ANOVA.
We calculated the odds of transgene and endogenous gene expression overlapping in individual cell types using elementary combinatorics. The combinatoric formula yields the probability that the transgene and the endogenous gene would be expressed in the same cell type by chance, given an equal probability of being expressed in any cell type. Thus, given 3/120 Cdh6+ types and 2/120 GFP+ types in the retina, the odds of finding at least 1 double + type is:
Similarly, given 2/120 Fat4+ types and 3/120 CFP+ types in the retina, the odds of finding 2 double + type is:
The null hypothesis is that expression of transgene within a given cell type is unrelated to whether that cell type expresses the endogenous gene. Thus, we can reject the null hypothesis at p∼0.05 for Cdh6 and HB9-GFP and at p∼0.0004 for Fat4 and Mito-P.
The HB9-GFP Transgene Is Inserted Near the Cdh6 Locus on Mouse Chromosome 15
The HB9-GFP transgene is composed of a 9 kb fragment from the 5′ end of the Mnx1 gene (previously called Hb9) that extends into the first exon, linked to a cDNA encoding the enhanced green fluorescent protein (GFP) (Wichterle et al., 2002). It was generated to label motor neurons, which express Mnx1, but was later shown to also mark two types of retinal neurons: a subset of cone photoreceptors and a type of retinal ganglion cell (RGC) that responds selectively to dark or bright objects moving in a dorsal-to-ventral direction across the retina (ventral-preferring on-off direction-selective retinal ganglion cells or V-ooDSGCs; Trenholm et al., 2011, 2013; Figures 1A,B,E). RNA-Seq data generated in our laboratory show that neither cones nor V-ooDSGCs express Mnx1 at detectable levels (Peng et al., 2017; Sarin et al., 2018).
FIGURE 1. Retinal expression pattern of three transgenic mouse lines. (A) Section of an adult mouse retina stained with ToPro, a nuclear stain. Photoreceptors are located in the outermost layer, termed Outer Nuclear Layer (ONL). These form synapses in the Outer Plexiform Layer (OPL) with interneurons, whose cell bodies reside in the Inner Nuclear Layer (INL). Bipolar and amacrine cells of the INL also form synapses in the Inner Plexiform Layer (IPL), with ganglion cells from the Ganglion Cell Layer (GCL). (B,E) Expression of HB9-GFP in V-ooDSGCs (solid arrowheads) and cone photoreceptors (open arrowheads). (C,F) Expression of Mito-CFP in Type 1a (solid arrowheads) and 1b Bipolar Cells (BCs), as well as other bipolar, amacrine, and ganglion cells. (D,G) Expression of TYW3 in several types of retinal ganglion cells (RGCs) that stratify in the middle part of the IPL. All cells are GFP positive. Scale bars (A–D) are 40 μm and scale bars (E–G) are 20 μm. Boxed regions in (B–D) are shown at higher magnification in (E–G).
In a study of ooDSGCs, we discovered that V-ooDSGCs and dorsal preferring ooDSGCs (D-ooDSGCs) both express Cdh6, which encodes the recognition molecule Cadherin 6 (Kay et al., 2011a). We confirmed this previously documented co-expression for V-ooDSGCs using the HB9-GFP line (Figure 2A). Cdh6 is also expressed in a set of amacrine interneurons, termed starburst amacrine cells, which innervate ooDSGCs but do not express HB9-GFP; conversely, cones are HB9-GFP-positive but Cdh6-negative. Thus, of >120 retinal neuronal types (Laboulaye et al., unpublished), one expresses Cdh6 but not HB9-GFP, one expresses HB9-GFP but not Cdh6, one expresses both and >115 express neither.
FIGURE 2. Co-expression of transgenes and endogenous genes. (A) Retinal expression pattern of Cdh6 mRNA shown by in situ hybridization. Cdh6 is expressed in HB9-GFP positive V-ooDSGCs. (B) Retinal expression pattern of Fat4 mRNA shown by in situ hybridization. Fat4 is expressed in Mito-P positive BCs. Scale bars are 20 μm.
To study the role of Cdh6 in the development and function of V-ooDSCGs, we generated Cdh6 mutants and attempted to generate HB9-GFP+/-;Cdh6-/- mice by crossing HB9-GFP+/-;Cdh6+/- and Cdh6-/- mice. We retrieved no HB9-GFP+/-;Cdh6-/- animals from >200 offspring, suggesting that the HB9-GFP transgene and the endogenous Cdh6 gene were linked. We used TLA to test this possibility. Two primer pairs complementary to the transgene sequence were designed, one complementary to GFP sequences at the 3′ end of the transgene and the other complementary to Mnx1 sequences at the 5′ end of the transgene (Figure 3A). Both were used to generate products that were sequenced to a depth of 5 Mb.
FIGURE 3. HB9-GFP locus identification. (A) Schematic of the HB9-GFP transgene, showing positions of primers used for TLA. Primer sets were designed within the Mxn1 and the GFP sequences. (B) Genome-wide TLA coverage using Mnx1 primers. Peak at Chromosome 5 shows endogenous Mnx1 (denoted by black circle) and peak at Chromosome 15 shows inserted sequence (red circle). (C) Genome-wide TLA coverage using GFP primers, showing a peak at Chromosome 15 (red circle). (D) Regional coverage of the HB9-GFP insertion site on Chromosome 15, spanning 74 kb surrounding the integration site. (E) Schematic of the inserted sequence. Multiple copies of the transgene were inserted 3′ to 5′. HB9-GFP is located between Cdh6 and Cdh9 on Chromosome 15. Primers were designed to confirm the left and right junctions of the transgene with Chromosome 15. PCR product from homozygous HB9-GFP, heterozygous HB9-GFP, and C57BL/6J animals is denoted as “Tg/Tg,” “Tg/+,” “WT” respectively. (F) Confirmed insertion of the transgene with primers specific to the junction between the 3′ end of the transgene and Chromosome 15 (Primers 1 and 2). (G) Confirmed insertion of the transgene with primers specific to the 5′ end of the transgene and Chromosome 15 (Primers 3 and 4).
Results from both sets of primers identified the insertion site of the HB9-GFP transgene on Chromosome 15 (Chr15: 13,853,116 -13,862,097) (Figures 3B–D). Consistent with our prediction, the gene nearest to the HB9-GFP transgene was Cdh6 (Chr15:13,034,200–13,173,675), with the 3′ end of Cdh6 ∼680 kb upstream of the 5′end of HB9-GFP (Figure 3E). To confirm the insertion, we designed primers flanking the predicted left and right junctions. Genomic PCR confirmed the insertion of HB9-GFP in heterozygotes and homozygotes for both of these primer sets (Figures 3F,G). Based on the orientation of the junctions and the relative position of the transgene, we conclude that the transgene was inserted 3′ to 5′.
The Mito-P Transgene Is Inserted Near Fat4 on Chromosome 3
In the Mito-P transgene, the coding sequence of the enhanced cyan fluorescent protein (CFP) was fused to a sequence encoding a 31 amino acid fragment from the human subunit VIII of the cytochrome c oxidase gene sufficient to drive expression in mitochondria (Misgeld et al., 2007). This construct was inserted into a 6.5 kb fragment of the Thy1 gene that is known to drive expression in central projection neurons including motor and sensory neurons and RGCs (Caroni, 1997; Feng et al., 2000) (Figure 4A). This transgene was designed and used to monitor mitochondrial dynamics in motor axons. Several lines were generated, each of which labeled distinct neuronal types, indicating an insertion site-dependent pattern (Misgeld et al., 2007). In addition to motor axons, Mito-P also labels two of 15 types of bipolar interneurons (Type 1A and Type 1B) as well as one of ∼50 types of amacrine interneuron (nGnG) in retina (Schubert et al., 2008; Kay et al., 2011b; Shekhar et al., 2016). Bipolar and amacrine cells express Thy1 at low levels; a small number of RGCs, which express Thy1 at far higher levels, are also labeled (Figures 1A,C,F).
FIGURE 4. Mito-P locus identification. (A) Schematic of the Mito-P transgene, showing positions of primers used for TLA. Primer sets were designed within the Thy1 sequence and within the CFP sequence. (B) Genome-wide TLA coverage using Thy1 primers. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 3 shows inserted sequence (red circle). (C) Genome-wide TLA coverage using CFP primers, with a peak at Chromosome 3 showing the inserted sequence (red circle). (D) Regional coverage of the Mito-P insertion site on Chromosome 3 using both sets of primers, showing a 3 kb deletion and spanning 49 kb. (E) Schematic of the inserted sequence. The transgene was inserted multiple times between Fat4 and Intu in Chromosome 3, in the 5′ to 3′ direction. Primers were designed to confirm the left junction of the transgene to Chromosome 3, as well as the predicted deletion. PCR product from homozygous Mito-P, heterozygous Mito-P, and C57BL/6J animals is denoted as “Tg/Tg,” “Tg/+,” “WT” respectively. (F) Primers 5 and 6 were used to confirm the junction between the transgene and Chromosome 3 in heterozygous and homozygous animals. (G) Primers 5 and 7 were used to confirm the deletion engendered by the transgene. The band is absent in putative homozygous Mito-P animals.
In a study of mouse bipolar cells, we found that Types 1A and 1B expressed Fat4, which encodes a cell surface protein implicated in cell polarity (Shekhar et al., 2016; Figure 2B). To study the role of Fat4 in retinal development, we obtained conditional Fat4 mutants (Saburi et al., 2008) and attempted to generate Mito-P;Fat4loxP/loxP mice by breeding. However, we retrieved no Mito-P;Fat4loxP/loxP animals among >100 offspring generated from Mito-P;Fat4loxP/+ x Fat4loxP/loxP matings. This result, which paralleled that described above for HB9-GFP and Cdh6, suggested that the Mito-P transgene was inserted near the Fat4 locus.
TLA revealed that the Mito-P transgene was inserted in Chromosome 3 (Chr3: 39,505,947–39,508,740), and that the insertion was accompanied by a 3 kb deletion (Figures 4A–D). The insertion site is located 550 kb from the 3′ end of the Fat4 gene (Chr3: 38,886,940–38,952,429), accounting for our failure to recover Mito-P;Fat4loxP/loxP offspring via conventional recombination. We confirmed both the insertion site and the accompanying deletion by PCR on genomic DNA (Figures 4E–G).
The TYW3 Transgene Is Inserted Near Khdrbs2 on Chromosome 1
We generated a set of transgenic mice called TYW using the Thy1 sequences described above (Caroni, 1997; Feng et al., 2000). The transgene included a cDNA encoding YFP flanked by LoxP sites followed by cDNAs for E. coli beta galactosidase (LacZ) and wheat germ agglutinin (WGA) (Figure 5A). It was designed to express YFP constitutively and LacZ plus WGA following excision of the floxed cassette with Cre recombinase (Kim et al., 2010). In practice, however, LacZ and WGA were expressed at undetectable levels, but the YFP was expressed strongly. Each of several lines labeled distinct sets of retinal neurons. TYW3 labeled six of ∼45 RGC types. One type was labeled most brightly; we called it W3B and analyzed its development and function in detail (Zhang et al., 2012; Krishnaswamy et al., 2015). Remarkably, dendrites of all 6 (W3B and five dimmer types together called W3D) laminated in a narrow central stratum in the central third of the inner plexiform layer (Figures 1A,D,G) (Whitney and Sanes, unpublished). Assuming that all thirds of the IPL are populated by equal numbers of dendrites, the odds of 6 stratifying in the same third are (1/36) × 3 or 1/243. We therefore wondered whether sequences near the TYW3 insertion site contributed to this unusual expression pattern.
FIGURE 5. TYW3 locus identification. (A) Schematic of the TYW3 transgene, showing positions of primers used for TLA. Both primer sets were designed within the Thy1 sequence. (B) Genome-wide TLA coverage using Thy1 primer set 1. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 1 shows inserted sequence (red circle). (C) Genome-wide TLA coverage using Thy1 primer set 2. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 1 shows inserted sequence (red circle). (D) Regional coverage of the TYW3 insertion site on Chromosome 1 using both sets of primers, showing a 419 kb deletion and spanning 5 Mb. (E) Schematic of the inserted sequence on Chromosome 1. The transgene was inserted multiple times between Lgsn and Khdrbs2 in Chromosome 1, in the 5′ to 3′ orientation. Primers were designed to confirm the left junction between the transgene and Chromosome 1, as well as the predicted deletion. PCR product from homozygous TYW3 and C57BL/6J animals is denoted as “Tg/Tg” and “WT” respectively. (F) Primers 8 and 9 were used to confirm the junction between the 5′ end of the transgene and the 3′ end of Chromosome 1 in homozygous animals. This 292 bp band is absent in wild-type animals (G) Primers 10 and 11 were used to confirm the deletion engendered by the transgene. The 173 bp band is absent in putative homozygous TYW3 animals.
TLA revealed that the TYW3 transgene was inserted in Chromosome 1 (Chr1: 31,745,752–32,165,062). The insertion was accompanied by a 420 kb deletion directly upstream of the insertion site (Figures 5A–D). The 3′ end of the TYW3 transgene is 7 kb upstream of the initiation codon of the Khdrbs2 gene (Figures 5D,E), which encodes an RNA binding regulator of alternative splicing also called Slm1 (Ehrmann et al., 2016). We validated the insertion site with primers spanning the junction between the 3′ end of Chromosome 1 and the 5′ end of the transgene (Figure 5F), as well as the deletion caused by the insertion using primers for a sequence within this putatively deleted region (Figure 5G).
Insertion Site Maps Enable Determination of Zygosity by Conventional PCR Genotyping
When transgenic mice are inbred, heterozygotes are generally distinguished from homozygotes either by qPCR of genomic DNA or by outcrossing to wild type animals. The former, which we used to assess zygosity for TLA, is subject to considerable variation and the latter is cumbersome; both are time-consuming and costly. Moreover, the use of primers derived from the reporter (e.g., GFP) can also give ambiguous results if, for example, more than one line contains a fluorescent protein as occurs in some complex mating schemes. Once insertion sites are mapped, however, line-specific primers can be designed that allow one to distinguish zygosity without relying on relative fluorescent RT-PCR intensities (Cain-Hom et al., 2017). We demonstrate this for two of the three lines analyzed here. For Mito-P, Primers 5 and 7 in Figure 4E generate a band in wild types or heterozygotes but none in homozygotes, because the sequence recognized by Primer 7 is deleted (Figure 4G). Likewise, Primers 10 and 11 in Figure 5E recognize a sequence deleted in the TWY3 line, so PCR using these primers generates a band in wild types or heterozygotes but not in homozygotes (Figure 5G). In addition, one primer set in each case generates a band unique to the lines: Primers 1 and 2 or 3 and 4 for Hb9-GFP (Figures 3E–G), Primers 5 and 6 for Mito-P (Figures 4E,F) and Primers 8 and 9 for TYW3 (Figures 5E,F). Thus, information gained from the insertion site greatly simplifies genotyping.
All Three Transgenes Are Present in Multiple Copies
Multiple copies of transgenes are frequently integrated in a head-to-tail tandem array at a single genomic site (Palmiter and Brinster, 1986). Transgene copy number can vary from one to over one hundred, and the number can influence both qualitative and quantitative aspects of transgene expression (e.g., Williams et al., 2008). To more completely characterize the HB9-GFP, Mito-P and TYW3 transgenes, we estimated their copy number. To this end, we analyzed data from the commercial service (Transnetyx)2 that we employ to genotype our transgenic lines. Transnetyx uses a quantitative fluorescent PCR method to detect transgene-specific sequences. The raw quantitative PCR results are normalized to a control endogenous gene (cJun). We compared the relative intensities of our three transgenic lines to intensities of single copy knock-in lines. After averaging the signals of 46–56 heterozygous animals from each line and normalizing to single copy intensities of GFP (n = 20) or LacZ (n = 22), we estimate that the HB9-GFP, Mito-P and TYW3 transgenes contain 10–11 (10.57 ± 0.60), 13 (12.92 ± 0.53) and 5 copies (5.21 ± 0.13) respectively (Figure 6) (Mean ± SEM).
FIGURE 6. Estimation of transgene copy number. Copy number was determined by quantitative PCR of genomic DNA, with values normalized to a line with a single copy of GFP or LacZ. Signal intensities were garnered from the following number of animals: n = 20–22 for single GFP and LacZ lines; n = 46 for HB9-GFP; n = 56 for Mito-P; n = 51 for TYW3.
In each case, we are confident that all copies are inserted at a single genomic site for two reasons. First, TLA revealed only a single insertion site for each transgene (Figures 3–5). Second, even if more than a single insertion occurred initially, the lines have been bred for at least 10 years, or >40 generations, which is more than enough to segregate multiple inserts.
Effect of Transgenes on Expression of Neighboring Endogenous Genes
We next asked if the transgenes we had studied affected expression of neighboring endogenous genes. Using quantitative PCR (qPCR), we found that levels of Khdrbs2 mRNA were reduced by ∼45% in TYW3 homozygotes compared to controls; Fat4 mRNA levels were reduced by ∼25% in Mito-P homozygotes compared to controls; and Cdh6 mRNA levels were reduced by ∼10% in HB9-GFP homozygotes compared to controls. The reductions were statistically significant for all three lines (p < 0.0001 for TYW3 and Mito-P; p = 0.0094 for HB9-GFP by Student’s t-test). Interestingly, the effect size of these transgenes on endogenous gene expression is related to the distance between the two (Figure 7). However, numerous other factors may affect expression, including deletions of endogenous sequences, transgene size and transgene copy number; our sample size is too small to distinguish among these possible explanations.
FIGURE 7. Effect of transgenes on expression of neighboring endogenous genes. Expression of Slm1, Fat4, and Cdh6 mRNA from TYW3, Mito-P, and HB9-GFP homozygous animals, respectively, was determined by RT qPCR. Retinas were collected at P56 for TYW3, P25 for Mito-P, and P6 for HB9-GFP. Values were compared to those from wild-type littermates for Mito-P and HB9-GFP and to age-matched C57BL/6J controls for TYW3. ΔΔ CT values were calculated against Δ CT values of Gapdh. The difference was calculated as 2ˆ-ΔΔCT. The change in expression compared to control was significant for all three transgenic lines: Slm1 in TYW3 homozygotes (p < 0.0001), Fat4 in Mito-P homozygotes (p < 0.0001), and Cdh6 in HB9-GFP homozygotes (p < 0.009) (HB9-GFP n = 4; Mito-P n = 8; TYW3 n = 4). Significance was calculated by one-way ANOVA and Tukey’s multiple comparison tests. Effect of transgene on endogenous gene expression varies with distance between the transgene and the endogenous gene.
Interactions Between the TYW3 Transgene and the Endogenous Khdrbs2 Gene
Because the TYW3 transgene exerted a strong effect on expression of Khdrbs2, we used immunohistochemical methods to examine interactions between the transgene and the endogenous gene in cellular detail. In wild-type retinas, Slm1 was present in most RGCs, as identified by double-labeling with the pan-RGC marker Rbpms (Rodriguez et al., 2014) and most amacrine cells, identified with the pan-amacrine marker AP2 (Bassett et al., 2007) (Figures 8A,B). Slm1 also appeared to be expressed by horizontal cells, identified by soma position. Bipolar, photoreceptors, and Müller glial cells were not detectably labeled. Patterns of Slm1 labeling in TYW3 heterozygotes appeared similar to those in wild-type retina (Figure 8C). We also noted that YFP-positive RGCs, which comprise ∼15% of all RGCs, were nearly all Slm1-positive (Figures 8C, 9I).
FIGURE 8. Expression of Slm1 in wild-type retina and TYW3 retinal architecture. (A) Slm1 is expressed by most RGCs. Section stained with Rbpms and Slm1. (B) Slm1 is expressed by most amacrine cells. Section stained with TFAP2 (AP2) and Slm1. (C) Slm1 is expressed by YFP-positive RGCs in the TYW3 line. Section stained with GFP and Slm1. Sections from wild type (D) and TYW3 homozygous (E) retina. The difference between panels (D) and (E) falls within the normal range of variation owing to section quality, staining intensity and differences among individuals. Section stained with ToPro. Lamination of TYW3-RGCs in heterozygous (F) and homozygous (G) retina. Section stained with GFP and ChAT, labeling starburst amacrine cells. (H) Rbpms counts in wild-type, TYW3 Tg/+, and TYW3 Tg/Tg retinas (Mean ± SEM). No significant difference between conditions (n = 5 for WT, n = 4 for TYW3 Tg/+ and TYW3 Tg/Tg). (I) TYW3-RGC counts in TYW3 Tg/+ and TYW3 Tg/Tg retinas (Mean ± SD). No significant difference between conditions (n = 4 for both). Scale bars for (A–G) are 40 μm.
FIGURE 9. Expression of Slm1 and Slm2 in TYW3 homozygotes. Expression of Slm1 in RGCs in wholemount of wild-type (A) and TYW3 homozygous (B) retinas. Stained with Rbpms and Slm1. (C) Quantification of Slm1 expression in RGCs across TYW3 genotypes. In wild-type animals, 96.1 ± 0.8% of RGCs express Slm1, while in TYW3 homozygotes, only 39.9 ± 2.3% of Rbpms-positive RGCs express Slm1 (Mean ± SEM). There is no significant decrease in Slm1 expression by RGCs in TYW3 heterozygotes. Significance determined by one-way ANOVA (n = 5 for WT, n = 4 for TYW3 Tg/+ and TYW3 Tg/Tg). Expression of Slm1 in amacrine cells in wholemount in wild-type (D) and TYW3 homozygous (E) retina. Stained with AP2 and Slm1. (F) Quantification of Slm1 expression in amacrine cells across TYW3 genotypes. In wild-type animals, 85.0 ± 0.9% of RGCs express Slm1, while in TYW3 homozygotes, only 46.8 ± 0.5% of AP2-positive ACs express Slm1. Significance determined by t-test (n = 3 for both) (Mean ± SEM). Slm1 expression in TYW3-RGCs in wholemount in TYW3 heterozygous (G) and TYW3 homozygous (H) retina. Stained with GFP and Slm1. (I) Quantification of Slm1 expression in TYW3-RGCs. As in section, 97.9 ± 1.0% of heterozygous TYW3-RGCs express Slm1, while in TYW3 homozygotes, only 5.2 ± 2.3% still express Slm1. Significance determined by t-test (n = 4 for both) (Mean ± SEM). (J) Slm1 and Slm2 expression overlaps in wild-type retina. Slm2 is also expressed in bipolar cells of the INL. (K) Slm1 and Slm2 expression in TYW3 homozygous retina. While Slm1 levels decrease, there does not appear to be a significant change in the pattern of Slm2 expression. Scale bars are 40 μm.
We then assessed TYW3 homozygotes, which, as noted above, are Khdrbs2 hypomorphs. We detected no alterations in the overall structure of the retina (Figures 8D,E) or in the lamination pattern of TYW3 RGCs (Figures 8F,G). We found no significant change in the total number of RGCs (Rbpms-positive) in TYW3 homozygotes (Figure 8H). Likewise, the number of YFP-positive RGCs did not differ significantly between TYW3 heterozygotes and homozygotes (Figure 8I).
Although there were no changes in the general organization of TYW3 Tg/Tg retinas, decreased levels of Slm1 were apparent in both RGCs and amacrine cells. We found that only 39.9 ± 3.4% of Rbpms-positive RGCs (Figures 9A–C) and 46.8 ± 1.0% of AP2-positive amacrines were Slm1-positive in TYW3 homozygotes (Mean ± SEM) (Figures 9D–F). Interestingly, however, the loss of Slm1 from transgene-positive RGCs in TYW3 homozygotes was greater than that in RGCs generally: only 5.2 ± 2.3% of YFP-RGCs were detectably Slm1-positive in homozygotes (Figures 9G–I).
Because Slm2 is upregulated in the brains of Khdrbs2 knock out mice (Traunmüller et al., 2014), we investigated Slm2 expression in TYW3 homozygotes. In wild-types, Slm2 was expressed by retinal ganglion, amacrine, and bipolar cells (Figure 9J). We found no detectable upregulation of Slm2 in TYW3 Tg/Tg retinas (Figure 9K), but its expression in most Slm1-positive cells even in wild-types suggests that it may be able to compensate for loss of Slm1 in homozygotes.
Many researchers have benefited from transgenic mice in which reporters are expressed in specific cell types that were not readily predictable based on the expression of the gene from which the transgene’s regulatory elements were derived – in other words, transgenes exhibiting what was presumed to be insertion site-dependent expression (for example, Cohen-Tannoudji et al., 1994; Young et al., 2008; Huberman et al., 2009; Haverkamp et al., 2009; Trenholm et al., 2011; Kay et al., 2011b; Dhande et al., 2013; Krishnaswamy et al., 2015; Peng et al., 2017). Lines generated with Thy1 derived regulatory elements have been a particularly rich source of such variation: an initial set of 25 such “XFP” lines (Feng et al., 2000) have been used to study neuronal subsets in cortex, hippocampus, spinal cord and dorsal root ganglia as well as retina. Lines in other sets using the same regulatory elements, such as the Brainbow series, incorporating multiple XFPs (Livet et al., 2007) the SLICK series, incorporating a Cre recombinase (Young et al., 2008), and the TYW series, incorporating lacZ (Kim et al., 2010) show similar line-to-line variation; for these as well, different lines have been used to mark and analyze different cell types, including some in non-neuronal tissues.
Although identifying endogenous genes near the transgene could aid in interpreting transgene expression patterns, this has been attempted infrequently, in large part because methods for determining insertion sites have been unreliable. We were motivated to reexamine this issue for two reasons. First, initial reports suggested that the TLA method would be more reliable than its predecessors (de Vree et al., 2014; Cain-Hom et al., 2017). Second, in the course of our developmental studies, we obtained suggestive evidence that two such transgenes were inserted in close proximity to genes expressed in small neuronal subsets that the transgenes marked: HB9-GFP near Cdh6 and Mito-P near Fat4. Results reported here confirm those suppositions, demonstrate linkage of the TYW3 transgene to Khdrbs2 and, more important, provide new insights into the influence of transgenes and endogenous genes on each other.
Endogenous Genes Affect Expression of Neighboring Transgenes
Our claim for an effect of endogenous genes on transgene expression is based on the selective expression of the HB9-GFP transgene in Cdh6-positive V-ooDSGCs and the selective expression of the Mito-P transgene in Fat4-positive BC1 bipolar cells. Although we cannot entirely rule out the possibility that the correspondence is coincidental, it is highly unlikely. Of >120 retinal cell types, 3 express Cdh6 (V-ooDSGCs, D-ooDSGCs, and starburst amacrine cells) and 2 are GFP-positive in HB9-GFP transgenic mice (V-ooDSGCs and cones). The odds of one of the GFP-positive types being Cdh6-positive by chance are 0.05 (see Materials and Methods). Likewise, 2 cell types are Fat4-positive (BC1A and BC1B) and 3 are CFP-positive in the Mito-P line (BC1A, BC1B and nGnG amacrines), with the odds of two CFP-positive types being Fat4 positive by chance being 0.0004. This level of co-expression is therefore unlikely to be random.
Two aspects of the influence of the insertion site are noteworthy. First, as detailed above, it is incomplete, differing from patterns seen in “enhancer traps” that often faithfully report on the expression of a neighboring gene. What other factors might influence transgene expression? One possibility is that it reflects expression of the gene from which the transgene’s regulatory elements are derived, but this seems unlikely. Hb9 is not detectably expressed in RGCs or cones (Peng et al., 2017; Sarin et al., 2018) and although some bipolar cells express Thy1, they do so at substantially lower levels than RGCs (Barnstable and Dräger, 1984; Macosko et al., 2015). Another possibility is that rearrangements within the transgene or at the insertion site have generated new specificities, such as the deletion directly upstream of the Mito-P transgene and the duplication within the HB9-GFP transgene (Figures 3, 4). Yet another, and perhaps most likely, is the one initially proposed for unexpected patterns of transgene expression, that juxtaposition of sequences within and outside of the transgene generates novel specificities (Palmiter et al., 1983).
A second point of interest is that the distances between these two transgenes and the nearest annotated genes are rather large: HB9-GFP is ∼680 kb from Cdh6 and Mito-P is ∼550 kb downstream of Fat4. Although such long-distance interactions were once thought to be unusual, recent studies have shown that chromatin is organized into regions such as topologically associating domains or TADs, ranging from a few hundred kilobases to a few megabases, within which gene expression is coordinated by enhancers that act over the entire domain (Symmons et al., 2014; Dekker and Heard, 2015; Dixon et al., 2016). TADs represent islands of the genome that are physically isolated from each other and are increasingly viewed as basic units of chromatin folding. The physical separation may enhance co-regulation of genes in the same TAD and prevent regulatory elements from affecting expression of genes in other neighboring TADs. Using the dataset of Dixon et al. (2016; Wang et al., 2017), we find that HB9-GFP and Cdh6 are indeed within the same TAD, as are Mito-P and Fat4 (Figure 10). This 3-dimensional genomic compartmentalization produces secondary and tertiary structures, leading to interactions between regions that are separated by substantial linear sequence. Moreover, both transgenes are inserted in gene-poor regions, with no other annotated genes over spans of 2.9 and 1.0 Mb surrounding HB9-GFP and Mito-P respectively (Figures 3, 4). This contrasts with an average intergenic distance of approximately 100 kb in the mouse genome (approximately 2 × 104 genes in a 2 × 109bp genome; see also, Mayer et al., 2005). In such regions, complex interfering signals from multiple genes or boundaries that insulate genes from each other may be minimized. Thus, while the HB9-GFP and Mito-P transgenes are 100s of kilobases from their endogenous neighbors, gene-poor regions adjacent to the transgenes and their insertion into TADs may explain the selective expression of these transgenes by cell types also expressing their nearest neighbors.
FIGURE 10. Topologically associating domains (TADs) surrounding HB9-GFP, Mito-P, and TYW3. Predicted topologically associating domains (conserved regions of 3-dimensional interaction) surrounding (A) Cdh6 on Chromosome 15, (B) Fat4 on Chromosome 3 (B), and Khdrbs2 on Chromosome 1 (C), calculated using the tools in Wang et al. (2017). Cdh6 and HB9-GFP appear in the same TAD, as do Fat4 and Mito-P. There are no other genes within these TADs. The insertion site of TYW3 does not appear to be within a particularly interactive region of Chromosome 1. Individual bins represent 40 kb. Analysis was performed using the Hi-C mouse cortex dataset of Dixon et al. (2016) and yielded heat maps of predicted interactions between bins within the specified regions. Areas of higher intensity on the heatmap may be considered sub-TADs or individual loops within the larger TAD.
In contrast to Cdh6/HB9-GFP and Fat4/Mito-P, it is uncertain whether the TYW3 expression pattern is influenced by the nearby Khdrbs2 for several reasons. Thy1 is already expressed by all RGCs in wild-type retina and most transgenes incorporating Thy1 regulatory elements are expressed by at least some RGCs (Feng et al., 2000; Kim et al., 2010; Misgeld et al., 2007). Likewise, most RGCs are Khdrbs2-positive, and TYW3-RGCs did not express Khdrbs2 at detectably higher levels than their YFP-negative neighbors. Thus, it is not possible to disentangle the effects of Thy1- and Khdrbs2-derived regulatory sequences.
Transgenes Affect Expression of Neighboring Endogenous Genes
Many cases have been described in which insertion of a transgene mutates an endogenous gene, leading to severe defects or lethality (e.g., Soriano et al., 1987; Keller et al., 1990; Woychik and Alagramam, 1998). Indeed, some genetic screens have relied on insertional mutagenesis, using transposons and retroviral vectors as mutagens (Golling et al., 2002; Shima et al., 2016). Nonetheless, the possibility that transgenic reporters may affect endogenous genes in ways that lead to subtle defects is less often considered. It is therefore sobering that all three of the transgenes we studied affected expression of a neighboring endogenous gene. The effect, in this admittedly small sample, was distance dependent: greatest for TYW3 and Khdrbs2, separated by 7 kb, modest for Mito-P and Fat4, separated by 550 kb and small but significant for HB9-GFP and Cdh6, separated by 680 kb. Interference with the endogenous gene might result from interruption of endogenous regulatory elements, the deletions and rearrangements that accompany insertion, alterations in chromatin structure, or some combination.
For TYW3, the decrease in Slm1 expression was striking. We observed no overt phenotype, consistent with the finding that even Slm1 null mutants are viable, fertile and outwardly normal (Iijima et al., 2014; Traunmüller et al., 2014). Nonetheless, Khdrbs2 (Slm1) and its homologues, Khdrbs1 (Sam68) and Khdrbs3 (Slm2) are known to regulate alternative splicing of critical neuronal genes (Ehrmann et al., 2016) so alterations in its activity could affect neuronal development or function. Likewise, Fat4 has been implicated in several developmental processes, with the null mutant being neonatally lethal (Saburi et al., 2008). Thus, even the modest defects observed in Mito-P could be consequential under some circumstances.
Mapping Mouse Transgene Integration Sites Is Feasible and Useful
We have demonstrated complex reciprocal interactions between transgenes and neighboring endogenous genes. These interactions are interesting, but also potentially worrisome, since the endogenous gene affected by the transgene has a substantial chance of being expressed in the very cells that the transgene is being used to study.
We analyzed only three lines and we deliberately chose ones that exhibited interesting integration site-dependent expression patterns in retina, so it is difficult to draw strong conclusions about the frequency of these interactions. Nonetheless, there are several reasons to believe that they are more frequent than has been generally appreciated. First, we observed effects of an endogenous gene on transgene expression in two and possibly all three of the lines, and effects of the transgene on expression of the neighboring endogenous gene in all three of the lines. Second, for two of the three lines, the distance between transgene and nearest endogenous neighbor is >500 kb, dispelling the notion that insertion within a gene is prerequisite to interaction. Third, sporadic reports appearing over a long period have provided additional cases in which transgene insertions near but not within endogenous genes affect expression of the endogenous gene (e.g., Sharpe et al., 1999; Mukai et al., 2006), or the endogenous gene influences expression of the transgene (e.g., Kothary et al., 1988; Sharpe et al., 1999; Narboux-Nême et al., 2012).
Together, our results show that mapping of transgene insertion sites can be useful in at least three respects: First, once neighboring genes have been identified, their expression can be assayed to test the possibility that a transgenic line is in fact a hypomorph. Second, if the endogenous gene is expressed in cells marked by the transgene, it becomes a candidate effector of that cell’s development or function. Third, once the insertion site has been mapped, it becomes straightforward to devise genotyping protocols that are specific to the line and that readily distinguish heterozygotes from homozygotes. Additional potential uses include targeting new transgenes to insertion sites expected to confer desirable expression patterns on them.
To date, mapping of transgene insertion sites has not been standard practice, both because its value has been questionable and reliable methods for doing so have not been available. With improved methods now available and evidence accumulating that interactions of transgenes and endogenous genes are frequent, it may be advisable to make this a more common practice.
XD, ML, MQ, and IW performed the experiments and analyzed the data. JS conceived the project and analyzed the data. ML and JS wrote the manuscript. All the authors reviewed and edited the manuscript.
This work was supported by NIH grants R01 EY022073 and R37 NS029169 to JS and a Klingenstein-Simons Neuroscience Fellowship to XD.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Karthik Shekhar for statistical help; Max van Min and Judith Bergboer (Cergentis) for assistance in interpreting TLA results and permission to use material from their reports in Figures 3–5; Helen McNeil (Lunenfeld Institute) for providing the Fat4 conditional mutant and Peter Scheiffele. (Biozentrum, Basel) for antibodies to Slm1 and Slm2.
Bassett, E. A., Pontoriero, G. F., Feng, W., Marquardt, T., Fini, M. E., Williams, T., et al. (2007). Conditional deletion of activating protein 2alpha (AP-2alpha) in the developing retina demonstrates non-cell-autonomous roles for AP-2alpha in optic cup development. Mol. Cell. Biol. 27, 7497–7510. doi: 10.1128/MCB.00687-07
Bier, E., Vaessin, H., Shepherd, S., Lee, K., McCall, K., Barbel, S., et al. (1989). Searching for pattern and mutation in the Drosophila genome with a P-lacZ vector. Genes Dev. 3, 1273–1287. doi: 10.1101/gad.3.9.1273
Brinster, R. L., Chen, H. Y., Trumbauer, M., Senear, A. W., Warren, R., and Palmiter, R. D. (1981). Somatic expression of herpes thymidine kinase in mice following injection of a fusion gene into eggs. Cell 27, 223–231. doi: 10.1016/0092-8674(81)90376-7
Burgess, D. L., Kohrman, D. C., Galt, J., Plummer, N. W., Jones, J. M., Spear, B., et al. (1995). Mutation of a new sodium channel gene, Scn8a, in the mouse mutant ‘motor endplate disease’. Nat. Genet. 10, 461–465. doi: 10.1038/ng0895-461
Cain-Hom, C., Splinter, E., van Min, M., Simonis, M., van de Heijning, M., Martinez, M., et al. (2017). Efficient mapping of transgene integration sites and local structural changes in Cre transgenic mice using targeted locus amplification. Nucleic Acids Res. 45:e62. doi: 10.1093/nar/gkw1329
de Vree, P. J., de Wit, E., Yilmaz, M., van de Heijning, M., Klous, P., Verstegen, M. J., et al. (2014). Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping. Nat. Biotechnol. 32, 1019–1025. doi: 10.1038/nbt.2959
Dhande, O. S., Estevez, M. E., Quattrochi, L. E., El-Danaf, R. N., Nguyen, P. L., Berson, D. M., et al. (2013). Genetic dissection of retinal inputs to brainstem nuclei controlling image stabilization. J. Neurosci. 33, 17797–17813. doi: 10.1523/JNEUROSCI.2778-13.2013
Donoghue, M. J., Merlie, J. P., Rosenthal, N., and Sanes, J. R. (1991). A rostrocaudal gradient of transgene expression in adult skeletal muscle. Proc. Natl. Acad. Sci. U.S.A. 88, 5847–5851. doi: 10.1073/pnas.88.13.5847
Dubose, A. J., Lichtenstein, S. T., Narisu, N., Bonnycastle, L. L., Swift, A. J., Chines, P. S., et al. (2013). Use of microarray hybrid capture and next-generation sequencing to identify the anatomy of a transgene. Nucleic Acids Res. 41:e70. doi: 10.1093/nar/gks1463
Feng, G., Mellor, R., Bernstein, M., Keller-Peck, C., Nguyen, Q. T., Wallace, M., et al. (2000). Imaging neuronal subset in transgenic mice expressing multiple spectral variants of GFP. Neuron 28, 41–51. doi: 10.1016/S0896-6273(00)00084-2
Golling, G., Amsterdam, A., Sun, Z., Antonelli, M., Maldonado, E., Chen, W., et al. (2002). Insertional mutagenesis in zebrafish rapidly identifies genes essential for early vertebrate development. Nat. Genet. 31, 135–140. doi: 10.1038/ng896
Haverkamp, S., Inta, D., Monyer, H., and Wässle, H. (2009). Expression analysis of green fluorescent protein in retinal neurons of four transgenic mouse lines. Neuroscience 160, 126–139. doi: 10.1016/j.neuroscience.2009.01.081
Huberman, A. D., Wei, W., Elstrott, J., Stafford, B. K., Feller, M. B., and Barres, B. A. (2009). Genetic identification of an On-Off direction-selective retinal ganglion cell subtype reveals a layer-specific subcortical map of posterior motion. Neuron 62, 327–334. doi: 10.1016/j.neuron.2009.04.014
Iijima, T., Iijima, Y., Witte, H., and Scheiffele, P. (2014). Neuronal cell type-specific alternative splicing is regulated by the KH domain protein SLM1. J. Cell. Biol. 204, 331–342. doi: 10.1083/jcb.201310136
Kay, J. N., De la Huerta, I., Kim, I. J., Zhang, Y., Yamagata, M., Chu, M. W., et al. (2011a). Retinal ganglion cells with distinct directional preferences differ in molecular identity, structure, and central projections. J. Neurosci. 31, 7753–7762. doi: 10.1523/JNEUROSCI.0907-11.2011
Kay, J. N., Voinescu, P. E., Chu, M. W., and Sanes, J. R. (2011b). Neurod6 expression defines novel retinal amacrine cell subtypes and regulates their fate. Nat. Neurosci. 14, 965–972. doi: 10.1038/nn.2859
Kim, I. J., Zhang, Y., Meister, M., and Sanes, J. R. (2010). Laminar restriction of retinal ganglion cell dendrites and axons: subtype-specific developmental patterns revealed with transgenic markers. J. Neurosci. 30, 1452–1462. doi: 10.1523/JNEUROSCI.4779-09.2010
Kothary, R., Clapoff, S., Brown, A., Campbell, R., Peterson, A., and Rossant, J. (1988). A transgene containing lacZ inserted into the dystonia locus is expressed in neural tube. Nature 335, 435–437. doi: 10.1038/335435a0
Krishnaswamy, A., Yamagata, M., Duan, X., Hong, Y. K., and Sanes, J. R. (2015). Sidekick 2 directs formation of a retinal pathway that detects differential motion. Nature 524, 466–470. doi: 10.1038/nature14682
Livet, J., Weissman, T. A., Kang, H., Draft, R. W., Lu, J., Bennis, R. A., et al. (2007). Transgenic strategies for combinatorial expression of fluorescent proteins in the nervous system. Nature 450, 56–62. doi: 10.1038/nature06293
Macosko, E. Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., et al. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214. doi: 10.1016/j.cell.2015.05.002
Mayer, R., Brero, A., von Hase, J., Schroeder, T., Cremer, T., and Dietzel, S. (2005). Common themes and cell type specific variations of higher order chromatin arrangements in the mouse. BMC Cell Biol. 6:44. doi: 10.1186/1471-2121-6-44
Mukai, H. Y., Motohashi, H., Ohneda, O., Suzuki, N., Nagano, M., and Yamamoto, M. (2006). Transgene insertion in proximity to the c-myb gene disrupts erythroid-megakaryocytic lineage bifurcation. Mol. Cell. Biol. 26, 7953–7965. doi: 10.1128/MCB.00718-06
Narboux-Nême, N., Goïame, R., Mattéi, M. G., Cohen-Tannoudji, M., and Wassef, M. (2012). Integration of H-2Z1, a somatosensory cortex-expressed transgene, interferes with the expression of the Satb1 and Tbc1d5 flanking genes and affects the differentiation of a subset of cortical interneurons. J. Neurosci. 32, 7287–7300. doi: 10.1523/JNEUROSCI.6068-11.2012
Palmiter, R. D., Norstedt, G., Gelinas, R. E., Hammer, R. E., and Brinster, R. L. (1983). Metallothionein-human GH fusion genes stimulate growth of mice. Science 222, 809–814. doi: 10.1126/science.6356363
Peng, Y. R., Tran, N. M., Krishnaswamy, A., Kostadinov, D., Martersteck, E. M., and Sanes, J. R. (2017). Satb1 regulates contactin 5 to pattern dendrites of a mammalian retinal ganglion cell. Neuron 95, 869–883. doi: 10.1016/j.neuron.2017.07.019
Raman, P., Grachtchouk, V., Lyons, R. H. Jr., and Koenig, R. J. (2015). Identification of the genomic insertion site of the thyroid peroxidase promoter-cre recombinase transgene using a novel, efficient, next-generation DNA sequencing method. Thyroid 25, 1162–1166. doi: 10.1089/thy.2015.0215
Rao, M. V., Donoghue, M. J., Merlie, J. P., and Sanes, J. R. (1996). Distinct regulatory elements control muscle-specific, fiber type-selective, and axially-graded expression of a myosin light chain gene in transgenic mice. Mol. Cell. Biol. 16, 3909–3922. doi: 10.1128/MCB.16.7.3909
Ray, T. A., Roy, S., Kozlowski, C., Wang, J., Cafaro, J., Hulbert, S. W., et al. (2018). Formation of retinal direction-selective circuitry initiated by starburst amacrine cell homotypic contact. eLife 7:e34241. doi: 10.7554/eLife.34241
Rodriguez, A. R., de Sevilla Müller, L. P., and Brecha, N. C. (2014). The RNA binding protein RBPMS is a selective marker of ganglion cells in the mammalian retina. J. Comp. Neurol. 522, 1411–1443. doi: 10.1002/cne.23521
Saburi, S., Hester, I., Fischer, E., Pontoglio, M., Eremina, V., Gessler, M., et al. (2008). Loss of Fat4 disrupts PCP signaling and oriented cell division and leads to cystic kidney disease. Nat. Genet. 40, 1010–1015. doi: 10.1038/ng.179
Sarin, S., Zuniga-Sanchez, E., Kurmangaliyev, Y. Z., Cousins, H., Patel, M., Hernandez, J., et al. (2018). Role for Wnt signaling in retinal neuropil development: analysis via RNA-seq and in vivo somatic CRISPR mutagenesis. Neuron 98, 109–126. doi: 10.1016/j.neuron.2018.03.004
Schubert, T., Kerschensteiner, D., Eggers, E. D., Misgeld, T., Kerschensteiner, M., Lichtman, J. W., et al. (2008). Development of presynaptic inhibition onto retinal bipolar cell axon terminals is subclass-specific. J. Neurophysiol. 100, 304–316. doi: 10.1152/jn.90202.2008
Sethuramanujam, S., Yao, X., deRosenroll, G., Briggman, K. L., Field, G. D., and Awatramani, G. B. (2017). “Silent” NMDA synapses enhance motion sensitivity in a mature retinal circuit. Neuron 96, 1099–1111. doi: 10.1016/j.neuron.2017.09.058
Sha, H., Xu, J., Tang, J., Ding, J., Gong, J., Ge, X., et al. (2007). Disruption of a novel regulatory locus results in decreased Bdnf expression, obesity, and type 2 diabetes in mice. Physiol. Genomics 31, 252–263. doi: 10.1152/physiolgenomics.00093.2007
Sharpe, J., Lettice, L., Hecksher-Sorensen, J., Fox, M., Hill, R., and Krumlauf, R. (1999). Identification of sonic hedgehog as a candidate gene responsible for the polydactylous mouse mutant Sasquatch. Curr Biol. 9, 97–100. doi: 10.1016/S0960-9822(99)80022-0
Shekhar, K., Lapan, S. W., Whitney, I. E., Tran, N. M., Macosko, E. Z., Kowalczyk, K., et al. (2016). Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166, 1308–1323. doi: 10.1016/j.cell.2016.07.054
Shima, Y., Sugino, K., Hempel, C. M., Shima, M., Taneja, P., Bullis, J. B., et al. (2016). A mammalian enhancer trap resource for discovering and manipulating neuronal cell types. eLife 5:e13503. doi: 10.7554/eLife.13503
Soriano, P., Gridley, T., and Jaenisch, R. (1987). Retroviruses and insertional mutagenesis in mice: proviral integration at the Mov 34 locus leads to early embryonic death. Genes Dev. 1, 366–375. doi: 10.1101/gad.1.4.366
Srivastava, A., Philip, V. M., Greenstein, I., Rowe, L. B., Barter, M., Lutz, C., et al. (2014). Discovery of transgene insertion sites by high throughput sequencing of mate pair libraries. BMC Genomics 15:367. doi: 10.1186/1471-2164-15-367
Suzuki, O., Hata, T., Takekawa, N., Koura, M., Takano, K., Yamamoto, Y., et al. (2006). Transgene insertion pattern analysis using genomic walking in a transgenic mouse line. Exp. Anim. 55, 65–69. doi: 10.1538/expanim.55.65
Swanson, L. W., Simmons, D. M., Arriza, J., Hammer, R., Brinster, R., Rosenfeld, M. G., et al. (1985). Novel developmental specificity in the nervous system of transgenic animals expressing growth hormone fusion genes. Nature 317, 363–366. doi: 10.1038/317363a0
Symmons, O., Uslu, V. V., Tsujimura, T., Ruf, S., Nassari, S., Schwarzer, W., et al. (2014). Functional and topological characteristics of mammalian regulatory domains. Genome Res. 24, 390–400. doi: 10.1101/gr.163519.113
Traunmüller, L., Bornmann, C., and Scheiffele, P. (2014). Alternative splicing coupled nonsense-mediated decay generates neuronal cell type-specific expression of SLM proteins. J. Neurosci. 34, 16755–16761. doi: 10.1523/JNEUROSCI.3395-14.2014
Wang, Y., Zhang, B., Zhang, L., An, L., Xu, J., Li, D., et al. (2017). The 3D Genome Browser: A Web-Based Browser for Visualizing 3D Genome Organization and Long-Range Chromatin Interactions. Available at: https://www.biorxiv.org/content/early/2017/02/27/112268/
Weis, J., Fine, S. M., David, C., Savarirayan, S., and Sanes, J. R. (1991). Integration site-dependent expression of a transgene reveals specialized features of cells associated with neuromuscular junctions. J. Cell Biol. 113, 1385–1397. doi: 10.1083/jcb.113.6.1385
Williams, A., Harker, N., Ktistaki, E., Veiga-Fernandes, H., Roderick, K., Tolaini, M., et al. (2008). Position effect variegation and imprinting of transgenes in lymphocytes. Nucleic Acids Res. 36, 2320–2329. doi: 10.1093/nar/gkn085
Young, P., Qiu, L., Wang, D., Zhao, S., Gross, J., and Feng, G. (2008). Single-neuron labeling with inducible cre-mediated knockout in transgenic mice. Nat. Neurosci. 11, 721–728. doi: 10.1038/nn.2118
Keywords: transgenic mice, GFP, Mnx1, Khdrbs2, Fat4, retina, targeted locus amplification
Citation: Laboulaye MA, Duan X, Qiao M, Whitney IE and Sanes JR (2018) Mapping Transgene Insertion Sites Reveals Complex Interactions Between Mouse Transgenes and Neighboring Endogenous Genes. Front. Mol. Neurosci. 11:385. doi: 10.3389/fnmol.2018.00385
Received: 20 July 2018; Accepted: 25 September 2018;
Published: 23 October 2018.
Edited by:Peter K. Giese, King’s College London, United Kingdom
Reviewed by:E. Michelle Southard-Smith, Vanderbilt University Medical Center, United States
Johannes Hirrlinger, Leipzig University, Germany
Copyright © 2018 Laboulaye, Duan, Qiao, Whitney and Sanes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Joshua R. Sanes, firstname.lastname@example.org
†Present address: Xin Duan, Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, United States; Mu Qiao, Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States