Mapping Transgene Insertion Sites Reveals Complex Interactions Between Mouse Transgenes and Neighboring Endogenous Genes

Transgenic mouse lines are routinely employed to label and manipulate distinct cell types. The transgene generally comprises cell-type specific regulatory elements linked to a cDNA encoding a reporter or other protein. However, off-target expression seemingly unrelated to the regulatory elements in the transgene is often observed, it is sometimes suspected to reflect influences related to the site of transgene integration in the genome. To test this hypothesis, we used a proximity ligation-based method, Targeted Locus Amplification (TLA), to map the insertion sites of three well-characterized transgenes that appeared to exhibit insertion site-dependent expression in retina. The nearest endogenous genes to transgenes HB9-GFP, Mito-P, and TYW3 are Cdh6, Fat4 and Khdrbs2, respectively. For two lines, we demonstrate that expression reflects that of the closest endogenous gene (Fat4 and Cdh6), even though the distance between transgene and endogenous gene is 550 and 680 kb, respectively. In all three lines, the transgenes decrease expression of the neighboring endogenous genes. In each case, the affected endogenous gene was expressed in at least some of the cell types that the transgenic line has been used to mark and study. These results provide insights into the effects of transgenes and endogenous genes on each other’s expression, demonstrate that mapping insertion site is valuable for interpreting results obtained with transgenic lines, and indicate that TLA is a reliable method for integration site discovery.


INTRODUCTION
The invention of a method for generating transgenic mice by injection of a plasmid into the fertilized oocyte (Brinster et al., 1981;Gordon and Ruddle, 1981;Palmiter, 1984/1985) was a transformative advance in biology. In most cases, the plasmid encodes a cDNA linked to regulatory elements (promoter and enhancer) that direct its expression. These mice have been used in three main ways. In one, the purpose is to identify regulatory sequences that govern temporal and spatial patterns of gene expression; here, the cDNA encodes a reporter gene that enables expression to be mapped. In the second, the purpose is to analyze the roles of a gene product by expressing it, or a protein that interferes with it; here, the cDNA encodes the protein under study and the regulatory elements are chosen to promote the desired expression pattern. In the third, the reporter is used to mark and analyze cells that express the gene from which the regulatory elements are derived.
In all three cases, the expectations are (a) that regulatory elements will direct expression in some or all of the cells in which the parent gene is normally expressed and (b) that all transgenic lines established from the same plasmid will exhibit qualitatively similar expression patterns. Indeed, these conditions are frequently met. In some cases, however, expression does not correspond to that expected from the regulatory elements included in the transgene, and/or expression patterns vary among lines. Such unexpected labeling patterns can be an advantage or a detriment. Most often, they foil attempts to map enhancers, mark cells, or interfere with a biological process in a desired manner. They can also, however, provide unanticipated opportunities to define and mark cell types that had been undiscovered or inaccessible (e.g., Weis et al., 1991;Kim et al., 2010).
What accounts for these unpredictable expression patterns? When patterns are similar among lines established from the same plasmid, the likely explanations are that juxtapositions among normally separate regulatory elements or isolation of such sequences from their native context lead to new specificities (Swanson et al., 1985;Donoghue et al., 1991;Rao et al., 1996). In contrast, when expression patterns differ among independently generated lines, variations are generally presumed to reflect influences of endogenous sequences near the chromosomal site of integration (Palmiter et al., 1983) and are therefore termed "integration site-dependent." The simplest explanation is that the reporter comes to be controlled by regulatory elements of a nearby endogenous gene, as seen in "enhancer traps" in transgenic Drosophila (Bier et al., 1989), zebrafish (Golling et al., 2002) and, recently, mice (Shima et al., 2016), all of which incorporate a minimal, tissue non-specific promoter but no strong regulatory elements. This mechanism is unlikely to provide a full explanation for conventional transgenes, however, which generally include tissue-specific promoters and enhancers. Other possibilities include novel specificities generated by juxtapositions of transgene and endogenous sequences, variations in chromatin conformation near the integration site, differences in transgene copy number, or mutations of either the transgene or flanking sequences that occur upon transgene integration (Palmiter and Brinster, 1986;Feng et al., 2000).
Having generated many transgenic lines with such insertion site-dependent expression patterns (Weis et al., 1991;Feng et al., 2000;Kim et al., 2010), we have become interested in the relationship between insertion sites and transgene expression patterns. In a few cases, the expression of a transgene has been related to that of a specific, nearby endogenous gene (Kothary et al., 1988;Sharpe et al., 1999;Narboux-Nême et al., 2012). We reasoned that if this were a general phenomenon, the endogenous gene might play a role in the development or function of the marked cells. Unfortunately, although identifying insertion site is straightforward for invertebrates, available methods for mice (Burgess et al., 1995;Sharpe et al., 1999;Suzuki et al., 2006;Sha et al., 2007;Liang et al., 2008;Dubose et al., 2013;Srivastava et al., 2014;Raman et al., 2015) have been cumbersome, little used and, in our hands, largely unsuccessful. Recently, however, a newly developed method termed Targeted Locus Amplification (TLA) was introduced that appeared to be more promising (de Vree et al., 2014;Cain-Hom et al., 2017). In TLA, genomic DNA in nuclei is cross-linked by formaldehyde, digested into small fragments by the frequently cutting NlaIII restriction enzyme (average fragment size ∼0.2 kb) and religated to form larger circular DNA containing fragments that were likely to have been near neighbors on a chromosome. These fragments are de-crosslinked and digested by another restriction enzyme, NspI, to create ∼2 kb fragments, which are then subjected to PCR with primers derived from sequences unique to the transgene. By amplifying fragments that contain the transgene sequence, this step selectively amplifies genomic sequences near the transgene insertion site. The product is sequenced and mapped to the genome, thereby localizing the transgene and also revealing insertions, deletions or other structural rearrangements both within the transgene and in flanking sequences.
We used TLA to determine insertion sites for three transgenic lines that incorporate fluorescent proteins as reporters: HB9-GFP (green; Wichterle et al., 2002), Mito-P (cyan, CFP; Misgeld et al., 2007), and TYW3 (yellow, YFP; Kim et al., 2010). All label subsets of cells in retina by what appears to be an insertion site-dependent mechanism, and have been used in studies of retinal development and function (Schubert et al., 2008;Kim et al., 2010;Trenholm et al., 2011;Kay et al., 2011aKay et al., ,b, 2012Duan et al., 2014;Krishnaswamy et al., 2015;Shekhar et al., 2016;Peng et al., 2017;Sethuramanujam et al., 2017;Ray et al., 2018). For two of them, our interest was heightened by breeding experiments in which we attempted to generate transgenic animals that also carried mutations of genes expressed in cell types labeled by the transgene. Our inability to generate such animals by conventional mating strategies suggested that the transgenes were linked to genes expressed in some of the retinal cells marked by the fluorescent protein: Cdh6 in one line and Fat4 in another. For all three transgenic lines, we document interactions between the transgene and the closest endogenous gene. For two of them, the nearest endogenous gene is hundreds of kilobases (kb) from the transgene and yet it appears to strongly influence transgene expression. For all three, the transgene decreases expression of a nearby endogenous gene in a position-dependent manner. Together, our results provide novel insights into insertion site-dependent transgene expression and strengthen the argument that determination of insertion sites can be useful both for gene discovery and for assessing effects of transgene insertion that would otherwise go undetected.

Animals
Animal protocols were approved by the Institutional Animal Care and Use Committee (IACUC) at Harvard University. Animals were used in accordance with NIH guidelines. Mutants were maintained on a C57BL/6J background (JAX Stock No. 000664). We obtained both the HB9-GFP (Wichterle et al., 2002) and the Thy1-mitoCFP-P (Misgeld et al., 2007) transgenic mouse lines from Jackson Laboratories (JAX Stock No. 005029 and 006617). For brevity, we refer to Thy1-mitoCFP-P as Mito-P. The Fat4 conditional mutant (Saburi et al., 2008) was a kind gift of Helen McNeill (U. Toronto). The TYW3 line was generated in our laboratory using a Thy1-lox-YFP-STOP-lox-WGA-ires-LacZ sequence, as previously reported (Kim et al., 2010). The Cdh6 CreER line was also generated in house by targeted insertion of a frt-neo-frt cassette, a 6xMyc-tagged CreER-T2, and a poly-adenylation signal at the translational start site of the Cdh6 coding sequence (Kay et al., 2011a) (JAX Stock No. 029428). HB9-GFP, Mito-P and TYW3 mice were back-crossed to C57BL/6J mice for at least 10 generations before being used for TLA.
For histology, in situ hybridization, and TLA, HB9-GFP and Mito-P tissue was collected at post-natal day 15 (P15) and P30 respectively. TYW3 retinas and spleen were collected at P56. For PCR of genomic DNA and RT-qPCR, samples were collected at P6-8 for HB9-GFP, P25-30 for Mito-P, and P50-60 for TYW3. Age-matched control animals were either wild-type littermates or C57BL/6J. Animals of both sexes were included in all experiments in roughly equal numbers.

Histology
Mice were euthanized by intraperitoneal injection of Euthasol (Virbac). Eyes were removed and fixed in 4% PFA in PBS for 90 min. Retinas were then dissected and rinsed with PBS. Retinas to be sectioned were sunk in 30% sucrose in PBS overnight at 4 • C, embedded in tissue freezing medium, frozen in dry ice and stored at −80 • C until processing. Retinas were then sectioned at 20 µm on a cryostat. Sections were rehydrated in PBS, incubated in 5% Normal Donkey Serum (NDS) (Jackson ImmunoResearch), 0.3% Triton X-100 (Sigma-Aldrich) in PBS for 2 h and then incubated with primary antibodies overnight at 4 • C. Sections were then washed in PBS, incubated with secondary antibodies for 2 h at room temperature, washed again, dried, and mounted with Vectashield (Vector Lab).
For whole mounts, fixed retinas were incubated with 5% NDS, 1% Triton X-100 in PBS for 3 h and then incubated in primary antibody for 5 days at 4 • C. Retinas were then washed in PBS and incubated overnight in secondary antibody. Finally, retinas were washed in PBS, flat-mounted on cellulose membrane filters (Millipore), coverslipped with Fluoro-Gel (Electron Microscopy Sciences), and sealed.
In situ hybridization was performed as described elsewhere (Kay et al., 2011a;Duan et al., 2014). Tissue was collected and prepared with RNase-free reagents, sectioned and imaged as described above. Section hybridization was carried out at 65 • C. Probes were detected using anti-digoxigenin (DIG) antibodies conjugated to horseradish peroxidase (HRP), followed by amplification with Cy3-tyramide (TSA-Plus System; Perkin-Elmer Life Sciences, Waltham, MA, United States) for 2 h.
Images were acquired using 488, 568, and 647 nm lasers on an Olympus-FV1000 Confocal Microscope. We used ImageJ (NIH) software to analyze confocal stacks and generate maximum intensity projections.

Targeted Locus Amplification
Targeted locus amplification (TLA) technology uses the physical proximity of nucleotides within a locus of interest to generate a map of original sequences and corresponding inserted transgenes (de Vree et al., 2014). Transgenic homozygotes and wild-type controls were euthanized and cells were prepared from their spleens (Cain-Hom et al., 2017). Homozygotes were distinguished from heterozygotes by fluorescent quantitative PCR (qPCR) results from a commercial genotyping service (Transnetyx) 1 . In some cases we confirmed their results in our laboratory by mating or additional PCR. The cells were frozen and shipped to Cergentis (Utrecht, Netherlands). TLA was then performed as described in de Vree et al. (2014) and Hottentot et al. (2017). Briefly, DNA was crosslinked, fragmented, religated, and decrosslinked. This product served as the TLA template, which was subsequently fragmented, circularized, and amplified with inverse primers complementary to a short locus-specific sequence. Once the complete locus was amplified, ∼2 kb segments were sheared. Libraries were prepared for sequencing by MiSeq or HiSeq technologies on an Illumina platform.
We confirmed the predicted insertion of HB9-GFP on Chromosome 15 using two primer pairs, one for the left junction Primers 1 and 2, and one for the right junction, Primers 3 and 4. We confirmed the predicted insertion of Mito-P on Chromosome 3 with Primers 5 and 6 and the predicted deletion in Chromosome 3 with Primers 5 and 7. We confirmed the insertion of TYW3 with Primers 8 and 9 and the predicted deletion in Chromosome 1 with Primers 10 and 11.
We extracted DNA from tail-clips of wild-type, heterozygous, and homozygous animals with 50 µL Quick Extract (Lucigen) at 68 • C for 30 min and 98 • C for 3 min in a PCR machine. Zygosity was determined as described above. PCR reaction mixtures were 3 µL DNA, 12.5 µL Econotaq Plus Green 2X Master Mix (Lucigen), 7.5 µL H 2 0, 0.4 µL F primer 10 mM, 0.4 µL R primer 10 mM for 25 µL reactions. The reaction program was 94 • C for 2 min; [94 • C for 30 s, 50-55 • C for 30 s, 72 • C for 1 min] x 40 cycles; 72 • C for 5 min. The annealing temperature varied in accordance with the melting temperature of the primer pairs tested.

RT qPCR
Mice were euthanized by intraperitoneal injection of Euthasol. Retinas were dissected and RNA was extracted with 250 µL Trizol Reagent (Invitrogen) and DirectZol RNA miniprep kit (Zymo Research). cDNA synthesis was carried out by incubating 8 µL (100-500 ng) extracted RNA with 1 µL oligodT (20) (Thermo Fisher) and 1 µL dNTP (Thermo Fisher) for 5 min at 65 • C. Next, reverse transcription was performed by incubating samples with 2 µL 10X RT buffer (Invitrogen), 4 µL 25mM MgCl2 (NEB), 2 µL 0.1 DTT (Invitrogen), 1 µL RnaseOUT Recombinant RNase Inhibitor (Thermo Fisher), and 1 µL SuperScript III (Invitrogen) in a PCR cycler for 50 min at 50 • C, 5 min at 85 • . Remaining RNA was removed by addition of 1 µL Ribonuclease H (Thermo Fisher) and incubation at 37 • C for 20 min. qPCR was performed using KAPA SYBR FAST qPCR master mix (Kapa Biosystems). Reactions were carried out with 8.2 µL H 2 O, 10 µL SYBR FAST master mix, 1 µL 10 mM forward primer, 1 µL 10 mM reverse primer, and 1 µL DNA. The reaction program was run on an ABI 7900 as follows: 95 • C for 5 min; [95 • C for 30 s, 60 • C for 45 s]x 40 cycles; 95 • C for 15 s; 60 • C for 15 s; 95 • C for 15 s. For Fat4, the reactions differed only in that the annealing temperature was set to 54 • C instead of 60 • C. We ran samples in triplicates and normalized expression of our genes of interest to Gapdh levels. Resulting CT values were used to calculate CT and fold changes in the expression of endogenous genes in our three transgenic lines. Both technical and experimental replicates were included. In most case, wild-type littermates were used as controls. Primers were as follows:

Estimating Copy Number
We estimated transgene copy number from quantitative fluorescent PCR data provided by our genotyping service (Transnetyx). The raw signal returned for every sample was a function of the CT between the housekeeping gene c-Jun and a probe for our gene of interest: We compiled raw signal data for heterozygous animals from the HB9-GFP, Mito-P, and TYW3 lines genotyped with either a GFP or a LacZ probe. We then averaged the signal for each transgenic strain. We normalized these values to the average intensity returned for our single copy Cre-GFP and LacZ knockin lines and calculated an estimated copy number for our three transgenic lines of interest.

Statistical Analysis
Comparisons were performed using GraphPad Prism software. For single comparisons, we used Student's t-test. For multiple comparisons, we used one-way ANOVA.
We calculated the odds of transgene and endogenous gene expression overlapping in individual cell types using elementary combinatorics. The combinatoric formula yields the probability that the transgene and the endogenous gene would be expressed in the same cell type by chance, given an equal probability of being expressed in any cell type. Thus, given 3/120 Cdh6+ types and 2/120 GFP+ types in the retina, the odds of finding at least 1 double + type is: Similarly, given 2/120 Fat4+ types and 3/120 CFP+ types in the retina, the odds of finding 2 double + type is: The null hypothesis is that expression of transgene within a given cell type is unrelated to whether that cell type expresses

RESULTS
The HB9-GFP Transgene Is Inserted Near the Cdh6 Locus on Mouse Chromosome 15 The HB9-GFP transgene is composed of a 9 kb fragment from the 5 end of the Mnx1 gene (previously called Hb9) that extends into the first exon, linked to a cDNA encoding the enhanced green fluorescent protein (GFP) (Wichterle et al., 2002). It was generated to label motor neurons, which express Mnx1, but was later shown to also mark two types of retinal neurons: a subset of cone photoreceptors and a type of retinal ganglion cell (RGC) that responds selectively to dark or bright objects moving in a dorsal-to-ventral direction across the retina (ventral-preferring on-off direction-selective retinal ganglion cells or V-ooDSGCs; Trenholm et al., 2011Trenholm et al., , 2013; Figures 1A,B,E). RNA-Seq data generated in our laboratory show that neither cones nor V-ooDSGCs express Mnx1 at detectable levels (Peng et al., 2017;Sarin et al., 2018). In a study of ooDSGCs, we discovered that V-ooDSGCs and dorsal preferring ooDSGCs (D-ooDSGCs) both express Cdh6, which encodes the recognition molecule Cadherin 6 (Kay et al., 2011a). We confirmed this previously documented co-expression for V-ooDSGCs using the HB9-GFP line (Figure 2A). Cdh6 is also expressed in a set of amacrine interneurons, termed starburst amacrine cells, which innervate ooDSGCs but do not express HB9-GFP; conversely, cones are HB9-GFP-positive but Cdh6-negative. Thus, of >120 retinal neuronal types (Laboulaye et al., unpublished), one expresses Cdh6 but not HB9-GFP, one expresses HB9-GFP but not Cdh6, one expresses both and >115 express neither.
To study the role of Cdh6 in the development and function of V-ooDSCGs, we generated Cdh6 mutants and attempted to generate HB9-GFP +/− ;Cdh6 −/− mice by crossing HB9-GFP +/− ;Cdh6 +/− and Cdh6 −/− mice. We retrieved no HB9-GFP +/− ;Cdh6 −/− animals from >200 offspring, suggesting that the HB9-GFP transgene and the endogenous Cdh6 gene were linked. We used TLA to test this possibility. Two primer pairs complementary to the transgene sequence were designed, one complementary to GFP sequences at the 3 end of the transgene and the other complementary to Mnx1 sequences at the 5 end of the transgene ( Figure 3A). Both were used to generate products that were sequenced to a depth of 5 Mb.
Results from both sets of primers identified the insertion site of the HB9-GFP transgene on Chromosome 15 (Chr15: 13,853,116 -13,862,097) (Figures 3B-D). Consistent with our prediction, the gene nearest to the HB9-GFP transgene was Cdh6 (Chr15:13,034,173,675), with the 3 end of Cdh6 ∼680 kb upstream of the 5 end of HB9-GFP ( Figure 3E). To confirm the insertion, we designed primers flanking the predicted left and right junctions. Genomic PCR confirmed the insertion of HB9-GFP in heterozygotes and homozygotes for both of these primer sets (Figures 3F,G). Based on the orientation of the junctions and the relative position of the transgene, we conclude that the transgene was inserted 3 to 5 .
The Mito-P Transgene Is Inserted Near Fat4 on Chromosome 3 In the Mito-P transgene, the coding sequence of the enhanced cyan fluorescent protein (CFP) was fused to a sequence encoding a 31 amino acid fragment from the human subunit VIII of the cytochrome c oxidase gene sufficient to drive expression in mitochondria (Misgeld et al., 2007). This construct was inserted into a 6.5 kb fragment of the Thy1 gene that is known to drive expression in central projection neurons including motor and sensory neurons and RGCs (Caroni, 1997;Feng et al., 2000) (Figure 4A). This transgene was designed and used to monitor mitochondrial dynamics in motor axons. Several lines were generated, each of which labeled distinct neuronal types, indicating an insertion site-dependent pattern (Misgeld et al., 2007). In addition to motor axons, Mito-P also labels two of 15 types of bipolar interneurons (Type 1A and Type 1B) as well as one of ∼50 types of amacrine interneuron (nGnG) in retina (Schubert et al., 2008;Kay et al., 2011b;Shekhar et al., 2016). Bipolar and amacrine cells express Thy1 at low levels; a small number of RGCs, which express Thy1 at far higher levels, are also labeled ( Figures 1A,C,F).
In a study of mouse bipolar cells, we found that Types 1A and 1B expressed Fat4, which encodes a cell surface protein implicated in cell polarity (Shekhar et al., 2016; Figure 2B).
To study the role of Fat4 in retinal development, we obtained conditional Fat4 mutants (Saburi et al., 2008) and attempted to generate Mito-P;Fat4 loxP/loxP mice by breeding. However, we retrieved no Mito-P;Fat4 loxP/loxP animals among >100 offspring generated from Mito-P;Fat4 loxP/+ x Fat4 loxP/loxP matings. This result, which paralleled that described above for HB9-GFP and Cdh6, suggested that the Mito-P transgene was inserted near the Fat4 locus.
TLA revealed that the Mito-P transgene was inserted in Chromosome 3 (Chr3: 39,505,508,740), and that the insertion was accompanied by a 3 kb deletion (Figures 4A-D). The insertion site is located 550 kb from the 3 end of the Fat4 gene (Chr3: 38,886,952,429), accounting for our failure to recover Mito-P;Fat4 loxP/loxP offspring via conventional recombination. We confirmed both the insertion site and the accompanying deletion by PCR on genomic DNA (Figures 4E-G).

The TYW3 Transgene Is Inserted Near
Khdrbs2 on Chromosome 1 We generated a set of transgenic mice called TYW using the Thy1 sequences described above (Caroni, 1997;Feng et al., 2000). The transgene included a cDNA encoding YFP flanked by LoxP sites followed by cDNAs for E. coli beta galactosidase (LacZ) and wheat germ agglutinin (WGA) (Figure 5A). It was designed to express YFP constitutively and LacZ plus WGA following excision of the floxed cassette with Cre recombinase (Kim et al., 2010). In practice, however, LacZ and WGA were expressed at undetectable levels, but the YFP was expressed strongly. Each of several lines labeled distinct sets of retinal neurons. TYW3 labeled six of ∼45 RGC types. One type was labeled most brightly; we called it W3B and analyzed its development and function in detail (Zhang et al., 2012;Krishnaswamy et al., 2015). Remarkably, dendrites of all 6 (W3B and five dimmer types together called W3D) laminated in a narrow central stratum in the central third of the inner plexiform layer (Figures 1A,D,G) (Whitney and Sanes, unpublished). Assuming that all thirds of the IPL are populated by equal numbers of dendrites, the odds of 6 stratifying in the same third are (1/3 6 ) × 3 or 1/243. We therefore wondered whether sequences near the TYW3 insertion site contributed to this unusual expression pattern.
TLA revealed that the TYW3 transgene was inserted in Chromosome 1 (Chr1: 31,745,752-32,165,062). The insertion was accompanied by a 420 kb deletion directly upstream of the insertion site (Figures 5A-D). The 3 end of the TYW3 transgene is 7 kb upstream of the initiation codon of the Khdrbs2 gene (Figures 5D,E), which encodes an RNA binding regulator of alternative splicing also called Slm1 (Ehrmann et al., 2016). We validated the insertion site with primers spanning the junction between the 3 end of Chromosome 1 and the 5 end of the transgene (Figure 5F), as well as the deletion caused by the insertion using primers for a sequence within this putatively deleted region (Figure 5G).

Insertion Site Maps Enable Determination of Zygosity by Conventional PCR Genotyping
When transgenic mice are inbred, heterozygotes are generally distinguished from homozygotes either by qPCR of genomic DNA or by outcrossing to wild type animals. The former, which we used to assess zygosity for TLA, is subject to considerable variation and the latter is cumbersome; both are time-consuming and costly. Moreover, the use of primers derived from the reporter (e.g., GFP) can also give ambiguous results if, for example, more than one line contains a fluorescent protein as occurs in some complex mating schemes. Once insertion sites are mapped, however, line-specific primers can be designed that allow one to distinguish zygosity without relying on relative fluorescent RT-PCR intensities (Cain-Hom et al., 2017). We demonstrate this for two of the three lines analyzed here. For Mito-P, Primers 5 and 7 in Figure 4E generate a band in wild types or heterozygotes but none in homozygotes, because the sequence recognized by Primer 7 is deleted ( Figure 4G). Likewise, Primers 10 and 11 in Figure 5E recognize a sequence deleted in the TWY3 line, so PCR using these primers generates a band in wild types or heterozygotes but not in homozygotes ( Figure 5G). In addition, one primer set in each case generates a band unique to the lines: Primers 1 and 2 or 3 and 4 for Hb9-GFP (Figures 3E-G), Primers 5 and 6 for Mito-P (Figures 4E,F) and Primers 8 and 9 for TYW3 (Figures 5E,F). Thus, information gained from the insertion site greatly simplifies genotyping.

All Three Transgenes Are Present in Multiple Copies
Multiple copies of transgenes are frequently integrated in a head-to-tail tandem array at a single genomic site (Palmiter and Brinster, 1986). Transgene copy number can vary from one to over one hundred, and the number can influence both qualitative and quantitative aspects of transgene expression (e.g., Williams et al., 2008). To more completely characterize the HB9-GFP, Mito-P and TYW3 transgenes, we estimated their copy number. To this end, we analyzed data from the commercial service (Transnetyx) 2 that we employ to genotype our transgenic lines. Transnetyx uses a quantitative fluorescent PCR method to detect transgene-specific sequences. The raw quantitative PCR results are normalized to a control endogenous gene (cJun). We compared the relative intensities of our three transgenic lines to intensities of single copy knock-in lines. After averaging the signals of 46-56 heterozygous animals from each line and normalizing to single copy intensities of GFP (n = 20) or LacZ Genome-wide TLA coverage using Thy1 primers. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 3 shows inserted sequence (red circle). (C) Genome-wide TLA coverage using CFP primers, with a peak at Chromosome 3 showing the inserted sequence (red circle). (D) Regional coverage of the Mito-P insertion site on Chromosome 3 using both sets of primers, showing a 3 kb deletion and spanning 49 kb. (E) Schematic of the inserted sequence. The transgene was inserted multiple times between Fat4 and Intu in Chromosome 3, in the 5 to 3 direction. Primers were designed to confirm the left junction of the transgene to Chromosome 3, as well as the predicted deletion. PCR product from homozygous Mito-P, heterozygous Mito-P, and C57BL/6J animals is denoted as "Tg/Tg," "Tg/+," "WT" respectively. (F) Primers 5 and 6 were used to confirm the junction between the transgene and Chromosome 3 in heterozygous and homozygous animals. (G) Primers 5 and 7 were used to confirm the deletion engendered by the transgene. The band is absent in putative homozygous Mito-P animals.
In each case, we are confident that all copies are inserted at a single genomic site for two reasons. First, TLA revealed only a single insertion site for each transgene (Figures 3-5). Second, Genome-wide TLA coverage using Thy1 primer set 1. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 1 shows inserted sequence (red circle). (C) Genome-wide TLA coverage using Thy1 primer set 2. Peak at Chromosome 9 shows endogenous Thy1 (black circle) and peak at Chromosome 1 shows inserted sequence (red circle). (D) Regional coverage of the TYW3 insertion site on Chromosome 1 using both sets of primers, showing a 419 kb deletion and spanning 5 Mb. (E) Schematic of the inserted sequence on Chromosome 1. The transgene was inserted multiple times between Lgsn and Khdrbs2 in Chromosome 1, in the 5 to 3 orientation. Primers were designed to confirm the left junction between the transgene and Chromosome 1, as well as the predicted deletion. PCR product from homozygous TYW3 and C57BL/6J animals is denoted as "Tg/Tg" and "WT" respectively. (F) Primers 8 and 9 were used to confirm the junction between the 5 end of the transgene and the 3 end of Chromosome 1 in homozygous animals. This 292 bp band is absent in wild-type animals (G) Primers 10 and 11 were used to confirm the deletion engendered by the transgene. The 173 bp band is absent in putative homozygous TYW3 animals. even if more than a single insertion occurred initially, the lines have been bred for at least 10 years, or >40 generations, which is more than enough to segregate multiple inserts.

Effect of Transgenes on Expression of Neighboring Endogenous Genes
We next asked if the transgenes we had studied affected expression of neighboring endogenous genes. Using quantitative PCR (qPCR), we found that levels of Khdrbs2 mRNA were reduced by ∼45% in TYW3 homozygotes compared to controls; Fat4 mRNA levels were reduced by ∼25% in Mito-P homozygotes compared to controls; and Cdh6 mRNA levels were reduced by ∼10% in HB9-GFP homozygotes compared to controls. The reductions were statistically significant for all three lines (p < 0.0001 for TYW3 and Mito-P; p = 0.0094 for HB9-GFP by Student's t-test). Interestingly, the effect size of these transgenes on endogenous gene expression is related to the distance between the two (Figure 7). However, numerous other factors may affect expression, including deletions of endogenous sequences, transgene size and transgene copy number; our sample size is too small to distinguish among these possible explanations.

Interactions Between the TYW3 Transgene and the Endogenous Khdrbs2 Gene
Because the TYW3 transgene exerted a strong effect on expression of Khdrbs2, we used immunohistochemical methods to examine interactions between the transgene and the endogenous gene in cellular detail. In wild-type retinas, Slm1 was present in most RGCs, as identified by double-labeling with FIGURE 7 | Effect of transgenes on expression of neighboring endogenous genes. Expression of Slm1, Fat4, and Cdh6 mRNA from TYW3, Mito-P, and HB9-GFP homozygous animals, respectively, was determined by RT qPCR. Retinas were collected at P56 for TYW3, P25 for Mito-P, and P6 for HB9-GFP. Values were compared to those from wild-type littermates for Mito-P and HB9-GFP and to age-matched C57BL/6J controls for TYW3.
CT values were calculated against CT values of Gapdh. The difference was calculated as 2ˆ-CT . The change in expression compared to control was significant for all three transgenic lines: Slm1 in TYW3 homozygotes (p < 0.0001), Fat4 in Mito-P homozygotes (p < 0.0001), and Cdh6 in HB9-GFP homozygotes (p < 0.009) (HB9-GFP n = 4; Mito-P n = 8; TYW3 n = 4). Significance was calculated by one-way ANOVA and Tukey's multiple comparison tests. Effect of transgene on endogenous gene expression varies with distance between the transgene and the endogenous gene.
the pan-RGC marker Rbpms (Rodriguez et al., 2014) and most amacrine cells, identified with the pan-amacrine marker AP2 (Bassett et al., 2007) (Figures 8A,B). Slm1 also appeared to be expressed by horizontal cells, identified by soma position. Bipolar, photoreceptors, and Müller glial cells were not detectably labeled. Patterns of Slm1 labeling in TYW3 heterozygotes appeared similar to those in wild-type retina ( Figure 8C). We also noted that YFP-positive RGCs, which comprise ∼15% of all RGCs, were nearly all Slm1-positive (Figures 8C, 9I).
We then assessed TYW3 homozygotes, which, as noted above, are Khdrbs2 hypomorphs. We detected no alterations in the overall structure of the retina (Figures 8D,E) or in the lamination pattern of TYW3 RGCs (Figures 8F,G). We found no significant change in the total number of RGCs (Rbpms-positive) in TYW3 homozygotes ( Figure 8H). Likewise, the number of YFP-positive RGCs did not differ significantly between TYW3 heterozygotes and homozygotes (Figure 8I).
Although there were no changes in the general organization of TYW3 Tg/Tg retinas, decreased levels of Slm1 were apparent in both RGCs and amacrine cells. We found that only 39.9 ± 3.4% of Rbpms-positive RGCs (Figures 9A-C) and 46.8 ± 1.0% of AP2-positive amacrines were Slm1-positive in TYW3 homozygotes (Mean ± SEM) (Figures 9D-F). Interestingly, however, the loss of Slm1 from transgene-positive RGCs in TYW3 homozygotes was greater than that in RGCs generally: only 5.2 ± 2.3% of YFP-RGCs were detectably Slm1-positive in homozygotes (Figures 9G-I).
Because Slm2 is upregulated in the brains of Khdrbs2 knock out mice (Traunmüller et al., 2014), we investigated Slm2 expression in TYW3 homozygotes. In wild-types, Slm2 was expressed by retinal ganglion, amacrine, and bipolar cells ( Figure 9J). We found no detectable upregulation of Slm2 in Slm1 is expressed by most amacrine cells. Section stained with TFAP2 (AP2) and Slm1. (C) Slm1 is expressed by YFP-positive RGCs in the TYW3 line. Section stained with GFP and Slm1. Sections from wild type (D) and TYW3 homozygous (E) retina. The difference between panels (D) and (E) falls within the normal range of variation owing to section quality, staining intensity and differences among individuals. Section stained with ToPro. Lamination of TYW3-RGCs in heterozygous (F) and homozygous (G) retina. Section stained with GFP and ChAT, labeling starburst amacrine cells. (H) Rbpms counts in wild-type, TYW3 Tg/+, and TYW3 Tg/Tg retinas (Mean ± SEM). No significant difference between conditions (n = 5 for WT, n = 4 for TYW3 Tg/+ and TYW3 Tg/Tg). (I) TYW3-RGC counts in TYW3 Tg/+ and TYW3 Tg/Tg retinas (Mean ± SD). No significant difference between conditions (n = 4 for both). Scale bars for (A-G) are 40 µm.
FIGURE 9 | Expression of Slm1 and Slm2 in TYW3 homozygotes. Expression of Slm1 in RGCs in wholemount of wild-type (A) and TYW3 homozygous (B) retinas. Stained with Rbpms and Slm1. (C) Quantification of Slm1 expression in RGCs across TYW3 genotypes. In wild-type animals, 96.1 ± 0.8% of RGCs express Slm1, while in TYW3 homozygotes, only 39.9 ± 2.3% of Rbpms-positive RGCs express Slm1 (Mean ± SEM). There is no significant decrease in Slm1 expression by RGCs in TYW3 heterozygotes. Significance determined by one-way ANOVA (n = 5 for WT, n = 4 for TYW3 Tg/+ and TYW3 Tg/Tg). Expression of Slm1 in amacrine cells in wholemount in wild-type (D) and TYW3 homozygous (E) retina. Stained with AP2 and Slm1. (F) Quantification of Slm1 expression in amacrine cells across TYW3 genotypes. In wild-type animals, 85.0 ± 0.9% of RGCs express Slm1, while in TYW3 homozygotes, only 46.8 ± 0.5% of AP2-positive ACs express Slm1. Significance determined by t-test (n = 3 for both) (Mean ± SEM). Slm1 expression in TYW3-RGCs in wholemount in TYW3 heterozygous (G) and TYW3 homozygous (H) retina. Stained with GFP and Slm1. (I) Quantification of Slm1 expression in TYW3-RGCs. As in section, 97.9 ± 1.0% of heterozygous TYW3-RGCs express Slm1, while in TYW3 homozygotes, only 5.2 ± 2.3% still express Slm1. Significance determined by t-test (n = 4 for both) (Mean ± SEM). (J) Slm1 and Slm2 expression overlaps in wild-type retina. Slm2 is also expressed in bipolar cells of the INL. (K) Slm1 and Slm2 expression in TYW3 homozygous retina. While Slm1 levels decrease, there does not appear to be a significant change in the pattern of Slm2 expression. Scale bars are 40 µm. TYW3 Tg/Tg retinas (Figure 9K), but its expression in most Slm1-positive cells even in wild-types suggests that it may be able to compensate for loss of Slm1 in homozygotes.

DISCUSSION
Many researchers have benefited from transgenic mice in which reporters are expressed in specific cell types that were not readily predictable based on the expression of the gene from which the transgene's regulatory elements were derived -in other words, transgenes exhibiting what was presumed to be insertion site-dependent expression (for example, Cohen-Tannoudji et al., 1994;Young et al., 2008;Huberman et al., 2009;Haverkamp et al., 2009;Trenholm et al., 2011;Kay et al., 2011b;Dhande et al., 2013;Krishnaswamy et al., 2015;Peng et al., 2017). Lines generated with Thy1 derived regulatory elements have been a particularly rich source of such variation: an initial set of 25 such "XFP" lines (Feng et al., 2000) have been used to study neuronal subsets in cortex, hippocampus, spinal cord and dorsal root ganglia as well as retina. Lines in other sets using the same regulatory elements, such as the Brainbow series, incorporating multiple XFPs (Livet et al., 2007) the SLICK series, incorporating a Cre recombinase (Young et al., 2008), and the TYW series, incorporating lacZ (Kim et al., 2010) show similar line-to-line variation; for these as well, different lines have been used to mark and analyze different cell types, including some in non-neuronal tissues.
Although identifying endogenous genes near the transgene could aid in interpreting transgene expression patterns, this has been attempted infrequently, in large part because methods for determining insertion sites have been unreliable. We were motivated to reexamine this issue for two reasons. First, initial reports suggested that the TLA method would be more reliable than its predecessors (de Vree et al., 2014;Cain-Hom et al., 2017). Second, in the course of our developmental studies, we obtained suggestive evidence that two such transgenes were inserted in close proximity to genes expressed in small neuronal subsets that the transgenes marked: HB9-GFP near Cdh6 and Mito-P near Fat4. Results reported here confirm those suppositions, demonstrate linkage of the TYW3 transgene to Khdrbs2 and, more important, provide new insights into the influence of transgenes and endogenous genes on each other.

Endogenous Genes Affect Expression of Neighboring Transgenes
Our claim for an effect of endogenous genes on transgene expression is based on the selective expression of the HB9-GFP transgene in Cdh6-positive V-ooDSGCs and the selective expression of the Mito-P transgene in Fat4-positive BC1 bipolar cells. Although we cannot entirely rule out the possibility that the correspondence is coincidental, it is highly unlikely. Of >120 retinal cell types, 3 express Cdh6 (V-ooDSGCs, D-ooDSGCs, and starburst amacrine cells) and 2 are GFP-positive in HB9-GFP transgenic mice (V-ooDSGCs and cones). The odds of one of the GFP-positive types being Cdh6-positive by chance are 0.05 (see Materials and Methods). Likewise, 2 cell types are Fat4-positive (BC1A and BC1B) and 3 are CFP-positive in the Mito-P line (BC1A, BC1B and nGnG amacrines), with the odds of two CFP-positive types being Fat4 positive by chance being 0.0004. This level of co-expression is therefore unlikely to be random.
Two aspects of the influence of the insertion site are noteworthy. First, as detailed above, it is incomplete, differing from patterns seen in "enhancer traps" that often faithfully report on the expression of a neighboring gene. What other factors might influence transgene expression? One possibility is that it reflects expression of the gene from which the transgene's regulatory elements are derived, but this seems unlikely. Hb9 is not detectably expressed in RGCs or cones (Peng et al., 2017;Sarin et al., 2018) and although some bipolar cells express Thy1, they do so at substantially lower levels than RGCs (Barnstable and Dräger, 1984;Macosko et al., 2015). Another possibility is that rearrangements within the transgene or at the insertion site have generated new specificities, such as the deletion directly upstream of the Mito-P transgene and the duplication within the HB9-GFP transgene (Figures 3, 4). Yet another, and perhaps most likely, is the one initially proposed for unexpected patterns of transgene expression, that juxtaposition of sequences within and outside of the transgene generates novel specificities (Palmiter et al., 1983).
A second point of interest is that the distances between these two transgenes and the nearest annotated genes are rather large: HB9-GFP is ∼680 kb from Cdh6 and Mito-P is ∼550 kb downstream of Fat4. Although such long-distance interactions were once thought to be unusual, recent studies have shown that chromatin is organized into regions such as topologically associating domains or TADs, ranging from a few hundred kilobases to a few megabases, within which gene expression is coordinated by enhancers that act over the entire domain (Symmons et al., 2014;Dekker and Heard, 2015;Dixon et al., 2016). TADs represent islands of the genome that are physically isolated from each other and are increasingly viewed as basic units of chromatin folding. The physical separation may enhance co-regulation of genes in the same TAD and prevent regulatory elements from affecting expression of genes in other neighboring TADs. Using the dataset of Dixon et al. (2016;Wang et al., 2017), we find that HB9-GFP and Cdh6 are indeed within the same TAD, as are Mito-P and Fat4 (Figure 10). This 3-dimensional genomic compartmentalization produces secondary and tertiary structures, leading to interactions between regions that are separated by substantial linear sequence. Moreover, both transgenes are inserted in gene-poor regions, with no other annotated genes over spans of 2.9 and 1.0 Mb surrounding HB9-GFP and Mito-P respectively (Figures 3, 4). This contrasts with an average intergenic distance of approximately 100 kb in the mouse genome (approximately 2 × 10 4 genes in a 2 × 10 9 bp genome; see also, Mayer et al., 2005). In such regions, complex interfering signals from multiple genes or boundaries that insulate genes from each other may be minimized. Thus, while the HB9-GFP and Mito-P transgenes are 100s of kilobases from their endogenous neighbors, gene-poor regions adjacent to the transgenes and their insertion into TADs may explain the selective expression of these transgenes by cell types also expressing their nearest neighbors.  Wang et al. (2017). Cdh6 and HB9-GFP appear in the same TAD, as do Fat4 and Mito-P. There are no other genes within these TADs. The insertion site of TYW3 does not appear to be within a particularly interactive region of Chromosome 1. Individual bins represent 40 kb. Analysis was performed using the Hi-C mouse cortex dataset of Dixon et al. (2016) and yielded heat maps of predicted interactions between bins within the specified regions. Areas of higher intensity on the heatmap may be considered sub-TADs or individual loops within the larger TAD.
In contrast to Cdh6/HB9-GFP and Fat4/Mito-P, it is uncertain whether the TYW3 expression pattern is influenced by the nearby Khdrbs2 for several reasons. Thy1 is already expressed by all RGCs in wild-type retina and most transgenes incorporating Thy1 regulatory elements are expressed by at least some RGCs (Feng et al., 2000;Kim et al., 2010;Misgeld et al., 2007). Likewise, most RGCs are Khdrbs2-positive, and TYW3-RGCs did not express Khdrbs2 at detectably higher levels than their YFP-negative neighbors. Thus, it is not possible to disentangle the effects of Thy1-and Khdrbs2-derived regulatory sequences.

Transgenes Affect Expression of Neighboring Endogenous Genes
Many cases have been described in which insertion of a transgene mutates an endogenous gene, leading to severe defects or lethality (e.g., Soriano et al., 1987;Keller et al., 1990;Woychik and Alagramam, 1998). Indeed, some genetic screens have relied on insertional mutagenesis, using transposons and retroviral vectors as mutagens (Golling et al., 2002;Shima et al., 2016). Nonetheless, the possibility that transgenic reporters may affect endogenous genes in ways that lead to subtle defects is less often considered. It is therefore sobering that all three of the transgenes we studied affected expression of a neighboring endogenous gene. The effect, in this admittedly small sample, was distance dependent: greatest for TYW3 and Khdrbs2, separated by 7 kb, modest for Mito-P and Fat4, separated by 550 kb and small but significant for HB9-GFP and Cdh6, separated by 680 kb. Interference with the endogenous gene might result from interruption of endogenous regulatory elements, the deletions and rearrangements that accompany insertion, alterations in chromatin structure, or some combination.
For TYW3, the decrease in Slm1 expression was striking. We observed no overt phenotype, consistent with the finding that even Slm1 null mutants are viable, fertile and outwardly normal (Iijima et al., 2014;Traunmüller et al., 2014). Nonetheless, Khdrbs2 (Slm1) and its homologues, Khdrbs1 (Sam68) and Khdrbs3 (Slm2) are known to regulate alternative splicing of critical neuronal genes (Ehrmann et al., 2016) so alterations in its activity could affect neuronal development or function. Likewise, Fat4 has been implicated in several developmental processes, with the null mutant being neonatally lethal (Saburi et al., 2008). Thus, even the modest defects observed in Mito-P could be consequential under some circumstances.

Mapping Mouse Transgene Integration Sites Is Feasible and Useful
We have demonstrated complex reciprocal interactions between transgenes and neighboring endogenous genes. These interactions are interesting, but also potentially worrisome, since the endogenous gene affected by the transgene has a substantial chance of being expressed in the very cells that the transgene is being used to study.
We analyzed only three lines and we deliberately chose ones that exhibited interesting integration site-dependent expression patterns in retina, so it is difficult to draw strong conclusions about the frequency of these interactions. Nonetheless, there are several reasons to believe that they are more frequent than has been generally appreciated. First, we observed effects of an endogenous gene on transgene expression in two and possibly all three of the lines, and effects of the transgene on expression of the neighboring endogenous gene in all three of the lines. Second, for two of the three lines, the distance between transgene and nearest endogenous neighbor is >500 kb, dispelling the notion that insertion within a gene is prerequisite to interaction. Third, sporadic reports appearing over a long period have provided additional cases in which transgene insertions near but not within endogenous genes affect expression of the endogenous gene (e.g., Sharpe et al., 1999;Mukai et al., 2006), or the endogenous gene influences expression of the transgene (e.g., Kothary et al., 1988;Sharpe et al., 1999;Narboux-Nême et al., 2012).
Together, our results show that mapping of transgene insertion sites can be useful in at least three respects: First, once neighboring genes have been identified, their expression can be assayed to test the possibility that a transgenic line is in fact a hypomorph. Second, if the endogenous gene is expressed in cells marked by the transgene, it becomes a candidate effector of that cell's development or function. Third, once the insertion site has been mapped, it becomes straightforward to devise genotyping protocols that are specific to the line and that readily distinguish heterozygotes from homozygotes. Additional potential uses include targeting new transgenes to insertion sites expected to confer desirable expression patterns on them.
To date, mapping of transgene insertion sites has not been standard practice, both because its value has been questionable and reliable methods for doing so have not been available. With improved methods now available and evidence accumulating that interactions of transgenes and endogenous genes are frequent, it may be advisable to make this a more common practice.

AUTHOR CONTRIBUTIONS
XD, ML, MQ, and IW performed the experiments and analyzed the data. JS conceived the project and analyzed the data. ML and JS wrote the manuscript. All the authors reviewed and edited the manuscript.

FUNDING
This work was supported by NIH grants R01 EY022073 and R37 NS029169 to JS and a Klingenstein-Simons Neuroscience Fellowship to XD.

ACKNOWLEDGMENTS
We thank Karthik Shekhar for statistical help; Max van Min and Judith Bergboer (Cergentis) for assistance in interpreting TLA results and permission to use material from their reports in