Detection of Human Endogenous Retrovirus K (HERV-K) Transcripts in Human Prostate Cancer Cell Lines

Human endogenous retroviruses (HERVs) are transcribed in many cancers including prostate cancer. Human endogenous retrovirus K (HERV-K) of the HML2 subtype is the most recently integrated and most intact retrovirus in the human genome, with many of the viral genomes encoding full- or partial-length viral proteins. To assess transcripts of HERV-K in prostate cancer cell lines and identify the specific HERV-K elements in the human genome that are transcribed, reverse transcriptase-PCR (RT-PCR) and cDNA sequencing were undertaken. Strand-specific RT-PCR, plasmid subcloning, and cDNA sequencing detected the presence of HERV-K(HML2) coding strand transcripts within four prostate cell lines (LNCaP, DU145, PC3, and VCaP). RT-PCR across splice junctions revealed splicing variants for env gene mRNA in three cell lines, two involving previously undescribed alternative splice sites. To determine the HERV-K loci from which the transcripts arose, RepeatMasker was used to compile a list of over 200 HERV-K internal genome segment fragments and over 1,000 HERV-K solo long terminal repeat (LTR) fragments in the human genome. Surprisingly, the sequences identified from internal positions of the viral genome were mostly smaller segments, while the LTRs were relatively intact. Possible reasons for this are discussed. The transcripts in the cell lines tested, arose from several HERV-K loci, with some proviruses being detected in multiple cell lines and others in only one of the four used. In some instances, transcripts from viral antisense strands was also detected. In addition, transcripts from both strands of solo LTRs were detected. These data show that transcripts from HERV-K loci commonly occur in prostate cancer cell lines and that transcription of either strand can occur. They also emphasize the importance of single nucleotide level analysis to identify the specific, individual HERV-K loci that are transcribed, and indicate that HERV-K expression in prostate cancer warrants further study.


INTRODUCTION
Human Endogenous Retroviruses (HERVs) exist as the integrated form of retrovirus DNA, called proviruses, at many loci in the human genome. They are the result of ancient retroviral infections of the human germline cells during evolution (1). There are many different species of endogenous retroviruses in the humans today, and collectively they comprise about 8% of the human genome. Since the divergence of the human and chimpanzee lineages approximately six million years ago, the only retrovirus known to have entered the genome of the human lineage is the HML2 subset of human endogenous retrovirus K (HERV-K) (2)(3)(4)(5)(6)(7)(8). Since it is the newest HERV, HERV-K(HML2) thus has had the least time to accumulate mutations and constitutes the most intact set of retrovirus in the human genome (2,(9)(10)(11)(12)(13)(14)(15)(16). Some HERV-K(HML2) proviruses entered the germline of the human lineage earlier in evolutionary time in the common ancestors of humans and other catarrhines (2,(17)(18)(19). In the absence of selection pressure on the host to maintain the proviral genomes in an intact form, recombination events, and various other types of mutations inevitably accumulate in the proviruses over evolutionary time. The most common mutation of HERV-K is the formation of solo long terminal repeats (LTRs), which occurs by homologous recombination between the two LTRs of an individual provirus (16,20). For HERV-K, solo LTRs outnumber proviruses by more than 10 to 1 (16). No HERV-K proviral locus has been found that produces fully functional, infectious virions, although the components to assemble an infectious virus genome exist in the human genome today (14). There are over 1,000 HERV-K(HML2)-containing loci in the human genome today, that range from solo LTRs and fragments of proviruses to almost intact, fulllength proviruses. Many of the HERV-K proviruses are sufficiently intact to encode functional proteins (2,(12)(13)(14)(15)(16), and these proteins have been suggested to contribute to diseases including cancers (21-25, 39).
Human endogenous retrovirus K is transcribed to various extents in several human diseases including cancer (26-30), HIV infection (31-35), and autoimmune disorders (36, 37). Humanspecific HERV-K(HML2) elements in the human genome are www.frontiersin.org approximately 99% identical in pairwise comparisons to each other, and many of the differences among them are single base pair changes. Thus the techniques used to detect HERV-K transcription often do not allow a determination of which specific viral loci in the human genome were the source of the detected RNAs. Individual loci differ in the integrity of specific open reading frames (ORFs), functionality of the encoded proteins, and the capacity of DNA sequences flanking the viral DNA as well as unique mutations within individual loci to affect transcription. In certain instances, transcription of specific HERV-K loci has been recognized (38-41), although for most HERV-K loci, particularly solo LTRs, HERV-K transcription has not been assessed. In addition, many HERV-K loci are situated within introns of genes, often in the reverse transcriptional orientation, and they may be transcribed as a segment of the gene being expressed (42). Transcription of multiple HERV types including HERV-K was reported in human prostate cancer (PrCa) and occurs to varying extents (43)(44)(45)(46). To understand the biological role, if any, and potentially to predict functionality of HERV-K in human diseases, a thorough analysis of viral transcription that includes solo LTRs, viral fragments, DNA strand, and genomic loci of origin is necessary. As a step toward these ultimate goals, we analyzed four PrCa cell lines by strand-specific reverse transcriptase-PCR (RT-PCR). Sequencing of PCR products was used to determine the loci of specific transcripts.

HERV-K LOCI IN HUMAN GENOME
To define the loci of origin for individual HERV-K transcripts, it was first necessary to generate a comprehensive list of the HERV-K segments in the human genome. This was accomplished using the Table Browser tool (47) at UCSC Genome Browser (48) and the GRCh37/hg19 human genome assembly. The sequences and genomic positions of LTR and internal genome segments were downloaded using RepeatMasker (49) and RepBase definitions (50) for LTR5_Hs, LTR5A, or LTR5B for the HERV-K LTRs, and HERVK-int for the internal HERV-K segments.
Thousand three hundred and sixty-one LTRs and 255 HERV-K internal genome fragments were obtained. This is somewhat larger than the number of loci assessed by Subramanian et al. (16) mainly because RepeatMasker identified multiple fragments derived from individual HERV-K genomes, and these were counted separately here. RepeatMasker identified the fragments separately when HERV-K loci were disrupted by deletions, insertions, or extensive substitution mutations. In Figure 1, each locus was plotted as a function of its length. The LTRs were mostly intact and not broken into fragments. The peak around 970 bp ( Figure 1A) reflected the size of the LTRs of the most recently acquired, Homo sapiensspecific proviruses. In contrast, the viral internal genome segments were mostly disrupted into smaller fragments ( Figure 1B). Only 26 loci had full-length or near full-length internal sequences. HERV-K proviruses were categorized as type I or II (51) depending on the presence of a 292-bp stretch spanning the pol-env gene boundary (type II) or the deletion of that segment (type I). The 26 loci included 18 that fell into the 7,400-to 7,600-bp window, plus 8 type I proviruses where RepeatMasker separately identified the 5 part of the internal sequences in the 5,400-to 5,600-bp window and the 3 part in the 1,600-to 1,800-bp window. These loci and a few additional loci relevant to this study are listed in Table 1.
Similarly, the genomic coordinates for all known genes were obtained using the knownGene table function in the UCSC Genes track for whole genes, exons, introns, 5 untranslated regions (5 UTRs), 3 untranslated regions (3 UTRs), and coding sequences (CDSs) ( Table 2). Thirty-three percent of LTRs and  31% of internal HERV-K genome sequences were located within transcribed portions of UCSC genes genes, which is similar to the fraction of the human genome encompassed by the transcribed portions of genes. Four percent of the viral loci overlap with exons, mainly with untranslated regions (UTRs). Three of the viral loci overlapped CDSs of putative human genes. None of those three genes have been firmly characterized, and for two of them, the overlapping segments were short. Details of the overlaps are described below in the Section Materials and Methods.

HERV-K TRANSCRIPTION IN PROSTATE CANCER CELL LINES
The general approach used to analyze HERV-K RNA expression was to perform RT-PCR on RNAs isolated from prostate cancer cell lines (Figure 2). These analyses were undertaken as part of ongoing studies of these tumors. Primers were designed in conserved www.frontiersin.org  stretches of viral sequences and were expected to amplify most of the HERV-K loci in the human genome identified by Repeat-Masker above. Nonetheless, it is possible that particular loci were not recognized by the particular primer pairs used. Transcripts were amplified using RNAs isolated from four PrCa cell lines. RT-PCR using a primer pair to detect a 2-kb amplicon spanning the pol-env junction in unspliced viral RNA ("unspliced," Figure 2) is shown in Figure 3A. Two kilobase products were detected in all four PrCa cell lines. Parallel amplification reactions were performed without RT and were uniformly negative, thus showing that amplification was from RNA templates and not from any potentially contaminating genomic DNA ( Figure 3A). The larger band in VCaP cells corresponded to that from type II proviruses, while the smaller band observed in all four lines corresponded to type I proviruses.

MULTIPLE HERV-K LOCI ARE ACTIVE
Individual proviruses inevitably accumulate unique mutations over evolutionary time. These differences and the sequences flanking the proviruses could affect the expression of specific HERV-K loci. They can also be used to identify the specific viral loci from which transcripts originated. Pairwise comparisons of the Frontiers in Oncology | Molecular and Cellular Oncology sequences of human-specific, HERV-K proviruses showed that they were about 99% identical. Thus sequencing of the PCR products was necessary to identify the specific loci of origin for the transcripts, and it is was important for the sequence reads to be long enough to encompass the limited polymorphisms among the newest proviruses in the human genome. Sequence reads were matched to the closest HERV-K locus in the human genome. Many sequence reads matched a genome provirus 100%. Some had one or more mismatches, but all of these matched a genomic locus by >99%. The differences may be due either to polymorphisms among the HERV-K loci in the human population, the extent of which is largely unknown, or to misincorporation errors during RT-PCR. Both RT and the PCR polymerase can cause nucleotide misincorporations.
To determine whether or not the HERV-K transcripts detected in PrCa cells arose from a single HERV-K locus or multiple loci in the human genome, the PCR products shown in Figure 3A were sequenced ( Table 3). Direct sequencing of the amplicons from DU145 and LNCaP cells allowed identification of the transcripts as containing mutations unique to HERV-K102 and HERV-K118 proviruses, respectively, thus showing that they derived principally from these loci ( Table 3) (hereafter proviruses are called simply K102, etc.). PC3 and VCaP showed mixed peaks in the chromatographs, and thus the PCR products were shotgun cloned into plasmid vectors, and at least seven individual clones of each were sequenced. Both lines contained transcripts from K118, while K102 was observed in PC3 and K50F was found in VCaP. Based on the nearly equal mixed peak heights in the chromatographs of the initial PCR products (data not shown), it is plausible to estimate that the proportion of transcription of the two loci a From direct sequencing of PCR products in Figure 3B.
b From direct sequencing of PCR products in Figure 3A.
c From sequencing of plasmid subclones of PCR products in Figure 3A.
was roughly comparable. In summary, the sequencing analysis showed that the transcripts originated from at least three different HERV-K loci in the human genome, and that they partially varied among the cell lines. The analysis detected only the most abundant cDNAs that were amplified, and more extensive sequencing might identify additional transcribed loci of lower abundance.

SOME HERV-K TRANSCRIPTS ARE SPLICED IN PROSTATE CANCER CELL LINES
In addition to assessing which proviruses had sufficient integrity for the RNAs to be spliced, the detection of viral splicing would provide further evidence that the RT-PCR products derived from viral RNAs and not from any potentially contaminating DNA. To investigate whether the HERV-K transcripts could be spliced, we performed strand-specific RT-PCR across one of the viral splicing junctions in the four PrCa cell line. Using primers designed to detect the 1×-spliced env mRNA (Figure 2), three out of four cell lines were found to be positive ( Figure 3B), although the sizes of the products were different in the different lines. To investigate the basis for this, the products were sequenced ( Table 3). This analysis showed that a different locus was responsible for the predominant spliced cDNA product in each line, K108 in DU145, K118 in LNCaP, and K60 in PC3. K108 is one of the most intact HERV-K proviruses in the human genome, having full-length ORFs for all viral proteins (2,12). Splicing of RNA from this provirus in DU145 cells occurred at the canonical env mRNA splice sites in the viral genome (Figure 4). Detection of the expected splice junction for K108 env mRNA further confirmed that the RT-PCR products were derived from RNA templates. Splicing for both K118 and K60 was detected at unique, non-canonical sites, and the positions were different for each (Figure 4). For K60 in PC3 cells, the 5 splice site matched the consensus sequence for the minor spliceosome (52) and was located 435 nt downstream of the canonical 5 splice site. The 3 splicing site matched no apparent consensus sequence and was only 6 nt downstream of the canonical env mRNA 3 splicing site. The alternative splice sites used and the presence www.frontiersin.org or absence of the 292-bp pol-env junction characteristic of type I HERV-K proviruses (Figure 4) accounted for the differences in sizes of the RT-PCR products in Figure 3B. The absence of unspliced products corresponding to K108 and K60 (Table 3) may be due to unspliced transcripts from these proviruses being less abundant than those from the other proviruses in these cells and/or transcripts from these proviruses being more completely spliced.
The spliced HERV-K products from DU145, LNCaP, and PC3 cells were not generated in the absence of RT (Figure 3B), showing that they were indeed derived from RNA templates. The possibility that they were derived from spliced pseudogenes was also considered by searching existing human genome sequences for the sequences that we detected using BLAT. No such products were detected. Moreover the splicing of K108 RNA exactly at the canonical sites as expected provided a control that expected splicing was detectable under the conditions used. Thus it appears that the unexpected use of noncanonical splice sites for some HERV-K transcripts accurately reflected unusual splicing that occurred in the PrCa cell lines used.

SOLO LTRs ARE ACTIVELY TRANSCRIBED
The question whether solo LTRs are transcribed was also addressed. As these, like viral LTRs in general, contain the viral U3 region transcriptional regulatory elements, they thus have the potential to initiate transcription (53, 54). In addition, many HERV-K solo LTRs are present within transcription units of cellular genes and may be transcribed as segments of such units. Since solo LTRs outnumber proviruses containing the viral internal sequences (16), we hypothesized that the levels of their RNAs might be higher than those from HERV-K internal sequences. If, alternatively, LTR transcripts were predominantly components of longer viral transcripts originating from full-length HERV-K proviruses, then the levels would be similar.
To answer this question, two-step, quantitative, RT-PCR was performed with one primer pair designed in the LTR and one in the env gene (q-LTR and q-env, Figure 2). The q-LTR pair was positioned in the R portion of the LTR, and the q-env pair was positioned so that it detected both unspliced and 1×-spliced env mRNA. In all four PrCa cell lines, the average difference of threshold cycle (Ct) between transcription levels of LTRs and env was about 7.2 Ct (Figure 5), which corresponds to roughly 100-fold Frontiers in Oncology | Molecular and Cellular Oncology or greater levels of LTR RNA than unspliced or 1×-spliced RNA. Since the unspliced and 1×-spliced RNAs have two copies of the LTR R region for each copy of env (Figure 2), the great excess of LTR transcripts presumably reflects a large excess of transcripts derived from solo LTRs.

HERV-K ELEMENTS ARE TRANSCRIBED FROM BOTH STRANDS
Human endogenous retrovirus K LTRs, like those of other retroviruses, contain the viral enhancers and promoters that may potentially drive transcription of the viral sense strand (53, 54). They also contain signals for polyadenylation (55). In addition, proviruses and solo LTRs may be integrated into other transcription units and be transcribed as part of them (42). Thus transcribed HERV-K sequences may arise from either the sense strand, the antisense strand, or both, depending on where active promoters are located. Viral sequence-containing transcripts may include introns in pre-mRNAs, long non-coding RNAs, standard viral transcripts, solo-LTR initiated RNAs, LTR-polyadenylated transcripts, and even short RNAs of unknown significance. To assess transcripts in the PrCa cell line more comprehensively, five different primer pairs for short PCR amplicons were designed for LTR, pro, and env genes (Figure 2). For the env gene, three nonoverlapping regions were used. Sequencing was used to identify the loci of origin for the RNAs detected. By including either the forward or the reverse primer separately during the reverse transcription step, strand specificity of HERV-K transcripts could be determined. RT-PCR was performed on LNCaP cells (Figure 6). The primary PCR products were directly sequenced, and, in addition, the products were cloned into plasmid vectors, and 7-10 clones were sequenced for each PCR product ( Table 4).
Analysis of sense strand transcripts from the internal portion of the HERV-K genome in LNCaP cells showed that 27/27 clones derived from HERV-K118 (Table 4), similar to what was observed above for the unspliced primer pair spanning the polenv junction ( Table 3). In contrast, sequencing of products the HERV-K antisense strand showed that they were derived from at least five different loci, including full length, recently inserted, HERV-Ks and more ancient HERV-K loci. Among these, the older HERV-K(I) locus was the most frequently detected one during the sequencing of plasmid clones that were derived from HERV-K antisense transcripts (39%) ( Table 4). Other proviruses from which antisense transcripts were detected were K106, K50F, and two unnamed, fragmented loci ( Table 4). Two of the proviruses identified by this approach, K50F and the fragment at 11q12.3, are known to be inserted in introns of human genes and in the opposite transcriptional orientation. Thus the viral antisense strands for these might be a consequence of being within the primary transcripts of the host genes. The other four HERV-K loci are not within introns of human genes, and the basis for transcription of their antisense strands is unclear.
Reverse transcriptase-PCR products from the LTR region were also analyzed ( Table 4). Both strands of the HERV-K LTR were represented. For 94% (17/18) of the clones sequenced, the best genomic locus or loci matching the obtained sequences were not in the LTRs of proviruses with internal viral sequences. Instead one or more solo LTRs were the loci of origin. Since the primer pair was in a well-conserved component (the R region) of LTR, it was not always possible to assign a sequence uniquely to a single solo LTR locus. Nonetheless, these data provided further evidence proving that solo LTR transcription is widespread, in addition to showing that transcripts of both the sense or antisense strands of various HERV-K proviruses and solo LTRs were transcribed.

DISCUSSION
Human endogenous retrovirus K(HML2) transcripts were detected in all four prostate cancer cell lines examined. These included transcripts of both viral strands and transcripts that were spliced. HERV-K transcripts were previously detected in prostate cancer and include loci in addition to those identified www.frontiersin.org here (35, 45). The results here employed cDNA sequencing to distinguish from which among the highly similar HERV-K loci in the human genome the most predominant transcripts arose. Certain HERV-K loci in the human genome, K118 and K102, were transcribed in multiple prostate cancer cell lines, raising the possibility that the genomic insertion sites, mutations unique to them, and/or epigenetic states of these proviruses make them prone to transcription in these cells lines. Other HERV-K loci were detected only in individual lines. However, detection here was limited by the number of plasmid clones sequenced, and deeper sequencing might reveal transcripts from more HERV-K loci. Nonetheless the loci identified here likely reflected the most abundantly transcribed loci in these cell lines. This study emphasizes the importance of analyzing HERV-K transcripts at the single nucleotide level to distinguish the specific loci from which they originated. An additional finding here was that HERV-K solo LTRs were abundantly transcribed in these cell lines. This is not surprising in that these are more abundant than HERV-K internal segments in the human genome (16). However, their abundance complicates identification of the specific, individual loci of origin, and their detection raises questions about the mechanisms that cause them to be transcribed. Many are present in introns of human coding genes ( Table 2), and one possibility is that they may be transcribed as components of the primary transcripts of these genes.
One unexpected finding here was the less intact nature of HERV-K internal genome segments relative to LTRs as detected by RepeatMasker (Figure 1). The LTR sizes clustered around large, full-length, and near full-length segments, while those of internal genome segments clustered at the very low end (Figures 1A vs  1B). Multiple factors could contribute to the internal segments being recognized as separate fragments by RepeatMasker. One is that if a portion of a repeated element suffered sufficiently extensive mutation over evolutionary time that RepeatMasker no longer recognized that portion as part of the element, then the two flanking segments of the element would be reported as different fragments. Likewise, if genetic breakage and rejoining events including insertions, deletions, or other rearrangements occurred, then an element would be reported by RepeatMasker as separate segments. The internal segment of HERV-K is longer than the LTR, roughly 7,500 vs. 1,000 bp, respectively, and thus would be more likely to have suffered mutations. The 1,361 LTR fragments encompassed 1,162,766 bp. The mean length of 864 bp corresponded to about 86% of a full-length LTR or roughly 1.2 fragments per LTR. The 255 HERV-K internal segment fragments encompassed 487,441 bp. The mean length of 1,911 bp corresponded to about 25% of a full-length internal segment or roughly 4 fragments per internal genome segment. The difference between LTRs and internal segments is greater than could be caused even if half the internal HERV-K segments were type I proviruses (missing the 292-bp) and thus recognized as two fragments. There are many potential explanations for this unexpected difference. One is that RepeatMasker may detect some HERV-K internal fragments that include some evolutionarily older and thus more fragmented, HERV-K variants than it does with LTR segments. Another is gene conversion. HERV-K LTRs have been well-documented to undergo gene conversion events among paralogous LTR loci (4,18,56). The greater abundance of LTRs might have caused them to undergo more inter-locus gene conversion events resulting in more homogenous sequences and thus being recognized as intact by RepeatMasker compared to internal segments. Other unprecedented and seemingly less plausible alternatives might include different mutation rates or effects on host fitness. The genetic basis Frontiers in Oncology | Molecular and Cellular Oncology is for the unexpected difference between LTRs and internal segments requires further study and might turn out to be biologically interesting.
Another unexpected finding was the detection of splicing at non-standard sites. Use of the expected splice sites was observed for K108 which is a type II provirus and one of the most intact loci in the human genome, containing full-length ORFs for all viral proteins (2,12). The non-standard splicing involved type I proviruses (K118 and K60). The 292-nt that are deleted in type I HERV-K proviruses span a segment from just downstream of the standard, env 3 -splice site and encompass the 5 -splice site for the second intron of HERV-K (Figure 4). Perhaps this deletion and/or other mutations in these proviruses affect splice site usage.
These studies bear on long term goals of understanding HERV-K expression in cancer cells, and possibly exploiting it for T-cell based immunotherapy in combination with conventional chemo-RT therapy. The detection of HERV-K transcripts in prostate cancer cell lines adds to the list of human tumors in which RNAs from this retrovirus have been observed and raises the point that further study of the virus in this disease is warranted, as it is in other diseases. Immune responses against HERV antigens have been reported in human diseases (27, 28, 33, 57-59), and such responses might be of significance for certain types of human cancers if HERV-K is expressed. Detailed analyses such as those performed here provide a step toward clarifying the spectrum of HERV-K expression in human tumor cells.

DETERMINATION OF HERV-K LOCUS NUMBER AND SIZE IN HUMAN GENOME
The Table Browser tool (47) at UCSC Genome Browser (62) website 1 was used to download the sequences and genomic positions of HERV-K elements in the human genome sequence. RepeatMasker (49) definitions based on RepBase (50) were filtered by repName matching for LTR5_Hs, LTR5A, or LTR5B for the HERV-K LTRs and HERVK-int for proviral internal HERV-K genome segments. R software (60) 2 was used to plot the loci against their size for LTRs ( Figure 1A) and HERVK-int ( Figure 1B).
Similarly, the genomic coordinates for all known genes in human genome were obtained using knownGene in UCSC Genes track, separately for whole gene, exons, introns, 5 UTRs, 3 UTRs, and CDSs. The HERV-K and gene coordinate data were uploaded on Galaxy (61) 3 for further analysis. To determine the overlap of each locus with annotated genes in the human genome, we used the Intersect tool (version 1.0.0) to compare genomic positions of HERV-K loci with known genes and their subcomponents (exons, introns, 5 UTRs, 3 UTRs, and CDSs). Intersection of at least 1 nt was considered as overlapping to generate the data summarized in Table 2.
The finding that three HERV-K fragments overlapped with CDSs was unexpected and was analyzed further. In one instance, a 772-bp fragment of a solo LTR was found to be located in the TBC1D29 gene on chromosome 17. The LTR fragment was in the opposite orientation as the gene, and the 3 end of the fragment comprised the last 19 coding nucleotides of the ORF, two tandem stop codons, and the small 3 UTR of one isoform of TBC1D29. This gene is a novel, putative TBC-1 domain protein, which are a family of Rab-GAP GTPase activator proteins.
A second instance was only a short 49 bp LTR_5B segment in the opposite orientation as gene R3HCC1 on chromosome 8. The LTR segment formed the 3 part of the second coding exon of the gene, through the third nucleotide before the splice junction. Multiz alignments on the UCSC Genome Browser showed that a similar sequence was in the rhesus macaque (Macaca mulatta) genome assembly, and no comparable element has been found for mouse and dog, consistent with a HERV-K insertion after the catarrhine divergence from other primates. However, similar sequences also have been identified for elephant and opossum. In particular, the elephant sequence encoded an identical amino acid sequence except at one position, which is not consistent with a HERV-K element. Given the short extent of this sequence and the apparent existence of orthologous sequences in elephant and opossum, it is difficult to discern if this is a bona fide HERV-K segment or an unusual sequence convergence, and caution should be reserved.
The third was a viral internal segment that overlapped a cDNA encoding a single exon transcript AX747630 on chromosome 17 with an ORF encoding 180 amino acids. The N-terminal 98 amino acids of the HERV-K Gag protein provided the first portion of the putative human protein. The same HERV-K insertion was present in the rhesus macaque genome assembly, and the CDSs of the human and macaque ORFs were 84% identical. Whether a real protein is produced from the ORF, and whether or not it has any function are unknown.

CELL LINES AND CULTURE CONDITIONS
Four PrCa cell lines (LNCaP, DU145, PC3, and VCaP) were purchased from ATCC (Manassas, VA, USA). Cell lines were grown in 5% CO 2 in a humidified 37°C incubator to 90% confluence.

RNA EXTRACTION
Total RNA was extracted using Trizol (Invitrogen). RNA was then subjected to DNAse-I digestion (TURBO DNA-free kit -Applied Biosystems #AM1907) to remove any genomic DNA contamination. RNA quality and concentration were determined by evaluation of rRNA bands in agarose gel electrophoresis and by NanoDrop spectrophotometric analysis (Thermo Scientific).

RT-PCR AND SEQUENCING
To identify the specific active HERV-K loci, 1 µg of each RNA was used for gene-specific RT-PCR. Primers were designed in wellconserved segments of HERV-K genome to ensure amplification of as many loci as possible. A set of primers was designed across the viral env splicing junction to identify the singly spliced variant of HERV-K transcript. Parallel controls were performed to detect beta-actin and GAPDH transcripts. Primer sequences were as follows: The reverse transcription reactions (SuperScript III First-Strand Synthesis System -Invitrogen #18080-051) were performed using 1 µg RNA in an initial volume of 10 µL with 1 µL of dNTP mix 10 mM, gene-specific primer for a final concentration of 2 µM and DEPC-treated water. After an initial denaturation step, 5 min at 65°C followed by 1 min at 4°C, 2 µL of 10× RT buffer, 4 µL of 25 mM MgCl 2 , 2 µL of 0.1 M DTT, 40 U RNaseOUT, and 200 U SuperScript III RT enzyme were added to reach a final volume of 20 µL. Gene-specific primers used were unspliced-Rev and 1×-env(1)-Rev for experiments shown in Figure 3; LTR-Rev, pro-1-Rev, env-1-Rev, env-2-Rev, and env-3-Rev for experiments to detect viral plus strand transcripts in Figure 6; and LTR-Fwd, pro-1-Fwd, env-1-Fwd, env-2-Fwd, and env-3-Fwd for experiments to detect viral minus strand transcripts in Figure 6. The RT elongation step was performed at 50°C for 50 min, followed by enzyme heat inactivation at 85°C for 5 min. After brief cooling of the sample at 4°C, digestion of residual RNA was performed with RNAse-H 2 U at 37°C for 20 min. Parallel experiments in which no RT enzyme was added were simultaneously carried out.
PCR was performed with 2 µL of RT product, 200 nM primers 200, Platinum PCR SuperMix (Invitrogen), and nuclease-free water to a total volume of 20 µL. After a denaturation step at 94°C for 2 min, 30 cycles of denaturation-annealing-elongation were performed, followed by final elongation at 72°C for 5 min. Denaturation was performed at 94°C for 25 s, annealing for 25 s, and extension at 72°C for each experiment as follows. Detection of unspliced amplicon (Figure 2A) was performed with annealing temperature of 58°C followed by extension time of 2.5 min. Detection of 1×-env amplicon (Figure 2B) was performed with a reaction at annealing temperature of 58°C followed by extension time of 3 min, 2 µL of a 1:200 dilution of the PCR product was then used in nested PCR reaction with the 1×-env(2) primers, and annealing temperature of 61°C followed by extension time of 3 min. Detection of LTR, pro, env-1, env-2, and env-3 amplicons (Figure 6) was performed with annealing temperature of 55°C followed by extension time of 1 min.
Electrophoresis of PCR products was performed in 1% agarose gels, PCR products obtained were purified (PCR purification -Qiagen), and recovered cDNA was sequenced. PCR products were cloned (TOPO TA cloning -Invitrogen Inc.) into pCR4 plasmids using MACH-1 competent cells and sequenced.

IDENTIFICATION OF HERV-K LOCI
Each sequence obtained, trimmed of any vector and primers sequences, was aligned to the human genome using BLAT software (62). The sequence was assigned to the locus obtaining the highest identity score (Tables 3 and 4). If more than one locus had the same identity score, the sequence was considered ambiguous ( Table 4). All sequences meeting the minimum length requirement of GenBank (>200 bp) were submitted there (accession numbers KF254334-KF254392).

QUANTITATIVE RT-PCR
To quantify HERV-K transcripts, 1 µg of RNA from each set was used for two-step, quantitative RT-PCR (qRT-PCR) (Superscript III First-Strand -Invitrogen) and PowerSybr Green (Applied-Biosystems). RT (SuperScript III First-Strand Synthesis System -Invitrogen #18080-051) of 1 µg RNA was performed in an initial volume of 10 µL with 1 µL of 10 mM dNTP mix, 1 µL of 50 µM Oligo(dT) 20 primer, and DEPC-treated water. After an initial denaturation step of 5 min at 65°C followed by 1 min at 4°C, 2 µL of 10× RT buffer, 4 µL of 25 mM MgCl 2 , 2 µL of 0.1 M DTT, 40 U RNaseOUT, and 200 U SuperScript III RT enzyme were added to reach a final volume of 20 µL. The RT elongation step was performed at 50°C for 50 min, followed by enzyme heat inactivation at 85°C for 5 min. After brief cooling of the sample at 4°C, digestion of residual RNA was performed with 2 U RNAse-H at 37°C for 20 min. Parallel experiments with no RT enzyme were carried out simultaneously.
One microlitter of a 1:10 dilution of the RT product was then added to 2× SYBR Green PCR Master Mix (AppliedBiosystems #4367659) and nuclease-free water to a final volume of 8 µL. Primers q-env-Fwd and q-env-Rev were designed in a wellconserved region of HERV-K genome common to unspliced and once-spliced transcript variants downstream of the ∆292 deletion characteristic of type I HERV-Ks. Primers q-LTR-Fwd and q-LTR-Rev were designed in a well-conserved region of HERV-K LTR corresponding to the R region. Two housekeeping genes, human beta-Actin (hACTB) and human GAPDH (hGAPDH), were used as endogenous controls for normalization. Three replicates per sample were used in 384-well plate PCR system (7900HT -AppliedBiosystems) with cycling conditions of 50°C for 2 min, 95°C for 10 min, 40 cycles of 95°C for 10 s, 60°C for 20 s and 72°C for 30 s. Data analysis was performed with SDS 2.4 software (AppliedBiosystems).

ACKNOWLEDGMENTS
The work was partially supported by NIBIB 1 R01 EB009040 (to Chandan Guha).