Systematic Identification of Endogenous Retroviral Protein-Coding Genes Expressed in Canine Oral Malignant Melanoma

Endogenous retroviruses (ERVs) are remnants of ancestral retroviruses that infected host germ cells in the past. Most ERVs are thought to be non-functional elements, but some ERVs retain open reading frames (ORFs) capable of expressing proteins. The proteins encoded by ERV-ORFs have potential roles in oncogenesis; however, studies on mammals other than humans and mice are limited. Here, we identified ERV-derived genes expressed in canine oral malignant melanoma (OMM). We identified 11 ERV-derived genes in our OMM samples. Differential expression gene analysis revealed that four ERV-derived genes (PEG10, LOC102155597, and two newly identified genes) were upregulated in OMM compared to healthy tissues. PEG10 is a conserved long terminal repeat (LTR)-type retrotransposon-derived gene among mammals and is involved in human cancers. LOC102155597 is a retroviral env gene conserved in Carnivora. This Env protein harbors an immunosuppressive domain, implying the potential adverse effects on the immune system. While the production of viral particles from ERVs has been reported in human and mouse melanoma, we found no ERV-derived genes having the potential to produce viral particles. These results provide insights into the different and conserved features of ERV-derived genes in mammalian melanoma.


INTRODUCTION
Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections to germ cells and have accumulated in all mammalian genomes (1). Retroviruses have three structural and enzymatic genes: gag gene encoding the major structural protein; pol gene encoding a polyprotein consisting of multiple proteins including protease, reverse transcriptase, RNase H, and integrase; env gene encoding the envelope protein. In addition, some retroviruses encode regulatory and/or accessory genes that control viral gene expression and/or suppress host defense mechanisms. Most ERVs have lost their open reading frames (ORFs) due to the accumulation of mutations. However, some genes derived from ERVs are expressed as proteins in host cells when the ORFs are still intact because they are young ERVs, or when the ORFs are domesticated as de novo host genes (1).
In human melanoma, aberrant expression of ERVs has been reported to be associated with oncogenesis (2). Human ERV-K (HERV-K) is actively expressed in human melanoma, and their protein expression, as well as particle formation, have been reported (3,4). ERV3-1, an env gene of HERV-R, is also reported to be expressed in colorectal cancers (5) and acute myelogenous leukemia (6). In mice, active expression of an ERV called MelARV has been reported in B16/BL6 melanoma cells (7,8). Knockdown of MelARV reduces the aggressiveness of B16 melanoma in transplanted mice (9). They proposed a functional model in which Env protein of MelARV regulates regulatory T cells (9). Thus, aberrantly expressed ERV genes may be a promising therapeutic target for melanoma. However, the association between melanoma and ERVs in mammals other than humans and mice remains unresolved.
Long terminal repeat (LTR)-type retrotransposons that mainly correspond to ERVs comprise 8 and 10% of the human and mouse genomes, respectively (10,11). The overall proportion of ERVs as well as their lineages varies greatly among mammalian species (12). In the dog genome, for example, a proportion of genomic regions corresponding to ERVs is relatively low at 4.85% (13). According to a comprehensive study identifying ERV-derived ORFs (ERV-ORFs) in the genome, the number of canine ERV-ORFs is one-third of those in humans (14). Therefore, the expressed ERV-derived genes in canine melanoma are supposed to be fewer than in human and mouse melanoma; however, comprehensive characterization of ERVs expressed in canine melanoma has not been performed.
Canine oral malignant melanoma (OMM) is a highly metastatic tumor in dogs, spreading to the lung, lymph nodes, and liver, and shows a grave prognosis (15). Surgical resection and radiation are first-line treatments for OMM cases with no metastasis; however, there are no effective systemic treatments after metastasis (15). Canine cancers, including OMM, have been proposed as research models for human cancer studies (16,17). On the contrary, the applications of the cancer therapies developed in humans to dogs are also underway, such as the administration of immune checkpoint inhibitors (18,19). These studies have shown the effectiveness of strategies targeting genes conserved in humans and dogs. Though most ERVs are not conserved in humans and dogs, considering the active expression of ERVs in human and mouse melanoma, it is plausible that ERVs, including canine-specific ones, are expressed in canine OMM. Since cancer-specific ERVs may not be detectable in normal tissues, de novo transcripts in OMM samples, in addition to the reference transcripts, are required for the comprehensive gene expression analysis. Here, we made a catalog of transcripts retaining the ERV-ORFs that are expressed in canine OMM samples and conducted differential gene expression analyses. Since ERVs are involved in oncogenesis in humans and mice, these canine ERVs have the potential to be new therapeutic targets and markers. This study also will provide a model for studying ERVs' common roles in melanoma across mammalian species.

RNA Sequencing
Canine OMM samples (n = 8) were obtained following surgical resection as treatment at Animal Medical Center, Yamaguchi University (
To remove LINE-derived sequences, ERV-ORFs which were annotated as "LINE" by RepeatMasker and/or "YP_073558.1" or "NP_048132.1" by BLASTP against the NCBI Viral Genome Database were removed. Transcripts in the merged GTF file overlapped with ERV-ORFs were identified using BEDtools intersect with "-s" option to consider strandedness (22). Fortythree ERV-derived genes detected at this step were listed in Supplementary Table 2. All ERV-derived genes were manually checked using Integrative Genomics Viewer (version 2.4.9) (23). Transcripts per kilobase million (TPM) was calculated using StringTie2 with "-e -rf " options using the merged GTF file. Eleven ERV-derived genes with the mean TPM > 1 were listed including Canis-env2 ( Table 2). Differentially expressed genes were identified and visualized with an MA plot and a volcano plot using DESeq2 (version 1.26.0) (24) in R (version 3.6.1).

Evolutionally Analyses of Canis-Env2
To collect Canis-env2 orthologs in Carnivora genomes, coding sequences of Canis-env2 in the dog was searched using Blat (25) in the following represented species in the UCSC genome browser (https://genome.ucsc.edu): Southern Sea otter, enhLutNer1 (Jun. 2019); Ferret, musFur1 (Apr. 2011); Hawaiian monk seal, neoSch1 (Jun. 2017); panda, ailMel1 (Dec. 2009); and Cat, felCat9 (Nov. 2017). The hits with the highest similarity scores were confirmed to be located between BTN2A1 and BTN1A1. To conduct the following evolutionary analyses of Canis-env2, we used MEGA-X (26). We first aligned nucleotide sequences of Canis-env2 genes by MUSCLE program with "Align codons" option (27), and pairwise p-distance values of amino acids, which are proportions of amino acid sites at which the two sequences to be compared are different, were calculated. The numbers of non-synonymous substitutions (dN) and synonymous substitutions (dS) per site were estimated by Nei-Gojobori method with Jukes-Canter correlation (28).

Identification of Transcriptionally Active ERV-Derived Genes in Canine OMM
We conducted RNA-seq analyses using canine oral malignant melanoma (OMM) cases (n = 8). To identify ERV-derived genes from these sequence data, we constructed a pipeline to identify ERV-ORFs expressed in canine melanoma ( Figure 1A). First, we constructed de novo transcript assemblies with reference to the RefSeq gene coordinates using StringTie2. In this procedure, transcripts expressed from the overlapped locus were given a single gene name. In total, we obtained 106,209 transcripts from the 8 OMM RNA-seq samples, and these transcripts were organized into 34,292 genes, of which 3,035 genes were not reported in the reference gene annotation ( Figure 1B). Next, we extracted genes whose transcripts were overlapped with ERV-ORFs in the gEVE database (http://geve.med.utokai.ac.jp/) (14) (Figure 1C). As a result, we obtained 43 genes, of which five genes were newly identified ( Figure 1D, Supplementary Table 2).

Characterization of ERV-Derived Genes
Next, we examined the structures of these 43 genes overlapped with ERV-ORFs. In nine genes, ERV-ORFs were identical to full or partial coding sequences of host genes: ASPRV1, CAR1, CNBP, CTSD, GIN1, PEG10, RTL1, SUGP2, and TAF1 (Figure 2). In these genes, four genes-ASPRV1, CAR1, PEG10, and RTL1 -were known to originate from ERVs or LTRtype retrotransposons and mammalian or Carnivora-specific genes. ASPRV1 codes an aspartic protease conserved in mammals and has roles in skin maintenance (29). PEG10 and RTL1 are gag-pol genes derived from Ty3/Gypsy retrotransposon and were involved in placental development (30,31). CAR1, also known as syncytin-Car1, is a retroviral env gene conserved in Carnivora (32). Syncytin-Car1 protein is involved in the cell fusion of syncytiotrophoblast during placentation (32). We used these four definite retroviral genes for further analysis. The remaining five genes-CNBP, CTSD, GIN1, SUGP2, and TAF1-consist of several exons, and one exon identified as containing ERV-ORFs (Figure 2A). Orthologs of these exons were found in at least chickens, mice, and humans and may be derived from ERVs before the divergence of birds and mammals. This possibility has been particularly suggested for GIN1, where its integrase domain shows a high similarity to the GIN element, an animal-specific DNA transposon that encodes a Gypsy/Ty3 retrotransposon-like integrase (33). On The extraction method for ERV-derived genes. Genes of which exon(s) were overlapped with the same coordinates annotated in the gEVE database were defined as ERV-derived genes. Note that genes whose intron(s) were overlapped with the gEVE annotation, or genes overlapped with the gEVE annotation in reverse direction were not regarded to ERV-derived genes. (D) Names of ERV-derived genes. Five ERV-derived genes were not found in the RefSeq genes. Twenty ERV-derived genes were annotated in the RefSeq database and expressed in the OMM samples. Eighteen ERV-derived genes were RefSeq genes, but those were not expressed in the OMM samples. the other hand, to the best of our knowledge, no evidence has been reported to indicate that CNBP, CTSD, SUGP2, and TAF1 were derived from transposons. It is also plausible that these genes independently acquired protein motifs similar to ERVs. Since it is beyond the focus of this study to make conclusions, we did not regard these five genes as ERV-derived genes in this study. Two splicing variants of CRADD were overlapped with two ERV-ORFs, although they were in the 5 ′ UTR. Therefore, CRADD was also removed for further analyses. Others were uncharacterized genes in the RefSeq database (i.e., genes whose names start with LOC), including non-coding genes or newly identified non-RefSeq genes. To identify sufficiently expressing genes, we collected genes that were expressed in the eight canine OMM samples with an average TPM > 1. Finally, we identified 11 ERV-derived genes, including the newly identified melanoma genes 1 to 4 (NMG-1 to 4), which were expressed in canine oral melanoma ( Table 2). Since NMG-1 to 4 overlapped with RepeatMasker track of an ERV group, CfERV1-int (Figure 2B; Supplementary Figure 1), these genes are presumed to be derived from ERVs.

Identification of Upregulated ERV-Derived Genes in Canine OMM
Next, we identified the ERV-derived genes that showed higher expression levels in the OMM samples than those in healthy tissues. Although we did not conduct the transcriptome sequencing of control samples, we utilized publicly available RNA-seq data of eight canine OMM samples and three canine oral healthy samples, all of which were obtained from the SRA database (20) (Supplementary Table 1). Then, we found that four ERV-derived genes were significantly upregulated (PEG10, NMG-1, NMG-4, and LOC102155597), and two ERV-derived genes were down-regulated (ASPRV1 and LOC10215559) in OMM (adjusted p < 0.05) (Figures 3A,B,  Supplementary Table 4). TPMs of these four upregulated genes varied among melanoma samples, but their expression levels were low in all healthy tissues (Figure 3C). PEG10 was reported as an important gene in several species of human cancers, such as hepatocellular carcinoma (34), breast cancer (35), lung cancer (36), and neuroendocrine prostate cancer (37). NMG-1 and NMG-4 are newly identified genes, and their expressions were experimentally verified by RT-PCR ( Figure 3D). NMG-1 gene has a single transcript isoform that is composed of two exons, and its second exon was overlapped with four ERV-ORFs annotated by gEVE database (Figure 2B). NMG-4 gene has two transcripts, and both are composed of three exons. LOC102155597 was overlapped with an Env-like ERV-ORF of 447 amino acid length ( Figure 2B). This env gene was previously designated as Canis-env2 (32). We found that Canis-env2 homologs located between BTN2A1 and BTN1A1 among six Carnivora species available in UCSC genome browser, and they were presumed to be orthologs ( Figure 4A). We found that the coding sequences of Canis-env2 orthologs were under purifying selection, suggesting that the proteins are functionally important (Figure 4B). This Canis-Env2 protein retains the signal peptide, the furin cleavage site, and the immunosuppressive domain, but not the transmembrane (TM) domain ( Figure 4C). Together, we successfully identified newly identified or uncharacterized retroviral coding genes expressed in canine OMM.

DISCUSSION
In this study, we identified and characterized ERV-derived genes that are expressed in canine OMM samples (Figures 1,  2). Among them, four ERV-derived genes are upregulated in melanoma compared to healthy tissues (Figure 3). Though ERV-derived viral particles have been found in melanoma in humans and mice (3, 7), we could not find such ERVderived genes that could potentially produce viral particles. Since this study is based on the canine reference genome (canFam3), we could not detect newly generated genes in OMM genomes. Therefore, the relationship between polymorphisms and ERVs in the genome of canine OMM remains to be investigated. Even with these limitations, evolutionarily conserved as well as expressed ERV-derived genes were newly identified in canine OMM in this study (Figure 4).
PEG10 is upregulated in several human cancers (39). Comprehensive expression analysis revealed that PEG10 is upregulated, especially in hepatocellular carcinoma and breast carcinoma (39). In hepatocellular carcinoma, PEG10 is involved in the TGF-β1-triggered epithelial-mesenchymal transmission, progressing metastasis (34). PEG10 interacts with SIAH1 and suppresses apoptosis of hepatocellular carcinoma (40). Though a correlation between PEG10 and human melanoma has not been reported yet, considering the upregulation of PEG10 in canine OMM, this gene may be involved in tumor development in various mammalian species.
NMG-1 and NMG-4 are two ERV-derived genes that were newly identified in this study. Both retained fragmented ORFs and overlapped with CfERV1-int, a canine-specific ERV family. Therefore, these two genes are relatively new ERVs and may be under being disrupted in the canine genome by mutation. Since their expressions were suppressed to very low levels in healthy samples, they may become a valuable marker for OMM. Future studies on the correlation of the expression levels with malignancy are needed.
LOC111094052 is an Env-coding gene, conveniently named Canis-env2 in the previous study (32). Canis-env2 orthologs were identified in all mammals belonging to the four major lineages of the order Carnivora ( Figure 4A). Pairwise dN/dS values among orthologs were <1 (Figure 4B), and their amino acid sequences were conserved (Figure 4C), suggesting the functional importance of the protein. Therefore, Canis-env2 has been under co-option as a host gene in the Carnivora. While the function of Canis-Env2 protein is still unknown, the structure of Canis-Env2 is very similar to a primatespecific Env protein, HEMO, which lost the TM domain by protease cleavage (41). In addition, env-panMars, an env gene conserved in marsupials, also lost the TM domain due to a stop codon before the TM domain. HEMO, Env-panMars, and Canis-Env2 retain immunosuppressive domains and may be involved in cancer immunity. Tumors expressing Env can avoid immune rejection when transplanted into mice (42).
MCA205 tumor cells expressing ERV-FRD1 Env (Syncytin-2) and ERV3 Env suppress the immune rejection in mice depending on the immunosuppressive domains. Syncytin-2 and ERV3 are expressed in the placenta, and their immunosuppressive activities are thought to be conserved for fetal-maternal immunosuppression (43). Thus, aberrant expression of Canis-Env2 may have a role in suppressing immune systems against OMM.
Our current study has some limitations: the small number of samples as well as the lack of well-designed control groups. These are partially due to the inevitable circumstances that our sample collection was dependent on the hospital surgeries for domesticated dogs. Furthermore, the expression of the transcripts and proteins identified in our RNA-seq analysis were not immunohistologically validated. However, our analysis clearly indicated that four ERV-derived genes (PEG10, LOC111094052, and two fragmented-ORF genes) was expressed in canine OMM. These results indicate that even in dogs with relatively low ERV copy numbers, ERV-derived genes similar to those of humans and mice are expressed.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The animal study was reviewed and approved by the Institutional Animal Experiment Ethics Committee of Yamaguchi University. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
KK and TMiy designed the study. AS, MY, and TMiz performed experiments. KK performed the computational analyses, with supervision by SN. KK, SN, TMiz, and TMiy prepared the manuscript. All authors reviewed the manuscript.

FUNDING
This work was supported by JSPS KAKENHI (Grant Numbers 20J22607 to KK, 19K22365 to SN, TMiz, and TMiy, 20K06775 to SN and TMiy, and 20H03150 to SN and TMiy).