<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2018.02534</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Faustovirus E12 Transcriptome Analysis Reveals Complex Splicing in Capsid Gene</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Cherif Louazani</surname> <given-names>Amina</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/594675/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Baptiste</surname> <given-names>Emeline</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/627065/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Levasseur</surname> <given-names>Anthony</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/240801/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Colson</surname> <given-names>Philippe</given-names></name>
<uri xlink:href="http://loop.frontiersin.org/people/75543/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>La Scola</surname> <given-names>Bernard</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/198600/overview"/>
</contrib>
</contrib-group>
<aff><institution>Assistance Publique &#x2013; H&#x00F4;pitaux de Marseille (AP-HM), Microbes, Evolution, Phylogeny and Infection (ME&#x03A6;I), Institut Hospitalo-Universitaire (IHU) M&#x00E9;diterran&#x00E9;e Infection, Institut de Recherche pour le D&#x00E9;veloppement IRD 198, Aix-Marseille Universit&#x00E9; UM63</institution>, <addr-line>Marseille</addr-line>, <country>France</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Erna Geessien Kroon, Universidade Federal de Minas Gerais (UFMG), Brazil</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Juliana Cortines, Universidade Federal do Rio de Janeiro, Brazil; Masaharu Takemura, Tokyo University of Science, Japan</p></fn>
<corresp id="c001">&#x002A;Correspondence: Bernard La Scola, <email>bernard.la-scola@univ-amu.fr</email></corresp>
<fn fn-type="other" id="fn002"><p>This article was submitted to Virology, a section of the journal Frontiers in Microbiology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>23</day>
<month>10</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>09</volume>
<elocation-id>2534</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>07</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>04</day>
<month>10</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2018 Cherif Louazani, Baptiste, Levasseur, Colson and La Scola.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Cherif Louazani, Baptiste, Levasseur, Colson and La Scola</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Faustoviruses are the first giant viruses of amoebae isolated on <italic>Vermamoeba vermiformis</italic>. They are distantly related to African swine fever virus, the causative agent of lethal hemorrhagic fever in domestic pigs. Structural studies have shown the presence of a double protein layer encapsidating the double-stranded DNA genome of Faustovirus E12, the prototype strain. The major capsid protein (MCP) forming the external layer has been shown to be 645-amino acid-long. Unexpectedly, its encoding sequence has been found to be scattered along a 17 kbp-large genomic region. Using RNA-seq, we studied expression of Faustovirus E12 genes at nine time points over its entire replicative cycle. Paired-end 250 bp-long read sequencing on MiSeq instrument and double-round spliced alignment enabled the identification of 26 different splice-junctions. Reads corresponding to junctions represented 2% of mapped reads and mostly matched with the predicted MCP encoding sequences. Moreover, our study enabled describing a 1,939 bp-long transcript that corresponds to the MCP, delineating 13 exons. At least two types of introns coexist in the MCP gene: group I introns that can self-splice (<italic>n</italic> = 5) and spliceosome-like introns with non-canonical splice sites (<italic>n</italic> = 7). All splice-sites were non-canonical with five types of donor/acceptor splice-sites among which AA/TG was the most frequent association.</p>
</abstract>
<kwd-group>
<kwd>giant virus</kwd>
<kwd>faustovirus</kwd>
<kwd>transcriptome</kwd>
<kwd>capsid</kwd>
<kwd>splicing</kwd>
</kwd-group>
<counts>
<fig-count count="5"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="45"/>
<page-count count="10"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec><title>Introduction</title>
<p>Faustoviruses are the first giant viruses of amoebae isolated using <italic>Vermamoeba vermiformis</italic> as cellular culture support (<xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>). Their capsids are icosahedral and virions are 200&#x2013;240 nm large (<xref ref-type="bibr" rid="B10">Benamar et al., 2016</xref>). These viruses are distantly related to African swine fever virus, the causative agent of lethal hemorrhagic fever in domestic pigs (<xref ref-type="bibr" rid="B3">Alonso et al., 2018</xref>) and single species of family <italic>Asfarviridae</italic> (<xref ref-type="bibr" rid="B22">Iyer et al., 2001</xref>). In addition, two other faustovirus relatives have recently been described. Kaumoebavirus, also isolated on <italic>V. vermiformis</italic>, stands phylogenetically outside the asfarvirus&#x2013;faustovirus group (<xref ref-type="bibr" rid="B9">Bajrai et al., 2016</xref>). Pacmanvirus, isolated on <italic>Acanthamoeba castellanii</italic>, is nested in phylogenetic analyses between asfarviruses and faustoviruses (<xref ref-type="bibr" rid="B6">Andreani et al., 2017</xref>). So far, 11 faustovirus isolates have been isolated, in all cases from sewage samples collected in France, Lebanon and Senegal (<xref ref-type="bibr" rid="B14">Cherif Louazani et al., 2017</xref>). Faustovirus-like sequences were also identified in metagenomes generated from arthropods as well as from febrile patients, healthy people, and from rodents (<xref ref-type="bibr" rid="B39">Temmam et al., 2015</xref>).</p>
<p>To better characterize the genomic diversity of faustoviruses, the genomes of the 11 isolates have been sequenced and annotated. These double-stranded DNA genomes contain between 456 and 491 kilobase pairs (kbp), have a G + C content comprised between 36.2 and 39.6%, and were predicted to encode between 457 and 519 genes (<xref ref-type="bibr" rid="B10">Benamar et al., 2016</xref>). Four lineages could be inferred from phylogenetic analyses of the core genome, with no clustering of the strains according to their geographical origin (<xref ref-type="bibr" rid="B10">Benamar et al., 2016</xref>; <xref ref-type="bibr" rid="B14">Cherif Louazani et al., 2017</xref>). For all these isolates, many hypothetical proteins were predicted, for which no function could be inferred due to the absence of recognizable homologs or conserved domains, their number being 148 among proteins encoded by the core genes.</p>
<p>In Faustovirus E12, the prototype virus of this group, proteomic analyses confirmed the presence in mature virions of 162 (33%) of the predicted proteins (<xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>). Moreover, cryo-electron microscopy showed the presence of a double protein layer encapsidating its genome (<xref ref-type="bibr" rid="B27">Klose et al., 2016</xref>). The major capsid protein (MCP) forming its external protein layer has been shown to be 645-amino acid-long. In addition, it folds into the double jelly roll motif that is characteristic of the capsid proteins of large nucleo-cytoplasmic double-stranded DNA viruses (NCLDV), a group of viral families that comprises the <italic>Asfarviridae</italic> family (<xref ref-type="bibr" rid="B22">Iyer et al., 2001</xref>). Strikingly, the sequences encoding the Faustovirus E12 MCP appeared to be scattered along a 17 kbp-large genomic region, with fragments located in both annotated and unannotated ORFs. This observation suggested that Faustovirus E12 uses an extended splicing during the expression of its MCP (<xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>; <xref ref-type="bibr" rid="B27">Klose et al., 2016</xref>).</p>
<p><italic>In silico</italic> gene finding approaches have limitations in identifying genes, especially those that undergo post-transcriptional modifications or are present in the genomes of non-model organisms (<xref ref-type="bibr" rid="B26">Klasberg et al., 2016</xref>). The RNA-seq technology is particularly helpful in such cases. Using high throughput sequencing, RNA-seq allows high resolution identification of whole genome transcripts, of splicing events and splice junctions. It delineates the transcriptional structure of genes, and provides interesting information on gene expression levels and kinetics (<xref ref-type="bibr" rid="B42">Wang et al., 2009</xref>). Thus, previous studies of giant virus transcriptomes used RNA-seq to validate gene predictions and determine the precise 5&#x2032; and 3&#x2032; UTR structures of transcripts (<xref ref-type="bibr" rid="B29">Legendre et al., 2014</xref>, <xref ref-type="bibr" rid="B30">2015</xref>). For Mimivirus, this approach increased the gene repertoire of 49 genes and detected a new component of the transcription apparatus (<xref ref-type="bibr" rid="B31">Legendre et al., 2011</xref>).</p>
<p>In the present study, we provide a comprehensive view of Faustovirus E12 genes expression through massive parallel sequencing of the total RNA-derived cDNA. We put a special focus on the identification of splicing events in the transcription process of the MCP encoding gene over the entire replicative cycle.</p>
</sec>
<sec id="s1" sec-type="materials|methods">
<title>Materials and Methods</title>
<p>A flowchart summarizing the main steps used in this study is presented in Figure <xref ref-type="fig" rid="F1">1</xref>.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Flowchart illustrating the workflow of this study. This flowchart shows the general pipeline of this RNA-seq study, starting from sample preparation and RNA extraction to cDNA sequencing and data analyses. The biological interpretation of expression count was possible through the functional categories clustering of expressed genes.</p></caption>
<graphic xlink:href="fmicb-09-02534-g001.tif"/>
</fig>
<sec><title>Data Acquisition</title>
<sec><title>Virus Production and Infection Cycle</title>
<p>Faustovirus E12 was produced on <italic>V. vermiformis</italic> (strain CDC19) as in a previously described procedure (<xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>). Briefly, confluent monolayers of amoebae in Peptone-Yeast extract-Glucose (PYG) medium incubated at 28&#x00B0;C were rinsed with Page&#x2019;s Amoeba Saline buffer (PAS) and centrifugated twice at 720 &#x00D7; <italic>g</italic> for 10 min, then put in a starvation medium at an adjusted concentration of 10<sup>6</sup> cells/mL. The amoebae were then incubated at 30&#x00B0;C with a viral suspension at a MOI of five until complete cell lysis. The culture supernatant was then filtered at 0.45 &#x03BC;m to eliminate cellular debris and the filtrate was titrated by limited dilution assay.</p>
<p>For the interrupted infection cycle, adherent <italic>V. vermiformis</italic> incubated in PYG medium were put in contact with viral suspension at a MOI of 100. After incubation at 30&#x00B0;C for 1 h, the supernatant was removed, and the cultures were gently rinsed three times with PAS to eliminate excess virus. This marked time 0 (T0). For later time points, infected and rinsed amoebae were incubated at 30&#x00B0;C in PYG. Infected cells were pelleted by centrifugation at 720 &#x00D7; <italic>g</italic> for 10 min and were stored at -80&#x00B0;C in PBS.</p>
<p>In total, we realized two infection cycles with the following post-infection time points in duplicate: (<italic>t</italic> = 0, 15 min, 90 min, 3 h, 6 h, and 8 h), hereafter referred to as T0min-1, T15min-1, T90min-1, T3H-1, T6H-1, T8H-1 for cycle 1 and T0min-2, T15min-2, T90min-2, T3H-2, T6H-2, and T8H-2 for cycle 2. The second cycle included three additional late time points (<italic>t</italic> = 11 h, 17 h, and 20 h): T11H, T17H, and T20H.</p>
</sec>
</sec>
<sec><title>RNA Extraction and cDNA Sequencing</title>
<p>RNA was extracted using the RNeasy mini kit (Cat No: 74104, Qiagen, France) according to the manufacturer&#x2019;s instructions. Total RNA was eluted in a 50 &#x03BC;L volume of RNase-free water. RNaseOUT (Thermo Fisher Scientific, France) was added to the elute to prevent RNA degradation. Genomic DNA contamination was checked using a PCR system targeting Faustovirus E12 DNA (forward primer: TCGGCATCAATCGCCTTATAG; reverse primer: GGCCAGAAGGGTCATTAACA). Two cycles of 30 min-DNase treatment using TURBO DNase (Invitrogen, France) incubation at 37&#x00B0;C were performed on the samples to achieve absence of DNA contamination. RNeasy MinElute Cleanup Kit (Qiagen) was used to purify DNA-free total RNA, using the manufacturer&#x2019;s protocol with an RNA elution volume of 14 &#x03BC;L in RNase-free water.</p>
<p>The extracted total RNAs were reverse transcribed into cDNA using random primers with the SuperScript VILO Synthesis Kit (Invitrogen, France). cDNA amplicons were purified with the Agencourt AMPure XP system (Beckman Coulter Inc., CA, United States). Two sets of purified cDNA corresponding to the early and a complete Faustovirus E12 infection cycle were sequenced on a MiSeq instrument with the 2-bp &#x00D7; 250-bp paired-end strategy, using Nextera XT DNA sample prep kit (Illumina Inc., CA, United States). Quantified cDNAs were fragmented, tagged, then barcoded through limited cycle PCR amplification (12 cycles). After purification on Agencourt AMPure XP beads (Beckman Coulter Inc., CA, United States), the libraries were normalized on specific beads and pooled for sequencing. Each set was loaded on a separate flowcell.</p>
</sec>
<sec><title>Data Analyses</title>
<sec><title>Quality Control and Pre-processing of Reads</title>
<p>The raw data of paired-end reads were adapter trimmed. Adapter-free reads were checked for quality using PrinSeq web-version 0.20.1 (<xref ref-type="bibr" rid="B36">Schmieder and Edwards, 2011</xref>). Reads with over 10% Ns were filtered out. PolyA/T tails of over seven nucleotides (nt) were trimmed. Reads were quality trimmed from 5&#x2032;-end with a sliding window of four and a step of three, with a mean Phred-scaled quality score cutoff of 20.</p>
</sec>
<sec><title>Study of Faustovirus Genes Expression</title>
<p>To identify potential splicing events in Faustovirus E12, we used a two-round alignment approach with a spliced-mapper: first, both pre-processed paired reads and singleton were mapped against the genomic sequence of Faustovirus E12 (GenBank accession no. KJ614390.1) using HISAT2 with minimum and maximum size of introns set to 20 and 5,000 bp (<xref ref-type="bibr" rid="B25">Kim et al., 2015</xref>). Spliced reads were extracted, and junctions manually validated using the Gene BED To Exon/Intron/Codon BED expander (Galaxy Version 1.0.0) (<xref ref-type="bibr" rid="B1">Afgan et al., 2016</xref>) and the Integrative Genomics Viewer (IGV) tool (<xref ref-type="bibr" rid="B40">Thorvaldsdottir et al., 2013</xref>). Junctions supported by at least two reads were included as known junctions in the second alignment round.</p>
<p>For each time point, reads mapping to the viral genes were quantified and the counts normalized with the geometric method using Cuffnorm (Galaxy Version 2.2.1.1) (<xref ref-type="bibr" rid="B41">Trapnell et al., 2010</xref>). For <italic>t</italic> = 0 to <italic>t</italic> = 8 h p.i., for which two biological replicates were available, both replicates were used as entries for a common normalized count (T0-c to T8H-c). To study the functional profile of the genes expressed during the replicative cycle, a BLASTp (<xref ref-type="bibr" rid="B4">Altschul et al., 1997</xref>) search of Faustovirus E12 annotated ORFs was performed against the Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) proteins database<sup><xref ref-type="fn" rid="fn01">1</xref></sup> (<xref ref-type="bibr" rid="B44">Yutin et al., 2014</xref>). Hits with e-values below 1e-03 were considered significant and assigned to their corresponding NCVOGs. A weighted average of expressed genes in Fragments Per Kilobase of transcript per Million mapped reads (FPKM) was calculated for each functional category at each time point.</p>
<p>Proteins of the African swine fever virus (ASFV) identified in the purified particles (<xref ref-type="bibr" rid="B2">Alejo et al., 2018</xref>) were searched for homologs in Faustovirus E12 using BLASTp (<xref ref-type="bibr" rid="B4">Altschul et al., 1997</xref>) with 1e-03 as cutoff.</p>
</sec>
<sec><title>Study of the Major Capsid Protein Encoding Gene in Faustovirus E12</title>
<p>The 645-amino-acid protein sequence of the Faustovirus E12 MCP (UniProtKB accession no.: A0A0H3TLP8) was used to predict coding regions in the viral genome, using GeneWise (online version: wise2-4-1) (<xref ref-type="bibr" rid="B32">Li et al., 2015</xref>) with the GeneWise 623 algorithm, the flat null model and modeled splice sites as entry parameters. Predicted positions of exons were manually curated using information from junction reads. The coordinates of the corresponding junctions have been added to the file of known splice junctions for the second-round alignment of total RNA-derived cDNA.</p>
</sec>
</sec></sec>
<sec><title>Results</title>
<sec><title>Faustovirus E12 Gene Expression</title>
<p>The transcriptome sequencing of Faustovirus E12-infected <italic>V. vermiformis</italic> resulted in 8,909,144 read pairs distributed over nine time points with two biological replicates corresponding to <italic>t</italic> = 0 min, 15 min, 90 min, 3 h, 6 h, and 8 h, and one replicate for <italic>t</italic> = 11, 17, and 20 h. After quality control, pre-processing and double-round mapping, reads corresponding to Faustovirus E12 represented &#x003C;1% of the total number of generated reads, yet covering 93.5% of the genome positions with at least one read in at least one dataset. Single-base-resolution coverage maps across the genome for datasets of both cycles are reported in Figure <xref ref-type="fig" rid="F2">2</xref> and Supplementary Figure <xref ref-type="supplementary-material" rid="SM2">1</xref>. We observed a gradual increase in genome coverage during the replication cycle, illustrating an active transcription process starting early after infection. Two major shifts in coverage peaks profiles were observed after <italic>t</italic> = 90 min and <italic>t</italic> = 8 h, marking transitions from early to intermediate and from intermediate to late infection time points.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Map of Faustovirus E12 genome coverage during the replication cycle. The predicted protein coding sequences are represented on the external circle in <italic>red</italic> and <italic>blue boxes</italic> for the forward and reverse strand, respectively. The single base resolution coverage for each time point is reported in the colored concentric circles from <italic>t</italic> = 0 to 20 h for the complete replication cycle samples set. Position 0 is at the 12 o&#x2019;clock position.</p></caption>
<graphic xlink:href="fmicb-09-02534-g002.tif"/>
</fig>
<p>We detected that Faustovirus E12 expresses during its replicative cycle 90% (445/492) of its predicted genes including all but two genes that were assigned an NCVOG ID (116/118) (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">1</xref>). These two genes are a putative metal-dependent hydrolase (PRJ_Fausto_00294) and an uncharacterized protein (PRJ_Fausto_00234). Genes related to DNA replication, recombination and repair; nucleotide metabolism and transcription and RNA processing were expressed early and throughout the whole cycle. These include a hydrolase and a putative P-loop containing nucleoside triphosphate hydrolase, the ribonucleotide reductase small and large subunits and the hypothetical protein (PRJ_Fausto_00128) containing a Rho factor transcription termination domain.</p>
<p>A large amount (32&#x2013;75%) of the transcripts detected at early time points and up to 6 h p.i. corresponded to uncharacterized or poorly characterized proteins. Among the early expressed genes, we also found genes predicted to be involved in the ubiquitin-proteasome pathway and in host response regulation notably ankyrin repeats and membrane occupation and recognition nexus (MORN) repeat containing proteins. DNA directed RNA polymerase subunits are expressed starting from 90 min p.i. along with the transcription factor S-II (TFIIS), the mRNA capping enzyme and the translation initiation factor SUI1. The first transcripts corresponding to the MCP appeared at 3 h p.i while genes related to virion structure and morphogenesis were expressed starting from 6 h p.i. with increasing abundance in the late times. From 8 h p.i., the majority (50&#x2013;73%) of the transcripts corresponded to proteins involved in virion structure and morphogenesis (Figure <xref ref-type="fig" rid="F3">3</xref>). Table <xref ref-type="table" rid="T1">1</xref> lists Faustovirus E12 genes predicted to encode for homolog proteins to those detected in ASFV purified particles proteome and their expression in late time points. All proteins forming the core shell in ASFV have their homologs in Faustovirus E12 expressed starting from 6 h p.i.: the 220 kDa polyprotein, the 62 kDa polyprotein and the protease necessary for their cleavage into their corresponding products. Other proteins found in the nucleoid of ASFV with their homolog predicted genes being expressed in Faustovirus E12 include all RNA polymerase subunits and RNA modification enzymes, transcription factors and DNA repair enzymes. Interestingly, using sequence homology, we were unable to identify in Faustovirus E12 genes predicted to encode for proteins detected in the outer and inner envelope of ASFV.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Functional categories distribution of expressed genes during the replication cycle of Faustovirus E12. Faustovirus E12 ORFs were assigned a functional category based on sequence homology with the Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOG) proteins database. The ratio of expression per functional category is reported for each time point.</p></caption>
<graphic xlink:href="fmicb-09-02534-g003.tif"/>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>ASFV virion-forming proteins with homologs detected in the transcriptome of Faustovirus E12.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">ASFV gene</th>
<th valign="top" align="left">ASFV protein</th>
<th valign="top" align="left">Function</th>
<th valign="top" align="left">Faustovirus E12 homolog protein</th>
<th valign="top" align="left">T8H</th>
<th valign="top" align="left">T11H</th>
<th valign="top" align="left">T17H</th>
<th valign="top" align="left">T20H</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">B646L</td>
<td valign="top" align="left">Major capsid protein p72</td>
<td valign="top" align="left">Morphogenesis</td>
<td valign="top" align="left">MCP</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">CP2475L</td>
<td valign="top" align="left">Polyprotein pp220</td>
<td valign="top" align="left">Morphogenesis</td>
<td valign="top" align="left">AIB52024</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">CP530R</td>
<td valign="top" align="left">Polyprotein pp62</td>
<td valign="top" align="left">Morphogenesis</td>
<td valign="top" align="left">AIB52025</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">S273R</td>
<td valign="top" align="left">Polyprotein processing protease</td>
<td valign="top" align="left">Morphogenesis</td>
<td valign="top" align="left">AIB52094</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">O174L</td>
<td valign="top" align="left">DNA polymerase X</td>
<td valign="top" align="left">DNA integrity</td>
<td valign="top" align="left">AIB52077</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">E296R</td>
<td valign="top" align="left">AP endonuclease</td>
<td valign="top" align="left">DNA integrity</td>
<td valign="top" align="left">AIB52085</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">E165R</td>
<td valign="top" align="left">dUTPase</td>
<td valign="top" align="left">DNA integrity</td>
<td valign="top" align="left">AIB51748</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">NP419L</td>
<td valign="top" align="left">DNA ligase</td>
<td valign="top" align="left">DNA integrity</td>
<td valign="top" align="left">AIB52048</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">NP1450L</td>
<td valign="top" align="left">RNA polymerase subunit 1</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52040</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">EP1242L</td>
<td valign="top" align="left">RNA polymerase subunit 2</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB51752</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">H359L</td>
<td valign="top" align="left">RNA polymerase subunit 3-11</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52132</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">D205R</td>
<td valign="top" align="left">RNA polymerase subunit 5</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52137</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">C147L</td>
<td valign="top" align="left">RNA polymerase subunit 6</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB51823</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">D339L</td>
<td valign="top" align="left">RNA polymerase subunit 7</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52143</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">Q706L</td>
<td valign="top" align="left">VACV D11-like helicase</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52129</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">B962L</td>
<td valign="top" align="left">VACV I8-like RNA helicase</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB51810</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">D1133L</td>
<td valign="top" align="left">VACV D6-like RNA helicase</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52142</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">G1340L</td>
<td valign="top" align="left">VACV A7 early transcription factor large subunit-like</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52005</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">NP868R</td>
<td valign="top" align="left">mRNA-capping enzyme</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52055</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">C475L</td>
<td valign="top" align="left">Poly(A) polymerase</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB51816</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">EP424R</td>
<td valign="top" align="left">Putative RNA methyltransferase</td>
<td valign="top" align="left">Transcription</td>
<td valign="top" align="left">AIB52114</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">EP152R</td>
<td valign="top" align="left">Protein EP152R</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51785</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">B169L</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51862</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">H339R</td>
<td valign="top" align="left">&#x03B1;-NAC binding protein pH339R</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB52116</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">M1249L</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51842</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">C129R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51831</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">K421R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51770</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">H240R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB52125</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">QP383R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB52104</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
</tr>
<tr>
<td valign="top" align="left">C122R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51826</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
<td valign="top" align="left"></td>
</tr>
<tr>
<td valign="top" align="left">M448R</td>
<td valign="top" align="left">Uncharacterized protein</td>
<td valign="top" align="left"></td>
<td valign="top" align="left">AIB51841</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td>
<td valign="top" align="left">+</td></tr>
</tbody></table>
<table-wrap-foot>
<attrib><italic>The transcripts corresponding to the proteins of Faustovirus E12 with a homolog in African swine fever virus (ASFV) are indicated with a &#x201C;+&#x201D; when detected in the corresponding dataset</italic>.</attrib>
</table-wrap-foot>
</table-wrap>
</sec>
<sec><title>Splicing Events in Faustovirus E12</title>
<p>Using a splice-aware mapper and a double-round alignment strategy, with a manual validation of splice-junctions, we were able to identify 26 potential splice-junctions represented by at least two reads, with insert sizes reaching up to 3,256 bp. Figure <xref ref-type="fig" rid="F4">4</xref> illustrates their distribution across the genome and throughout the replicative cycle of Faustovirus E12 in <italic>Vermamoeba vermiformis</italic>. We observed an uneven distribution of potential introns, with a high rate of splice-junctions grouped together in a single region of the genome and appearing in late times p.i. This region is the one predicted in previous studies to encode for the MCP of the virus (<xref ref-type="bibr" rid="B27">Klose et al., 2016</xref>). Overall, the number of junction-reads reached 2.7% (1,386) of the total mapped reads with 95.7% (1,326) of these reads aligning to the MCP encoding region.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Genome-wide map of splicing events in Faustovirus E12 across its replication cycle. Predicted and curated splice junctions resulting from the second-round mapping of RNA-seq reads against the genome with HISAT2 are reported in <italic>colored boxes</italic> corresponding to the dataset where they were detected. When in close genomic coordinates, junctions appear in two layers for display purpose. The annotated protein coding sequences are represented on the external circle in <italic>blue wedges</italic>.</p></caption>
<graphic xlink:href="fmicb-09-02534-g004.tif"/>
</fig>
</sec>
<sec><title>Faustovirus E12 Major Capsid Protein Transcription</title>
<p>In order to study the transcription of the MCP, we used both the gene prediction results of GeneWise, and the junction-reads after the first-round alignment. Junction-reads confirming the positions of predicted exons were added to the validated junctions file for the second-round alignment. The complete MCP transcript appears composed of 13 exons delineating 12 introns. Nine of the 13 exon&#x2013;intron boundaries are supported by detected junction-reads. The mean intron length is 1,273 bp, with minimum and maximum lengths of 396 and 3,256 bp, respectively, and a mean G + C content of 35.2%. Exons forming the MCP coding transcript are significantly shorter (<italic>p</italic> = 0.0007, unpaired <italic>t</italic>-test) with length varying from 13 to 527 bp for a mean length of 149 bp and a mean G + C content of 43.9%. The exonic G + C content is significantly higher than that observed in introns (<italic>p</italic> &#x003C; 0.0001, unpaired <italic>t</italic>-test).</p>
<p>An A/T substitution at transcript position 1,879 was found to generate a premature stop codon at protein position 631, suggesting the presence of a potential frame shift or post-transcriptional RNA editing mechanism.</p>
<p>Reads mapping to the MCP region represented 23.5% of the total mapped reads. The coverage of the intronic and exonic regions shifts during the replicative cycle. In early time points and until 3 h p.i, Faustovirus E12 appears to express transcripts corresponding to the intronic regions with coverage varying from 1.36 to 3.56, while for the same samples, the exonic regions have a null coverage. Starting from 6 h p.i., the exonic regions are detected and the highest coverage was observed in the sample T11H with 265.12 average coverage versus 16.89 in intronic regions of the same sample.</p>
<p>To get a closer view on the mechanisms involved in the expression of the MCP gene, exon&#x2013;intron boundaries were examined for conserved splice-sites that would suggest the presence of spliceosome-processed introns. Moreover, the intronic sequences were searched against the Rfam database for conserved motifs. Through this approach, five group I self-splicing introns were identified, and two introns were shown to contain an inserted ORF encoding a GIY-YIG homing endonuclease (Figure <xref ref-type="fig" rid="F5">5A</xref>). All the MCP gene exon-intron boundaries show non-canonical splice-sites with five types of donor-acceptor associations, among which AA/TG was the most represented (Figure <xref ref-type="fig" rid="F5">5B</xref>).</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Faustovirus E12 major capsid protein gene structure. The MCP gene contains 13 exons and 12 introns among which five are group I self-spliced introns and two contain a GIY-YIG homing endonuclease inserted ORF. The donor and acceptor splice sites are represented for potentially spliceosome-processed introns <bold>(A)</bold>. The association of donor/acceptor splice site in these introns shows the high abundance of the donor site AA and its frequent association with TG acceptor site <bold>(B)</bold>.</p></caption>
<graphic xlink:href="fmicb-09-02534-g005.tif"/>
</fig>
</sec>
</sec>
<sec><title>Discussion</title>
<sec><title>An Overview of the Transcriptional Landscape of Faustovirus E12</title>
<p>This study represents the first exploration of the transcriptional landscape of Faustovirus E12. Using total RNA sequencing of infected cells at nine different time points covering the whole replicative cycle of the virus in <italic>Vermamoeba vermiformis</italic>, we were able to follow the temporal regulation of the viral transcription. Faustovirus E12 gene expression seems to follow the classical temporal regulation described in other giant viruses of amoebae and those of the former NCLDV group. Early on, transcripts related to the ubiquitin pathway were detected. This pathway has been described as a viral adaptation mechanism against host defenses. By transcribing its own components of the ubiquitin pathway, the virus can alter the host response to infection by modulating or degrading cell proteins (<xref ref-type="bibr" rid="B23">Iyer et al., 2006</xref>). Ankyrin repeats containing proteins are also expressed early and throughout the replicative cycle. In Poxviruses, these motif containing proteins have been described as modulators of host-range and their early expression could play a role in repressing host response by targeting the NF-&#x03BA;B pathway (<xref ref-type="bibr" rid="B20">Herbert et al., 2015</xref>). In parallel, to prepare its replication, the virus encodes the ribonucleotide reductase small and large subunits that provide the dNTPs necessary for viral DNA synthesis, therefore allowing virus growth in non-dividing cells (<xref ref-type="bibr" rid="B18">Gammon et al., 2010</xref>). Early mRNA transcripts are likely expressed using viral enzymes packaged within the infectious particles. Similarly to what is described in ASFV, the viral RNA polymerase is responsible for the transcription of all the viral genes but is expressed later during the replicative cycle (<xref ref-type="bibr" rid="B35">Rodr&#x00ED;guez and Salas, 2013</xref>). Indeed, different RNA polymerase subunits, transcription factors and RNA modification enzymes are expressed late during the infection cycle, and likely translated into proteins incorporated to the virions during the assembly step. The comparison of the nucleoid components described by the proteomics analysis of ASFV particles and the late transcribed genes in Faustovirus E12 comforts this hypothesis (<xref ref-type="bibr" rid="B2">Alejo et al., 2018</xref>). Among the late transcribed gene products, we identified three enzymes homologous to the components of the base excision repair (BER) pathway described in ASFV. This pathway has been hypothesized to serve as an adaptation mechanism for viral replication in the cytoplasm of macrophages while not expressed in tissue cell cultures (<xref ref-type="bibr" rid="B17">Dixon et al., 2013</xref>). Formed by a DNA polymerase type X, a class II Apurinic/apyrimidinic (AP) endonuclease and a DNA ligase, all three detected in the transcriptome of Faustovirus E12 infected <italic>Vermamoeba vermiformis</italic>, this pathway could confirm the potential role of amoebae as training field for microorganisms&#x2019; resistance to macrophages (<xref ref-type="bibr" rid="B19">Greub and Raoult, 2004</xref>). The comparative study of Faustovirus E12 transcription on different host cells should be further investigated.</p>
<p>Faustovirus E12 DNA primase responsible for the initiation of DNA replication (AIB51821) in ASFV and the proliferating cell nuclear antigen-like protein that clamps the DNA polymerase to the DNA (AB52098) are expressed starting from 6 h p.i. with the onset of DNA replication (<xref ref-type="bibr" rid="B17">Dixon et al., 2013</xref>; <xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>). The most abundant transcripts detected in our study appear late during the infection cycle after <italic>t</italic> = 6h, at the viral factory step, and correspond to structural proteins responsible for the particles&#x2019; morphogenesis and packaging: the MCP, forming the external protein shell, is the most abundant transcript in late times. It is followed by the 220 kDa polyprotein and the 62 kDa polyprotein, both described as essential for the assembly of the core shell and the incorporation of the genomic DNA and nucleoid components in the mature virions (<xref ref-type="bibr" rid="B7">Andr&#x00E9;s et al., 2002</xref>; <xref ref-type="bibr" rid="B37">Su&#x00E1;rez et al., 2010</xref>). At <italic>t</italic> = 20 h p.i. as described in the developmental cycle of Faustovirus E12, most amoebae are lysed (<xref ref-type="bibr" rid="B34">Reteno et al., 2015</xref>) or appear at different stages of the replicative cycle of Faustovirus E12. This shows in our data with a mix of early and late transcribed genes in this dataset.</p>
<p>Although our data confirm the expression of most of Faustovirus E12 predicted protein-encoding genes, the low abundance of viral reads doesn&#x2019;t allow further interpretation. With the high abundance of amoebal rRNA and mRNA in the total RNA extract, and in the absence of the <italic>V. vermiformis</italic> complete genome sequence from international sequence databases, the reads that could not be aligned against the Faustovirus E12 genome could not be unequivocally attributed to this amoeba. The possibility of using a ribodepletion strategy should be explored for future transcriptomic studies targeting giant viruses of amoebae.</p>
</sec>
<sec><title>Corrected MCP Transcript</title>
<p>This study represents a first step forward in the understanding of the non-canonical splicing in Faustovirus E12 MCP expression. The use of paired-end 250 bp-long read sequencing on the MiSeq instrument allowed us to unambiguously identify splice junctions using a splice-aware mapper. Although HISAT2 is adapted to eukaryotic model organisms, the use of both prediction data and manual curation of the junction reads allowed us to describe a 1,939 bp-long transcript generated from a 17 kbp long gene and corresponding to the 645 amino acid-long sequence of the MCP forming the external protein shell of the mature Faustovirus E12 virions.</p>
<p>In early times of the replicative cycle, we observed transcription of regions corresponding to the introns of the MCP gene. Moreover, in the absence of RNA enrichment or selection step in our protocol, the observed transcribed introns in later times could be partially due to the presence of immature pre-mRNA particles in the total RNA extract.</p>
<p>Gene splicing was first described by two teams in Adenovirus 2 in 1977 (<xref ref-type="bibr" rid="B11">Berget et al., 1977</xref>; <xref ref-type="bibr" rid="B15">Chow et al., 1977</xref>). Subsequently, it has proved extensive in eukaryotes and as a central mechanism in gene regulation and protein diversity generation (<xref ref-type="bibr" rid="B24">Kelemen et al., 2013</xref>). The presence of introns has been suggested to increase gene expression by controlling the DNA accessibility or through the regulatory effect of some introns on the RNA polymerase (<xref ref-type="bibr" rid="B21">Hir et al., 2003</xref>). In Faustovirus E12, splicing was detected in the MCP gene, a high abundance protein encoding gene, with most spliced reads corresponding to this. This could reinforce the hypothesis that splicing plays a role in increasing gene expression (<xref ref-type="bibr" rid="B2">Alejo et al., 2018</xref>). However, the low abundance of viral reads in our datasets was a limit to the high confidence identification of other spliced genes.</p>
<p>Among giant viruses, introns were first described in the MCP encoding gene of <italic>Acanthamoeba polyphaga mimivirus</italic>, the firstly discovered giant virus of amoebae (<xref ref-type="bibr" rid="B8">Azza et al., 2009</xref>; <xref ref-type="bibr" rid="B28">Legendre et al., 2010</xref>). This gene was depicted as composed of three exons separated by two introns. A recent study compared MCP gene splicing profiles in <italic>Mimiviridae</italic> members from lineages A, B, and C and showed a lineage-independent variation in the structure and synteny of exons and intronic regions of this gene (<xref ref-type="bibr" rid="B13">Boratto et al., 2018</xref>). Introns were also detected in other conserved gene from giant viruses of amoebae, including those encoding DNA-dependent RNA polymerases and DNA polymerases (<xref ref-type="bibr" rid="B43">Yoosuf et al., 2012</xref>; <xref ref-type="bibr" rid="B33">Philippe et al., 2013</xref>; <xref ref-type="bibr" rid="B16">Deeg et al., 2018</xref>). Other NCLDV spliced genes include different genes of Paramecium bursaria chlorella virus 1 (PBCV-1) with 2 to 3 different types of introns described: spliceosome processed-like introns are present in the DNA polymerase and the pyrimidine dimer-specific glycosylase (PDG) genes and conserved in different chlorella viruses (<xref ref-type="bibr" rid="B38">Sun et al., 2000</xref>; <xref ref-type="bibr" rid="B45">Zhang et al., 2001</xref>). Group IB self-splicing introns are reported in a putative transcription factor TFII-like gene (ORF A125L) and in other regions of the viral genome where this intron propagated (<xref ref-type="bibr" rid="B12">Blanc et al., 2014</xref>).</p>
<p>In Faustovirus E12, a mixed mechanism may interfere with the expression of the MCP gene: the five group I introns could self-splice while the other exons use non-canonical splice-sites for their excision. The splice-sites, defined by the exon&#x2013;intron boundaries in this virus are different from the usual canonical splice-sites observed in amoebae and eukaryotic cells, making it difficult to accurately identify them by using existing mapping programs alone. The use of known protein sequence to validate splice-junctions and a two-round alignment approach were beneficial for the definition of the MCP gene structure. The 13 exons forming this gene exhibit higher G + C content than their long flanking introns. This difference in G + C content could therefore play a role in the recognition of the exons by the splicing machinery, lowering the constraint on the intron-defined splice-sites, as hypothesized in higher eukaryotes (<xref ref-type="bibr" rid="B5">Amit et al., 2012</xref>).</p>
<p>Moreover, this Faustovirus E12 MCP gene exhibits inserted GIY-YIG homing endonuclease encoding ORFs in two different introns. The presence of this enzyme has been thought to play a role in host competition among related viruses, impeding virus replication by cleaving genes essential to virus replication and contributing to the creation of chimeric genomic regions containing parasitic genetic elements in these genomes (<xref ref-type="bibr" rid="B16">Deeg et al., 2018</xref>).</p>
<p>As a summary, Faustovirus E12 MCP splicing presents three main features that make it unusual: (i) The number of introns: although splicing has been described in other viruses, the number of introns is generally limited to 1&#x2013;3 introns. It is, to our knowledge, the first description of a spliced gene composed of 12 introns in a virus. (ii) The size of introns: with a mean length of 1,273 bp, the introns forming the MCP gene of Faustovirus E12 are larger than previously described introns in viruses. The gene structure with multiple large introns is otherwise common in cellular organisms. (iii) The mixed mechanisms that could be in play in the splicing of the MCP gene: Faustovirus E12 MCP gene is formed both of group I introns, and potential spliceosomal introns. Moreover, the potentially spliceosomal introns use non-canonical splice-sites in their excision. Overall, the complexity and unusual splicing observed in Faustovirus E12 contribute to blurring the border between giant viruses of amoebae and cellular organisms, and thus strengthen the delineation of these viruses as different complex entities compared to classical viruses.</p>
</sec>
</sec>
<sec><title>Data Availability</title>
<p>The datasets generated for this study were submitted to the European Nucleotide Archive database and are available under the accession numbers <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="ERR2724024">ERR2724024</ext-link> to <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="ERR2724038">ERR2724038</ext-link>.</p>
</sec>
<sec><title>Author Contributions</title>
<p>ACL and BLS conceived and designed the experiments. AL, PC, and EB contributed to materials and analysis tools. ACL, AL, PC, and EB analyzed the data. ACL, AL, PC, and BLS wrote the paper.</p>
</sec>
<sec><title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was supported by a grant from the French State managed by the National Research Agency under the &#x201C;Investissements d&#x2019;avenir&#x201D; (Investments for the Future) program with the reference ANR-10-IAHU-03 (M&#x00E9;diterran&#x00E9;e Infection) and R&#x00E9;gion Provence-Alpes-C&#x00F4;te d&#x2019;Azur and European funding FEDER PRIMI.</p>
</fn>
</fn-group>
<ack>
<p>We are thankful to Prof. Christophe Beroud for fruitful discussions about splicing.</p>
</ack>
<sec sec-type="supplementary material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2018.02534/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmicb.2018.02534/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.XLSX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM2" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Afgan</surname> <given-names>E.</given-names></name> <name><surname>Baker</surname> <given-names>D.</given-names></name> <name><surname>van den Beek</surname> <given-names>M.</given-names></name> <name><surname>Blankenberg</surname> <given-names>D.</given-names></name> <name><surname>Bouvier</surname> <given-names>D.</given-names></name> <name><surname>&#x010C;ech</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>44</volume> <fpage>W3</fpage>&#x2013;<lpage>W10</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkw343</pub-id> <pub-id pub-id-type="pmid">27137889</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alejo</surname> <given-names>A.</given-names></name> <name><surname>Matamoros</surname> <given-names>T.</given-names></name> <name><surname>Guerra</surname> <given-names>M.</given-names></name> <name><surname>Andr&#x00E9;s</surname> <given-names>G.</given-names></name></person-group> (<year>2018</year>). <article-title>A proteomic atlas of the African swine fever virus particle.</article-title> <source><italic>J. Virol.</italic></source> (in press). <pub-id pub-id-type="doi">10.1128/JVI.01293-18</pub-id> <pub-id pub-id-type="pmid">30185597</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alonso</surname> <given-names>C.</given-names></name> <name><surname>Borca</surname> <given-names>M.</given-names></name> <name><surname>Dixon</surname> <given-names>L.</given-names></name> <name><surname>Revilla</surname> <given-names>Y.</given-names></name> <name><surname>Rodriguez</surname> <given-names>F.</given-names></name> <name><surname>Escribano</surname> <given-names>J. M.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>ICTV virus taxonomy profile: <italic>Asfarviridae</italic>.</article-title> <source><italic>J. Gen. Virol.</italic></source> <volume>99</volume> <fpage>10</fpage>&#x2013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1099/jgv.0.000985</pub-id> <pub-id pub-id-type="pmid">29214972</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname> <given-names>S. F.</given-names></name> <name><surname>Madden</surname> <given-names>T. L.</given-names></name> <name><surname>Sch&#x00E4;ffer</surname> <given-names>A. A.</given-names></name> <name><surname>Zhang</surname> <given-names>J.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Miller</surname> <given-names>W.</given-names></name><etal/></person-group> (<year>1997</year>). <article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>25</volume> <fpage>3389</fpage>&#x2013;<lpage>3402</lpage>. <pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id> <pub-id pub-id-type="pmid">9254694</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amit</surname> <given-names>M.</given-names></name> <name><surname>Donyo</surname> <given-names>M.</given-names></name> <name><surname>Hollander</surname> <given-names>D.</given-names></name> <name><surname>Goren</surname> <given-names>A.</given-names></name> <name><surname>Kim</surname> <given-names>E.</given-names></name> <name><surname>Gelfman</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Differential GC content between exons and introns establishes distinct strategies of splice-site recognition.</article-title> <source><italic>Cell Rep.</italic></source> <volume>1</volume> <fpage>543</fpage>&#x2013;<lpage>556</lpage>. <pub-id pub-id-type="doi">10.1016/j.celrep.2012.03.013</pub-id> <pub-id pub-id-type="pmid">22832277</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andreani</surname> <given-names>J.</given-names></name> <name><surname>Khalil</surname> <given-names>J. Y. B.</given-names></name> <name><surname>Sevvana</surname> <given-names>M.</given-names></name> <name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>Di Pinto</surname> <given-names>F.</given-names></name> <name><surname>Bitam</surname> <given-names>I.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Pacmanvirus, a new giant icosahedral virus at the crossroads between <italic>Asfarviridae</italic> and <italic>Faustovirus</italic>.</article-title> <source><italic>J. Virol.</italic></source> <volume>91</volume> <fpage>e212</fpage>&#x2013;<lpage>e217</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.00212-17</pub-id> <pub-id pub-id-type="pmid">28446673</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andr&#x00E9;s</surname> <given-names>G.</given-names></name> <name><surname>Garc&#x00ED;a-Escudero</surname> <given-names>R.</given-names></name> <name><surname>Salas</surname> <given-names>M. L.</given-names></name> <name><surname>Rodr&#x00ED;guez</surname> <given-names>J. M.</given-names></name></person-group> (<year>2002</year>). <article-title>Repression of African swine fever virus polyprotein pp220-encoding gene leads to the assembly of icosahedral core-less particles.</article-title> <source><italic>J. Virol.</italic></source> <volume>76</volume> <fpage>2654</fpage>&#x2013;<lpage>2666</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.76.6.2654-2666.2002</pub-id> <pub-id pub-id-type="pmid">11861832</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Azza</surname> <given-names>S.</given-names></name> <name><surname>Cambillau</surname> <given-names>C.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name> <name><surname>Suzan-monti</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>Revised <italic>Mimivirus</italic> major capsid protein sequence reveals intron-containing gene structure and extra domain.</article-title> <source><italic>BMC Mol. Biol.</italic></source> <volume>10</volume>:<issue>39</issue>. <pub-id pub-id-type="doi">10.1186/1471-2199-10-39</pub-id> <pub-id pub-id-type="pmid">19432951</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bajrai</surname> <given-names>L. H.</given-names></name> <name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>Azhar</surname> <given-names>E. I.</given-names></name> <name><surname>Robert</surname> <given-names>C.</given-names></name> <name><surname>Levasseur</surname> <given-names>A.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Kaumoebavirus, a new virus that clusters with <italic>Faustoviruses</italic> and <italic>Asfarviridae</italic>.</article-title> <source><italic>Viruses</italic></source> <volume>8</volume>:<issue>278</issue>. <pub-id pub-id-type="doi">10.3390/v8110278</pub-id> <pub-id pub-id-type="pmid">27801826</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>Reteno</surname> <given-names>D. G. I.</given-names></name> <name><surname>Bandaly</surname> <given-names>V.</given-names></name> <name><surname>Labas</surname> <given-names>N.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name> <name><surname>La Scola</surname> <given-names>B.</given-names></name></person-group> (<year>2016</year>). <article-title><italic>Faustoviruses</italic>: comparative genomics of new megavirales family members.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>7</volume>:<issue>3</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2016.00003</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Berget</surname> <given-names>S. M.</given-names></name> <name><surname>Moore</surname> <given-names>C.</given-names></name> <name><surname>Sharp</surname> <given-names>P. A.</given-names></name></person-group> (<year>1977</year>). <article-title>Spliced segments at the 5&#x2032; terminus of adenovirus 2 late mRNA.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>74</volume> <fpage>3171</fpage>&#x2013;<lpage>3175</lpage>. <pub-id pub-id-type="doi">10.1073/PNAS.74.8.3171</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blanc</surname> <given-names>G.</given-names></name> <name><surname>Mozar</surname> <given-names>M.</given-names></name> <name><surname>Agarkova</surname> <given-names>I. V.</given-names></name> <name><surname>Gurnon</surname> <given-names>J. R.</given-names></name> <name><surname>Yanai-Balser</surname> <given-names>G.</given-names></name> <name><surname>Rowe</surname> <given-names>J. M.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Deep RNA sequencing reveals hidden features and dynamics of early gene transcription in <italic>Paramecium bursaria Chlorella</italic> virus 1.</article-title> <source><italic>PLoS One</italic></source> <volume>9</volume>:<issue>e90989</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0090989</pub-id> <pub-id pub-id-type="pmid">24608750</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boratto</surname> <given-names>P. V. M.</given-names></name> <name><surname>Dornas</surname> <given-names>F. P.</given-names></name> <name><surname>da Silva</surname> <given-names>L. C. F.</given-names></name> <name><surname>Rodrigues</surname> <given-names>R. A. L.</given-names></name> <name><surname>Oliveira</surname> <given-names>G. P.</given-names></name> <name><surname>Cortines</surname> <given-names>J. R.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Analyses of the kroon virus major capsid gene and its transcript highlight a distinct pattern of gene evolution and splicing among mimiviruses.</article-title> <source><italic>J. Virol.</italic></source> <volume>92</volume> <fpage>e1782</fpage>&#x2013;<lpage>e1717</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.01782-17</pub-id> <pub-id pub-id-type="pmid">29118120</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cherif Louazani</surname> <given-names>A.</given-names></name> <name><surname>Andreani</surname> <given-names>J.</given-names></name> <name><surname>Ouarhache</surname> <given-names>M.</given-names></name> <name><surname>Aherfi</surname> <given-names>S.</given-names></name> <name><surname>Baptiste</surname> <given-names>E.</given-names></name> <name><surname>Levasseur</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Genome sequences of new <italic>Faustovirus</italic> strains st1 and lc9, isolated from the South of France.</article-title> <source><italic>Genome Announc.</italic></source> <volume>5</volume> <fpage>e613</fpage>&#x2013;<lpage>e617</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00613-17</pub-id> <pub-id pub-id-type="pmid">28705976</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chow</surname> <given-names>L. T.</given-names></name> <name><surname>Gelinas</surname> <given-names>R. E.</given-names></name> <name><surname>Broker</surname> <given-names>T. R.</given-names></name> <name><surname>Roberts</surname> <given-names>R. J.</given-names></name></person-group> (<year>1977</year>). <article-title>An amazing sequence arrangement at the 5&#x2032; ends of adenovirus 2 messenger RNA.</article-title> <source><italic>Cell</italic></source> <volume>12</volume> <fpage>1</fpage>&#x2013;<lpage>8</lpage>. <pub-id pub-id-type="doi">10.1016/0092-8674(77)90180-5</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deeg</surname> <given-names>C. M.</given-names></name> <name><surname>Chow</surname> <given-names>C.-E. T.</given-names></name> <name><surname>Suttle</surname> <given-names>C. A.</given-names></name></person-group> (<year>2018</year>). <article-title>The kinetoplastid-infecting <italic>Bodo saltans</italic> virus (BsV), a window into the most abundant giant viruses in the sea.</article-title> <source><italic>eLife</italic></source> <volume>7</volume>:<issue>e33014</issue>. <pub-id pub-id-type="doi">10.7554/eLife.33014</pub-id> <pub-id pub-id-type="pmid">29582753</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dixon</surname> <given-names>L. K.</given-names></name> <name><surname>Chapman</surname> <given-names>D. A. G.</given-names></name> <name><surname>Netherton</surname> <given-names>C. L.</given-names></name> <name><surname>Upton</surname> <given-names>C.</given-names></name></person-group> (<year>2013</year>). <article-title>African swine fever virus replication and genomics.</article-title> <source><italic>Virus Res.</italic></source> <volume>173</volume> <fpage>3</fpage>&#x2013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1016/j.virusres.2012.10.020</pub-id> <pub-id pub-id-type="pmid">23142553</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gammon</surname> <given-names>D. B.</given-names></name> <name><surname>Gowrishankar</surname> <given-names>B.</given-names></name> <name><surname>Duraffour</surname> <given-names>S.</given-names></name> <name><surname>Andrei</surname> <given-names>G.</given-names></name> <name><surname>Upton</surname> <given-names>C.</given-names></name> <name><surname>Evans</surname> <given-names>D. H.</given-names></name></person-group> (<year>2010</year>). <article-title><italic>Vaccinia virus</italic> - encoded ribonucleotide reductase subunits are differentially required for replication and pathogenesis.</article-title> <source><italic>PLoS Pathog.</italic></source> <volume>6</volume>:<issue>e1000984</issue>. <pub-id pub-id-type="doi">10.1371/journal.ppat.1000984</pub-id> <pub-id pub-id-type="pmid">20628573</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Greub</surname> <given-names>G.</given-names></name> <name><surname>Raoult</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>Microorganisms resistant to free-living Amoebae.</article-title> <source><italic>Clin. Microbiol. Rev.</italic></source> <volume>17</volume> <fpage>413</fpage>&#x2013;<lpage>433</lpage>. <pub-id pub-id-type="doi">10.1128/CMR.17.2.413-433.2004</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herbert</surname> <given-names>M. H.</given-names></name> <name><surname>Squire</surname> <given-names>C. J.</given-names></name> <name><surname>Mercer</surname> <given-names>A. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Poxviral ankyrin proteins.</article-title> <source><italic>Viruses</italic></source> <volume>7</volume> <fpage>709</fpage>&#x2013;<lpage>738</lpage>. <pub-id pub-id-type="doi">10.3390/v7020709</pub-id> <pub-id pub-id-type="pmid">25690795</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hir</surname> <given-names>H. L.</given-names></name> <name><surname>Nott</surname> <given-names>A.</given-names></name> <name><surname>Moore</surname> <given-names>M. J.</given-names></name></person-group> (<year>2003</year>). <article-title>How introns influence and enhance eukaryotic gene expression.</article-title> <source><italic>Trends Biochem. Sci.</italic></source> <volume>28</volume> <fpage>215</fpage>&#x2013;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.1016/S0968-0004(03)00052-5</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2001</year>). <article-title>Common origin of four diverse families of large Eukaryotic DNA viruses.</article-title> <source><italic>J. Virol.</italic></source> <volume>75</volume> <fpage>11720</fpage>&#x2013;<lpage>11734</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.75.23.11720-11734.2001</pub-id> <pub-id pub-id-type="pmid">11689653</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iyer</surname> <given-names>L. M.</given-names></name> <name><surname>Balaji</surname> <given-names>S.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name> <name><surname>Aravind</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.</article-title> <source><italic>Virus Res.</italic></source> <volume>117</volume> <fpage>156</fpage>&#x2013;<lpage>184</lpage>. <pub-id pub-id-type="doi">10.1016/j.virusres.2006.01.009</pub-id> <pub-id pub-id-type="pmid">16494962</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kelemen</surname> <given-names>O.</given-names></name> <name><surname>Convertini</surname> <given-names>P.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name> <name><surname>Wen</surname> <given-names>Y.</given-names></name> <name><surname>Shen</surname> <given-names>M.</given-names></name> <name><surname>Falaleeva</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>Function of alternative splicing.</article-title> <source><italic>Gene</italic></source> <volume>514</volume> <fpage>1</fpage>&#x2013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1016/j.gene.2012.07.083</pub-id> <pub-id pub-id-type="pmid">22909801</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname> <given-names>D.</given-names></name> <name><surname>Langmead</surname> <given-names>B.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2015</year>). <article-title>HISAT: a fast spliced aligner with low memory requirements.</article-title> <source><italic>Nat. Methods</italic></source> <volume>12</volume> <fpage>357</fpage>&#x2013;<lpage>360</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.3317</pub-id> <pub-id pub-id-type="pmid">25751142</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klasberg</surname> <given-names>S.</given-names></name> <name><surname>Bitard-Feildel</surname> <given-names>T.</given-names></name> <name><surname>Mallet</surname> <given-names>L.</given-names></name></person-group> (<year>2016</year>). <article-title>Computational identification of novel genes: current and future perspectives.</article-title> <source><italic>Bioinforma. Biol. Insights</italic></source> <volume>10</volume> <fpage>121</fpage>&#x2013;<lpage>131</lpage>. <pub-id pub-id-type="doi">10.4137/BBI.S39950</pub-id> <pub-id pub-id-type="pmid">27493475</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klose</surname> <given-names>T.</given-names></name> <name><surname>Reteno</surname> <given-names>D. G.</given-names></name> <name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>Hollerbach</surname> <given-names>A.</given-names></name> <name><surname>Colson</surname> <given-names>P.</given-names></name> <name><surname>La Scola</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2016</year>). <article-title>Structure of <italic>Faustovirus</italic>, a large dsDNA virus.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>113</volume> <fpage>6206</fpage>&#x2013;<lpage>6211</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1523999113</pub-id> <pub-id pub-id-type="pmid">27185929</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Legendre</surname> <given-names>M.</given-names></name> <name><surname>Audic</surname> <given-names>S.</given-names></name> <name><surname>Poirot</surname> <given-names>O.</given-names></name> <name><surname>Hingamp</surname> <given-names>P.</given-names></name> <name><surname>Seltzer</surname> <given-names>V.</given-names></name> <name><surname>Byrne</surname> <given-names>D.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in <italic>Mimivirus</italic>.</article-title> <source><italic>Genome Res.</italic></source> <volume>20</volume> <fpage>664</fpage>&#x2013;<lpage>674</lpage>. <pub-id pub-id-type="doi">10.1101/gr.102582.109</pub-id> <pub-id pub-id-type="pmid">20360389</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Legendre</surname> <given-names>M.</given-names></name> <name><surname>Bartoli</surname> <given-names>J.</given-names></name> <name><surname>Shmakova</surname> <given-names>L.</given-names></name> <name><surname>Jeudy</surname> <given-names>S.</given-names></name> <name><surname>Labadie</surname> <given-names>K.</given-names></name> <name><surname>Adrait</surname> <given-names>A.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a <italic>pandoravirus</italic> morphology.</article-title> <source><italic>Proc. Natl. Acad. Sci.</italic></source> <volume>111</volume> <fpage>4274</fpage>&#x2013;<lpage>4279</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1320670111</pub-id> <pub-id pub-id-type="pmid">24591590</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Legendre</surname> <given-names>M.</given-names></name> <name><surname>Lartigue</surname> <given-names>A.</given-names></name> <name><surname>Bertaux</surname> <given-names>L.</given-names></name> <name><surname>Jeudy</surname> <given-names>S.</given-names></name> <name><surname>Bartoli</surname> <given-names>J.</given-names></name> <name><surname>Lescot</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>In-depth study of <italic>Mollivirus sibericum</italic>, a new 30,000-y-old giant virus infecting <italic>Acanthamoeba</italic>.</article-title> <source><italic>Proc. Natl. Acad. Sci.</italic></source> <volume>112</volume> <fpage>E5327</fpage>&#x2013;<lpage>E5335</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.1510795112</pub-id> <pub-id pub-id-type="pmid">26351664</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Legendre</surname> <given-names>M.</given-names></name> <name><surname>Santini</surname> <given-names>S.</given-names></name> <name><surname>Rico</surname> <given-names>A.</given-names></name> <name><surname>Abergel</surname> <given-names>C.</given-names></name> <name><surname>Claverie</surname> <given-names>J.-M.</given-names></name></person-group> (<year>2011</year>). <article-title>Breaking the 1000-gene barrier for <italic>Mimivirus</italic> using ultra-deep genome and transcriptome sequencing.</article-title> <source><italic>Virol. J.</italic></source> <volume>8</volume>:<issue>99</issue>. <pub-id pub-id-type="doi">10.1186/1743-422X-8-99</pub-id> <pub-id pub-id-type="pmid">21375749</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname> <given-names>W.</given-names></name> <name><surname>Cowley</surname> <given-names>A.</given-names></name> <name><surname>Uludag</surname> <given-names>M.</given-names></name> <name><surname>Gur</surname> <given-names>T.</given-names></name> <name><surname>McWilliam</surname> <given-names>H.</given-names></name> <name><surname>Squizzato</surname> <given-names>S.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>The EMBL-EBI bioinformatics web and programmatic tools framework.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>43</volume> <fpage>W580</fpage>&#x2013;<lpage>W584</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv279</pub-id> <pub-id pub-id-type="pmid">25845596</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Philippe</surname> <given-names>N.</given-names></name> <name><surname>Legendre</surname> <given-names>M.</given-names></name> <name><surname>Doutre</surname> <given-names>G.</given-names></name> <name><surname>Cout&#x00E9;</surname> <given-names>Y.</given-names></name> <name><surname>Poirot</surname> <given-names>O.</given-names></name> <name><surname>Lescot</surname> <given-names>M.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title><italic>Pandoraviruses</italic>: amoeba viruses with genomes Up to 2.5 Mb reaching that of parasitic Eukaryotes.</article-title> <source><italic>Science</italic></source> <volume>341</volume> <fpage>281</fpage>&#x2013;<lpage>286</lpage>. <pub-id pub-id-type="doi">10.1126/science.1239181</pub-id> <pub-id pub-id-type="pmid">23869018</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Reteno</surname> <given-names>D. G.</given-names></name> <name><surname>Benamar</surname> <given-names>S.</given-names></name> <name><surname>Bou Khalil</surname> <given-names>J.</given-names></name> <name><surname>Andreani</surname> <given-names>J.</given-names></name> <name><surname>Armstrong</surname> <given-names>N.</given-names></name> <name><surname>Klose</surname> <given-names>T.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title><italic>Faustovirus</italic>, an asfarvirus-related new lineage of giant viruses infecting amoebae.</article-title> <source><italic>J. Virol.</italic></source> <volume>89</volume> <fpage>6585</fpage>&#x2013;<lpage>6594</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.00115-15</pub-id> <pub-id pub-id-type="pmid">25878099</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rodr&#x00ED;guez</surname> <given-names>J. M.</given-names></name> <name><surname>Salas</surname> <given-names>M. L.</given-names></name></person-group> (<year>2013</year>). <article-title>African swine fever virus transcription.</article-title> <source><italic>Virus Res.</italic></source> <volume>173</volume> <fpage>15</fpage>&#x2013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1016/j.virusres.2012.09.014</pub-id> <pub-id pub-id-type="pmid">23041356</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmieder</surname> <given-names>R.</given-names></name> <name><surname>Edwards</surname> <given-names>R.</given-names></name></person-group> (<year>2011</year>). <article-title>Quality control and preprocessing of metagenomic datasets.</article-title> <source><italic>Bioinformatics</italic></source> <volume>27</volume> <fpage>863</fpage>&#x2013;<lpage>864</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btr026</pub-id> <pub-id pub-id-type="pmid">21278185</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Su&#x00E1;rez</surname> <given-names>C.</given-names></name> <name><surname>Salas</surname> <given-names>M. L.</given-names></name> <name><surname>Rodr&#x00ED;guez</surname> <given-names>J. M.</given-names></name></person-group> (<year>2010</year>). <article-title>African swine fever virus polyprotein pp62 is essential for viral core development.</article-title> <source><italic>J. Virol.</italic></source> <volume>84</volume> <fpage>176</fpage>&#x2013;<lpage>187</lpage>. <pub-id pub-id-type="doi">10.1128/JVI.01858-09</pub-id> <pub-id pub-id-type="pmid">19846532</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>McCullough</surname> <given-names>A. K.</given-names></name> <name><surname>Wood</surname> <given-names>T. G.</given-names></name> <name><surname>Lloyd</surname> <given-names>R. S.</given-names></name> <name><surname>Adams</surname> <given-names>B.</given-names></name><etal/></person-group> (<year>2000</year>). <article-title>Intron conservation in a UV-specific DNA repair gene encoded by <italic>Chlorella</italic> viruses.</article-title> <source><italic>J. Mol. Evol.</italic></source> <volume>50</volume> <fpage>82</fpage>&#x2013;<lpage>92</lpage>. <pub-id pub-id-type="doi">10.1007/s002399910009</pub-id> <pub-id pub-id-type="pmid">10654262</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Temmam</surname> <given-names>S.</given-names></name> <name><surname>Monteil-bouchard</surname> <given-names>S.</given-names></name> <name><surname>Sambou</surname> <given-names>M.</given-names></name> <name><surname>Aubadie-ladrix</surname> <given-names>M.</given-names></name> <name><surname>Azza</surname> <given-names>S.</given-names></name> <name><surname>Decloquement</surname> <given-names>P.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Faustovirus-like asfarvirus in hematophagous biting midges and their vertebrate hosts.</article-title> <source><italic>Front. Microbiol.</italic></source> <volume>6</volume>:<issue>1406</issue>. <pub-id pub-id-type="doi">10.3389/fmicb.2015.01406</pub-id> <pub-id pub-id-type="pmid">26733117</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thorvaldsdottir</surname> <given-names>H.</given-names></name> <name><surname>Robinson</surname> <given-names>J. T.</given-names></name> <name><surname>Mesirov</surname> <given-names>J. P.</given-names></name></person-group> (<year>2013</year>). <article-title>Integrative genomics viewer ( IGV ): high-performance genomics data visualization and exploration.</article-title> <source><italic>Brief. Bioinform.</italic></source> <volume>14</volume> <fpage>178</fpage>&#x2013;<lpage>192</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbs017</pub-id> <pub-id pub-id-type="pmid">22517427</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname> <given-names>C.</given-names></name> <name><surname>Williams</surname> <given-names>B. A.</given-names></name> <name><surname>Pertea</surname> <given-names>G.</given-names></name> <name><surname>Mortazavi</surname> <given-names>A.</given-names></name> <name><surname>Kwan</surname> <given-names>G.</given-names></name> <name><surname>van Baren</surname> <given-names>M. J.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.</article-title> <source><italic>Nat. Biotechnol.</italic></source> <volume>28</volume> <fpage>511</fpage>&#x2013;<lpage>518</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.1621</pub-id> <pub-id pub-id-type="pmid">20436464</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname> <given-names>Z.</given-names></name> <name><surname>Gerstein</surname> <given-names>M.</given-names></name> <name><surname>Snyder</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>RNA-Seq: a revolutionary tool for transcriptomics.</article-title> <source><italic>Nat. Rev. Genet.</italic></source> <volume>10</volume> <fpage>57</fpage>&#x2013;<lpage>63</lpage>. <pub-id pub-id-type="doi">10.1038/nrg2484</pub-id> <pub-id pub-id-type="pmid">19015660</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yoosuf</surname> <given-names>N.</given-names></name> <name><surname>Yutin</surname> <given-names>N.</given-names></name> <name><surname>Colson</surname> <given-names>P.</given-names></name> <name><surname>Shabalina</surname> <given-names>S. A.</given-names></name> <name><surname>Pagnier</surname> <given-names>I.</given-names></name> <name><surname>Robert</surname> <given-names>C.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Related giant viruses in distant locations and different habitats: <italic>Acanthamoeba polyphaga</italic> moumouvirus represents a third lineage of the Mimiviridae that is close to the megavirus lineage.</article-title> <source><italic>Genome Biol. Evol.</italic></source> <volume>4</volume> <fpage>1324</fpage>&#x2013;<lpage>1330</lpage>. <pub-id pub-id-type="doi">10.1093/gbe/evs109</pub-id> <pub-id pub-id-type="pmid">23221609</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yutin</surname> <given-names>N.</given-names></name> <name><surname>Wolf</surname> <given-names>Y. I.</given-names></name> <name><surname>Koonin</surname> <given-names>E. V.</given-names></name></person-group> (<year>2014</year>). <article-title>Origin of giant viruses from smaller DNA viruses not from a fourth domain of cellular life.</article-title> <source><italic>Virology</italic></source> <volume>46</volume> <fpage>38</fpage>&#x2013;<lpage>52</lpage>. <pub-id pub-id-type="doi">10.1016/j.virol.2014.06.032</pub-id> <pub-id pub-id-type="pmid">25042053</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Adams</surname> <given-names>B.</given-names></name> <name><surname>Sun</surname> <given-names>L.</given-names></name> <name><surname>Burbank</surname> <given-names>D. E.</given-names></name> <name><surname>Van Etten</surname> <given-names>J. L.</given-names></name></person-group> (<year>2001</year>). <article-title>Intron conservation in the DNA polymerase gene encoded by <italic>Chlorella</italic> viruses.</article-title> <source><italic>Virology</italic></source> <volume>285</volume> <fpage>313</fpage>&#x2013;<lpage>321</lpage>. <pub-id pub-id-type="doi">10.1006/viro.2001.0935</pub-id> <pub-id pub-id-type="pmid">11437665</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn01"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/pub/wolf/COGs/NCVOG/">ftp://ftp.ncbi.nih.gov/pub/wolf/COGs/NCVOG/</ext-link></p></fn>
</fn-group>
</back>
</article>