Evolution and Phylogeny of Large DNA Viruses, Mimiviridae and Phycodnaviridae Including Newly Characterized Heterosigma akashiwo Virus

Nucleocytoplasmic DNA viruses are a large group of viruses that harbor double-stranded DNA genomes with sizes of several 100 kbp, challenging the traditional concept of viruses as small, simple ‘organisms at the edge of life.’ The most intriguing questions about them may be their origin and evolution, which have yielded the variety we see today. Specifically, the phyletic relationship between two giant dsDNA virus families that are presumed to be close, Mimiviridae, which infect Acanthamoeba, and Phycodnaviridae, which infect algae, is still obscure and needs to be clarified by in-depth analysis. Here, we studied Mimiviridae–Phycodnaviridae phylogeny including the newly identified Heterosigma akashiwo virus strain HaV53. Gene-to-gene comparison of HaV53 with other giant dsDNA viruses showed that only a small proportion of HaV53 genes show similarities with the others, revealing its uniqueness among Phycodnaviridae. Phylogenetic/genomic analysis of Phycodnaviridae including HaV53 revealed that the family can be classified into four distinctive subfamilies, namely, Megaviridae (Mimivirus-like), Chlorovirus-type, and Coccolitho/Phaeovirus-type groups, and HaV53 independent of the other three groups. Several orthologs found in specific subfamilies while absent from the others were identified, providing potential family marker genes. Finally, reconstruction of the evolutionary history of Phycodnaviridae and Mimiviridae revealed that these viruses are descended from a common ancestor with a small set of genes and reached their current diversity by differentially acquiring gene sets during the course of evolution. Our study illustrates the phylogeny and evolution of Mimiviridae–Phycodnaviridae and proposes classifications that better represent phyletic relationships among the family members.

Nucleocytoplasmic large DNA viruses infecting marine algae as natural hosts, including HaV53, have been collectively classified as a family, Phycodnaviridae, with prefix 'phyco-' meaning algae. The term 'algae' is rather broad and includes both multi-and unicellular, brown and green, aquatic photosynthetic organisms. Presumably reflecting the diversity of host species, several past studies have suggested that Phycodnaviridae consists of groups of viruses with different features. Notably, while classified as phycodnaviruses, Chrysochromulina ericina virus (CeV), PgV and AaV are similar in many respects to Mimiviridae (Brussaard et al., 2004;Monier et al., 2008;Moniruzzaman et al., 2014), giant NCLDVs that infect Acanthamoeba (Raoult et al., 2004(Raoult et al., , 2007Claverie et al., 2006Claverie et al., , 2009Raoult and Forterre, 2008), rather than to other phycodnaviruses. Several studies strongly suggest that these three viruses are closely related to Mimivirus (Yutin et al., 2009(Yutin et al., , 2013(Yutin et al., , 2014Fischer et al., 2010;Thomas et al., 2011;Santini et al., 2013;Moniruzzaman et al., 2014;Legendre et al., 2015). The terms 'extended Mimivirus' and 'Megaviridae' have been used in several studies to encompass the Mimivirus lineages, smaller Mimiviruses (i.e., Cafeteria roenbergensis virus, CroV), and phycodnaviruses that share characteristic features with Mimiviruses, although 'Megaviridae' is yet to be adopted by the International Committee for Taxonomy of Viruses (ICTV) as a family classification. Collectively, the current family Phycodnaviridae, officially adopted by ICTV, includes viruses that may not necessarily be closely related evolutionarily or phylogenetically.
While the origin and evolutionary history of NCLDVs in general are of great interest, the diversity of NCLDVs imposes difficulties in collectively evaluating their phylogenetic relationships. Attempts to classify NCLDVs and to infer their evolutionary history have involved comparisons of their sequences and genomic compositions. Orthologous genes in NCLDVs can be identified by direct comparisons of viral genes, or using the established Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) database (Yutin et al., 2009). In this study, we analyzed the phylogenetic relatedness of HaV53 to members of the Phycodnaviridae and Mimiviridae. Our results underscore the validity of demands for the reclassification of the current Phycodnaviridae family, in addition to providing insights into the evolution of Mimiviridae and Phycodnaviridae including HaV53. For NCLDV CP (NCVOG0022), D5-like helicase primase (NCVOG0023), and DNA polymerase B (NCVOG0038) phylogenetic analyses (Supplementary Figure S2), the orthologs were determined by choosing best-hit target sequences obtained by BLASTP search (E-value < 10 −20 ) using the NCVOG orthologs as queries, and the databases were created from the amino acid sequences coded by the genomes of above mentioned viruses and of Ambystoma tigrinum virus Software BLAST+ (version 2.2.31) executables were downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/. Databases for BLASTP and PSI-BLAST searches were constructed according to the provided manual. For phyletic studies, 18 viruses, namely AaV, AtCV, BpV, CeV, CroV, EhV86, EsV, FsV, HaV53, MegaV, MimiV, MoumouV, MpV1, OlV5, OtV1, OtV5, PBCV1, and PgV were selected. Proteins equal to or larger than 100 aa encoded by each virus were extracted and used as queries. When a single open reading frames (ORF) hit multiple target sequences in databases, the hit with the highest bit score was selected for further study. Similarly, when multiple ORFs in a viral genome hit the same target sequence in NCVOG, the ORF that hit with the highest bit score was selected for further study to identify a true ortholog rather than paralogs.

Sequence Information and Database
Multiple sequence alignments and phylogenetic reconstructions by neighbor-joining were performed in ClustalX version 2.1 (Larkin et al., 2007). Poorly conserved regions and positions including gaps were removed prior to phylogenetic analysis. Neighbor-joining phylogenetic inferences were conducted, and the confidence of the branching was assessed using 1,000 bootstrap resampling replicates of the analyzed dataset.
Pan-genome analysis was conducted using PGAP software using cut-off values of 20% identity and E-value < 10 −5 (Zhao et al., 2012). In this analysis, orthologs in each virus in the dataset were determined by all-to-all BLASTP search followed by MCL, and phyletic inference calculated by neighbor-joining based on the presence/absence matrix of the orthologs in each combination of the viruses (Zhao et al., 2012).
Gain and loss of gene families during evolution was mapped on a guide tree based on the concatenated sequence of nine preserved genes ( Figure 4A) using COUNT software (Csuros, 2010;Kamneva and Ward, 2014). For each gene family, Wagner parsimony with gene gain penalties of 1 and 5 were used to infer the most parsimonious ancestral gene sets with different gain/loss pressures. We chose Wagner parsimony, rather than other protocols, because it allows multiple gains with penalties and infers gene family expansion and contraction (Csuros, 2010). For both PGAP and COUNT analyses, we selected genes coding for proteins with 100 aa or more. The resulting trees from all the analyses were visualized using Geneious 9.0.5.

HaV53 Genes and NCLDV Orthologs
Recently, we completed sequencing of the HaV53 genome (GenBank accession number KX008963, (Ogura et al., 2016). To gain further insight into the potential functions of HaV53 gene products, we predicted the functions of HaV53 ORF using the NCVOG database (Yutin et al., 2009(Yutin et al., , 2014 and the NCBI NR protein database. HaV53 genes were annotated by searching the databases using BLASTP with E-value cutoff set to 10 −5 (Supplementary Figure S1; Supplementary  Table S1). All the search results were further supported by PSI-BLAST results (E-value < 10 −8 , three iterations), confirming homology to the target sequences. Seventy-four HaV53 ORFs exhibited similarities to NCVOGs, while for 27, the best-hit proteins were from bacterial, archaeal, and eukaryotic genomes (Supplementary Table S1; Figure 1). The remaining HaV53 ORFs did not exhibit significant similarity to any protein in the databases (Figure 1). As expected, HaV53 possesses orthologs of four previously sequenced HaV01 genes, AGB-1, UKCH-2, NCLDV major capsid protein, and B-family DNA polymerase, with high sequence similarities (Supplementary Table S1). Like other NCLDVs, multiple occurrences of genes annotated with identical functions were also observed within the subset of HaV53 ORFs. To evaluate if these genes were originated from FIGURE 1 | Phyletic distribution of HaV53 gene homologs. The best-hit homolog in the NCBI NR database to the each HaV53 open reading frames (ORF) was determined by BLASTP (E-value < 10 −5 ), and source organisms were identified. insertion of multiple homologous genes from different source organisms, or due to duplication of a gene acquired from single horizontal gene transfer, we evaluated the homologies among the HaV53 ORFs (paralogs) and compared their homologies to their potential orthologs found in NR database by BLASTP search (Table 1). When homologies among the HaV53 paralogs and their orthologs in other organisms were compared, identities among HaV53 paralogs were much higher than identities to orthologs, presumably suggesting that these redundancies were based on recent gene duplication rather than horizontal gene transfer from the species with the closest orthologs (Santini et al., 2013).
To certify that HaV53 is indeed a phycodnavirus, we conducted phylogenetic analysis of DNA polymerase B, capsid protein, and D5-like helicase primase, and found that HaV53 genes cluster with their orthologs from other phycodnaviruses, confirming that HaV53 is a new member of the family (Supplementary Figure S2).

Similarity of HaV53 to Other NCLDVs
To further evaluate the relatedness of HaV53 and other NCLDVs, we first conducted a blanket comparison of all the HaV53 ORFs with NCLDV genes. To this end, we constructed an NCLDV protein sequence database consisting of all the proteins encoded by representative, fully sequenced and annotated NCLDVs, including Megaviridae, Phycodnaviridae, Marseilleviridae, Ascoviridae, Asfarviridae, Baculoviridae, Herpesviridae, Iridoviridae, Poxviridae, Pandraviruses, and Pithovirus. First, we identified the NCLDV orthologs of each gene in HaV53, and identified the source viral species of the best-hit target genes (Figure 2). For comparison, the same analyses were conducted for genes carried by 17 members of Phycodnaviridae and proposed Megaviridae (Figure 2). Each virus showed a characteristic pattern in the distribution of best-hit sources. As expected, three lineages of mimiviruses, MimiV (lineage A), MoumouV (lineage B), and MegaV (lineage C), were found to be significantly related by this analysis; about 88, 92, and 92%, of MimiV, MoumouV, and MegaV genes, respectively, were most homologous to the other two mimivirus lineages. Among the proposed Megaviridae with smaller genome sizes, AaV and CroV exhibited similarities to both mimiviruses and smaller Megaviridae, while PgV and CeV showed significant relatedness to each other. AaV, PgV, and CeV are currently classified as Phycodnaviridae, Redundant genes, or paralogs, in HaV53, presumably products of gene duplications, were identified by all-to-all BLASTP using HaV53 ORFs as query and database.
FIGURE 2 | Source viral species of the best-matching nucleocytoplasmic large DNA viruses (NCLDV) orthologs for genes from HaV53, Megaviridae, and Phycodnaviridae. The best-hit homologs in the NCLDV database, to viral ORFs were determined by BLASTP (E-value < 10 −5 ), and source NCLDVs were identified.
Frontiers in Microbiology | www.frontiersin.org although they did not show strong similarities to the other members of the family. A group including chloroviruses, OtV1/5, OlV5, MpV, and BpV showed large proportions of orthologs identified from the group. EhV did not show significant similarities to any NCLDV, and contained the smallest proportion of genes (16.5%) showing similarities to NCLDVs. As expected, two phaeoviruses, EsV and FsV, showed significant similarities to each other. HaV53 genes showed a similar degree of similarity to both Megaviridae and Phycodnaviridae minus Megaviridae, with 14.0 and 12.8%, respectively. These results indicate that members of Phycodnaviridae exhibit homologies to particular group of other family members, or showed low homologies to others. These observation implies that Phycodnaviridae comprise several cluster of the members, which are not necessarily homologous to each other. We next explored the presence/absence of all NCVOGs in the 18 viruses. We searched for NCVOG homologs for each viral ORF with size 100 aa or larger using BLASTP (E-value < 10 −5 ), then identified the target sequence that gave the highest bit score (Figure 3). When more than one viral ORF hit the same NCVOG, the ORF that gave the highest bit score was identified as the NCVOG ortholog. Nine NCVOGs were found in all the analyzed viruses including HaV53 FIGURE 3 | Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOG) orthologs in Phycodnaviridae and Mimiviridae. The NCVOG orthologs in each virus were identified by BLASTP search (E-value < 10 −5 ), and similarities between the query viral factors and NCVOGs are displayed. When several different viral ORFs hit an NCVOG, the viral factor with the highest bit score was chosen. (Figure 3): NCLDV major capsid protein (NCVOG0022), D5-like helicase primase (NCVOG0023), DNA polymerase B (NCVOG0038), an uncharacterized protein (NCVOG0158), proliferating cell nuclear antigen protein (NCVOG0241), A32-type packaging ATPase (NCVOG0249), poxvirus late transcription factor VLTF3-like protein (NCVOG0262), A1L transcription factor/late transcription factor VLTF2 (NCVOG1164), and a protein with uncharacterized C-terminal domain conserved in iridovirus, phycodnavirus, and mimivirus (NCVOG1423). The homologies among the orthologs were further confirmed by reciprocal BLASTP searches. For three orthologs, NCVOG0038, NCVOG0249, and NCVOG0262, all combinations identified from the eighteen viruses showed significant similarities to each other (E-value < 10 −5 ), and all combinations for NCVOG1423 showed significant similarities except CeV and EhV. The other five orthologs showed similarities for some of the analyzed combinations ( Supplementary Figures S3A-E Figure S1C), suggesting that NCVOG0023 may not be suitable to be used as a hallmark gene for classification.

Phylogenetic/Phylogenomic Positions of HaV53 among Phycodnaviridae and Megaviridae Members
Based on the set of common orthologs identified above, we next analyzed phyletic relationships between the 18 viruses ( Figure 4). As previously reported, three viruses currently classified as phycodnaviruses, AaV, PgV, and CeV, associate with mimivirus and CroV (Santini et al., 2013;Moniruzzaman et al., 2014), segregating from the remaining Phycodnaviridae and HaV53, with some ambiguity concerning the position of AaV (Figure 4A). The rest of Phycodnaviridae clustered into three distinct groups: one with EhV, EsV, and FsV; HaV53; and the seven remaining viruses including PBCV ( Figure 4A).
Further, we analyzed phyletic relations by pan-genome analysis ( Figure 4B). This approach allows us to evaluate phylogenomic relationships among the dataset by de novo clustering, and is thus independent of the NCVOG database. Other than AaV and HaV53 being clustered with low confidence, overall taxonomic/phylogenetic relationships were coherent with the results based on phylogenetic analysis (Figures 4A,B). Results of these two analyses revealed that current Phycodnaviridae can be categorized into four groups. The first group includes AtCV, PBCV, OlV5, OtV1, OtV5, MpV, and BpV. The second group consists of EhV, EsV, and FsV.
The third group includes Phycodnavidae members that belong to Megaviridae, with some ambiguity concerning the position of AaV. HaV53 appears to be independent of the other three groups, positioned between the EhV group and the Megaviridaephycodnaviruses.
As shown in Figure 3, we also identified several NCVOGs that characteristically exist in specific viral groups deduced from the phylogenetic/pan-genomic analysis (Figure 4). For example, five NCVOGs with unknown functions (NCVOG1111, 1220(NCVOG1111, , 1265(NCVOG1111, , 1318(NCVOG1111, , and 1329 were found in the PBCV group, but not in others. Also, four NCVOGs, unknown functions (NCVOG0877, 0881, 1017, and 5034) were found in EhV, EsV, and FsV exclusively, but not in other groups. In contrast, metallopeptidase WLM (NCVOG1120), transcription initiation factor IIB (NCVOG1127), and two uncharacterized proteins (NCVOG1131 and 1216) exist in all the analyzed viruses but EhV, EsV, and FsV. Some of the previously identified Megaviridae hallmark genes, DNA topoisomerase I (NCVOG0033), ATP-dependent protease (NCVOG0228), and DNA-directed RNA polymerase II subunit RPB7 (NCVOG2249), were found in the proposed Megaviridae including mimiviruses. Further, homologies among these orthologs were analyzed, and the orthologs that showed significant similarities between all combinations of species are summarized in Table 2.
One of the previously identified hallmark genes (Ogata et al., 2011), MutS7 (NCVOG0105), was not found in MoumouV by this analysis. However, MoumouV does possess MutS (YP_007354438.1), while it is categorized as NCVOG0199 by this method. The protein possesses MutS7 features including MutS domains II, III, and IV, and the orthologs in Megaviridae, including MoumouV NCVOG0199, exhibit significant similarities in all combinations, while being absent in other Phycodnaviridae ( Table 2; Supplementary Figure S4A). Furthermore, asparagine synthetase (NCVOG0061) and polyA polymerase (NCVOG0575) genes were not found in AaV (Supplementary Figures S4B,C). With consistently higher e-values for other Megaviridae hallmark genes, AaV could be defined as an outlier of Megaviridae. In addition, NCVOG0061 orthologs were also found in several PBCV-type Phycodnaviridae members, and these displayed sequence similarities with those in Megaviridae members, casting questions on its status as a hallmark gene (Moniruzzaman et al., 2014) Figure S4B).

(Supplementary
Finally, evolutionary scenarios of viruses were reconstructed using COUNT software (Csuros, 2010;Kamneva and Ward, 2014) with the core gene phylogenetic tree as the guide tree topology (Figure 5). By this program, the size of the common ancestor virus and subsequent evolutional process can be inferred assuming different pressures for acquisitions and loss of genes, using different gene gain/loss penalties. When lower penalty for gene gains that gives the ancestor with only sixty-one NCVOG genes was chosen, Mimiviridae members were presumed to go through major gene gain when diverged from the smaller Megaviridae to reach to contemporary genome sizes. On the other hand, when high penalty for gene gain that gives the  NCVOG specifically associated with the groups of viruses were identified ( Figure 5), and the genes that showed significant homologies by comparisons of all the combinations viruses in the group are listed.
common ancestor with 996 NCVOGs was adopted, massive gene reduction at the timing of the divergence of the ancestor of Phycodnaviridae minus Megaviridae group from the ancestor of Megaviridae was inferred. The nodes where the hallmark genes identified in Figure 3 emerged or were lost were also predicted by the analysis (Figure 5). The inferred gene gain/loss ratio and estimated numbers of gene gains/losses per COG varied widely between clades as reported previously by analyzing evolutionally processes of closely related mimiviruses and phycodnaviruses (Filee, 2015).

DISCUSSION
The present study reveals that HaV53, the first raphidovirus isolated and characterized (Nagasaki and Yamaguchi, 1997;Nagasaki et al., 2005), is a unique NCLDV in many respects. While ∼31% of HaV53 genes display homology to NCLDV proteins, ∼11% show homology to bacterial and eukaryotic proteins, and ∼58% show no homology to any proteins identified to date. That a significant percentage of HaV53 genes do not show homology with proteins in the databases is typical of viruses belonging to a newly characterized lineage with no other sequenced representatives. Gene duplications followed by mutations or genetic drift and horizontal gene transfer from host to virus are presumable sources of unique HaV53 genes. By a heuristic approach, several HaV53 ORFs were identified as potential results of gene duplication ( Table 1).
On the other hand, possibility of horizontal gene transfer cannot be investigated, at this point, because of the scarcity of H. akashiwo genome/transcriptome information obtained to date.
HaV53 possesses 74 genes exhibiting homology to NCVOGs (Supplementary Table S1), while its composition is unique relative to other members of Phycodnaviridae or proposed Megaviridae (Figures 1 and 2). For example, HaV53 does not possess the Megaviridae hallmark genes MuTS7, DNAdirected RNA polymerases, polyA polymerase, and DNA topoisomerase I (Ogata et al., 2011;Santini et al., 2013;Moniruzzaman et al., 2014), supporting the conclusion that HaV53 is not likely a member of the proposed family. Notably, HaV53_ORF179 exhibits significant homology to bacterial asparagine synthetase A with an aminoacyl-transfer RNA synthetase domain (WP_003512022.1). Some mimiviruses characteristically possess aspartyl/asparaginyl-tRNA synthetase (Yutin et al., 2014). In the case of HaV53_ORF179, however, the motif corresponding to the anti-codon binding domain is missing, suggesting that the protein may not exhibit FIGURE 5 | Inferred gene gain/loss patterns and NCLDV hallmark genes. Numbers of genes gained (green triangles) and lost (red triangles) inferred using COUNT implementing Wagner parsimony are indicated at each node. The former and the latter numbers, separated by slashes, indicated at the symbols are numbers of gained or lost genes inferred by the analysis with gain penalties 5 and 1, respectively. Magenta and yellow circles at the nodes indicate inferred NCVOGs numbers with gain penalties 5 and 1, respectively, and blue circles indicate numbers of genes of each analyzed virus. Sizes of the circles represent numbers of genes.
tRNA synthase activity. Several orthologs that are shared among members of the PBCV and EhV groups are not found in HaV53 (Figure 3). These data underscore the uniqueness of HaV53 among Phycodnaviridae and proposed Megaviridae.
Importantly, HaV53 does not possess DNA-directed RNA polymerase or polyA RNA polymerase, indicating that HaV53 depends on its host's transcription machinery. On the other hand, as observed in many different NCLDVs, HaV53 harbors several genes related to regulation of transcription, including transcription initiation factors, mRNA capping enzyme subunits, and ribonuclease III.
Among the viruses analyzed in this study, AaV, EhV, and HaV53 possess higher proportions of unique genes that are not homologous to other NCLDV genes (Figures 2  and 5). This may be the main reason for the less welldefined phyletic positions of these three viruses in the results of pan-genomic analysis ( Figure 4B). In particular, AaV has been characterized as a Megaviridae-type phycodnavirus (Moniruzzaman et al., 2014). However, NCVOG orthologs commonly found in Megaviridae-type phycodnaviruses exhibit low homology to the corresponding genes in AaV (Figure 4). In addition, polyA polymerase (Supplementary Figure S4C) and asparagine synthetase (Moniruzzaman et al., 2014) are missing exclusively in AaV. These observations and Figure 4 show that AaV may be a non-standard member, or rather, outlier of the Megaviridae. AaV and HaV53 clustered closely, though with low confidence, in our pan-genomic analysis. We also directly compared AaV and HaV53 genes by allto-all BLASTP, and consistent with the results presented in Figure 2, they did not exhibit particularly high homology to each other.
We observe a segregation of viruses currently considered to be phycodnaviruses into at least four groups. The proposed Megaviridae-phycodnavirus group segregates from the rest. In addition, the EhV group clearly segregates from other Phycodnaviridae, consistent with the argument of Allen et al. (2006a;2006b). HaV53 does not show a strong association with any of the three groups, and thus presumably represents a novel, independent group. Accordingly, we found several orthologs that specifically associate with each group of Phycodnaviridae ( Table 2). These group-specific genes can be utilized as hallmarks to classify Phycodnaviridae in the future.
Currently, there were two major scenarios for evolution of Giant dsDNA viruses; the 'reduction model' and the 'expansion model.' The 'reduction model' is based on the idea that the viruses presumably emerged from much more complex organisms with larger sizes of genome, and reached to current status by genome simplifications (Raoult et al., 2004;Martin et al., 2010;Boyer et al., 2011;Nasir and Caetano-Anollés, 2015). In the 'expansion model, ' the viruses are presumed to descend from common ancestor virus with much smaller genomes, and reaching to contemporary sizes and diversity by progressively acquiring genes (Yutin et al., 2014). Assuming distinctive gene gain/loss penalty scores to yield ancestor virus with distinctive NCVOG numbers, two evolutional paths resulting from the contrasting models were reproduced (Figure 5). When the 'reduction model' was reproduced with high gene gain penalty, the massive gene losses during the early stage of divergence of Phycodnaviridae minus Megaviridae from the rest (i.e., Megaviridae) were inferred [ Figure 5, at node (I)]. On contrary, according to the 'expansion model' inferred by using lower gain penalty, major gene gains were observed after Mimiviridae diverged from smaller members of proposed Megaviridae. Comparative genome analyses of closely related members of Phycodnaviridae and Mimiviridae revealed specific patterns of gene gains and losses during the divergence of the lineages (Filee, 2015). Such future studies comprise of more distant viruses, possibly with more lineages, will provide insights into the overall evolutionally process of the Giant dsDNA viruses.
As Phycodnaviridae encompasses viruses infecting hosts of such vast diversity, they are expected to adopt varied strategies, and thus to develop genomes coding for distinctive genes with an array of functions during evolution. Our results, along with several previous observations, strongly suggest that classification of Phycodnaviridae does not represent current similarity in their genetic components, viral life cycle, and evolutionary relatedness. Systematic reclassification of the family based on current knowledge may not only provide better taxonomy of viruses but also lead to a better representation and understanding of evolution of NCLDVs, which remain enigmatic biological entities.

AUTHOR CONTRIBUTIONS
FM conducted data analysis and participated in discussion and writing the paper. SU designed research, conducted data analysis, and wrote the paper.

FUNDING
This work was supported by KAKENHI (Grant numbers 26291092, 26660153, 15H01263) provided by the Ministry of Education, Culture, Sports, Science and Technology of Japan to SU. The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

ACKNOWLEDGMENTS
This work was supported by KAKENHI (Grant numbers 26291092, 26660153, 15H01263), and Priority Areas "Comprehensive Genomics" (No. 221S0002) provided by the Ministry of Education, Culture, Sports, Science and Technology of Japan to SU. Some of the computational resources used in this study were provided by the Data Integration and Analysis Facility, National Institute for Basic Biology.