# A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution

^{1}School of Computer Science, Inner Mongolia University, Hohhot, China^{2}School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China^{3}Beijing University of Civil Engineering and Architecture, Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China

Genome rearrangements are the evolutionary events on level of genomes. It is a global view on evolution research of species to analyze the genome rearrangements. We introduce a new method called RGRPT (recovering the genome rearrangements based on phylogenetic tree) used to identify the genome rearrangements. We test the RGRPT using simulated data. The results of experiments show that RGRPT have high sensitivity and specificity compared with other tools when to predict rearrangement events. We use RGRPT to predict the rearrangement events of six mammalian genomes (human, chimpanzee, rhesus macaque, mouse, rat, and dog). RGRPT has recognized a total of 1,157 rearrangement events for them at 10 kb resolution, including 858 reversals, 16 translocations, 249 transpositions, and 34 fusions/fissions. And RGRPT has recognized 475 rearrangement events for them at 50 kb resolution, including 332 reversals, 13 translocations, 94 transpositions, and 36 fusions/fissions. The code source of RGRPT is available from https://github.com/wangjuanimu/data-of-genome-rearrangement.

## Introduction

The rapid development of sequencing technologies makes the phylogenetic analysis from the level of whole genome possible. A studied genome is represented as a line of conserved segments (called syntenic blocks). The genome rearrangements of species are changes of syntenic block orderings and losing of sequence blocks. These events include reversal, translocation, transposition, fusion, fission, and so on (Xu et al. 2017; Cheng et al. 2019; Dong et al., 2018). The research on genome rearrangements is mainly three aspects.

One is the computation of evolutionary distance between two species by considering genome rearrangements. Researchers have proposed a lot of metric for measuring the dissimilarity of evolution between species and a large amount of algorithms for computing the metrics. The breakpoint distance is the minimum rearrangement operations transforming one genome to the other genome, which is computed by means of breakpoint graph (Blanchette et al., 1997; Sankoff and Blanchette, 1998). There are lots of algorithms for computing breakpoint distance. In 1995, Hannenhalli and Pevzner put forward an algorithm with O(*n ^{5}*) time complexity to compute the breakpoint distance just considering reversal events (Hannenhalli and Pevzner, 1999). Later, Kaplan improved the algorithm to time complexity O(

*n*) (Kaplan et al., 2000). In 1996, Hannenhalli designed an algorithm with O(

^{5}*n*) time complexity to compute it by considering translocation events (Hannenhalli, 1995). In 2001, Zhu et al. improved the algorithm to time complexity O(

^{3}*n*) (Zhu and Ma, 2002). And then Zhu et al. devised an algorithm with O(

^{2}logn*n*) time complexity (Liu et al., 2004). The DCJ distance is introduced by Yancopoulos et al. (Sophia et al., 2005), which uses the double cut and join (DCJ for short) operation to model rearrangement events, such as reversal, translocation, transposition, fusion, and fission in an unified way. Yancopoulos et al. first propose a method to compute the DCJ distance by considering only translocations and reversals on linear chromosomes (Sophia et al., 2005). Paper (Lu et al., 2006) has proposed an

^{2}*O(n*time algorithm to compute the distance by considering the fusions and fissions between circular unsigned chromosomes. Unimog (Hilker et al., 2012) is software for computing DCJ distance which implements lots of algorithms (Erdös et al., 2011; Jakub et al., 2011). SoRT is a tool to compute breakpoint distance and the DCJ distance for linear/circular multi-chromosomal gene orders (Yen-Lin et al., 2010). SCJ distance (Feijão and Meidanis, 2011) is defined using the single cut and join (SCJ for short) operations, which is in analogy to DCJ measure. The distance can be computed by a speedily computable.

^{2})Two is the reconstruction of the ancestral gene orders by using the genomes of extant species. Ma et al. (Ma et al., 2006) use maximum parsimony principle to recover reliably ancestral genomes starting from phylogenetic tree and adjacent genes in genome and make the probabilistic reconstruction accuracy analysis for the six mammalian genome (human, mouse, rat, dog, opossum, and chicken) based on the improved Jukes–Cantor model. PMAG utilized the Bayesian theorem in the probabilistic framework to infer ancestral genomes (Yang et al., 2014). Multiple Genome Rearrangements (MGR) recovers the ancestral genome by minimizing the rearrangement distance (Bourque and Pevzner, 2002). Multiple Genome Rearrangements and Ancestors (MGRA) is developed to reconstruct ancestral genomes based on multiple breakpoint graphs and is used to analyze rearrangement evolutionary events of seven mammalian genomes (human, chimpanzee, macaque, mouse, rat, dog, and opossum) (Alekseyev and Pevzner, 2009). Decostar (Duchemin et al., 2017) is a software which reconstructs neighborhood relations of ancestral genes aiming at reconstructing the organization of ancestral genomes.

Three is the recognition of the rearrangement events of existing species. Efficient Method to Recover Ancestral Events (EMRAE) is an algorithm which can recognize rearrangement events in evolution described by phylogenetic tree by means of adjacent genes in genomes (Zhao and Bourque, 2009).

## Materials and Methods

### Preliminaries

A genome is composed of several chromosomes, and each chromosome is an ordering of syntenic blocks. For convenience, each syntenic block is recorded by an integer, so a chromosome is represented by a signed permutation *X*=*c*_{1}*c*_{2}⋯*g _{n}*, where

*c*(1≤

_{i}*i*≤

*n*) is an integer representing a syntenic block, its sign is assigned with the orientation that is either positive (recorded by

*c*) or negative (recorded by –

_{i}*c*). The chromosome

_{i}*X*=

*c*

_{1}

*c*

_{2}⋯

*c*is the same as –

_{n}*X*= –

*c*–

_{n}*c*

_{n}_{– 1}… –

*c*

_{1}.

A reversal *r* (*i*, *j*) (*i* ≤ *j*) converts chromosome *X*=*c*_{1}*c*_{2}⋯*c _{n}* into a new chromosome

*X*'=

*c*

_{1}

*c*

_{2}⋯−

*c*−

_{j}*c*

_{j}_{–1}⋯−

*c*

_{i}_{+1}−

*c*

_{i}c_{j}_{+1}⋯

*c*, where the reversal is from

_{n}*c*to

_{i}*c*.

_{j}A translocation event breaks two chromosomes into four segments and then reconnects them into two new chromosomes. Given two chromosomes *X* = *X*_{1}*X*_{2} and *Y* = *Y*_{1}*Y*_{2}, where *X*_{1}=*x*_{1}*x*_{2}⋯*x
_{i}*

_{–1},

*X*

_{2}=

*x*

_{i}x_{i}_{+1}⋯

*x*,

_{m}*Y*

_{1}=

*y*

_{1}

*y*

_{2}⋯

*y*

_{j}_{–1}, and

*Y*=

_{2}*y*

_{j}y_{j}_{+1}⋯

*y*, a translocation is represented by

_{n}*tl*(

*i*,

*j*).

*X*

_{1}and

*Y*

_{1}are exchanged to form two new chromosomes

*X*'=

*Y*

_{1}

*X*

_{2}and

*Y*'=

*X*

_{1}

*Y*

_{2}, or

*X*

_{1}and

*Y*

_{2}are exchanged to form two new chromosomes

*X*” = –

*Y*

_{2}

*X*

_{2}and

*Y*” =

*X*

_{1}–

*Y*

_{1}.

A transposition event is to exchange two adjacent fragments on one chromosome into a new chromosome. A transposition is represented by *tp*(*i*, *j*, *k*), i.e., the fragment *c _{i}*⋯

*c*of one chromosome inserted into after

_{j}*c*. If

_{k}*c*is on the same chromosome (

_{k}*k*>

*j*or

*k*<

*i*), then the transposition

*tp*(

*i*,

*j*,

*k*) is called intra-chromosomal; otherwise, it is inter-chromosomal. Given a chromosome

*X*=

*c*

_{1}

*c*

_{2}⋯

*c*

_{i}c_{i}_{+1}⋯

*c*

_{j}_{–1}

*c*⋯

_{j}*c*⋯

_{k}*c*and an intra-chromosomal transposition,

_{n}*X*is converted into

*X*'=

*c*

_{1}

*c*

_{2}⋯

*c*

_{k}c_{i}c_{i}_{+1}⋯

*c*

_{j}c_{k}_{+1}⋯

*c*.

_{n}A fusion event is to connect two chromosomes into a new chromosome. The fusion acting on chromosomes *X*_{1} and *X*_{2} is represented by *f u*(*X*_{1}, *X*_{2}) and forming a new chromosome *X*_{1}*X*_{2} or *X*_{1}−*X*_{2}. A fission is to split a chromosome into two new chromosomes. A fission acting on the chromosome *X* = *X*_{1}*X*_{2} is represented by *f i*(*X*) and forming two new chromosomes *X*_{1} and *X*_{2} (where *X*_{1} and *X*_{2} are non-empty segments).

An adjacency *a*(*c
_{i}*,

*c*

_{i}_{+1}) of genome

*X*is two adjacent integers in one chromosome of

*X*.

*a*(

*c*,

_{i}*c*

_{i}_{+1}) is the same as

*a*(−

*c*

_{i}_{+1},−

*c*). For example, all adjacencies on chromosome

_{i}*X=*1,234 are

*a*(1, 2),

*a*(2, 3), and

*a*(3, 4). For a set of genomes

*S*, an adjacency

*a*is effective w.r.t.

*S*if it belongs to at least one genome and not all genomes. For example, two uni-chromosomal genomes

*G*

_{1}and

*G*

_{2}, the chromosome

*X=*1,234 of

*G*

_{1}and the chromosome

*Y=*1 – 3 − 24 of

*G*

_{2}, then all effective adjacencies w.r.t.

*G*

_{1}and

*G*

_{2}are

*a*(1, 2),

*a*(2, 3),

*a*(3, 4),

*a*(1, −3), and

*a*(−2, 4).

### EMRAE

Given a phylogenetic tree *T* describing the evolution of the genomes *G*, EMRAE first computes all effective adjacencies w.r.t. *G*. Then, it predicts the rearrangement events for each edge of *T* by means of inference rules (will be introduced in the following).

Figure 1 shows a reversal *r*(2, 3) during the evolution from *A* to *B*, where *A* and *B* are two uni-chromosomal genomes, and the chromosomes are *X =* 1,234 and *Y =* 1 – 3 – 24, respectively. The set of genomes will be divided into two subsets recorded by *S _{A}* and

*S*after removing the edge

_{B}*e*from

*T*. Suppose there is not any rearrangement events inside

*S*and

_{A}*S*. Then, adjacencies

_{B}*a*(1, 2) and

*a*(3, 4) can be found in each genome of

*S*and not in any one genome of

_{A}*S*;

_{B}*a*(1,−3) and

*a*(−2,4) can be found in each genome of

*S*

_{B}and not in any one genome of

*S*. In turn, we can utilize the four adjacencies

_{A}*a*(1, 2),

*a*(3, 4),

*a*(1, −3), and

*a*(−2,4) to identify a reversal

*r*(2, 3) occurring on the edge

*e*. The EMRAE method infers the rearrangement events by means of the similar rules.

**Figure 1** A reversal ** r** (2, 3) during the evolution from

*A*to

*B*;

*S*\s\do5

**(A)**and

*S*\s\do5

**(B)**are two subsets of all leaves species divided by the edge

*e*.

Let *e* = (*A*, *B*) be an edge of *T*, *G*={*G*_{1},*G*_{2},⋯,*G
_{m}*}the genomes of leaves, and

*a*

_{1},

*a*

_{2},⋯

*a*the children of

_{i}*A*and

*b*

_{1},

*b*

_{2},⋯

*b*

_{j}the children of

*B*. EMRAE first selects a number of adjacencies as candidate adjacencies

*Ca*(

*e*,

*A*) for edge

*e*and node

*A*according the following steps.

1. Find the adjacencies are in each genome of *S
_{A}* and not in any one genome of

*S*, then put them to

_{B}*Ca*(

*e*,

*A*);

2. If *A* is an internal node, find all edges connected with *A* except *e* and record them with *e*_{1},*e*_{2},⋯,*e
_{k}*. For each

*e*=(

_{i}*u*,

_{i}*A*)(1≤

*i*≤

*k*),

*G*can be divided into two parts after removing

*e*,

_{i}*S*is the part not including

_{ui}*A*.

a. Find the adjacencies that are in one genome of each S_{ui} (1 ≤ i ≤ k) and not in any one genome of *S
_{B}*, then put them to

*Ca*(

*e*,

*A*);

b. Compute *Ca*(*e
_{i}*,

*u*) and

_{i}*Ca*(

*e*,

_{i}*u*)(1≤

*i*≤

*k*). For each one

*Ca*(

*e*,

_{i}*u*), find the adjacency

_{i}*a*

_{1}from

*Ca*(

*e*,

_{i}*u*), such that

_{i}*a*

_{1}is not overlap gene with any one adjacency in

*Ca*(

*e*,

_{i}*u*),

*a*

_{1}has overlap gene with one adjacency

*a*

_{2}in each

*Ca*(

*e*,

_{j}*u*)(1≤

_{j}*j*≠

*i*≤

*k*), and

*a*

_{2}has overlap gene with at least one adjacency in

*Ca*(

*e*,

_{j}*u*), then put

*a*\s\do5(1) to

*Ca*(

*e*,

*u*).

EMRAE then infers rearrangement from *Ca*(*e*, *A*) and *Ca*(*e*, *B*) for edge *e =* (*A*, *B*) with the help of inference rules in the following section. From the definitions of genome rearrangements, we find that each genome rearrangement can change several adjacencies. For example, each reversal *r*(*i*, *j*)(*i* ≤ *j*) can change two adjacencies *a*_{1}=*a*(*c
_{i}*

_{–1},

*c*) and

_{i}*a*

_{2}=

*a*(

*c*,

_{j}*c*

_{j}_{+1}) into

*b*

_{1}=

*a*(

*c*–

_{i}_{1}, –

*c*) and

_{j}*b*

_{2}=

*a*(−

*c*,

_{i}*c*

_{j}_{+1}). Based on those facts, we obtain the inference rules introduced in the following section.

### Inference Rule

Let *e =* (*A*,*B*) be an edge of the phylogenetic tree *T*. Given adjacencies *a*_{1} = *a* (*c*_{1}–_{1}, *c
_{i}*),

*a*

_{2}=

*a*(

*c*

_{j, }c_{j}_{+1}) in

*Ca*(

*e*,

*A*) and

*b*

_{1}=

*a*(

*c*

_{i}_{–1},−

*c*),

_{j}*b*

_{2}=

*a*(−

*c*,

_{i}*c*

_{j}_{+1}) in

*Ca*(

*e*,

*B*), EMRAE infers a reversal

*r*(

*i*,

*j*) from

*A*to

*B*if all genomes are uni-chromosomal or

*a*

_{1},

*a*

_{2}are in the same chromosome in

*S*and

_{A}*b*

_{1}, and

*b*

_{2}are in the same chromosome in

*S*. Otherwise, we infer a translocation

_{B}*tl*(

*i*,

*j*). Similarly, given adjacencies

*a*

_{1}=

*a*(

*c*

_{i}_{–1},

*c*),

_{i}*a*

_{2}=

*a*(

*c*

_{j}c_{j}_{+1}) in

*Ca*(

*e*,

*A*) and

*b*

_{1}=

*a*(

*c*

_{i}_{+1},

*c*

_{j}_{+1}),

*b*

_{2}=

*a*(

*c*,

_{j}*c*) in

_{i}*Ca*(

*e*,

*B*), EMRAE infers a translocation

*tl*(

*i*,

*j*), or a reversal for

*a*

_{1},

*a*

_{2}in

*Ca*(

*e*,

*A*) and adjacencies

*b*

_{1},

*b*

_{2}in

*Ca*(

*e*,

*B*).

Assume that there are adjacencies *a*_{1}=*a*(*c
_{i}*

_{–1},

*c*),

_{i}*a*

_{2}=

*a*(

*c*,

_{j}*c*

_{j}_{+1}), and

*a*

_{3}=

*a*(

*c*,

_{k}*c*

_{k}_{+1}) in

*Ca*(

*e*,

*A*) and

*b*

_{1}=

*a*(

*c*

_{i}_{–1},

*c*

_{j}_{+1}),

*b*

_{2}=

*a*(

*c*,

_{k}*c*), and

_{i}*b*

_{3}=

*a*(

*c*,

_{j}*c*

_{k}_{+1}) in

*Ca*(

*e*,

*B*). EMRAE can predict a transposition

*tp*(

*i*,

*j*,

*k*) during the evolution from

*A*to

*B*if all genomes are uni-chromosomal. Otherwise, suppose

*m*genomes in

*S*have

_{A}*a*

_{1}and

*a*

_{2}, then EMRAE can predict a transposition

*tp*(

*i*,

*j*,

*k*) if there are at least

*m*/2 genomes such that the four integers of

*a*

_{1}and

*a*

_{2}on the same chromosome, or there are at least

*m*/2 genomes such that the four integers of

*a*

_{2}and

*a*

_{3}on the same chromosome.

Assume that there is *a*=*a*(*c
_{i}*,

*c*) in

_{j}*Ca*(

*e*,

*A*). EMRAE can predict a fission that splits the adjacency

*a*=

*a*(

*c*,

_{i}*c*) if

_{j}*a*is sign-compatible for each genome

*G*in

_{k}*S*. The fusion from

_{B}*A*to

*B*can be seen as a fission from

*B*to

*A*.

### Recovering the Genome Rearrangements Based on Phylogenetic Tree

EMRAE can not identify the rearrangement occurring in the frontier of genomes. We take Figure 2, for example, where species *A*, *B*, and *C* are uni-chromosomal genomes *A =* 1,234, *B =*−2 – 134, and *C =* 1,234. A reversal r(1,2) has occurred in the evolution from *A* to *B*. EMRAE can compute the candidate adjacencies *a*(−1,3) for *Ca*(*e*_{1},*B*) and *a*(2,3) for *Ca*(*e*_{1},*A*). So, EMRAE can not infer the reversal r(1,2) on the edge *e*_{1} according to the candidate adjacencies.

We improve EMRAE so that the improved method (called RGRPT) is able to infer the rearrangement events occurring in the frontier region. The inference rule of RGRPT is the same as that of EMRAE. The difference between RGRPT and EMRAE is that they have different candidate adjacencies. RGRPT puts 0 to the head and tail for each chromosome, so there will be added a lot of adjacencies for each genome. For example, considering the uni-chromosomal genomes *X =* 1,234 and *Y =* −2 −134, the two additional candidate adjacencies *a*(0,1) and *a*(0,−2) are added.

RGRPT adds candidate adjacencies in the step b of EMRAE. For each one *Ca*(*e
_{i}*,

*u*) and an adjacency

_{i}*a*

_{1}from

*Ca*(

*e*

_{i},

*u*

_{i}), if there is an adjacency

*a*

_{2}in each

*Ca*(

*e*,

_{j}*u*)(1≤

_{j}*j*≠

*i*≤

*k*) such that

*a*

_{1}with

*a*

_{2}has overlap gene, then put

*a*

_{1}to

*Ca*(

*e*,

*u*).

## Results

All of the experiments were performed on a computer with Intel Vostro 14 2.0 GHz CPU, 4 GB RAM, and 500 GB Hard Disk Drives (HDD). The operating system was Win10 64 bit with Java 1.6 installed. RGRPT was written in Java.

We tested RGRPT with both simulated data and the practical data (i.e., real biological data) introduced by the following section.

### Simulated Data

Here, we start with an uni-chromosomal genome as the ancestor, and it evolves along the phylogenetic tree with *n* taxa whose topology sees the Figure 3.

We generate two simulated data sets in order to test the affectivity of RGRPT. One of them is created from the phylogeny only with reversals events. The other data set is generated from the phylogeny with kinds of events, including reversals, translocation, transposition, fusion, and fission, and the quantity of those events is in a certain ratio. The two data sets can test the ability of methods to recover the simple and the complex evolution histories. First data set is created just using reversal events. Since the reversal on only one gene is rare (Korbel et al., 2007), we set the ratio of reversal on one gene and on more than one gene as 1:3. The number of leaves is from 3 to 10 with step 1. For each number of leaves, the ancestor genome with *m* gene, where *m* from 50 to 150 with step 10. Each edge will happen *k* reverse, where *k* is random integer number from 3 to 10. So, there are 11 groups data for each leaf number. Sensibility is the percentage of correctly predicted events in all practical events. Specificity is the percentage of correctly predicted events in all predicted events. We compute the sensibility and specificity for RGRPT and EMRAE for each group data. Table 1 shows the average sensitivity and specificity for each leaf number. The second column of the table records the number of all events, and its last row records the average values.

**Table 1** Results of EMRAE and recovering the genome rearrangements based on phylogenetic tree algorithms in predicting reversal events.

Table 1 shows that RGRPT achieves higher sensibility than EMRAE, and RGRPT achieves comparable specificity with EMRAE. Obviously, RGRPT can distinguish more actually occurred events than EMRAE. So, the experimental results show that the RGRPT is more efficient than EMRAE for predicting reversal events.

Second data set is generated by using all events, i.e., reversal, translocation, transposition, fusion, and fission. The reversals are generally more than the other rearrangement events. The fusions and the fissions are very rare; so, we record the number of the two events together. Here, we set the ratio of those events as 10:2:2:0.1. The ancestor genome has 5 chromosomes and each chromosome with 100 genes. The ancestor genome evolves along the topology with four leaves (see Figure 3). Each edge happen *k* events, where *k* is random number from 1 to μ and μ is 6, 12, 18, and 24. For each μ, it runs 10 times; so, we can obtain 10 groups data for each μ. Table 2 shows the average of 10 groups data for each μ. This table indicates that the RGRPT is more efficient than EMRAE for predicting all events.

**Table 2** Results of EMRAE and recovering the genome rearrangements based on phylogenetic tree algorithms in predicting all events.

### Practical Data

The practical data is from the paper (Zhao and Bourque, 2009). It contains six mammalian genomes, i.e., human, chimpanzee, rhesus monkey, mouse, voles, and dog. The data are created from two different levels of resolution 10 kb and 50 kb. Figure 4 is the tree describing the phylogeny of species. The results are shown in Tables 3 and 4. EM and RG represent EMRAE and RGRPT respectively, and Rev, Tloc, Tran, Fus, and Fis represent reversal, translocation, transposition, fusion, and fission, respectively. Each row in the table records the ancestor rearrangement events of the edge. For example, the values in the human row are the rearrangement events from D to human; the values in MR row are the rearrangement events from A and B.

**Table 3** Genome rearrangement predictions of EMRAE and recovering the genome rearrangements based on phylogenetic tree at 10 kb resolution.

**Table 4** Genome rearrangement predictions of EMRAE and recovering the genome rearrangements based on phylogenetic tree at 50 kb resolution.

At 10 kb resolution, the RGRPT algorithm predicts 1,157 ancestor rearrangement events, including 858 reversals, 16 translocations, 249 transpositions, and 34 fusions and fissions. It identifies 48 rearrangement events more than the EMRAE. The reversal events are in the majority in all predicted events. At 50 kb resolution, the RGRPT algorithm predicts 475 ancestor rearrangement events, including 332 reversals, 13 translocations, 94 transpositions, and 36 fusion and fissions. RGRPT identifies 21 rearrangement events more than EMRAE algorithm. The rearrangement events identified in the rat edge are mostly in all edges either at 10 kb resolution or at 50 kb resolution. The syntenic blocks of genomes at 10 kb resolution are more than the syntenic blocks of genomes at 50 kb resolution. The fact reduces the recognized rearrangement events at 10 kb resolution that are more than the recognized rearrangement events at 50 kb resolution. Experiments show that RGRPT can recover more ancestor events than EMRAE.

## Discussion

This paper proposes a new method, RGRPT, to infer ancestor rearrangement events. RGRPT takes a phylogenetic tree describing the evolution of species and the genomes of species as input. Experiments on the simulated data and practical data show that RGRPT is more efficient than EMRAE and can recover more ancestor rearrangement events than EMRAE. RGRPT provides a method for us to research the genome rearrangement of species. We can use RGRPT to recognize the ancestral genome rearrangement for the evolution of other species in future (Tian et al., 2018).

## Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://github.com/wangjuanimu/data-of-genome-rearrangement.

## Author Contributions

JW proposed and implemented the RGRPT method. JW and BC designed all experiments. All authors participated in the designing the algorithm and writing the paper.

## Funding

The work was supported by the National Natural Science Foundation of China (61661040, 61661039, 61571163, 61532014, 61671189, 91735306, 61751104); the National Key Research and Development Plan Task of China (Grant No. 2016YFC0901902).

## Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## References

Alekseyev, M. A., Pevzner, P. A. (2009). Breakpoint graphs and ancestral genome reconstructions. *Genome Res.* 19 (5), 943–957.

Blanchette, M., Bourque, G., Sankoff, D. (1997). Breakpoint phylogenies. *Genome Inform. Ser. Workshop Genome Inform.* 8, 25–34.

Bourque, G., Pevzner, P. A. (2002). Genome-scale evolution: reconstructing gene orders in the ancestral species. *Genome Res.* 11 (1), 26–36.

Cheng, L., Yang, H., Zhao, H., Pei, X., Shi, H., Sun, J., et al. (2019) Metsigdis: a manually curated resource for the metabolic signatures of diseases. *Briefings Bioinf.*
doi: 10.1093/bib/bbx103

Dong, S., Zhao, C., Fei, C., Liu, Y., Zhang, S., Hong, W., et al. (2018). The complete mitochondrial genome of the early flowering plant nymphaea colorata is highly repetitive with low recombination. *Bmc Genomics* 19 (1), 614–626.

Duchemin, W., Anselmetti, Y., Patterson, M., Ponty, Y., Brard, S., Chauve, C., et al. (2017). Decostar: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies. *Genome Biol. Evol.* 9 (5), 1312–1319.

Erdös, P. L., Soukup, L., Stoye, J. (2011). Balanced vertices in trees and a simpler algorithm to compute the genomic distance. *Appl. Math. Lett.* 24 (1), 82–86.

Feijão, P., Meidanis, J. (2011). Scj:a breakpoint-like distance that simplifies several rearrangement problems. *IEEE/ACM Trans. Comput. Biol. Bioinform.* 8 (5), 1318–1329.

Hannenhalli, S. (1995). Polynomial-time algorithm for computing translocation distance between genomes. *Discrete Appl. Math.* 71 (1–3), 137–151.

Hannenhalli, S., Pevzner, P. A. (1999). Transforming cabbage into turnip:polynomial algorithm for sorting signed permutations by reversals. *J. Acm* 46 (1), 1–27.

Hilker, R., Sickinger, C., Pedersen, C. N., Stoye, J. (2012). Unimog–a unifying framework for genomic distance calculation and sorting based on dcj. *Bioinformatics* 28 (19), 2509.

Jakub, K., Robert, W., Braga, M. D. V., Jens, S. (2011). Restricted dcj model: rearrangement problems with chromosome reincorporation. *J. Comput. Biol. J. Comput. Mol. Cell Biol.* 18 (9), 1231–1241.

Kaplan, H., Shamir, R., Tarjan, R. E. (2000). Faster and simpler algorithm for sorting signed permutations by reversals. *SIAM J. Comput.* 29 (3), 880–892.

Korbel, J. O., Urban, A. E., Affourtit, J. P., Godwin, B., Grubert, F., Simons, J. F., et al. (2007). Paired-end mapping reveals extensive structural variation in the human genome. *Science* 318 (5849), 420–426.

Liu, X., Zhu, D., Ma, S., Li, Z., Wang, L. (2004). An o(n2) algorithm for sorting oriented genomes by translocations. *Chin. J. Comput.* 27 (10), 1354–1360.

Lu, C. L., Huang, Y. L., Wang, T. C., Chiu, H. T. (2006). Analysis of circular genome rearrangement by fusions, fissions and block-interchanges. *Bmc Bioinf.* 7 (1), 295.

Ma, J., Zhang, L., Suh, B., e. a. Raney, B. (2006). Reconstructing contiguous regions of an ancestral genome. *Genome Res.* 16 (12), 1557–1565.

Sankoff, D., Blanchette, M. (1998). Multiple genome rearrangement and breakpoint phylogeny. *J. Comput. Biol.* 5, 555–570.

Sophia, Y., Oliver, A., Richard, F. (2005). Efficient sorting of genomic permutations by translocation, inversion and block interchange. *Bioinformatics* 21 (16), 3340–3346.

Tian, Z., Teng, Z., Cheng, S., Guo, M. (2018). Computational drug repositioning using meta-path-based semantic network analysis. *BMC Syst. Biol.* 12 (S9), 134.

Xu, Y., Wang, Y., Luo, J., Zhao, W., Zhou, X. (2017) Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to esc fate decision. *Nucleic Acids Res.* 45 (21), 12100–12112.

Yang, N., Hu, F., Zhou, L., Tang, J. (2014). Reconstruction of ancestral gene orders using probabilistic and gene encoding approaches. *PLoS One* 9 (10), e108796.

Yen-Lin, H., Chen-Cheng, H., Chuan Yi, T., Chin Lung, L. (2010). Sort2: a tool for sorting genomes and reconstructing phylogenetic trees by reversals, generalized transpositions and translocations. *Nucleic Acids Res.* 38 (Web Server issue), W221–W227.

Zhao, H., Bourque, G. (2009). Recovering genome rearrangements in the mammalian phylogeny. *Genome Res.* 19 (5), 934–942.

Keywords: genome rearrangements, mammal, phylogenetic tree, evolution, algorithm

Citation: Wang J, Cui B, Zhao Y and Guo M (2019) A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution. *Front. Genet.* 10:1020. doi: 10.3389/fgene.2019.01020

Received: 02 July 2019; Accepted: 24 September 2019;

Published: 29 October 2019.

Edited by:

Lei Deng, Central South University, ChinaReviewed by:

Yungang Xu, University of Texas Health Science Center at Houston, United StatesZhen Tian, Zhengzhou University, China

Wei Lan, Guangxi University, China

Copyright © 2019 Wang, Cui, Zhao and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Maozu Guo, guomaozu@bucea.edu.cn