MeiosisOnline: A Manually Curated Database for Tracking and Predicting Genes Associated With Meiosis

Meiosis, an essential step in gametogenesis, is the key event in sexually reproducing organisms. Thousands of genes have been reported to be involved in meiosis. Therefore, a specialist database is much needed for scientists to know about the function of these genes quickly and to search for genes with potential roles in meiosis. Here, we developed “MeiosisOnline,” a publicly accessible, comprehensive database of known functional genes and potential candidates in meiosis (https://mcg.ustc.edu.cn/bsc/meiosis/index.html). A total of 2,052 meiotic genes were manually curated from literature resource and were classified into different categories. Annotation information was provided for both meiotic genes and predicted candidates, including basic information, function, protein–protein interaction (PPI), and expression data. On the other hand, 165 mouse genes were predicted as potential candidates in meiosis using the “Greed AUC Stepwise” algorithm. Thus, MeiosisOnline provides the most updated and detailed information of experimental verified and predicted genes in meiosis. Furthermore, the searching tools and friendly interface of MeiosisOnline will greatly help researchers in studying meiosis in an easy and efficient way.


BACKGROUND
Meiosis, the process to generate daughter cells with an intact, haploid genome through one round of DNA replication followed by two rounds of cell division, is a basic feature of sexual reproductive organisms (Gerton and Hawley, 2005;Miller et al., 2013;Bolcun-Filas and Handel, 2018;Biswas et al., 2021). Compared with mitosis, meiosis is characterized by homologous chromosome separation, which ensures the genetic integrity of all daughter cells (Sato et al., 2021). A series of biological processes would take place during meiosis prophase I to guarantee the formation and repair of programmed meiotic DNA double-strand breaks (DSBs) and the pair and synapsis between homologous chromosomes, as well as the formation of meiotic crossovers (Handel and Schimenti, 2010;Baudat et al., 2013;Gray and Cohen, 2016;Ranjha et al., 2018;Jiao et al., 2020;Li et al., 2021).
With the development of genomic technologies on model organisms and recent advances of transcriptomics and proteomics, tremendous articles have been published on meiosis from different species, and we get a clearer understanding about the genetic control of key events in meiosis (Watanabe et al., 2001;Wang et al., 2009;Chalmel and Rolland, 2015;Chen et al., 2018). However, information about meiotic genes is widely fragmented, which makes it still difficult to illuminate/highlight genes, molecular complexes, and/or signaling pathways involved in meiosis. What is more, it is still challenging to identify novel meiotic genes, especially in mammalian meiosis, since genetic modification in model organisms is time-consuming and is like a gamble sometimes (Khan et al., 2018Huang et al., 2019;Xie et al., 2019;Yousaf et al., 2020). Thus, a specialist database that can provide integrated annotation of meiotic genes and predict novel functional genes is urgently needed.
Here, we report a publicly accessible, comprehensive database, MeiosisOnline. 1 It is the first resource that is not only a wellstructured repository of experimentally verified meiotic genes with detailed annotation, but also a powerful tool to predict genes that may function in meiosis.

Manual Curation of Literature
To collect the information of meiotic genes, specific keywords were used to search in PubMed (Supplementary Table 1). Then all the collected papers were curated manually and genes that had been validated by experiments were deemed as functional meiotic genes.

Gene Expression Data Collection
Gene expression information was retrieved from the ArrayExpress database. 2 Datasets from Affymetrix GeneChip platform were downloaded and were divided into different categories, including "developmental stages, " "gene disturbance, " "before and after treatment, " and "tissues and cell types" (Supplementary Table 2). Gene expression data combined with category information are provided as annotation information in MeiosisOnline and applied for predicting genes that may function in meiosis.

Annotation
Annotation information for each gene in MeiosisOnline contains "basic information, " "function annotation and classification, " "protein-protein interaction (PPI) and gene expression." (1) Basic information: gene name/synonyms, nucleotide sequences, etc., were extracted from GenBank 3 and UniProt Knowledgebase. 4 (2) Function annotation and classification: detailed functional information is also manually collected from literature reports. (i) Which meiotic stage is the gene involved? (ii) Did the gene function in one sex or both sexes? (iii) Whether deletion or mutation of the gene in model organism has a phenotype in fertility? (iv) Which protein complex of the gene is involved? (v) The cellular location and expression pattern in tissues or cell lines.
(vi) Experimental methods used for functional analysis.
(vii) The information of related literature and figures for illustrating the function of protein/gene. (viii) Gene ontology annotation for collected genes.
(3) Protein-protein interaction and gene expression: both verified and predicted PPI information were provided. Gene expression pattern in reproductive system was also provided graphically.

Implementation
To execute more jobs in parallel, a Dell 730 server with LAMP architecture is used to host the MeiosisOnline database. The server is equipped with 128 GB RAM and two 12-core Intel processors (2.2 GHz). The jQuery is used to render the interface and Python and R languages are employed to supply the backend.

The Manual Curation of Meiosis-Related Genes From the Literature
MeiosisOnline is aiming to construct a functional annotation pipeline about meiosis-associated genes from published articles. After keywords querying in PubMed, about 45,000 research articles published before January 1, 2021, were collected. All collected papers were manually curated, and functional meiotic genes are only included as those with functional experimental validation (Supplementary Table 3). In total, 2,052 unique meiotic genes with experimentally verified functions from 84 species were curated along with functional information in MeiosisOnline. We found that the functional meiotic genes are firstly derived from mice, which accounts for 28.74% of the total reported genes, followed by human (5.16%) and rat (5.07%). Furthermore, other species comprise the rest of 61.03% (Supplementary Table 4). To be noted that the genes always have preferably expression profiles, for example, Mlh3 (mg0000873) expresses during both male and female meiosis, Sun5 (mg0000693) only presents in male germ cells, while Bmp15 (mg0000982) is specially expressed in oocytes.

The Overall Framework of MeiosisOnline
MeiosisOnline is developed in a user-friendly manner and the major functional modules of the database (Figure 1) include:

Search Page
Users could find their interested genes using the Search page. 5 Four additional search options were also provided 6 : (1) Advanced search. Users can query up to three keywords and set up different combination by selecting the operators ("and, " "or, " or "exclude") to find the information more specifically (Supplementary Figure 1A).

Browse Page
In the Browse page, users can browse genes by classifications or species 7 (Supplementary Figure 2A). Users can get all the genes belonging to a certain category in a tabular form. For example, users can browse MeiosisOnline Genes (MGs) collected from knockout mice (Supplementary Figure 2B) or MG genes identified in different species, e.g., Homo sapiens (Supplementary Figure 2C).

Candidates Page
In the Candidates page, MeiosisOnline lists all the predicted functional genes in mouse 8 (Supplementary Figure 3A). Clicking the MG ID, detailed information for a candidate gene can be seen (Supplementary Figure 3B).

Feedback Page
Users can submit suggestions about the records integrated in MeiosisOnline or submit novel verified meiotic gene information to our database. 9

MeiosisOnline Integrates Information of Functional Genes in Meiosis
Besides the general information including gene ID, protein ID, taxonomy ID, and basic descriptions, MeiosisOnline also provides high-quality functional annotation for the collected genes. Based on the function annotation information, the experimental verified genes were classified into different categories. Additionally, figures and/or tables illustrating the function of the collected meiotic genes were also incorporated. Moreover, manually annotated functions, signaling pathways, and associated protein complexes of the collected meiotic genes were provided (Figure 2). The functional distribution of these genes in various stages of meiosis and fecundity is also listed (Supplementary Tables 4, 5). For instance, using "Stra8" for query, results will be listed in tabular form, including MeiosisOnline Gene ID (MG ID), gene names, UniProt ID, etc. (Figure 2A). Once clicking the MG ID (MG0001089), detailed information for mouse Stra8 is available that includes the following: (1) basic information (gene name, nucleotide and protein sequences, etc.), (2) functional annotation and classification from related literature (developmental stages, experimental methods, literature abstract, relevant figures, etc.), and (3) PPI and gene expression information (Figure 2B).

MeiosisOnline Facilitates the Discovery of Functional Meiotic Genes
To expand the utilization of our MeiosisOnline database, a prediction model was constructed and used to predict the candidate functional meiotic genes. As mouse is one of the best studied animal models, the GAS algorithm (Zhang et al., 2013) was used to predict potential meiotic functional genes from Mus musculus (Supplementary Figure 4).
To verify the efficiency of GAS, we randomly separated the training data into two equal parts: one as a new training dataset and the other as a testing dataset. The model was split into three stages: stage 1 was constructed with features of the category "developmental stages, " stage 2 included features from the categories "tissues and cell types" and "before and after treatment, " and stage 3 is the features of the category "gene disturbance." As shown in Supplementary Figure 5, the performances of GAS models were better when more features were added to them. Then, based on the experimentally verified meiotic genes and gene expression data, 590 mouse genes with experimentally verified function in meiosis were used as the positive training dataset. The negative training dataset contained 5,868 genes from MGI (Mouse Genome Informatics), 10 10 http://www.informatics.jax.org/ of which knockout mice did not have any abnormalities in the reproductive system. Three hundred and ninety-four features used for GAS construction and prediction were extracted from the 85 microarray data (Supplementary Table 2).
Ultimately, 165 candidate genes (GAS probability > 0.5) having potential role were sorted out and annotated in MeiosisOnline (see text footnote 8). For the candidate genes, information that implicate their function in meiosis, including gene expression, protein localization, structure, and protein interactions, are included in MeiosisOnline.
We also performed GO enrichment analysis on both literature-reported genes in M. musculus (Supplementary Table 6) and GAS-predicted candidate genes (Supplementary Table 7). Compared with whole genome data, we statistically calculated the distribution of MGs in cellular components, biological processes, and molecular functions by R (hypergeometric distribution, p < 0.05) (Yu et al., 2012). Among all GO terms, 118 in biological processes, 1 in molecular functions, and 32 in cellular components are overlapped in both sets of reported and predicted genes ( Figure 3A). Considering that predicted genes enriched by overlapped GO term have more potential in regulating meiosis, interestingly, meiotic cell cycle (GO: 0051321) was enriched from both sets of genes (reported and predicted) (Supplementary Table 8).
Furthermore, as genes are mostly regulated through network structure in meiosis, we mapped out the PPIs among all of the genes and constructed a potential meiosis network with 1,083,566 reported PPIs among 26,569 proteins in MeiosisOnline. For example, Cct5 and Sf3a3, which have not been reported, show very high connectivity with reported genes (Figure 3B). Further investigation of Cct5 and Sf3a3 would disclose how these two genes interacted with reported genes and what is the function of these interactions. Moreover, we also found that reported genes like Ccna2 interacted wildly with predicted genes (Figure 3C).

DISCUSSION
Studies on animal models, especially genetically modified mice, have revealed many critical regulators involved in meiosis (Marston and Amon, 2004;Handel and Schimenti, 2010;Robert et al., 2016;Jiang et al., 2017Jiang et al., , 2018; however, the information for these meiotic genes are scattered among thousands of papers. Thus, it is difficult to collect and compare the information of meiotic genes among different species from papers. Here, based on manual curation of meiosis-related genes from the literature, the first comprehensive database, MeiosisOnline, focusing on meiosis was developed. As the fundamental process of gametogenesis, systematical annotation for meiotic genes is important to conduct further experiment study. Currently, only a few databases provided information related to meiosis, with limited features. Some databases were only repositories of gene expression data such as GermOnline 4.0 (Lardenois et al., 2010), SpPress (Vibranovski et al., 2009), and GermSAGE (Lee et al., 2009), and the utilization of these databases to obtain valuable information regarding experimentally verified function is not satisfactory. Some of those are limited to a specific species or a certain biological process, such as SpPress that focused only on spermatogenesis in Drosophila (Vibranovski et al., 2009), ReCGiP that focused on reproduction in pig (Yang et al., 2010), and MeioBase that focused on meiotic genes in plant (Li et al., 2014). In our study, MeiosisOnline provided detailed and comprehensive information and annotation of the meiotic genes, including basic information, functional annotation and classification, PPI, and gene expression, etc. With these diverse information, users can easily access the detailed functional information of meiosisassociated genes.
Additionally, besides the proven functional meiotic genes, among the 2,300 genes that are predominantly expressed in the testis (Schultz et al., 2003), the function of most genes in meiosis are still unclear. In MeiosisOnline, based on literaturereported genes and genome-wide transcriptional data from ArrayExpress analysis, 165 genes (GAS probability > 0.5) in mouse are predicted to be involved in meiosis. As we know that homologous recombination is the basic feature of meiosis (Zickler and Kleckner, 2015), when we perform GO terms for the predicted genes by MeiosisOnline, double-strand break repair via homologous recombination (GO: 0000724) was one of the most enriched GO (Supplementary Table 8), which implies that these predicted genes may function during meiosis and further functional study by animal models should be conducted.
What is more, MeiosisOnline could conduct the study focus on complex molecular and/or signaling networks in meiosis. During meiosis, some genes are regulated through the network structure; for example, the deletion of Hadc1 or Ddac2 alone did not affect meiosis, while their combined deletion resulted in meiotic arrest and subsequent oocyte depletion (Ma et al., 2012). Hence, mapping out PPIs among known and predicted genes is useful in uncovering the novel regulating mechanism of meiosis.
In summary, MeiosisOnline is the first specialist database on meiosis. It not only provides comprehensive information for experimental verified meiotic genes but can also predict genes that may function in meiosis. It would be a helpful resource for researchers to gain a new insight in meiosis.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
YZ and QS conceived and supervised the project. XJ, HZ, AA, and WL collected the data from the literature and ArrayExpress, as well as positive and negative training data for the prediction of potential meiotic functional genes. DZ developed the web interface. XJ, DZ, YZ, and QS wrote the manuscript. JW reviewed the manuscript. All authors contributed to the article and approved the submitted version.