GlioMarker: An integrated database for knowledge exploration of diagnostic biomarkers in gliomas

Gliomas are the most frequent malignant and aggressive tumors in the central nervous system. Early and effective diagnosis of glioma using diagnostic biomarkers can prolong patients’ lives and aid in the development of new personalized treatments. Therefore, a thorough and comprehensive understanding of the diagnostic biomarkers in gliomas is of great significance. To this end, we developed the integrated and web-based database GlioMarker (http://gliomarker.prophetdb.org/), the first comprehensive database for knowledge exploration of glioma diagnostic biomarkers. In GlioMarker, accurate information on 406 glioma diagnostic biomarkers from 1559 publications was manually extracted, including biomarker descriptions, clinical information, associated literature, experimental records, associated diseases, statistical indicators, etc. Importantly, we integrated many external resources to provide clinicians and researchers with the capability to further explore knowledge on these diagnostic biomarkers based on three aspects. (1) Obtain more ontology annotations of the biomarker. (2) Identify the relationship between any two or more components of diseases, drugs, genes, and variants to explore the knowledge related to precision medicine. (3) Explore the clinical application value of a specific diagnostic biomarker through online analysis of genomic and expression data from glioma cohort studies. GlioMarker provides a powerful, practical, and user-friendly web-based tool that may serve as a specialized platform for clinicians and researchers by providing rapid and comprehensive knowledge of glioma diagnostic biomarkers to subsequently facilitates high-quality research and applications.

platform for clinicians and researchers by providing rapid and comprehensive knowledge of glioma diagnostic biomarkers to subsequently facilitates highquality research and applications. KEYWORDS gliomas, biomarker, diagnostic, database, knowledge exploration Introduction Gliomas, arising from the glial support cells within the brain, are classified into four grades (World Health Organization [WHO] grades I-IV), and are the most common primary tumors of the central nervous system (CNS) (1). Histologically, gliomas include ependymomas, astrocytoma (including glioblastoma [GBM]), oligodendroglioma, mixed gliomas, and a few others, such as optic nerve and brain stem gliomas (2), exhibiting a considerable variability in age of onset, grade of severity, and ability to progress, as well as to metastasize (3). Due to the absence of effective diagnostic strategies, patients mainly rely on neurological examination and neuroimaging methods performed when the disease is already at an advanced stage, especially in glioblastoma (4). Therefore, early and efficient diagnosis is essential for implementing of precise therapy, which is crucial for prolonging patients' lives and improving their quality of life (5).
Effective medical practice is clinically dependent on highfidelity and technically suitable diagnostic biomarkers to accurately diagnose diseases and conditions (6). In recent decades, many researchers have focused on the study of new diagnostic biomarkers and their functions in gliomas (7,8). In landmark 2016, the WHO classification of gliomas used molecular parameters in addition to histology to define many tumor entities for the first time, significantly improving the accuracy of tumor diagnosis (9). Thus, a thorough and comprehensive understanding of the existing diagnostic biomarkers in gliomas and the identification of more valuable potential biomarkers are of great significance in the era of precision oncology (5).
Although outstanding achievements have been made in biomarker research of glioma diagnosis, many challenges still exist. (i) Due to the lack of systematic knowledge, the molecular mechanism of glioma pathogenesis, malignancy, and clinical aggressiveness remains unclear. (ii) With the rapid increase in publications, it becomes a complex, time-consuming, and challenging process for biologists or clinicians to mine crucial diagnostic biomarkers and obtain knowledge from the multifarious literature and data sources with corroborative analysis. (iii) With high-throughput sequencing and multiomics technology, glioma molecular characteristics have gradually been understood (10). Therefore, integrating gliomarelated omics data with biomarker information from the literature to mine their biological functions and regulatory mechanisms through online analysis is worthy of further exploration. (iv) Although several biomarker databases have been established, such as the Tuberculosis Biomarker Database for tuberculosis (11), LiverCancerMarkerRIF (12) and CancerLivER (13) for liver cancer, ExoBCD (14) for exosomal biomarkers in breast cancer, and CBD (15) for colorectal cancer, no diagnostic biomarker database in gliomas has been reported for public usage, which highlights the need for the research and clinical application of biomarkers for glioma diagnosis.
This study developed an integrated and web-based database GlioMarker (http://gliomarker.prophetdb.org/) to overcome the above limitation. GlioMarker is part of the Prophet project, which aims to achieve the goal of establishing a cancer knowledge map ecosystem. In the current GlioMarker (version 1.0), accurate information on 406 glioma diagnostic biomarkers from 1559 publications (1989.05-2022.05), including biomarker descriptions, clinical information, associated literature, experimental records, associated diseases, statistical indicators, etc., was manually extracted. To better understand the biological significance of these biomarkers, we integrated many external resources to provide clinicians and researchers with capabilities for exploring ontology, precision medical knowledge, and genomic and expression data of glioma patients. GlioMarker can provide rapid and comprehensive knowledge of glioma diagnostic biomarkers for clinicians and researchers, facilitating high-quality research and applications as a powerful and practical tool with a user-friendly interface. In the future, multiple types of biomarkers will be covered in GlioMarker version 2.0.

Materials and methods
The flowchart of GlioMarker construction is shown in Figure 1, including literature-based mining and the establishment of biomarker knowledge exploration. The second part includes ontology annotations, precision medical knowledge exploration, and genomic and expression data exploration. The details are described in the following sections.

Literature survey and data selection criteria
To guarantee high-quality of data collection, all the data for GlioMarker were collected from the public database PubMed (https://pubmed.ncbi.nlm.nih.gov/) by manual text mining as follows ( Figure 2). First, a the detailed literature search was performed in PubMed using the following keywords: The flow chart of GlioMarker construction. We then filtered these articles according to the following specifications: (1) Only full publications were considered to collect, whereas case reports, communication letters, comments, and review articles were excluded (1334 were retained).
(2) After reading the titles and abstracts, articles that did not meet the diagnostic biomarkers relationship were excluded (621 retained). (3) Articles without the necessary data were excluded (500 retained).

Information extraction
We curated the 500 articles manually by reading the full-text, and two independent curators reviewed each returned article. Their curated information was then integrated. Other members of our team routinely performed additional quality checks.
Information about the diagnostic biomarkers was adequately extracted based on 33 fields, covering biomarker descriptions, clinical information, associated literature, experimental records, associated diseases, statistical indicators, etc. Importantly, we manually sorted out the knowledge points of each biomarker, which were defined as the original results of the biomarker. The data dictionary for the 33 fields is presented in Supplementary Table S1.
It is worth noting that if a biomarker was published several times, multiple corresponding records were included in The pipeline of literature curation. GlioMarker. After quality control checks, 406 diagnostic biomarker entries were finally included in GlioMarker.

The establishment of biomarker knowledge exploration
We integrated many external resources to provide GlioMarker with further knowledge exploration functions of diagnostic biomarkers obtained from the literature mining.
(1) Ontology exploration. In this function, users can obtain more ontology information about the biomarker. Ontology annotations were supported by HUGO (16), NCBI Gene (17), Ensembl BioMart (18) (2) Precision medical knowledge exploration. In this function, users can find the relationship between any two or more components of diseases, drugs, genes, and variants to explore the knowledge of precision medicine, which was supported by PreMedKB (28).
(3) Omics data exploration. With this function, users can explore the clinical application value of a specific diagnostic biomarker through online analysis of genomic and expression data from glioma cohort studies. The visualization of genomic and expression data was based on the customized cBioPortal (29) and GEPIA (30) platforms, respectively.

Database construction
GlioMarker was constructed in PostgreSQL (10.0), Django, and Python (3.8). HTML, CSS, JavaScript, and Vuejs were used to build the web interface. The Nginx was selected as the HTTP Server. These web operations were implemented in the CentOS (7.5.1804) operating system.
The receiver operating characteristic curve (ROC), its associated area (AUC ROC ), sensitivity, and specificity, are essential to globally assess the diagnostic performance of a biomarker in clinical use (31). Thus, in the process of literature extraction, these relevant indicators were included in GlioMarker. Among them, 81 biomarkers have been evaluated by ROC, and the AUC values were all greater than 0.6. Importantly, we have assigned evidence levels to the biomarkers in the GlioMaker database. A total of 4 levels of evidence were defined, namely "Biomarker has been approved by the FDA", "Biomarker is validated in clinical trials", "Biomarker is validated in preclinical research (in vitro or in vivo models)", and "Biomarker is putative one based on data analysis", to indicate the robustness of the biomarkers. For each study, we manually compiled relevant knowledge about the biomarker, including its up regulators, targets, and knowledge points. In total, 1718 knowledge points, 16 up regulators, and 84 targets were retrieved.
Some diagnostic biomarkers that were reported multiple times may have more application prospects, and multiple corresponding records are provided in GlioMarker. Four different biomarker types are involved, among which GFAP (32-38) and miR-21 (39-45) were reported in more than 7 different publications (Table 2).

Web interface
To allow users to better explore the glioma diagnostic biomarkers, an integrated GlioMarker with a user-friendly web interface was constructed. Database navigation occurs based on a set of menus, including HOME, BIOMARKER, DOWNLOAD, CURATION, ABOUT, and FEEDBACK.
A navigation menu and a data summary block are presented on the HOME page ( Figure 3A). Information on 406 biomarkers, 1718 knowledge points, 12 ontology annotations, and 3 integrated external resources is displayed in the data summary block.
On the BIOMARKER page, the diagnostic biomarkers list is provided ( Figure 3B). The biomarker list has a filter function, and tabular information can be customized to allow users to build specific search strategies to suit different study designs. Keyword search accepts different types of names and accession IDs as search queries. Moreover, five subpages can be accessed by clicking on the name of a biomarker to facilitate further knowledge exploration: Curation, Ontology, Knowledge, Genomic Data, and Expression Data, respectively. On the Curation subpage, biomarker information extracted from the literature was displayed in detail and classified as "General", "Clinical", "Experimental", "Disease", "Statistics", etc ( Figure 3E). The Ontology subpage allows users to obtain more ontology annotations about the biomarker from other resources ( Figure 3F). The Knowledge subpage displayed precision medicine exploration results of the relationship between diseases, drugs, genes, and variants ( Figure 3G). The Genomic Data and Expression Data subpages provided online analysis of genomic and expression data from the glioma cohort study to explore the clinical application value of a specific diagnostic biomarker (Figures 3H, I).
On the CURATION page, the main contents of the diagnostic biomarker-related publications are summarized. The original articles are linked to PubMed via PMID ( Figure 3D).
The biomarker list and related documentation are available on the DOWNLOAD page ( Figure 3C). Help documents and our contact information are available on the ABOUT page. Finally, on the FEEDBACK page, users can provide comments or suggestions about GlioMarker.

Case study
Here, we first use GFAP as a case to illustrate biomarker knowledge exploration in GlioMarker. When the user entered "GFAP" in the search box, seven records of GFAP studies were available from 2007 to 2021 ( Figure 4A). The results revealed that GFAP was significantly elevated in the plasma (33,36) and serum (34,35,37) in glioblastoma (GBM) patients. Serum GFAP levels were able to distinguish GBM from non-GBM patients, and the maximum AUC was 0.9 (34). In addition, the expression pattern of GFAP-d can also be used as a histopathological diagnostic biomarker for spinal astrocytoma (32) ( Figure 4B). Users can further explore more knowledge of GFAP through the  (1) In ontology exploration, users can access other external resources by using the hyperlink to obtain more ontology annotations ( Figure 4C).
(2) Precision medical knowledge exploration results showed many variants in GFAP, and some were significantly related to Alexander disease ( Figure 4D). (3) Genomic data exploration revealed GFAP copy number alterations, networks, pathway reports, etc., at the genome level ( Figure 4E). Among them, pathway reports indicate that GFAP is involved in autophagy signal transduction mediated by molecular chaperones, which may provide insights for further research on the mechanism of GFAP in the occurrence and development of gliomas. Moreover, the exploration of expression data shows that GFAP is highly expressed in low-grade glioma (LGG) patients, and this high expression is associated with a lower survival rate ( Figure 4F). In addition, GlioMarker has incorporated biomarkers that allow users to differentiate between low-or high-grade diseases. For example, CHI3L1(Alias symbols: YKL-40) was highly differentially expressed in high-grade glioma (HGG) tissue (46), and this protein can also be monitored in patients' serum and help confirm the absence of active disease in GBM (47). Alterations in Galectin-1 (48), miR-766-5p and miR-376b-5p (49) levels in serum and ADLH1A1 (50), and WEE1 (51) levels in tissue might also be used as auxiliary diagnostic indicators of HGG. Moreover, users can also find potential LGG diagnostic biomarkers in GlioMarker. For example, the serum anti-FLNC autoantibody (52), the level of which was significantly higher in low-grade glioma patients than in high-grade glioma patients or in normal volunteers, represents a potential serum biomarker for the early diagnosis of LGG. In addition, HLA-DRA (53) and Fam20C (54) are also promising biomarkers for LGG diagnosis and prognosis.
GlioMarker can also aid in the identification of biomarkers suitable for liquid biopsies. By filtering "Source" in the columns and checking "Serum" and "Plasma", 66 relevant studies were identified, and most of these studied forcus on circulating miRNAs. Taking miR-210 as an example, its robust level can be detected both in exosome (55), serum (56) and plasma (57) to distinguish patients with glioma from healthy controls, and maybe a promising diagnostic and prognostic biomarker. Additionally, through precision medical knowledge exploration, we found that miR-210 was also related to other diseases, such as siderosis (58)and pediatric osteosarcoma (59).

A wide variety of biomarker types
GlioMarker focused on glioma diagnostic biomarkers, which include a diverse range of biomolecule types. Biomarkers described in GlioMarker include protein, miRNA, lncRNA, circRNA, mRNA, DNA, as well as imageological, epigenetic, immunological, and metabolic features.

Comprehensive coverage of curated information
The original data of the diagnostic biomarkers were manually curated based on the full text. Each entry into GlioMarker was reviewed by two independent curators, and each biomarker was defined in the context of associated literature, experimental records, disease type, and clinical relevance, among other indicators. Therefore, information about the biomarkers in GlioMarker is more accurate and comprehensive. Importantly, these knowledge points are the actual results of the biomarker and help users quickly access the biomarker's research content, providing evidence for the potential clinical diagnostic biomarker from a contextual perspective. This feature represents a significant advantage that distinguishes GlioMarker from other databases and will help construct reliable knowledge graphs in the future. GlioMarker will be updated every 12 months.

Powerful biomarker knowledge exploration capabilities
By integrating many external resources, clinicians or researchers can further explore the knowledge of these diagnostic biomarkers from three aspects. (1) Obtain more ontology annotations.
(2) Identify the relationship between any two or more components of diseases, drugs, genes, and variants to explore information regarding precision medicine.
(3) Explore the clinical application value of a specific diagnostic biomarker through online analysis of genomic and expression data from glioma cohort studies.

User-friendly search methods
In GlioMarker, users can access the biomarker of interest through the list or keyword search. The biomarker list has a filter function, allowing users to build specific search strategies to suit different study designs. The keyword search feature accepts different types of names and accession IDs as search queries.

Direct data transfer and integration.
GlioMarker allows users to extract knowledge they are interested in through convenient downloads. Users can also submit biomarker curation data created by themselves according to our template to integrate into GlioMarker.

Limitations and future perspectives
According to the visions of predictive, preventive, personalized, and participatory medicine (P4 Medicine) (62)(63)(64), GlioMarker is a database focused on diagnostic biomarkers in glioma. While considerable effort has been made by our curation team to capture all the relevant information, there is no doubt that some biomarkers or newly emerging molecular biomarker types might have been missed. Similarly, although some diagnostic biomarkers included also have prognostic, therapeutic, or predictive values, GlioMarker currently does not include a sufficient number of prognostic, predictive, therapeutic, adverse drug effect, or drug efficacy biomarkers.
We are actively exploring this area and expect that these biomarkers will be included in the 2.0 version of the database. In addition, the literature related to biomarkers is emerging in large numbers, so we need to filter the literature more strictly before curating in the next version, for example, by focusing on higher-quality papers and some rich biomarker performance data (i.e., quantitative expression results, sensitivity, specificity, ROC curve, reproducibility, statistical significance, threshold or cutoff value).
The 2016 WHO classification of central nervous tumors added molecular features to the histological diagnosis for the first time (9). With the rapid development of molecular oncology and the discovery of promising biological biomarkers, the 2021 edition adds more types/subtypes of tumors defined by biological and molecular characteristics and no longer reflects histological subtypes (65). We are inspired to witness these changes, and this is where GlioMarker's vision lies. However, there is still some controversy regarding the 2021 edition classification, and indepth follow-up studies are still needed. Many researchers believe that histologic morphology is still the primary requirement for diagnosing diffuse glioma and dose not rely exclusively on molecular alterations. For example, diffuse astrocytomas with lower histologic grade and IDH wild type with TERT promoter mutation are classified as glioblastoma in the 2021 fifth edition classification. However, different histologic grades still have different prognoses (66). In addition, since the new classification system was newly proposed, this change was not reflected in many previous studies. Therefore, the 2016 edition was still referenced in GlioMarker, and the classification of diseases according to histological morphology was retained. To improve the robustness and adaptability of GlioMarker, we plan to take the following measures to introduce new classification systems in subsequent versions. For example, by adding new fields to label the types of diseases classified according to the 2021 fifth edition, using typing trees can be used to help researchers adapt to the old and new editions of the classification, etc.
In the era of precision medicine, the application of biomarkers is crucial. However, many biomarkers remain in the stage of scientific research and are still far away from clinical application. Our Prophet project is committed to building a broad tumor biomarker platform and becoming a bridge connecting scientific research and clinical applications. GlioMarker aims to create a comprehensive biomarker platform in glioma and become a bridge between scientific research and clinical practice.
In addition, most of the biomarkers contained in GlioMarker only have potential application value, and the underlying mechanisms for most diagnostic biomarkers remain unknown. Therefore, in the next version of GlioMarker, we will establish a knowledge map based on knowledge points, and more omics data will be integrated. The biomarkers in GlioMarker will also be used for further metaanalysis to obtain more robust evidence. We are looking forward to more participation in the user community and receiving feedback on GlioMarker version 1.0.

Conclusion
GlioMarker is the first diagnostic biomarker database in the field of gliomas to provide biomarker knowledge exploration, including comprehensive literature mining, ontology annotations, precision medical knowledge e xpl or a ti on , a n d on l in e om ic s d at a a na ly s is an d visualization. GlioMarker may function as a professional platform for clinicians and researchers to promote the highquality research and application of diagnostic biomarkers in gliomas as a powerful and practical web-based tool with a user-friendly interface.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions
ZR, JY, and YL, the main author of the study, conceived the study and contributed to the writing. ZR, JY, YL, XL, and SS took part in designing and conducting the study. XC, ZM, SW, XL, SS and ZR manually collected literature, YG, SZ, MF, JL, and ZR curated the biomarkers-related data. JY, YS, and YH developed the web interface. QC and ZC helped in the interpretation and analysis of data. TY contributed to the manuscript review, revision, and funding support. All of the authors read and approved the final manuscript.