- 1National Centre for Biological Sciences (TIFR), Bangalore, Karnataka, India
- 2Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- 3Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
Introduction: The translation of India's extensive traditional knowledge on indigenous medicinal plants into modern therapeutic solutions is contingent upon a systematic framework. While traditional Indian medicine offers a rich source of therapeutic leads, this knowledge is often not structured for modern computational analysis, creating a barrier to systematic drug discovery.
Methods: To this end, we present GRAYU, a curated and comprehensive online database that integrates data across multiple categories, connecting 1,039 traditional formulations to 12,743 indigenous plants, 129,542 phytochemicals, and 13,480 indicated diseases.
Results: GRAYU provides insights into 1,370,257 plant-phytochemical, 116,531 plant-disease, 2,389 plant-formulation, and 4,087 formulation-disease associations. We show potential applications on phytochemical analogs, sustainable plant substitution, and disease-network analysis, highlighting the potential of integrative graphs to decode shared molecular signatures and therapeutic networks across traditional Ayurvedic formulations.
Discussion: GRAYU represents a user-friendly resource for researchers to investigate complex bio-associations and formulate novel therapeutic hypotheses, with insights from traditional Indian medicine. GRAYU organises reported associations and computational relationships and can hint at mechanistic causality or biological activity; however, all outputs require contextual interpretation and further experimental validation. The database is available at https://caps.ncbs.res.in/GRAYU/.
1 Introduction
Traditional medical systems such as Ayurveda, practiced in India for more than 3,000 years, constitute a vast repository of empirical knowledge on medicinal plants, polyherbal formulations, and their therapeutic applications (Gurib-Fakim, 2006). Despite this long history and widespread usage, much of the information remains dispersed across classical texts, ethnobotanical surveys, and modern phytochemical studies, limiting its accessibility and integration into contemporary biomedical research. As natural products continue to play a critical role in drug discovery, with approximately 34% of approved small-molecule drugs in the last 4 decades being natural products (Newman and Cragg, 2020), natural product derivatives, or botanical drugs, there is a pressing need to digitize and standardize traditional medicine knowledge for computational and translational use.
A growing body of contemporary research continues to demonstrate the broad biological relevance of plant-derived phytochemicals, reinforcing the value of traditional medical systems as reservoirs of pharmacologically active molecules. For example, curcumin from Curcuma longa has been widely studied for its diverse effects across metabolic, inflammatory, and neurological pathways (Sureda et al., 2023), while numerous other herbs/phytochemicals have been shown to influence key physiological processes relevant to human health (Salehi et al., 2018). Such studies illustrate the depth of bioactive chemical diversity within traditional herbal knowledge and further highlight the importance of organizing this information within structured, interoperable digital frameworks.
In recent years, several databases have been developed to collate information on medicinal plants and their phytochemicals. For example, the IMPPAT 2.0 database provides a manually curated resource of over 4,000 Indian medicinal plants, nearly 18,000 phytochemicals, and more than 1,000 therapeutic uses. It offers chemical structures (2D and 3D), physicochemical and drug-likeness scores, and ADMET predictions (Vivek-Ananth et al., 2023). Similarly, OSADHI (Kiewhuo et al., 2023) is another platform aiming to digitize traditional Indian medicine, compiling phytochemical data for Indian medicinal plants and enabling chemoinformatic methods in natural-product drug development. Beyond the Indian traditional medicine sphere, integrative efforts like SymMap (Wu et al., 2019) have sought to combine Traditional Chinese Medicine (TCM) with modern medicine through symptom mapping and molecular mechanism links. Similarly, BATMAN-TCM (Liu et al., 2016; Kong et al., 2024) provides predicted TCM phytochemical–target protein interactions, while HERB (Fang et al., 2021; Gao et al., 2025) integrates clinical trials and meta-analyses with experimental evidence for TCM. These efforts highlight the growing importance of databases that connect traditional formulations to modern molecular and clinical data.
Despite these advances, important gaps remain. Many existing platforms emphasize plant–phytochemical or plant–therapeutic associations, but do not systematically incorporate classical polyherbal formulations, which are central to Ayurveda. In addition, Ayurvedic disease terms, often recorded in Sanskrit or regional languages, lack direct mapping to standardized biomedical vocabularies, creating barriers to integration with global biomedical resources. Finally, most existing interfaces provide limited functionality for multi-layered exploration, restricting users to linear searches rather than enabling graph-based traversals across plants, phytochemicals, formulations, and diseases.
To address these limitations, we present GRAYU, a graph-based resource for Ayurveda that integrates over 12,000 medicinal plants, more than 1,000 Ayurvedic formulations, nearly 130,000 phytochemicals, and over 13,000 diseases into a unified framework. GRAYU maps Ayurvedic nosology to modern disease ontologies to enable interoperability, while providing interactive graph visualization and advanced search pipelines to support complex queries. By combining curated associations with computational tools, GRAYU not only preserves traditional knowledge but also provides a platform for drug discovery and mechanistic exploration.
2 Materials and methods
2.1 Data curation
The GRAYU database was constructed by integrating extensive data from multiple sources, covering Ayurvedic formulations, medicinal plants, phytochemicals, and diseases. The curation process involved several key stages to ensure data accuracy, consistency, and comprehensive integration.
2.2 Curation of formulations
Formulations were obtained from the Ayurvedic Standard Treatment Guidelines, the Ayurvedic Pharmacopoeia of India and the Ayurvedic Formulary of India. The Ayurvedic Standard Treatment Guidelines are available on the website of the National Ayush Mission, Ministry of Ayush, Government of India, while the Ayurvedic Pharmacopeia and Formulary are available on the website of the Pharmacopeia Commission for Indian Medicine & Homeopathy, Ministry of Ayush, Government of India. While the Ayurvedic Standard Treatment Guidelines is the primary source of the data that maps formulations to diseases, the Ayurvedic Pharmacopeia and the Ayurvedic Formulary are the predominant sources for mapping the formulations to their constituent plants. Ayurvedic terms and their descriptions were adapted from Ayurvedic Standard Treatment Guidelines and Ayurvedic Pharmacopeia (Supplementary Table S1). The integration between the sources was done by assigning a unique standardised name based on the transliteration of the Sanskrit name, and a single node was created for multiple instances of the same formulation across both sources. Data integration in cases such as multiple mentions of a formulation, across sources, as well as standardising the transliteration of the Sanskrit names of the formulations, was done manually.
2.3 Curation of phytochemicals and medicinal plants
2.3.1 Phytochemical data consolidation
The phytochemical data was aggregated from several specialized databases, including CMAUP (Zeng et al., 2019; Hou et al., 2024), FooDB (FooDB, 2025), HMDB (Wishart et al., 2007; 2022), IMPPAT (Mohanraj et al., 2018; Vivek-Ananth et al., 2023), and NPASS (Zeng et al., 2018; Zhao et al., 2023). To effectively integrate the information from these varied sources, we established PubChem (Kim et al., 2025) as the foundational reference for consolidation.
First, a comprehensive mapping was created to link phytochemical records from the various source databases to their corresponding PubChem Compound IDs (CIDs). The core of our phytochemical dataset was then initialized using detailed records from PubChem. Subsequently, a script systematically processed each phytochemical entry from the source databases. For each entry, relevant data fields were extracted based on a predefined template that specified the path to the information within the original JSON structure.
To maintain data integrity, properties from the source databases were only added to a phytochemical’s record if that property was not already present from the PubChem data. This approach ensured that the high-quality, standardized information from PubChem was preserved while enriching it with additional data from other specialized resources. Values were cleaned to remove extraneous characters, and in cases where a property contained a list of values, they were concatenated into a single string. The final, consolidated phytochemical data was then written to a CSV file, forming the phytochemical nodes of our knowledge graph.
2.3.2 Medicinal plant data integration
Data for medicinal plants was curated and unified from three primary databases: CMAUP, IMPPAT, and OSADHI. The integration process was designed to create a single, comprehensive record for each unique plant species.
A script iterated through the JSON files from each of the plant databases. The scientific name of the plant was designated as the primary identifier for unifying records from different sources. For each plant, a master record was created, and properties from the various databases were progressively merged. To correct typographical errors in plant names, we mapped names to external taxonomic databases, Plants of the World Online (POWO) and World Flora Online (WFO), and merged obvious typographical variants among names that did not map to either database (Govaerts et al., 2021; WFO Team, 2021).
A predefined schema guided the extraction and merging process, distinguishing between unique properties (e.g., scientific name, taxonomic IDs), where only a single value is expected, and list-based properties (e.g., synonyms, common names), where multiple values are permissible. For list-based properties, values from different sources were appended and delimited by a semicolon. A specific function was also implemented to clean and standardize the scientific names to a consistent format (e.g., “Genus species”). This meticulous process of extraction, cleaning, and merging resulted in a unified dataset of medicinal plants, which was then exported to a CSV file to serve as the plant nodes in the graph. Furthermore, the script also parsed and extracted relationships to diseases and phytochemicals, which were written to separate edge files for constructing the knowledge graph.
2.4 Curation of disease nodes and ontologies
Disease ontologies were obtained from two primary sources: the Medical Subject Headings (MeSH; (Dhammi and Kumar, 2014)) database (https://www.nlm.nih.gov/mesh/) and the Disease Ontology (DOID; (Baron et al., 2024)) via the Ontology Lookup Service (McLaughlin et al., 2025) (https://www.ebi.ac.uk/ols4/ontologies/doid). From MeSH, only entries under the Diseases branch (MeSH Tree ID: C) were included, with each unique identifier treated as a distinct disease node. Similarly, disease and symptom identifiers from the DOID ontology were extracted and converted into corresponding nodes. To avoid redundancy, nodes were merged wherever DOID provided mappings to MeSH terms.
Two types of intra-category relationships were incorporated: “IS_SUBCLASS_OF” and “HAS_SYMPTOM.” Parent-child hierarchies were derived from the MeSH and DOID ontology trees, while “has symptom” relationships were taken directly from DOID. Finally, Ayurvedic descriptions of diseases and conditions were manually mapped to the most closely corresponding disease nodes, based on semantic similarity between Ayurvedic terminology and the standardized MeSH or DOID labels (Supplementary Table S2).
2.5 Knowledge graph construction
The GRAYU database is implemented using Neo4j, a native graph database, to effectively model and query the complex relationships between Ayurvedic formulations, medicinal plants, phytochemicals, and diseases. The construction process involved defining a graph schema with constraints and indexes, followed by the systematic loading of nodes and the creation of relationships from the curated CSV files. The core schema consists of four primary node labels: Phytochemical, Plant, Disease, and Formulation. To ensure data integrity and optimize query performance, unique constraints were established for primary identifiers: the PubChem CID for Phytochemical nodes, the standardized scientific name for Plant nodes, a unique disease node id for Disease nodes, and the formulation name for Formulation nodes. Additionally, indexes were created on frequently searched properties, such as phytochemical names, MESH and DOID identifiers, to accelerate data retrieval. Data was loaded into the graph in batches using the apoc.periodic.iterate procedure for efficient, parallelized processing. Each row from the respective CSV files was used to create a node with its properties meticulously mapped and type-cast (e.g., to integer, float, or boolean). Text fields containing multiple values delimited by semicolons were converted into list properties within the nodes. Once the nodes were loaded, relationships were created to connect them, forming the knowledge graph:
1. FOUND_IN: A relationship from a Phytochemical node to a Plant node, established by matching the phytochemical’s CID and the plant’s scientific name.
2. ASSOCIATED_WITH_DISEASE: A relationship from a Plant node to a Disease node, created by matching plants to MESH or DOID identifiers listed in the curated edge files.
3. IS_INGREDIENT_IN: A relationship from a Plant node to a Formulation node, with properties such as parts and quantity attached to the relationship itself.
4. ASSOCIATED_WITH: A relationship from a Formulation node to a Disease node, also including the corresponding ayurvedic_term as a property on the relationship.
5. IS_SUBCLASS_OF and HAS_SYMPTOM: Intra-disease relationships created by linking Disease nodes to each other based on the hierarchical and symptomatic information curated from the MESH and DOID ontologies.
This structured approach results in a robust and queryable knowledge graph, providing a comprehensive digital framework for exploring the multifaceted data of traditional Indian medicine.
2.6 Online database construction
The online database was constructed using a backend server built with the Flask web framework in Python. A data access layer was developed to encapsulate all database interactions, translating high-level application logic-such as paginated data retrieval for browsing, execution of simple and complex multi-step advanced searches, and fetching detailed information for individual entities into optimized Cypher queries executed via the official Neo4j Python driver. The front-end was constructed with standard web technologies, including HTML for structure, CSS for styling, and JavaScript for dynamic interactivity. For data visualization, graph-specific endpoints provide data formatted for the Cytoscape.js library, enabling users to interactively explore the network of relationships between entities.
3 Results
3.1 Statistics of the graph
The comprehensive data integration process culminated in the creation of GRAYU, a large-scale knowledge graph comprising 157,010 nodes and 1,520,687 relationships (edges). The phytochemicals constitute the largest set of nodes, underscoring the chemical diversity captured within the database. The most prevalent relationship is FOUND_IN (Phytochemical-Plant), followed by ASSOCIATED_WITH_DISEASE edges (Plant-Disease), representing the therapeutic links between plants and diseases. Other relationships, such as ASSOCIATED_WITH (Formulation-Disease) and IS_INGREDIENT_IN (Plant-Formulation), establish the core therapeutic framework of Ayurveda, while hierarchical relationships like SUBCLASS_OF provide ontological depth to the disease classifications.
The overall architecture of these connections is visually summarized in the meta-graph shown in (Figure 1). The schematic illustrates the primary node types and the directed edges that define the relationships between them, such as a plant being an ‘ingredient in' a formulation, which in turn ‘is associated with’ a disease. This high-level view provides a clear conceptual framework of the GRAYU database, representing how multilayered traditional knowledge has been structured and interconnected within our system.
Figure 1. The meta-graph of the database which shows the types of nodes present, and the relationships which exist between them.
3.2 Website features and use cases
The GRAYU web portal provides a user-friendly interface for querying, browsing, and visualizing the integrated knowledge graph (Figure 2). The platform is designed to support a range of enquiries, from simple keyword searches to complex, multi-step queries, facilitating the exploration of relationships between formulations, plants, phytochemicals, and diseases.
Figure 2. The homepage of the GRAYU web portal, presenting a navigation bar, a search bar and summary statistics of the database.
3.2.1 Data access and search functionalities
Users can access the data through several mechanisms. A quick search bar, available throughout the site, allows for keyword searches across all four categories (Formulations, Plants, Phytochemicals, and Diseases) and accepts scientific names, common names, and synonyms. Alternatively, a browse functionality presents the data in paginated, sortable tables for each category. Each entity in the database has a dedicated detailed results page, which presents its specific properties and associated connections. For example, a formulation page displays its ingredients (herbs) and indicated diseases, while a plant page lists its constituent phytochemicals, associated diseases, and inclusion in formulations. These pages also feature an interactive graph visualization of the entity’s immediate network connections. For more complex inquiries, an advanced search page enables the construction of multi-step, filtered queries. Users can build a search pipeline where the results from one query serve as the input for the next, allowing for the progressive filtering and exploration of interconnected data. This feature supports the application of multiple filters based on the properties of both the primary entities and their connected nodes.
3.2.2 Interactive network visualization
A dedicated graph page offers a powerful tool for dynamic network exploration. Users can search for any entity and render it as a central node in an interactive graph. The initial display can be customized to show a specific number of connections, which can then be dynamically expanded. Right-clicking on any node reveals options to expand its connections, either by a specific category (for instance, to show all phytochemicals for a plant) or by showing all of its relationships. A properties panel provides detailed information for any selected node, and the interface includes standard controls for navigation and visualization management.
3.3 Applications of the GRAYU database
To illustrate the utility of GRAYU, three potential applications are presented, each highlighting a distinct application: (i) identification of molecular analogs through cross-species mapping, (ii) selection of sustainable substitutes for threatened plants, and (iii) Graph-based analysis of Ayurvedic formulations associated with Anemia. Together, they demonstrate the potential of GRAYU to integrate traditional pharmacology with molecular, ecological, and systems-level evidence.
3.3.1 Molecular analogs of benzylamine in ayurvedic medicinal plants
To illustrate how GRAYU can be applied for metabolite-centered exploration using curated knowledge, we focus on benzylamine (also known as moringine), a small metabolite reported from Moringa oleifera. Benzylamine has been discussed in the literature in the context of antidiabetic activity, with reports linking it to metabolic enzyme pathways involving dipeptidyl peptidase-IV (DPP-IV), α-glucosidase, and α-amylase (Shafi et al., 2022). We focused on this potent compound for demonstrating how known bioactive metabolites can be contextualized within an integrative knowledge graph. Rather than treating benzylamine as an endpoint, GRAYU enables its use as a starting node to explore broader chemical neighbourhoods and related molecular entities captured across multiple data sources.
Starting from benzylamine, GRAYU allows systematic traversal across interconnected layers of information, including its distribution across medicinal plants, associations with metabolic disease annotations, and links to structurally related metabolites curated from phytochemical databases (Supplementary Table S3). This graph-based exploration reveals that benzylamine analogs are present in several plant species beyond Moringa oleifera, highlighting alternative botanical sources and related molecular scaffolds that share chemical similarity with a literature-supported antidiabetic compound. By integrating metabolite occurrence, plant associations, and disease-linked annotations within a unified framework, this application demonstrates how GRAYU can be used to organize existing evidence, identify structurally related candidates, and support prioritization of analogs for future comparative or experimental studies. The example emphasizes the role of GRAYU as an exploratory and integrative platform for navigating curated metabolite-centric knowledge across traditional medicine resources.
3.3.2 Replacement strategy for threatened medicinal species
In Ayurveda, substitution of medicinal plants (Abhava Pratinidhi Dravya) is practiced when a herb becomes scarce or unavailable. Using GRAYU, we examined the substitution pair Myristica fragrans (nutmeg) and S. aromaticum (clove) closely, where the aril of M. fragrans (mace) is traditionally replaced by Syzygium aromaticum (Venkatasubramanian et al., 2010). GRAYU searches identified 155 shared phytochemicals, including eugenol, isoeugenol, myristicin, elemicin, 4-terpineol, camphene, β-caryophyllene, quercetin, kaempferol, ursolic acid, stigmasterol, dehydrodieugenol, p-cymene, β-phellandrene and linalool and 58 common disease associations, encompassing metabolic, inflammatory, and infectious disorders. While M. fragrans is listed as data deficient on the IUCN Red List, its wild relative Myristica malabarica is classified as vulnerable, highlighting the need for sustainable alternatives. Comparative graph exploration revealed 66 shared phytochemicals and 15 overlapping disease associations between M. malabarica and S. aromaticum, validating the latter as a promising substitute based on both biochemical and therapeutic similarity (Supplementary Table S4; Figure 3A). This case underscores GRAYU potential for conservation-driven substitution analysis, enabling evidence-based selection of ecologically viable alternatives without compromising on therapeutic equivalence. However, detailed comparisons of each feature, such as dosage levels, need to be carefully considered for replacement purposes.
Figure 3. GRAYU graph-based case studies illustrating phytochemical, plant, and disease interconnections. (A) Replacement strategy for threatened species: graph depicting the biochemical and therapeutic overlap between Myristica malabarica (vulnerable species) and Syzygium aromaticum, sharing multiple phytochemicals and 15 common disease associations including arthritis, asthma, cardiovascular disorders, and diabetes. (B) Network pharmacology of anemia: multi-layered network connecting 17 Ayurvedic formulations (blue), 10 hub plants (green), associated diseases (red) and phytochemicals (orange). The network highlights molecular convergence between Moringa oleifera and Trigonella foenum-graecum and identifies shared antioxidant and hematinic phytochemicals underlying anti-anemic formulations.
3.3.3 Graph-based network analysis of anemia
Two medicinal plants, also used as traditional foods, M. oleifera and Trigonella foenum-graecum are popularly used for diabetes management (Pasha et al., 2020; Naika et al., 2022). Next, we demonstrate how GRAYU can be used to identify shared plants and metabolites across Ayurvedic formulations associated with Anemia through graph-based exploration Both plants are widely utilized in traditional medicine for metabolic and hematinic disorders and are rich in overlapping flavonoids and phenolic acids such as quercetin, kaempferol, rutin, β-sitosterol, and ferulic acid. These metabolites are well documented for their antioxidant, anti-inflammatory, and glucose-lowering activities, supporting the traditional co-usage of these plants in formulations targeting diabetes, arthritis, obesity, and anemia (Pasha et al., 2020; Naika et al., 2022). The integrative graph of GRAYU highlighted these shared phytochemicals, which might suggest that such sets of chemicals can yield functionally redundant or synergistic therapeutic effects (Figure 3B; Supplementary Table S6), as two plants with differing phytochemical composition share therapeutic activity. This overlap supports the Ayurvedic concept of therapeutic equivalence and indicates potential synergistic or complementary use of the two herbs in multi-target formulations addressing metabolic and inflammatory disorders.
Extending from this plant-pair intersection to a disease-centric network, Anemia (Pandu Roga) was selected as a representative condition to explore multi-herbal connectivity. GRAYU identified 17 Ayurvedic formulations associated with Anemia, comprising nearly 90 unique medicinal plants traditionally prescribed for Rakta-vardhana (blood enrichment). Network centrality analysis revealed ten recurrent hub species, including Piper longum, Terminalia chebula, Phyllanthus emblica, Terminalia bellirica, and Zingiber officinale, that form the structural backbone of anti-anemic formulations (Supplementary Table S5; Supplementary Table S6). These plants possess well-established hematopoietic, antioxidant, and Rasayana properties, experimentally validated for enhancing iron absorption, erythropoiesis, and oxidative stress regulation. Further expansion to phytochemical nodes revealed 20 core metabolites, notably β-sitosterol, α-tocopherol, gallic acid, palmitic acid, stearic acid, and rutin, that collectively define a shared biochemical scaffold mediating hematinic and metabolic homeostasis (Kim and Nemeth, 2015; Cotoraci et al., 2021). The network suggests that Ayurvedic formulations for Anemia are systematically organized molecular ecosystems, designed to act through synergistic modulation of metabolic and redox pathways. GRAYU, thus, provides a computational framework to visualize, quantify, and rationalize these cross-layer interactions, bridging traditional therapeutic logic with systems pharmacology.
4 Discussion
The GRAYU database provides a unified framework for systematically connecting the diverse layers and categories of Ayurvedic knowledge with contemporary biomedical ontologies. The integration of over 12,000 medicinal plants, nearly 130,000 phytochemicals, more than 1,000 Ayurvedic formulations and over 13,000 diseases in one platform represents one of the largest resources to date focused on traditional Indian medicine. The scale and breadth of the associations, spanning plant–phytochemical, plant–formulation, formulation–disease and plant–disease connections, highlight the potential of Ayurveda as a knowledge system that can be interrogated using modern computational tools.
In comparison with other existing resources, GRAYU expands the landscape of natural product databases in important ways. Earlier efforts such as IMPPAT 2.0 (Vivek-Ananth et al., 2023) and OSADHI (Kiewhuo et al., 2023) provided comprehensive collections of Indian medicinal plants and phytochemicals, but their focus remained largely at the level of plant–phytochemical associations without embedding them within the context of classical formulations or mapping to standardized disease vocabularies. Similarly, global resources such as HMDB (Wishart et al., 2022) and NPASS (Zhao et al., 2023) have provided metabolite- and activity-centric perspectives, while Chinese medicine repositories including SymMap (Wu et al., 2019), BATMAN-TCM 2.0 (Kong et al., 2024), HERB 2.0 (Gao et al., 2025), and ETCM v2.0 (Zhang et al., 2023) have created powerful examples of integrating traditional systems with modern targets and diseases. GRAYU complements and extends these approaches by creating a graph-based architecture that is specific to Ayurveda; and explicitly bridges Sanskrit nosology with MeSH and DOID disease ontologies. This mapping is particularly important as it enables interoperability between Ayurvedic descriptions and widely adopted biomedical standards, thereby opening avenues for cross-database analyses and translational research.
The statistical analysis of GRAYU demonstrates the richness of the network, revealing a highly interconnected system where many plants contain phytochemicals shared across multiple formulations, and where a single disease node may be targeted through numerous plant–formulation combinations. This property reflects the holistic and polyherbal principles of Ayurveda, while at the same time, creating opportunities for network pharmacology approaches to identify convergent mechanisms. An important consideration when interpreting GRAYU outputs is the heterogeneity of evidence across the underlying sources. Canonical Ayurvedic texts provide high-confidence traditional associations, whereas curated literature and database derived links may include incomplete, inconsistent, or low-confidence records. Plant metabolites such as polyphenols are well known to generate false-positive associations in vitro and in silico due to their non-specific interaction profiles and potential classification as PAINS (Bolz et al., 2021). GRAYU therefore does not assign mechanistic meaning to any plant–metabolite–disease link, nor does it imply therapeutic efficacy. Instead, the resource organises relationships to facilitate exploration, prioritisation, and hypothesis generation, all of which require further experimental validation.
A major strength of GRAYU lies in its user interface and graph-based exploration features. The implementation of Interactive Knowledge graph and advanced search pipelines allows researchers to construct complex, multi-step queries, tracing the relationships from plants to phytochemicals and onwards to diseases or formulations. The expandable graph visualizations provide an intuitive view of the immediate neighbourhood of a node, while the advanced filtering enables detailed interrogation of properties. Such functionality transforms GRAYU from a static catalogue into a dynamic exploratory platform with intuitive dialogues. This is aligned with the evolving expectations of modern bioinformatics resources, where the ability to interrogate complex associations is valued as highly as data comprehensiveness.
The GRAYU-based applications collectively emphasize the capability to interlink traditional pharmacological wisdom with molecular and ecological data. All applications presented in this work are computational demonstrations and can only suggest pharmacodynamic activity or therapeutic recommendations. The Benzylanime analog study demonstrates the ability to pinpoint structural derivatives and cross-species analogs that may possess improved pharmacodynamic potential, serving as entry points for rational natural product modification leading to rational drug design. The substitution analysis between Myristica and Syzygium highlights the utility of GRAYU in identifying molecularly and therapeutically equivalent alternatives, supporting conservation and sustainable sourcing. Traditional foods and superfoods may belong to taxonomically diverse lineages, yet converge in chemical resources and in enabling better disease management. This is shown by comparing Moringa and Trigonella. Finally, the anemia network model shows that Ayurvedic polyherbal systems are not arbitrary aggregates, but systematically optimized biochemical ensembles, with recurring phytochemicals providing a unifying molecular framework. Collectively, these case studies reveal that GRAYU enables systems-level exploration of Ayurveda, supporting applications from drug discovery and sustainability planning to mechanistic hypothesis generation. By integrating graph-based visualization with experimental and clinical annotations, GRAYU bridges traditional and modern pharmacology, offering a reproducible computational platform for future integrative medicine research.
There are, however, limitations to be acknowledged. Because GRAYU is a data-integration and computational resource, all plant species and their associated phytochemicals are presented solely as curated entries drawn from authoritative sources, and the case studies illustrate only how graph-based exploration can reveal patterns, overlaps, or relationships within existing knowledge. These examples are therefore conceptual demonstrations of the utility of the knowledge graph and are not intended to imply pharmacological activity, therapeutic recommendations, or mechanistic claims. Any associations surfaced through GRAYU must be interpreted within the context of their original evidence base and require experimental validation before biological relevance can be inferred.
The extent of coverage of knowledge captured within GRAYU in the four categories critically depends on the information arising from primary data sources, many of which contain inconsistent or partially digitized information. Several Ayurvedic Pharmacopoeia of India (API) volumes are not fully digitized or publicly accessible, and the Ayurvedic Standard Treatment Guidelines (ASTG) often provide indication information without corresponding herb compositions. As a result, a subset of formulations presently lacks fully resolved plant mappings. GRAYU also does not currently incorporate ethnopharmacological activity labels (e.g., antioxidant, hepatoprotective), since these descriptors vary substantially across regional surveys and lack standardization; for now, we prioritize canonical Ayurvedic indications documented in official pharmacopeias. Importantly, the diverse evidence levels across the integrated datasets may include low-confidence or false-positive associations. Some plant metabolites, particularly polyphenols, are known to behave as pan-assay interference compounds (PAINS), which makes purely computational associations prone to over-interpretation. Methodologically, the graph networks necessarily follow a path across the categories - for example, from Plant-to-Disease and not from Phytochemical-to-Disease directly (as shown in Figure 1). The diversity of preparation methods and variation in formulation proportions cannot be fully captured in the current dataset. The mapping of Ayurvedic disease categories to modern ontologies, while carefully curated, remains an approximation in cases where concepts are not directly comparable, due to epistemological differences. Moreover, the majority of associations remain at the level of traditional knowledge and computational inference, underscoring the need for systematic experimental validation. Despite these challenges, GRAYU represents a significant advance by creating a scaffold upon which such validations can be prioritised and guided. GRAYU establishes a scalable foundation that can be expanded as more pharmacopeial sources become digitized, ethnopharmacological activities become standardized, and additional experimental or clinical evidence becomes available. Future releases will incorporate these layers to improve completeness, precision, and translational relevance.
In conclusion, GRAYU establishes a large-scale, curated, and graph-enabled platform that integrates data across categories (Ayurvedic formulations, plants, phytochemicals, and diseases) into a unified framework. By combining data richness, ontology mapping, advanced search functionality, and illustrative case studies, GRAYU not only preserves traditional knowledge in digital form but also positions it for active use in modern biomedical research and drug discovery.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://caps.ncbs.res.in/GRAYU/.
Author contributions
SJ: Data curation, Methodology, Software, Visualization, Writing – original draft. AP: Data curation, Methodology, Software, Visualization, Writing – original draft. DR: Data curation, Methodology, Software, Visualization, Writing – original draft. RM: Data curation, Formal Analysis, Validation, Writing – original draft. DA: Methodology, Visualization, Writing – original draft. AdS: Methodology, Visualization, Writing – original draft. NV: Formal Analysis, Validation, Writing – original draft. PR: Formal Analysis, Validation, Writing – original draft. VK: Data curation, Formal Analysis, Validation, Writing – original draft. PP: Data curation, Formal Analysis, Validation, Writing – original draft. SB: Data curation, Formal Analysis, Validation, Writing – original draft. SV: Formal Analysis, Validation, Writing – original draft. AnS: Data curation, Formal Analysis, Validation, Writing – original draft. KS: Data curation, Formal Analysis, Methodology, Validation, Writing – original draft. MP: Conceptualization, Formal Analysis, Project administration, Supervision, Visualization, Writing – review and editing. RS: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review and editing.
Funding
The author(s) declared that financial support was received for this work and/or its publication. RS is a J.C. Bose National Fellow (JBR/2021/000006) from the Science and Engineering Research Board, India. RS would also like to thank Bioinformatics Centre Grant funded by the Department of Biotechnology, India (BT/PR40187/BTIS/137/9/2021) and the Institute of Bioinformatics and Applied Biotechnology for the funding through her Mazumdar-Shaw Chair in Computational Biology (IBAB/MSCB/182/2022).
Acknowledgements
Authors would like to acknowledge Prof. Sudhir Krishna, Mr Naren Sridharan, Drs Adwait Joshi, Abhishek Sharma and Meenakshi Iyer for discussions and suggestions. We thank the vibrant and helpful discussions from various Ayurvedic practitioners and formulators, especially Rajesh Sreenivasan of BV Pundit Sadvaidyasala, Nanjangud and Drs. Darshan Shankar, Subrahmanya, Aswini Mohan L. and Vishnuprasad of Trans-Disciplinary University, Bangalore, which helped to curate the database. We thank NCBS(TIFR) for infrastructural facilities.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that generative AI was used in the creation of this manuscript. During the preparation of this manuscript, the authors utilized generative AI to assist in the development of Python scripts for data processing and the implementation of HTML, CSS, and JavaScript for the database web interface. Additionally, AI tools were used to enhance the grammatical accuracy and readability of the text. The authors have rigorously reviewed and verified all AI-generated code and manuscript revisions for technical accuracy and scientific integrity. The authors take full responsibility for the final content and the results presented in this publication.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar.2025.1727224/full#supplementary-material
References
Baron, J. A., Johnson, C. S., Schor, M. A., Olley, D., Nickel, L., Felix, V., et al. (2024). The DO-KB Knowledgebase: a 20-year journey developing the disease open science ecosystem. Nucleic Acids Res. 52 (D1), D1305–D1314. doi:10.1093/nar/gkad1051
Bolz, S. N., Adasme, M. F., and Schroeder, M. (2021). Toward an understanding of pan-assay interference compounds and promiscuity: a structural perspective on binding modes. J. Chem. Inf. Model. 61, 2248–2262. doi:10.1021/acs.jcim.0c01227
Cotoraci, C., Ciceu, A., Sasu, A., and Hermenean, A. (2021). Natural antioxidants in anemia treatment. Int. J. Mol. Sci. 22 (4), 1883. doi:10.3390/ijms22041883
Dhammi, I. K., and Kumar, S. (2014). Medical subject headings (MeSH) terms. Indian J. Orthop. 48, 443–444. doi:10.4103/0019-5413.139827
Fang, S., Dong, L., Liu, L., Guo, J., Zhao, L., Zhang, J., et al. (2021). HERB: a high-throughput experiment- and reference-guided database of traditional Chinese medicine. Nucleic Acids Res. 49, D1197–D1206. doi:10.1093/nar/gkaa1063
FooDB (2025). FooDB. Available online at: www.foodb.ca (Accessed September 11, 2025).
Gao, K., Liu, L., Lei, S., Li, Z., Huo, P., Wang, Z., et al. (2025). HERB 2.0: an updated database integrating clinical and experimental evidence for traditional Chinese medicine. Nucleic Acids Res. 53, D1404–D1414. doi:10.1093/nar/gkae1037
Govaerts, R., Nic Lughadha, E., Black, N., Turner, R., and Paton, A. (2021). The world checklist of vascular plants, a continuously updated resource for exploring global plant diversity. Sci. Data 8, 215. doi:10.1038/s41597-021-00997-6
Gurib-Fakim, A. (2006). Medicinal plants: traditions of yesterday and drugs of tomorrow. Mol. Asp. Med. 27, 1–93. doi:10.1016/j.mam.2005.07.008
Hou, D., Lin, H., Feng, Y., Zhou, K., Li, X., Yang, Y., et al. (2024). CMAUP database update 2024: extended functional and association information of useful plants for biomedical research. Nucleic Acids Res. 52, D1508–D1518. doi:10.1093/nar/gkad921
Kiewhuo, K., Gogoi, D., Jyoti Mahanta, H., Rawal, R. K., Das, D., Jamir, E., et al. (2023). OSADHI–an online structural and analytics based database for herbs of India. Comput. Biol. Chem. 102, 107799. doi:10.1016/j.compbiolchem.2022.107799
Kim, A., and Nemeth, E. (2015). New insights into iron regulation and erythropoiesis. Curr. Opin. Hematol. 22, 199–205. doi:10.1097/MOH.0000000000000132
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., et al. (2025). PubChem 2025 update. Nucleic Acids Res. 53, D1516–D1525. doi:10.1093/nar/gkae1059
Kong, X., Liu, C., Zhang, Z., Cheng, M., Mei, Z., Li, X., et al. (2024). BATMAN-TCM 2.0: an enhanced integrative database for known and predicted interactions between traditional Chinese medicine ingredients and target proteins. Nucleic Acids Res. 52, D1110–D1120. doi:10.1093/nar/gkad926
Liu, Z., Guo, F., Wang, Y., Li, C., Zhang, X., Li, H., et al. (2016). BATMAN-TCM: a bioinformatics analysis tool for molecular mechANism of traditional Chinese medicine. Sci. Rep. 6, 21146. doi:10.1038/srep21146
McLaughlin, J., Lagrimas, J., Iqbal, H., Parkinson, H., and Harmse, H. (2025). OLS4: a new ontology lookup service for a growing interdisciplinary knowledge ecosystem. Bioinformatics 41, btaf279. doi:10.1093/bioinformatics/btaf279
Mohanraj, K., Karthikeyan, B. S., Vivek-Ananth, R. P., Chand, R. P. B., Aparna, S. R., Mangalapandi, P., et al. (2018). IMPPAT: a curated database of Indian medicinal plants, phytochemistry and therapeutics. Sci. Rep. 8, 4329. doi:10.1038/s41598-018-22631-z
Naika, M. B. N., Sathyanarayanan, N., Sajeevan, R. S., Bhattacharyya, T., Ghosh, P., Iyer, M. S., et al. (2022). Exploring the medicinally important secondary metabolites landscape through the lens of transcriptome data in fenugreek (trigonella foenum graecum L.). Sci. Rep. 12, 13534. doi:10.1038/s41598-022-17779-8
Newman, D. J., and Cragg, G. M. (2020). Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803. doi:10.1021/acs.jnatprod.9b01285
Pasha, S. N., Shafi, K. M., Joshi, A. G., Meenakshi, I., Harini, K., Mahita, J., et al. (2020). The transcriptome enables the identification of candidate genes behind medicinal value of drumstick tree (Moringa oleifera). Genomics 112, 621–628. doi:10.1016/j.ygeno.2019.04.014
Salehi, B., Mnayer, D., Özçelik, B., Altin, G., Kasapoğlu, K. N., Daskaya-Dikmen, C., et al. (2018). Plants of the genus lavandula: from farm to pharmacy. Nat. Product. Commun. 13, 1934578X1801301037. doi:10.1177/1934578X1801301037
Shafi, K. M., Sajeevan, R. S., Kouser, S., Vishnuprasad, C. N., and Sowdhamini, R. (2022). Transcriptome profiling of two moringa species and insights into their antihyperglycemic activity. BMC Plant Biol. 22, 561. doi:10.1186/s12870-022-03938-6
Sureda, A., Tejada, S., Mir Khan, U., and Selamoglu, Z. (2023). An overview of the biological function of curcumin in the processes of oxidative stress, inflammation, nervous system, and lipid levels. Central Asian J. Med. Pharm. Sci. Innovation 3, 1–11.
Venkatasubramanian, P., Kumar, S. K., and Nair, V. S. N. (2010). Cyperus rotundus, a substitute for aconitum heterophyllum: studies on the ayurvedic concept of abhava pratinidhi dravya (drug substitution). J. Ayurveda Integr. Med. 1, 33–39. doi:10.4103/0975-9476.59825
Vivek-Ananth, R. P., Mohanraj, K., Sahoo, A. K., and Samal, A. (2023). IMPPAT 2.0: an enhanced and expanded phytochemical atlas of Indian medicinal plants. ACS Omega 8, 8827–8845. doi:10.1021/acsomega.3c00156
WFO Team (2021). World flora online gets a new look, a major data update, and launches the WFO plant list. TAXON 70, 1418–1419. doi:10.1002/tax.12557
Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, N., et al. (2007). HMDB: the human metabolome database. Nucleic Acids Res. 35, D521–D526. doi:10.1093/nar/gkl923
Wishart, D. S., Guo, A., Oler, E., Wang, F., Anjum, A., Peters, H., et al. (2022). HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 50, D622–D631. doi:10.1093/nar/gkab1062
Wu, Y., Zhang, F., Yang, K., Fang, S., Bu, D., Li, H., et al. (2019). SymMap: an integrative database of traditional Chinese medicine enhanced by symptom mapping. Nucleic Acids Res. 47, D1110–D1117. doi:10.1093/nar/gky1021
Zeng, X., Zhang, P., He, W., Qin, C., Chen, S., Tao, L., et al. (2018). NPASS: natural product activity and species source database for natural product research, discovery and tool development. Nucleic Acids Res. 46, D1217–D1222. doi:10.1093/nar/gkx1026
Zeng, X., Zhang, P., Wang, Y., Qin, C., Chen, S., He, W., et al. (2019). CMAUP: a database of collective molecular activities of useful plants. Nucleic Acids Res. 47, D1118–D1127. doi:10.1093/nar/gky965
Zhang, Y., Li, X., Shi, Y., Chen, T., Xu, Z., Wang, P., et al. (2023). ETCM v2. 0: an update with comprehensive resource and rich annotations for traditional Chinese medicine. Acta Pharm. Sin. B 13 (6), 2559–2571. doi:10.1016/j.apsb.2023.03.012
Keywords: data integration, drug repurposing, endangered plants, knowledge graph, network pharmacology, traditional foods
Citation: Joshi S, Pathak A, Regati DR, Menon R, Ajith DS, Sheshadri A, Viswanathan N, Ray P, Koul V, Panda P, Bhambore SA, Verma S, Sinha A, Shafi KM, Pavalam M and Sowdhamini R (2026) GRAYU: graph-based database integrating Ayurvedic formulations, medicinal plants, phytochemicals and diseases. Front. Pharmacol. 16:1727224. doi: 10.3389/fphar.2025.1727224
Received: 17 October 2025; Accepted: 29 December 2025;
Published: 22 January 2026.
Edited by:
Rajesh Kumar Pathak, University of Rovira i Virgili, SpainReviewed by:
Zeliha Selamoglu, Niğde Ömer Halisdemir University, TürkiyeMarilena Gilca, Carol Davila University of Medicine and Pharmacy, Romania
Kumud Pant, Graphic Era University, India
Copyright © 2026 Joshi, Pathak, Regati, Menon, Ajith, Sheshadri, Viswanathan, Ray, Koul, Panda, Bhambore, Verma, Sinha, Shafi, Pavalam and Sowdhamini. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Murugavel Pavalam, cG11cnVnQGhvdG1haWwuY29t; Ramanathan Sowdhamini, bWluaUBuY2JzLnJlcy5pbg==
†These authors have contributed equally to this work and share first authorship
Deepthi S. Ajith1