Challenges and opportunities of molecular epidemiology: using omics to address complex One Health issues in tropical settings

Tigistu-Sahle, Feven; Mekuria, Zelalem H.; Satoskar, Abhay R.; Sales, Gustavo F. C.; Gebreyes, Wondwossen A.; Oliveira, Celso J. B.

doi:10.3389/fitd.2023.1151336

REVIEW article

Front. Trop. Dis., 28 July 2023

Sec. Major Tropical Diseases

Volume 4 - 2023 | https://doi.org/10.3389/fitd.2023.1151336

This article is part of the Research TopicRecent Advances In “Omics” Of Tropical DiseasesView all 6 articles

Challenges and opportunities of molecular epidemiology: using omics to address complex One Health issues in tropical settings

Feven Tigistu-Sahle^1*

Zelalem H. Mekuria^2,3

Abhay R. Satoskar⁴

Gustavo F. C. Sales⁵

Wondwossen A. Gebreyes^2,3

Celso J. B. Oliveira⁵

¹Ohio State Global One Health initiative, Addis Ababa, Ethiopia
²Global One Health initiative (GOHi), The Ohio State University, Columbus, OH, United States
³Department of Veterinary Preventive Medicine, College of Veterinary Medicine, The Ohio State University (OSU), Columbus, OH, United States
⁴Department of Pathology, Ohio State University (OSU), Columbus, OH, United States
⁵Department of Animal Science, College of Agricultural Sciences, Federal University of Paraíba (UFPB), Areia, PB, Brazil

The molecular biology tools available since the early 1970s have been crucial to the development of molecular epidemiology as an important branch of public health, and are used for the identification of host genetic and environmental factors associated with both communicable (CDs) and non-communicable diseases (NCDs) across human and animal populations. Molecular epidemiology has significantly contributed to the understanding of etiological agents, disease distribution, and how to track outbreaks, as well as to prevention and control measures against tropical infectious diseases. However, there have been significant limitations compromising the successful application of molecular epidemiology in low-to-middle income countries (LMICs) to address complex issues at the animal–human–environment interface. Recent advances in our capacity to generate information by means of high-throughput DNA genomic sequencing, transcriptomics, and metabolomics have allowed these tools to become accessible at ever-lower costs. Furthermore, recently emerged omics fields such as lipidomics are improving our insights into molecular epidemiology by measuring lipid phenotypes that gauge environmental and genetic factors in large epidemiological studies. In parallel, the development of bioinformatic tools has revolutionized the utility of omics, providing novel perspectives to better characterize pools of biological molecules and translate them into the structure, function, and dynamics of organisms. Unfortunately, the use of such powerful tools has not been optimal for a One Health approach to both CDs and NCDs, particularly in low-resource tropical settings. The aim of this review is to present the fundamentals of omics tools and their potential use in molecular epidemiology, and to critically discuss the impact of omics on the evolving One Health dimension applied to tropical diseases. We use Ethiopia and Brazil as model systems to illustrate existing gaps and opportunities, while also addressing global applications. Moreover, we also discuss perspectives on exploring omics based molecular epidemiology in the context of One Health as a crucial approach to preventing and mitigating the burden of CDs and NCDs at the interface of human health, animal health, and the environment. This review shows that building capacity in the tropical regions is crucial to establishing equitable global health.

Overview: molecular epidemiology of tropical diseases and the One Health approach

Today, our planet is facing major complex global health challenges. Approximately 75% of public health emergencies associated with infectious diseases are of animal or environmental origin (1). Tropical regions such as sub-Saharan Africa, Latin America, and Southeast Asia are known to be major hotspots of emerging and re-emerging infectious zoonotic diseases of global significance (2). Furthermore, the consistent increase in major chronic non-communicable diseases (NCDs) in the tropical belt is known to be associated with environmental factors, mainly chemical hazards (3). The One Health approach addresses such complex issues at the interface of human health, animal health, plant health, and the environment (4). Recent examples of major epidemics and pandemics that necessitated One Health-oriented initiatives for prevention or mitigation include COVID-19, Ebola virus disease (EVD), Zika virus, Middle East respiratory syndrome (MERS), and many emerging and re-emerging vector-borne diseases, including dengue fever and chikungunya (5, 6).

Tropical regions, particularly those in developing countries, have experienced significant limitations in implementing these genotypic approaches to address epidemiologic problems with both communicable (CDs) and non-communicable (NCDs) diseases. We have previously reported on a detailed review of the use of molecular epidemiology to address leading bacterial and viral zoonotic diseases, and addressed the critical capacity-building needs at the interface of human health and animal health (4, 7). The problem is further accentuated with regard to the application of omics approaches. Although the use of omics approaches is rapidly expanding worldwide, with more advanced tools at ever-lower running costs, their implementation remains very limited in developing tropical regions compared with in developed regions where there is usually ample laboratory infrastructure and well-trained personnel. Moreover, although high-resolution genomic epidemiology can provide invaluable information about dissemination patterns of infectious agents, as shown in the COVID-19 pandemic (8), there is a paucity of information about the role of the environment and other potentials reservoirs of infectious agents (including zoonotic pathogens) in developing tropical regions. The successful integration of omics and non-omics data in analytical epidemiological models is a key bottleneck, and the application of more effective and relevant bioinformatics approaches to epidemiologic investigations is required (9).

Most diseases are complex in nature and result from the interaction of biotic and abiotic hazards (such as pathogens) with environmental factors or exposures at the system-level interface. Therefore, understanding these complex associations requires modeling exhaustive and appropriate data that characterize in detail such features and conditions at the molecular level. The statistical power of the strength of association between risk factors and disease states is weakened by the fact that phenotypes of different exposures are coded, not only at the genetic level but also at the macromolecular (lipid, protein, etc.) and molecular (fatty acid, amino acid, etc.) levels of detail.

In this review, we describe the conceptual fundamentals and current status of the use of various omics to address tropical and global diseases of significance to the One Health approach, including examples concerning foodborne, waterborne, vector-borne, and interrelated chronic diseases. We will highlight gaps associated with implementing omics tools in tropical developing regions using model systems from Latin America (Brazil) and eastern Africa (Ethiopia). These two tropical countries were chosen based on convenience and yet they represent developing regions with a distinct disparity in terms of economic status and availability of skilled personnel and laboratory infrastructure. We will highlight details of how genomics, metagenomics, transcriptomics, and metabolomics, such as lipidomic tools, are used to address communicable and non-communicable diseases. Finally, we recommend potential ways to develop and strengthen capacity for the application of omics tools and associated bioinformatic analyses in tropical developing regions.

The term “molecular epidemiology” emerged around the early 1970s with the use of molecular biology tools to link swine and human type A influenza virus H1N1 to the 1918 influenza pandemic (10). For the last half a century, a plethora of molecular biology techniques have been used to more accurately address epidemiologic investigations by, for example, discerning etiology, understanding risk factors, tracking sources of disease outbreaks, developing markers for diagnostic purposes, and assisting prevention and control approaches to epidemics and pandemics. Present-day molecular epidemiology is moving from the traditional reductionist approach toward omics-driven research, which is mainly spurred by the technological advances using high-throughput laboratory techniques that measure multiple biomarkers from different sources (11–14).

Molecular epidemiology must be recognized as a dynamic and rapidly evolving discipline. The molecular tools commonly applied to molecular epidemiology are usually used with the aim of investigating disease occurrence, disease distribution, or the determinants associated with disease distribution. However, sometimes the term molecular epidemiology has been misused in studies that should be more appropriately described as involving molecular taxonomy or phylogeny, as no epidemiological problem is addressed, despite the invaluable contribution of both phylogeny and taxonomy to the field of molecular epidemiology (15). Box 1 shows the definitions of terminologies pertinent to this review.

Box 1: Definitions of pertinent terminologies
Molecular epidemiology is the field of epidemiology in which information at the molecular level is obtained or explored by molecular tools to improve our capacity to make epidemiological decisions. Therefore, molecular epidemiology can provide additional information to classic epidemiological approaches (descriptive or analytic) that can be applied to a diverse health-related outcomes or events, such as investigating outbreaks, establishing surveillance systems, developing rapid diagnostic methods, detecting host genetic markers for disease susceptibility etc.

Genetic epidemiology: is the field of molecular epidemiology in which molecular biomarkers to assess genetic variations that allow predicting susceptibility and progression of disease. The term is usually applied to the study of non-communicable diseases (NCDs). Genetic epidemiology deployed the use of molecular biomarkers to assess genetic variations that allow predicting progression of various NCDs such as cancer, diabetes and cardiovascular diseases among others. Genetic epidemiology is particularly important in the control of inherited diseases and in the understanding of the multifactorial causes of diseases related to genetic aberrations among individuals or (sub-) populations.

Genomic epidemiology: is one more evolutionary step in progression of molecular epidemiology where genomic information, usually obtained by high-throughput sequencing methods and bioinformatics, such as whole genome sequencing (WGS) or metagenomics, is used to investigate the distribution and spread of infectious agents and to mitigate disease in a population. Although this term has been also used in studies involving host-related genetic background, its more frequently used to address etiologic agents.

The tools that have been commonly used over the last two-to-three decades have primarily focused on methodologies that indirectly show sequence variations of the targeted pathogenic organisms or eukaryotic gene biomarkers. Such methods are primarily based on the profiling of chromosomal or plasmid-based genetic determinants, DNA fragmentation built on targeted restriction digestion, the amplification of targeted gene(s), hybridization using targeted probes, the amplification of repetitive elements, or a combination of these approaches. Some of the most common methods used in molecular epidemiology of infectious diseases include random amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), pulsed-field gel electrophoresis (PFGE) (16, 17), amplified fragment length polymorphism (AFLP) (18), and repetitive palindromic PCR (REP-PCR) (19), among many others.

These indirect DNA fingerprinting (or also known as genotyping) approaches come with various advantages as well as limitations. They have great variability in terms of discriminatory power (measured based on Simpson’s index of diversity, i.e., the DI index) (20), repeatability (ability to give consistent results at the same conditions, usually referred to as intra-laboratory reproducibility), reproducibility (ability to give consistent results in different conditions, usually referred to as inter-laboratory reproducibility), typeability (applicability for the a specific organism or targeted genetic element), throughput (ability to test a large number of specimens in a defined period), and other optional criteria such as speed and cost. The relative importance of each criterion depends on the objectives of the epidemiologic scenario (outbreak investigation, surveillance, or research) as well as the availability of laboratory infrastructure, laboratory supplies, and skilled personnel. The incorporation of sequencing information provided by first-generation sequencing platforms (Sanger sequencing) represented an important evolutionary step in molecular epidemiology toward the development of methods such as multilocus sequence typing (MLST) (21) and spa typing, which is based on sequencing of the polymorphic X region of the protein A gene (spa) present in Staphylococcus aureus (22). A summary of the evolution of epidemiology with the application of various genotyping tools and their unique features is shown in Figure 1.

FIGURE 1

Figure 1 Schematic diagram depicting the evolution of epidemiology as a discipline from the classic descriptive/analytic epidemiology to molecular epidemiology, progressing from traditional indirect genotyping methods to the current and emerging omics applications. Block arrow (gray) shows the progression pattern of the molecular epidemiology approaches.

Overview of omics as a tool for understanding molecular epidemiology

The development of omics in the last few decades has been of increasing importance, as they allow us to address molecular epidemiologic questions in the most direct and discriminatory ways. These approaches are rapidly expanding across the world with more advanced tools that can rapidly sequence small or large RNA- or DNA-based genomes of bacterial, viral, or parasitic organisms. Consistent with this, the whole-genome sequencing (WGS) of pathogens has become an indispensable tool in epidemiologic investigations of veterinary and livestock infectious diseases. The cost of these genomic tools is also rapidly declining, making them more affordable to use for epidemiologic research, surveillance, and outbreak investigations. However, their implementation has been very limited to developed regions where there is ample laboratory infrastructure and well-trained personnel. The need for capacity building in low- and middle-income countries (LMICs), most of which are located in tropical regions, is of paramount importance.

More recently, approaches using next-generation sequencing (NGS), also known as second-generation sequencing (SGS) or massively parallel sequencing (MPS), have enabled us to obtain high-throughput genomic information capable of determining the order of nucleotides in the whole genome or targeted regions of DNA or RNA (cDNA) with scalability and speed, providing completely new perspectives for genomic approaches. We are now experiencing the expansion of third-generation sequencing technologies, also known as long-read sequencing technologies, which represent very powerful tools for genome assembly, especially by using short- and long-read hybrid assembly strategies. Later in this review, NGS and its utilities for molecular epidemiology in a tropical diseases context is described in greater detail.

In parallel with other high-throughput measurement methods, such as liquid chromatography combined with high-resolution mass spectrometry (LC-MS/MS) and nuclear magnetic resonance (NMR) spectroscopy, we are now able to collectively detect and quantify pools of biological molecules, such as small molecule substrates, intermediates, proteins, and products of cell metabolism, providing unique chemical fingerprints associated with specific cellular processes (23). The possibility of collectively characterizing these biological molecules and analyzing how they translate into structure and biological function, supported by increasing computational biology and bioinformatics, was the milestone for omics (genomics, transcriptomics, proteomics, and metabolomics including lipidomics). These tools enable the identification of biomarkers that are often crucial for the development of diagnostic or therapeutic tools. Biomarker datasets characterizing biological features by means of genomics, transcriptomics, proteomics, lipidomics, and metabolomics are commonly known as omics data, while non-omics data are usually either phenotypic or genotypic data measuring indirect differences in genetic makeup, as described in the earlier section.

The profiling of these biomarker datasets characterizing biological phenotypes at different levels of a biological system using highly discriminatory and high-throughput biotechnologies on DNA sequence data (genomics), RNA expression levels (transcriptomics), protein translation levels (proteomics), lipid synthesis levels (lipidomics), and metabolite levels (metabolomics) is commonly called omics data (24–26). Therefore, omics is the field of biomarker discovery research using sets of advanced tools to discover multiple factors that can be applied to understanding large sets of biological molecules involved in disease or health conditions; it explores the comprehensive set of an organism’s genetic and phenotypic makeup (24).

Advanced technologies including next-generation sequencing, single- or multi-cell RNA assays, high-definition spectrometry tools, and expression assays have enabled the profiling of biological systems at the genomic, transcriptomic, proteomic, lipidomic, and metabolomic levels. In doing so, omics studies have helped to indicate early molecular signatures of low-level exposures, broadening our understanding of healthy and diseased states. At the same time, epidemiological registries gather information such as individual reports of habits and symptoms, the characterization of diseases by pathologists, and clinical electronic health records, commonly referred to as metadata. Consequently, as the application of omics tools in molecular epidemiology has increased in recent years with the improved sensitivity, higher resolution, and greater number of data garnered by omics-based assays, it has been combined with robust metadata to decipher classical epidemiologic questions.

Through these advancements in multiple biomarker discovery techniques, modern population studies have moved from “black-box epidemiology” to “systems epidemiology” (27). Nevertheless, omics-based population studies will still require the basics of epidemiological studies, such as confounding, interaction, selection bias of the population, and measurement error principles, while also exchanging the reductionist view of the traditional approach for a more complex interpretation of different exposures on biological systems using multiple biomarkers (28). Therefore, in the context of understanding the complex effects of environmental factors on biological systems, it is crucial to integrate omics data and non-omics data (metadata) in the same models of association and prediction of health or disease status. This endeavor poses several challenges regarding data generation, capture, curation, sharing, analysis, and visualization, and regarding information privacy and storage (29).

omics application in One Health-related molecular epidemiology: some examples applied to NCDs

One Health is an integrated, unifying approach that aims to sustainably balance and optimize the health of humans, animals, plants, and ecosystems. It recognizes that the health of humans, domestic and wild animals, plants, and the wider environment (including ecosystems) are closely linked and interdependent. More than two-thirds of emerging and re-emerging diseases in humans are known to be of animal and environmental origin (30). Omics-based research provides in-depth information about the effect of exposure on the metabolome profile (the transcriptome, microbiome, lipidome, etc.) and how these complex changes may allow for the identification of associations between the gene expression profile and confounding environmental factors (31).

Although recent high-resolution and high-throughput omics tools are accelerating and expanding epidemiological population studies, the use of molecular tools to identify biomarkers associated to disease or health conditions started decades ago. A good example of the application of systems-level multi-omics is the Framingham Heart study established in 1948 (32), which revealed the complex dynamics between smoking, lifestyle, and metabolic disorders and susceptibility to coronary heart disease development (33, 34). In addition, genotyping studies have generated an important hypothesis on the environmental effect of benzene exposure on the development of hematological cancer (35, 36). Some of the approaches that have been used in the context of One Health include a microarray-based measurement of approximately 22,000 gene transcripts revealing a set of genes that were differentially expressed and further elucidating the biological pathways that are involved in arsenic-induced skin lesions. (37).

Although important advances have been achieved in terms of the identification of genetic markers associated with NCDs, such as diabetes mellitus (38, 39), cancer (40), and even obesity (39, 41), the vast majority of investigations have originated from developed high-income countries (42). Importantly, there is not only a scarcity of information on the genetic background of populations in LMICs, but also a lack of knowledge of the environmental risks affecting these populations. This is of paramount importance, as gene functionality can be affected by environment and nutrition (43). Moreover, the effects of environmental factors on gene functionality, driven by molecular mechanisms, such as DNA methylation and micro-RNA modifying gene expression, are heritable (44). This is particularly important for public health as an understanding of the implications of exposure factors, especially those associated with embryogenesis and gestation, can be key to preventing NCD in later life and future generations. This is the field of epigenetics (45); the mechanisms of human disease-causing mutations associated with changes in the epigenome or in the abundance and activity of proteins regulating chromatin structure are reported by Zoghbi and Beaudet (46). Epigenetic epidemiology can provide unprecedented insights into the evolutionary aspects of diseases at both the individual and population level. However, the particularities of the genetic background and biotic and abiotic exposure factors in LMICs should be considered when building rigorous causal models in these regions.

Genomics and metagenomics frontiers in genomic epidemiology

The term genomics is often used to refer to a whole-genome-sequencing (WGS) approach to understanding and managing infectious disease outbreaks, and it is inarguably one successful application of omics tools in investigating infectious diseases (47). The application of genomics in the epidemiological framework involves inferring relatedness among organisms by incorporating sets of epidemiological parameters known as metadata together with organism sequence information (48). In such a case, sequencing is usually performed in previously isolated organisms, especially cultured bacteria. For organisms containing shorter nucleic-acid contents, such as viruses, the whole genome can usually be obtained by either amplicon sequencing combinations in second-generation sequencing technologies or applying long-read sequencing third-generation technologies. On the other hand, shotgun metagenome sequencing (the sequencing of the total DNA of a sample without the previous isolation of targeted organisms) allows for the identification of the microbial community present in a sample and the characterization of these microorganisms in terms of the presence of virulence or resistance genes (49). For instance, shotgun metagenomics of blood culture bottles with periprosthetic tissue enabled the prediction of phenotypic antimicrobial resistance and virulence profiles for Staphylococcus aureus (50).

Although (meta)genomics can provide high-resolution information that is usually explored in epidemiological research, the use of genomics in routine clinical microbiology can differ significantly as the time to response is a key factor, whether it is applied to the rapid characterization of infectious agents directly from clinical samples or to the identification of pathogens to combat disease outbreaks (51). Importantly, the structural complexity and genome size of organisms have a profound impact on the practical aspects of genomics field application, including cost and speed. Although small genomes, such as viruses, facilitate the use of tools such as WGS, eukaryotic pathogens such as malaria are distributed in a wider variety of phylogenetic lineages. Their functional complexity relies on evolutionary aspects associated with pathogenicity and host adaptation (52, 53), making comparative genomics, for the identification of vaccine targets, for instance, more complicated, costly, and time-consuming (54–56).

Genomics spurred by technological innovation

Genomics, as is the case for all other omics fields, was spurred by technological innovation (57). The conventional methods for identifying and characterizing bacterial, viral, and parasitic pathogens from patient or environmental samples were established in the 19th century and are based on presumptive organisms in a system designed to selectively enrich specific pathogens (58). Toward the end of the 20th century, molecular and nucleic acid-based assays were used in various ways for pathogen genotyping and epidemiological traceability. NGS methods introduced a massive parallelization of DNA sequencing, exponentially increased the number of data generated, and at the same time lowered the cost of WGS (59). Hence, NGS has allowed researchers to sequence hundreds of organisms simultaneously (58, 60).

Since 2008, the National Institutes of Health (NIH) have reported a 50-fold drop in the cost per-megabase of DNA sequencing with the introduction of NGS methods. For bacterial genome sequencing, NGS methods cost approximately US $150 to US $250, compared with $94 PFGE, which was the gold standard method for genotyping foodborne bacterial infections on PulseNet (61). Further refinement of technologies has occurred through time, especially with the introduction of third-generation sequencing technologies (59). Third-generation sequencing technologies use single DNA/RNA molecules and offer long- to ultra-long reads, which are ideal for assembling repetitive mobile genetic elements and plasmids (62, 63).

On top of these technological innovations, the development of highly scalable and adaptable sequencing protocols facilitated the widespread use of WGS. A good example is the tiled amplicon sequencing approach developed for WGS of the Ebola virus in 2015 (64), Zika virus (ZIKV) in 2017, and West Nile virus (WNV) (65), and was quickly adapted for SARS-CoV-2 sequencing (the ARTIC protocol), which enabled large-scale sequencing with a considerable cost reduction ($10–50 per genome) (66). As of the compiling of this report, more than 13 million SARS-CoV-2 genomes and more than five million raw un-assembled (FASTQ) sequences have been shared openly through the Global Initiative on Sharing Avian Influenza Data (GISAID) and the National Center for Biotechnology Information (NCBI) databases, mostly using the tied amplicon sequencing protocol. This constitutes the largest single pathogen genome collection in history.

Genomics provide high-resolution insights for infectious disease epidemiology

A series of molecular epidemiology methods, including PFGE, MLST, and targeted amplicon-based sequencing have been developed and used to construct phylogenetic relationships (7). NGS and the ability to perform WGS have leapfrogged the iterative improvements of these older genotyping methods (67). Whole-genome sequencing of pathogens provides a single-base-pair-level resolution, making it ideal for investigating outbreaks in a population or hospital setting involving clonal pathogens that conventional molecular epidemiologic tools were unable to discern (67). Notable examples include a S. aureus outbreak investigation in intensive care and neonatal units showing that spa typing captured a minority of the transmission events, in contrast to WGS (68). Similarly, the US Food and Drug Administration (FDA) documented multiple incidents of Salmonella (69, 70), E. coli (71), and Listeria (72) outbreaks where conventional typing tools, such as the PFGE, were unable to attribute a source. In those instances, WGS was used to investigate the relatedness of clinical isolates to food and environmental sources containing identical strains of bacterium (73). A good example of the use of genomics is the mapping of Vibrio cholerae subtypes introduced in Haiti from South Asia that caused more than 9,000 deaths in the 2010 outbreak (74).

Moreover, a contemporary population-level genetic analysis powered by WGS can comprehensively dissect a pathogen’s evolutionary history and provides insights into origins and cross-species transmission events. A recent study that looked at 10,254 S. aureus genomes including 1,896 bovine isolates showed four independent host-jump events in the last 2,500 years (Figure 2). S. aureus human-to-animal transmission facilitated evolutionary adaption in livestock, which was then followed by the dissemination of seven endemic S. aureus clones through the cattle trade, causing bovine mastitis around the world (75). Beyond high-resolution genotyping, WGS is applied as a diagnostics tool in public health and clinical laboratories to guide treatments, for example, to guide appropriate antimicrobial therapy in the case of foodborne pathogens or monitor resistance to antiviral therapy in the case of human immunodeficiency virus (HIV) infections (76). It also constitutes a second-line confirmatory test to clarify test ambiguities, differentiate between species with complex genera, such as the Mycobacterium tuberculosis complex, and verify relapse or reinfection in cases of a second episode of infection (77).

FIGURE 2

Figure 2 Time-scaled phylogenetic reconstruction of Staphylococcus aureus that originated in humans, with cross-species transmission to bovines 2,500 years ago. Branches are color coded according to the host and arrows point to the main evolutionary jumps into the bovine population. Adopted (with permission) from 75; Proceedings of the National Academy of Sciences (PNAS)].

Whole-genome sequencing and genomic epidemiology have contributed significantly to the response to major infectious disease outbreaks in tropical regions in the last two decades. Real-time monitoring of virus spread using portable sequencers was demonstrated in the 2014 Ebola virus disease (EVD) outbreak that spread through Guinea, Liberia, Sierra Leone, and Nigeria (64). The genomic analysis showed that a West African EBV variant diverged from Central African lineages around 2004 (78) and crossed from Guinea to Sierra Leone around May 2014 to cause sustained human-to-human transmission (79, 80). Furthermore, large-scale sequencing efforts involving the virus from 2013 to 2016 showed that the outbreak involved a single spillover event of the Zaire Ebola virus from an animal reservoir before it was sustained through human-to-human and cross-border transmission events (78). For EVD, direct contact and human-to-human transmission remain the typical pathways. Genomic studies have proven that outbreaks can occur through sexual transmission of the virus from persistently infected individuals. This finding was a basis for the improvement of the WHO guidance on the repeated testing of semen samples before clearing a patient (81, 82).

Another major epidemic of the last decade was the Zika virus (ZIKV) outbreak in the Americas, which emerged in Brazil, where viral whole-genome sequencing data from the vector (mosquitoes) were used to shed light on the number of country-level virus introduction events. Using ZIKV whole-genome phylogenic reconstruction, studies were able to plot the rapid expansion of the ZIKV from Brazil to Puerto Rico, Honduras, Colombia, other Caribbean islands, and the continental United States (83). Modeling involving genomic data and traveler information have provided evidence of unreported hidden outbreaks of ZIKV in Cuba, providing a framework for understanding ZIKV dynamics in the Americas (84).

Most importantly, the past 3 years have been an illustrative example of how genomic data can support conventional surveillance. More than 13 million SARS-CoV-2 sequences, compiled through GISAID, mainly from human sources, were used for real-time monitoring of the pandemic’s spread and to design mitigation strategies. In those countries with periodic sampling, the SARS-CoV-2 genome provided early insight into the emergence of viral mutations and provided a platform from which to monitor SARS-CoV-2 evolution in real time (85, 86). Large-scale genomic comparisons allowed for rapid estimation of the rate of SARS-CoV-2 mutation and evolution in the early stages of the pandemic. The initial calculation showed that the SARS-CoV-2 mutation rate was at least 10 times lower than seasonal influenza. However, with the emergence of variants of concern (VOCs), the virus acquired a higher number of non-synonymous mutations, specifically in the spike gene enabling SARS-CoV-2 to evade natural and vaccine-acquired immunity and establish re-infections in previously exposed individuals (87–90) The genomic evidence, coupled with additional experimental findings, helped to design a bivalent booster dose of the mRNA vaccines to protect patients from severe COVID-19-associated pneumonia and complications (88, 91, 92).

Another use for SARS-CoV-2 genomic data was for spotting sources of novel variants. By sequencing viral genomes in a longitudinal manner from persistently infected individuals, studies elucidated the process of variant emergence, at least in part (93, 94). In addition, the SARS-CoV-2 genomes from non-human hosts were used to track the origins of some mutations associated with animal adaptations and to identify transmission routes to humans (95, 96). Also, genomic surveillance of SARS-CoV-2 originating from environmental and wastewater sequencing provided information on upcoming waves of infection and novel variants before detection in human surveillance (97).

Metagenomics: the microbial community approach

Metagenomics comprises NGS approaches that sequence and analyze multiple genomes of organisms simultaneously. Metagenomics combined with untargeted nucleic acid extraction allows a mixture of multiple genomes that cannot be cultured in the laboratory to be detected, enumerated, and functionally characterized. The key feature characterizing metagenomics as a unique technique is that it can be universally applied to bacterial, viral, and eukaryotic microbes all at once (98). It also circumvents the challenges of viable but non-culturable (VBNC) microbes and opens a transformative avenue for infectious disease investigations (99). Prior to metagenomics, analyzing these microbes required either isolating and amplifying a pure culture of any suspected pathogen or performing whole-genome sequencing in repeated rounds, which was highly costly, laborious, and time consuming. The utilization of nucleic acid for assessing bacterial diversity was developed based on the 16S rRNA marker gene and phylogenetic techniques pioneered in the field of environmental microbiology (100, 101). Other markers that are also specific to prokaryotic taxa, such as cpn60, rpoB, and 23S rRNA have been less frequently used (102, 103). Universal targets for eukaryotic organisms include the 18S rRNA gene or, alternatively, the nuclear ribosomal internal transcribed spacer (ITS) for specifically characterizing fungal populations (104, 105). Importantly, these targeted NGS approaches (amplicon sequencing) aiming at microbiome characterization (metataxonomics) have reduced sequencing costs compared with shot-gun metagenomics, contributing to their widespread use in microbiome studies, particularly in those focusing on gut microbial composition and dysbiosis. Although these methods provide limited information compared with metagenomics, as functional genes are not covered, the reasonable cost–benefit trade-off owing to the need for lower sequencing depth per sample has contributed to their increasing use in LIMCs. Furthermore, functional predictive bioinformatic tools based on taxonomic data are available (106) and can be exploited despite their inherent limitations.

Dysbiosis of the intestinal microbiota is closely associated with human infectious diseases. This is because the intestinal epithelial cells associated with the microbiota form the major host immune barrier. It is estimated that human somatic cells are less numerous than the microbial cells colonizing all the tissues (107). The stability of the human microbiome is influenced by several factors, including human genetics and interactions with the immune system during early development, body site, diet, antibiotic administration, and lifestyle (108). Understanding the human microbiota, how it interacts with the host, and how these microorganisms respond to infectious diseases is of critical importance for the understanding of disease epidemiology and for the development of new clinical interventions in tropical regions. For instance, unique gut microbial species have been identified in COVID-19 patients, such as Streptococcus thermophilus, Bacteroides oleiciplenus, Fusobacterium ulcerans, and Prevotella bivia (49), alongside significant changes in other taxa possibly associated with disease severity. These examples highlight the invaluable power of metagenomics in shedding light on the complex microbial interactions that clearly influence disease outcome.

The power of metagenomics as a One Health discovery tool

Infectious diseases of viral origin, particularly RNA viruses, are shown to cause the majority of the recurrent and novel emerging zoonotic, foodborne, waterborne, and vector-borne disease epidemics globally. The timely identification of novel pathogens has a tremendous impact on the subsequent mitigation and response. Unlike bacterial (16S rRNA) and fungal (ITS) agents, viral agents lack unifying marker genes that can be used to characterize the diversity of the virome. As a result, there is a global trend toward the utilization of whole-genome metagenomic sequencing approaches as mainstream methods to identify novel viruses. A recent example is the first identification of SARS-CoV-2 from a single patient admitted to the Central Hospital of Wuhan on 26 December 2019 experiencing severe respiratory syndrome. Metagenomic sequencing of the bronchoalveolar lavage provided the first conclusive evidence of a novel RNA virus from the family Coronaviridae (109). This discovery had a clear impact: within 12 days, species-specific SARS-CoV-2 RT-PCR was developed for the global monitoring of the spread of infection and to support the public health response (110).

Nearly two-thirds of emerging infectious diseases that affect humans are zoonotic, and three-quarters of these originate in wildlife, making surveillance of wildlife for novel pathogens part of a logical strategy to prevent future emergence. The greatest paradigm shift in recent viral surveillance is the application of metagenomics techniques at prioritized One Health interfaces to identify viruses with the risk of novel emergence. Global projects, such as PREDICT and the Global Virome, utilize this approach in wildlife surveillance of potentially zoonotic viruses. Over the span of 10 years, these efforts have characterized 1,000 unique viruses, including zoonotic diseases of public health concern such as Bombali ebolavirus, Zaire ebolavirus, Marburg virus, and MERS- and SARS-like coronaviruses (111, 112). Furthermore, metagenomic approaches can detect co-infections, attribute sources, and screen environmental samples for previously unreported or divergent pathogens (113, 114). Innovative metagenomic protocols such as meta-total RNA sequencing (MeTRS) displayed improved sensitivity and linearity in detecting viral, bacterial, and fungal communities with the greatest reproducibility while requiring lower sequencing depth (Figure 3). As a result, this approach has been increasingly used to investigate the prevalence, diversity, abundance, and co-occurrence of low-abundance pathogens; for instance, this approach was applied to Campylobacter in infants to explain colonization patterns that may have occurred through multiple reservoirs or from a reservoir that several Campylobacter species coinhabit (113). In this study, our team has also shown the use of metagenomics in studying 105 stool samples collected from children infected with multiple viral genera (Figure 4). Although Enterovirus was the most common, several children were shown to be infected with three or four other viruses.

FIGURE 3

Figure 3 Campylobacter spp. prevalence, diversity, and abundance in children’s stools detected using MeTRS. Blue and red cells represent the absence or presence, respectively, of Campylobacter spp. in stool samples. Adopted from the study by Terefe et al. (113; Frontiers in Public Health). MeTRS is found to be more sensitive than PCR and detects more species of Campylobacter than reported in other studies that utilized the 16S method.

FIGURE 4

Figure 4 Metagenomic profile of the gastrointestinal tract from 105 children’s stool samples from Ethiopia. Vertical columns represent each child’s sample. The first four columns are negative controls. Enteroviruses were the most common in most children. The reads were filtered to include viruses with a large number of gene segment reads.

Transcriptomics

Analysis of the pathogen transcriptome is critical for understanding host–pathogen interactions, predicting antimicrobial resistance, predicting virulence, and tracking disease progression in humans and animals. A hybridization-based microarray approach has been used in the past for studying gene expression, but it has many limitations, such as the need to know the sequence of genes to be studied in advance, artifacts due to cross-hybridization, and inaccuracies in quantifying gene transcript levels (115, 116). Other sequencing techniques such as series analysis of gene expression (SAGE) and cap analysis gene expression (CAGE), were found to be better than a microarray approach for the quantification of genes but they are labor intensive, require large quantities of RNA, and are not useful for quantifying spliced isoforms (117). In addition, all these previous methods are expensive and very difficult to implement in tropical developing regions because of a lack of skilled personnel. Next-generation RNA sequencing (RNA-Seq) in now routinely used for studying gene expression in pathogens and the host to predict and identify regulatory pathways involved in disease pathogenesis. RNA-Seq enables a high-throughput quantitative profiling of transcriptional gene expression in an unbiased manner. Several technical approaches are currently available for RNA-Seq, with each approach having its advantages and disadvantages. Bulk RNA-Seq involves the sequencing of total RNA, which contains ribosomal RNA (rRNA), pre-mRNA, and different classes of non-coding RNA (ncRNA). This sequencing approach is useful for measuring global gene expression patterns, alternative splicing, gene fusion, new transcripts, and isoform expression (118). The most commonly used bulk RNA-Seq involves a single-end short sequencing, which is largely focused on mRNA, and involves targeting the poly(A)-tail of mRNA using poly(T) oligonucleotides rather than depleting rRNA. This approach is useful for studying differentially expressed genes (DEGs) between the biological samples. The other approach, which is primarily useful for analyzing point mutations, novel transcripts, and long non-coding RNAs, usually involves the sequencing of rRNA-depleted libraries for better in-depth coverage. Nonetheless, bulk RNA-Seq fails to capture transcriptomes of individual cells within the biological sample and their heterogeneity. Bulk RNA-seq has been used to assess drug resistance (119); the RNA interactome (120); host–parasite interactions (121); and the characterization of the pathogen stages in the vector (122) in zoonotic diseases such as leishmaniasis, which is caused by a protozoan parasite Leishmania.

Recently, the single-cell RNASeq (scRNASeq) technique has become one of the preferred tools for studying transcriptome heterogeneity and discovering rare and uncommon cells. This technique allows for a rapid analysis of gene expression profiles of up to 20,000 individual cells in a single assay, and the identification and characterization of cell populations based on their transcriptomic signatures. scRNA-seq studies using fluorescent bacterial strains have revealed host-cell heterogeneity in vitro (123–125). Furthermore, Pisu et al. (126) used a multimodal scRNASeq approach in Mycobacterium tuberculosis infection that enabled the simultaneous acquisition of the host cell transcriptome and bacterial phenotyping of the individual infected cell. They demonstrated the functional heterogeneity of infected host cells in vivo, which could not be revealed using conventional technology (126). In addition, scRNA-Seq has proved critical for investigating transcriptomic diversity in eukaryote pathogens such as Plasmodium and Leishmania. In malaria, scRNA-Seq analysis of different parasite life stages has revealed transcriptome signatures that are associated with pathogenicity (127), sexual commitment (128), asynchronous development, and sporozoite maturation (129). In addition, single-cell epigenomics and scRNA-Seq can identify the transcriptome changes that regulate the development of host Th1 and Tfh cells in malaria (130, 131). An interactive Malaria Cell Atlas developed following comprehensive functional genomics analysis by scRNA-Seq of various Plasmodium species and parasitized host cells is a major milestone for malaria research (132). Similarly, the scRNA-Seq technique has been effectively employed to identify specific Leishmania parasite populations with a discernible transcriptomic signature that are involved in the formation of parasite hybrids between different Leishmania species in vitro on stress (133). Tissue-specific dual RNA-Seq in the liver and spleen of Leishmania donovani- and Leishmania infantum-infected mice by sequencing both host and parasite transcriptomes by bulk RNA-seq has identified distinct pathways that could be involved in the pathogenesis of these two infections (134). In a recent study using in silico-cell-type deconvolution for transcriptome analysis, a unique transcriptome signature in bone marrow stem cells from L. donovani-infected mice and the blood of VL patients from India, Ethiopia, and Brazil was identified as associated with parasite persistence (135). In addition, scRNA-Seq of the cells isolated from L. major-infected mice identified cellular heterogeneity and transcriptome alterations in the cells at the site of infection (136). Nonetheless, single-cell-based RNA-seq methods have not yet been used for simultaneous analysis of both host and parasite transcripts in vivo. The analysis of host and parasite transcripts at the single-cell level would allow for the identification of novel cell type, which is not possible using conventional methods, such as flow cytometry via cell annotation by transcriptomic analysis. Such an unbiased approach can be widely applied to other infectious diseases for which reference transcriptomes are available.

Challenge: genomics is expensive and metagenomics even more expensive

Since 2005, the number of NGS platforms with different costs, chemistries, capacities, and applications in microbial genomics has increased. With every emerging NGS technology, it is important to evaluate its use in accordance with the laboratory’s research needs and throughput requirements before adoption. Selection of the appropriate sequencing platform requires grounded knowledge on reagent accessibility, cost, and instrument preventive and curative technical support. Several guidance documents have reviewed the infrastructural and operational need for the establishment of genomic- and bioinformatics-associated capacities (137, 138). In addition, major companies in the sequencing space offer NGS solutions varying in size, cost, and throughput according to research and clinical service needs, as in the case of high-throughput sequencing cores. However, as of today, the NGS market share of infectious diseases/microbiological research is below that of human genetic, germline testing, and cancer-related applications, making it difficult to obtain the technical support and interest from companies necessary to invest in innovative means to scale down NGS costs for routine use in infectious disease research. Often, whole-genome bacterial and viral sequencing is performed with benchtop sequencers with a few exceptions for highly repetitive bacterial genomes that may require long reads, although hybrid sequencing usually seems to be the optimal strategy for superior genome assembly in such cases. Most pathogen sequencing is performed with short-read sequencing technology for high raw read accuracy. For applications that need a greater depth of sequencing, such as the characterization of minor variants in a microbial population and metagenomic studies, high-throughput sequencers are often needed. These requirements increase the initial investment from a few thousand to millions of dollars, depending on the sequencing platform. Substantial investments are also needed for ancillary equipment associated with quality control, measurement, and library preparation. The different chemistries available for NGS also have cost implications as, for example, metagenomic sequencing requires unbiased pathogen detection, hence careful assessment and consideration is needed to avoid errors and biased amplification of organisms. This is especially important with microbiologic and clinical research where abundance can be associated with disease severity or progression (139).

Bioinformatics: computational needs and analytical tools

In most developing countries, investment in genomics and bioinformatics capabilities was spurred more recently as a result of the COVID-19 pandemic as a reactive measure rather than a proactive investment (140). As a result, the laboratory facilities established have minimal or no computational space or expertise in the area of bioinformatics. An essential part of NGS (including genomics, metagenomics, and transcriptomics) capacity is the availability of infrastructure for data acquisition, analysis, and storage frameworks. Currently, instruments on the market produce sequencing in too large a quantity for most computer CPUs (60). Hence, laboratories in developing countries generate data that is beyond their capacity to store, manage, and analyze. In addition, the lack of harmonization in data streaming has created a choice of multiple (i.e., local, offsite, or cloud-based) platforms for data storage and the preferred approaches vary depending on cost, privacy risk, internet reliability, and ease of accessibility.

Genomic, metagenomic, and transcriptomic data analysis steps remain a challenge because the NGS raw sequence data (FASTQ) require complex bioinformatic steps to assemble fragments. Bioinformatic workflows for read-level processing, mapping, genomic assembly, polishing, variant calls, and clustering using phylogenetic approaches require individual tools that are not yet harmonized. Handling, interpreting, and making use of the massive datasets available, especially through metagenomics, requires a high level of computational skill and an up-to-date understanding of the taxonomic landscape via a well-curated database to provide context for and an interpretation of the result (67).

Another important challenge that has been observed in the context of NGS-generated information relates to data ownership and regulations for data sharing. It is undeniable that technological innovation and infectious disease control are affected by data access. In the context of microbial sequencing data, the sharing of pathogen information can improve surveillance systems and support drug development to control infectious diseases (141). An important example of the benefits of data sharing was observed during the SARS-CoV-2 pandemic, enabling a rapid response to the pandemic at both individual and population levels and contributing to the control of infection not only in high- but also middle- and low-income countries (142). However, complex barriers exist, primarily related to the conflicting interests and motivations of different sectors involved in data sharing, such as prioritizing public health, basic research, economic prosperity, and innovation. Broad discussion and coordination among sectors are urgently needed to establish rules and to ensure that data access is not a limiting factor for infectious disease control and inequality reduction.

Considering these limitations, internet and cloud technologies are providing remarkable opportunities for remote learning and data analysis, directly benefiting laboratories with limited infrastructure and a lack of specialized staff, as commonly seen in LMICs. Applications and pipelines for NGS-data analysis have been discussed elsewhere (67), including tools such as Taxonomer, EDGE, and Pathosphere. These and other low-cost or free bioinformatics platforms are shown in the Supplementary Material (S1).

Metabolomics application in One Health

The metabolome represents the entire low-molecular-weight metabolites of an organism and thus is distinctly different from genomic or transcriptomic sequences. The metabolite profiling of these entities using spectrometry techniques is termed metabolomics. As a relatively newer member of the ‘omics’ family, metabolomics seeks to define the entire complement of metabolites within a cell, tissue, or biofluid (143, 144). Metabolomics provides an analytical approach to reveal the response of a biological system to endogenous or exogenous stimuli, and, consequently, depict its steady-state physiological state (145). As the downstream product of gene transcription, proteome expression, and lipidome composition, the metabolome (including lipidomics) is tightly coupled to other omics fields such as genomics, transcriptomics, and proteomics. It represents an ideal platform for understanding the global systems biology of an organism and its relationship with environmental stimuli (146).

The development of metabolomics over the past two decades has mainly been spurred by the advancement of analytical techniques and bioinformatic tools (147). The most common instruments used in metabolomic analyses include liquid or gas chromatography coupled with high-resolution mass spectrometry (LC-MS/MS, GC-MS/MS) and nuclear magnetic resonance (NMR). The metabolomic approaches used for such purposes can generally be grouped into two classes: metabolite profiling (untargeted) and metabolite fingerprinting (targeted) strategies. The former is a hypothesis-free global investigation of metabolites in a biological system, whereas the latter represents a hypothesis-driven analysis of predefined metabolites (148–150; Castro-Puyana et al., 2013). The main objective of an untargeted metabolite-profiling approach is to establish patterns between different biological samples to potentially indicate significant differences and lead to biomarker identification. On the other hand, metabolic fingerprinting involves analysis of a specified list of metabolites as a result of certain biochemical questions about a particular pathway of an organism. Such targeted analysis characterizes phenotypes and attempts to describe the metabolic signatures or patterns left by environmental factors (151). Between these two approaches to metabolomics, the untargeted profiling of metabolites provides more potential for the discovery of novel metabolic associations of diseases, and has the potential to better characterize exposure and detect early markers of diseases, thus improving diagnostic and therapeutic methods.

The application of metabolomic techniques can provide several complementary inputs to classical molecular epidemiology approaches. The strong computational and technological application of metabolomics provides a strong correlation of metabolite data with metabolic pathways (152, 153). Moreover, the application of sensitive and robust analytical methods such as high-resolution mass spectrometry combined with chromatographic techniques (HPLC-MS/MS) or time-of-flight mass spectrometry (LC-TOF MS) significantly improve the sensitivity and specificity of small-molecule detection, and, consequently, allow the characterization and quantification of complex metabolic profiles in biological samples, which in turn results in the simultaneous measurement of hundreds of metabolites in a single sample (154, 155). In this section, we will review the application of metabolomics within the One Health concept by taking two countries, Ethiopia and Brazil, as model references and highlight the role of this technique in the clarification of issues that are relevant to the One Health topic.

Metabolomics applications in zoonotic infectious diseases and vector-borne diseases

The field of metabolomics has provided researchers with a new generation of tools that detect the small-molecular metabolites of organisms with improved sensitivity, ultimately improving our understanding of pathogen detection and analysis of diversity. Although this powerful analytical method has been extensively applied to molecular epidemiology and related fields, its use in the elucidation of zoonotic infections is still underexplored. This limitation is particularly prominent in the Global South, including sub-Saharan Africa and Latin America. However, there are a few studies reporting the use of metabolomics-based approaches to address zoonotic infections. For example, Lagatie et al. (156) reported on the use of metabolomics to identify potential markers of infection by the nematode Ascaris lumbricoides in populations from Ethiopia, Kenya, Belgium, and Indonesia. In another example of the application of metabolomics in One Health, serologic markers have been identified by NMR-based metabolomics in ruminants affected by caseous lymphadenitis caused by Corynebacterium pseudotuberculosis (157).

Modern metabolomic-based methods encompass the application of advanced analytical instrumentation to study the vector–host–pathogen interplay. Several metabolomic changes have been detected in response to arboviral infections—including dengue, chikungunya, malaria, and Zika virus—in serum, blood, and urine from humans, mice, and monkeys (158). Given that such markers could be identified in biofluids, such as serum and urine, metabolomic-based analysis can be used for non-invasive, population-based screening of various vector-borne diseases (159). Although there are few studies reporting the use of metabolomics in regions affected by neglected tropical and vector-borne diseases, interesting findings related to shifts in the metabolic profiles of serum samples of Zika-virus-infected newborns with or without microcephaly have been reported (160). The use of metabolomics together with other omics approaches (lipidomics, proteomics, and genomics) may provide new insights and aid future studies focused on disease comprehension and therapeutic applications for vector-borne diseases.

Metabolomics applications in food safety and toxicity

Food safety is a major concern for governments and regulatory agencies as a result of the significant increase in demand for animal- and plant-derived food globally in parallel with the rise of food fraud and adulteration. This is a potential field for the application of metabolomics, as there are numerous laws around the world regarding hazardous compounds such as chemical contaminants and toxins and their maximum residue limits (MRLs) (161) Current trends in food science have moved toward a multi-omics examination of metabolites using metabolomic analysis in combination with other omics technologies (e.g., genomics, transcriptomics, proteomics, and lipidomics) to elucidate the fundamental mechanisms of food spoilage and adulteration. In a study using LC-MS/MS, metabolomic methods showed the contamination of Brazil nuts with aflatoxin B1, B2, G1, and G2 (162, 163).

Owing to its robustness and high sensitivity, metabolomics has now been applied to the concept of the “exposome” (i.e., the entirety of human environmental exposures) and to the detection of the molecular imprint of intermediate markers of exposure and diseases, leading to the identification of causative agents (164). A study of Hirmi Valley liver disease (HVLD) in the Tigray region of Ethiopia by Robinson et al. (165) presented a unique application of metabolomics to human liver injury caused by plant hepatotoxins known as pyrrolizidine alkaloids. Using untargeted global ¹H NMR analysis, a number of metabolites associated with liver dysfunction have been identified in urine samples of HVLD cases, together with indicators of changes to the gut microbiome in humans in response to this plant hepatotoxin.

Metabolomics applications in environmental health

Biological systems are continuously exposed to several environmental factors with biochemical consequences, and, subsequently, environmental metabolomics is gaining ground as one of the emerging omics fields. It can be used for examining the metabolic fingerprints of environmental stressors, and, subsequently, for the development of biomarkers of toxicant exposure, metabolic response, susceptibility, and disease diagnosis and monitoring (166, 167). The application of metabolomics in this field will bring an improved understanding of the underlying mechanisms of toxic compounds in the environment. However, little evidence exists of the application of this field in resource-limited regions globally. The use of metabolomics in tropical regions, where environmental stressors are rapidly increasing as a result of urbanization, industrialization, climate changes, and other factors, is encouraged.

Metabolomics applications in non-communicable diseases

NCDs are responsible for 74% of all deaths globally, of which 77% are in low- and middle-income countries (168). Current clinical practices with respect to NCDs mainly focus on a select number of biochemicals that are directly related to pathophysiology and disregard the potential interaction of several other metabolomes of a patient. Clinicians are thus prevented from making the best possible therapeutic interventions as a result of the lack of such multi-omics data that reduce the systems-level approach to diseases. Several molecular epidemiology studies have used metabolomic data to explain the metabolic pathways associated with various pathologies, subsequently developing several biomarkers for disease diagnosis and prognosis in type II diabetes, cardiovascular diseases, hypertension, cancer, and heart failure (Wang et al., 2011; 169–173).

Although no such studies were found in our model countries of Brazil and Ethiopia, a study carried out in Western Africa (i.e., the Gambia) applied metabolomics analysis to the diagnosis of pneumonia, which was shown to be successful in identifying a number of urinary and plasma metabolites with a strong correlation to the incidence and severity of childhood pneumonia infection (174).

Challenges of applying metabolomics in One Health

At the macro level, similar to other omics approaches, the main challenges of applying metabolomics in tropical developing regions include the lack of skilled personnel and the lack of necessary infrastructure. When these are available, however, there are both technical and statistical challenges intrinsically associated with the processing of metabolomic data. The technical challenges include signal fluctuation, sensitivity reduction, environment alterations, analytical variances (152), which usually arise during sample handling and analytical laboratory preparation. Statistical challenges arise from the lack of proper software applications that can process the multidimensional and highly correlated metabolite variables of datasets. Moreover, there is a lack of standardized protocols concerning the type of specimens, sampling methods, storage conditions, and profiling-related variables (150, 175, 176). This is a critical factor for population-based epidemiological studies using metabolomic data from different sources.

Standardization is urgently required for the epidemiological application of metabolomics; a number of international collaborations are now working toward the development of standardized protocols guiding sample collection, processing, and data acquisition (177). Similarly, international collaborations are under way to address statistical challenges such as missing data and multiple correlated datasets, which for the most part have not yet yielded any gold standard approaches (175, 178–180).

Concluding remarks and recommendations

Molecular epidemiology is a crucial discipline that allows us to address critical public health challenges. This is particularly significant considering the rising burden of communicable and non-communicable diseases worldwide, especially those at the interface of human health, animal health, and the environment, which ultimately require the One Health approach. The rise of the omics approach described in this review has brought much more powerful, discriminatory techniques that allow us to discern etiologic agents, track infectious strains, and characterize metabolites at a large scale in complex biological systems. These technological developments have enabled developed regions (mainly in the Global North) to accelerate their capability for the early detection of potential public health emergencies and the development and utilization of biomarkers for the early detection of NCDs such as cancer, among many others. In contrast to such progress, tropical developing regions (mainly in the Global South) have been far behind in terms of building and strengthening skilled personnel and laboratory infrastructure. This disparity must be addressed and narrowed so that global health can be improved equitably and the planet can be made healthy and livable for everyone.

Increasingly, omics tools have become essential for surveillance, research, and outbreak control in epidemic and pandemic settings. However, there is great variation in their implementation across the world. This disparity is not only between countries with a high GDP (commonly referred as the Global North) and a low GDP (commonly referred as the Global South), but also among countries in tropical regions, where most Global South nations are located. We recommend the following:

● A major emphasis on strengthening laboratory infrastructure capacity for omics is important.

● Considering the significant gap in human capacity development, we recommend workforce training at PhD and laboratory technician levels to establish a functional critical mass of skilled workers who can address the needs in omics.

● In order to realize the above two points, it is crucial that developing countries in tropical regions invest significant financial resources in building and strengthening omics capacity.

● To accelerate the implementation of omics, it is also important to establish more robust technology transfer and data exchange systems, encompassing data access, security, and other important areas.

● It should also be noted that omics is much broader than genomics. We recommend the implementation of a more complete package of omics, including metabolomics and transcriptomics, in order to realize effective genotyping and phenotyping profiles.

Author contributions

FT-S: writing—original draft, reviewing, and editing. ZM: writing—original draft, reviewing, and editing. AS: writing—original draft, reviewing, and editing. GFCS: writing—original draft, reviewing, and editing. WG: conceptualization, writing—original draft, reviewing, and editing. CO: conceptualization, writing—original draft, reviewing, and editing. All authors contributed to the article and approved the submitted version.

Acknowledgments

We thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, proc. 3136678/2020-0) for providing a Research Productivity Fellowship (Level 1-D) granted to CJBO; NIH D43: Part of the funding for this work was provided through the NIH Fogarty International Center, GID D43 TW008650; Parts of this work were also funded by the Bill & Melinda Gates Foundation OPP#1175487. Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript version that might arise from this submission.

Conflict of interest

The author AS declares that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fitd.2023.1151336/full#supplementary-material

References

1. Jones KE, Patel N, Levy M, Storeygard A, Balk D, Gittleman JL, et al. Global trends in emerging infectious diseases. Nature (2008) 451(7181):990–94. doi: 10.1038/nature06536

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Allen T, Murray KA, Zambrana-Torrelio C, Morse SS, Rondinini C, Di Marco M, et al. Global hotspots and correlates of emerging zoonotic diseases. Nat Commun (2017) 8(1):1124. doi: 10.1038/s41467-017-00923-8

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Fuller R, Rahona E, Fisher S, Caravanos J, Webb D, Kass D, et al. Pollution and non-communicable disease: time to end the neglect. Lancet Planet Health (2018) 2(3):e96–8. doi: 10.1016/S2542-5196(18)30020-2

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Gebreyes WA, Dupouy-Camet J, Newport MJ, Oliveira CJ, Schlesinger LS, Saif YM, et al. The global one health paradigm: challenges and opportunities for tackling infectious diseases at the human, animal, and environment interface in low-resource settings. PloS Negl Trop Dis (2014) 8(11):e3257. doi: 10.1371/journal.pntd.0003257

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Belay ED, Kile JC, Hall AJ, Barton-Behravesh C, Parsons MB, Salyer S, et al. Zoonotic disease programs for enhancing global health security. Emerg Infect Dis (2017) 23(13):S65–70. doi: 10.3201/eid2313.170544

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Sharun K, Dhama K, Pawde AM, Gortázar C, Tiwari R, Bonilla-Aldana DK, et al. SARS-CoV-2 in animals: potential for unknown reservoir hosts and public health implications. Vet Q (2021) 41(1):181–201. doi: 10.1080/01652176.2021.1921311

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Gebreyes WA, Jackwood D, de Oliveira CJB, Lee CW, Hoet AE, Thakur S. Molecular epidemiology of infectious zoonotic and livestock diseases. Microbiol Spectr (2020) 8(2):8–2. doi: 10.1128/microbiolspec.AME-0011-2019