A Machine-Generated View of the Role of Blood Glucose Levels in the Severity of COVID-19

SARS-CoV-2 started spreading toward the end of 2019 causing COVID-19, a disease that reached pandemic proportions among the human population within months. The reasons for the spectrum of differences in the severity of the disease across the population, and in particular why the disease affects more severely the aging population and those with specific preconditions are unclear. We developed machine learning models to mine 240,000 scientific articles openly accessible in the CORD-19 database, and constructed knowledge graphs to synthesize the extracted information and navigate the collective knowledge in an attempt to search for a potential common underlying reason for disease severity. The machine-driven framework we developed repeatedly pointed to elevated blood glucose as a key facilitator in the progression of COVID-19. Indeed, when we systematically retraced the steps of the SARS-CoV-2 infection, we found evidence linking elevated glucose to each major step of the life-cycle of the virus, progression of the disease, and presentation of symptoms. Specifically, elevations of glucose provide ideal conditions for the virus to evade and weaken the first level of the immune defense system in the lungs, gain access to deep alveolar cells, bind to the ACE2 receptor and enter the pulmonary cells, accelerate replication of the virus within cells increasing cell death and inducing an pulmonary inflammatory response, which overwhelms an already weakened innate immune system to trigger an avalanche of systemic infections, inflammation and cell damage, a cytokine storm and thrombotic events. We tested the feasibility of the hypothesis by manually reviewing the literature referenced by the machine-generated synthesis, reconstructing atomistically the virus at the surface of the pulmonary airways, and performing quantitative computational modeling of the effects of glucose levels on the infection process. We conclude that elevation in glucose levels can facilitate the progression of the disease through multiple mechanisms and can explain much of the differences in disease severity seen across the population. The study provides diagnostic considerations, new areas of research and potential treatments, and cautions on treatment strategies and critical care conditions that induce elevations in blood glucose levels.

SARS-CoV-2 started spreading toward the end of 2019 causing COVID-19, a disease that reached pandemic proportions among the human population within months. The reasons for the spectrum of differences in the severity of the disease across the population, and in particular why the disease affects more severely the aging population and those with specific preconditions are unclear. We developed machine learning models to mine 240,000 scientific articles openly accessible in the CORD-19 database, and constructed knowledge graphs to synthesize the extracted information and navigate the collective knowledge in an attempt to search for a potential common underlying reason for disease severity. The machine-driven framework we developed repeatedly pointed to elevated blood glucose as a key facilitator in the progression of COVID-19. Indeed, when we systematically retraced the steps of the SARS-CoV-2 infection, we found evidence linking elevated glucose to each major step of the life-cycle of the virus, progression of the disease, and presentation of symptoms. Specifically, elevations of glucose provide ideal conditions for the virus to evade and weaken the first level of the immune defense system in the lungs, gain access to deep alveolar cells, bind to the ACE2 receptor and enter the pulmonary cells, accelerate replication of the virus within cells increasing cell death and inducing an pulmonary inflammatory response, which overwhelms an already weakened innate immune system to trigger an avalanche of systemic infections, inflammation and cell damage, a cytokine storm and thrombotic events. We tested the feasibility of the hypothesis by manually reviewing the literature referenced by the machine-generated synthesis, reconstructing atomistically the virus at the surface of the pulmonary airways, and performing quantitative computational modeling of the effects of glucose levels on the infection process. We conclude that elevation in glucose levels can facilitate the progression of the disease through multiple mechanisms and can explain much of the differences in disease severity seen across the population. The study provides diagnostic considerations, new areas of research and potential treatments, and cautions on treatment strategies and critical care conditions that induce elevations in blood glucose levels.

INTRODUCTION
SARS-CoV-2, a novel coronavirus closely related to its predecessor SARS-CoV-1 that was responsible for an outbreak in 2003, emerged toward the end of 2019 in China and reached pandemic proportions, probably within a month (1,2) causing the disease COVID-19. The actual average mortality rate is lower than the current 2-3% of all confirmed infections because this coronavirus also causes asymptomatic infections in a larger proportion of the population (3,4). Nevertheless, even with an order of magnitude more asymptomatic than symptomatic infections, this virus would cause over a 100 million hospitalizations and tens of million deaths if allowed to fully penetrate the world population. There are also increasing reports of persistent symptoms and various long-term sequelae from COVID-19 (5)(6)(7)(8), warning of an even deeper health crisis. Containment strategies, and lockdown when these fail, slow down full penetration of the world's population allowing nations time to prepare a public health strategy, improve treatments and develop vaccines. This is a major challenge since the current rate of infections is still in the hundreds of thousands per day, which provide ideal conditions for the virus to mutate. The disease has thus become endemic in the world and will most likely remain a health crisis for many years to come. It is thus of paramount importance to gain deep insight into the factors responsible for the progression of the disease, to improve disease management, and to develop new treatment strategies.
The main symptoms of COVID-19 are fever, cough, fatigue, dyspnea, myalgia, and chest pain, with diarrhea included among the less common symptoms (9)(10)(11)(12)(13). In addition, anosmia and a loss of taste are other early and long-lasting typical symptoms (14,15). In 70-80% of known cases, patients present with mild to moderate symptoms and the disease is manageable without hospitalization, with patients recovering within a few days or weeks. However, in about 15% of known infections the disease progresses to a severe form, with pneumonia as the primary complication often requiring hospitalization. Lung capacity decreases significantly and blood oxygen levels drop dangerously low, requiring nasal oxygen, and in more severe cases, intubation using mechanical ventilators. In 4-7% of known cases the disease becomes life threatening, requiring intensive care (16), with acute respiratory failure in around 20% of these cases (17).
The substantial amount of patient data that has become available has allowed the early identification of groups of people at higher risk of the disease progressing to a severe form and with a higher mortality rate. Of all COVID-19 deaths, more than 50% are patients over 80 years old (Supplementary Figure 1A). Indeed, the case fatality rate (CFR; the percentage of deaths among positively diagnosed infections) increases sharply with age: from <1% below the age of 50 years, to 2-3% around 60 years, and as much as 10-20% above the age of 80 (Supplementary Figure 1B). The main risk factors that add to this age-related CFR include hypertension, cardiovascular diseases, diabetes mellitus (DM) and severe obesity (18)(19)(20)(21)(22)(23)(24), with varying impact depending on the country (25). The precedent SARS-CoV-1 showed a similar clinical profile and also affected more severely the elderly and those with diabetes and hypertension (26)(27)(28). In fact, the mortality rate (MR; the percentage of deaths among all people) increases with age for many other diseases as well, and patients with diabetes, hypertension or cardiovascular disease are also more susceptible to succumbing to a range of diseases (29,30), including even seasonal influenza infections (31,32).
A puzzling aspect of COVID-19 is why the disease becomes so severe with age and preconditions, and in some apparently healthy or young patients. Most of these critical cases seem to be associated with a "cytokine storm" in the lungs (33,34), an exaggerated immune response that produces high levels of cytokines that damages the airway epithelium, leading to acute respiratory distress syndrome (ARDS), requiring ventilation or intensive care with intubation, which is fatal in 20-50% of cases (24,(35)(36)(37)(38). Survivors of the cases that require invasive ventilation also need long-term rehabilitation (39). The 20-50% deaths in intensive care units (ICU) is due to respiratory failure, multi-organ failure and/or septic shock (40,41). It has furthermore emerged that the virus affects blood coagulation, leading to micro-or macro-vascular thromboses often associated with acute pulmonary embolism and cardiac injury (42)(43)(44)(45)(46).
This pandemic has accelerated the development of a large number of vaccines on an unprecedented timeline (64)(65)(66). Several vaccines, based on different strategies (vector-based, mRNA-based or protein-based) and delivery systems (lipid nanoparticles, attenuated viruses) with proven efficacy and safety, are now available (60,(67)(68)(69)(70)(71). The vaccination campaign has significantly progressed in many countries, but the time required to get enough people vaccinated worldwide to eradicate the virus, or to lower the risk of new variants emerging is still too slow to stop the spread of the virus. With global travel mobility, virus variants may require vaccination boosters or complete restarts in nations previously fully vaccinated. Other uncertainties include the period of immunity and efficacy of the vaccines in the various groups at risk (72) and hence, investigations into the pathophysiology of SARS-CoV-2 and new treatments must continue in parallel and with urgency.
Understanding why some groups are naturally protected while others are vulnerable (73) may improve management of this disease. All the known preconditions (i.e., aging, DM, obesity, hypertension) are commonly accepted to be associated with chronic inflammation and a weaker immune system, which could explain the higher sensitivity and complications of the disease (74)(75)(76)(77)(78). Another association with severe cases that is emerging is hyperglycemia (53,55,79,80), and it is now well-accepted that a tight control of glucose levels is important in the management of COVID-19, not only in patients with diabetes (81)(82)(83)(84)(85) but also in general (86). However, the role that glucose plays in the progression of the disease and the importance of managing glucose levels in the aging population, in people with diabetes and in apparently healthy groups, is unclear.
In 2020, the White House launched the CORD-19 database (COVID-19 open Research Dataset), a dataset of full text articles on COVID-19, SARS-CoV-2, and related coronaviruses (87) that has been made open access to facilitate global collaboration in understanding and management of the pandemic and to accelerate development of treatments. The resource was created with an advanced data preparation pipeline (87), including clustering articles and removing potentially duplicated results and filtering out irrelevant data. Since it is humanly impossible for any researcher to read all these articles, let alone synthesize all the results, findings and knowledge, we attempted to make sense of this large amount of data by developing natural language and machine learning tools to automatically mine the contents, as well as knowledge graph technologies to synthesize the data and navigate the knowledge.
Specifically, we developed deep learning and natural language processing applications (entity extraction and linking) to mine and extract structured information from the large number of open-access publications of the CORD-19 dataset, and then used the knowledge graph as an expert knowledge system to follow the molecular steps of the infection and explore the role of glucose metabolism at each step of the disease progression; from the most superficial symptomatic associations to the deepest biochemical mechanisms implicated in the disease. The expert knowledge system (the "machine"), allowed us to navigate deep into numerous biochemical, homeostatic, and metabolic mechanisms of action of glucose in the context of this disease, and to find sets of articles that implicate glucose in the SARS-CoV-2 infection.
This approach has many potential pitfalls because the machine learning models treats all data in all articles equally. Indeed, the machine-generated output is analogous to a community vote on the concepts present in the literature, which has its strengths and weaknesses. The main weakness is that the model cannot judge the quality of each article, its output is vulnerable to biases within articles and to over representation of potentially erroneous concepts in the literature, and it filters out forefront research that has not yet reached the wider research community. The strengths are that it can digest a vast number of articles, represent all the concepts present in the dataset without human bias of the concepts, and filter out unsupported concepts. In a sense, the output is the common denominator of the community knowledge. To compensate for the machine learning's weaknesses, we manually validated some of the concepts by manually reviewing the literature and performed targeted original research to test some of the conclusions drawn. The limits of human interpretation of the machine-generated output are also subject to weakness, which are pointed out in the discussion.
A link between any two entities in the knowledge graph represent a non-classified and non-qualified association. It is however, an association that survived the algorithms that effectively manage a community vote, and is therefore a relevant, significant and unbiased association. Despite the fact that this is one of the simplest associations between any two concepts that can be constructed, we were surprised to find that the machinegenerated could tell a meaningful and potentially significant story. We found that abnormal glucose metabolism may not only be a strong predictor of disease severity, but may be the most likely fundamental reason why some people suffer a more serious form of COVID-19 than others. We tested the feasibility of this hypothesis by extracting and analyzing data cross articles, by testing the relevance and implications of some of the data reported in multiple articles using computational modeling, and by digitally reconstructing the virus and its immediate environment at the inner lining of the lungs, at an atomistic level. Methodologically, the study shows how an expert knowledge system can and cannot be used to review vast literature datasets to gain insight to the consensus being reached by the research community. The study suggests that a unifying hypothesis has actually already emerged at the collective research community level, which could impact the course of this and future pandemics.
A clinical hypothesis must be tested in well-controlled clinical trials before it can be used to take any medical actions. If clinically validated, the hypothesis of reduced glucose metabolic capacity as a pre-condition underlying age dependency and other pre-conditions of disease severity, and induced elevations of glucose in otherwise healthy patients, as favoring disease progression, would have implications for diagnostic measures during admission. These include measurement of postprandial glucose (PPG), ideally combined with HbA1c (glycated hemoglobin A1c). An alternative, perhaps more pragmatic measurement is the level of fructosamine that reflects glucose control over the previous few weeks. In addition, other measurements of insulin metabolism that not only aim at detecting diabetes, but any possible dysregulation of glucose metabolism such as occurring in pre-diabetes, acute hyperglycemia, impaired glucose tolerance (IGT), or stress induced hyperglycemia could be considered. This hypothesis also has implications for disease management where assisted control of blood glucose levels during hospitalization, prevention of hyperglycemia during critical care, and avoiding high levels of intravenous glucose in ICU becomes important. Glucose tolerance screening of those not yet infected by the virus could predict those groups with the highest risk of severe disease and enable improved mitigation strategies and prioritization for vaccinations.

Analysis of the CORD-19 Dataset
For our analysis, we used the CORD-19v47 dataset that contained, at the time of the study, over 240,000 scientific articles (see section Methods). Given that it is humanly impossible to read this number of articles, we developed machine learning models to extract the most frequent entities mentioned in the context of respiratory viral infections, coronaviruses in general, and SARS-CoV-2 in particular. We then constructed a knowledge graph of these entities to synthesize the data and navigate the subset of knowledge that specifically relates to a potential role of glucose in the progression of COVID-19 (Figure 1).
We began by extracting entities using named entity recognition (NER) models trained to recognize nine selected entity types (see section Methods, Entities Extraction). Each FIGURE 1 | Knowledge graph construction and analysis pipeline. Entities of interest are extracted using the named entities recognition (NER) techniques on the entire CORD-19 dataset, or on a subset of articles selected for matching user-specified query (Search). The extracted entities are then linked to the NCIT ontology terms and passed to the knowledge graph construction stage. On this stage we build a knowledge graph using extracted and linked terms as nodes and their co-occurrences as edges. The association strength between terms is quantified using a mutual-information based score of their co-occurrences. We perform various graph analytics tasks (e.g., community detection, BMIPs, minimum spanning trees computation) that allow us to navigate the knowledge graph and reveal connectivity patterns and subgraphs carrying the most important terms and their co-occurrences. Data from the graphs are then used to guide further literature review: a first seed set of article was obtained from the CORD-19, examining the articles featuring co-occurrence of terms of interest (first automatic selection of CORD-19 articles). Guided by this set of articles, we then perform a second step of classical literature review and consider other sources, not included in CORD-19 (see detailed description of the steps in section Methods). extracted entity was mapped to a term in the National Cancer Institute Thesaurus (NCIt) ontology allowing to resolve most of the ambiguities of lexical variations as well as synonyms, aliases and acronyms (see section Methods, Entity linking). Linking to the NCIt ontology also enabled access to standardized semantics of the entities, their humanreadable definitions, and their hierarchical structure within the ontology. This approach yielded over 400,000 unique and relevant entities.
Next, we constructed a knowledge graph by creating a node for each extracted entity and building a link when entities were co-mentioned. The importance of a node was computed as the weighted degree centrality, and the strength of a link was computed using mutual information techniques (see section Methods). In this context, the weighted degree centrality can be interpreted as the relative importance of the entity in the dataset, an edge between a pair of entities as the presence of some association between them, and the corresponding edge weight as a quantification of the strength of the association. The links do not on their own represent the type of association, which rather emerges from the overall structure of the graph. The network was then partitioned into the nine entity types to obtain a first high-level view of the contents of the CORD-19 dataset (Figure 2). The entity types protein and symptom/disease are the most represented entities in the CORD-19 dataset (27 and 21% respectively), whereas cell compartment is the least common. The six remaining entity types are roughly equally represented (between 6 and 11%). This rather trivial analysis does provide a first high-level view of the distribution of different entity types found in the dataset.
To validate that the associations between entities are semantically meaningful (as opposed to incidental), we applied community detection methods to objectively partition the knowledge graph into clusters of strongly associated entities (see section Methods, Community detection). The emergent communities that were automatically detected, revealed five different conceptually coherent topics (biology of viruses, diseases and symptoms, immune response, infectious disorders, and chemical compounds) supporting some degree of relevance of the associations (Supplementary Figure 3).

Presence of the Entity Glucose in the CORD-19 Database
To obtain a next deeper level view of the contents of the dataset, we measured the frequency of entity mentions in each article. COVID-19 is indeed the most frequently mentioned entity providing a minimal validation of the automatic entity extraction by the ML models ( Table 1). The entity glucose is found in 6,326 of the 240,000 articles, making it the 179th most frequently mentioned entity among more than 400,000 entities extracted. It is also the 17th most frequently mentioned entity in the entity type chemical (over >20,000 chemical entities extracted) ( Supplementary Table 2B), indicating the extent to which glucose is present in the CORD-19 database. Of these chemicals, the entity glucose in the one biochemical with the The entity glucose ranks 179th most frequent among the terms in the CORD-19v47 database following COVID-19-related entity-types recognition. The list of the 100 most frequent entities is available in Supplementary Table 2A. deepest and broadest association with all stages of the virus infection (see below).

Knowledge Graph of "Glucose in Coronaviruses Infection"
In order to obtain the context in which glucose is mentioned in the dataset, we performed a mutual information-guided search of the paths that are formed by the links of the knowledge graph from "glucose" to "SARS-CoV-2." Since there are a large number of potential paths between any two entities, we filtered them by the best mutual information pathways approach (see section Methods "BMIPs Search"), and then aggregated the entities according to their BMIP. Examination of the clusters revealed five coherent coronavirus-specific topics [comorbidities in high-risk group, SARS-related symptoms and complications, SARS-related drugs, SARS disease biomarkers and inflammation, and coronavirus receptors and RAAS (reninangiotensin-aldosterone system), see Figure 3], showing that glucose is mentioned in the context of numerous stages of the coronavirus infection: from high-risk groups through to disease development and complications. In addition, three entities directly associated with glucose (glucose transport, glucose uptake, FIGURE 3 | Subgraph obtained by aggregating the 20 Best Mutual Information Pathways from "glucose" to "SARS-CoV-2." The subgraph was constructed with the entities encountered during the mutual information guided shortest path search (see section Methods) from "glucose" to "SARS-COV-2" Node sizes are proportional to the weighted degree of nodes and node colors represent different entity types. Analysis of entities detected allows us to identify five groups of entities, each related to a specific field in coronavirus infection: (A) comorbidities in high-risk group, (B) SARS-related symptoms and complications, (C) SARS-related drugs, (D) SARS biomarkers and inflammation, (E) coronavirus receptors and RAAS system. No entities from cell compartment or organ/system entity types were detected in this analysis.
and glucose tolerance test) were found in the BMIPs (Figure 3, red nodes), indicating that glucose is also mentioned in the context of glucose metabolism.
Knowledge Graph of "Glucose in COVID-19" The first level of analysis thus far shows that glucose is extensively covered in the CORD-19 dataset and is associated with numerous key events in the infection process of coronaviruses in general. Our next level of analysis aimed to understand to what extent, and how, glucose is associated specifically with COVID-19. First, we extracted the 3,000 of the most relevant articles in the CORD-19 database using a customized ML semantic search with the phrase "glucose as a risk factor for COVID-19" (see section Methods "Query-based literature search") and then subjected these articles to entity extraction, that yielded over 20,000 entities extracted. We then constructed a knowledge graph as above, but using only the 1,500 most frequent entities, from these 20,000 extracted. Since the resulting graph is still extremely dense (see  Knowledge graphs in section Methods), we constructed a minimum spanning-tree (see section Methods) from the 150 most frequent entities (Figure 4) to allow focusing only on the most important associations between entities, in the context of "glucose as a risk factor for COVID-19." The tree structure that emerged reflects those associations that survive the greedy minimum spanning tree algorithm, guided us to the most frequent entities linking glucose to various aspects of the disease. For example, we could identify hyperglycemia as the main entity that links glucose to all groups at risk for COVID-19 (i.e., DM, obesity, hypertension, and cardiovascular disorder) in this dataset. It also shows links from glucose to immune responses, inflammatory processes, and oxidative stress in one part of the tree, vascular system and thromboses in another, and to airways of the lung, ARDS, multi-organ failure, and death in another part, among other important entities (Figure 4). From the graph created with the 1,500 most frequent entities, we next identified the top 25 BMIPs from glucose to COVID-19 as further analysis. Figure 5 shows all the most important entities linking glucose to COVID-19 in this subset of the CORD-19 dataset. Most of the entities are pathologies and biomarkers known to be associated with COVID-19. The BMIP graph in Figure 5 shows the context in which glucose is found in the CORD-19 dataset obtained from a search for "glucose as a risk factor for COVID-19." On the other hand, BMIPs subgraphs of entities linking glucose to COVID-19 according to each entity-type (as described in Figure 2), show the strongest associations with the immune defense of the lung through the entities respiratory system, alveolar epithelium, innate immunity, alveolar cell type II, immune cells, interleukins, chemokines among others (Figure 6). The subgraphs also show strong associations between glucose and the entities that concern viral entry and replication: entities such as glycosylation, glycolysis, glucose uptake, lactic acid, or lactate dehydrogenase. Finally, the subgraphs show associations with COVID-19 symptoms and complications through the entities inflammation, CRP, ARDS, cardiovascular complication, thrombosis and associations with the vasculature by the entities vascular system, fibrinogen, D-dimer, ferritin, platelet, or endothelial cells.
The subgraphs linking glucose to COVID-19 generated according to entity types additionally guided us to and through the specific symptoms, drugs, pathways, chemicals, organs, celltypes, cell compartments, and proteins where the strongest associations with glucose exist in the dataset (see Figures 6A-H). For example, the associations found in the context of the phrase "glucose as a risk factor in COVID-19" in the organ/system entity type, includes all organs known to be affected in COVID-19. In the entity type pathways, we find homeostatic, immune, infectious pathways as well as other biochemical and metabolic pathways where glucose is known to be involved.
The minimum spanning tree enables a different and deeper way to navigate the knowledge contained in the dataset. We therefore again constructed a minimum spanning tree, but this time from the entire knowledge graph containing all the 1,500 most frequent entities. This allowed us to then zoom into specific entities and navigate to deeper associations in the dataset (Supplementary Figure 4 and Figure 7). For example, a zoomin on the entity glucose reveals the entities prediabetes, glucose tolerance test, homeostatic process, HbA1c, insulin, or beta-cells as key entities associated with the groups at risk to COVID-19 ( Figure 7A). Zooming in on lung and alveolar epithelium, close entities in the spanning tree, reveals airway, mucociliary clearance, airway surface liquid, mucus, alveolar macrophage, lung surfactant, surfactant protein-D, phagocytosis, and alveolar cell-type II as key entities associating glucose in the lung (Figures 7B,C). Zooming in on viral entry ( Figure 7D) reveals viral load, DNA replication, S protein, glycoprotein, carbohydrates, lectin, or glycosylation as key entities in the viral entry process.
To summarize, the knowledge graphs generated from the CORD-19 database enables navigation of the contents of the CORD-19 dataset in terms of entities, different associations between entities, and in the specific context of "glucose as a risk factor for COVID-19, " and enable instant access to the underlying article(s) and the specific text where these entities are mentioned. We chose this way to construct the knowledge graphs because it finds all types of meaningful associations for an objective view of the dataset, rather than focusing our extraction on a predetermined subset of association types that may bias the view. The methodology used delivers unbiased access to all entities and their associations in over 240,000 scientific articles that are relevant to a potential role played by glucose in the infection. We complemented this representative review by research in the general literature, analyses and computational modeling of specific parameters extracted from multiple articles and using atomistic reconstructions of the virus and its immediate environment to gain a deeper insight into the biophysical constraints that may need to be considered.

Overview of Blood Glucose Metabolism in High-Risk Patients
Two measures are frequently used as indicators of glucose metabolism; FPG (fasting plasma glucose), measured as the blood glucose concentration after a minimum fasting period of 8 h, and PPG (postprandial plasma glucose), measured as the blood glucose concentration 1 or 2 h after a meal or ingestion of a bolus of glucose. Under normal conditions, FPG values range from 4.4 to 6.1 mmol/L (79-110 mg/dL) (average of 5.5 mmol/L), and PPG values should be lower than 7.8 mmol/L (<140 mg/dL). Hyperglycemia is generally diagnosed when FPG is >7 mmol/L (>126 mg/dL) or PPG >11 mmol/L (>190 mg/dL). Such a high FPG value is sufficient to diagnose chronic FIGURE 4 | Minimum spanning-tree constructed from the knowledge subgraph containing the 150 most frequently mentioned entities. The knowledge graph is built from the 3,000 most relevant articles related to the query "glucose as a risk factor for COVID-19" as detailed in methods. The spanning tree is obtained by minimizing an edge distance score based on mutual information (see section Methods). The entity types are color-coded as presented in Figure 2. Such sparsified view of the knowledge graph allows us to get insight on the most frequent entities linking glucose (top left) to various aspects of the disease in the CORD-19. For example, the term "hyperglycemia" provides an important link between glucose and all risk groups for COVID-19 (i.e., DM, obesity, hypertension, and cardiovascular disorder). It also illustrates that the literature supports a strong association of glucose-related terms to immune responses, inflammatory processes, vascular system, thromboses or airways of the lung. hyperglycemia, however normal or modestly elevated blood glucose (FPG ranging 6.1-7 mmol/L or PPG ranging 7.8-11 mmol/L), called impaired fasting glucose (IFG), could reveal an impaired glucose tolerance (IGT) that leads to greater and more frequent glucose fluctuations than normal (88). Because there are no symptoms of IGT, many people with this condition are unaware of it. Diagnosis of IGT is done following an oral glucose tolerance test (OGTT), the measure of blood glucose concentration 2 h after ingestion of a standardized bolus of glucose (usually 75 g) to detect how quickly the body can clear the glucose from the blood. IGT is indicated when OGTT is between 7.8 and 11 mmol/L and could be a sign of pre-diabetes or other metabolic disorders.
As mentioned, aging, hypertension, cardiovascular diseases, DM, and obesity are strong risk factors for more severe symptoms and higher death rates from SARS-CoV-2 infection. We find that the literature strongly supports abnormal FPG, IGT, or hyperglycemia in all these conditions as described below.

Aging
A hallmark feature of COVID-19 is its preferential impact on the elderly, but the reason is not clear. One of the many changes that occur with aging, is a steady increase in FPG and PPG, an increased rate of IGT (89)(90)(91), as well as an increase frequency of asymptomatic hyperglycemia (92,93). FPG reflects the steady-state of blood glucose, while PPG reflects how well perturbations in glucose levels are tolerated, or the capacity to clear sudden elevations in glucose. We could not find sufficient data on variations in blood glucose metabolism with aging in any one study and therefore compiled data from multiple articles (Supplementary Table 5) and plotted the average trajectories of FPG and 2 h PPG (following an OGTT test) within different age ranges (see Figures 8A-C). Figure 8A shows that FPG concentration increases linearly by ∼0.165 mmol/L per decade starting from around 30 years old, with no significant difference between gender ( Figure 8B). On the other hand, 2 h PPG concentration only changes marginally until the age of around 60 years, but then starts increasing markedly by around 0.64 mmol/L per decade ( Figure 8C).
The fatality rate for COVID-19 with age of the patient is well-characterized by an exponential increase, with a dramatic increase after the age of 60 years (Supplementary Figure 1B). While both a decline in glucose management with aging and an increase in CFR in COVID-19 with age are known, how these two age-related variables are correlated has not yet been evaluated in the literature. We estimated this correlation using data gathered from literature (Figures 8A,C and Supplementary Figure 1B). The correlation coefficient between age-related COVID-19 CFR and age-related changes in FPG was 0.87 (R 2 = 0.75) (Figure 8D), and between age-related COVID-19 CFR and agerelated changes in 2 h PPG was 0.95 (R 2 = 0.90) (Figure 8E). Even if both correlations are high, the data is insufficient to establish whether the higher correlation for 2 h PPG is significant, which would indicate that disease severity is more predicted by a compromised capacity for glucose clearance (reflected in 2 h-PPG levels) than steady-state blood glucose levels (reflected in FPG levels).
While the correlations are striking, they do not prove causality and the exponential increase in CFR with age is most likely due to the convergence of multiple factors, including those occurring in age-related comorbidities. The CORD-19 dataset did not contain sufficient data to isolate and quantitatively evaluate other agerelated variables that could be predictive of COVID-19 CFR. We therefore reviewed the extent to which dysregulation of glucose metabolism is common among some of the known comorbidities of COVID-19.

Diabetes Mellitus
DM is one of the comorbidities that is a strong risk factor for COVID-19 mortality since around 50% of hospitalized patients who died from COVID-19, independent of age, had DM (Supplementary Figure 5A). On the other hand, one in five COVID-19 patients with diabetes succumb to the disease (94). A persistent hyperglycemia (FPG > 7 mmol/L or PPG > 11 mmol/L) is the hallmark of DM. Because hyperglycemia could also be acute, glycated HbA1c (glycated hemoglobin), a measure of the glycemic variation over the past 2-3 months, is an additional marker that is used in the diagnosis of DM (95). Hence, FPG > 7 mmol/L with HbA1c > 6.5% is the definite indicator of DM, whereas FPG > 7 mmol/L with normal HbA1c (<6%) reflects an acute hyperglycemia without DM. Indeed, nondiabetic acute hyperglycemia is often asymptomatic and therefore undiagnosed, but could mask an impaired glucose tolerance (IGT). Finally, HbA1c ranging between 6 and 6.4% is a sign of pre-diabetes. Figure 9A shows the average values of FPG, 2 h-PPG, and Hb1Ac in the diabetic population, according to several reports in literature (Supplementary Table 4A).

Hypertension
Hypertension is the second most frequent comorbidity in COVID-19 related deaths (25) and is also correlated with FIGURE 6 | BMIPs subgraphs from glucose to COVID-19 in each entity type. To obtain the subgraphs we, first, sliced the knowledge graph (built from the 3,000 most relevant articles related to the query "glucose as a risk factor for COVID-19") according to different entity types (see Figure 2). As a result, we obtained a subgraph per entity type, containing only the entities of the given type. We then added the terms "glucose" and "COVID-19" to each of the subgraphs and then computed BMIPs from "glucose" to "COVID-19.  Figure 5A); in patients <44 years, 35% of the deaths are associated with hypertension, but in patients >75 years, the association rises up to 70% of the deaths. Hypertension is one of the most prevalent conditions found in the general population [from 20 to 45% depending on the country (96)], and is positively correlated with advancing age (97,98). In addition, hypertension frequently coexists with the other risk factors such as DM, overweight and obesity (93,(99)(100)(101). Indeed, a high proportion of COVID-19 patients present with both diabetes and hypertension (Supplementary Figure 5A). It is therefore difficult to separate hypertension as a risk factor by itself from its association with advancing age and other agerelated comorbidities, and there is a real need to address if hypertension, by itself, is an independent risk factor for COVID-19 mortality (102,103). However, what is clear in the literature is that hypertension is strongly associated with poor glucose metabolism. Firstly, a hypertensive state is positively associated with increased FPG (Figure 9B and Supplementary Figure 6A) and in parallel, higher levels of glucose are considered to be one of the causes of hypertension (104). Secondly, in 70% of cases, hypertension is associated with a disturbance in glucose metabolism [i.e., previously known or newly diagnosed DM (25%), IGT (22%), insulin resistance (9%), or IFG (11%) (105)]. Thirdly, a study on 63,443 men (ages 21-60 years) showed that IFG increases more with age if blood pressure is also elevated (98) (Supplementary Figure 6B). Dysfunction in glucose metabolism in hypertensive patients is therefore frequent and often undiagnosed because an OGTT test to detect IGT is not commonly conducted in the management of hypertension (106). Additionally, some β-blockers, the first drugs prescribed in the management of hypertension, have the common side effect of inducing acute hyperglycemia (107,108).

Obesity
Overweight and obesity are risk factors for COVID-19 complications and mortality (109,110). Overweight is defined as a condition where the body mass index (BMI) is between 25 and 30, while obesity is indicated when BMI > 35, and severe obesity when the value exceeds 40. BMI is positively correlated with FPG levels (111)(112)(113) (Supplementary Figure 6A). Mild or severe obesity is directly correlated with hyperglycemia and the incidence of diabetes (90), and IGT is a common finding in obese patients (114). Additionally, it was shown that the incidence of IGT and DM increase proportionally with BMI (i.e., the study shows 20 and 1% IGT and DM incidence, respectively, for BMI >21; 29 and 6% for BMI ranging 25-26.9; and up to 55 and 20% for BMI > 31 [from

Intensive Care
Patients in ICU have a high risk of hyperglycemia, independent of a history of diabetes, due to the stress of the disease and/or hospitalization (termed "stress hyperglycemia") (116), or due to the enteral or parenteral feeding that is commonly rich in glucose (92,(117)(118)(119); and hyperglycemia has been reported to predict a poor prognosis for diverse critically ill patients (120)(121)(122). Additionally, common drugs, that are sometimes used for the treatment of severe viral infection such as catecholamine vasopressors, and some immunosuppressants and corticosteroids, can predispose patients to hyperglycemia (92,123,124).  Figure 1B] are represented to detect possible correlation in between these age-related events. The correlation coefficient "R" that indicates how strong is the relationship between the two variables is calculated between the two respective series of values for the same age range, then R 2 , the coefficient of determination = R ∧ 2.
A review of the literature thus far shows that the different groups known to be at risk for severe COVID-19 are all likely to present with some level of hyperglycemia, impaired fasting glucose (increased FPG), or IGT (Figure 10), suggesting that reduced glucose metabolic capacity and/or induced elevations in blood glucose could explain why the known preconditions are risk factors for COVID-19 complications and mortality. Findings on the role of glucose during the previous SARS-CoV-1 and MERS outbreaks and preliminary reports on COVID-19 pathogenesis further support this hypothesis. Firstly, even a mild increase in FPG (5.78-7.9 mmol/L) was linked to increased morbidity and mortality in SARS-infected patients during the 2003 outbreak (27). Secondly, it was reported in China, in a small cohort, that 52% of patients presenting clinical characteristics of COVID-19 were hyperglycemic (12). Finally, numerous more recent studies showed that increased FPG is associated with a poor prognosis and increased risk of death from COVID-19, whereas well-controlled FPG is associated with a better outcome ( Table 2). More importantly, not only diabetes or hyperglycemia, but IFG specifically has been associated with a higher risk of poor outcome and mortality (52), suggesting that even a modest increase in FPG is a prognostic indicator.
In summary, the literature supports the hypothesis of glucose dysregulation as a common factor within the most common groups at risk. To understand whether there exists a causal foundation for these correlations, we investigated potential mechanisms of action of glucose in the life cycle of the infection. To do so, we traced the various steps of the pathogenesis of COVID-19 that were indicated by the expert knowledge system.

Glucose in the SARS-CoV-2 Life Cycle
SARS-CoV-2 belongs to the coronavirus family, whose name comes from the shape that the structural spike gives to the virion; protruding as spikes at the surface of the envelope and forming a crown (130). The S protein holds a receptor binding domain (RBD) at the termini of the ectodomain that allows the recognition and binding to its host receptor angiotensinconverting enzyme 2 (ACE2) (131,132). Each spike is a homotrimer of the S protein, but only one RBD acquires the so-called up-conformation to allow the binding to ACE2 (133,134). Subsequently, a complex sequence of cleavages by host proteases [membrane TMPRSS2 (135) and furin (136)] allows conformational changes of the spike necessary for the subsequent fusion of the virion with the host cell membrane, the cell entry  and genome delivery inside the cell for further replication (16,137). Once inside the cell, the virus relies entirely on the host for energy, and must hijack the cellular machinery of the host to produce more copies of virions. Glycosylation and glycolysis are two key pathways necessary for viral entry and replication and therefore hijacking these metabolic processes is of critical importance for the infection.

Glycolysis as a Key Mechanism for Viral Replication
Viruses are non-living entities and, as such, do not have their own metabolism. Hence, viruses need a supply of nucleotides for genome replication, amino-acids for new protein synthesis, fatty-acids for their membrane, as well as adenosine triphosphate (ATP) for the viral packaging process (138). For this purpose, most viruses have evolved to modify the cellular metabolism of host cells upon entry to increase the availability of energy and nutrients for their own reproduction. One of the most common modifications is the switch to glycolysis as the main metabolic pathway, a fast process for providing the virus with ATP without requiring oxygen, but needing an increase in the uptake of extracellular glucose.
To achieve this, viruses induce glucose transporter expression, glucose uptake, glycolytic enzymes expression (hexokinase 2), and lactic acid production (139,140), as early as 8-12 h post infection. The activation of any one of these metabolic pathways is dependent on the cell type infected and on the type of virus (141,142). The correlation between glucose availability and viral replication is well-known, especially for the influenza virus. For example, Reading et al. (143) showed that viral replication of influenza in the lung is proportional to blood glucose concentration. Kohio and Adamson (144) also showed that in vitro exposure of pulmonary epithelial cells to elevated glucose concentrations significantly increased influenza virus infection and replication, whereas the treatment of cells with glycolysis inhibitors significantly suppressed the viral replication. Similarly, glucose reduction during infection reduces viral replication (138,145). Importantly, SARS-CoV-2 replication in monocytes was shown to rely entirely on ATP produced by glycolysis (146). Glucose supply and glycolytic efficiency are therefore crucial parameters for viral replication.

Glycosylation as a Key Process in Viral Pathogenesis
Glucose is not only an essential energy and carbon source for viral replication, it is also the precursor for glycan trees synthesis, a key process in viral pathogenesis. N-glycosylation, that consists of the addition of glycan trees at N(X)T/S consensus sites of proteins, is a post-translational modification that affects more than 50% of mammalian proteins, most importantly membrane proteins (147). This modification has a crucial role in ensuring the correct structure and function of the proteins, the regulation of protein-protein interactions, cell signaling, and pathogen-host recognition (148,149). Glycan trees are hydrophilic structures also conferring a high solubility to secreted proteins. They consist of assemblies of monosaccharides [sugar molecules such as glucose, galactose, N-acetylglucosamine, Nacetylgalactosamine, glucuronic acid, xylose, mannose, fucose, or sialic acids (150)] and can be divided into three main types: (1) the oligomannose types [or high-mannose (HM)], considered to be under-processed glycan trees, that exclusively contain mannose residues and are rarely found in mammalian membrane proteins; (2) the complex types, that are bulky, but flexible trees, containing multiple branches with any number of the other type of saccharides mentioned, and (3) the hybrid types which are composed of one branch of mannose residues and a second branch with complex residues (151). Importantly, glucose, the main monosaccharide in carbohydrate metabolism, can be converted into all the types of sugars required to build glycan trees.
Glycosylation is key in multiple biological mechanisms of viruses [infectivity, virulence, immune interactions among others (152), and implicated in species-to-species transmissibility (153)]. Transmission of zoonotic viruses into humans are accompanied by drastic changes in glycosylation, as exemplified by the human influenza H3 hemagglutinin where the number of glycosylation sites have doubled since the 1968 pandemic while its amino acid sequence has remained 88% unchanged (154). Glycosylation is essential for particular mechanisms such as maintaining the structural shape of the viruses, recognizing the host cells and binding sites, as well as for cell entry (155)(156)(157). It is also used to evade the immune system; indeed, it allows the virus to deceive the humoral and adaptive immune system of the host by imitating its glycosylation coat (in a process called molecular FIGURE 10 | Interconnection of groups at-risk and their link with dysregulation of blood glucose metabolism. Hyperglycemia, increased FPG or IGT (impaired glucose tolerance) are glucose metabolism dysregulation observed in the different group at risk for COVID-19. Hyperglycemia is the define characteristic of DM. ICU is associated with stress induced hyperglycemia. Hypertension, obesity and aging are all conditions strongly linked to increased FPG or IGT. Beta-blockers, used as treatment for hypertension or drugs used in ICU may also induce hyperglycemia. Additionally, obesity is a comorbidity often associated with DM and hypertension.  mimicry), and shield its immunogenic epitopes from antibody recognition (153,155,(157)(158)(159)(160). The glycan coat of SARS-CoV S proteins is however relatively sparse compared to strong immune evaders such as HIV or Ebola (153). Viral glycoproteins are thought to be more heavily glycosylated than host glycoproteins, and the glycan composition can differ from host compositions, and from host to host. The under-processed HM type is, for example, rarely found on the host cells, but frequently found in enveloped virus protein. This is explained because the distribution of oligo-mannose or complex glycans is determined by the accessibility and crowding of the carbohydrate chains, more than the protein sequence itself (161). Indeed, densification of glycans over a protein sequence results in inhibition of glycan processing and poorer conservation of glycan trees across viral copies (162). In SARS-CoV-2, the S protein forming the spike is particularly highly glycosylated, with 22 sites of N-glycosylation per monomer, holding mostly complex-type glycans, and ∼30% oligomannose-type (133,134,163,164).
We used an atomistic visualization tool (BioExplorer, see section Methods) to reconstruct the glycosylation profile of the SARS-CoV-2 S protein, in order to obtain a realistic view of the organization of the different types of glycans on the different domains of the spike. Several groups have reported the glycosylation profiling of the protein S (160,163,165), with some discrepancies in the reports. This is likely because the glycosylation profile of a protein can differ from cell type to cell type, and because of glycosylation microheterogeneity, i.e., the inherent variation of glycan structure at a specific site (166). In our study, we considered data reported in Watanabe et al. (163), considering only the most frequently represented glycan type (HM, complex or hybrid) for each specific site, without including microheterogeneity (see section Methods, Glycan types and position). The resulting distribution of glycans is schematically represented in Figure 11A (detailed in section Methods). This atomistic representation of the glycosylated spike shows the extent to which the spike is physically shielded by glycan trees (Figures 11B-E) making the virus appear as a large sugar molecule to the host.
Whereas, the complex glycans are mostly localized at the extremities of the spikes and around the connector domain (CD), the HM glycans are concentrated around the central core of the spike (in a ring-like formation), and only rarely localized at the extremities (Figures 11A-E). We can reasonably hypothesize that the bulky complex glycans, mimicking the host cell glycan types, are exposed at the extremity to help hide the spike from detection by the immune system. In contrast, underprocessed HM glycans, that require less enzymatic processing, could be sufficient to cover less exposed immunogenic domains such as the fusion peptide ( Figures 11A,D,E). In addition, the HM types, which are the glycans recognized as foreign by the innate humoral immune system, are logically less exposed than the host-like complex types (160). Complex glycans on the RBD are surrounding the RBM (receptor binding motif), that is itself completely glycan-free to allow binding to its receptor (Figures 11C,E). These complex glycans may also serve a different function such as aiding the recognition and binding to the receptor. Indeed, ablation of two N-sites of the RBD (N331-N343) drastically reduces infectivity (167). Glycans located on the N-terminal domain (NTD) could also be involved in receptor recognition as molecular dynamics simulations have suggested that, apart from the shielding, glycans at two sites, the N165 and N234 in the NTD, may provide conformational stability of the receptor-binding domain during recognition of ACE2 (168). ACE2 is also glycosylated, holding six putative N-glycosylation sites [(169) and section Methods]. It has been reported that ACE2 glycosylation does not affect its expression on the cell surface, but it is required for the binding to SARS-CoV-2 glycosylated spike and for fusion with the membrane (170,171). To gain insight into the involvement of glycans in the spike-receptor interaction, we represented the interaction of the glycosylated spike in its open conformation with glycosylated ACE2 (Figure 12). The domains on the spike and ACE2 involved in the interaction (binding domains) are highlighted in green and blue, respectively, showing the accuracy of the models. Interestingly, one can observe that both binding domains are almost exclusively surrounded by complex glycans (Figure 12) that seem to be connected. The complex glycans on the spike might therefore not only serve to protect the ACE2 binding domain when in closed conformation or to stabilize the interaction, but could also enable the conformational change of the protein S into its up-position, required to be able to bind to the receptor. This is in agreement with a recent proposal reported by Casalino et al. (168), that the glycan composition of SARS-CoV-2 spike is crucial for the RBD up/down conformational changes. Similarly, the complex glycans of ACE2, almost all concentrated near the spike interacting domains, may serve to allow and stabilize the interaction with the spike.
More than an effect in receptor binding and infectivity previously mentioned, mutations of some glycosylation sites are known to render the virus resistant to neutralizing antibodies (172). The S protein is the major antigen responsible for the adaptive immune response (131,134,173,174); it is therefore natural to direct vaccines at the spike protein (175). Several amino acid changes in the S protein could affect viral infectivity, transmissibility and efficacy of neutralizing antibodies. If they involve glycosylation sites, then the virus can change its glycan coat needed for infection, transmission, and deceive the host's immune system. Three variants of SARS-CoV-2 are of particular concern; the variant B.1.1.7 (or 501Y-V1) first emerged in the UK, the variant 501Y-V2 first emerged in South Africa (SA), and the P.1 variant first emerged in Brazil-all with mutations identified in the sequence coding for the S protein (see https:// www.ecdc.europa.eu/en/publications-data/covid-19-riskassessment-spread-new-variants-concern-eueea-first-update). All three variants share the N501Y mutation, located in the RBM, responsible for a more infectious phenotype with higher infectivity but apparently little change in severity (176,177), due to increased binding affinity for its receptor ACE2. The SA and Brazilian variants hold an additional mutation (E484K), which may be an immune evader mutation (178,179). They all have undergone additional mutations, but none of these impact glycosylation sites. However, the Brazilian variant, from which there is less information on infectivity, severity, immune evading, possesses a T20N mutation, a mutation located at the beginning of the NTD region, which could potentially become a functional glycosylation site according to the NetNGlyc 1.0 software (http://www.cbs.dtu.dk/services/NetNGlyc/).
In order to better understand the potential impact of this additional glycan site, we modeled the glycosylation profile of the P.1 variant using the BioExplorer tool we developed. Interestingly, the additional glycan (N20) would be localized at the very top of the spike (Figure 11F), adding to the shielding of the RBM surrounding region (Figure 11G), which may suggest that it would be better at evading the immune system. In addition, it is localized very close to the N331 and N343 glycans sites [see  (167)] shown involved in receptor binding, which may also increase efficacy of receptor binding and account for transmission with lower viral loads. Overall, this potentially new functional N-site, in addition to the welldescribed N501Y and E484K mutations, could render the P.1 variant more infectious and even a stronger immune evader.
Glucose in the Antiviral Defense of the Lung SARS-CoV-1 and SARS-CoV-2 are respiratory viruses that mainly invade the human body through droplets first inhaled into the upper airways, where they infect host cells by binding to the host receptor ACE2, and then may migrate to the lower airways where more cells can be more easily infected (180). ACE2 is expressed in many different tissues, but mainly found in lungs, pancreas, kidneys, as well as the gastrointestinal tract and endothelial cells (181)(182)(183)(184). Because ACE2 is the entry point for the virus, several studies have focused on the role of the ACE2-spike interaction, the expression level of ACE2, and the glycosylation status of ACE2 to explain the severity of the disease. The data is however inconsistent with no clear correlation between ACE2 expression levels and disease severity (185)(186)(187)(188)(189). However, before reaching the lower airway where it can bind to ACE2, the virus has to break through the first non-specific anti-pathogen defense system of the lung formed by the pulmonary epithelium and the airway surface liquid (ASL). This defense system is the first-line protective barrier from constant exposure to bacteria, fungi, viruses and toxic particles (190).

The Pulmonary Epithelium and the ASL as the First-Lines of Defense Against Pathogens
The non-alveolar epithelium of the respiratory zone is composed of many types of secretory cells that produce cytokines, antimicrobial agents as well as mucins forming the mucus (Figure 13) (191)(192)(193). This epithelium possesses a high number of ciliated cells with hair-like projections that beat rhythmically, propelling pathogens and inhaled particles trapped in the mucus out of the airways. This process, called mucociliary clearance, is the very first defense that starts in the upper airway and that attempts to expel the pathogen before it can reach the epithelial cells (194,195). Some pathogens may get through and reach the lower alveoli, where the epithelium is mainly composed of alveolar epithelial cells type I and II (AECI and AECII) along with numerous resident macrophages (193). The thin AECI cover 95% of the alveolar surface area and are largely devoid of organelles since they specialize on passive gas exchange (196), whereas the cuboidal AECII secrete surfactant, a fluid composed of a mixture of proteins and lipids involved in both the maintenance of surface tension, to avoid the collapse of the alveoli, and alveolar protection (197)(198)(199). The AECII pneumocytes are the cells of the respiratory tract showing the highest expression of ACE2 as compared to lower levels of ACE2 that are found on the clara cells, the ciliated airway cells and the epithelial cells of the nasal cavity (200).
The ASL is composed of a periciliary layer and the overlying mucosal layer, and lines most of the respiratory tract (Figure 13).
The mucosal layer is composed of mucins, large glycosylated proteins secreted by the specialized mucosal and goblet cells (191,201) that form a physical barrier to trap inhaled particles or pathogens. The periciliary layer has a lower viscosity to allow the ciliary beating for mucociliary clearance. The ASL volume, depth and hydration level are critical for the functioning of the mucociliary escalator and these parameters are therefore homeostatically regulated by an intricate orchestration of mucin production and expression of a complex combination of ion channels, exchangers and pumps [see (202) for an extensive review]. Na + absorption and secretion of HCO − 3 and Cl − are mediated through the specific transporters ENaC and CFTR (203,204). Importantly, the deeper alveoli in the lungs are lined with a thin surfactant layer to permit efficient gaseous exchange (198), which contains several other molecules, including aminoacids, proteins, lipids and glucose, all of which are under strict homeostatic control to avoid conditions that would support bacterial growth (205), while ensuring a proper functioning of the ASL. The glucose concentration in the ASL is especially carefully regulated (206).

Regulation of Glucose Concentration in the ASL
Glucose is 10-12 times less concentrated in the ASL than in blood (207). This low concentration of glucose (0.4 mmol/L ± 0.2 in normal condition) is necessary to maintain the proper functioning and the sterility of the ASL (208). Glucose is exclusively supplied to the airways from the circulating blood, reaching the basolateral side of epithelial cells, where uptake of glucose can occur through glucose transporters (GLUT). The low concentration of glucose in the ASL is tightly regulated by homeostatic mechanisms that include paracellular passive diffusion controlled by tight junction barriers, and facilitative transcellular epithelial glucose transport (Figure 14); the paracellular diffusion being the primary mechanism (209).
The transcellular transport of glucose is mediated by the facilitative transporter GLUT, expressed at the basolateral membranes, and by GLUT or SGLT1 (sodium-glucose linked transporter) at the apical membrane of the airway and alveoli, respectively (Figure 14 left panel) and (207,210). Glucose normally moves through GLUTs by passive diffusion down a concentration gradient generated by the activity of hexokinases, which phosphorylate intracellular glucose to maintain a low-intracellular concentration of glucose (209). In contrast, transport via SGLT is driven by sodium (Na + ) and glucose gradients. This co-transport of Na + in the alveolus would be advantageous for the maintenance of the low volume of fluid required for efficient gaseous exchange (211)(212)(213). Several pathological conditions lead to a disruption of glucose homeostasis in the lung and a subsequent increased glucose concentration in the ASL (210,211). Indeed, defects in tight junction permeability or an increase in blood glucose concentration (hyperglycemia) could both lead to a rise of glucose in the ASL (Figure 14, right panel), with the greatest effect when they coexist. Any elevation is directly countered by apical reuptake by the epithelial cells through the GLUT and SGLT transporters, followed by rapid metabolism by hexokinase in the glycolysis pathway. Hence, the direct conversion of glucose FIGURE 13 | Overview of cell types and innate immunity in the epithelium of the lung. The airway epithelium is a pseudostratified columnar epithelium involved in air conduction, lung moisture and protection. It lines the respiratory airways and is made up of different cell types. Basal cells serve as progenitor for other cells; goblet and serous cells produce and secrete antimicrobials and mucins as part of the airway surface liquid (ASL, composed of a periciliary layer and a more viscous upper mucosal layer); clara cells, with short microvilli, secrete defensive surfactants; and ciliated cells which act in mucociliary clearance. At the endpoint of the respiratory system, the alveolar epithelium that lines alveoli is a thin simple squamous epithelium made up of two cell types; AEC I are flattened squamous cells covering 95% of the surface, in close interaction with the capillaries to facilitate gas exchanges; and AEC II which are cuboidal cells producing and secreting the surfactants. Additionally, to ensure the host protection, many immune cells are resident; the DC, with their snorkels-shaped extension across epithelial tight junctions act as phagocytes and antigen presenting cells in contact with antigens; T cells, γδ T, and NKT cells are characterized by a rapid production and secretion of cytokines upon interactions with pathogens; and alveolar macrophages which are sentinel phagocytes in close proximity with AEC I and II in the alveoli. DC, dendritic cell; NKT, natural killer T cell; γδ T, gamma delta T cell; AEC I and AEC II, alveolar epithelial cells; SP-A, SP-D, surfactant protein A and D.
FIGURE 14 | Overview of glucose transport in the lung epithelial cells. In normal state, glucose is transcellularly transported from bloodstream into the ALS by specialized basolateral transporters GLUT, and is controlled by the intracellular hexokinase activity. Glucose is also transported by a passive paracellular diffusion. The latter is limited through tight junctions which link epithelial cells and maintain the tissue cohesion and integrity, rendering the paracellular diffusion extremely low. Reuptake of glucose from ASL occurs through apical GLUT or SGLT1 transporters. The overall glucose transfer results in ALS concentration of glucose 10-12 times lower than in blood. Conversely, in hyperglycemia where glucose concentration in the blood is higher (right panel), the glucose gradient is increased toward ASL leading to augmented transcellular transport. Additionally, defective tight junctions result in higher paracellular permeability. In such condition, the reuptake of glucose and rapid hexokinase metabolism of epithelial cells is then not sufficient to maintain a low concentration of glucose in the ASL, that could dramatically increase. GLUT, glucose transporter; SGLT, sodium-glucose linked transporter; Na + , sodium ion. to glucose 6-phosphate (G6P) allows the cells to maintain a steep gradient of glucose concentration needed for a strong driving force for the reuptake of glucose from the ASL.

Elevated Glucose in the ASL Impairs Primary Lung Defenses
A high concentration of glucose in the ASL has multiple effects that lead to general impairment in its defense capability (210), as summarized in Figure 15 and detailed below.
Glucose impairs the humoral arm of lung defenses. As mentioned before, the airway epithelial cells secrete a wide range of antimicrobial agents (see Figure 13). The combined activity of these proteins is a crucial step in the first phase of the innate defense of the lung against infections by viruses, bacteria and fungi. Among them, enzymes (lysozymes, proteases), proteases inhibitors, and soluble factors [cytokines, lactoferrin, ß-defensin, and LL-37 (cathelicidin-related peptide)] that are dedicated to humoral immunity against a variety of pathogens (190,192,214,215). However, the protection against viruses is mainly mediated by the soluble C-type lectins SP-A and SP-D (surfactant protein A and D, pattern recognition molecules of the collectins family) (216)(217)(218), produced by the AECII cells and secreted in the distal alveolar airway (see Figure 13) (219,220). In case of viral invasion, C-type lectins bind to the high-mannose glycans exposed at the surface of the enveloped viruses through their carbohydrate recognition domain (CRD) (221,222), and exert their antiviral activity through two different mechanisms: first, by aggregating the pathogens, that physically impairs the binding to the receptors, and second, by recruiting and activating the resident alveolar macrophages, neutrophils, and chemoattracted phagocytes to phagocytose the aggregated viruses (216,218,220,223,224). SP-A and SP-D show significant differences in ligand preferences; in the case of SARS-CoV, it seems to be mainly targeted by SP-D recognition (218,225). Using the BioExplorer, and the data reported in the literature, we reconstructed a model of the environment of SARS-CoV-2 in the ASL during primary infection under normal glucose concentration (Figure 16).
Importantly, the CRD domain of collectins recognize other varieties of carbohydrates with different affinities (217,226). Ctype lectin with an EPN tripeptide motif on their CRD, such as SP-D, show a high affinity for glucose (227). Hence, at high concentrations, glucose can bind to the CRD domain of these C-type lectin that competitively blocks the viral recognition (143,(228)(229)(230). In addition, glucose can indirectly impair ßdefensins and lactoferrins activities (see below on ASL acidosis). Finally, high ASL glucose concentrations could impair not only the activity, but also the secretion of these antimicrobial factors (210) (Figure 15).
Elevated glucose impairs the cellular immunity in the lung. The alveolar phagocytes play a key role in the non-specific elimination of pathogens as well as in the orchestration of the adaptive immune system through crosstalk. Alveolar macrophages, interstitial macrophages and dendritic cells (DC) are some of the few cell types that reside in healthy airspaces [(231) and Figure 13]. In brief, at resting state, alveolar macrophages participate in the homeostasis of the lung, mainly clearing apoptotic cells and recycling surfactant. Upon infection, alveolar macrophages recognize early alarm signals from infected cells, such as elevations in type 1 interferongamma (IFN-γ), pathogen-associated molecular patterns (PAMPs) and danger-associated molecular patterns (DAMPs), migrate to the site of infection, and initiate a pro-inflammatory response. These activated macrophages (M1) express various cell surface receptors, including surfactant-CRD receptors or pattern recognition receptors (PRRs), leading to pathogens recognition, phagocytosis and clearance (232,233). They also secrete reactive oxygen species (ROS) to kill pathogens, as well as pro-inflammatory cytokines necessary for the chemoattraction of additional phagocytes and immune cells that migrate across the epithelium to access the site of infection. Additionally, they facilitate the clearance of infected cells to limit the propagation of the infection. In a second phase, guided by anti-inflammatory cytokines and surfactant proteins [such SP-A and SP-D that play a crucial immunomodulatory function (234)], macrophages switch to the alternatively activated macrophage (M2) state to begin winding down the inflammatory response, phagocytose apoptotic cells, repair damaged cells, and restore homeostasis (224,235). Increasing glucose above physiological concentrations is associated with a reduction in the chemotactic migration capacity of neutrophils and in their phagocytotic efficiency (76,236,237). Interestingly, aging and hyperglycemia are also two conditions associated with a decreased number of alveolar macrophages and DCs, with altered function of antigen presenting cells (APCs) (238)(239)(240), significantly impairing the cellular arm of innate defense.
Elevated glucose causes acidosis of the ASL. Regulation of the pH of the ASL, neutral under normal conditions (6.9-7), is also tightly controlled as it may affect the general capability of the innate immune defense of the ASL (202). As previously mentioned, elevation in the glucose concentration in the ASL is countered by apical reuptake by the epithelial cells and rapid metabolism by hexokinases in the glycolytic pathway. One main consequence is the production of lactate, in part released into the ASL (212, 241) through apical monocarboxylate transporters (MCT) that are lactate/H + cotransporters. Secretion of lactate into airway secretions leads to an acidification of the ASL that inhibits numerous pH-dependent antimicrobial agents, such as lysozyme, lactoferrin, ß-defensin, and LL-37 (242,243). Acidic pH could also affect the activity of the surfactant protein (229).
The acidification is normally neutralized by secretion of HCO − 3 -rich fluid through CFTR channels (202). However, the accompanying secretion of a HCO − 3 -rich fluid leads to an imbalance of ions and water, impacting the ASL osmolarity and volume, resulting in increased viscosity of the ASL fluid, diminished beating of the cilia, and reduced mucociliary clearance of waste and pathogens (202,(244)(245)(246)-direct consequences of hyperglycemia (202,241). Indeed, pathological conditions such DM, aging, and hypertension are associated with impaired mucociliary clearance (247,248). Acidification additionally impairs immune cell migrations such FIGURE 15 | Schematic of the impacts of increase in ASL glucose on ASL functions. High glucose in the ASL has multiple consequences on the immune defense system of the ASL. Innate defense capacities, such as the activity of lectins and macrophages are directly impaired by elevated glucose. High glucose also leads to the production of AGEs that reduces the activity of other antimicrobial factors (such as defensin or lactoferrin), and leads to a reduction of the MCC capacity. Additionally, more glucose from the ASL is uptaken by the epithelial cells, leading to an increased production of intracellular lactate. Lactate is then partially released in the ASL trough MCT transporters, consequently leading to an acidification of the ASL due to the co-transport of H + . Acidic pH aggravates the impairments of antimicrobial and MCC activities. Finally, elevated glucose is a direct supply for bacterial proliferation in the ASL. AGEs, Advanced glycation end products; MCC, mucociliary clearance; MCT, monocarboxylate transporter.
as neutrophil chemotaxis and consequently the efficacy of innate phagocytosis (249).
Elevated glucose leads to increased production of AGEs. In contrast to N-glycosylation, which requires a complex sequence of enzymatic reactions during protein synthesis in the endoplasmic reticulum (ER) and Golgi apparatus, advanced glycation end products (AGEs) are proteins and lipids modified from a non-enzymatic covalent linking through direct exposure to high amount of sugars (glucose, fructose and derivatives) (250). Glycation of proteins can interfere with their normal functions by disrupting molecular conformational changes, altering enzymatic activity, impeding protein-protein interactions and functioning of receptors. The normal physiological rate of AGE production is markedly increased in hyperglycemia [caused by diabetes for example (251)], but also increases with advancing age, oxidative stress, and inflammation (252)(253)(254)(255)(256).
The presence of high concentrations of glucose in the ASL changes its overall glycation profiling with an increased expression of AGEs, leading to serious consequences for ASL function. First, the activity of lysozymes and lactoferrins, the most abundant antimicrobial peptides in the ASL (215), is significantly reduced (257,258). Second, AGEs are known ligands for RAGE (receptors for AGEs) that are highly expressed in lung tissue, such as in AECI and AECII cells. RAGE is a critical proinflammatory mediator of the innate immune response, with its main role being the amplification of the cellular inflammatory response by producing reactive oxygen species (ROS) through the NFkB pathway (259). RAGE is essentially a PRR (pattern recognition receptor) with the ability to bind not only AGEs but also numerous ligands such as exogenous PAMPs, and DAMPs released by infected cells. RAGE is also a regulator of lung physiology, participating to the regulation of cell adhesion and morphology and therefore may play a critical role in gaseous exchange (260). Importantly, AGEs increase the expression of RAGE in macrophages, activating their polarization in the M1 pro-inflammatory phenotype (261). The presence of excessive AGEs in the ASL, through activation of RAGE, hence leads to a pro-inflammatory status of the pulmonary epithelium. Indeed, RAGE has been shown to be an important factor in respiratory viral infection, as RAGE −/− mice showed delayed mortality and accelerated viral clearance upon influenza A virus (IAV) infection (262). Finally, we hypothesize that the properties of mucins, highly glycosylated proteins, may also be altered by excessive glycation, leading to a disturbance of the mucus viscosity (263), affecting the efficacy of the mucociliary clearance. Overall, the presence of AGEs in the ASL would not only impair some aspects of the innate defense of the lung, but would also participate in generating a strong activation of the pro-inflammatory state.
In summary, high glucose in the ASL is associated with the impairment of multiple aspects of the innate antiviral defense of the lung, including the mucociliary clearance capacity, the lectin-mediated recognition of the virus, the general activity of the antimicrobial agents, as well as the number, the migration capacity and the function of the resident neutrophils and macrophages (Figure 15). Taken together, the overall efficiency of the early phase of viral elimination and clearance of infected cells could be seriously compromised by elevations of glucose in the ASL. The integrity of this early non-pathogen-specific phase is critical because if the virus breaks through these defenses, cascades of other pathogen-specific effects are initiated that make it increasingly more difficult for the immune system to protect the body from the virus, especially if it is a novel virus, as is the case of SARS-CoV-2.

Modeling of ASL Glucose Concentrations in Patients at-Risk
As mentioned before, defects in tight junction permeability or hyperglycemia could both lead to a rise of glucose in the ASL, with the greatest effect when they coexist. Hence, DM, obesity or acute hyperglycemia, are pathological conditions known to induce an increased concentration of glucose in the ASL (117,206,210). For example, ASL glucose is reported 1.2 (±0.7) mmol/L in diabetic patients compared to 0.4 (±0.2) mmol/L in non-diabetic (264). Concerning epithelial permeability, a defect in tight junction resistance can be induced by exposure to toxic particles from air pollution or smoking (265, 266), but also, and especially, by chronic inflammatory conditions associated with chronic lung diseases such as cystic fibrosis (CF), chronic obstructive pulmonary disease (COPD) or severe asthma (117,266,267). In such inflammatory conditions, glucose in the ASL has been reported to reach 1.6 (±0.1) mmol/L, or even 2 (±1.1) mmol/L depending on the pathology (211). Diabetic patients not only suffer from hyperglycemia, but they also often present with chronic inflammation (247,268), aggravating the disruption of glucose flux from the blood to the ASL.
To attempt to quantitatively evaluate to what extent changes in blood glucose can change glucose levels in the ASL under various permeabilities of the tight junctions, we produced a computational model using data obtained from the literature (see section Methods and Figure 17) and used this model to estimate the ASL glucose concentration for a control case [with normal blood glucose and epithelial resistance (Rt)] and a diabetic case (hyperglycemic and impaired Rt) (Figure 17) We then used the model to predict ASL glucose concentration in the other group at-risk for COVID-19, for which there is no data available in the literature. Aging is a condition that, in addition to reduced glucose metabolic capacity, is strongly linked to a general decrease in paracellular resistance in many tissues, including the lungs (269). We therefore used the model to infer an age-related increase of FPG ( Figure 8A) based on reported increases in the paracellular permeability with aging (Figure 17). Indeed, the model predicts that the glucose concentration in ASL increases significantly with age, as expected, because FPG increases and epithelial resistance decreases with age.
Hypertension is associated with chronic inflammation (270), which could also be responsible for a general impairment of cellular epithelial resistance. We reviewed above how hypertension is linked to an increased FPG and a higher risk of developing IGT. Based on these known qualitative effects and the quantitative modeling, it is reasonable to assume that people with hypertension will also present with higher concentrations of glucose in their ASL.
Importantly, higher glucose in the airway secretions has been observed in ventilated patients in the ICU (206,271), not surprisingly correlated with stress hyperglycemia, and not necessarily only in those patients with a chronically compromised glucose metabolism. Hence, it is most likely that all groups defined at risk for COVID-19, present with a higher concentration of glucose in their ASL as summarized in Figure 18.
It is also important to emphasize that viral infection itself is a condition known to affect the tight junction resistance (272), which would act synergistically to facilitate the infection if the virus breaks through the primary defenses of the lungs.

Infection in Healthy Patients
In brief, when a healthy person becomes infected, droplets containing virions that reach the respiratory tract will activate the mucociliary clearance as well as a reflex cough to normally expel the virus. If some virions reach the deeper airways and alveoli, they become trapped and inactivated by a second layer of defense, namely the humoral defense of the ASL, composed of numerous antimicrobial peptides, such as the lactoferrin, ßdefensins, or the SP-D proteins (see Figure 16), as previously mentioned. Beyond this barrier, those cells that do get infected quickly produce pro-inflammatory cytokines and type 1 IFN to alert the immune system and the neighbor's healthy cells to protect themselves. The resident macrophages and DC cells, also activated by PAMPs, convert into the M1 phenotype and promptly release proinflammatory molecules, such as type 1 IFN, TNF-a, and Il-1b (273), as well as a panel of chemokines to attract and activate more resident and circulating phagocytes and immune cells (neutrophils, DC cells and monocytes, as well as cytotoxic NK cells). SP-D proteins help drive the phagocytosis of the virus by the alveolar macrophages that also produce ROS to help clear the virus and the infected cells. Once at the infected site, immune cells themselves release a battery of pro-inflammatory and anti-inflammatory cytokines as well as ROS to further orchestrate an even more elaborate immune response (274). Both an impaired pulmonary epithelial resistance (due to aging or chronic inflammation among others), or an increase in blood glucose concentration (due to chronic or acute hyperglycemia, increased FPG or IGT) lead to an increase glucose concentration in the ASL, with a greater effect when both coexist (as modeled in Figure 17). All group at risk for COVID-19 are presenting one or several of these conditions, explaining that they are all susceptible to present an increased concentration of glucose in their ASL.
The infected AECII cells express damage-associated molecular patterns (DAMPs) on their plasma membrane, produce ROS and cytokines to activate their own phagocytosis and clearance by macrophages, converted to the M2 phenotype-all to limit viral propagation to the neighboring cells (274). AECII cells also secrete more surfactant proteins to further amplify the local innate defense and help drive the resolution of the inflammation through their immunomodulatory activity and their capacity to stimulate phagocytosis of apoptotic cells (219,224). Viral replication is contained by this timely orchestration of non-pathogen specific humoral and cellular innate pulmonary defenses (275) and is therefore a determining step to avoid a deeper infection (192). Complete viral clearance is finally achieved through the adaptive immune response orchestrated by the T and B lymphocytes coming from the bloodstream (65), reaching the site of infection by diapedesis: but of course, even more effectively if the body has previously been exposed to the pathogen.

Elevated Glucose Favors the Primary Infection and Viral Replication Elevated Glucose Impairs the Primary Non-specific Defense of the ASL
We reviewed and showed above that all groups at high risk for COVID-19 are likely to present with higher glucose in their ASL (see Figure 18), which acts to impair numerous facets of the primary innate humoral and cellular defenses (detailed in Figure 15). Reduced capacity of the early innate immune response and in consequence reduced physical viral clearance by glucose in these patients may explain the general increased susceptibility to infection with respiratory viruses such as SARS-CoV and influenza (see Figure 19). Using the BioExplorer, we have produced a movie (video, see "source code" in BioExplorer method section) showing the main impacts of high glucose in ASL on the primary step of infections in the lung, to explain the increased susceptibility to respiratory viruses in atrisk patients.

Elevated Glucose Levels Facilitate ACE2 Binding and Cell Entry
When the virus reaches the receptor, the RBD domain of the spike must move into the "up-conformation" to bind to the receptor, which triggers a sequence of cleavages by host proteases (membrane TMPRSS2 and furin) required for the virus to fuse with the host cell membrane and enter the cell. The virus must then strip off its coat to deliver the mRNA package inside the cell. We have described how elevated blood glucose can elevate glucose in the ASL and cause acidification of the ASL. Additionally, a study has shown that a mere bolus of glucose can also produce a more acidic intracellular environment (144,276). A lower pH in the ASL is thought to help the SARS-CoV-2 spike conformational masking to avoid detection by the immune system while it binds to the receptor (277). According to the literature, we speculate that more acidic pH in the ASL increases proteases activity (246), that would facilitate the membrane fusion (278)(279)(280). In addition, lowered pH intracellularly could further help the uncoating of the virus (144,276). Taken together, elevated blood glucose can not only compromise the early physical and immune barriers making it easier for the virus to reach its target receptor, but can also facilitate all the main steps of the actual process of infecting the cells, such as receptor binding, membrane fusion and cell entry.

Elevated Glucose Favors Viral Replication
Increased Glycolysis Rate. Enveloped viruses have evolved the ability to reprogram carbon metabolism of cells and hijack the glycolysis pathway for their own replication (138,145). As mentioned above, in the case of high glucose in the ASL, the epithelial cells have the capacity to uptake this glucose from the apical side to keep concentration low in the ASL. However, glucose levels (for example blood glucose reaching >6.7-9.7 mmol/L) can exceed the capacity for re-uptake (117). As a consequence, not only the glucose concentration in the airway, but also the glucose level inside the cells rises, saturating the hexokinase capacity for glucose phosphorylation. This negative feedback could lead to an abrupt runaway elevation of glucose levels in the ASL as the increased concentration of unmetabolized glucose inside the cells lowers the driving force for the reuptake of glucose from ASL. Thus, once inside the cell, the virus has access to an abundant supply of glucose for producing the nucleotides, amino acids, lipids and the ATP needed for replication (146). Therefore, not only does elevated glucose allow more viral particles to access and enter the cells, but also provides an ideal environment for efficient and fast intracellular replication. This analysis is in agreement with a recent study showing that the use of the glucose analog 2-deoxy-D-glucose blocks SARS-CoV-2 replication in Caco-2 cells (281). (168)]. Enveloped viruses have the capability to hijack the host cell N-glycosylation machinery to adorn their own glycoproteins with host glycans; however, to do so, a readily available glucose supply is essential. Glycosylation of some of the host proteins, such as ACE2 are also essential to allow more viruses to enter (171,181). A large supply of intracellular glucose therefore provides the ideal environment to ensure maintenance of glycosylation profiles of both viral and host proteins throughout the progression of the infection.

Efficient glycosylation process. A specific glycosylation coating is required for viruses to efficiently evade the immune system and invade cells [see above and
Exponential viral replication. High viral replication rates result in host cell damage and death, with numerous adverse effects. In the non-alveolar epithelium, damaged ciliated cells lead to reduced mucociliary clearance capacity (282,283), and compromised integrity of tight junctions increases paracellular flux of glucose into the ASL (117,207), escalating the damage through the numerous mechanisms described before (see Figure 20). This is in agreement with a recent publication on IAV infection showing that elevated glucose prior to viral infection increases virus-induced pulmonary barrier damage (284). Host cell damage together with elevated glucose turns negative feedback homeostatic mechanisms into a positive feedback loop of pathological processes. Increased production of ROS caused by cellular damages and hypoxia induces expression of the HIF1α (hypoxia inducible factor 1α) transcription factor (146) stimulating the expression of GLUT transporters (145,211,212,285) and glycolytic genes (286,287), that increase extracellular glucose uptake and glycolytic capacity, amplifying the viral replication (286)(287)(288)(289). HIF1α also induces the expression of LDH (286) causing an increase in the conversion of pyruvate into lactate (a biomarker of a poor COVID-19 prognosis), further decreasing the pH, and further compromising innate defenses. Thus, the increased glucose levels in the ASL observed in high-risk patients, not only favors viral access to the cells, receptor binding, cellular entry, and the delivery of its genetic material, but also a vicious cycle of exponential viral replication depicted in Figure 20. This causes significant damage to the local alveolar epithelium and reduced capacity for gaseous exchange, likely correlating with the appearance of respiratory distress symptoms in the patient such as shortness of breath, dyspnea, and fatigue.

Modeling of the Impact of Glucose Concentration on the Different Steps of SARS-CoV-2 Primary Infection
To better understand the interplay between the key variables of the numerous glucose-mediated actions implicated in the SARS-CoV-2 primary infection and to attempt to quantitatively evaluate the impact of elevations in blood glucose levels on COVID-19 severity, we built a computational model to simulate numerous glucose-mediated actions and predict FIGURE 20 | Graphical representation of the effects of high glucose in ASL on SARS-CoV-2 primary infection and viral replication. Elevation of glucose in the ASL has multiple consequences on the primary innate defense functions of the ASL (decreased MCC, lectin and macrophages activities, increased production of AGEs, acidification of the ASL; detailed in Figure 15) with the consequence of an increase capability of the virus to reach and bind its receptor. Acidic pH could also favor the viral entry process. Elevation of glucose in the ASL also increases the glucose uptake inside the cell, favoring the cellular replication capacity of the virus, that hijacks the host glycolysis and N-glycosylation pathways. High viral replication provokes inflammation and hypoxia that tend to affect the paracellular resistance (Rt) of the cell aggravating the efflux of glucose in the ASL. Hypoxia also leads to increase the expression of GLUT transorters and glycolytic genes amplifying the glycolytic rate, and in consequence, the viral replication rate, the production of lactate, the decrease of the pH in the ASL, in a vicious circle of infection escalation. AGEs, Advanced glycation end products; Rt, epithelial resistance; GLUT, Glucose transporters.

Modeling of SARS-CoV-2 Binding With Its Receptor ACE2
A schematic of the model is presented in Figure 21A. Briefly, lectin traps a fraction of the virus in the ASL before it reaches the receptors. Higher glucose concentrations act competitively for binding to lectin, leaving less lectin available for trapping the virus and allowing more viruses the chance to reach the receptor. In the presence of a constant concentration of lectin, binding of the virus to the receptor depends on both the viral load and glucose concentration in the ASL. After endocytosis, the virus uses epithelial glucose to replicate, leading to the production of lactate, which when released in the ASL, further lowers the pH of the ASL.
We first simulated the binding of SARS-CoV-2 to the receptor ACE2 as a function of three different viral loads at the time of infection in a normoglycemic or hyperglycemic condition (see section Methods). The model illustrates the extent to which SARS-CoV-2 may bind to ACE2 depending on both the viral load (see section Methods for description of viral loads) and the glucose concentration in ASL. Increased glucose in the ASL due to hyperglycemia increases receptor binding for all viral loads (Figure 21B). In the hyperglycemic case, binding to the receptor is nearly doubled for all viral loads, suggesting that the receptor binding at any viral load is amplified under hyperglycemic conditions. Additionally, a high viral load results in near maximal receptor binding if patients are hyperglycemic.

Simulating SARS-CoV-2 Cell Entry
We next explored the efficiency of SARS-CoV-2 endocytosis after a simulated "sneeze" that delivers virions in a pulse-like manner with a time course of concentration decay (see section Methods). We used three "sneezes" of varying viral content and explored cell-entry after receptor binding; again, in a normoglycemic or hyperglycemic condition. The rise time is a function of inhaling the virus and the decay time represents clearance from the lung ( Figure 22A).
The model suggests that SARS-CoV-2 endocytosis in epithelial cells is substantially increased by high glucose in the ASL in all cases, but with the greatest effect seen for low viral loads ( Figure 22B, blue lines). The decreasing effect of elevated glucose concentration as the viral load increases (left vs. right panels) is due to saturation of the endocytic process.

Modeling of SARS-CoV-2 Replication Rate
In the next step, we examined the viral replication rate which takes into account receptor binding, endocytosis and subsequent intracellular viral load, which depends on glucose concentration and the viral load in the ASL (Figures 23A,B and section Methods). The model suggests that the viral replication rate for low viral loads in the hyperglycemic case is equivalent to the rate of replication induced by high viral loads in the normoglycemic condition. It also suggests that the hyperglycemic condition can further amplify the replication rates induced by any viral load by three to four times compare to normal condition ( Figure 23B).

Modeling SARS-CoV-2 Viral Numbers
Finally, we simulated the hypothetical viral number produced in epithelial cells after a "sneeze" stimulus which takes into  account receptor binding, endocytosis and viral replication rate, again as a function of both glycemic conditions and viral loads ( Figure 24A). We then used the resulting value as a biomarker of primary infection severity (Figure 24B and section Methods). We took the number of virions in the epithelium for each viral load in a normoglycemic (0.4 mM ASL glucose) and a hyperglycemic condition (1.2 mM ASL glucose), and set a hypothetical threshold for primary infection severity (number of virions generated per epithelial cell that would induce severe epithelial damage) at around 1 × 10 4 virions ( Figure 24B). While the hypothetical severity threshold was slightly crossed in the normoglycemic condition at the intermediate viral load, condition it was already reached at the low viral load in the hyperglycemic condition. Furthermore, the degree of the effect of hyperglycemia on severity of outcome depends on the viral load, as the severity threshold is only slightly cross in the normoglycemic condition in case of high viral load, whereas it is dramatically exceeded in the hyperglycemic condition already in case of intermediate viral load (Figure 24B).
To summarize, we modeled the various interactions of glucose in the lung reported in the literature to test the feasibility of the effects of a range of glucose concentration in the ASL on viral receptor binding, endocytosis and replication in lung epithelial cells. The model suggests that while under normal conditions, a high viral load is required to cause severe epithelial damage, under hyperglycemic conditions, even low viral loads could start causing damage. This emphasizes both the importance of the viral load in the infection process and the susceptibility conferred by elevated glucose levels. Consequent viral binding, endocytosis and replication is predicted in hyperglycemic conditions already with intermediate viral loads, and becoming even extreme with high viral loads. While these results could vary somewhat quantitatively due to limitations in the model resulting from simplification steps or unknown mechanisms and parameters, the direction of the results is unlikely to change since all reported effects of glucose only facilitate the infection process. These first steps of the infection are some of the defining moments for how the disease progresses further. In the next part, we show how, if the escalation is not contained at this stage, high blood glucose also facilitates the subsequent development of the disease and its complications.

Elevated Glucose Contributes to Severe Complications of COVID-19
Patients with DM, obesity, hypertension or the elderly are generally more sensitive to respiratory viruses such as influenza or respiratory syncytial viruses (RSV) (290,291). The common impairment of non-specific innate immune defenses of the lung due to high glucose in ASL that we have previously described, could explain the general susceptibility of these patients to respiratory viruses. However, the reason why SARS-CoV-2 infection leads to a worsening of the disease with severe symptoms, such as ARDS, multi-organ failures, or pulmonary embolism in some cases is still elusive. Specificity of viruses resides, among others, in their binding receptor, i.e., ACE2 in the case of SARS-CoV viruses. ACE2 is an effector of the RAAS system that converts angiotensin II (Ang II) to angiotensin 1-7 (292). Following the binding of SARS-CoV-2 to its receptor, the virus-receptor complex is internalized, leading to the inactivation of ACE2, and consequently to an extracellular increase of Ang II concentration (189,293,294)-first locally then systemically. Ang II accumulation has numerous physiological consequences such as an increase in endothelial permeability, vasoconstriction, inflammation and thrombosis (188,(295)(296)(297), and further causes glycemic dysregulation and increased insulin resistance (298,299). Indeed, plasma levels of Ang II are increased in SARS-CoV-2 infected patients compared to healthy individuals, which has been associated with the viral load and lung injury (300), suggesting that inactivation of ACE2 is specifically implicated in the disease severity.

Elevated Glucose Drives the Immune Response Into a Cytokine Storm and ARDS
As mentioned above, high glucose in the ASL is mainly responsible for a weak migration capacity and activation of the innate immune cells at the site of infection, delaying the secretion of the pro-inflammatory mediators, necessary for a well-timed effective immune response. Indeed, previous studies have shown that hyperglycemia impairs the diapedesis capacity (recruitment from the blood) of immune cells (236,238,240,301) delaying the immune response. This is also in agreement with the delayed recruitment of monocytes and secretion of type 1 IFNs observed in severe cases of SARS-CoV-1 and SARS-CoV-2 infections (302-304). Moreover, it was shown that SARS coronaviruses have evolved an elegant way when infecting host cells, to inhibit their production of type 1 IFNs (305,306) which are key mediators of the antiviral response, an effect that also could be explained by high glucose and related increased glycolysis and lactate production by infected cells (307). This overall lag in the pro-inflammatory signal, that could be attributed to consequences of elevated glucose, favors viral propagation, leading to greater epithelial damage associated with an increased level of DAMPs, ROS, and pro-inflammatory cytokines secretion by the many infected cells. These effects, combined with a late and excessive infiltration of M1 monocytes, M1 macrophages, and T cells at the site of infection, cause an exaggerated local inflammation. Indeed, late but excessive infiltration of macrophages, monocytes and neutrophils has been observed in COVID-19 patients (65). Moreover, the subsequent anti-inflammatory signal (M1-M2 shift) necessary to resolve the inflammation is not triggered on time leaving the immune cells in a pro-inflammatory state. Impaired apoptosis and clearance of infected cells by macrophages also adds to a prolonged secretion of inflammatory cytokines by infected cells-an effect that could be further aggravated by the inhibition of the SP-D and SP-A in elevated glucose conditions and a failure to trigger the clearance of apoptotic cells to resolve the inflammation (219,224).
According to this cascade, a growing body of evidence suggests an overactivation of many immune cells in hyperglycemic conditions. In high glucose conditions, DC cells, monocytes, M1 macrophages, effector CD4 + and CD8 + T lymphocytes, once recruited, show a hyper-responsiveness with an exaggerated expression of cytokines, mainly IL-6 and IL-1 (146,236,238,240,301). This exaggerated response is amplified by the SARS-CoV-2 specific inactivation of ACE2 with the resulting local increase in Ang II levels that further aggravates the pro-inflammatory phenotype [IL-6 and ROS (308,309)].
To recap our analysis, we propose that in hyperglycemic conditions, the local overproduction of ROS and cytokines by infected epithelial cells, combined with the hyperactivation of the immune cells and the general imbalance of the pro-and anti-inflammatory signals, and aggravated by the inactivation of ACE2, most likely lays the foundations for the cytokine storm syndrome (CSS) observed upon SARS-CoV-2 infection in severe cases (68). Importantly, the immune cells themselves express ACE2 and are therefore also targeted by the SARS-CoV-2 virus. Infected circulating immune cells and increased apoptosis of T-lymphocytes can lead to lymphopenia, adding to the overall dysfunction of the immune response (65,310). To model these conditions, we have attempted to represent the main events involved in the course of the immune response upon SARS-COV-2 infection in a healthy patient (Figure 25A), and the main impact due to high glucose in high-risk patients (Figure 25B), a representation that is in perfect alignment with other reports (57,311).
With uncontrolled viral propagation and cell damage, the overproduction of cytokines aggravates the damage of the alveolar epithelium as well as the thin pulmonary vascular endothelium. Excessive fluid accumulates in the alveolar spaces causing pulmonary edema, impairment of gaseous exchange, spiraling into the acute respiratory distress syndrome (ARDS) that is characteristic of the severe forms of COVID-19 (65,274,312). At this stage, oxygenation or mechanical ventilation is necessary. ARDS often leads to hypoxemia, respiratory failure and in critical cases, the death of the patient.
ARDS is a severe complication observed in other respiratory viral infections such as influenza (313). However, the overall incidence of ARDS caused by the seasonal IAV is only around 2.7 cases per 100,000 person-years (314), whereas it reaches 15-30% in COVID-19 (315). This huge difference is likely due to the specificity of SARS-CoV-2 for its receptor ACE2 and the consequent higher levels of Ang II. First, as previously described, the Ang II accumulation adds to the uncontrolled pro-inflammatory status, aggravating the cytokine overproduction (IL-6, ROS) characteristic of the cytokine storm. Second, Ang II was shown to inhibit alveolar fluid clearance, to dysregulate ENaC expression that worsens alveolar edema (316). Third, it was shown that Ang II leads to the overexpression of RAGE (317), the pro-inflammatory receptor of AGEs. The higher level of AGEs due to high glucose present in DM, aging or hyperglycemic patients, combined with the Ang IIdependent overexpression of RAGE will lead to a subsequent hyperactivation of the AGEs-RAGE signaling pathway with overproduction of ROS and IL-6 (251,318), that may add to the sustained pro-inflammation of the lung, responsible for the ARDS.
RAGE is a receptor for other specific ligands, including HMGB1 (high mobility group box 1 protein; recognized as a DAMP protein), whose expression increases in hyperglycemic conditions or in conditions such as obesity, systemic The efficient humoral and cellular innate defense of the lung (antimicrobial agents, phagocytes, DC cells, and NK cells) leads to a high phagocytotic rate of the viruses, resulting in a low primary viral replication rate and limiting epithelial damage. The pro-inflammatory signal targeted by the innate immune cells and the infected epithelial cells allows the recruitment of the adaptive immune system (T-cells and B-cells) for further viral clearance. The pro-inflammatory cytokines production is well-balanced by the anti-inflammatory secondary response to resolve inflammation. Viruses are finally cleared, and epithelial damages are limited. (B) In case of high glucose in ASL due to high blood glucose and/or increased epithelial permeability, the humoral and cellular innate defenses of the lung are strongly impaired, reducing the primary viral clearance and delaying the pro-inflammatory signal. The inefficient primary viral elimination response leads to a rapid increase in viral replication as well as important epithelial cellular damage, combined with a delay in the signal for the adaptive response. Additionally, high blood glucose is linked to exaggerated pro-inflammatory cytokine production as well as an impaired anti-inflammatory secondary response, together responsible for the cytokine storm. The cytokine overproduction and the important cellular damages are leading to development of ARDS and eventually respiratory failure. inflammation (319) or diabetes. HMGB1 is also released from activated platelets during vascular damage. Indeed, HMGB1 is elevated in COVID-19 patients and could be an additional potential biomarker of the disease progressing to a more severe form (320). In patients with elevated blood glucose, the HMGB1-RAGE signaling pathway could exacerbate and drive the inflammatory response into a cytokine storm and subsequent ARDS (321).
To summarize thus far, in addition to all the glucose-mediated effects, the specific inactivation of ACE2 and consequent overproduction of Ang II may contribute to the higher proportion and more severe form of ARDS upon SARS-CoV-2 infection, as compared to other respiratory viral infections, in this group of patients.

Elevated Glucose Contributes to the Development of Multi-Organ Failure
ARDS is one of the leading causes of death upon SARS-CoV-2 infection. However, SARS-CoV-2 shows the peculiarity to degenerate into other deadly complications such as multi-organ failures, which has been observed in 47% of the severe cases (35,38,39,322). As mentioned before, the local cytokine storm is responsible for the destruction of both the alveolar and vascular epithelia. Consequently, the inflammatory components present at the infection site (i.e., cytokines, oxidative species, antimicrobials peptides, as well as the virions particles) diffuse and circulate in the bloodstream damaging the vasculature itself and the peripheral organs. Importantly, hyperglycemia, aging and hypertension are all conditions associated with pre-existing endothelium impairments, such as a generalized increase in endothelial permeability (76), further facilitating the transport of these cytotoxic agents from the blood to the peripheral organs. The excessive amount of circulating antimicrobial components, pro-inflammatory cytokines and ROS can cause damage and inflammation to many organs. Additionally, all organs expressing ACE2 become potential additional targets for the circulating SARS-CoV-2 virions such as the heart, kidneys, the gastrointestinal tract, the brain or the vasculature (41,323). As previously detailed for AECII cells, the binding, replication and dissemination of SARS-CoV-2 may also be facilitated in the peripheral organs in hyperglycemic patients. Indeed, SP-D is expressed and is part of the innate immune defense of other organs such as the gastrointestinal tract and kidneys (173,226,324,325). For this reason, the innate defense of these organs could also be directly impaired by high glucose, consistent with the gastrointestinal symptoms and high rate of kidney failure reported (326,327).
More importantly, the pancreatic β-cells express significant levels of ACE2 (200) and may become damaged and inflamed upon SARS-CoV-2 infection (328). The dysfunction of the β-cells may cause a reduction in insulin release and secondary acute hyperglycemia, as it was observed in the preceding 2003 SARS-CoV-1 outbreak (329), and more recently also proposed for SARS-CoV-2 (86,126), aggravating the hyperglycemia and enabling the multiple damaging effects of high blood glucose. Accumulation of Ang II itself can also lead to β-cells apoptosis (330), further amplifying the glycemic dysregulation, insulin resistance, driving a positive feedback paracrine loop and vicious cycle of adverse effects in the disease's progression.

Elevated Glucose Favors Thrombotic Events
In addition to pneumonia, ARDS and multi-organs failures, a high proportion of COVID-19 patients were diagnosed with thrombotic events at a much higher rate than in other types of lung infection (331). Indeed, alveolar capillary microthrombi are nine times more frequent in COVID-19 patients than in people with influenza (332), and occur in 80-100% of severe cases (333,334). These patients present blood clots disseminated throughout the lungs associated with elevated levels of thrombotic markers such as fibrinogen, thrombin, plasmin, and D-dimer (333,335). The blood clots are responsible for pulmonary embolism, heart attack and stroke (336), and could also contribute to a dramatic drop in blood oxygen levels in severe cases of COVID-19.
It is reported that this coagulopathy arises from a thromboinflammation mechanism (337,338); the infection and destruction of endothelial cells (ECs) expressing the receptor ACE2 (339,340) trigger an intricate cascade of inflammatory and pro-coagulant events. Under normal physiological conditions, quiescent endothelial cells (ECs) preserve vascular homeostasis, ensure barrier integrity and function, prevent inflammation, and inhibit coagulation by expressing blood clot-lysing enzymes and producing the glycocalyx, a protective layer of glycoproteins and glycolipids with anticoagulant properties. The infection of the ECs by SARS-CoV-2 is responsible for a massive endothelial pyroptosis, a highly inflammatory form of cell apoptosis (333,341), associated with the disruption of the glycocalyx, the exposure of the basement membrane and activation of pro-coagulant factors (e.g., P-selectin, von Willebrand factor, leukocyte adhesion molecules and fibrinogen) (342). Additionally, pulmonary ECs are antigen presenting cells and assumed to play a role in the immune surveillance against respiratory pathogens. Consequently, infected ECs also release ROS, proinflammatory chemokines and cytokines such as IL-6 (343). The resulting pulmonary endotheliitis contributes to the innate immune hyperactivation, promotes inflammatory cell infiltration (such as by neutrophils) and exacerbates the cytokine storm (333,340). Importantly, as previously mentioned, the SARS-CoV-2 invasion also especially provokes the internalization of ACE2. The resulting accumulation of Ang II in the endothelium amplifies the pro-inflammatory status and induces a local pulmonary vasoconstriction that, combined with the pro-coagulant and pro-inflammatory effects of the virus, degenerates to a severe thrombotic phenotype (Figure 26). The alveolar microcirculatory thrombosis may degenerate to a systemic disseminated intravascular coagulopathy, associated with a pro-hemorrhagic pattern that exacerbates organ injury and increases the risk of mortality (332,338,339).
Patients presenting with glucose metabolism dysregulation may particularly be at risk for these thrombotic complications. First, we have highlighted that hyperglycemia or IGT favors a higher viral replication rate, which we assume would also occur in ECs. Second, chronic or acute hyperglycemia is itself a risk factor for coagulation (344), with increased level of prothrombotic factors and endothelial dysfunctions (e.g., blood viscosity, coagulation, and vascular construction) (345)(346)(347). Hyperglycemia-induced increase of AGEs, as well as glycation of fibrinogen and collagen, are largely involved in the multiple mechanisms of hyperglycemia-dependent endothelial dysfunction (348)(349)(350). Third, increased levels of plasmin fibrinogen or prothrombin was demonstrated in all subsets of patients at risk (347,(351)(352)(353)(354). Fourth, aging, hyperglycemia, DM, obesity, and especially hypertension, are all conditions associated with a pre-existing increased level of Ang II (298,(355)(356)(357)(358)(359). Hence, these patients in particular, may present an excessive concentration of Ang II upon SARS-CoV infection, and then are susceptible to exaggerated vascular vasoconstriction. In addition, the overexpression of Ang II amplifies the glucose dysregulation in these patients, leading to an unstoppable vicious cycle where symptoms become ever more severe.
In summary, patients presenting with a pre-existing procoagulant condition and increased Ang II levels, are more prone to develop severe coagulation disorders and thrombotic events upon SARS-CoV-2 infection, especially if accompanied by hyperglycemia as summarized in Figure 26.

Elevated Glucose Increases the Risk of Secondary Pulmonary Infection
Glucose in the ASL is an important and direct nutrient for bacteria or others pathogens (208) and therefore supports secondary bacterial infections, as established in DM (119,275), cystic fibrosis (360), COPD (361), or patients in the ICU (see section Discussion). Secondary bacterial infection during pulmonary disease management is an important cause of mortality, especially in ICUs in European countries (362). Even if apparently less frequent than in case of IAV infections, bacterial infection is one of the complications in patients critically ill with COVID-19 (363,364), that could also be linked to elevated blood glucose or IGT in patients at risk.

Therapeutic Approaches and Research Strategies
The evidence that patients with elevated blood glucose or IGT are more prone to severe primary infection and COVID-19 complications and death in the literature is overwhelming. Elevated glucose can not only explain much of the variance in COVID-19 severity as a correlative biomarker, but because virtually every action of glucose in biochemical, metabolic and homeostatic pathways seems to serve only to facilitate the infection, it could also be a primary determining factor in the Other consequence of SARS-CoV-2 infection is the increased production of Ang II (as consequence of ACE2 inactivation after internalization), with a vasoconstriction effect. The combination of the activation of pro-coagulation and vasonconstriction is then associated with a high risk of thrombotic events. The majority of patients at risk present a pre-existing pro-coagulant state due to (1) hyperglycemia-induced increase in AGEs production, (2) increased level of pro-coagulant factor such as fibrinogen. In addition, all group at risk are also described to present a pre-existing increased level of Ang II. Then, the presence of these pre-existing conditions (pro-coagulant factors and Ang II level) easily explain the increased risk of thrombotic events upon SARS-CoV-2 infection in these groups of patients. Moreover, and as previously described, the majority of patients at-risk for COVID-19 presents a hyperglycemia or an IGT that increase the viral replication rate in cells. Then, these patients may also present an increased infection rate in their endothelial cells further amplifying the associated events above described. Hence, this schema highlights why these specific groups of patients present a high proportion of severe micro and macro-vascular thrombotic events upon SARS-CoV-2 infection.
severity of the disease. Controlling glucose levels could therefore reduce the severity of the disease and consequently also the mortality rate.
Obtaining an unbiased representation of the findings in such a vast database of relevant literature was only possible with the aid of text mining, machine learning and knowledge engineering approaches. The knowledge graph provided an overview of the contents of the dataset, revealed the high-level structure of the information it contains, and helped guide us through and down to the deepest levels of knowledge, where accessing the specific articles referenced allowed human verification of the findings. The hypothesis that arose called for supplementing the review by analyzing data found across different articles, performing computational modeling to test the feasibility of some of the actions of glucose, and producing atomistic reconstructions to better appreciate some of the physical and biophysical parameters involved. This supplemental analysis further supported the hypothesis that elevated glucose is a primary risk factor for the severity of COVID-19.
Interventions that reduce the availability of glucose would allow tackling multiple facets of the primary viral infection: improving the innate defense of the ASL, decreasing the viral replication capacity inside the cells, impairing N-glycosylation process that would compromise the immune evasion facet of the virus and possibly increase the immune recognition, improving timely orchestration of the immune system. Improving glucose metabolism would also diminish the risk of developing secondary SARS-CoV-2 specific complications such as coagulopathy. The literature actually already contains tests of this hypothesis. In fact, the mortality rate is lower among diabetic patients where glycemia is well-controlled (79) and recent studies show that patients with uncontrolled hyperglycemia or newly diagnosed DM (i.e., untreated) are even more at risk than those with known DM (i.e., treated) (53,80). The hypothesis can be refuted, in part or in full, by finding severe COVID-19 cases where comprehensive measurements fail to detect even normally subclinical glucose dysregulation.

Glucose Lowering Drugs
According to the hypothesis that well-controlled glycemia is critical for determining the outcome of COVID-19, the most standard strategy is to use glucose lowering drugs, widely available and low cost. A wide variety of these drugs is available, with different mechanisms of actions that could all present pros and cons for management of COVID-19 [comprehensively reviewed in (83,365,366)]. For example, ACE agonists would be contra-indicated because they would further increase the inhibition of the ACE pathway and increase expression of ACE2 (367). Also, most of the glucose lowering drugs risk inducing hypoglycemia, which is not recommended. Insulin is widely used for managing glucose levels during hospitalization in ICU (368,369) where it does seem to reduce mortality, length of stay in ICU, and ventilator dependence in COVID-19 (370). However, correcting glucose levels in ICU is extremely challenging (371,372), especially in a pandemic situation where hospitals are being overwhelmed and should only be attempted by experienced staff. Insulin protocols for any COVID-19 patient would therefore have to be explored, developed and clinically tested.
According to the literature reviewed, metformin may be an effective glucose-lowering drug for COVID-19. Metformin is an old drug and the first line therapy for diabetes management. Apart from its safer glucose-lowering effect (i.e., reducing glucose without provoking hypoglycemia) metformin has several other interesting effects that may be advantageous for the management of COVID-19. Firstly, metformin is known to restore the permeability function of tight junctions by increasing the transepithelial electrical resistance as well as the expression of the tight junction proteins claudin 1 and occludin (210,212). Indeed, it has been shown that metformin reduces the airway glucose permeability and the hyperglycemia-induced Staphylococcus aureus load, independently of its effect on blood glucose (373). Secondly, this drug has some anti-inflammatory effects, reducing plasma CRP levels, a biomarker for the poor prognosis for the disease (374). Importantly, this anti-inflammatory effect was also observed in the airway epithelium (375). Thirdly, metformin decreases the glycation of hemoglobin and presents cardiovascular protective, vasodilative and anti-thrombotic effects (376)(377)(378)(379). Finally, metformin inhibits formation of AGEs (380,381) as well as cytosolic and mitochondrial ROS production induced by AGEs in endothelial and smooth muscle cells (382). The combination of these various actions may be protective against the SARS-CoV-2 infection, especially in patients at risk. It is important to mention that metformin is contra-indicated in case of respiratory failure or severe hypoxemia because a side effect, even if rarely reported, is lactic acidosis. Concerning indication of metformin, two studies showed beneficial effects in diabetic users compare to non-users with a reduction in COVID-19 mortality (383) or in heart failure and inflammation (384), which was confirmed by a further meta-analysis (385). However, Cheng et al. (384) showed that metformin was associated with an increased acidosis. However, they showed that this occurred only in severe cases treated with high doses of metformin and in patients that presented with pre-existing renal dysfunction. We found one study where metformin usage seemed to be associated with higher risk of severe COVID-19 in diabetic patients (386).
In summary, the literature supports the notion that patients on metformin do better than those on other diabetes medications. However, this observation should be interpreted with caution since metformin is often the treatment for patients with early or easy to control diabetes, while other medications are introduced as the disease progresses, with insulin introduced when the disease is at its most advanced state. Hence, it remains possible that patients on metformin show better outcomes not because of the beneficial effects of metformin, but because their underlying dysregulation of glucose metabolism is less severe (366). This emphasizes that even strong cases for repurposing a drug, as is the case for metformin, requires randomized clinical trials. Perhaps even more important, clinical trials should be performed on non-diabetic healthy people.

Lowering Carbohydrates in Diet
The strong link between COVID-19 severity and diabetes and obesity has led to consideration of nutritional interventions in the treatment of the disease (387), as for example the use of low-carbohydrate diets or ketogenic diet (low-carb, high fat diet) The basic principle is to diminish the intake of carbohydrates, providing fat instead of carbohydrates for the body to switch on ketosis and produce ketones as the primary energy source (388,389). Indeed, there is growing evidence of therapeutic benefits of a ketogenic diet for severe pathologies (390) such as cancer (391,392), diabetes (393,394), and pharmaco-resistant epilepsy (395,396), and for the prevention of Alzheimer disease (397,398) and other neurodegenerative disorders (399). It is also the first line therapy for the management of the Glut1DS (Glut1 deficiency syndrome) rare disease (400) and there is evidence that a ketogenic diet decreases comorbidities linked to hyperglycemia (401,402). Since viruses are high glucose consumers, just like cancers (141,403,404), diminishing the indispensable primary source of energy for the virus may be an effective intervention. Importantly, it was recently shown to be a safe intervention for patients even in ICU (405,406).
One study recently showed that the ketogenic diet (KD) activates protective gamma delta T (γδT) cell responses against influenza virus infection (407). These γδT cells are IL-17producing T cells that play an important antiviral protection in the lung (see Figure 13), maintaining epithelial integrity, regulating homeostasis and providing a first line of defense against pathogens and injury (408). Importantly, γδT cells were shown to be reduced in COVID-19 patients (409). Goldberg et al. showed a significant increase in the frequency and absolute number of γδT cells in the lungs of KD-fed mice; this increase was required for the KD-mediated protection against influenza disease, resulting in lower viral titers and overall better preservation of airway tissue integrity. Interestingly, in 2010, Taylor et al. (408) reported that γδT cells were reduced and impaired by hyperglycemia in a mouse model of obesity. In a similar way, KD could also have the potential to help the immune defense against pulmonary viral infection, but this has to be confirmed for SARS-CoV infection.
Supporting this hypothesis, a clinical trial on KD for intubated critical care COVID-19 was initiated (https://clinicaltrials.gov/ ct2/show/NCT04358835). The aim of the study was to measure the benefit of KD on gas exchange, inflammation, and duration of mechanical ventilation in intubated patients with COVID-19 infection (on 15 patients at start). Additionally, two recent reports advised on the use of a carbohydrate-restricted diet for the management of the disease (410,411) and a randomized controlled trial on KD has been developed [see (410)]. Cooper et al. (411), claim that the ketogenic diet would be more beneficial than insulin therapy because large fluctuations in blood glucose concentrations are primarily driven by dietary sources, and it would also avoid the adverse effects of hyperinsulinemia. Lowering carbohydrate consumption could therefore manage both hyperglycemia, hyperinsulinemia and may additionally help manage hypertension (412).

Importance of Systematic Glucose Metabolism Measurement (FPG, PPG, HbA1c, Insulin)
The numerous lines of evidence in the literature for elevated blood glucose as a correlative risk factor is overwhelming and makes a strong case for far more thorough monitoring during the management of COVID-19. As previously mentioned, increased FPG becomes an important marker of mortality and morbidity and should be systematically measured. Importantly, a single normal FPG value is not sufficient to exclude acute hyperglycemia or IGT. For this reason, regular FPG and PPG measurements should be systematically obtained. We emphasize that the 2 h OGTT test is not recommended as it requires the ingestion of a high amount of glucose, that could be detrimental and escalate the disease progression. We propose that measuring HbA1c and insulin should also be included to detect any glucose metabolism dysregulation (e.g., diabetes, prediabetes, hyperglycemia, IGT, IFG, hyperinsulinemia). Indeed, it appears that the proportion of undiagnosed prediabetic patients is high in severe cases (126). On this line, HbA1c was recently found as a predictor of COVID-19 severity (413). Finally, even moderate dysregulation, and not only severe hyperglycemia, should be taken into consideration as it could be the starting point for an unstoppable viral infection. However, routine measurement of insulin and HbA1c could be challenging in many hospitals, especially during a pandemic where hospitals become overwhelmed. Measuring fructosamine could be a simpler alternative or a complementary measure for assessing recent glycemia (414,415).

Alternative Biomarkers
Ang II plasma level. Ang II is strongly associated with dysglycemia, and patients who are at risk for severe COVID-19 disease (e.g., diabetic, obese, elderly, or hypertensive) present with an increased basal plasma level of Ang II, that is further amplified by SARS-CoV-2 infection. Ang II possesses inflammatory and vasoconstrictive effects that appear to play a critical role in the COVID-19 disease severity. Hence, measuring Ang II at admission or/and during the course of the disease, could be an additional biomarker for risk stratification or as a prognostic indicator.
SP-D plasma level. SP-D is a key player in the development and regulation of the innate immune defense of the lung against SARS infection (219,225). Serum levels of SP-D are elevated in patients with SARS related pneumonia and has been suggested as a marker of alveolar damage in this condition (416). SP-D is mainly synthetized by AECII cells of the lung and is released into the blood during certain types of lung injury. Furthermore, SP-D plasma level is considered to be a putative biomarker for pulmonary disease (325,417), such as acute airway inflammation (418), or exacerbation of COPD (419). Importantly, it was also described as a biomarker of cardiovascular diseases (420,421) or atherosclerosis (325). Hence, comparing SP-D plasma level in non-critical vs. critical patients could indicate if SP-D could be an additional biomarker of COVID-19 disease severity.

Glucose Management in ICU
The high glucose content in parenteral feeding used in some ICU cases could be more detrimental than beneficial for COVID-19 patients since high glucose favors all stages of the infection. The accepted range of FPG in ICU is [8-11 mmol/L (118)] which is much higher than the normal range. In fact, the common thinking was that "high blood glucose concentrations are believed to be a normal physiologic reaction in stressed patients and that excess glucose is necessary to support the energy needs of glucose-dependent organs" (92). This strategy may be indicated for some other diseases, but is not supported for SARS-CoV and other viral infections.
The exact target of blood glucose concentration in ICU remains a matter of debate as reports yield contradictory conclusions (422), mostly due to the heterogeneity of the studies, but the beneficial effects of lower glucose in parenteral feeding is consistently supported by multiple studies. For example, Patiño et al. (423) have demonstrated that patients receiving hypoenergetic-hyperproteic total parenteral nutrition regimens on a surgical ICU have a more physiological clinical course, with less metabolic stress than those receiving high-energy loads. Later in 2001, Van den Berghe et al. (369), reported that intensive insulin therapy, to maintain blood glucose at or below 110 mg per deciliter, reduces morbidity and mortality among critically ill patients in the surgical intensive care unit. Hypoglycemia has to be avoided, however, in the case of SARS-CoV-2 infection, a tight control of glucose metabolism should be mandatory for ICU patients, as recently proposed (370,371). Although it is challenging to manage glucose levels in ICU patients, some effective protocols for tight glucose control in these conditions are emerging (85).

DISCUSSION
CORD-19, a valuable literature dataset, was made open-access to stimulate collaboration and accelerate solutions in the global crisis caused by the SARS-CoV-2 virus. While there are numerous questions one could ask using such a vast dataset, we chose to ask why some people get more affected than others. It is no longer possible for humans to read, let alone synthesize the hundreds of thousands of scientific studies produced with a wide range of scientific expertise and across numerous scientific disciplines. We therefore developed an expert knowledge system to mine the dataset and help navigate this literature resource. We then combined expertise in molecular and cell biology, data and knowledge engineering, machine learning, and scientific modeling and visualization to address this urgent question, and provide potential clinical and research guidance to help fight against this virus.
We first performed a trivial analysis of the entities mentioned in the entire set of articles of the CORD-19. Glucose stood out as the most frequently mentioned biochemical that could be a common and important biological variable in all patients with COVID-19. We therefore constructed specific knowledge graphs to focus on all findings that consider glucose in the context of respiratory diseases, coronaviruses, and COVID-19. This allowed us to explore the potential role of glucose across many levels, from the most superficial symptomatic associations to the deepest biochemical mechanisms implicated in the disease. We found strong support for elevated blood glucose as a fundamental risk factor across groups with identified pre-conditions. We followed this review of the correlations between glucose control and the disease with a review of the mechanisms of action of glucose in the various steps of the infection. Our analysis showed how elevated blood glucose can impair the first level of innate immune defense in the lung and create ideal conditions for the virus to access, enter and replicate in target cells. It also revealed how elevated glucose can facilitate the development of multiple complications of the disease such as hyperinflammation and pro-coagulation. A case for impaired glucose metabolism as a common pre-condition for the severity of COVID-19 becomes even more compelling when the data published in the CORD-19 dataset is combined with established knowledge of glucose biochemistry, metabolism, and homeostasis, and with the role of glucose in related pathologies.
We used the knowledge graphs to find the stronger and more consistent claims, tested the feasibility of the parameters reported in multiple articles using computational modeling, and attempted to obtain an atom-level realistic view of the virus and some of the compounds it interacts with. We conclude that the literature strongly supports a case for compromised glucose metabolism that causes elevation of glucose levels in extracellular fluids, blood and tissue as a single pathology that can facilitate virtually every step in the life-cycle of SARS-CoV-2, and that induced elevations of glucose by stress during hospitalization, treatment drugs, and in intravenous infusions can contribute to disease severity. Reduced glucose metabolic capacity could be a common pathology that can contribute to age-dependency of the disease and can explain why the specific comorbidities render these groups more vulnerable to the infection. Subclinical pathology of glucose metabolism may also be one of the reasons why some young and apparently healthy people can contract a more severe form of the disease.
Elevated glucose naturally does not act alone. It acts in concert with numerous other pathophysiological pathways to facilitate the primary infection and replication of SARS-CoV-2. For example, the effects of elevated glucose can act synergistically with the virus's inactivation of the ACE2 receptor to drive a more severe form of COVID-19. Indeed, hyperglycemia or impaired glucose tolerance can cause multiple physiological disturbances that are all linked to the severity of the disease such as an impaired innate immune system, impaired lung epithelial resistance, subsequent hyperactivation and dysregulation of the immune system, increased vascular permeability, and a procoagulant state. Patients presenting with compromised glucose metabolic capacity struggle to contain and eliminate the virus and to prevent the progression of the infection and the occurrence of complications. Patients with high glucose metabolic capacity such as healthy people at any age and in particular young people, have primary lung defenses that are sufficient to contain and expel the virus before advanced disease sets in, slow the infection of and replication in cells, and lower the risk of fatal complications. The effects of elevated blood glucose can also act synergistically if lung epithelial tight junctions are compromised causing a positive feedback in the increase of glucose in the ASL, a disruption of glucose homeostasis and a subsequent breakdown of the lung primary defenses. Our computational model also illustrates the critical importance of the viral load in determining disease severity since low doses of the virus can escape the primary defenses if glucose metabolism is compromised, and high doses of the virus can overcome the intact immune defenses of even the most healthy.
We could not find another biological variable, other than glucose metabolism, in the literature that could better explain why the disease is more severe in some than in others. Young people do have lower levels of ACE2 receptors and TMPRSS2 in their lower airways (424), but the literature is inconclusive as to whether the level of ACE2 receptors can underlie all the differences reported in the severity of COVID-19 (186,187). However, these apparently inconsistent results may be still be because of these studies could not adjust their results for the integrity of the primary lung defenses, which is what determines the viruses access to most of the ACE2 receptors in the alveolar cells. The virus has found an ACE2 receptor-independent pathway to enter cells so (425), hence, while the levels of this receptor remain an important variable, it is unlikely to be able to explain the variance in disease severity as comprehensively as the level of glucose control.
The importance of elevated glucose levels compromising the very first defense of the lung as the key barrier to contain the virus and prevent an avalanche of infection and complications, is underappreciated in the CORD-19 database. Similarly, the potential importance of elevated glucose levels acting to provide the virus ideal conditions to coat the spike protein with glycans that can confer its pathogenicity and immune evasion is underappreciated. It required piecing together numerous lines of evidence across many sources to reveal how elevated glucose could be involved in the complications of the disease, such as driving the immune response into a cytokine storm and participating in the dysregulation of coagulation and thrombotic features. Management of COVID-19 to some extent considers management of glucose as an important component, but if this hypothesis is correct, glucose management may need to become a central strategy.
Tight control and management of glycemia in COVID-19 patients may be critical in order to lower the first phase of infection and decrease the escalation of the disease. Managing glycemia in ICUs, where more than 80% of the patients were reported to present with hyperglycemia, also seems critically important. Even if glycemia is checked during hospital admission, only FPG of more than 10 mmol/L is considered serious (118), while even a moderate increase in FPG could be a risk factor. Furthermore, FPG reflects mostly the resting glucose levels and may not reveal sufficiently abnormal glucose metabolic capacity to clear glucose. Impaired glucose tolerance (IGT) should therefore be specifically tested, but the OGGT, the usual gold standard test for IGT involves ingestion of a large bolus of glucose, could drive the progression of the disease. HbA1c measurements may serve as an alternative biomarker for IGT, and insulin levels could be more systematically measured to detect undiagnosed diabetes, pre-diabetes, or insulin resistance. It is also necessary to consider the practicalities whether such tests are feasible in general hospitals around the world and to consider alternative markers such as fructosamine. Nevertheless, the monitoring of glycemia during the course of the disease should at least be as important as the monitoring of the more common biomarkers such as IL-6, CRP, D-dimer, or ferritin. Interventions to control glycemia should seriously be considered. Ang II and serum SP-D could be additional biomarkers to assess the risk of complications such as the cytokine storm and disseminated intravascular coagulation, and could also help in estimating the time course of the disease.
Even at a late stage of the pandemic, approaches to detect and manage abnormal glucose metabolism and administer appropriate glucose-lowering drugs or diets, are indicated to help weaken the infection. Metformin, an old, safe, and FDA approved drug, is an interesting glucose management drug that also has multiple other effects that could be beneficial in the management of COVID-19. Metformin not only reduces blood glucose levels and clearance following a bolus of glucose, but also has anti-inflammatory properties as well as cardio-vascular protective effects (i.e., anti-thrombotic). Patients with diabetes on metformin seem to be at lower risk of severe disease, but studies on the potential beneficial effects in healthy and diverse groups and in groups presenting with the other comorbidities of COVID-19 are lacking. However, due to high risk of lactic acidosis, metformin is not recommended in critically ill patients, especially those at risk of ARDS.
The literature makes a strong case for using a ketogenic diet (KD) in the management of COVID-19, but there are challenges to clinical implementation. It is contra-indicated for some groups such as those with type 1 DM (410,411). The diet is difficult to set-up properly to ensure nutritional, electrolyte and fluid balance. Beneficial effects of this diet, that has been found in the management of other diseases, are also usually expected over the long-term (390). The time needed for the body to enter ketosis also varies for different groups and the reasons are unclear. In addition, a transition to ketosis is often associated with flu-like symptoms (keto flu) (426), which may interfere with the innate immune response to the virus. Hence, reducing glucose in the diet may be an interesting approach as a potential preventive measure to reduce the risk of developing severe symptoms. However, implementing such an intervention for severely ill patients in critical care in a pandemic, often with non-specialist staff, is not without potentially serious risk to the patient.
We provide a machine learning's view of the role of glucose in the severity of COVID-19, but to what extent can we trust the machine-generated view? In its current state, the machinegenerated output has important flaws as mentioned in the introduction. We attempted to moderate the output to mitigate some of these flaws, but our interpretation of the output also has potential flaws. This is especially true given the wide range of disciplines covered, which the multiple authors may not necessarily cover. Indeed, the reviewers of this article added significantly to this review by pointing out cases where the machine-generated output and the human translation failed.
An example of how the machine learning models can fail was provided by the reviewers. In our guidance that hospitals should test for glycated hemoglobin, the reviewer pointed out that that may not be practical in the general hospital setting, especially during the rush of a pandemic, and suggested measuring fructosamine levels. This is clearly knowledge that the models could not extract and the human missed.
The reviewers also pointed out that GRP78 (glucose-regulated protein 78) and CD147 are two proteins reported to interact with the S-protein and are alternative receptors or co-receptors for SARS-CoV-2. This omission is serious because GRP78 is induced by glycemic stress and Cd147 is upregulated by high glucose levels and by AGEs. Additionally, blockade of both GRP78 and Cd147 is shown to inhibit SARS-CoV-2 viral entry and replication. In response to the reviewers, we examined why the model missed this concept. We found that CD147 and GRP78 were in fact mentioned 307 times and 438 times, respectively, in the entire CORD-19v47 dataset before any filtering and that they were included in the knowledge graph containing the 10,000 most frequent entities (a minimum included frequency was 119 mentions). They are therefore well-represented in the CORD-19 literature, but they didn't survive the "community vote" performed by the mutual information algorithm (see section Methods) when examining a subset of the article in the context of the role of glucose in COVID-19. This suggest that while they are crucial forefront concepts, that they had not yet reached a significant level of appreciation in the research community in the context of the role of glucose in COVID-19 (mentioned only 46 times and 34 times, respectively, after filtering the articles for glucose in COVID-19). These concepts therefore did not make it as part of the top-most concepts containing the most frequent 1,500 entities (a minimum included frequency was 67 mentions). We do need to keep in mind that the CORD-19 dataset has expanded enormously since our analysis, and the potential importance of GRP78 and CD147 in the role of glucose in COVID-19 may have already changed.
Another example was the missing of the entity HMGB1, another ligand for RAGE, the Receptor for Advanced Glycation End-products, which has been shown to be involved in the inflammatory response of the lung (see section Elevated Glucose Leads to Increased Production of AGEs). This is also actually a very important omission of the model that was detected by an expert human (the reviewer), because expression of HMGB1 is enhanced by hyperglycemia, in conditions such as obesity, systemic inflammation and diabetes. In addition, HMGB1 levels are elevated in COVID-19 patients and therefore could be a novel biomarker in managing COVID-19. The HMGB1 was not discussed because it was only mentioned 111 times in the entire CORD-19v47 dataset, and only 8 times in the 3,000 articles after filtering the articles for glucose in COVID- 19. Surprisingly, a related entity HMGB2 was detected after this filter (74 mentions in the 3,000 papers), but in this case the human authors failed to find a link with COVID-19 and glucose during the manual review. This illustrates the shortcomings of the machine-generated output and the human translation of the output that should be taken into account to mitigate the weaknesses when building future incarnations of this assistive technology. For example, when we manually imposed the inclusion of these entities in the knowledge graph, then both GRP78 and CD147 do appear in the context of SARS-CoV-2 receptors, and HMGB1 appears as a ligand for RAGE as expected (see Supplementary Figure 8). Thus, with some "normalization, " the model can represent these concepts in an appropriate manner.
As already pointed out, the model can also not yet detect concepts that are more common expert knowledge and less objectively reported science, such as the practicalities of HBA1c measurement, routine insulin dosage or dietary adaptations, or the fact that these interventions require specialized staff.
These examples illustrate the barriers that future incarnations of such a machine-driven framework would need to overcome to become stand-alone reliable assistant to reviewing of the scientific literature. Future incarnations could also have rigorous processes in place to ensure that some concepts are not over-represented for reasons other than scientific merit, such as funding, political and research bias. It could have safe guards in place to ensure that forefront research is not filtered out as illustrated above. Finally, such a machine learning framework can be vastly improved by qualifying the kinds of associations between concepts and achieving deeper natural language understanding and reasoning capabilities.
In conclusion, we present a powerful initial approach to reviewing the COVID-19 literature using machine learning, natural language processing and knowledge graph technologies. At this stage, the model still requires significant human curation, including from the external reviewers of this article. Despite the short-comings of the approach, it is today the only way to obtain a community view of the contents of such a vast literature dataset and it did allow us to extract a powerful community consensus on the role of glucose in the infection that has far reaching potential implications for this in future pandemics. The result is overwhelming collective evidence that elevated blood glucose, arising from clinically or subclinical pathology in glucose metabolism or from induced hyperglycemia due to hospitalization, drug treatments and intravenous infusions in ICU should be considered as a biomarker that correlates with and hence is predictive of severity of COVID-19, as well as evidence that elevated glucose can cause an acceleration of virtually every step of the SARS-CoV2 infection. Rigorous clinical studies are therefore called for to determine whether elevated glucose is in fact the predominant underlying driver of disease severity.  (87). For our analysis, we used the CORD-19 v47, released on February 11th 2020, including over 240,000 articles, with over 100,000 being full-text.

References cited in the methods can be found in
Named Entity Recognition (NER): Our text mining pipeline for extracting information from the CORD-19 database starts by using machine learning models for named entity recognition (NER). These models are based on scispaCy models (427) that we fine-tuned on a manually annotated subset of sentences from the CORD-19 dataset in order to recognize nine custom entity types of interest: "cell compartment, " "cell type, " "chemical, " "symptom/disease, " "drug, " "organ/system, " "organism, " "biological process/pathway, " and "protein. " In order to create the dataset to train and test our NER models, two scientific experts used the annotation Prodigy (https://prodi. gy/) to label a total of 1,355 sentences containing mentions of each of the nine entity types of interest. In particular, the human experts located all the text spans in the sentences that correspond to a given entity, and then classified such spans with respect to their correct entity type. Then, for each entity type of interest, we evaluated if a pre-trained scispaCy model supporting a similar entity type was available. If such a model existed (e.g., for entity type "organism"), we fine-tuned its weights on our annotated sentences. On the other side, if such a model could not be found (e.g., for "pathway"), we used our annotated sentences to train the model from scratch.
Entity linking: Extracted words in articles do not necessarily correspond to unique entities. In standard text search, this leads to ambiguity because an entity may be spelled differently in different articles or even within the same article. Entity linking addresses this problem by resolving extracted entities to unique identifiers taken from a knowledge base while taking into account lexical variations as well as synonyms, aliases, and acronyms. The resolved identifier can therefore be used to unambiguously reference an entity on subsequent text mining and knowledge graph tasks.
In this study, the National Cancer Institute Thesaurus (NCIt) ontology (428) was used as the knowledge base to which extracted entities were linked. Such linking also gave us access to the semantics of the entities with their human-readable definitions and hierarchically structured semantic types (e.g., the entity angiotensin-1 is a AGT gene product, which is a subtype of peptide hormone, which is a protein). The entity types obtained during the extraction phase were therefore enriched with ontology types which allowed labeling of the resulting concepts into nine unique entity types summarized in the table below: 2) Query-based literature search Amongst the hundreds of thousands of articles contained in the CORD-19 database, we wanted to be able to focus our analysis on subsets of publications. An information retrieval tool for literature search was therefore implemented, allowing users to query the CORD-19 database both for various simple filtering criteria (e.g., "publication date" or "journal") and for relevance with respect to a given query.
Our information retrieval model is based on a two phases approach. First, for a given query, we computed a relevance score for each of the sentences in the CORD-19 database. Then, we ranked these sentences with respect to their score and returned the sorted results to the user.
In order to compute the relevance score, we used a machine learning model based on BioBERT (429) and fine-tuned on the CORD-19 corpus in order to produce sentence embeddings vectors. For any given sentence, its embedding vector encodes its semantic information, so that semantically similar sentences are mathematically represented by similar vectors. Then, we use the cosine distance to compute similarity between pairs of sentences embedding vectors, which enables us to compute the relevance score of each sentence in the CORD-19 as the similarity between its embedding and the embedding of the query. The criteria set for the search in the current study are the following: -Query: "Glucose as a risk factor for COVID-19" -Granularity: "Articles" -Number of top articles: "3,000" -Date range: "2000-2020" -Journal Type: "All."

3) Knowledge graph construction method
We examined co-mentions of entity pairs on an articleand paragraph-level, adding an edge between a pair of entities if they co-occur in at least one article. We then assigned several weight metrics to each co-mention edge including raw co-mention frequency, positive pointwise mutual information (PPMI) and normalized pointwise mutual information (NPMI) (430) calculated based on the co-occurrence in the same articles and the same paragraphs. In this context, the presence of an edge between a pair of entities can be interpreted as the presence of some association between the pair of concepts they represent, and the corresponding edge weights quantify the strength of such association. To assign weights on the nodes of the constructed graph that would reflect the importance of entities in the extracted dataset, we have computed nodes' weighted degree centrality using the previously described edge weights (given by raw frequency, PPMI and NMPI).
-CORD-19 knowledge graphs Following the methods described above, we constructed two knowledge graphs. The first knowledge graph is based on the entire CORD-19 dataset. Out of more than 400,000 entities extracted and linked during data preparation, the 10,000 most frequently mentioned entities were selected and used as nodes of the knowledge graph. The constructed graph is very dense containing over 44 million edges (resulting in a density of 0.87, i.e., 87% of all possible pairs of co-mentions) out of which 12 million edges have non-zero paragraph-level cooccurrence (making the density of the paragraph-based comention network 0.25).
The second knowledge graph was built using the results of the query-based literature search (see section Results). Out of more than 20,000 extracted entities, the 1,500 most frequently mentioned entities were selected and used as nodes of the knowledge graph. Only the edges that correspond to the nonzero paragraph-level co-occurrence were considered. As a result, we generated ∼700,000 edges, giving a total density of 0.62.

4) Community detection
To partition the knowledge graph constructed based on the entire CORD-19 dataset into clusters of strongly associated entities, we performed node community detection using the Louvain algorithm (431) on paragraph-level co-occurrences. A community represents a cluster of nodes whose connections are stronger within the community than with the rest of the network. The algorithm detected five different communities of entities. Having examined the most frequent entities in each community, we have mapped the communities to the following five topics: biology of viruses, diseases and symptoms, immune response, infectious disorders, chemical compounds (Supplementary Figure 3). Note that despite the fact that our network is given by a highly dense graph, such community detection is still possible when taking into account edge weights (in our case, the NPMI values of the co-occurrence edges). The resulting partition gives us a modularity value of 0.21.

5) Minimum spanning tree analysis
To sparsify our knowledge graph and gain insight into the most important and relevant associations between entities, we computed the minimum weight spanning tree with the weight being assigned to the inverse of the edge NPMI based on paragraph-level co-occurrence (therefore, a higher pointwise mutual information between entities implies a smaller distance). The minimum spanning trees on Figure 4 and Supplementary Figure 4 were generated from the knowledge graph built using the results of the query-based literature search.

6) Best mutual information pathways search
To gain insight on the most important and informative sets of entities relating "glucose" and "COVID-19" or "SARS-COV-2" in the literature, we have used the approach called "best mutual information pathways" (BMIPs). Such pathways are constructed using the shortest weighted paths from the source to the target entity (e.g., from "glucose" to "COVID-19" or "SARS-COV-2"), where the weight corresponds to the articlelevel NPMI (i.e., the higher NPMI associated with an edge, the smaller the "distance" between the corresponding source and target entities).
To be able to navigate and explore the literature, we aimed to identify concepts that relate a source to a target entity. Therefore, we focused on finding a set of such shortest paths, rather than a single shortest path. Classical algorithms exist to find the N shortest paths between two nodes in a weighted graph (such as (432)). However, due to the density of our graph, such algorithms perform poorly in terms of execution time. For the same reason it is extremely rare that one of the shortest paths between two nodes consists of more than two hops. We therefore adopted a naïve strategy that exploits this property. We first find all the indirect shortest paths (discarding the direct edge from the source to the target) in terms of number of hops ignoring the edge weight (usually such paths consist of two hops, which greatly reduces the space of all possible paths). Then, the algorithm ranks these paths by computing the cumulative distance score and chooses the ones with the largest score. Such distance scores are simply given by the sum of the inverse of the NPMI associated with the path's edges. Finally, to further explore the space of co-mentioned entities in depth, we can run the path search procedure in a nested manner. For each edge encountered on a path e 1 , e 2 , . . . , e n from e 1 to e n , we further expanded it into n shortest paths between each pair of successive entities (i.e., paths between e 1 and e 2 , e 2 , and e 3 , etc.). For example, the graph in Figure 3 is obtained by aggregating the nodes and the edges encountered by, first, searching for the 20 BMIPs and, second, by further expanding each of the encountered edges into their five BMIPs.

7) Knowledge graphs visualization
We have developed graphical interfaces that allow performing interactive entity curation, exploration and analysis of cooccurrence graphs, that were used to produce graphs and BMPIs in the figures. Semantic issues that were not identified by the ontology linking process were fixed during this manual curation process. Then, all knowledge graphs-related figures were produced using the Gephi software (https://gephi.org/).
The sizes of the nodes are proportional to their weighted degree centralities. The color of the nodes corresponds to different entity types (except for Supplementary Figure 3 where colors correspond to different communities). The thickness values of the edges correspond to NPMI values, while their length is arbitrary.

8) Literature review
The previously mentioned graphical interfaces allowed us to explore the knowledge graph and to examine the CORD-19 articles supporting different entity co-occurrences (for example, "glucose" and "CRP"). This first set of automatically selected articles guided us to then further perform a second step of classical literature review and considered other sources, not necessarily included in CORD-19.

9) Source code
The source code for the pipeline stages relative to Semantic Literature Search and Named Entity Recognition can be found here: https://github.com/BlueBrain/Search/tree/v0.1.0.
The ML models, and the data used to train them, can be found here: https://zenodo.org/record/4589007. The code, data, and instructions to reproduce the entity linking, knowledge graphs generation and analysis can be accessed from: https://github.com/BlueBrain/BlueBrainGraph/ tree/master/cord19kg.

B) GLUCOSE-DEPENDANT SARS-CoV-2 INFECTION COMPUTATIONAL MODELS
The computational model of the glucose-dependence of SARS-CoV-2 infectivity in alveolar epithelial cells was deterministic and written in the MATLAB simulation language (https://www.mathworks.com/), requiring 64 bit version R2016a or later running on an i5 CPU or equivalent or better with minimum 8 GB RAM. The implementation of the model was based on the parameters and governing equations provided below.

C) BIOEXPLORER DESIGN AND IMPLEMENTATION
The Blue Brain BioExplorer (BBBE) application is built on top of Brayns (https://github.com/BlueBrain/Brayns), the BBP rendering platform. The role of the application is to use the underlying technical capabilities of the rendering platform to create large scale and accurate 3D scenes from Jupyter notebooks.
The viral and host cell membranes were generated from phospholipids structures created following the process described in the VMD (https://www.ks.uiuc.edu/Research/ vmd/) Membrane Proteins Tutorial (http://www.ks.uiuc.edu/ Training/Tutorials/). Then, an assembly of phospholipids elements is generated by the BioExplorer, with a given shape, and a given number of instances of phospholipids.
Molecular components: Dimension and PDB-ID of each molecular component represented in Figures 11, 12, 16, 19 and in the Movie are described in the panel below: The spike in closed conformation was self-generated using "I-Tasser Protein structure prediction" and "Modeler" from Uniprot ID P0DTC2. The model for ACE2 corresponds to ACE2/B • AT1 complex (14). The same models were used to generate surfactant protein-D and surfactant protein-A, only the conformation was different. Dimension and conformation of SP-D and SP-A were obtained from literature (220,442).

3) Glycans a. Formula
Glycan trees are retrieved from Glycam Builder (http:// glycam.org/Pre-builtLibraries.jsp) with the corresponding formula for each of the four types of N-glycans (No diversity was included in the present model).

b. Types and positions
The glycan-type (HM, complex, or hybrid) with the highest representation reported for each specific site was considered, hence no microheterogeneity is included in our representation.

DATA AVAILABILITY STATEMENT
The links to access the simulation codes used to generate the datasets presented in this study are provided in the method section, and can be run using the instructions in the "README.md" files. All simulation codes are publicly open sourced.