Complex Dietary Topologies in Non-alcoholic Fatty Liver Disease: A Network Science Analysis

Background and Aims: Previous studies have explored the associations between nutrition (food groups, nutrients, and dietary patterns) and the prevalence of non-alcoholic fatty liver disease. However, it remains unclear whether how foods are consumed together is associated with non-alcoholic fatty liver disease. The present study aims to construct dietary networks from network science and to explore the associations between complex dietary networks and non-alcoholic fatty liver disease. Methods: The present case–control study generated 2,043 multivariate matched controls for 2,043 newly diagnosed non-alcoholic fatty liver disease cases. Mutual information, which represents both linear and non-linear dependencies among food groups, was used to construct the network topologies. Results: The dietary topologies in the studied case and control groups were different despite the fact that only few food groups show differences in absolute intake. The dietary structure of the case group focused on two major components with more cohesion among food groups, while contrarily the control group had one major component with higher diversity of food groups. The dietary topology of the case group showed equality in connections among beneficial and detrimental food groups, whereas the control group focused more on healthier food choices. Conclusions: This study suggests how foods are consumed, besides the absolute intake, could be an important determinant of the occurrence of non-alcoholic fatty liver disease. A diverse diet that focuses on whole grain, tubers, and vegetables could yield beneficial effects regarding non-alcoholic fatty liver disease. Network science could offer a complementary tool in nutritional epidemiology.


INTRODUCTION
Non-alcoholic fatty liver disease (NAFLD) develops without alcohol abuse. It is defined as the presence of at least 5% hepatic steatosis without evidence of hepatocellular injury in the form of hepatocyte ballooning (1). As reported in a metaanalysis conducted in 2016, 25% of the global adult population were afflicted with NAFLD (2). NAFLD constitutes not only a potentially progressive course that leads to liver fibrosis, cirrhosis, hepatocellular carcinoma, and liver transplantation (3) but is also associated with other non-communicable diseases, such as type 2 diabetes (4) and cardiovascular diseases (5). Considering the increasing disease burden of NAFLD, it is important to identify risk factors and develop appropriate treatment strategies. Lifestyle interventions, particularly a healthy diet, have been recognized as effective treatments in early to advanced stages of NAFLD (1).
Previous studies have explored the associations between NAFLD and the intake of nutrients and food items, such as mushrooms (6), yogurts (7), raw garlic (8), nuts (9), oranges (10), soft drinks (11), dietary fibers (12), and fructose (13). Moreover, dietary patterns, which encompass the effects of overall diet and closely parallels the real world situation (14), were also shown to be associated with the prevalence of NAFLD (15)(16)(17)(18)(19)(20)(21)(22). In fact, our previous study has shown that higher carbohydrate/sweet pattern scores are associated with a higher prevalence of NAFLD among females (16). Another study has also demonstrated that higher intake of a healthy dietary pattern (characterized by higher intake of fruits, vegetables/legumes, white meats, olive oil, margarine, and bread/toast) is associated with a lower prevalence of NAFLD (20). It should be noted, however, that all the aforementioned dietary patterns assessed in previous studies were derived based on the hypothesis that the associations among the intakes of food items were linear. For example, factor analysis reduces data into patterns that can explain the maximum variation in food intake based on linear inter-correlations between dietary items (23). But reduced rank regression focuses on identifying linear functions of food groups, which explains as much variation as possible in a set of intermediate response variables (24). However, the associations between food item intakes could be non-linear. A recent study from the network science approach has derived some dietary patterns that fully reflect the complex interconnectedness of food intakes, and explored the associations between dietary patterns and dementia (25). Networks are databased mathematical models of complex systems that can identify Abbreviations: BMI, Body mass index; FFQ, Food frequency questionnaire; MI, Mutual information; NAFLD, Non-alcoholic fatty liver disease; WDR, Weighed diet records. both linear and non-linear associations and explore complex dynamics (26). Compared with traditional statistical methods used in the derivation of dietary patterns, network science can help discover the potential role of food groups in overall dietary patterns and provide a new insight into the complexity, particularly the non-linearity, of dietary patterns (25). To the best of our knowledge, no study has been conducted to explore the association between NAFLD and dietary patterns constructed based on non-linear associations among food groups. Moreover, no study has been carried out to investigate the differences in comprehensive interactions among food groups between NAFLD patients and their controls, referred to as the case group and the control group, respectively. These are the topics and also the main contributions of the present study, which aims at a case-control study to explore the differences in dietary pattern structures between NAFLD patients and their controls using network science tools.

Participants
The present case-control study was conducted based on the Tianjin Chronic Low-grade Systemic Inflammation and Health (TCLSIHealth) Cohort Study, a large prospective dynamic cohort study focusing on the associations between chronic low-grade systemic inflammation and the healthy status of a population living in Tianjin, China (15,16). Participants were recruited when they were taking annual health examinations at the Tianjin Medical University General Hospital-Health Management Center and some other community management centers in Tianjin.
Participants with missing variables or those with implausible energy intakes (≤400 or ≥10,000 kcal/day) were excluded (n = 1,521) in the process of data clean. Afterall, 23,063 participants without acute inflammatory disease completed comprehensive health examinations and answered questionnaires between May 2013 and December 2016. We excluded participants who changed their lifestyles (e.g., diet, drinking, smoking, physical activity, and sleeping) in the last 5 years (n = 5,883) or those with a history of cardiovascular diseases (n = 1,052) or cancer (n = 197). We also excluded participants who had a history of NAFLD (n = 2,463). As a result, the final population comprised 13,468 participants (3,008 cases with newly diagnosed NAFLD and 10,460 controls) for propensity score matching (Figure 1). The study protocol was approved by the Institutional Review Board of the Tianjin Medical University. All participants provided written informed consent prior to enrolment in the study.

Propensity Score Matching
Propensity scores were calculated for all participants using a logistic regression model with the following covariates: sex, age, body mass index (BMI), physical activity, energy intake, education level, household income, smoking status, drinking status, employment status, metabolic syndrome status, and family history of cardiovascular disease, hypertension, and diabetes. Using these propensity scores, cases were individually matched to control using the nearest matching method within a caliper distance, which selects for matching a control subject whose propensity score is closest to that of the case subject. This is known as the nearest neighbor matching approach. Moreover, a further restriction is imposed, where the absolute differences in propensity scores of matched subjects must be below some prespecified threshold (the caliper distance) (27). Thus, participants, for whom the propensity score could not be matched due to a greater caliper distance, were excluded from further analysis. As suggested by Austin (27), a caliper width equal to 0.2 of the standard deviation of the logit of the propensity score was used, because this value minimizes the mean squared error of the estimated treatment effects in several scenarios. To better match cases and controls, we used the 1:1 ratio matching method. Cases that could not be matched to any controls were discarded. Finally, 2,043 cases and 2,043 controls were generated using this propensity score matching method (Figure 1).

Assessment of Dietary Intake
Dietary intake was assessed using a modified version of the food frequency questionnaire (FFQ) that includes 100 food items [the initial version of the FFQ included only 81 food items (16)] with specified serving sizes. The FFQ includes seven frequency categories ranging from "almost never eat" to "twice or more per day" for foods and eight frequency categories ranging from "almost never drink" to "four or more times per day" for beverages. The reproducibility and validity of the questionnaire were assessed with a random sample of 150 participants from our cohort using data from repeated measurements of the FFQ ∼3 months apart, and 4-d weighed diet records (WDR). The Spearman rank correlation coefficient for energy intake between the two FFQs was 0.68 (P < 0.05). The correlation coefficients for food items (i.e., fruits, vegetables, fish, meat, and beverages) between the two FFQs were ranged from 0.62 to 0.79 (all P < 0.05). Meanwhile, the Spearman rank correlation coefficient for energy intake assessed using the WDR and FFQ was 0.49 (P < 0.05). Correlation coefficients for nutrients (i.e., vitamin C, vitamin E, polyunsaturated fatty acid, saturated fatty acids, carbohydrates, and calcium) were assessed using the WDR and FFQ ranged from 0.35 to 0.54 and from 0.39 to 0.72 before and after adjustments for energy intake, respectively (all P < 0.05). The mean daily intakes of nutrients were calculated using an ad-hoc computer program developed to analyse the questionnaire responses. Consumption of food items was calculated by multiplying the portion size (g/time) by the frequency with which each food item was consumed per day. Furthermore, Chinese food composition tables (28) were used as the nutrient database to calculate nutrient intakes. Nutrient intake was calculated by first multiplying the amount (in grams) consumed for each food item with its nutrient content per gram and then adding the nutrient contributions across all food items. Similar food items were further collapsed into 25 food groups based on the characteristics of food items for network science analyses.

Liver Ultrasonography and Definitions of NAFLD
Liver ultrasonography was performed by trained sonographers using a TOSHIBA SSA-660A ultrasound machine (Toshiba, Tokyo, Japan), with a 2-5 MHz curved array probe. According to the revised definition and treatment guidelines for NAFLD put forth by the Chinese Association for the Study of Liver Disease in 2010 (29), we define "heavy drinking" as >140 g alcohol intake per week in men and >70 g per week in women. Total alcohol intake in the past week was assessed using the FFQ. Participants were diagnosed as having NAFLD using abdominal ultrasonography (evidenced by brightness of the liver and a diffusely echogenic change in the liver parenchyma) and no history of heavy drinking. Participants with a history of selfreported or previously diagnosed NAFLD were excluded in the present study. Thus, all participants with NAFLD in the present study were newly diagnosed cases.

Assessment and Definition of Matching Variables
Sociodemographic variables (including sex, age, education, employment status, smoking status, drinking status, and household income) were also assessed using the questionnaire. The educational level was assessed by asking the question "what is the highest degree you earned?, " which was divided into two categories: <college graduate or ≥college graduate. Employment statuses were classified as either senior officials and managers or professionals. Information on smoking status ("never, " "former, " and "current smoking") and drinking status ("never, " "former, " "current drinking everyday, " and "current drinking sometime") among the participants was obtained from the questionnaire survey. Physical activity in the most recent week was assessed using the short form of the International Physical Activity Questionnaire (IPAQ) (30). BMI (in kg/m 2 ) was calculated by dividing the weight (in kilograms) by the square of the height (in meters). Waist circumference was measured at the umbilical level with participants standing and breathing normally. The blood pressure was measured twice in the left upper arm using a TM-2655P automatic monitor (A&D Co., Tokyo, Japan) in a seated position, with a 5-min rest in between. The mean of these two measurements was taken as the blood pressure value.
Fasting blood samples were obtained via venepuncture of the cubital vein and immediately mixed with ethylenediaminetetraacetic acid. Fasting blood glucose concentrations were measured using the glucose oxidase method, triglyceride levels were measured using enzymatic methods, and high-density lipoprotein cholesterol levels were measured using the chemical precipitation method with reagents from Roche Diagnostics GmbH (Mannheim, Germany) on an automatic biochemistry analyser (Roche Cobas 8000 modular analyzer). Finally, metabolic syndrome was defined in accordance with the criteria of the American Heart Association scientific statement of 2009 (31).

Statistical Analysis
The networks of dietary patterns among NAFLD patients and controls were built using mutual information (MI), which was used to infer the associations among food groups. MI measures the information shared by two discrete random variables. It measures how much knowing one of these variables reduces the uncertainty about the other (32). It quantifies the amount of information obtained about one random variable X through the other random variable Y by determining how similar the joint distribution p(x, y) is to the products of the factored marginal distributions, p(x)p(y) (25): The MI is non-negative and symmetric in X and Y. The MI is zero when X is independent of Y. Compared with traditional correlation measures, which capture only linear dependence, the MI contains information about both linear and non-linear dependencies (33). First, we computed the MI matrix for cases and controls using the Miller-Madow estimator (34) using the build.mim function in the minet R package (35). As suggested by Meyer (35), considering that the intakes of food groups were continuous variables, we partitioned the intake of food groups into subintervals with equal frequencies, called bins. The number of bins to be used for discretisation is set by default to √ m where m is the number of samples (36). The MI matrices for the case and control groups are presented in Supplementary Figures 1,  2, respectively.
Second, the edge score for each pair of food groups in each network was inferred using the mrnet inference algorithm (37) in the minet R package. This function takes the MI matrix as an input and returns the adjusted MI values in the form of a weighted adjacency matrix of the network. Weights >0 can be interpreted as implying higher confidence associations (25).
For visualization, the landscapes of food intake networks for the case and control groups were contributed by food groups as nodes and the associations between them as edges. Furthermore, since the adjusted MI values were displayed in the form of a weighted adjacency matrix of the network, the weights of edges were set using values obtained directly from the matrix. The width of the edge was set proportional to the weight of connections (for better interpretability, plots were limited to edges with inferred weight ≥0.30), and the node size was set proportional to the absolute intake of each corresponding food group. The colors from light to dark were proportional to the strengths of the nodes.
The structural properties of the networks were calculated by both weighted degrees (namely, the strength) and hubs (similar to authorities in undirected networks). The strength was calculated by summing up the edge weights of the adjacent edges for each node (38). The hub scores of the nodes were defined as the principal eigenvector A × t(A), where A is the adjacency matrix of the graph (39). Compared with strength, which represents the direct association between each node and the others, a hub can describe the importance of a node considering both itself and all the nodes to which it is connected, computed via an iterative algorithm that maintains and updates numerical weights for each node. In conclusion, strength represents direct the interaction of each node with the others, while hub can be used to measure the importance of each node in the entire network. The differences in strength and hub for each food group between the case and control groups were calculated (by subtracting the values in the control group from those in the case group). All the above statistical analyses were performed using SAS version 9.4 for Windows (SAS Institute Inc., Cary, NC, USA) and the minet package in the R environment (version 4.0; R Development Core Team, Vienna, Austria). The topologies of networks were visualized using Gephi version 0.9.2 for Windows (www.gephi.org).

Characteristics of Participants
The characteristics of participants before matching are presented in Supplementary Table 1. Among the 13,468 participants, 22.3% were classified as having newly diagnosed NAFLD. Participants with NAFLD were mostly men, older, current smokers, ex-smokers, and current drinkers, many also with metabolic syndrome, higher levels of BMIs, daily energy intakes, alanine aminotransferase, aspartate aminotransferase, and γ-glutamyl transpeptidase, lower education levels, unlikely managers, and had a family history of diabetes (all P < 0.05). The characteristics of participants (2043 NAFLD cases and 2043 controls) after matching are presented in Table 1. There were no significant differences in matching variables between the case and the control groups.
The average intake of food groups according to NAFLD status are presented in Table 2. Participants with NAFLD showed higher intakes of fish, ice cream and candy, tea and tea beverages, and sugar-containing beverages but lower intakes of whole grain (all P < 0.05).

Network Topologies of Dietary Patterns
The network topologies of dietary patterns among cases and controls are presented in Figure 2 (case group, red; control group, blue). These network topologies showed the connections between food groups and the entire structure of the dietary patterns among case and control groups. There were five components in the case group and six in the control group. However, there were two major components in the case group. The core nodes, which had high strengths and may play central roles in the first component were tubers, whole grain, ginger and garlic, fish, animal organs, meat, and processed meat. The core nodes in the second component comprised vegetables, legume and legume products, refined grain, and fruits. In contrast, there was only one major component in the control group. The core nodes comprised tubers, whole grain, vegetables, and fruits. Moreover, there were two small clusters of the food groups in both case and control groups. The first contained Chinese cakes, western-style cakes and cookies, and ice cream and candy. The second contained fruit and vegetable juices, sugar-containing beverages, and coffee. Furthermore, there were more circles in the case topology than in the control topology. For example, tubers, processed meat, meat, and ginger and garlic, altogether form closed circle in the case group. The circles in the case topology suggested that the dietary habit presented more cohesively regarding connectivity while the control structure showed equality among most food groups in terms of topology.

Differences in Networks Between Cases and Controls
We calculated the strengths and hubs for all food groups according to NAFLD status (Supplementary Table 2). The mean values of strengths in the case and the control groups were 2.32 and 2.40, respectively. The mean values of hubs in the case and the control groups were 0.37 and 0.41, respectively. Tubers yielded the highest strengths and hubs while eggs yielded the lowest values among both cases and controls.
The differences (case values minus control values) of the strengths and hubs in the food groups between cases and controls are presented in Figures 3, 4, respectively. Overall, the control group had higher strengths and, particularly, hubs in most food groups. For strengths, which represented the direct interactions of each food group with others, ginger and garlic yielded the largest positive value while nuts yielded the largest negative value. For hubs, which represented the importance of each food group in the entire network, fish yielded the largest positive value while vegetables yielded the largest negative value.

DISCUSSION
This study first used network science tools to explore the differences in dietary topologies between patients with NAFLD and controls. The dietary network topologies were constructed using MIs, which contain information about both linear and non-linear dependencies, among food groups (33). Further, the dietary network topologies provided information not only on simple associations among food groups but also on comprehensive interactions in dietary intake habits among participants. The results suggest that the dietary structures are different between the case and the control groups. The dietary structure of the case group focuses on two major components, whereas the control group has only one major component. In the case group, it was found that the dietary habits present more cohesively on connectivity in each component, whereas the control structure shows equality among most food groups. Besides absolute intake, food groups plays a role in the entire dietary structure and they are subsequently associated with NAFLD.
Previous studies have explored the associations between absolute intake of single food groups or nutrients and NAFLD. For example, a previous study found that consumption of raw garlic was inversely associated with NAFLD among Chinese men (8). Another study suggested that a higher intake of insoluble dietary fiber is associated with a lower prevalence of newly diagnosed NAFLD (12). In recent years, some studies have focused on the entire effect of diet and explored the associations between dietary patterns and NAFLD (15)(16)(17)(18)(19)(20)(21)(22). For example, our previous study showed that animal food patterns was positively associated with the prevalence of NAFLD (15). However, in the above studies, the dietary pattern scores were calculated based on the absolute intake of food groups and their importance in FIGURE 2 | Dietary topologies among cases of non-alcoholic fatty liver disease (red) and matched controls (blue). Dietary topologies computed separately among cases of non-alcoholic fatty liver disease (red) and matched controls (blue) using mutual information. Edge width sets proportional to the weights of connections (for better interpretability, plots are limited to edges with inferred weight ≥0.30), and node size sets proportional to the absolute intake of each corresponding food group, where the colors from light to dark were proportional to the strengths of each nodes.
dietary patterns. Thus, the dietary pattern was associated with NAFLD, implies in truth that the sum of weighted absolute intakes of food groups was associated with NAFLD. However, no study has explored the associations between how we eat foods as a whole (as opposed to how much we eat) and the prevalence of NAFLD. Only one previous study applied network science to explore the associations between complex dietary behaviors and dementia (25). The results suggested that how foods are consumed (but not only the quantity consumed) may be important for dementia prevention (25). In line with the previous study (25), we found that compared with studies that focused on single food groups, nutrients, or dietary patterns derived using traditional methods, network science provided an additional layer of complexity in the associations between dietary intake and NAFLD. For example, the results suggest that tubers were core nodes in network topologies among both cases and controls and there was no statistical difference in tuber intake between the two groups. However, in the case group, tuber intake was directly associated with several nodes with nearly equal hubs (whole grain, fish, processed meat, ginger, and garlic). But, in the control group, only two core nodes (whole grain and vegetables) were directly associated with tuber intake. This suggested that how tubers are consumed (not only the absolute intake) could be an important determinant of the NAFLD occurrence.
Moreover, at the level of the entire network, the results suggest that the dietary structure of the case group had two major components. Interestingly, the two major components in the case group were both characterized by food groups with beneficial and detrimental effects on NAFLD, while the hub nodes were equal. The core nodes in the first component comprised tubes, whole grain, ginger and garlic, fish, animal organs, meat, and processed meat. For example, a previous study showed that consumption of whole grains had beneficial effects on hepatic steatosis and liver enzymes concentrations among patients with NAFLD (40). Meanwhile, frequent consumption of raw garlic was also inversely associated with NAFLD among Chinese men (8). However, consumption of animal organs and meat was positively associated with NAFLD (15). Thus, although the absolute intakes of whole grain, garlic, animal organs, and meat were the same in the case and the control groups in the present study, the beneficial effects of whole grain and garlic could be covered by the detrimental effects of animal organs and meat in the case group, and vice versa. A similar structure was found in the second component in the case group, which was characterized by core nodes comprising legume and legume products, fruits, vegetables, and refined grain. The beneficial effects of legume and legume products, fruits, and vegetables could be covered by the intake of refined grain, and vice versa. However, in the control group, we found only one major component, typified by food FIGURE 3 | Differences in the strengths of food groups between case and control dietary topologies. For each node, strength was computed as the sum of edge weights (mutual information) associated with other nodes, which represented the direct associations between each node and others. The differences in strengths of food nodes calculated by subtracting control values from case values. groups with beneficial effects on NAFLD as hub nodes, such as whole grain, tubers, vegetables, and fruits.
Furthermore, we found that, compared with those of the case group, the control group had higher strengths and, particularly, hubs for most food groups. Meanwhile, there were more circles in the case topology (instead of stars in the control topology). We observed that whole grain, tubers, and vegetables were the core nodes as stars in the dietary structure in the control group. The results suggested that the dietary habits of the case group were focused on some specific food groups and circles of food groups while the control group showed a higher healthy diversity in food choices. Thus, a well-diversified diet that focuses on whole grain, tubers, and vegetables could yield beneficial effects regarding NAFLD. There are several plausible mechanisms underlying the results. First, a previous study suggested that higher healthy food diversity was inversely associated with the indicators of body adiposity in the United States (41). Meanwhile, high visceral adiposity was associated with high risk of NAFLD (42). Second, the hub food groups (whole grain, tubers, and vegetables) in the control group contain greater fiber, which leads to a slower digestion of macronutrients and have beneficial effect on blood glucose burden and insulin concentrations (43). Disruption of glucose and insulin play important role in the development of NAFLD (44,45). Third, other components, such as polyphenols, in vegetables also contributed to the lower prevalence of NAFLD (46).
The use of network science to derive dietary patterns was the main strength of the present study. Compared to the methods previously used in the derivation of dietary patterns, the network topologies here are constructed using MI, which contains FIGURE 4 | Differences in the hubs of food groups between case and control dietary topologies. For each node, hub was defined as the principal eigenvector of A × t(A), where A is the adjacency matrix of the graph. The hub can describe the importance of a node considering both itself and all connected nodes, which was computed via an iterative algorithm that maintains and updates numerical weights for each node. The differences in hubs for food nodes calculated by subtracting control values from case values.
information about both linear and non-linear dependencies among food groups. Moreover, by comparing the network topologies between the case and the control groups, one could conclude, as dietary suggestions for preventing NAFLD based on an overall food system, that how to eat but not only how much to eat is very important. The second strength of the present study lies in the inclusion of participants who were newly diagnosed with NAFLD, and our exclusion of participants who had changed their lifestyles in the last 5 years. Based on these inclusion and exclusion criteria, the reverse causation (i.e., participants with NAFLD changing their diet to reduce weights) was corrected accordingly.
Nevertheless, this study had some limitations. First, recall bias may have arisen from our use of a self-reporting questionnaire. Second, using the network method, we were unable to explore the differences in network topologies between the case and the control groups at an individual level. Moreover, confounding factors could not be adjusted. For this reason, we used the propensity score matching method to balance the case and the control groups. Thus, as shown in Table 1, all measured matching factors were balanced between the two groups. Third, we used hepatic ultrasonography instead of liver biopsies to detect fatty liver, as liver biopsies were unavailable during health examinations of the target population in our data collection. A previous study has found that ultrasonography has a sensitivity of 89% and a specificity of 93% for detecting NAFLD, and is widely used in population-based studies due to its non-invasiveness and accessibility (47). Yet, ultrasonography has limited sensitivity and does not reliably detect steatosis when the amounts of fat are low or when individuals have high BMI.

CONCLUSIONS
This study suggests that how foods are consumed, but not only the absolute intake, could be important in determining the occurrence of NAFLD. A diverse diet that focuses on whole grain, tubers, and vegetables could yield beneficial effects regarding NAFLD. Thus, despite absolute intake of food groups, dietary intervention strategies for NAFLD should also focused on whole dietary structures. Future randomized controlled trials that explore the effect of such dietary structures on NAFLD are needed to clarify the results in the present study. Moreover, it was demonstrated that network science could provide a complementary tool for in-depth studying nutritional epidemiology.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The protocol of this study was approved by the Institutional Review Board of the Tianjin Medical University and participants gave written informed consent before participation in the study.