A Description of Risk Factors for Non-alcoholic Fatty Liver Disease in the Southern Community Cohort Study: A Nested Case-Control Study

Background: Non-alcoholic fatty liver disease (NAFLD) is associated with obesity and hypercholesterolemia. In addition, total fat and folate intake have been associated with NAFLD. Aims: We investigated risk factors for NAFLD among individuals of largely low socioeconomic status, and whether these associations differed by race. Methods: A nested case-control study was conducted within the Southern Community Cohort Study. Through linkage of the cohort with Centers for Medicare and Medicaid Services, International Classification of Diseases, Ninth Revision, Clinical Modification codes were used to identify incident NAFLD cases. Controls were matched 4:1 to cases on enrollment age, sex, and race. A logistic regression was used to estimate odds ratios for the associations of NAFLD with covariates of interest. Results: Neither total fat nor folate intake was significantly associated with NAFLD. Hypercholesterolemia (odds ratio 1.21) and body mass index (75th vs. 25th percentile) for blacks (odds ratio 1.96) and whites (odds ratio 2.33) were associated with an increased risk of non-alcoholic fatty liver disease. No significant interaction with race for any of the studied variables was noted. Conclusions: Both hypercholesterolemia and increasing body mass index, but not total fat and folate intake, were risk factors for NAFLD in the Southern Community Cohort Study.


INTRODUCTION
The prevalence of non-alcoholic fatty liver disease (NAFLD) is generally estimated to be 20-30% in Western countries (1). Because of its high prevalence and associated adverse health effects, NAFLD has been identified as a significant contributor to increased health care costs, even after controlling for factors such as age, body mass index (BMI), diabetes, and other comorbidities (2). Furthermore, about 10-22% of patients with NAFLD develop non-alcoholic steatohepatitis (NASH) (3), a progressive form of NAFLD (4), and by 2020, NASH is projected to be the primary etiology for liver transplantation (5). Other than lifestyle changes including decreased caloric intake or increased physical activity, there are no currently recommended long-term treatments for NAFLD (6).
Variations in the prevalence of NAFLD across ethnic groups have been noted (7), and NAFLD is more prevalent in countries with higher per capita gross national incomes (8). One explanation for racial disparities in NAFLD is thought to be differences in adipose tissue distribution, specifically visceral adiposity (7,9,10). With respect to other risk factors for NAFLD development, there is emerging interest in the role of nutrients such as dietary folate and fatty acids. Data from animal models in which there is folate restriction or supplementation in high risk mice reveal a significant association with liver injury or NAFLD, and human studies, while less consistent, suggest a positive association between low folate and/or vitamin B12 (and high homocysteine) and NAFLD (11)(12)(13)(14)(15)(16)(17)(18)(19). Increased total fat intake has been found to be positively associated with prevalent NASH (6,20). Of note, lower folate and other micronutrient intakes have been reported in minority populations (21,22), but there are limited data on risk factors for NAFLD among lowincome populations, who tend to have less access to nutrient-rich diets and to consume energy-dense diets (23). Furthermore, it is not known whether there exists an interaction between race and specific dietary factors that may influence the risk of NAFLD.
The Southern Community Cohort Study (SCCS) is a large ongoing prospective study of adult participants, two-thirds of whom are black and over half of whom have an annual household income below $15,000 (24,25). This cohort presents an ideal context in which to study the joint relationship between race and dietary factors, such as total fat and folate intake, in NAFLD. In this case-control study nested within the prospective SCCS, we sought to identify factors contributing to incident NAFLD in the SCCS, such as diabetes, hypercholesterolemia, BMI, and lifestyle factors such as daily energy expenditure, and to determine whether there are differences by race in the effect of BMI or dietary factors (i.e., intake of total fat and folate) on NAFLD.

Cohort Description
The sample for this nested case-control study was derived from individuals enrolled in the SCCS. Between March 2002 and September 2009, ∼85,000 participants age 40-79 years and resident in 12 states in the southeastern United States, were enrolled in the SCCS. Approximately 86% of participants were recruited from community health centers (CHCs), settings that provide primary health and preventive care to underserved populations (25), while the remaining 14% were recruited from the general population. At cohort enrollment, CHC participants completed a computer-assisted personal interview, and general population participants completed and mailed in the study questionnaires. Detailed descriptions of SCCS methods have been previously published (25). Informed consent was obtained from all participants, and the study protocols were approved by the Institutional Review Boards at both Vanderbilt University Medical Center and Meharry Medical College.

Study Population
The SCCS cohort was linked to the Centers for Medicare and Medicaid Services (CMS) Research Identifiable Files from January 1, 1999 to December 31, 2010 using Social Security number, date of birth, and first and last name in order to ascertain claims for medical conditions that were diagnosed after study enrollment (25,26). The eligible source population for this study was selected from the SCCS using the following inclusion criteria: age ≥ 65 years old at the time of enrollment in the SCCS, or age < 65 years old at SCCS enrollment and having ≥ 2 CMS claims between enrollment and the end of follow-up (December 31, 2010). These criteria were used in order to increase the likelihood of continuous Medicare and/or Medicaid coverage from enrollment to the end of the follow-up period in order for NAFLD cases to be ascertained.
Based on a set of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes previously used to identify individuals with NAFLD in the Surveillance, Epidemiology, and End Results (SEER) registries (27), participants in the source population who fulfilled the criteria for NAFLD after the date of SCCS enrollment were identified as incident NAFLD cases. Participants who had NAFLD before enrollment in the SCCS were considered to have prevalent NAFLD and excluded from the eligible study population.
The inclusion criteria for ICD-9-CM codes were as follows: 571.5 (cirrhosis of liver without alcohol), 571.8 (other chronic nonalcoholic liver disease), or 571.9 (unspecified chronic liver disease without alcohol). The exclusion criteria included having . NAFLD cases, as well as controls, who reported having hepatitis on the baseline questionnaire were excluded. Because the number of participants who reported a race other than black or white was small (about 5%), they were removed from the analyses. Individually matched controls were randomly selected by incidence density sampling from among cohort members of similar age at SCCS enrollment (± 5 years), sex, and race. Controls were defined as participants from the eligible population who did not fulfill the ICD-9-CM definition for diagnosis of incident NAFLD and were individually matched 4:1 to cases (Figure 1). As such, 1,201 cases and 4,533 matched controls were identified.

Data Collection
The baseline questionnaire (available at http://www. southerncommunitystudy.org/) collected information on participants' demographic characteristics, personal and family medical history, lifestyle and anthropometric factors. Dietary intake was assessed at baseline using an 89-item food-frequency questionnaire (28). Participants selected the amount of average intake over the past year for different food items and supplements, with a range from "Never" to "2+/day." Intake levels for individual food items were derived from raceand sex-specific portion size information from the National Health and Nutrition Examination Study and the United States Department of Agriculture (USDA) Continuing Survey of Food Intakes by Individuals. Total energy and nutrient intake were derived from these data (23). Calculation of folate intake took into account intake from food sources only.
Daily energy expenditure was derived from the participants' reports of how much daily light, moderate, and strenuous activity they performed and expressed as standard metabolic equivalent-hours per day (MET-h/day), which indicates the intensity and duration of activity. These calculations were made using standard methods described in the Compendium of Physical Activity, which describes the energy cost of more than 600 different activity-related behaviors (29).

Statistical Analyses
The design of the study was a nested case-control study. The data were modeled using a conditional logistic regression model, conditioning on the matched case-control sets (each set included four controls and one case), with NAFLD as the dependent variable. Participants were excluded from the analysis if they were part of incomplete case-control sets or were missing any of the following variables: age, race, sex, history of diabetes, hypercholesterolemia, myocardial infarction (MI) or coronary artery bypass surgery, BMI, annual household income (< $15,000, ≥ $15,000), number of daily alcoholic drinks, daily energy expenditure, and last visit to a doctor (prior to enrollment visit, in months) (Supplementary Table 1). The independent variables, all self-reported on the baseline questionnaire, were included in a single model and were as follows: BMI, history of physician diagnosis of diabetes (no or yes), hypercholesterolemia (no or yes), MI or coronary artery bypass surgery (no or yes), annual household income (< $15,000, ≥ $15,000), number of daily alcoholic drinks, daily energy expenditure, and last visit to a doctor (prior to enrollment visit, in months). BMI was calculated from height and weight, which were reported by the participants. Folate and total fat intake, the primary dietary variables of interest, were derived from a validated food frequency questionnaire and adjusted for total daily energy intake using the residual method (30,31). BMI, daily energy expenditure, number of daily alcoholic drinks, last visit to a doctor, daily energy intake, total fat intake, and folate intake were added in the model as non-linear variables using restricted cubic splines. Next, we tested for interactions between race and our primary dietary variables (folate and total fat) as well as between race and BMI in the conditional logistic regression model with the matching variables and covariates specified above. The analyses were performed using R version 3.3.0 (2016-05-03). P-values ≤ 0.05 were considered statistically significant.

RESULTS
Of the 84,523 participants of the SCCS, 551 participants with a diagnosis of NAFLD prior to SCCS enrollment were not included in our study. Of the remaining participants, 8,741 participants had an age ≥ 65 years at enrollment and 39,262 participants had an age < 65 years but had ≥ 2 CMS claims between enrollment and the end of follow-up. Of these, 8,457 participants were identified as incident NAFLD cases or controls, but 2,723 participants were excluded for having missing data or being part of an incomplete case-control set (Figure 1).
Baseline characteristics of the 1,201 cases and 4,533 matched controls are presented in Table 1. As a result of matching, the median age at enrollment of both the cases and controls was 58 years old [interquartile range (IQR) 50-64]. Fifty-seven percentage of the cases and 57% of the controls were black, and 23% were male. Of the cases, 37% had diabetes and 52% had hypercholesterolemia, while among the controls, 29% had diabetes and 45% had hypercholesterolemia. As shown in Table 2, total fat intake (75th percentile vs. 25th percentile of intake) was not associated with odds of being diagnosed with NAFLD among either blacks (odds ratio (OR) = 0.98; 95% confidence interval (CI) 0.73-1.32) or whites (OR = 0.99; 95% CI 0.72-1.35), nor was there evidence of interaction with race (p > 0.10). Folate intake (comparing the 75th percentile to the 25th percentile of intake) was associated with a non-significant decrease in odds of being diagnosed with NAFLD (OR = 0.83; 95% CI 0.60-1.15) for whites, but not for blacks (OR = 1.10, 95% CI 0.82-1.47), but there was no statistically significant interaction (p > 0.10).
Having a higher BMI (75th percentile compared to 25th percentile) was associated with significantly greater odds of being diagnosed with NAFLD for both blacks (OR 1.96; 95% 1.51-2.56) and whites (OR 2.33; 95% CI 1.70-3.19), but the interaction of race and BMI was not significant (p > 0.10) (Figure 2). There was an inverse association of small magnitude with the number of daily alcoholic drinks (75th percentile of intake vs. 25th percentile of intake) and NAFLD (OR = 0.95; 95% CI 0.93-0.97). Hypercholesterolemia was associated with significantly greater odds of being diagnosed with NAFLD (OR = 1.21; 95% CI 1.05-1.40). Diabetes, MI or coronary artery bypass surgery, time since last doctor's visit, annual income, daily energy expenditure, and total energy intake were not significantly associated with NAFLD in the model ( Table 2).

DISCUSSION
In this study of black and white middle aged and older adults of generally low income who were eligible for Medicaid or Medicare in the SCCS, we found that BMI and hypercholesterolemia were associated with greater odds of NAFLD. We also showed that folate intake and total fat intake were not significantly associated with NAFLD, and that neither of these associations, nor that of BMI with NAFLD, varied by race. Folate intake varies by race/ethnicity and by SES in the United States. For example, Black and Hispanic women have lower pre-pregnancy red blood cell folate concentrations than Non-Hispanic White women (32), despite mandatory folic acid fortification in the United States. In addition, in the lowest quartile income groups compared to the highest quartile income group, decreases were observed in the absolute differences in red blood cell folate concentration after mandatory folic acid fortification, but increases in the relative ratios were noted (33).
Both decreased folate intake and increased total fat intake have been directly associated with prevalent NAFLD or NASH in previous studies (19,20). Among obese female patients undergoing bariatric surgery, Hirsch et al. found that serum folate levels were greater in those with normal liver biopsies [27.7 ± 7.04 nmol/L, (mean ± SD)] than that of those with NAFLD (21.1 ± 7.9 nmol/L, p = 0.005) (19). On a similar note, among obese female children, Frelut et al. found that elevated alanine aminotransferase levels, an indicator of NAFLD, were noted in those who were homozygous for a mutation in folate metabolism compared to those who were not homozygous for the mutation (14). Moreover, in a study from South Korea, folate intake was associated with a decreased risk of NAFLD (OR = 3.37) in male participants (15), while another study from Iran showed that the mean intake of folate in healthy controls was significantly higher than that of patients with NAFLD (18). With regards to fat intake, in a cross-sectional study of NAFLD participants, Vilar et al. showed that total lipid intake as a percent of total energy intake among participants with NASH was greater at 37.5 ± 8.0% (mean ± SD) than that of participants with steatosis alone at 31.2 ± 7.8% (p = 0.003) (20). While the Vilar et al. study compared participants with biopsy-proven NASH vs. simple hepatic steatosis (20), our study compared those with and without NAFLD, based on ICD-9-CM codes.
We found that both hypercholesterolemia and higher BMI were associated with an increased risk of NAFLD and that the association between BMI and NAFLD was not significantly modified by race. Hypercholesterolemia is a known risk factor for NAFLD (34), and other traditional risk factors for NAFLD, such as diabetes and obesity, disproportionately affect black adults (35,36), who comprise a substantial proportion of the SCCS. However, the distribution of body fat, specifically percent visceral fat, is thought to be a particular factor that accounts for racial differences in the prevalence of NAFLD (10), and BMI does not reflect body fat distribution or distinguish between adipose tissue, fat free mass and skeletal mass which vary widely across multi-ethnic populations for a given BMI value (37).
In the current study, the number of daily alcoholic drinks had a small inverse relationship with NAFLD. This finding likely arises because NAFLD is usually diagnosed after excluding other causes of liver disease, such as chronic alcohol use, and participants who reported significant alcohol use would be less likely to be diagnosed as having NAFLD. In this study, current drinkers, defined as those participants who had ≥ 1 daily alcoholic drinks, reported mild to moderate alcohol consumption, with medians of 1.3 daily alcoholic drinks among controls and 1.1 among cases. Dunn et al. found that among patients with biopsy-proven NAFLD, those who reported moderate alcohol consumption had decreased odds for progressive liver disease with fibrosis (OR = 0.56, 95% CI 0.41-0.77) than non-drinkers (38). As such, the findings of our study are consistent with the Dunn et al. (38) study, but since heavy alcohol intake may preclude the diagnosis of NAFLD, it is difficult to draw any conclusions on alcohol use and NAFLD.
NAFLD has become increasingly widespread, but it has been relatively less studied among low socioeconomic status (SES) individuals. Zhu et al. (8) noted that in countries in Asia and Europe with a higher national income, there were higher prevalences of NAFLD. The authors postulated that this was the result of more sedentary behavior and a move away from traditional diets toward diets with an increased amount of calories (8). In the SCCS population of generally low SES, the observed lower risk of NAFLD among those with lower income, although not statistically significant, is suggestive of a similar association with income. However, in the current study, we accounted for total energy intake in the model.
The strengths of the study include studying NAFLD in the setting of a large, low SES cohort, with a majority of black participants. This represents a population that has previously been less studied with respect to NAFLD, and one with a relatively high burden of risk factors previously shown to be associated with NAFLD. Moreover, the use of a validated dietary questionnaire in the SCCS allowed examination of associations between nutrient intake, specifically total fat and folate, and NAFLD.
The current study has some limitations. NAFLD was identified through CMS claims using a set of ICD-9-CM codes. Clinically, methods of identifying NAFLD include using liver aminotransferase levels and hepatic imaging. However, like us, other groups have used ICD-9-CM codes to determine which patients had NAFLD, as in SEER registries, which were linked to Medicare data (27). Although the use of ICD-9-CM codes provides a means by which to determine which individuals have NAFLD in a large cohort such as the SCCS, it likely underestimates the true number of individuals with NAFLD because NAFLD is underdiagnosed by clinicians (39). Future directions include determining the sensitivity and specificity of ICD-9-CM code algorithms for identifying NAFLD in large population cohorts, as well as using natural language processing and other bioinformatics approaches in electronic health records to identify NAFLD. Another limitation is the use of a casecontrol study design. However, we did use interaction terms between race and BMI, folate, and total fat, respectively, to help mitigate bias.
Our results suggest no significant associations between total fat intake and folate intake with NAFLD in this predominantly black, low SES cohort. Hypercholesterolemia and BMI, generally accepted as traditional NAFLD risk factors, were associated with an increased risk of NAFLD among both blacks and whites >40 years old.

DATA AVAILABILITY STATEMENT
The datasets generated for this study will not be made publicly available. Requests for data need to be reviewed and approved by the SCCS Data and Biospecimen Use Committee. Requests can be made directly to the Southern Community Cohort Study: https:// www.southerncommunitystudy.org/.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Boards at Vanderbilt University Medical Center and Meharry Medical College.
The patients/participants provided their written informed consent to participate in this study.

FUNDING
This study was supported by grants from the National Cancer Institute (R01 CA092447), the American Recovery and Reinvestment Act (3R01 CA092447-08S1), the National Institutes of Health (T32 DK007569), and the National Institute of Diabetes and Digestive Diseases (K24 DK62849), a Veterans Affairs Merit Award (1I01CX000982-01A1), and institutional funds from the Vanderbilt Center for Kidney Disease, a National Institutes of Health George O'Brien Kidney and Urological Disease grant, and the Vanderbilt Epidemiology Center.