Diet Sustainability Analyses Can Be Improved With Updates to the Food Commodity Intake Database

Diet sustainability analyses inform policymaking decisions and provide clinicians and consumers with evidence-based information to make dietary changes. In the United States, the Food Commodity Intake Database (FCID) provides a crosswalk for integrating nationally representative data on food intake from the National Health and Nutrition Examination Survey (NHANES) with data on sustainability outcomes from other publicly available databases. However, FCID has not been updated since 2010 and does not link with contemporary NHANES data, which limits further advancements in sustainability research. This study fills this research gap by establishing novel linkages between FCID and NHANES 2011–2018, comparing daily per capita food intake with and without these linkages, and making these data publicly available for use by other researchers. To update FCID, two investigators independently established novel data linkages, a third investigator resolved discrepancies, and a fourth investigator audited linkages for accuracy. Dietary data were acquired from nearly 45,000 adults from 2001 to 2018, and food intake was compared between updated vs. non-updated FCID versions. Total food intake from 2011 to 2018 was 5–23% higher using the updated FCID compared to the non-updated version, and intake was over 100% higher in some years for some food categories including poultry, eggs, legumes, starchy vegetables, and tropical oils (P < 0.001 for all comparisons). Further efforts may be needed to create new food composition data to reflect new products and reformulations that enter the food supply over time. This study removes a barrier to further diet sustainability analyses by establishing a data crosswalk between contemporary NHANES and other publicly available databases on agricultural resource use, environmental impacts, and consumer food expenditures.

Diet sustainability analyses inform policymaking decisions and provide clinicians and consumers with evidence-based information to make dietary changes. In the United States, the Food Commodity Intake Database (FCID) provides a crosswalk for integrating nationally representative data on food intake from the National Health and Nutrition Examination Survey (NHANES) with data on sustainability outcomes from other publicly available databases. However, FCID has not been updated since 2010 and does not link with contemporary NHANES data, which limits further advancements in sustainability research. This study fills this research gap by establishing novel linkages between FCID and NHANES 2011-2018, comparing daily per capita food intake with and without these linkages, and making these data publicly available for use by other researchers. To update FCID, two investigators independently established novel data linkages, a third investigator resolved discrepancies, and a fourth investigator audited linkages for accuracy. Dietary data were acquired from nearly 45,000 adults from 2001 to 2018, and food intake was compared between updated vs. non-updated FCID versions. Total food intake from 2011 to 2018 was 5-23% higher using the updated FCID compared to the non-updated version, and intake was over 100% higher in some years for some food categories including poultry, eggs, legumes, starchy vegetables, and tropical oils (P < 0.001 for all comparisons). Further efforts may be needed to create new food composition data to reflect new products and reformulations that enter the food supply over time. This study removes a barrier to further diet sustainability analyses by establishing a data crosswalk between contemporary NHANES and other publicly available databases on agricultural resource use, environmental impacts, and consumer food expenditures.

INTRODUCTION
Diet sustainability analyses have increased in number over the past decade (1) in response to growing global awareness that food system transformation is needed to address concerns about human health, environmental impacts, food affordability, and social justice (2, 3). Unlike food system sustainability analyses, which focus broadly on the conditions and decisions that occur throughout a food system (e.g., production, processing, transport, and consumption), diet sustainability analyses focus more narrowly on the sustainability impacts of consumer food choices. As a result, these findings inform consumer-oriented policy action including the development of sustainable dietary guidance, and are directly relevant to clinicians and consumers seeking evidence-based information on how to make impactful dietary changes (4).
For example, a growing number of countries have adopted sustainable dietary guidelines and several more have attempted it, including the United States (US) (5). Over one-third of US consumers report that considerations of environmental sustainability are an important driver of their food choices, and nearly one-third report that it has had much more or somewhat more of an impact on their food purchasing decisions over the previous 10 years (6). Willets-Smith et al. (7) demonstrated that targeted dietary shifts among individuals motivated by health and environmental concerns (16% of the total population) can reduce GHG emissions by up to 6.7%, further demonstrating the potential impact of consumer behavior changes. It bears noting that dietary sustainability cannot solely be achieved by shifts in motivated consumers' behavior; it will require multifaceted, population-level interventions (i.e., regulation, subsidies, changes in public procurement) (8).
Although most diet sustainability analyses have been conducted using data collected from other countries, the number of studies conducted using US-based data has increased (1) as data integration methods have improved (9). For example, Canning et al. (10) combined dietary data from the National Health and Nutrition Examination Survey (NHANES) with an environmentally-extended economic model and a biophysical model and found that food demand in the US accounted for 28% of freshwater withdrawals, 25% of total land area, 18% of greenhouse gas emissions (GHGs), but only 8.6% of gross domestic product (GDP). More recently, He et al. (11) showed that that shifts toward healthier diets can reduce some, but not all, environmental impacts but may be unaffordable for some lower-income groups.
As consumers continue to seek ways to improve the sustainability of their diets, these analyses will continue to rise in importance. NHANES is the backbone of diet sustainability analyses in the US because it is the richest source of nationally representative dietary data. Survey respondents typically report consumption of mixed dishes that contain multiple ingredients, so food composition databases are used to quantify these ingredients, which provides a crosswalk to environmental and economic databases (9). Key among these food composition databases is the Food Commodity Intake Database (FCID), which disaggregates NHANES foods into nearly 500 highly differentiated ingredients and has been used to evaluate dietary intake (12)(13)(14)(15)(16), chemical exposure (17)(18)(19)(20), environmental impacts (7,(21)(22)(23)(24)(25), agricultural resource use (26,27), and food expenditures (28,29). However, FCID has not been updated since 2010, so it does not link with more contemporary NHANES data and therefore presents a barrier for further diet sustainability analyses.
To address this research need, the objectives of this study are to (1) link FCID 2001-2010 to NHANES 2011-2018, (2) compare daily per capita food intake with and without these novel linkages, and (3) make these linkages publicly available for use by other researchers.

National Health and Nutrition Examination Survey (NHANES)
Data on individual-level food intake were acquired from NHANES, 2001-2018. NHANES is a continuous, multistage, cross-sectional survey of individual-level food intake, health behaviors, health status, and sociodemographics. Data are collected from ∼5,000 non-institutionalized individuals per year using in-person surveys, physical examinations, and laboratory tests performed by trained staff. Data have been collected continuously since 1999 and are released in 2-year cycles (30). Respondents are assigned survey weights that reduce the potential for bias from differential probabilities of selection and nonresponse, and some demographic groups are oversampled to increase the reliability and precision of subgroup analyses (31). The dietary component of NHANES is What We Eat In America, which captures intake of ∼4,500 different foods. A portion of these foods are updated for each NHANES survey cycle to reflect new products that enter the market and reformulations of existing products.

Food Commodity Intake Database
Data on the ingredient composition of NHANES mixed dishes were acquired from Food Commodity Intake Database (FCID), 2001-2010. FCID was developed by the US Environmental Protection Agency (US EPA) to estimate dietary exposure to pesticides when used in conjunction with the Dietary Exposure Evaluation Model (DEEM), and to estimate food consumption rates provided in EPA's Exposure Factors Handbook. FCID provides the gram weight of nearly 500 ingredients present in each NHANES mixed dish in their as consumed forms, which were determined by EPA staff using popular, regional, and specialty cookbooks, as well as professional judgement.

Matching Procedure
For each new NHANES cycle, many of the foods are retained from previous cycles but some are replaced with new foods to account for changes in the food supply. Therefore, NHANES 2011-2018 includes many foods that are not included in based on professional judgement. Perfect agreement between the investigators was achieved for 60% of the foods, and the remaining discrepancies were minor (e.g., the NHANES food was "pizza, with cheese and extra vegetables, not specified as to type of crust, " yet investigator 1 matched it with "pizza with cheese and extra vegetables, regular crust" and investigator 2 matched it with "pizza, cheese, with vegetables, not specified as to type of crust"). All matches were audited by a third investigator who resolved discrepancies (40% of matches) and flagged instances in which investigators 1 and 2 agreed but a closer match was available (< 1% of matches). A fourth investigator reviewed all matches for accuracy. After the discrepancies were resolved, 100% of the NHANES foods were linked with FCID ingredient composition data.

Statistical Analyses
All FCID ingredients (n = 484) were grouped into 21 food categories (Supplementary Table 2) for analysis based on the Healthy Dietary Patterns in the Dietary Guidelines for Americans (32), and more specific categories were established where possible (for example, meat was further categorized into beef, pork, and other meat). Mean per capita intake of each food category was estimated for each NHANES cycle from 2001 to 2018. Temporal trends from 2011 to 2018 were estimated with and without FCID updates using linear regression models, and were compared using paired Wald tests with P < 0.05. Respondents with incomplete dietary data were identified by trained NHANES staff and were excluded from the analyses. To ensure equal sample sizes for analytic comparisons between updated and nonupdated intakes for each food category, additional respondents were deemed to have incomplete data if they did not consume any foods included in NHANES 2001-2010. All analyses were adjusted for age (continuous), gender (male/female), and energy intake (continuous) using linear regression. Stata 16.1 (Stata Corp; College Station, TX) was used for data management and statistical analyses.

Data Availability
The updated FCID database is available for download at Data Archiving and Networking Services (DANS) through a Creative Commons license (CCO-1.0). doi: 10.17026/dans-zqx-a23v.

DISCUSSION
For the first time, this study integrated data on food composition from the Food Commodity Intake Database (FCID) with data on food intake from the National Health and Nutrition Examination Survey (NHANES) 2011-2018. Using dietary data from nearly 45,000 individuals, this study demonstrated that total food intake estimated with FCID would be 5-23% lower without these updates, and larger differences were observed for certain food categories. These data are made publicly available for use by other researchers to catalyze advancements in diet sustainability science.
Other food composition databases are available to disaggregate NHANES mixed dishes into their component ingredients, but these have limitations that are now overcome with FCID (Supplementary Table 4). Food Intakes Converted to Retail Commodities Database (FICRCD) has not been updated since 2008 (33) and may require imputation to fill in missing food recipes (34), although the embedded computations on food processing conversions may still be useful for specific research purposes (35). Others (11) have used the Food and Nutrient Database for Dietary Studies (FNDDS) (36) and Food Patterns Equivalents Database (FPED) (37) to disaggregate NHANES foods for diet sustainability analyses, but these databases do not account for food waste which represents ∼30% (by weight) of food available for consumption (26), and will underestimate the associated sustainability outcomes. By contrast, FCID is the only food composition database that disaggregates NHANES mixed dishes into ingredients that map onto agricultural commodities, which can then be linked with data on food waste from the Loss-adjusted Food Availability data series (38), as described elsewhere (9). These linked FCID-LAFA data can be used to evaluate the association between food waste and multiple indicators of sustainability, including agricultural resource use (26,27), environmental impacts (25), diet quality (26,27), and consumer food expenditures (28,29). The present study allows these linkages to be extended to more contemporary data on food intake from NHANES 2011-2018, thereby filling an important data gap.
When using the non-updated FCID to estimate food intake, consumption of nearly all food categories decreased from 2011 to 2018 due to incomplete linkages with NHANES. The proportion of NHANES foods not matched with FCID ingredients increased with each NHANES cycle and reached 57% by 2017-2018, which resulted in lower intakes over time for many food categories. The updated database filled those linkage gaps and increased estimates by up to 65% for 16 out of 21 food categories and over 100% for the remaining 5 food categories. The largest changes were observed for eggs (up to 324% increase) and poultry (up to 148% increase), possibly due to their increased use as an ingredient in processed foods that had entered the market since FCID was last updated in 2010 (see below). Temporal trends using the updated database were consistent with estimates of loss-adjusted per capita food availability for all food categories, although a minor discrepancy was observed for other vegetables (38). Other vegetables is a heterogeneous category and LAFA only includes a subset of those included in FCID.
Approximately 20,000 new food products entered the US marketplace every year from 2011 to 2018 (39), and a portion  intake even further than what was observed in the present study. Researchers have several options for addressing this limitation. First, new recipes can be created for processed foods that entered the US food supply since 2011, just as EPA did when FCID was updated in 2005 and 2010 (this explains why intake of foods in the "other" category increased dramatically in 2005, which led to an increase in total food intake at that time). Second, researchers can derive the intake of some food categories in mass quantity from other food composition databases, like FICRCD, FNDDS, and FPED (described above).
FCID can be used to estimate the environmental impacts of dietary patterns by linking with the database of Food Impacts on the Environment for Linking to Diets (data FIELD), which provides data on GHG emissions and energy use associated with the production of each FCID ingredient (21). DataFIELD was created by aggregating impact data from a review of life cycle assessments (LCA) that evaluated impacts from cradle-to-farm gate for most ingredients and cradle-to-processing for others, and therefore these data do not include the impacts that occur downstream in the food system (e.g., manufacturing and home cooking). A similar approach has been adopted by others (23). These system boundaries were adopted due to the use of FCID as a crosswalk between LCAs and NHANES, as well as limited data availability from LCA studies on downstream impacts (21). Future efforts will be needed to update food impact estimates with new system boundaries as the LCA literature continues to expand. Limited data linkage between FCID 2001-2010 and NHANES 2011-2018 may have impacted prior sustainability analyses. In some cases, researchers only used NHANES data up until 2010 to align with the year FCID was last updated (21,25,40), which does not reflect changes in food consumption that have occurred since that time. Others have combined FCID 2001-2010 with NHANES data up to 2016 with (28,29) and without (26) imputation to fill data gaps, and demonstrated that incomplete linkages led to a reduction of daily Total Food Demand (sum of retail loss and consumer purchase amount) by 10% (26) and a reduction of daily consumer food expenditures by 7 (28) to 15% (29).
This study has several strengths. To reduce bias in the data linkage procedure, matches were completed independently by two investigators, discrepancies were reconciled by a third investigator, and data were audited by a fourth investigator. Ingredients were categorized into 21 distinct food categories to further investigate bias within each food category, and the raw data are made publicly available so that others can create their own food categories to address specific research questions. Data on food intake were acquired from a large, nationally representative sample over an 18-year time period, which increases generalizability. Finally, this study fills an important research gap by providing ingredient recipes for contemporary NHANES data, which removes a barrier to further diet sustainability analyses.
This study also has limitations. The data linkage procedure was performed by hand coding over 4,000 NHANES foods to nearly 500 FCID ingredients, so misclassification bias cannot be ruled out. This hand-coding method was tested against automated natural language processing during the design phase of this study, and the hand coding method demonstrated superior performance when audited by investigators. Nonetheless, it is possible that further refinement of automated methods may yield similar or improved outcomes; further investigation is warranted to reduce bias and investigator burden. This study used proxy recipes that were already included in FCID 2001-2010 rather than create new recipes for NHANES foods, which may not reflect new products and reformulations that entered the US food supply since 2011. Further efforts are needed by the federal government or others to create new recipes for newly added NHANES foods, which may increase estimates of food intake beyond what was demonstrated in this study. Therefore, the results presented in this study should be considered conservative. Finally, self-reported dietary data are subject to social desirability bias and may have introduced measurement error.

CONCLUSIONS
This study removes a barrier to future diet sustainability analyses by linking data on food composition from FCID 2001-2010 with nationally representative data on food intake from NHANES 2011-2018. As a result, contemporary dietary data can be linked to publicly available data on agricultural resource use, environmental impacts, consumer food expenditures, and other sustainability indicators, which was not previously possible. All data are made publicly available.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: Data Archiving and Networking Services (DANS) through a Creative Commons license (CCO-1.0). doi: 10.17026/dans-zqx-a23v.

ETHICS STATEMENT
The data collection protocol for the National Health and Nutrition Examination Survey was reviewed and approved by the National Center for Health Statistics Review Board. The patients/participants provided their written informed consent to participate. The data analysis protocol for the present study was reviewed and approved by the Institutional Review Board at William & Mary.

AUTHOR CONTRIBUTIONS
ZC was responsible for data management and analysis, designed the research, and wrote the paper. ZC, AC, CK, EJ, BH, JL, AM, MS, and TW conducted the research. All authors read, edited, and approved the final manuscript.