Validity of Dietary Assessment Methods When Compared to the Method of Doubly Labeled Water: A Systematic Review in Adults

Accuracy in quantifying energy intake (EI) using common dietary assessment methods is crucial for interpreting the relationship between diet and chronic disease. The aim of this systematic review was to evaluate the validity of dietary assessment methods used to estimate the EI of adults in comparison to total energy expenditure (TEE) measured by doubly labeled water (DLW). Articles in English across nine electronic databases, published between 1973 and February 2019 were retrieved. Studies were included if participants were adults (≥18 years) and used the DLW technique to measure TEE compared to self-reported EI. A total of 59 studies were included, with a total of 6,298 free living adults and a mean of 107 participants per study. The majority of studies including 16 studies that included a technology based method reported significant (P < 0.05) under-reporting of EI when compared to TEE, with few over-reporting EI. Misreporting was more frequent among females compared to males within recall based dietary assessment methods. The degree of under-reporting was highly variable within studies using the same method, with 24 h recalls having less variation and degree of under-reporting compared to other methods.


INTRODUCTION
The accuracy of measuring food and nutrient intakes using various dietary assessment methods is crucial for interpreting the relationship between development of dietary related chronic diseases, including type 2 diabetes mellitus, cardiovascular disease, and some cancers (1). These chronic diseases contribute significantly to the global burden of disease (2). The validity of dietary assessment methods plays an important role in accurately describing the dietary patterns and nutrient intakes of populations, comparing dietary intakes to recommended dietary guidelines, and following trends in dietary intakes in populations over time (3)(4)(5). While self-report measures of EI have received criticism, recommendations have been made to minimize bias when collecting, analyzing, and interpreting dietary data assessed using self-reported methods (6).
The incorporation of technologies to assess dietary intake, including by way of smartphone and the Internet, has facilitated key developments in the collection, analysis and interpretation of dietary intake data (7). This includes reducing costs associated with data collection and analysis, lowering subject and researcher burden and facilitating more timely approaches to data analysis (7). However, the emergence of newer dietary assessment methods with technology assisted components, such as image-based methods and wearable devices (e.g., micro-camera) that incorporate technology for data collection, means that a review of the validity of technology based methods is also timely (8,9).
A variety of established self-reported dietary assessment methods exist, including 24 h recalls, diet histories, food frequency questionnaires (FFQs) and food records. Many methods are subject to mis-reporting which is often classified as over-or under-reporting (10,11), with an additional selection bias in terms of the type of people who volunteer to participate in these studies, due to high participant burden (12,13). Other potential biases within assessment of dietary intake can stem from issues relating to memory, perception and conceptualization of portion sizes, knowledge and confidence with technology-all of which could impact adversely on accuracy of reported EI (14,15).
Image-based methods require participants to capture digital images of food and beverages pre-and post-consumption with a camera device, and as such are similar to a food record (7). Image-based methods may be susceptible to mis-reporting due to reactivity bias, in that knowing one must take an image of the foods about to be eaten may influence what foods the person chooses to eat on that occasion (9). In addition, measurement using a technology based dietary intake method is dependent and subject to the inherent limitations of technologybased approaches; identification of the food and its components and accounting for intra-and inter-individual variability, and complexities (7) related to whether food is consumed from one's own plate or shared plates (16) and/or consumed with additional condiments.
Measuring the validity of dietary assessment tools requires an objective measure that does not face the same inherent errors found in the dietary assessment tool being assessed. The doubly labeled water (DLW) technique is an objective method of measuring total energy expenditure (TEE), and is considered a reference method for evaluating validity of self-reported EI in relatively weight stable individuals (3,4,12). It is also independent of self-reported error (17,18). An initial DLW dose is determined by standardized equations according to body weight. Following consumption, urine samples are collected over a period of seven to 14 days to account for short-term day-to-day variation in physical activity (19).
A previous review (2001) provides valuable insight that EI is consistently under-reported compared against DLW, with the majority of studies at the time of publication using food records or diaries (17). An additional review by Hill and Davies in 2001 went further to describe characteristics associated with under-reporting which included: (1) Dietary restraint, (2) Socioeconomic status, and (3) Gender (under-reporting more common in women than in men) (20). An additional review by Livingstone and Black (21) detailed additional factors relating to low energy reporters, which included possible cultural influences. However, there have been no reviews in adults since that investigate the misreporting of energy intake. It is within this context that this review aims to evaluate the validity of selfreported dietary assessment methods in estimating the daily EI of adults (≥18 years) in comparison to TEE measured by DLW.

Search Strategy
Initially searches of online database were conducted in Cochrane, CINAHL, MEDLINE, EMBASE, Scopus, Cumulative Index to Nursing and Allied Health Literature, ProQuest, PubMed and Excerpta Medica Database. Keywords and combinations of keywords used included adult, dietary assessment, food frequency questionnaire, dietary recall, 24 h food recall, diet record, food record, food diary, energy intake, energy expenditure, doubly labeled water, valid * , accuracy * , precise * and combination of all above-mentioned, see Supplementary Material for example search strategy. Articles retrieved were limited to those published in English-language journals between 1973 and February 2019. The reference lists of articles that met the inclusion criteria were hand searched and key articles identified were used for further searches via the Web of Science database Cited Reference function. Authors were not contacted for any missing information and gray literature was not searched. The protocol for this review was developed and registered with PROSPERO-an international prospective register of systematic reviews, under the registration number CRD42017064545.

Study Selection
The flow of studies at each stage of the review is depicted in Figure 1. Following the initial database searches, titles and abstracts were screened to determine which studies required full text retrieval. The full-text articles retrieved were assessed for eligibility using inclusion criteria. The screening was done by two independent reviewers (Y.H and T.B). Articles were identified as relevant if they were studies that aimed to compare dietary intake with TEE, if they included adult participants (aged ≥18 years), if they reported EI measured by a dietary assessment method, if DLW was used to estimate TEE and if the primary purpose of the study was to validate the dietary assessment method. Full articles were retrieved if eligible for inclusion or if eligibility for inclusion was unclear after screening the abstracts. Articles were reviewed by two independent reviewers (YH and TB). Any disagreement between the two reviewers was resolved by discussion with a third independent reviewer (MR).

Data Extraction and Quality Evaluation
All relevant articles were then independently assessed for quality using the American Dietetic Association quality checklist for primary studies as outlined in the Evidence Analysis Manual (22). A study was rated as 'positive' quality if it satisfied a majority of the quality criteria, including four priority criteria pertaining to (1) Selection of study participants, (2) Comparability of study groups, (3) Intervention description and (4) Outcomes. A study was rated as having "neutral" or "negative" quality based on the number of criteria that were met/ not met. No studies were excluded from the review based on quality assessments.
Data relevant to this review were extracted using a standardized tool which was initially piloted using four studies, with minor wording changes made for reviewer clarity. Data were then extracted by two independent reviewers (YH, TB), including study design participant characteristics, dietary assessment methods(s) used, and DLW results. Any discrepancy was resolved via discussion with a third reviewer (MR). Dietary assessment methods were categorized using the National Cancer Institutes of Health Dietary Assessment Primer definitions (23). Dietary assessment methods with technology components were also recorded if any form of communication and/or information technology was used, such as mobile or smartphone, the Internet or sensors collecting image, movement or auditory data. The technology could be utilized in either the collection, analysis or interpretation of the dietary method.

Population
The search strategy identified 572 records (Figure 1). After review of full text papers, 59 articles were included and underwent critical appraisal and data extraction. Major reasons for exclusion were: the study did not report dietary validation results (n = 12), not a study (n = 3) or not conducted in an adult population (n = 1). Table 1 summarizes study details including number of participants and anthropometry, dietary assessment methods used and DLW reporting period. Across the 59 included studies there was a total of 6,298 adults. The majority of studies were conducted in free-living settings with one conducted in a military population (78), one in clinical population group with short bowel syndrome (36), one in obese pregnant women (60) and one in wrestlers (71). The mean number of participants per study was 107 (ranging from 6 to 1075) with the age of participants ranging from 18 to 96 years.

Study Design
The reporting period for DLW measurement of TEE ranged from 7 to 22 days (Supplementary Material) 24 h. Five studies collected additional saliva samples for DLW purposes (31,33,42,45,56) and two also collected blood samples (5,64).
A total of five different dietary assessment methodologies were used across 59 studies. The most commonly used dietary assessment method was a food record (FR) (n = 36), 12 of which were weighed food records (WFR) (26,27,29,30,33,41,51,56,69,71,72,76). The range of recording days were 2 and 16 days with the majority (n = 12) of studies had a reporting period of 7 days. The next most frequently used method were 24 h recalls (n = 24) with the multi-pass method (MPR) used in 13 studies with recall days ranging from two to seven. Seven of the MPR studies had a reporting period of 2 days and an additional six studies reported for 3 days. Of the studies that used a 24 h recall approach (n = 24), the range was from two (42) to 14 recalls (41). A total of 18 studies clearly described that they used non-consecutive days for recalls (5,25,27,31,38,39,42,44,49,50,52,53,59,66,67,72,74,79,82).

Food Record
Of the studies that reported the accuracy of food records at the group level, the majority of studies (n = 19) found significant under-reporting of EI, by 11 to 41% (26,35,42,43,46,53,54,61,65,73,78,80) with over-reporting found in only one study by 8% (64). Three studies found no significant difference between absolute EI estimated by food record and TEE measured by DLW (31,47,58).       Six studies using food records reported outcomes by sex (26,29,41,53,76,83), with three studies (26,29,53) reporting no significant difference between sexes while one study each for males (76) and females (83) identified as having a lower degree of misreporting. One study (41) found that females under-reported while males slightly over-reported.
Two additional studies reported a negative correlation (35,46) between EI reporting accuracy and BMI while no association with BMI was reported in two studies (56,72). Two studies found that individuals with overweight and obesity were more likely to under-report compared to normal weight individuals (54,80), although only one study reported this difference to be statistically significant (p = 0.032) (80).

Food Record with technology component
Technology was applied to the food record method most commonly using a digital camera (n = 4) (45, 67, 68, 71), a mobile phone (image based) (n = 3) (55, 60, 69), a wearable camera (n = 1) (65), the Internet (n = 1) (43), and a PDA (n = 1) (58). Of the studies that used a digital camera, three studies reported under-reporting of 6, 17, and 24%, respectively (45,68,71) while one study found no significant difference between EI and TEE (67). However, those with overweight or obesity were more likely to over-report EI. Image based methods using a smart phone to estimate EI were under-reported compared to DLW between 20 and 37% (54) and in one study where a wearable camera was used in addition to a food record compared with food record alone, the Overall quality Frontiers in Endocrinology | www.frontiersin.org Overall quality   use of the wearable camera reduced level of under-reporting from 34 to 30% (65).
One study found EI was over-reported in a clinical group of individuals with short bowel syndrome (36).

h MPR with technology component
Technology was mostly added to 24 h recalls through use of a web-based system to assist in standardizing the multiple-pass approach (25,31,63). In one study, the 24 MPR method was compared with the same method but with the addition of a wearable camera (39). While both methods were found to underreport EI in comparison with DLW, the camera-based method had a lower degree of under-reporting (13 and 7% for females and 17 and 9% for males for the 24 MPR and 24 MPR with camera, respectively) (39). The camera used in this study was a wearable camera worn around the neck with movement, heat, and light sensors.

Studies Using Multiple Methods
Seven studies utilized and reported outcomes of EI misreporting using three different dietary methods in one study. The combination of dietary assessment methods most often used were a 24 h recall, FFQ and food records (n = 5) (27,31,63,72,73). Three studies reported that under-reporting was lowest for the MPR method (31,63,73), while one reported that food record was lowest (72) and one reported that FFQ was lowest (27).

Food Frequency Questionnaire
Significant under-reporting of EI was found at the group level in all studies using an FFQ when compared to the DLW method. EI under-reporting ranged from 4.6 to 42% (5, 24, 25, 27, 31, 34, 37, 38, 48, 50, 54, 61, 62, 66, 72-74, 76, 77). One study showed no significant difference between reported EI and TEE on average when using an adapted version of FFQ from a validated FFQ among low income women in Brazil, however, at the individual level significant misreporting remained (49). Three studies compared the validity of different FFQs (i.e., Block FFQ vs. National Cancer Institute's Diet History Questionnaire (DHQ) (72) and a full vs. brief FFQ i.e., Meal-Q vs. MiniMeal-Q (28,34,72,77). No significant difference in validity was found between the Block FFQ and DHQ, with both having similar, significant EI under-reporting, by ∼27% in 20 female adults (72). The other study found significant (P < 0.001) under-reporting of 30 and 36% by both Meal-Q and MiniMeal-Q, respectively. The difference between EI estimated by Meal-Q and MiniMeal-Q was found to be significant (P < 0.001) (34). In the study by Sawaya et al. (72), both FFQs were also found to under-report EI in young females.
One study using an FFQ identified that individuals with obesity under-reported to a greater extent than their non-obese counterparts (50). Another study indicated that the difference between the EI from a FFQ and the DLW method were significantly correlated with BMI (r = 0.50) (48). One study used an FFQ, known as the Short Dietary Questionnaire (SDQ), and identified that EI was significantly (P < 0.001) under-reported by ∼26%, and that females with overweight/obesity under-reported more than normal weight females (77).

Diet History
Four out of the five studies found EI was under-reported by 1.3-47% (26,30,35,70). One study found females under-report to a greater extent than males by 47 and 1.3% respectively (26).

DISCUSSION
The aim of the current review was to evaluate the validity of self-reported dietary assessment methods used to estimate EI of adults in comparison to TEE measured by the DLW method. A total of 59 studies were included, which utilized a number of dietary assessment methods, of which food records were the most commonly used method (n = 36). The main finding from the review is that EI was underestimated for the majority of dietary assessment methods, in the range of 11-41% for food records, 1.3-47% for diet histories and 4.6-42% for FFQs. The method with lowest total amount and lowest level of variation was found to be 24 h recalls, with underestimations of EI ranging between of 8-30%. This variation could be attributed to recall bias, length of reporting period and use of visual aids to estimate portion size.
Methods utilizing a technology component are relatively new compared to traditional methods. They are often more appropriate for some population groups when compared to more traditional methods, such as individuals with language barriers (84). They can also help assist in reducing reliance on respondents' memory and with estimating portion size by capturing intakes in real time via images and/or on audio recordings (85). The current review included 15 studies that used a technology component, with only two studies making direct comparisons with traditional methods. The Handheld PDA and the Remote Food Photography Method (RFPM), both categorized as food records, were found to have a lower degree of misreporting, however, these technologies were only supported by one study each (55,58). For many studies in the current review, the technology component was primarily utilized in the collection phase (31,34,39,43), however, it was unclear in many studies. To date, research estimating EI using wearable devices has been limited to small samples sizes, a limited variety of foods and controlled environments (8,86). Objective measurement of intake in larger sample sizes and freeliving individuals is required to determine the performance of technology based methods, including those that utilize sensors or wearable devices (7).
The current review also identified sex-differences in the validity of EI, with females having a greater tendency than males to misreport EI when using MPR (38,39,53), diet history (26) and FFQ. However, for food records and FFQ the differences by sex on self-reported EI were inconsistent (37,76). In study populations of adults with overweight or obesity, underreporting of EI was identified to a greater degree compared to adults with normal weight when comparing MPR (50,73), diet history and food record to TEE using DLW. These results could be reflective of a range of reasons including: difficulty to capture dietary intake using the aforementioned methods in this population group such as differences in portion size or frequency of consumption, as well as dieting practices in these individuals, which has been reported previously (87).
In this systematic review, 32 studies used the method of triads (i.e., 2+ measures of diet + DLW) to evaluate the validity of dietary assessment methods (e.g., FFQ, 24h recall, DLW). Nine of these studies used a technology assisted method (25,31,39,48,57,65,67,69,71). The method of triads is a statistical approach occasionally used in dietary assessment research (88)(89)(90). This method began to be utilized for validation of dietary assessment methods in the twentieth century and involves three separate methods to measure dietary intake. These could include a primary method and a reference method and a biomarker (90). The method assumes the linearity between the three measurements and the true intake and independence between the three measurement errors. There are several limitations and systemic errors known to affect this approach including the occurrence of correlation coefficients >1 or negative coefficients which limits the application (90).
Interestingly, FFQ was the most common method used in the included validation studies (n = 12) (5, 25, 38, 48-50, 54, 61, 66, 74, 76, 77). Similar to other methods, FFQs significantly underestimated EI and its reliability is low due to degree of variation in underestimation across studies with under-reporting ranging from 4.6 to 42%. This may be driven by variation within the FFQ method itself, such as length of reporting period and number of foods and beverages on the questionnaire. Despite this, other dietary assessments, including diet history, FR, WFR, 24 h recall, 24 h MPR and SDQ also underestimated EI. Investigating ways to improve accuracy of estimations of EI are needed and technology-based methods may help to better capture portion size and reduce participant burden (84).
The limitations of using self-reported EI from dietary assessment methods have been previously reported (6,91). This includes the timeframe of DLW measurements do not necessarily overlap with the period of time covering EIs measurement. If the total EI of participants were atypical during DLW measurement period, the degree of the under or overestimation would be greater than usual. It should also be acknowledged that TEE measured by DLW is not always equal or nearly equal to energy intake in non-weight stable individuals (92,93). True misreporting of EI may have occurred in the included studies. A lack of agreement between methods may be the result of reporter bias or reactivity which occurs when individuals change their dietary behavior due to greater awareness of the measurement of their dietary intake. Reactivity may stem from an individual's desire to reduce burden by simplifying the reporting process (e.g., consuming single foods rather than combination foods) or to comply with socially desirable norms (i.e., to appear to have a healthy diet by reporting intake as per recommended in dietary guidelines).

CONCLUSION
The majority of dietary assessment methods included in the current review were found to significantly under-estimate EI when compared to TEE measured using the DLW technique. The degree of under-reporting was highly variable across all methods, however, 24 h recalls were associated with a lower degree of mis-reporting and less variation in degree of under-reporting compared to other dietary assessment methods.