Edited by: Megan A. McCrory, Boston University, United States
Reviewed by: Janet Tooze, Wake Forest School of Medicine, United States; Francesco Bonomi, University of Milan, Italy
This article was submitted to Nutritional Epidemiology, a section of the journal Frontiers in Nutrition
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Dietary interventions seek to change dietary behaviors – either to affect some clinical outcome or to change the behavior itself. These studies might use only one time point after baseline to assess participant outcomes, or they may be longitudinal, in which participant outcomes are measured several times over the course of months or years after initial group assignment.
Dietary intervention studies usually require investigators to collect nutrient intake data— such as sodium consumption in study participants—to estimate the effect of the intervention on diet. Yet properly measuring dietary intake, especially over time, with high accuracy can be difficult. Direct nutrient intake is rarely observed, and in dietary studies, researchers frequently resort to two methods to measure nutrient intake: self-report or biomarkers (
Self-reported measures generally rely on participants reporting their dietary intake over some period of time, such as the past 24 h or 7 days. This often takes the form of a food frequency questionnaire (FFQ), where participants fill out a survey about their eating habits or a 24-h dietary recall, where participants report everything consumed over the previous day. That is then used to extract information about the nutrients in the food reported as having been consumed. Biomarkers are biologic components from participants, such as blood, urine, or hair, which contain information about a person's nutrient levels. Biomarkers are useful because they objectively measure intake and some provide unbiased estimates of intake. Therefore, biomarkers may be closer to the “truth” than self-reported methods (but still subject to measurement error), and hence provide a better estimate of a person's nutrient intake (
Unfortunately, biomarkers are often expensive, invasive, and/or difficult to implement in a study (
Both of these methods (biomarkers and self-report) act as “proxy” measurements of true intake, because they can be representative, but are potentially imprecise versions of the truth. They are potentially subject to two main types of error: systematic and random. Systematic error, or bias, means that a measure consistently departs from the truth in the same direction (i.e., always higher or lower), and can be hard to detect and analyze statistically (
If researchers are concerned with measurement error, they may have a slight preference for biomarker collection methods because the objective nature of biomarkers leads to less systematic error, but they are still subject to potential random errors such as daily variation in diet (
Given these measurement challenges in nutrition (and many other fields), researchers have developed statistical methods such as
The existing measurement error literature in dietary studies, and their respective correction methods, typically examine measurement error at one specific time point and/or in a single observational cohort. However, these measurement error patterns may not remain constant in longitudinal lifestyle interventions.
In addition, in randomized controlled trials (RCTs), where individuals are randomly assigned to treatment conditions and the intervention and comparison groups have different experiences, self-reporting behaviors could change over time and/or by treatment assignment. Those in the treatment group may become more cognizant of nutrition intake through intervention exposure, leading to increased reporting accuracy. Participants may also modify their self-reported values (even if not necessarily their true intake) to appear compliant with intervention recommendations, which decreases their accuracy (
Self-reported precision could also wane over time as participants experience fatigue with repeated reporting (
As a case study, we examined sodium intake in two longitudinal intervention trials, Trials of Hypertension Prevention (TOHP) (
TOHP was a U.S. based, multicenter, randomized trial of 2,182 participants testing the efficacy of a lifestyle intervention aimed at lowering diastolic blood pressure (DBP) from the high normal range (80–89 mmHg) (
Participants were considered eligible if they were healthy men and women, aged 30 through 54 years, who had high normal DBP and were not taking antihypertensive drugs for the prior 2 months (
Study characteristics and participant demographics in TOPH and PREMIER studies.
751 | 818 | |
Enrollment dates | 1988–1990 | 1999–2001 |
Timing of sodium assessment | Baseline | Baseline |
6 months | 6 months | |
18 months | 18 months | |
Assessment method | 24-h recall | Two 24-h recalls |
24-h urine | 24-h urine | |
Treatment categories ( |
Sodium reduction |
Established |
Male N (%) | 534 (71) | 310 (38) |
Mean baseline BMI (SD) | 27.3 (3.6) | 33.2 (5.7) |
Mean baseline age (SD) | 43 (6.4) | 50 (8.9) |
PREMIER was also a U.S. based, multicenter randomized trial testing the effects of various lifestyle intervention on blood pressure outcomes in 810 adults with above optimal DBP (80–95 mmHg) and who were not taking antihypertensive medications (
Participants were randomly assigned to one of three treatment groups: Established, Established Plus Dash, or Advice Only. The Established group received guidance on improving their dietary habits (including reducing sodium consumption) and increasing physical activity. Established Plus Dash received an intervention similar to Established but also received education on the DASH diet, a diet high in fruits, vegetables and low-fat dairy products. Finally, Advice Only received general healthy behavior advice, but no specific counseling on sodium intake or physical activity.
All eligible participants attended a randomization visit, where researchers randomized them to a group and then collected baseline measurements including two 24-h dietary recalls, and a 24-h urine sample. Trial researchers contacted all participants unannounced at 6 and 18 months after enrollment, at which point individuals again provided two 24-h dietary recalls and 24-h urine samples (
Intake of nutrients and food groups was assessed from unannounced 24-h dietary recalls conducted by telephone interviewers. Two recalls (one obtained on a weekday and the other on a weekend day) were obtained at baseline, 6-, and 18-months by the Diet Assessment Center of Pennsylvania State University. The Nutrition Data System (NDS) developed and maintained by the Nutrition Coding Center of the University of Minnesota was used to generate the estimates of individual nutrient intake from the recalls (
We obtained the datasets for TOHP and PREMIER through an online request from the National Heart, Lung, and Blood Institute BioLINCC data repository after receiving IRB approval through Johns Hopkins Bloomberg School of Public Health and Northwestern University.
For both datasets we consolidated the original treatment and control groups into new ones for our purposes. In TOHP, only the sodium reduction group received counseling on sodium management. Hence, we discarded the stress management and weight reduction groups and only use the original control group in the control arm. For the PREMIER study we considered both behavioral intervention groups (Established, Established plus DASH) as the “treatment” condition, and used the advice only condition as the control condition. We are interested in whether participants in the sodium reduction interventions, more (or less) accurately report their actual sodium intake compared to those in the advice only group, and whether the pattern of measurement error varies over time.
The same data cleaning procedures were used for both studies prior to analysis. First, the biomarker sodium values were converted to dietary sodium values by dividing urine sodium values by 0.86, as only 86% of sodium intake appears in urine (
Our model of interest is a calibration model in which a reference measure (urinary sodium) is regressed on its self-reported version (
We began by plotting the data in order to visualize the relationship between urinary sodium and self-reported sodium and help inform our modeling efforts. We used scatterplots of urinary sodium against self-reported sodium, grouped by time, with an overlapping linear predicted regression line for each condition at each time point.
Mixed effects linear regression was used (
For each trial, we started with an initial model that included main effects for follow-up time (indicators for 6- and 18-months), subjects' self-reported intake, as well as two-way interactions between self-reported intake and time, time and treatment assignment, and a three-way interaction between self-reported intake, time, and treatment. We allow each individual to have a random intercept, and the (log centered) self-reported values to have a random slope, and used an unstructured covariance matrix to model the random effects.
For each person
In Equation (1)
We excluded a main effect for treatment (TX) from the model because the coefficient was ~0. This is expected because we assume treatment and control groups have similar sodium levels at baseline, at least in expectation (because of randomization) and thus reduces an extra parameter.
Including the three-way (self-reported intake by time by treatment) interactions in this initial model allows the relationship between urinary sodium and self-reported sodium to vary over time and across the treatment and control groups. We include a time by treatment interaction to examine whether average levels of urinary sodium differ by time and treatment condition at a fixed level of self-report.
A backwards variable selection approach was used to obtain a final analysis model. First, the initial saturated model with the three-way interaction shown in Equation (
In our second-stage model, we tested the significance of the self-report*time terms (β6, β7), which measure whether the relationship between urinary sodium and self-reported sodium changes over time, assuming any change is constant across the treatment and control groups. Once again, if both coefficients had
Our final model allows urinary sodium levels to change across time and treatment status. In this model we test the time*treatment interactions (β4, β5). If both coefficients had
After selecting our final model we then standardized the regression coefficients. To standardize the exposure—self-reported intake—we subtracted the pooled (control and treatment) mean self-reported intake at baseline from all self-reported values and then divided that result by the standard deviation of self-reported intake at baseline. The outcome—urinary sodium—was similarly standardized, using the pooled mean and standard deviation of urinary sodium at baseline.
Both datasets include people who over and under report by time and treatment status (
Scatterplots of log urinary sodium vs. log self-reported sodium by time and treatment conditions in TOHP. The solid orange (control) and dashed blue lines (treatment) are linear smoothers of urinary sodium as a function of self-reported sodium in each treatment condition. 45-degree line represents where urinary sodium equals self-reported sodium. Units are on the natural log scale.
Scatterplots of log urinary sodium vs. log self-reported sodium data time and treatment conditions in PREMIER. The solid orange (control) and dashed blue lines (treatment) are linear smoothers of urinary sodium as a function of self-reported sodium in each treatment condition. 45-degree line represents where urinary sodium equals self-reported sodium. Units are on the natural log scale.
We overlapped a linear smoother on top of the scatterplot to highlight some reporting differences between the treatment and control conditions. These lines should be considered as preliminary models, as they fit the models separately by time and group, and thus do not allow formal model comparisons across time or group, but the relationships between self-reported and biomarker values appear broadly similar. In both studies at baseline, the two study conditions are approximately equal in urinary vs. dietary sodium levels, as expected from the randomization.
Using the stepwise procedure described above, neither the three-way interactions in model (
This model implies that average measured urinary sodium changes over time (β2, β3), and at different rates in the treatment group vs. control group (β4, β5) but that there is no differential change in the slope of self-reported sodium across groups over time. It is interesting to note that the final regression results in both datasets were very similar to one another.
In TOHP (
Standardized regression output from the final regression model.
Centered self-report β1 | 0.29 (0.23, 0.34) | <0.001 | 0.21 (0.17, 0.26) | <0.001 |
Month 6 control β2 | 0.03 (−0.10, 0.16) | 0.68 | −0.24 (−0.37, −0.10) | <0.001 |
Month 18 control β3 | −0.19 (−0.32, −0.06) | 0.005 | −0.08 (−0.21, 0.05) | 0.23 |
Month 6 Trt. β4 | −0.81 (−1.0, −0.63) | <0.001 | -0.1 (−0.26, 0.06) | 0.20 |
Month 18 Trt. β5 | −0.65 (−0.84, −0.47) | <0.001 | −0.15 (−0.30, 0.0) | 0.06 |
In PREMIER (
If the relationship between urinary sodium and self-reported sodium did not change over time and by treatment condition, we would expect β2, β3, β4, β5 = 0. Instead, we find that β2, β3, β4, β5 < 0, an indication that the relationship between urinary sodium and self-reported sodium does in fact change over time and by treatment status. In general, for a given level of self-report, urinary sodium is
We expand on the current nutrition literature by focusing on the differential measurement error structure of self-reported intake which may arise when the treatment group self-reports their sodium intake with increased or decreased accuracy (
The final models for TOHP and PREMIER look very similar to one another, with slightly different coefficient values. The slopes of self-reported sodium did not change as a function of time or by treatment condition. The lack of significance in the three-way self-report*time*treatment interaction and the two-way self-report*time interaction indicates a lack of significant difference in systematic error in terms of the relationship between self-reported sodium and urinary sodium between the treatment arms across all three time points. However, the intercepts do change by time and/or treatment condition indicating that measurement error is affected by time and/or treatment condition. Further, our final models were much more parsimonious than our initial, fully saturated model. This result suggests that relatively simple measurement error correction models that involve only shifts in the intercept of the calibration model are sufficient to appropriately correct for measurement error.
In PREMIER, we see a decrease in measured urine sodium—conditioning on self-report—at 6 months in the control group, whereas in TOHP we see a much stronger decrease in the treatment group at 6 and 18 months. These results suggest that the relationship between biomarker and self-report can differ by treatment group and/or time, however, these differences may be study specific.
A failure to take into account differential measurement error could result in biased estimates of the treatment effect. For example, in TOHP at 6-months, for a given level of self-reported sodium, participants in the treatment condition had lower urinary sodium than did control participants. A measurement error correction model that did not take this difference into account would result in an attenuated treatment effect because this difference in reporting would not be incorporated into the difference between groups.
Discrepancies in the literature still exist about the relationship between treatment and self-reporting error. Other studies have found evidence for a relationship between treatment assignment and self-report bias, similar to the results of TOHP. In the Women's Health Eating and Living Study, a longitudinal randomized intervention trial with validation data (
One possible solution to examine and address measurement error across time and treatment groups would be internal validation datasets with longitudinal intervention aspects. While this route is resource intensive, it may be worthwhile if it allows researchers to estimate treatment effects with less bias and greater power to detect significant effects. A cheaper or less invasive biomarker would make creating this dataset more feasible. Another option would be more measurement error correction methods, which is why it is important to study how measurement error structures change over time and by treatment status. Siddique et al. (
One limitation of this study is the amount of missing data, with the highest being 29% at 18 months in TOHP and the lowest being 1% at baseline in both studies. The regression models were fit assuming that the missing data was “missing at random” (MAR). This means we assume participants with unobserved dietary sodium information at a given time point will have similar intake values as the observed participants at the same time after conditioning on other observed values (
In both studies, the 24-h recalls and the 24-urine samples were not required to capture the same day of measurement. We assume that these two measures are capturing estimates of short-term intake. Even so, the limited number of measurements at each time point is likely not adequate to capture usual intake. Estimates from both the biomarker and self-reported data are therefore subject to additional variability due to day-to-day variation in diet (
The biomarker sodium levels—measured through urine—are also subject to additional sources of variability. Urinary sodium excretion may reflect more than 1 day of intake (
We found that the measurement error structure in longitudinal studies can differ by time and treatment condition. When correcting for measurement error, intervention researchers need to take these differences into account, either by designing internal validation studies that are also longitudinal or by implementing measurement error correction methods that are explicitly designed to account for these changes in measurement error. Lifestyle intervention trials that fail to do this may draw erroneous conclusions of their results.
Publicly available datasets were analyzed in this study. This data can be found here:
The studies involving human participants were reviewed and approved by Johns Hopkins Bloomberg School of Public Health IRB Northwestern University IRB. The patients/participants provided their written informed consent to participate in this study.
JS originally conceived the idea for studying measurement error in longitudinal studies, and created the original model. AP performed the analyses and drafted the manuscript. ES provided assistance on analyses. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.