1 Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD.
2 Department of Mathematics, Statistics and Computer Science, Bar Ilan University, Ramat Gan, Israel.
3 Dunn Human Nutrition Unit, Medical Research Council, Cambridge, United Kingdom.
4 Nutritional Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD.
5 Applied Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.
6 Department of Statistics, Texas A&M University, College Station, TX.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
biological markers; dietary assessment methods; epidemiologic methods; measurement error; models, statistical; model selection; regression analysis; research design
Abbreviations: AIC, Akaike Information Criterion; BIC, Bayes Information Criterion; FFQ, food frequency questionnaire
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Usually, in large studies, a relatively inexpensive method of measurement, such as a food frequency questionnaire (FFQ), is employed. Investigators now recognize that errors in the values reported on FFQs can profoundly affect the results and interpretation of nutritional epidemiologic studies (35
). Dietary measurement error often attenuates (biases toward 1) the estimated disease relative risk and reduces the statistical power to detect an effect. An important relation between diet and disease may therefore be obscured.
Realization of this problem has prompted the integration into large epidemiologic investigations of calibration substudies that involve a more intensive but presumably more accurate dietary reporting method, called the "reference" instrument. Typically, the instruments chosen for reference measurements have been multiple-day food records, sometimes with weighed quantities instead of estimated portion sizes, or multiple 24-hour recalls. FFQs have been "validated" against such instruments, and correlations between FFQs and reference instruments, sometimes adjusted for within-person random error in the reference instrument, have been quoted as evidence of FFQ validity (6, 7
). Additionally, on the basis of such studies, statistical methods have been employed to adjust FFQ-based relative risks for measurement error (8
), using the regression calibration approach.
The correct application of the regression calibration approach relies on the assumptions that errors in the reference instrument are uncorrelated with 1) true intake and 2) errors in the FFQ (9). Throughout this paper, we take these two conditions as requirements for a valid reference instrument.
Recent evidence suggests that these assumptions may be unwarranted for dietary report reference instruments. Studies involving biomarkers, such as doubly labeled water for measuring energy intake and urinary nitrogen for measuring protein intake (1016
), suggest that reports using food records or recalls are biased (on average, towards underreporting) and that individuals may systematically differ in their reporting accuracy. This could mean that all dietary report instruments involve bias at the individual level, although direct evidence for individual macronutrients other than protein is not yet available. Part of the bias may depend on true intake (which manifests itself in what we call group-specific bias), therefore violating the first assumption for a reference instrument. Part of the bias may also be person-specific (defined below in detail) and may correlate with its counterpart in the FFQ, thereby violating the second assumption.
For this reason, Kipnis et al. (9) proposed a new measurement error model that allows for person-specific bias in the dietary report reference instrument as well as in the FFQ. Using sensitivity analysis, they showed that if the correlation between person-specific biases in the FFQ and the reference instrument was 0.3 or greater, the usual adjustment for measurement error in the FFQ would be seriously incorrect. However, the paper presented no empirical evidence that such correlations exist.
In this paper, we present results of a reanalysis of a calibration study conducted in Cambridge, United Kingdom (1719
) that employed urinary nitrogen excretion as a biomarker for assessing nitrogen intake (20
) in addition to the conventional dietary instruments. The biomarker measurements allowed us to generalize the model by Kipnis et al. (9
) and further explore the structure of measurement error in dietary assessment instruments and its implications for nutritional epidemiology.
![]() |
MODELS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (1) |
Fitting model 1 to observed intake Q instead of true intake T yields a biased estimate of the exposure effect. To an excellent approximation (21
), the expected observed effect is expressed as
![]() | (2) |
![]() | (3) |
Although, in principle, when measurement error eQ is correlated with true exposure T, 1 could be negative or greater than 1 in magnitude, in nutritional studies
1 usually lies between 0 and 1 (22
) and can be thought of as an attenuation of the true effect
1.
Measurement error also leads to loss of statistical power for testing the significance of the disease-exposure association. Assuming that the exposure is approximately normally distributed, the sample size required to reach the requested statistical power for a given exposure effect is proportional to (22)
![]() | (4) |
Commonly used measurement error adjustment
Following equations 2 and 3, the unbiased (adjusted) effect can be calculated as , where
is the estimated attenuation factor. Estimation of
1 usually requires simultaneous evaluation of additional dietary intake measurements made by the reference instrument in a calibration substudy. The common approach in nutritional epidemiology, introduced and made popular by Rosner et al. (8
), uses food records/recalls as reference measurements (F), assuming that they are unbiased instruments for true long term nutrient intake at the personal level. For person i and repeat measurement j, the common model can be expressed as
![]() | (5) |
![]() | (6) |
![]() | (7) |
![]() | (8) |
![]() | (9) |
The calibration data
The data were obtained from a dietary assessment validation study carried out at the Medical Research Council's Dunn Clinical Nutrition Center, Cambridge, United Kingdom (17). One hundred and sixty women aged 5065 years were recruited through two general medical practices in Cambridge. Subjects from practice 1 (group 1) were studied from October 1988 to September 1989, and those from practice 2 (group 2) were studied from October 1989 to September 1990. The principal measures for this study were a 4-day weighed food record and two 24-hour urine collections obtained on each of four occasions (seasons) over the course of 1 year. Season 1 was OctoberJanuary; season 2, FebruaryMarch; season 3, AprilJune; and season 4, JulySeptember.
The weighed food record was the primary dietary report instrument of interest. The weighed records were obtained using portable electronic tape-recorded automatic scales that automatically record verbal descriptions and weights of food without revealing the weight to the subject. Each 4-day period included different days chosen to ensure that all days of the week were studied over the year, with an appropriate ratio of weekend days to weekdays.
Urine specimens were checked for completeness with p-aminobenzoic acid and were used to calculate urinary nitrogen excretion (23). Since it is estimated that approximately 81 percent of nitrogen intake is excreted through the urine (20
), the urinary nitrogen values were adjusted, dividing by 81 percent, to estimate the total nitrogen intake of each individual. Subjects were asked to collect the first 24-hour urine sample on the third or fourth day of their food record procedure and the second sample 34 days later.
In this analysis, we studied nitrogen intake (g/day) and analyzed the Oxford FFQ, which is based on the widely used FFQ of Willett et al. (24), modified to accommodate the characteristics of a British diet. Nitrogen in foods is analyzed directly and then converted to dietary protein content using established factors of 5.186.38 (25
). The FFQ was administered 1 day before the start of the weighed food record in season 3. We used the weighed food record as the dietary report reference instrument and the adjusted urinary nitrogen measurements as the biomarker. Urinary nitrogen has long been used as a critical measure of protein nutriture in nitrogen balance studies (20
, 26
39
), and adjusted urinary nitrogen appears to provide a marker for nitrogen intake that is valid as a reference instrument, as defined in the Introduction. (See the Appendix for more details.)
Note that both weighed food records and urinary nitrogen measure intake over a short period of time, while the FFQ assesses diet during the previous year. Therefore, errors in weighed food records and urinary nitrogen may reflect seasonal patterns in food consumption, but FFQ errors should not, in principle, contain seasonality.
In all of our analyses, we applied logarithmic transformation to the data to better approximate normality. Table 1 lists the mean values and variances of the transformed data according to instrument and season.
|
Suppose that the common assumptions (equations 59) for a reference instrument hold for the weighed food record. We would then expect that using the common approach (8
) with the weighed food record as the reference instrument should lead to nearly the same estimated attenuation as using the urinary nitrogen as the reference instrument. Figures 1 and 2 display scatterplots of averaged weighed food record data versus FFQ data and averaged urinary nitrogen data versus FFQ data, respectively; the slopes of the regression lines give the estimates of the respective attenuation factors. The former method yielded an estimated attenuation factor of 0.282, while the latter estimated it as 0.187; using a statistical test based on their bootstrap distributions, the difference between these two estimates is statistically significant (p = 0.022). This important finding means that the attenuation caused by measurement error in the FFQ is in fact more severe than it would appear when using the weighed food record as the reference instrument. If we accept the previously stated assumptions concerning urinary nitrogen, this result suggests that the weighed food record does not satisfy at least one of the two major requirements for a reference instrumentnamely, that its error be unrelated to true intake and independent of error in the FFQ.
|
|
![]() |
![]() |
The difference r between within-person bias and its group-specific component varies from person to person and may be determined by personality characteristics such as susceptibility to social/cultural influences. We will call it person-specific bias. Note that this error component is part of within-person systematic error and will be reproduced in repeated measurements on the same individual.
Gathering all of the error components together, we model the FFQ intake Qij for individual i and repeat measurement (season) j as
![]() | (10) |
Model for the dietary report reference instrument. As we have argued, we need to allow for systematic group-specific and person-specific biases in dietary report reference instruments. Thus, we now make the same assumptions regarding the error structure for the reference instrument as for the FFQ and use a model which is analogous to that of model 10.
In the Medical Research Council study, each individual i was requested to provide the weighed food record in each (j) of the four seasons. We model these data as
![]() |
![]() | (11) |
Note that the term si in equation 11 is parallel to the term ri in equation 10 for the FFQ. Since the same personality traits can influence both person-specific biases, one may anticipate that the two will have a nonzero correlation (r,s).
Because there was only one application of the FFQ in the Medical Research Council study (17), we cannot estimate
and
separately, only their sum. Thus, we can estimate the covariance between r and s and the correlation between r +
and s, but not the correlation
(r,s). The correlation between r +
and s will be smaller than
(r,s), because
is independent of s.
Model for the biomarker. As we mentioned above, it is reasonable to assume that adjusted urinary nitrogen has errors that are unrelated to true intake and to errors in dietary assessment instruments. The Medical Research Council study included two repeat urinary nitrogen measurements in each of the four seasons. Letting j denote season (j = 1, 2, 3, 4), as before, and k denote the repeat measurement within the season (k = 1, 2), we write this model as
![]() | (12) |
As we explain in the Appendix, external evidence suggests that the variance of the person-specific bias, wi, is very small relative to the variance of other terms in the model. Therefore, we assume in our main analysis that its variance is actually zero, and we show in the Appendix that our results do not change appreciably when other reasonable values of the variance are used.
Unlike model 1011 for dietary assessment methods, which is not identifiable without biomarker data (9), model 12 with a specified value for the variance of wi, such as zero, is identifiable on its own. Fitting it to the Medical Research Council data supports the assumption that the within-person random errors
ijk are mutually independent (i.e., they are not correlated within season) and have constant variances within seasons but not between seasons. In particular, season 2 has a different error variance than the other three seasons, which have similar variances, so that, denoting the variance of
ijk by
.
In contrast, the variances of ij and uij< are assumed to be constant for all i and j; this assumption is supported by examination of plots of residuals after fitting model 1012 to the data. The within-person random errors
ij, uij, and
ijk are assumed to be mutually independent, except when the instruments are administered in the same season, in which case seasonal fluctuations in diet are assumed to produce nonzero correlation between uij and
ijk. To verify that FFQ errors were not affected by seasonality, we initially allowed for nonzero correlations between
ij and each of the errors uij and
ijk in season 3. As we expected, these correlations were found to be very small and statistically nonsignificant, and we did not include them in the final model.
Model 1012 involves 20 unknown parameters. From the data, we can estimate 19 unique variances and covariances. These, together with an assumed value for the variance of wi, allow us to estimate all of the parameters of the model. In practice, we use the method of maximum likelihood for estimation, which increases efficiency when there are missing values in the data.
Alternative measurement error models
Several alternatives to measurement model 1012 have been proposed in the literature. In table 2, we list six models that are special cases of (and nested within) the more general model 1012. These include the common model of Rosner et al. (8) and models proposed by Freedman et al. (42
), Kaaks et al. (40
), Spiegelman et al. (43
), and Kipnis et al. (9
). The defining manner in which each model departs from model 1012 is given in the table. To test the significance of the correlation between person-specific biases in the FFQ and the weighed food record, we also included in the comparison a version of model 1012 with
(r,s) = 0.
|
Plummer and Clayton (19) suggest a quite general model (their model II(c)) that includes our model as a special case. They do not consider person-specific biases but allow group-specific biases to vary in repeat administrations of the same instrument. In addition, within-person random errors are assumed to be correlated, both across repeat administrations of the same instrument and across instruments, with the exception of errors in the biomarker. These are assumed to be correlated across repeat administrations within the same season and with errors in dietary report instruments in the same season but to be independent of measurements taken in different seasons. Moreover, all of the correlations and variances that are assumed to exist are allowed to differ from one another.
Prentice (44) suggested a model similar to that presented by Kipnis et al. (9
), except that he explicitly assumed that
(r,s) =
s/
r (9
). However, all model parameters are allowed to depend on body mass index, and we do not include his model in this comparison.
![]() |
Model Comparison using Medical Research Council Data |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
![]() |
Model comparison results
The results of model comparison are given in table 3. Ideally, one aims to find a model that passes the goodness-of-fit test, is not significantly different from any more complex model, provides a significantly better fit than all models nested within it, and has the highest AIC and BIC scores among all models. For the Medical Research Council data, model 1012 emerges as best by these criteria. First, it is one of only four models, together with its two simplified versions and the model of Plummer and Clayton (19), to pass the goodness-of-fit test. Second, it does not fit the data significantly worse than the more general model of Plummer and Clayton. The likelihood ratio
2 statistic comparing the two models is 38.8 (38.8 = 1,173.2 1,134.4), based on 37 degrees of freedom (37 = 56 19) (p = 0.39). Third, model 1012 provides a significantly better fit (p
0.0011) than any model nested in it. For example, comparing it with its version with uncorrelated person-specific biases, the likelihood ratio
2 statistic is 10.7 (10.7 = 1,134.4 1,123.7), based on 1 degree of freedom (1 = 19 18), with a p value of 0.0011.
|
Attenuation of estimated effect and statistical power
Table 4 displays the estimates of the most interesting parameters for model 1012 and the common model. They include the attenuation factor 1, the variance of true intake
, the correlation
(Q,T) between the FFQ and true usual intake, and the slopes ßQ1 and ßF1 that represent group-specific biases in the FFQ and weighed food record, respectively. For all parameters, except
, there are major differences between model 1012 and the common approach. First, the slope of the regression of the weighed food record on true intake, ßF1, assumed to be 1 in the common approach, is estimated as 0.766 in our model, thereby demonstrating the flattened slope phenomenon in the reference instrument. In addition, the common approach suggests that the slope in the regression of the FFQ on true intake, ßQ1, is 0.661 and the correlation
(Q,T) between the FFQ and true usual intake is 0.432, while our model estimates them as 0.430 and 0.284, respectively, indicating much less accuracy.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our statistical framework allows evaluation of two major common assumptions about a dietary report reference instrument: 1) there is no correlation between its measurement error and true intake; and 2) there is no correlation between its measurement error and that of the FFQ. Our results using the Medical Research Council data suggest that both assumptions are violated because of the presence of both group- and person-specific biases in the weighed food record and the correlation of the person-specific bias with that in the FFQ.
The statistical model we used rests on the requirement that the urinary nitrogen marker for nitrogen intake does itself satisfy assumptions 1 and 2 above. Assumption 1 is supported by several studies, documented in the Appendix, that have examined urinary nitrogen under various controlled feeding situations. Assumption 2 is based on the strong intuition that discrepancies between this biomarker measurement and true intake are caused by physiologic factors and therefore will be unrelated to errors in a dietary report instrument.
We have thus demonstrated that, at least for these data, the weighed food record may well be a flawed reference instrument. There still remains the question, Do these flaws translate into anything of importance? We believe that they do. As was shown above, using the common approach yields the estimated attenuation factor of 0.282, but it is estimated as 0.187 when using the new model. In addition, the estimated correlation between the FFQ-based nitrogen intake and true intake is 0.432 by the common approach but only 0.284 by the new model. This correlation is used as a measure of the FFQ validity, and its squared value represents the loss in statistical power to test the significance of a disease-exposure association. Thus, for these data, the real effect of measurement error in the FFQ is a greater attenuation (51 percent) and a greater loss of power (52 percent) for testing the true effect than would be estimated by the common approach.
Our estimates of the attenuation factor also indicate that the common approach may lead to unexpectedly underpowered studies. For the Medical Research Council data, our model suggests the need for a study 2.3 times larger than would have been designed had the common approach been used.
In summary, our results suggest that the impact of measurement error in dietary assessment instruments on the design, analysis, and interpretation of nutritional studies may be much greater than has been previously suspected, at least regarding protein intake. Both the attenuation of diet effect and the loss of statistical power in FFQ-based epidemiologic studies may be greater than previously estimated, because of the use of dietary reporting methods as reference instruments. This means that current and past studies may be underpowered and may explain some of the null results that have been found in nutritional epidemiology. There is a need to confirm our results by conducting further studies with biomarkers.
Our paper covers only the analysis of protein intake unadjusted for total energy intake. Further work is needed on the effects of measurement error on the analysis of protein density or energy-adjusted protein intake (6), an approach that is often used in nutrition analyses. This will require simultaneous consideration of both energy intake, using a biomarker such as doubly labeled water (10
), and protein intake, using urinary nitrogen excretion. Black et al. (16
) reported results from a small study with such data that supported a correlation between underreporting of protein and underreporting of energy, but also higher rates of underreporting of energy than of protein. As was reported previously (45
), the effect of measurement error in energy-adjusted models can be more complex than in univariate analysis. Therefore, further studies are needed in which data from questionnaires, dietary report reference instruments, and biomarkers for protein and energy intakes are all collected and analyzed simultaneously to investigate the effects of measurement error on protein density or energy-adjusted protein intake.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Among studies with varying levels of controlled conditions in which protein intakes were provided at levels necessary to maintain a positive nitrogen balance (a near-given in diets in developed countries), the long term ratio of urinary nitrogen to dietary nitrogen among individuals is generally within a range of 7090 percent (20, 26
39
). Bingham and Cummings (20
) specifically addressed the question of nitrogen output and validation of dietary intakes in a rigorously controlled feeding study of eight adults adhering to their regular diets and found that the mean ratio of urinary nitrogen to dietary nitrogen was 81 percent, with a standard error of 2 percent (range, 7883 percent). In other well-controlled studies, group means have ranged from 77 percent to 88 percent (26
32
). Generally, urinary nitrogen is robust in free-living adults, except when there is inadequate total energy and/or protein intake, inadequate essential amino acid intake, a very high fiber intake, or profuse sweating (46
49
). None of these conditions are prevalent in adequately nourished populations, and a range of 7090 percent represents a realistic range for biologic variability in the ratio of urinary nitrogen to dietary nitrogen that does not depend on age, gender, and source of protein, as long as subjects maintain a positive nitrogen balance. This is supported by different studies that measured this range in old and young participants and in men and women with soy, egg, meat, or mixed sources of protein in their diets (20
, 26
39
).
Nevertheless, the ratio of urinary nitrogen to dietary nitrogen does not represent an exact biologic constant and may still include interperson variability, or person-specific bias. Three studies described by Bingham and Cummings (20), Oddoye and Margen (28
), and Castaneda et al. (32
) and two studies described by Young et al. (39
) provided information on within-person variation in the ratio (R) of urinary nitrogen to dietary nitrogen and therefore can be analyzed by analysis of variance to estimate and/or test the presence of person-specific bias in the urinary nitrogen biomarker. These five studies represent a valuable subsample of the controlled feeding studies and include both men (20
, 28
, 39
) and women (32
), young (28
, 39
), middle-aged (20
), and elderly (32
) participants, and a variety of protein sources, including soy protein (39
), meat-free protein (32
), formula diets (26
), beef protein (39
), and usual diet (20
).
We carried out a meta-analysis of these five studies using a random effects model for ratio R that included both a random study effect and, nested in it, a random person effect (person-specific bias). The study effect was very small (variance
) and not statistically significant (p = 0.21), while the person effect w was also relatively small (variance
) but highly statistically significant (p = 0.0008). These results provide some evidence that although ratio R does not seem to depend on age, gender, and source of protein intake, it does contain a small person-specific bias. After we pooled all of the participants from the five studies and fitted a random effects model with a random effect representing person-specific bias, the variance of this bias was estimated as 0.0027 (standard deviation 5.2 percent). The mean long term ratio of urinary nitrogen to dietary nitrogen was estimated as 83.5 percent (standard error 2.3 percent), which agrees well with the calibration constant of 81 percent suggested by Bingham and Cummings (20
). The mean ratio (83.5 percent) and the standard deviation of its person-specific bias (5.2 percent) agree well with the general observation that individual ratios fall between 70 percent and 90 percent.
These results suggest that urinary nitrogen level satisfies both requirements for a reference instrument. The stability of the urinary nitrogen:dietary nitrogen ratio and the relatively low person-specific bias support the essential absence of correlation between errors in adjusted urinary nitrogen and true nitrogen. The relatively low person-specific bias and the fact that the bias is probably physiologically based rather than psychologically based also support the essential absence of correlation between errors in adjusted urinary nitrogen and errors in dietary report instruments.
It is interesting to note that the estimated variation due to person-specific bias in the urinary biomarker for protein intake constitutes only about 10 percent of the estimated variation of true protein intake. Nevertheless, to investigate how this person-specific bias might change the result of our model fit, we conducted a sensitivity analysis by including person-specific bias in the biomarker model and changing its value from (the value assumed in the main text) to
(the value estimated in this appendix). The results are reported in appendix table 1. The estimated attenuation factor was not affected by the presence of person-specific bias in the biomarker, since this bias does not violate the two major requirements for the reference instrument. Other parameters in the model changed slightly. The estimated variance of true intake was reduced by the variation due to person-specific bias. The estimated correlation between true intake and its FFQ measure was increased by 4.5 percent, and the estimated slopes in the regressions of FFQ and weighed food record on true intake were increased by approximately 10 percent each. However, the general conclusions reached in the paper remain the same.
|
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|