From the Department of Epidemiology, German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany.
Received for publication October 10, 2001; accepted for publication June 20, 2002.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
calibration; diet; epidemiologic methods; questionnaires
Abbreviations: Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; FFQ, food frequency questionnaire.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Standardization requires a reference method that is 1) applicable to all centers and all ethnic groups in the same manner and 2) more accurate than the FFQ itself. It is sufficient to have reference measurements for subgroups of randomly selected subjects in each center and group. In the European Prospective Investigation into Cancer and Nutrition (EPIC) and the HawaiiLos Angeles Multiethnic Cohort studies, 24-hour recalls are used as a reference assessment method (68). Other recommended reference methods are diet records and biomarkers (1). Note that, compared with FFQs, they all refer to very short periods of exposure. Thus, a standardization procedure should allow for the possible nonconformity of time periods of the assessment methods being compared.
Standardization can be carried out by the technique of calibration. For each center, a calibration function has to be determined on the basis of measurement pairs available for a subgroup. Denote dietary intake data obtained by FFQ and the reference method as Q and R, respectively; then, the calibration function is a function of Q that approximates R in some sense and that will be applied afterwards to the FFQ data for all subjects in the center. An often-used statistical procedure to find a good approximation is linear regression based on the method of least squares or on the maximum likelihood method (9, 10). If positive correlation between Q and R is assumed, the corresponding calibration function is strictly monotonic increasing and therefore does not change the rankings of the subjects. Linear regression calibration also guarantees that the arithmetic means of the calibrated FFQ intakes and the reference measurements always coincide for each center. However, apart from the equality of means, the distribution of the calibrated questionnaire data can be very different from that expected from the reference measurements for the same period of exposure. In particular, the estimated variance and the range of measurements are often too small after linear regression calibration. This discrepancy is a serious weakness of linear regression calibration if applied in a multicenter study. In a pooled data set of calibrated dietary intake, the rankings for one center can deviate markedly from those expected from the reference measurements. Consequently, the center effect and the diet effect on a disease are confounded, implying biased results in nutritional epidemiology.
In this paper, a nonlinear calibration approach is proposed that prevents the confounding effects caused by pooling data. Its application ensures that the rankings for one center are similar in both pooled data sets of calibrated FFQ intake and estimated usual intake. Thus, the reference data control the ranking of calibrated data from different centers. To ensure the same time frame as the FFQ, the calibration procedure starts by estimating the usual dietary intake distribution from the short-term reference measurements from each center. Then, the estimated center-specific usual intake distribution is approximated by using a strictly monotonic increasing, but nonlinear function of Q. The calibrated FFQ data can be considered standardized long-term dietary intake, where standardization refers to mean, variance, skewness, and kurtosis.
The following section describes the proposed method in detail. It is then applied to data from a validation study performed within the EPIC-Potsdam study (11). In this paper, men and women are considered two groups with different FFQ biases, and dietary intake is calibrated separately. Using these data, we compare the nonlinear method with three different linear calibration methods, including the classic linear regression calibration.
![]() |
A NONLINEAR CALIBRATION METHOD |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In step 1, the reference measurements Rij will be transformed to obtain a sample distribution toward normality. To achieve this, the well-known two-parameter family of Box-Cox transformations (16) is used, which is defined by
. (1)
We confine ourselves to Box-Cox transformations with a power parameter that is zero or the inverse of a positive integer. This restriction simplifies the back-transformation because an exact formula can be used (refer to the Appendix). Because of the parameter restriction, we cannot estimate
and
by using the common maximum likelihood method (1618). Therefore, we apply a grid search procedure to maximize the Shapiro-Wilk statistics, the statistics applied most often to test the hypothesis of an underlying normal distribution. A special macro was written by using SAS software (SAS Institute, Inc., Cary, North Carolina), where
varies over the grid (1, 1/2, 1/3,... , 1/10, 0) and
varies over the same grid multiplied by the mean of the original data.
Subsequent steps 2 and 3 are based on the assumption that the classic measurement error model with normally distributed components holds for the transformed reference measurements Xij = g(Rij). This model has the form
. (2)
Here, Ti denotes the true usual intake of the ith individual according to the transformed scale, where "usual" refers to a long-term daily average, in most cases an average over 1 year. The error term ij includes the day-to-day variation as well as the random measurement error. Because Ti and
ij are assumed to be independent and normally distributed with expectations µ and 0 and variances
T2 and
2, respectively, the resulting distribution of Xij is also normal with expectation µ and the summed variance
T2 +
2. Moreover, the average
for the ith individual, derived in step 3, has the same distribution as Xij, with the only exception being that the variance is reduced to
. (3)
Here, k denotes the number of replicates. Finally, the variable defined by
(4)
has the same distribution as the transformed usual intake Ti. Therefore, if the standard estimators for the unknown parameters are applied, Ti can be estimated by
(5)
in step 3, where denotes the empirical variance and
stands for the grand mean. The ratio on the right-hand side of equation 5 is called shrinkage factor because it is always less than 1. Thus,
is a shrinkage estimator that shifts the individual mean
to the grand mean to remove the remaining intraindividual variation in the individual means. This estimator should not be confused with an empirical Bayes or Stein-type estimator that has a similar form but another motivation. Note that the quantity under the square-root sign in this equation can be negative. In this rare case, the variance component of Ti should be estimated by using the nonnegative minimum biased invariant estimator of Hartung (19).
In step 4, the estimated usual intake will be back-transformed to the original scale of the reference measurements by integrating the inverse function g1 (t +
) over the normal distribution of the error term
, ending with usual intakes
in the original scale. Instead of approximating the integral as proposed in the original Nusser method (12), an explicit formula for the integral can be used in the back-transformation step to simultaneously improve the accuracy and reduce the computational effort (refer to the Appendix). Applying the back-transformation formula ensures the equality of the arithmetic means of the original and back-transformed data, provided that the distribution of the transformed data is approximately normal or at least symmetric.
The second phase, consisting of steps 5 and 6, represents the centerpiece of the calibration procedure because it relates the questionnaire measurement to the estimated usual intake. Actually, the FFQ data are standardized to be distributed similar to the variable obtained in step 4. At first, a power transformation
f(Qi) = (Qi + d)c (6)
is applied to approximate skewness and kurtosis of . Principally, any optimization method for solving equations that allows for the two parameter restrictions c > 0 and d > min Qi can be used. We estimated the parameters by minimizing the sum of squared deviations, where c varied in an interval (0, cmax] and d in an interval (min Qi, dmax]. Next, a linear function
f *(Qi) = af(Qi) + b = a(Qi + d)c + b (7)
is determined to have the same mean and standard deviation as those for the estimated usual intake distribution. Note that a linear transformation does not change skewness and kurtosis of the distribution and therefore does not cancel out the usefulness of step 5. The function f * represents the nonlinear calibration function. In general, calibrated values will be positive. In the rare case of negative values, they should be replaced by zero. Since the parameters a, defined as a ratio of two standard deviations, and the power c are both positive values, the nonlinear calibration function is always strictly monotonic increasing and does not change the ranking of the subjects.
![]() |
APPLICATION AND COMPARISON OF NONLINEAR AND LINEAR CALIBRATION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As shown in table 2, we compared the arithmetic mean, standard deviation, skewness, and kurtosis of the sample distribution of all 24-hour recalls; the distribution of individual mean intake as the average of 12 recalls; and the distribution of estimated usual intake. For energy, total protein, total fat, and total carbohydrate, and for both genders, the sample distribution was flat with a high standard deviation reflecting both inter- and intraindividual variation. As a result of reduced intraindividual variation, the individual means of repeated recalls had a lower standard deviation along with increased lower and decreased higher percentiles. Since usual intake was estimated after all intraindividual variation was eliminated, its distribution was even more shrunken than that of the individual means. Whereas the arithmetic means of the three distributions coincided, skewness and kurtosis were generally closer to zero if intraindividual variation was reduced or eliminated. Thus, the day-to-day individual variation in dietary intake had a higher degree of nonnormality than the variation in usual intake in the population. Altogether, table 2 demonstrates that the estimated usual intake distribution clearly differed from the sample distribution of short-term reference measurements.
|
In tables 3 and 4, four different calibration methods were evaluated by comparing the distribution of calibrated FFQ intake with that of usual intake for energy, total protein, total fat, and total carbohydrate. In addition to the nonlinear calibration method described in this paper and classic linear regression calibration, two further linear methods were involved. Additive and multiplicative calibration were defined by linear functions of only one parameter. Whereas the slope was fixed at 1 in the additive procedure, multiplicative calibration was characterized by 0 intercept. In both cases, the unique parameter was estimated by setting the means of the calibrated and reference measurements to be equal.
|
|
After separate calibration of FFQ data for men and women, we pooled both data sets. We also pooled the estimated usual intake values for both genders. To describe the order in the pooled data set, we considered the following two proportion functions:
(x) =
and
(x) =
If x is chosen as a low intake, (x) can be interpreted as the percentage of men among low consumers; in the case of a high intake value x,
(x) represents the percentage of men among high consumers. Obviously, the corresponding proportion functions for women are simply the differences between these values and 100 percent. If the two proportion functions
(x) and
(x) of the calibrated data are similar to the ones for estimated usual intake, between-group validity of the calibrated intakes is high. In the opposite case, the calibration method fails to rank men and women correctly.
The effect of calibration on the mixture of groups in a pooled data analysis was illustrated for total protein intake (figures 1 and 2). In figure 1, the proportion (x) of men among study participants whose total protein intake was less than x was given for varying threshold value x. The proportion of men increased from 0 percent to 55 percent when the upper limit of usual protein intake was increased from 55 g/day to 100 g/day. Obviously, the proportion functions of usual and nonlinearly calibrated intake were very close, reflecting a similar mixture of groups in the ordered sample. In contrast to the nonlinear method, the proportion function for linear regression remained equal to zero up to an intake of 72 g/day and was steeper in the narrow interval from 75 g/day to 85 g/day. Consequently, the data for both genders were stronger separated by linear regression than they should have been. On the other hand, additive and multiplicative calibration diluted the separation of genders by protein intake. The same effect can be seen in figure 2 by considering the percentage of men among individuals whose protein intake was above a specified threshold x.
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this paper, we described and applied a standardization procedure based on nonlinear calibration. The proposed method is characterized by three features. Firstly, it is aimed at usual intake rather than short-term intake directly assessed by the reference measurements. This focus on usual intake is relevant for applications in nutritional epidemiology because long-term habitual dietary intake is the primary exposure of interest. The nonlinear, multistep procedure enables us to eliminate the annoying intraindividual variation in the reference measurements and to synchronize the intake data obtained by using both assessment methods. Secondly, the calibration procedure yields an approximation of the whole usual intake distribution, not of the mean value only. In contrast to all linear calibration methods, there is no common correction factor for low, medium, and high reported intake. Rather, a flexible correction function ensures a good overall fit of the distribution and a simultaneous treatment of over- and underestimation dependent on the reported intake. Thirdly, the method handles different grades of nonnormality of the intake data. Thus, it can also be applied in cases of highly skewed reference measurements, an often-occurring phenomenon in nutrition.
In a complex study, standardization of dietary intake requires separate calibration of centers, groups, or substudies. The chosen calibration method should maintain the within-center validity and establish a high between-center validity of the assessment method. If validity is interpreted as the capability to rank subjects correctly, within-center validity can be maintained by using a strictly monotonic-increasing calibration function at each center. Here, the Spearman correlation coefficient between reference measurements and FFQ after calibration is equal to the one determined before. None of the calibration methods evaluated in this paper affects within-center validity because they do not change the ranking of subjects within centers. However, they clearly differ in their capability to reach a high between-center validity. The empirical results presented in this paper demonstrate that only the described nonlinear method can rank intakes from different centers similar to the expected ranking of usual intake.
It is well known that linear regression calibration in a one-center epidemiologic study can be performed either before or after relative risks are estimated. The two approaches, sometimes referred to as the imputation and risk correction methods, yield the same final estimates (9). In multicenter studies, this property of linear regression calibration does not hold anywhere, however. Thus, we must decide at which point data should be calibrated. From the theoretical point of view, calibration is a data processing step, and risk estimation is part of statistical analysis that requires processed data; consequently, relative risk estimates should already be based on calibrated data so that later correction is not necessary. Since the bias of the assessment method chosen for a multicenter study depends on the center, calibration must be performed in each center separately before data are pooled. Thus, relative risks should be estimated by using pooled calibrated data. Doing so ensures that the often-criticized overall measurers of meta-analysis are not needed.
The proposed nonlinear calibration method requires repeated reference measurements. Repetitions are necessary to obtain an estimate of intraindividual variance, which acts as a connecting link between the observed distribution of reference measurements and the usual intake distribution. Without knowledge of intraindividual variation, shrinkage of the individual means to the grand mean on the normal scale cannot be quantified. Other assumptions concerning the reference measurements tacitly made in this paper, such as equal intraindividual variances and nonexistence of nuisance effects, can be avoided by performing initial data adjustments well known in the estimation theory of usual intake distributions (1214). However, to apply the proposed calibration method, one supposition should always be fulfilled: The subjects selected for reference measurements must be representative of the study population. A random selection procedure and a sufficiently large number of selected subjects should ensure that the usual intake distributions of study population and subgroup do not differ greatly.
Nonlinear calibration is not only a standardization procedure but also a method to improve the accuracy of FFQs, supposing that reference measurements are more accurate and reliable. The reference methods commonly used are 24-hour recalls and food records, although in the last few years there has been broad discussion about biomarkers being the preferred reference method. Because accuracy of the calibrated FFQ strongly depends on the accuracy of the reference method, the bias of reference measurements is the crucial issue, and a permanent search for better reference methods is necessary. The proposed nonlinear calibration procedure should be applied only if well-accepted reference measurements are available.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Let t be any value in the transformed scale. Since t was measured with an error , the inverse function g1 of the Box-Cox transformation (16) must be applied to the term t +
. Subsequent integration over the error distribution yields the general formula
(8)
for the back-transformation. Here, is the density of the normal distribution with zero mean and variance
. In the special case g(R) = ln(R +
), we obtain the well-known result
. (9)
Now let the inverse power p = 1 of the transformation function g be a positive integer. Then, the back-transformation can be calculated by using the binomial formula
(10)
where x!! denotes the product of all uneven integers from 1 to x with the exception of (1)!! = 1.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|