a Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USA.
b National Cancer Institute, Bethesda, MD, USA.
c Department of Environmental Health, Colorado State University, Fort Collins, CO, USA.
d European Institute of Oncology, Milan, Italy.
Reprint requests: Dr Theodore R Holford, Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, PO Box 208034, New Haven, CT 065208034, USA.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods Patients with breast-related surgery at Yale-New Haven Hospital were interviewed using a standardized questionnaire, and breast adipose tissue samples were analysed for nine PCB congeners (74, 118, 138, 153, 156, 170, 180, 183, 187). The study recruited 490 women (304 cases and 186 controls) between 1994 and 1997. Logistic ridge regression was used to analyse the instability caused by collinearity.
Results Although total PCB did not appear to be associated with breast cancer risk, significant differences in effect were observed among the nine congeners. Logistic ridge regression demonstrated a protective effect on breast cancer risk for a potentially anti-oestrogenic and dioxin-like congener, 156, while two phenobarbital, CYP1A and CYP2B inducers had an adverse effect, 180 and 183. This analysis also suggested that a protective effect for another phenobarbital congener, 153, was largely explained by instability caused by collinearity.
Conclusions These results indicate that studies of PCB congeners and health require an in-depth statistical analysis in order to better understand the complex issues related to their collinearity.
Keywords Breast cancer, environmental epidemiology, PCB, logistic regression, ridge regression
Accepted 16 May 2000
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Potential mechanisms by which PCB could affect risk of diseases like breast cancer include various types of hormonal activities that have been observed for specific congeners. Some exhibit oestrogenic behaviour,3,4 thus possessing the theoretical potential for increasing breast cancer risk. Indeed, Dewailly et al.5 suggest that accumulation of such organochlorines may increase risk for hormone-responsive breast cancers. On the other hand, some congeners behave as anti-oestrogens, which could in principle counteract the putative harmful effects of oestrogen and result in an ameliorative effect on breast cancer risk, not unlike the possible protection offered by tamoxifen.6 Wolff and Toniolo7 have suggested that the common practice of analysing epidemiological studies by considering total PCB as a putative risk factor could be missing important nuances in exposure. If the putative effects of PCB arose because of oestrogenic activity, then it would make better sense to analyse these exposure groups separately. Other factors that could be considered in such groupings include persistence, dioxin-like activity, phenobarbital induction, potential for inducing cytochrome P450 enzymes,4,7 and degree of chlorination.8 An analysis that included these factors would reflect the underlying biology thought to be relevant in the aetiology of breast cancer. However, this approach could still miss potentially important combinations of congener exposure if the underlying mechanism, and thus the appropriate grouping of congeners, was unknown.
Commercial PCB are marketed by the proportion of chlorine by weight, so that a variety of congeners are usually present in a particular product.1 Hence, exposure to one congener is likely to be correlated with exposure to others that happened to be in the same product. The resulting high correlation among the exposure levels for individual congeners can result in collinearity, which confounds the estimates of affect on breast cancer risk. For example, if the exposure to an oestrogenic congener was highly correlated with an anti-oestrogenic congener, then the effect of either one on breast cancer risk might not be apparent because when one was high, the other would also be high and their combined effect could tend to cancel each other out. In this hypothetical case, a better analysis might be to include the difference in exposure levels for these two congeners as a covariate in the analysis instead of their total.
We consider below the joint effect of nine PCB congeners measured in breast adipose tissue on the risk of breast cancer. These data are from a case-control study in Connecticut, and the effect of total PCB exposure along with the effects of individual congeners taken one-at-a-time was considered in an earlier manuscript.9 Because the exposure levels for individual congeners were highly correlated, collinearity could have a profound effect on the estimated associations. The statistical techniques used to assist in the interpretation of these results included principle components analysis and ridge regression.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Breast adipose tissue and chemical analysis
Breast adipose tissue not needed for diagnostic purposes was collected and placed into a glass bottle on ice, coded and frozen within 30 minutes after biopsy. Samples were stored at 84°C until being packed in blue ice and mailed in batches to the study laboratory at Colorado State University, where they were stored frozen until analysis. Laboratory personnel did not know the case-control status of samples.
Nine PCB congeners that are known to be abundant in human adipose tissue were measured in this study, International Union of Pure and Applied Chemists (IUPAC) congener numbers: 74, 118, 138, 153, 156, 170, 180, 183, and 187. These congeners were found to have higher concentrations than others in preliminary analyses of some study samples, and thus they were likely to provide the greatest precision of measurement in the chemical analysis. Stellman et al.10 report the relative abundance of 14 PCB in Long Island, NY, and the reported means for our nine congeners represent 87% of their total mean concentration. Earlier data reported by McFarland and Clark1 provide per cent of total PCB for what was known about all 209 congeners, and our nine congeners comprise 59% of the total found in human fat, compared to 66% for Stellman et al.'s 14.
The laboratory method for analysing PCB congeners in breast adipose tissue has been described elsewhere.11 The method involves extraction of the compounds of interest in hexane, separation of the organochlorine pesticides from the PCB and purification of the sample using Florisil® chromatography, and identification and quantification of the compounds using gas chromatography. The quantitation limits (ppb) of this method were: 7.5 for 74; 12.5 for 118; 10.0 for 138; 25.0 for 153; 25.0 for 156; 20.0 for 170; 12.5 for 180; 10.0 for 183; and 12.5 for 187.
Strict quality control/quality assessment procedures were followed throughout sample analysis, including method spikes, reagent blanks, and the establishment of quality control windows. Quality control (QC) spike recoveries for the PCB throughout the sample analyses ranged from 83% to 108% with a coefficient of variation (CV) that ranged from 12% to 24%. The QC spike recoveries for the PCB isomers during the sample analyses ranged from 82% to 96% with a CV of 1025%. Adipose tissue levels of PCB congeners were reported as parts per billion (ppb), which is equivalent to nanograms of PCB congener per gram of lipid. The amount of lipid in the sample was quantified gravimetrically.
Statistical methods
The data analyses were based on lipid-adjusted adipose tissue levels of PCB using a linear logistic regression model as implemented in PROC GENMOD in SAS.12 Covariates included in the final model were age, body mass index, lifetime months of lactation, age at menarche, age at first full pregnancy (<25 and 25), number of live births (none, <3, and
3), dietary fat intake, household income (<$8750, 875014 284, 14 396 24 999,
25 000, and unknown), and fat levels of DDE (<435.2, 435.22784.3, 784.41437.3,
1437.4). DDE was included as a covariate because earlier studies suggested that it might be associated with breast cancer risk.
Of primary interest in this analysis was the joint effects of individual PCB congeners on risk of breast cancer, and whether the effect of each congener was the same, which was tested using a linear contrast. If these results suggested that the magnitude of effect on breast cancer risk was different among the congeners then it would not make sense to evaluate total PCB exposure, but to investigate the joint effects of each congener. Regression diagnostics were used to determine whether the results were sensitive to one or more influential observations, but the overall conclusions were found to be stable. Bootstrap methods were used to estimate bias in the estimates of risk, as well as providing alternative estimates of standard errors.13 While the resulting standard errors were slightly greater, the conclusions were essentially unchanged, so these results are not presented.
If individual congener levels are highly correlated, as was the case, then collinearity must be considered. It is well known that when the collinearity is extreme, the numerical accuracy of the results can be affected. While we did not find evidence of such inaccuracies in this analysis, the correlations were high enough to produce profound effects on the estimated associations with breast cancer risk. This was addressed in the analysis by (a) variable reduction, (b) principle components, and (c) ridge regression, which are described below.
Variable reduction
The simplest way of dealing with collinearity is to drop redundant variables from the regression model. This approach is predicated on the idea that only a subset of variables is needed in the regression model. If, on the other hand, we have two chemicals that have highly correlated exposure levels, and they both affect the risk of disease, it may not be possible to accurately estimate their separate effects on risk. Dropping one of the chemicals from a model may not change the accuracy of the prediction, but it forces that regression coefficient to be zero, thus providing a biased estimate of effect for both chemicals.
Principle components
An alternative to dropping variables from the analysis is to consider linear combinations that make substantive scientific sense. One of the simplest and most commonly used summaries is to analyse the effect of total PCB. Principle components analysis offers another approach for identifying components of variation among the factors of interest, predicated on the idea that there are common unmeasured factors giving rise to an observed joint distribution of exposures. In order to understand better the nature of the effects for individual congeners, principle components analysis was used to create factors that were independent of each other. Using PROC PRINCOMP in SAS we estimated the eigenvectors, which provided loading scores that gave rise to new variables to be included in the linear logistic model.
Ridge regression
Yet another approach for dealing with instability of parameter estimates in the presence of collinearity is ridge regression.14,15 Extensions of this idea to binary outcomes and/or the generalized linear model are provided by Schaefer16,17 and Segerstedt.18 To employ this method we first normalized the exposure measures for each congener by subtracting the mean for the study population and dividing the result by the standard deviation for the congener. Maximum likelihood estimates of the logistic regression parameters for each congener were obtained using PROC GENMOD in SAS, first adjusting for age alone, and then the remaining covariates. These results were then used to obtain ridge regression estimators for a particular ridge coefficient, k(0), by implementing the formulae described in the Appendix using PROC IML of SAS.
A fundamental issue in ridge regression is the selection of the ridge coefficient, k. When k = 0, the result is the usual maximum likelihood estimator, and as k becomes large the ridge estimators eventually go to 0, although before reaching the limit they can change sign. Parameter estimates that are heavily influenced by collinearity tend to change rapidly for small values of k, and become more stable as k increases. This phenomenon can be observed by creating a ridge trace, which plots the ridge estimators against k. The reported ridge estimator of a regression parameter uses a small value of k in the range in which the ridge trace has been stabilized. While the resulting estimator is no longer a maximum likelihood estimate, and is thus biased, it will generally have reduced variance. Our objective is to find estimates that have a reduced mean squared error, which is the sum of the variance and the square of the bias. The results from this approach are not unique in the sense that a certain amount of judgement is necessary when interpreting the ridge trace, but the fact that the choice of k is in the range where the trace is stable means that some variation in this coefficient will not substantially change estimates of effect.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One difficulty in interpreting these results is the fact that levels of exposure to individual congeners are correlated, and thus collinear. While very highly correlated factors can cause the numerical accuracy of regression parameters to suffer, this was not the case for these data. In fact, the likelihood ratio statistics used to test the overall significance of the individual congeners and the differences in the congener-specific effects does not depend on a matrix calculation, which is the type of computation most affected by collinearity. Instead, our primary concern here is with interpretational difficulties of the effects, and the extent to which they can be attributed to specific congeners.
We have shown that our model is able to identify groups of women with very different levels of breast cancer risk. The trend in risk by the quintiles formed from the breast cancer risk score (Table 3) is apparent. In addition, the highest quintile has 3.57 times the risk of those in the first, which is a sizeable effect. Hence, it is clear that this approach is able to separate these women into categories with important differences in breast cancer risk.
The principle components analysis showed that 75% of the overall variation in the levels of exposure for congeners 153, 156, 180 and 183 was explained by the first component, which was similar to their total. Because this was not significant, our results are consistent with studies that have not found an effect due to total PCB exposure. However, we have found significant differences in the effects of the individual congeners.
Ridge regression allowed us to determine the individual congener effects that were most affected by collinearity. The effects of 180 and 183 remain positive at the value of the ridge coefficient that stabilizes the parameters (k = 20). In addition, the estimated protective effect of 156 is little changed by varying the ridge coefficient. The effect of 153, on the other hand, exhibits a profound change, from a strong protective effect to one that largely disappears when the ridge coefficient stabilizes the parameter estimates.
Wolff et al.4 have proposed a grouping of the congeners as a means of interpreting their possible effects on breast cancer risk. Congener 156 is in Group 2, whose members are anti-oestrogenic and dioxin-like, and thus could interfere with the harmful effect of oestrogen to provide a protective effect on breast cancer risk. On the other hand, congeners 180 and 183 fall into Group 3, the phenobarbital, CYP1A and CYP2B inducers, and the manner in which these might affect breast cancer risk is far from clear. Some have suggested that induction of CYP1A1 gene expression could lead to a protective effect.2 Yet, two epidemiological studies have found that CYP1A1 polymorphism may increase risk of breast cancer especially among some smokers.20,21 A third epidemiological study failed to find such an association,22 so much remains to be learned about the aetiology of breast cancer, but at this point it is impossible to predict the manner in which a chemical might affect its risk. Our results are broadly consistent with what might be expected for both of these groups. Ridge regression suggested that the significant protective effect for 153 that was observed initially could largely be accounted for by the instability caused by collinearity. Because 153 belongs to Group 3 along with 180 and 183, one would not expect the opposite effects observed in the initial analysis. However, further study revealed that the protective effect was largely due to an artifact of collinearity. Hence, the ridge estimators of the effects of the individual congeners are consistent with what one might expect from the congener groups proposed by Wolff et al.4
While our results are broadly consistent with the categories of PCB congeners proposed by Wolff et al.,4 there still appears to be some differences in the effects of congeners within these broad groups. The use of these categories assumes a knowledge of the mode of action with respect to breast cancer risk, but this is probably premature. It is also possible that while a category might suggest a direction for an effect, the magnitudes could still differ.
The PCB are readily absorbed from the gut in mammals and very little is excreted in its parent form, so that they tend to build up in adipose tissue because of their lipid solubility.23 The rate of biotransformation for PCB tends to decrease with the number of chlorines on the biphenyl nucleus; and it varies with the position of the chlorines, as well as the species. Dogs and rats, for instance, tend to metabolize PCB much more readily than monkeys. For example, 50% of congener 136 is eliminated by dogs and rats in one day, while monkeys require 6 days. On the other hand, congener 153 which like 136 has six chlorines, has a half-life of 8 days in dogs, but monkeys and rats are apparently not able to eliminate 50% of the administered dose over their remaining lifespan.24 Wolff et al. found a median half-life for total PCB of 37 years among cases in the New York University Women's Health Study, but it was indeterminate among controls.25 Clearly, some of these compounds remain in the body for a very long time, but much remains to be learned about their biotransformation in humans.
Unlike most of the earlier studies of the effect of PCB on breast cancer risk, we determined levels in breast adipose tissue instead of blood serum. Tessari and Archibeque-Engle26 have pointed out the analytical difficulties with the chemical measures of the extremely low levels of PCB found in sera. In addition, serum levels can be very sensitive to short-term changes in diet, such as fasting. For a subset of these cases, we also measured serum levels of individual congeners and, while we found that the two are correlated (data not shown here), sera offer an imperfect surrogate for the levels found in breast adipose tissue. It is well known that even purely random errors in exposure measurements can bias estimates of association with disease risk. This could account for the absence of congener specific effects observed by Moysich et al.,8 who determined serum levels of PCB.
Our use of breast adipose tissue precluded the possibility of using general population controls in this study, requiring instead the recruitment of women without breast cancer who had undergone breast surgery. Many of these subjects had other benign breast diseases, some of which may also be related to hormonal factors. If the effect of PCB on the breast diseases in the controls is in the same direction as that of breast cancer, then our study would be biased in the direction of underestimating the effect of PCBs on breast cancer. Alternatively, if some of the benign conditions in the controls are really precursors of cancer, the direction of the bias would again be to attenuate estimates toward the null, especially if the effect of PCB occurs early in the disease process.
A limitation of this study is the fact that only 9 of the 209 PCB congeners were measured for each subject. While these are among the most common congeners found in humans, comprising 87% of the total mean concentration found in a Long Island, NY study,10 it is still possible that we have measured compounds that are actually surrogates for others that are the real causative agents for breast cancer. We have noted the collinearity among the congeners we did measure, so it is not a large stretch to consider the possibility that chemicals that have not been measured may also influence the associations presented here.
Further work is needed to explore the effect of individual congeners on breast cancer risk. The use of ridge estimators appeared to offer a useful contribution to our understanding of the effects for these congeners in terms of the classification provided by Wolff et al.4 However, determination of the ridge coefficient required a certain level of judgement on which different investigators could honestly disagree. It would be important to see whether other studies that measured exposure to individual congeners in adipose tissue arrived at similar conclusions regarding a potential protective effect of congener 156 and adverse effects of congeners 180 and 183.
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
The variance of the ridge estimator is given by
![]() |
In this analysis we were primarily concerned with the collinearity among the congeners, but it was also necessary to adjust for covariates, U. To accomplish this we used residuals from a prediction of the congener levels from the covariates as the regressors in the formula given above, X* = [I U(U'U)1U']X, along with a similar term for the response, Z* = [I U(U'U)1U']Z. When used in the expression for the ridge regression estimator given above, this provided values that were adjusted for covariates. The ridge estimators were obtained by using output files from PROC GENMOD in SAS to obtain the fitted values and the weights, which were in turn input into PROC IML for the final step in the calculations.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Safe SH, Zacharewski, T. Organochlorine exposure and risk for breast cancer. Etiology of breast and gynecological cancers: Proceedings of the Ninth International Conference on Carcinogenesis and Risk. Clin Biol Res 1997;396:13345.
3 Safe SH. Toxicology, structure-function relationship, and human and environmental health impacts of polychlorinated byphenyls: progress and problems. Environ Health Perspect 1992;100:25968.[ISI]
4 Wolff MS, Camaan D, Gammon M, Stellman SD. Proposed PCB congener groupings for epidemiological studies. Environ Health Perspect 1997;105:1314.[ISI][Medline]
5 Dewailly É, Dodin S, Verreault, R et al. High organochlorine body burden in women with estrogen receptor-positive breast cancer. J Natl Cancer Inst 1994;86:23234.[ISI][Medline]
6
Fisher B, Contantino JP, Wickerham DL et al. Tamoxifen for prevention of breast cancer: Report of the National Surgical Adjuvant Breast and Bowel Project P1 Study. J Natl Cancer Inst 1998;90: 137188.
7 Wolff MS, Toniolo PG. Environmental organochlorine exposure as a potential etiologic factor in breast cancer. Environ Health Perspect 1995;103(Suppl.7):14145.[ISI][Medline]
8 Moysich KB, Ambrosone CB, Vena JE et al. Environmental organochlorine exposure and postmenopausal breast cancer risk. Cancer Epidemiol Biomark 1998;7:18188.[Abstract]
9
Zheng T, Holford TR, Tessari J et al. Breast cancer risk associated with PCBs by congener. Am J Epidemiol 2000;152:5058.
10 Stellman SD, Djordjevic MV, Muscat JE et al. Relative abundance of organochlorine pesticides and polychlorinated biphenyls in adipose tissue and serum in women in Long Island, New York. Cancer Epidemiol Biomark 1998;7:48996.[Abstract]
11 Archibeque-Engle AL, Tessari JD, Winn DT et al. Comparison of organochlorine pesticide and polychlorinated biphenyl residues in human breast adipose tissue and serum. J Toxicol Environ Health 1997;52:28593.[ISI][Medline]
12 SAS Institute. SAS/STAT User's Guide. Version 6. Cary, NC: SAS Institute, 1990.
13 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall, 1993.
14 Weisberg S. Applied Linear Regression, 2nd Edn. New York: John Wiley & Sons, 1985, pp.25359.
15 Neter J, Wasserman W, Kutner MH. Applied Linear Regression Models. Homewood, IL: Richard D Irwin Inc., 1983, pp.393400.
16 Schaefer RL, Roi LD, Wolfe RA. A ridge logistic estimator. Commun Statist-Theory Meth 1984;13:99113.
17 Schaefer RL. Alternative estimators in logistic regression when the data are collinear. J Statist Comp Sim 1986;25:7591.
18 Segerstedt B. On ordinary ridge regression in generalized linear models. Commun Statist-Theory Meth 1992;21:222746.
19 Zheng T, Holford TR, Mayne ST, et al. DDE and DDT in breast adipose tissue and risk of female breast cancer. Am J Epidemiol 1999;150: 45358.[Abstract]
20 Ambrosone CB, Freudenheim JL, Graham S et al. Cytochrome P4501A1 and glutathione S-transferase (M1) genetic polymorphisms and postmenopausal breast cancer risk. Cancer Res 1995;5:348385.
21 Ishibe N, Hankinson SE, Colditz GA et al. Cigarette smoking, cytochrome P450 1A1 polymorphisms, and breast cancer risk in the Nurses' Health Study. Cancer Res 1998;58:66717.[Abstract]
22 Bailey LR, Roodi N, Verrier CS, Yee CJ, Dupont WD, Parl FF. Breast cancer and CYPIA1, BSTM1, and BSTT1 polymorphisms: evidence of a lack of association in Caucasians and African Americans. Cancer Res 1998;58:6570.[Abstract]
23 Matthews HB. Metabolism of PCBs in mammals: routes of entry, storage, and excretion. In: E'Itri FM, Kamrin MA (eds). PCBs: Human and Environmental Hazards. Boston: Butterworth Publishers, 1983, pp.20313.
24 Sipes IG, Schnellmann RG. Biotransformation of PCBs: metabolic pathways and mechanisms. In: Safe S, Hutzinger O (eds). Environmental Toxin Series 1. Berlin: Springer-Verlag, 1987, pp.97110.
25 Wolff MS, Zeleniuch-Jacquotte, A, Dubin N, Toniolo P. Risk of breast cancer and organochlorine exposure. Epidemiol Biomark 2000;9: 27177.
26
Tessari JD, Archibeque-Engle SL. Correspondence re: SD Stellman et al., Relative abundance of organochlorine pesticides and polychlorinated biphenyls in adipose tissue and serum of women in Long Island, New York. Cancer Epidemiol Biomark 1999;8:11114.
27 Nelder JA, Wedderburn RWM. Generalized linear models. J R Statist Soc B 1972;42:10942.