Reproductive factors and familial predisposition for breast cancer by age 50 years. A case-control-family study for assessing main effects and possible gene–environment interaction

Heiko Becher1, Silke Schmidt2,3 and Jenny Chang-Claude2

1 University of Heidelberg, Department of Tropical Hygiene and Public Health, Im Neuenheimer Feld 324, 69120 Heidelberg, Germany. E-mail: heiko.becher{at}urz.uni-heidelberg.de
2 German Cancer Research Center, Department of Clinical Epidemiology, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany. E-mail: j.chang-claude{at}dkfz-heidelberg.de
3 Duke University Medical Center, Center for Human Genetics, Box 3468, Durham, NC 27710, USA. E-mail: sschmidt{at}chg.mc.duke.edu


    Abstract
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
Background The effect of environmental/lifestyle factors on breast cancer risk may be modified by genetic predisposition.

Methods In a population-based case-control-family study performed in Germany including 706 cases by age 50 years, 1381 population, and 252 sister controls, we investigated main effects for environmental/lifestyle factors and genetic susceptibility and gene–environment interaction (G x E). Different surrogate measures for genetic predisposition using pedigree information were used: first-degree family history of breast or ovarian cancer; and gene carrier probability using a genetic model based on rare dominant genes. Possible G x E interaction was studied by (1) logistic regression using cases and population controls including an interaction term; (2) comparing results using sister controls and population controls; (3) case-only analysis with logistic regression and (4) a mixture logistic model.

Results Familial predisposition showed the strongest main effect and the estimated gene carrier probability gave the best fit. High parity and longer duration of breastfeeding reduced breast cancer risk significantly, a history of abortions increased risk and age at menarche showed no significant effect. We found significant G x E interaction between parity and genetic susceptibility using different surrogate measures. In women most likely to have a high genetic susceptibility, high parity was less protective. Later age at menarche was protective in women with a positive family history. No evidence for G x E interaction was found for breastfeeding and abortion.

Conclusions These findings corroborate results from other studies and provide further evidence that the magnitude of protection from parity is reduced in women most likely to have a genetic risk in spite of the limitations of using surrogate genetic measures.


Keywords Gene carrier probability, mixture logistic model, case-only design, population and sibling controls

Accepted 20 May 2002

The assessment of gene–environment interaction (G x E)has become a major focus for scientists in the field of modern epidemiology. Ideally, for estimating G x E interaction froman epidemiological case-control study, one would have data on both exposure to environmental risk factors collected via questionnaire and genotypes for a disease susceptibility gene determined from genomic DNA. However, one often encounters situations in which the known susceptibility genes are very large and complex, making it rather time- and cost-prohibitive to obtain measured genotypes for a large study population. If the genes have not yet been identified, any investigation ofG x E interaction must be based on surrogate genetic measures. Therefore, it may be of interest to explore interactive effectsof genetic and environmental risk factors by using surrogate genetic measures that are based on more readily available phenotypic information.

In the case of breast cancer, two major susceptibility genes, BRCA11 and BRCA2,2 have been identified and further predisposing genes are likely to exist.3,4 Due to differences in age at diagnosis observed both within families and between families harbouring mutations in BRCA1 and BRCA2, we were interested to investigate whether the effects of reproductive factors (parity, breastfeeding, abortion, age at menarche) on breast cancer risk differ according to the level of an individual’s genetic risk. We considered several methods for assessing possible G x E interaction by estimating an unobserved genotype given disease status and age at onset or current age of affected and unaffected relatives of the study probands, respectively.5 The surrogate genetic measures range from a simple binary variable for presence/absence of affected first-degree relatives to more refined measures that take into account the entire family history information and make use of existing knowledge about a genetic modelfor the disease under study. We were interested in examining whether the type of surrogate genetic measure used affectedthe detection of possible G x E interaction. In addition, different analytical methods have been proposed for the detection of possible G x E interaction. One approach uses different controls to search for modification of risk associated with exposure when the underlying genetic factor is unknown. Relatives are easily identifiable and may be more willing to participate in a research study. Andrieu and Goldstein6 showed that G x E interaction may manifest itself in differences of odds ratios (OR) for environmental factors using relatives of cases as controls compared with population-based controls. The OR associated with the environmental exposure using sibling controls were always higher than those obtained with population controls when joint effectsof genotype and environmental exposure on disease risk were more than multiplicative. The difference in the estimates ofthe OR was dependent on the amount of interaction between genetic and environmental factors. However, few studies have been performed using two sets of controls groups, related and unrelated controls. In the case-only approach analysis, cases with and without the susceptibility genotype are compared with respect to the prevalence of the environmental exposure. This approach has been claimed to have greater power than a case-control analysis which, however, holds only under the assumption that genotype and environmental exposure are uncorrelated.7 The different approaches have never been applied to data from a single study. Using data from a case-control-family study of premenopausal breast cancer conducted in Germany, we therefore employed different analytical methods to estimate and to compare the effect of G x E interaction derived from the surrogate genetic measures. The term G x E interaction has become common in the literature even if ‘E’ stands for factors which cannot be directly attributed to the environment. In this paper we mainly consider reproductive factors and use surrogate genetic measures, however, we use the term G x E interaction throughout the text. The meaning of the term ‘interaction’ has been a cause of confusion.8 Here, we use it in the most common statistical sense such that G x E interaction is present when the effect of genotype on disease risk depends on the level of exposure to an environmental factor, or vice versa.


    Study design
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
A population-based genetic epidemiological case-control-family study was carried out in two geographical areas in the state of Baden-Württemberg, Germany.9 Case subjects were diagnosed by 50 years of age with either in situ or invasive breast cancer between 1 January 1992 and 31 December 1995 and were living in the two study regions. They were identified through frequent monitoring of hospital admission, surgery schedule, and pathology records of all 40 hospitals in the study regions, with periodic checks made against pathology institutes serving these hospitals. They were either approached in the hospital before their first discharge or invited to participate in the study by letter signed by the attending physician if discharged. We also approached hospitals on the vicinity of these two study regions to obtain an estimate of patients that we may have missed (about 2%). Clinical and pathological characteristics of the diagnosed breast cancers were abstracted from hospital records and pathology reports were requested.

Two sets of controls, population controls and sister controls, were recruited. The population controls, matched by age and study region to the respective case, were selected randomly from a list of 5000 female residents for each study region obtained from the population registries of the study regions. Germany has a strict registration of all inhabitants of every municipality. We were therefore able to select controls from women who were inhabitants of the defined geographical areas. Women were not eligible as controls if they could not speak German, had moved out of the study region, had a previous diagnosis of primary breast cancer, were mentally handicapped or deceased. Cases were informed that the participation of a sister control is part of the study protocol. Sister controls were identified through the questionnaire and the cases were requested to obtain permission from their sister(s), if any, for their participation. If more than one sister existed, the next elder or, if not available, the sister closest in age was selected.

All subjects were asked to complete a self-administered questionnaire. Detailed information was collected on demographic and anthropometric factors, menstrual, reproductive and breastfeeding history, use of contraceptives and exogenous hormones, medical and screening history, family history of cancer, selected occupational exposures, diet, smoking history, and alcohol consumption. Pedigree information and history of cancer was obtained for four generations (grandparents, parents, maternal and paternal aunts, brothers and sisters, sons and daughters), on the year of birth, year of death, occurrence of cancer, cancer site and year of diagnosis. Information on cancer diagnoses in other family members was also requested. In addition, all subjects and parents of the cases were asked to give a blood sample. The median time between diagnosis and interview was 2 months for cases.

Data analysis is based upon 706 cases (70.2% of eligible cases), of which 6.8% had in situ carcinoma, 1381 population controls (61.2% of those contacted), and 252 sisters (58.2% of available sisters). All information on case and control exposures was truncated at the date of diagnosis for cases and date of completion of the risk factor questionnaire for controls.


    Statistical methods
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
Surrogate measures for genetic susceptibility
Two methods were used in the present paper to assess genetic susceptibility using pedigree information (Table 1Go).


View this table:
[in this window]
[in a new window]
 
Table 1 Variables used for assessing genetic susceptibility
 
Method 1 is a common measure of first degree family history, here including both breast and ovarian cancer (variable named BCOV), which was shown to have limitations.10 However, it has an appealing interpretation and has therefore often been used in the literature. In previous studies of breast cancer this variable has consistently been shown to be associated with a two- to threefold increased risk.11,12

Method 2: The estimated probability of carrying a predisposing gene is estimated from the complete pedigree information, in this case siblings, parents, parents’ siblings, and grandparents, their ages, and affection status with respect to breast or ovarian cancer (variable named PCARR). To compute carrier probabilities for the study subjects, we assumed the genetic model of Claus et al.,13 as modified by Easton et al.,14 including the ovarian cancer phenotype as in Ford et al.15 i.e. an autosomal-dominant mode of inheritance, a population frequency of mutant BRCAx alleles of 0.0033, and age-specific penetrances derived under the assumption of a normal age-at-onset distribution with genotype-specific means and a common variance. To implement this model in MLINK, we assigned family members to 21 liability classes defined by (1) affection status, (2) age at breast cancer diagnosis or at last observation, using the seven age groups <30, 30–39, 40–49, 50–59, 60–69, 70–79, >=80 years and (3) age at ovarian cancer diagnosis or at last observation, using the seven age groups. Unaffected males were assigned to the liability class of females unaffected up to age 29, approximately equivalent to being of unknown carrier status. For the case-control analysis, the phenotype of cases and controls (presence or absence of breast cancer) is assumed to be unknown when calculating carrier probabilities (see Appendix for details). For the case-only analysis (see below), the disease status of the cases is taken into account. The software program LINKAGE/MLINK16 was used for the computation of gene carrier probabilities. The children of cases and controls were not considered in the calculation since they were mostly too young to be at risk. The male relatives needed to connect female family members for the MLINK program were assumed to be censored at age 20, which is approximately equivalent to being of unknown disease status and thus being uninformative for the estimation of carrier probabilities. A principal limitation of using carrier probabilities in either a case-only or a case-control study is that they are based on the assumed genetic model for rare and highly penetrant autosomal dominant genes. Clearly, if G x E interaction for a particular environmental factor only exists for a particular common gene with low penetrance, any analysis using the above carrier probabilities is unlikely to identify such an effect.

Estimation of main effects due to reproductive risk factors
Main effects were assessed with conditional logistic regression analysis (cases versus population controls) using the statistical software package SAS (PROC PHREG). The conditioning was done on age in one-year intervals for an optimal control of a confounding effect of age.17 Model selection was performed based both on prior knowledge and statistical relevance of covariables. In the analysis with sister controls, we used a conditional logistic model in which the case and her sister formed a matched set. Age was included as an additional covariable. A formal variable selection procedure was not used. For some analyses, only parous women were included.

Estimation of main effects due to genetic susceptibility
The main effect of the different genetic susceptibility measures was assessed with a conditional logistic regression model adjusted for relevant reproductive factors as identified in the first step. The gene carrier probability was used both as continuous and categorical variables.

Assessment of gene-environment interaction
We employed several methods to assess G x E interaction with the rationale that a comparison of results may be useful for the interpretation of effect estimates.

Case-control analysis (cases versus population controls)
Using the model with the four environmental (reproductive/ lifestyle) variables identified initially as relevant risk factors, first-order interaction terms with each of the two surrogate genetic measures for genotype and environmental/lifestyle factors were included in the logistic regression model. Here, the family history variable, BCOV, was used as a binary variable, whereas the gene carrier probability, PCARR, was incorporated as a continuous variable. To assess G x E interaction on a multiplicative scale of measurement, the interaction terms were calculated by multiplying the two surrogate genetic variables, BCOV (binary), PCARR (continuous), with the environmental variables. These were included as binary (abortion), continuous (duration of breastfeeding), or as categorical variables (number of full-term pregnancies, age at menarche; Table 4Go). The number of full-term pregnancies was considered both among all subjects as well as in parous women only. Thus, a total of 10 (2 x 5) logistic regression models were estimated.


View this table:
[in this window]
[in a new window]
 
Table 4 Odds ratios (OR) for breast cancer associated with reproductive factors in a case-control-family study in Germany
 
The traditional case-control approach has the advantage that it allows all covariates (genetic or environmental) to be categorical or continuous, allowing for a very flexible analysis and it permits the estimation of both main effects and interaction effects. However, studies may require including a very large number of cases and controls in order to have sufficient power to detect G x E interaction even when measured genotypes of study probands can be obtained, especially if the genetic factor is rare.18–20

Case-control analysis (cases versus sister controls)
We compare estimates of the main effects obtained by using the two different control groups (population versus sister controls). The rationale is that a difference in these estimates would give some indication of the direction of G x E interaction.6 It was shown by these authors that, with values of interaction greater than one (positive interaction), the matched OR with sister controls is always greater than the one with population controls. For values of interaction less than one, however, the direction of the relative difference between the two OR may differ. Although theoretically attractive, the practical limitation is that not all cases have one or more sisters. The sample size for this analysis is thus much smaller, yielding wider CI for the OR estimates.

Case-only analysis
In case-only analysis, a measure of association of G x E among cases is computed as a cross-product ratio.21 If gene and environmental factor are independent in the general population, this cross-product ratio estimates the population risk ratio for G x E. Schmidt and Schaid22 showed that it is important to distinguish this effect measure from the one estimated in a case-control study. If the factors are not independent, the case-only analysis is not appropriate. Therefore, we first checked independence by correlation analysis and/or {chi}2-tests among the controls. Keeping in mind that different effect measures are estimated and assuming that the independence assumption holds, the case-only study has been shown to provide more power to detect G x E and better precision for effect estimation due to elimination of control group variability.7,23 Here, Y denotes a binary variable for genetic susceptibility and X denotes a binary variable describing a reproductive factor, the model logit P(Y = 1|X) = {alpha}0 + {alpha}1x provides an estimate for the risk ratio for G x E through exp({alpha}1) with a corresponding CI. Note that the reverse model logit P(X = 1|Y) = {alpha}0* + {alpha}1y would yield the same coefficient {alpha}1. As in the case-control analysis, we use this model based on the fixed effect assumption. For a case-only analysis, the dependent variable, which can be either the genetic or environmental factor, has to be binary in order to obtain a parameter that allows an interpretation as a relative risk. When both variables are continuous or categorical with more than two levels, one of the two must be dichotomized and then used as the dependent variable.

If, as in the present study, carrier probabilities are used to incorporate the uncertainty about the proband’s unmeasured genotype G, a case-only analysis can be carried out by using a mixture logistic model that includes measured environmental risk factors.24 Specifically, in a case-only analysis for i = 1,...,n cases (disease status D = 1), the response variable E corresponds to a binary environmental exposure and the carrier probability P(G = 1) defines the following mixture likelihood:

where


The logistic probabilities for the response variable (environmental factor) are weighted by the probabilities for each of the two possible outcomes (presence/absence) for the dependent variable (genetic factor). The parameter {alpha}1 estimates the risk ratio for G x E. Further details on maximizing this likelihood can be found in Schmidt.24


    Results
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
Table 2Go shows the age distribution of the study population and the distribution of reproductive risk factors. The distribution of the surrogate genetic measures is clearly different in cases and population controls (Table 3Go). For the case-only analysis, the disease status of the cases was taken into account in the computation of gene carrier probabilities, which range from 0.009 to 0.978 with a median of 0.083. The estimated carrier probabilities were smaller for the case-control analysis, where the affection status of study subjects was assumed to be unknown. Here, the median was 0.004 in both cases and in controls.


View this table:
[in this window]
[in a new window]
 
Table 2 Distribution of age and of reproductive risk factors among cases, population controls, and sister controls
 

View this table:
[in this window]
[in a new window]
 
Table 3 Distribution of surrogate measures of genetic susceptibility among cases and population controls
 
Main effects of genetic and environmental factors
Age at menarche did not have a significant effect on breast cancer risk (Table 4Go). We nevertheless decided to keep this recognized risk factor for breast cancer25 for further G x E interaction analysis. The variable ‘number of full-term pregnancies’ was used with four levels (0,1,2,3+) in the analysis of all women and with three levels (1,2,3+) in the analysis restricted to parous women. There was a statistically significant decrease in risk with increasing number of full-term pregnancies among parous women and also when comparing 3+ versus 1 full-term pregnancy (P = 0.01). In all women, neither number of full-term pregnancies nor parity as a binary variable (0 versus 1+ full-term pregnancies) yielded a significant effect.

All the surrogate measures for genetic susceptibility are associated with highly significant increased OR for breast cancer (Table 5Go). Among the models in which one parameter for genetic susceptibility was estimated, we found the best fit for the model in which the scored gene carrier probabilities (0: P < 0.005, 1: 0.005<= P < 0.010, 2: 0.010<= P < 0.025, 3: 0.025<= P < 0.05, 4: P >= 0.05) were used ({chi}2-value 37.07).


View this table:
[in this window]
[in a new window]
 
Table 5 Odds ratios (OR) for breast cancer associated with variables describing genetic susceptibility in a case-control-family study in Germany
 
Estimation of G ¥ E interaction
In examining the validity of the independence assumption for the genetic variable and the different environmental risk factors among the controls, we found a negative correlation between the number of full-term pregnancies and the gene carrier probability (rSpearman = –0.15, P < 0.001). This correlation remained almost unchanged when adjusted for family size (partial rSpearman = –0.12, P < 0.001). Therefore, a necessary condition for performing a case-only analysis is violated and the corresponding analyses are not presented for this variable. For all other environmental factors, no significant correlation with the surrogate genetic measures was found and therefore the case-only analysis was considered to be appropriate. Table 6Go shows the distribution of the risk factors by case-control status and by family history.


View this table:
[in this window]
[in a new window]
 
Table 6 Distribution of reproductive risk factors by case-control status and first-degree family history of breast or ovarian cancer
 
Parity/number of full-term pregnancies
The case-control analysis showed a positive interaction indicating that for genetically susceptible women, the protective effect of a higher number of full-term pregnancies is reduced (Table 7Go). This is consistent for the different methods used and is statistically significant for the analysis using gene carrier probability. The estimated interaction OR vary from 1.14 to 1.67 and the P-values from 0.01 to 0.31. As an example, consider a woman with 3+ full-term pregnancies whose estimated OR is 0.76 in comparison with nulliparous women (Table 4Go) when estimating the effect for each category separately. In the model on which the results of Table 7Go are based, the estimated OR for a woman with 3+ full-term pregnancies and no first-degree relative with breast or ovarian cancer is 0.81, which is slightly higher than the estimate obtained in Table 4Go. For women with at least one first-degree relative with breast cancer, the estimated OR for 3+ full-term pregnancies is 0.81 x 1.24 = 1.0 when using the simplest surrogate measure (variable BCOV). Results are only slightly different when using the carrier probability as surrogate measure. Thus, the protective effect of a larger number of full-term pregnancies in genetically non-susceptible women in this study appears to be reduced or even absent in women with genetic susceptibility.


View this table:
[in this window]
[in a new window]
 
Table 7 Estimates for gene–environment interaction in a case-control-family study of breast cancer in Germany
 
In the analysis with sister controls, >=3 full-term pregnancies were associated with a non-significant, slightly increased risk (OR = 1.29, 95% CI: 0.56–2.98, P = 0.55) (Table 4Go). Compared with the protective effect found in population controls (OR = 0.76, 95% CI: 0.52–1.09, P = 0.13), this result provides further support for a positive interaction of this variable with genetic susceptibility.

Breastfeeding
The interaction estimates are close to one with large P-values except for the mixture logistic model (case-only analysis) where a negative interaction term was estimated (Table 6Go). However, the CI is wide and a further interpretation of this estimate is thus not warranted. In the case-only analyses using PCARR, breastfeeding was used as the dependent binary variable, whereas in the case-only analyses using BCOV, this surrogate genetic measure was the dependent binary variable and duration of breastfeeding (in months) was incorporated as a continuous variable. The comparison of results from population controls (OR = 0.77) and sister controls (OR = 0.58) suggests a stronger protective effect when comparing with sister controls.

Abortion
The interaction estimates for this factor are mostly positive but not statistically significant. A positive interaction would mean that a history of induced abortion is associated with a higher breast cancer risk in genetically susceptible women compared to non-susceptible women. The effects of abortion as estimated from population controls and from sister controls were almost identical (OR = 1.36 and 1.34 respectively) and therefore do not provide additional information.

Age at menarche
We found consistent negative interaction estimates in all models and analyses considered, indicating a stronger protective effect of later age at menarche for women with genetic susceptibility. However, except for the model with the simple binary indicator for genetic susceptibility (P-value = 0.02), the estimates are not significant (P-values 0.15 to 0.42). The CI from the mixture logistic model is much wider than that from the other models (see Discussion). There was no appreciable difference in the results obtained using population controls (OR = 0.93) compared with those using sister controls (OR = 1.03) (Table 4Go).


    Discussion
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
Number of full-term pregnancies, induced abortions, duration of breastfeeding, and age at menarche were considered as independent risk factors for breast cancer in this analysis and, except for age at menarche, were found to have significant effects on breast cancer risk. These main effects were observed in the first study of predominantly premenopausal women in Germany and confirm previous findings.25 Family history was a strong predictor of breast cancer risk regardless of the type of surrogate measure employed. The use of a surrogate variable based on a genetic model developed for rare dominant breast cancer genes14,15 yielded the best fit of the model. However, when the carrier probability was dichotomized into ‘high’ and ‘low’ after selecting an optimal cut-off point according to the goodness-of-fit statistics (difference of deviances of the models with and without the surrogate measures), it was only slightly superior to the simple binary indicator ‘presence/absence of breast or ovarian cancer in first-degree relatives’. The {chi}2-value for the fit of the model with gene carrier probability dichotomized at the cut-off point of 0.007 was 33.3 compared with 31.4 with the binary family history indicator. Thus, it appears that any binary surrogate measure for genetic susceptibility is unlikely to extract much more information than the simple measure. However, using the estimated carrier probability as a categorical or continuous variable leads to a better model fit and seems more suitable for assessing G x E interaction.

We found a consistent and, for some models, significant positive interaction of genetic susceptibility and number of full-term pregnancies. The fact that this effect was observed in different models and with several surrogate genetic measures supports a reduced protective effect of high parity among genetically susceptible women. Several other authors have considered a possible interaction between a family history of breast cancer and parity and observed a lack of protection from multiple births in women with a family history of breast cancer.26–28 Other studies, however, observed no differential effect of parity by family history of breast cancer29,30 or even a stronger protective effect of parity for those with a family history.31,32 Note that the previous studies rarely reported results by menopausal status. In a recent combined analysis of seven breast cancer case-control studies, Andrieu et al.33second-degree relative) increased slightly for women with high parity compared with women with low parity, but only in the older age group (>40 years). This finding is equivalent to our observation that high parity appears to be less protective for genetically susceptible women, although our sample size is too small to consider potentially different effects in older and younger women. The reduction in protection from high parity appears to be more consistent in studies with a more stringent definition of genetic susceptibility. In a segregation analysis of breast cancer in 288 pedigrees selected through breast cancer probands from two French hospitals in which the mean age of diagnosis was 51 years, a significant interaction was found between parity and the dominant gene effect, and the decreased risk of breast cancer associated with high parity among non-susceptible women was not discernible among susceptible women.34 A recent study of 236 age-matched case-control pairs of women who carry deleterious mutations in the BRCA1 and BRCA2 genes and are therefore genetically predisposed to breast cancer also found an increased risk for breast cancer with increasing number of births.35

We did not observe an effect of age at first birth on breast cancer risk. However, the age at first life birth is typically highly correlated with number of life births (in our study, rSpearman = –0.21, P < 0.001). Gail and Greene36 noted that in the so-called Gail model37 which was developed from case-control data of the Breast Cancer Detection Demonstration Project (BCDDP), a negative interaction was observed between age at first life birth and number of affected first-degree relatives. This interaction was also found by Bondy et al.38 who investigated women participating in the American Cancer Society 1987 Texas Breast Screening Project. Because of the negative correlation between number of births and age at first birth, these findings are also in agreement with ours.

There was no indication of an interaction with breastfeeding in this analysis. In a previous analysis, in which a first-degree family history of disease included only breast cancer, the association between duration of breastfeeding and breast cancer risk was modified slightly by family history of breast cancer (P-value for interaction = 0.08).9 The protective effect of breastfeeding was stronger in those with a family history of breast cancer. Few studies have examined variation in the effect of breastfeeding by family history of breast cancer. Some evidence for a stronger protective effect of breastfeeding (ever/never) in the presence of family history was also previously reported by two case-control studies where, however, no relationship with duration of breastfeeding was found.32,39

Our data show a positive (albeit non-significant) interaction of genetic factors and induced abortions, potentially implying an additional increase in breast cancer risk for genetically susceptible women. Such an interaction has been previously reported in a study where the familial risk was found to be highest among women who had had an abortion before first childbirth.40 The interaction was modified by the time at which abortion occurred in relation to the first birth and the authors suggested that the observation may be due to an effect of abortion itself rather than predisposition to spontaneous abortion.

There has not been much evidence of variation by family history in the risk for breast cancer associated with age at menarche. Several previous studies which investigated such variation indicated that the risk associated with an early age at menarche tends to further increase rather than decrease for women with a family history of breast cancer.27,30,33 The slightly negative interaction with age at menarche observed in this study would also suggest a further decreased breast cancer risk for women with late age at first menarche who are genetically susceptible.

We chose to investigate G x E interaction using two different analytical methods with the rationale that an agreement of results should support the existence (or non-existence) of G x E interaction. The results from the different analytical approaches presented here cannot be combined to give a ‘best’ estimate. It appears that the comparison of OR derived from population controls and those derived from sister controls provide only limited additional information. Interaction with a rare genetic factor will produce only a small difference between the environmental factor OR and is thus more difficult to determine.6 Surely both the magnitude and statistical variation of the effect estimates are highly influenced by the underlying sample sizes for each comparison. Thus, in considering the use of different control groups, one has to bear in mind the practical limitations of a restricted pool of sisters as controls.

Given known main effects, the case-only approach is an appealing approach. However, its limitations become apparent when both the surrogate genetic and the environmental variable are continuous or categorical with more than two levels. If both variables are continuous, it may be possible to use a linear model or to calculate a correlation coefficient between the two variables. Further research is warranted for investigating whether the corresponding regression or correlation coefficient can provide a meaningful interpretation with regard to G x E interaction. We calculated the Spearman correlation coefficient between duration of breastfeeding and gene carrier probability among cases (for parous women only) and did not find any correlation (r = 0.02). This is in agreement with the results on G x E interaction for this variable from the other analyses.

The more common approaches for investigating G x E interaction with a fixed-effect model do not adjust for misclassification of the true underlying genotype. The mixed logistic model, on the other hand, takes into account that the true genotype can only be inferred probabilistically and therefore provides at least some adjustment for misclassification. The price to be paid for this adjustment is the wider CI for the effect estimates, which reflects the greater degree of uncertainty about the true level of genetic susceptibility. In addition, the use of carrier probabilities as a surrogate genetic measure, whether for a case-only or a case-control analysis, assumes that the underlying genetic model (disease allele frequency, mode of inheritance, and age-specific penetrance values) is correct, which may or may not be appropriate for the study population at hand. To take into account the uncertainty about the true genetic model parameters as an additional source of variability, the computation of carrier probabilities could be repeated with the upper and lower confidence limits of these model parameters, which would obviously lead to even wider confidence limits for the G x E interaction estimate.

The heterogeneity of results for G x E interaction obtained by various studies must be attributed to (1) the limitations imposed by the surrogate measures of genetic susceptibility and (2) the need for larger sample sizes to assess interactions in general. If surrogate measures for genetic susceptibility are the only available information, any distinction between potentially different interaction effects that an environmental factor may have with a specific gene is impossible. It is conceivable that, if a factor has a positive interaction with one gene, and a negative interaction with another gene, the overall estimated effect would essentially be reduced to zero if no distinction between genes is made. On the other hand, G x E interactions detected using surrogate measures of genetic susceptibility point to environmental exposures for which further investigation may be warranted in studies with known genetic susceptibility. Studies of clearly defined sub-populations, such as a prospective cohort study among carriers of mutations in BRCA1 and BRCA2 or other breast cancer susceptibility genes, or case-control studies with measured genotypes, are thus required to confirm the findings of our analyses and previous studies assessing G x E interaction in breast cancer.


    Dedication
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
This article is dedicated to Harald zur Hausen on the occasion of his retirement as head of the German Cancer Research Centre (Deutsches Krebsforschungszentrum) in Heidelberg with gratitude and appreciation.


KEY MESSAGES

  • A population-based case-control-family study from Germany with 706 cases by age 50 years, 1381 population, and 252 sister controls to investigate main effects and possible gene–environment interaction (G x E) for environmental/lifestyle factors and genetic susceptibility.
  • First-degree family history of breast or ovarian cancer and gene carrier probability using a interaction genetic model based on rare dominant genes were used as surrogate measures.
  • Different approaches to assess possible G x E interaction were used and compared.
  • The strongest main effect was found for familial predisposition taken as the estimated gene carrier probability.
  • High parity and longer duration of breastfeeding reduced breast cancer risk significantly.
  • In women most likely to have a genetic risk, parity does not appear to give the magnitude of protection it does in the general population.
  • We observed a stronger protective effect of later age at menarche for women with a positive family history based on both surrogate measures.
  • For breastfeeding and abortion, no evidence for G x E interaction was found.

 


    Appendix
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
The use of the gene carrier probability as a covariable for genetic susceptibility in case-control and case-only analysis

Assume one wants to estimate within the case-control study the odds ratio for breast cancer (Y) given genetic susceptibility (S). Let p0 and p* be gene carriers that serve as the surrogate measure for genetic susceptibility S. Then it follows

as resulting from the logistic model

The independent variables S and X are assumed to have an effect on the disease variable Y, but the outcome Y itself should not influence the values of S and X. Therefore, when using the gene carrier probability as a surrogate variable for genetic susceptibility, this probability must not take the case/control status into account, just as the other surrogate measures are based only on disease history of relatives other than the proband. The gene carrier probability was therefore calculated based on an unknown disease status of cases and controls given the attained age. For the case-only analysis we used the gene carrier probability for the cases given disease status. This is justified because we simply computed a measure of association within a homogeneous (with respect to disease status) group of subjects. Under the independence assumption for S and X, this measure of association can be interpreted as an interaction risk ratio, as shown in Schmidt and Schaid.22


    Acknowledgments
 
We are grateful to the many gynecologists and oncologists in the 40 clinics of the study regions ‘Rhein-Neckar-Odenwald’ and ‘Freiburg’ for allowing us to contact their patients; to Ursula Eilber for competent data co-ordination and management, Silke Schieber, Andrea Busche-Bässler, Regina Hübner, Ruth Schäuble, Heike Wiedensohler, and Renate Birr for data collection; and to the many women and their relatives who participated in this research project. This work was supported by the Deutsche Krebshilfe e.V.


    References
 Top
 Abstract
 Study design
 Statistical methods
 Results
 Discussion
 Dedication
 Appendix
 References
 
1 Miki Y, Swensen J, Shattuck-Eidens D et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 1994; 266:66–71.[ISI][Medline]

2 Wooster R, Bignell G, Lancaster J et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 1995;378:789–92.[CrossRef][ISI][Medline]

3 Trench G, Spurdle AB, Gatei M et al. Dominant negative ATM mutations in breast cancer families. J Natl Cancer Inst 2002;94(3):205–15.[Abstract/Free Full Text]

4 Antoniou AC, Pharoah PD, McMullan G, Day NE, Ponder BA, Easton D. Evidence for further breast cancer susceptibility genes in addition to BRCA1 and BRCA2 in a population-based study. Genet Epidemiol 2001;21:1–18.[CrossRef][ISI][Medline]

5 Andrieu V, Goldstein AM. Epidemiologic and genetic approaches in the study of gene-environment interaction: an overview of available methods. Epidemiol Rev 1998;20:137–47.[ISI][Medline]

6 Andrieu V, Goldstein AM. Use of relative of cases as controls to identify risk factors when an interaction between environmental and genetic factors exists. Int J Epidemiol 1996;25:649–57.[Abstract]

7 Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med 1994;13:153–62.[ISI][Medline]

8 Clayton D, McKeigue PM. Epidemiologic methods for studying genes and environmental factors in complex diseases. Lancet 2001;358: 1356–60.[CrossRef][ISI][Medline]

9 Chang-Claude J, Eby N, Kiechle M, Bastert G, Becher H. Breastfeeding and breast cancer risk by age 50 among women in Germany. Cancer Causes Control 2000;11:687–95.[CrossRef][ISI][Medline]

10 Khoury MJ, Flanders WD. Bias in using family history as a risk factor in case-control studies of disease. Epidemiology 1995;6:511–19.[ISI][Medline]

11 Eby N, Chang-Claude J, Bishop DT. Familial risk and genetic susceptibility for breast cancer. Cancer Causes Control 1994;5:458–70.[ISI][Medline]

12 Pharoah PD, Day NE, Duffy S, Easton DF, Ponder BA. Family history and the risk of breast cancer: a systematic review and meta-analysis. Int J Cancer 1997;71:800–09.[CrossRef][ISI][Medline]

13 Claus EB, Risch N, Thompson WD. Genetic analysis of breast cancer in the Cancer and Steroid Hormone Study. Am J Hum Genet 1991;48: 232–42.[ISI][Medline]

14 Easton DF, Bishop DT, Ford D, Crockford GP. Genetic linkage analysis in familial breast and ovarian cancer: Results from 214 families. Am J Hum Genet 1993;52:678–701.[ISI][Medline]

15 Ford D, Easton DF, Stratton M et al. Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. The Breast Cancer Linkage Consortium. Am J Hum Genet 1998;62:676–89.[CrossRef][ISI][Medline]

16 Lathrop GM, Lalouel JM, Julier C, Julier C, Ott J. Strategies for multilocus linkage analysis in humans. Proc Nat Acad Sci USA 1984; 81:3443–46.[Abstract]

17 Neuhäuser M, Becher H. Improved odds ratio estimation by post-hoc stratification of case-control data. Stat Med 1997;16:993–1004.[CrossRef][ISI][Medline]

18 Hwang SJ, Beaty TH, Liang KY, Coresh J, Khoury MJ. Minimum sample size estimation to detect gene-environment interaction in case-control designs. Am J Epidemiol 1994;140:1029–37.[Abstract]

19 Goldstein AM, Falk RT, Korczak JF, Lubin JH. Detecting gene-environment interactions using a case-control design. Genet Epidemiol 1997;14:1085–89.[CrossRef][ISI][Medline]

20 Foppa I, Spiegelman D. Power and sample size calculations for case-control studies of gene-environment interactions with a polytomous exposure variable. Am J Epidemiol 1997;146:596–604.[Abstract]

21 Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls. Am J Epidemiol 1996;144:207–13.[Abstract]

22 Schmidt S, Schaid DJ. Potential misinterpretation of the case-only study to assess gene-environment interaction. Am J Epidemiol 1999; 150:878–85.[Abstract]

23 Yang Q, Khoury MJ. Evolving methods in genetic epidemiology. III Gene-environment interaction in epidemiologic research. Epidemiol Rev 1997;19:33–43.[ISI][Medline]

24 Schmidt S. Statistical Methods to Assess Gene-Environment Interactions with Surrogate Genetic Measures. PhD Thesis, Department of Statistics, University of Dortmund, Germany 1999.

25 Kelsey JL, Bernstein L. Epidemiology and prevention of breast cancer. Ann Rev Public Health 1996;17:47–67.[CrossRef][ISI][Medline]

26 Negri E, La Vecchia C, Bruzzi P et al. Risk factors for breast cancer: pooled results from three Italian case-control studies. Am J Epidemiol 1988;128:1207–15.[Abstract]

27 Parazzini F, La Vecchia C, Negri E, Franceschi S, Bocciolone L. Menstrual and reproductive factors and breast cancer in women with family history of the disease. Int J Cancer 1992;51:677–81.[ISI][Medline]

28 Colditz GA, Rosner BA, Speizer FE. Risk factors for breast cancer according to family history of breast cancer. J Natl Cancer Inst 1996;88: 365–71.[Abstract/Free Full Text]

29 Sellers TA, Kushi LH, Potter JD et al. Effect of family history, body-fat distribution, and reproductive factors on the risk of postmenopausal breast cancer. N Engl J Med 1992;326:1323–29.[Abstract]

30 Bain C, Speizer FE, Rosner B, Belanger C, Hennekens CH. Family history of breast cancer as a risk indicator for the disease. Am J Epidemiol 1980;111:301–08.[Abstract]

31 Colditz GA, Willett WC, Hunter DJ et al. Family history, age, and risk of breast cancer. JAMA 1993;270:338–43.[Abstract]

32 Egan KM, Stampfer MJ, Rosner BA et al. Risk factors for breast cancer in women with a breast cancer family history. Cancer Epidemiol Biomarkers Prev 1998;17:359–64.

33 Andrieu N, Prevost T, Rohan TE et al. Variation in the interaction between familial and reproductive factors on the risk of breast cancer according to age, menopausal status, and degree of familiality. Int J Epidemiol 2000;29:214–23.[Abstract/Free Full Text]

34 Andrieu N, Demenais F. Interactions between genetic and reproductive factors in breast cancer risk in a French family sample. Am J Hum Genet 1997;61:678–90.[ISI][Medline]

35 Jernström H, Lerman C, Ghadirian P et al. Pregnancy and risk of early breast cancer in carriers of BRCA1 and BRCA2. Lancet 1999;354: 1846–50.[CrossRef][ISI][Medline]

36 Gail MH, Greene MH. Gail model and breast cancer. Lancet 2000; 355:1017.

37 Gail MH, Brinton LA, Byar DP et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989;81:1879–86.[Abstract]

38 Bondy ML, Lustbader ED, Halabi S, Ross E, Vogel VG. Validation of a breast cancer risk assessment model in women with a positive family history. J Natl Cancer Inst 1994;86:620–25.[Abstract]

39 Brinton LA, Potischman NA, Swanson CY et al. Breastfeeding and breast cancer risk. Cancer Causes Control 1995;6:199–208.[CrossRef][ISI][Medline]

40 Andrieu N, Duffy SW, Rohan TE et al. Familial risk, abortion and interactive effect on the risk of breast cancer—a combined analysis of six case-control studies. Br J Cancer 1995;72:744–51.[ISI][Medline]