The University of Melbourne, Centre for Genetic Epidemiology, 723 Swanston Street, Carlton, Victoria 3053, Australia.
The Human Genome Project and its associated technologies have opened up many new opportunities for epidemiology, one being the ability to test if the effects of risk factors depend on the individuals genotype. As an early foray into this area, in this issue of the International Journal of Epidemiology Becher et al.1 report findings on potential differences in the effects of some breast cancer risk factors according to the likely genetic predisposition of the woman at risk, based on her family history. The findings are interesting in generating hypotheses, and are appropriately expressed with an appreciation of limitations in statistical power due to the indirect measures of genetic susceptibility, especially when using sister controls. There are a number of important novel and interesting features in this work, not the least being the use of a variation on the case-control-family design.
We first used this expression to describe a study in which the subjects consisted of population-based cases, population-based controls, and relatives of both cases and controls, all studied alike using the same questionnaires.2,3 The motivation behind extending case-control studies to relatives goes back a long way; see references in (2) to Woolf in 1955 and Clemmensen in 1965, and the Minnesota Breast Cancer Family Study of 19441952.4 These designs contrast with what might be termed case-control family designs, which rely solely on reports from the cases and controls on the disease status and other characteristics of their relatives, who are usually only the first-degree relatives and the information on them is rarely validated. To discriminate from our case-control-family design we have used two dashes to highlight the comparable involvement of cases, controls, and relatives, but have had a battle to maintain them both against the persistent attempts of sub-editors and typesetters to drop the second dash!
The current study starts with a classic case-control design. Although cases have been ascertained through hospitals, care has been taken to ensure this represents population-based sampling by using the local strict registration of all inhabitants of every municipality.5 These population registries also make it possible to approach a population-based sample of controls living in the same geographical region. In extending to family members the geographical restriction has been dropped, but sampling has been limited to a maximum of one sister per case only, being preferably the next elder or the sister closest in age. Living parents have been approached to donate a blood sample. Clearly more extensive sampling of relatives is possible, but there are major resource issues especially if biospecimen samples are to be sought.3
Although most analyses presented used risk factor data from only cases and controls, this was supplemented by the family breast cancer history data covering first- and second-degree relatives. By obtaining the latter information from multiple sources, the family history of the proband can be extended beyond first-degree relatives and hopefully has increased veracity.
Some analyses used sisters as controls, which attempts to overcome potential problems when using unrelated controls from confounding due to differing ethnic backgrounds, or population stratification as it is referred to in the genetics literature.6 This approach raises many interesting issues, such as deciding what is an appropriate control set.3 Becher et al.1 tried to match each case to a sister control. Others use only unaffected sisters who have lived at least as long as the case, and try to truncate exposure measures to the same reference age. This has consequences for statistical power because a substantial proportion of cases have no matched control sister. Another approach is to think of sisters as a group of controls frequency-matched for age, and select sisters in the same reference age range as cases. The influence on family clustering on statistical inference can be allowed for in the analyses using, for example, robust estimators of variance. Sensitivity analyses can be conducted by excluding case-control sibling pairs with large age differences. Other control relatives, such as cousins, can also be considered so as to increase efficiency, but much remains to be learnt about the best way to proceed in practice.3,6
These designs can be used to make inference about genetic variants if they are measured using the family biospecimen samples. There are a variety of methods,6,7 and selection of the more appropriate and efficient designs will depend on the allele frequencies and suspected modes of inheritance.
Becher et al.1 have made inference about gene effects solely on the basis of family history. This raises the question: just which gene(s) are being represented by the different surrogate measures of genetic susceptibility? It has become apparent that family history of female breast cancer is not a good predictor of genetic risk.8 For example, having an affected first-degree relative only increases the probability that a women has inherited a mutation in BRCA1 or BRCA2 by about three- to fourfold. Given the rarity of these mutations, even among cases, their BCOV, a binary measure of the presence of a first-degree family history of breast or ovarian cancer, will for the most part not represent the effects of mutations in these genes. PCARR, an estimate of the possibility of carrying a dominantly inherited high risk of breast cancer based on the breast and ovarian cancer histories of the mother, sisters, aunts, and grandmothers, is based on a major gene model based on a single mode of inheritance, and from modelling of unvalidated nuclear family data. Nevertheless it is more likely to be detecting the effects of high-risk alleles, although this does not necessarily mean they are all mutations in the currently known genes BRCA1 and BRCA2.
The complete genetic architecture of female breast cancer is likely to involve many variants in multiple genes, and the effects of these genetic variants may differ substantially from one another, and do so according to the womans genotype and environment. I do not like the colloquial terms genegene interactions or geneenvironment interactions that are open to multiple interpretations.9 Instead I prefer to think in terms of genetic and environmental modifiers of risk in women with a particular genetic or environmental risk factor. For example, for a woman who has a germline mutation in BRCA1 or BRCA2 what matters is knowing if there are specific lifetyle-related modifiers of her risk. It is irrelevant to her whether the effect sizes differ (statistically) from those of women who do not have a mutation. As genotyping becomes cheaper and more is learnt about the genes controlling breast cancer predisposition pathways, case-control-family studies will provide powerful means for determining both genetic and environmental modifiers of inherited risk.3,7 Having genotype information on parents adds substantially to the information that can be extracted from family designs.
In conclusion, the combined effects of the risk factorscorrelated between relativesthat cause familial aggregation of a disease on a population-basis must be an order of magnitude greater than the average increased risk to first-degree relatives of affected individuals.8 For breast cancer this means a 20-fold or more greater risk in women in the upper quartile of familial (and possibly mostly genetic) risk compared with those in the lower quartile. The breast cancer susceptibility genes BRCA1 and BRCA2 appear to explain only a small proportion of this risk gradient, so uncovering the familial risks and understanding how they interact (in the biological sense) with the classic environmental risk factors should lead to a major increase in the understanding of the causes of this disease.
However, the order of magnitude of extra information is unlikely without conducting innovative genetic epidemiology studies using, for example, a case-control-family design, which involve an order of magnitude more subjects and resources. Furthermore, given that obtaining valid population-based samples of controls is becoming increasingly difficult, family designs may be the future of epidemiology, not just genetic epidemiology.
Efficient sampling and optimal analytical methods should now be an issue of major interest, using both theory and practical experience. I recommend that epidemiologists read carefully the article by Becher et al.1 in this issue, and the references here, and appraise themselves of some of the new ideas that will allow combining the emerging fields of molecular epidemiology and genetic epidemiology with traditional questionnaire epidemiology.
![]() |
References |
---|
![]() ![]() |
---|
2 Hopper JL, Giles GG, McCredie MRE, Boyle P. Background, rationale and protocol for a case-control-family study of breast cancer. The Breast 1994;3:7986.[CrossRef]
3 Hopper JL, Chenevix-Trench G, Jolley DJ et al. Design and analysis issues in a population-based, case-control-family study of the genetic epidemiology of breast cancer and the Co-operative Family Registry for Breast Cancer Studies (CFRBCS). J Natl Cancer Inst Mongr 1999; 26:95100.[Medline]
4 Anderson VE, Goodman HO, Reed SC. Variables related to human breast cancer. Minneapolis: University of Minnesota Press, 1958.
5 Chang-Claude J, Eby N, Kiechle M, Bastert G, Becher H. Breastfeeding and breast cancer risk by age 50 among women in Germany. Cancer Causes Control 2000;11:68795.[CrossRef][ISI][Medline]
6 Thomas DC, Witte JS. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomark Prev 2002;11:50512.
7 Cui J, Spurdle AB, Southey MC et al. Regressive logistic and proportional hazards disease models for within-family analyses of measured genotypes, with application to a CYP17 polymorphism and breast cancer. Genet Epidemiol 2003;24: in press.
8 Hopper J. Genetic epidemiology of female breast cancer. Semin Cancer Biol 2001;11:36774.[CrossRef][ISI][Medline]
9 Clayton D, McKeigue PM. Epidemiologic methods for studying genes and environmental factors in complex diseases. Lancet 2001;358: 135660.[CrossRef][ISI][Medline]