Correspondence to: Colin B. Begg, Ph.D., Memorial Sloan-Kettering Cancer Center, 1275 York Ave., New York, NY 10021 (e-mail: beggc{at}mskcc.org).
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Almost all studies to date have used the tools of genetic epidemiology to estimate the penetrance of BRCA1 and BRCA2 mutations. Penetrance is derived from the degree of familial aggregation of cancerthat is, the extent to which cancers cluster within families. These data have been used to determine age-specific risk. Early studies involved families with multiple occurrences of breast cancer that had been used to establish the linkage with BRCA1 and BRCA2 in the first place. The Breast Cancer Linkage Consortium has assembled many such families, and these data have led to penetrance estimates ranging from 71% to 85% by 70 years of age (58). However, ascertainment of these families is subject to ill-defined selection effects leading to serious doubts about the validity of the estimates (9,10).
Concerns about this issue have led other investigators to pursue the ascertainment of carrier families that are unselected on the basis of family history of cancer. An influential study was conducted in the Washington, DC, metropolitan area, in which the investigators advertised for volunteers who were self-identified as Jewish. A total of 5318 volunteers provided blood for genetic analysis, and 120 carriers were identified who had one (or more) of the three breast cancer-associated mutations common in Ashkenazi Jews (185delAG and 5382insC in BRCA1 and 6174delT in BRCA2). Examination of the occurrence of breast cancer in the relatives of these carriers led to a combined penetrance estimate of 56%, a result substantially lower than the earlier estimates from cancer families (3). Since then, several studies have been published in which data from population-based casecontrol studies or consecutive series of hospital-based case patients with incident cancers have been used to identify carriers unselected on the basis of family history of cancer. These studies have exclusively used probands who have been diagnosed with cancer and have led to penetrance estimates that are considerably lower than the earlier estimates from families with multiple cases of breast cancer.
Case probands are used in these population-based studies for pragmatic reasons. [Throughout this article, the term proband will be used to mean the individual used to identify the family for study. Thus, the proband need not have been affected with the disease, in contrast to conventional usage of this term (11): a case proband is a proband who has been diagnosed with breast cancer, and a control proband is a proband who is disease free.] Because mutations in the BRCA1 and BRCA2 genes occur in a small percentage of the population at risk for breast cancer, genotyping of population-based control subjects will identify carriers very infrequently. By contrast, carriers are more frequent among case patients with incident breast cancer. Thus, the strategy of using case patients to identify probands with and without mutations and then using the relatives of these probands to calculate penetrance is appealing.
However, little attention has been paid to the methodologic implications of using case probands rather than control probands in studies of penetrance, and advice to epidemiologists on this issue has been vague. Wacholder et al. (12), although stating the importance of the fact that the probands must be representative of the population at risk, nonetheless allow for the use of a "series of patients with tissue available for genotyping . . ." for convenience, acknowledging that this approach may lead to the overestimation of penetrance because of "overrepresentation of high-risk families." Hopper et al. (13), in an exposition of principles of genetic epidemiology study designs, advocate collecting family history data from population-based control subjects but do not articulate the role of control subjects in the estimation of penetrance.
Unfortunately, case probands are fundamentally different from control probands, regardless of carrier status, because of size-biased sampling. That is, all risk factors for breast cancer are over-represented in incident cases of breast cancer, and so a sample of case probands who are mutation carriers will also have a higher frequency of all other risk factors than control probands who are mutation carriers, unless the mutation is the sole determining cause of cancer in all mutation carriers. The purpose of this article is to draw attention to this problem and to review the literature on the penetrance of breast cancer in BRCA1 and BRCA2 carriers so that the potential for bias in published estimates can be evaluated.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A fundamental assumption in both of these methods is that the risks are presumed to be no different for family members of a carrier who is a case patient than for family members of a carrier who is a control subject. Risk is thus assumed to depend solely on the presence of the mutation, an assumption also used by Parmigiani et al. (15) in their algorithm to predict mutational status on the basis of a family history of breast and ovarian cancer. Indeed, in their primary analysis of data from the Washington Ashkenazi Study, Wacholder et al. (12) combined the data from their probands regardless of whether they were patients with (or survivors of) breast and/or ovarian cancer (27 mutation carriers) or subjects at risk of breast cancerthat is, those who did not have the disease (62 mutation carriers). Subsequently, investigators studying the penetrance of BRCA1 or BRCA2 mutations have abandoned control probands entirely and used only data from case probands (1621).
Penetrance estimates are meaningful only for individuals who are at risk, i.e., women who do not have breast cancer. Thus, the population at risk is the population of female carriers who do not (yet) have breast cancer. However, the following analysis shows that the distribution of risks in the population at risk is fundamentally different (i.e., lower risks in general) than the distribution of risks in incident cases. Let r denote the risk of a randomly selected carrier who is at risk in the chosen population, and consider the possibility that there is heterogeneity in the risks for different carriers. For simplicity, consider risk to be a single measure for each individual, even though risk actually changes as the individual ages. Let p(r) be the probability density of the risks in carriers, with the mean risk denoted by µ. Then, µ is the penetrance that is being estimated. In an ideal, though impractical, study design, one would identify a large number of carriers in the population and follow them over time to observe incident cancers and to estimate this (average) penetrance directly.
Consider now the distribution of risks in carriers identified from an incident series of cases. The risk distribution of these carriers would not be p(r) but, instead, would be size-biased, in the terminology of Patil and Rao (22). That is, individuals with higher risks are more likely to be sampled in proportion to the risks themselves. Specifically, if the probability density of risks among these carriers is q(r), then
![]() | [1] |
Let us denote the mean risk of this distribution by µc, where the subscript c denotes the use of case probands. It can be shown that
![]() | [2] |
where v2 is the variance of the risks in the population. Thus, the mean risk in carriers identified from case probands is greater than the mean risk in carriers identified from control probands, and equation 2 shows that the square of the coefficient of variation determines the degree by which the mean risk (penetrance) is inflated. That is, the greater the variation in risk among carriers in the population, the greater the bias.
It is important to clarify that the risk distributions correspond to the inherent risks in the probands themselves. Because penetrance estimates are based on cancer incidence in relatives of probands, size bias occurs because of the impact of the additional risk factors that aggregate within families, e.g., genetic variations other than the gene under investigation (e.g., BRCA1 or BRCA2) or shared environmental risk factors. Furthermore, when we apply logic similar to that of Wacholder et al. (12), the upward bias in first-degree relatives of case probands who are carriers will be approximately half that of the probands themselves, as defined by equation 2.
The size-biased sampling paradigm characterized by equation 1 is fundamental to the casecontrol methodology that has formed the basis of cancer epidemiology research for approximately 50 years, since the work of Cornfield (23). That is, all cancer risk factors (e.g., smoking history, reproductive history, or others) are size-biased in a series of population-based incident case subjects in exactly the manner of equation 1
. For example, let us consider any binary exposure with a prevalence of p and relative risk of
. The exposed and nonexposed case subjects will be sampled in proportion to the product of the (relative) risks and the population prevalences, i.e., in the proportions p
and (1 p), respectively. Thus, the prevalence of the exposure among case subjects is q = p
/(1 p + p
). The odds ratio, calculated from the cross-product of the casecontrol status and the exposure status, is thus q(1 p)/p(1 q) =
. This paradigm allows us to estimate the relative risk associated with any candidate risk factor but does not permit us to directly estimate absolute risks without the use of extraneous data on incidence to anchor the estimation (e.g., from cancer incidence registries). In other words, in identifying an incident case subject, we identify an individual who has been subject to a selection mechanism that affects all factors that influence risk. Thus, incident case subjects can be used for estimating absolute risks only if appropriate adjustments are made for the influences of all risk factors, both known and unknown. This concept has related implications for the interpretation of data on the incidence rates of second primary cancers, an issue that has been examined in detail for melanoma (24).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Another indication of the likely degree of bias when using family data in case probands to estimate penetrance comes from the observations that there is far more familial aggregation of breast cancer than can be explained solely by mutations in BRCA1 and BRCA2. Family members of case probands routinely have much higher risks than the family members of control probands, supporting the basic thesis of equation 1that all factors that influence risk are preferentially sampled in case patients in proportion to their effects on risk. For example, Claus et al. (26), using data from the Cancer and Steroid Hormone Study, identified those relatives of case probands who were very likely to not carry BRCA mutations (91.7% of case probands and 94.9% of control probands) and showed that the familial risk in such women was substantially higher than that in relatives of the control probands. In addition, in one of the most comprehensive studies using linked data from twin registries and cancer registries in Scandinavia, Lichtenstein et al. (27) showed dramatically increased risks for cancer in twins of case patients compared with that in twins of control subjects for every cancer site with sufficient data for the study. For female breast cancer, the relative risk for breast cancer in monozygotic twins was found to be 5.2 (95% confidence interval [CI] = 3.7 to 7.4), and the relative risk for breast cancer in dizygotic twins was 2.8 (95% CI = 2.1 to 3.8), results that may disguise a strong negative association with age at diagnosis (28).
In the studies by Lichtenstein et al. (27), relatives of case patients were compared with relatives of control subjects, without knowledge of individual carrier status or any other risk factors. The contributions of any risk factor to these estimated familial relative risks are greatly attenuated compared with the fundamental relative risks induced by the risk factors (29). Thus, these familial relative risks represent the presence of a substantial genetic component to risk variation. Hopper and Carlin (30) have studied this phenomenon in detail by use of a normally distributed risk factor that is classified into quartiles of risk, showing that a genetic factor that is associated with a relative risk of 20 when the upper and lower quartiles are compared will lead to a familial relative risk of only 3.5 in monozygotic twins and 1.9 in dizygotic twins. In fact, for a known binary risk factor, the contribution to the familial relative risk in monozygotic twins can be calculated by the expression (1 p + p2)/(1 p + p
)2, where p is the carrier prevalence and
is the relative risk (28). For dizygotic twins or their first-degree relatives, the contribution of a binary risk factor to the relative risk should be approximately half that calculated by this expression. The carrier prevalence of BRCA1 and BRCA2 mutations is believed to be in the range of 0.1%0.4% in outbred Western populations, such as the Scandinavian population studied by Lichtenstein et al. (27). Peto et al. (31) estimated the prevalence in Britain at just over 0.1%. Even if the relative risk were in the range of 1020, the contribution of BRCA1 and BRCA2 mutations to the relative risk for monozygotic twins would be, at most, in the range of 1.302.24 (or in the range of 1.151.62 for dizygotic twins or their first-degree relatives). Thus, the bulk of the genetic variation in the risk of breast cancer remains to be explained, which is a conclusion also reached by other investigators (31,32). The presence of additional unknown genetic variants that substantially influence risk argues against the possibility that the risk of breast cancer shows little or no variation among carriers of BRCA1 and BRCA2 mutations. As demonstrated by equation 2
, however, the presence of such variation inevitably leads to bias in the estimation of penetrance when case probands are used.
The theory presented in this article indicates that control probands are an especially precious resource for studies of strong genetic risk factors, because all of the highly penetrant cancer gene mutations identified to date are rare in the population. In the absence of control probands who are mutation carriers, kincohort approaches that use case probands are a pragmatic strategy, but methodologic work on statistical techniques for correcting the bias caused by size-biased sampling is needed. Chatterjee and Wacholder (33) have developed an analytic method that recognizes and endeavors to estimate the residual intrafamily correlation that is the consequence of size-biased sampling. However, the method is designed to estimate residual correlation rather than to adjust the estimates of penetrance. Simulation studies by both Gail et al. (34) and Chatterjee and Wacholder (33) show that the presence of heterogeneity leads to a positive bias in the estimate of penetrance. In both of these investigations, the case patients were not generated in a size-biased manner with respect to the underlying risk, and so the biases observed must be underestimates of the true biases. Furthermore, the authors used casecontrol sampling fractions ranging from 10% (34) to 50% (33) when, as we have seen, most investigations of the penetrance of major cancer gene mutations use essentially 100% case probands. It may not be possible to devise a reliable statistical method for correcting the bias in studies that use only familial aggregation of cancer in relatives of incident case patients. The more appropriate use of these data may be to estimate the relative risks associated with the mutations, even while recognizing that this strategy is itself susceptible to bias caused by a decrease in the observed relative risks when unmeasured covariates that influence risk are involved (35,36). The use of a conditional likelihood approach may provide a robust analytic strategy in this setting (37). These relative risks can then be used to infer the penetrance of the mutation by using known age-specific incidence rates as a benchmark, such as in the approach of Satagopan et al. (25).
Finally, an important thesis of this article is that substantial heterogeneity exists in the risk for cancer among individuals in the population. The notion that the presence of a germline BRCA1 or BRCA2 mutation in a woman completely defines her risk for breast cancer is probably far from the truth and, in fact, numerous unknown genetic risk modifiers are likely to exist. Thus, a woman who is identified as a carrier and who also has a strong family history of breast cancer is likely to possess a much higher risk for breast cancer than a carrier with no known family history of the disease. These differences should be reflected in the tools used by genetic counselors to predict risk. However, refinement of risk prediction methods such as the Gail model (38) to encompass detailed family history of breast cancer and carrier status (if known) is a major methodologic challenge, in part because of the problems outlined in this article.
![]() |
NOTES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1 Editor's note: SEER is a set of geographically defined, population-based, central cancer registries in the United States, operated by local nonprofit organizations under contract to the National Cancer Institute (NCI). Registry data are submitted electronically without personal identifiers to the NCI on a biannual basis, and the NCI makes the data available to the public for scientific research.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1 Fodor FH, Weston A, Bleiweiss IJ, McCurdy LD, Walsh MM, Tartter PI, et al. Frequency and carrier risk associated with common BRCA1 and BRCA2 mutations in Ashkenazi Jewish breast cancer patients. Am J Hum Genet 1998;63:4551.[Medline]
2 Roa BB, Boyd AA, Volcik K, Richards CS. Ashkenazi Jewish population frequencies for common mutations in BRCA1 and BRCA2. Nat Genet 1996;14:1857.[Medline]
3 Struewing JP, Hartge P, Wacholder S, Baker SM, Berlin M, McAdams M, et al. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336:14018.
4 Schrag D, Kuntz KM, Garber JE, Weeks JC. Decision analysis: effects of prophylactic mastectomy and oophorectomy on life expectancy among women with BRCA1 and BRCA2 mutations. N Engl J Med 1997;336:146571.
5 Easton DF, Ford D, Bishop DT. Breast and ovarian cancer incidence in BRCA1-mutation carriers. Breast Cancer Linkage Consortium. Am J Hum Genet 1995;56:26571.[Medline]
6 Ford D, Easton DF, Bishop DT, Narod SA, Godgar DE. Risks of cancer in BRCA1 mutation carriers. Lancet 1994;343:6925.[Medline]
7 Ford D, Easton DF, Stratton M, Narod S, Goldgar D, Devilee P, et al. Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. Am J Hum Genet 1998;62:67689.[Medline]
8 Narod SA, Ford D, Devilee P, Barkadottir RB, Lynch HT, Smith SA, et al. An evaluation of genetic heterogeneity in 145 breast-ovarian cancer families. Am J Hum Genet 1995;56:25464.[Medline]
9 Burton PR, Palmer LJ, Jacobs K, Keen KJ, Olson JM, Elston RC. Ascertainment adjustment: where does it take us? Am J Hum Genet 2000;67:150514.[Medline]
10 Elston RC. Twixt cup and lip: how intractable is the ascertainment problem? Am J Hum Genet 1995;56:157.[Medline]
11 Khoury MJ, Beaty TH, Cohen BH. Fundamentals of genetic epidemiology. New York (NY): Oxford University Press; 1993. p. 67.
12 Wacholder S, Hartge P, Struewing JP, Pee D, McAdams M, Brody L, et al. The kin-cohort study for estimating penetrance. Am J Epidemiol 1998;148:62330.[Abstract]
13 Hopper JL, Chenevix-Trench G, Jolley DJ, Dite GS, Jenkins MA, Venter DJ, et al. Design and analysis issues in a population-based, case-control-family study of the genetic epidemiology of breast cancer and the Co-operative Family Registry for Breast Cancer Studies (CFRBCS). J Natl Cancer Inst Monogr 1999;(26):95100.[Medline]
14 Gail MH, Pee D, Benichou J, Carroll R. Designing studies to estimate the penetrance of an identified autosomal dominant mutation: cohort, case-control, and genotyped-proband designs. Genet Epidemiol 1999;16:1539.[Medline]
15 Parmigiani G, Berry DA, Aguilar O. Determining carrier probabilities for breast cancer susceptibility genes BRCA1 and BRCA2. Am J Hum Genet 1998;62:14558.[Medline]
16 Prevalence and penetrance of BRCA1 and BRCA2 mutations in a population-based series of breast cancer cases. Anglian Breast Cancer Study Group. Br J Cancer 2000;83:13018.[Medline]
17 Risch HA, McLaughlin JR, Cole DE, Rosen B, Bradley L, Kwan E, et al. Prevalence and penetrance of germline BRCA1 and BRCA2 mutations in a population series of 649 women with ovarian cancer. Am J Hum Genet 2001;68:70010.[Medline]
18 Antoniou AC, Gayther SA, Stratton JF, Ponder BA, Easton DF. Risk models for familial ovarian and breast cancer. Genet Epidemiol 2000;18:17390.[Medline]
19 Hopper JL, Southey MC, Dite GS, Jolley DJ, Giles GG, McCredie MR, et al. Population-based estimate of the average age-specific cumulative risk of breast cancer for a defined set of protein-truncating mutations in BRCA1 and BRCA2. Australian Breast Cancer Family Study. Cancer Epidemiol Biomarkers Prev 1999;8:7417.
20 Thorlacius S, Struewing JP, Hartge P, Olafsdottir GH, Sigvaldason H, Tryggvadottir L, et al. Population-based study of risk of breast cancer in carriers of BRCA2 mutation. Lancet 1998;352:13379.[Medline]
21 Warner E, Foulkes W, Goodwin P, Meschino W, Blondal J, Paterson C, et al. Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected Ashkenazi Jewish women with breast cancer. J Natl Cancer Inst 1999;91:12417.
22 Patil GP, Rao CR. Weighted distributions and size biased sampling with application to wildlife populations and human families. Biometrics 1978;34:17989.
23 Cornfield J. A method of estimating comparative rates from clinical data. Applications to cancer of the lung, breast, and cervix. J Natl Cancer Inst 1951;11:126975.
24 Begg CB, Satagopan JM, Berwick M. A new strategy for evaluating the impact of epidemiologic risk factors for cancer with application to melanoma. J Am Stat Assoc 1998;93:41526.
25 Satagopan JM, Offit K, Foulkes W, Robson ME, Wacholder S, Eng GM, et al. The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiol Biomarkers Prev 2001;10:46773.
26 Claus EB, Schildkraut J, Iversen ES Jr, Berry D, Parmigiani G. Effect of BRCA1 and BRCA2 on the association between breast cancer risk and family history. J Natl Cancer Inst 1998;90:18249.
27 Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al. Environmental and heritable factors in the causation of cancer analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 2000;343:7885.
28 Begg CB. The search for cancer risk factors: when can we stop looking? Am J Public Health 2001;91:3604.[Abstract]
29 Peto J. Genetic predisposition to cancer. In: Cairns J, Lyon JL, Skolnick MH, editors. Banbury Report 4: Cancer incidence in defined populations. Cold Spring Harbor (NY): Cold Spring Harbor Laboratories; 1980. p 20313.
30 Hopper JL, Carlin JB. Familial aggregation of a disease consequent upon correlation between relatives in a risk factor measured on a continuous scale. Am J Epidemiol 1992;136:113847.[Abstract]
31 Peto J, Collins N, Barfost R, Seal S, Warren W, Rahman N, et al. Prevalence of BRCA1 and BRCA2 gene mutations in patients with early-onset-breast cancer. J Natl Cancer Inst 1999;91:9439.
32 Hopper JL. Genetic epidemiology of female breast cancer. Semin Cancer Biol 2001;11:36774.[Medline]
33 Chatterjee N, Wacholder S. A marginal likelihood approach for estimating penetrance from kin-cohort designs. Biometrics 2001;57:24552.[Medline]
34 Gail MH, Pee D, Carroll R. Effects of violations of assumptions on likelihood methods for estimating the penetrance of an autosomal dominant mutation from kin-cohort studies. J Stat Plan Inf 2001;96:16777.
35 Babiker A, Cuzick J. A simple frailty model for family studies with covariates. Stat Med 1994;13:167992.[Medline]
36 Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with non linear regressions and omitted covariates. Biometrika 1984;71:43144.
37 Kraft P, Thomas DC. Bias and efficiency in family-based gene-characterization studies: conditional, prospective, retrospective, and joint likelihoods. Am J Hum Genet 2000;66:111931.[Medline]
38 Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, et al. Projecting individual probabilities of developing breast cancer for white females who are being evaluated annually. J Natl Cancer Inst 1989;81:187986.[Abstract]
Manuscript received December 11, 2001; revised June 11, 2002; accepted June 25, 2002.
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
||||
|
Oxford University Press Privacy Policy and Legal Statement |