Relative risk for genetic associations: the case-parent triad as a variant of case-cohort design

Habibul Ahsana,d, Susan E Hodgeb,c,e, Gary A Heimana, Melissa D Beggb and Ezra S Sussera,c,f

a Department of Epidemiology and
b Department of Biostatistics of Joseph L Mailman School of Public Health,
c Department of Psychiatry and
d Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA.
e Division of Clinical-Genetic Epidemiology and
f Epidemiology of Brain Disorders, New York State Psychiatric Institute, New York, NY, USA.

Dr Habibul Ahsan, Department of Epidemiology, 622 West 168th Street, Room PH-18–129, Columbia-Presbyterian Medical Center, New York, NY 10032, USA. E-mail: ha37{at}columbia.edu


    Abstract
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
The contribution of this paper is to conceptualize the case-parent triad within an epidemiological framework. We propose that the case-parent triad design is a variant of the case-cohort design. The affected offspring of case-parent triads come from a source cohort of all offspring of parents in a population. We first demonstrate that if the source cohort is restricted to offspring of a certain parental mating type then the relative risk in relation to genetic exposure can be estimated simply from the ratio of the number of exposed to the number of unexposed affected offspring. We then extend the logic to studies including offspring of all parental mating types; provided that the allele frequencies and possible parental mating types are specified, a valid relative risk can still be estimated. Compared to prior descriptions of the case-parent triad design, the proposed approach is readily understandable, epidemiologically meaningful and provides a relatively simple perspective for estimating valid measure of effect. Also, by allowing the potential sources of selection bias to be revealed more easily the design is made more accessible both conceptually and practically to epidemiologists.

Keywords Case-parent triad, case-cohort design, family-based design, genetic epidemiology, genetic association, relative risk, association study, TDT

Accepted 3 January 2002


    Introduction
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
Strategies for investigating genetic effects in populations have often originated from genetic disciplines, and have subsequently been reconceptualized for epidemiology.1,2 Inspired by the human genome project,3 studies seeking to establish associations between genes and diseases are proliferating and new design and analytical approaches for these studies are being developed.4 The ‘case-parent triad’ (also known as ‘case-parental control’ or ‘case-parent trio’) design, which includes a series of subjects affected with the disease of interest and their parents, has attracted considerable attention.

We present in this paper a basic epidemiological framework in which the case-parent triad approach can be clearly conceptualized. Previously most of the literature on the case-parent triad has focused on statistical approaches to testing and estimation. The fundamental nature of the design, and the attendant possibilities for bias, have rarely been discussed. In a pioneering effort, Khoury viewed this design as a matched case-control study with the imaginary sibs carrying parental non-transmitting alleles as controls.2,5 Khoury and colleagues also proposed analytical methods for this design,5,6 and on one occasion, alluded to the resemblance of their proposed approach to a case-cohort analytical approach,6 but they did not elaborate.

Our paper focuses on two aspects of the case-parent triad design. The primary goal of our paper is to present the logic and rationale of this popular genetic study in the context of standard epidemiology designs. Specifically, we show that the case-parent triad may best be understood as a variant of an epidemiological case-cohort design (see below). The secondary goal is to show how the analytical approach for this epidemiological case-cohort design can also be derived from the perspective of statistical genetics, i.e. an attempt to bridge the link between the two disciplines of genetics and epidemiology for understanding the case-parent triad approach. In doing so, we present a simple, intuitive approach for estimating relative risk and describe potential sources of selection bias. We show that although relative risk can be estimated most straightforwardly under restricted parental mating types, the approach can be generalized to broader situations if estimates of allelic frequencies are available. We also illustrate how the epidemiological perspective can be implemented using data from a case-parent triad study of panic disorder. Finally, we discuss how the proposed approach relates to previous literature on the case-parent triad design. An Appendix contains mathematical details.


    Genetic association studies
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
Studies that seek to identify disease genes can be categorized into two broad groups: linkage studies and association studies. Linkage studies,7 which essentially examine co-segregation of marker alleles and the disease of interest in multiplex families, are effective in identifying genes with strong effects. Association studies can potentially detect genes with more modest effect (only if the genetic marker and the disease loci are in linkage disequilibrium).8 Thus, the two types of studies complement each other.

Association studies encompass several distinct designs. In the simplest design the association between a genetic variant and disease can be examined in a case-control design by comparing frequencies of exposure between diseased cases and non-diseased controls. People carrying the mutant/variant allele are considered as exposed, and people carrying the normal (i.e. the default) allele, as unexposed. In this context, however, ordinary case-control association studies can be susceptible to ‘population stratification bias’,5 a confounding bias arising from differences in genetic backgrounds of cases and controls. The distribution of alleles in a population is related to the ethnic and social background and geographical origin of their parents. These determinants of allelic distribution are also related to many of the diseases under study. Thus, without proper adjustment, the association between an allelic variant and disease may be confounded. Such confounding bias in examining allelic association is not readily controlled in the analysis, since accurate and sufficient information on ethnicity and other determinants of non-random matings is difficult to obtain. The problem is acute for genes with modest effects, since a weak association between a gene and disease may be easily distorted (in either direction) by residual confounding.

Family-based association studies, such as the case-parent triad, were originally proposed to minimize population stratification bias.5,9 Using controls that are family members of the cases ensures that controls and cases are matched with respect to genetic ancestry since both groups come from the same families. Family-based sampling also provides excellent control for a host of important environmental factors. Among family-based design, best known is the case-parent triad design which uses an ‘index case’ (an offspring affected with the disease of interest) and his/her parents. Genomic DNA for all triad members must be available in order to determine their allelic status. Other family-based designs compare cases with siblings, cousins, or other family members.

The case-parent triad has usually been considered as a case-control design.5 Family-based designs using siblings or cousins as controls can indeed be viewed as matched case-control studies, and a standard analytical approach for matched case-control data can be applied. We propose, however, that this particular family-based design (i.e. the case-parent triad design) is best understood as a variant of what epidemiologists have named the ‘case-cohort’ design.10,11 In a case-cohort design, cases are the individuals from a source population who develop disease over a given period and the controls represent a random sample of the entire source population from which the cases derive. Since controls are a random sub-sample of the source cohort, the exposure odds in the controls (i.e. the ratio of the number of exposed controls to the number of unexposed controls) in a case-cohort study provides an estimate of the exposure odds in the source population (i.e. the ratio of the numbers of the exposed to unexposed in the entire source population). A special feature of the case-cohort design is that the cross-product ratio (or the ratio of exposure odds for cases relative to controls) estimated from a case-cohort study equals the relative risk obtained in a cohort study of the same source population.10,11 This conceptualization of the case-parent triad as a case-cohort design not only makes the design considerably simpler to understand but also reveals potential sources of selection bias more readily (see below).


    Case-parent triad design as a variant of case-cohort study
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
We start with a candidate disease susceptibility gene and a disease of interest D. We have some prior knowledge about the allelic variation of the gene. Let ‘A’ be the variant disease susceptibility allele and ‘B’ be the wild-type allele for a di-allelic gene (or all other alleles for a multi-allelic gene). The goal is to estimate the relative risk for the disease D in relation to a variant allele(s) A.

Consider the total baseline birth cohort of all offspring in a defined population over some period. Let ‘N’ be the number of children comprising the total baseline cohort, with ‘N1’, the number of exposed, i.e. carrying at least one A allele, and ‘N0’, the number of unexposed, i.e. not carrying an A allele. Similarly let ‘n’ be the number of children who develop disease among the total cohort in a given time period, with ‘n1’, the number of diseased among exposed and ‘n0’, the number of diseased among unexposed.

The absolute risks for D in relation to exposure A in the cohort are:


and the relative risk (RR) for D in relation to exposure A, i.e. the parameter RR in the cohort, is given by:

((1))
Note that the relative risk is equal to the ratio of odds of exposure in cases (n1/n0) to the odds of exposure in the total cohort (N1/N0). In an epidemiological case-cohort design, controls, selected as a random sample of the baseline cohort, are used to estimate the odds of exposure (N1/N0) in the source cohort.10,11 Although there is no explicit control group in our proposed approach, we demonstrate that the exposure odds among the total cohort can be inferred from the parents' allelic status (see below). In the simplest scenario, we demonstrate that the exposure odds in the total cohort is estimated to be ‘one’ (see below), and the exposure odds in the cases essentially yields the measure of effect, i.e. the relative risk, in the study.

Estimation of relative risk
In this section, we demonstrate how an estimate of relative risk can be derived from a sample of case-parent triads. We develop the logic within a source cohort. We show that parental genotypes of offspring who developed disease can be used in estimating the exposure odds in the entire source cohort (i.e. all offspring in the population).

With A and B alleles, the three possible genotypes for the parents of the birth cohort are: AA, AB or BB. All possible parental mating types can be summarized as AA x AB, AB x AB, AB x BB, AA x BB, AA x AA, and BB x BB. Of all mating types, the first three (AA x AB, AB x AB, and AB x BB) in which at least one parent is heterozygous AB, are generally informative for the case-parent triad design. (The last three mating types, in which both parents are homozygous, are uninformative because the event that the offspring does or does not receive a particular allele is non-stochastic; the probability equals 0% or 100% in these cases.) Often the frequency of A is low, therefore, of the three informative mating types, the AB x BB mating type is often the most common in the population. For example, if the population frequency of a disease-related variant allele is 0.1 then the AB x BB mating type will comprise over 89% of the three informative mating types (i.e. those in which at least one parent is heterozygous AB). To show the logic of our argument clearly we restrict our discussion primarily to case-parent triads where the parental mating types are AB x BB, but the argument can be readily extended to other mating types (Appendix).

Now imagine a birth cohort consisting only of children from the parental mating type AB x BB. Keeping the same notation, assume that there are ‘N’ individuals in the cohort whose parental mating type is AB x BB, and assume that ‘n’ individuals among them developed the disease D over a given time period. The relative risk (RR) for D in relation to exposure A allele among children of AB x BB parents is given by:

((2))
Note that eq (2)Go is the same as eq (1)Go except that eq (2)Go applies to a specific subgroup. Thus, in eq (2)Go N1 and N0 denote the numbers of exposed and unexposed people in the cohort whose parental mating type is AB x BB, respectively, and n1 and n0 denote the numbers of diseased individuals among the two corresponding groups.

We can generally assume N1/N0 = 1, since under Mendelian transmission, on average half of children of AB x BB parents carry the A allele and half do not. Therefore the numbers of the exposed (N1) and the unexposed (N0) in the birth cohort are expected to be equal.

In this situation, the relative risk can be expressed simply as (n1/n0), i.e. the ratio of exposed to unexposed cases. So we can estimate the relative risk in relation to the exposure (A) simply by taking the ratio of the number of exposed affecteds to the number of unexposed affecteds. Figure 1Go shows the conceptual framework for estimating relative risk from case-parent triads. Note that although only allelic information among the affected offspring is needed to estimate the relative risk, allelic data on the full triad are needed in order to identify those triads whose parental mating type is AB x BB. Further note that in this proposed approach parents themselves do not serve directly as controls per se (as in a regular case-control study); rather, their allelic distribution is used to derive the odds of allele A in the source cohort.



View larger version (19K):
[in this window]
[in a new window]
 
Figure 1 Case-parent triad as a variant of case-cohort design: conceptual framework for estimating relative risk

 
Thus, provided that we ascertain all diseased offspring (cases) in the birth cohort whose parental mating type is AB x BB, we can estimate the relative risk, using cases only, simply by taking the ratio of exposed to unexposed cases (n1/n0). Often, the ascertainment of all cases is not possible and only a fraction (‘k’) of all cases are selected. Nevertheless, under certain conditions one can still estimate a valid relative risk. The key condition is that the exposed and unexposed cases have the same probability of being ascertained in the study sample, i.e. the sampling fractions are equal. This condition is often reasonable and its validity in a particular study can be examined especially if data on the total cohort, from which the study samples are selected, are available. Also, it is possible that more than one offspring may develop disease in a sibship; we address this possibility in the Discussion.


    Application of the proposed approach to data from a study of panic disorder
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
To illustrate the proposed method, we estimated the relative risk for alleles at two different loci examined in a recent family-based association study on panic disorder,12 catechol-O-methyltransferase (COMT) and D22S944, a nearby dinucleotide repeat marker. Panic disorder is somewhat familial,13,14 and at least some cases of panic disorder appear to have a genetic aetiology,15,16 although the mode of inheritance is not well understood.16–18

The triads were taken from an ongoing genetic study of panic disorder.14,19,20 In this study, families and triads were collected and diagnosed by senior clinical investigators, according to DSM III-R diagnostic criteria, and were genotyped for approximately 300 genetic markers. Eighty-three affected individuals and their parents were available to us. COMT is a biallelic gene, with the higher-activity allele the variant of interest (A), with estimated population frequency of 46%. Twenty-one of these triads had the requisite COMT, AB x BB mating type; in 13, the offspring had received the A allele from the heterozygous parent, and in 8, the B. Hence the estimate of relative risk in relation to COMT A allele is simply 13/8 = 1.62 (95% CI: 0.62–4.52).

D22S944 is a polymorphic CA-repeat marker near COMT, and we chose the ‘163’ allele (population frequency 28%) as the allele of interest for this illustration. Of the 83 triads, 27 had the requisite mating types, of which in 20, the offspring received the allele of interest. Thus the relative risk for this allele is estimated as 20/7 = 2.86 (95% CI: 1.16–8.01).

The Appendix gives more details (see Tables 2 and 3GoGo there), and shows how the approach can be extended to include additional parental mating types. As noted earlier, so long as the allele frequencies are specified the relative risk can still be estimated. In our example, including all parental mating types yielded relative risks of 1.53 (95% CI: 0.88–2.78) for COMT and 1.45 (95% CI: 0.90–2.35) for D22S944 (sampling scheme 4—see Appendix).


View this table:
[in this window]
[in a new window]
 
Table 2 Numbers of panic disorder triads belonging to each mating type, for the ‘high-activity’ allele at the COMT locus and for the ‘163’ allele at the D22S944 locus
 

View this table:
[in this window]
[in a new window]
 
Table 3 Estimates (and 95% CI) for relative risk for the panic disorder data, under 4 sampling schemes
 

    Relationship of the proposed approach to prior approaches for the case-parent triad design
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
Falk and Rubinstein first proposed the ‘haplotype relative risk’ method to analyse case-parent triads.9,21 The four alleles of the two parents are divided into ‘transmitted’ (the two alleles that are transmitted from parents to the affected offspring), and ‘non-transmitted’ alleles (those two that are not transmitted). In each triad, the affected offspring carries two transmitted alleles and is considered a ‘case’. Then a hypothetical sibling with the two non-transmitted alleles is considered a ‘control’. The method estimates a standard unmatched odds ratio by comparing the cases with their imaginary sibling controls with respect to the presence/absence of the allele of interest A. The method assumes that if an allele is associated with the disease, then the frequency of that allele should be higher among cases than among controls.

Khoury, who named this approach the ‘case-parental control’ design, viewed it instead as a matched case-control study. He proposed a method of estimating the matched odds ratio from the case-fictitious control pairs who are discordant with respect to A allele.5 Viewing this approach as a case-control design was novel and ingenious. Nonetheless, as we have shown, the resemblance is specifically to a case-cohort design, not to a conventional case-control study. How the controls are selected conflicts with the rationale of control selection in a conventional epidemiological case-control design. From an incidence-density sampling perspective, controls should be selected at the time cases occur, so they represent the distribution of person-time in the source cohort.11 From this perspective, the fictitious controls, i.e. imaginary sibs, cannot be viewed as controls, as one cannot assume them to be unaffected at the time when the cases developed disease.

Viewing the imaginary sibs as fictitious controls is also incompatible with a case-cohort sampling perspective where the controls should represent the exposure distribution among the source cohort. By definition, the fictitious controls can only carry the alleles that the cases do not receive. If A allele is associated with the disease, the cases would be more likely to receive A allele from the parents and the frequency of A would be lower among the fictitious controls than in the source cohort. Thus, the odds ratio estimated from this approach would, in general, overestimate the true relative risk.

In a subsequent paper, Flanders and Khoury proposed a refined analytical approach for case-parent triads and noted that their proposed approach resembled the analysis of a case-cohort study.6 This was an early intimation of our approach (see Discussion). However, they have not been explicit in presenting the case-parent triad design as a case-cohort study.

Spielman and colleagues22 refined the analytical approach for the case-parent triad design by proposing a ‘transmission disequilibrium test’ (TDT). Here the transmission frequency of A allele to the affected offspring from heterozygous parents is examined. That is, the number of times the A allele is transmitted from a heterozygous AB parent to an affected offspring (the ‘positive’ transmissions) is compared with the number of times the B allele is transmitted (the ‘negative’ transmissions). Designating the frequencies of positive and negative transmissions as ‘b’ and ‘c’, respectively, the TDT statistic is [(b – c)2/ (b + c)], whose value follows a {chi}2 distribution with 1 degree of freedom.22 The TDT and its variants22–25 avoid the problem of conceptualizing the fictitious controls as non-diseased, since these approaches use only the transmitted alleles. However, many of these TDT-type methods focus on hypothesis testing, rather than estimating the magnitude of effect, a key element in epidemiology for examining exposure-disease association. Nevertheless, we note similarities between TDT and our proposed method. Both methods can be applied to the same case-parent triads. In fact, the numbers of exposed and unexposed cases for our method are the same as the frequencies of positive and negative transmissions of the TDT statistic, respectively, i.e. ‘b’ and ‘c’ of the expression [(b – c)2/(b + c)].22 Our proposed method complements—not replaces—the TDT, because it provides a closed form measure of association, clarifies sources of bias, and is simple and easy to understand.

More recent analytical approaches towards the analysis of case-parent triad data address the issue of estimating allelic association in different ways. Like the method of Falk and Rubinstein,9 methods proposed by Flanders and Khoury6 and Schaid and Sommer26 focus on estimation of the odds ratio. A number of methods are also available for estimating the relative risk; these include the techniques described by Schaid and Sommer,27 Sun et al.,28 Wilcox et al.29 and Knapp et al.30 Methods published by Self et al.31 and Schaid32 use ‘rate ratio’ models for disease risk (one specifies the ‘instantaneous probability’, or hazard function, in the model statement rather than the probability of disease), but they might also be legitimately described as ‘relative risk’ models since no real time is incorporated in the estimation. It is important to note that one of the estimators proposed herein can be derived as a special case of the Self et al. method for relating the candidate gene to disease risk. Under sampling scheme 1 (where only AB x BB mating types are considered), our simple, closed form estimate of the relative risk turns out to be identical to that derived under the model proposed by Self and colleagues, restricting attention to the model with only a single binary covariate for transmission of the disease susceptibility gene. The correspondence between the two approaches confirms the value of our estimator; highlighting its simpler form in special cases should serve to expand its utility as well. Finally, recently Whittemore and colleagues devised likelihood-based methods where families with arbitrary structure, i.e. case with parents and/or siblings, can be included.33

While many of the methods cited above address the estimation of the relative risk, most are regression-type methods that require complicated, iterative computer algorithms to obtain parameter estimates. The methods proposed in this paper, however, all have closed form. This allows them to be conceptualized and computed more easily, and, therefore, to be used and understood by a greater number of data analysts and researchers. The trade-off between the two types of approaches consists of clarity and simplicity versus flexibility: our proposed methods are simpler and more transparent to a wider audience, but the regression methods can be applied to a wider array of data structures. Clearly there is need for both styles of analyses in practice.

Another way our paper differs from the previous analytical approaches to case-parent triad is that the analytical approach presented in our paper (mathematical details in Appendix) evolved as an extension of our primary goal to conceptualize the case-parent triad design as a case-cohort study. Specifically, it may be mentioned that the quantitative estimates based on genetic principles that are presented in the Appendix of our paper are mathematically similar to those presented in the paper by Knapp and colleagues;30 yet the rationale and the context of their mathematical derivations are very different than ours. Thus, our paper's focus is not the quantitative estimation per se but rather to establish the links between standard epidemiological principles and mathematical approaches in the statistical genetics literature (described above).


    Discussion
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
The case-parent triad is an elegant family-based design for examining disease-allele association. Our contribution is to conceptualize the approach in a way that is accessible to an epidemiologist and useful for understanding bias and validity in these studies. We propose that the case-parent triad be viewed as a variant of a case-cohort study. Using the fact that the exposure odds in the source cohort can be derived from knowledge of the parental genotypes of the cases, we describe a simple method for estimating relative risk.

To demonstrate the logic, we initially focused on the simplest case-parent triad design, with attention restricted to offspring of the AB x BB mating types. Under this restriction, in the cohort that is the source of the cases, 50% of offspring have A (exposed) and 50% do not (unexposed) according to Mendelian principles. Therefore the relative risk [(n1/n0)/(N1/N0)] can be estimated directly from the relative numbers of cases (n1/n0); the exposure odds in the source cohort, i.e. the ratio of the exposed and unexposed cohorts (N1/N0), is unity.

In practice, in case-parent triad studies, cases with different parental mating types (e.g. AB x BB, AB x AB, AB x AA) may also be included. Cases from a certain parental mating type may be considered as originating from a certain stratum in the source cohort. In some strata, the sizes of the exposed and unexposed cohorts are not equal (i.e. the exposure odds is not 1). Since their ratio is no longer unity it does not cancel out in the relative risk calculation. Nevertheless, a valid estimation of relative risk within each stratum is possible if the relative sizes of exposed and unexposed cohorts (or, their ratio) are known or can be estimated using genetic principles (Appendix contains details and examples).

To utilize these data, one may wish to combine data from cases in different strata, i.e. with different parental mating types. This can be done by estimating (N1/N0) to appropriately weight the ratio of exposed over the unexposed number of cases (n1/n0). This is mathematically demonstrated in the Appendix under different scenarios of parental mating types. Under these more general situations, in order to precisely estimate relative risk one needs allele frequency estimates for the source population. Otherwise derivation of relative risk under these scenarios becomes more complex and it may be preferable to restrict the analysis to one or a few of the mating types. Note that if the variant allele of interest is not very common in the population (as with most of the disease related genes), restriction of analysis under AB x BB mating types should utilize most (e.g. >89% with an allele frequency 0.1 and >67% with an allele frequency 0.2) of the usable data. One may also wish to compare results from the AB x BB mating type only with the combined results from multiple mating types (as shown in the Appendix) to examine whether the assumptions (e.g. knowledge of allele frequency and Hardy-Weinberg equilibrium) for expanded analytical schemes combining all parental mating types are reasonable.

By clarifying the nature of the design, the sources of potential selection bias can be more easily identified, described, and addressed. In epidemiological terms, the proposed design requires that exposed and unexposed cases have the same probability of being ascertained in the study sample, i.e. the sampling fractions are equal. An effective way to preserve this assumption is by including a series of consecutive incident cases and their parents and then selecting subjects with eligible parental mating types for analysis. When consecutive incident cases are not used, the validity of the assumption depends on whether the ascertainment of cases is independent of the allelic status. In such a situation, the validity would also depend on whether survival of the disease is related to the allelic status.

The proposed method requires that the A allele represents the ‘exposure’. Thus, the method is better suited to situations where disease risk is thought to be similar for heterozygous and homozygous carriers, i.e. under a dominant model. Under restriction to parental AB x BB mating types, the method estimates relative risk only for heterozygous offspring. Assessment of disease risk separately for heterozygotes and homozygotes is possible in more complex forms of the design which include all parental mating types (Appendix).

What should one do when there are more than one affected offspring in the same sibship? The most conservative course is to use only one child per sibship—e.g. chosen at random from among the affected children in that sibship. (There do exist circumstances under which one can validly use more than one affected child per sibship, but that topic is beyond the scope of this paper. The interested reader is referred to34,35 for further discussion of this issue.)

The case-parent triad design may also be viewed as a variant of other standard epidemiological designs. First, although strictly speaking the method is a variant of a case-cohort design, it can also be viewed simply as a variant of the cohort design itself, but one in which only those cohort members who developed disease are studied. By studying the cases arising from some source population and a sample of controls representing that source population, the study estimates the same relative risk that the full cohort study would have estimated studying the entire source population. However, since unlike in a regular cohort study we cannot derive an estimate of ‘risk difference’ from our proposed approach (because of the lack of data on the actual sizes of the exposed and unexposed cohorts), strictly speaking, the case-parent triad design is not a regular cohort design. Similarly, keeping a cohort design perspective, the proposed design may also be viewed as a ‘retrospective randomized trial’, where only those who developed disease are ascertained. This similarity with randomized trials arises from the fact that under the null hypothesis the probability for the offspring of receiving the A allele from their AB x BB parents is 50%. The analysis and relative risk estimation can be done the same way as proposed here. Cohort studies and randomized trials are familiar to a wide audience of epidemiologists, geneticists, clinicians and other researchers, whereas the prior approaches to case-parent triad design are familiar only to genetic epidemiological research methodologists, and case-cohort designs are familiar only to epidemiologists. Therefore, while these formulations are less precise they may be useful for some purposes.

The proposed method has some important practical limitations. First, as mentioned above, unlike regular cohort studies the proposed method does not allow estimation of risk difference —an important measure of association in epidemiological research. Second, as with all methods using case-parent triad data, application of the proposed approach has limitations for studying late-onset diseases, especially since the current version of the method requires genotyping of both parents. (But note that parents need only be genotyped for the candidate/marker locus, not diagnosed for presence or absence of the disease of interest. The method does not use parental diagnosis information. Thus, in future applications, if elderly parents were genotyped and their DNA stored, they could be used for this kind of method, even if they had never been diagnosed for presence or absence of disease.) Third, the confidence intervals for the relative risk may be unstable for situations where children with all parental mating types are included, especially under certain values of allele frequencies and effect sizes (Appendix).


    Conclusion
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
The currently available literature on family-based genetic association studies focuses on statistical approaches to hypothesis testing and estimation. By contrast, the current paper conceptualizes the case-parent triad design within an epidemiological framework and provides logical foundations for considering sources of bias. It also describes a simple method of estimating the relative risk, when examining genetic associations using case-parent triad data, which can be implemented without any special computer software.


    Appendix
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
Estimators and confidence intervals for relative risk under different sampling schemes, and application to panic disorder data
1 Notation and terminology
Let A denote the allele of interest, and B denote all other allele(s) that are not A. We define gene frequencies


We also define ‘penetrances’


((3))
Thus, the relative risk R, corresponding to RR = R1/R0 in the text (equation 1Go), is given by

2 Probability model
Figure 2Go shows the three mating types (MT) that have at least one heterozygous (AB) parent. The Figure also shows diagramatically all possible ways for these mating types to produce an affected child (under the assumptions of this paper). There are seven resultant triad types. For example, triad type 1 consists of parents who are AB x BB and an affected child who is AB. The ni indicate how many triads of the indicated type are actually observed. The numbers written on the arrows in Figure 2Go indicate probabilities. For example, for triad type 1, the population probability of the AB x BB MT is 4uv3, based on the assumption of Hardy-Weinberg equilibrium; the probability of a child from this MT receiving the genotype AB is ½, based on Mendel's First Law; and the probability that that child is affected is {alpha}, according to (3). The total ‘joint’ probability of that triad is thus the product of these three probabilities:



View larger version (13K):
[in this window]
[in a new window]
 
Figure 2 Probability model for parental mating types (MT) with at least one heterozygous AB parent. Probabilities of these MT assume Hardy-Weinberg equilibrium

 
Similar joint probabilities can be derived for all seven possible triad types and are shown in Figure 2Go.

3 Estimators and confidence intervals for R under various sampling schemes
Using the joint probabilities shown in Figure 2Go, we now rigorously derive the maximum likelihood estimator (MLE), R^, and the confidence intervals, under four different sampling schemes. (The sampling schemes will be described below.) For each sampling scheme, we will proceed in three steps.

Step 1: Designate one set of observations as the ‘positive’ ones (i.e. observations that contribute to the numerator, {alpha}, of R), and another set of observations as ‘negative’ (contributing to the denominator, ß, of R). The probability of the positive observations is designated p, and the probability of the negative observations, 1 – p.

Step 2: We have thus defined a binomial probability model, so it is straightforward to estimate p (using the maximum likelihood estimator, or MLE) and to find a desired confidence interval (CI) on p, using standard statistical methods.

Step 3: In each case, the relative risk R can be expressed as a monotonic increasing function of this binomial parameter p. Therefore, applying that function to the estimates and CI for p yields the MLE and CI for R.

We now briefly describe each sampling scheme and its justification. The corresponding formulae are displayed in Table 1Go. Detailed derivations are not included here but are available from the authors upon request. We will then illustrate application of the formulae with some of the panic data described in the text (Tables 2 and 3GoGo).


View this table:
[in this window]
[in a new window]
 
Table 1 Formulae for estimators of p (the binomial parameter) and R (the relative risk), for four sampling schemes. See Appendix for details.
 
Sampling scheme 1: Using only AB x BB mating types
The simplest sampling scheme is to use only AB x BB mating types; this is the scheme discussed in the body of this paper (triad types 1 and 2 in Figure 2Go). There are only two types of affected offspring to observe: AB (the ‘positive’ ones, counted as n1) and BB (the ‘negative’ ones, n2). The probabilities of an observation in the n1 and n2 categories are denoted p and 1 – p, respectively, and are given in Table 1Go, along with the other necessary formulae. This sampling scheme has the advantage of not requiring knowledge of the allele frequency u.

Sampling scheme 2: Using all mating types that have at least one heterozygous parent
There are three mating types that have at least one heterozygous (AB) parent, as shown in Figure 2Go, so this sampling scheme uses more of the data than sampling scheme 1. Separate the triads into those involving {alpha} (‘positive’ observations), i.e. a child with at least one A allele in his/her genotype (triad types 1, 3, 4, 6 and 7); and those involving ß (‘negative’ observations), i.e. a child with no A alleles (types 2 and 5). This leads to defining


Table 1Go gives the relevant formulae. These formulae assume that the allele frequency u is known.

Sampling scheme 3: Using all mating types that have at least one heterozygous parent—with transmission disequilibrium test (TDT)-type data
Data for the TDT22 tabulates affected offspring into two categories: those affected offspring who received an A allele from a heterozygous parent (the ‘positive’ observations) and those who received a B allele from a heterozygous parent (the ‘negative’ ones). Thus, if one receives data in this dichotomous form (e.g. from a published paper), one cannot use the formulae for sampling scheme 2. Rather, one can estimate the relative risk as follows. Offspring from MT2 (AB x AB) get counted twice, because each such offspring actually represents two independent meioses, one from each parent. (For example, an AA offspring from MT2 gets counted as receiving an A allele twice, whereas an AB offspring from this mating gets counted as receiving one A and one B.) Let x denote the total count of meioses receiving A, and y, the total count receiving B:


Table 1Go gives the relevant formulae, again assuming u is known.

Sampling scheme 4: Using all mating types
The final sampling scheme we consider uses all mating types that produce an affected child, whether at least one parent is heterozygous or not. This requires expanding the diagram from Figure 2Go to include six mating types rather than only three (Figure 3Go). As in sampling scheme 2, we separate the triads into those involving {alpha} (‘positive‘) and those involving ß (‘negative‘). Referring to Figure 3Go, define


(Thus, s is the same as z in sampling scheme 2, but with n8 and n9 added; t is the same as w, but with n10 added.) The corresponding probabilities, estimators, and CI are given in Table 1Go, again, under the assumption that u is known.



View larger version (8K):
[in this window]
[in a new window]
 
Figure 3 Probability model for additional parental mating types (MT). Probabilities of these MT assume Hardy-Weinberg equilibrium

 
Example: Estimating relative risk for the panic data
We apply the formulae in Table 1Go to the panic data mentioned in the body of the paper. Table 2Go shows the breakdown of the 83 triads into the 10 possible mating types, for both marker loci, COMT and D22S944. Table 3Go gives the estimates and CI for both loci, for all four sampling schemes. Here we describe the procedure in more detail for two of the analyses: the relative risk for COMT, for sampling schemes 1 and 2.

Sampling scheme 1: n1 = 13 and n2 = 8, so according to the fifth line in Table 1Go, pGo^ = n1/(n1 + n2) = 13/21 = 0.619. Exact probability calculations yield 95% confidence limits on p of pL = 0.384 and pU = 0.819 (pL and pU denote the lower and upper confidence limits, respectively). The last line of Table 1Go gives us the functional relationship R = p/(1 – p), so the MLE is R^ = 0.619/(1 – 0.619) = 1.62. To find the confidence limits on R, we apply the same function to pL and pU: RL = 0.384/ (1 – 0.384) = 0.62; and RU = 0.819/(1 – 0.819) = 4.52. Thus the 95% CI on R is (0.62–4.52).

Sampling scheme 2: z = 47, w = 10, and again we use u =.46 (v = 0.54), so p^ = 47/57 = 0.825. Exact 95% confidence limits on p are pL = 0.701 and pU = 0.913. The relationship of R to p in this sampling scheme is given (last line of Table A.2Go) as [p/(1 – p)]•v(1 + v)/(3u2 – u + 2), which equals [p/(1 – p)]• (0.382). Thus, R^ = [(0.825)/(1 – 0.825)]•(0.382) = 1.80, with 95% confidence limits RL = [(0.701)/(1 – 0.701)]•(0.382) = 0.90 and RU = [(0.913)/(1 – 0.913)]•(0.382) = 4.01. Thus the 95% CI on R is (0.90–4.01).

4 Comments
Sampling scheme 1 is the simplest and has the advantage of not requiring knowledge of the allele frequency u. Sampling schemes 2 and 4 have the advantage of using more of the data—all mating types with at least one heterozygous parent in sampling scheme 2, and all mating types in sampling scheme 4. Sampling scheme 3 would be used only if one had dichotomous TDT-type data and could not recover the original breakdown of mating types; otherwise it would offer no advantage over sampling scheme 2.

As mentioned, sampling schemes 2, 3, and 4 require that one have a reliable estimate of the allele frequency u in the population being studied. (If the population being studied is stratified, i.e. if there is more than one subpopulation, with different gene frequencies in these subpopulations, then these different gene frequencies would need to be known.) Otherwise, one should use sampling scheme 1, where gene frequency is irrelevant. The other option, when u is unknown, would be to combine data from mating types 1 and 2 and multiply together the ‘conditional likelihoods’ from those two mating types. (The conditional likelihoods from the remaining mating types do not contribute additional information about the ratio of {alpha} to ß.) This approach is significantly more complex than the other material presented here and is beyond the scope of this study.

Finally, we mention that occasionally the estimator and/or the upper confidence limit of R may be undefined. This can happen when the denominator of the formula (last row of Table 1Go) becomes zero or negative. Sampling scheme 3 (the TDT-data-based one) is particularly prone to this problem, so this is another reason not to use that sampling scheme unless necessary. (See Table 3Go for examples of this phenomenon.)


    Acknowledgments
 
We are grateful to Drs Abby J Fyer, Steven P Hamilton, James A Knowles, and Myrna M Weissman, for providing us with the case-parent triad data from their study of panic disorder (grants MH 28274 and MH 37592). We also thank Drs Alfredo Morabia, Zena Stein, Sharon Schwartz, Duncan Thomas and Alice Whittemore for helpful comments. This work was supported in part by grants CA 69398 (HA), DAMD 170010213 (HA), MH 48858 and DK 31813 (SEH).


    References
 Top
 Abstract
 Introduction
 Genetic association studies
 Case-parent triad design as...
 Application of the proposed...
 Relationship of the proposed...
 Discussion
 Conclusion
 Appendix
 References
 
1 Susser M, Susser E. Indicators and designs in genetic epidemiology: separating heredity and environment. Rev Epidemiol Sante Publique 1987;35:54–77.[ISI][Medline]

2 Khoury MJ. Genetic epidemiology. In: Rothman KJ, Greenland S (eds). Modern Epidemiology. 2nd Edn. Philadelphia: Lippincott-Raven, 1998.

3 Collins FS, Partinos A, Jordan E, Chakravarti A, Gesteland R, Walters L, the members of the DOE and NIH planning groups. New goals for the US Human Genome Project: 1998–2003. Science 1998; 282:682–89.[Abstract/Free Full Text]

4 Seminara D (ed.). Innovative study designs and analytic approaches to the genetic epidemiology of cancer. J Natl Cancer Inst Monograph 1999;26:1–105.[Medline]

5 Khoury MJ. Case-parental control method in the search for disease-susceptibility genes. Am J Hum Genet 1994;55:414–15.[ISI][Medline]

6 Flanders WD, Khoury MJ. Analysis of case-parental control studies: Method for the study of associations between disease and genetic markers. Am J Epidemiol 1996;144:696–703.[Abstract]

7 Ott J. Analyses of Human Genetic Lineage. Baltimore, MD: Johns Hopkins University Press, 1999.

8 Greenberg DA. Linkage analysis of ‘necessary’ disease loci versus ‘susceptibility’ loci. Am J Hum Genet 1993;52:135–43.[ISI][Medline]

9 Falk CT, Rubinstein P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet 1987;51:227–33.[ISI][Medline]

10 Rothman KJ, Greenland S. Case-control studies. In: Rothman KJ, Greenland S (eds). Modern Epidemiology. 2nd Edn. Philadelphia: Lippincott-Raven, 1998.

11 Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986;73:1–11.[ISI]

12 Hamilton SP, Slager SL, Heiman GA et al. Evidence for a susceptibility locus for panic disorder near the catechol-O-methyltransferase gene on chromosome 22. Biol Psychiatry, in press.

13 Crowe RR, Noyes R, Pauls DL, Slymen D. A family study of panic disorder. Arch Gen Psychiatry 1983;40:1065–69.[Abstract]

14 Weissman MM, Wickramaratne P, Adams PB et al. The relationship between panic disorder and major depression. A new family study. Arch Gen Psychiatry 1993;50:767–80.[Abstract]

15 Torgersen S. Genetic factors in anxiety disorders. Arch Gen Psychiatry 1983;40:1085–89.[Abstract]

16 Kendler KS, Neale MC, Kessler RC, Heath AC, Eaves LJ. Panic disorder in women: a population-based twin study. Psychol Med 1993; 23:397–406.[ISI][Medline]

17 Vieland VJ, Hodge SE, Lish JD, Adams PB. Segregation analysis of panic disorder. Psychiatr Genet 1993;3:63–71.[ISI]

18 Vieland VJ, Goodman DW, Chapman T, Fyer AJ. New segregation analysis of panic disorder. Am J Med Genet 1996;67:147–53.[CrossRef][ISI][Medline]

19 Knowles JA, Fyer AJ, Vieland VJ et al. Results of a genome-wide genetic screen for panic disorder. Am J Med Genet 1998;81:139–47.[CrossRef][ISI][Medline]

20 Fyer AJ, Weissman MM. Genetic linkage study of panic: clinical methodology and description of pedigrees. Am J Med Genet 1999;88: 173–81.[CrossRef][ISI][Medline]

21 Rubinstein P. HLA and IDDM: facts and speculations on the disease gene and its mode of inheritance. Hum Immunol 1991;30:270–77.[CrossRef][ISI][Medline]

22 Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–16.[ISI][Medline]

23 Sham PC, Curtis D. An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet 1995;59:323–36.[ISI][Medline]

24 Sun F, Flanders D, Yang Q, Khoury MJ. Transmission Disequilibrium Test (TDT) when only one parent is available: The 1-TDT. Am J Epidemiol 1999;150:97–104.[Abstract]

25 Spielman RS, Ewens WJ. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet 1999;59:983–89.

26 Schaid DJ, Sommer SS. Comparison of statistics for candidate-gene association studies using cases and parents. Am J Hum Genet 1994;55: 402–09.[ISI][Medline]

27 Schaid DJ, Sommer SS. Genotype relative risks: methods for design and analysis of candidate gene association studies. Am J Hum Genet 1993;53:1114–26.[Medline]

28 Sun F, Flanders D, Yang Q, Khoury MJ. A new method for estimating the risk ratio in studies using case-parental control design. Am J Epidemiol 1998;148:902–09.[Abstract]

29 Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of ‘case-parent triads’. Am J Epidemiol 1998;148:893–901.[Abstract]

30 Knapp M, Wassmer G, Baur MP. The relative efficiency of the Hardy-Weinberg equilibrium-likelihood and the conditional on parental genotype-likelihood methods for candidate-gene association studies. Am J Hum Genet 1995;57:1476–85.[Medline]

31 Self SG, Longton G, Kopecky KJ, Liang K-Y. On estimating HLA/ disease association with application to a study of aplastic anemia. Biometrics 1991;47:53–61.[ISI][Medline]

32 Schaid DJ. Relative-risk regression models using cases and their parents. Genet Epidemiol 1995;12:813–18.[ISI][Medline]

33 Whittemore AS, Tu I-P. Detecting disease genes using family data: I. Likelihood-based theory. Am J Hum Genet 2000;66:1328–40.[CrossRef][ISI][Medline]

34 Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet 1996;59:983–89.[ISI][Medline]

35 Martin ER, Kaplan NL, Weir BS. Tests for linkage and association in nuclear families. Am J Hum Genet 1997;61:439–48.[CrossRef][ISI][Medline]