1 Center for Environmental and Genetic Medicine, Institute of Biosciences and Technology, Texas A&M University System Health Sciences Center, Houston, TX
2 National Institute of Environmental Health Sciences, Research Triangle Park, NC
Correspondence to Dr. Laura E. Mitchell, Texas A&M University System Health Science Center, Institute of Biosciences and Technology, 2121 West Holcombe Boulevard, Houston, TX 77030 (e-mail: lmitchell{at}ibt.tamhsc.edu).
Received for publication December 22, 2004. Accepted for publication April 28, 2005.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
alleles; epidemiologic methods; genotype; linkage disequilibrium; linkage (genetics); models, genetic; models, statistical
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Case-control designs, in which individuals with a particular condition (cases) are compared with control individuals and their mothers (case mothers) are compared with control individuals, have been used to assess the relation between risk and both the offspring and maternal genotypes (13
). As with any case-control study, a limitation of this approach is the potential for bias due to the use of inappropriate controls. In addition, when tests for offspring and maternal genotypic effects are performed independently and both tests reject the null hypothesis, it is difficult to differentiate among three alternative interpretations: both the offspring and maternal genotypes influence disease risk; only the offspring genotype influences risk; and only the maternal genotype influences risk (4
). Because of these limitations, alternative approaches for assessing offspring and maternal genotypic effects are of interest.
Several family-based approaches have been proposed for assessing offspring and maternal genetic effects (59
). The two-step transmission disequilibrium test (TDT) was the first such approach proposed for this purpose (5
). Using this approach, one samples and genotypes affected individuals and their parents and maternal grandparents. The relation between the offspring genotype and risk is assessed by evaluating the transmission of alleles from parents who are heterozygous at the locus of interest to their affected offspring and testing for violations of the Mendelian expectation of 0.5. The relation between the maternal genotype and the risk of having an affected child is similarly assessed by evaluating the transmission of alleles from heterozygous maternal grandparents of affected individuals to the mothers of affected individuals. However, because the mothers (and fathers) in the study families have had an affected child, the sample is weighted in favor of families in which the mother (and father) inherited a high-risk variant. Hence, the test based on transmissions from maternal grandparents to mothers of affected individuals is actually a test of the global hypothesis of no maternal and no offspring genetic effects. If used as a test for maternal effects only, the rejection rate will be elevated (supranominal) under a null where the offspring genotype influences risk (8
).
Alternative family-based approaches for evaluating maternal and offspring genetic effects have been proposed (79
) but require data from different family constellations than that required by the two-step TDT. This paper describes two new methods for the evaluation of offspring and maternal genetic effects that, like the two-step TDT, require genotype data only from cases and their parents and maternal grandparents.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Both methods require genotype data for "pents" comprising the maternal grandmother, maternal grandfather, mother, father, and affected child and consider a single susceptibility locus with two alleles or two categories of alleles (i.e., high- vs. low-risk alleles). To simplify, one allele is designated as the "variant." The specific allele designated as the variant does not affect the inference. Genotypes are designated as 0, 1, or 2 according to the number of copies of the variant allele.
Both methods assume Mendelian assortment and that, within strata (defined by the four grandparental genotypes for one approach and by the genotypes of maternal grandparents and father for the other), survival and sampling of the affected child are unrelated to the genotype under study. Neither method requires Hardy-Weinberg equilibrium or a rare-disease assumption.
Pent-1
With the first method, the family constellation of interest includes the maternal grandmother (MM), maternal grandfather (MF), paternal grandmother (FM), paternal grandfather (FF), mother (M), father (F), and affected child (C). This method is similar to a previously described approach (8), which requires genotype data from all individuals in this family constellation. However, with the proposed approach, genotype data for the paternal grandparents are not collected, and statistical methods for missing data are used to fit the full grandparental likelihood and to develop the associated likelihood ratio test statistics. The analysis uses the expectation-maximization algorithm (10
) to maximize the likelihood. The required assumption that the data be missing at random is automatically fulfilled when genotypes for all paternal grandparents are missing by design.
The pent-1 method assumes that there is genetic symmetry across matings (grandparental and parental) at the locus of interest. Symmetry across maternal (i.e., MM x MF) and paternal (i.e., FM x FF) grandparental mating types is imposed by presuming equal frequencies in the population and equating reciprocal mating types (i.e., 0 x 1 = 1 x 0). Symmetry across parental mating types (i.e., M x F) is imposed by defining reflections of grandparental mating type combinations (i.e., MM x MF/FM x FF) to be equivalent (i.e., 0 x 1/1 x 2 = 1 x 2/0 x 1). These equivalencies give rise to 21 maternal-paternal grandparental mating type categories. The trio, consisting of the mother, father, and affected child, is treated as the outcome within these 21 grandparental mating type categories.
The number of family genotype combinations possible with a diallelic gene is 435, of which 148 are usefully distinct (table 1). The multinomial frequency distribution of families among these 148 possible outcomes provides the basis for analysis. With successive conditioning, the probability (Pr) for the family genotype combinations in a random sample can be expressed in the following way:
![]() |
|
![]() |
When the offspring and maternal genotypic effects on risk combine multiplicatively, the relative probabilities of the family genotype categories correspond to the nonzero cells in table 1. The relative risk of disease when the offspring genotype includes one (or two) copies of the variant allele, compared with no copies, is denoted R1 (or R2), and the relative risk of disease when the maternal genotype includes one (or two) copies of the variant allele, compared with no copies, is denoted S1 (or S2).
A maximum likelihood approach, using any software package that can do Poisson regression, can be used to estimate the relative risks:
![]() | (1) |
The i are 21 stratification parameters that jointly stratify on the genotypes of the maternal and paternal grandparents, ln[OFF(MM,MF,FM,FF,M,F,C)] is a term (defined below) with its coefficient constrained to be one (called an "offset"), and I{C=1}, I{C=2} and I{M=1}, I{M=2} are indicator variables that are one (or zero) according to whether the bracketed equalities hold true (or false). The model can be modified to allow for dominant (
1 =
2, ß1 = ß2) or recessive (
1 = 0, ß1 = 0) inheritance by suitably redefining the independent predictors, and interaction terms can be added to allow for the possibility that the combined effect of the offspring and maternal genotype is greater or less than multiplicative.
The term OFF(MM,MF,FM,FF,M,F,C) is the presumed known relative frequency in the general population (i.e., unconditioned on disease status of the offspring) of the triad genotype M, F, and C, conditional on MM, MF, FM, and FF. This probability is the product of the Mendelian probabilities of the mother's genotype conditional on the maternal grandparental genotypes (table 1, column 3), the father's genotype conditional on the paternal grandparental genotypes (table 1, column 4), and the child's genotype conditional on the parental genotypes (table 1, coefficients of the entries in columns 79). The offset is included in the model to allow for the fact that, assuming Mendelian transmission, matings involving two heterozygotes are twice as likely to produce a heterozygous child as they are to produce either type of homozygous child. Offsets are allowed, for example, in the GENMOD procedure of SAS software (11).
When genotype data are available for cases, their parents, and all four grandparents (i.e., complete data), standard Poisson regression software can be used to fit this model to the 148 observed cell counts, to derive estimates and confidence intervals for the relative risks (i.e., R1 = exp(ß1), R2 = exp(ß2), S1 = exp(1), and S2 = exp(
2)), and to perform tests of hypotheses regarding both offspring and maternal genotypic effects on the risk of disease using the likelihood ratio test. With the pent-1 approach, where data will be available for the maternal grandparents but not for the paternal grandparents, the expectation-maximization algorithm can be applied (10
). When the expectation-maximization algorithm is used, the parameter estimates generated by the software package in the maximization stage converge to the maximum likelihood estimates for the observed-data likelihood. However, as is always the case when the expectation-maximization algorithm is applied, the standard errors output by the statistical package are downwardly biased. Valid standard errors can be based on the observed-data likelihood by taking second derivatives of the log-likelihood or by using standard numerical methods to approximate those derivatives. Confidence intervals for parameters of interest can also be computed using profile likelihoods to specify bounds for the intervals. Moreover, when the expectation-maximization algorithm is used, the likelihood ratio tests must be based on the maximized observed-data likelihood (i.e., the one using the actual data) and not the pseudo-data likelihood (i.e., the one using the expectation-maximizationbased estimated data).
Pent-2
With the second method, the family constellation of interest is again the "pent" composed of the MM, MF, M, F, and C. This method does not require the assumption of genetic symmetry for either maternal grandparental matings (i.e., MM x MF: 1 x 0 = 0 x 1) or parental mating types. However, for simplicity of analysis and without loss of generality, the data are jointly stratified according to the six maternal grandparental mating types (rather than the nine maternal grandparental genotype combinations) and the paternal genotype, and the genotypes of the mother and affected child are treated as the outcome within each of these 18 categories (table 1). This simplification is possible because the maternal grandmother and grandfather have identical roles in the assumed model: to donate alleles to the mother of the case, and it provides the same results as the model that stratifies on the 27 possible genotype categories for the maternal grandparents and father.
A maximum likelihood approach, using any software package that can do Poisson regression, can be applied to a revision of equation 1 in order to estimate the relative risks associated with the variant allele:
![]() |
For both the pent-1 and the pent-2 approaches, the expectation-maximization algorithm can be used to include data from pents in which some individuals have not been genotyped, as described for case-parent trios (12).
Simulations
All simulations assumed an admixed population composed of two, equally sized subpopulations. Within each subpopulation, there were Hardy-Weinberg equilibrium and random mating at the locus of interest. However, the subpopulations differed with respect to the baseline disease risk (i.e., risk of disease in homozygous individuals with zero copies of the variant allele) (0.01 vs. 0.03), and the frequency of the variant allele was twofold higher in the subpopulation with the higher baseline risk. In this way, we created an association between the variant allele and disease due to admixture, which would invalidate case-control approaches in this population. For all analyses performed using the pent-1 and pent-2 approaches, paternal grandparental genotypes were treated as missing, reducing the families to the pent structure.
Simulations were undertaken to assess the validity of parameter estimates obtained using the pent-1 and pent-2 approaches. For each scenario, 1,000 simulated studies of 500 independent pents each were conducted, and mean estimates for R1, R2, S1, and S2 were calculated from the 1,000 simulation-based estimates. The standard errors of these means were used to construct 95 percent confidence intervals, to assess possible bias in estimation.
Additional simulations were undertaken to assess the power of the pent-1 and pent-2 approaches for detecting maternal genetic effects. For comparison, the power of the log-linear approach that requires genotype data from the affected child, parents, and all four grandparents (8) was also assessed. For each scenario, 1,000 simulated studies of 400 independent families each were conducted. The simulated data were used to compare the complete-data, pent-1, and pent-2 approaches. In addition, the noncentrality parameters for the test statistics, calculated from the log-likelihood of the expected data under the assumed parameter values, enabled us to calculate the asymptotic relative efficiencies of the various approaches and approximate powers for various sample sizes. Expected counts were also used to calculate noncentrality parameters and to estimate the power of the pent-1 approach to detect maternal (offspring) genetic effects with and without offspring (maternal) genetic effects in the model.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
For each of the scenarios considered, the pent-1 approach outperformed the pent-2 approach (table 3). The likelihood ratio tests based on the pent-1 approach had asymptotic relative efficiencies that ranged from 1.1 to 1.6, compared with tests based on the pent-2 approach. Hence, for a given study size and genotyping requirement, the pent-1 analysis provided increased power to detect maternal genetic effects relative to the pent-2 approach (figures 1 and 2).
|
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The two-step TDT (5) was the first family-based approach proposed to assess the dual contribution of the offspring and the maternal genotypes to disease risk. While this approach provides a valid test for the effects of the offspring genotype, the test for maternal genotypic effects is biased when the offspring genotype contributes to disease risk (8
). The present analyses indicate that the pent-1 and pent-2 approaches can provide valid estimates of both the offspring and maternal genetic effect parameters. However, the investigator should check that the available numbers support the estimation of all specified parameters. If the variant allele is rare and there are fewer than 10 affected offspring (mothers) who are homozygous for the variant, the data will be too sparse to estimate R2 (and/or S2), and simplified models (e.g., dominant, recessive, or gene-dosage) should be used. Moreover, results obtained using either the pent-1 or pent-2 approach need to be interpreted with caution when effects that are not specified by the model (e.g., maternal-offspring genotype interactions, parent-of-origin effects) may influence risk. Both approaches can be modified to allow for maternal-offspring genotype interactions, and the pent-1 approach can be modified to allow for parent-of-origin effects.
The present analyses indicated that the power of the pent-1 approach to detect maternal genetic effects was only slightly reduced relative to that of the approach based on complete grandparental genotype data (8). The slight increase in sample size requirements for the pent-1, relative to the complete-data, approach is likely to be more than offset by the reduction in recruitment (i.e., two vs. four grandparents) and genotyping (i.e., five vs. seven genotypes/family) costs.
By use of family-based designs, information on maternal genetic effects can be obtained from data on the transmission of the variant alleles from heterozygous maternal grandparents to the mothers of affected individuals, relative to Mendelian expectation, and allele counts for mothers, relative to fathers, of affected individuals. The pent-1 approach, which assumes genetic symmetry across all mating types, uses both sources of information regarding maternal genetic effects. In contrast, the pent-2 approach, which does not require the symmetry assumption, assesses maternal genetic effects using only the information derived from the transmission of the variant allele to the mothers of affected individuals. Hence, the pent-1 approach, which uses all of the relevant data, outperforms the pent-2 approach.
Comparison of noncentrality parameters for the pent-1 approach (table 4) with a previously described case-parent trio approach for evaluating offspring and maternal genetic effects (table 2 of Weinberg (8)) indicated that the pent-1 approach also outperforms the trio-based approach for evaluating maternal genetic effects. For the scenarios evaluated, likelihood ratio tests based on the pent-1 approach had asymptotic relative efficiencies that ranged from 1.3 to 1.9, compared with tests based on case-parent trios. This difference in power is not surprising, since the trio-based approach uses only the asymmetries in parental allele counts to estimate the maternal genetic effect parameters.
The pent-1 approach gains its relative advantage over the pent-2 approach by exploiting the assumption of genetic mating symmetry, which can fail. For example, consider a population in which there are allele frequency differences between subpopulations and in which males and females from one subpopulation differ with respect to the frequency with which they select mates from another subpopulation. Such differences will produce genetic asymmetries in parental matings, whether or not the variant of interest is associated with disease in the offspring, and could result in biased estimates of the maternal genetic effect parameters when the pent-1 or other approaches (e.g., trios) that assume symmetry are used. Such departures are possible, since allele frequencies vary across populations (20), and matings between populations differ by gender. For example, in the United States in 1999, births to White mothers and Black fathers were more common than were births to Black mothers and White fathers (65,276 vs. 18,619) (21
). However, considering all livebirths in the United States in 1999 for which parental race was available, only 5.1 percent had parents from different racial groups. Therefore, the potential for bias due to mating asymmetries is likely to be minimal, even when there are large differences in allelic frequencies across subpopulations. If study samples are drawn from areas where interpopulation matings are more common, families in which the mother and father are from different populations could be omitted from the analyses using the pent-1 approach, or the more robust pent-2 approach could be applied. Alternatively, as the trade-off between power and robustness may be hard to judge, both analyses could be conducted, and the more precise and powerful pent-1 approach could be selected when the estimates are similar for the two.
There are now several methods for assessing the effects of the offspring and maternal genotypes on disease risk. The choice in a given study is likely to be guided by practical considerations, including availability of affected families, genotyping costs, and the availability of grandparents. Birth defects and other disorders with early onset (22, 23
), as well as pregnancy-related disorders (24
), are obvious conditions that may be influenced by either offspring or maternal genetic effects or by both. In addition, there is an accumulating body of literature that suggests that some diseases of adulthood, including hypertension (25
), may be influenced by conditions in utero.
Obvious candidates for genes that may act through either the inherited or maternal genotype include those that are involved in placental development and function, growth, DNA repair, apoptosis, biotransformation of potential teratogens, and absorption and metabolism of vitamins and other nutrients required for embryonic and fetal development. In addition, the so-called maternal-effect genes (e.g., Zar1), which control early embryonic events via maternal gene transcripts and proteins present in the oocyte (26, 27
), may influence certain phenotypes (e.g., infertililty) via the maternal genotype.
In summary, family-based approaches can be used to assess the role of both the maternal and the inherited genotype in the etiology of a disease with early onset. The pent-1 and pent-2 approaches described here offer valid tests for both maternal and offspring genetic effects, and they allow estimation of the relative risks associated with maternal and offspring genotypes. Moreover, the pent-1 approach offers increased power relative to other family-based approaches.
![]() |
ACKNOWLEDGMENTS |
---|
The authors would like to thank Drs. Emily O. Kistner and David M. Umbach for their helpful comments on an earlier draft of this manuscript.
Conflict of interest: none declared.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|