Performance of the Log-Linear Approach to Case-Parent Triad Data for Assessing Maternal Genetic Associations with Offspring Disease: Type I Error, Power, and Bias

Jacqueline R. Starr1,2,3 , Li Hsu4,5 and Stephen M. Schwartz2,5

1 Department of Pediatrics, School of Medicine, University of Washington, Seattle, WA.
2 Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, WA.
3 Children’s Craniofacial Center, Children’s Hospital and Regional Medical Center, Seattle, WA.
4 Department of Biostatistics, School of Public Health and Community Medicine, University of Washington, Seattle, WA.
5 Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA.

Received for publication October 16, 2003; accepted for publication August 30, 2004.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Maternal genetic variation may serve as a biomarker in studies aimed at clarifying fetal determinants of infant or adult disease. The log-linear approach to case-parent triad data (LCPT) can be used to investigate maternal genetic polymorphisms in relation to offspring disease risk, but LCPT operating characteristics have been reported for only a limited range of situations. The authors performed a simulation study to investigate the performance of the LCPT for assessing maternal associations with offspring disease risk over a wide range of scenarios with varying sample sizes (n), high-risk allele frequencies (f ), and modes of inheritance, all of which greatly affect the expected number of triads in informative categories. For most f values less than 0.5, the LCPT approach with 200 triads allowed for approximately 80% power to detect valid, unbiased maternal relative risks of 2 when inheritance was log-additive or dominant. When inheritance was recessive, this was true for most f ’s greater than 0.35. Outside of this range, however, power and bias depended greatly on the mode of inheritance, f, and n. On the basis of these findings, epidemiologists may consider the LCPT a useful approach for assessing maternal relative risks unless one expects a very rare or fairly common maternal allele to increase offspring disease risk.

epidemiologic methods; log-linear model; operating characteristic; polymorphism (genetics); risk


Abbreviations: LCPT, log-linear approach to case-parent triad data; MDRR, minimum detectable relative risk; RR, relative risk.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The occurrence of diseases such as fetal alcohol syndrome and diethylstilbestrol-related birth defects underscores the importance of maternal exogenous exposures in relation to offspring disease risk. The fetal environment, and concomitantly infant disease risk, may be similarly affected by maternal endogenous characteristics, such as maternal metabolizing enzyme activities or hormone levels (1), as illustrated by pregestational diabetes-associated teratogenicity (2). More recently, maternal factors have been posited to influence the risk of adult-onset disease as well (35); for example, maternal pregnancy estrogen levels are thought to affect offspring risk of testicular or breast cancer (610).

In designing studies of maternal factors, however, either the rarity of a given disease, long induction periods, or both hamper the prospective investigation of maternal pregnancy biomarkers in relation to subsequent offspring disease risk. An alternative approach employs maternal genotypes at polymorphic loci in the relevant pathway as surrogates of genetically determined maternal in-utero biomarker levels in a retrospective epidemiologic study (1, 11, 12). Such methods have been applied in studies of maternal cigarette smoking and low birth weight (13), maternal thrombophilia and intrauterine growth restriction (14), and maternal folate metabolism and neural tube defects (15, 16).

The log-linear approach to case-parent triad data (LCPT) has been proposed as a method for estimating associations between maternal genotypes and disease risk in offspring, independent of disease associations with the offsprings’ own genotypes (1, 11). The LCPT performs well with 100 triads when the average high-risk variant allele frequency ( f ) is approximately 0.14 (1, 11), but, to our knowledge, no one has explored this approach with other sample sizes (n = number of triads) or values of f. Both n and f, however, greatly affect the expected distribution of triad genotypes and therefore should influence the operating characteristics of the LCPT, which might suffer when the expected number of triads in analytically informative categories is low. Thus, a more comprehensive investigation of the LCPT will aid researchers in planning studies and evaluating their results.

We performed a computer simulation study to evaluate the performance of the LCPT under various n’s and f ’s. We assessed 1) the rate of false-positive detection of maternal genetic associations under the null hypothesis of no maternal genetic associations with disease (type I error); 2) statistical power to detect moderate maternal genetic associations; 3) minimum detectable maternal relative risks (MDRRs) with 80 percent power when n = 200; and 4) bias in maternal relative risk (RR) estimates.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Computer simulations: generating scenarios
We created simulated data sets comprising case-parent triads classified by the combination of maternal (M), paternal (F), and offspring (C) genotypes. Assuming that F is unassociated with offspring disease risk, one can show that the probability of observing a given triad genotype combination among triads ascertained through cases (P[M,F,C|D]) is proportional to RmRcP[C|M,F]P[M,F], where Rm and Rc denote the relative risks associated with the maternal and offspring genotypes, respectively, and D denotes that the offspring is a case (11). We assumed Hardy-Weinberg equilibrium to generate P[M,F], ranging f from 0.05 to 0.95 in increments of 0.05, and we assumed Mendelian genetics to specify P[C|M,F]. We generated case-triads according to the relative risks associated with each triad class in a given scenario by using the above relation and simulating multinomial deviates.

We simulated sample sizes of 100, 200, 500, 1,000, 2,000, or 5,000 triads. For each value of n and f, we simulated maternal hetero- and homozygote relative risks of 1.4 and 2 (log-additive), 2 and 2 (dominant), and 1 and 2 (recessive), respectively. We assessed whether C affected maternal relative risk estimates and tests by varying whether offspring relative risks were null or non-null (non-null being the same as maternal relative risks in a given scenario).

We randomly generated 1,000 data sets for each scenario, performing all simulations in Stata 8.2 (17).

Using the LCPT to estimate relative risks and perform hypothesis tests
The LCPT is one of a class of family-based association tests for estimating offspring relative risks conditional on parental genotypes (18). Additionally, it offers a means to estimate the relative risks associated with maternal genotypes independent of offspring relative risks by using a log-linear regression model stratified on mating type (the combination of parental genotypes). There are six such combinations if one assumes that there is no sex bias in the distribution of parental genotypes in the general population; indeed, the null hypothesis that maternal genotypes are unassociated with offspring disease risk exploits this assumption.

We specified the unrestricted regression model as follows (1):

ln[E(nM,F,C)] = {gamma}j + {alpha}1I{M=1} + {alpha}2I{M=2} + ß1I{C=1} + ß 2I{C=2} + ln(2)I{M=F=C=1},

where each of M, F, and C equal 0, 1, or 2; j indexes the mating types; I{M=1} and I{M=2} represent dummy variables indicating whether or not the mother has one (M = 1) or two (M = 2) copies of the variant; I{C=1} and I{C=2} represent analogous terms for offspring genotypes; and the final term in the model represents a correction factor for the one triad (M = F = C = 1) having twice the probability of occurring relative to other triads of the same mating type. Thus, {kwi021eq1} (RR1) and {kwi021eq2} (RR2) represent the estimated relative risks associated with the mother’s having one or two copies of the variant, respectively. Elevated maternal relative risks indicate that among case-parents with discordant genotypes, mothers tend to have more copies of the variant than fathers. Mating types with concordant parental genotypes are uninformative regarding maternal relative risks.

Because of uncertainty about the true mode of inheritance at a locus, primary analyses will often employ the "unrestricted model" (1), in which one estimates separately the relative risks (and 95 percent confidence intervals thereof) associated with the mother’s hetero- or homozygosity. One could also incorporate prior information about the mode of inheritance (11, 19). For example, one could parameterize the model log-additively, imposing the assumption that risk multiplies with each additional allele (we restricted offspring parameters similarly in respective restricted models):

ln[E(nM,F,C)] = {gamma}j + (1/2){alpha}LI{M=1} + {alpha}LI{M=2} + (1/2)ßLI{C=1} + ßLI{C=2} + ln(2)I{M=F=C=1}.

To enforce a dominant or recessive genetic model, one could restrict the relative risks associated with maternal homozygosity to be equal to the RR2 or 1, respectively. We subsequently refer to {kwi021eq3} as RRL and to the dominant and recessive relative risks as RRD and RRR, respectively. We subtracted power in the unrestricted model (1) from power in each corresponding restricted model to compare their performances.

In all regression analyses, we tested maternal relative risk parameters statistically by performing likelihood ratio tests. For a given locus, twice the difference between the log-likelihoods of the model estimating maternal relative risks and a model in which maternal relative risks were assumed to be zero was compared with a chi-squared distribution with 2 df (model 1) or 1 df (restricted models). We rejected the null hypothesis on the basis of a two-sided test with a level of 0.05.

In calculating bias, type I error, and power, we excluded data sets in which any maternal relative risk coefficient could not be estimated or its estimated standard error was 0 or >50.

Estimating type I error
To assess type I error in each scenario, we calculated the proportion of data sets in which we rejected the null hypothesis when, indeed, maternal genotypes were unassociated with offspring disease risk. We calculated exact binomial 95 percent confidence intervals for the estimated error rates.

Estimating power
To estimate statistical power in each scenario, we calculated the proportion of data sets in which we rejected the null hypothesis when maternal genotypes were associated with offspring disease risk.

MDRR estimates with 80 percent power
With n = 200 triads, we estimated MDRRs calculated as the relative risk at which the null hypothesis of no association with maternal genotypes was rejected in approximately 80 percent of the 1,000 repeated simulations. For each scenario, we set an initial relative risk and generated 50 data sets. We performed the LCPT analysis in each data set and calculated power. If the 95 percent confidence interval for this proportion excluded 80 percent, we modified the relative risk and generated 50 new data sets. We iterated this process until the 95 percent confidence interval included 80 percent and then raised the number of simulated data sets to 200. We then iterated similarly. When the 95 percent confidence interval based on 200 simulated data sets included 80 percent, we raised the number of data sets to 1,000, modifying the relative risk and repeating sets of simulations until power estimates were between 77.5 percent and 82.5 percent. Across scenarios, we ranged f from 0.05 to 0.95 in the log-additive and recessive scenarios but only from 0.05 to 0.75 in the dominant model, because of prohibitive increases in the MDRR and simulation time.

Estimating bias
We evaluated bias graphically by plotting the exponentiated average of the maternal parameter regression coefficients against the high-risk variant allele frequency. We also calculated the ratio of this average to the true relative risk.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Identifiability of maternal parameter estimates
RR1 and RR2 were often nonestimable at very low and very high f ’s, while RRD was often nonestimable at very high f ’s and RRR at very low f ’s. When n = 100 and f = 0.05 or f = 0.95, as many as 96 percent of the data sets had a nonestimable maternal parameter estimate. Higher n’s and less extreme values of f improved this problem; almost all parameter estimates were estimable with 0.15 ≤ f ≤ 0.8 and n = 200, or with 0.1 ≤ f ≤ 0.9 and n = 1,000. RRL was estimable in all data sets.

Type I error
The LCPT was generally valid for most values of n and f (figure 1). The 95 percent confidence intervals for the type I error rates usually included 0.05.



View larger version (35K):
[in this window]
[in a new window]
 
FIGURE 1. Proportion of data sets exhibiting false-positive detection of maternal genetic associations with offspring disease risk using the log-linear approach to case-parent triad data, by sample size, high-risk variant allele frequency, mode of inheritance assumed in the analytic model (no assumptions, top row; assumed dominant, second row; assumed recessive, third row; assumed log-additive, bottom row), and offspring genotypic association with disease (null, left panels; non-null, right panels). Non-null offspring hetero- and homozygote relative risks (RRs) were 1.4 and 2, respectively.

 
With the unrestricted model, the test of maternal relative risks was occasionally conservative (figure 1), with 95 percent confidence intervals entirely below 0.05 in 23 of 228 scenarios. The 95 percent confidence interval lay entirely above 0.05 in 11 of 228 scenarios that tended to have moderate values of f.

With the assumed-dominant models (figure 1), the 95 percent confidence intervals lay above 0.05 in 12 of 228 scenarios. When f was very high, the 95 percent confidence interval often lay below 0.05 (in 22 of 228 scenarios).

This pattern was reversed with respect to f when the analytic model was assumed to be recessive (figure 1), with similar numbers of scenarios having 95 percent confidence intervals excluding 0.05. The test of maternal relative risks tended to be conservative at very low f ’s with smaller n’s.

When the analytic model was assumed to be log-additive (figure 1), the 95 percent confidence intervals excluded 0.05 in only 13 scenarios. In 10 of the 13 scenarios, the test was conservative, with little pattern in relation to f.

Statistical power for detecting maternal associations
The statistical power of the LCPT depended greatly on f (figure 2). Power was maximized around f = 0.4, 0.25, and 0.7 for underlying log-additive, dominant, and recessive phenotypes, respectively. Power decreased asymmetrically around these values, and this asymmetry was much more pronounced for maternally dominant and recessive phenotypes.



View larger version (33K):
[in this window]
[in a new window]
 
FIGURE 2. Statistical power of the log-linear approach to case-parent triad data for detecting maternal genetic associations with offspring disease risk, by sample size, high-risk variant allele frequency, underlying mode of inheritance (log-additive, top row; dominant, middle row; recessive, bottom row), and analytically assumed mode of inheritance (no assumptions, left column; assumed dominant, second column; assumed recessive, third column; assumed log-additive, fourth column). Underlying ("true") maternal hetero- and homozygote relative risks were, respectively, 1.4 and 2 in the log-additive scenarios, 2 and 2 in the dominant scenarios, and null and 2 in the recessive scenarios. In these simulations, the offsprings’ own genotypes were unassociated with disease risk.

 
Under log-additive inheritance, even with moderate values of f, 500 triads were required to achieve ≥80 percent power to detect an RR1 of 1.4 (figure 2). Increasing n to 2,000 sufficed to achieve ≥80 percent power for all f ’s. Analyzing the data with an assumed-log-additive model increased power slightly. Analyzing the data with an assumed-dominant model increased power slightly for very low f ’s but decreased it greatly for f > 0.45 (figures 2 and 3). Analyzing the data with an assumed-recessive model similarly affected power, depending on f in the opposite direction.



View larger version (26K):
[in this window]
[in a new window]
 
FIGURE 3. Differences in the statistical power of the log-linear approach to case-parent triad data for detecting maternal genetic associations with offspring disease risk using a sample size of 200 triads, comparing analyses in which the mode of inheritance is assumed to be dominant, recessive, or log-additive with analyses in which no assumptions are made regarding the mode of inheritance (unrestricted). Assumptions about the mode of inheritance involved restricting the regression parameters. In the assumed-dominant models, the maternal hetero- and homozygote relative risks (RRs) were assumed to be equal; in the assumed-recessive model, the maternal heterozygote RR was assumed to be null; and in the assumed-log-additive models, the maternal homozygote RR was assumed to be the square of the maternal heterozygote RR. Differences were calculated as the statistical power of the restricted model minus the power of the unrestricted model for the same data sets (the solid, dashed, and dotted lines compare the assumed-dominant, -recessive, and -log-additive models, respectively, with the unrestricted model). The underlying (or "true") maternal hetero- and homozygote RRs were, respectively, 1.4 and 2 in the log-additive scenarios (left panel), 2 and 2 in the dominant scenarios (middle panel), and null and 2 in the recessive scenarios (right panel). In these simulations, the offsprings’ own genotypes were unassociated with disease risk.

 
Under dominant inheritance, with the unrestricted analytic model, n = 200 afforded ≥80 percent power to detect maternal relative risks of 2 for 0.1 ≤ f ≤ 0.4 (figure 2), and increasing n to 500 raised power to ≥80 percent for all values of f less than 0.7. Power decreased sharply above this range, however, falling dramatically short of 80 percent even when n = 5,000 with f = 0.95.

Under recessive inheritance, the unrestricted model with n = 200 allowed for power of ≥80 percent to detect maternal homozygote relative risks of 2 when 0.45 ≤ f ≤ 0.8 (figure 2); power tended to be much lower at very low f ’s.

Analyzing the data enforcing a log-additive model when the true model was also log-additive increased power, by an average of 10 percent when n = 200 (figures 2 and 3), and never by more than 18 percent for all sample sizes studied. Making this assumption generally decreased power when the true genetic model was dominant or recessive, however, sometimes by as much as 60 percent. Similarly, correctly specifying the model as dominant or recessive increased power for most f ’s, by as much as an absolute increase of 20 percent (figures 2 and 3). Mistakenly assuming a dominant or recessive model when the opposite mode of inheritance was true, however, reduced power greatly for most f ’s (figures 2 and 3), by as much as an absolute difference of 97 percent.

As compared with null offspring relative risks, offspring non-null associations slightly increased power to detect maternal relative risks when f was lower and slightly decreased power when f was higher (figure 4). The differences, though small, were consistent across all scenarios considered.



View larger version (21K):
[in this window]
[in a new window]
 
FIGURE 4. Using a sample size of 200 triads, the statistical power of the log-linear approach to case-parent triad data for detecting maternal genetic associations with offspring disease risk depends on whether the offsprings’ own genotypes also affect disease risk. The underlying (or "true") maternal hetero- and homozygote relative risks (RRs) were, respectively, 1.4 and 2 in the log-additive scenarios (left panel), 2 and 2 in the dominant scenarios (middle panel), and null and 2 in the recessive scenarios (right panel). When offspring RRs were non-null, they were equal to the maternal RRs for a given underlying mode of inheritance. In all analyses, hetero- and homozygote RRs were estimated separately, with no assumptions about the underlying genetic model.

 
MDRR estimates with 80 percent power and 200 triads
When the underlying genetic model was log-additive and when 0.25 ≤ f ≤ 0.75, minimum detectable heterozygote relative risks were approximately 1.5–1.7 (figure 5). Variation of the MDRRs with f demonstrated slight skewing towards higher MDRRs as f approached 1, as compared with f near 0. MDRRs from the assumed-log-additive model were 6 percent lower than those from the unrestricted model, on average, and were never more than 14 percent lower.



View larger version (28K):
[in this window]
[in a new window]
 
FIGURE 5. Minimum detectable maternal relative risks (MDRRs) according to high-risk variant allele frequency when the underlying genetic model is log-additive (left panel), dominant (middle panel), or recessive (right panel), using a sample size of 200 case-parent triads. For all underlying modes of inheritance, hetero- and homozygote relative risks (RRs) were estimated separately, with no assumptions about the underlying genetic model (in the unrestricted model, the solid line indicates the homozygote RR and the dotted line indicates the heterozygote RR). Additionally, MDRRs were estimated in restricted analyses, in which the analytic model was assumed to be the same as the true mode of inheritance. For the underlying log-additive models, the maternal homozygote RR is indicated by the dashed line, and it was assumed to be the square of the maternal heterozygote RR; for the underlying dominant models, the true maternal RRs were equal; for the underlying recessive models, maternal heterozygote RRs were null, and the maternal homozygote RRs from the assumed-recessive analyses are indicated by the dashed lines. In all simulations, the offsprings’ own genotypes were unassociated with disease risk.

 
We observed minimum MDRRs of approximately 2 when 0.15 ≤ f ≤ 0.35 under a dominant genetic model and when 0.4 ≤ f ≤ 0.8 under a recessive genetic model (figure 5). The MDRR increased greatly outside of these ranges. When f was greater than 0.65 under dominant inheritance, MDRRs began to increase sharply, becoming prohibitively high. Under recessive inheritance, the MDRRs varied asymmetrically with f in the direction opposite of that observed with dominant inheritance, but this skewing was less severe. When the assumed model was the same as the true one, MDRRs from the assumed-dominant and assumed-recessive models were generally 5–20 percent lower than those from the unrestricted model.

Non-null offspring relative risks reduced MDRRs (by ≤23 percent) at lower f ’s and increased them at higher f ’s. The increases depended on the underlying genetic model and were as high as 100 percent under dominant inheritance (data not shown).

Bias in maternal relative risk estimates
At f = 0.95, even at sample sizes of up to 1,000 triads, maternal RR1, RR2, and RRD estimates were sometimes as much as 50–160 percent higher or 85 percent lower than their true values, depending on the underlying inheritance (figure 6). At the lowest values of f, maternal RR1 and RRD estimates were unbiased, but the maternal RR2 and RRR estimates tended to exhibit large downward biases (figure 6). Analyzing the data with a dominant or recessive model, even when the assumption was correct, did not greatly reduce these biases (figure 6). The sample size needed to minimize the observed bias depended on the value of f and was sometimes as high as 5,000. Estimates of RRL were not generally biased when the true model was log-additive (figure 6).



View larger version (19K):
[in this window]
[in a new window]
 
FIGURE 6. Average maternal relative risk (RR) estimates from the log-linear approach to case-parent triad data, by sample size, high-risk variant allele frequency, underlying mode of inheritance (log-additive, top row; dominant, middle row; recessive, bottom row), and analytically assumed mode of inheritance (no assumptions, first two columns; assumed dominant, third column; assumed recessive, fourth column; assumed log-additive, fifth column). The underlying (or "true") maternal hetero- and homozygote RRs were, respectively, 1.4 and 2 in the log-additive scenarios, 2 and 2 in the dominant scenarios, and null and 2 in the recessive scenarios. In these simulations, the offsprings’ own genotypes were unassociated with disease risk. Each average RR estimate was calculated by exponentiating the average of the coefficients estimated in the data sets for a given scenario.

 
Expected proportion of triads per genotype combination and empty cells
To explore the reason for extreme decreases in power and increases in bias as f approached 0 and, especially, 1, we calculated the expected proportion of triads per combination of M, F, and C when f = 0.05 or f = 0.95. Selecting triads into the study through cases caused asymmetry in the triad distribution with respect to f. Expected proportions were lower (and thus the probability of empty cells was greater) in informative mating type categories when f = 0.95 as compared with f = 0.05 (table 1). This skewing was particularly pronounced for dominant inheritance, when mating types 3 and 5 contribute the most information regarding the magnitude of the maternal relative risks but would be expected to hold less than 1 percent of triads with f = 0.95. Very low f ’s caused a similar problem with recessive inheritance.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Expected proportion of case-parent triads with each combination of maternal (M), paternal (F), and offspring (C) genotypes, given no association or given that the underlying mode of inheritance is log-additive, dominant, or recessive, with a variant allele frequency of either 0.05 or 0.95
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The expression of maternal genes could strongly affect embryonic and fetal physiology, either directly or indirectly, by influencing the levels of nutrients, toxins, or other compounds that cross the placenta. Recognition of the potential importance of maternal factors in relation to offspring risk of fetal- and adult-onset diseases may be increasing, necessitating the characterization of methods for investigating maternal genetic associations with disease risk.

In this study, we found that for moderate values of f, the LCPT approach with 500 triads allowed for approximately 80 percent power to detect unbiased maternal RR1’s of 1.4, 2, and 2 when inheritance was log-additive, dominant, and recessive, respectively. Operating characteristics depended greatly on the mode of inheritance, f, and n. Large decreases in power and increases in bias may make the LCPT method impractical for studying putative high-risk maternal alleles that are relatively common when the underlying mode of inheritance is log-additive or, particularly, when it is dominant. Conversely, with underlying recessive inheritance, power and estimation would both suffer if the high-risk allele were very rare, though to a lesser extent than with very common high-risk alleles under dominant inheritance.

We illustrated the cause of these dramatic shifts in the performance of the LCPT by calculating the expected proportion of triads in each triad genotype category under specific alternative hypotheses. Mating type categories consisting of concordant maternal and paternal genotypes (mating types 1, 4, and 6) are uninformative regarding maternal relative risks. When the true mode of inheritance is dominant or recessive, however, mating type 2 or mating type 5, respectively, also offers little information regarding the magnitude of the maternal relative risk. The expected frequency of triads in some informative triad categories drops sharply when f nears 0 or 1, and the number of categories with near-zero counts is higher for f ’s approaching 1 versus 0. This asymmetry reflects the mechanics of the LCPT analysis, in which one measures skewing away from the expected distribution of triads in the absence of maternal associations with the cases’ disease. Even under the null-hypothesized distribution, extremely rare or common alleles may lead to many zero cell-counts. Yet, among triads randomly selected from the population, the expected distribution of empty cells when f = 0.05 is exactly the reverse of that when f = 0.95. However, under the alternative hypothesis—that is, when the maternal relative risk is greater than 1—the distribution of empty cells among triads ascertained through cases will not be symmetric around f = 0.5, regardless of the mode of inheritance. Thus, even under log-additive inheritance, the LCPT had poorer performance when f = 0.95 than when f = 0.05.

The poor performance of the LCPT for studying very common high-risk alleles raises the question of whether this is a practical limitation or merely a theoretical limitation. The LCPT is potentially most applicable to the study of complex diseases—those with multiple genetic and environmental factors, each of which typically would be only slightly associated with increased disease risk. A causal maternal genetic variant that occurs in 95 percent of the population and increases offspring disease risk fivefold, for example, would contribute to the occurrence of approximately 90 percent of cases, which would be inconsistent with this paradigm of complex disease etiology (20, 21). However, this proportion increases or decreases with the magnitude of the relative risk (22), such that a causal maternal variant with a prevalence of 95 percent and a relative risk of only 1.5 would contribute to only about 30 percent of cases. Furthermore, if the very common allele acts as a modifier of risk increases caused by other genetic or environmental factors, its associated relative risk and population attributable risk percentage would be even lower in analyses that did not account for this heterogeneity (indeed, the situation just described drives much of the current, widespread research emphasis on "gene-environment" interactions). Thus, the high prevalence of a putative high-risk allele is not sufficient to exclude it as a candidate of potential etiologic or public health importance. Despite their potential scientific interest, however, very high prevalence might impede the investigation of such variants via the LCPT approach unless the true mode of inheritance were recessive.

In addition to f and n, any factor influencing the expected triad frequencies could affect the performance of the LCPT, particularly at small n’s. Although the tests of maternal and offspring relative risks are orthogonal (11), we showed that with smaller n’s, associations between the offsprings’ own genotypes and disease can affect the power to detect maternal genetic associations with the offspring’s disease. Despite the weakness of these effects, they were consistent across analytic models and modes of inheritance, and the effects were stronger when the magnitude of the offspring relative risks was larger (data not shown). Power did not depend on offspring relative risks when n was ≥2,000.

Some case-parent triad studies may include a mix of complete and incomplete triads—for example, if only one parent is available for genotyping (11, 23). Although we did not examine this issue in our simulations, we would expect that missing parental genotypes would reduce the sample size and could thereby worsen LCPT performance. Even if a parent’s unavailability were related to his/her genotype of interest, missingness should not bias the maternal relative risk unless genotypes were differentially related to mothers’ probability of being unavailable as compared with fathers’.

In contrast to LCPT-derived offspring relative risk estimates, the maternal relative risk estimates may be confounded by associations between maternal ethnicity and maternal alleles (24). We suspect that the fact that mating partners are often chosen on the basis of ethnicity may serve to decrease potential confounding from population stratification, since the LCPT implicitly matches case mothers to case fathers rather than to control mothers. Even within homogeneous groups, however, any underlying genetic basis for sex-specific choices of mates could confound observed associations with maternal variants of interest, because the null hypothesis for the test of maternal associations depends on the assumption of mating type symmetry. For nonrandom mating to cause inflated error rates, the nonrandomness would have to correlate with variation in the genetic region of interest and also be sex-specific. For example, if it is more important to women than to men to choose a mate who is tall relative to the general population, LCPT analysis of variants in genes related to height, such as the human growth hormone (GH1) gene (25), may yield spurious maternal associations with offspring disease risk. While it is difficult to assess how commonly such spurious results might arise, differences between mothers’ and fathers’ genotypes could be explored among randomly selected triads.

It has been suggested that the power of the LCPT design would increase if one incorporated into the analysis prior information about the mode of inheritance (11). We demonstrated that the consequent gain in power that occurs when the model is correctly specified may be outweighed by the potential loss of even greater power when model assumptions are false, particularly in assumed-dominant or assumed-recessive models. One might be tempted to fit a restricted model (e.g., a dominant one) if a primary, unrestricted analysis did not yield results of "statistical significance," perhaps in the hopes of consolidating sparse analytic categories. However, basing the choice of analysis on the results of the first model fit would probably still inflate the overall error rate due to making multiple comparisons, even though we showed that falsely assuming either a dominant or a recessive model in a single model generally proved to be valid.

Analyzing data by using a log-additive model led to only moderate loss of efficiency with smaller sample sizes, even when the true model was dominant or recessive. This is consistent with what has been shown regarding the performance of the LCPT for assessing offspring relative risks (19). However, the loss of efficiency was much greater when n was greater, particularly at extreme values of f (data not shown). Also in its favor, the assumed-log-additive model allowed for identifiable parameter estimates more often than did the other models considered, even when inheritance was dominant or recessive.

We have presented results of a simulation study in which we investigated the performance of the LCPT over a wide range of epidemiologic scenarios. On the basis of these findings, epidemiologists may consider the LCPT a useful approach for assessing maternal genetic associations, unless they expect a very rare or fairly common maternal allele to increase disease risk.


    ACKNOWLEDGMENTS
 
This research was supported by National Institutes of Health grant R01CA85914. Dr. Jacqueline Starr was supported in part by predoctoral training grant T32 DE07227 from the National Institute of Dental and Craniofacial Research.

The authors thank Dr. David M. Umbach for his thoughtful suggestions on this article.


    NOTES
 
Correspondence to Dr. Jacqueline R. Starr, Departments of Pediatrics and Epidemiology, University of Washington, Box 359300 (M2-8), Seattle, WA 98105-0371 (e-mail: jrstarr{at}u.washington.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Wilcox A, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads." Am J Epidemiol 1998;148:893–901.[Abstract]
  2. Kousseff BG. Diabetic embryopathy. Curr Opin Pediatr 1999;11:348–52.[CrossRef][Medline]
  3. Barker DJ, Eriksson JG, Forsen T, et al. Fetal origins of adult disease: strength of effects and biological basis. Int J Epidemiol 2002;31:1235–9.[Abstract/Free Full Text]
  4. Petry CJ, Hales CN. Long-term effects on offspring of intrauterine exposure to deficits in nutrition. Hum Reprod Update 2000;6:578–86.[Abstract/Free Full Text]
  5. Alcolado JC, Laji K, Gill-Randall R. Maternal transmission of diabetes. Diabet Med 2002;19:89–98.[CrossRef][ISI][Medline]
  6. McLachlan JA, Newbold RR, Li S, et al. Are estrogens carcinogenic during development of the testes? APMIS 1998;106:240–2.[ISI][Medline]
  7. Schottenfeld D. Testicular cancer. In: Schottenfeld D, Fraumeni JF Jr, eds. Cancer epidemiology and prevention. New York, NY: Oxford University Press, 1996:1207–19.
  8. Rajpert-De Meyts E, Skakkebaek NE. The possible role of sex hormones in the development of testicular cancer. Eur Urol 1993;23:54–9.[ISI][Medline]
  9. Schernhammer ES. In-utero exposures and breast cancer risk: joint effect of estrogens and insulin-like growth factor? Cancer Causes Control 2002;13:505–8.[CrossRef][ISI][Medline]
  10. Okasha M, McCarron P, Gunnell D, et al. Exposures in childhood, adolescence and early adulthood and breast cancer risk: a systematic review of the literature. Breast Cancer Res Treat 2003;78:223–76.[CrossRef][ISI][Medline]
  11. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998;62:969–78.[CrossRef][ISI][Medline]
  12. Mitchell LE. Differentiating between fetal and maternal genotypic effects, using the transmission test for linkage disequilibrium. Am J Hum Genet 1997;60:1006–7.[CrossRef][ISI][Medline]
  13. Wang X, Zuckerman B, Pearson C, et al. Maternal cigarette smoking, metabolic gene polymorphism, and infant birth weight. JAMA 2002;287:195–202.[Abstract/Free Full Text]
  14. Kupferminc MJ, Many A, Bar-Am A, et al. Mid-trimester severe intrauterine growth restriction is associated with a high prevalence of thrombophilia. BJOG 2002;109:1373–6.[ISI][Medline]
  15. Brody LC, Conley M, Cox C, et al. A polymorphism, R653Q, in the trifunctional enzyme methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase/formyltetrahydrofolate synthetase is a maternal genetic risk factor for neural tube defects: report of the Birth Defects Research Group. Am J Hum Genet 2002;71:1207–15.[CrossRef][ISI][Medline]
  16. Doolin MT, Barbaux S, McDonnell M, et al. Maternal genetic effects, exerted by genes involved in homocysteine remethylation, influence the risk of spina bifida. Am J Hum Genet 2002;71:1222–6.[CrossRef][ISI][Medline]
  17. StataCorp. Stata statistical software, release 8.0. College Station, TX: Stata Corporation, 2003.
  18. Schaid DJ, Sommer SS. Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet 1993;53:1114–26.[Medline]
  19. Schaid DJ. General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 1996;13:423–49.[CrossRef][ISI][Medline]
  20. Lander ES, Schork NJ. Genetic dissection of complex traits. Science 1994;265:2037–48.[ISI][Medline]
  21. Risch NJ. Searching for genetic determinants in the new millennium. Nature 2000;405:847–56.[CrossRef][ISI][Medline]
  22. Benichou J. A review of adjusted estimators of attributable risk. Stat Methods Med Res 2001;10:195–216.[ISI][Medline]
  23. Weinberg CR. Allowing for missing parents in genetic studies of case-parent triads. Am J Hum Genet 1999;64:1186–93.[CrossRef][ISI][Medline]
  24. Sinsheimer JS, Palmer CG, Woodward JA. Detecting genotype combinations that increase risk for disease: maternal-fetal genotype incompatibility test. Genet Epidemiol 2003;24:1–13.[CrossRef][ISI][Medline]
  25. Horan M, Millar DS, Hedderich J, et al. Human growth hormone 1 (GH1) gene expression: complex haplotype-dependent influence of polymorphic variation in the proximal promoter and locus control region. Hum Mutat 2003;21:408–23. [CrossRef][ISI][Medline]