Sample size determination for studies of gene-environment interaction

JA Luana, MY Wongb, NE Daya and NJ Warehama

a Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, Cambridge CB2 2SR, UK.
b Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong.

Dr Nicholas J Wareham, Department of Public Health and Primary Care, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB2 2SR, UK. E-mail: njw1004{at}medschl.cam.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
Background The search for interaction effects is common in epidemiological studies, but the power of such studies is a major concern. This is a practical issue as many future studies will wish to investigate potential gene-gene and gene-environment interactions and therefore need to be planned on the basis of appropriate sample size calculations.

Methods The underlying model considered in this paper is a simple linear regression

and relating a continuous outcome to a continuously distributed exposure variable.

Results The slope of the regression line is taken to be dependent on genotype, and the ratio of the slopes for each genotype is considered as the interaction parameter. Sample size is affected by the allele frequency and whether the genetic model is dominant or recessive. It is also critically dependent upon the size of the association between exposure and outcome, and the strength of the interaction term. The link between these determinants is graphiscally displayed to allow sample size and power to be estimated. An example of the analysis of the association between physical activity and glucose intolerance demonstrates how information from previous studies can be used to determine the sample size required to examine gene-environment interactions.

Conclusions The formulae allowing the computation of the sample size required to study the interaction between a continuous environmental exposure and a genetic factor on a continuous outcome variable should have a practical utility in assisting the design of studies of appropriate power.

Keywords Genotype, environmental exposure, gene-environment interaction, sample size, quantitative trait

Accepted 7 February 2001


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
The study of interaction or effect modification is frequently undertaken in epidemiology, but the power of such studies to demonstrate these interactions, and therefore their sample size is a matter of concern. Previous papers12345 have presented power and required sample size calculations for case-control studies of gene-environment interaction where the environmental factor is categorical and the genetic factor is binary. Using published formulae,1 Hwang et al.2 presented sample size calculations for a binary environmental exposure and a binary genetic factor. These were extended by Foppa and Spiegelman3 to consider an environmental exposure that was categorized into multiple levels. Both of these methods were subsequently compared4 with an approach designed for the general multivariate regression model for the odds ratio.6 However, in all of these studies the outcome is a binary event variable, such as the occurrence of a disease of interest.

The alternative situation where the outcome variable is continuously distributed has received less attention but is likely to become important as researchers investigate the genetic basis of quantitative traits such as blood pressure and obesity. A method for calculating power in this situation was recently described but was limited to a number of specific situations in which some main and interaction effects were fixed to zero.7 In the approach presented here, we consider the situation of an effect of a categorical genetic factor on the association between a continuous environmental exposure and a continuously distributed outcome. We illustrate the utility of this approach with an example of the investigation of the interaction between genes and physical activity in the determination of glucose tolerance.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
Suppose that we consider a certain autosomal locus in which there are two different alleles, a and A, where a is the rare allele. There would be three possible genotypes, aa, aA, and AA. For the purposes of this analysis we have considered dominant and recessive models which allow the three genotypes to be reduced to two genetic groups, i.e. dominant (carriers of the rare allele versus homozygotes for the common allele) or recessive (homozygotes for the rare allele versus all others). The relationship between a continuous outcome variable y and the genetic factor with a continuous environmental exposure E can be expressed as two simple linear regressions



The regression parameters {alpha}i and ßi are weights reflecting the contribution of the genetic factor and the environmental exposure to the continuously distributed outcome y. If there is no gene-environment interaction, then the regression parameters ß1 and ß2 are equal. {varepsilon} is a stochastic error term and is assumed to be normally distributed with mean zero and variance {sigma}2y. We assume the distributions of the residual of y in each group are the same, and the variances of exposure E in each group are {tau}2. In order to give the ß parameters a clear interpretation, we have standardized both the outcome and the environmental exposure by making {sigma}2y = {tau}2 = 1. {sigma}2y is the residual variance of y after adjusting for E. In most situations E would account for 20% or less of the total variation in y and therefore {sigma}2y would be within 10% of the population standard deviation. Thus the ß coefficients are interpretable as the approximate proportion of a standard deviation change in y for a standard deviation change in E.

We consider a general situation for a polymorphissm where p is the frequency of the rare allele. Assuming that the polymorphissm is in Hardy-Weinberg equilibrium, then the genotype frequencies of aa, aA and AA are p2, 2p(1 – p) and (1 – p)2, respectively. Accordingly, the proportions of individuals in the two genetic groups are p2 and 1 – p2 for a recessive model, and p(2 – p) and (1 – p)2 for a dominant model, respectively. To study the effect of the environmental exposure on the association of the outcome variable with this genetic factor, we test the null hypothesis that the regression slopes in the genetic sub-groups are equal. If n individuals are studied, then the test statistic (Appendix) is distributed as a F-distribution with degrees of freedom 1 and n 4 under the null hypothesis, and a non-central F-distribution with degrees of freedom 1 and n – 4 under the alternative hypothesis.8 The non-centrality parameter is

where p1 and p2 are the proportions of individuals in each group. The above expression of the non-centrality parameter can be simplified re-written as np1p21 – ß2)2, which means that we can always set the higher risk group as the first group without affecting the sample size and power calculations.

In this paper we adopt the definition of the non-centrality parameter as given by Rencher,9 S-Plus10 and SAS.11 However, in some papers,1213 it is defined as, where {phi} is the non-centrality parameter defined above, and k is the numerator degrees of freedom of the test statistic.

Under the situation that the two slopes are equal, we can study the association of the outcome variable with the genetic factor where E is included as a confounding factor, i.e. to test whether the two intercepts are equal. If the slopes are not equal, then testing the equality of the intercepts is misleading. The test statistic (Appendix) follows an F-distribution with degrees of freedom 1 and n – 3 under the null hypothesis, and a non-central F-distribution with degrees of freedom 1 and n – 3 under the alternative hypothesis. The non-centrality parameter is

Using the distribution and the non-centrality parameter, we are then able to calculate power to detect an interaction effect or alternatively the sample size necessary to detect a given interaction with fixed power and significance. We have not adopted any specific parametric model for describing the interaction. Instead in the results and figures we present power calculations over a range of values for ß1 and ß2.

The range of possible values for ß1 and ß2 are derived from the study of the relationship between physical activity and glucose intolerance. This association is typical of quantitative traits that may be influenced by genetic factors, as evidence from ecological and migration studies suggests the possibility of strong gene-environment interactions.14 In a study by Wareham et al.,15 the relationship between physical activity and a continuous measure of glucose intolerance was quantified using an objective measure of energy expenditure and a multivariate approach to correction for measurement error. The corrected regression coefficient relating habitual energy expenditure to the 2-h plasma glucose was –0.72 mmol/l per standard deviation of the physical activity level, the ratio of the total energy expenditure to basal metabolic rate. The 95% CI for this coefficient were –0.35 to –1.15 mmol/l per standard deviation. As the population standard deviation for the 2-h plasma glucose was 2.2 mmol/l, we may then express this coefficient standardized for the dependent variable too, resulting in a central estimate of –0.33 with 95% CI of –0.16 to –0.52. In the analysis of plausible values for ß2, we have, therefore taken 0.1 to 0.5 as the range of overall effect that would be of interest in the study of gene-environment interactions. We have simplified the reporting of associations by only considering positive associations, as the results would be symmetrical for associations that were in the opposite direction. This range of ß2 values is plausible and would include the central estimates from other studies that have examined the association between continuous outcomes and continuous exposures. For example in the Intersalt study16 the pooled regression coefficient relating 24-h sodium excretion to systolic blood pressure was 0.0354 mm Hg/mmol sodium per day. As the standard deviation of the systolic blood pressure in the UK centres was approximately 15 mm Hg and the standard deviation of the sodium excretion was 50 mmol per day, this can be converted to a standardized ß2 value of 0.12, which is within the range we have selected to examine. Although it is possible that stronger effects would be of interest, there are at present few examples of such strong associations and we have limited our attention to those that are less than 0.5.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
Figure 1Go shows how sample size and the power to detect an interaction for a given allele frequency of the rare allele (5%) vary according to the ratio of the standardized regression coefficients relating the environmental exposure to the outcome in the genetic sub-groups. Using the range of values for ß2 from the example of glucose intolerance and physical activity, the figure shows that when the effect in those with the common allele is large 2 = 0.5) and there is a moderately strong interaction such that the individuals with the rare allele have a slope that is twice as great, then a study of under 1000 people would be sufficient to detect this interaction with power of greater than 90%. However, if the effect size in those with the common allele were much smaller (ß2 = 0.1), then even a study of 8000 individuals would be underpowered to detect a doubling of this effect size in those with the rare allele. Figure 1 also shows how power is markedly increased if the interaction is very strong. When the effect size in those with the common allele is small (ß2 = 0.1), but the effect in those with the rare allele is five times stronger (rather than twice as strong in the previous example), then a study of 600 people would have a power of 80% to detect the interaction.



View larger version (18K):
[in this window]
[in a new window]
 
Figure 1 Power and sample size to detect a gene-environment interaction with 5% significance for different values of and assuming a rare allele frequency of 5% in a dominant model: ß coefficients are standardized

 
The example in Figure 1 was constrained by the frequency of the rare allele which was fixed at 5%. Figure 2Goshows how the power to detect an interaction effect of 2 with a moderate effect 2 = 0.25) is affected by alterations in the allele frequency. Two different genetic models are considered (recessive and dominant), but the same graph can be used to estimate power and sample size for both. Using the example of the association between physical activity and glucose intolerance where the central estimate was approximately equal to that considered in this figure, then a study of 2000 individuals would have more than 80% power to detect a doubling of this effect size in a sub-group of individuals with the rare allele which occurred with a frequency of more than 5% and was dominant. However, only very large studies (6000 individuals) would be powered sufficiently to detect an interaction of the same magnitude if the rare allele had its biological effect in a recessive manner, and even then the rare allele frequency would need to be high (15%).



View larger version (26K):
[in this window]
[in a new window]
 
Figure 2 Power and sample size to detect a given gene-environment interaction (ß1 = 0.5, ß2 = 0.25) with 5% significance for varying frequencies of the rare allele in both recessive and dominant genetic models: ß coefficients are standardized

 
In Figures 1 and 2 the magnitude of the gene-environment interaction was considered to be relatively strong as the ratio of the two regression slopes was assumed to be at least 2. Although such strong gene-environment interactions may exist, in any given situation the strength of the interaction will not be known at the point at which power and sample size are being considered. Therefore, we have calculated sample size for possible interactions ranging from close to 1 up to 3. Figure 3Goshows a series of sample size plots in which the ratio of ß1 : ß2 is varied for a range of plausible values of ß2 with power fixed at 80% and significance at 5%. Separate graphs are presented for alleles of differing frequencies. When the rare allele is common (30%) and dominant, then if the effect size is large (ß2 = 0.5) in those with the common allele, a study of 5000 would be powered to detect an effect in those with the rare allele which was only about 1.2 times greater. Such a small interaction may however, be biologically important and of potential public health significance as the polymorphissm is common and has a large effect. Conversely, only an enormous study of almost 50 000 people would be sufficient to detect an interaction effect of 3 for an uncommon dominant allele (0.5%) if the effect in those with the common allele was small (i.e. ß2 = 0.1). In this situation, one would need to question whether such interactions were worth detecting.



View larger version (36K):
[in this window]
[in a new window]
 
Figure 3 Sample size required to achieve 80% power at 5% significance level to detect gene-environment interactions of varying magnitude by different frequencies of the rare allele

 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
In this paper we have presented the formulae and graphs necessary to calculate the statistical power and sample size that is required to study the interaction between a genetic factor and a continuous environmental exposure on a continuously distributed outcome variable. The need for such sample size calculations is likely to increase as we attempt to design studies aimed at understanding the genetic basis of common diseases. The key parameters that determine the sample size are the frequency of the genetic factor and the manner in which it has its biological effect i.e. whether it is dominant or recessive. In addition, power and sample size are critically determined by the absolute magnitude of the slope of the regression linking the environmental exposure and the outcome in people with the common allele, and the ratio of this slope to that in the sub-group of people with the rare allele. As in the case of the example of the association between physical activity and glucose intolerance, estimates of the overall effect size may already be available from previously published studies and pilot work can relatively quickly establish the allele frequency for candidate polymorphissms. The parameter that is uncertain is the strength of the interaction. This is, of course, the outcome of the study, and in settling on a given value to calculate power one would need to be guided by consideration of what size of interaction would be of biological importance in a given situation.

The fact that power critically depends upon the magnitude of the association between the environmental exposure and the outcome is an argument for utilizing exposure measurement instruments that have small degrees of error, because less precise instruments will result in attenuated regression coefficients, making it harder to detect gene-environment interactions. Given that the cost of epidemiological studies is determined not only by the total sample size but also by the cost of measuring the main exposures, the balance between investing in large studies with imprecise but inexpensive exposure measurement compared to smaller studies with expensive but more precisely measured exposures becomes critical in planning future studies to detect possible gene-environment interactions.


    Appendix
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
The test statistic, for testing H0 : ß1 = ß2, is equal to

where n is the sample size, Y = (y1,y2,...yn)', X is the design matrix17 accommodating the linear regression models in this paper,

Xß is the design matrix when ß1 = ß2

where xi is the environmental variable value of individual i, and k is the number of individuals in the first genetic group. Under the null hypothesis, the test statistic Fß follows F-distribution with degrees of freedom 1 and n – 4. Under the alternative hypothesis, Fß has a non-central F-distribution with degrees of freedom 1 and n – 4 with non-centrality parameter

p1 and p2 are the proportions of individuals in the first and second genetic groups, and p1+ p2 = 1. For a recessive model and a dominant model, is p1 equal to p2 and p(2 – p), respectively.

The power with 5% significance level for fixed values of n, p, ß1 and ß2 can be obtained easily using any statistical software, e.g. in SAS, the command for the power calculation is

Under the situation that ß1 = ß2, the test statistics, for testing H0 : {alpha}1 = {alpha}2, is equal to

where X{alpha} is the design matrix when {alpha}1 = {alpha}2 and ß1 = ß2, that is, a n x 2 matrix with all elements in the first column equal to one and x1,x2,...,xn in the second column. Under the null hypothesis, the test statistic F{alpha} follows F-distribution with degrees of freedom 1 and n – 3. Under the alternative hypothesis, F{alpha} has a non-central F-distribution with degrees of freedom 1 and n – 3 with non-centrality parameter


KEY MESSAGES
  • Existing power and sample size calculations exist for examining interaction in case-control studies.
  • This paper presents power and sample size calculations for gene-environment interaction studies in which both the environmental exposure and the outcome are continuous.
  • Power is dependent upon:
    • the frequency of the genetic polymorphissm and whether its biological effect is dominant or recessive,
    • the magnitude of the interaction effect, expressed as the ratio of the slopes of the genotype-specific regression coefficients between exposure and outcome,
    • the absolute slope of these regression coefficients.

 


    Acknowledgments
 
Dr Wareham is an MRC Clinician Scientist Fellow. The work of Dr Wong was supported by the British Council and the Royal Society. The idea for this paper was conceived by all authors, who each contributed to the analysis and writing of the paper. NJW will act as guarantor for the paper. There are no conflicts of interest.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Appendix
 References
 
1 Smith PG, Day NE. The design of case-control studies—the influence of confounding and interaction effects. Int J Epidemiol 1984; 13:356–65.[Abstract]

2 Hwang SJ, Beaty TH, Liang KY, Coresh J, Khoury MJ. Minimum sample-size estimation to detect gene environment interaction in case-control designs. Am J Epidemiol 1994;140:1029–37.[Abstract]

3 Foppa I, Spiegelman D. Power and sample size calculations for case-control studies of gene-environment interactions with a polytomous exposure variable. Am J Epidemiol 1997;146:596–604.[Abstract]

4 Garcia-Closas M, Lubin JH. Power and sample size calculations in case-control studies of gene-environment interactions: Comments on different approaches. Am J Epidemiol 1999;149:689–92.[Abstract]

5 Sturmer T, Brenner H. Potential gain in efficiency and power to detect gene-environment interactions by matching in case-control studies. Genet Epidemiol 2000;18:63–80.[ISI][Medline]

6 Lubin JH, Gail MH. On power and sample-size for studying features of the relative odds of disease. Am J Epidemiol 1990;131:552–66.[Abstract]

7 van den Oord E. Method to detect genotype-environment interactions for quantitative trait loci in association studies. Am J Epidemiol 1999;150:1179–87.[Abstract]

8 Mood AM, Graybill FA, Boes DC. Introduction to the Theory of Statistics. Third Edn. New York: McGraw-Hill Book Company, 1974.

9 Rencher AC. Linear Models in Statistics. New York: Wiley, 2000.

10 MathSoft Inc. S-Plus 5 for Unix Guild to Statistics. Seattle, Washington: MathSoft Inc., 1998.

11 SAS Institute Inc. SAS/IML(R) Software: Usage and Reference, Version 6. Cary, NC, USA: SAS Institute Inc., 1990.

12 Pearson ES, Hartley HO. Charts of the power function for all analysis of variance tests, derived from the non-central F-distribution. Biometrika 1951;38:112–30.[ISI]

13 Odeh RE, Fox M. Sample Size Choice: Chart for Experiments with Linear Models. Second Edn. New York: Marcel Dekker, 1991.

14 Hamman RF. Genetic and environmental determinants of non-insulin-dependent diabetes mellitus (NIDDM). Diabetes Metab Rev 1992;8:287–338.[ISI][Medline]

15 Wareham NJ, Wong MY, Day NE. Glucose intolerance and physical inactivity: the relative importance of low habitual energy expenditure and cardiorespiratory fitness. Am J Epidemiol 2000;152:132–39.[Abstract/Free Full Text]

16 Intersalt Cooperative Research Group. Intersalt—an international study of electrolyte excretion and blood-pressure – results for 24 hour urinary sodium and potassium excretion. Br Med J 1988; 297:319–28.[ISI][Medline]

17 Myers RH. Classical and Modern Regression with Application. Second Edn. Boston: PWS-KENT, 1990.