1 Division of Clinical Epidemiology, Geneva University Hospitals, Geneva, Switzerland
2 Columbia Genome Center and Departments of Genetics and Development and of Psychiatry, Columbia University, New York, NY
Correspondence to Prof. Alfredo Morabia, Division of Clinical Epidemiology, Geneva University Hospitals, 25 rue Micheli-du-Crest, 1211 Geneva 14, Switzerland (e-mail: a.morabia{at}hcuge.ch).
Received for publication September 15, 2004. Accepted for publication December 9, 2004.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
body mass index; environment; genes; genetics; hypercholesterolemia; lipids; metabolism; risk factors
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Assessing the relative importance of interactions regarding common disorders is currently hampered by their formidable number between even a limited set of determinants (4) and by the paucity of biologic knowledge relative to their mechanisms. The reverse transport of cholesterol from peripheral tissues to the liver may be an exception in that it lends itself well to assessment of the relative roles of G and E main and interaction effects, for two reasons. There exists a rich biochemical and physiologic literature on the architecture of this pathway (5
8
), and its environmental determinants have also been studied extensively (2
, 9
22
).
Figure 1 depicts the overall study design strategy used to determine the relative contributions of G and E main and interaction effects to the population variance of blood lipid concentrations. A subset of subjects with extreme phenotypes was selected from the total random sample of 1,543 untreated subjects because their respective high density lipoprotein (HDL) cholesterol and low density lipoprotein (LDL) cholesterol concentrations were in the lowest (or highest) tertile (T1, 33.3rd percentile) and highest (or lowest) tertile (T3, 66.7th percentile), separately by gender. The tertile boundaries (mmol/liter) for HDL cholesterol and LDL cholesterol were, for men/women, as follows: HDL cholesterolT1 = 1.06/1.33, T3 = 1.32/1.63; LDL cholesterolT1 = 3.54/3.22, T3 = 4.30/4.04 (to convert to mg/dl, multiply by 38.6). For 186 subjects, HDL cholesterol was <T1 and LDL cholesterol was >T3 (low HDL, high LDL), considered an atherogenic phenotype ("cases"). At the other extreme were 185 subjects whose HDL cholesterol was >T3 and LDL cholesterol was <T1 (high HDL, low LDL), considered a nonatherogenic phenotype ("controls"). The case-control study subjects also tended to score in the corresponding extreme tertiles of triglycerides because HDL cholesterol, LDL cholesterol, and triglyceride levels are highly correlated in the extreme tails of their distributions. The entire coding DNA and the immediately adjacent noncoding DNA, including both 5' and 3' untranslated regions and a portion of each intron, were assayed first in a purposefully selected resequencing subsample of 95 of the case-control study subjects in whom the largest genetic variability was anticipated. Forty-eight subjects had a nonatherogenic lipid profile and were sedentary; hence, they were expected to have a "protective" genetic constitution. The 47 other subjects had an atherogenic lipid profile but were physically active; hence, they were expected to have a "deleterious" genetic constitution.
|
A previous work established that a subset of nine genetic variants (mostly single nucleotide polymorphisms (SNPs), out of a total of 275), spanning seven candidate genes, together with five of the 10 environmental factors (BMI, alcohol intake, current cigarette smoking, gender, and age) explained the largest amount (65 percent) of the lipid variance for the subjects with extreme phenotypes (23). The present study extends this previous work by 1) implementing the model developed for the phenotypically extreme subjects to the whole population-based sample and 2) estimating the absolute and relative contributions of the genetic, environmental, and interaction effects on blood lipid levels in a general European adult population.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The intensive and standardized recruitment of a potential subject lasted from 2 weeks to 2 months. Eligible subjects were identified by using a standardized procedure from an annual list of residents established by the local government. All legal residents of the canton are registered. The only information from the list used in the survey (i.e., gender, age, and whether the person is of Swiss origin) is highly accurate. Stratified random sampling, based on the list by gender within 10-year age strata, was proportional to the corresponding population distributions. Selected subjects were mailed an invitation to participate; if they did not respond, as many as seven telephone attempts at different times on various days of the week were made. If telephone contact was unsuccessful, two more letters were mailed. Subjects not reached (15 percent of men and 19 percent of women) were replaced by using the same selection protocol. Routine survey quality monitoring had shown that these subjects no longer resided in the canton, so they were not eligible for the study. Subjects who refused to participate were not replaced, and participating subjects were not eligible for future surveys. The participation rate was 61 percent. The population-based sample used in the analyses comprised 1,543 subjects not under treatment for hypercholesterolemia.
Survey measurements
Each participant filled out several self-administered, standardized questionnaires covering risk factors for the major lifestyle chronic diseases, sociodemographic characteristics, educational and occupational histories, and reproductive history (women only). A semiquantitative food frequency questionnaire, previously developed and tested in the Geneva general adult population (24), asked about serving sizes and consumption frequencies of 80 food items organized by food groups during the 4 previous weeks, which were converted into daily energy, nutrient, and alcohol intakes in the analyses. A physical activity frequency questionnaire, also developed previously in the same target population and validated by using a heart rate monitor (25
), measured total and activity-specific energy expenditures based on which of 70 physical activities grouped by general type (e.g., occupational, housework, leisure time, sports) were performed in the past 7 days.
During a scheduled appointment at a mobile epidemiology clinic (housed in a special bus), the questionnaires were checked for completion by trained interviewers, and a physical examination was performed. After participants removed their shoes and heavy outerwear, their weight was measured by using a medical scale (precision, 0.5 kg), and height was measured with a medical gauge (precision, 1 cm), from which BMI was calculated.
Laboratory measurements
Total plasma cholesterol, HDL cholesterol, and triglycerides were assayed (mmol/liter) in fasting blood (Bayer Technicon Diagnostics, Brussels, Belgium, with monthly quality control checks performed by the Swiss Center for Quality Control in Clinical Chemistry and Hematology (Geneva)). Total genomic DNA was extracted from ethylenediaminetetraacetic acid (EDTA) blood (Gentra Puregene blood kit; BioConcept, Allschwil, Switzerland). LDL cholesterol (mmol/liter) was calculated as (cholesterol HDL cholesterol triglycerides/2.2) (26).
SNPs were assayed by using modified template-directed dye-terminator incorporation with fluorescence polarization (TDI-FP) detection (Acycloprime-FP SNP Detection Kit; PerkinElmer Life and Analytical Sciences, Inc., Shelton, Connecticut). Information on primers to amplify and detect SNPs is available from the authors on request.
Statistical analyses
Linear regression models were used to evaluate the joint contributions (measured by the squared multiple correlation coefficient, R2) of the nine genetic variants, five environmental factors, and their interaction effects to the variation in log(HDL cholesterol/LDL cholesterol ratio), HDL cholesterol, and LDL cholesterol (log data reported as geometric mean). The nine variants (across seven genes) comprised two ABCA1 SNPs (exons 32b. + 30 and 50b.3038), APOE2 (2/
2 or
2/
3), two HL SNPs (exon 1b. 280 promoter and 3b. 279), LPL S447X (exon 9.b99), LDLR2 (rs2228671 on exon 2), PLTP exon 1b. + 26, and SR-BI A350A (exon 8b.41). Each genetic variant was coded 0 if the rare variant was absent and 1 if present (as heterozygote/homozygote). This coding assumes that the rare allele effect is dominant. Because homozygotes for the rare allele were extremely rare, it is essentially equivalent to an additive model, consistent with most evidence about the genetics of many quantitative traits.
The five environmental factors were gender (reference = women vs. men), cigarette smoking (reference = (never smoker (<100 lifetime cigarettes) + former smoker (quit 1 year preinterview)) vs. current smoker), alcohol intake (grams of alcohol/day) (reference = none vs. low intake (men/women:140/120) vs. (medium + high intakes) (men/women:
41/
21)), age (years), and BMI (weight (kg)/height (m)2). The first three covariates were expressed with four dummy variables, and the last two were continuous. In some analyses, BMI was recoded as normal weight (BMI <25) vs. overweight (only) (25
BMI <30) vs. obese (BMI
30), and age was recoded as 1) <55 vs.
55 years or 2) <50 vs.
50 years.
A systematic analytical strategy based on stepdown selections of G x G, E x E, and G x E interaction effects first within and then between each of the three types of interactions for each lipid outcome was used to obtain the final models. Details of these procedures are described in the Appendix.
All final p values, regression coefficients, and cumulative and total R2's were obtained by using Monte Carlo bootstrap procedures (resampling with replacement) (27). A final p
0.05 (p = 0.10) value was considered statistically (or borderline) significant. Sampling variations in the R2 estimates were also assessed with 95 percent bias-corrected and accelerated percentile confidence intervals (28
). All bootstrapped estimates were based on 2,000 replicates each, twice the minimum number recommended for bootstrap confidence intervals (28
).
The final models were further augmented with the other five environmental factors that had been removed in the modeling process performed for only the extreme-phenotype subjects (23) and were bootstrapped as above. These other five environmental factors were 1) educational level (reference group = primary (<9 years of schooling), secondary, university (
13 years of schooling and a Swiss baccalaureate degree)); 2) country of birth (reference = (Switzerland + all other (
one third France, remainder (mostly northern Europe) <5 percent each)), Mediterranean (Italy, Spain, Portugal)); 3) total dietary fat (%); 4) dietary fiber (g/day); and 5) daily energy expenditure (reference = active (
10 percent total energy expended in physical activities requiring four or more times the basal metabolic rate, sedentary (<10 percent expended in physical activities requiring four or more times the basal metabolic rate) (29
)). The first two covariates were expressed with three dummy variables, and the last three were continuous (in the descriptive analyses, dietary fat and dietary fiber were coded as below the overall median (percentile (P)50) vs.
P50). These additional analyses were used to assess any residual contributions of the other environmental factors, which reasonably might have been expected to play a role in determining blood lipid concentrations in the general population despite contraindications when focusing on only the extreme-phenotype subjects.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Associations between environmental factors and blood lipids
Table 1 describes in detail all 10 environmental characteristics of the population-based sample and their effects on blood lipid measurements. For the five environmental factors retained in the modeling process in the Morabia et al. (23) study, age and gender differences in lipid levels were in the expected directions; overweight or obesity and cigarette smoking were individually associated with higher serum total cholesterol, LDL cholesterol, and triglycerides but with a lower HDL cholesterol and HDL cholesterol/LDL cholesterol ratio, whereas alcohol intake was associated with higher serum total cholesterol, HDL cholesterol, and HDL cholesterol/LDL cholesterol ratio.
|
Clustering of genetic and environmental determinants of blood lipids
The symmetrical distributions of the numbers of SNPs, of (relevant) environmental exposures (counting one each for age 55 years, BMI
25 kg/m2 (overweight), current smoking, or any alcohol intake, but excluding gender), and of both types of factors harbored by the study subjects are shown in Web Appendix figure A1. (This figure and four supplementary Appendix tables (each referred to as "Web Appendix table" in the text) are posted on the Journal's website (http://aje.oupjournals.org/).) There was practically no correlation (r = 0.004) between the numbers of SNPs and the numbers of environmental factors. The median numbers were three out of a maximum of nine SNPs, two out of four environmental exposures (apart from gender), and five out of 13 genetic and environmental factors combined. About 70 percent of the study subjects had three SNPs or fewer, 75 percent had two or fewer environmental exposures, and 69 percent had five or fewer of both types of factors.
Cumulative genetic, environmental, and interaction effects on blood lipids
Consistent with the model developed for the phenotypically extreme subjects, the nine SNPs and five environmental factors had significant effects (bootstrapped p < 0.05 either individually or through interaction) on all or most of the three lipid outcomes (table 2). For HDL cholesterol/LDL cholesterol ratio, all but one (SR-BI A350A) of the nine SNPs and all five environmental factors had significant effects. For HDL cholesterol alone, all nine SNPs and all five environmental factors had significant effects. For LDL cholesterol alone, six of the nine SNPs and four environmental factors (excluding alcohol intake) had significant effects. Of the nine genetic variants, four (APOE2, the PLTP SNP, and the two ABCA1 SNPs) had significant effects on all three lipid outcomes; a fifth, SR-BI A350A, had significant effects on HDL cholesterol and LDL cholesterol and a borderline effect on HDL cholesterol/LDL cholesterol ratio. Another three variants (LPL S447X and the two HL SNPs) had significant effects on the HDL cholesterol/LDL cholesterol ratio and on HDL cholesterol. The ninth LDLR SNP had significant effects on HDL cholesterol and LDL cholesterol.
|
The total R2 was 33.7 percent for the HDL cholesterol/LDL cholesterol ratio, 33.0 percent for HDL cholesterol, and 18.5 percent for LDL cholesterol. These three total R2's, respectively, were decomposed into the main effects of the nine genetic variants (6 percent, 4 percent, and 5 percent) and the five environmental factors (25 percent, 28 percent, and 11 percent), with the remainder (3 percent, 2 percent, and 3 percent) being jointly due to the G x G, G x E, and E x E interaction effects.
Detailed 2 x 2 and 2 x 2 x 2 tables specifically illustrating the additional two- and three-way interaction effects on the lipid outcome measurements are available in Web Appendix tables A1, A2, and A3.
Stability of the final model after augmentation by the other five environmental factors
When the final model with the interactions shown in table 2 was augmented with the other five environmental factors that had been removed during the modeling process in the Morabia et al. (23) study, the previous regression coefficients and p values remained essentially unchanged, as shown in Web Appendix table A4. Although sedentarity was significantly associated with the HDL cholesterol/LDL cholesterol ratio (p = 0.044) and HDL cholesterol alone (p = 0.017), and (log)fiber intake showed borderline significant associations with the HDL cholesterol/LDL cholesterol ratio (p = 0.051) and LDL cholesterol (p = 0.065), all of the additional contributions to the explained variances were marginal. Specifically, total R2 increased by only 0.7 (to 34.4) percent for the HDL cholesterol/LDL cholesterol ratio, by only 0.8 (to 33.8) percent for HDL cholesterol, and by only 0.7 (to 19.2) percent for LDL cholesterol.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In addition to either HDL cholesterol or LDL cholesterol alone, the combined phenotype based on the HDL cholesterol/LDL cholesterol ratio was used as an outcome measurement because biochemical, physiologic, and pharmacologic data all support the notion that the 11 genes originally targeted by Morabia et al. (23) converge to moderate both HDL cholesterol and LDL cholesterol levels. Moreover, although HDL cholesterol and LDL cholesterol are relatively independent of each other in the general population, they are highly (negatively) correlated in the tails of their distributions, where the working model for the present study was developed.
There is a growing body of evidence that combinations of multiple genes harboring predisposing alleles can be identified that, together, contribute significantly to the population variance of lipid levels (30). Mootha et al. (31
) have shown that combining the expressions from 22,000 genes of previously defined pathways may increase signal relative to noise and improve statistical power. Yang et al. (32
) used simulations and empirical data to demonstrate that the information from several genetic and environmental factors could be combined to improve prediction of a multifactorial disease. From an evolutionary perspective, adaptive pressures may select against any single gene variant (allele) explaining a large fraction of the population phenotypic variance. Instead, alleles explaining the variance of a population trait are more likely to be distributed among several, or numerous, genes within a pathway or larger network of gene products. In addition, substantial exposure heterogeneity in the present population also supported the idea that different combinations of determinants modulated blood lipid levels. Typically, a person tended to be exposed to about a median of only five of the 13 total factors (apart from gender) included in the models. The present work of estimating and separating out the relative roles of some pathway-related genetic and environmental factors and their epistatic effects in determining blood lipid levels demonstrates the success of the latter approaches when applied in a general population setting.
The more dominant role of the environmental factors in determining blood lipid concentrations makes good sense. Blood cholesterol levels have been fluctuating over the last decades in almost all populations of the world (33, 34
), and this phenomenon can be explained only by changes in exposure to factors totally or in part "environmental," such as diet, physical activity, smoking, alcohol intake, or overweight/obesity. The very modest contributions of interactions are unexpected, however. One explanation could be that their effects were underestimated in the analysis. The particular genetic variants identified in the final model represent only a small fraction of all variants that operate in the population as a whole. Interactions may also involve factors that are outside of the studied model and that therefore could not have been accounted for in the analysis. Up to three-way interactions were feasible to analyze given the sample size, but there may be important higher-order interactions. Finally, the set of genetic and nongenetic factors analyzed were all predictors of lipid levels before interactions were investigated. Therefore, some determinants, which are related to lipid levels only when interacting with other factors, may have been missed.
However, there are also reasons why the results probably do reflect the relative contributions of genes, environment, and interactions to HDL cholesterol levels. The main analytical strategy in the population-based random sample began with nine variants across seven of 11 reverse transport of cholesterol pathway genes and five of 10 environmental factors, which are considered the main determinants of blood lipid levels (23). All polymorphisms in the exons and flanking regions of the 11 genes were surveyed, and state-of-the-art methods were used to assess environmental exposures such as diet (including alcohol intake) (24
), physical activity (25
), and smoking (35
). The final model implicated BMI, gender, and APOE, which are the strongest individual determinants of HDL cholesterol. Remarkably, the cumulative effect of the other eight trait-related gene variants managed to approach that of only APOE alone. Lastly, the findings are consistent with the literature: even though work exists suggesting that some genetically defined subgroups of the population may benefit more than others from specific interventions or therapies (2
, 3
), on the whole no strong signal is emerging from the wealth of available studies that cholesterol-reducing preventive strategies should target subgroups of the population defined by specific G x G, G x E, or E x E interactions.
It is of note that sedentarity and fiber intake were related to lipid levels in the full population-based sample but not for the phenotypically extreme subjects alone. To check the possibility that these associations might have been confounded by the strong effect of BMI in the case-control study of Morabia et al. (23), the final logistic model for the phenotypically extreme subjects was augmented by sedentarity and log(fiber intake), but BMI was excluded. The results continued to indicate that neither sedentarity (p = 0.18) nor fiber intake (p = 0.60) was associated with case-control status in the absence of BMI. The absence of effects of lipid intake, country of birth, and educational level in the full sample can probably be explained by the extreme homogeneity of this entirely urban population, in which migrants do not maintain the ethnic characteristics of their population of origin.
The results of this systematic investigation of genes, environmental factors, and their interactions in explaining the variance of lipoprotein levels indicate that 1) BMI, smoking, and alcohol intake are strong predictors of blood lipid levels; and 2) their effects are only marginally modified by genetic background. These findings have tremendous clinical and public health implications. They imply that the documented trends of the growing prevalence of hypercholesterolemia in the population of origin of the study sample (34) are attributable to modifiable lifestyle behaviors affecting the population as a whole and that they are not driven mainly by genetic subgroups. Confirmation of these results by further studies in other parts of the world would provide evidence that, at least at present, prevention strategies should target every individual patient, or the whole populationthat is, not specific genetic subgroups. These strategies should focus on reducing obesity, in particular through physical activity (36
), elimination of cigarette smoking, and moderation of alcohol intake.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
First, each of the three types of interaction effects was analyzed separately. The following sections describe these effects.
G x G interactions
The nine SNPs were first divided into two subgroups, G2 and G7. G2 comprised the HL3b. 279 and SR-BI A350A SNPs that showed significant two-way interactions with gender (23); G7 comprised the remaining seven SNPs.
Step 1: For each SNP, the previously identified variables were augmented with all possible three-way G7 x G7 x gender and G7 x G2 x gender interactions + any two-way G7 x gender and G7 x G2 interactions required for a hierarchical model + all possible remaining two-way G7 x G2, G7 x G7, and G2 x G2 interactions.
Step 2: p = 0.10 stepdown selection was applied only to the step 1 model three-way interactions.
Step 3: p = 0.10 stepdown selection was applied only to the step 2 model two-way interactions.
Step 4: Interactions retained in all step 3 models were fitted in an overall model.
Step 5: p = 0.10 stepdown selection was applied only to the step 4 model interactions.
E x E interactions
Step 1: The previously identified variables were augmented with all possible two-way E x E interactions.
Step 2: p = 0.10 stepdown selection was applied only to the step 1 model interactions.
G x E interactions
Step 1: For each of the four E factors of age, BMI, smoking, and alcohol intake separately, the previously identified variables were augmented with all nine possible two-way G x E interaction terms (gender and the two corresponding three-way G2 x E x gender interactions were automatically included).
Step 2: p = 0.10 stepdown selection was applied only to the interactions for each step 1 model.
Step 3: Interactions retained in all step 1 models were fitted in an overall model.
Step 4: p = 0.10 stepdown selection was applied only to the step 3 model interactions.
Finally, the simultaneous effects of the G x G, E x E, and G x E interactions were assessed by augmenting the previously identified variables with all of the interaction terms retained as described above pooled together and applying p = 0.05 stepdown selection to only those interactions.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|