1 Functional Genomics Unit
2 G. N. Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology, Delhi, India
3 Microarray Facility, Department of Biological Services
4 Department of Molecular Genetics and Crown Human Genome Center, The Weizmann Institute of Science, Rehovot, Israel
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
GeneChip; microarrays; twins; differential gene expression; housekeeping genes
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The natural variation in gene expression is an outcome of the complex interplay of genetic polymorphisms (acting in cis or in trans), physiological variations (such as time of day and gender), and environmental factors (12). One approach to address this complexity is the use of model systems, including animals, insects, or lower eukaryotes. In these cases, conditions can be chosen to minimize the contribution of nongenetic variables. Such studies in yeast (Saccharomyces cerevisiae), fruitfly (Drosophila melanogaster), and fish (genus Fundulus) allowed inferences on global patterns of variation in gene expression that could be correlated to genetic differences (6,16,22). Although these data are very useful, it is desirable that, in parallel, estimation of natural variation in gene expression in humans be carried out directly.
Minimizing the contributions of nongenetic factors in humans is inherently difficult. Therefore, estimation of variation in gene expression due to genetic differences will have to be addressed from a different angle. Studies in monozygotic twins could enable us to estimate the size of the contribution of genetic and environmental factors to the natural variation in gene expression, because phenotypic differences within monozygotic twin pairs are due to environmental effects alone, as they uniquely share their entire genetic background (20). Therefore, differentially expressed genes between monozygotic twins can be classified as "genes whose expression varies randomly due to environmental factors."
Identification of differentially expressed genes between monozygotic twins could allow us to determine the contribution of environmental factors, if a given twin pair can be sampled at the same time. Comparison between unrelated individuals can be carried out by considering various factors such as differences in gender, age, and time of day (27) and examining the characteristics of the housekeeping genes, since these genes are expressed constitutively in all tissues to maintain cellular functions.
Here we report the gene expression analysis of five pairs of monozygotic twins and three unrelated individuals using HG-U95Av2 microarrays. Our results serve to expand the current understanding of natural variation in gene expression in humans and suggest the use of monozygotic twins for comparative analysis in these investigations.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Three more normal individuals, including two females and one male, were recruited. Their ages were 23, 34, and 37 yr, respectively. Informed consent was obtained from all. About 20 ml of blood were drawn by vein puncture and immediately processed for nucleic acid isolation. Three-quarters of the isolated blood was used for total RNA isolation, and the rest was used for isolating genomic DNA. Twelve highly polymorphic microsatellite markers located on eight different chromosomes (Linkage panel set, version 2; Perkin Elmer Applied Biosystems, Foster City, CA) were used for haplotyping of genomic DNA from twins to assess their monozygosity.
Isolation of Total RNA and Genomic DNA from Blood Leukocytes
Total RNA was isolated from blood leukocytes after the red blood cells (RBCs) were lysed in 1x RBC lysis buffer (150 mM NH4Cl, 10 mM NaHCO, and 1 mM EDTA prepared in diethylpyrocarbonate-treated water). The blood leukocytes were recovered by centrifugation at 250 g, and total RNA was isolated with an EZ-RNA isolation kit (Biological Industries, Kibbutz Beth Haemek, Israel). The quality of total RNA was examined by gel electrophoresis. Samples with either DNA contamination or degradation were discarded. The genomic DNA was isolated with the salting-out procedure (21).
Preparation of cDNA and In Vitro Transcription and Labeling
The amount of RNA taken from each sample was equalized, based on absorbance at 260 nm. Double-stranded cDNA was synthesized from 8 µg of total RNA by reverse transcription, using T7-(dT)24 primer and the Superscript Choice cDNA synthesis system (Invitrogen). In vitro transcription of the cDNA was carried out with the use of an Enzo Bioarray High Yield RNA transcript labeling kit (Affymetrix) to prepare biotin-labeled cRNA. The labeled cRNA was cleaned, using RNeasy columns (Qiagen). The labeled target was fragmented, and a hybridization cocktail was prepared including fragmented cRNA, probe array controls, BSA, and Herring sperm DNA.
GeneChip Processing
GeneChips were processed (HG-U95Av2 arrays, Affymetrix) under the same set of experimental conditions. First, labeled products were hybridized with the Affymetrix GeneChip Test3 arrays. If the results were judged satisfactory, hybridization was subsequently carried out with the HG-U95Av2 arrays as per the manufacturers instructions. Arrays were hybridized at 45°C for 16 h. After hybridization, arrays were washed using an automated GeneChip Fluidics Station 400. After the washing, the array was stained with streptavidin-phycoerythrin and scanned with an HP Gene Array Scanner. Data analysis was carried out using Affymetrix Microarray Suite Software (MAS 5.0). All GeneChip experiments were performed at the Weizmann Institute of Science (Rehovot, Israel).
Data Analysis
The HG-U95Av2 array consists of 12,626 probe sets (including controls) for 10,000 genes. Global scaling was carried out to reliably compare the data from multiple arrays. The raw data from the GeneChip experiments have been submitted to Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) under the following accession numbers: GSM14477, GSM14478, GSM14479, GSM14480, GSM14481, GSM14482, GSM14483, GSM14485, GSM20645, GSM29053, GSM29054, GSM29055, GSM29056, GSM29057, and GSM29058.
Comparative analyses were performed by considering probe sets with "present" (P) call with a P value of 0uu0.04. Our goal was to identify differentially expressed genes above experimental noise. First, we compared duplicate experiments using the same RNA sample to obtain a cutoff limit of signal log ratio to identify differentially expressed genes above experimental noise. We observed that none of the 10,000 genes was differentially expressed at a signal log ratio >1.585 in duplicate experiments. Therefore, the differentially expressed genes in pairwise comparisons were identified by selecting the probe sets with "change" call "I" or "D" and a signal log ratio >1.585.
Functional Classification of Differentially Expressed Genes
To examine the correlation of functional classification of genes with their variability in expression, we first categorized the differentially expressed genes into three categories: least variable (absolute signal log ratio value: 1.6uu2.3), moderately variable (absolute signal log ratio value: 2.3uu3), and most variable (absolute signal log ratio value: >3). Subsequently, they were classified according to function into six functional classes, based on the scheme described by Adams et al. (1) and Hsiao et al. (14). The genes belonging to replication, transcription, and translation have been collectively grouped into the "information" class, as described by Andrade et al. (4). We show (see Figs. 3 and 5, vertical bars) the number of genes exhibiting fold change variation in the three categories and in each functional class: information (IN; includes replication, transcription, and translation), "signaling and communication" (SC), "immune and related functions" (IR), "metabolic processes" (MP), "cell cycle" (CC), and "structure and motility" (SM). Signaling and communication include receptors, protein modification, hormone/growth factors, intracellular transducers, effectors/modulators, metabolism, cell adhesion, and channels/transport proteins. Information includes protein synthesis, translation factors, ribosomal proteins, posttranslational modification/targeting, protein degradation, tRNA synthesis/metabolism, RNA synthesis, transcription factors, RNA polymerase, RNA processing, RNA degradation, DNA synthesis/replication, and DNA repair. Metabolic processes include amino acids, nucleotides, sugars, lipids, cofactors, protein modification, energy, and carrier proteins/membrane transport. Cell cycle/cell division includes cell cycle, apoptosis, chromosomal structure, and DNA repair. Structure and motility include cytoskeletal, microtubule-associated proteins/motors, and extracellular matrix. Immune and related functions include immunology, homeostasis, and carrier proteins/membrane transport and stress response.
NetAffx (version dated 23 June 2004, http://www.affymetrix.com) was mainly used for annotation and functional classification of the differentially expressed genes (19). Supplementary information was obtained from GeneCards (http://bioinformatics.weizmann.ac.il/cards) (23) and LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink).
Housekeeping Genes
The reference data set of 575 housekeeping genes compiled by Eisenberg and Levanon (10) was used for comparative analysis (http://www.compugen.co.il/supp_info/Housekeeping_genes.html) (25). A total of 475 housekeeping genes were identified as meeting the criteria of P call in at least 9 of 13 arrays (70%). Expression patterns of housekeeping genes were examined by computing their mean expression and coefficient of variation (CV), as suggested previously (14). Mean expression was computed by logarithmic transformation (base 10) of the signal values from all 13 arrays. Probe sets without P calls were not considered. The CV was computed as standard deviation (SD)/mean.
Statistical Analysis
Preferences in distribution of the differentially expressed genes in different functional classes for each category of variation (least, moderate, and most variable) were tested, using the chi-square (2) test. To compute expected occurrence, the total number of genes was equally distributed in each of the six functional classes. Equal occurrence of genes in the different functional classes is expected when variation in gene expression occurs solely because of random fluctuations. A statistical test was carried out only for those cases where substantially high numbers of genes varied in expression.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Differentially Expressed Genes
Female twin pairs.
The scatter plots of the gene expression levels measured by "signal" values for female twin pairs are shown in Fig. 1, AuuC. It is evident that, in all cases, gene expression is highly similar between monozygotic twins. None of the genes was observed to be differentially expressed in the pair F1:F2. The number of differentially expressed genes was 19 in the pair F5:F6 and 24 in the pair F7:F8. The majority of the differentially expressed genes belonged to the least variable category: 15/19 in the pair F5:F6 and 14/24 in the pair F7:F8. Functional classification revealed that IR topped the list (28%) of differentially expressed genes in F5:F6, whereas SC topped the list (39%) in the pair F7:F8. The distribution in other classes was nearly equal in both pairs. The only exception was the low representation of SM function in the pair F7:F8, at 5%.
|
|
|
Overall, differentially expressed genes between monozygotic twins was low (0uu1.76%), and the majority of them belonged to the least variable category in all pairs. In general, there appears to be no clear preference for any of the functional classes, although genes of SC and IR classes generally tend to top the list of differentially expressed genes. A sum of 214 genes (nonredundant set) was differentially expressed in all pairwise comparisons of monozygotic twins.
Housekeeping genes.
In the backdrop of differences in gene expression, analysis of the expression patterns in housekeeping genes is an important step to characterize differentially expressed genes. The number of differentially expressed housekeeping genes between monozygotic twins was very low. The results are displayed in Table 1. No clear preference to any of the functional classes was observed among the differentially expressed housekeeping genes. These observations mirror the global pattern of distribution of differentially expressed genes between monozygotic twins.
|
Comparisons Between Unrelated Individuals
Differentially expressed genes among unrelated individuals.
To further elaborate on the influence of genetic and environmental factors on gene expression, we carried out comparative gene expression analysis between unrelated individuals of the same gender and similar age to minimize the contribution of other factors. A total of eighteen pairs of comparisons between seven unrelated female individuals of similar age were carried out meeting these criteria (Fig. 4). The number of differentially expressed genes in the pairs ranged from 37 to 1,413, corresponding to an extent of variation from 0.37 to 14.13%. This range is higher than that observed between monozygotic twins. The total number of these genes in all 18 pairs was 3,057. These genes were distributed as 46% in least variable, 31% in moderately variable, and 23% in most variable categories. This distribution differs from the pattern between monozygotic twins, wherein we observed that a majority of the differentially expressed genes belonged to the least variable category. These observations indicate that the variability in the expression of genes increases with genetic distance.
|
The distribution of 3,057 differentially expressed genes from 18 pairs in the six functional classes is shown in Fig. 5. The top ranking class was SC (31%), followed by IN (24%), MP (20%), IR (12%), CC (7%), and SM (6%). It was apparent that the deviation from equal representation was statistically significant in all three categories of variation (most variable, P < 0.0001; moderately variable, P < 0.0001; and least variable, P < 0.0001; 2 test).
|
In the backdrop of housekeeping genes varying in expression, an important goal is to identify the most highly expressed housekeeping genes. We ranked them according to their mean expression levels in our experiments. The top 15 highly expressed housekeeping genes are listed in Table 2. It is evident that several of these highly expressed genes (9/15) are ribosomal protein coding genes that carry out important cellular functions. It is also interesting to note that the CV in expression across individuals varying in genetic background, age, gender, and environment is low among the highly expressed housekeeping genes.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Overall, we found very low variation in gene expression between monozygotic twins (0uu1.76%). The high variation observed in gene expression between the monozygotic pair M1:M2 could be attributed to significant differences in the environment to which they were exposed. These differences were comprised of diverse climates, nutrition habits, and professions. Compared with this pair, the other twin pairs either lived closely or lived in similar geographical locations and generally had similar nutrition habits and professions.
We also observed that most of the differentially expressed genes between monozygotic twins belonged to the least variable category. Furthermore, we noted that there was no clear preference for these genes to belong to any of the six functional classes, although the genes belonging to SC and IR tended to top the list. Thus random variation in gene expression due to environmental factors is more likely to be found among the genes belonging to SC and IR classes. This is perhaps due to the characteristic role of these genes to function at the interface between body and environment.
Examination of the expression of housekeeping genes between monozygotic twins indicated very low variation. Because housekeeping genes carry out essential functions for the maintenance of cellular physiology, it appears that environmental differences only play a minor role when the underlying genetic background is identical. None of the genes coding for basal transcription machinery, ribosomal proteins, and DNA replication was found to be highly variable in expression between monozygotic twins. Perhaps this is due to the generally observed high level of sequence conservation and the ancient characteristics of these genes (24,25).
Compared with monozygotic twins, the variation in gene expression between unrelated individuals of the same gender and similar age exhibited a higher range. Furthermore, the substantial representation of differentially expressed genes that was observed in all three categories of variation was distinctly different from that observed between monozygotic twins. Our results are in agreement with independent observations made by Cheung et al. (8), who observed that genes showed less variability in expression between closely related individuals compared with unrelated individuals. Taken together, it appears that differences in genetic background are primary contributors to variation in gene expression in humans, while environmental effects may play a minor role. Because genes belonging to SC and IR functions tend to top the list between unrelated individuals, similar to monozygotic twins, it appears that SC and IR genes are highly sensitive to genetic and environmental differences.
The number of housekeeping genes differing in expression between unrelated individuals was severalfold higher compared with monozygotic twins, indicating that differences in genetic background contribute substantially to this variability. However, the highly expressed housekeeping genes showed very low variation with apparent independence with respect to differences in genetics, environment, gender, and age. These results uphold the observations by Hsiao et al. (14). In summary, our study, although subject to the characteristics of experimental signal-to-noise ratio specific to GeneChip experiments, indicates that gene expression profiling in monozygotic twins could be very useful to identify genes the expression of which varies randomly with environmental factors, and this data can be used to assess natural variations in gene expression.
A data set of these genes across different populations could be used as a sieve to identify genes the expression of which primarily varies due to genetic differences in humans. Although our study is somewhat limited due to a small sample size, we envisage that similar studies conducted in other populations could define the extent and nature of normal variability in gene expression and provide insights to understand the genetic basis of the differences between individuals in a population.
![]() |
ACKNOWLEDGMENTS |
---|
A. Sharma and V. K. Sharma are recipients of a fellowship from the Council of Scientific and Industrial Research. We thank the Department of Biotechnology, Government of India, and the Ministry of Science, Israel, for a grant under Indo-Israel cooperation.
![]() |
FOOTNOTES |
---|
Address for reprint requests and other correspondence: S. Ramachandran, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India (E-mail: ramu{at}igib.res.in; ramucbt@yahoo.com).
10.1152/physiolgenomics.00228.2003.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|