1 Food Bioactives Group, RIKILT-Institute of Food Safety, 6700 AE Wageningen, The Netherlands
2 Division of Human Nutrition and Epidemiology, Wageningen University, 6700 EV Wageningen, The Netherlands
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
gene expression; data normalization method; between-person variation; within-person variation
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The technology was first used for assessment of cellular mRNA expression levels. The goal of this application is to determine gene expression of the "whole" genome in any cell or tissue sample of interest (14). For this, isolated (m)RNA is labeled and hybridized to the DNA microarray. After washing and scanning of the DNA microarray, a raw fluorescence data image is obtained which should represent overall gene expression level of the original sample. Usually, not every gene is represented on the DNA microarray, and hybridization results are influenced by multiple factors, namely, the labeling method, hybridization conditions, the sequence of the gene, and target features. Since in most cases not absolute expression levels but differences in expression between samples are determined, problems can be minimized by direct comparison of two samples, labeled with different fluorophores (e.g., Cy3 and Cy5) on one array (7). To compare multiple samples (each separately labeled with, e.g., Cy5), a common reference sample is labeled (labeled with Cy3) and can be used in each hybridization. In the ideal situation the reference represents each spotted gene on the DNA microarray, but in practice this is a pool representing each investigated sample.
For expression analysis to be efficient and reliable, reproducible laboratory protocols and validated procedures for data normalization are required (8). Labeling procedures are constantly optimized and new procedures are being developed, primarily to reduce the amount of input material that is required (10, 21, 23). Critical to the success of these protocols is that they are reproducible. Therefore, it is striking that no simple, standard protocol is available to assess the reproducibility. The last step before comparing the gene expression of different samples is data normalization, which is performed on the obtained raw fluorescent data. Data normalization makes use of the Cy3 reference sample images and allows directly comparison of the Cy5 sample values on different slides (1, 7). This is a widely used approach, but again, no simple standard method is available to assess whether data normalization is performed correctly. Different methods have been reported to address the way data normalization and labeling methods are performed (17, 22). These methods range from simple linear correlation coefficient (R2) analysis to sophisticated statistical methods (12, 13, 22). When evaluating different labeling methods, one is interested in the systematic deviation (i.e., difference in the amount of labeled RNA samples), which preferably should be as small as possible. However, when the R2 is used in DNA microarray data analysis, its feature to be insensitive to the systematic deviation makes it unsuitable for the purpose of assessing reproducibility (2). Another limitation of the R2 is that it only facilitates the comparison of two samples at the same time. An attractive alternative is the use of the intraclass correlation coefficient (ICC). This is a relatively simple statistical procedure used to determine the reproducibility of a measurement of a variable (3, 18, 20). This correlation is based on variance components analysis and measures the homogeneity within groups relative to the total variation. The ICC is large when there is little variation within the groups compared to variation among group means, where groups consist of replicate measurements. A small ICC occurs when within-group variation is large compared with between-group variability, indicating that some unknown variable has introduced nonrandom effects in the different groups. The maximum value of the ICC is 1, and the minimum value is theoretically 0 (3, 13, 18, 20). The ICC is routinely used in epidemiological studies to address the test-retest reliability, validity of questionnaires, interlaboratory concordance, and correlation of plasma/tissue levels to disease status. Having seen how the ICC is utilized for reliability, reproducibility, and validation analysis (6, 15, 16), we decided to investigate whether it can be used to assess technical variation in DNA microarray technology and, more specifically, to assess the reproducibility of sample RNA labeling methods and to optimize data normalization. We decided to extend its use to assessment of biological variation. To this end, the within- and between-person variation in gene expression in small biological samples was estimated. The analysis of gene expression in biological samples is used for clinical as well as epidemiological studies. Since human material is often scarce, it is necessary to determine how many biopsies should be taken to acquire sufficient accuracy in the assessment of the tissue that is analyzed.
In this paper, we show that the ICC is a relatively simple statistical measure that can be used to estimate methodological and biological variation as exemplified by addressing the validity of our data normalization procedure, by comparing the reproducibility of different labeling methods and by analyzing variation of gene expression in human rectal biopsies.
![]() |
MATERIAL AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Persons with intestinal complaints visiting the hospital for a colonoscopy were asked to participate in this study. If the colonoscopy findings showed no visible symptoms, then a rectal biopsy was taken. In total 27 persons were recruited at the colonoscopy visit, 5 subjects donated multiple biopsies (n = 46), and 22 individuals gave one biopsy. The biopsies taken had an average size of 7 mm3, an average weight of 1.2 mg, and gave an average yield of 13 µg of total RNA. The biopsies were lyophilized and ground before use. Total RNA from cultured cells and tissue was extracted using the TRIzol according to the directions from the supplier (Invitrogen). RNAs that were not used for mRNA isolation were purified using the RNeasy protocol (QIAquick RNeasy kit; Qiagen, Leusden, The Netherlands). mRNA was isolated from total RNA by poly(A)+ selection using oligo(dT) Sephadex (mRNA purification kit; Pharmacia, Roosendaal, The Netherlands). Concentrations were determined spectrophotometrically at A260nm, and all samples were checked after 1 h incubation at 37°C on 1% TAE/agarose gels for absence of degradation.
Direct labeling.
The RNA labeling protocol is based on Schena et al. (19). Either total RNA or mRNA was labeled by incorporation of either Cy3-dCTP or Cy5-dCTP during reverse transcription. A combination of HT-29 and Caco-2 mRNA (1.0 µg) or total RNA (40.0 µg) was used for the labeling. In short, 1.0 µg of sample poly(A)+ RNA or 40.0 µg total RNA was mixed with 1.0 ng control luciferase mRNA (Promega, Leiden, The Netherlands), 2.0 µg oligo(dT)21 primer (Isogen, Maarssen, The Netherlands), and/or 150.0 ng random hexamers (Invitrogen) in a final volume of 13.5 µl, heated for 3 min at 65°C (RNA denaturation) and 10 min at 25°C (primer annealing), then immediately put on ice. Then, a reverse transcription reaction was performed for 2 h at 37°C in a final volume of 25 µl. The reaction mixture contained the RNA template with the annealed primer, 1x first-strand buffer (Invitrogen), 10 mmol/l dithiothreitol, 0.5 mmol/l dATP, 0.5 mmol/l dGTP, 0.5 mmol/l dTTP, 0.04 mmol/l dCTP, 0.04 mmol/l Cy3-labeled dCTP (or Cy5-labeled CTP, Amersham), 15 U of RNase OUT (Invitrogen), and 150 U of SuperScript II reverse transcriptase (Invitrogen). The labeled cDNA obtained was purified by an ethanol precipitation performed at room temperature. The pellet was dried and dissolved in 10 µl TE, pH 8.0 (10 mmol/l Tris·HCl and 1 mmol/l EDTA). After a 3-min boiling step, the cDNA was immediately put on ice, and 2.5 µl of 1 mol/l NaOH was added. The cDNA was then incubated for 10 min at 37°C to break down the remaining RNA. To neutralize the pH, 2.0 µl of 1 mol/l HCl and 2.5 µl of 1 mol/l Tris·HCl (pH 6.8) were added. Finally, an additional ethanol precipitation at room temperature was performed, and the resulting cDNA pellet was dissolved in 25 µl hybridization buffer containing 5x SSC, 0.2% SDS, 5x Denhardts solution, 50% (vol/vol) formamide, and 0.2 mg/ml denatured herring sperm DNA. Prior to hybridization, the labeled cDNA was heated (improves cDNA dissolving) for 3 min at 65°C and spun for 2 min at 12,000 g to remove undissolved debris.
Indirect labeling.
This RNA labeling protocol is based on the protocol from Henegariu et al. (9). In the reverse transcription step aminoallyl-dUTP was incorporated and afterward chemically coupled to Cy5 monofunctional dye. A combination of HT-29 and Caco-2 mRNA (1.0 µg) or total RNA (40.0 µg) was used for the labeling. For biopsy material 12.5 µg total RNA was used for the labeling. In short, 1.0 µg of sample poly(A)+ RNA or 12.5 µg or 40.0 µg total RNA was mixed with 1.0 ng control luciferase mRNA, 2.0 µg oligo(dT) primer (21-mer), and/or 150.0 ng random hexamers in a final volume of 12.75 µl, and heated for 3 min at 65°C (RNA denaturation) and 10 min at 25°C (primer annealing), then immediately put on ice. Then, a reverse transcription reaction was performed for 2 h at 37°C in a final volume of 25 µl. The reaction mixture contained the RNA template with the annealed primer, 1x first-strand buffer, 10 mmol/l dithiothreitol, 0.5 mmol/l dATP, 0.5 mmol/l dGTP, 0.5 mmol/l dCTP, 0.3 mmol/l dTTP, 0.2 mmol/l aminoallyl-dUTP (Sigma), 15 U of RNase OUT, and 150 U of SuperScript II reverse transcriptase. The obtained cDNA was purified by an ethanol precipitation performed at room temperature. The pellet was dried and dissolved in 10 µl TE, pH 8.0. After a 3-min boiling step, the cDNA was immediately put on ice, and 2.5 µl of 1 mol/l NaOH was added. The cDNA was then incubated for 10 min at 37°C to break down the remaining RNA. To neutralize the pH, 2.0 µl of 1 mol/l HCl and 2.5 µl of 1 mol/l Tris·HCl (pH 6.8) were added. An ethanol precipitation at room temperature was performed, and the resulting cDNA pellet was dissolved in 10 µl 0.1 mol/l sodium bicarbonate buffer (pH 9.3). The chemical coupling took place for 30 min at room temperature by adding 10 µl 5 mmol/l Fluorolink Cy5 monofunctional dye (Pharmacia) to the cDNA. An ethanol precipitation was performed at -20°C for at least 2 h, and the resulting cDNA pellet was dissolved in 100 µl Millipore-filtered water. All cDNAs were purified using the PCR purification protocol (QIAquick PCR purification kit, QIAgen). Finally, an additional ethanol precipitation at room temperature was performed, and the resulting cDNA pellet was dissolved in 25 µl hybridization buffer. Prior to hybridization, the labeled cDNA was heated for 3 min at 65°C and spun for 2 min at 12,000 g to remove undissolved debris.
Microarrays construction.
An in-house-produced subtracted cDNA library, enriched for genes which are expressed in differentiated and undifferentiated Caco-2 cells (A. Peijnenburg et al., unpublished results), were printed on silylated slides (CEL Associates, Houston, TX) using a PixSys 7500 arrayer (Cartesian Technologies, Durham, NC). Arrays were spotted by passive dispensing using quill pins (ChipMaker 3; TeleChem, Sunnyvale, CA), resulting in a spot diameter of 0.12 mm at a volume of about 0.5 nl. After printing, microarrays were allowed to dry at room temperature for at least 3 days. Free aldehyde groups were blocked with NaBH4 according to the method of Schena et al. (19). The microarrays, used for the validation of the normalization method and assessment of different labeling methods, contained 1,152 spotted genes in duplicate. For the assessment of biological variation, the microarrays that were used contained 2,304 single-spotted genes.
Microarray hybridization.
Prior to hybridization, microarrays were prehybridized in hybridization buffer at 42°C for several hours. After prehybridization, slides were rinsed twice in Millipore-filtered water, once in isopropanol, and dried by centrifugation (2 min, 470 g). The hybridization was performed in a Geneframe (1 x 1 cm2, 25-µl hybridization volume; Westburg, Leusden, The Netherlands). A 1:1 (vol/vol) mixture of Cy3- and Cy5-labeled cDNAs was hybridized to each array. Arrays were hybridized overnight at 42°C in a humid hybridization chamber. After hybridization, slides were washed at room temperature, first in 1x SSC/0.1% SDS (5 min) and subsequently in 0.1x SSC/0.1% SDS (5 min) and 0.1x SSC (1 min), and then dried by centrifugation (2 min, 470 g).
Microarray scanning.
Microarrays were scanned using a confocal laser scanner (ScanArray 3000; General Scanning, Watertown, MA) containing a GHeNe 543-nm laser for Cy3 measurement and a RHeNe 633-nm laser for Cy5 measurement. Scans were made with a pixel resolution of 10 µm, a laser power of 90%, and a photomultiplier tube voltage of 80%. The software package ArrayVision (version 7.0; Imaging Research, Ontario, Canada) was used for image analysis of the TIFF files, as generated by the scanner. Density values of each spot, multiplied by the area and the background (surrounding entire template), were collected and stored for further data processing in Microsoft Excel.
Experimental setup and data normalization.
All arrays were hybridized with a Cy5-labeled sample cDNA and a Cy3-labeled reference cDNA. For validation of the normalization method and the assessment of different labeling methods, the reference cDNA was a mixture of directly labeled mRNA of HT-29 and Caco-2 cells. A mixture of indirectly labeled total RNA from rectal biopsies and HT-29 and Caco-2 cells was used as reference cDNA for assessment of biological variation. The reference cDNA was pooled after fluorescent labeling and subsequently subdivided and hybridized on all arrays. The reference hybridization signals of the spots should under ideal circumstances be identical on each slide. In practice this signal will differ because of 1) fluctuations in the amount of DNA spotted and 2) variations in the hybridization conditions within a slide, between slides, and between different experiments (random variation). For these reasons, corrected sample hybridization signals, Cy5spot(x),slide(X)corr1, were calculated for each spot x on slide X according to Eq. 1 in Fig. 1 of the first correction, where N is the total number of hybridized slides, Cy5spot(x),slide(X) and Cy3spot(x),slide(X) are the measured Cy3 and Cy5 hybridization signals of spot x on slide X, respectively, and Cy3spot(x),slide(1,...,N)median is the median of the hybridization values of spot x on all slides hybridized in the experiment. To correct for differences in labeling efficiency between samples and for inaccuracies in the amount of sample mRNA used in the labeling reaction (systematic labeling deviation), sample hybridization values were also corrected for the median Cy5 signal according to equation in Fig. 1 of the second correction, where n is the total number of spots on the array, Cy5spot(1,...,n),slide(X)median,corr1 is the median of the Cy5 signals of all spots on slide X after the first correction, and
![]() |
|
![]() |
In the text output of SPSS, a "single measure intraclass correlation (ICC1)" and an "average measure intraclass correlation" were obtained. The single measure ICC was used for calculation of the ICCs of different number of repeats (i), by the following formula
![]() |
The value of ICC tends to be slightly smaller than 1. The closer the ICC is to 1, the more similar the samples are (3, 18, 20).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To gain insight into characteristics of the ICC for the analysis of DNA microarrays, a data set was generated by labeling a sample two times independently with Cy5. This was mixed afterward with Cy3-labeled RNA from the same reference pool, then hybridized to separate but identical slides. From this microarray data set three scatter plots were made (Fig. 2). The first scatter plot was obtained from the raw microarray data (Cy5 values, Fig. 2A). Correcting this data for the random error (Fig. 1, Eq. 1) resulted in the second scatter plot (Fig. 2B). The third scatter plot (Fig. 2C) was obtained with the data of the second scatter plot after adjusting this for the systematic error (Fig. 1, Eq. 2). Subsequently, the R2 and ICC were calculated for each scatter plot. The uncorrected raw microarray data set rendered the smallest ICC and R2 of all three scatter plots (ICC = 0.910 and R2 = 0.977). After correcting the data for the random error (Fig. 1, Eq. 1), both ICC and R2 improved (ICC = 0.936 and R2 = 0.989). In contrast, when correcting for the systematic error (Fig. 1, Eq. 2), only the ICC improved (0.936 vs. 0.989). This shows that the ICC is sensitive for both the random variation and the systematic deviation, whereas the R2 is only sensitive for the random variation. Thus the ICC can be used to assess systematic deviation, whereas the R2 cannot. Since the number of replicates is not limited for calculation of the ICC, more than two samples can be compared at once. The R2 can only be calculated for two samples at the same time and thus only allows comparison for each scatter plot separately. Because the ICC is sensitive for both random variation and systematic deviation, it is a useful tool to obtain insight in both technical and biological variation.
|
Assessment of technical variation: labeling methods.
Various different labeling methods have been described for microarray hybridization experiments (9, 19, 21, 23). These methods can be divided in direct or indirect labeling methods. In the direct labeling method the fluorescent dye is incorporated during the cDNA synthesis, whereas in the indirect labeling method coupling of fluorescent dye to cDNA occurs afterwards. Since the fluorescent dyes are very bulky, indirect labeling may provide an advantage, since less bulky nucleotide modifications can be used, which are likely to improve reverse transcriptase function and will result in a higher insertion of the modified nucleotides and longer cDNA strands. We compared the quality and reproducibility of oligo(dT)-primed direct labeling and indirect labeling using either mRNA or total RNA as input material. In addition, for labeling of mRNA, also a combination of oligo(dT) and random hexamer was tested in the indirect as well as in the direct labeling method. This approach aims at a higher labeling efficiency and cannot be used for total RNA since it would also result in labeling of ribosomal RNA. For an overview of the methods tested, see Table 1. In the indirect labeling method, aminoallyl nucleotides were incorporated during the cDNA reaction followed by reactive dye coupling. We also tested biotin nucleotide incorporation followed by streptavidin-dye coupling. However, in our hands the biotin-streptavidin-based technique resulted in very poor signal-to-noise ratios, due to high background signals. Therefore, this method was not taken along in further evaluation (data not shown).
|
In all three direct labeling methods a smaller ICC was obtained compared with the ICC of indirect labeling methods when both corrections were applied, whereas a higher ICC was obtained from the directly labeling methods when only the first correction was applied. The random labeling variation seems to play a greater role in the direct labeling methods, whereas the systematic deviation seems to play a bigger role in the indirect labeling methods (Table 1).
mRNA labeled with a combination of oligo(dT) and random hexamer primers had a better reproducibility and signal-to-noise ratio compared with mRNA labeled with only oligo(dT) primers, for both direct and indirect labeling method. The ICC for total RNA tended to be smaller than the ICC for mRNA. Based on these results, we decided to use the indirect labeling method in all subsequent experiments. The use of mRNA in combination with oligo(dT) and random hexamer primers gave the best results and is the preferred method if sufficient amounts of RNA are available. However, total RNA was used in subsequent experiments, since only limited amounts of RNA could be obtained from the rectal biopsies that were used and further mRNA isolation would result in too little input material.
Assessment of biological variation.
The ICC was also applied to optimize human rectal biopsy sampling by investigating the variation in gene expression in rectal biopsies between and within persons. Since this question is targeted at assessment of random variation (difference in gene expression) and not the systematic deviation, both data corrections (Fig. 1, Eqs. 1 and 2) were performed on the microarray data set. Multiple biopsies (n = 46) were taken from five different healthy persons for the within-person variation analysis. From 22 persons with intestinal complaints, but without visible symptoms, one rectal biopsy was obtained to assess the variation in gene expression of biopsies from different subjects. Total RNA was labeled using the indirect labeling method and hybridized to separate but identical microarrays in combination with identical reference cDNA on all slides. To determine the within-person variation, all biopsies of one individual were used separately in SPSS to obtain two ICCs, namely the single and average measured value. Using these, we calculated the ICCs ranging from one to six biopsies, for each of the five individuals separately (Fig. 3A). An average ICC of 0.870, of the five individuals, was found if one biopsy per person was taken. It should be noted that a variation in gene expression in one biopsy can only be obtained by calculation in relation to gene expression data of multiple biopsies. If two biopsies are used for analysis of variation, then the ICC becomes 0.930. Additional biopsies lead to a slightly increased ICC, but the biggest increase in ICC is found in the step from one to two biopsies.
|
The ICC of two biopsies is above 0.9. Therefore, it can be concluded that at least two rectal biopsies per person should be used when working with DNA microarrays. To establish whether the ICC can be improved by pooling biopsies of a person, the following analysis was done. The ICC was obtained from two individuals, with two biopsies per individual; these were used separately, or the values were averaged per individual (N = 4 or N = 2, ICC4 vs. ICC2). This was also performed for three individuals, with two biopsies per individual; these were used separately or the values were averaged per individual (N = 6 or N = 3, ICC6 vs. ICC3). Use of the mean of two biopsies gave an improved ICC compared with the single biopsies (ICC = 0.026, SD = 0.009). The reliability of the microarray data of biopsies from different persons also becomes greater when the number of biopsies increases (Fig. 3B). When combining all the subjects, biopsies from four different persons give an ICC above 0.9. This is well within the resolution of the analysis, since biopsies from 21 different persons are necessary to obtain an ICC that is identical to the technical variation [0.981, Table 1 (indirect labeling of total RNA)]. By dividing the subjects into groups, by age and gender, more homogeneous groups are created. In three cases this resulted in improved ICCs, except in the group of male subjects that are older than 60 yr; there, for unknown reasons, biopsies from six different persons are needed to give an ICC above 0.9 (0.908). By pooling all data, the diversity in this group (men over 60 yr old) is concealed by the uniformity of the other groups.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Assessment of technical variation: data normalization.
Data normalization by cohybridization of a separately labeled reference sample is widely used and accepted in DNA microarray analysis (8, 17); in most cases mean values are used. We compared data normalization performed with mean or median values. Since the ICC is sensitive for both random and systematic deviation, it could be established that the median value, rather than the mean value, of overall Cy5 values should be used for correction for the systematic error (Fig. 1, Eq. 2), in data normalization. The normalization step is based on the assumption that there are only minor differences in overall gene expression between samples and that there is symmetry in expression levels of up- and downregulated genes. The assumption that there are only minor differences in overall gene expression levels between samples is not valid in each experimental setup. First of all, the hybridized arrays should contain a large number of genes. Furthermore, the arrays should not contain a preselected group of genes that is expected to hybridize stronger to one sample than to the other. Finally, the total number of different RNA messengers expressed in the samples should be equal. When in one sample a lower variation in the type of RNA messengers is expressed compared with the other, the hybridization signals will be relatively high for the sample where a lower number of different mRNA transcripts are expressed, assuming that equal amounts of RNA are used in the reverse transcription reaction. For these exceptions, another method for the second data correction step should be used, for example, normalization based on a set of "housekeeping" genes or normalization based on a set of spiked controls, each with their own disadvantages.
Assessment of technical variation: labeling methods.
Total or mRNA that was indirectly labeled gave a better reproducibility and quality compared with direct labeling, although a larger systematic error was found. The reproducibility and quality was increased when mRNA was reversibly transcribed with the combination of oligo(dT) and random hexamer, for the direct as well as indirect labeling. Theoretically the reproducibility (ICC) has a maximum value of 1. In most papers a reproducibility for labeling methods of 0.9 and higher is considered to be sufficient (10, 21, 23). In view of the above, a good reproducibility was obtained for all labeling methods, particularly for the methods which use the indirect labeling method. The low background of the indirect labeling method can be ascribed to the removal of excess dye that is accomplished by a column, in contrast to precipitation in the direct labeling method. Adjusting for the systematic error led to enhanced ICCs, especially for the indirect labeling methods. This can be due to efficiency differences of the chemical coupling of the dye. An increase of the coupling time might overcome this problem.
Assessment of biological variation.
As expected, the between-person variation was found to be greater than the within-person variation in human rectal biopsies. No difference was found between male and female rectal biopsies. The within-person variation in gene expression can be attributed to a difference in composition of the biopsies. The biopsies taken can vary in the blood and muscle content, but also in the constitution of the epithelium layer consisting of enterocytes, goblet, endocrine, and gut-associated lymphatic tissue cells. The above-mentioned difference can also have an effect on the between-person variation. Variation in the biopsies can also be due to the fact that the rectum might be affected by disease, despite the absence of visible symptoms in colon or rectum.
To decrease variation, two biopsies per subject should be taken for analysis. This was determined for individuals separately (Fig. 3A), but also, when multiple biopsies from different persons were combined into multiple groups and analyzed, the mean of two biopsies gave an improved ICC compared with using the single biopsies (ICC = 0.026, SD = 0.009). From the results presented in Fig. 3B, where a single biopsy per person was used for gene expression analysis, it can be determined that four persons per group are required to obtain a sufficiently homogeneous sample, based on an ICC cutoff of 0.9. However, by separating persons in groups, we found that in some cases (males of over 60 yr) four biopsies seemed insufficient. This is in agreement with Hwang et al. (11), who found in DNA microarray analysis of bone marrow samples from different lymphoid leukemia subtypes that seven persons per group are required to separate distinct disease states or other physiological differences with statistically significant reliability.
We also have used this data set to determine the minimum sample size in microarray experiments. Using the same threshold as Hwang et al. (0.95), we found that a minimum of seven subjects per subgroup should be used for this data set.
In view of all the data, we suggest that if the group that is sampled is diverse or not well characterized, as is generally the case in intervention trials or cohort studies, then the overall cutoff should be increased to 0.95. This implies that, if one biopsy per person is analyzed for gene expression, then at least eight different persons per group are needed. This number can be reduced to a minimum of six persons if two or more biopsies per person are sampled and averaged.
One should take also into account that six repeats are considered to be sufficient in our laboratory conditions, but this value must be determined by each experimenter, as it may depend on several local factors (array quality, probe labeling, hybridization conditions, scanning of the slides, etc.).
We showed that the ICC can be used for assessment of technical and biological variation in microarray experiments. After evaluation of the technical variation, we recommend that the indirect labeling should be used and whenever possible mRNA should be taken as input material with the combination of oligo(dT) and random hexamer primers. Assessing the biological variation of human rectal biopsies revealed that two biopsies per person and at least six persons, in total per group, should be analyzed when studying gene expression for example in human dietary intervention trials.
![]() |
ACKNOWLEDGMENTS |
---|
This work was supported by MLDS Grant WS 99-72, the Dutch Digestive Diseases Foundation, and by Grant VCZ 980-10-020 from ZonMW, the Netherlands Organization for Health Research and Development.
![]() |
FOOTNOTES |
---|
Address for reprint requests and other correspondence: J. Keijer, Food Bioactives Group, RIKILT-Institute of Food Safety, PO Box 230, 6700 AE Wageningen, The Netherlands (E-mail: jaap.keijer{at}wur.nl).
10.1152/physiolgenomics.00111.2003.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|