Quantitative assessment of the importance of dye switching and biological replication in cDNA microarray studies
Mingyu Liang1,
Amy G. Briggs1,
Elizabeth Rute1,
Andrew S. Greene1,2 and
Allen W. Cowley, Jr.1
1 Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin 53226
2 Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin 53226
 |
ABSTRACT
|
---|
Dye switching and biological replication substantially increase the cost and the complexity of cDNA microarray studies. The objective of the present analysis was to quantitatively assess the importance of these procedures to provide a quantitative basis for decision-making in the design of microarray experiments. Taking advantage of the unique characteristics of a published data set, the impact of these procedures on the reliability of microarray results was calculated. Adding a second microarray with dye switching substantially increased the correlation coefficient between observed and predicted ln(ratio) values from 0.38 ± 0.06 to 0.62 ± 0.04 (n = 12) and the outlier concordance from 21 ± 3% to 43 ± 4%. It also increased the correlation with the entire set of microarrays from 0.60 ± 0.04 to 0.79 ± 0.04 and the outlier concordance from 31 ± 6% to 58 ± 5% and tended to improve the correlation with Northern blot results. Adding a second microarray to include biological replication also improved the performance of these indices but often to a lesser degree. Inclusion of both procedures in the second microarray substantially improved the consistency with the entire set of microarrays but had minimal effect on the consistency with predicted results. Analysis of another data set generated using a different cDNA labeling method also supported a significant impact of dye switching. In conclusion, both dye switching and biological replication substantially increased the reliability of microarray results, with dye switching likely having even greater benefits. Recommendations regarding the use of these procedures were proposed.
experimental design; Pearson correlation coefficient; outlier concordance; Northern blot; gene expression
 |
INTRODUCTION
|
---|
CDNA MICROARRAY HAS BECOME an increasingly important technique for high-throughput measurement of mRNA expression. A cDNA microarray experiment typically involves labeling mRNAs from two samples being compared with different fluorescent dyes such as Cy3 and Cy5. The two samples are then hybridized together to a microarray containing cDNA probes for thousands of genes. The ratio between the fluorescent intensities of Cy3 and Cy5 at each spot provides a measure of the relative expression level of this gene between the samples examined. Although several variations of DNA microarray techniques have been introduced and the application of them has been diversified, cDNA microarrays utilizing such a two-color hybridization method, as described originally by Schena et al. (16), remain one of the most widely used methods.
The potentially enormous power of the cDNA microarray technique and its inherent complexity has motivated a large number of experiments and analyses studying various aspects of the technique, particularly the preparation of arrays and samples and the analysis of data (14). As cDNA microarray is being incorporated into more physiologically oriented studies involving multiple factors and naturally existing variability, experimental design also needs to be rigorously addressed (2, 19). Two particularly important issues in experimental design are the use of dye switching and biological replication. Due to the physiochemical differences between fluorescent dyes Cy3 and Cy5, it is suspected that they might cause systematic bias in the ratios generated. Random variations in the handling of the two samples or the scanning of the two fluorescent channels could also result in ratio bias. In addition to normalization between the two dyes (14, 15), a commonly used approach to correct any residual dye bias is to repeat the hybridization, with Cy3 and Cy5 switched between the two samples being compared. Biological replication, in which several independent individuals are analyzed in a study, is a standard practice in physiological experiments because of the well-known variability between individuals. Although both dye switching and biological replication are intuitively beneficial for cDNA microarray studies, one of the drawbacks is that these procedures substantially increase the costs of these already expensive experiments, further limiting the ability of a laboratory to use cDNA microarrays. These procedures also further increase the complexity of the experimental design and the data structure, posing even greater challenges for data analysis. Therefore, the practical question becomes, to what extent a cDNA microarray experiment can benefit from dye switching and/or biological replication, i.e., whether the benefits are great enough to justify the additional costs and the increased complexity.
In the present analysis, we took advantage of the unique characteristics of a published microarray data set that was generated in a physiologically oriented context (9), and we developed several algorithms to quantitatively assess the importance of dye switching and biological replication. Guidelines for designing cDNA microarray experiments were proposed based on this analysis.
 |
METHODS
|
---|
Characteristics of the Data Set Used
A previously published microarray data set (9) was utilized for this analysis, in which cDNA microarrays were used to identify genes in the rat renal medulla associated with the development of Dahl salt-sensitive hypertension. A custom-made microarray containing cDNA probes for
2,000 genes, representing
80% of all currently known rat genes, was used. Microarray hybridization was carried out with the widely used direct, two-color, Cy3 and Cy5, labeling method. A custom-designed data analysis method was used to screen for reliable data points and to adjust signal intensity, correct background, and calculate and normalize natural log-transformed ratios. Details of these procedures were described previously (9). Renal medullary mRNA expression profiles were compared in four groups of rats, Dahl salt-sensitive rats on a low-salt (SSLS) or high-salt (SSHS) diet, and consomic, salt-insensitive SS.BN13 rats on a low-salt (13LS) or high-salt (13HS) diet, using a loop-like, four-way comparison experimental design with a total of 24 microarrays. As depicted in Fig. 1A, three pairs of individual rats were compared in each comparison between two groups of rats (i.e., biological replication), and each pair of rats was examined with both forward and reverse labeling (i.e., dye switching). This design enabled the evaluation of contributions of biological replication and dye switching separately or in combination. Moreover, 20 randomly selected genes were further analyzed with Northern blots, providing one of the largest sets of validation data in the microarray literature, although still limited from a data analysis point of view.

View larger version (30K):
[in this window]
[in a new window]
|
Fig. 1. A: experimental design of the data set utilized in the present study. Four groups, each containing three individual rats, were subjected to four comparisons. Each comparison involved three different pairs of rats, each examined by both forward and reverse labeling. SSLS, Dahl salt-sensitive rats on a low-salt diet; SSHS, Dahl salt-sensitive rats on a high-salt diet; 13LS, consomic SS.BN13 rats on a low-salt diet; 13HS, consomic SS.BN13 rats on a high-salt diet. B: subsets of microarrays with different combinations used in the present analysis. The three numbers in the abbreviation shown in parentheses represent number of arrays, number of pairs of rats, and number of labeling directions (forward or reverse), respectively. Data subsets generated from one of the four comparisons are shown as examples.
|
|
A second data set from a study by Yuan et al. (21) was used to test whether the conclusions drawn from the main data set described above could be applied to other cDNA labeling methods. The study by Yuan et al. (21) used arrays similar to those used in the study described above but utilized a different labeling method using the commercially available TSA Labeling and Detection Kit (MICROMAX; NEN Life Science Products, Boston, MA). The TSA method (21) involved labeling the reverse transcription products generated from total RNA with fluorescein or biotin and subsequent antibody-mediated deposition of Cy3 and Cy5. Data from eight arrays (designated A1 to A8) examining eight control samples (designated C1 to C8) and eight treated samples (designated T1 to T8), each extracted from an individual rat, were analyzed. The dye-labeling pattern was as follows: in A1 to A4, the control samples (i.e., C1 to C4) were labeled with Cy3, and the treated samples (i.e., T1 to T4) were labeled with Cy5; in A5 to A8, the labeling was reversed, i.e., the control samples were labeled with Cy5, and the treated samples were labeled with Cy3.
Identification of Outliers Using an Intensity-Dependent, Continuous Curve of Threshold
A criterion of two times the standard deviation of the entire set of ln(ratio) values was used as the threshold to identify differentially expressed genes (i.e., outliers) in the original study (9). This criterion assumed that expressions of the majority of genes remained unchanged under the experimental conditions examined. However, a large dispersion of ln(ratio) values has been noticed at the lower range of signal intensity, which gradually decreases as intensity increases. Similar dispersion patterns were seen when identical samples were hybridized against each other (1), indicating that it was a systematic technical artifact, rather than a biological phenomenon. With data dispersed in this manner, when a constant threshold such as two times the standard deviation of the entire set of ln(ratio) values is applied, genes with lower intensities have a higher probability of being identified as outliers. To avoid this bias, an algorithm was developed to generate an intensity-dependent, continuous threshold curve. Genes were ranked according to their intensities and divided into consecutive groups, each containing 50 genes. The average of normalized ln(ratio) values in each group was confirmed to be close to 0. The standard deviation of ln(ratio) values as well as the average of ln(intensity) values in each group was calculated. An equation was identified to describe the relationship between two times the ln(ratio) standard deviation of each 50-gene group with the corresponding average of ln(intensity). This equation was then used to calculate a ln(ratio) threshold at the ln(intensity) level of any given gene. If the actual ln(ratio) of a gene exceeded the calculated ln(ratio) threshold, then the gene was considered an outlier. This threshold curve was refitted for each subset of arrays as defined below since each data subset might contain a different number of microarrays.
Generation of Data Subsets to Separate the Impact of Dye Switching and Biological Replication
To evaluate the impact of dye switching and/or biological replication, we divided data from each of the four comparisons into several subsets of data in six different combinations as shown in Fig. 1B. The combination of "1 array, 1 pair of rats, 1 way of labeling" (1-1-1) constituted a baseline condition where neither dye switching nor biological replication was utilized. The combination of "2 arrays, 1 pair of rats, both ways of labeling" (2-1-2) utilized dye switching when the second array was added, whereas the combination of "2 arrays, 2 pairs of rats, 1 way of labeling" (2-2-1) utilized biological replication. Any changes in the reliability of microarray results in combinations "2-1-2" and "2-2-1" compared with "1-1-1" would reflect the impact of dye switching and biological replication, respectively, in addition to the impact of adding a second array itself. The combination of "2 arrays, 2 pairs of rats, 2 ways of labeling" (2-2-2) would reflect the impact of simultaneous addition of dye switching and biological replication in the second array. The combination of "4 arrays, 2 pairs of rats, both ways of labeling" (4-2-2) reflected the impact of dye switching and biological replication when they were added sequentially, but also reflected the impact of increasing the number of arrays to four. The combination of "6 arrays, 3 pairs of rats, both ways of labeling" (6-3-2) added to the combination of "4-2-2" another biological replicate with both ways of labeling. The ln(ratio) values were averaged for each gene in each subset and used for subsequent analyses. Note that in some combinations such as "2-2-1" and "2-2-2," a pair of rats had to be used in more than one data subset to take advantage of a more complete coverage of the available data. As a result, not all individual subsets of data in these combinations were completely independent of each other. Accordingly, conventional statistical significance was not tested. Similar trends were seen when only independent subsets were examined.
Quantification of the Impact of Dye Switching and/or Biological Replication
Three indices were examined to assess the reliability of results obtained from each combination described in Fig. 1B and, thereby, to quantify the importance of dye switching and/or biological replication.
Index 1: Consistency between observed ln(ratio) values and ln(ratio) values predicted on the basis of the loop-like, four-way comparison design.
With the loop-like four-way comparison design, ln(ratio) values for any given comparison could be predicted based on ln(ratio) values from the other three comparisons using the following formulas
 |
 |
 |
 |
For each combination of arrays shown in Fig. 1B, the Pearson correlation coefficient and the concordance of outliers were calculated as measures of the consistency between predicted and observed data. The Pearson correlation coefficient was calculated based on predicted ln(ratio) values and observed ln(ratio) values of all available genes. The outlier concordance, expressed as percentage, was calculated as [2 x M/(A + B)] x 100, in which A and B represented the numbers of outliers identified from two data subsets being compared (the predicted and the observed data in this case), and M represented the number of overlapping outliers. The number of outliers varied from one data subset to another but was generally within the range of 30 to 60. Ideally, predicted ln(ratio) values should be identical to observed ln(ratio) values. However, technical variance exists between any two microarrays. Since the predicted ln(ratio) values were essentially the sum of ln(ratio) values from three microarrays (or three sets of microarrays), the variance between predicted ln(ratio) values and observed ln(ratio) values would be greater than the variance that can be expected between any two sets of microarrays. The ability of dye switching and/or biological replication to reduce this composite variance, therefore, provided a sensitive measure of their benefits.
Index 2: Consistency between results from subsets of microarrays and the entire set of microarrays.
The Pearson correlation coefficient of ln(ratio) values and the concordance of outliers were calculated for each combination of arrays shown in Fig. 1B (except the combination of "6-3-2") compared with the entire set of arrays (i.e., the combination of "6-3-2"). The ability of dye switching and/or biological replication to increase this consistency was used as a measure of their benefits.
Index 3: Consistency between results from microarrays and Northern blots.
The Pearson correlation coefficient between microarray and Northern blot ln(ratio) values of 20 genes was calculated for each combination of arrays (Fig. 1B) as another index of the reliability of microarray results.
 |
RESULTS
|
---|
In the study by Liang et al. (9), microarray data were normalized by shifting the mean ln(ratio) of an array to 0. Other normalization methods, such as intensity-dependent normalization (LOWESS correction), pin-by-pin normalization, and scaling, have been described and shown to be beneficial (20). Therefore, the necessity of applying these methods to the data set used in this analysis was examined. The plots of ln(ratio) vs. averaged ln(intensity) were created for eight microarrays (two from each of the four comparisons). None of these arrays exhibited the typical, intensity-dependent deviation of ln(ratio) from 0 (the "Nike swoop" shape) that would constitute the basis for the LOWESS correction. The plots were generally symmetrical around the horizontal axis. An example of this plot can be found in Fig. 2A. Variations between the four printing pins also appeared to be small. The pin-to-pin coefficient of variance of the number of outliers (based on a threshold set for the entire array) was 0.19 ± 0.06 (n = 8), and that of the standard deviation of ln(ratio) [reflecting the range of ln(ratio) in each pin] was 0.10 ± 0.01 (n = 8). The array-to-array coefficient of variance of the standard deviation of ln(ratio) was 0.23. Therefore, it appeared that the benefit of applying additional normalization methods would be minimal in this particular data set, especially if the tradeoff of applying additional normalizations (i.e., the potential to compromise the validity of the assumptions underlying normalizations) was taken into consideration. These results support the notion that substantial differences exist among various array platforms and that normalization methods should be chosen based on the characteristics of specific data sets.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 2. A: an example of the intensity-dependent dispersion of ln(ratio). Yellow curves indicate global, intensity-independent threshold of differential expression determined by the two times of the ln(ratio) standard deviation of the entire array. Purple curves indicate the intensity-dependent, continuous thresholds of differential expression determined by the linear regression equation obtained in B. B: the two times of the ln(ratio) standard deviation of each 50-gene bin was linearly correlated with the averaged ln(intensity) of each bin.
|
|
Figure 2A depicts a typical distribution of ln(ratio) values over ln(intensity) values in a cDNA microarray hybridization. The dispersion of ln(ratio) values decreased as the ln(intensity) increased. If two times the standard deviation of the entire set of ln(ratio) values, 0.588, was used as the threshold of differential expression (the yellow lines in Fig. 2A), then a disproportionally large number of genes at the lower intensity range would be identified as differentially expressed. When two times the standard deviation of ln(ratio) values in each 50-gene bin was plotted against the averaged ln(intensity) of the bin, a linear relationship was revealed (Fig. 2B). The linear regression equation, ln(ratio) = -0.10 x ln(intensity) + 0.36, with a Pearson correlation coefficient of -0.78, was then used to calculate a threshold ln(ratio) for each gene based on its ln(intensity). These threshold ln(ratio) values formed continuous lines shown in purple in Fig. 2A. Differentially expressed genes (i.e., outliers) identified in this way were used in the following calculation of outlier concordance.
The Pearson correlation coefficient (r) between observed ln(ratio) values and predicted ln(ratio) values based on subsets of microarrays, each containing a single microarray (the combination "1-1-1," Fig. 1) was 0.38 ± 0.06 (n = 12, Fig. 3A), and the outlier concordance was 21 ± 3% (n = 12, Fig. 3B). Adding a second array examining the same pair of rats, but with a reverse labeling (the combination "2-1-2"), substantially increased the correlation coefficient to 0.62 ± 0.04 (n = 12) and the outlier concordance to 43 ± 4% (n = 12). When a second array was added to examine a different pair of rats with the same way of labeling (the combination "2-2-1"), the correlation coefficient was similarly increased to 0.62 ± 0.03 (n = 12), while the outlier concordance increased to 35 ± 3% (n = 12). Adding a second array examining a different pair of rats with a reverse labeling (the combination "2-2-2") did not increase the correlation coefficient (0.38 ± 0.08, n = 12) and only slightly increased the outlier concordance to 26 ± 4% (n = 12). Increasing the number of arrays to four or six to include two or three pairs of rats, each examined with forward and reverse labeling (combinations "4-2-2" or "6-3-2," n = 4 each), resulted in greater increases in the correlation coefficient that reached 0.69 ± 0.04 or 0.79 ± 0.03. In addition, the outlier concordance was increased to 52 ± 4% or 56 ± 4%. An example of the correlation for each combination is shown in Fig. 3, CH.

View larger version (41K):
[in this window]
[in a new window]
|
Fig. 3. Effects of dye switching and/or biological replication on the consistency between observed and predicted results. See METHODS for the calculation of predicted results. A: correlation coefficient (means ± SE of data subsets generated from the entire study; see the text for n numbers) between observed ln(ratio) values and predicted ln(ratio) values. B: concordance (means ± SE; see the text for n numbers) between the outliers identified from observed ln(ratio) values and predicted ln(ratio) values. CH: representative examples of correlation between observed and predicted ln(ratio) values obtained from each combination of arrays; 1-1-1 = 1 array, 1 pair of rats, 1 way of labeling; 2-1-2 = 2 arrays, 1 pair of rats, forward and reverse labeling; 2-2-1 = 2 arrays, 2 pairs of rats, 1 way of labeling; 2-2-2 = 2 arrays, 2 pairs of rats, one with forward and one with reverse labeling; 4-2-2 = 4 arrays, 2 pairs of rats, forward and reverse labeling for each pair of rats; 6-3-2 = 6 arrays, 3 pairs of rats, forward and reverse labeling for each pair of rats.
|
|
As shown in Fig. 4, AG, the correlation coefficient and outlier concordance between subsets of arrays and the entire set of six arrays followed the same trend of changes as the predictability described above. The exception to this was the combination of two arrays examining two pairs of rats with one forward and the other reverse labeling ("2-2-2"). The correlation coefficient and outlier concordance between the combination "2-2-2" and the entire set of six arrays reached a level similar to or slightly higher than that achieved when two arrays examining one pair of rats with both ways of labeling ("2-1-2") were evaluated (Fig. 4).

View larger version (32K):
[in this window]
[in a new window]
|
Fig. 4. Effects of dye switching and/or biological replication on the consistency between results from subsets of microarrays and the entire set of six microarrays. A: correlation coefficient (means ± SE; see the text for n numbers) between ln(ratio) values from subsets of microarrays and the entire set of microarrays. B: concordance (means ± SE; see the text for n numbers) between the outliers identified from subsets of microarrays and the entire set of microarrays. CG: representative examples of correlation between ln(ratio) values obtained from each subset of microarrays and the entire set of microarrays.
|
|
The correlation coefficients between array ln(ratio) values and Northern blot ln(ratio) values of 20 randomly selected genes also followed a similar trend (Fig. 5).

View larger version (13K):
[in this window]
[in a new window]
|
Fig. 5. Effects of dye switching and/or biological replication on the correlation coefficient (mean ± SE; see the text for n numbers) between ln(ratio) values of 20 genes obtained from microarrays and Northern blots.
|
|
To test whether these results were only associated with the particular cDNA labeling method used in the study by Liang et al. (9), a second data set (21) generated using a different labeling method was analyzed (see METHODS). Only the second consistency index (i.e., the consistency between data subsets with the entire data set) was calculated for this analysis due to the lack of a loop-like design needed for the first index and the limited number of Northern blots. Two types of data subsets were created, 2-2-1 (2 arrays, 2 pairs of rats, 1 way of labeling) and 2-2-2 (2 arrays, 2 pairs of rats, one with forward and the other with reverse labeling). Four individual subsets were created for each type. An example of 2-2-1 would be the combination of arrays A1 and A2, whereas the combination of A1 and A5 would be an example of 2-2-2. So the only difference between 2-2-1 and 2-2-2 was that 2-2-2 contained dye switching, whereas 2-2-1 did not. Variations between different pairs of rats were random and therefore should, on average, have equal impact on "2-2-1" and "2-2-2". When these subsets were compared with the entire set of arrays, it was found that the correlation coefficient was 0.76 ± 0.05 for 2-2-1, and it was 0.89 ± 0.02 for 2-2-2. The outlier concordance was 31 ± 6% for 2-2-1 and 47 ± 2% for 2-2-2.
 |
DISCUSSION
|
---|
Indices of Benefits
To quantitatively assess the benefits of dye switching and biological replication, one would ideally want to compare microarray results (with or without dye switching and/or biological replication) to a "gold standard" mRNA measurement method. This is, however, practically difficult. None of the mRNA measurement techniques currently available has been accepted as a "gold standard" by all biologists. Moreover, techniques that have been used to validate microarray results, such as Northern blotting and real-time PCR, are difficult to use at a throughput level high enough to allow large-scale comparisons with microarray.
In the absence of a "gold standard," it is still possible to de-compose sources of variation and assess the relative contribution of each source to the overall variations (7, 18). The purpose of the present study, however, was to assess the impact of dye switching and biological replication on the reliability of microarray results. Reliability may or may not be equivalent to reproducibility measured by variation, depending on how these terms are defined. In the setting of biological experiments such as the one analyzed in the present study, reliability can be further defined as precision (i.e., how precise the data reflect the subjects being measured) and "generalizability" (i.e., how well the conclusions derived from the measurement of a limited number of subjects can be extrapolated to a larger population).
In the present analysis, we took advantage of the unique characteristics of a published data set (9) and used the combination of three indices to assess the impact of dye switching and biological replication on the precision and/or generalizability of microarray results. Each index has advantages and disadvantages. The predictability index was used to assess precision, because if each measurement were a precise representation of the subject, then the measured and the predicted data would be identical. One could argue that this index was in fact reflecting reproducibility in repeated measurements of a subject, which may or may not be equivalent to precision. The measured and the predicted data would be identical so long as the repeated measurements were reproducible, even though they might not be precise. However, in the absence of a "gold standard," reproducibility in repeated measurements does provide a reasonable indication of precision. An advantage of this index is that it is free of any assumptions regarding the benefits of dye switching or biological replication. The disadvantage is that it does not reflect generalizability. The use of the consistency with the entire set of arrays had the disadvantage of assuming qualitative benefits of dye switching and biological replication because both procedures were utilized in the entire set of arrays. However, so long as this assumption was acceptable, the relative ability of dye switching and/or biological replication in each subset of arrays to bring the results closer to the entire set of arrays would provide a straightforward measure of the quantitative benefits of dye switching and/or biological replication in extrapolating the results to the whole population, i.e., the generalizability of the results. The obvious advantage of the comparison with Northern blots was the use of an independent second technique, and it could reflect both precision and generalizability. The disadvantage was the number of genes for which both microarray and Northern blot data were available was limited, reducing the power of this index. In addition, because of the lack of a "gold standard," one could always question the relative reliability of microarray vs. Northern blot. Therefore, despite the limitations of each index, the three indices appear to complement each other. Consistent trends observed in more than one of them would provide a strong indication of improvements in data reliability.
Relative Benefits of Dye Switching and Biological Replication
One of these consistent trends was the improvement of all three indices when a second array was added using the reverse labeling to examine the same pair of rats (i.e., dye switching). A 63% increase in correlation coefficient and a doubling of outlier concordance between observed and predicted data were obtained. Similar improvements were found when comparing between subsets of arrays and the entire set of arrays. The data set available did not allow quantitative distinction between the effect of dye switching and the effect of simply adding a second array. However, the improvement in consistency that was observed very likely involved the benefits of dye switching, because other combinations containing two arrays did not achieve the same level of improvement. In fact, the improvement achieved by adding a second array labeled in the same way but to examine a different pair of rats (i.e., biological replication) was often less than that obtained by dye switching. These results indicated that both dye switching and biological replication improved the reliability of microarray results, with dye switching likely having even greater benefits.
The ln(ratio) data used in these analyses had been normalized by adjusting the mean ln(ratio) of each array to 0 (9). It therefore appears that normalization alone was not sufficient to remove the influence of the dye difference. This was consistent with the remarkably strong effect of the dye difference on microarray results, such as that reported by Jin et al. (5), supporting the notion that dye switching is required for obtaining reliable microarray results. The exact nature and the mechanism underlying dye biases are not clear at present. Further experiments and a deeper understanding of the physiochemical characteristics of the dyes and their binding kinetics are needed to address these questions.
The importance of replication in microarray experiments has been emphasized (6, 8, 12). The present analysis showed that biological replication, even when applied in the absence of dye switching, also appeared to have substantial benefits. The magnitude of the impact of biological replication depends highly on the level of naturally existing individual variability in each specific experimental setting. To determine exactly how many replicates are needed for a specific experiment, one would have to determine the variability level of each gene of interest, the magnitude of expression differences expected, and the statistical power desired. Several studies have examined the "normal" variability of gene expression levels (3, 4, 10, 13), providing a prototype of this kind of assessment.
When dye switching and biological replication were included simultaneously in the second array added (the combination of "2-2-2"), consistency with the entire set of arrays was substantially improved to a level similar to or slightly higher than that achieved by the combination of "2-1-2" (i.e., dye switching without biological replication). However, the predictability was only minimally increased compared with a single array. This was perhaps a result of the different nature of these two indices. The consistency with the entire set of arrays essentially reflects the generalizability of the results, that is, the ability to extrapolate the results to the whole population. The predictability, on the other hand, was an index of precision, that is, the accuracy in the measurements of the samples being examined. Compared with a single array, adding a second array with reverse labeling and examining a different pair of rats enhanced the resemblance of the combination structure with the entire set of arrays. It thereby increased the generalizability of the results. However, the second array was used to examine a second pair of rats and with reverse labeling. This, therefore, did not improve the precision of the measurement of mRNA levels in either pair of rats involved and did not substantially improve the predictability.
The improvement of index 2 by the inclusion of dye switching was observed in a second data set generated using a different cDNA labeling method (21). It is important to keep in mind that this index assumes that dye switching is qualitatively beneficial. Therefore, we cannot use this index alone to draw conclusions regarding the benefit of dye switching. However, the fact that this index performed differently for combinations 2-2-1 (without dye switching) and 2-2-2 (with dye switching) supported the notion that dye-labeling patterns had an effect on the results obtained using this labeling method, which is consistent with the conclusion drawn from the analysis of the data from Liang et al. (9).
Determining Thresholds of Differential Expression
Determining the threshold of differential expression is a major issue in microarray studies. A fixed fold change was widely used in earlier studies, often without a convincing rationale. A standard deviation-based threshold (9), predetermined P value threshold (5, 21), corrected P values (17) and "null distribution"-based approaches (4, 10) have also been applied. The intensity-dependent dispersion of ratios has been noted previously (11). The intensity-dependent, continuous threshold curve utilized in the present analysis was similar to that developed by Mutch et al. (11), except that a logarithmic function, instead of an inverse function, was used in the present analysis. Genes identified as differentially expressed using this equation contained a more consistent representation of genes across the entire range of ln(intensity) as shown in Fig. 2A.
It is important to point out that although dye biases and intensity-dependent effects could be partially related, they are in essence two distinct problems. Furthermore, two types of intensity-dependent effects need to be distinguished. One is the intensity-dependent dispersion of log-transformed ratios (i.e., a wider dispersion of ratios at lower intensity levels), which we observed in our data set and addressed by using the threshold curve. The other is the intensity-dependent deviation of log-transformed ratios from 0, i.e., the "Nike swoop" shape (20), which we did not observe in our data set.
Summary and Recommendations
The present analysis indicated that both dye switching and biological replication improved the reliability of microarray results. Dye switching appears to yield greater benefits. The selection of experimental design is governed by scientific logic but can also be influenced by practical issues such as the availability of materials or resources. The results of this analysis argue against sacrificing dye switching and biological replication for the sake of reducing costs or experimental complexity. Based on these analyses, we propose the following guidelines for designing cDNA microarray experiments when only a small, fixed number of microarrays is available for a particular study. If the main purpose of the experiment is to obtain estimates of the whole population, then each array should be used to examine a different pair of samples, with dyes reversed in half of the pairs. If obtaining accurate measurements for the samples examined is the main concern, then two arrays with dye switching should be used to examine each pair of samples. If both the generalizability and the precision are desired, then the second design is preferred because, compared with the first design, the gain of precision appears quantitatively much greater than the loss of generalizability. It is important to note that these guidelines are developed based on physiologically oriented experiments using cDNA microarray techniques described in the two studies analyzed (9, 21). Caution should be taken when applying these guidelines to experiments with drastically different characteristics or using other types of microarray techniques.
 |
DISCLOSURES
|
---|
This study was supported by National Heart, Lung, and Blood Institute Grants HL-66579, HL-54998, and HL-29587.
Editor S. R. Gullans served as the review editor for this manuscript submitted by Editor A. W. Cowley, Jr.
 |
ACKNOWLEDGMENTS
|
---|
We gratefully acknowledge Meredith Skelton for critical review of the manuscript, and the Microarray Group in the Department of Physiology at the Medical College of Wisconsin for helpful discussion.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: M. Liang, Dept. of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226 (E-mail: mliang{at}mcw.edu).
10.1152/physiolgenomics.00143.2002.
 |
REFERENCES
|
---|
- Amaral SL, Liang M, Rute E, Cowley AW Jr, and Greene AS. cDNA microarray analysis of gene expression in skeletal muscle angiogenesis after chromosomal substitution in Dahl S rats (Abstract). Hypertension 40: 396, 2002.
- Churchill GA. Fundamentals of experimental design for cDNA microarrays. Nat Genet 32: 490495, 2002.[ISI][Medline]
- Hsiao LL, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, Dillon W, Lee KF, Clark KE, Haverty P, Weng Z, Mutter GL, Frosch MP, Macdonald ME, Milford EL, Crum CP, Bueno R, Pratt RE, Mahadevappa M, Warrington JA, Stephanopoulos G, Stephanopoulos G, and Gullans SR. A compendium of gene expression in normal human tissues. Physiol Genomics 7: 97104, 2001. First published October 2, 2001; 10.1152/physiolgenomics.00040.2001.[Abstract/Free Full Text]
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, and Friend SH. Functional discovery via a compendium of expression profiles. Cell 102: 109126, 2000.[ISI][Medline]
- Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, and Gibson G. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet 29: 389395, 2001.[ISI][Medline]
- Kerr MK and Churchill GA. Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci USA 98: 89618965, 2001.[Abstract/Free Full Text]
- Kerr MK, Martin M, and Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol 7: 819837, 2000.[ISI][Medline]
- Lee ML, Kuo FC, Whitmore GA, and Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci USA 97: 98349839, 2000.[Abstract/Free Full Text]
- Liang M, Yuan B, Rute E, Greene AS, Zou AP, Soares P, McQuestion GD, Slocum GR, Jacob HJ, and Cowley AW Jr. Renal medullary genes in salt-sensitive hypertension: a chromosomal substitution and cDNA microarray study. Physiol Genomics 8: 139149, 2002. First published January 2, 2002; 10.1152/physiolgenomics.00083.2001.[Abstract/Free Full Text]
- Liang M, Yuan B, Rute E, Greene AS, Olivier M, and Cowley AW Jr. Insights into Dahl salt-sensitive hypertension revealed by temporal patterns of renal medullary gene expression. Physiol Genomics 12: 229237, 2003. First published December 10, 2002; 10.1152/physiolgenomics.00089.2002.[Abstract/Free Full Text]
- Mutch DM, Berger A, Mansourian R, Rytz A, and Roberts MA. The limit fold change model: a practical approach for selecting differentially expressed genes from microarray data. BMC Bioinformatics 3: 17, 2002.[Medline]
- Oleksiak MF, Churchill GA, and Crawford DL. Variation in gene expression within and among natural populations. Nat Genet 32: 261266, 2002.[ISI][Medline]
- Pritchard CC, Hsu L, Delrow J, and Nelson PS. Project normal: defining normal variance in mouse gene expression. Proc Natl Acad Sci USA 98: 1326613271, 2001.[Abstract/Free Full Text]
- Quackenbush J. Computational analysis of microarray data. Nat Rev Genet 2: 418427, 2001.[ISI][Medline]
- Quackenbush J. Microarray data normalization and transformation. Nat Genet 32: 496501, 2002.[ISI][Medline]
- Schena M, Shalon D, Davis RW, and Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467470, 1995.[Abstract]
- Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32: 502508, 2002.[ISI][Medline]
- Wang X, Ghosh S, and Guo SW. Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res 29: E75, 2001.[Medline]
- Yang YH and Speed T. Design issues for cDNA microarray experiments. Nat Rev Genet 3: 579588, 2002.[ISI][Medline]
- Yang YH, Dudoit S, Luu P, and Speed TP. Normalization for cDNA Microarray Data (Technical Report no. 589). Berkeley, CA: Dept. of Statistics, Univ. of California at Berkeley, 2000.
- Yuan B, Liang M, Yang Z, Rute E, Taylor N, Olivier M, and Cowley AW Jr. Gene expression reveals vulnerability to oxidative stress and interstitial fibrosis of the renal outer medulla to non-hypertensive elevations of angiotensin II. Am J Physiol Regul Integr Comp Physiol 284: R1219R1230, 2003. First published January 23, 2003; 10.1152/ajpregu.00257.2002.[Abstract/Free Full Text]