A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays
M. C. K. YANG1,
Q. G. RUAN2,
J. J. YANG1,
S. ECKENRODE2,
S. WU1,
R. A. MCINDOE2 and
J. X. SHE2
1 Department of Statistics, University of Florida
2 Department of Pathology, Immunology, and Laboratory Medicine, Center for Mammalian Genetics and Diabetes Center of Excellence, University of Florida, Gainesville, Florida, 32610-0275
 |
ABSTRACT
|
---|
Over the last few years, there has been a dramatic increase in the use of cDNA microarrays to monitor gene expression changes in biological systems. Data from these experiments are usually transformed into expression ratios between experimental samples and a common reference sample for subsequent data analysis. The accuracy of this critical transformation depends on two major parameters: the signal intensities and the normalization of the experiment vs. reference signal intensities. Here we describe and validate a new model for microarray signal intensity that has one multiplicative variation and one additive background variation. Using replicative experiments and simulated data, we found that the signal intensity is the most critical parameter that influences the performance of normalization, accuracy of ratio estimates, reproducibility, specificity, and sensitivity of microarray experiments. Therefore, we developed a statistical procedure to flag spots with weak signal intensity based on the standard deviation (
ij) of background differences between a spot and the neighboring spots, i.e., a spot is considered as too weak if the signal is weaker than c
ij. Our studies suggest that normalization and ratio estimates were unacceptable when this threshold (c) is small. We further showed that when a reasonable compromise of c (c = 6) is applied, normalization using trimmed mean of log ratios performed slightly better than global intensity and mean of ratios. These studies suggest that decreasing the background noise is critical to improve the quality of microarray experiments.
microarray; gene expression; statistics; normalization; functional genomics
 |
INTRODUCTION
|
---|
THE TREMENDOUS ADVANCE of the human genome project and development of new high-throughput technologies has created unparalleled opportunities to study the mechanism of disease, monitor disease progression, and evaluate effective therapies. As more and more genes are being identified, it has become extremely important to understand the function of these genes and their pathways. Global gene expression analysis is a critical component of this ambitious endeavor. Microarray technologies offer investigators an opportunity to simultaneously monitor the expression of a large number of genes in the context of their biological system.
For this study, we concentrated on microarray-based studies monitoring RNA expression levels using cDNA microarrays printed on glass microscope slides. Brown and coworkers (2, 3, 5, 11, 12) developed the protocols widely used to do this type of assay. The basic strategy for this type of analysis is to isolate RNA from two sources, a reference and an experimental sample. The RNA samples are converted to cDNA and labeled with a fluorophore, typically Cy3 or Cy5. The two probes are combined and hybridized to the microarray. Following the hybridization and washes, the array is scanned at both 532 nm (Cy3) and 635 nm (Cy5) to detect the labeled cDNA that has hybridized to the array. The two images produced from the scanner are combined, and the data for each spot (gene) is collected (along with background and error measurements). The data used for subsequent analysis is typically expressed in the form of a ratio of experimental expression to reference expression. The hybridizations are repeated multiple times to ensure reproducibility and confidence in the measurement. Once the data from several hybridizations are available, a variety of clustering and statistical methods can be used to determine which genes to study further (6, 7, 13).
One of the most critical steps in microarray experiments is to accurately assess the expression ratios between the sample and reference in a dual color experiment because most subsequent analyses of the data, such as cluster analysis (6), depend on the accuracy of these ratios. Even statistical decision rules such as shrinkage estimation (9) or hypothesis testing on yes/no types of expression differences depend on these original ratios. The ratio estimation consists of two major steps: 1) to eliminate elements with weak expression that are statistically too close to the background to avoid the detrimental effect in the ratio and 2) to adjust the expression for each gene by the overall expression. The ratio estimate follows by simply taking the ratio of the two adjusted expressions or the difference of the log transformation of the expressions. We used experimental replicates to determine the appropriate rules for each of the two steps.
 |
MATERIALS AND METHODS
|
---|
cDNA Clones and PCR Products
Each microarray used in this study contained 11,520 mouse cDNA clones. This set of clones was derived from a normalized nonobese diabetic (NOD) spleen cDNA library created in our laboratory by the subtraction of NOD spleen cDNA from B6 spleen cDNA using a published PCR strategy and a kit from Clontech (4). To produce microarrays, cDNA clone inserts were amplified directly from bacterial culture in 96-well plates using PCR primer pairs that anneal to the vector adjacent to the cloning site for these inserts. PCR products were purified by ethanol precipitation. The target DNA was then resuspended in 3x SSC and stored at -20°C until needed for printing.
Microarray Printing
The glass slides used in the production of the arrays were coated in-house with poly-L-lysine using a previously described protocol (10). The microarrays were printed using a GMS 417 arrayer (Affymetrix, Santa Clara, CA) and subsequently processed using previously described protocols (10). The details of these protocols can be found at http://www.genomics.ufl.edu/microarray. The microarrays can then be stored for several months in a dehumidified chamber.
RNA/Probe Preparation, Hybridization, and Data Capture
Total RNA was isolated from the spleen of NOD mice at 4 wk, 5 wk, and 20 wk of age using the Qiagen RNA extraction kits per the manufacturers instructions. The reference RNA was a pool of total RNA from 4-wk-old spleens from 10 NOD and 10 C57BL/6 animals. In our current assays, we can obtain excellent hybridization signals with 1015 µg of total RNA. The hybridization probes are produced using an amino-allyl labeling strategy. This labeling strategy incorporates an amino-allyl modified dUTP [5-(3-aminoallyl)-2'-deoxyuridine 5'-triphosphate (aadUTP), from Sigma] during cDNA synthesis. Following the synthesis, a monofunctional NHS-ester Cye dye (either Cy5 or Cy3) is coupled to the modified dUTP in a sodium bicarbonate buffer. This labeling procedure circumvents the problem of a low incorporation rate with either the Cy3- or Cy5-coupled dUTP. The aadUTP will incorporate into the growing cDNA strands at rates similar to the unmodified nucleotide. After coupling the Cye dye to the aadUTP, the unincorporated dyes are removed using a Qia-Quick PCR purification kit (Qiagen). Once the reference and experimental probes are created, the hybridization is accomplished under a coverslip in a specially designed hybridization chamber (Array-It). The hybridization of the probes to the array is allowed to proceed for 16 h. Once the hybridizations were complete, detection of the hybridized probes to the array is accomplished using the GMS 418 Scanner (Affymetrix, Santa Clara, CA). The image data was captured by scanning the slide twice, the first time at 532 nm (Cy3-labeled probes) and the second time at 635 nm (Cy5-labeled probes). This process produces two 16-bit TIFF files, one for each wavelength. These images were then merged and analyzed using Scanalyze by Michael Eisen (6).
Four experiments (exp 1, 2, 3, and 4) were carried out for this study. The first experiment consists of three independent reference/reference hybridizations (RR1, RR2, and RR3) using the reference RNA (a pool of total spleen RNA samples from ten 4-wk-old NOD and ten 4-wk-old B6 mice) labeled independently with both Cy3 and Cy5. Since the same RNA was labeled with both fluorophores, the expected ratio is one for every target gene on the slide. Experiments 2, 3, and 4 were hybridizations using the same reference sample against spleen RNA from an NOD mouse at 4, 5, and 20 wk (diabetic) of age, respectively. Each experiment consists of three replicates.
Statistical Model, Estimation Theory, and Justification
Data output from a dual color microarray experiment contains the average fluorescence intensities for the target and background intensities for both the reference and experimental samples. On each microarray, let
 | (1) |
represent the target and background fluorescence intensities for sample (color) i, (i = 1,2), at spot j (j = 1, 2,..., n), where n is the number of spots in the microarray.
The observed fluorescent intensities for each spot j is determined primarily by two parameters: the amount of probes available for hybridization and the amount of target DNA on the microarray (dj for spot j). In dual color microarray experiments, the amount of target DNA for a given spot is identical for the reference and experimental sample. The amount of probe is proportional to the mRNA expression levels (the fraction of mRNA for gene j among the total mRNA for all expressed genes) in the reference and experimental samples (
1j and
2j for gene j) and the total amount of mRNA used for labeling in the corresponding samples (a1 and a2, respectively). The amount of probe available for hybridization to spot j should be a1
1j and a2
2j, and the fluorescence intensities at spot j should be a1dj
1j and a2dj
2j after hybridization. Both a1dj
1j and a2dj
2j are subject to random error due to variation in measuring mRNA concentration (measurement error and purity), efficiency of probe preparation (labeling and purification), and hybridization efficiency. It is reasonable to assume that the variation (noise) is proportional to the true values, i.e., the fluorescence after hybridization can be represented as aidj
ij x exp(
ij) where
ij is a random variable with an ideal value of 0. A proportional constant may be added to the expression aidj
ij x exp(
ij); however, it is not necessary, because we are only interested in the ratio of expressions. A proportional constant would disappear in the ratio. We let
ij = aidj
ij as the ideal expression level for gene j in color i.
There is another type of noise from the fluorescence of the background due to the experimental conditions of the hybridization. The magnitude of this variation (noise) is variable across the glass slide. We assume that this noise is additive. Combined with the multiplicative variation (
ij), the observed target intensity for sample i at spot j is
 | (2) |
where b'ij represents the additive background intensity inside the spots. This inside background intensity cannot be measured, although the background noise bij surrounding the spots can be measured. The variables b'ij and bij are not identical, but we assume they have the same distribution.
The true sample to reference expression ratio (rj) and its log transformation (
j) are defined as
 | (3) |
Since the observed values (tij and bij) are not sufficient to estimate a1, dj,
ij,
ij, and b'ij, we will make the following assumptions on the distribution of b'ij - bij and
1j -
2j.
Assumption 1.
The background noise is additive as defined in Eq. 2. Moreover, the differences of b'ij and bij (b'ij - bij) are independent random variables with 0 mean and variance
ij2, where
ij2 is almost a constant among neighboring spots. In other words, the large variation in background noise can be stabilized by taking the difference of adjacent spots.
Assumption 2.
The differences
1j -
2j are independent and identically distributed normal random variables for all spots j.
We will adjust the observed intensities by subtracting the background measurement as
 | (4) |
A spot will be eliminated from further analysis if t'ij is small. The appropriate threshold for elimination depends on the background noise. When the background standard deviation (
ij) defined in assumption 1 is estimated as
ij, a spot is discarded if
 | (5) |
where c is a user-defined constant that will be determined later. It turns out, quite intuitively, that c cannot be too small. When c is large, the background noise contributes very little to the observed signal in Eq. 4, and then if we take the logarithm on Eq. 4, it becomes
 | (6) |
where
ij = ln[1 + (b'ij - bij)/(aidj
ije
ij)]
(b'ij - bij)/(aidj
ije
ij) is approximately a small zero mean random variable and becomes negligible if the signal is strong. Therefore, the observed log ratio (defined as yj) can be represented as
 | (7) |
where
= ln(a2/a1) and is the normalization constant,
j = ln(rj), and
j represents the last four random terms. The value
(the normalization constant) is used to compensate for the fact that each color is scanned independently under a different laser power and gain for each wavelength in a two-color microarray experiment. The true ratio rj (or log ratio
j) is not estimable without first determining this constant. However, to estimate the normalization constant
, we face the confounding problem of
and the mean value of
j. Actually the mean of
j is inseparable from
because an increase in a2 is equivalent to the increase in all rj. Using mathematical notation, the confounding effect can be expressed as yj = (
+ µ) + (
j - µ) +
j, i.e., we can claim the log ratio to be (
j - µ) for any µ without affecting the validity of the model. If we are interested in the relative expression level of genes within the same array, then the constant µ plays no role. But, if we are interested in the expression of the same gene in different arrays, then the values of µ affect the level of their expressions when compared. Normalization could be used to put the reference mRNA expression in two different arrays on an equal footing. For example, we may adjust the expressions so that the total expression of the target and reference are the same. This is reasonable for normalization; however, it is not compatible with the commonly used logarithmic transformation for the ratio. In addition, because the majority of the genes have weak expression, they affect the accuracy of the total estimation whether or not they are eliminated. When the logarithmic transformation is taken for the ratio, we may use the model in Eq. 7 and assume that 
j = 0 as in Kerr et al. (8). This assumption is convenient for the usual analysis of variance model setting, Eq. 7, but not so easy to interpret from biological viewpoint. Another assumption would be that "most" of the genes (so-called housekeeping genes) have the same expression. In other words, "most" of the
j are 0. We will define "most" later and will see that each of the assumptions will lead to a different method to estimate the
. We will compare them. All the assumptions are equivalent to setting µ = 0 in the expression yj = (
+ µ) + (
j - µ) +
j. Once
can be estimated, the estimate of the log ratio (
j) should be
 | (8) |
where
is the estimate of
.
We have not found a method that can estimate the variance of
j in a single experiment, because the variances of
j and
j are confounded. However, if the hybridization is repeated, then we can find the variance of
j and consequently a confidence bound of
j. Let the second experiment be represented in a similar notation to Eq. 7 as
 | (9) |
The difference between Eq. 7 and Eq. 9 produces
 | (10) |
Therefore, the variance of
j -
'j can be estimated by the sample variance of
yj. We will denote this as s2. If we let the variances of
j and
'j be 
2 and 
'2, then s2 estimates 
2 and 
'2. The estimate for the log ratio (
j) based on the two experiments should be
 | (11) |
where
'j is the
j in Eq. 8 for the second experiment. Moreover, a 1 -
confidence interval for the true log ratio
j is
 | (12) |
where z
/2 is the upper
/2 quantile from a standard normal distribution. It can be easily shown that Eq. 12 does not need to assume that the variances of
j and
'j are equal. Extension to more than two replicates can be worked out; however, we will omit the details here.
 |
RESULTS
|
---|
Validation of the Model
Validation 1a: The background noise is additive.
If the background noise is additive, then the distribution of the difference between signal and background (tij - bij) for very weak genes (tij is close to bij) should be the same as b'ij - bij, where j' is an adjacent spot. We define a target signal as very weak if tij - bij is smaller than 0.5 standard deviation of bij. Comparisons of the standard deviation and the quarter range of tij - bij for very weak spots and b'ij - bij revealed that the two distributions match very well (Table 1), except for a small increase (with respect to std) in the mean of S vs. B (Table 1). This is not unexpected, because some of the weak spots may still contain a small amount of true signal.
Validation 1b: The difference between two neighboring background values (
bij) is nearly normally distributed.
Figure 1 shows the histograms of the difference in background values between adjacent spots (
bij = b2j - b2(j+1)) from the replicates in experiment 1. These histograms illustrate that
bij tends to have a bell-shape histogram with a heavier concentration at the center than the normal distribution. Comparison of the standard deviations of bij and
bij revealed a great reduction of variation for
bij compared with the variation for bij (Table 2). The correlation between |
bij| and the average background noise (bij + bi(j+1))/2 (last column in Table 2) assesses whether the local variation of bij (measured by |
bij|) is affected by the magnitude of the background noise [measured by (bij + bi(j+1))/2]. The small correlation indicates that the variation of
bij is not related to the uneven spread of the background.

View larger version (14K):
[in this window]
[in a new window]
|
Fig. 1. A histogram of the difference in background noise levels between adjacent elements as described in assumption 1 ( bij = bij - b'ij). The data was from the three hybridizations in experiment 1. The distributions are nearly normal with heavier concentration in the center. RR1, RR2, and RR3 are three independent reference/reference hybridizations using the reference RNA (a pool of total spleen RNA samples from ten 4-wk-old NOD and ten 4-wk-old B6 mice) labeled independently with both Cy3 and Cy5.
|
|
The reduction of variation for
bij compared with the variation for bij is also illustrated by the distribution of bij (Fig. 2A) and
bij (Fig. 2B) associated with each spot. As can be seen, the distribution of
bij (Fig. 2B) is much more uniform than the original background (bij) (Fig. 2A). Although our data showed that assuming
bij as independent and identically distributed was acceptable, a few outliers of
bij still exist in microarrays containing a large number of spots (Fig. 2B). Therefore, we estimated the background noise standard deviation
ij for each spot locally. We tried several spot matrix to define "local" for the
ij estimation and found that using the eight nearest neighbors gives the best estimates
ij.

View larger version (92K):
[in this window]
[in a new window]
|
Fig. 2. Distribution of background in microarray experiments. Each cell represents the location of a spot on the microarray. The color represents the fluorescence intensity. The darker the color, the higher the value of the background noise (bij). A: typical background distribution. Intensity values are represented by different colors. B: distribution of the background difference ( bij) between neighboring spots. Because the difference of neighboring background values is used, the far left point is missing in each row. Apparently, B is less intense and more evenly distributed than A.
|
|
Validation 2: The multiplicative variation (
1j -
2j) is independent of the magnitude of the signal and normally distributed.
From assumption 2, the log ratio estimates
j in Eq. 8 should be normally distributed with 0 mean for two identical RNA samples, in which all log ratios (
j) are 0. This expectation is tested using the reference vs. reference hybridizations in experiment 1. As the assumption predicted, the histograms for the three replicates are all very close to a normal curve (Fig. 3). To evaluate that "(
1j -
2j)" is independent of spot j as predicted in assumption 2, we used a linear regression analysis to assess the relationship between the absolute mean value of three log ratio estimates and their variance. We expect the variance to be independent of the mean. Indeed, the R2 values from the regression analysis are close to zero (R2 = 0.136, 0.014, 0.00, and 0.013 for experiments 1, 2, 3, and 4, respectively). The slightly positive value for experiment 1 (Ref vs. Ref) is not surprising given that all true log ratios for this experiment are zero. Any large value in the three ratios will produce both a large mean and a large variance and consequently a positive correlation. However, the R2 is not large enough to warrant a modification of the identical distribution assumption.
Applications of the Model
With the validated assumptions, one can use the model for a number of purposes. We have developed a statistically based procedure to flag weak spots before further analyses and to estimate the log ratios and their accuracy.
Flagging of weak spots.
Intuitively, the quality of the spots on microarrays is an important parameter that determines the accuracy and reproducibility of microarray data. As discussed in validation 1, the quality of the spots can be evaluated by comparing the background adjusted signal intensity (t'ij) and the standard deviation (
ij) of the local background difference (
bij). If the background-adjusted intensity t'ij is smaller than a certain threshold (c) of
ij, the spot is then flagged. We have found that the threshold (c) is the most important parameter for normalization and accuracy of the microarray data.
Normalization.
As we explained, after Eq. 7 that there are various ways to do the normalization. The method using the two-color total expressions to adjust each gene expression is equivalent to
 | (13) |
where the sum can be all the spots or only the spots surviving an initial elimination based on background. Under the assumption that "most" genes have the same expression, if we define "most" of the genes as more than 50%, then we should estimate
using a 25% trimmed mean method, i.e., to trim the top and bottom 25% of yj before taking the average. If we do not want to assume 50%, then we may use the mode to estimate
. Obviously, under the assumption 
j = 0,
should be estimated by the average of all the yj or the 0% trimmed mean. Another often-used method which uses the mean of the unadjusted ratios is also considered. The estimation of
in this case is -ln[
(t'2j/t'1j)/n], where n is the sample size, and this is referred to as the mean of ratio method. The mode estimation was based on an Splus subroutine with bandwidth equal to 1.57 x (q0.75 - q0.25)/n, where q
is the
th quantile of data with size n. We have evaluated the consistency of five different normalization methods using the average standard deviation of the log ratio estimates (Eq. 8) from the three replicates (Table 3). The smaller the standard deviation, the more consistent the method. Since the estimates in general depend on how weaker signals are eliminated, three thresholds, c = 0, c = 3, and c = 6 in Eq. 5 were evaluated. As shown in Table 3, little difference was observed between the different normalization methods. However, Table 3 demonstrates a flagging threshold of six (c = 6) performs better than a threshold of three (c = 3), and c = 0 is much worse than the other two thresholds in terms of consistency of ratio estimates. Evaluation of different thresholds (c) suggested that the larger the c, the more consistent the results. This is because a stronger signal has less chance of being blurred by the background noise. However, a large c will leave one with a smaller number of genes for analysis. Thus there is a trade-off between consistency and possibly missing a positive. We recommend the threshold of six as a reasonable compromise between the number of remaining genes and the quality of the data.
Estimation of multiplicative variation
j in Eq. 7.
Note that the s2 mentioned in Eq. 10 estimates 
2 + 
'2. The two variances are inseparable. When there are three replicates, we have one s2 for each pair, and we can find
2
for each replica by solving three simultaneous equations. Table 4 shows 
obtained for all the replicas in each experiment. The data indicate that there is considerable variation between replicates of the experiments. Note the 
is the standard deviation of the noise
ij in the exponent in Eq. 4. To recover the multiplicative noise, we list e
and a 95% confidence interval for the ratio.
In our experiments, as in most microarray experiments, the true expressions are unknown. Simulation studies can fill this void to a certain extent, because the true expressions are known. Therefore, we performed a large number of simulations using the observed expression values in the reference sample and the normal and beta distributions for the log ratios ln(rj). These two distributions were chosen to mimic the previously observed distributions of the log ratios, i.e., a symmetric bell curve sometimes tilted toward the positive side like a beta density function (9). The background noise in the simulated data was taken from the background noise of real data. All the data presented are based on 1,000 simulations. Since the true ratios are known in the simulation study, we can evaluate the difference between the true and the estimated ratios. Table 5 gives an example of the simulation results. Due to the large simulation sample, the differences between most estimates are statistically significant. It is evident that normalization using spots with strong signal intensities (c = 6) is much more consistent than normalization with all genes (c = 0). Furthermore, both the 25% trimmed mean and mode of log ratios gave better estimates of the true expression levels than global intensity and mean of ratios.
Confidence intervals for ratio estimates.
Table 6 presents the confidence intervals (defined by Eq. 12) from the simulated data. A 95% confidence interval is suppose to cover the true
j 95% of the time. Table 6 shows that the coverage is very accurate for those spots with a strong signal (
1j > 10
1j). As expected, the confidence intervals for those spots with weaker signals (
1j < 6
1j) are not as good. This implies that genes with strong true expression level (
ij > 10
ij) are likely to be picked, and the expression ratios can be reliably estimated by the confidence interval formula in Eq. 12.
Specificity and sensitivity.
One critical decision for microarray analysis is to determine whether an observed difference between two samples is statistically significant beyond experimental variation. Many investigators consider a twofold change as significant, i.e., two samples are considered to have different expression levels if the observed ratio (
j) is greater than 2 or less that 0.5. Others have suggested a threefold threshold. To make this rule general, one could claim a difference in expression if
 | (14) |
where f is the suggested threshold measured in folds. To evaluate this decision, we convert it into the usual frame for hypothesis testing
 | (15) |
The conventional way to evaluate this decision rule is to use the specificity (probability of accepting H0 when H0 is true, or equivalently 1 -
) and the sensitivity or power (1 - ß, probability of accepting H1 when H1 is true). Determination of the specificity and sensitivity in Eq. 15 is more complicated than the usual hypothesis testing, because these depend on both the expression intensity
ij and rj. A reasonable requirement for easily interpretable microarray results would be: 1) if r < R0, then the specificity is at least 1 -
, or the false-positive rate is at most
; 2) if r > R1 and the expression level
ij >
0, then the sensitivity is at least 1 - ß, where R0, R1,
0,
, and ß are user-specified values. Based on 1,000 simulations, Table 7 presents the specificity and sensitivity data for a microarray experiment under various values for c and f when 
is set at 0.20 and 0.4, a reasonable error range from our experiments (see Table 4). Table 7 shows the trade-off between specificity and sensitivity for various thresholds of c and f as well as the multiplicative error 
. For example, if the 
is 0.2 and we want a specificity of 0.99 for r < 1.25 and a sensitivity > 0.85 when r > 3.5 for a strong signal (
ij > 10
ij), then we should let c = 6 and f = 2.5.
 |
DISCUSSION
|
---|
In this study, we have developed a model that fits the observed signal intensities for a two-color microarray experiment. The observed signal (tij) is composed of the true expression level with an additive noise due to background (bij) and a multiplicative noise (
ij) due to experimental variation from the probe preparation and hybridization efficiency. Similar models have been developed by other investigators. The assumption of multiplicative variation was first proposed by Chen et al. (1). In their model, the likelihood function was based on the null hypothesis that all genes have the same expression. Therefore, the application of their model is limited to known house keeping genes. Kerr et al. (8) proposed an analysis of variance model that is similar to our Eq. 6. However, they did not consider the background noise in their model. In addition, they did not extend Eq. 6 to Eq. 7, and consequently their model requires the independence of the multiplicative noise for sample (
1j) and reference (
2j) as well as equal variance between experiments. The former assumption is not easy to justify because both errors are from the same spot and the latter assumption was shown not to be a good approximation to the real situation (Table 4). We believe that our model has better assumptions reflecting the true experimental conditions.
Intuitively, the accuracy of microarray data depends on the signal/background ratio of the spots. It is a customized practice to eliminate weak spots on microarrays before subsequent data analyses. However, this practice was based on the users discretion, and a statistically sound procedure was lacking. Discovery of the distribution of additive variation allowed us to propose a model-based method to flag spots with weak expression levels, a critical procedure that can influence all subsequent analyses of microarray data such as normalization and ratio estimates. Since the background has a local distribution, the best way to flag weak spots is to identify those spots that have an adjusted signal (t'ij) less than a certain threshold of standard deviation of background intensity difference between the spot and its neighboring spots (c
ij). We found that the higher the threshold, the more reliable the results. However, increased threshold will decrease the number of genes that can be evaluated. A compromise would be c = 6. We have developed a computer program for this purpose, and it is available upon request. To provide flexibility to the users, we have used a 10-flag scheme with three different c values (c = 2, 4, or 6) for both the sample [channel 1 (CH1)] and reference (CH2): flag 0 (4
ij > CH1 > 2
ij and 4
ij > CH2 > 2
ij), flag 1 (CH1 < 2
ij and 4
ij > CH2 > 2
ij); flag 2 (4
ij > CH1 > 2
ij and CH2 < 2
ij); flag 3 (CH1 < 2
ij and CH2 < 2
ij); flag 4 (6
ij > CH1 > 4
ij and 6
ij > CH2 > 4
ij); flag 5 (CH1 < 4
ij and 6
ij > CH2 > 4
ij); flag 6 (6
ij > CH1 > 4
ij and CH2 < 4
ij); flag 7 (CH1 > 6
ij and CH2 > 6
ij); flag 8 (CH1 < 6
ij and CH2 > 6
ij); flag 9 (CH1 > 6
ij and CH2 < 6
ij).
Normalization is the next critical step for analyzing microarray data. Our experiments showed that the threshold for background noise (c) used for elimination of weak spots is the most critical parameter that influences the performance of normalization. Normalization with all weak spots (c = 0) gives high variation for ratio estimates, and it is not acceptable. When c = 6 is chosen, all normalization methods tested in this study performed well, although the trimmed mean of log ratios performed better than the global intensity and mean of ratio.
Since a large number of genes are simultaneously analyzed by microarray experiments, it is critical to assess the effects of multiple decisions. Using simulations based on parameters derived from experimental data, we have assessed the specificity and sensitivity of detecting expression differences depending on several variables including c, 
, f, and r (Table 7). However, the reproducibility is not solely controlled by these parameters. It also depends on the number of analyzed genes and the distributions of the expression ratios (rj) (or number of genes with different expression vs. number of genes with the same expression) and the true expression levels (
ij). For example, assuming that the ratio is 1 for 99% of the genes (r = 1) and the remaining 1% of genes with strong signal expression (
ij > 10
ij) have a ratio of 3.0, we will observe a number of genes with different expression levels using an elimination threshold of six (c = 6) and cutoff of 2.0 (f = 2.0). Using the numbers in row 8 of Table 7, the chance that one of the chosen genes really has a different expression is, by the usual Bayes formula
 |
where 0.01 and 0.99 are the prior probabilities for a gene with r = 0 and r = 3 (respectively), and 0.92 and 0.02 are the probabilities (from the table) that they will be claimed as genes with different expressions. This suggests that most of the claimed differences would be false alarms. This is because the number of false-positive genes is large in proportion to the true positive even though the false-positive rate is low (2%).
The false-positive rate can be dramatically reduced when the experiment is duplicated and only those genes that have different expressions in both hybridizations are considered. The p then becomes
This is clearly a tremendous gain in confidence. If an experiment is repeated three times and only the genes that agreed all three times are considered, then the chance of making a wrong decision is almost zero. Of course, selecting genes with consistent results in two or multiple experiments decreases the power of detecting a gene with true expression difference. But in most practical situations, it is preferable to select a smaller number of genes with high confidence than a larger number of genes with a high false-positive rate.
In summary, we have developed a statistical model that allows investigators to improve the normalization, accuracy of ratio estimates, and evaluation of the experimental variation, as well as sensitivity and specificity of their microarray experiment.
 |
ACKNOWLEDGMENTS
|
---|
This work was partly supported by National Institute of Diabetes, Digestive and Kidney Diseases Grant 1R24-DK-58778, National Institute of Child Health and Development Grant 1RO1-HD-37800 and Juvenile Diabetes Research Foundation, and National Institute of Allergy and Infectious Diseases Grant 1P01-AI-42288.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: J.-X. She, Dept. of Pathology, Immunology and Laboratory Medicine, Box 100275, College of Medicine, Univ. of Florida, Gainesville, FL 32610-0275 (E-mail: she{at}ufl.edu).
 |
REFERENCES
|
---|
-
Chen Y, Dougherty ER, and Bittner ML. Ratio-based decisions and quantitative analysis of cDNA microarray images. J Chem Optics 2: 364374, 1997.
-
DeRisi JL, Iyer VR, and Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278: 680686, 1997.[Abstract/Free Full Text]
-
DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, and Trent JM. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14: 457460, 1996.[ISI][Medline]
-
Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya N, Sverdlov ED, and Siebert PD. Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc Natl Acad Sci USA 93: 60256030, 1996.[Abstract/Free Full Text]
-
Eisen MB and Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol 303: 179205, 1999.[ISI][Medline]
-
Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 1486314868, 1998.[Abstract/Free Full Text]
-
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, and Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531537, 1999.[Abstract/Free Full Text]
-
Kerr MK, Martin M, and Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol 7: 819837, 2000.[ISI][Medline]
-
Newton MA, Kendziorski CM, Richmond CS, Blattner FR, and Tsui KW. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8: 3752, 2001.[ISI][Medline]
-
Pat Brown Laboratory Protocols [Online]. Department of Biochemistry, Stanford University, 19992001 (http://cmgm.stanford.edu/pbrown/protocols/).
-
Schena M, Shalon D, Davis RW, and Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467470, 1995.[Abstract]
-
Shalon D, Smith SJ, and Brown PO. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6: 639645, 1996.[Abstract]
-
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, and Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 29072912, 1999.[Abstract/Free Full Text]