Fidelity and enhanced sensitivity of differential transcription profiles following linear amplification of nanogram amounts of endothelial mRNA

Denise C. Polacek1,*, Anthony G. Passerini1,3,*, Congzhu Shi1, Nadeene M. Francesco1, Elisabetta Manduchi4, Gregory R. Grant4, Steven Powell6, Helen Bischof6, Hans Winkler6, Christian J. Stoeckert, Jr.4,5 and Peter F. Davies1,2,3

1 Institute for Medicine and Engineering
2 Department of Pathology and Laboratory Medicine
3 Department of Bioengineering
4 Center for Bioinformatics
5 Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104
6 AstraZeneca Pharmaceuticals, Mereside Alderley Park, Macclesfield, Cheshire SK10 4TG, United Kingdom


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Although mRNA amplification is necessary for microarray analyses from limited amounts of cells and tissues, the accuracy of transcription profiles following amplification has not been well characterized. We tested the fidelity of differential gene expression following linear amplification by T7-mediated transcription in a well-established in vitro model of cytokine [tumor necrosis factor {alpha} (TNF{alpha})]-stimulated human endothelial cells using filter arrays of 13,824 human cDNAs. Transcriptional profiles generated from amplified antisense RNA (aRNA) (from 100 ng total RNA, ~1 ng mRNA) were compared with profiles generated from unamplified RNA originating from the same homogeneous pool. Amplification accurately identified TNF{alpha}-induced differential expression in 94% of the genes detected using unamplified samples. Furthermore, an additional 1,150 genes were identified as putatively differentially expressed using amplified RNA which remained undetected using unamplified RNA. Of genes sampled from this set, 67% were validated by quantitative real-time PCR as truly differentially expressed. Thus, in addition to demonstrating fidelity in gene expression relative to unamplified samples, linear amplification results in improved sensitivity of detection and enhances the discovery potential of high-throughput screening by microarrays.

high-throughput screening; quantitative real-time polymerase chain reaction; tumor necrosis factor; false discovery rate


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
MICROARRAY ANALYSES for high-throughput transcription profiling typically require micrograms of total RNA. Such amounts are often unavailable from limited numbers of cells isolated from ex vivo tissue, needle biopsy, FACS enrichment, or by laser capture microdissection from frozen or archival sections (10, 13, 18). In these circumstances, RNA amplification is required prior to array hybridization. Two methods of mRNA amplification commonly in use are PCR-based and T7-driven linear amplification (22). Although PCR is well suited to detecting gene expression in very limited amounts of starting material, its usefulness for analyzing complex transcriptional profiles is compromised by systematic biases introduced by exponential amplification (20). Linear amplification minimizes bias (20); however, uncertainty about the accuracy of differential gene expression remains because there is no reference to unamplified samples for comparison. When sample size constraints make RNA amplification obligatory, little information is available to the investigator to assess the probability that the ranking of differentially expressed genes corresponds to reality. The fidelity of gene expression profiles generated from very small amounts of sample RNA representative of these procedures is of primary importance.

We tested the fidelity of differential gene expression following linear amplification of nanogram quantities of mRNA in a well-established in vitro model of cytokine stimulation of human endothelial cells. There is an extensive literature describing the induction of specific adhesion molecules, chemokines, and transcription factors by tumor necrosis factor-{alpha} (TNF{alpha}) (15, 23, 27), which provided a reference for expected changes in gene expression. We compared differential expression profiles obtained through statistical analysis of multiple replicate arrays using unamplified RNA to those generated following a 4,000- to 5,000-fold linear amplification of aliquots (100 ng total RNA) taken from the same pool of total RNA. The validity of differential expression in sets of genes identified from the analysis of unamplified and amplified RNA samples was subsequently assessed by quantitative real-time PCR (QRT-PCR). The objectives of this study were 1) to evaluate and compare endothelial responses to TNF{alpha} stimulation, 2) to assess the fidelity of differential expression by comparing amplified and unamplified profiles generated using standard microarray experiments, and 3) to evaluate the sensitivity of detection in amplified samples.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Cell culture.
Human aortic endothelial cells (HAEC) were purchased from Clonetics (San Diego, CA) at passage 3. Tissue culture media and reagents were obtained from BioWhittaker (Walkersville, MD). HAEC cells were routinely maintained in complete medium, consisting of endothelial cell basal medium (EBM-2) supplemented with 0.04% hydrocortisone, 0.4% human basic fibroblast growth factor, 2% fetal bovine serum, and 0.1% each of human recombinant epidermal growth factor, human recombinant vascular endothelial growth factor, ascorbic acid, heparin, gentamicin sulfate, amphotericin-B, and human recombinant insulin-like growth factor. Cells were maintained at 37°C in a 5% CO2 humidified environment with medium changes every 2 days.

TNF{alpha} stimulation.
Confluent HAEC were harvested at passage 6 by treatment with trypsin (0.05%)/EDTA (0.53 mM), and 5.1 x 106 cells were seeded into 150-mm diameter culture dishes (Becton-Dickinson Labware, Franklin Lakes, NJ). Cells were grown in complete medium for 48 h, to 80% confluence, with a change of medium after 24 h. One day prior to stimulation, cells were switched to a basal medium (EBM-2) supplemented with 2% calf serum and 0.1% gentamicin sulfate amphotericin-B (starvation medium) to suppress cell cycle-specific gene expression. Cells were then stimulated for 2 h at 37°C with 10 ng/ml recombinant human TNF{alpha} (+TNF{alpha}) (R & D Systems, Minneapolis, MN). Control HAEC received only fresh starvation medium (-TNF{alpha}). Cells of the same passage were pooled from several dishes prior to RNA isolation, resulting in a single large sample for each condition (+TNF{alpha} and -TNF{alpha}) on which all replicate experiments were performed.

RNA extraction.
Total RNA was extracted using the RNeasy Total RNA Isolation Kit (Qiagen, Valencia, CA), which avoids the use of phenol and chloroform that may interfere with subsequent enzymatic steps. Briefly, medium was removed and the cells were washed with PBS and lysed in a buffer containing guanidine isothiocyanate and ß-mercaptoethanol (0.143 M). The lysate was homogenized and precipitated with 70% ethanol, transferred to a silica membrane column, and DNA and proteins were removed by a series of washes and centrifugations. Highly purified total RNA was then eluted from the column using RNase-free water. The integrity and quantity of the total RNA samples were evaluated by an Agilent 2100 Bioanalyzer using the RNA 6000 Nano Chips assay kit (Agilent Technologies, Waldbronn, Germany). The size range of the aRNA was evaluated against Ambion’s RNA 6000 Ladder. Additional quantitative assessment of the total RNA samples was performed using a Beckman Spectrophotometer (OD260/280). The RNA was divided into aliquots and frozen at -80°C until amplified.

RNA amplification.
mRNA (~1 ng) was amplified from 100 ng total RNA (equivalent to ~104 HAEC) using the MessageAmp aRNA Kit (Ambion, Austin, TX), which is based upon the aRNA amplification procedure first described by Van Gelder and colleagues (22). Poly(A) RNA was reverse transcribed using an oligo(dT) primer containing a T7 RNA polymerase promoter sequence. RNase H treatment cleaved the mRNA into small fragments that served as primers during second-strand synthesis, resulting in a double-stranded cDNA template for T7-mediated linear amplification by in vitro transcription. Typically 4–5 µg aRNA were produced from one round of amplification (a 4,000- to 5,000-fold amplification). The aRNA was quantified by Agilent Nano Chip technology and evaluated for size relative to pure polyadenylated RNA. Two micrograms aRNA was subsequently labeled by reverse transcription using hexanucleotide priming.

Microarray filter design and printing.
Microarray filters were designed and printed by AstraZeneca Pharmaceuticals (Alderley Park, UK). 3'-Biased, sequence-verified cDNA clones (1.5–2.0 kb) were identified from Incyte and GenBank databases using proprietary software. Approximately 4,700 of the cDNAs represented the cardiovascular gene expression database of the University of Toronto. The balance consisted of placental genes, G protein-coupled receptor related genes, housekeeping genes, and proprietary expressed sequence tags (ESTs) (Incyte, Palo Alto, CA). PCR products were prepared from overnight bacterial cultures and assessed by agarose gel electrophoresis and PicoGreen analysis. Clones were rearrayed, and 196 were selected at random and sequence-verified to confirm sample ID. Then, 13,824 cDNAs (at ~100 ng/µl) were spotted in duplicate onto 22x11-cm Nytran C membranes (Schleicher and Schuell) using a Genetix QBot fitted with a 384-pin (0.4 mm) print-head. The membranes were cross-linked and subsequently denatured, neutralized, and washed before hybridization by consecutive treatment for 5 min each in 1) 1.5 M NaOH, 3 M NaCl; 2) 0.75 M Tris, 1.5 M NaCl; 3) 0.5 M Tris, pH 8.0; and 4) 2x SSC. The treated filters were dried at room temperature between Whatman 3MM papers, and stored at -80°C.

Probe labeling.
33P-labeled DNA probes were synthesized using the SuperScript II reverse transcriptase kit (GIBCO BRL; Life Technologies, Rockville, MD) with minimal modifications. Ten micrograms HAEC total RNA (unamplified) was denatured at 65°C for 5 min and then incubated at 42°C for 1 h with first-strand buffer (50 mM Tris·HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2), 10 mM DTT, 50 µCi [33P]dATP (PerkinElmer Life Sciences, Boston, MA), 0.5 mM dCTP, 0.5 mM dGTP, 0.5 mM dTTP, 1 µg oligo(dT)15 primer (Promega, Madison, WI), and 200 U SuperScript II reverse transcriptase. Alternatively, 2 µg aRNA (amplified) was labeled using the same protocol, except that 100 ng random hexamer (Amersham Pharmacia Biotech, Piscataway, NJ) was used as primer. Free nucleotides, primers, and enzyme were removed from labeled cDNA probes by QIAquick PCR purification kit (Qiagen). Two microliters of purified probes from each 100-µl sample were used to assess the efficiency of the labeling reaction by liquid scintillation counting.

Hybridization to cDNA arrays.
Four replicate arrays for each condition (+TNF{alpha} and -TNF{alpha}) were hybridized with unamplified RNA samples and five replicates with amplified RNA samples derived from the same RNA pools. The filters were prehybridized for 6 h in 10 ml of hybridization solution (200 mM sodium phosphate, 10 mM EDTA, 1% BSA, 6.7% SDS, and 6.7% deionized formamide). Radiolabeled probes were denatured at 95°C for 5 min and chilled on ice. The probes were added into 6 ml of fresh hybridization solution along with 50 µl Human CotI DNA (GIBCO BRL, Life Technologies), and the filters were hybridized overnight. Hybridizations were carried out at 62°C in a hybridization oven with continuous rotation. Filters were then washed three times (40 mM sodium phosphate solution, 1 mM EDTA, 1% SDS), sealed in plastic wrap, and exposed to phosphor imaging screens (Eastman Kodak, Rochester, NY) at room temperature for 5 days. The screens were scanned on the Storm System (Molecular Dynamics, Sunnyvale, CA) at 50 µm resolution.

Array analysis.
Image files were quantified using ArrayVision V.6.3 (Imaging Research, St. Catharines, Ontario). Raw intensity values (computed via the "volume" principal measure in ArrayVision) were corrected on an individual basis using local background estimates (median intensity value of the pixels in four valley regions surrounding each spot, adjusted to the size of the spot), and duplicate background corrected intensities were averaged for each gene. These data were preprocessed and analyzed according to the methods of ArrayStat V.1.2 (Imaging Research) (16, 17). Briefly, the data were log-transformed and centered within conditions. A nonparametric spline curve fit was used as part of a pooled approach to estimating random error over intensities. Outliers were detected and removed according to an algorithm which fits standardized residuals to a normal Gaussian distribution. A minimum of three replicates per condition (outliers removed) was specified for a gene to be considered for further analysis. The data were normalized across conditions by adjusting the mean difference between conditions to zero. Although the normalization was computed across all genes (outliers removed), an iterative approach was applied in which 2% of the genes most differentially expressed were successively removed from the calculation of the mean until there appeared to be no further influence upon the normalization.

Putative sets of differentially expressed genes were identified by ArrayStat via application of a statistical test, which provides P values that are corrected for multiple testing using a false discovery rate (FDR) approach (3) with FDR = 5%. In this way 95% of the putative set is expected to be true positives. In an FDR approach, the expected proportion of false positives among the set of all predictions is controlled, whereas in the classic family-wise type I error approach, the probability that there is at least one false prediction is controlled (3). Therefore, an FDR as high as 50% or even higher might still be acceptable, whereas a P value as high as 50% would not be. For example, if there are thousands of genes represented on the array, of which only one percent are differentially expressed, then it would be beneficial to reveal a subset of ~100 genes, half of which are truly differentially expressed (i.e., with an FDR of 50%). Since it is generally not necessary to find a set with no false predictions, especially at great cost in terms of the power of the test, the FDR approach is widely considered more appropriate for microarray analysis.

Plots of M vs. A were generated to evaluate the quality of the data. These are plots of the difference of log intensities M = log2I1 - log2I2 vs. the mean log intensity A = (log2I1 + log2I2)/2, where I1 and I2 are intensities of treatment (+TNF{alpha}) and control (-TNF{alpha}) conditions, respectively, or of two replicate arrays within a condition. The plots identify potential intensity-dependent biases in the data and are visually more revealing than scatter plots of log2I2 vs. log2I1 (25).

The data for the putative sets of differentially expressed genes computed in ArrayStat for the amplified and unamplified RNA samples were imported into GeneSpring (Silicon Genetics, Redwood City, CA) along with a flag indicating significant differential expression. The genes were annotated using information available in public databases and hierarchically classified according to a simple gene ontology constructed based on these annotations. Gene lists were filtered for significance and combined using Venn diagrams according to these biological classifications (see Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Distribution of TNF{alpha}-regulated genes by biological classification

 
Quantitative real-time PCR.
To validate the results from microarray data, quantitative real-time PCR (QRT-PCR) was performed using the FastStart DNA Master SYBR Green I kit and the LightCycler system (Roche Applied Science, Indianapolis, IN). In total, 45 genes were chosen based on the array data, which included several TNF{alpha}-specific markers as well as randomly selected genes from the groups identified through the combination of putative sets of differentially expressed genes (see Fig. 4). These groups consisted of genes that were predicted as regulated by TNF{alpha} in the amplified RNA group only, genes that were predicted as regulated by TNF{alpha} in the unamplified RNA group only, and genes that were identified as regulated by TNF{alpha} in both the amplified and unamplified RNA groups (see RESULTS). Primer sets specific to these genes were designed using Oligo Primer Analysis software (Molecular Biology Insights, Cascade, CO). cDNAs were generated from 1 µg of (unamplified) HAEC total RNA using the SuperScript II reverse transcription reagents in a 20-µl reaction volume, and 0.1 µl of this cDNA reaction was used in a 20-µl QRT-PCR reaction. Mg2+ concentration, annealing temperature, and primer concentration were optimized for each gene. Triplet measurements were performed for each sample. Ubiquitin was used for normalization of cDNA quantity, since it was shown to be unchanged by TNF{alpha} stimulation in the microarray data.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4. Venn diagram showing the distribution of differentially expressed genes (+TNF{alpha}/-TNF{alpha}) identified from analysis of amplified (n = 5) and unamplified (n = 4) RNA samples derived from the same pools. Statistical analysis was performed by the methods of ArrayStat with false discovery rate (FDR) set to 5%. Of the entire population of genes on the array (13,824), 155 were identified as differentially regulated by TNF{alpha} from analysis of unamplified samples, whereas 1,296 were identified from analysis of amplified samples. 146 genes were identified as common to both sets.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Integrity and size distribution of RNA.
The integrity and size distribution of 10–20 ng RNA was assessed immediately following total RNA isolation, after aRNA amplification, and again prior to each labeling/hybridization using Agilent Nano Chip technology. Representative profiles are shown in Fig. 1. The total RNA profile (Fig. 1A) shows intact 28S and 18S ribosomal RNA peaks present at a 2:1 ratio with a flat baseline and no tailing of the bands, demonstrating little if any RNA degradation in the samples. The amplified aRNA profile (Fig. 1B) shows a majority of transcripts in the range of 500–2,500 nucleotides in length, which is somewhat smaller on average than for purified polyadenylated RNA (Fig. 1C), suggesting either early termination of reverse transcription and/or early termination of T7-mediated transcription. Bias introduced by 5'-underrepresentation of transcripts has been previously discussed (2, 24) and, because we are using 3'-biased cDNA PCR products as probes on our arrays, this should not impact the results of the present study (2).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Representative RNA profiles generated by Agilent Nano Chip technology and used to assess integrity and quantity. A: a typical profile for intact total RNA reveals prominent 28S and 18S ribosomal peaks present at approximately a 2:1 ratio with little tailing and a flat, smooth baseline. B: a typical profile for intact amplified aRNA reveals a broad range of transcript sizes (the majority 500–2,500 bp in these samples) and no evidence of contaminating ribosomal peaks. C: a typical profile for unamplified polyadenylated RNA is shown for comparison.

 
Evaluation of systematic bias by M-A plots.
In M vs. A plots, data with systematic intensity-dependent effects removed are roughly symmetrical and centered vertically about the M = 0 line with random scatter if the majority of genes remain unchanged by the treatment effect or approximately equal amounts of up- and downregulation occur over the range of A values. Outliers on the plot represent differentially expressed genes with M > 0 for induced genes and M < 0 for suppressed genes. M vs. A plots demonstrated that the raw data spanned a wide range of signal intensities. Tight M vs. A plots about M = 0 were observed both for replicates (not shown) and for the effect of TNF{alpha} within amplified or within unamplified samples. The normalized data shown in Fig. 2 were centered about zero, symmetrical, and exhibited outliers representing likely differentially expressed genes. In contrast, the data were very widely distributed about M = 0 in plots illustrating the effect of TNF{alpha} between amplified and unamplified samples (Fig. 3). This observation suggests that although comparisons within sample groups are valid, the reliability of expression ratios derived from data in which one RNA sample has been amplified and the other has not is poor, and the user is cautioned against such comparisons.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. Representative M vs. A plots illustrating the reproducibility in background corrected and normalized data when comparing within unamplified (A) or within amplified samples (B). M = log2I1 - log2I2 and A = (log2I1 + log2I2)/2, where I1 is the mean intensity for +TNF{alpha} and I2 is the mean intensity for -TNF{alpha}, respectively (n = 4 for unamplified, n = 5 amplified).

 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3. Representative M vs. A plot illustrating the lack of concordance in background corrected and normalized data when comparing between amplified and unamplified samples. M = log2I1-log2I2 and A = (log2I1 + log2I2)/2, where I1 is the mean intensity for amplified +TNF{alpha} and I2 is the mean intensity for unamplified -TNF{alpha}, respectively (n = 4).

 
Differential expression by TNF{alpha} treatment (Venn diagrams).
Of the total population of genes on the array, 1,296 were identified as being regulated by TNF{alpha} treatment in amplified samples (designated "identified in amplified") and 155 genes in unamplified samples ("identified in unamplified"). Venn diagrams were used to display the differentially expressed genes derived from these analyses. Figure 4 shows that 146 genes were detected as differentially regulated (+TNF{alpha}/-TNF{alpha}) by both analyses. We designate these genes the "common identified genes." Only nine genes were unique to the unamplified set (designated "unique to unamplified"). These results indicate that linear amplification can identify the great majority of genes that are detected as differentially expressed in unamplified samples. However, an additional 1,150 significantly regulated genes were identified in the amplified set only (designated "unique to amplified"), suggesting that amplification may result in greater sensitivity for detecting differentially expressed genes. The accuracy of differential expression predictions for these additional genes is reported below. The complete gene lists defined above are presented as supplementary data at the following web site: http://www.cbil.upenn.edu/RAD3/fidelity_of_amplified_RNA.

The results of Venn diagrams generated for selected biological classifications of annotated genes of possible importance to the TNF{alpha} response are summarized in Table 1. An expanded version of Table 1 containing links to the complete annotated gene lists is available online at http://www.cbil.upenn.edu/RAD3/fidelity_of_amplified_RNA. Similar trends were observed for these classifications as were observed for the entire population of significantly regulated genes described above. Specifically, in each case amplification captured all of the genes which were identified as differentially regulated without amplification, in addition to identifying many additional genes. Some of the classifications presented in Table 1 include genes that are generally present with low abundance and difficult to detect without amplification, for example, the group of transcription factors. The results demonstrate greater sensitivity associated with amplification given adequate replication (n >= 4 in our study). The accuracy of these additional predicted changes is addressed below.

Characterization of the "common identified genes" group.
Ranking the group of common identified genes by expression ratio (+TNF{alpha}/-TNF{alpha}) revealed a similar rank order in amplified and unamplified samples. This is shown for a subset of common identified adhesion genes in Table 2. The full list of 146 differentially expressed genes is presented as a supplementary file at http://www.cbil.upenn.edu/RAD3/fidelity_of_amplified_RNA. The rank order of these genes is similar whether unamplified or amplified RNA was used and tends to vary only in those instances where neighboring gene expression ratios are very similar. Strongly upregulated or downregulated genes in the amplified group were similarly regulated in the unamplified group. There was a tendency for the magnitudes of differential expression to be greater in the amplified group particularly when expression ratios exceeded threefold up or down. Despite such exaggerated expression differences, however, it is noteworthy that the amplified samples generally produced closer agreement with +TNF{alpha}/-TNF{alpha} expression ratios measured by QRT-PCR than did the unamplified material. Additionally, the list of "common identified genes" was associated mainly with the lowest P values (highest probability of differential expression).


View this table:
[in this window]
[in a new window]
 
Table 2. Common identified adhesion genes ranked by amplified ratio

 
Figure 5A shows the differential expression of several known TNF{alpha}-responsive genes validated by QRT-PCR. ELAM-1, VCAM-1, and MCP-1 were induced by TNF{alpha}, whereas PECAM-1 was suppressed, in agreement with other published studies (15, 23, 27). An additional 10 "common identified genes" were chosen at random from the entire list over the full range of this P value ranking. Nine of these 10 genes were found to be truly different by QRT-PCR (Fig. 5A). The criterion used to assess agreement for results presented in Fig. 5 was consistency in the direction of regulation without regard for the magnitude of the expression ratio.



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 5. Results of quantitative real-time PCR.

 
Two conclusions can be drawn from the analysis of the "common identified genes": 1) the 146 genes in this list have a high probability of being truly regulated by TNF{alpha}, and 2) 94% of the genes detected as differentially expressed using microgram amounts of unamplified total RNA were accurately identified following linear amplification of nanogram amounts of total RNA.

Characterization of the "unique to amplified" gene group.
The 1,150 genes found to be regulated in the "unique to amplified" group were ranked by P values, and 24 were selected over the entire range of P values (P < 10-15 to P = 0.005) for validation by QRT-PCR (Fig. 5B). Of these genes, 67% (16 of 24) were truly regulated. Combining these results with the QRT-PCR results in the previous section produces an "empirically estimated FDR" of ~31% for detection of regulated genes in amplified samples. Although this is higher than the projected FDR of 5% from ArrayStat, it is a reasonably low FDR for typical experimental purposes. Approximately 63% of the genes with the smallest 10% of P values in the "identified in amplified" gene list belong to the "common identified genes" group.

Characterization of the "unique to unamplified" gene group.
Six genes were chosen for validation by QRT-PCR from the nine found to be "unique to unamplified" (sequence unavailable for the remaining three). Only three of the six genes evaluated were truly differentially expressed (Fig. 5C). Although 50% of the genes validated from this group represent false positives for unamplified samples, the "empirically estimated FDR" for the entire "identified in unamplified" gene set, which includes the "common identified genes", is ~12%.

We conclude that, in addition to identifying 94% of the genes found by analysis of unamplified samples, amplification identified with reasonable confidence a large number of truly regulated genes that were not detected in unamplified samples, thus enhancing the discovery potential of the microarray experiments.

Data submission to public repository.
According to recommendations of the Microarray Gene Expression Data (MGED) Society (1) and proposed standards for the publication of DNA microarray data ("minimal information about a microarray experiment," or MIAME) (5), array data from this study have been entered into the RNA Abundance Database (RAD) (14, 21), from where they have been deposited in the public repository ArrayExpress (http://www.ebi.ac.uk/arrayexpress).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Since animal tissues are diverse in terms of the number and distribution of cell types, it is evident that cell-specific transcriptional profiling is necessary to obtain spatially precise information about complex programs of genetic regulation. This requires either the isolation of large amounts (5–10 µg total RNA) of RNA from homogeneous cells or cell type-enriched tissue preparations or, alternatively, the isolation of small amounts (50–100 ng total RNA) of homogeneous material followed by amplification of the mRNA prior to microarray analyses. Although there are several methods of amplifying mRNA in use at the present time, T7-driven linear amplification methods based upon the original work of Van Gelder and colleagues (22) offers distinct advantages by avoiding bias introduced by the inherent nonlinear amplification of PCR-based methods (20). This is now a routine component of the Affymetrix standard eukaryotic target labeling protocol (http://www.affymetrix.com).

Several recent studies have evaluated linear amplification for transcriptional profiling (2, 4, 9, 19, 20, 24, 26) and to a varying degree have demonstrated reproducibility in amplified data, greater sensitivity, and fidelity relative to unamplified samples. For example, Feldman et al. (9) in analyses of amplified RNA captured ~80% of the genes that were identified without amplification. Puskas et al. (20) reported similar results and using QRT-PCR also observed a larger number of false positives following amplification. Pabon et al. (19) demonstrated reproducible results using single-round amplification from as low as 1 ng mRNA, and indicated a lower limit of twofold for the detection of differentially expressed genes within or between amplified samples, but threefold when comparing amplified to unamplified data (greater variance). Zhao et al. (26) showed high reproducibility of amplified replicates for starting template amounts of 0.3–3 µg total RNA. They reported high fidelity of amplified data compared with unamplified data and low bias relative to a "gold standard" virtual array. Although generally supportive of our findings, these studies are limited in scope in that they either amplified microgram amounts of total RNA, utilized very limited replication, applied a heuristics approach to identify differentially regulated genes, and/or provided little validation of individual genes.

In the present study, we have demonstrated the fidelity and improved sensitivity for the detection of differential gene expression comparing 4,000- to 5,000-fold linear amplification from nanogram levels of total RNA to micrograms of unamplified material from the same source, thus demonstrating the utility of the approach for very limited, but experimentally realistic quantities of cells. In the case of a monolayer of vascular endothelial cells in vivo or in culture, an area of <1 cm2 can be evaluated for gene expression. This is of particular value when comparing different vascular beds (11), regions within the same artery (7), or spatially sensitive cells in vitro (8). Furthermore, we have utilized sufficient replication to allow for statistical analysis in the identification of differentially expressed genes and used extensive independent validation of individual genes to estimate the concordance of our results with true biological expression differences.

Amplification provided enhanced sensitivity in detecting TNF{alpha}-regulated genes that might have otherwise been missed in unamplified samples. A significantly greater number of genes, 1,296 (9.4% of the array total), were identified as regulated by TNF{alpha} treatment in amplified samples than in unamplified samples (155 genes or 1.1%). This observation held when considering biological classifications of annotated genes (Table 1), some of which comprised genes that are generally present at low abundance and notably difficult to detect (e.g., transcription factors). However, the enhanced sensitivity did not appear to be confined to only poorly expressed genes as was evident when the 1,150 genes found in the "unique to amplified" group were distributed according to intensity values into bins established over the range of intensities (data not shown). This is also apparent in an M vs. A plot (Fig. 6) in which the data for these same 1,150 genes have similar spreads of dynamic range in intensities for both amplified and unamplified samples.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 6. M vs. A plot for the 1,150 genes identified as the "unique to amplified" group. M = log2I1 - log2I2 and A = (log2I1 + log2I2)/2, where I1 is the mean intensity for +TNF{alpha} and I2 is the mean intensity for -TNF{alpha}, respectively (n = 5 for amplified, n = 4 unamplified). Light spots correspond to data from the amplified samples, whereas dark spots correspond to data from the unamplified samples. Note that there is a systematic shift of the amplified data toward lower intensities (A), due to an overall lower intensity associated with amplified arrays. Nevertheless, the amplified data span a similar dynamic range of intensities as the unamplified data without apparent intensity-dependent bias in the ability of amplification to identify genes missed by unamplified samples.

 
A practical question raised in the interpretation of transcriptional profiles generated from amplified RNA in the absence of unamplified reference arrays is whether some criteria could be applied that would allow the identification of the "common identified genes" found in this study. Validation of this group revealed a low number of false positives (>=90% validated by random sampling). Since ~63% of the genes with the smallest 10% of P values in the "identified in amplified" gene list belong to the "common identified genes" group, a higher probability of true biological differences exists in this lowest subset of P values. It should be noted that the largest differences in gene expression in response to TNF{alpha} were generally, but not always, associated with the smallest P values.

As is apparent from Table 1, the amplification protocol identified additional differential gene expression in several biological classifications of interest relative to known TNF{alpha} responses. The most prominent genes upregulated by TNF{alpha} (the adhesion molecules ICAM, VCAM, E-selectin; plus IL-8; MCP-1; fractalkine CX3C; and follistatin) were noted both with and without amplification, and their identities were in good agreement with recent reports by Murakami et al. (15) and Zhou et al. (27). An exception is squalene epoxidase, which was reported upregulated by Murakami et al. (15), whereas significant downregulation was noted in our study. In addition, however, we noted a 10-fold increase in manganese superoxide dismutase (SOD2) and increases of collagen type II (20-fold) and wnt5a (3.2-fold), a member of the wingless family of signaling molecules involved in cell proliferation, differentiation, and organogenesis. We also identified >2-fold downregulation of genes encoding PECAM-1, bone morphogenic proteins BMP2B and BMP4, endothelial nitric oxide synthase (eNOS), hepatocyte growth factor, and cytoskeletal organizing protein LIM.

A large number of potentially important genes were identified as differentially expressed only after RNA amplification. Gas-1, which encodes a mitochondrial electron transfer protein, was expressed 11-fold higher, and other examples included genes for various ras-related GTPase activating proteins, a protease inhibitor cystatin-C, MAP kinase 3, several G protein-coupled receptors and G proteins, VEGF, and the interleukins IL-1{alpha}, IL-1ß, IL-6, and IL-15. Identified as downregulated were genes for serine/threonine kinases and a suppressor of c-fos. The majority of "unique to amplified" genes exhibited only modest differential expression ratios but with highly significant P values. There was a consistency associated with these smaller changes that suggests that they are real and potentially important. For example, all of the identified genes associated with known NF{kappa}B pathways were detected to be upregulated, and in the annotated classification "extracellular matrix" we noted that metalloproteinases 1, 3, 8, and 10 were upregulated (range 1.6- to 2.4-fold) while matrix metalloproteinase 2 expression was suppressed. The detection of such changes facilitates a more comprehensive analysis of gene expression and its integration into the physiology of the cells.

Relaxing the FDR for the unamplified data to 31% (matching the "empirically estimated FDR" for amplification) resulted in the identification of an additional 99 genes, 69 of which were common to the amplified list at an FDR of 5%. However, this failed to capture the majority of the genes identified as "unique to amplified" (an additional 1,081), thus illustrating that the sensitivity of amplification is a real phenomenon and not related to the stringency of the statistical test applied. Although the "empirically estimated FDRs" of 31% for amplified and 12% for unamplified samples are both larger than the specified FDR of 5% in our analysis, they are reasonable for most experimental purposes. In fact, the FDR of 5% used in the present study is conservative and could be reasonably increased without greatly affecting the "empirically estimated FDR" for amplification (~31%). Clearly, in addition to accurately representing the majority of genes, amplification introduces some artifact for certain genes, incorrectly reporting differences reproducibly across replicates. The cause of this is not clear.

Systematic and random errors contribute to difficulties in interpreting microarray data, and extraneous factors can introduce bias which leads to statistically significant effects that do not reflect biology (17). Matching of control and experimental samples within animal, tissue sample, or culture is recommended to control for variation among individuals. In addition, running control and experimental samples on the same day may account for some of the many sources of technical variation. Bias introduced by differing amplification efficiency based on transcript size or secondary structure and differences in primers and labeling between amplified/unamplified protocols can be controlled for by amplifying both treatment and control samples.

Technical constraints led to the use of the different primers for the labeling of amplified samples (random hexamer) and unamplified RNA samples (oligo dT), a strategy which has been previously employed (9, 20, 26). The use of random priming for amplified samples could have resulted in smaller probes and more nonspecific hybridization (4). However, since the same primers were used consistently within amplified or within unamplified samples, any systematic effects should cancel out when considering differential expression and thus are not likely to have a great impact on our results. This would not be true when comparing amplified to unamplified samples and may factor in to the lack of concordance observed when comparing between these samples (e.g., Fig. 3). Other studies have also reported greater variability when comparing amplified to unamplified samples (9, 19, 20, 26). Furthermore, based on our extensive validation by QRT-PCR (for which a reversal in the direction of regulation was used as a criterion in identifying false positives), it is unlikely that the difference in primers greatly impacts the results attributed to amplification in this study.

An additional advantage of amplification of individual small samples compared with the pooling of multiple unamplified samples is the improved statistical power arising from larger numbers of replicates (6, 12) and the avoidance of diluting local effects by pooling samples both within and between patients, animals, or cultures. Furthermore, by improving the detection rate of differential gene expression, amplification is not only a useful tool in overcoming small sample quantities, but an "enabling" methodology in studying gene regulatory networks.

In summary, single-round linear amplification of 100 ng total RNA (~1 ng mRNA) demonstrated fidelity of differentially expressed TNF{alpha}-responsive endothelial genes to an acceptably high level as evaluated by microarray analysis. Amplification not only allowed a significant degree of confidence to be attached to detection of differential expression in limiting amounts of starting material, but also identified significantly more gene expression changes than did analyses using unamplified RNA from the same source. The trade-off for this enhanced discovery potential is the probable addition of a number of false positives to the list of differentially expressed genes, which underscores the importance of validation of the microarray results. While unavoidable when there is no choice but to amplify RNA from small amounts of starting material, this may be preferred by an investigator who wishes to detect low-abundance genes or genes that are subtly differentially expressed. These results using single-round amplification of aRNA suggest that, even when adequate quantities of cells/tissues are available, routine amplification may be useful as a complementary tool in detecting additional differentially regulated genes.


    ACKNOWLEDGMENTS
 
The computational assistance of Gary Chang of the University of Pennsylvania is gratefully acknowledged. We thank Dr. Robert Nadon of McGill University and Dr. Dennis McCormac of Imaging Research for discussions of statistical analysis and the application of the ArrayStat software. We thank Drs. James Eberwine, Scott Diamond, and Don Baldwin of the University of Pennsylvania for critical reading of the manuscript.

These studies were supported by National Institutes of Health Grants HL-62250, HL-70128, K25-HG-02296, and K25-HG-00052, by National Space Biomedical Research Institute (NASA) Grant NSBRI-01-102, and by a Sponsored Research Award from AstraZeneca Pharmaceuticals.


    FOOTNOTES
 
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: P. F. Davies, Institute for Medicine and Engineering, Univ. of Pennsylvania, 1010 Vagelos Laboratories, 3340 Smith Walk, Philadelphia, PA 19104 (E-mail: pfd{at}pobox.upenn.edu).

10.1152/physiolgenomics.00173.2002.

* D. C. Polacek and A. G. Passerini contributed equally to this work. Back


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, Causton HC, Cavalieri D, Gaasterland T, Hingamp P, Holstege F, Ringwald M, Spellman P, Stoeckert CJ Jr, Stewart JE, Taylor R, Brazma A, and Quackenbush J. Standards for microarray data. Science 298: 539, 2002.
  2. Baugh LR, Hill AA, Brown EL, and Hunter CP. Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res 29: E29, 2001.[Medline]
  3. Benjamini Y and Hockberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statistical Soc Ser B-Methodological 57: 289–300, 1995.
  4. Bosch I, Melichar H, and Pardee AB. Identification of differentially expressed genes from limited amounts of RNA. Nucleic Acids Res 28: E27, 2000.[Medline]
  5. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, and Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29: 365–371, 2001.[ISI][Medline]
  6. Cook SA and Rosenzweig A. DNA microarrays: implications for cardiovascular medicine. Circ Res 91: 559–564, 2002.[Abstract/Free Full Text]
  7. Davies PF, Polacek DC, Handen JS, Helmke BP, and DePaola N. A spatial approach to transcriptional profiling: mechanotransduction and the focal origin of atherosclerosis. Trends Biotechnol 17: 347–351, 1999.[ISI][Medline]
  8. DePaola N, Davies PF, Pritchard WF Jr, Florez L, Harbeck N and Polacek DC. Spatial and temporal regulation of gap junction connexin43 in vascular endothelial cells exposed to controlled disturbed flows in vitro. Proc Natl Acad Sci USA 96: 3154–3159, 1999.[Abstract/Free Full Text]
  9. Feldman AL, Costouros NG, Wang E, Qian M, Marincola FM, Alexander HR, and Libutti SK. Advantages of mRNA amplification for microarray analysis. Biotechniques 33: 906–914, 2002.[ISI][Medline]
  10. Florell SR, Coffin CM, Holden JA, Zimmermann JW, Gerwels JW, Summers BK, Jones DA, and Leachman SA. Preservation of RNA for functional genomic studies: a multidisciplinary tumor bank protocol. Mod Pathol 14: 116–128, 2001.[ISI][Medline]
  11. Garlanda C and Dejana E. Heterogeneity of endothelial cells. Specific markers. Arterioscler Thromb Vasc Biol 17: 1193–1202, 1997.[Abstract/Free Full Text]
  12. Lee ML, Kuo FC, Whitmore GA, and Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci USA 97: 9834–9839, 2000.[Abstract/Free Full Text]
  13. Luo L, Salunga RC, Guo H, Bittner A, Joy KC, Galindo JE, Xiao H, Rogers KE, Wan JS, Jackson MR, and Erlander MG. Gene expression profiles of laser-captured adjacent neuronal subtypes. Nat Med 5: 117–122, 1999.[ISI][Medline]
  14. Manduchi E, Pizarro A, and Stoeckert CJ Jr. RAD (RNA Abundance Database): an infrastructure for array data analysis. Proc SPIE 4266: 68–78, 2001.
  15. Murakami T, Mataki C, Nagao C, Umetani M, Wada Y, Ishii M, Tsutsumi S, Kohro T, Saiura A, Aburatani H, Hamakubo T, and Kodama T. The gene expression profile of human umbilical vein endothelial cells stimulated by tumor necrosis factor alpha using DNA microarray analysis. J Atheroscler Thromb 7: 39–44, 2000.[Medline]
  16. Nadon R, Shi P, Skandalis A, Woody E, Hubschle H, Susko E, Rghei N, and Ramm P. Statistical inference methods for gene expression arrays. In: Microarrays: Optical Technologies and Informatics, edited by Bittner M, Chen Y, Dorsel A, and Dougherty E. Bellingham, WA: SPIE, 2001, p. 46–55.
  17. Nadon R and Shoemaker J. Statistical issues with microarrays: processing and analysis. Trends Genet 18: 265–271, 2002.[ISI][Medline]
  18. Nagle RB. New molecular approaches to tissue analysis. J Histochem Cytochem 49: 1063–1064, 2001.[Abstract/Free Full Text]
  19. Pabon C, Modrusan Z, Ruvolo MV, Coleman IM, Daniel S, Yue H, Arnold LJ Jr, and Reynolds MA. Optimized T7 amplification system for microarray analysis. Biotechniques 31: 874–879, 2001.[ISI][Medline]
  20. Puskas LG, Zvara A, Hackler L Jr, and Van Hummelen P. RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques 32: 1330–1340, 2002.[ISI][Medline]
  21. Stoeckert CJ Jr, Pizarro A, Manduchi E, Gibson M, Brunk B, Crabtree J, Schug J, Shen-Orr S, and Overton GC. A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics 17: 300–308, 2001.[Abstract]
  22. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, and Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA 87: 1663–1667, 1990.[Abstract]
  23. Van Rijen HV, van Kempen MJ, Postma S, and Jongsma HJ. Tumour necrosis factor {alpha} alters the expression of connexin43, connexin40, and connexin37 in human umbilical vein endothelial cells. Cytokine 10: 258–264, 1998.[ISI][Medline]
  24. Wang E, Miller LD, Ohnmacht GA, Liu ET, and Marincola FM. High-fidelity mRNA amplification for gene profiling. Nat Biotechnol 18: 457–459, 2000.[ISI][Medline]
  25. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, and Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: E15, 2002.[Medline]
  26. Zhao H, Hastie T, Whitfield ML, Borresen-Dale AL, and Jeffrey SS. Optimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysis. BMC Genomics 3: 31, 2002.[Medline]
  27. Zhou J, Jin Y, Gao Y, Wang H, Hu G, Huang Y, Chen Q, Feng M, and Wu C. Genomic-scale analysis of gene expression profiles in TNF-{alpha} treated human umbilical vein endothelial cells. Inflamm Res 51: 332–341, 2002.[ISI][Medline]