1 Institute for Medicine and Engineering
2 Department of Pathology and Laboratory Medicine
3 Department of Bioengineering
4 Center for Bioinformatics
5 Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104
6 AstraZeneca Pharmaceuticals, Mereside Alderley Park, Macclesfield, Cheshire SK10 4TG, United Kingdom
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
high-throughput screening; quantitative real-time polymerase chain reaction; tumor necrosis factor; false discovery rate
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We tested the fidelity of differential gene expression following linear amplification of nanogram quantities of mRNA in a well-established in vitro model of cytokine stimulation of human endothelial cells. There is an extensive literature describing the induction of specific adhesion molecules, chemokines, and transcription factors by tumor necrosis factor- (TNF
) (15, 23, 27), which provided a reference for expected changes in gene expression. We compared differential expression profiles obtained through statistical analysis of multiple replicate arrays using unamplified RNA to those generated following a 4,000- to 5,000-fold linear amplification of aliquots (100 ng total RNA) taken from the same pool of total RNA. The validity of differential expression in sets of genes identified from the analysis of unamplified and amplified RNA samples was subsequently assessed by quantitative real-time PCR (QRT-PCR). The objectives of this study were 1) to evaluate and compare endothelial responses to TNF
stimulation, 2) to assess the fidelity of differential expression by comparing amplified and unamplified profiles generated using standard microarray experiments, and 3) to evaluate the sensitivity of detection in amplified samples.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
TNF stimulation.
Confluent HAEC were harvested at passage 6 by treatment with trypsin (0.05%)/EDTA (0.53 mM), and 5.1 x 106 cells were seeded into 150-mm diameter culture dishes (Becton-Dickinson Labware, Franklin Lakes, NJ). Cells were grown in complete medium for 48 h, to 80% confluence, with a change of medium after 24 h. One day prior to stimulation, cells were switched to a basal medium (EBM-2) supplemented with 2% calf serum and 0.1% gentamicin sulfate amphotericin-B (starvation medium) to suppress cell cycle-specific gene expression. Cells were then stimulated for 2 h at 37°C with 10 ng/ml recombinant human TNF (+TNF
) (R & D Systems, Minneapolis, MN). Control HAEC received only fresh starvation medium (-TNF
). Cells of the same passage were pooled from several dishes prior to RNA isolation, resulting in a single large sample for each condition (+TNF
and -TNF
) on which all replicate experiments were performed.
RNA extraction.
Total RNA was extracted using the RNeasy Total RNA Isolation Kit (Qiagen, Valencia, CA), which avoids the use of phenol and chloroform that may interfere with subsequent enzymatic steps. Briefly, medium was removed and the cells were washed with PBS and lysed in a buffer containing guanidine isothiocyanate and ß-mercaptoethanol (0.143 M). The lysate was homogenized and precipitated with 70% ethanol, transferred to a silica membrane column, and DNA and proteins were removed by a series of washes and centrifugations. Highly purified total RNA was then eluted from the column using RNase-free water. The integrity and quantity of the total RNA samples were evaluated by an Agilent 2100 Bioanalyzer using the RNA 6000 Nano Chips assay kit (Agilent Technologies, Waldbronn, Germany). The size range of the aRNA was evaluated against Ambions RNA 6000 Ladder. Additional quantitative assessment of the total RNA samples was performed using a Beckman Spectrophotometer (OD260/280). The RNA was divided into aliquots and frozen at -80°C until amplified.
RNA amplification.
mRNA (1 ng) was amplified from 100 ng total RNA (equivalent to
104 HAEC) using the MessageAmp aRNA Kit (Ambion, Austin, TX), which is based upon the aRNA amplification procedure first described by Van Gelder and colleagues (22). Poly(A) RNA was reverse transcribed using an oligo(dT) primer containing a T7 RNA polymerase promoter sequence. RNase H treatment cleaved the mRNA into small fragments that served as primers during second-strand synthesis, resulting in a double-stranded cDNA template for T7-mediated linear amplification by in vitro transcription. Typically 45 µg aRNA were produced from one round of amplification (a 4,000- to 5,000-fold amplification). The aRNA was quantified by Agilent Nano Chip technology and evaluated for size relative to pure polyadenylated RNA. Two micrograms aRNA was subsequently labeled by reverse transcription using hexanucleotide priming.
Microarray filter design and printing.
Microarray filters were designed and printed by AstraZeneca Pharmaceuticals (Alderley Park, UK). 3'-Biased, sequence-verified cDNA clones (1.52.0 kb) were identified from Incyte and GenBank databases using proprietary software. Approximately 4,700 of the cDNAs represented the cardiovascular gene expression database of the University of Toronto. The balance consisted of placental genes, G protein-coupled receptor related genes, housekeeping genes, and proprietary expressed sequence tags (ESTs) (Incyte, Palo Alto, CA). PCR products were prepared from overnight bacterial cultures and assessed by agarose gel electrophoresis and PicoGreen analysis. Clones were rearrayed, and 196 were selected at random and sequence-verified to confirm sample ID. Then, 13,824 cDNAs (at 100 ng/µl) were spotted in duplicate onto 22x11-cm Nytran C membranes (Schleicher and Schuell) using a Genetix QBot fitted with a 384-pin (0.4 mm) print-head. The membranes were cross-linked and subsequently denatured, neutralized, and washed before hybridization by consecutive treatment for 5 min each in 1) 1.5 M NaOH, 3 M NaCl; 2) 0.75 M Tris, 1.5 M NaCl; 3) 0.5 M Tris, pH 8.0; and 4) 2x SSC. The treated filters were dried at room temperature between Whatman 3MM papers, and stored at -80°C.
Probe labeling.
33P-labeled DNA probes were synthesized using the SuperScript II reverse transcriptase kit (GIBCO BRL; Life Technologies, Rockville, MD) with minimal modifications. Ten micrograms HAEC total RNA (unamplified) was denatured at 65°C for 5 min and then incubated at 42°C for 1 h with first-strand buffer (50 mM Tris·HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2), 10 mM DTT, 50 µCi [33P]dATP (PerkinElmer Life Sciences, Boston, MA), 0.5 mM dCTP, 0.5 mM dGTP, 0.5 mM dTTP, 1 µg oligo(dT)15 primer (Promega, Madison, WI), and 200 U SuperScript II reverse transcriptase. Alternatively, 2 µg aRNA (amplified) was labeled using the same protocol, except that 100 ng random hexamer (Amersham Pharmacia Biotech, Piscataway, NJ) was used as primer. Free nucleotides, primers, and enzyme were removed from labeled cDNA probes by QIAquick PCR purification kit (Qiagen). Two microliters of purified probes from each 100-µl sample were used to assess the efficiency of the labeling reaction by liquid scintillation counting.
Hybridization to cDNA arrays.
Four replicate arrays for each condition (+TNF and -TNF
) were hybridized with unamplified RNA samples and five replicates with amplified RNA samples derived from the same RNA pools. The filters were prehybridized for 6 h in 10 ml of hybridization solution (200 mM sodium phosphate, 10 mM EDTA, 1% BSA, 6.7% SDS, and 6.7% deionized formamide). Radiolabeled probes were denatured at 95°C for 5 min and chilled on ice. The probes were added into 6 ml of fresh hybridization solution along with 50 µl Human CotI DNA (GIBCO BRL, Life Technologies), and the filters were hybridized overnight. Hybridizations were carried out at 62°C in a hybridization oven with continuous rotation. Filters were then washed three times (40 mM sodium phosphate solution, 1 mM EDTA, 1% SDS), sealed in plastic wrap, and exposed to phosphor imaging screens (Eastman Kodak, Rochester, NY) at room temperature for 5 days. The screens were scanned on the Storm System (Molecular Dynamics, Sunnyvale, CA) at 50 µm resolution.
Array analysis.
Image files were quantified using ArrayVision V.6.3 (Imaging Research, St. Catharines, Ontario). Raw intensity values (computed via the "volume" principal measure in ArrayVision) were corrected on an individual basis using local background estimates (median intensity value of the pixels in four valley regions surrounding each spot, adjusted to the size of the spot), and duplicate background corrected intensities were averaged for each gene. These data were preprocessed and analyzed according to the methods of ArrayStat V.1.2 (Imaging Research) (16, 17). Briefly, the data were log-transformed and centered within conditions. A nonparametric spline curve fit was used as part of a pooled approach to estimating random error over intensities. Outliers were detected and removed according to an algorithm which fits standardized residuals to a normal Gaussian distribution. A minimum of three replicates per condition (outliers removed) was specified for a gene to be considered for further analysis. The data were normalized across conditions by adjusting the mean difference between conditions to zero. Although the normalization was computed across all genes (outliers removed), an iterative approach was applied in which 2% of the genes most differentially expressed were successively removed from the calculation of the mean until there appeared to be no further influence upon the normalization.
Putative sets of differentially expressed genes were identified by ArrayStat via application of a statistical test, which provides P values that are corrected for multiple testing using a false discovery rate (FDR) approach (3) with FDR = 5%. In this way 95% of the putative set is expected to be true positives. In an FDR approach, the expected proportion of false positives among the set of all predictions is controlled, whereas in the classic family-wise type I error approach, the probability that there is at least one false prediction is controlled (3). Therefore, an FDR as high as 50% or even higher might still be acceptable, whereas a P value as high as 50% would not be. For example, if there are thousands of genes represented on the array, of which only one percent are differentially expressed, then it would be beneficial to reveal a subset of 100 genes, half of which are truly differentially expressed (i.e., with an FDR of 50%). Since it is generally not necessary to find a set with no false predictions, especially at great cost in terms of the power of the test, the FDR approach is widely considered more appropriate for microarray analysis.
Plots of M vs. A were generated to evaluate the quality of the data. These are plots of the difference of log intensities M = log2I1 - log2I2 vs. the mean log intensity A = (log2I1 + log2I2)/2, where I1 and I2 are intensities of treatment (+TNF) and control (-TNF
) conditions, respectively, or of two replicate arrays within a condition. The plots identify potential intensity-dependent biases in the data and are visually more revealing than scatter plots of log2I2 vs. log2I1 (25).
The data for the putative sets of differentially expressed genes computed in ArrayStat for the amplified and unamplified RNA samples were imported into GeneSpring (Silicon Genetics, Redwood City, CA) along with a flag indicating significant differential expression. The genes were annotated using information available in public databases and hierarchically classified according to a simple gene ontology constructed based on these annotations. Gene lists were filtered for significance and combined using Venn diagrams according to these biological classifications (see Table 1).
|
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
The results of Venn diagrams generated for selected biological classifications of annotated genes of possible importance to the TNF response are summarized in Table 1. An expanded version of Table 1 containing links to the complete annotated gene lists is available online at http://www.cbil.upenn.edu/RAD3/fidelity_of_amplified_RNA. Similar trends were observed for these classifications as were observed for the entire population of significantly regulated genes described above. Specifically, in each case amplification captured all of the genes which were identified as differentially regulated without amplification, in addition to identifying many additional genes. Some of the classifications presented in Table 1 include genes that are generally present with low abundance and difficult to detect without amplification, for example, the group of transcription factors. The results demonstrate greater sensitivity associated with amplification given adequate replication (n
4 in our study). The accuracy of these additional predicted changes is addressed below.
Characterization of the "common identified genes" group.
Ranking the group of common identified genes by expression ratio (+TNF/-TNF
) revealed a similar rank order in amplified and unamplified samples. This is shown for a subset of common identified adhesion genes in Table 2. The full list of 146 differentially expressed genes is presented as a supplementary file at http://www.cbil.upenn.edu/RAD3/fidelity_of_amplified_RNA. The rank order of these genes is similar whether unamplified or amplified RNA was used and tends to vary only in those instances where neighboring gene expression ratios are very similar. Strongly upregulated or downregulated genes in the amplified group were similarly regulated in the unamplified group. There was a tendency for the magnitudes of differential expression to be greater in the amplified group particularly when expression ratios exceeded threefold up or down. Despite such exaggerated expression differences, however, it is noteworthy that the amplified samples generally produced closer agreement with +TNF
/-TNF
expression ratios measured by QRT-PCR than did the unamplified material. Additionally, the list of "common identified genes" was associated mainly with the lowest P values (highest probability of differential expression).
|
|
Characterization of the "unique to amplified" gene group.
The 1,150 genes found to be regulated in the "unique to amplified" group were ranked by P values, and 24 were selected over the entire range of P values (P < 10-15 to P = 0.005) for validation by QRT-PCR (Fig. 5B). Of these genes, 67% (16 of 24) were truly regulated. Combining these results with the QRT-PCR results in the previous section produces an "empirically estimated FDR" of 31% for detection of regulated genes in amplified samples. Although this is higher than the projected FDR of 5% from ArrayStat, it is a reasonably low FDR for typical experimental purposes. Approximately 63% of the genes with the smallest 10% of P values in the "identified in amplified" gene list belong to the "common identified genes" group.
Characterization of the "unique to unamplified" gene group.
Six genes were chosen for validation by QRT-PCR from the nine found to be "unique to unamplified" (sequence unavailable for the remaining three). Only three of the six genes evaluated were truly differentially expressed (Fig. 5C). Although 50% of the genes validated from this group represent false positives for unamplified samples, the "empirically estimated FDR" for the entire "identified in unamplified" gene set, which includes the "common identified genes", is 12%.
We conclude that, in addition to identifying 94% of the genes found by analysis of unamplified samples, amplification identified with reasonable confidence a large number of truly regulated genes that were not detected in unamplified samples, thus enhancing the discovery potential of the microarray experiments.
Data submission to public repository.
According to recommendations of the Microarray Gene Expression Data (MGED) Society (1) and proposed standards for the publication of DNA microarray data ("minimal information about a microarray experiment," or MIAME) (5), array data from this study have been entered into the RNA Abundance Database (RAD) (14, 21), from where they have been deposited in the public repository ArrayExpress (http://www.ebi.ac.uk/arrayexpress).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several recent studies have evaluated linear amplification for transcriptional profiling (2, 4, 9, 19, 20, 24, 26) and to a varying degree have demonstrated reproducibility in amplified data, greater sensitivity, and fidelity relative to unamplified samples. For example, Feldman et al. (9) in analyses of amplified RNA captured 80% of the genes that were identified without amplification. Puskas et al. (20) reported similar results and using QRT-PCR also observed a larger number of false positives following amplification. Pabon et al. (19) demonstrated reproducible results using single-round amplification from as low as 1 ng mRNA, and indicated a lower limit of twofold for the detection of differentially expressed genes within or between amplified samples, but threefold when comparing amplified to unamplified data (greater variance). Zhao et al. (26) showed high reproducibility of amplified replicates for starting template amounts of 0.33 µg total RNA. They reported high fidelity of amplified data compared with unamplified data and low bias relative to a "gold standard" virtual array. Although generally supportive of our findings, these studies are limited in scope in that they either amplified microgram amounts of total RNA, utilized very limited replication, applied a heuristics approach to identify differentially regulated genes, and/or provided little validation of individual genes.
In the present study, we have demonstrated the fidelity and improved sensitivity for the detection of differential gene expression comparing 4,000- to 5,000-fold linear amplification from nanogram levels of total RNA to micrograms of unamplified material from the same source, thus demonstrating the utility of the approach for very limited, but experimentally realistic quantities of cells. In the case of a monolayer of vascular endothelial cells in vivo or in culture, an area of <1 cm2 can be evaluated for gene expression. This is of particular value when comparing different vascular beds (11), regions within the same artery (7), or spatially sensitive cells in vitro (8). Furthermore, we have utilized sufficient replication to allow for statistical analysis in the identification of differentially expressed genes and used extensive independent validation of individual genes to estimate the concordance of our results with true biological expression differences.
Amplification provided enhanced sensitivity in detecting TNF-regulated genes that might have otherwise been missed in unamplified samples. A significantly greater number of genes, 1,296 (9.4% of the array total), were identified as regulated by TNF
treatment in amplified samples than in unamplified samples (155 genes or 1.1%). This observation held when considering biological classifications of annotated genes (Table 1), some of which comprised genes that are generally present at low abundance and notably difficult to detect (e.g., transcription factors). However, the enhanced sensitivity did not appear to be confined to only poorly expressed genes as was evident when the 1,150 genes found in the "unique to amplified" group were distributed according to intensity values into bins established over the range of intensities (data not shown). This is also apparent in an M vs. A plot (Fig. 6) in which the data for these same 1,150 genes have similar spreads of dynamic range in intensities for both amplified and unamplified samples.
|
As is apparent from Table 1, the amplification protocol identified additional differential gene expression in several biological classifications of interest relative to known TNF responses. The most prominent genes upregulated by TNF
(the adhesion molecules ICAM, VCAM, E-selectin; plus IL-8; MCP-1; fractalkine CX3C; and follistatin) were noted both with and without amplification, and their identities were in good agreement with recent reports by Murakami et al. (15) and Zhou et al. (27). An exception is squalene epoxidase, which was reported upregulated by Murakami et al. (15), whereas significant downregulation was noted in our study. In addition, however, we noted a 10-fold increase in manganese superoxide dismutase (SOD2) and increases of collagen type II (20-fold) and wnt5a (3.2-fold), a member of the wingless family of signaling molecules involved in cell proliferation, differentiation, and organogenesis. We also identified >2-fold downregulation of genes encoding PECAM-1, bone morphogenic proteins BMP2B and BMP4, endothelial nitric oxide synthase (eNOS), hepatocyte growth factor, and cytoskeletal organizing protein LIM.
A large number of potentially important genes were identified as differentially expressed only after RNA amplification. Gas-1, which encodes a mitochondrial electron transfer protein, was expressed 11-fold higher, and other examples included genes for various ras-related GTPase activating proteins, a protease inhibitor cystatin-C, MAP kinase 3, several G protein-coupled receptors and G proteins, VEGF, and the interleukins IL-1, IL-1ß, IL-6, and IL-15. Identified as downregulated were genes for serine/threonine kinases and a suppressor of c-fos. The majority of "unique to amplified" genes exhibited only modest differential expression ratios but with highly significant P values. There was a consistency associated with these smaller changes that suggests that they are real and potentially important. For example, all of the identified genes associated with known NF
B pathways were detected to be upregulated, and in the annotated classification "extracellular matrix" we noted that metalloproteinases 1, 3, 8, and 10 were upregulated (range 1.6- to 2.4-fold) while matrix metalloproteinase 2 expression was suppressed. The detection of such changes facilitates a more comprehensive analysis of gene expression and its integration into the physiology of the cells.
Relaxing the FDR for the unamplified data to 31% (matching the "empirically estimated FDR" for amplification) resulted in the identification of an additional 99 genes, 69 of which were common to the amplified list at an FDR of 5%. However, this failed to capture the majority of the genes identified as "unique to amplified" (an additional 1,081), thus illustrating that the sensitivity of amplification is a real phenomenon and not related to the stringency of the statistical test applied. Although the "empirically estimated FDRs" of 31% for amplified and 12% for unamplified samples are both larger than the specified FDR of 5% in our analysis, they are reasonable for most experimental purposes. In fact, the FDR of 5% used in the present study is conservative and could be reasonably increased without greatly affecting the "empirically estimated FDR" for amplification (31%). Clearly, in addition to accurately representing the majority of genes, amplification introduces some artifact for certain genes, incorrectly reporting differences reproducibly across replicates. The cause of this is not clear.
Systematic and random errors contribute to difficulties in interpreting microarray data, and extraneous factors can introduce bias which leads to statistically significant effects that do not reflect biology (17). Matching of control and experimental samples within animal, tissue sample, or culture is recommended to control for variation among individuals. In addition, running control and experimental samples on the same day may account for some of the many sources of technical variation. Bias introduced by differing amplification efficiency based on transcript size or secondary structure and differences in primers and labeling between amplified/unamplified protocols can be controlled for by amplifying both treatment and control samples.
Technical constraints led to the use of the different primers for the labeling of amplified samples (random hexamer) and unamplified RNA samples (oligo dT), a strategy which has been previously employed (9, 20, 26). The use of random priming for amplified samples could have resulted in smaller probes and more nonspecific hybridization (4). However, since the same primers were used consistently within amplified or within unamplified samples, any systematic effects should cancel out when considering differential expression and thus are not likely to have a great impact on our results. This would not be true when comparing amplified to unamplified samples and may factor in to the lack of concordance observed when comparing between these samples (e.g., Fig. 3). Other studies have also reported greater variability when comparing amplified to unamplified samples (9, 19, 20, 26). Furthermore, based on our extensive validation by QRT-PCR (for which a reversal in the direction of regulation was used as a criterion in identifying false positives), it is unlikely that the difference in primers greatly impacts the results attributed to amplification in this study.
An additional advantage of amplification of individual small samples compared with the pooling of multiple unamplified samples is the improved statistical power arising from larger numbers of replicates (6, 12) and the avoidance of diluting local effects by pooling samples both within and between patients, animals, or cultures. Furthermore, by improving the detection rate of differential gene expression, amplification is not only a useful tool in overcoming small sample quantities, but an "enabling" methodology in studying gene regulatory networks.
In summary, single-round linear amplification of 100 ng total RNA (1 ng mRNA) demonstrated fidelity of differentially expressed TNF
-responsive endothelial genes to an acceptably high level as evaluated by microarray analysis. Amplification not only allowed a significant degree of confidence to be attached to detection of differential expression in limiting amounts of starting material, but also identified significantly more gene expression changes than did analyses using unamplified RNA from the same source. The trade-off for this enhanced discovery potential is the probable addition of a number of false positives to the list of differentially expressed genes, which underscores the importance of validation of the microarray results. While unavoidable when there is no choice but to amplify RNA from small amounts of starting material, this may be preferred by an investigator who wishes to detect low-abundance genes or genes that are subtly differentially expressed. These results using single-round amplification of aRNA suggest that, even when adequate quantities of cells/tissues are available, routine amplification may be useful as a complementary tool in detecting additional differentially regulated genes.
![]() |
ACKNOWLEDGMENTS |
---|
These studies were supported by National Institutes of Health Grants HL-62250, HL-70128, K25-HG-02296, and K25-HG-00052, by National Space Biomedical Research Institute (NASA) Grant NSBRI-01-102, and by a Sponsored Research Award from AstraZeneca Pharmaceuticals.
![]() |
FOOTNOTES |
---|
Address for reprint requests and other correspondence: P. F. Davies, Institute for Medicine and Engineering, Univ. of Pennsylvania, 1010 Vagelos Laboratories, 3340 Smith Walk, Philadelphia, PA 19104 (E-mail: pfd{at}pobox.upenn.edu).
10.1152/physiolgenomics.00173.2002.
* D. C. Polacek and A. G. Passerini contributed equally to this work.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|