Analysis of mucosal gene expression in inflammatory bowel disease by parallel oligonucleotide arrays
B. K. DIECKGRAEFE1,
W. F. STENSON1,
J. R. KORZENIK1,
P. E. SWANSON2 and
C. A. HARRINGTON3
1 Division of Gastroenterology
2 Department of Surgical Pathology, Washington University School of Medicine, St. Louis, Missouri 63110
3 Affymetrix, Inc., Santa Clara, California 95051
 |
ABSTRACT
|
---|
DNA arrays capable of simultaneously measuring expression of thousands of genes in clinical specimens from affected and normal individuals have the potential to provide information about disease pathogenesis not previously possible. Few studies have applied mRNA profiling to diseases involving complex tissues like the intestinal mucosa, reflecting the unique challenges inherent to this type of analysis. We report the analysis of mucosal gene expression in ulcerative colitis (UC) patients and inflamed and noninflamed control specimens. Genes can be used as markers for cell recruitment, activation, and mucosal synthesis of immunoregulatory molecules. Self-organizing maps were applied to cluster and analyze gene expression patterns and were paired with histopathological scores to identify genes associated with increased disease activity. Clustering was achieved on the basis of differences in expression levels across individual specimens. Several inflammatory mediators were identified as likely determinants of characteristic histological features of active UC. These results provide proof of principle for application of functional genomics to larger inflammatory bowel disease populations for gene discovery, to facilitate identification of disease subgroups on the basis of gene expression signatures, and for prediction of disease behavior or optimal therapeutic intervention.
oligonucleotide arrays; gene expression; ulcerative colitis; Crohns disease; inflammatory bowel disease
 |
INTRODUCTION
|
---|
ULCERATIVE COLITIS (UC) and Crohns disease (CD) are chronic relapsing inflammatory disorders of the gastrointestinal tract, collectively termed inflammatory bowel disease (IBD). The causes of these diseases, which afflict an estimated 1 million individuals in the United States, remain unknown. Prevailing explanations of the pathogenesis of IBD hold that the chronic intestinal inflammation results from an aberrant immune response generated, at least in part, against ubiquitous microbial antigens. Four major factors have substantially hampered progress in the understanding and treatment of IBD. First, only a small subset of genes involved in the initiation, amplification, and perpetuation of the mucosal inflammatory phenotype characteristic of IBD have likely been identified. Second, IBD results from complex interactions between multiple genes. Families with IBD typically lack the Mendelian segregation of the inflammatory phenotype that would be expected if the disease were caused by mutation of a single gene. Although genome-wide scans have identified multiple candidate causative or contributory loci, individual loci account for only a very small fraction of the overall risk for the development of IBD (19). Third, an important but yet unidentified environmental factor is suggested by differences in concordance rates among monozygotic and dizygotic twins with UC or CD (33). Finally, the diverse group of monogenic murine knockout models (reviewed in Refs. 16, 27) that lead to a similar CD- or UC-like phenotypes suggests that human IBD is also likely to be a genetically heterogeneous disorder.
Genome sequencing projects and the development of DNA array techniques have recently provided new tools that provide a more comprehensive picture of the gene expression underlying disease states. For genome-wide gene expression analysis, serial analysis of gene expression (SAGE), differential display techniques, and both cDNA and oligonucleotide array-based technologies have been recently applied. Oligonucleotide- or cDNA-based arrays have proven to be useful for the analysis of multiple samples (5, 7, 8, 13, 14, 18, 23, 28, 30, 34). Two basic variations of high-density DNA arrays have been developed. The first consists of cDNA sequences arrayed by high-speed robotics to glass slide microarrays (30). The second consists of oligonucleotide arrays synthesized in situ by combining semiconductor-based photolithography and modified phosphoramidite-based DNA synthesis (reviewed in Ref. 28). Our studies utilized the Affymetrix GeneChip array containing sets of 25-mer oligonucleotides specifically designed for each target mRNA.
Genome-wide gene expression analysis of tissue samples from affected and normal individuals can illuminate important events involved in disease pathogenesis. In IBD, for example, individual mRNAs can serve as sensitive markers for recruitment and involvement of specific cell types, cellular activation, and mucosal expression of key immunoregulatory proteins. Disease heterogeneity, reflecting differences in underlying environmental and genetic factors leading to the inflammatory mucosal phenotype, may be reflected in different gene expression profiles. The ability to measure and analyze this type of gene expression data should therefore provide a basis for improved classification and diagnosis, as well as identification of new therapeutic targets, and provide important prognostic information. An important proof of principle for the application of gene expression profiling to identify previously unrecognized tumor subtypes (class prediction) has been recently reported for diffuse large B-cell lymphoma and acute leukemias (2, 17).
Most reported GeneChip or microarray studies have centered on cultured cell lines or purified single cell populations. The measurement and analysis of gene expression in diseases involving more complex tissues, such as IBD, pose several unique challenges. The inflammatory mucosa is composed of heterogeneous and changing cell populations. Furthermore, the interactions of immune cell populations with nonimmune cellular components of the intestinal mucosa, including epithelial, mesenchymal, and microvascular endothelial cells, are thought to be pivotal in the pathogenesis of IBD. Gene expression measurements will represent an average of these many different cell types. Gene expression by some cell populations (e.g., epithelial cells) may be decreased relative to the total mRNA pool, reflecting mucosal trafficking of inflammatory cell populations in IBD. Meaningful gene expression differences may also be hidden in genetic noise or complex patterns of mucosal gene expression unrelated to disease pathogenesis. We undertook studies to examine the utility of gene expression profiling combined with sophisticated gene clustering analyses to detect distinctive gene expression patterns that associate with histological score and clinical features of disease activity. We report our quantitative analysis of mucosal mRNA profiles in eight selected UC specimens and seven control specimens demonstrating the following: 1) populations of genes that are overexpressed or underexpressed in UC mucosa compared with control mucosa, 2) previously unsuspected genes with a likely role in the mucosal immune response, and 3) distinctive patterns of gene expression associated with specific histopathological features. We focus on a discussion of the potential role for functional genomics as applied to complex disease specimens and approaches for the analysis of these large data sets. Validation of the role for specific molecules identified by this analysis and exploration of methods that allow the use of smaller amounts of tissue, such as endoscopic biopsies, for the interrogation of microarrays will be presented elsewhere.
 |
METHODS
|
---|
Tissue specimens.
Representative samples from colectomy specimens were obtained from the pathologist in the operating room immediately following resection. The mucosa was dissected from underlying tissue and homogenized in Ultraspec RNA (Biotecx Laboratories, Houston, TX) for the isolation of total RNA. A portion of the specimen immediately adjacent to the region used for RNA isolation was fixed in 10% neutral buffered formalin and processed for routine light microscopy. Use of patient material was in accordance with Institutional Review Board guidelines and protocols.
Preparation of labeled cRNA.
Poly(A)+ mRNA was isolated by two rounds of selection using Oligotex (Qiagen, Santa Clarita, CA). Two micrograms of poly(A)+ mRNA was used as a template for the synthesis of double-stranded cDNA using a cDNA synthesis kit (Life Technologies, Gaithersburg, MD) with a modified oligo(dT) primer incorporating a T7 RNA polymerase promoter site. After second strand synthesis, cDNA was purified by phenol:chloroform:isoamyl alcohol extraction using Phase Lock Gels (5 Prime
3 Prime, Inc., Boulder, CO) followed by ethanol precipitation. Biotin-labeled cRNA was synthesized by in vitro transcription using the T7 Megascript kit (Ambion, Austin, TX) in the presence of biotin-11-CTP and biotin-16-UTP (ENZO, Farmingdale, NY) using 1 µg of cDNA as a template. Labeled sample cRNA was separated from unincorporated NTPs by the RNeasy Mini kit (Qiagen). Products were analyzed by denaturing agarose gel electrophoresis and quantified by absorbance at 260 and 280 nm. To improve hybridization kinetics and reduce the effects of RNA secondary structure, RNAs were randomly fragmented to an average length of 50 bases by heating to 94°C in 40 mM Tris-acetate, pH 8.1, 100 mM potassium acetate, and 30 mM magnesium acetate for 35 min.
Hybridization, confocal scanning, and quantitative image analysis.
Oligonucleotide arrays are mounted in cartridges which serve as hybridization chambers. Arrays were prehybridized for 1020 min at 40°C. Hybridization solutions were then prepared in a volume of 200 µl containing 1.0 M NaCl, 10 mM Tris (pH 7.6), 0.005% Triton X-100, 10 µg of fragmented RNA probe, 50 pM of a control biotinylated oligonucleotide (complimentary to the corner grid, used for image alignment), 0.1 mg/ml degraded herring sperm DNA, and biotin-labeled bacterial and phage hybridization control cRNAs (1.5 pM bioB, 5 pM bioC, 25 pM bioD, and 100 pM Cre), used to assess chip performance and estimate transcript abundance. Prior to use, mixtures were heated to 95°C for 5 min, spun to remove any particulate materials, and equilibrated to 40°C. Prehybridization solutions were removed, and the chips hybridized for 16 h at 40°C with continuous rotation. The microarray set used for these analyses contained four chips, each containing representation of
1,700 human genes and expressed sequence tags (ESTs) (Hum 6000). After hybridization, the solutions were removed, and the arrays were washed with 6x SSPE-T (0.9 M NaCl, 60 mM NaH2PO4, 6 mM EDTA, and 0.005% Triton X-100, pH 7.6) twice rapidly at room temperature and once at 50°C for 60 min. A single high-stringency wash was then performed with 0.5x SSPE-T at 50°C for 15 min. Arrays were stained with a solution of 2 µg/ml streptavidin-phycoerythrin conjugate (Molecular Probes, Eugene, OR), 1 mg/ml acetylated BSA, and 6x SSPE-T at 40°C for 10 min, and were then washed extensively on an automated Fluidics Station. Arrays were read by scanning confocal microscope (GeneChip scanner 50, Molecular Dynamics) with argon laser excitation. After a quantitative scan was performed, a grid was aligned to the stored image, using the corner control regions and known array dimensions. Alignments were manually reviewed and adjusted if necessary. GeneChip analysis software (V2.3) was used to merge the intensity information with the identity of the oligonucleotide synthesized at that particular array position and analyze the hybridization data. The presence (detection) of a particular RNA in the hybridization solution was determined by integration of hybridization pattern [perfect match (PM) and mismatch (MM) hybridization intensity, and ratios] and abundance across all probe pairs for each individual gene using the quantitative hybridization intensity as previously described (24). Analysis parameters used by the software were set to values corresponding to moderate stringency (GeneChip software settings: SDT = 30, SRT = 1.5). These analysis parameters were chosen based on experiments that demonstrate reliable detection of gene transcripts and spiked RNAs present at a low abundance level using the assay protocols described above and oligonucleotide arrays with 50 x 50-µm probe features (L. Wodicka and D. Lockhart, personal communication). Output from the GeneChip analysis was merged with the Unigene or GenBank descriptor and stored as an Excel data spreadsheet. Data was pruned by removal of all genes not scored by the Affymetrix software as "present" (detected) in at least 2 of the 15 specimens. For the purpose of fold increase calculations, genes with fluorescence (average difference measurement) of less than 10 were set to 10. Gene expression patterns were analyzed by self-organizing maps (SOMs) implemented in publicly available software (http://genome-www.stanford.edu/
sherlock) (Cluster 2.0), provided by Gavin Sherlock. Expression data was prefiltered and normalized as established by others (32).
Disease scoring.
Paraffin-embedded sections were stained with hematoxylin and eosin and were scored by a single histopathologist (P. E. Swanson) blinded to the results of the GeneChip analysis (see Table 1). Specimens were examined to verify consistency with the clinical diagnosis and were then graded by the following standard pathological criteria: activity (A: 0, no active crypt injury; 1, focal cryptitis or crypt abscesses without ulceration; 2, diffuse cryptitis with crypt abscesses without ulceration; and 3, ulceration); and chronicity (C: 0, no chronic mucosal injury; 1, quantitatively increased lymphoplasmacytic infiltrates; 2, mucosal fibrosis or crypt architecture distortion; 3, mucosal atrophy) and degree (1, mild chronic inflammation; 2, marked chronic inflammation; for example, "2C3" indicates marked chronic inflammation with mucosal atrophy). Additional note (see "Misc." in Table 1) was made of the presence of eosinophils (E: 1, eosinophil-rich infiltrates; 2, eosinophilic cryptitis or crypt abscesses); increased apoptosis of epithelial cells (Ap); Paneth cell metaplasia (P); lymphoid aggregates with germinal centers (L); nerve cell hypertrophy (N); and thickened muscularis mucosae (M).
Validation of regenerating gene family expression by RT-PCR.
To provide independent confirmation for expression of regenerating gene family (REG) members in the control tissues and UC mucosa, RT-PCR was performed using the Titan one tube system (Boehringer Mannheim). PCR primers specific for each individual human REG family member were used for cDNA synthesis and amplification: pancreatic stone protein (PSP), GGCCAAGAGGCCCAGACAGAGTTGCC and TGAGTTGAGTTGGAGAGATGGTCCG; pancreatitis-associated protein (PAP), GAAGAACCCCAGAGGGAACTGCCC and ACCAAACACAGGCTGCTGACTTCC; and regenerating gene homolog (REGH), GGCCAGGAGTCCCAGACAGAGC and TTTTTGAACTTGCAAACAAAGGAGAAC. PCR products were run on 2% agarose gel and visualized by ethidium bromide staining. Bands were excised, cloned into pGEM-T (Promega), and analyzed by DNA sequencing.
 |
RESULTS
|
---|
Patient clinical characteristics and histopathological findings.
Fifteen mucosal specimens were chosen for analysis as shown in Table 1. Eight operative specimens were obtained from patients undergoing surgery for UC. The indication for surgery was disease refractory to medical management or chronic glucocorticoid dependence; thus our specimens may reflect a bias toward more severe or refractory disease. One pair of specimens was taken from an area of macroscopically uninvolved mucosa (UC-A-) and 15 cm distally, from macroscopically involved (UC-A+) mucosa. Seven non-UC operative specimens were obtained from noninflamed (Nl-A, -B, and -C) and inflamed control mucosa (Nl-D, and CD-A, -B, and -C). The three noninflamed control specimens were obtained from macroscopically normal areas of surgical resections for adenocarcinoma (Nl-A), diverticular abscess (Nl-B), and diverticulosis (Nl-C). One additional specimen had moderate acute inflammation secondary to rectal prolapse (Nl-D). Several different types of colonic processes were consciously selected so that the effect of any gene expression change (e.g., increased expression of a theoretical adenocarcinoma predisposing gene) would be minimized when the three noninflamed control specimens were averaged. Control mucosal samples were obtained at least 10 cm away from pathology such as adenocarcinoma or diverticular abscess, and none of the colitis specimens had evidence of epithelial dysplasia. Three CD specimens were also included as inflamed controls. They were chosen to broadly represent the clinical variation seen in CD; 1) segmental CD with fistula formation (CD-A), 2) severe ileal disease with mild proximal colonic involvement (CD-B), and 3) Crohns colitis, macroscopically with skip lesions, bear claw ulceration, and serosal thickening (CD-C). The results of the histopathological examination using a tissue sample isolated from the middle of the region of mucosa used for mRNA isolation are outlined in Table 1.
Expression sample preparation, hybridization, and analysis.
For mucosal gene expression analysis, total RNA was isolated, twice selected by oligo(dT) binding, and converted into double-stranded cDNA using a oligo(dT) primer incorporating a T7 RNA polymerase binding site. In vitro transcription, with biotin-labeled NTPs (biotin-11-CTP, biotin-16-UTP), was used to generate labeled target, using conditions previously show to provide nonbiased linear amplification (34). Following hybridization, washing, and staining with streptavidin-phycoerythrin, arrays were read in a confocal fluorescence scanner. Figure 1A shows a magnified view of the probe pairs corresponding to immunoglobulin-
3 (an mRNA markedly increased in the mucosa of UC specimens) and actin (a housekeeping gene) from a UC and noninflamed control specimen. PM and MM hybridizations are evident by alternating rows of bright and darker hybridization signals.

View larger version (44K):
[in this window]
[in a new window]
|
Fig. 1. Probe array hybridization image and correlation of gene expression between patient specimens. A: magnified images of the probe pairs for actin and immunoglobulin- 3 are shown from an ulcerative colitis (UC, left) and noninflamed control mucosal specimen (right). Genes are represented on the chip by alternating rows of paired unique perfect match (PM) and mismatch (MM) oligonucleotide sequences. Mismatch pairs contain a centrally located single base pair mismatch but are otherwise identical to the PM sequence. Most individual genes are represented by 20 PM and 20 MM oligonucleotide containing features. Hybridization intensities used for subsequent analysis are the difference between the PM and MM intensities (after background subtraction) averaged across the 20 probe pairs. This tends to cancel nonspecific hybridization while allowing detection of low-abundance mRNAs. B: plot comparing the hybridization intensities from two A chips for individual genes expressed by two noninflamed control mucosal specimens. Of the genes detected on the array, 10.5% differ by greater than 3-fold and 1.4% differ by more than 10-fold. C: comparison of hybridization intensities for individual genes from a noninflamed control specimen and mucosal specimen isolated from a patient with UC; 17.3% showed an expression difference of greater than 3-fold and 3.3% showed an expression difference of more than 10-fold. Inner solid and outer dashed lines indicate expression differences of 3- and 10-fold, respectively.
|
|
To investigate the hybridization performance of individual mRNAs in the complex mixture of RNA species derived from the multicellular components of the colonic mucosa, we performed spiking experiments using synthetic bacterial and phage RNAs. Known quantities of individual control RNAs were added into hybridization mixtures to allow determination of hybridization intensities. Hybridization intensities were closely related to the concentration of spiked RNAs at lower concentrations (data not shown). A nonlinear increase in hybridization signal was noted at higher concentrations, presumably corresponding to saturation of the complementary oligonucleotide sites following 16-h hybridization. These conditions were chosen for subsequent studies to preserve sensitive detection of less abundant RNA species. Because of the curve saturation, fold changes in signal intensities will tend to underestimate the actual increase in RNA levels for more abundant messenger species. The lowest concentration of control RNA, spiked at 1.5 pM, had a mean fluorescence of 5 ± 3. This concentration, corresponding to
13 mRNA copies/cell (22), was intermittently scored as present in the GeneChip hybridization assay (data not shown), suggesting that 1.5 pM mRNA is near the detection threshold. To assess variation due to sources other than hybridization and scanning, we examined the reproducibility of results obtained from identical DNA arrays synthesized at different times and the results obtained using two independent RNA target preparations. Wodicka et al. (34) have previously shown that <0.1% of genes surveyed on the GeneChip showed a difference of greater than or equal to threefold when independent preparations of the same sample were compared. Our analysis of the hybridization signal for individual genes hybridized to different probe arrays (different synthesis batch with the same biotinylated RNA target) revealed detection of 757 of the
1,700 genes represented on this chip. Less than 1% (0.79%) differed by greater than threefold in intensity, and none varied by more than fourfold. To investigate the reproducibility of the steps involved in the synthesis of biotinylated RNA target, we independently prepared two biotinylated cRNA targets from the same starting mRNA. Again, less than 1% (0.65%) of genes differed by more than threefold, and the maximum observed difference was less than sixfold. To assess gene expression variability between two different specimens, we compared the expression of individual genes represented on a single probe array. Figure 1B shows a comparison between two noninflamed control mucosal specimens (Nl-A and Nl-B). Our results revealed that 10.5% of genes varied by more than 3-fold, and 1.4% varied by more than 10-fold. Figure 1C shows a similar comparison between a noninflamed control mucosal specimen (Nl-B) and an inflamed UC specimen (UC-F). Threefold or greater changes were identified in 17.3% of the genes represented on this particular chip, and greater than 10-fold changes were identified in 3.3% of the genes detected on this array.
Differential gene expression in UC mucosal specimens.
Poly(A)+ RNA was extracted from the mucosa of eight UC surgical resection specimens. Samples were selected to represent a range of disease activity (Table 1). To establish levels of baseline gene expression in noninflamed colonic mucosa, the hybridization intensity for each individual gene was averaged from three noninflamed control specimens (Nl-A, -B, and -C). Histogram analysis was performed to examine the distribution of fluorescent hybridization intensities (expression levels) for individual mRNAs profiled. The expression levels of individual genes in both normal and UC specimens were normally distributed (data not shown). Analysis of the raw data revealed that a considerable number of genes were differentially expressed in only one or a few specimens. To minimize the effect of individual patient variation, genes were selected for inclusion in Table 2 if expression was changed by more than threefold relative to the noninflamed controls in at least five of the eight UC specimens. This was chosen to be a conservative filter for the identification of genes whose expression is significantly changing in the majority of samples, biasing selection toward genes involved in a common pathway (as opposed to individual pathogenic mechanisms). Table 2 provides a summary of 74 mRNAs whose expression was reproducibly increased in UC. The absolute expression level for genes increased in UC, 3- to 5-fold, 5- to 10-fold, and >10-fold, in Table 2 average 561, 1,054, and 740, respectively. For reference, the median expression level (fluorescence) for genes expressed by both disease and noninflamed control mucosa was
200 (
5 pM, corresponding to
510 copies/cell). Thus there are a considerable number of genes expressed at low or undetectable levels in the control specimens that are expressed at moderate or high abundance in disease specimens. A subset (16/74, 22%) of the genes (indicated in Table 2) have either been previously reported to be elevated in the mucosa of patients with IBD, a related inflammatory disorder, or were directly confirmed by us. Please refer to the Supplementary Material1 for this article (published online at the Physiological Genomics web site) for a comprehensive list of genes increased (Table 3a) or decreased (Table 3b). The expression level for selected genes across each individual specimen has also been included in the Supplementary Material (Table 4) to demonstrate the ability of oligonucleotide-based arrays to provide discrimination between differentially regulated but highly related gene family members [e.g., matrix metalloproteinases, and the interleukin-8 (IL-8) C-X-C chemokine subfamily].
All three members of the human REG family (PAP, PSP, and regenerating protein 1ß) were found to be substantially elevated in UC, with minimal or no expression in the noninflamed control specimens. To provide independent confirmation of changes in the expression of these genes in UC, RT-PCR was performed using the Titan one tube system (Boehringer Mannheim). PCR primers specific for individual REG family members were used for cDNA synthesis and amplification. Initial amplification, carried out for 28 cycles, did not show any product in the noninflamed controls (not shown). A representative gel showing PCR products produced following a 35-cycle reaction using RNA from normal mucosa (Nl-B) or UC mucosa (UC-E) is shown in Fig. 2. The arrow in Fig. 2 (right) shows the location of the PCR product corresponding to three individual REG family members shown to be upregulated in UC. To further verify primer specificity for individual family members, PCR products were analyzed by restriction digestion and DNA sequencing. All were shown to perfectly match the expected sequences.

View larger version (53K):
[in this window]
[in a new window]
|
Fig. 2. Expression of members of the regenerating gene family (REG) determined by reverse transcriptase PCR. The human REG family consists of pancreatitis-associated protein (PAP), pancreatic stone protein (PSP), and regenerating gene homolog (REGH). To provide independent confirmation of changes in the expression of these genes in UC, RT-PCR was performed using the Titan one-tube system (Boehringer Mannheim). PCR primers specific for individual REG family members were used for cDNA synthesis and amplification. The PCR products following 35 cycles of amplification, using RNA from a normal (Nl-B) or UC (UC-E) patient, are shown. Since REG members have considerable sequence homology, bands were gel purified and cloned into pGEM-T (Promega). Resulting clones, analyzed by restriction digest and bidirectional sequencing, were found to match the expected sequences. This gel is representative of 3 experiments.
|
|
Interpreting gene expression patterns with self-organizing maps.
SOMs were applied to analyze gene expression patterns contained within our data set. The 1,087 genes that increased or decreased more than threefold relative to the mean noninflamed control expression in any of the specimens were included in the analysis. Hybridization intensities for individual genes were normalized to a mean of 0 and standard deviation of 1 to allow clustering to occur on the basis of the expression profile rather than by absolute level (32). To compare gene expression profiles and histopathological findings, pathological scoring criteria (Table 1) for activity, chronicity, and the presence of Paneth cell metaplasia were incorporated into the gene expression data set before clustering into a 4 x 5 node SOM matrix using Cluster (V2.0). Final SOM geometry was selected by visual inspection (32). A range of two-dimensional matrices was examined; less than 20 nodes provided indistinct patterns (and an associated wide variation between expression of individual genes within a cluster). Increasing beyond 20 nodes led to cluster duplication (nearly identical cluster patterns). Activity, chronicity, and Paneth metaplasia variables were found to have segregated into clusters 18, 19, and 14, respectively. Figure 3A is a plot of the histopathological score for activity and chronicity for all 15 specimens. Figure 3B shows the average normalized gene expression for the genes contained within clusters 1719. The average gene expression in cluster 18 closely reflected the ordering of specimens by the disease activity score. Many of the genes contained within cluster 18 and closely related clusters 17 and 19 have been previously identified in the literature for their specific involvement in IBD or inflammation. Included in these clusters were the following: IL-1ß; IL-6; IL-8; IL-11; GRO2; lymphotoxin-ß; IL-1 receptor antagonist (IL-1 RA); cyclooxygenase-2 (COX-2); granulocyte-macrophage colony-stimulating factor (GM-CSF); monocyte chemotactic and activating factor; and intercellular adhesion molecule 1 (ICAM1). Also included were markers for the following specific cell lineages: CD83 and CD19 (dendritic cells); the early T cell activation antigen (CD69); T cell-specific transcription factor TCF7; T cell receptor-ß cluster; endothelial activation markers (CD62e, CD62p, endothelial differentiation protein EDG-1); various stages of B cell differentiation (CD21, CD20 receptor, CD22, CD53, CD127); and neutrophil-specific proteins (e.g., neutrophil cytosolic factor 1) or genes known to be induced with neutrophil differentiation [e.g., protein tyrosine phosphatase receptor type C, selectin L (CD62l), pleckstrin, myeloid cell nuclear differentiation antigen (32)]. These results indicate that SOMs can be applied to identify genes involved in a particular process, such as acute inflammation/histopathological "activity" without prior knowledge of their identities. Plots of normalized gene expression for other individual clusters can be found in Fig. 4 of the Supplementary Material.

View larger version (19K):
[in this window]
[in a new window]
|
Fig. 3. Gene clustering and analysis by self-organizing maps (SOMs) and histopathological score; 1,087 genes that increased or decreased more than 3-fold relative to the mean noninflamed control baseline were included in this analysis and were grouped into 20 clusters using Cluster V2.0. A: plot of the histopathological score for activity (0, no active crypt injury; 1, focal cryptitis or crypt abscesses without ulceration; 2, diffuse cryptitis with crypt abscesses without ulceration; 3, ulceration) and chronicity (0, no chronic mucosal injury; 1, quantitatively increased lymphoplasmacytic infiltrates; 2, mucosal fibrosis or crypt architecture distortion; and 3, mucosal atrophy) for each of the 15 specimens. B: average normalized gene expression for genes contained within clusters 1719. Hybridization intensities were normalized to a mean of 0 and a standard deviation of 1 across individual samples to allow clustering to occur on the basis of expression profile rather than absolute gene expression levels. Scores for histopathological variables were incorporated into the data set before application of the SOM and allowed to cluster with individual genes. Variables for activity segregated to cluster 18 and for chronicity segregated to cluster 19. Clustering by function was apparent, with many genes with a known role in acute and chronic inflammation found contained within clusters 1719. Genes in each cluster are represented by a centroid (average pattern) with error bars indicating the standard error. Individual samples are ordered across the x-axis: UC-C, UC-D, UC-F, UC-G, UC-E, UC-A+, UC-B, UC-A-, Nl-D (inflamed), Nl-B, Nl-A, Nl-C, CD-A, CD-B, and CD-C. Clusters 019 are shown in Fig. 4 of the Supplementary Material, at the Physiological Genomics web site (see footnote 1).
|
|
 |
DISCUSSION
|
---|
Gene array techniques have been used previously to compare gene expression between benign and malignant tissues (3, 12, 21, 31), cell lines undergoing hematopoietic differentiation (32), and to identify gene expression common to rheumatoid arthritis synovial tissue and CD mucosa (18), but this is the first application of this approach to the comparison of gene expression in diseased (UC) and normal human colonic mucosa. This approach has allowed us to identify genes that are either overexpressed or underexpressed in UC mucosa compared with histologically normal mucosa. Individual studies examining one or a few genes in IBD have lead to a wealth of information about the molecular mechanisms underlying the persistent mucosal inflammation. Rapid accumulation of cDNA sequencing data in the public GenBank/dbEST sequence databases has driven the development of rapid and reliable methods for simultaneously monitoring the differential expression of thousands of genes, which promises to provide a more complete view of the biological events occurring in complex disease specimens. Description and classification of these changes are critical steps in the process of 1) dissecting the circuitry responsible for the disease process, 2) identification and validation of potential targets for therapeutic intervention, and 3) providing tools for improved diagnosis and classification. We report our findings examining mucosal gene expression in eight UC specimens paired with seven inflamed and noninflamed controls using high-density oligonucleotide arrays. A measure of the reliability of the methodology is the close agreement of hybridization results obtained from independent chips and independent biotinylated target synthesis reactions from the same mRNA sample. Our studies confirm increases in a number of immunoregulatory molecules known to be associated with IBD and implicate additional mediators as determinants of the inflammatory mucosal phenotype.
The Affymetrix GeneChip (Hum 6000) arrays used in this study are a set of four chips which contain
256,000 individual oligonucleotide features representing over 6,500 human genes and ESTs. These genes were selected to provide broad representation of known genes contained in GenBank and ESTs with similarity to entries in the SwissProt database (Affymetrix, unpublished material). Therefore this technology is a powerful tool with which to implicate the involvement of known genes (with a previously unsuspected role) and unknown or poorly characterized genes in a pathological process. Since genes contained on the arrays were not selected specifically for the analysis of gene expression in UC, they do not reflect biases inherent to any particular model of UC pathogenesis. This approach enhances the likelihood of identifying genes that are important to the pathogenesis of UC but are not a part of our current disease understanding. Our results confirm increases in a number of genes (Table 2) that have previously been described in association with UC, including inducible NO synthase (iNOS; 35), IL-1, IL-1 RA (10), and IL-8 (4). We have also identified the increased expression of genes associated with chronic inflammation and tissue remodeling but which have not previously been specifically associated with UC. These genes include a number of matrix metalloproteinases, pentraxin-related genes NPTX2 and PTX3, the cystic fibrosis antigen, and extracellular matrix constituents. A number of totally unexpected genes and ESTs were also found with increased expression. Our studies specifically implicate several potent neutrophil chemotaxins as potential mediators of acute neutrophilic crypt injury, including psoriasin (S100 calcium-binding protein A7), multiple members of the C-X-C chemokine subfamily, and small inducible cytokine A3 (SCYA3) in UC. Changes in mucosal cell populations, such as increased influx and activation of peripheral blood monocytes, were identifiable by expression of specific genes, including S100 calcium binding protein A8, M130 antigen (CD163), and the 39-kDa human cartilage glycoprotein. Increased mucosal expression of molecules implicated in immunologic tolerance including indoleamine 2,3-dioxygenase were also identified. Significant increases in a related enzyme, tryptophan 2,3-dioxygenase, indicate that it may also serve a related immunoregulatory role. Identification and characterization of differentially expressed transcripts will allow the development of new and more comprehensive models for the mucosal events critical to the pathogenesis of UC. These results also provide an important proof of principle for the application of gene arrays to discovery efforts in IBD.
There are genes involved in UC that were not identified in this analysis. Among those genes known to be elevated in UC and not identified in Table 2 are 5-lipoxygenase (5-LO) and COX-2. These genes are represented on the arrays and were elevated in UC subjects but did not meet the applied cutoff criteria. COX-2 was expressed on average 3.2-fold higher than controls, but only three of the eight specimens had expression greater than threefold above the mean control. 5-LO was expressed on average approximately twofold higher than controls, but only one of the eight specimens had expression more than threefold above control. These results illustrate that fold change criteria need to carefully incorporate the specific goals of the analysis (e.g., balancing specificity and sensitivity). There are a number of other possibilities why relevant genes may not the identified by this type of analysis: 1) genes may be expressed at low levels or only in cells that make up a small fraction of the mucosa and simply fall below the level of reliable detection for the assay; 2) the hybridization efficiency for a specific probe-target pair may have been low (however, this is less likely to impact our results since 20 different oligonucleotide probe sets were used to represent most genes on the GeneChip); and 3) to be detected, gene sequences must be adequately represented on the array. Probe sequences selected for inclusion on a gene chip could fail to represent some differentially spliced gene variants. Accordingly, one must be cognizant of these issues and avoid using this type of analysis to exclude involvement of a particular gene in a disease process.
SOMs, based on an unsupervised neural network algorithm, were applied to cluster and analyze gene expression patterns. This analysis assigns genes to the single group or "cluster" that most closely shares a related expression pattern across specimens. This approach has biological relevance, because coordinated regulation of groups of genes often signifies a role in a common process or pathway (9, 15, 20). However, there were also informative examples of inflammation-associated genes that did not cocluster with other known inflammation-related genes. There are several possible explanations aside from technical considerations discussed above. One cause relates to the multiple cell populations present in most biopsy specimens. CD9, a cell surface molecule expressed by activated and differentiating B and T cells, was unexpectedly found in cluster 1. However, CD9 is also expressed by multiple mucosal cell populations including immune, epithelial, endothelial, and smooth muscle cells. Genes concurrently expressed by multiple cell populations may provide a different expression profile than a gene exclusively expressed by an activated B or T cell. This was also the case for the other expressed CD markers that did not cosegregate into other inflammation-associated gene clusters (CD55, CD124, CD114, and CD31). In contrast, more specific markers for inflammation or immune cell populations (CD69, an early T cell activation antigen; CD19 and CD22, B cell markers; CD53, an exclusively leukocyte marker; CD62L, which mediates lymphocyte homing to high endothelial venules and leukocyte rolling on activated endothelium; Granzyme B, cytotoxic T cell-associated serine esterase 1; CD38, highly expressed on hemopoietic cells during early differentiation and activation; and CD83, a dendritic cell surface antigen) all segregated to related inflammatory gene clusters. Finally, inflammation-associated genes may segregate differently due to underlying patient variables (e.g., genetic, medications, concurrent disease) or disease heterogeneity (e.g., different pathogenic mechanisms). These findings indicate that while positive gene clustering data can be applied to identify genes involved in a biological process, negative clustering data should not be used to exclude involvement of a particular gene in a specific biological process when applied to complex tissues.
Known genes provide insight into possible functions of novel or poorly characterized coclustered genes. Glia maturation factor-
(GMFG), originally identified by homology to GMF-ß, a growth and differentiation factor for neurons and glia, clusters with other genes related to disease activity. GMFG mRNA levels were increased sevenfold in UC specimens. Although this molecule has not been functionally characterized, our clustering results would suggest its involvement in the immune response. This idea is supported by the recent identification of GMFG transcripts in hematopoietic stem/progenitor cells (25) and representation in multiple lymphoid tissues in the dbEST database. Clustering by function was also apparent in cluster 11, where many of the genes are involved in extracellular matrix synthesis (e.g., collagens, versican, and osteonectin) and remodeling (including matrilysin, MT-MMP, anti-elastase, maspin, protease inhibitor 3). Another interesting member of cluster 11 was pigment epithelium-derived factor (PEDF). Expression of PEDF, a potent angiogenesis inhibitor (11), was significantly increased in UC specimens. Inhibition of endothelial cell migration during mucosal repair and regeneration could play a key role in the genesis of bloody diarrhea characteristic of UC.
Gene products from a particular cell type tend to cluster together, providing clues to the cellular origin of novel gene products. Our data demonstrated marked expression of individual members of the homologous REG gene family (PSP, REGH, and PAP) in the setting of chronic mucosal injury and inflammation. We have confirmed these findings by RT-PCR (Fig. 2). GeneChip expression analysis showed minimal or absent expression in paired noninflamed specimens or in a specimen with acute inflammation from rectal prolapse. The cellular origin of this gene family in the inflamed colon was unknown. The pathological variable "Paneth cell metaplasia" was contained in cluster 14 with REG family members, suggesting the Paneth cell to be a likely cellular source for REG expression. Immunohistochemistry confirmed that a primary cellular source for the full-length PSP protein in diseased mucosa was the metaplastic Paneth cell population (unpublished data).
Disease heterogeneity may complicate the study of patients with IBD. Identification of molecular markers that identify disease subpopulations is a critical goal for future CD research. Clustering methods have been applied to discriminate between subtle tissue phenotypes on the basis of broadly distributed gene expression "signatures" (3, 21). Different epithelial phenotypes (malignant vs. benign) were separable based on distinct gene expression profiles (3). Recently, gene expression profiling has been applied to identify molecularly distinct tumor subtypes (class prediction) in patients with the diagnosis of diffuse large B cell lymphoma or acute leukemias (2, 17). This characterization was clinically significant, with one subtype demonstrating a significantly different therapeutic response (2). Although the data we present involve a relatively small number of IBD specimens, our results support the presence of heterogeneity within diagnostic groups. Cluster 3 contains
60 genes uniquely induced in association with fistulizing CD (CD-A). Members of this cluster include: mitochondrial stress-70 protein; 90-kDa heat shock protein; DNAJ protein homologs 1 and 2; 70-kDa heat shock proteins 1, 4, and 6; FK506-binding protein 4; ubiquitin; 27-kDa heat shock protein 1; and transformation-sensitive protein (IEF SSP 3521). This coherent functional profile, if confirmed in additional patient populations, may provide potential clues to events that lead to fistulizing behavior in a subset of patients with CD. Increased expression of a number of these genes has been described in association with cell stresses including intracellular pathogens or viral infection (1, 6, 36). Genes contained in a number of clusters (e.g., clusters 0, 4, 5, 8, and 9) also appear to be differentially expressed in subsets of UC specimens. These results support the feasibility of a larger study focused on identification of pathognomonic patterns of gene expression. These results might provide a basis for improved diagnosis and molecular classification of disease subgroups and serve to identify potential biological determinants of specific disease behaviors.
 |
ACKNOWLEDGMENTS
|
---|
We thank Sumathi Venkatapathy and Mamatha Mahadevappa for technical assistance, and we thank Jacques Retief for assistance with the data analysis.
This work was supported by National Institutes of Health Grants DK-02457, DK-33165, DK-55753, and P01-HG-01323.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: B. K. Dieckgraefe, Washington Univ. School of Medicine, 660 S. Euclid Ave., Campus Box 8124, St. Louis, MO 63110 (E-mail: dieck{at}im.wustl.edu).
1 Supplemental material to this article (Table 3, a and b, Table 4, and Fig. 4) is available online at http://physiolgenomics.physiology.org/cgi/content/full/4/1/1/DC1. 
 |
REFERENCES
|
---|
-
Adhuna A, Salotra P, Mukhopadhyay B, and Bhatnagar R. Modulation of macrophage heat shock proteins (HSPs) expression in response to intracellular infection by virulent and avirulent strains of Leishmania donovani. Biochem Mol Biol Int 43: 12651275, 1997.[ISI][Medline]
-
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, and Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503511, 2000.[ISI][Medline]
-
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, and Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 67456750, 1999.[Abstract/Free Full Text]
-
Arai F, Takahashi T, Furukawa K, Matsushima K, and Asakura H. Mucosal expression of interleukin-6 and interleukin-8 messenger RNA in ulcerative colitis and in Crohns disease. Dig Dis Sci 43: 20712079, 1998.[ISI][Medline]
-
Bowtell DD. Options available, from start to finish, for obtaining expression data by microarray. Nat Genet 21: 2532, 1999.[ISI][Medline]
-
Brenner BG and Wainberg MA. Heat shock protein-based therapeutic strategies against human immunodeficiency virus type 1 infection. Infect Dis Obst Gynecol 7: 8090, 1999.
-
Brown PO and Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet 21: 3337, 1999.[ISI][Medline]
-
Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, and Childs G. Making and reading microarrays. Nat Genet 21: 1519, 1999.[ISI][Medline]
-
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, and Herskowitz I. The transcriptional program of sporulation in budding yeast. Science 282: 699705, 1998.[Abstract/Free Full Text]
-
Cominelli F and Pizarro TT. Interleukin-1 and interleukin-1 receptor antagonist in inflammatory bowel disease. Aliment Pharmacol Ther 10: 4953, 54, 1996.
-
Dawson DW, Volpert OV, Gillis P, Crawford SE, Xu H, Benedict W, and Bouck NP. Pigment epithelium-derived factor: a potent inhibitor of angiogenesis. Science 285: 245248, 1999.[Abstract/Free Full Text]
-
DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, and Trent JM. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14: 457460, 1996.[ISI][Medline]
-
Duggan DJ, Bittner M, Chen Y, Meltzer P, and Trent JM. Expression profiling using cDNA microarrays. Nat Genet 21: 1014, 1999.[ISI][Medline]
-
Eisen MB and Brown PO. DNA arrays for analysis of gene expression. Methods Enzymol 303: 179205, 1999.[ISI][Medline]
-
Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 1486314868, 1998.[Abstract/Free Full Text]
-
Elson CO, Sartor RB, Tennyson GS, and Riddell RH. Experimental models of inflammatory bowel disease. Gastroenterology 109: 13441367, 1995.[ISI][Medline]
-
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, and Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531537, 1999.[Abstract/Free Full Text]
-
Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J, Woolley DE, and Davis RW. Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. Proc Natl Acad Sci USA 94: 21502155, 1997.[Abstract/Free Full Text]
-
Hugot JP, Laurent-Puig P, Gower-Rousseau C, Olson JM, Lee JC, Beaugerie L, Naom I, Dupas JL, Van Gossum A, Orholm M, Bonaiti-Pellie C, Weissenbach J, Mathew CG, Lennard-Jones JE, Cortot A, Colombel JF, and Thomas G. Mapping of a susceptibility locus for Crohns disease on chromosome 16. Nature 379: 821823, 1996.[ISI][Medline]
-
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, and Brown PO. The transcriptional program in the response of human fibroblasts to serum. Science 283: 8387, 1999.[Abstract/Free Full Text]
-
Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, Smith PD, Jiang Y, Gooden GC, Trent JM, and Meltzer PS. Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res 58: 50095013, 1998.[Abstract]
-
Lewin B. Gene Expression. New York: Wiley-Interscience, 1980, vol. 2.
-
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, and Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 16751680, 1996.[ISI][Medline]
-
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, and Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 16751680, 1996.[ISI][Medline]
-
Mao M, Fu G, Wu JS, Zhang QH, Zhou J, Kan LX, Huang QH, He KL, Gu BW, Han ZG, Shen Y, Gu J, Yu YP, Xu SH, Wang YX, Chen SJ, and Chen Z. Identification of genes expressed in human CD34(+) hematopoietic stem/progenitor cells by expressed sequence tags and efficient full-length cDNA cloning. Proc Natl Acad Sci USA 95: 81758180, 1998.[Abstract/Free Full Text]
-
Pena AS and Crusius JB. Genetics of inflammatory bowel disease: implications for the future. World J Surg 22: 390393, 1998.[ISI][Medline]
-
Podolsky DK. Lessons from genetic models of inflammatory bowel disease. Acta Gastroenterol Belg 60: 163165, 1997.[ISI][Medline]
-
Ramsay G. DNA chips: state-of-the-art. Nat Biotechnol 16: 4044, 1998.[ISI][Medline]
-
Satsangi J, Parkes M, Jewell DP, and Bell JI. Genetics of inflammatory bowel disease. Clin Sci (Colch) 94: 473478, 1998.[ISI][Medline]
-
Schena M, Shalon D, Heller R, Chai A, Brown PO, and Davis RW. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 93: 1061410619, 1996.[Abstract/Free Full Text]
-
Shim C, Zhang W, Rhee CH, and Lee JH. Profiling of differentially expressed genes in human primary cervical cancer by complementary DNA expression array. Clin Cancer Res 4: 30453050, 1998.[Abstract]
-
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, and Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 29072912, 1999.[Abstract/Free Full Text]
-
Tysk C, Lindberg E, Jarnerot G, and Floderus-Myrhed B. Ulcerative colitis and Crohns disease in an unselected population of monozygotic and dizygotic twins. A study of heritability and the influence of smoking. Gut 29: 990996, 1988.[Abstract]
-
Wodicka L, Dong H, Mittmann M, Ho MH, and Lockhart DJ. Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat Biotechnol 15: 13591367, 1997.[ISI][Medline]
-
Zhang XJ, Thompson JH, Mannick EE, Correa P, and Miller MJ. Localization of inducible nitric oxide synthase mRNA in inflamed gastrointestinal mucosa by in situ reverse transcriptase-polymerase chain reaction. Nitric Oxide 2: 187192, 1998.[ISI][Medline]
-
Zhu H, Cong JP, Mamtora G, Gingeras T, and Shenk T. Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. Proc Natl Acad Sci USA 95: 1447014475, 1998.[Abstract/Free Full Text]