1 Taisho Laboratory of Functional Genomics, Nara Institute of Science and Technology
2 Core Research for Evolutional Science and Technology, Japan Science and Technology Corporation, 8916-5 Takayama, Ikoma, Nara, 630-0101, Japan
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
adapter-tagged competitive polymerase chain reaction; cluster analysis; SwissProt database
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
So far we have tried to understand such programs by analyzing individual genes. Genetic approaches based on mutant analysis have resulted in the isolation of key genes for specific processes, particularly in invertebrate model systems. However, such approaches are insufficient to completely address the genetic mechanism of any developmental process. To approach such issues, it is absolutely necessary to describe expression states of the entire gene population.
Large-scale analysis of gene expression is not conceptually new (14). The main factor preventing this concept from extensive application had been technological. For example, Northern hybridization (27) is not applicable for testing thousands of genes, whereas differential hybridization (27) and differential display (17) can point out only a fraction of genes that are differentially expressed. cDNA microarrays have been successfully applied to monitor expression levels of active genes in budding yeast (7, 32) and some mammalian cell lines (13). So far, however, application to the mammalian nervous system has been limited only to identification of differentially expressed genes (18, 40). High mRNA complexity results in higher background noise and consequently makes quantitation unreliable. The other approaches based on frequency of clones such as digital expression profiling (23) or serial analysis of gene expression (SAGE) (36) are labor intensive, and not sensitive to detect modest changes.
These technical limitations can be overcome by introduction of adapter-tagged competitive PCR (ATAC-PCR) (15), an advanced form of quantitative PCR, which is characterized by addition of adapters with different spacer length to different cDNA samples. Because the technique is free from tedious steps inherent to conventional quantitative PCR, a large number of genes can be assayed. ATAC-PCR has high sensitivity, and can identify changes in gene expression as small as twofold without ambiguity. Combined with capillary sequencers, ATAC-PCR has the ability to process as many genes as DNA microarrays.
We analyzed gene expression profiles in postnatal mouse cerebellar development with ATAC-PCR. The cerebellum is one of the best-studied regions in the mammalian nervous system (2, 3). There are two major cell types: Purkinje cells and granule cells. Purkinje cells have already finished proliferation at birth, and grow in size and become functionally mature after birth. Granule cells proliferate after birth in the epidermal germinal layer, the outermost layer of the cerebellar cortex. At birth, there are few granule cells, which then proliferate vigorously and soon exceed Purkinje cells both in number and volume. The granule cells then transform their shapes, and start to migrate inward through the molecular layer, extend axons (parallel fibers), and settle at the granule cell layer. Cell proliferation reaches a peak during the first week after birth, whereas cell migration and elongation primarily occurs during the second week after birth. This process is completed by the third week after birth. At that point, the cerebellar cortex enters into its second phase of development, which involves gradual maturation of synapses without morphological changes, resulting in the full assumption of adult characteristics.
The cerebellar cortex has unique features that are favorable for gene expression analysis. Because the majority of cellular mass is occupied by a single cell type, (i.e., granule cell) (3), RNA obtained from the whole structure is likely to represent that from granule cells. The postnatal developmental processes occur synchronously. Furthermore, naturally occurring mutants (30) and targeted gene disruptions blocking particular steps of its development are available (5, 21). A preliminary study supported this view (20) and has prompted us to survey the expression profiles of genes on a large scale.
We determined expression levels of 1,869 genes by ATAC-PCR at 6 time points during postnatal cerebellar development. The expression patterns classified by cluster analysis were compared with a new functional category table constructed using information obtained from the literature. The gene expression patterns and the inferred functions were in good agreement with anatomical as well as physiological observations made during the developmental process.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Usually, six different cDNA samples attached to different adapters were used, three of which were assigned to different amounts of control cDNA samples. In the case of the cerebellar experiment, cDNA derived from the adult cerebrum was used as the control: 10, 3, and 1 portions of cDNA with different adapters were included in each PCR reaction. As for cerebellar samples, one portion each of three out of six cerebellar samples was included in the reaction. PCR amplification was performed with the carboxyfluorescein (FAM)-labeled adapter primer corresponding to the common region of adapters and with a gene-specific primer. Products were separated by polyacrylamide gel electrophoresis. With each PCR reaction, a calibration curve was made with three control samples. Thus accurate quantitation can be made with the three cerebellar samples. Sequences of primers and adapters are as follows: C1S-FAM, 5'-6FAM-GTACATATTGTCGTTAGAACGC-3'; MB-1, 5'-GTACATATTGTCGTTAGAACGCG-3' and 5'-GATCCGCGTTCTAACGACAATATGTAC-3'; MB-2, 5'-GTACATATTGTCGTTAGAACGCGACT-3' and 5'-GATCAGTCGCGTTCTAACGACAATATGTAC-3'; MB-3, 5'-GTACATATTGTCGTTAGAACGCGCATACT-3' and 5'-GATCAGTATGCGCGTTCTAACGACAATATGTAC-3'; MB-4, 5'-GTACATATTGTCGTTAGAACGCGATCCATACT-3' and 5'-GATCAGTATGGATCGCGTTCTAACGACAATATGTAC-3'; MB-5, 5'-GTACATATTGTCGTTAGAACGCGTCAATCCATACT-3' and 5'-GATCAGTATGGATTGACGCGTTCTAACGACAATATGTAC-3'; and MB-6, 5'-GTACATATTGTCGTTAGAACGCGTACTCAATCCATACT-3' and 5'-GATCAGTATGGATTGAGTACGCGTTCTAACGACAATATGTAC-3'.
An ABI model 3700 DNA analyzer was used for gel separation, with a current production rate of more than 1,000 assays per day. Genes subjected for ATAC-PCR analysis were selected in the descending order of abundance, prioritizing known genes.
To obtain expression patterns at six time points, three combinations of RNA samples were used with each gene. cDNA derived from adult cerebrum was used as a standard for calibration. With each set, two assays with different combinations of calibrations were performed. The first consisted of 10 portions of the standard with adapter MB-1, 3 portions with the MB-3, 1 portion with the MB-6, and 1 portion each of cerebellar samples with other adapters. The second consisted of 10 portions of the standard with adapter MB-6, 3 portions with the MB-4, 1 portion with the MB-1, and 3 portions each of cerebellar samples with other adapters. Using these two individual assays with different calibrations, most of the obtained data points were within the range of calibration. Those data which had discrepancies between the two were discarded. The overall success rate of assays was about 70%.
Statistical analyses.
Cluster analysis was performed using ClustanGraphics3 developed by Wishart (41). The data matrix was at first standardized to z-score, and cluster analysis was performed using Wards method (41). Optimal reordering of the cases was also performed using the software. Among the several hierarchical clustering procedures we tested, Wards method gave consistently better results than other methods. In the case of cerebellar development, clustering was truncated at the 12-cluster level. The proximity matrix was then reordered by a recently developed method. This method reorders the cases so that the rank correlation between the actual and target row-wise ranks is maximized.
Functional categories were assigned to all known genes in our expressed sequence tag (EST) collections based on the SwissProt database and/or the Medline abstract. Each functional category is independent from each other with several exceptions: "intracellular signal transduction" does not include serine-threonine kinases and tyrosine kinases; "cell surface molecule" includes all those except adhesion molecules.
Statistical tests to select functional categories enriched in specific clusters or groups were based on the binomial distribution. Those functional categories enriched in a specific cluster(s) or group were selected, by comparing the occurrence in a cluster with the occurrence in the entire population. The cutoff points were arbitrarily set either 0.01 (Fig. 3) or 0.05 (Fig. 4).
Multidimensional scaling was performed using a software package (STATISTICA 97) with default settings.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We selected six time points throughout early postnatal cerebellar development (2 days, 4 days, 8 days, 12 days, 3 wk, and 6 wk after birth) and determined expression levels. The relative expression levels of the developing vs. adult cerebrum were assayed using ATAC-PCR (15, 20). An outline of ATAC-PCR is schematically represented in Fig. 1 and described in detail in METHODS. To ensure accuracy, we repeated the assay with the following three combinations of time points: 2 days, 4 days, and 8 days; 12 days, 3 wk, and 6 wk; and 4 days, 12 days, and 6 wk. The expression profile covering 1,869 genes was generated from these data sets. (Please refer to the Supplementary Material1 for this article, published online at the Physiological Genomics web site.)
|
|
In a study using budding yeast, Munich Information Center for Protein Sequences (MIPS) classification of genes was used for correlating gene functions and their expression patterns (34). However, it is not applicable to mammalian systems, because the unicellular system lacks most of the functions that characterize multicellular organisms. Furthermore, it is not optimized for use with cluster analysis of gene expression.
In our EST collections, about 1,600 known genes, which included the 1,053 known gene assayed here, were identified, which covers more than one-quarter of the 5,841 known genes listed in the UniGene set (UniGene Build no. 75). All of the known genes were classified into 90 functional categories. Keywords representing functional categories are supplied as a supplementary material. The categorization of functions was done mainly based on description in the SwissProt database and/or Medline abstracts. Biologically relevant activities were used to assign functional categories to the gene products. In addition, we intended the assignment criteria to be rather "loose" so that enough genes could be included in each category. For example, detailed classifications of metabolic pathways were avoided. Functional categories are varied in nature. For example, "eye" and "testis" indicate sites of expression. The category "Ca2+-related" represents genes whose products require Ca2+ for their function, such as intracellular signal transducers, carrier proteins of unknown function, and calcium channels. Categories such as "brain" and "intracellular signal transduction" can cover a wide variety of genes. The functional category "brain" represents in some cases the site of expression, but in other cases, this represents functions related to the nervous system. Up to four keywords were assigned to each gene.
Correlation of functional categories enriched in specific gene expression patterns.
We selected functional categories that appeared more than 10 times in the 1,869 genes and searched for statistically significant correlations between the categories and the clusters representing expression patterns. The results are shown in Fig. 3. There were five functional categories that were enriched in group A clusters, characterized by high expression during early stages of development followed by decline. They included "cancer-related," which was enriched in group A1, "ribosomal protein," in group A1 and group A3, "RNA processing," in group A1, "intracellular signal transduction," in group A4, and "transcription factor," in group A4. Eight functional categories were enriched in group B clusters, which showed initial low expression followed by augmented expression during late stages of development (12 days, 3 wk, and 6 wk after birth). They included "carbohydrate metabolism," which was enriched in group B2, and those related to brain functions such as "ion channel and transporter" in group B3, "neurotransmitter receptor" in group B3, and "synapse component" in group B4. Figure 3 also includes genes that are predominantly or specifically expressed in the cerebellum. These two annotations were defined by data obtained by ATAC-PCR; relative expression levels at 6 wk against those in the adult cerebrum exceeding 20-fold were defined as "cerebellum specific," and those from 10- to 20-fold were as "cerebellum dominant." These two annotations were also enriched in group B clusters.
|
|
Expression patterns of individual genes.
Several interesting insights into individual genes can be obtained using the functional categories.
Ten of fifteen "cancer-related" genes belonged to the group A clusters (Fig. 5, "cancer-related"), strongly suggesting their involvement in cell proliferation. Genes encoding ribosomal proteins showed elevated expression during early stages of development, more than half of them being grouped in clusters A1 and A3, characterized by elevated expression around 4 days (Fig. 5, "ribosomal protein"). This peak represents the active phase of protein synthesis vs. the cell proliferation stage. It is interesting to note that genes whose products belong to the functional category "cell growth" did not necessarily exhibit elevated expression patterns during the early stage of cerebellar development (Fig. 5, "cell growth"). Further studies are needed, but the majority of the experimental evidence for these genes has been obtained not with intact nervous systems but with other in vitro systems. We suspect that functions in the nervous system in vivo might be different from those in in vitro systems.
|
The functional categories "brain," "intracellular signal transduction," and "cytoskeleton" each consist of a large number of gene products, whose characteristics are not entirely consistent. Only members of the "brain" category were found to be enriched in group B. Genes whose products have important functions in the mature brain were in general expressed to a lesser degree in the early stages and were progressively increased in expression over the course of development.
Transcription factors were enriched in cluster A4, but the biological meaning of this correlation is unclear. This category includes 12 transcription factors known to be involved in development or differentiation. Their expression patterns were not consistent with one another, and no overall tendency was observed.
Genes specific to oligodendroglia were found in group B with one exception: a gene encoding a brain-specific lipid-binding protein (Fig. 5, "oligodendroglia"). There are very few oligodendroglia at birth, but they increase in number over the course of development (8). Our finding is in complete agreement with this observation and suggests that their multiplication is most active at around 3 wk after birth.
Several dominant expression patterns were found for specific genes encoding adhesion molecules and matrix proteins, which are known to be functionally important during development of the nervous system (Fig. 5, "adhesion molecule," "cell matrix protein"). In particular, three gene products, reelin (9), tenascin (4), and matrix metalloproteinases (35) are known to play important roles during postnatal cerebellar development. The function of reelin was inferred from the Reeler mutant mouse to be involved in the formation of layer structures. In situ hybridization experiments revealed dense expression in the external germinal layer (29), and results with ATAC-PCR agreed well with these observations. Tenascin is likely to be expressed in astrocytes in the cerebellar cortex and is thought to take part in guidance of granule cell migration (4). Its expression was transiently elevated during the middle stage of development, which agrees well with the timing expected from its proposed physiological action.
Programmed cell death is an important mechanism of development. Recent studies demonstrated that the majority of programmed cell death occurred in the brain within the region of cell proliferation, although many cells were dying in the postmitotic regions (6). Five apoptosis-related gene products, Bax- (12), Requiem (11), TDAG51 (24), Nedd2 (16), and Siva (25), have been found to induce apoptosis. These activities were demonstrated only with blood cell lines and not with neuronal cell lines. Their expression patterns, except that of Siva, belonged to group A clusters, demonstrating elevated expression during early development (Fig. 5, "apoptosis"). This behavior is likely to be related to the known programmed cell death in the external germinal layer, the region where granule cells are proliferating. The role of late-onset apoptosis-related genes is awaiting further analyses.
Multidimensional scaling of developmental stages.
All of the above analyses have focused on characterizing how each gene product functions during postnatal cerebellar development. Instead, the relationships between each developmental stage can be explored based on similarities between each time point calculated using the values of the 1,869 genes as variables. We applied multidimensional scaling (28), which is a statistical procedure for fitting a set of points in a space such that the distances between points correspond as closely as possible to a given set of dissimilarities (or similarities) between a set of objects. Here, Pearsons product-moment correlation coefficient was calculated between each of the developmental stages, and they were plotted in three-dimensional space. In this analysis, distances between each time point represent dissimilarities deduced from the correlation coefficient: expression states of time points are similar when they are close to each other. Because of the characteristics of the correlation coefficient, the quantitative aspects of each transcript were ignored, and the expression status of each gene was treated equally. As shown in Fig. 6, day 2 and day 4 are located close to each other, and day 12, week 3, and week 6 are close to each other. They represent two distinct groups, each representing the cell proliferation stage and the maturation stage. Day 8 is far away from the others, indicating distinctive physiological states.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Information regarding gene product functions that have been determined thus far is stored mainly as articles in scientific journals. To couple the information with gene expression patterns obtained by cluster analysis, we added to each gene keywords representing functional categories which summarize literature information. Assignment of keywords allowed for statistical and categorical correlation of gene expression patterns and gene product functions.
We analyzed the expression of 1,869 genes at 6 time points, which covered only parts of the entire population. Nevertheless, they gave a number of interesting findings and clues for further analysis. From the analysis of known genes, their expression profiles were in complete agreement with the anatomy and physiology of the developing cerebellum. Strong correlation between gene product functions and expression patterns suggests that functions of novel genes might be deduced from their expression patterns. In most of the cases, however, such assertions cannot be made easily, because functions of known genes are not as highly condensed in the clusters. When used in combination with other information such as primary structure, however, hypotheses can be made, leading to more focused experimental design.
Several molecules have been identified as being involved in controlling development of cerebellar granule cells. A helix-loop-helix transcription factor Mth1 is essential for embryogenesis of granule cells (5). The sonic hedgehog pathway has been shown to be involved in postnatal granule cell proliferation (38), supporting a model of local control of the proliferation by Purkinje cells. We also identified here at least 63 transcription factors, many of which are likely to control expression of other genes during development. It is a difficult but very important problem to link the expression patterns of these genes and other genes in their respective pathways.
The use of functional categories was successful for correlating gene expression patterns and product functions. As demonstrated by as the analyses of adhesion molecules and apoptosis-related genes, it is also a good method to discriminate and characterize members within a functional category. However, several methodological problems have been observed. The results are subject to a large degree of experimenter bias. We do not claim that our keyword list is a definitive version, and keyword lists designed by other investigators may provide different interpretations. In addition, keywords are condensations of information and may often exclude important details. More importantly, the literature information is derived from research conducted and reported in the past and will therefore reflect any biases of that work without addressing any of the specific conditions under which that work was performed, which may or may not be relevant to the current situation. For example, genes belonging to the category "cell growth" were not necessarily highly expressed during the cell proliferation period. The cell growth-promoting effects of these gene products have been mainly studied using in vitro systems. Their main functions in the brain, on the other hand, could be distinct from previously observed in vitro activities. Thus the analysis of expression patterns may suggest other unknown functions of known genes products.
Interpretation of expression patterns requires several other caveats. It should be noted that the levels of proteins do not necessarily parallel mRNA levels and that protein levels are more likely to reflect the physiological states of samples (20). Repeated experiments exploring the limits of ATAC-PCR revealed that differences less than twofold were ambiguous. Expression patterns with marked changes and those with minor changes were indistinguishable after standardization, and data sets with small changes may yield incorrect expression patterns. Transcription of some genes might be loosely controlled, and patterns might be different among individual samples.
Both cluster analysis and multidimensional scaling can be used to demonstrate the relationship of each developmental stage by means of similarity matrix of expression states. For a small number of cases, multidimensional scaling is more appropriate because the relationship is demonstrated visually in a two- or three-dimensional space. It should be noted that the results shown in Fig. 6 are based on an assumption that the weight of each gene is equal. Although the results are in agreement with anatomical and physiological observations, it may be rather appropriate to assume that genes of several functional categories weigh more than the others. For example, since the properties of neurons are mainly determined by their electrophysiological properties, higher weights of genes belonging to such groups might better reflect the states of tissues.
A tissue is composed of multiple cell types, and expression patterns obtained from RNA extracted from whole tissue are a weighted average of those of each cell type. It is therefore necessary to be careful in our data interpretation, especially for the middle stage of development; at these stages, there are at least three granule cell types: those in the outer epidermal germinal layer, inner germinal layer, and granule cell layer. In general, observed changes in expression in each cell type would be masked by opposite changes in other cell types, such that the real changes of gene expression in individual cells should be sharper than those observed. More accurate observation can be done with separate sampling of each cell layer using advanced techniques such as laser microcapture dissection (31).
The work presented here is only the beginning of a long period of postgenomic research. The technology both for assays and statistical analysis may be widely applicable for the analysis of complicated systems including other parts of the nervous system.
![]() |
ACKNOWLEDGMENTS |
---|
The expression data and the list of functional categories will be available from our web site (http://love2.aist-nara.ac.jp).
This work was partly supported by a Grant-in-Aid from the Ministry of Education, Science, Sports, and Culture.
![]() |
FOOTNOTES |
---|
Address for reprint requests and other correspondence: K. Kato, Taisho Laboratory of Functional Genomics, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0101, Japan (E-mail address: kkato@bs.aist-nara.ac.jp).
1 Supplementary Material to this article is available online at http://physiolgenomics.physiology.org/cgi/content/full/4/2/155/DC1.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|