Application of machine learning and visualization of heterogeneous datasets to uncover relationships between translation and developmental stage expression of C. elegans mRNAs
Marjan Trutschl1,3,
Tzvetanka D. Dinkova2 and
Robert E. Rhoads2
1 Department of Computer Science, Louisiana State University
2 Department of Biochemistry and Molecular Biology
3 Center for Bioinformatics and Computational Biology, Louisiana State University Health Sciences Center, Shreveport, Louisiana
 |
ABSTRACT
|
---|
The relationships between genes in neighboring clusters in a self-organizing map (SOM) and properties attributed to them are sometimes difficult to discern, especially when heterogeneous datasets are used. We report a novel approach to identify correlations between heterogeneous datasets. One dataset, derived from microarray analysis of polysomal distribution, contained changes in the translational efficiency of Caenorhabditis elegans mRNAs resulting from loss of specific eIF4E isoform. The other dataset contained expression patterns of mRNAs across all developmental stages. Two algorithms were applied to these datasets: a classical scatter plot and an SOM. The outputs were linked using a two-dimensional color scale. This revealed that an mRNAs eIF4E-dependent translational efficiency is strongly dependent on its expression during development. This correlation was not detectable with a traditional one-dimensional color scale.
eIF4E; self-organizing map; color scale; mRNA-specific translational control; Caenorhabditis elegans
 |
INTRODUCTION
|
---|
DNA MICROARRAYS have been increasingly applied for quantification of global gene expression in both humans and model organisms to understand a wide range of cellular responses. Computational methods used for DNA microarray experiments include, among other methods, hierarchical clustering and self-organizing maps (SOMs) (9). For instance, coregulation of genes in yeast has been explored through application of hierarchical average-link clustering (3), fuzzy k-means clustering (4), and SOM clustering (5, 16). When groups of records are obtained from a clustering algorithm, the user may also be interested in descriptive information about a single group of records, differences that set individual records apart from the majority, or properties of the complete dataset.
The soil nematode Caenorhabditis elegans is a powerful metazoan model system that has been used for a variety of gene expression profiling studies. DNA microarrays have been used in this organism to explore changes in gene expression during development and cell differentiation and in response to stress conditions (18, 12, 17). These studies have measured mRNA abundance, which reflects regulation of both transcription and mRNA turnover. Recently, we extended microarray analysis in C. elegans to investigate gene expression at the level of translation (protein synthesis) (2). Although translational control of gene expression is widespread (13), relatively few studies have made use of DNA microarrays, and none in C. elegans before our work.
mRNA translational efficiency is modulated both through covalent changes affecting the activities of the canonical initiation factors and also through binding of mRNA-specific proteins (13). A different class of eukaryotic initiation factors (eIF) catalyzes each of the individual steps of protein synthesis initiation. Recruitment of mRNA is normally rate limiting for initiation of translation and requires the recognition of the 5'-terminal m7G-containing cap by eIF4E. The availability of eIF4E for interaction with mRNA and other initiation factors is regulated in response to a variety of hormonal, nutritional, and mitogenic signals.
Intriguingly, multiple isoforms of eIF4E have been found in plants, flies, mammals, frogs, nematodes, and fish (reviewed in Ref. 2). Five eIF4E isoforms, termed IFE-1 through IFE-5, are expressed in C. elegans (6, 8). Prior attempts to understand the physiological roles of multiple eIF4E isoforms within a single organism have met with only partial success. Our initial approach to understand the roles of individual eIF4E genes in C. elegans was to knock out the expression of a single eIF4E gene, encoding IFE-4, by either RNA interference or a null mutation and then examine the polysomal distribution of all mRNAs using DNA microarrays (2). We found that polysome shifts, in the absence of total mRNA changes, occurred for only a small subset of mRNAs. The selective decreases in translational efficiency were confirmed by decreases in the corresponding proteins in the knockout worms. However, our analysis did not reveal why this particular subset of mRNAs responded to loss of IFE-4, whereas the vast majority of mRNAs did not. There are six main developmental stages in the C. elegans life cycle: embryo, four larval stages (L1, L2, L3, L4), and adult (15). One possibility was that the mRNAs had to be expressed with a particular developmental stage pattern to be affected by loss of IFE-4.
In the current study, we tested the hypothesis that changes in mRNA distribution on polysomes caused by deletion of the ife-4 gene were correlated with the expression patterns of these mRNAs during development. In two separate approaches, we applied a combination of neural network-supported learning and visualization to discover relationships between two heterogeneous datasets describing individual C. elegans mRNAs: one dataset of changes in translational efficiency upon deletion of ife-4, and one dataset of expression during development. The results of both approaches yielded evidence for a strong correlation between polysomal behavior and a particular developmental pattern.
 |
METHODS
|
---|
Microarray data from C. elegans polysomes.
Polysomal distribution data were obtained as previously described (2). Briefly, heavy (H) and light (L) polysomes were sedimented from approximately equal proportions of each developmental stage of C. elegans strain N2 var. Bristol (wild type) and strain KX17[ife-4(ok320)], the null mutant for ife-4. Total RNA (T) was isolated from the same worm populations. The RNAs were hybridized to Affymetrix GeneChip microarrays containing probes to all C. elegans genes, and the signals were analyzed using Affymetrix software. Genes called "present" in at least one of the samples [H, L, or T from ife-4(ok320) or N2] were further analyzed. Signals for each gene in H or L arrays were normalized by dividing with the corresponding signal from the T array. This eliminated changes in polysomal distribution of mRNAs that were due merely to alterations in transcription or mRNA stability (19). Signal log ratios (SLRs) for normalized H and L values from ife-4(ok320) vs. N2 were calculated, averaged for three experiments, scored for significance using the thresholds of SLR < 1 or SLR > +1, and compared with an unpaired t-test.
SOM.
Expression of individual C. elegans mRNAs as a function of developmental stage has previously been published (7). The data provide the relative mRNA levels in a given developmental stage compared with the same mRNA in a mixed-stage population, reported as fold expression. We used these data to construct a SOM of the pattern of expression across all six developmental stages of C. elegans. The SOM tools used were based on those available in SOM Toolbox (http://www.cis.hut.fi/projects/somtoolbox) for Matlab (http://www.mathworks.com). The number of clusters was not predetermined, but we found that a 5 x 5 SOM with 25 output nodes ("clusters") provided the most informative topology for this study. Membership in a SOM output node was established for each gene.
Centroids.
A centroid for each SOM output node was calculated based on the average value of sums (relative mRNA levels) for each of six dimensions (embryo, L1, L2, L3, L4, adult) using the equation
This information was displayed as a line plot for each output node, where the y-axis reflects relative mRNA levels and the x-axis reflects the six developmental stages. To facilitate the internodal comparison of average developmental profiles, a global scale was applied to the y-axis for all output nodes. The global scale used miny and maxy values identified over all six dimensions for all the data.
Two-dimensional color scale.
To achieve visual correlation between the SOM output nodes and a classical scatter plot, each output node was colored using a two-dimensional (2D) color scale where the rows in the SOM were mapped to a red (R) component and the columns to a green (G) component. One component (blue; B) was kept constant to achieve a 2D color scale. Each of the three color components was calculated using the equation
where v is value. The color scale was divided horizontally and vertically according to the number of output nodes in the rows and columns of the SOM so as to provide as many perceptually discriminative colors as there are categories.
Combining datasets.
The dataset resulting from microarray analysis of polysomal behavior comparing N2 with ife-4(ok320) (see above) and the dataset containing changes in mRNA level during development (7) were heterogeneous and could not be compared directly. They differed in both length and method of gene designation, one using Affymetrix Gene ID and the other WormBase ID (http://www.wormbase.org). We used a structured query language to join them into a cohesive entity by means of an Affymetrix-to-WormBase look-up table based on WormBase version WS100 (http://mcb.harvard.edu/hunter/Protocols/protocols.htm). Combining the two datasets resulted in 9,328 genes with a common gene designation.
 |
RESULTS
|
---|
Dataset on polysomal behavior and linkage to developmental expression.
In a previous study (2), we investigated the role of IFE-4, one of the five eIF4E isoforms expressed in C. elegans, which has homologs in plants and mammals. This was done by comparing the distribution of individual mRNAs in polysomal fractions between wild type (N2) and an ife-4 null mutant [ife-4(ok320)]. mRNAs experiencing a loss of translational efficiency shift from heavier to lighter polysomes (11). We reasoned that mRNAs that required IFE-4 for their translation would be found on lighter polysomes in ife-4(ok320) compared with N2 worms. Whole cell lysates were loaded onto sucrose density gradients, subjected to ultracentrifugation, and fractionated with continuous monitoring of absorbance at 260 nm (Fig. 1A). RNAs isolated from polysomal fractions were pooled into two different samples: light polysomal RNA (L; 24 ribosomes bound per mRNA) and heavy polysomal RNA (H; >4 ribosomes bound per mRNA) (Fig. 1B). Total RNA was obtained from the same cell lysates. The H, L, and T samples were used to obtain labeled cRNA and were hybridized to C. elegans Affymetrix GeneChip arrays containing a total of 22,625 probe sets (
18,967 genes). We found 14,764 sequences called present in at least one of the samples analyzed (H, L, or T) for either N2 or ife-4(ok320). The average hybridization signals were normalized by the mean expression signal on the array for each sample. These signals were used to compare the polysomal (H and L) and T mRNA levels between ife-4(ok320) and N2 (wild type). In our previous study (but not in the present one), we eliminated from consideration any mRNAs that changed more than twofold in T, since changes in H or L may have reflected overall intracellular amounts of mRNA rather than changes in translational efficiency. Surprisingly, significant polysomal changes were observed for only 33 of the mRNAs. These results indicated that translation of a specific subset of mRNAs is dependent on a unique isoform of eIF4E.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 1. Generation of a combined dataset of translational changes after ife-4 loss and mRNA expression as a function of developmental stage in Caenorhabditis elegans. The translational efficiencies of individual mRNAs were compared between mixed-stage wild-type C. elegans (N2) and a strain lacking the translational initiation factor IFE-4 [ife-4(ok320)]. A: polysomes were isolated from each strain and divided into heavy (H) and light (L) fractions. A typical polysomal profile (A260) is shown at left. B: RNA was isolated from H and L fractions as well as from the total extract (T). cRNAs from all 3 fractions were hybridized to C. elegans Affymetrix GeneChip microarrays. The individual mRNA signals in H or L fractions were normalized using T. C: Affymetrix data from B were linked with a published dataset (7) that gives expression of each mRNA as a function of developmental stage [embryo (E), larval stages 14 (L1L4), adult (A)] by using an Affymetrix-to-WormBase lookup table to create a combined dataset. The data are provided at http://genome.cs.lsus.edu/mRNA/PG2005.
|
|
Unfortunately, manual inspection of these mRNAs did not identify any unifying feature that would explain their common sensitivity to IFE-4, e.g., nucleotide sequence, trans-splicing, nature of the cap, chromosomal location, and so forth. We therefore set out in the current study to test whether this was related to the developmental expression pattern of these mRNAs. Information on the latter was available as a quantitative dataset consisting of 17,871 records reflecting the degree to which the mRNA level for a gene is overrepresented or underrepresented in a particular developmental stage compared with its level in the mixture of all stages (7). Most of the 14,764 mRNAs found to be present in our polysomal analysis were included in the developmental profile dataset. We reasoned that some mRNAs may be changed in both total level (T) as well as in translational efficiency. To include such mRNAs in our analysis, we divided the values for H and L by T, similar to the treatment of data in the initial study describing translational profiling by microarrays (19). This normalizes the values of H and L by total mRNA and allows one to determine whether the translational efficiency of an mRNA changes, regardless of whether its total intracellular mRNA level changes. Normalization permitted us to include many mRNAs that had been excluded in our previous analysis (2). Finally, we combined these two datasets into a single dataset, resulting in 9,328 records, as described in METHODS (Fig. 1C).
First approach to correlating the two datasets.
We took two approaches to find relationships between polysomal distribution and developmental expression. First, we plotted in Fig. 2A the normalized difference in each mRNA level when ife-4(ok320) is compared with N2 for L vs. the same parameter for H. This scatter plot was then divided into nine classes of polysomal behavior, designated by different colors. Values in each class differ from the adjacent class by at least twofold. For instance, records in class 7 (Fig. 2A, bottom left, orange) represent 339 mRNAs that were decreased at least twofold in ife-4(ok320) compared with N2 in both H and L. Records in class 8 (bottom middle, red) represent 108 mRNAs that were decreased at least twofold in L but were unchanged in H. The majority of mRNAs (8,356; 90%) were changed less than twofold in both L and H (class 5; green). Of the mRNAs that changed distribution upon deletion of ife-4, more were decreased in H, L, or both without being increased in either H or L (classes 4, 7, and 8; n = 664: P value < 0.05 for 195 records) than the opposite, being increased in H, L, or both without being decreased in either H or L (classes 2, 3, and 6; n = 271; P value < 0.05 for 83 records). This bias is reasonable, given that IFE-4 is a translational initiation factor; its loss is more likely to cause a decrease in translational efficiency of mRNAs than an increase.

View larger version (25K):
[in this window]
[in a new window]
|
Fig. 2. Changes in polysomal distribution of mRNAs upon deletion of the ife-4 gene correlate with characteristic developmental expression patterns. A: 9 classes of polysomal behavior were defined on the basis of differences in mRNA abundance in H or L polysomes (normalized for T) when the ife-4(ok320) mutant was compared with the wild-type N2. Classes are defined by signal log ratio (SLR) for ife-4(ok320) vs. N2 worms. Signals from H and L were divided by T, and then SLRs were calculated and averaged over 3 independent experiments. The y-axis shows the SLRs obtained from L polysomes and the x-axis the SLRs from H polysomes. Those in class 5 (green) have 1 < SLR < 1 for both L and H. Those in class 6 (yellow) have SLR > 1 for H but 1 < SLR < 1 for L, etc. Stated another way, mRNAs in class 7 are at least 2-fold less abundant in ife-4(ok320) worms than in N2 for both H and L polysomal fractions, etc. B: centroids representing developmental expression patterns for genes present in each of the 9 polysomal classes defined in A. Nos. in parentheses (bottom left corners) represent the no. of significantly changed genes. There are no significantly changed genes in classes 1 and 9. The genes in class 5 are not changed, by definition.
|
|
To discern any relationships between polysomal behavior and developmental expression, we averaged the expression of mRNAs in each of these classes at each of the six developmental stages of C. elegans (7). We then calculated a centroid representing the average developmental mRNA expression for each class (Fig. 2B). The results indicated that the mRNAs in each of the nine polysomal classes differed in developmental behavior. For instance, those in class 7 (decreased in both H and L polysomes) were expressed in embryo at a level similar to the average of all stages, decreased slightly in the L1 stage, progressively increased up to L4, and then decreased sharply in adult. Classes 4 and 8, which are similar to class 7 in polysomal behavior (Fig. 2A) also had developmental profiles that were similar to those of class 7 (Fig. 2B). Classes that were more different from each other in polysomal behavior appeared more different in developmental profile (e.g., class 7 vs. class 3).
Second approach to correlating the two datasets.
The results in Fig. 2B indicate that a broad correlation exists between classes of mRNAs defined by similar polysomal behavior and their developmental expression, but they say nothing about individual mRNAs. To obtain more definitive evidence for a correlation, and to allow this to be determined for individual mRNAs, we applied a 5 x 5 SOM algorithm to the combined developmental expression dataset (9,328 genes). Next, we devised a method to distinguish each of the 25 SOM output nodes on the scatter plot of polysomal behavior. Various visualization and data analysis packages provide a wide assortment of color scales (10). However, most of these color scales are of a one-dimensional (1D) nature. Linking a 1D color scale to a 2D topological map such as a SOM leads to misinterpretation of data. No matter how the output nodes are labeled, they cannot be unambiguously mapped to a 1D color scale. This is illustrated in the 1D color scale of Fig. 3A, where some adjacent output nodes of the 5 x 5 SOM (e.g., output nodes 5 and 10) have colors that are more different than those of output nodes that are farther apart (e.g., output nodes 5 and 6). The color scale used in Fig. 3A is continuous and gives the appearance of continuity of output nodes, yet the output nodes are categorical and not continuous. If we were to replace the color scale with a categorical scale, such as the so-called "rainbow" color scale, we would perceptually create only a few distinct output node areas, e.g., corresponding to blue, cyan, green, yellow, and red (Fig. 3C). Furthermore, equal steps in the rainbow scale do not correspond to equal steps in color, but look instead like fuzzy bands of color. Thus merely changing the color scale would not solve the problem of perceptual ambiguity of output nodes.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 3. One-dimensional (1D) vs. two-dimensional (2D) color visualization of a self-organizing map (SOM). A: 5 x 5 SOM colored using a 1D color scale. B: 5 x 5 SOM colored using a 2D color scale. A 3-dimensional (3D) color cube is used to create a 2D color scale as described in METHODS. R, red; G, green; B, blue. C: sample rainbow color scale with 5 perceptually distinct colors corresponding to blue, cyan, green, yellow, and red.
|
|
To overcome this problem we developed a 2D color scale. Figure 3B shows a 2D slice of a three-dimensional (3D) color cube with superimposed SOM output nodes. This approach achieves a high level of visual correlation between the neighboring output nodes while preserving the data-output node relationships. It was necessary to include a blue component in the 3D color scheme, even though its value was fixed, because each color consists of three components, R, G and B (Fig. 3B).
This color scale was next used to label the output nodes of the 5 x 5 SOM of developmental expression data (Fig. 4A). This produced a coloring scheme in which output nodes containing mRNAs with similar developmental expression profiles have similar colors, e.g., output nodes 4, 5, 9, and 10. We next displayed all the records contained in each output node as a polysomal scatter plot, thereby linking the two datasets (Fig. 4B). The results indicate that significant differences in polysomal behavior exist among the various output nodes. For instance, output nodes 3, 4, and 5 have a large number of records in class 7 compared with other output nodes (Fig. 4B). Interestingly, output nodes 3, 4 and 5 also have similar developmental expression profiles (Fig. 4A). These results provide more compelling evidence for a correlation between developmental expression and dependence on a particular translational initiation factor for efficient initiation.

View larger version (36K):
[in this window]
[in a new window]
|
Fig. 4. Polysomal behavior of mRNAs for ife-4(ok320) vs. N2 in subpopulations defined by developmental expression patterns. A: 5 x 5 SOM of developmental expression patterns was constructed from the data of Ref. 7. Output nodes are labeled according to the 2D color scale shown in Fig. 3B. Centroids are drawn for each SOM output node. B: scatter plots, as in Fig. 2A except with records colored as in Fig. 4A, representing polysomal behavior of mRNAs falling within each of the designated output nodes. No. of records mapping to a particular output node is shown at bottom right.
|
|
A polysomal scatter plot of records from all output nodes, each labeled according to the 2D developmental color scheme, is presented (see Fig. 5B). This reveals even more strikingly that mRNAs most strongly affected by IFE-4 loss have similar developmental expression patterns. For instance, the records with a strong magenta component (output nodes 4, 5, 9, and 10) are overrepresented in class 7 and rarely occur in other classes. Using a 1D color scheme yields far less information (Fig. 5A). The depiction in Fig. 5B allows one to quickly discern trends, whereas the depiction in Fig. 4B shows the global polysomal behavior of all mRNAs with a specific developmental pattern.

View larger version (28K):
[in this window]
[in a new window]
|
Fig. 5. Polysomal mRNA distribution in ife-4(ok320) vs. N2 colored according to developmental expression pattern. Scatter plot as in Fig. 2A but with each of the 9,328 mRNAs colored according to its location in the developmental SOM output grid of Fig. 4A. Dashed line shows mRNAs belonging to class 7 (decreased by at least 2-fold in both H and L fractions). A: 1D color scale. B: 2D color scale. (The latter is not the same color scheme used in Fig. 2A.)
|
|
IFE-4 expression during C. elegans development.
To further explore the relationship between IFE-4 and translation in specific developmental stages, we determined the expression of IFE-4 itself throughout C. elegans development using green fluorescent protein (GFP; Ref. 1). Expression of IFE-4::GFP was not observed in early embryos, was detected at low levels in late embryos and L1, increased at later larval stages, and then either remained constant in some tissues of adult (pharynx and tail neurons) or decreased (vulval muscle) (Fig. 6). We also quantitated the fluorescence from IFE-4::GFP-expressing worms (Fig. 7). The pattern of IFE-4 protein expression was similar to that seen for mRNAs strongly affected by loss of IFE-4 (Fig. 4, output nodes 3, 4, and 5). Thus the developmental expression profile of mRNAs most affected by IFE-4 loss (class 7) is similar to that of IFE-4 itself.

View larger version (61K):
[in this window]
[in a new window]
|
Fig. 6. Expression of the ife 4::GFP transgene as an extrachromosomal array during C. elegans development. ife 4::GFP(lsEx385) animals were hypochlorite treated to obtain embryos (E) and were grown to the desired developmental stages. Animals from each larval stage (L1, L2, L3, and L4) and adults (A) were imaged with Nomarski microscopy for anatomic observation (top) and with epifluorescence for green fluorescent protein (GFP; bottom).
|
|

View larger version (11K):
[in this window]
[in a new window]
|
Fig. 7. Quantification of ife-4::GFP expression during C. elegans development. Fluorescence measurements were made on different tissues expressing ife-4::GFP from E, L1L4, and A worms, represented in Fig. 6. The maximum fluorescence was registered using IPLab 3.6 software. Mean ± SD maximum fluorescence from all tissues is graphically represented for each developmental stage.
|
|
Developmental stage does not solely determine polysomal behavior.
On the basis of the results of the first and second approaches to finding a correlation between polysomal behavior and developmental expression, one might be tempted to conclude that any mRNA with a developmental expression pattern similar to that of IFE-4 (e.g., output node 5) would shift to lighter polysomes upon deletion of the ife-4 gene (e.g., polysome class 7). Such a finding would mean that developmental expression pattern is both a necessary and sufficient criterion for polysomal shift.
To test whether developmental pattern is necessary, we computed the percentage of records belonging to each polysomal class that fall into each SOM output node of developmental expression pattern (Table 1). The results indicate that there is a significant enrichment of mRNAs from class 7 in SOM output nodes 3, 4, and 5 (50% of all records). These SOM output nodes are clustered together and display a characteristic developmental expression pattern in which mRNAs increase after L1, peak at L4, and drop in adults (Fig. 4A). Classes 4 and 8, representing polysomal behaviors similar to class 7 (Fig. 2A), were enriched in the same SOM output nodes as class 7 but not as much. By contrast, mRNAs belonging to class 3, the class most dissimilar to class 7 (Fig. 2A), were enriched in SOM output nodes 21, 22, and 23 (51% of all records), which represent a completely different developmental expression pattern (Fig. 4A). mRNAs in class 5, which represent the overwhelming majority of genes and were not changed in H or L fractions, were uniformly distributed in all SOM output nodes. These results indicate that developmental expression pattern is a necessary criterion for IFE-4-dependent polysomal behavior.
To determine whether developmental expression pattern is a sufficient criterion, we performed the opposite analysis, tabulating the percentage of records belonging to each developmental output node that fall into a particular polysomal class (Table 2). The results indicate that most records found in each SOM output node fall into class 5 (no change). Output nodes 3 and 4 contain the fewest class 5 records because of the enrichment of class 7 records, but class 5 was nonetheless predominant for these output nodes. Thus most mRNAs that are expressed with a developmental pattern similar to that of ife-4 (output nodes 3, 4, and 5) are not affected by loss of ife-4, indicating that developmental pattern is not a sufficient criterion for ife-4-dependent polysomal behavior. Presumably other features, such as sequences in the 5'- or 3'-untranslated regions, presence of specific binding proteins, tissue-specific expression, etc., are needed in addition to a specific developmental expression pattern.
DISCUSSION
Our observation that a small subset of mRNAs is shifted to lighter polysomes when the ife-4(ok320) mutant is compared with the wild-type N2 partially explains the observed pleiotropic phenotype of ife-4(ok320): an egg-laying defect, low brood size, and defective response to food and serotonin signals (2). Some of the genes that govern these phenotypic traits are included in the group affected by ife-4 loss. However, the biochemical or cell biological features common among ife-4-affected mRNAs are not apparent. In particular, these mRNAs encode proteins belonging to diverse functional categories, including transcriptional regulation, signal transduction, growth and development, and metabolism. However, they do not belong to any specific metabolic or signaling pathway. There is no strong preference for any chromosome. Nor are trans-spliced mRNAs particularly underrepresented, which might have been expected, since IFE-4 strongly discriminates against m37,2,2GpppN-containing caps (8, 14). Although many of the mRNAs altered by ife-4 knockout are expressed in muscle or neuron, this is not a sufficient explanation, since the overwhelming majority of mRNAs expressed in these tissues is not affected by ife-4 knockout. Thus a unifying feature for these mRNAs was not apparent from our initial examination.
One hypothesis to explain a restricted set of mRNAs affected in translational efficiency by the loss of IFE-4 is that they share the same expression pattern during development. To uncover any correlation with expression patterns, we linked our IFE-4 translational profiling database to a published database describing the developmental transcriptome in C. elegans. Of the several databases that are available, we chose the database from Jiang et al. (7) because it contained expression data averaged from 56 replicates for each developmental stage and for most of the C. elegans genes. Our study involved 9,328 genes that appeared in both Jiang et al. and our microarray datasets. Although this is not a gigabyte dataset, the traditional computational or visual methods are not appropriate for uncovering patterns, relationships, and interesting properties in these multivariate datasets. One limitation is that SOM-driven clustering does not provide a mechanism for linking results to classical visualizations such as scatter plots. Furthermore, a single static, source-centric visualization such as a scatter plot could not detect complex properties of the data. In the current study, we developed visualizations that provided coupled views of multivariate and complex data (projections from multidimensional to 2D or 3D space) to provide additional insight into the nontrivial relationships among the genes. The 2D color scale that we used to link the SOM and scatter plot was needed, since both SOMs and scatter plots are 2D entities, and since attempts to do this with a 1D color scale proved inadequate (Fig. 5A). The 2D linking mechanism not only permitted identification of signature expression patterns but also allowed us to focus on specific genes and variables. This approach could be extended to other classical visualizations.
This analysis uncovered a complex relationship between seemingly unrelated biological parameters, expression of an mRNA during development, and dependence on a specific initiation factor for translational efficiency. This relationship is made somewhat more comprehensible by our observation that the expression pattern of IFE-4::GFP protein is similar to that of affected mRNAs, increasing from L1 through the adult stage with a peak at L3L4. These protein levels are consistent with published microarray data giving the developmental expression pattern of ife-4 mRNA (7), taking into account that elevated mRNA levels temporally precede elevated protein levels. This developmental pattern differs from those of ife-1, ife-2, and ife-3 mRNAs (unfortunately, the data for ife-5 mRNA were not reported). However, the expression pattern of IFE-4 does not constitute the entire explanation for eIF4E isoform-specific translation, since most mRNAs with this developmental expression pattern are not affected by ife-4 loss. The visualization approaches we have described here could be used to explore the contributions of other parameters such as tissue specificity of expression, RNA sequence elements, etc., perhaps with results combined in a 3D color scheme.
 |
GRANTS
|
---|
This work was supported by National Institute of General Medical Sciences Grant GM-20818.
 |
ACKNOWLEDGMENTS
|
---|
We acknowledge the staff of the Louisiana State University Health Sciences Center (Shreveport) Research Core Facility for assistance with data analysis and initial calculations relating to microarray data.
Present address of T. D. Dinkova: Departamento de Bioquimica, Facultad de Quimica, Universidad Nacional Autonoma de Mexico, Mexico DF 04510, Mexico.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: R. E. Rhoads, Dept. of Biochemistry and Molecular Biology, Louisiana State Univ. Health Sciences Center, Shreveport, LA 71130-3932 (E-mail: rrhoad{at}lsuhsc.edu).
10.1152/physiolgenomics.00307.2004.
 |
REFERENCES
|
---|
- Chalfie M, Tu Y, Euskirchen G, Ward WW, and Prasher DC. Green fluorescent protein as a marker for gene expression. Science 263: 802805, 1994.[ISI][Medline]
- Dinkova TD, Keiper BD, Korneeva NL, Aamodt EJ, and Rhoads RE. Translation of a small subset of Caenorhabditis elegans mRNAs is dependent on a specific eukaryotic translation initiation factor 4E isoform. Mol Cell Biol 25: 100113, 2005.[Abstract/Free Full Text]
- Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 1486314868, 1998.[Abstract/Free Full Text]
- Gasch AP and Eisen MB. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 3: 59.5159.22, 2002.
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, and Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531537, 1999.[Abstract/Free Full Text]
- Jankowska-Anyszka M, Lamphear BJ, Aamodt EJ, Harrington T, Darzynkiewicz E, Stolarski R, and Rhoads RE. Multiple isoforms of eukaryotic protein synthesis initiation factor 4E in C. elegans can distinguish between mono- and trimethylated mRNA cap structures. J Biol Chem 273: 1053810542, 1998.[Abstract/Free Full Text]
- Jiang M, Ryu J, Kirlay M, Duke K, Reinke V, and Kim SK. Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans. Proc Natl Acad Sci USA 98: 218223, 2001.[Abstract/Free Full Text]
- Keiper BD, Lamphear BJ, Deshpande AM, Jankowska-Anyszka M, Aamodt EJ, Blumenthal T, and Rhoads RE. Functional characterization of five eIF4E isoforms in Caenorhabditis elegans. J Biol Chem 275: 1059010596, 2000.[Abstract/Free Full Text]
- Kohonen T. Self-Organizing Maps. Berlin: Springer-Verlag, 1995.
- Levkowitz H. Color Theory and Modeling for Computer Graphics, Visualization, and Multimedia Applications. Boston, MA: Kluwer Academic, 1997.
- Lodish HF. Model for the regulation of mRNA translation applied to haemoglobin synthesis. Nature 251: 385388, 1974.[ISI][Medline]
- Roy PJ, Stuart JM, Lund J, and Kim SK. Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 418: 975979, 2002.[CrossRef][ISI][Medline]
- Sonenberg N, Hershey JWB, and Mathews MB. Translational Control of Gene Expression. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory, 2000.
- Stachelska A, Wieczorek Z, Ruszczynska K, Stolarski R, Pietrzak M, Lamphear BJ, Rhoads RE, Darzynkiewicz E, and Jankowska-Anyszka M. Interaction of three Caenorhabditis elegans isoforms of translation initiation factor eIF4E with mono- and trimethylated mRNA 5' cap analogues. Acta Biochim Pol 49: 671682, 2002.[ISI][Medline]
- Sulston J and Horovitz HR. Post-embryonic cell lineages of the nematode Caenorhabditis elegans. Dev Biol 56: 110156, 1977.[CrossRef][ISI][Medline]
- Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, and Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 29072912, 1999.[Abstract/Free Full Text]
- Wang J and Kim SK. Global analysis of dauer gene expression in Caenorhabditis elegans. Development 130: 16211634, 2003.[Abstract/Free Full Text]
- Zhang Y, Ma C, Delohery T, Nasipak B, Foat BC, Bounoutas A, Bussemaker HJ, Kim SK, and Chalfie M. Identification of genes expressed in C elegans touch receptor neurons. Nature 418: 331335, 2002.[CrossRef][ISI][Medline]
- Zong Q, Schummer M, Hood L, and Morris DR. Messenger RNA translation state: the second dimension of high-throughput expression screening. Proc Natl Acad Sci USA 96: 1063210636, 1999.[Abstract/Free Full Text]