Detection of genes with tissue-specific expression patterns using Akaike’s information criterion procedure

Koji Kadota1, Shin-Ichiro Nishimura1, Hidemasa Bono2, Shugo Nakamura3, Yoshihide Hayashizaki2, Yasushi Okazaki2 and Katsutoshi Takahashi1

1 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064 Japan
2 Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama 230-0045 Japan
3 Department of Biotechnology, University of Tokyo, Tokyo 113-8657 Japan


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 
We applied a method based on Akaike’s information criterion (AIC) to detect genes whose expression profile is considerably different in some tissue(s) than in others. Such observations are detected as outliers, and the method we used was originally developed to detect outliers. The main advantage of the method is that objective decisions are possible because the procedure is independent of a significance level. We applied the method to 48 expression ratios corresponding to various tissues in each of 14,610 clones obtained from the RIKEN Expression Array Database (READ; http://read.gsc.riken.go.jp). As a result, for several tissues (e.g., muscle, heart, and tongue tissues that contain similar cell types) we objectively obtained specific clones without any "thresholding." Our study demonstrates the feasibility of the method for detecting tissue-specific gene expression patterns.

outlier detection; tissue-specific expression; DNA microarray; AIC; expression analysis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 
ONE OF THE IMPORTANT CHALLENGES of microarray analysis is the identification of genes with tissue-specific expression patterns whose characteristics facilitate the identification of a large number of possible markers. Gene expression levels have been estimated by counting the frequency of transcripts in unbiased cDNA libraries (20) and semi-quantitatively by oligonucleotide-grafted arrays (18). In this research area, several theoretical and computational approaches for the identification of differential gene expression have been proposed (4, 7, 10, 21, 22, 24), and most of the investigations aimed at describing the transcriptional profiles in a tissue (or two tissues) that distinguished them from the others.

Bortoluzzi et al. (6), who analyzed 4,080 putative muscle genes, found that most genes were present in at least one additional tissue, possibly because all cells have a cytoskeleton, most cells exhibit some contractile properties, and most tissues share certain types of cells.

For the analysis of several tissue-specific expression patterns, a few methods, e.g., analysis of variance (ANOVA) and the so-called template-matching method, can be applied by assigning a confidence estimate such as a P value to the markedly contracting genes (21). However, those methods are less useful in the following cases. First, in the two-color competitive hybridization assays on cDNA microarrays customarily used in most of the published studies, there are often strongly biased origins of transcripts on the glass slides. For example, in a mouse 18,816-clone array made by 23 libraries, 3,110 clones were derived from tongue, whereas only 17 clones were from spleen (19). This bias can result in the mistaken conclusion that a large number of clones are upregulated with a high statistical significance when tongue tissue is used as the target. Hence, this bias may confuse the confidence estimation. Furthermore, since hybridization experiments are typically noisy (8, 13), the combined single-expression matrix of tissues and genes is often included in the missing data after data processing (15). Second, among the expression levels of particular tissues whose levels are significantly different from those of other tissues, similar intra-tissue levels cannot always be identified. For example, when an observed expression ratio profile for 48 tissues is (2, 1, 1, 0,..., 0) and the template profile is (1, 1, 1, 0,..., 0), then the P value between these profiles is 5.2E-23. The P value increases to 1.49E-07 if the observed profile is (10, 1, 1, 0,..., 0). In general, the greater the number of different levels in particular tissues of a markedly contracting clone, the lower is the confidence level assigned to the clone. However, such clones are just as important as are clones with similar expression levels of particular tissues.

A large expression data matrix of 49 adult and embryonic mouse tissues and 18,816 mouse cDNAs, and a Web interface (called READ, for "RIKEN Expression Array Database") have recently been constructed (5, 19). The READ system facilitates tissue-specific expression searches by inputting an optional value for the expression ratios under the "search by tissue form" option. It also provides a search tool, RINGENE, that can dynamically calculate expression neighbors (or anti-neighbors) to a hand-selected clone across the expression pattern of specified tissues based on an arbitrary threshold (5); thus it involves "thresholding."

Akaike’s information criterion (AIC), introduced almost 30 years ago by H. Akaike, is an information criterion for the identification of an optimal model from a class of competing models (2). Kitagawa (17) subsequently used AIC to detect outliers, and Ueda (25) more recently simplified AIC. The most significant advantages of those methods are 1) it is possible to reach a relatively objective decision because the procedure does not require the selection of a significance level, and 2) various situations (e.g., single outlier, multiple lowest or highest outliers, two-sided and grouped cases) can be treated equally. We now report the application of a simplified method for the identification of markedly contracting clones from mouse cDNA microarray data. The validity of this novel approach is demonstrated by the distribution of the data detected as outliers and by the comparison with the other method.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 

Microarray data.
The gene expression data we used were obtained from the READ database (http://read.gsc.riken.go.jp) and originally consisted of 49 adult and embryonic mouse tissues and 14,610 clones. We excluded the expression data derived from "sv40t" tissue in our analysis because all but that tissue were in the normal state. Since the data for each of the tissues are stored as a ratio to an identical reference [embryonic day 17.5 (E17.5), whole body] and low-quality data are filtered out by the PRIM ("preprocessing implementation for microarray") filtration program (15), expression differences across tissues can be compared (19).

Minimum AIC procedure.
In general, the problem of identifying tissue-specific expression patterns in multisource data can be viewed as an outlier identification problem (10). We applied a procedure based on AIC to detect outliers (17, 25). Unlike other conventional approaches (9, 11), this method has several favorable characteristics for dealing with ratio-type microarray data: 1) determination of the number of outliers and the "test" can be performed simultaneously, 2) various situations (e.g., single outlier, several lowest or highest outliers, two-sided and grouped cases) can be treated equally, and 3) objective decision-making is possible because the procedure does not require the selection of a significance level such as 1% or 5% (17).

According to Ueda (25), a statistic U to identify outliers is defined as

where (n+s) denotes the total number of observations except for missing data across the 48 tissues (n + s <= 48), s denotes the number of outlier candidates, and {sigma} denotes the standard deviation of scores assigned to n samples excluding outlier candidates.

The statistic U has a clear interpretation in outlier detection. A low value for the first term in the equation does, whereas a high value does not, indicate that the combination of s outlying observations is likely to be bona fide. The second term indicates increased unreliability due to an increased number of parameters (in this case, s). Therefore, a low value for the first term and a high value for the second term would indicate the incorrect prediction of non-outliers as outliers and the correct prediction of true outliers (i.e., low sensitivity and high specificity). The best approximating combination is one that achieves the lowest value for U and is termed the minimum AIC estimate (MAICE). The procedure aimed at obtaining the MAICE of the models is called the minimum AIC procedure (17).

Detecting tissue-specific expressions as outliers.
The minimum AIC procedure is executed for each clone. In the procedure, (n + s) observations for each clone are included (n + s <= 48 except for missing data). Consider, for example, centrin2 (clone ID 1700007M18), which is known to be specifically upregulated in testis (12, 27). We expect the observation (expression ratio) in testis to be identified as an outlier on the high (upregulated) side since the clone is derived from mouse testis ("17" in the clone ID indicate testis).

The (n + s) observations are normalized by subtracting the mean and dividing by the standard deviation, then sorted in order of increasing magnitude by -1.86, -1.08, ..., 1.36, 5.66. With the resultant values, MAICE is decided by considering various combinations of outlier candidates starting from both sides of the values. We set the maximum number of the outlier candidates to be half of the (n + s) observations. Accordingly, in practice, we consider the number of combinations as X(X + 1)/2, where X = 1 + (n + s)/2 and the value for the second term is cut off at and below the figures of the first decimal place. For example, MAICE is decided by considering 25(25 + 1)/2 combinations for a clone with 48 observations; for a clone with 46 or 47 observations, it is 24(24 + 1)/2. A schematic illustration of this procedure is shown in Fig. 1. Using this procedure on the clone, the MAICE is the case with two outliers, the observation in testis on the upregulated side and in muscle on the downregulated side. The result we obtained for the upregulated side coincides with earlier reports (12, 27). We applied the procedure to each of 14,610 clones and constructed a matrix (called outlier matrix) for storing the information about the outliers detected in the up- and downregulated side. The program was developed in the C language, and the computation time to calculate 14,610 clones across 48 tissues was about 10 s by a Pentium III 933 MHz (1 GB memory) on RedHat Linux 7.1.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 1. The minimum AIC procedure. The expression profile of centrin2 (clone ID 1700007M18) serves as the example. First, the expression ratios [depicted as pseudo-color images; red, upregulated in the above tissue compared with the reference tissue (whole E17.5 embryos); green, downregulated] are sorted and normalized (top). Then, the statistics U values that correspond to many combinations of outlier candidates on both sides are calculated, and the minimum AIC estimate (MAICE) is decided. Outliers on the high (or low) side are regarded as observations in the corresponding tissue-specific upregulation (or downregulation). Some combinations that have high potential to be the MAICE are shown. AIC, Akaike’s information criterion.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 
We studied tissue-specific gene expression patterns in a publicly available gene expression matrix consisting of 48 fetal and adult mouse tissues and 14,610 clones. To identify aberrant observations as outliers, we used a method based on AIC, which has been used for modeling in various fields of statistics, engineering, and numerical analysis (1, 3, 23), and constructed an "outlier matrix" that corresponds to the original expression matrix.

Table 1 shows the number of observations and the outliers for each of the examined tissues. Of 669,214 observations, 16,389 (2.45%) were identified as outliers on the up- or downregulated side. Interestingly, the number of outliers in testis was the highest; in "E16head" it was the lowest. The trend in the number of outliers across the examined tissues was dissimilar from the number in clones derived from those tissues on the array (correlation coefficient between the numbers across tissues, 0.17). We posit that this was ascribable to the biased collection of cDNA clones to reduce the chance of capturing clones already collected in a cDNA library.


View this table:
[in this window]
[in a new window]
 
Table 1. Numbers of outliers and clones with data in each of tissues

 
There were 5,825 clones without aberrant observations on either the up- or downregulated side across the tissues; 4,252 clones had two or more aberrant observations (Table 2). Two clones manifested the maximum of 10 observations. One of these, regenerating islet-derived 1 (reg gene; clone ID 1810029P16), showed important upregulation in 9 tissues, most of these were from digestive-related organs (liver, small intestine, cecum, spleen, colon, E10 whole body, stomach, N10 intestine, and pancreas); downregulation was noted in neonatal day 10 (N10) skin tissue. This finding is not surprising because the human reg gene is expressed in pancreas, gastric mucosa, and kidney and ectopically in colon and rectal tumors (26). In another clone, myelin-associated oligodendrocytic basic protein (MOBP gene; clone ID 1500011O05), a member of the central nervous system (CNS) myelin-constituting proteins (28), important expressions were found only on the upregulated side. Half of the tissues were CNS related: muscle, E10 whole body, N0 skin, eyeball, mammary gland (lac_10), cerebellum, cortex, brain, thymus, and E13 head.


View this table:
[in this window]
[in a new window]
 
Table 2. Populations of outlying observations detected in each of clones

 
The distribution of the observations (expression ratios) detected as outliers makes it possible to assess the adequacy of the minimum AIC procedure. If the majority of the absolute values of the outliers is higher than the absolute values of non-outliers, then confidence in our results is increased. Table 3 shows the distribution of the expression ratios detected to be outliers. As expected, a low percentage of the outliers had ratios of -1.0 ~ 1.0. The closest observation to zero in the outliers was -0.04689 in thymus (preg1) tissue in a 2510028M24 clone. The outlier was detected on the upregulated side because the majority of the observations in that clone were negatives. On the other hand, for values >5.0, 1,682 of 1,711 (98.3%) possible observations were identified to be outliers. An example of a clone that included the 29 remaining observations was clone 2310028E01. There were three observations with values >5.0 in that clone (5.05, 6.03, and 8.26 in spleen, stomach, and pancreas, respectively). Of the three high values, only 8.26 in pancreas was detected to be an outlier because of the high deviation in the observations across tissues.


View this table:
[in this window]
[in a new window]
 
Table 3. Relationship between intervals of ratios and the percentages of outliers

 
The outlier matrix we constructed makes it possible to search for genes with particular tissue-specific expression patterns. Figure 2 shows three examples of muscle-related (heart, tongue, and muscle tissue; Fig. 2A), lung-related (lung, E16lung, and N0lung tissue; Fig. 2B), and brain-related patterns (brain and cerebellum; Fig. 2C). It is noteworthy that obvious distinctions between those tissues and others can be extracted; this is not possible with the conventional search in READ (5). Unlike the conventional ranking of genes by P values, our procedure cannot rank clones individually. Instead, the expression patterns are ranked in order of the number of missing data (represented by the color gray) in each of the examples, as a surrogate estimate of the confidence of tissue specificity.



View larger version (90K):
[in this window]
[in a new window]
 
Fig. 2. Pseudo-color images of tissue-specific expression profiles by the minimum AIC procedure. Upregulation in muscle-related (A), lung-related (B), and brain-related (C) tissues is shown as the examples. Each row corresponds to a clone whose clone ID is shown at the right; the columns correspond with expression ratios in 48 different tissues. Gene expression ratios are depicted according to the color scale shown at top right. Gray indicates missing or excluded data. Arrows point to values of tissues whose expression profiles are markedly different from the others. The expression profiles are categorized according to the number of the values (excluding gray) with arrows.

 
Table 4 lists the descriptions corresponding to Fig. 2 and the P values with the ranks calculated by a template-matching method (21). It is reasonable to assume that there are large proportions of the detected clones that have the coincidences reported in the literature, although their number is partly dependent on the number of clones printed on the array. Examples of the three patterns are titin for muscle-specific clones, advanced glycosylation end product-specific receptor for lung-specific clones, and calbindin for brain-specific clones (6, 14).


View this table:
[in this window]
[in a new window]
 
Table 4. List of clones with tissue-specific expression profiles

 
Figure 3 shows the top-ranked clones in the template-matching method (the numbers correspond to those in Fig. 2) as a representative over conventional methods in which clones were ranked in order of the confidence level defined as a P value. The numbers of significant clones in muscle-, lung-, and brain-related tissues were 223, 75, and 104, respectively (at P < 6.84E-05 calculated by 1/14,610 clones). Compared with the results of the template-matching method, we observed a good match in the muscle-related clones in top-ranked clones but not in the lung- and brain-related clones (clones with an asterisk in Fig. 3 were also detected by the minimum AIC procedure). For example, in the case of lung-related clones, the second and third highest clones in the template-matching method were not detected by the minimum AIC procedure: for the second highest clone, values in "N0lung" and "lung" (not in "E16lung") were detected as outliers by the procedure, whereas for the third highest clone, values in "N10cerebellum" with the above three lung-related tissues were also detected as outliers. On the other hand, the disagreement between the two methods in the brain-related (upregulated only in "brain" and "cerebellum") clones was because the minimum AIC procedure identified some extra outliers (especially in "eyeball" and/or "cortex") in all the highest clones detected only upon pattern matching. As the worst case in pattern matching, the order of the three highest values in the 1500031B22 clone (the lowest in Fig. 3) was 4.197 for eyeball, 4.106 for brain, and 3.635 for cerebellum, despite the brain-related pattern matching. We also obtained a similar result to the pattern matching when we applied the ANOVA algorithm (for the result, see Supplementary Material, which is published at the Physiological Genomics web site).1  



View larger version (100K):
[in this window]
[in a new window]
 
Fig. 3. Pseudo-color images of tissue-specific expression profiles by the template-matching method. Top-ranked clones with the P values are shown. The numbers of the clones correspond to those identified in Fig. 2. *Clones also detected by the minimum AIC procedure.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 
The main aim of this work was to develop an efficient method for the identification of clones with tissue-specific expression patterns from a gene-expression matrix. The method was originally developed by Kitagawa (17) for detecting outliers in observations. We directly applied the simplified version of Ueda (25) to 48 observations corresponding to the tissues of each of 14,610 clones.

The method is based on AIC, whose information criterion has been used for modeling in the fields of statistics, engineering, numerical analysis, and recently gene expression analysis (1, 3, 23; and Kadota K, Tominaga D, Akiyama Y, and Takahashi K, unpublished observations). A detailed explanation of the method has been presented elsewhere (17, 25; and Kadota et al., unpublished observations). The most significant advantage of the method is that it is possible to arrive at an objective decision because the method does not require the selection of a significance level such as 1% or 5%.

The current procedure is quite different from previous procedures in which genes were ranked in order of the confidence level defined as a P value. Both the current and conventional strategies entail pros and cons. Answers derived by the current means are free of confidence since the procedure is independently applied to each N clone array. Therefore, if N is increased, the clones examined with the current strategy continue to be present. Use of conventional means, on the other hand, may result in the disappearance of some clones, and the confident P value will be changed.

Interpretation regarding the population of outliers in the examined tissues (Table 1) is difficult and strongly dependent on the distribution of the observations (correlation coefficient between the number of outliers and the standard deviations across 48 tissues, 0.82). On the other hand, the correlation coefficient between the number of outliers and the cDNA clones printed on the array (18 tissues) was relatively low at 0.17, possibly because of a bias in the collection of cDNA clones from biased libraries. These considerations point to the importance of valid normalization, a topic discussed at the Microarray Gene Expression Data Society Meeting (MGED; http://www.mged.org). The data analyzed here were applied to the conventional global median normalization strategy. It supposes that the confidences of two expression ratios such as 2,000/1,000 and 20/10 are essentially the same, although the former must be more robust than the latter. Application of a sophisticated data processing method such as the intensity-dependent method (29) will lead to the acquisition of more confident results.

The detection of tissue-specific gene expression in the data set yielded important findings. Among tissues derived from two or more outliers in each clone (Table 2), we observed histological similarities and the proximity of the regions. This is reasonable because most such tissues share certain types of cells. Moreover, the strong tendency toward the upregulated side in the outlying observations is validated by the degree of homogeneity of the target samples (see Table 3). Namely, the whole E17.5 embryos used as a reference are quite heterogeneous compared with the experimental target tissues (19). Therefore, we conclude that overall, we observed target tissue-dominant upregulation (in contrast to the whole E17.5 embryo-specific downregulations).

Once an outlier matrix that corresponds to the original gene-expression matrix is constructed, it will become easy to extract specific expression patterns from arbitrarily selected tissues, lung-specific patterns, for example (see Fig. 2). The clustering technique frequently used in microarray analysis might be able to obtain a cluster, most of which shows a tissue-specific pattern. However, the procedure includes manual retrieval with an arbitrarily determined threshold. Moreover, there is no guarantee that such clusters will be formed.

The problem of "thresholding" persists in the conventional methods (e.g., template-matching and ANOVA). Although we showed template matching here only as an example (see Fig. 3), we also observed that the results of the ANOVA method were similar to those obtained by template matching (see Supplementary Material at the Physiological Genomics web site). One can set a confident threshold P value as <1/(total number of clones) for a single comparison of a template. The large number of comparisons (248) in the 48-tissue array is considerable. If we apply the Bonferroni correction to eliminate the problem of multiple comparisons, then we must raise the confident threshold according to the number of comparisons; thus the confident number of the tissue-specific clones enters into the considerations.

We observed a remarkable difference in the detected brain-specific clones (highly expressed only in "brain" and "cerebellum") between the minimum AIC procedure and the template-matching method. The former seemed to be able to detect the clones, whereas the latter also included some extra observations especially in "cortex" and "eyeball." We explain the unsatisfactory results obtained by the template-matching method as follows. One reason may be that template matching based on the correlation coefficient considers all variants to be essentially equivalent, a fact already discussed by Pavlidis and Noble (21). Another reason may be the strong similarities among the four tissues: hierarchical clustering of the 49 tissues showed the cluster consisting of the four tissues (15). Accordingly, we conclude that the minimum AIC procedure is specifically applicable to the extraction of specific expression patterns from arbitrarily selected tissues under the condition of coexisting similar tissues.

The advantages of the method we proposed here are 1) the acquired answer is objective and 2) various situations (e.g., single outlier, multiple lowest or highest outliers, two-sided and grouped cases) can be treated equally. As these characteristics mirror those of the method currently in wide use, our method appears to be readily applicable to various expression data.


    ACKNOWLEDGMENTS
 
We thank K. Shimizu for helpful comments. We also thank M. Terauchi and M. Kadota for valuable technical assistance.

This work was supported by a Grant-in-Aid for Scientific Research on Priority Areas (C) "Genome Information Science" from the Ministry of Education, Culture, Sports, Science and Technology of Japan.


    FOOTNOTES
 
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: K. Takahashi, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064 Japan (E-mail: takahashi-k{at}aist.go.jp).

10.1152/physiolgenomics.00153.2002.

1 The Supplementary Material for this article is available online at http://physiolgenomics.physiology.org/cgi/content/full/12/3/251/DC1. Back


    References
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Akaike H. Statistical predictor identification. Ann Inst Statist Math 22: 203–217, 1970.[ISI]
  2. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Int Symp Information Theory 2nd, 1973, p. 267–281.
  3. Akaike H. A Baysian analysis of the minimum AIC procedure. Ann Inst Statist Math 30: 9–14, 1978.[ISI]
  4. Audic S and Claverie JM. The significance of digital gene expression profiles. Genome Res 7: 986–995, 1997.[Abstract/Free Full Text]
  5. Bono H, Kasukawa T, Hayashizaki Y, and Okazaki Y. READ: RIKEN Expression Array Database. Nucleic Acids Res 30: 211–213, 2002.[Abstract/Free Full Text]
  6. Bortoluzzi S, d’Alessi F, Romualdi C, and Danieli GA. The human adult skeletal muscle transcriptional profile reconstructed by a novel computational approach. Genome Res 10: 344–349, 2000.[Abstract/Free Full Text]
  7. Bortoluzzi S, d’Alessi F, Romualdi C, and Danieli GA. Differential expression of genes coding for ribosomal proteins in different human tissues. Bioinformatics 17: 1152–1157, 2001.[Abstract/Free Full Text]
  8. Claverie JM. Computational methods for the identification of differential and coordinated gene expression. Hum Mol Genet 8: 1821–1832, 1999.[Abstract/Free Full Text]
  9. Dixon WJ. Processing data for outliers. Biometrics 22: 74–89, 1953.
  10. Greller LD and Tobin FL. Detecting selective expression of genes and proteins. Genome Res 9: 282–296, 1999.[Abstract/Free Full Text]
  11. Grubbs FE. Procedures for detecting outlying observations in samples. Technometrics 11: 1–21, 1969.[ISI]
  12. Hart PE, Glantz JN, Orth JD, Poynter GM, and Salisbury JL. Testis-specific murine centrin, Cetn1: genomic characterization and evidence for retroposition of a gene encoding a centrosome protein. Genomics 60: 111–120, 1999.[ISI][Medline]
  13. Herwig R, Aanstad P, Clark M, and Lehrach H. Statistical evaluation of differential expression on cDNA nylon arrays with replicated experiments. Nucleic Acids Res 29: e117, 2001.[Abstract/Free Full Text]
  14. Hyden H and Ronnback L. S100 on isolated neurons and glial cells from rat, rabbit and guinea pig during early postnatal development. Neurobiology 5: 291–302, 1975.[ISI][Medline]
  15. Kadota K, Miki R, Bono H, Shimizu K, Okazaki Y, and Hayashizaki Y. Preprocessing implementation for microarray (PRIM): an efficient method for processing cDNA microarray data. Physiol Genomics 4: 183–188, 2001.[Abstract/Free Full Text]
  16. Kitagawa G. On the use of AIC for the detection of outliers. Technometrics 21: 193–199, 1979.[ISI]
  17. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, and Brown EL. Expression monitoring by hybridization to high-density oligo-nucleotide arrays. Nat Biotechnol 14: 1675–1680, 1996.[ISI][Medline]
  18. Miki R, Kadota K, Bono H, Mizuno Y, Tomaru Y, Carninci P, Itoh M, Shibata K, Kawai J, Konno H, Watanabe S, Sato K, Tokusumi Y, Kikuchi N, Ishii Y, Hamaguchi Y, Nishizuka I, Goto H, Nitanda H, Satomi S, Yoshiki A, Kusakabe M, DeRisi JL, Eisen MB, Iyer VR, Brown PO, Muramatsu M, Shimada H, Okazaki Y, and Hayashizaki Y. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc Natl Acad Sci USA 98: 2199–2204, 2001.[Abstract/Free Full Text]
  19. Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, and Matsubara K. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat Genet 2: 173–179, 1992.[ISI][Medline]
  20. Pavlidis P and Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol 2: research0042, 2001.[Medline]
  21. Romualdi C, Bortoluzzi S, and Danieli GA. Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. Hum Mol Genet 10: 2133–2141, 2001.[Abstract/Free Full Text]
  22. Sakamoto Y and Akaike H. Analysis of cross classified data by AIC. Ann Inst Statist Math 30: 185–197, 1978.[ISI]
  23. Stekel DJ, Git Y, and Falciani F. The comparison of gene expression from multiple cDNA libraries. Genome Res 10: 2055–2061, 2000.[Abstract/Free Full Text]
  24. Ueda T. Simple method for the detection of outliers [in Japanese]. Japanese J Appl Stat 25: 17–26, 1996.
  25. Watanabe T, Yonekura H, Terazono K, Yamamoto H, and Okamoto H. Complete nucleotide sequence of human reg gene and its expression in normal and tumoral tissues. The reg protein, pancreatic stone protein, and pancreatic thread protein are one and the same product of the gene. J Biol Chem 265: 7432–7439, 1990.[Abstract/Free Full Text]
  26. Wolfrum U and Salisbury JL. Expression of centrin isoforms in the mammalian retina. Exp Cell Res 242: 10–17, 1998.[ISI][Medline]
  27. Yamamoto Y, Mizuno R, Nishimura T, Ogawa Y, Yoshikawa H, Fujimura H, Adachi E, Kishimoto T, Yanagihara T, and Sakoda S. Cloning and expression of myelin-associated oligodendrocytic basic protein. A novel basic protein constituting the central nervous system myelin. J Biol Chem 269: 31725–31730, 1994.[Abstract/Free Full Text]
  28. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, and Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: e15, 2002.[Abstract/Free Full Text]