Discovering New Hormones, Receptors, and Signaling Mediators in the Genomic Era

Sheau Yu Hsu and Aaron J. W. Hsueh

Division of Reproductive Biology Department of Gynecology and Obstetrics Stanford University School of Medicine Stanford, California 94305-5317


    INTRODUCTION
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
The human genome has 3 billion base pairs with 3–5% as coding exons that account for about 100,000–140,000 genes. The Human Genome Project is scheduled to complete the sequencing of the entire genome in the next few years. At the present time, the Expressed Sequence Tags (ESTs) in the GenBank represent >60% of the human genes. Mammalian genes exist in families as reflected by the presence of multiple paralogs. With recent advances in bioinformatic tools, the massive resources in the GenBank are easily accessible to investigators through the Internet and allow opportunities for discoveries that could impact all fields of biomedicine. Using Internet-based tools to search the massive genome databases, it is possible to identify new endocrine genes based on evolutionary conservation, domain homology, and tissue expression patterns. Examples for the discovery of paralogous genes include: LGRs (leucine-rich repeat containing, G protein-coupled receptors) that are evolutionarily conserved as compared with gonadotropin and TSH receptors, and RIFs (relaxin-insulin-like factors) that show domain conservation with known growth factors. Using the GenBank searches, polymorphism of previously characterized genes can also be identified. The rapid pace of genomic revolution allows the discovery of new genes in the endocrine systems, identification of new drug targets, analysis of genetic susceptibility of individuals to endocrine diseases, and customized drug therapy based on gene polymorphism. It is anticipated that the future integration of gene sequence-based approaches with transcript expression profiles and protein functions will lead to a more complete understanding of the circuitry controlling the endocrine pathways.


    GENOMIC REVOLUTION
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
The recent genomic revolution officially started in 1992 with the launching of the Human Genome Project (1). It is anticipated that, within the next few years, sequencing of the entire 3 billion (3 x 109) base pairs of the human genome will be completed. At the present time, the complete genomic sequences of more than 2 dozen organisms including yeast Saccharomyces cerevisiae and nematode Caenorhabditis elegans are already available. This progress has been compared with the completion of the Periodic TableGo of the Elements at the end of the 19th century (2) and became possible through the largely automated sequencing of libraries of genomic DNA and expressed sequence tags (ESTs) that represent fragments of sequenced cDNAs from diverse tissues. At the beginning of April 2000, the GenBank contained more than 2.4 billion bases of human genome sequences and the complete sequencing of the entire human genome has just been announced (for an update of deposited sequences see http://www.ornl.gov/hgmis/project/ progress.html). In addition to ESTs, these sequences are represented by single-pass genome survey sequences (GSS) encompassing the ends of large genomic fragments, unfinished and unordered high-throughput genomic sequences (HTGS), and finished nonredundant (NR) sequences. Among the various GenBank sequences, the EST database is particularly useful for identifying novel genes (3, 4). These short sequences represent transcribed genes and contain continuous protein coding regions. The 1.8 million human ESTs presently in the GenBank have been clustered into approximately 92,224 unique transcripts (Unigene Build 107, Jan. 27, 2000) based on their overlapping sequences, therefore accounting for more than 60% of human genes (http://www.ncbi.nlm.nih.gov/UniGene/Hs.stats.shtml). It is clear that the EST sampling approach has identified a majority of transcripts encoded by our genome. Indeed, more than 90% of known or nonredundant genes of human or mouse origin are represented by at least one EST (1). Thus, the EST and other gene sequence databases have become indispensable for gene discovery.


View this table:
[in this window]
[in a new window]
 
Table 1. Evolving Methods for Discovering Endocrine Genes

 

    TRADITIONAL VS. GENBANK APPROACHES FOR THE DISCOVERY OF ENDOCRINE GENES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Using function- and phenotype-driven approaches, traditional studies of endocrine signaling molecules are based upon purification of hormones and receptors in specific glands and targets, respectively. Intracellular signaling molecules were also isolated through protein purification and, more recently, through protein-protein interaction assays such as coprecipitation of proteins (5) (Table 1Go). After the advent of recombinant DNA technology, the discovery of endocrine signaling molecules has been expanded to include searches based on 1) sequence similarity to known genes using low-stringency hybridization, 2) PCR with degenerate primers, and 3) differential cloning (subtraction and differential display) (6, 7). The protein-protein interaction searches were also expanded to include expression cloning based on ligand or antibody binding (8, 9, 10) and yeast two-hybrid interaction assay (11, 12, 13). In addition, some endocrine genes (e.g. MEN1) were isolated based on genetic analysis in diseased individuals and positional cloning after chromosome walking (14). Although the recombinant DNA approach has allowed the successful discovery of new hormones, receptors, and intracellular mediators, the genomic revolution exemplified by the rapid GenBank expansion has prompted another paradigm shift in the identification of novel endocrine factors with the ultimate goal of finding all endocrine regulatory genes. With major advances in bioinformatic tools available on the Internet (Table 2Go), the massive resources in the GenBank are easily accessible to investigators and allow opportunities for major discoveries that could impact endocrinology as well as all fields of biomedicine. The new approach takes advantage of the unprecedented power of modern computing tools and minimizes the use of the time-consuming laboratory bench-based wet science. Initial discoveries of novel genes can be performed in minutes, followed by further verification and characterization using the bench-top approaches. In addition, the tedious stringency control procedures in the traditional cloning methods based on sequence matching (e.g. degenerate PCR and low-stringency hybridization) are now replaced by changing the expectation or threshold values during computational searches. The scale of economy for this approach is obvious: computational searches take only minutes and can encompass sequences derived from diverse organisms and tissue or cell origins. A comparison of the computational search vs. PCR amplification for novel gene discovery is shown in Table 3Go.


View this table:
[in this window]
[in a new window]
 
Table 2. Bioinformatic Tools and Useful Web Links for New Gene Discovery

 

View this table:
[in this window]
[in a new window]
 
Table 3. Sequence Alignment Search vs. PCR Identification of Novel Genes

 

    GENBANK SEARCHES FOR NOVEL PARALOGOUS ENDOCRINE GENES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Although previously predicted, based on traditional approaches, the concept that mammalian genes are present in families is strengthened by the recent comparison of genome sequences in diverse organisms (15). In addition to the identification of orthologous human genes with conserved structure and function in lower organisms (16, 17), multiple paralogous genes evolve from the same ancestral gene during the evolution of simple organisms into complex ones. Genes evolved through domain rearrangement and gene fusion. New genes can change their molecular specificity but remain in the similar regulatory circuitry or may be recruited to serve completely new functions in a novel regulatory linkage. In addition, new domains can evolve through domain combination and shuffling. Furthermore, duplication of a chromosomal fragment or even the whole genome provides opportunities for the evolution of novel genes (18, 19, 20). By comparing genomes from different phylogenies, it has been hypothesized that during early chordate evolution, before Cambrian explosion and during the early Devonian period, two entire genome duplications took place (21, 22), which led to the generation of multiple mammalian paralogs and the opportunity to develop complex regulatory mechanisms and functions.

To allow efficient comparison of DNA or protein sequences, a variety of paired sequence comparison programs using different scoring matrices have been developed for aligning individual sequences with a catalogued sequence database. The Basic Local Alignment Search Tool (BLAST) used by National Center for Biotechnology Information (NCBI) and other related programs have become the essential tools for gene sequence analysis and the deduction of their functions (23, 24, 25) (Table 2Go). To facilitate the identification of genes that are related but with limited primary sequence homology, a variety of sequence-based computation tools and databases such as eMOTIF and BLOCKS (Table 2Go) have been developed to further optimize the discovery of sequence relatedness among genes and for the development of phylogenetic and evolutionary models. Moreover, a search of gene relatedness based on the structure of translated products has also been explored (SCOP) to reveal potential functional relationships. To facilitate analysis of intron-exon arrangements of the unorganized HTGS and GSS genomic sequences, gene recognition algorithms have also been developed (Genie, GRAIL, etc).

Among the large number of ESTs and genomic entries in the GenBank, sequences not identical to known ones but that share similarity either in the entire structure or in selected domains are the most obvious candidates for further study. In general, the relatedness is obvious for an unknown gene that shares >40% identify with a known gene in its primary sequences. While proteins with this level of identity likely represent paralogs and could have similar functions, most novel sequences require further validation of their putative physiological roles.


    GENES WITH CLOSE EVOLUTIONARY ORIGIN
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Taking advantage of the evolutionary conservation of the primary sequence of genes that diverge during evolution, comparison of novel genes with known family proteins can lead to identification of paralogous genes. The classical example of a gene search using this approach is the identification of multiple orphan receptors more than a decade ago, which was based mainly on low-stringency hybridization cloning. The term orphan receptor was coined to describe gene products that belong to the nuclear receptor superfamily on the basis of sequence identity but without known ligands. Orphan receptors have been identified in most metazoans and include about 40 human genes (26, 27). The cloning of the orphan receptors ushered in the exciting new concept of reverse endocrinology. The reversed endocrine discovery process led to the identification of vitamin A metabolite 9-cis retinoic acid as a ligand for the retinoid X receptor (RXR) family of orphan receptors (28, 29). In addition, various hormones belonging to the transforming growth factor-ß (TGF-ß) superfamily, including different growth differentiation factors (GDFs) and bone morphogenic proteins (BMPs), have been identified based on degenerate PCR amplification and low-stringency screening (30, 31). Likewise, a variety of tyrosine kinase receptors (32, 33, 34), serine/threonine kinase receptors (type I and II receptor for TGF-ß superfamily proteins) (8, 35, 36, 37), and chemokines (38) were isolated using the same approach. More recently, up to 80 orphan G protein-coupled receptors have been identified largely through the cloning of genes sharing the unique seven-transmembrane domains by using PCR with degenerate primers (39, 40).

Using the computation approach, additional G protein-coupled receptors have been identified in the GenBank based on sequence conservation in the transmembrane region (40, 41). Recently, we have identified four novel leucine-rich repeat-containing, G protein-coupled receptors (LGR) sharing significant homology with known glycoprotein receptors from vertebrates using the sequences of primitive homologs isolated from Drosophila and snail Lymnanei stagnalis (42, 43) as queries for the GenBank search. Using the same evolutionary conservation approach, a number of genes homologous to mammalian glycoprotein hormone receptors were identified from nematode (44) and Drosophila (S. Y. Hsu, S. Nishi, and A. J. W. Hsueh, unpublished data). Of interest, the nematode LGR protein expressed in mammalian cells showed constitutive activity, resembling the point mutations found in the LH receptor gene in patients with familial male-limited precocious puberty (45, 46) and in the TSH receptor gene in patients with nonautoimmune hyperthyroidism (47). The finding of these novel leucine-rich repeat-containing, G protein-coupled receptors in diverse phylogenies has allowed 1) the determination of the consensus sequences and constituent motifs in these receptors, 2) the identification of additional LGRs, 3) the construction of a gene family tree and evolutionary models for these receptors, and 4) the improvement of the modeling of LGR protein structures (48). Although these novel receptors shared identical domain arrangement with glycoprotein hormone receptors, diversification in their primary sequences has predicted their binding to novel ligands, thus providing opportunities for reverse endocrinology. Similar bioinformatic searches have resulted in the identification of multiple cytokines (http://cytokine.medic.kumamoto-u.ac.jp/CFC/CK/Chemo-kine.html) (49, 50, 51, 52) and a large group of human Toll-like receptors (53, 54, 55). Many novel cytokines identified by this approach have been shown to function as chemotaxis regulators and are important for inflammatory responses. Likewise, Toll-like receptors are essential for immune surveillance. The rapid virtual gene identification process facilitates the illustration of the physiological function of these paralogous genes.


    GENES WITH LIMITED SIMILARITY IN SELECTIVE MOTIFS
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
While paralogous genes sharing an overall similarity can be identified with ease, discovery of related genes with limited similarity in selective motifs require an in-depth understanding of the structure-function of the common motif. Traditionally, low-stringency hybridization and degenerate PCR based on shared sequence motifs are the main approaches for identifying this group of paralogous genes. However, this approach becomes inefficient and impractical when the consensus motif is short and the similarity among homologous genes is low. In contrast, computation approaches allow the scanning of similarity along the entire length of candidate genes, based on conservation in stretches of primary sequences or in the presumptive secondary structure (Table 3Go). The confidence on sequence matches among remote homologs could gradually be optimized by focusing on comparison in regions corresponding to active sites or functional motifs that are most likely to be shared among divergent members. Thus, the computational approach is especially fruitful when the homology among paralogous genes is low but the functional motif(s) of known family members are well characterized. For example, proteins in the insulin superfamily were grouped based on their conserved tertiary structure of mature proteins. These proteins include insulin, insulin-like growth factors I and II (IGF-I, IGF-II), relaxin, Leydig cell relaxin, and EPIL (early placenta insulin-like factor) in vertebrates as well as molluscan insulin-like peptides and bombyxins in invertebrates (56). Six cysteine residues important for interchain disulfide bond formation are completely conserved in these otherwise remotely related polypeptides. Using the conserved domain sequences as query, we have isolated two novel relaxin/insulin-like factors (RIFs) sharing close homology with relaxin in the putative mature portion of these polypeptides (56). Although RIFs share low overall sequence similarity (<30%) with proteins in the insulin/relaxin superfamily, the disulfide bond-forming cysteine residues are completely conserved. Indeed, it is difficult to identify these molecules using traditional sequence-matching methods. Among genes involved in apoptosis, the death effector domain represents a stretch of about 80 amino acids that is shared by adapters, regulators, and executors of the tumor necrosis factor-{alpha} (TNF-{alpha})/Fas pathway. Based on the conservation of this domain, GenBank searches have also allowed us to isolate an intracellular apoptosis mediator DEFT (death effector in testis) that is abundantly expressed in the testis (57).


    GENBANK SEARCHES FOR POLYMORPHISMS OF KNOWN GENES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
In addition to the discovery of novel genes, the computational approach also allows the unique opportunity to identify polymorphisms of known genes encoding proteins with different functional characteristics. Single nucleotide polymorphism (SNP) represents single nucleotide changes in the same gene among individuals (58). These minor alterations occur with varying frequencies in different ethnic populations and are the central focus of pharmacogenomic studies. It has been estimated that there are 300,000 potential SNPs in the human genome (59, 60). Although SNPs were traditionally characterized by single-strand conformational polymorphism (SSCP) analysis, the accumulation of multiple sequences of a given gene in the GenBank allows future polymorphism discovery when sufficient sequences of the same gene are aligned. The importance of endocrine gene polymorphism is illustrated by polymorphisms of cytochrome P450 genes associated with abolished or enhanced drug metabolism in patients (61, 62) and a chemokine receptor CCR-5 polymorphism associated with human immunodeficiency virus (HIV) resistance (63). Other examples include polymorphisms in ß-adrenergic receptors and sensitivity to ß-agonists in asthmatics (64), angiotensin-converting enzymes (ACE) and sensitivity to ACE inhibitors (65, 66), angiotensin II type 1 receptors and vascular reactivity to phenylephrine (67), and hydroxtryptamine receptors and responsiveness to neuroleptics such as clozapine (68). The identification of these polymorphisms not only contributed to the study of pharmacology and epidemiology but also allowed the elucidation of functional mechanisms of some of the receptors. For hormone ligands, the human LHß gene has two linked SNPs (Trp8Arg and Ile15Thr) that are known to encode proteins exhibiting altered immunoassay properties (69) and show distinct carrier frequency among different racial groups. Furthermore, polymorphisms in the promoter of the chemokine RANTES gene are associated with changes in the secretion of this ligand for CCR-5, thus altering the chance of HIV infection (70). In the future, analysis of gene polymorphism using the informatic approach could allow one to identify the genetic susceptibility of individuals to some endocrine diseases and to customize hormonal treatment protocols for individual patients.


    LIMITATIONS IN THE MINING OF GENE SEQUENCE DATABASES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Although the existing computation algorithm for gene sequence analysis is powerful, there are limitations. Sequences obtained through the high-throughput DNA analysis are generated by automated single-pass analysis of DNA. As a result, sequences may contain mistakes, deletions, or insertions. In addition, the annotation of many sequence entries may have mistakes. Thus, despite the unprecedented opportunity to access the gigantic databases, conflicting information derived from such archives requires careful scrutiny to weed out false positives including sequencing errors, annotation mistakes, pseudogenes, multiple copies of recently duplicated and highly homologous genes (e.g. hCG-ß), and a combination of these common artifacts. These pitfalls are particularly imminent when one is studying families of proteins with close sequence homology or is focusing on detailed features (such as polymorphism) of sequences.

The relatedness of diverse genes could theoretically be decided based on pairwise sequence comparison and motif search programs. Although sequences that share less than15–20% identity could have sequence-function relatedness, most genes with this level of identity are likely to perform distinct functions. In addition, the majorities of sequenced genes in the GenBank either have limited sequence similarity to known genes or have no sequence references. It is becoming clear that sequence-independent approaches are required to augment the powerful sequence-based gene discovery.


    INTEGRATION OF GENOMIC INFORMATION AND KNOWN LITERATURE ON GENE FUNCTION
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
With the anticipated sequencing of the human genome, future gene discovery no longer involves the physical cloning of new genes but will include consolidation and integration of existing sequences and the identification of gene functions. To gain insight into the potential roles of individual genes in our genome, several new approaches have been developed to facilitate the organization and dissemination of data from GenBank and the existing literature, as well as to streamline the data extraction process for identifying gene functions.


    SEQUENCE-BASED CLUSTERING OF INDIVIDUAL DNA SEQUENCES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Multiple sequences derived from the same gene were submitted to the GenBank by different investigators, and a given gene may have different entries (e.g. contiguous and noncontiguous genomic sequences, mRNA sequences with alternative splicing, and ESTs). Because it is not yet routine to identify all genomic regions that are transcribed, the integration of EST sequences and genomic sequences is necessary to identify all coding regions. To consolidate all sequences derived from the same gene, a number of databases such as UniGene, STACK, and TIGR have been set up (Table 2Go) (71). The UniGene collection has grouped more than 1.8 million ESTs of human origin into clusters of unique sequences, each representing the transcription product of a distinct human gene. This collection provides nonredundant mapping candidates for generating gene expression profiles in diverse tissues. Moreover, the organization of gene sequences in this format has allowed investigators to perform virtual library subtraction experiments in a tissue- or cell-specific manner across the whole body through simple query (72).

Due to the recent rapid expansion of HTG sequences, it is anticipated that the majority of DNA sequences in the GenBank will be in the form of unfinished genomic sequences in the near future. Because these sequences are derived from single or few reads of DNA clones, they are unordered, unoriented, and contain gaps and errors. To analyze this type of sequence, sophisticated algorithms that integrate the analysis of different gene features have been developed. For example, the Genotator and NIX packages provide workbenches for automated sequence annotation and annotation browsing (Table 2Go). The programs integrate multiple gene-finding tools, homology searches, and algorithms for identifying promoters, splice sites, and open reading frames. The results are presented using a graphic browser to facilitate gene analysis. The tradeoff of such convenience is the lack of options to alter the stringency of analysis. However, even the best tools rarely perform at more than 75% accuracy. Indeed, recent studies indicated that, in addition to the 6,200 predicted genes, 300 novel genes can be identified in yeast using genome-wide transposon disruption; these genes were not recognized based on the most sophisticated algorithm tools (73).


    INTEGRATION OF GENE SEQUENCES WITH TEXT-BASED LITERATURE ON GENE FUNCTION
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
With the explosion of gene sequences, one of the major challenges facing genomic researchers today is the integration of sequence data with the vast and growing body of literature based on functional analyses of genes. The implementation of a user-friendly interface for easy query of diverse databases is necessary for both novice and experienced investigators who seek to sort out the expanding knowledge base of gene sequence and functions. The Online Mendelian Inheritance in Men (OMIM) database (http://www.ncbi.nlm.nih.gov/Omim/) provides a searchable web-based dynamic database of human genes and genetic disorders. Each gene entry contains textual information and hypertext links to the GenBank database and PubMed, allowing quick reference checks for most known genes. In addition, multiple online databases for specific gene groups or tissues have been developed. For example, several databases (GPCRDB, GCRDb, GRAP, ORDB, CORD, and Swiss Model 7TM Interface, accessible through the portal site http://www.opioid.umn.edu/links.html) organize information on the G protein-coupled receptors into searchable databases, allowing users to browse and analyze the molecular and physiological data of this largest protein family in eukaryotic organisms (74). Likewise, the Nuclear Receptor Resource (NRR) Project (http://nrr.georgetown.edu/NRR/NRR.html) is a collection of individual databases on members of the steroid and thyroid hormone receptor superfamily. In addition, we have developed a database on ovarian genes. The Ovarian Kaleidoscope Database (http://ovary.stanford.edu/) provides information regarding the biological function, expression pattern, and regulation of genes expressed in the ovary. It also serves as a gateway to other online information resources offering data about nucleotide and amino acid sequences, chromosomal localization, human and murine mutation phenotypes, and biomedical publications relevant to ovarian research. For all these databases, gene sequence and functional information are interlinked. Continuing development of databases and portal sites focusing on specific biomedical areas would allow easy access to updated and organized discipline-specific information and perhaps make the analysis of gene sequences part of routine literature searches.


    FUTURE IDENTIFICATION OF GENE FUNCTIONS
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Among the human genes, currently less than 8% are considered to be known genes with different levels of functional characterization (75). The inadequacy of a sequence alignment-based approach to determine the functions of most sequenced genes has prompted the development of a variety of experimental procedures to utilize the genomic information more efficiently and to provide alternative ways for the annotation of novel genes. With the anticipated availability of entire human genome sequences, the traditional positional cloning approach will be replaced by virtual chromosomal walking of the ordered genome sequences. Furthermore, the present single gene knockout approach for gene function identification will be replaced by random genome-wide gene deletion (http://www.lexgen.com/k) or targeted gene trapping (76, 77). The secretory trap method based on capturing the N-terminal signal peptide sequence of an endogenous gene to generate an active reporter fusion protein has allowed the generation of mice with a deletion of genes encoding membrane and secreted proteins (77, 78, 79, 80). Among the several hundred mutant strains generated (http://socrates.berkeley.edu/~skarnes/resource.html), many are defective in ligand or receptor genes important for endocrine research. Integration of this collection of mutant mice with the expanding GenBank database will continue to allow characterization of new hormones and receptors.


    DNA ARRAY FOR EXPRESSION PROFILING AND POLYMORPHISM SEARCH
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
The development of high-density DNA microarrays has allowed the analysis of the expression of thousands of genes simultaneously (81). By combining this technology with the vast EST collection, investigators now can quickly group genes showing similar expression profiles. The DNA array method has allowed the genotyping of physiological processes and the understanding of regulatory mechanisms in a global manner without detailed functional analysis of individual genes. For example, one can detect changes of gene expression based on hybridization to cDNAs from different physiological states (before and after endocrine ablation), diseased progression stages (normal vs. diseased) and treatment periods (pre- vs. post-) (82). In addition, the sequencing-by-hybridization approach based on array analysis of overlapping oligomers corresponding to the entire sequence of a known gene allows detection of mutations or polymorphism of individual genes (83).


    FACILITATION OF PROTEIN DISCOVERY USING GENBANK SEARCH
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
The classical protein purification approach for identifying endocrine proteins will be accelerated by matching EST databases with peptide sequences derived from two-dimensional gel electrophoresis analysis followed by peptide mass fingerprinting with MALDI-TOF (matrix-assisted laser-desorption-ionization-time-of-flight) mass spectrometry. Modern mass spectrometers can provide sufficient information to allow unique recognition of protein fragments, as well as detection of secondary modifications such as phosphorylation and glycosylation. Alternatively, it may be possible to devise chip detectors for ligands, receptors, and intracellular signaling mediators. In addition, integration of a genome database with protein coprecipitation and global yeast two-hybrid interaction approaches should allow the assembly of comprehensive interaction maps of genomes (84, 85) and the routine identification of downstream components of cell surface receptors in mammalian cells (86).


    THREE-DIMENSIONAL STRUCTURE COMPARISON
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Because the folding of a protein is the result of a collective interaction among its constituent amino acid residues, comparison of proteins based on their secondary or tertiary structures represents an alternative approach for predicting gene functions. Important information can be extracted regarding the fold of a protein embedded in the length and arrangement of the predicted helices, strands, and coils along the polypeptide chain. As the number of polypeptides with determined secondary or tertiary structures is increased, this approach will become more useful to identify the function of uncharacterized genes (87, 88).


    INTEGRATION OF EXPRESSION, PATHWAY, AND PHYLOGENETIC PROFILING
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
By phylogenetic comparison of sequence data with experimental data on correlated mRNA expression patterns, protein-protein interactions, or protein functions in a given species, the researcher is able to infer protein functions through properties other than sequence similarity (89, 90, 91, 92). Studies on genomes of lower organisms have demonstrated that a great number of genes are created through multiple fusion events during evolution, and genes that have fused are likely to interact with each other in the same signaling pathway (93, 94, 95). In addition, previously unidentified genes associated with prostate cancer, steroid synthesis, insulin synthesis, and neurotransmitter processing can be discovered by using a guilt-by-association approach based on coordinated expression of genes in different cDNA libraries (96).


    CONCLUSIONS
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 
Historically, most endocrine factors have been defined based on phenotypic changes. As the genome projects progress, the massive archives in the GenBank represent a golden opportunity to discover new hormones, receptors, and signaling molecules. Through sequence analysis, paralogous novel endocrine genes can be isolated, and furthermore, polymorphisms of known genes can be identified. Although gene sequence-based approaches cannot completely replace the traditional biochemical and physiological characterization of endocrine mediators, they have greatly facilitated the identification and analysis of new genes. The ongoing shift in the investigation from one gene at a time to a global approach necessitates new methods to integrate the explosion of knowledge on gene sequences, transcript expression profiles, and protein functions and interactions. Although all human genes will be known in a few years, major challenges await endocrinologists to elucidate the physiological roles of all hormones, receptors, and signaling mediators. The greatest challenge will be to decipher the logical circuitry controlling all endocrine pathways.


    ACKNOWLEDGMENTS
 
We thank Ms. Caren Spencer for editorial assistance.


    FOOTNOTES
 
Address requests for reprints to: Dr. Aaron J. W. Hsueh, Department of Gynecology/Obstetrics, Stanford University School of Medicine, Division of Reproductive Biology, 300 Pasteur Drive, Room A-344, Stanford, California 94305-5317.

Work from our laboratory was supported by NIH Grants HD-23273 and HD-31398. The Ovarian Kaleidoscope Database is supported by the Specialized Cooperative Centers Program in Reproduction Research, NICHD, NIH.

Received for publication January 25, 2000. Revision received February 28, 2000. Accepted for publication March 1, 2000.


    REFERENCES
 TOP
 INTRODUCTION
 GENOMIC REVOLUTION
 TRADITIONAL VS. GENBANK...
 GENBANK SEARCHES FOR NOVEL...
 GENES WITH CLOSE EVOLUTIONARY...
 GENES WITH LIMITED SIMILARITY...
 GENBANK SEARCHES FOR...
 LIMITATIONS IN THE MINING...
 INTEGRATION OF GENOMIC...
 SEQUENCE-BASED CLUSTERING OF...
 INTEGRATION OF GENE SEQUENCES...
 FUTURE IDENTIFICATION OF GENE...
 DNA ARRAY FOR EXPRESSION...
 FACILITATION OF PROTEIN...
 THREE-DIMENSIONAL STRUCTURE...
 INTEGRATION OF EXPRESSION,...
 CONCLUSIONS
 REFERENCES
 

  1. Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L 1998 New goals for the U.S. Human Genome Project: 1998–2003. Science 282:682–689[Abstract/Free Full Text]
  2. Lander ES 1996 The new genomics: global views of biology. Science 274:536–539[Free Full Text]
  3. Boguski MS, Schuler GD 1995 ESTablishing a human transcript map. Nat Genet 10:369–371[Medline]
  4. Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriguez-Tome P, Aggarwal A, Bajorek E, Bentolila S, Birren BB, Butler A, Castle AB, Chiannilkulchai N, Chu A, Clee C, Cowles S, Day PJ, Dibling T, Drouot N, Dunham I, Duprat S, East C, Hudson TJ, et al. 1996 A gene map of the human genome. Science 274:540–546[Abstract/Free Full Text]
  5. Margolis B 1994 The GRB family of SH2 domain proteins. Prog Biophys Mol Biol 62:223–244[CrossRef][Medline]
  6. Tsuchida K, Mathews LS, Vale WW 1993 Cloning and characterization of a transmembrane serine kinase that acts as an activin type I receptor. Proc Natl Acad Sci USA 90:11242–11246[Abstract]
  7. Koenig BB, Cook JS, Wolsing DH, Ting J, Tiesman JP, Correa PE, Olson CA, Pecquet AL, Ventura F, Grant RA 1994 Characterization and cloning of a receptor for BMP-2 and BMP-4 from NIH 3T3 cells. Mol Cell Biol 14:5961–5974[Abstract]
  8. Mathews LS, Vale WW 1991 Expression cloning of an activin receptor, a predicted transmembrane serine kinase. Cell 65:973–982[Medline]
  9. Takemoto Y, Furuta M, Sato M, Hashimoto Y 1997 A simple improvement in expression cloning. DNA Cell Biol 16:797–799[Medline]
  10. Franzen P, ten Dijke P, Ichijo H, Yamashita H, Schulz P, Heldin CH, Miyazono K 1993 Cloning of a TGFß type I receptor that forms a heteromeric complex with the TGFß type II receptor. Cell 75:681–692[Medline]
  11. McLaughlin MM, Kumar S, McDonnell PC, Van Horn S, Lee JC, Livi GP, Young PR 1996 Identification of mitogen-activated protein (MAP) kinase-activated protein kinase-3, a novel substrate of CSBP p38 MAP kinase. J Biol Chem 271:8488–8492[Abstract/Free Full Text]
  12. Witczak O, Skalhegg BS, Keryer G, Bornens M, Tasken K, Jahnsen T, Orstavik S 1999 Cloning and characterization of a cDNA encoding an A-kinase anchoring protein located in the centrosome, AKAP450. EMBO J 18:1858–1868[Abstract/Free Full Text]
  13. Kawabata M, Chytil A, Moses HL 1995 Cloning of a novel type II serine/threonine kinase receptor through interaction with the type I transforming growth factor-ß receptor. J Biol Chem 270:5625–5630[Abstract/Free Full Text]
  14. Chandrasekharappa SC, Guru SC, Manickam P, Olufemi SE, Collins FS, Emmert-Buck MR, Debelenko LV, Zhuang Z, Lubensky IA, Liotta LA, Crabtree JS, Wang Y, Roe BA, Weisemann J, Boguski MS, Agarwal SK, Kester MB, Kim YS, Heppner C, Dong Q, Spiegel AM, Burns AL, Marx SJ 1997 Positional cloning of the gene for multiple endocrine neoplasia-type 1. Science 276:404–407[Abstract/Free Full Text]
  15. Bruccoleri RE, Dougherty TJ, Davison DB 1998 Concordance analysis of microbial genomes. Nucleic Acids Res 26:4482–4486[Abstract/Free Full Text]
  16. Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr S, Smith T, Weng S, Cherry JM, Botstein D 1998 Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282:2022–2028[Abstract/Free Full Text]
  17. Botstein D, Chervitz SA, Cherry JM 1997 Yeast as a model organism. Science 277:1259–1260[Free Full Text]
  18. Wolfe KH, Shields DC 1997 Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387:708–713[CrossRef][Medline]
  19. Blaxter M 1998 Caenorhabditis elegans is a nematode. Science 282:2041–2046[Abstract/Free Full Text]
  20. Ruvkun G, Hobert O 1998 The taxonomy of developmental control in Caenorhabditis elegans. Science 282:2033–2041[Abstract/Free Full Text]
  21. Meyer A, Schartl M 1999 Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol 11:699–704[CrossRef][Medline]
  22. Pebusque MJ, Coulier F, Birnbaum D, Pontarotti P 1998 Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. Mol Biol Evol 15:1145–1159[Abstract]
  23. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ 1990 Basic local alignment search tool. J Mol Biol 215:403–410[CrossRef][Medline]
  24. Altschul SF, Lipman DJ 1990 Protein database searches for multiple alignments. Proc Natl Acad Sci USA 87:5509–5513[Abstract]
  25. Pearson WR, Lipman DJ 1988 Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444–2448[Abstract]
  26. Kliewer SA, Lehmann JM, Willson TM 1999 Orphan nuclear receptors: shifting endocrinology into reverse. Science 284:757–760[Abstract/Free Full Text]
  27. Kliewer SA, Lehmann JM, Milburn MV, Willson TM 1999 The PPARs and PXRs: nuclear xenobiotic receptors that define novel hormone signaling pathways. Recent Prog Horm Res 54:345–367[Medline]
  28. Willy PJ, Umesono K, Ong ES, Evans RM, Heyman RA, Mangelsdorf DJ 1995 LXR, a nuclear receptor that defines a distinct retinoid response pathway. Genes Dev 9:1033–1045[Abstract]
  29. Mangelsdorf DJ, Evans RM 1995 The RXR heterodimers and orphan receptors. Cell 83:841–850[Medline]
  30. McPherron AC, Lee SJ 1993 GDF-3 and GDF-9: two new members of the transforming growth factor-ß superfamily containing a novel pattern of cysteines. J Biol Chem 268:3444–3449[Abstract/Free Full Text]
  31. Wozney JM 1992 The bone morphogenetic protein family and osteogenesis. Mol Reprod Dev 32:160–167[Medline]
  32. Crosier PS, Lewis PM, Hall LR, Vitas MR, Morris CM, Beier DR, Wood CR, Crosier KE 1994 Isolation of a receptor tyrosine kinase (DTK) from embryonic stem cells: structure, genetic mapping and analysis of expression. Growth Factors 11:125–136[Medline]
  33. Crosier PS, Hall LR, Vitas MR, Lewis PM, Crosier KE 1995 Identification of a novel receptor tyrosine kinase expressed in acute myeloid leukemic blasts. Leuk Lymphoma 18:443–449[Medline]
  34. Iwama A, Okano K, Sudo T, Matsuda Y, Suda T 1994 Molecular cloning of a novel receptor tyrosine kinase gene, STK, derived from enriched hematopoietic stem cells. Blood 83:3160–3169[Abstract/Free Full Text]
  35. Mathews LS 1994 Activin receptors and cellular signaling by the receptor serine kinase family. Endocr Rev 15:310–325[Medline]
  36. Ryden M, Imamura T, Jornvall H, Belluardo N, Neveu I, Trupp M, Okadome T, ten Dijke P, Ibanez CF 1996 A novel type I receptor serine-threonine kinase predominantly expressed in the adult central nervous system. J Biol Chem 271:30603–30609[Abstract/Free Full Text]
  37. Tsuchida K, Sawchenko PE, Nishikawa S, Vale WW 1996 Molecular cloning of a novel type I receptor serine/threonine kinase for the TGFß superfamily from rat brain. Mol Cell Neurosci 7:467–478[CrossRef][Medline]
  38. Parnet P, Garka KE, Bonnert TP, Dower SK, Sims JE 1996 IL-1Rrp is a novel receptor-like molecule similar to the type I interleukin-1 receptor and its homologues T1/ST2 and IL-1R AcP. J Biol Chem 271:3967–3970[Abstract/Free Full Text]
  39. Marchese A, Docherty JM, Nguyen T, Heiber M, Cheng R, Heng HH, Tsui LC, Shi X, George SR, O’Dowd BF 1994 Cloning of human genes encoding novel G protein-coupled receptors. Genomics 23:609–618[CrossRef][Medline]
  40. Marchese A, George SR, Kolakowski Jr LF, Lynch KR, O’Dowd BF 1999 Novel GPCRs and their endogenous ligands: expanding the boundaries of physiology and pharmacology. Trends Pharmacol Sci 20:370–375[CrossRef][Medline]
  41. O’Dowd BF, Nguyen T, Jung BP, Marchese A, Cheng R, Heng HH, Kolakowski Jr LF, Lynch KR, George SR 1997 Cloning and chromosomal mapping of four putative novel human G-protein- coupled receptor genes. Gene 187:75–81[CrossRef][Medline]
  42. Tensen CP, Van Kesteren ER, Planta RJ, Cox KJ, Burke JF, van Heerikhuizen H, Vreugdenhil E 1994 A G protein-coupled receptor with low density lipoprotein-binding motifs suggests a role for lipoproteins in G-linked signal transduction. Proc Natl Acad Sci USA 91:4816–4820[Abstract]
  43. Hauser F, Nothacker HP, Grimmelikhuijzen CJ 1997 Molecular cloning, genomic organization, and developmental regulation of a novel receptor from Drosophila melanogaster structurally related to members of the thyroid-stimulating hormone, follicle-stimulating hormone, luteinizing hormone/choriogonadotropin receptor family from mammals. J Biol Chem 272:1002–1010[Abstract/Free Full Text]
  44. Kudo M, Chen T, Nakabayashi K, Hsu SY, Hsueh AJ 2000 The nematode leucine-rich repeat-containing, G protein-coupled receptor (LGR) protein homologous to vertebrate gonadotropin, thyrotropin receptors is constitutively actived in mammalian cells. Mol Endocrinol 14:272–284[Abstract/Free Full Text]
  45. Kudo M, Osuga Y, Kobilka BK, Hsueh AJW 1996 Transmembrane regions V and VI of the human luteinizing hormone receptor are required for constitutive activation by a mutation in the third intracellular loop. J Biol Chem 271:22470–22478[Abstract/Free Full Text]
  46. Laue L, Chan WY, Hsueh AJ, Kudo M, Hsu SY, Wu SM, Blomberg L, Cutler Jr GB 1995 Genetic heterogeneity of constitutively activating mutations of the human luteinizing hormone receptor in familial male-limited precocious puberty. Proc Natl Acad Sci USA 92:1906–1910[Abstract]
  47. Kopp P, Jameson JL, Roe TF 1997 Congenital nonautoimmune hyperthyroidism in a nonidentical twin caused by a sporadic germline mutation in the thyrotropin receptor gene. Thyroid 7:765–770[Medline]
  48. Hsu SY, Liang SG, Hsueh AJ 1998 Characterization of two LGR genes homologous to gonadotropin and thyrotropin receptors with extracellular leucine-rich repeats and a G protein-coupled, seven-transmembrane region. Mol Endocrinol 12:1830–1845[Abstract/Free Full Text]
  49. Wells TN, Peitsch MC 1997 The chemokine information source: identification and characterization of novel chemokines using the WorldWideWeb and expressed sequence tag databases. J Leukoc Biol 61:545–550[Abstract]
  50. Yoshie O, Imai T, Nomiyama H 1997 Novel lymphocyte-specific CC chemokines and their receptors. J Leukoc Biol 62:634–644[Abstract]
  51. Hieshima K, Imai T, Opdenakker G, Van Damme J, Kusuda J, Tei H, Sakaki Y, Takatsuki K, Miura R, Yoshie O, Nomiyama H 1997 Molecular cloning of a novel human CC chemokine liver and activation- regulated chemokine (LARC) expressed in liver. Chemotactic activity for lymphocytes and gene localization on chromosome 2. J Biol Chem 272:5846–5853[Abstract/Free Full Text]
  52. Yang JY, Spanaus KS, Widmer U 2000 Cloning, characterization, genomic organization of Lcc-1 (Scya16), a novel human Cc chemokine expressed in liver. Cytokine 12:101–109[CrossRef][Medline]
  53. Rock FL, Hardiman G, Timans JC, Kastelein RA, Bazan JF 1998 A family of human receptors structurally related to Drosophila Toll. Proc Natl Acad Sci USA 95:588–593[Abstract/Free Full Text]
  54. Chaudhary PM, Ferguson C, Nguyen V, Nguyen O, Massa HF, Eby M, Jasmin A, Trask BJ, Hood L, Nelson PS 1998 Cloning and characterization of two Toll/interleukin-1 receptor-like genes TIL3 and TIL4: evidence for a multi-gene receptor family in humans. Blood 91:4020–4027[Abstract/Free Full Text]
  55. Yang RB, Mark MR, Gray A, Huang A, Xie MH, Zhang M, Goddard A, Wood WI, Gurney AL, Godowski PJ 1998 Toll-like receptor-2 mediates lipopolysaccharide-induced cellular signalling. Nature 395:284–288[CrossRef][Medline]
  56. Hsu SY 1999 Cloning of two novel mammalian paralogs of relaxin/insulin family proteins and their expression in testis and kidney. Mol Endocrinol 13:2163–2174[Abstract/Free Full Text]
  57. Leo CP, Hsu SY, McGee EA, Salanova M, Hsueh AJ 1998 DEFT, a novel death effector domain-containing molecule predominantly expressed in testicular germ cells. Endocrinology 139:4839–4848[Abstract/Free Full Text]
  58. Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lander ES 1998 Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077–1082[Abstract/Free Full Text]
  59. Smigielski EM, Sirotkin K, Ward M, Sherry ST 2000 dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28:352–355[Abstract/Free Full Text]
  60. Evans WE, Relling MV 1999 Pharmacogenomics: translating functional genomics into rational therapeutics. Science 286:487–491[Abstract/Free Full Text]
  61. Ingelman-Sundberg M, Oscarson M, McLellan RA 1999 Polymorphic human cytochrome P450 enzymes: an opportunity for individualized drug treatment. Trends Pharmacol Sci 20:342–349[CrossRef][Medline]
  62. Marshall A 1997 Laying the foundations for personalized medicines. Nat Biotechnol 15:954–957[Medline]
  63. Liu R, Paxton WA, Choe S, Ceradini D, Martin SR, Horuk R, MacDonald ME, Stuhlmann H, Koup RA, Landau NR 1996 Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86:367–377[Medline]
  64. Buscher R, Herrmann V, Insel PA 1999 Human adrenoceptor polymorphisms: evolving recognition of clinical importance. Trends Pharmacol Sci 20:94–99[CrossRef][Medline]
  65. Nakano Y, Oshima T, Watanabe M, Matsuura H, Kajiyama G, Kambe M 1997 Angiotensin I-converting enzyme gene polymorphism and acute response to captopril in essential hypertension. Am J Hypertens 10:1064–1068[CrossRef][Medline]
  66. Haas M, Yilmaz N, Schmidt A, Neyer U, Arneitz K, Stummvoll HK, Wallner M, Auinger M, Arias I, Schneider B, Mayer G 1998 Angiotensin-converting enzyme gene polymorphism determines the antiproteinuric and systemic hemodynamic effect of enalapril in patients with proteinuric renal disease. Austrian Study Group of the Effects of Enalapril Treatment in Proteinuric Renal Disease. Kidney Blood Press Res 21:66–69[CrossRef][Medline]
  67. Henrion D, Amant C, Benessiano J, Philip I, Plantefeve G, Chatel D, Hwas U, Desmont JM, Durand G, Amouyel P, Levy BI 1998 Angiotensin II type 1 receptor gene polymorphism is associated with an increased vascular reactivity in the human mammary artery in vitro. J Vasc Res 35:356–362[CrossRef][Medline]
  68. Arranz MJ, Collier DA, Munro J, Sham P, Kirov G, Sodhi M, Roberts G, Price J, Kerwin RW 1996 Analysis of a structural polymorphism in the 5-HT2A receptor and clinical response to clozapine. Neurosci Lett 217:177–178[CrossRef][Medline]
  69. Haavisto AM, Pettersson K, Bergendahl M, Virkamaki A, Huhtaniemi I 1995 Occurrence and biological properties of a common genetic variant of luteinizing hormone. J Clin Endocrinol Metab 80:1257–1263[Abstract]
  70. Liu H, Chao D, Nakayama EE, Taguchi H, Goto M, Xin X, Takamatsu JK, Saito H, Ishikawa Y, Akaza T, Juji T, Takebe Y, Ohishi T, Fukutake K, Maruyama Y, Yashiki S, Sonoda S, Nakamura T, Nagai Y, Iwamoto A, Shioda T 1999 Polymorphism in RANTES chemokine promoter affects HIV-1 disease progression. Proc Natl Acad Sci USA 96:4581–4585[Abstract/Free Full Text]
  71. Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA 1999 A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Genome Res 9:1143–1155[Abstract/Free Full Text]
  72. Vasmatzis G, Essand M, Brinkmann U, Lee B, Pastan I 1998 Discovery of three genes specifically expressed in human prostate by expressed sequence tag database analysis. Proc Natl Acad Sci USA 95:300–304[Abstract/Free Full Text]
  73. Ross-Macdonald P, Coelho PS, Roemer T, Agarwal S, Kumar A, Jansen R, Cheung KH, Sheehan A, Symoniatis D, Umansky L, Heidtman M, Nelson FK, Iwasaki H, Hager K, Gerstein M, Miller P, Roeder GS, Snyder M 1999 Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature 402:413–418[CrossRef][Medline]
  74. Nakata K, Takai T, Kaminuma T 1999 Development of the receptor database (RDB): application to the endocrine disruptor problem. Bioinformatics 15:544–552[Abstract/Free Full Text]
  75. O’Brien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg J, Stanyon R, Copeland NG, Jenkins NA, Womack JE, Marshall Graves JA 1999 The promise of comparative genomics in mammals. Science 286:458–481[Abstract/Free Full Text]
  76. Zambrowicz BP, Friedrich GA, Buxton EC, Lilleberg SL, Person C, Sands AT 1998 Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392:608–611[CrossRef][Medline]
  77. Skarnes WC, Moss JE, Hurtley SM, Beddington RS 1995 Capturing genes encoding membrane and secreted proteins important for mouse development. Proc Natl Acad Sci USA 92:6592–6596[Abstract]
  78. Durick K, Mendlein J, Xanthopoulos KG 1999 Hunting with traps: genome-wide strategies for gene discovery and functional analysis. Genome Res 9:1019–1025[Abstract/Free Full Text]
  79. Chowdhury K, Bonaldo P, Torres M, Stoykova A, Gruss P 1997 Evidence for the stochastic integration of gene trap vectors into the mouse germline. Nucleic Acids Res 25:1531–1536[Abstract/Free Full Text]
  80. Stoykova A, Chowdhury K, Bonaldo P, Torres M, Gruss P 1998 Gene trap expression and mutational analysis for genes involved in the development of the mammalian nervous system. Dev Dyn 212:198–213[CrossRef][Medline]
  81. Schena M, Shalon D, Davis RW, Brown PO 1995 Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470[Abstract]
  82. Jennings EG, Young RA 1999 Genome expression on the World Wide Web. Trends Genet 15:202–204[CrossRef][Medline]
  83. Ahrendt SA, Halachmi S, Chow JT, Wu L, Halachmi N, Yang SC, Wehage S, Jen J, Sidransky D 1999 Rapid p53 sequence analysis in primary lung cancer using an oligonucleotide probe array. Proc Natl Acad Sci USA 96:7382–7387[Abstract/Free Full Text]
  84. Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal M 2000 Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 287:116–122[Abstract/Free Full Text]
  85. Blackstock WP, Weir MP 1999 Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol 17:121–127[CrossRef][Medline]
  86. Pandey A, Podtelejnikov AV, Blagoev B, Bustelo XR, Mann M, Lodish HF 2000 Analysis of receptor signaling pathways by mass spectrometry: identification of Vav-2 as a substrate of the epidermal, platelet-derived growth factor receptors. Proc Natl Acad Sci USA 97:179–184[Abstract/Free Full Text]
  87. Di Francesco V, Munson PJ, Garnier J 1999 FORESST: fold recognition from secondary structure predictions of proteins. Bioinformatics 15:131–140[Abstract/Free Full Text]
  88. Geetha V, Di Francesco V, Garnier J, Munson PJ 1999 Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs. Protein Eng 12:527–534[Abstract/Free Full Text]
  89. Niehrs C, Pollet N 1999 Synexpression groups in eukaryotes. Nature 402:483–487[CrossRef][Medline]
  90. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D 1999 Detecting protein function and protein-protein interactions from genome sequences. Science 285:751–753[Abstract/Free Full Text]
  91. Enright AJ, Ilipoulos I, Kyrpides NC, Ouzounis CA 1999 Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90[CrossRef][Medline]
  92. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D 1999 A combined algorithm for genome-wide prediction of protein function. Nature 402:83–86[CrossRef][Medline]
  93. Zlokarnik G, Negulescu PA, Knapp TE, Mere L, Burres N, Feng L, Whitney M, Roemer K, Tsien RY 1998 Quantitation of transcription and clonal selection of single living cells with ß-lactamase as reporter. Science 279:84–88[Abstract/Free Full Text]
  94. Whitney M, Rockenstein E, Cantin G, Knapp T, Zlokarnik G, Sanders P, Durick K, Craig FF, Negulescu PA 1998 A genome-wide functional assay of signal transduction in living mammalian cells. Nat Biotechnol 16:1329–1333[CrossRef][Medline]
  95. Rao A 1998 Sampling the universe of gene expression. Nat Biotechnol 16:1311–1312[CrossRef][Medline]
  96. Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T 1999 Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res 9:1198–1203[Abstract/Free Full Text]