Discovering New Hormones, Receptors, and Signaling Mediators in the Genomic Era
Sheau Yu Hsu and
Aaron J. W. Hsueh
Division of Reproductive Biology Department of Gynecology and
Obstetrics Stanford University School of Medicine Stanford,
California 94305-5317
 |
INTRODUCTION
|
---|
The human genome has 3 billion base pairs with
35% as coding exons that account for about 100,000140,000
genes. The Human Genome Project is scheduled to complete the sequencing
of the entire genome in the next few years. At the present time, the
Expressed Sequence Tags (ESTs) in the GenBank represent >60% of the
human genes. Mammalian genes exist in families as reflected by the
presence of multiple paralogs. With recent advances in bioinformatic
tools, the massive resources in the GenBank are easily accessible to
investigators through the Internet and allow opportunities for
discoveries that could impact all fields of biomedicine. Using
Internet-based tools to search the massive genome databases, it is
possible to identify new endocrine genes based on evolutionary
conservation, domain homology, and tissue expression patterns. Examples
for the discovery of paralogous genes include: LGRs (leucine-rich
repeat containing, G protein-coupled receptors) that are
evolutionarily conserved as compared with gonadotropin and TSH
receptors, and RIFs (relaxin-insulin-like factors) that show domain
conservation with known growth factors. Using the GenBank searches,
polymorphism of previously characterized genes can also be identified.
The rapid pace of genomic revolution allows the discovery of new genes
in the endocrine systems, identification of new drug targets, analysis
of genetic susceptibility of individuals to endocrine diseases, and
customized drug therapy based on gene polymorphism. It is anticipated
that the future integration of gene sequence-based approaches with
transcript expression profiles and protein functions will lead to a
more complete understanding of the circuitry controlling the endocrine
pathways.
 |
GENOMIC REVOLUTION
|
---|
The recent genomic revolution officially started in 1992 with the
launching of the Human Genome Project (1). It is anticipated that,
within the next few years, sequencing of the entire 3 billion (3
x 109) base pairs of the human genome will be
completed. At the present time, the complete genomic sequences of more
than 2 dozen organisms including yeast Saccharomyces
cerevisiae and nematode Caenorhabditis elegans are
already available. This progress has been compared with the completion
of the Periodic Table
of the Elements at the end of the 19th century
(2) and became possible through the largely automated sequencing of
libraries of genomic DNA and expressed sequence tags (ESTs) that
represent fragments of sequenced cDNAs from diverse tissues. At the
beginning of April 2000, the GenBank contained more than 2.4 billion
bases of human genome sequences and the complete sequencing of the
entire human genome has just been announced (for an update of
deposited sequences see http://www.ornl.gov/hgmis/project/
progress.html). In addition to ESTs, these sequences are
represented by single-pass genome survey sequences (GSS) encompassing
the ends of large genomic fragments, unfinished and unordered
high-throughput genomic sequences (HTGS), and finished nonredundant
(NR) sequences. Among the various GenBank sequences, the EST database
is particularly useful for identifying novel genes (3, 4). These short
sequences represent transcribed genes and contain continuous protein
coding regions. The 1.8 million human ESTs presently in the GenBank
have been clustered into approximately 92,224 unique transcripts
(Unigene Build 107, Jan. 27, 2000) based on their overlapping
sequences, therefore accounting for more than 60% of human genes
(http://www.ncbi.nlm.nih.gov/UniGene/Hs.stats.shtml).
It is clear that the EST sampling approach has identified a majority of
transcripts encoded by our genome. Indeed, more than 90% of known or
nonredundant genes of human or mouse origin are represented by at least
one EST (1). Thus, the EST and other gene sequence databases have
become indispensable for gene discovery.
 |
TRADITIONAL VS. GENBANK APPROACHES FOR THE DISCOVERY OF
ENDOCRINE GENES
|
---|
Using function- and phenotype-driven approaches, traditional
studies of endocrine signaling molecules are based upon purification of
hormones and receptors in specific glands and targets, respectively.
Intracellular signaling molecules were also isolated through protein
purification and, more recently, through protein-protein interaction
assays such as coprecipitation of proteins (5) (Table 1
). After the advent of recombinant DNA
technology, the discovery of endocrine signaling molecules has been
expanded to include searches based on 1) sequence similarity to known
genes using low-stringency hybridization, 2) PCR with degenerate
primers, and 3) differential cloning (subtraction and differential
display) (6, 7). The protein-protein interaction searches were also
expanded to include expression cloning based on ligand or antibody
binding (8, 9, 10) and yeast two-hybrid interaction assay (11, 12, 13). In
addition, some endocrine genes (e.g. MEN1) were isolated
based on genetic analysis in diseased individuals and positional
cloning after chromosome walking (14). Although the recombinant DNA
approach has allowed the successful discovery of new hormones,
receptors, and intracellular mediators, the genomic revolution
exemplified by the rapid GenBank expansion has prompted another
paradigm shift in the identification of novel endocrine factors with
the ultimate goal of finding all endocrine regulatory genes. With major
advances in bioinformatic tools available on the Internet (Table 2
), the massive resources in the GenBank
are easily accessible to investigators and allow opportunities for
major discoveries that could impact endocrinology as well as all fields
of biomedicine. The new approach takes advantage of the unprecedented
power of modern computing tools and minimizes the use of the
time-consuming laboratory bench-based wet science. Initial discoveries
of novel genes can be performed in minutes, followed by further
verification and characterization using the bench-top approaches. In
addition, the tedious stringency control procedures in the traditional
cloning methods based on sequence matching (e.g. degenerate
PCR and low-stringency hybridization) are now replaced by changing the
expectation or threshold values during computational searches. The
scale of economy for this approach is obvious: computational searches
take only minutes and can encompass sequences derived from diverse
organisms and tissue or cell origins. A comparison of the
computational search vs. PCR amplification for novel gene
discovery is shown in Table 3
.
 |
GENBANK SEARCHES FOR NOVEL PARALOGOUS ENDOCRINE GENES
|
---|
Although previously predicted, based on traditional approaches,
the concept that mammalian genes are present in families is
strengthened by the recent comparison of genome sequences in diverse
organisms (15). In addition to the identification of orthologous human
genes with conserved structure and function in lower organisms (16, 17), multiple paralogous genes evolve from the same ancestral gene
during the evolution of simple organisms into complex ones. Genes
evolved through domain rearrangement and gene fusion. New genes can
change their molecular specificity but remain in the similar regulatory
circuitry or may be recruited to serve completely new functions in a
novel regulatory linkage. In addition, new domains can evolve through
domain combination and shuffling. Furthermore, duplication of a
chromosomal fragment or even the whole genome provides opportunities
for the evolution of novel genes (18, 19, 20). By comparing genomes from
different phylogenies, it has been hypothesized that during early
chordate evolution, before Cambrian explosion and during the early
Devonian period, two entire genome duplications took place (21, 22),
which led to the generation of multiple mammalian paralogs and the
opportunity to develop complex regulatory mechanisms and functions.
To allow efficient comparison of DNA or protein sequences, a variety of
paired sequence comparison programs using different scoring matrices
have been developed for aligning individual sequences with a catalogued
sequence database. The Basic Local Alignment Search Tool (BLAST) used
by National Center for Biotechnology Information (NCBI) and
other related programs have become the essential tools for gene
sequence analysis and the deduction of their functions (23, 24, 25) (Table 2
). To facilitate the identification of genes that are related but with
limited primary sequence homology, a variety of sequence-based
computation tools and databases such as eMOTIF and BLOCKS (Table 2
)
have been developed to further optimize the discovery of sequence
relatedness among genes and for the development of phylogenetic and
evolutionary models. Moreover, a search of gene relatedness based on
the structure of translated products has also been explored (SCOP) to
reveal potential functional relationships. To facilitate analysis of
intron-exon arrangements of the unorganized HTGS and GSS genomic
sequences, gene recognition algorithms have also been developed (Genie,
GRAIL, etc).
Among the large number of ESTs and genomic entries in the GenBank,
sequences not identical to known ones but that share similarity either
in the entire structure or in selected domains are the most obvious
candidates for further study. In general, the relatedness is obvious
for an unknown gene that shares >40% identify with a known gene in
its primary sequences. While proteins with this level of identity
likely represent paralogs and could have similar functions, most novel
sequences require further validation of their putative physiological
roles.
 |
GENES WITH CLOSE EVOLUTIONARY ORIGIN
|
---|
Taking advantage of the evolutionary conservation of the
primary sequence of genes that diverge during evolution, comparison of
novel genes with known family proteins can lead to identification of
paralogous genes. The classical example of a gene search using this
approach is the identification of multiple orphan receptors more than a
decade ago, which was based mainly on low-stringency hybridization
cloning. The term orphan receptor was coined to describe gene products
that belong to the nuclear receptor superfamily on the basis of
sequence identity but without known ligands. Orphan receptors have been
identified in most metazoans and include about 40 human genes (26, 27).
The cloning of the orphan receptors ushered in the exciting new concept
of reverse endocrinology. The reversed endocrine discovery process led
to the identification of vitamin A metabolite 9-cis retinoic
acid as a ligand for the retinoid X receptor (RXR) family of orphan
receptors (28, 29). In addition, various hormones belonging to the
transforming growth factor-ß (TGF-ß) superfamily, including
different growth differentiation factors (GDFs) and bone morphogenic
proteins (BMPs), have been identified based on degenerate PCR
amplification and low-stringency screening (30, 31). Likewise, a
variety of tyrosine kinase receptors (32, 33, 34), serine/threonine kinase
receptors (type I and II receptor for TGF-ß superfamily proteins) (8, 35, 36, 37), and chemokines (38) were isolated using the same approach.
More recently, up to 80 orphan G protein-coupled receptors have been
identified largely through the cloning of genes sharing the unique
seven-transmembrane domains by using PCR with degenerate primers (39, 40).
Using the computation approach, additional G protein-coupled receptors
have been identified in the GenBank based on sequence conservation in
the transmembrane region (40, 41). Recently, we have identified four
novel leucine-rich repeat-containing, G protein-coupled receptors (LGR)
sharing significant homology with known glycoprotein receptors from
vertebrates using the sequences of primitive homologs isolated from
Drosophila and snail Lymnanei stagnalis (42, 43)
as queries for the GenBank search. Using the same evolutionary
conservation approach, a number of genes homologous to mammalian
glycoprotein hormone receptors were identified from nematode (44)
and Drosophila (S. Y. Hsu, S. Nishi, and A. J. W.
Hsueh, unpublished data). Of interest, the nematode LGR protein
expressed in mammalian cells showed constitutive activity, resembling
the point mutations found in the LH receptor gene in patients with
familial male-limited precocious puberty (45, 46) and in the TSH
receptor gene in patients with nonautoimmune hyperthyroidism (47). The
finding of these novel leucine-rich repeat-containing, G
protein-coupled receptors in diverse phylogenies has allowed 1) the
determination of the consensus sequences and constituent motifs in
these receptors, 2) the identification of additional LGRs, 3) the
construction of a gene family tree and evolutionary models for these
receptors, and 4) the improvement of the modeling of LGR protein
structures (48). Although these novel receptors shared identical domain
arrangement with glycoprotein hormone receptors, diversification in
their primary sequences has predicted their binding to novel ligands,
thus providing opportunities for reverse endocrinology. Similar
bioinformatic searches have resulted in the identification of multiple
cytokines
(http://cytokine.medic.kumamoto-u.ac.jp/CFC/CK/Chemo-kine.html)
(49, 50, 51, 52) and a large group of human Toll-like receptors (53, 54, 55). Many
novel cytokines identified by this approach have been shown to function
as chemotaxis regulators and are important for inflammatory responses.
Likewise, Toll-like receptors are essential for immune surveillance.
The rapid virtual gene identification process facilitates the
illustration of the physiological function of these paralogous
genes.
 |
GENES WITH LIMITED SIMILARITY IN SELECTIVE MOTIFS
|
---|
While paralogous genes sharing an overall similarity can be
identified with ease, discovery of related genes with limited
similarity in selective motifs require an in-depth understanding of the
structure-function of the common motif. Traditionally, low-stringency
hybridization and degenerate PCR based on shared sequence motifs are
the main approaches for identifying this group of paralogous genes.
However, this approach becomes inefficient and impractical when the
consensus motif is short and the similarity among homologous genes is
low. In contrast, computation approaches allow the scanning of
similarity along the entire length of candidate genes, based on
conservation in stretches of primary sequences or in the presumptive
secondary structure (Table 3
). The confidence on sequence matches among
remote homologs could gradually be optimized by focusing on comparison
in regions corresponding to active sites or functional motifs that are
most likely to be shared among divergent members. Thus, the
computational approach is especially fruitful when the homology among
paralogous genes is low but the functional motif(s) of known family
members are well characterized. For example, proteins in the insulin
superfamily were grouped based on their conserved tertiary structure of
mature proteins. These proteins include insulin, insulin-like growth
factors I and II (IGF-I, IGF-II), relaxin, Leydig cell relaxin, and
EPIL (early placenta insulin-like factor) in vertebrates as well as
molluscan insulin-like peptides and bombyxins in invertebrates (56).
Six cysteine residues important for interchain disulfide bond formation
are completely conserved in these otherwise remotely related
polypeptides. Using the conserved domain sequences as query, we have
isolated two novel relaxin/insulin-like factors (RIFs) sharing close
homology with relaxin in the putative mature portion of these
polypeptides (56). Although RIFs share low overall sequence similarity
(<30%) with proteins in the insulin/relaxin superfamily, the
disulfide bond-forming cysteine residues are completely conserved.
Indeed, it is difficult to identify these molecules using traditional
sequence-matching methods. Among genes involved in apoptosis, the death
effector domain represents a stretch of about 80 amino acids that is
shared by adapters, regulators, and executors of the tumor necrosis
factor-
(TNF-
)/Fas pathway. Based on the conservation of this
domain, GenBank searches have also allowed us to isolate an
intracellular apoptosis mediator DEFT (death effector in testis) that
is abundantly expressed in the testis (57).
 |
GENBANK SEARCHES FOR POLYMORPHISMS OF KNOWN GENES
|
---|
In addition to the discovery of novel genes, the computational
approach also allows the unique opportunity to identify polymorphisms
of known genes encoding proteins with different functional
characteristics. Single nucleotide polymorphism (SNP) represents single
nucleotide changes in the same gene among individuals (58). These minor
alterations occur with varying frequencies in different ethnic
populations and are the central focus of pharmacogenomic studies. It
has been estimated that there are 300,000 potential SNPs in the human
genome (59, 60). Although SNPs were traditionally characterized by
single-strand conformational polymorphism (SSCP) analysis, the
accumulation of multiple sequences of a given gene in the GenBank
allows future polymorphism discovery when sufficient sequences of the
same gene are aligned. The importance of endocrine gene polymorphism is
illustrated by polymorphisms of cytochrome P450 genes associated with
abolished or enhanced drug metabolism in patients (61, 62) and a
chemokine receptor CCR-5 polymorphism associated with human
immunodeficiency virus (HIV) resistance (63). Other examples include
polymorphisms in ß-adrenergic receptors and sensitivity to
ß-agonists in asthmatics (64), angiotensin-converting enzymes (ACE)
and sensitivity to ACE inhibitors (65, 66), angiotensin II type 1
receptors and vascular reactivity to phenylephrine (67), and
hydroxtryptamine receptors and responsiveness to neuroleptics such as
clozapine (68). The identification of these polymorphisms not only
contributed to the study of pharmacology and epidemiology but also
allowed the elucidation of functional mechanisms of some of the
receptors. For hormone ligands, the human LHß gene has two linked
SNPs (Trp8Arg and Ile15Thr) that are known to encode proteins
exhibiting altered immunoassay properties (69) and show distinct
carrier frequency among different racial groups. Furthermore,
polymorphisms in the promoter of the chemokine RANTES gene are
associated with changes in the secretion of this ligand for CCR-5, thus
altering the chance of HIV infection (70). In the future, analysis of
gene polymorphism using the informatic approach could allow one to
identify the genetic susceptibility of individuals to some endocrine
diseases and to customize hormonal treatment protocols for individual
patients.
 |
LIMITATIONS IN THE MINING OF GENE SEQUENCE DATABASES
|
---|
Although the existing computation algorithm for gene sequence
analysis is powerful, there are limitations. Sequences obtained through
the high-throughput DNA analysis are generated by automated single-pass
analysis of DNA. As a result, sequences may contain mistakes,
deletions, or insertions. In addition, the annotation of many sequence
entries may have mistakes. Thus, despite the unprecedented opportunity
to access the gigantic databases, conflicting information derived from
such archives requires careful scrutiny to weed out false positives
including sequencing errors, annotation mistakes, pseudogenes, multiple
copies of recently duplicated and highly homologous genes
(e.g. hCG-ß), and a combination of these common artifacts.
These pitfalls are particularly imminent when one is studying families
of proteins with close sequence homology or is focusing on detailed
features (such as polymorphism) of sequences.
The relatedness of diverse genes could theoretically be decided based
on pairwise sequence comparison and motif search programs. Although
sequences that share less than1520% identity could have
sequence-function relatedness, most genes with this level of identity
are likely to perform distinct functions. In addition, the majorities
of sequenced genes in the GenBank either have limited sequence
similarity to known genes or have no sequence references. It is
becoming clear that sequence-independent approaches are required to
augment the powerful sequence-based gene discovery.
 |
INTEGRATION OF GENOMIC INFORMATION AND KNOWN LITERATURE ON GENE
FUNCTION
|
---|
With the anticipated sequencing of the human genome, future gene
discovery no longer involves the physical cloning of new genes but will
include consolidation and integration of existing sequences and the
identification of gene functions. To gain insight into the potential
roles of individual genes in our genome, several new approaches have
been developed to facilitate the organization and dissemination of data
from GenBank and the existing literature, as well as to streamline the
data extraction process for identifying gene functions.
 |
SEQUENCE-BASED CLUSTERING OF INDIVIDUAL DNA SEQUENCES
|
---|
Multiple sequences derived from the same gene were submitted to
the GenBank by different investigators, and a given gene may have
different entries (e.g. contiguous and noncontiguous genomic
sequences, mRNA sequences with alternative splicing, and ESTs). Because
it is not yet routine to identify all genomic regions that are
transcribed, the integration of EST sequences and genomic sequences is
necessary to identify all coding regions. To consolidate all sequences
derived from the same gene, a number of databases such as UniGene,
STACK, and TIGR have been set up (Table 2
) (71). The UniGene collection
has grouped more than 1.8 million ESTs of human origin into clusters of
unique sequences, each representing the transcription product of a
distinct human gene. This collection provides nonredundant mapping
candidates for generating gene expression profiles in diverse tissues.
Moreover, the organization of gene sequences in this format has allowed
investigators to perform virtual library subtraction experiments in a
tissue- or cell-specific manner across the whole body through simple
query (72).
Due to the recent rapid expansion of HTG sequences, it is anticipated
that the majority of DNA sequences in the GenBank will be in the form
of unfinished genomic sequences in the near future. Because these
sequences are derived from single or few reads of DNA clones, they are
unordered, unoriented, and contain gaps and errors. To analyze this
type of sequence, sophisticated algorithms that integrate the analysis
of different gene features have been developed. For example, the
Genotator and NIX packages provide workbenches for automated sequence
annotation and annotation browsing (Table 2
). The programs integrate
multiple gene-finding tools, homology searches, and algorithms for
identifying promoters, splice sites, and open reading frames. The
results are presented using a graphic browser to facilitate gene
analysis. The tradeoff of such convenience is the lack of options to
alter the stringency of analysis. However, even the best tools rarely
perform at more than 75% accuracy. Indeed, recent studies indicated
that, in addition to the 6,200 predicted genes, 300 novel genes can be
identified in yeast using genome-wide transposon disruption; these
genes were not recognized based on the most sophisticated algorithm
tools (73).
 |
INTEGRATION OF GENE SEQUENCES WITH TEXT-BASED LITERATURE ON GENE
FUNCTION
|
---|
With the explosion of gene sequences, one of the major
challenges facing genomic researchers today is the integration of
sequence data with the vast and growing body of literature based on
functional analyses of genes. The implementation of a user-friendly
interface for easy query of diverse databases is necessary for both
novice and experienced investigators who seek to sort out the
expanding knowledge base of gene sequence and functions. The Online
Mendelian Inheritance in Men (OMIM) database
(http://www.ncbi.nlm.nih.gov/Omim/) provides a searchable web-based
dynamic database of human genes and genetic disorders. Each gene entry
contains textual information and hypertext links to the GenBank
database and PubMed, allowing quick reference checks for most known
genes. In addition, multiple online databases for specific gene groups
or tissues have been developed. For example, several databases (GPCRDB,
GCRDb, GRAP, ORDB, CORD, and Swiss Model 7TM Interface, accessible
through the portal site http://www.opioid.umn.edu/links.html) organize
information on the G protein-coupled receptors into searchable
databases, allowing users to browse and analyze the molecular and
physiological data of this largest protein family in eukaryotic
organisms (74). Likewise, the Nuclear Receptor Resource (NRR) Project
(http://nrr.georgetown.edu/NRR/NRR.html) is a collection of individual
databases on members of the steroid and thyroid hormone receptor
superfamily. In addition, we have developed a database on ovarian
genes. The Ovarian Kaleidoscope Database (http://ovary.stanford.edu/)
provides information regarding the biological function, expression
pattern, and regulation of genes expressed in the ovary. It also serves
as a gateway to other online information resources offering data about
nucleotide and amino acid sequences, chromosomal localization, human
and murine mutation phenotypes, and biomedical publications
relevant to ovarian research. For all these databases, gene sequence
and functional information are interlinked. Continuing development of
databases and portal sites focusing on specific biomedical areas would
allow easy access to updated and organized discipline-specific
information and perhaps make the analysis of gene sequences part of
routine literature searches.
 |
FUTURE IDENTIFICATION OF GENE FUNCTIONS
|
---|
Among the human genes, currently less than 8% are
considered to be known genes with different levels of functional
characterization (75). The inadequacy of a sequence alignment-based
approach to determine the functions of most sequenced genes has
prompted the development of a variety of experimental procedures to
utilize the genomic information more efficiently and to provide
alternative ways for the annotation of novel genes. With the
anticipated availability of entire human genome sequences, the
traditional positional cloning approach will be replaced by virtual
chromosomal walking of the ordered genome sequences. Furthermore, the
present single gene knockout approach for gene function identification
will be replaced by random genome-wide gene deletion
(http://www.lexgen.com/k) or targeted gene trapping (76, 77). The
secretory trap method based on capturing the N-terminal signal peptide
sequence of an endogenous gene to generate an active reporter fusion
protein has allowed the generation of mice with a deletion of genes
encoding membrane and secreted proteins (77, 78, 79, 80). Among the several
hundred mutant strains generated
(http://socrates.berkeley.edu/
skarnes/resource.html), many are
defective in ligand or receptor genes important for endocrine research.
Integration of this collection of mutant mice with the expanding
GenBank database will continue to allow characterization of new
hormones and receptors.
 |
DNA ARRAY FOR EXPRESSION PROFILING AND POLYMORPHISM SEARCH
|
---|
The development of high-density DNA microarrays has allowed the
analysis of the expression of thousands of genes simultaneously (81).
By combining this technology with the vast EST collection,
investigators now can quickly group genes showing similar expression
profiles. The DNA array method has allowed the genotyping of
physiological processes and the understanding of regulatory mechanisms
in a global manner without detailed functional analysis of individual
genes. For example, one can detect changes of gene expression based on
hybridization to cDNAs from different physiological states (before and
after endocrine ablation), diseased progression stages (normal
vs. diseased) and treatment periods (pre- vs.
post-) (82). In addition, the sequencing-by-hybridization approach
based on array analysis of overlapping oligomers corresponding to the
entire sequence of a known gene allows detection of mutations or
polymorphism of individual genes (83).
 |
FACILITATION OF PROTEIN DISCOVERY USING GENBANK SEARCH
|
---|
The classical protein purification approach for identifying
endocrine proteins will be accelerated by matching EST databases with
peptide sequences derived from two-dimensional gel electrophoresis
analysis followed by peptide mass fingerprinting with MALDI-TOF
(matrix-assisted laser-desorption-ionization-time-of-flight) mass
spectrometry. Modern mass spectrometers can provide sufficient
information to allow unique recognition of protein fragments, as well
as detection of secondary modifications such as phosphorylation and
glycosylation. Alternatively, it may be possible to devise chip
detectors for ligands, receptors, and intracellular signaling
mediators. In addition, integration of a genome database with protein
coprecipitation and global yeast two-hybrid interaction approaches
should allow the assembly of comprehensive interaction maps of genomes
(84, 85) and the routine identification of downstream components of
cell surface receptors in mammalian cells (86).
 |
THREE-DIMENSIONAL STRUCTURE COMPARISON
|
---|
Because the folding of a protein is the result of a collective
interaction among its constituent amino acid residues, comparison of
proteins based on their secondary or tertiary structures represents an
alternative approach for predicting gene functions. Important
information can be extracted regarding the fold of a protein embedded
in the length and arrangement of the predicted helices, strands, and
coils along the polypeptide chain. As the number of polypeptides with
determined secondary or tertiary structures is increased, this approach
will become more useful to identify the function of uncharacterized
genes (87, 88).
 |
INTEGRATION OF EXPRESSION, PATHWAY, AND PHYLOGENETIC PROFILING
|
---|
By phylogenetic comparison of sequence data with experimental data
on correlated mRNA expression patterns, protein-protein interactions,
or protein functions in a given species, the researcher is able to
infer protein functions through properties other than sequence
similarity (89, 90, 91, 92). Studies on genomes of lower organisms have
demonstrated that a great number of genes are created through multiple
fusion events during evolution, and genes that have fused are likely to
interact with each other in the same signaling pathway (93, 94, 95). In
addition, previously unidentified genes associated with prostate
cancer, steroid synthesis, insulin synthesis, and neurotransmitter
processing can be discovered by using a guilt-by-association approach
based on coordinated expression of genes in different cDNA libraries
(96).
 |
CONCLUSIONS
|
---|
Historically, most endocrine factors have been defined based on
phenotypic changes. As the genome projects progress, the massive
archives in the GenBank represent a golden opportunity to discover new
hormones, receptors, and signaling molecules. Through sequence
analysis, paralogous novel endocrine genes can be isolated, and
furthermore, polymorphisms of known genes can be identified. Although
gene sequence-based approaches cannot completely replace the
traditional biochemical and physiological characterization of endocrine
mediators, they have greatly facilitated the identification and
analysis of new genes. The ongoing shift in the investigation from one
gene at a time to a global approach necessitates new methods to
integrate the explosion of knowledge on gene sequences, transcript
expression profiles, and protein functions and interactions. Although
all human genes will be known in a few years, major challenges await
endocrinologists to elucidate the physiological roles of all hormones,
receptors, and signaling mediators. The greatest challenge will be to
decipher the logical circuitry controlling all endocrine pathways.
 |
ACKNOWLEDGMENTS
|
---|
We thank Ms. Caren Spencer for editorial assistance.
 |
FOOTNOTES
|
---|
Address requests for reprints to: Dr. Aaron J. W. Hsueh, Department of Gynecology/Obstetrics, Stanford University School of Medicine, Division of Reproductive Biology, 300 Pasteur Drive, Room A-344, Stanford, California 94305-5317.
Work from our laboratory was supported by NIH Grants HD-23273 and
HD-31398. The Ovarian Kaleidoscope Database is supported by the
Specialized Cooperative Centers Program in Reproduction Research,
NICHD, NIH.
Received for publication January 25, 2000.
Revision received February 28, 2000.
Accepted for publication March 1, 2000.
 |
REFERENCES
|
---|
-
Collins FS, Patrinos A, Jordan E, Chakravarti A,
Gesteland R, Walters L 1998 New goals for the U.S. Human Genome
Project: 19982003. Science 282:682689[Abstract/Free Full Text]
-
Lander ES 1996 The new genomics: global views of biology.
Science 274:536539[Free Full Text]
-
Boguski MS, Schuler GD 1995 ESTablishing a human transcript
map. Nat Genet 10:369371[Medline]
-
Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice
K, White RE, Rodriguez-Tome P, Aggarwal A, Bajorek E, Bentolila S,
Birren BB, Butler A, Castle AB, Chiannilkulchai N, Chu A, Clee C,
Cowles S, Day PJ, Dibling T, Drouot N, Dunham I, Duprat S, East C,
Hudson TJ, et al. 1996 A gene map of the human
genome. Science 274:540546[Abstract/Free Full Text]
-
Margolis B 1994 The GRB family of SH2 domain proteins. Prog
Biophys Mol Biol 62:223244[CrossRef][Medline]
-
Tsuchida K, Mathews LS, Vale WW 1993 Cloning and
characterization of a transmembrane serine kinase that acts as an
activin type I receptor. Proc Natl Acad Sci USA 90:1124211246[Abstract]
-
Koenig BB, Cook JS, Wolsing DH, Ting J, Tiesman JP, Correa
PE, Olson CA, Pecquet AL, Ventura F, Grant RA 1994 Characterization and cloning of a receptor for BMP-2 and BMP-4 from NIH
3T3 cells. Mol Cell Biol 14:59615974[Abstract]
-
Mathews LS, Vale WW 1991 Expression cloning of an activin
receptor, a predicted transmembrane serine kinase. Cell 65:973982[Medline]
-
Takemoto Y, Furuta M, Sato M, Hashimoto Y 1997 A simple
improvement in expression cloning. DNA Cell Biol 16:797799[Medline]
-
Franzen P, ten Dijke P, Ichijo H, Yamashita H, Schulz P,
Heldin CH, Miyazono K 1993 Cloning of a TGFß type I receptor that
forms a heteromeric complex with the TGFß type II receptor. Cell 75:681692[Medline]
-
McLaughlin MM, Kumar S, McDonnell PC, Van Horn S, Lee JC, Livi
GP, Young PR 1996 Identification of mitogen-activated protein (MAP)
kinase-activated protein kinase-3, a novel substrate of CSBP p38 MAP
kinase. J Biol Chem 271:84888492[Abstract/Free Full Text]
-
Witczak O, Skalhegg BS, Keryer G, Bornens M, Tasken K, Jahnsen
T, Orstavik S 1999 Cloning and characterization of a cDNA encoding an
A-kinase anchoring protein located in the centrosome, AKAP450. EMBO J 18:18581868[Abstract/Free Full Text]
-
Kawabata M, Chytil A, Moses HL 1995 Cloning of a novel type II
serine/threonine kinase receptor through interaction with the type I
transforming growth factor-ß receptor. J Biol Chem 270:56255630[Abstract/Free Full Text]
-
Chandrasekharappa SC, Guru SC, Manickam P, Olufemi SE, Collins
FS, Emmert-Buck MR, Debelenko LV, Zhuang Z, Lubensky IA, Liotta LA,
Crabtree JS, Wang Y, Roe BA, Weisemann J, Boguski MS, Agarwal SK,
Kester MB, Kim YS, Heppner C, Dong Q, Spiegel AM, Burns AL, Marx SJ 1997 Positional cloning of the gene for multiple endocrine
neoplasia-type 1. Science 276:404407[Abstract/Free Full Text]
-
Bruccoleri RE, Dougherty TJ, Davison DB 1998 Concordance
analysis of microbial genomes. Nucleic Acids Res 26:44824486[Abstract/Free Full Text]
-
Chervitz SA, Aravind L, Sherlock G, Ball CA, Koonin EV, Dwight
SS, Harris MA, Dolinski K, Mohr S, Smith T, Weng S, Cherry JM, Botstein
D 1998 Comparison of the complete protein sets of worm and yeast:
orthology and divergence. Science 282:20222028[Abstract/Free Full Text]
-
Botstein D, Chervitz SA, Cherry JM 1997 Yeast as a model
organism. Science 277:12591260[Free Full Text]
-
Wolfe KH, Shields DC 1997 Molecular evidence for an ancient
duplication of the entire yeast genome. Nature 387:708713[CrossRef][Medline]
-
Blaxter M 1998 Caenorhabditis elegans is a nematode. Science 282:20412046[Abstract/Free Full Text]
-
Ruvkun G, Hobert O 1998 The taxonomy of developmental control
in Caenorhabditis elegans. Science 282:20332041[Abstract/Free Full Text]
-
Meyer A, Schartl M 1999 Gene and genome duplications in
vertebrates: the one-to-four (-to-eight in fish) rule and the evolution
of novel gene functions. Curr Opin Cell Biol 11:699704[CrossRef][Medline]
-
Pebusque MJ, Coulier F, Birnbaum D, Pontarotti P 1998 Ancient
large-scale genome duplications: phylogenetic and linkage analyses shed
light on chordate genome evolution. Mol Biol Evol 15:11451159[Abstract]
-
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ 1990 Basic
local alignment search tool. J Mol Biol 215:403410[CrossRef][Medline]
-
Altschul SF, Lipman DJ 1990 Protein database searches for
multiple alignments. Proc Natl Acad Sci USA 87:55095513[Abstract]
-
Pearson WR, Lipman DJ 1988 Improved tools for biological
sequence comparison. Proc Natl Acad Sci USA 85:24442448[Abstract]
-
Kliewer SA, Lehmann JM, Willson TM 1999 Orphan nuclear
receptors: shifting endocrinology into reverse. Science 284:757760[Abstract/Free Full Text]
-
Kliewer SA, Lehmann JM, Milburn MV, Willson TM 1999 The PPARs
and PXRs: nuclear xenobiotic receptors that define novel hormone
signaling pathways. Recent Prog Horm Res 54:345367[Medline]
-
Willy PJ, Umesono K, Ong ES, Evans RM, Heyman RA, Mangelsdorf
DJ 1995 LXR, a nuclear receptor that defines a distinct retinoid
response pathway. Genes Dev 9:10331045[Abstract]
-
Mangelsdorf DJ, Evans RM 1995 The RXR heterodimers and orphan
receptors. Cell 83:841850[Medline]
-
McPherron AC, Lee SJ 1993 GDF-3 and GDF-9: two new members of
the transforming growth factor-ß superfamily containing a novel
pattern of cysteines. J Biol Chem 268:34443449[Abstract/Free Full Text]
-
Wozney JM 1992 The bone morphogenetic protein family and
osteogenesis. Mol Reprod Dev 32:160167[Medline]
-
Crosier PS, Lewis PM, Hall LR, Vitas MR, Morris CM, Beier DR,
Wood CR, Crosier KE 1994 Isolation of a receptor tyrosine kinase (DTK)
from embryonic stem cells: structure, genetic mapping and analysis of
expression. Growth Factors 11:125136[Medline]
-
Crosier PS, Hall LR, Vitas MR, Lewis PM, Crosier KE 1995 Identification of a novel receptor tyrosine kinase expressed in acute
myeloid leukemic blasts. Leuk Lymphoma 18:443449[Medline]
-
Iwama A, Okano K, Sudo T, Matsuda Y, Suda T 1994 Molecular
cloning of a novel receptor tyrosine kinase gene, STK, derived from
enriched hematopoietic stem cells. Blood 83:31603169[Abstract/Free Full Text]
-
Mathews LS 1994 Activin receptors and cellular signaling by
the receptor serine kinase family. Endocr Rev 15:310325[Medline]
-
Ryden M, Imamura T, Jornvall H, Belluardo N, Neveu I, Trupp M,
Okadome T, ten Dijke P, Ibanez CF 1996 A novel type I receptor
serine-threonine kinase predominantly expressed in the adult central
nervous system. J Biol Chem 271:3060330609[Abstract/Free Full Text]
-
Tsuchida K, Sawchenko PE, Nishikawa S, Vale WW 1996 Molecular
cloning of a novel type I receptor serine/threonine kinase for the
TGFß superfamily from rat brain. Mol Cell Neurosci 7:467478[CrossRef][Medline]
-
Parnet P, Garka KE, Bonnert TP, Dower SK, Sims JE 1996 IL-1Rrp
is a novel receptor-like molecule similar to the type I interleukin-1
receptor and its homologues T1/ST2 and IL-1R AcP. J Biol Chem 271:39673970[Abstract/Free Full Text]
-
Marchese A, Docherty JM, Nguyen T, Heiber M, Cheng R, Heng HH,
Tsui LC, Shi X, George SR, ODowd BF 1994 Cloning of human genes
encoding novel G protein-coupled receptors. Genomics 23:609618[CrossRef][Medline]
-
Marchese A, George SR, Kolakowski Jr LF, Lynch KR, ODowd BF 1999 Novel GPCRs and their endogenous ligands: expanding the boundaries
of physiology and pharmacology. Trends Pharmacol Sci 20:370375[CrossRef][Medline]
-
ODowd BF, Nguyen T, Jung BP, Marchese A, Cheng R, Heng HH,
Kolakowski Jr LF, Lynch KR, George SR 1997 Cloning and chromosomal
mapping of four putative novel human G-protein- coupled receptor genes.
Gene 187:7581[CrossRef][Medline]
-
Tensen CP, Van Kesteren ER, Planta RJ, Cox KJ, Burke JF, van
Heerikhuizen H, Vreugdenhil E 1994 A G protein-coupled receptor with
low density lipoprotein-binding motifs suggests a role for lipoproteins
in G-linked signal transduction. Proc Natl Acad Sci USA 91:48164820[Abstract]
-
Hauser F, Nothacker HP, Grimmelikhuijzen CJ 1997 Molecular cloning, genomic organization, and developmental
regulation of a novel receptor from Drosophila melanogaster
structurally related to members of the thyroid-stimulating hormone,
follicle-stimulating hormone, luteinizing hormone/choriogonadotropin
receptor family from mammals. J Biol Chem 272:10021010[Abstract/Free Full Text]
-
Kudo M, Chen T, Nakabayashi K, Hsu SY, Hsueh AJ 2000 The
nematode leucine-rich repeat-containing, G protein-coupled receptor
(LGR) protein homologous to vertebrate gonadotropin, thyrotropin
receptors is constitutively actived in mammalian cells. Mol Endocrinol 14:272284[Abstract/Free Full Text]
-
Kudo M, Osuga Y, Kobilka BK, Hsueh AJW 1996 Transmembrane
regions V and VI of the human luteinizing hormone receptor are required
for constitutive activation by a mutation in the third intracellular
loop. J Biol Chem 271:2247022478[Abstract/Free Full Text]
-
Laue L, Chan WY, Hsueh AJ, Kudo M, Hsu SY, Wu SM, Blomberg L,
Cutler Jr GB 1995 Genetic heterogeneity of constitutively activating
mutations of the human luteinizing hormone receptor in familial
male-limited precocious puberty. Proc Natl Acad Sci USA 92:19061910[Abstract]
-
Kopp P, Jameson JL, Roe TF 1997 Congenital nonautoimmune
hyperthyroidism in a nonidentical twin caused by a sporadic germline
mutation in the thyrotropin receptor gene. Thyroid 7:765770[Medline]
-
Hsu SY, Liang SG, Hsueh AJ 1998 Characterization of two LGR
genes homologous to gonadotropin and thyrotropin receptors with
extracellular leucine-rich repeats and a G protein-coupled,
seven-transmembrane region. Mol Endocrinol 12:18301845[Abstract/Free Full Text]
-
Wells TN, Peitsch MC 1997 The chemokine information source:
identification and characterization of novel chemokines using the
WorldWideWeb and expressed sequence tag databases. J Leukoc Biol 61:545550[Abstract]
-
Yoshie O, Imai T, Nomiyama H 1997 Novel lymphocyte-specific CC
chemokines and their receptors. J Leukoc Biol 62:634644[Abstract]
-
Hieshima K, Imai T, Opdenakker G, Van Damme J, Kusuda J, Tei
H, Sakaki Y, Takatsuki K, Miura R, Yoshie O, Nomiyama H 1997 Molecular
cloning of a novel human CC chemokine liver and activation- regulated
chemokine (LARC) expressed in liver. Chemotactic activity for
lymphocytes and gene localization on chromosome 2. J Biol Chem 272:58465853[Abstract/Free Full Text]
-
Yang JY, Spanaus KS, Widmer U 2000 Cloning, characterization,
genomic organization of Lcc-1 (Scya16), a novel human Cc chemokine
expressed in liver. Cytokine 12:101109[CrossRef][Medline]
-
Rock FL, Hardiman G, Timans JC, Kastelein RA, Bazan JF 1998 A
family of human receptors structurally related to Drosophila Toll. Proc
Natl Acad Sci USA 95:588593[Abstract/Free Full Text]
-
Chaudhary PM, Ferguson C, Nguyen V, Nguyen O, Massa HF, Eby M,
Jasmin A, Trask BJ, Hood L, Nelson PS 1998 Cloning and characterization
of two Toll/interleukin-1 receptor-like genes TIL3 and TIL4: evidence
for a multi-gene receptor family in humans. Blood 91:40204027[Abstract/Free Full Text]
-
Yang RB, Mark MR, Gray A, Huang A, Xie MH, Zhang M,
Goddard A, Wood WI, Gurney AL, Godowski PJ 1998 Toll-like receptor-2
mediates lipopolysaccharide-induced cellular signalling. Nature 395:284288[CrossRef][Medline]
-
Hsu SY 1999 Cloning of two novel mammalian paralogs of
relaxin/insulin family proteins and their expression in testis and
kidney. Mol Endocrinol 13:21632174[Abstract/Free Full Text]
-
Leo CP, Hsu SY, McGee EA, Salanova M, Hsueh AJ 1998 DEFT, a
novel death effector domain-containing molecule predominantly expressed
in testicular germ cells. Endocrinology 139:48394848[Abstract/Free Full Text]
-
Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R,
Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L,
Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS,
Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lander ES 1998 Large-scale identification, mapping, and genotyping of
single-nucleotide polymorphisms in the human genome. Science 280:10771082[Abstract/Free Full Text]
-
Smigielski EM, Sirotkin K, Ward M, Sherry ST 2000 dbSNP: a
database of single nucleotide polymorphisms. Nucleic Acids Res 28:352355[Abstract/Free Full Text]
-
Evans WE, Relling MV 1999 Pharmacogenomics: translating
functional genomics into rational therapeutics. Science 286:487491[Abstract/Free Full Text]
-
Ingelman-Sundberg M, Oscarson M, McLellan RA 1999 Polymorphic
human cytochrome P450 enzymes: an opportunity for individualized drug
treatment. Trends Pharmacol Sci 20:342349[CrossRef][Medline]
-
Marshall A 1997 Laying the foundations for personalized
medicines. Nat Biotechnol 15:954957[Medline]
-
Liu R, Paxton WA, Choe S, Ceradini D, Martin SR, Horuk R,
MacDonald ME, Stuhlmann H, Koup RA, Landau NR 1996 Homozygous defect in
HIV-1 coreceptor accounts for resistance of some multiply-exposed
individuals to HIV-1 infection. Cell 86:367377[Medline]
-
Buscher R, Herrmann V, Insel PA 1999 Human adrenoceptor
polymorphisms: evolving recognition of clinical importance. Trends
Pharmacol Sci 20:9499[CrossRef][Medline]
-
Nakano Y, Oshima T, Watanabe M, Matsuura H, Kajiyama G, Kambe
M 1997 Angiotensin I-converting enzyme gene polymorphism and acute
response to captopril in essential hypertension. Am J Hypertens 10:10641068[CrossRef][Medline]
-
Haas M, Yilmaz N, Schmidt A, Neyer U, Arneitz K, Stummvoll HK,
Wallner M, Auinger M, Arias I, Schneider B, Mayer G 1998 Angiotensin-converting enzyme gene polymorphism determines the
antiproteinuric and systemic hemodynamic effect of enalapril in
patients with proteinuric renal disease. Austrian Study Group of the
Effects of Enalapril Treatment in Proteinuric Renal Disease. Kidney
Blood Press Res 21:6669[CrossRef][Medline]
-
Henrion D, Amant C, Benessiano J, Philip I, Plantefeve G,
Chatel D, Hwas U, Desmont JM, Durand G, Amouyel P, Levy BI 1998 Angiotensin II type 1 receptor gene polymorphism is associated with an
increased vascular reactivity in the human mammary artery in
vitro. J Vasc Res 35:356362[CrossRef][Medline]
-
Arranz MJ, Collier DA, Munro J, Sham P, Kirov G, Sodhi M,
Roberts G, Price J, Kerwin RW 1996 Analysis of a structural
polymorphism in the 5-HT2A receptor and clinical response to clozapine.
Neurosci Lett 217:177178[CrossRef][Medline]
-
Haavisto AM, Pettersson K, Bergendahl M, Virkamaki A,
Huhtaniemi I 1995 Occurrence and biological properties of a common
genetic variant of luteinizing hormone. J Clin Endocrinol Metab 80:12571263[Abstract]
-
Liu H, Chao D, Nakayama EE, Taguchi H, Goto M, Xin X,
Takamatsu JK, Saito H, Ishikawa Y, Akaza T, Juji T, Takebe Y, Ohishi T,
Fukutake K, Maruyama Y, Yashiki S, Sonoda S, Nakamura T, Nagai Y,
Iwamoto A, Shioda T 1999 Polymorphism in RANTES chemokine promoter
affects HIV-1 disease progression. Proc Natl Acad Sci USA 96:45814585[Abstract/Free Full Text]
-
Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn
AA, Broveak TR, Hide WA 1999 A comprehensive approach to clustering of
expressed human gene sequence: the sequence tag alignment and consensus
knowledge base. Genome Res 9:11431155[Abstract/Free Full Text]
-
Vasmatzis G, Essand M, Brinkmann U, Lee B, Pastan I 1998 Discovery of three genes specifically expressed in human prostate by
expressed sequence tag database analysis. Proc Natl Acad Sci USA 95:300304[Abstract/Free Full Text]
-
Ross-Macdonald P, Coelho PS, Roemer T, Agarwal S, Kumar A,
Jansen R, Cheung KH, Sheehan A, Symoniatis D, Umansky L, Heidtman M,
Nelson FK, Iwasaki H, Hager K, Gerstein M, Miller P, Roeder GS, Snyder
M 1999 Large-scale analysis of the yeast genome by transposon tagging
and gene disruption. Nature 402:413418[CrossRef][Medline]
-
Nakata K, Takai T, Kaminuma T 1999 Development of the receptor
database (RDB): application to the endocrine disruptor problem.
Bioinformatics 15:544552[Abstract/Free Full Text]
-
OBrien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg
J, Stanyon R, Copeland NG, Jenkins NA, Womack JE, Marshall Graves JA 1999 The promise of comparative genomics in mammals. Science 286:458481[Abstract/Free Full Text]
-
Zambrowicz BP, Friedrich GA, Buxton EC, Lilleberg SL, Person
C, Sands AT 1998 Disruption and sequence identification of 2,000 genes
in mouse embryonic stem cells. Nature 392:608611[CrossRef][Medline]
-
Skarnes WC, Moss JE, Hurtley SM, Beddington RS 1995 Capturing
genes encoding membrane and secreted proteins important for mouse
development. Proc Natl Acad Sci USA 92:65926596[Abstract]
-
Durick K, Mendlein J, Xanthopoulos KG 1999 Hunting with traps:
genome-wide strategies for gene discovery and functional analysis.
Genome Res 9:10191025[Abstract/Free Full Text]
-
Chowdhury K, Bonaldo P, Torres M, Stoykova A, Gruss P 1997 Evidence for the stochastic integration of gene trap vectors into the
mouse germline. Nucleic Acids Res 25:15311536[Abstract/Free Full Text]
-
Stoykova A, Chowdhury K, Bonaldo P, Torres M, Gruss P 1998 Gene trap expression and mutational analysis for genes involved in the
development of the mammalian nervous system. Dev Dyn 212:198213[CrossRef][Medline]
-
Schena M, Shalon D, Davis RW, Brown PO 1995 Quantitative
monitoring of gene expression patterns with a complementary DNA
microarray. Science 270:467470[Abstract]
-
Jennings EG, Young RA 1999 Genome expression on the World Wide
Web. Trends Genet 15:202204[CrossRef][Medline]
-
Ahrendt SA, Halachmi S, Chow JT, Wu L, Halachmi N, Yang SC,
Wehage S, Jen J, Sidransky D 1999 Rapid p53 sequence analysis in
primary lung cancer using an oligonucleotide probe array. Proc Natl
Acad Sci USA 96:73827387[Abstract/Free Full Text]
-
Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, Brasch
MA, Thierry-Mieg N, Vidal M 2000 Protein interaction mapping in C.
elegans using proteins involved in vulval development. Science 287:116122[Abstract/Free Full Text]
-
Blackstock WP, Weir MP 1999 Proteomics: quantitative and
physical mapping of cellular proteins. Trends Biotechnol 17:121127[CrossRef][Medline]
-
Pandey A, Podtelejnikov AV, Blagoev B, Bustelo XR, Mann M,
Lodish HF 2000 Analysis of receptor signaling pathways by mass
spectrometry: identification of Vav-2 as a substrate of the epidermal,
platelet-derived growth factor receptors. Proc Natl Acad Sci USA 97:179184[Abstract/Free Full Text]
-
Di Francesco V, Munson PJ, Garnier J 1999 FORESST: fold
recognition from secondary structure predictions of proteins.
Bioinformatics 15:131140[Abstract/Free Full Text]
-
Geetha V, Di Francesco V, Garnier J, Munson PJ 1999 Comparing
protein sequence-based and predicted secondary structure-based methods
for identification of remote homologs. Protein Eng 12:527534[Abstract/Free Full Text]
-
Niehrs C, Pollet N 1999 Synexpression groups in eukaryotes.
Nature 402:483487[CrossRef][Medline]
-
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO,
Eisenberg D 1999 Detecting protein function and protein-protein
interactions from genome sequences. Science 285:751753[Abstract/Free Full Text]
-
Enright AJ, Ilipoulos I, Kyrpides NC, Ouzounis CA 1999 Protein
interaction maps for complete genomes based on gene fusion events.
Nature 402:8690[CrossRef][Medline]
-
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D 1999 A combined algorithm for genome-wide prediction of protein
function. Nature 402:8386[CrossRef][Medline]
-
Zlokarnik G, Negulescu PA, Knapp TE, Mere L, Burres N,
Feng L, Whitney M, Roemer K, Tsien RY 1998 Quantitation of
transcription and clonal selection of single living cells with
ß-lactamase as reporter. Science 279:8488[Abstract/Free Full Text]
-
Whitney M, Rockenstein E, Cantin G, Knapp T, Zlokarnik
G, Sanders P, Durick K, Craig FF, Negulescu PA 1998 A genome-wide
functional assay of signal transduction in living mammalian cells. Nat
Biotechnol 16:13291333[CrossRef][Medline]
-
Rao A 1998 Sampling the universe of gene expression. Nat
Biotechnol 16:13111312[CrossRef][Medline]
-
Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T 1999 Prediction of gene function by genome-scale expression analysis:
prostate cancer-associated genes. Genome Res 9:11981203[Abstract/Free Full Text]