Tools and strategies for physiological genomics: the Rat Genome Database
Simon N. Twigger1,2,
Dean Pasko1,
Jeff Nie1,
Mary Shimoyama1,
Susan Bromberg1,
Dan Campbell1,
Jiali Chen1,
Norberto dela Cruz1,
Chunyu Fan1,
Cindy Foote1,
Glenn Harris1,
Brian Hickmann1,
Yuan Ji1,
Weihong Jin1,
Dawei Li1,
Jedidiah Mathis1,
Nataliya Nenasheva1,
Rajni Nigam1,
Victoria Petri1,
Dorothy Reilly1,
Victor Ruotti1,
Eric Schauberger1,
Kathy Seiler1,
Ronit Slyper1,
Jennifer Smith1,
Weiye Wang1,
Wenhua Wu1,
Lan Zhao1,
Angela Zuniga-Meyer1,
Peter J. Tonellato1,2,
Anne E. Kwitek1,2 and
Howard J. Jacob1,2,3
1 Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin 2 Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin 3 Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin
 |
ABSTRACT
|
---|
The broad goal of physiological genomics research is to link genes to their functions using appropriate experimental and computational techniques. Modern genomics experiments enable the generation of vast quantities of data, and interpretation of this data requires the integration of information derived from many diverse sources. Computational biology and bioinformatics offer the ability to manage and channel this information torrent. The Rat Genome Database (RGD; http://rgd.mcw.edu) has developed computational tools and strategies specifically supporting the goal of linking genes to their functional roles in rat and, using comparative genomics, to human and mouse. We present an overview of the database with a focus on these unique computational tools and describe strategies for the use of these resources in the area of physiological genomics.
ontologies; comparative genomics; model organism; bioinformatics; physiological genomics
 |
INTRODUCTION
|
---|
RAT RESEARCH HAS BEEN revolutionized by the release of the draft genome sequence of the rat (8), completing the basic genome resources required to facilitate better understanding of disease in this animal model of critical clinical relevance. The rat is primarily known as a "physiological" model and is a dominant model in nutrition, neuroscience, pharmacology, toxicology, and physiology. Given the need to annotate the human genome with function, linking the rat into this process is a logical and necessary requirement if we are to accelerate improvements in health care. The Rat Genome Database (RGD) is the model organism database tasked with the curation and integration of genomic data with a focus on the use of the rat as a model system for the study of the genetic basis of complex phenotypes (6).
The rat provides a wide variety of experimental approaches for physiological genomics, greatly facilitated by the many genetically characterized strains of rat, each with unique phenotypic characteristics. A popular paradigm for the study of the genetic basis of complex phenotypes in the rat is shown in Fig. 1, illustrating the use of these specific inbred strains to identify genomic regions associated with a phenotype and the ultimate identification of the gene(s) responsible. Also highlighted in this Fig. 1 is the translational nature of rat research, the need to relate results to human systems and the human genome. Devising effective ways to facilitate this translation between systems is one of the most significant biological and bioinformatic challenges being faced by the scientific community today. There are two approaches to this problem: translation of information from one system to the corresponding information in the other system via some form of translation table, or the adoption of a "lingua franca" such that both systems describe information in the same fashion, using agreed-upon terms and definitions. Comparative genomics relies on the first approach, translating genomic data from one system to another via homologous genomic regions. The development of biological ontologies epitomizes the second approach, the integration of data from multiple organisms via annotations from shared ontologies. Ontologies are structured vocabularies of terms with previously agreed-upon definitions, of which the Gene Ontology is perhaps the most well-developed biological example (2). RGD employs both of these approaches in its curation and tool development, allowing the subsequent data integration. Shared ontologies allow phenotype, disease, pathway, and gene functional data to be incorporated from multiple species; curated gene orthologs (genes evolved from a common ancestral gene that share significant sequence homology and usually identical function across 2 or more species) and comparative maps built using rat, mouse, and human genomes allow a direct biological connection between species. Together, these approaches underlie the RGD toolset, defining the primary-use cases for the database and the data and tools presented.

View larger version (66K):
[in this window]
[in a new window]
|
Fig. 1. Overview of physiological genomics in the rat and the various translational technologies available to relate rat research to human systems. Left: traditional positional cloning techniques whereby 2 rats, 1 possessing the phenotype of interest and the other being a nonaffected control strain, are crossed, and the progeny are genotyped and phenotyped leading to the definition of quantitative trait loci (QTLs) linking the phenotype to specific regions of the genome. The regions are then examined for potential candidate genes, and comparative genomics is used to integrate evidence from other organisms or to translate results to the human genome. Translation of information to the human system is also becoming possible using informatic tools such as ontologies. Various ontologies are shown connecting elements of the experimental paradigm to related human data.
|
|
Like other databases, RGD curates and compiles data from published research but has gone further to provide researchers with not only the data but also with tools to accelerate the use of this data in their ongoing research. The close associations of active research groups with the RGD bioinformatics staff highlighted the needs of researchers to access and interpret this data in specific ways. This in turn led to the development of a variety of web-based applications targeted at the specific needs of researchers using the rat as a model system for physiologicalgenomics. Like any toolbox, each tool has an appropriate use and is often best employed in concert with others to create the final result. Here we describe the bioinformatic resources available at RGD and present strategies for their application to physiological genomics.
Physiological genomics has been described as "research directed towards the understanding of the relationship of genes to complex physiological functions" (5). In this respect, much of the focus of RGD is on providing data and tools for physiological genomics research. To illustrate specific tools and their use, we have identified four prominent research "tracks" that comprise a significant cross-section of physiological genomics-related uses. These tracks are Positional Cloning, the use of genomic techniques to locate and identify gene(s) related to a phenotype of interest; Comparative Genomics, the analysis and comparison of genomes from different species; Expression Profiling, the use of techniques such as microarray analysis to detect gene expression differences related to a phenotype of interest; and Functional Genomics, experiments aimed at understanding the function of specific genes. These categories are somewhat artificial in that a research project will often utilize aspects of some or all of these tracks at the same time. However, they serve as a convenient framework for the following descriptions. Here we illustrate the interconnected nature of the tools and data and guide the reader to means of choosing the tools most likely to be applicable to his/her research.
 |
MATERIALS AND METHODS
|
---|
The RGD is an online resource available at http://rgd.mcw.edu. Little specific software is required to use the database beyond a standard web browser (e.g., Internet Explorer, Firefox, Safari, Mozilla). The Virtual Comparative Map (VCMap) tool does require that a Java virtual machine be present; however, this is almost certainly preinstalled on most desktop systems. For the server itself, the database uses Oracle 9i, and the website is driven by Oracle Application Server using an embedded Apache web server to serve standard web pages and Common Gateway Interface (CGI) scripts, and a Java container to host various Java Server Pages and servlets. The public website runs on a SunFire V480 server [4 900-MHz processors, 16 GB of random access memory (RAM) with 1.2 GB devoted to the database, 72 GB hard disk] running Solaris 2.9. Many of the RGD tools presented here are Perl CGI scripts. Several tools use Java, including VCMap, MetaGene, RGD Advanced Search, RGD Quick Search, and GViewer, along with the standard quantitative trait loci (QTLs) and simple sequence length polymorphism (SSLP) standard interfaces. Most of the Java applications use Java Server Pages (JSP)/servlet technologies. The VCMap and MetaGene tools use Java applets to provide a graphic user interface to the comparative mapping data and sequence data. RGD utilizes a variety of third-party tools, most notably the Gbrowse genome browser (v1.61) created by Lincoln Stein (17). In addition, RGD provides access to standard sequence analysis tools such as BLAST-Like Alignment Tool (BLAT) (11), Basic Local Alignment Search Tool (BLAST) (1), and RepeatMasker (http://www.repeatmasker.org). The database project may be cited by referring to the most recent article (6), and the best methods for citing specific resources are described in the guidelines posted online (http://rgd.mcw.edu/cite.shtml).
RGD data.
The major types of data curated by RGD are shown in Fig. 2. The focus of the database is on the curation of rat genes and their function, and to this end we also curate information about rat strains, SSLP (also known as microsatellite) markers that are often used in the mapping of genes to specific phenotypes, and the QTLs that result from these studies. For an excellent outline of the experimental techniques involved in the genetic analyses of complex phenotypes in the rat, see the review by John Rapp (15). RGD also maintains mouse and targeted human QTL data to facilitate comparative QTL mapping. All mouse QTL data are downloaded on a regular basis from the Mouse Genome Database (4) and linked to the mouse genome by a mixture of techniques. For mouse QTL with two flanking markers, the locations of these markers on the genome are used directly to define the QTLs position. In the case of QTL with a single peak marker, the location of the peak marker is determined on the genome directly or by using the nearest mapped marker. The span of the QTL is artificially set to 36 Mb (the average mouse QTL size for those QTLs with mapped flanking markers), and flanking markers selected from the genome were spaced at 18 Mb on either side of the peak location. Human QTL data for phenotypes related to popular areas of rat research are curated from the literature by the RGD curation team; flanking markers defining the span of the QTL are extracted from the relevant references. Human phenotypes targeted to date include arthritis, blood pressure, asthma, chronic obstructive pulmonary disease, obesity, and diabetes with regular literature searches being performed to stay current with ongoing research. To aid in linking the genes and markers to the genome, various sets of mapping data are also maintained for published whole genome, genetic and radiation hybrid maps (12, 16) in addition to gene and marker sequences and the genomic locations of these sequences on the current build of the rat genome (release 3.1). The curation methods used at RGD are a mixture of manual and informatic processes. The informatic steps compare incoming data against existing curated data, flagging inconsistencies for manual review. The manual curation processes create much of the biological content of the database. Key facts are extracted from published articles and incorporated into the database as categorized notes or annotations from various controlled vocabularies such as Gene Ontology (2) and phenotype and disease and pathway ontologies in development at RGD.

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 2. List of curated data contained within the Rat Genome Database (RGD). Not shown are splice variants, which are curated as a subset of genes, and rat single nucleotide polymorphism (SNP) data, which are available via the RGD Gbrowse genome browser but are not currently part of the main RGD curated database.
|
|
 |
RESULTS
|
---|
RGD tools and strategies for physiological genomics.
Although not covered in detail here, RGD provides standard search interfaces for each of the types of data listed in Fig. 2. These provide core search functions specific to the data type, allowing varying degrees of search complexity and the ability to order and limit the returned results. Standard interfaces are useful for finding specific data objects (a particular gene or SSLP for example) or when the Quick Search returns too many hits and limiting parameters are needed.
The major research-oriented tools available on RGD are listed in Table 1 along with a brief description of each tools primary function. Figure 3 provides a more extensive comparison between the tools, indicating their general functionality and their applicability to the four research tracks outlined below. Help documentation for all RGD data and tools is available on the RGD Help Pages (http://rgd.mcw.edu/tu/index.shtml). The tool documentation describes each tool discussed in this article, and it provides a definition for the tool and suggested uses along with descriptions of required input data and the available output formats.

View larger version (62K):
[in this window]
[in a new window]
|
Fig. 3. Comparison table of the tools: their functionality and applicability to the selected research tracks. A checkmark means that a tool provides the indicated functionality or is broadly applicable to projects on the indicated research track. ACP, Allele Characterization Project; BLAST, Basic Local Alignment Search Tool; BLAT, BLAST-Like Alignment Tool; VCMap, Virtual Comparative Map; RH, Radiation Hybrid.
|
|
Positional cloning.
An overview is shown in Fig. 4. The assumption is that a researcher wishes to positionally clone genes contributing to a phenotype exhibited by a particular inbred strain of rat. An initial search of RGD using the Quick Search could be used to find strains exhibiting an appropriate phenotype. Any existing QTLs for this phenotype would also suggest potential affected and control strains and would indicate known candidate regions and/or genes. Having selected two strains to cross, Genome Scanner could be used to select polymorphic SSLPs (or microsatellite) markers at regular intervals across the genome to be used in the genetic mapping of the cross. This tool utilizes the extensive microsatellite allele data set in RGD derived from the Allele Characterization Project (ACP) undertaken at the Medical College of Wisconsin, in which >4,200 microsatellites were genotyped in 48 commonly used inbred rat strains (16). The SSLPs selected can be used to genotype the experimental progeny, resulting in the identification of a novel QTL for the phenotype being studied. Having identified a QTL region for further study, a variety of tools can be applied in the experimental analysis stage. Alternatively, the investigator may begin with the QTLs that have already been mapped and the intervals defined in RGD.

View larger version (31K):
[in this window]
[in a new window]
|
Fig. 4. Strategies for using RGD in support of positional cloning experiments. GO, Gene Ontology; SSLP, simple sequence length polymorphism.
|
|
After defining a QTL, attention turns to the genes within that region to identify the positional candidate genes and to the genetic markers that may be needed for a higher-resolution mapping study. Three tools are particularly relevant at this stage: the Genome Browser, for graphic exploration of a region; the Genome Annotation Tool, to list and download raw sequence data for a region [accession nos. and basic information for expressed sequence tags (ESTs), mRNAs, genes, etc.]; and the Gene Annotation Tool, which lists functional annotation for the genes within the region.
The "Genome Browser" provides a convenient graphic method to identify genes, markers, single nucleotide polymorphisms (SNPs), other QTLs, congenic strains, and many other features present in the specified region. The Genome Browser has a novel functional annotation track that can depict ontology annotations, graphically showing such things as known phenotype, disease, and pathway associations useful for identifying potential candidate genes. The "Genome Annotation Tool" provides access to the individual sequence objects present in the region (sequence accession nos. for RefSeq and RGD gene records, SSLPs, ESTs, mRNAs), which can be downloaded and used in subsequent bioinformatic mining steps. For further analysis of the individual genes in the QTL region, the "Gene Annotation Tool" provides access to a broad spectrum of gene annotations drawn from RGD, EntrezGene (13), Universal Protein Resource (UniProt) (3), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (10), including ontology annotations that may provide functional information about a gene. The specific steps needed to use these three tools for the annotation of a genomic region are shown in Box 1 (see APPENDIX). Readers are encouraged to try these examples to see the basic use of each tool and to use them as a starting point for the annotation of a region relevant to their own interests.
If similar phenotypes have been mapped to syntenic regions in mouse and human, this increases the confidence in the association in rat and facilitates translational studies; VCMap is the ideal tool for this role and is described in more detail in Comparative genomics, below. As an aid to guide further rat studies, an application unique to RGD, ACP Haplotyper, affords rat researchers the ability to compare the genomic haplotypes in the region among 48 inbred rat strains (18). On the basis of mapped SSLP allele data, ACP Haplotyper visually represents shared blocks of conservation among the 48 strains, illustrating regions where the genomes of these strains differ and which might potentially house genes responsible for the observed phenotypic differences.
Comparative genomics.
RGD has a wide variety of tools catering to comparative genomics and the use of both rat and mouse as models for human (Fig. 5). In addition to the curation of rat data, RGD curates mouse and human gene orthologs, mouse QTLs (from the Mouse Genome Database), and human QTL data and has an extensive integrated comparative mapping environment built around the VCMap tool (19). The incorporation of key data for all three organisms means that RGD is of great value to any researcher doing comparative genomics among any of the three species. This is especially powerful because of the annotations for phenotype, disease, and Gene Ontology (molecular function, biological process, cellular component) that are curated across all three species. Shared vocabularies provide the perfect framework for informatic comparisons between organisms and greatly enhance the potential data mining opportunities stemming from comparative studies.
Comparative genomics can be broken down into two phases: the initial identification of the primary region in organism 1, followed by identification and exploration of the homologous region(s) in organism 2. It is possible to enter RGD with gene, sequence, or QTL data from any of the three organisms and navigate to the homolog or homologous region in the other organism(s). VCMap provides the most comprehensive search and visualization features. A wide variety of maps (e.g., genetic, radiation hybrid, cytogenetic, genomic) can be loaded for all three organisms and integrated via the common comparative maps. Rat QTLs can be aligned with counterparts in human and mouse, corroborating the shared phenotypic association between the genomes that has been demonstrated in many previous cases (9). RGD QTL and genes are also linked to the Vista (7) and Evolutionary Conserved Region (ECR) Browsers (14), which provide two-way and multiple genome sequence comparisons, respectively. Traditional sequence comparison tools such as BLAST are also available to assist in homolog assignment for those genes currently without curated orthologs.
Once the homologous region has been identified in the other organism(s), the tools are much the same as were described previously for QTL analysis after positional cloning. The significant addition is that the Gene Annotation Tool and VCMap contain data for rat, mouse, and human, providing a one-stop solution for data annotation across all three species.
Expression analysis.
The rat is a popular system for expression profiling studies, most often microarray analysis, particularly in areas such as toxicology where it is projected to become one of the primary test species for new compounds. RGD has a variety of tools that can be employed at various stages of an expression analysis project (Fig. 6). The various rat chips from the major manufacturers such as Affymetrix and Agilent are the most common platforms in use, so the role of RGD in these cases is to aid in the annotation of the probe sets. For researchers looking to design their own arrays or utilize alternative expression analysis techniques, the gene, EST, and sequence data contained in the database are of most interest. To create a targeted custom chip, one might use the Quick Search or Ontology Browser to locate genes with specific ontology annotations (e.g., genes found in a particular cellular component or involved in a specific pathway) or on a specific chromosome or chromosomal region.

View larger version (31K):
[in this window]
[in a new window]
|
Fig. 6. Strategies for using RGD in support of expression profiling experiments. EST, expressed sequence tag.
|
|
Once an experiment has been performed, annotation of differentially expressed sequences is the primary objective. For single genes or ESTs, the standard query interfaces provide the best route into the database, and tools such as VCMap and the Genome Browser will also work with individual identifiers. However, in most cases, a larger number of targets will have been identified, necessitating a bulk annotation of all the targets with key information for subsequent manual review. The Gene Annotation Tool (GA Tool) was designed with this role in mind and accepts a variety of inputs relevant to microarray analysis, including gene symbols, GenBank accession numbers, Affymetrix Probeset identifiers, EntrezGene identifiers, and SwissProt identifiers. The annotations returned for each input sequence can include data collated from RGD and the three other key databases, EntrezGene, UniProt, and KEGG. The GA Tool accepts input data from rat, mouse, and human, making it a very useful annotation application for expression studies in multiple organisms. The GA Tool output can be in the form of a flat file for subsequent analysis in Excel or other informatic tools or, alternatively, a HyperText Markup Language (HTML) table with hyperlinks to records in all the included databases. The HTML version can be saved to provide a convenient jumping-off point for further exploration of the results. Once a connection has been made to a gene record within RGD, the user can then follow many different paths for data mining as described in more detail below.
Functional genomics.
From a data analysis perspective, ascribing function to genes is required for one of two reasons: the function is broadly known but the gene that is responsible is unknown, or the gene is known but its function(s) is not. The former scenario is typical of candidate gene selection in a QTL region, and the latter is found when attempting to relate differentially expressed genes from a microarray experiment back to the original experimental condition. Approaches to solving these problems using RGD are shown in Fig. 7.
Much of the functional or "biological" content of the database is captured primarily in ontology annotations and also by text annotations describing features such as expression and regulation conditions curated from the scientific literature. The Ontology Browser can be used to query the database for genes associated with a particular phenotype, disease, pathway, or Gene Ontology term (covering molecular function, biological process, and cellular component). This will return a list of matching genes and QTLs with their positions on the genome visualized on the Gviewer component. As an alternative, the Quick Search also utilizes ontology annotations and searches free text notes, which may return hits the Ontology Browser omits. Sequence searches using BLAST provide a further avenue for functional exploration; blasting an existing gene or domain possessing the function of interest may highlight related genes potentially possessing the same functional characteristics.
When presented with a gene and the need to acquire as much relevant information as possible about that gene and its function, many of the same tools are used but in the reverse direction. Ontology annotations available for that gene will give good indications about the known function; annotations available for homologs at other databases such as the Mouse Genome Database and EntrezGene may also be invaluable. VCMap may predict a homolog in mouse or human where one has not yet been curated by RGD staff. At this stage, it can be useful to utilize databases and tools outside of RGD for additional evidence. RGD genes are integrated and linked to many of the key biological data repositories, each of which contains a wealth of additional data worth investigating (Fig. 8).

View larger version (43K):
[in this window]
[in a new window]
|
Fig. 8. External databases and tools accessible from RGD gene records: 1) Univ. of California Santa Cruz (UCSC) Genome Browser, http://genome.ucsc.edu; 2) Mouse Genome Database, http://www.informatics.jax.org; 3) Gene Ontology (GO) Consortium, http://geneontology.org; 4) The Institute for Genomic Research (TIGR) Gene Indices, http://www.tigr.org/tdb/tgi/; 5) RatMap, http://ratmap.gen.gu.se; 6) Kyoto Encyclopedia of Genes and Genomes (KEGG), http://www.genome.jp/kegg/; 7) Universal Protein Resource (UniProt), http://www.uniprot.org/; 8) National Center for Biotechnology Information (NCBI), http://www.ncbi.nlm.nih.gov/; 9) VISTA Genome Browser, http://pipeline.lbl.gov/; 10) Ensembl, http://www.ensembl.org/; and 11) Evolutionary Conserved Region (ECR) Browser, http://ecrbrowser.dcode.org/.
|
|
 |
DISCUSSION
|
---|
The rat is a very important model for physiology and complex phenotype analyses and, with the recent release of the rat genome sequence, has become one of the premier systems for physiological genomics. The RGD curates and integrates published rat genomic data from the literature and other data repositories with the goal of providing a single unified data set and online environment for researchers utilizing the rat. The potential users and uses of the database have been expanded by comparative genomics as researchers seek to incorporate additional data from other species. As the needs of the users have grown and become more sophisticated, RGD has developed unique bioinformatic tools to better support these needs, making RGD a critical part of their research toolkit.
RGD contains much of the standard information available in most model organism databases: maps, genes, strains, markers, and sequences. However, a particular focus of the rat system is the mapping of complex phenotypes and the translation of these findings to human. In support of this, RGD is unique in incorporating QTL data from rat, mouse, and human and gene homolog data from mouse and human. The driving force behind much of the tool development has been to enable users to get at this data in ways that accelerate the research process. Tools such as Genome Scanner use the microsatellite data to directly aid in the selection of reagents for use in the lab; the Radiation Hybrid (RH) Mapserver takes PCR results from radiation hybrid mapping panels and returns genomic locations on the standard Medical College of Wisconsin RH map (12). Many of the other tools aim to assist in hypothesis generation by integrating and/or visualizing data in ways that help a researcher. The Genome Browser provides a way to comprehend the genomic environment of a gene and allows the investigator to upload their data onto the browser. ACP Haplotyper shows how the genomes of inbred strains compare with each other at the level of a haplotype, and VCMap provides comparisons with other species. Data annotation applications such as the Gene Annotation Tool package a wide variety of information for the interpretation of microarray results and, as with all data in RGD, provide links to external databases for further data exploration.
Rat research and rat data are part of the far bigger physiological genomics revolution, being driven by high-throughput experimental techniques and by large-scale bioinformatic data analysis. Comparative genomics provides the traditional "Rosetta Stone" for interspecies translations; however, the emerging ontologies and data standards are enabling "comparative informatics," and this promises to be an equally powerful tool in translational research. At the molecular level, tools like Gene Ontology are revolutionizing the ability of researchers to analyze gene function across multiple organisms. At a higher level, the mammalian phenotype ontology implemented in RGD is being codeveloped with the Mouse Genome Database and will result in standardized descriptions for phenotypes across the primary rodent model systems. As these are implemented in other databases such as the Medical College of Wisconsin PhysGen database of rat cardiovascular phenotypes (http://pga.mcw.edu) or in mouse mutagenesis databases, one can imagine being able to find not only genes, strains, and QTLs for a given phenotype but also available rat and mouse animals complete with phenotypic measurements and comparisons. As more of these standardized vocabularies are introduced and implemented across diverse data sources, more opportunities for physiological genomics tool development will be presented that should prove to be highly beneficial to the field.
 |
APPENDIX
|
---|
Box 1: Step-by-Step Example of Annotating a Genome Region
A common task for many researchers is the annotation of a genomic region that has been defined by QTL analysis or shared homology with another organism. The three main tools used for this analysis are the Genome Browser, the Genome Annotation Tool and the GA Tool. In this example, we use a region that we hypothesize has been identified by positional cloning and is flanked by the microsatellite markers D1Rat5 and D1Rat10. Each of these tools will also accept genomic coordinates to define the boundaries of the region of interest, in addition to the symbols of known genes or markers.
Genome annotation tool.
- Go to http://rgd.mcw.edu/sequenceresources/genome-annotation.shtml.
- Enter the flanking data for the region of interest: "D1Rat5D1Rat10."
- Select the sequence features you wish to retrieve for this region [one or more of EST, sequence tag site (STS), mRNA, Known Gene, and RefGene].
- Click "Submit."
An HTML page will be returned listing the sequence features in the region, ordered by their genomic location. In this example, using the data at time of writing (Genome Build v3.1), 243 STS markers and 48 Known Genes were found in this region. Along with the basic genomic data (symbol, chromosome, start and stop positions), known human and mouse homolog symbols are shown as are links to Online Mendelian Inheritance in Man (OMIM) for appropriate human genes.
GA tool.
Used for annotating a list of genes, identified by their symbol or other accession number, the GA Tool is unique in that it also integrates data from EntrezGenes, UniProt, and KEGG into the resulting annotation data. The GA Tool can also be used to annotate genes found within a particular region of the rat genome, identified by flanking markers or genomic coordinates (this option is not yet available for human and mouse regions).
- Go to http://rgd.mcw.edu/gatool/.
- Select "Chromosome Location" under the type of data being submitted.
- Select the sequence features (output types) that should be retrieved for the region of interest, e.g., "Ref.Seq."
- In the right-hand panel, enter the flanking marker data or chromosomal coordinates into the Chromosome Location box. In the example used here, enter "D1Rat5D1Rat10" into the box.
- In the various database sections below, you can check off the information you would like annotated onto the sequence features selected in step 2 that are found in the region requested. A basic selection to get an overview of a gene and its function might include i) RGD Gene Symbol, ii) RGD Gene Description, iii) RGD Gene Ontology Terms, iv) RGD Phenotype Terms, v) KEGG Gene Symbol, and vi) KEGG Pathway Name.
- In the output section at the bottom of the page, you can select either HTML format for online viewing or various text formats for importing into other applications. Tab delimited is a good selection for import into Excel.
- Click "Submit."
An HTML page (or plain text page) will be returned listing the sequence features found within the region selected, along with the selected annotations of those features if the annotation exists. The annotations are hyperlinked back to parent database so further exploration is facilitated. By saving the HTML results of a GA Tool annotation run, the page can provide a useful jumping-off point for exploration of the annotated data set.
Genome browser.
The Genome Browser allows the visual inspection of a region, showing the features, annotations, and their physical positions relative to each other, a very different perspective to the tabular listings of data created by the tools above.
- Go to http://rgd.mcw.edu/sequenceresources/gbrowse.shtml.
- To focus on the same region as before, we have to enter the genomic coordinates of the markers. i) To obtain this information, enter "D1Rat5" in the "Landmark or Region" field and click "Search." The browser will refresh showing just D1Rat5 and listing its genomic coordinates at the top of the page ("Showing 128 bp from Chr1, positions 10,818,666 to 10,818,827"). For our purposes, we will use the start position, 10,818,666. ii) Repeat this process for D1Rat10 to obtain its end position, 25,274,122 bp.
- Enter the genomic coordinates for the region of interest preceded by the chromosome number: "Chr1:10818666..25274122."
- Clicking "Search" will return an image of the 14.6 Mb that are flanked by D1Rat5 and D1Rat10.
- RGD provides a wide variety of annotation tracks for use on the Genome Browser, including genes and other sequence features such as ESTs, SSLPs and SNPs, QTLs, functional annotations, microarray expression levels, and more. A basic starting point for exploration of a region might include the following tracks: RGD Genes and RHMap3.4 Markers (SSLPs and ESTs mapped on the RH map v3.4). Select these tracks and click "Update Image" to redraw the image.
- The Genome Browser has many options for data display, including the ability to upload your own annotations to visualize against the RGD annotations. More information on customizing the display can be found by following the link to "Help" in the "Instructions" section of the Gbrowse page.
 |
GRANTS
|
---|
This work was funded by National Heart, Lung, and Blood Institute Grant HL-64541 (H. J. Jacob) and National Human Genome Research Institute Grant HG-002273 (S. N. Twigger).
 |
ACKNOWLEDGMENTS
|
---|
We thank Carol Moreno-Quinn for critical review of this manuscript and Brian Halligan for invaluable help with Acrobat.
Present addresses: D. Campbell, Dept. of Computer Engineering, Univ. of Wisconsin, Madison, WI 53706; B. Hickman, Dept. of Electrical and Computer Engineering, Univ. of Wisconsin, Madison, WI 53706; E. Schauberger, Genetics Program at Michigan State Univ., East Lansing, MI 48824; R. Slyper, Dept. of Computer Science, Univ. of Michigan, Ann Arbor, MI 48109; and P. Tonellato, Point One Systems, 10437 Innovation Drive, Wauwatosa, WI 53226.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: S. Twigger, Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226 (e-mail: simont{at}mcw.edu)
10.1152/physiolgenomics.00040.2005.
 |
REFERENCES
|
---|
- Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. Basic local alignment search tool. J Mol Biol 215: 403410, 1990.[CrossRef][ISI][Medline]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, and Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 2529, 2000.[CrossRef][ISI][Medline]
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, ODonovan C, Redaschi N, and Yeh LS. The Universal Protein Resource (UniProt). Nucleic Acids Res 33: D154D159, 2005.[Abstract/Free Full Text]
- Bult CJ, Blake JA, Richardson JE, Kadin JA, Eppig JT, Baldarelli RM, Barsanti K, Baya M, Beal JS, Boddy WJ, Bradt DW, Burkart DL, Butler NE, Campbell J, Corey R, Corbani LE, Cousins S, Dene H, Drabkin HJ, Frazer K, Garippa DM, Glass LH, Goldsmith CW, Grant PL, King BL, Lennon-Pierce M, Lewis J, Lu I, Lutz CM, Maltais LJ, McKenzie LM, Miers D, Modrusan D, Ni L, Ormsby JE, Qi D, Ramachandran S, Reddy TB, Reed DJ, Sinclair R, Shaw DR, Smith CL, Szauter P, Taylor B, Vanden Borre P, Walker M, Washburn L, Witham I, Winslow J, and Zhu Y. The Mouse Genome Database (MGD): integrating biology with the genome. Nucleic Acids Res 32: D476D481, 2004.[Abstract/Free Full Text]
- Cowley AW. Physiological genomics: tools and concepts. J Physiol 554: 3, 2004.[Free Full Text]
- de la Cruz N, Bromberg S, Pasko D, Shimoyama M, Twigger S, Chen J, Chen CF, Fan C, Foote C, Gopinath GR, Harris G, Hughes A, Ji Y, Jin W, Li D, Mathis J, Nenasheva N, Nie J, Nigam R, Petri V, Reilly D, Wang W, Wu W, Zuniga-Meyer A, Zhao L, Kwitek A, Tonellato P, and Jacob H. The Rat Genome Database (RGD): developments towards a phenome database. Nucleic Acids Res 33: D485D491, 2005.[Abstract/Free Full Text]
- Frazer KA, Pachter L, Poliakov A, Rubin EM, and Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acids Res 32: W273W279, 2004.[Abstract/Free Full Text]
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Cooney AJ, DSouza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428: 493521, 2004.[CrossRef][ISI][Medline]
- Jacob HJ and Kwitek AE. Rat genetics: attaching physiology and pharmacology to the genome. Nat Rev Genet 3: 3342, 2002.[CrossRef][ISI][Medline]
- Kanehisa M, Goto S, Kawashima S, Okuno Y, and Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res 32: D277D280, 2004.[Abstract/Free Full Text]
- Kent WJ. BLATthe BLAST-like alignment tool. Genome Res 12: 656664, 2002.[Abstract/Free Full Text]
- Kwitek AE, Gullings-Handley J, Yu J, Carlos DC, Orlebeke K, Nie J, Eckert J, Lemke A, Andrae JW, Bromberg S, Pasko D, Chen D, Scheetz TE, Casavant TL, Soares MB, Sheffield VC, Tonellato PJ, and Jacob HJ. High-density rat radiation hybrid maps containing over 24,000 SSLPs, genes, and ESTs provide a direct link to the rat genome sequence. Genome Res 14: 750757, 2004.[Abstract/Free Full Text]
- Maglott D, Ostell J, Pruitt KD, and Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33: D54D58, 2005.[Abstract/Free Full Text]
- Ovcharenko I, Nobrega MA, Loots GG, and Stubbs L. ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res 32: W280W286, 2004.[Abstract/Free Full Text]
- Rapp JP. Genetic analysis of inherited hypertension in the rat. Physiol Rev 80: 135172, 2000.[Abstract/Free Full Text]
- Steen RG, Kwitek-Black AE, Glenn C, Gullings-Handley J, Van Etten W, Atkinson OS, Appel D, Twigger S, Muir M, Mull T, Granados M, Kissebah M, Russo K, Crane R, Popp M, Peden M, Matise T, Brown DM, Lu J, Kingsmore S, Tonellato PJ, Rozen S, Slonim D, Young P, Jacob HJ, et al. A high-density integrated genetic linkage and radiation hybrid map of the laboratory rat. Genome Res 9: AP1AP8, insert, 1999.[Abstract/Free Full Text]
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, and Lewis S. The generic genome browser: a building block for a model organism system database. Genome Res 12: 15991610, 2002.[Abstract/Free Full Text]
- Twigger SN, Gullings-Handley J, Kwitek-Black AE, Tonellato PJ, and Jacob HJ. Rat genome database ACP haplotyper and genome scannernovel tools for rat genomics (Abstract). FASEB J 14: A329, 2000.
- Twigger SN, Nie J, Ruotti V, Yu J, Chen D, Li D, Mathis J, Narayanasamy V, Gopinath GR, Pasko D, Shimoyama M, De La Cruz N, Bromberg S, Kwitek AE, Jacob HJ, and Tonellato PJ. Integrative genomics: in silico coupling of rat physiology and complex traits with mouse and human data. Genome Res 14: 651660, 2004.[Abstract/Free Full Text]