* Computational Proteomics Group, John Curtin School of Medical Research, Australian National University, Canberra, Australia; School of Biological Sciences and Biological Informatics & Technology Centre, University of Sydney, NSW, Australia;
Dipartimento di Patologia Animale, Igiene e Sanità Pubblica Veterinaria, Sezione di Biochimica e Fisiologia Veterinaria, Università di Milano, Milan, Italy; and
Comparative Genomics Group, Research School of Biological Sciences, Australian National University, Canberra, Australia
Correspondence: E-mail: Jill.Gready{at}anu.edu.au.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: comparative genomics conserved contiguity phylogenetic analysis repeats regulatory sequences phylogenetic footprinting
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The first gene distal to PRNP in eutherian mammals is PRND, which encodes the doppel (Dpl) protein (Moore et al. 1999) and is thought to have arisen by a duplication of PRNP (Mastrangelo and Westaway 2001). The next neighbor gene to PRND, detected so far only in humans and not present in mouse, is the related PRNT gene (Makrinou, Collinge, and Antoniou 2002), which seems to be a pseudogene that has arisen from duplication of PRND.
The complexity of the mammalian PRNP locus and the conundrum of PrP function provided the impetus for searching publicly available genomic sequence databases of human (Lander et al. 2001) and model vertebrates (Aparicio et al. 2002; Waterston et al. 2002) to discover PrP paralogs in silico. As these paralog genes may have functions related to those of PrP, their characterization may illuminate its functions and reveal the basis for PrP's disease associations. The paralogs themselves may be indicated to be potential drug targets (Lander et al. 2001).
Using this comparative genomics approach, we recently manually annotated a new gene, SPRN, that is a paralog of PRNP (Premzl et al. 2003). SPRN was detected in mammals (human, mouse, rat) and fish (Fugu, Tetraodon, zebrafish). It encodes the Shadoo (Sho) protein, and is better conserved between mammals and fish than is PRNP (fig. 1).
|
|
In addition to the serious problem of transmission of prion disease among mammals and to humans in the recent bovine spongiform encephalopathy (BSE) crisis arising from feeding of contaminated mammal-derived proteins to cows, these new findings of PrP homologs in fish raise new issues: the possibility of spread of prion disease to farmed fish from mammals (and vice versa?) from meat and bone meal feedstuff derived from farm animals. Hence, understanding the evolution of the fish proteins and definition of their functions appear both critical and timely for defining the function, and dysfunction, of the mammalian PrPs, as a means for understanding the transmissibility of infectious prion disease.
Genes in different species defined as orthologous by their derivation from a common ancestor generally lie in the same genomic context and perform the same function; this property is the basis for prediction of gene function in newly sequenced genomes (O'Brien et al. 1999; Waterston et al. 2002). Although this comparative genomics approach is a powerful tool for understanding biological function, the appropriate evolutionary distance of species in a comparison depends on the biological question addressed (Frazer et al. 2003). For example, a comparative study of custom-sequenced genomic data for four yeast species that diverged 5 to 20 MYA allowed a complete analysis for gene identification, determination of gene structures, estimate of rapid/slow evolutionary changes, detection of regulatory elements, detection of combinatorial control of expression, and identification of the key regulatory elements determining gene activity (Kellis et al. 2003). However, there are limitations in the depth and extent of possible mammalian comparative genomic analysis, which parallel availability of eutherian, marsupial, and monotreme lineage representatives (O'Brien, Eizirik, and Murphy 2001; Graves and Westerman 2002). The most comprehensive targeted comparative genomic analysis reported to date included 13 vertebrate species (one human, eight other mammals, one bird, and three fish) and allowed identification of conserved coding and noncoding sequences as well as insight into the genome dynamics (Thomas et al. 2003).
The aim of this work is to analyze the evolutionary relationships among mammalian PRNP and fish genes encoding proteins related to PrP, and also their relationships with SPRN from mammals and fish, by means of comparative genomic analysis. This analysis comprises both homology and nonhomology criteria for assessing gene orthology (Eisen and Wu 2002). Apart from identification of gene similarity (homology criteria), we tested whether homologous rearrangement events have occurred in the intergenic regions (nonhomology criteria). Conserved order and transcription orientation with respect to adjacent gene or genes, also dubbed "conserved contiguity," implies that there has been no rearrangement in the intergenic regions and that the genes share common evolutionary history. This is suggested to be a strong indication of orthology (Comparative Genome Organization Workshop 1996; Gilligan, Brenner, and Venkatesh 2002).
Our analysis is based on sequence data available in public databases. Apart from human, only two vertebrate genome sequences are available in draft form: those for mouse (Waterston et al. 2002) and tiger pufferfish Fugu rubripes (Aparicio et al. 2002). Substantial data for the rat genome is also available (http://www.ensembl.org/Rattus_norvegicus/). Rodents are separated from human by 75 Myr of independent evolution, and mouse and rat (separated by 30 Myr) are the major mammalian models for genetic and immunological research and physiological research, respectively. Data from the recently reported dog genome, which is at only 1.5x sequence coverage (Kirkness et al. 2003), is not yet available in searchable form. Fish are at the root of vertebrate diversification, and extant fish are separated from mammals by 450 Myr of independent evolution. The compact vertebrate genome of Fugu rubripes was proposed as a tool for discovery of vertebrate genes and their regulatory elements (Brenner et al. 1993), and, indeed, its draft genome sequence showed significant conservation of protein-coding genes, intron-exon structures, and gene arrangement with human (Aparicio et al. 2002). Partial genomic sequence information is also available for green pufferfish Tetraodon nigroviridis (http://www.genoscope.cns.fr/), which diverged from Fugu 20 to 30 MYA, and also for zebrafish Danio rerio (http://www.ensembl.org/), which diverged 100 to 150 MYA from pufferfish and is a popular experimental model in developmental studies. The availability of genome information has led to increased recent interest in zebrafish as a model system for human disease (Ward and Lieschke 2002).
Here, we compare genomic sequences, including adjacent genes, protein alignments, and phylogenetic analysis of mammalian PRNP, related fish genes, and SPRN genes from mammals and fish, together with their adjacent genes. We provide detailed descriptions of the mammalian PRNP, PRND, PRNT, and SPRN genes, and performed "phylogenetic footprinting" on the aligned human, mouse, and Fugu SPRN genes to define possible regulatory regions and transcription factorbinding sites. We present evidence for novel fish genes related to PrP that we identified in public databases: stPrP-2 from Tetraodon and stPrP-3 from zebrafish. We derived the SPRN sequence coding for Sho experimentally from Tetraodon, and for Fugu, Tetraodon, carp, and zebrafish, we found in silico a duplicated SPRN gene (SPRNB) encoding the related Shadoo2 protein (Sho2). We have assessed orthology of the mammalian PRNP and fish genes related to PrP and the mammalian SPRN and homologous fish genes, respectively, and have inferred evolutionary patterns for the mammalian PRNP and SPRN genes. We have developed an evolutionary model consistent with our various findings, as well as information available in the literature, and propose it as a framework for investigating gaps in knowledge in these genes, particularly in fish and tetrapod lineages.
![]() |
Methods and Materials |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Detection of PRNP Genes and Genomic Environment in Human, Mouse, and Rat
The PRNP genes in human (chr20p13: 46149964630236 bp), mouse (Chr2F3: 132911892132940089 bp), and rat (Chr3Q36: 112889678112890442 bp) were found by key word search of the Ensembl human (version 12.31.1), Ensembl mouse (version 12.3.1), and Ensembl rat (version 11.2.1) genome databases, respectively. The local genomic environment is also evident from the interactive web genome browser, as is annotation of the genomic sequence.
Detection of Fugu stPrP-1Coding Gene and stPrP-2Coding Genes and Their Genomic Context
We identified stPrP-1coding gene and stPrP-2coding gene sequences and their local genomic context in the Fugu interactive Ensembl genome browser (version 12.2.1) by using the sequences of their open reading frame (ORFs) as search query (GenBank accession numbers AY141106 and AY188583, respectively) and the server's BlastN (Altschul et al. 1990) search tool. The local genomic environment and annotation of the genomic sequence is also evident from this Web service. The genes encoding stPrP-1 and stPrP-2 are located on scaffold_96 and scaffold_155, respectively.
Identification of Tetraodon Sequences Containing PrP-like and stPrP-2Coding Genes
We identified genomic contig FS_CONTIG_4238_2 containing the Tetraodon PrP-like and stPrP-2coding genes by using the sequence of the Tetraodon PrP-like ORF (Suzuki et al. 2002) as search query and the BlastN program provided in the Genoscope Web service. We identified overlapping genomic clone FS_CONTIG_4238_1 by using the terminal 200 bp of FS_CONTIG_4238_2 as search query. In addition, two more overlapping clones (FS_CONTIG_24895_1 and FS_CONTIG_31286_1) were identified by using the same strategy. Sequences of these contigs were merged into a virtual contig (Tetraodon virtual contig 1) of length 22,249 bp. We verified this assembly using the PiPMaker program (Schwartz et al. 2000) and alignment to the orthologous Fugu genomic sequence. The sequence was also annotated by using the NIX interactive Web tool (Williams, Woollard, and Hingamp 1998; http://www.hgmp.mrc.ac.uk/NIX/). The TestPrP-2 accession number is BN000527.
Identification of Zebrafish Coding Sequence for stPrP-1 (C-Terminal Only) and stPrP-3
To identify the zebrafish stPrP-1coding gene, we used nucleotide sequence from the Fugu stPrP-1 ORF as search query and the BlastN Web service from the Ensembl zebrafish interactive genome database (version 14.2.1). We detected coding sequence for the C-terminal part of the protein only on chromosome fragment ctg10456. A later search (version 22.3b.1) detected the complete gene and genomic context on chr10 (11,350,000 to 11,390,000 bp). Genomic sequence containing the zebrafish stPrP-3 gene (assembly_234, NA3274.1) was identified by using the Fugu stPrP-2-coding sequence as search query and the BlastN Web service from the Ensembl zebrafish genome DB, as above. The local genomic environment and its annotation are evident from the interactive genome browser. The ZePrP-3 accession number is BN000526.
Detection of SPRN Genes and Genomic Environment in Human, Mouse, Rat, Fugu, and Zebrafish
We described recently identification of the SPRN gene in human, mouse, rat, Fugu, and zebrafish (Premzl et al. 2003). We used here an identical approach to extract the SPRN gene sequences and their local genomic environment from the Ensembl databases. The SPRN accession numbers are BN000518 (human), BN000519 (mouse), BN000520 (rat), and BN000521 (Fugu).
Identification of Tetraodon Genomic Sequence Containing the SPRN Gene and Cloning of Sho Coding Sequence
Genomic contig FS_ CONTIG_4144_1 was identified in the Genoscope database as described in Premzl et al. (2003). By using its terminal 200-bp sequence as search query and the BlastN program on the Web service, we identified overlapping clone FS_CONTIG_31029_1. Using the same strategy, the next overlapping clone FS_CONTIG_37429_1 was identified. These sequences were assembled into a virtual contig of 19,029 bp (Tetraodon virtual contig 2; we used a 10-kb sequence containing the SPRN gene from this virtual contig in further analyses). We verified this assembly by using the PiPMaker program to align it to the orthologous Fugu genomic sequence and annotated it by using the NIX interactive Web tool. As the low quality of the FS_CONTIG_4144_1 sequence does not allow translation of the Sho ORF, we used its sequence to design primers (5'-CTAAGACATCCGCCATGACACG-3' and 5'-AGCATCTGCTGCACATCCACAC-3') flanking the Sho ORF, using the MacVector version 7.0 program (Oxford Molecular Group 2000). DNA was extracted from Tetraodon cells using standard procedures (Sambrook, Fritsch, and Maniatis 1989) and used as the template for PCR amplification using FastStart Taq polymerase (Roche) according to the PCR procedure using GC-RICH solution. The PCR product was cloned using the TOPO cloning kit (Invitrogen). The cloned fragment was sequenced by using universal primers M13 forward and M13 reverse and the BigDye Terminator version 3.1 Cycle Sequencing Kit (Applied Biosystems). Products of cycle sequencing reactions were run on an ABI3730 DNA sequencer (Applied Biosystems). The TeSPRN accession number is AJ717305.
Identification of Sho2 Coding Sequences in Zebrafish, Fugu, Tetraodon, and Carp
Zebrafish genomic sequence containing the Sho2 (ctg10456) coding sequence was identified using the zebrafish Sho ORF sequence as search query and the BlastN program available from the Ensembl (version 14.2.1) interactive Web service. Genomic sequence containing the Fugu gene encoding Sho2 (scaffold_96) was identified using the zebrafish Sho2 coding sequence as search query and the BlastN program available from the Ensembl (version 12.2.1) Web genome browser. The Tetraodon genomic contig (FS_CONTIG_41464_1) containing Sho2 coding sequence was identified using the Fugu Sho2 ORF sequence and the BlastN search tool from the Genoscope database. The express sequence tag (EST) (CA964511) containing the ORF for the carp Sho2 was detected by BlastN search of the NCBI est_others database, using the zebrafish Sho2 coding sequence as search query. Local genomic environment and its annotation are provided in the Ensembl genome browser for Fugu SPRNB and zebrafish sprnb. The SPRNB accession numbers are BN000522 (Fugu), BN000523 (zebrafish), BN000524 (carp), and BN000525 (Tetraodon).
Interspersed Elements Analysis
We determined the relative content of repetitive elements (interspersed repeats, small RNA, satellites, simple repeats, and low-complexity sequence) in the human PRNP, PRND, PRNT, and SPRN genes and in mouse Prnp, Prnd, and Sprn genes by using the RepeatMasker program on its Web server (http://ftp.genome.Washington.edu/RM/RepeatMasker.html) with the high sensitivity, slow option. The genomic coordinates and database source for the genes are given in Supplementary Material table S1 and in the Ensembl databases listed above. Content and distribution of interspersed repeats in the human, mouse, and rat genomic sequences was also determined using RepeatMasker. Interspersed repeats only were masked in the genomic sequences we used for the VISTA alignments.
VISTA Global Alignments
We used the VISTA Web server (Mayor et al. 2000; http://www-gsd.lbl.gov/VISTA/) to align genomic sequences in two in silico experiments. In the first experiment, we aligned human (chr20: 4,558,866 to 4,938,939 bp), mouse (chr2: 132,862,228 to 133,103,751 bp), rat (chr3: 112,821,949 to 113,040,637 bp), Fugu (chr_scaffold_155: 247,572 to 271,811 bp), and Tetraodon (Tetraodon virtual contig 1 [see above]) genomic sequences containing either the PRNP or the stPrP-2 gene, its whole upstream intergenic sequence, and adjacent genes (gene for PrP-like protein in fishes, PRND in mammals, PRNT in human, and RASSF2 and SLC23A1 in both mammals and fishes). In the second experiment, we aligned human (chr10: 134,322,741 to 134,374,165 bp), mouse (chr7: 130,335,510 to 130,371,625 bp), rat (chr1: 199,669,758 to 199,704,156 bp), zebrafish (assembly_203: 4,494,411 to 4,519,077 bp), Fugu (chr_scaffold_28: 384,496 to 394,338 bp), and Tetraodon (Tetraodon virtual contig 2 [see above]) genomic sequences containing the SPRN gene, its whole upstream intergenic region, and adjacent genes (gene encoding GTP-binding protein in all species, amine oxidase (AO)coding gene in mammals and pufferfish, and long-chain fatty-acyl elongasecoding gene in zebrafish). Before submission to VISTA, transposable elements in the mammalian sequences were masked by using RepeatMasker as above. Human sequence and its annotation were always used as the base sequence. Pairwise sequence comparisons were calculated with a threshold of 50% identity in a 50-bp window. The minimum identity shown in the VISTA plots is 30%.
PiPMaker Local Alignments
Pairwise alignments of the Fugu and Tetraodon genomic sequences containing the stPrP-2coding gene and SPRNA (see above), and human (chr20: 4,657,104 to 4,708,670 bp) and mouse (chr2: 132,957,930 to 132,998,136 bp) genomic sequences bordered by PRND and RASSF2 genes, were performed by using default settings of the PiPMaker program available as a Web service (Schwartz et al. 2000; http://bio.cse.psu.edu/cgi-bin/pipmaker?basic). Results are presented as dot plots.
Phylogenetic Footprinting and Promoter Analysis
Phylogenetic footprinting of the human, mouse, and Fugu genomic sequences containing the SPRN/Sprn gene was performed by using the FootPrinter program as a Web service (Blanchette and Tompa 2002; http://abstract.cs.washington.edu/blanchem/FootPrinterWeb/FootPrinterInput.pl). We used sequence upstream to the Sho ORF and the whole intergenic region in this experiment. The orthologous rat, Tetraodon, and zebrafish sequence data were not used in this analysis because of gaps in the sequence (rat and zebrafish) or low quality (Tetraodon). We used a conservative approach in this analysis, accepting only motifs detected in all three species and setting low parsimony scores: score 0 for 6-bp and 7-bp motifs, score 1 for 8-bp and 9-bp motifs, and score 2 for 10-bp, 11-bp, and 12-bp motifs. A 13-bp motif (score 2) was identified after inspection of the 12-bp motif sequences. Potential transcription factorbinding sites in these same human and mouse genomic sequences were analyzed by using the MatInspector program available as a Web service (Quandt et al. 1995; http://www.genomatix.de/cgi-bin/eldorado/main.pl). Again, we used a conservative approach in this analysis: motifs were identified with core similarity 1 and matrix similarity score above the optimized matrix similarity score. Motifs predicting transcription factorbinding sites deposited in the TRANSFAC database (Wingender et al. 1996) were identified.
Alignment and Phylogenetic Analysis of PrP-Related and Sho Proteins
Amino acid sequences of 11 PrP and PrP-related proteins (details in figure S3 of Supplementary Material online) and 10 Sho/Sho2 proteins (details in figure S4 of Supplementary Material online) were obtained from GenBank, or as described in this study. The sequences of these two protein sets were aligned independently using ClustalW (Thompson, Higgins, and Gibson 1994), and the result was subsequently refined using genetic data environment (GDE) (Smith et al. 1994). From the PrP and Sho alignments, 118 and 149 sites, respectively, were chosen for phylogenetic analyses using the maximum-likelihood program package MOLPHY (Adachi and Hasegawa 1996). The two alignments were analyzed using ProtML (Adachi and Hasegawa 1996) with the JTT-F model of amino acid substitution. Initially, every possible tree was assessed to produce a set of 2,000 near-optimal trees (using the -jfen 2000 option). This set of trees was then analyzed more rigorously without, and with, nearest-neighbor interchanges (using the -jfu and -jfRu options). Local bootstrap probabilities (LBP) were estimated using the resampling estimated log-likelihood method for internal edges in the tree examined. Trees were compared statistically using the Kishino-Hasegawa test (Kishino and Hasegawa 1989), and topological model uncertainty (Wolf et al. 2000) was considered using model averaging (Jermiin et al. 1997); when considered, we used the class V weighting scheme with = 0.05 to produce (1) a majority-rule consensus tree and (2) relative-likelihood scores (RLS) for internal edges in the inferred phylogeny.
Analysis Software
The Lasergene (DNASTAR, Madison, Wis.) and Vector NTI (InforMax, Frederick, Md.) software packages were used for basic handling and analyses of the nucleotide sequence and protein data. Signal peptide cleavage sites were predicted by using the SignalP program (Nielsen et al. 1997; http://www.cbs.dtu.dk/services/SignalP/) and GPI-anchor sites were predicted by using the bigPI-predictor program (Eisenhaber, Bork, and Eisenhaber 1999; http://mendel.imp.univie.ac.at/gpi/gpi_server.html) available as Web services. Other figures were drawn using Microsoft PowerPoint and Adobe Illustrator.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Identification of Genomic Sequences Containing PRNP in Mammals and Related Genes in Fish
Using the Ensembl interactive genome browsers for human, mouse, rat, Fugu, and zebrafish (http://www.ensembl.org/) we found the genes and extracted their local contexts. Genomic sequence of Tetraodon containing stPrP-2 and PrP-like proteins was derived by detecting overlapping genomic clones in the Genoscope database (http://www.genoscope.cns.fr/) and assembling them into a virtual genomic contig (Tetraodon virtual contig 1 [see below]). In figure 2, we present gene order and relative orientation in the local genomic regions of mammalian PRNP genes and fish genes encoding stPrP-1, stPrP-2, stPrP-3, and PrP-like proteins. In mammals, genes adjacent to PRNP and PRND are RASSF2 and SLC23A1. The PRNT gene is present in humans but not in rodents (fig. 2A [see also below]). In pufferfish, genes encoding stPrP-2 and PrP-like proteins are also adjacent to Rassf2-coding and Slc23a1-coding genes (fig. 2C). The stPrP-2coding gene is in the same position and relative orientation with respect to the adjacent genes as is mammalian PRNP, suggesting an evolutionary relationship.
|
We found another stPrP-like gene, we here call stPrP-3, in zebrafish in the Ensembl genomic contig NA3274.1. Its proximal flanking gene is unknown, but its distal flanking genes code for green cone photoreceptor (gcp) and annexin a6 (anxa6).
We observe some major differences between the PrP-related genes in the mammalian and fish genomes. First, neither PRND nor PRNT genes were detected in fish. Although the similarity in position and orientation of PRNT in human and the PrP-like gene in pufferfish might suggest orthology, sequence comparison shows no similarity (i.e., they are different genes). Obversely, no PrP-like proteincoding genes were found in mammals. Suzuki et al. (2002) correctly noted that the genes encoding Rassf2 and Slc23a1 are in proximity to the PrP-like proteincoding gene in Fugu but reported them in a different gene order and relative gene orientation. Rivera-Milla, Stuermer, and Malaga-Trillo (2003) stated that the genomic environments around the Fugu genes encoding PrP461 and PrP-like proteins seemed to have undergone multiple chromosomal rearrangements, but they did not present supporting data. In the Fugu genomic data, Oidtmann et al. (2003) found the stPrP-1coding gene on scaffold_96, and the stPrP-2coding gene on scaffold_155, adjacent to the PrP-like proteincoding gene; they concluded it was unlikely that ancestors of the stPrP-1 and stPrP-2 genes evolved directly into present mammalian forms of PRNP and PRND. As supported by our later phylogenetic analysis, we suggest that stPrP-2 shares an ancestor with the fish gene encoding stPrP-1, but the gene duplication giving rise to stPrP-1 and stPrP-2 occurred after the evolutionary separation of fish and mammals. Although stPrP-2 in pufferfish and PRNP in mammals share the same proximity to the Rassf2-coding and Slc23a1-coding genes and are likely orthologous, their sequences have diverged greatly, and they may have evolved significantly different functions. The relaxation of conserved contiguity between the sites with different genes distal to PRNP and stPrP-2 (i.e. PRND and PrP-like genes) suggests independent duplications producing PRNP-PRND and stPrP-2 and PrP-like genes. As shown in figure 1, the C-terminal regions of PrP and stPrP-2 proteins have similar features, but these are not present in the PrP-like protein sequences; hence, the function of the PrP-like protein is likely different from that of stPrP-2.
Identification of Genomic Sequences Containing SPRN in Mammals and Fish
We recently characterized the genomic environment of the SPRN gene in the human, mouse, rat, Fugu, and zebrafish genomes (Premzl et al. 2003); this is summarized in figure 3, together with that of Tetraodon. Interestingly, we have recently detected an apparent duplication of the SPRN gene in human, about 140 kb downstream, from genomic data (L. Sangiorgio, B. Strumbo, and T. Simonic, unpublished results). We obtained the Tetraodon SPRNA context by finding overlapping genomic sequence reads in the Genoscope database (http://www.genoscope.cns.fr/) and assembling them into a virtual genomic contig (Tetraodon virtual contig 2 [see below]). (The fish gene is here labeled SPRNA to differentiate it from the related gene in fish, SPRNB, reported in this paper.) Genes adjacent to SPRN in mammals and SPRNA in pufferfish are those encoding the GTP-binding protein (GTP) and AO; both gene order and relative orientation are conserved in the two vertebrate clades, implying orthology. However, in zebrafish, although we found the GTP-bindingprotein gene adjacent to sprna in the same tail-to-tail orientation, the proximal adjacent gene is the long-chain fatty-acyl elongase (fae)coding gene rather than the AO-homologous gene. Because the next proximal genes are not known from the assembly, it is not clear whether the AO-encoding gene is present.
|
Among the four genes in this genomic fragment, the exon-intron structure is known from comparison of the genomic and cDNA sequences only for the PrP-like proteincoding gene. The single-exon ORF of the stPrP-2coding gene and GenScan predictions for Rassf2-coding gene and Slc23a1-coding gene exons are also shown. The dot plot indicates that the Tetraodon virtual contig 1 is assembled in an order consistent with the Fugu genomic sequence; we conclude the assembly is correct and valid for comparative analysis. Additionally, we deduced the Tetraodon stPrP-2 amino acid sequence from the genomic sequence information (see later figure S3 of Supplementary Material online).
Second, we identified 19,029 bp of Tetraodon genomic sequence (FS_CONTIG_4144_1, FS_CONTIG_31029_1, FS_CONTIG_37429_1; Genoscope) containing the SPRNA gene and adjacent genes, and merged it into the Tetraodon virtual contig 2. The dot plot of 10 kb of this sequence aligned to orthologous Fugu genomic sequence (chr_scaffold_28: 384,496 to 394,338 bp; Ensembl) is shown in figure S1B of Supplementary Material online; this shows the GenScan exon predictions for the single-exon Sho ORF and for genes encoding AO and GTP-binding protein. The dot plot again shows consistent patterns in the aligned sequences, suggesting correct assembly of the second virtual Tetraodon contig and its suitability for computational cross-species comparisons. Conservation of sequence proximal (4.5 kb) and distal (
6 kb) to the GTP-binding proteincoding gene may denote exons not recognized by the GenScan prediction (Guigo et al. 2003) (see also figure S2 of Supplementary Material online).
Comparative Genomic Analysis of PRNP and SPRN Genomic Regions
Genomic sequence annotation comprises gene order and relative transcriptional orientation, gene structure, distribution of repeat elements, and distribution of GC islands. The information for the human (base [reference] sequences in our study) and mouse genomes is provided in the interactive Ensembl genome browsers (http://www.ensembl.org/). An extensive collection of transcripts is available only for these two species, with the rat, Fugu, and zebrafish genome annotations (http://www.ensembl.org/) being much less comprehensive. There are some major differences between the genomes of homeothermic and poikilothermic species; for example, the GC compositional heterogeneity in poikilothermic animals is less pronounced (Aparicio et al. 2002). In addition, the depth of fish transposable element analysis is less than that of primates and rodents (Aparicio et al. 2002). In the following section, we compare annotations of the human and mouse PRNP, PRND, PRNT, and SPRN genes and of their local genomic contexts.
Gene Structure, Gene Features, Gene Density, and CpG Islands
Genomic location, gene structure, gene size, GC content, and features of exons and introns of human and mouse PRNP, PRND, PRNT, and SPRN genes are summarized in table S1 of Supplementary Material online.
We found that gene density and GC content are much higher in the SPRN genomic environment than in the PRNP environment, in all mammals analyzed (figs. 2 and 3, and see also tables S2 and S3 of Supplementary Material online). There are five genes in 380,074 bp of the human PRNP local genomic environment, which is 45.02 % GC rich, compared with three genes in 51,425 bp of the human SPRN gene context, which is 50.66% GC rich. In all four genes, (PRNP, PRND, PRNT, and SPRN), the ORF is contained in a single coding exon. Their gene lengths correlate inversely with GC content, and GC content is also higher in the exons (comparing exon 1/intron 1 pairs and, where relevant, exon 2 /intron 2 pairs). The GC content of the human and mouse SPRN genes (66% and 58%, respectively) is much higher than that of PRNP (42% and 45%), PRND (46% and 47%), and PRNT (43%, human). For reference, mammalian GC content is known to vary genome-wide at different scales; for example, the average GC content of the human genome is 41%, ranging from 36% to 47% on the large scale (<10 Mb) and from 33% to 59% on the small scale (<300 kb) (Lander et al. 2001), with similar values reported for the mouse genome (Waterston et al. 2002). An analysis of CpG islands in the human and mouse genomic contexts of PRNP, PRND, PRNT, and SPRN using the cpgplot program (Larsen et al. 1992) (results not shown) showed islands associated with the PRNP and SPRN promoter regions but not those of PRND (as shown by Comincini et al. [2001]) or PRNT.
Distribution of Transposable Elements in PRNP and SPRN Genes
The role of repeats in the genome is controversial. Although often considered as just "junk," it is now recognized that active elements have reshaped genomes (e.g., creating new, modified, or reshuffled genes). Passive elements are excellent markers for mutation and selection analyses (Lander et al. 2001). Differences in the repeat content and distribution denote differences in the gene/genome dynamics (Thomas et al. 2003). Discernable interspersed repeats are reported to comprise approximately 45% of the human genome (Lander et al. 2001) and 37.5% of the mouse genome (Waterston et al. 2002). The lower content in mouse is thought to be caused by a higher nucleotide substitution rate in mouse (4.5 x 109) than in human (2.2 x 109), which makes older elements difficult to identify. The depth of the repeat analysis using the RepeatMasker program (http://ftp.genome.Washington.edu/RM/RepeatMasker.html) is 150 to 200 and 100 to 120 MYA for human and mouse elements, respectively (Waterston et al. 2002).
Our analysis of the distribution of interspersed repeat elements in the local genomic environments of the human, mouse, and rat PRNP and SPRN genes shows more transposable repeats in human than in mouse and rat, for both regions (given in tables S2 and S3 of Supplementary Material online). There are more interspersed elements in the SPRN genomic environment than in that of PRNP of human and rat but fewer in mouse.
At the single-gene level, however, we observe some striking differences in the transposable-element content and distribution; this is summarized in table 2 for the human and mouse PRNP, PRND, PRNT, and SPRN genes. For a detailed analysis of repeat distribution in eutherian PRNP and PRND genes, see Lee et al. (1999) and Comincini et al. (2001). The repeat content of these four genes correlates with their length and inversely with their GC content; thus, PRNP, has the most repeats. Lee et al. (1999) suggested that the gene had expanded independently in all lineages since the mammalian radiation by numerous insertions. Strikingly, in contrast, SPRN, the shortest gene with the highest GC content, is devoid of transposable elements.
|
Alignment of Genomic Sequences
Orthologous genes can usually be aligned and recognized by comparative genomic analysis in closely related species. Between evolutionarily more distant species, such as mammals and fish, it is the coding regions that are primarily recognizable (Frazer et al. 2003). However, where rapid divergence of nucleotide sequence, indels, and gene loss or acquisition has occurred, sequences cannot readily be aligned, so orthologs will not be recognized (Kellis et al. 2003). Indeed, the analysis of Thomas et al. (2003) showed that almost one third of human coding sequence did not align to fish in the corresponding genomic region.
To detect highly diverged but orthologous sequences in two long contiguous sequences, global alignments are an advantage (Frazer et al. 2003). We used the VISTA global alignment tool (Mayor et al. 2000) for this purpose to detect conserved, orthologous elements in mammalian and fish genomic sequences. Cross-species VISTA analysis of genomic fragments containing the mammalian PRNP and fish stPrP-2coding and PrP-like proteincoding genes is shown in figure 4, with similar analysis for the mammalian and fish SPRN genes in figure S2 of Supplementary Material online. Description of the genomic context and coordinates is given in Materials and Methods.
|
We observe that conservation with rodents in the human PRNT gene region differs from that of other genes shown in the plot. There is almost no conservation between rat and human (none at all in exons), and conservation between human and mouse appears poor.
In contrast to the human-rodent comparison, the VISTA plot in figure 4 detects no conservation between mammalian PRNP and fish stPrP-2coding and PrP-like proteincoding genes. Neither homology criteria nor nonhomology criteria for gene orthology are fulfilled. First, the coding exons of the genes do not align, indicating divergence of mammalian PRNP and fish stPrP-2coding genes beyond detectable conservation. Second, there is clear evidence of deletion, translocation, or duplication events in the local genomic sequence since divergence of mammals and fish: the PRND gene (and PRNT gene in human) exists in mammals but not in fish, whereas the PrP-like proteincoding gene is present in fish only. The adjacent RASSF2 and SLC23A1 genes align with their fish orthologs in the exons.
PipMaker Analysis of PRNT
To test the Vista observation for the PRNT gene further, we aligned human (chr20: 4,657,104 to 4,708,670 bp; Ensembl) and mouse (chr2: 132,957,930 to 132,998,136 bp; Ensembl) genomic sequence regions between the PRND and RASSF2 genes using the PipMaker program. The dot plot results presented in figure 5, like those of the Vista results, show no conservation of PRNT gene exons between human and mouse.
|
It is likely that the PRNT gene appeared in the human lineage after the evolutionary split with rodents. This is consistent with the position of the PRNP genomic context on chromosome 20 in a highly recombinogenic region that may have fostered a recent duplication of PRND to create the human-specific PRNT gene. However, it is also possible that the PRND duplication is more ancient and that PRNT survives as a pseudogene in other mammalian lineages as well as human but has been deleted in rodents.
VISTA Plot for SPRN Genomic Region
Alignment of the mammalian and fish genomic regions containing the SPRN and adjacent genes shows conservation in all three genes (figure S2 in Supplementary Material online). Notably, the coding exon sequence of SPRN aligns in all five pairwise alignments. We note several conservation peaks distal to the SPRN promoter in mammals and fish (human sequence positions approximately 48 to 48.5 kb and approximately 49.5 to 50 kb); a detailed analysis of the regulatory region is given below.
The coding exons of the GTP-bindingprotein gene are mostly conserved between mammals and fish. The large gap in the alignment (approximately 24.5 to 39 kb in human sequence) is caused by the insertion of LINE elements in human sequence only (two complete elements in antisense orientation and two truncated human LINE/L1 elements). L1s are the "young" transposable elements actively amplified during the past 40 Myr of primate evolution; that is, after the evolutionary divergence of human and rodents. The distal end of the GTP-bindingprotein gene overlaps with the distal end of the SPRN gene in the human, but not the mouse, sequence; the functional significance of this, if any, is unclear. A few other examples of such antiparallel overlapping of untranslated exons of functional genes have been reported (Miyajima et al. 1989; Batshake and Sundelin 1996; Dan et al. 2002).
There are four polyadenylation signals in the human GTP-binding proteincoding gene, resulting in alternative transcription of the noncoding part of its 3' terminal exon. All four sites differ in one position from canonical consensus polyadenylation signals (AAUAAA and AUUAAA); the sequence of the first (41,227 to 41,232 bp; for AK095872, BC00409, and BC000920 transcripts), second (41,262 to 41,267 bp; for BC026725 transcript), and third (41,321 to 41,326 bp; for BC035721 transcript) signals is GTTAAA. The most distal fourth signal sequence, which overlaps with the 3' end of the SPRN gene, is AATCAA (42,068 to 42,072 bp; for cDNAs AK074976, NM_138384). Untranslated gene fragments may contain regulatory sequences that affect mRNA stability and translation efficiency, so the choice of alternate polyadenylation sites may strongly affect expression of the gene (Beaudoing and Gautheret 2001). The sequence of the polyadenylation signal site for SPRN is canonical consensus AATAAA (41,454 to 41,449 bp; for BC040198 transcript).
The exons of the third gene encoding AO are conserved between mammals and pufferfish but not zebrafish. In the zebrafish sequence, the third gene is for the long-chain fatty-acyl synthetase (fig. 3). As the zebrafish contig does not contain the next proximal gene, it is not clear whether the synthetase gene is merely inserted or the AO-coding gene, which is found on another contig, is at another location.
Conservation of the SPRN-coding exon between mammals and fish satisfies homology criteria for gene orthology. Next, conserved gene order and relative transcription orientation of SPRN and its adjacent gene encoding GTP-binding protein (including also AO-coding gene in pufferfish) between mammals and fish indicates that no rearrangement occurred in this genomic fragment after the evolutionary divergence of fish and mammals 450 MYA. Thus, we conclude that the SPRN gene is likely to be orthologous between mammals and fish.
Phylogenetic Footprinting of the SPRN Gene
"Phylogenetic footprinting" is a method for identifying regulatory elements in a single gene by comparing orthologous sequences from several species (Blanchette and Tompa 2002). The basic assumption is that functional regions, including regulatory elements, are under greater selective pressure and will be more conserved between species than nonfunctional regions. However, as regulatory elements may be quite short (5 to 20 bp), it is difficult to recognize them in the nonfunctional background noise. We used the program Footprinter (Blanchette and Tompa 2002; http://abstract.cs.washington.edu/blanchem/FootPrinterWeb/FootPrinterInput.pl) to identify such elements conserved in the human, mouse, and Fugu SPRN genes. The program reports sets of conserved motifs, taking into account a phylogenetic tree relating the input species.
As shown in table 3, we identified 16 conserved motifs upstream to the SPRN ORF, in the intron, exon 1, and upstream promoter. In human and mouse, five motifs were detected in the upstream promoter, one in exon 1, and 10 in the intron, as shown in figure 6. Some motifs are duplicated. Although this set of motifs contains candidates for regulatory regions of the SPRN gene, it may also contain false positives. Also, Footprinter may miss motifs present in a single species (i.e., false negatives), motifs shorter than 6 bp in multiple species, motifs containing indels, motifs that fail to meet statistical significance, and dimers with variable internal sequences (Blanchette and Tompa 2002). We next checked whether any known transcription factorbinding sites were among the detected motifs using the MatInspector program (Quandt et al. 1995; http://www.genomatix.de) for human and mouse sequences. This program predicts transcription factorbinding sites deposited in the TRANSFAC database (Wingender et al. 1996). We detected 155 and 159 likely transcription factorbinding sites in the human and mouse sequences, respectively (69 or 82 in the intron, 11 or 14 in exon 1, and 75 or 62 in the upstream promoter): full lists for human and mouse are given in tables S4 and S5, respectively, of the Supplementary Material online.
|
|
Motif 4 denotes the binding site for the nurr1 nuclear receptor, which is expressed only in brain and is known to play an important role in coordinate neuroendocrine regulation of activity of the hypothalamic/pituitary/adrenal axis (Murphy and Conneely 1997). It is also critical for dopaminergic neuron development by activating tyrosine hydroxylase transcription in a cell contextdependent manner (Kim et al. 2003). Aberrations in the dopaminergic system are associated with Parkinson disease and schizophrenia.
Motif 14 corresponds to the activating transcription factor 6 (ATF6) binding site. ATF6 is a member of the basic leucine-zipper family. It is involved in induction of the endoplasmic reticulum (ER) stress response, during which transcription of genes encoding molecular chaperones and folding enzymes located in the ER is upregulated. Some genes in this pathway are directly activated by ATF6 binding sites. Upstream to ATF6 in the ER stress response is IRE1 (Wang et al. 2000). Remarkably, the ER stress-response pathway is involved in familial Alzheimer disease (FAD) pathogenesis. FAD-linked PS1 mutants attenuate autophosphorylation of IRE1 and lead to impaired induction of the ER stress response. These mutants also attenuate the ATF6 signaling pathway (Kudo et al. 2002). We have already shown predominant expression of mammalian SPRN in brain (Premzl et al. 2003). These new findings of conserved and clustered nurr1 and ATF6 transcription factorbinding site motifs in the human and mouse introns are further suggestive of Sho's brain-specific function and are targets for experimental analysis.
Finally, part of the third conserved motif 16 binds the MYC-associated zinc-finger proteinrelated transcription factor (MAZR). MAZR interacts with Bach2, a B-cell and neuron-specific transcription repressor (Kobayashi et al. 2000).
Protein Alignment and Phylogenetic Analysis
As already discussed for the genomic sequences and summarized in figures 13 and table 1, we have expanded the data set of fish proteins related to PrP. First, we deduced a sequence of 395 amino acids for Tetraodon stPrP-2 protein from the assembled genomic sequence. Second, we translated the zebrafish stPrP-3coding sequence in Ensembl genomic contig NA3274.1 into a protein of 561 amino acids. Third, based on the incomplete genomic sequence we had assembled, we cloned and sequenced the Tetraodon Sho ORF and deduced a 155-residue protein, thus adding a third fish Sho sequence to those for zebrafish and Fugu we had previously reported (Premzl et al. 2003). Last, as already discussed, we identified a new Sho-related class, Shadoo2 (Sho2), of sequences in public databases. We were able to deduce Sho2 protein sequences of 150, 150, and 135 amino acids, respectively, from genomic information for Fugu (scaffold_96 from Ensembl), Tetraodon (FS_CONTIG_41464_1 from Genoscope), and zebrafish (ctg10456 from Ensembl). Carp Sho2 was conceptually translated from EST data (CA964511 from NCBI) to give a protein of 145 amino acids.
As described in Materials and Methods, the amino acid sequences of 11 PrP-related proteins from human, chicken, turtle, Xenopus, Fugu, Tetraodon, and zebrafish and 10 Sho and Sho2 proteins from human, mouse, rat, Fugu, Tetraodon, zebrafish and carp, were aligned independently using ClustalW (Thompson, Higgins, and Gibson 1994). After refinement of the alignments using GDE (Smith et al. 1994), we assessed the alignments and chose 118 and 149 sites from the PrP-related and Sho/Sho2 proteins, respectively, for phylogenetic analyses using the maximum-likelihood program MOLPHY (Adachi and Hasegawa 1996). The alignments are shown in figures S3 and S4 of Supplementary Material online.
The analysis of the PrP-related sequence set identified a single most-likely tree, shown in figure 7, and 281 near-optimal trees, none of which differed significantly from the most-likely tree. The total tree length is 5.65, implying that every site in the alignment (figure S3 of Supplementary Material online) has changed on the average 5.65 times, which, in turn, implies that interpretation of the tree must be done with some caution; this is also reflected in several low LBP and RLS values (fig. 7). However, the fact that we found only 282 "good" trees (i.e., the most-likely tree and near-optimal trees) out of 34,459,425 possible trees places the result in a better light.
|
The analysis of the Shadoo proteins identified a single most-likely tree, shown in figure 8, and 48 near-optimal trees, none of which differed significantly from the most likely tree. The total tree length is 4.77, implying that every site in the alignment (figure S4 of Supplementary Material online) has changed on the average 4.77 times. Again, this implies caution must be applied in interpretation of the tree, this need being reflected also in some low LBP and RLS values (fig. 8). However, again we note we found only 49 "good" trees (i.e., the most-likely tree and the near-optimal trees) out of 2,027,025 possible trees, setting the result in a better light.
|
It is established that gene duplication occurred in several fish lineages and that many duplicated fish genes have only one homolog in mammals (Taylor et al. 2003; Aparicio et al. 2002). Two fates of duplicated genes have been proposed. The classical model of neofunctionalization predicts that one of the duplicate loci retains its original function, whereas the other duplicate is fixed only if rare beneficial mutations occur (Ohno 1970). This model fits current knowledge of Shos and Sho2s, with the Sho2 duplicate either deleted in mammals (and other tetrapods?) or so highly diverged by neofunctionalization that its mammalian ortholog is not recognizable. The alternative model proposes that both duplicates are preserved because of subfunctionalization, where proteins encoded by the duplicates complement each other functionally (Force et al. 1999). This model fits the fish stPrPs, which have sequences similar to each other but dissimilar to those of tetrapod PrPs.
Protein Sequence Features
We introduced previously a model for the structures of mature Sho, PrP, and PrP-related proteins, which is a useful basis for discussing both conserved and highly variable regions in these sequences (Premzl et al. 2003). The model consists of four regions: a basic region 1, which shows a tendency for insertion of repeat or other sequence (region 2), a hydrophobic region 3, and a C-terminal region 4. Figure 1 and figures S3 and S4 of Supplementary Material online show the model represents a quite dynamic structural scaffold, with substantial differences in the presence, lengths, and sequence compositions of the regions and insertions among the complete Sho/PrP set, particularly taking account of the numerous fish proteins. As we have already compared the sequence features of all proteins in figure 1 except for the ones newly reported here (TestPrP-2, ZestPrP-3, TeSho, FuSho2, TeSho2, CaSho2, and ZeSho2 [see table 1]), we confine attention here largely to the new proteins.
Fish proteins from the known stPrP set (FustPrP-2, 424 residues; TestPrP-2, 395 residues; FustPrP-1, 461 residues; and ZestPrP-3, 561 residues) are much longer than tetrapod PrPs (frog, 216 residues; turtle, 270 residues; chicken, 273 residues; and human, 253 residues) or fish PrP-like proteins (170 to 190 residues) and show significant sequence heterogeneity among themselves, especially in the seemingly "free format" low-complexity repeats and large insertion in region 2 and for large insertions within region 4. Most of the difference in length between TestPrP-2 and FustPrP-2 is in the large insertion in region 2, whereas the very large ZestPrP-3 protein shows increases of approximately 90 and approximately 45 residues in the region 2 insertion and in region 4, respectively, compared with stPrP-2s. More useful alignment and analysis requires better knowledge of the number of stPrP proteins in individual fish and their variation among the major fish lineages (see also Protein Alignment and Phylogenetic Analysis above). As shown in figure 1, the new stPrPs are also predicted to have N-terminal and C-terminal signal sequences for extracellular export and GPI-anchor attachment, respectively (Premzl et al. 2003 and Materials and Methods), and also one disulfide bridge and three N-glycosylation sites.
The main significantly conserved feature among all PrP-related proteins is the hydrophobic region 3 (residues 112 to 131 in HuPrP) and also subsequent sequence in region 4 up to about residue 160 (HuPrP). Interestingly, this includes the beginning of sequence shown to be folded in the NMR structures (L125-R228 for HuPrP [Zahn et al. 2000]) and includes the first unusual (highly hydrophilic) -helix (D144-M154, HuPrP) and the first part of the antiparallel ß-sheet (Y128-G131, HuPrP). This segment also includes the region from residues 119 to 138 (HuPrP) implicated as a PrPC dimerization site and potential binding site for PrPC-PrPSc complex formation (Horiuchi and Caughey 1999; Zuegg and Gready 2000). We note a repetitive insertion in this region in fish PrP-like proteins.
Compared with PrP-related proteins, conservation of features, including length, is much higher in the Sho/Sho2 set. We previously noted very high conservation (identity 81% to 96%) among mammalian Shos (human, mouse, and rat), and good conservation, particularly for zebrafish, between fish and mammalian sequences to slightly beyond the end of the hydrophobic sequence (identity 41% to 53%, zebrafish 1 to 78) (Premzl et al. 2003). The fish proteins are all of similar length (FuSho, 146 residues; TeSho, 149 residues; ZeSho, 131 residues; FuSho2, 150 residues; TeSho2, 150 residues; ZeSho2, 136 residues; CaSho2, 145 residues). There is an insertion in the Fugu and Tetraodon Sho basic repeats that is not present in other Shos. Of note, however, is that Sho2 sequence in this region is different from that in Shos, with the Sho basic region N-terminal to the hydrophobic region being missing in Sho2s. Although there are some local regions of sequence conservation, including around the N-glycosylation site conserved in all Shos and Sho2s, the C-terminal regions of fish Shos and Sho2s are quite diverged; this is also the sequence region most diverged between fish and tetrapod Shos (see above).
Model for the Evolution of PRNP-Related and SPRN-Related Genes
In their paper reporting the finding of the fish PrP-like proteincoding genes, Suzuki et al. (2002) made the initial suggestion of an evolutionary link between the fish PrP-like proteincoding genes and tetrapod PRNP genes. This was based on both weak "homology" (both encode extracellular GPI-anchored proteins with repeats and an unusual internal hydrophobic region [see figure 1]) and context-based (conservation of genes encoding Rassf2 and Slc23a1 in both genomic contexts) criteria. The report by Oidtmann et al. (2003) of the stPrP-2coding gene proximal to the PrP-like proteincoding gene in Fugu, somewhat strengthened the homology argument for an evolutionary relationship between tetrapod PRNPs and this fish gene because the C-terminal regions of the proteins are similar, whereas those of PrP-like proteins are quite different (see figure 1). However, this finding weakened the context-based argument. Although the orientation of the stPrP-2 gene is now the same as that of mammalian PRNP, the two genes are separated from the Rassf2 coding gene by different genes (PrP-like gene and PRND) with different orientations (fig. 2). We have suggested previously that this could be accounted for by independent duplications. Oidtmann et al. (2003) also reported another fish gene, stPrP-1, in another genomic context. In our earlier paper (Premzl et al. 2003) reporting the finding of the SPRN gene in both fish and mammals, we provided clear evidence, both homology and context based, that suggested a direct evolutionary relationship. However, at this stage, we had no evidence for an evolutionary link between Sho and PrP proteins, although the overall structural features were intriguingly similar.
In this paper, we applied several methods of homology-based (VISTA genomic global alignment, promoter-region footprinting, and protein sequence alignment and phylogenetics) and context-based (genomic context and relative gene order and orientation) analysis to see whether more definitive statements can be made on the relationships between the fish and tetrapod genes. We also included in the analyses sequences of several relevant new genes we found in fish. It is now clear that fish contain a plethora of genes with similarities to PRNP and SPRN, with quite diverged sequences even among the few fish lineages available to us. Also, data on SPRN and PRND genes of tetrapods other than mammals are lacking. Remarkably, in all these genes, the ORF is contained within a single exon. Hence, although we can make some firm conclusions, others are more tentative and await the correction of the current draft data for Fugu, Tetraodon, and zebrafish genomes as well as the availability of more data for tetrapod lineages other than mammals (i.e., amphibians, reptiles, birds, marsupials, and monotremes).
Considering first the issues for PrP and related genes, our VISTA analysis for global alignment of genomic environment does not show conservation between mammalian PRNP and fish stPrP-2 and PrP-like proteincoding genes, although the adjacent Rassf2 and Slc23a1 coding genes align with their fish orthologs in the exons. There is clear evidence of rearrangements in this genomic region, as the PRND gene (and PRNT gene in human) exists in mammals but not in fish, where the PrP-like proteincoding gene is so far reported only in fish. It, thus, seems clear that the PRNP and stPrP-2coding genes have diverged greatly since they last shared a common ancestor. There is no basis for suggesting functional homology; that is, the genes are likely to have evolved significantly different functions. Second, our phylogenetic analysis suggests that the gene duplication giving rise to stPrP-1/stPrP-3 and stPrP-2 is likely to have occurred after evolutionary separation of fish and mammals. The clustering of the fish stPrPs separately from the tetrapod PrPs in figure 7 is indicative of the divergence of the amino acid sequence between the fish and higher-vertebrate proteins, and of likely different functions.
Remarkably, we found a new SPRN-related gene, SPRNB, adjacent to the stPrP-1 coding gene in both Fugu and zebrafish. As current data indicate the genomic contexts of the genes (fig. 2) are different, caution in interpreting these results is necessary. However, it seems unlikely to be accidental that these two genes have been found together twice. Hence, these findings provide the first hint of an evolutionary link between the SPRN and PRNP families. The fact that these two genes code for the unique hydrophobic segment, as we reported initially (Premzl et al. 2003), is further evidence at the protein level of an evolutionary link.
Considering now the SPRN/SPRNB genes, a quite different picture emerges. All evidence suggests mammalian and fish SPRN/sprn genes are likely to be orthologous; that is directly descended from a common ancestor and with preserved function. VISTA analysis shows the SPRN gene aligns in our cross-species analysis, together with its adjacent genes, whereas protein-sequence alignment and phylogenetic-tree analysis also indicate conservation between mammalian and fish Shos. Phylogenetic analysis shows Shos and Sho2s lying on two separate branches (fig. 8), implying the two genes were duplicated before the divergence of fish from tetrapods.
We present an evolutionary model consistent with these various findings in figure 9, as a basis for discussion and future investigation. The model proposes that the ancestral gene leading to all the PrP-related and Sho-related encoding genes was the SPRN-like gene. First, an ancient prevertebrate duplication produced the SPRNA and SPRNB genes within an environment that may have contained the AO-binding protein and GTP-binding protein encoding genes proximally, and the Rassf2 and Slc23a1 encoding genes distally. The model then proposes separation of the SPRN and SPRNB genes, by a translocation of half of the gene cluster. The subsequent history of the two branches suggests the genomic environment containing the SPRNB gene was highly recombinogenic, whereas that containing the SPRNA gene was stable, leading to the currently known fish and mammal Sho orthologs. The stability of this gene suggests that orthologous SPRN genes coding Sho protein will be found in the other tetrapod lineages. A further duplication of the SPRNB gene is then proposed, still before the divergence of fish from tetrapods, with the duplicate genes acquiring additional C-terminal coding sequence to produce SPRNB1 and SPRNB2 protogenes with a completed C-terminal domain, as in tetrapod PrP and fish stPrP-1/-2/-3 proteins (fig. 1).
|
After divergence of fish from tetrapods, the model proposes independent duplications of the SPRNB1 and SPRNB2 protogenes. In the tetrapod lineage, a gene duplication of SPRNB1 produced a gene cluster containing PRNP and PRND genes, together with Rassf2 and Slc23a1 coding genes. The timing of this duplication is unclear because known genes are limited to PRNPs in frogs, reptiles, birds, and mammals and PRND in mammals. As noted above and elsewhere in this paper, it is not known at what stage the SPRNB2 gene was deleted in tetrapod evolution or whether it has simply diverged beyond levels currently detectable in mammals. The model allows for a more ancient origin of PRNT than necessary simply to explain the existing minimal data (i.e., absent in rodents but present in human).
In the fish branch, initial duplications of the SPRNB1 protogene to produce stPrP-2 and PrP-like proteincoding genes and duplications of the SPRNB2 protogene to produce SPRNB (Sho2 coding gene, as observed) and stPrP-1coding gene, are proposed. As drawn in figure 9, these gene clusters are already separated. If the separation occurred after these duplications, translocation of the SPRNB and stPrP-1 fragment might more conveniently explain the different contexts observed in Fugu and zebrafish. The additional complexity and plethora of genes in fish is consistent with existing knowledge of the higher rate of gene duplication in fish and the fact that many duplicated fish genes have only one homolog in mammals (Taylor et al. 2003; Aparicio et al. 2002).
Dynamics of Mammalian PRNP and SPRN Genes
We observed some interesting differences in the mammalian SPRN and PRNP gene characteristics, which offer some insights into how these genes have been evolving. First, although in the SPRN and PRNP local genomic environments (covering several genes as in VISTA analyses in figure 4 and in figure S2 of Supplementary Material online), we found the percentage of repetitive elements is comparable (tables S2 and S3 of Supplementary Material online), at the single-gene level the picture is strikingly different. The SPRN gene has no transposable elements, whereas PRNP and PRND have up to 46% and 24%, respectively, in human, all in the introns (table 2). Consistent with this picture of gene dynamics, the brain-specific SPRN gene has evolved conservatively between fish and mammals, and Sho is predicted to have a conserved basic function in vertebrate brain.
By contrast, we have shown that the PRNP gene has undergone major and rapid changes at several levels of comparison: it does not align with sequences of fish stPrP-2coding genes (fig. 4); the sequence, and subsequence composition (Strumbo et al. 2001), of the tetrapod PrPs (frog, reptile, bird, and mammal) show significant differences; and the number of exons and size of introns varies significantly in mammals (Lee et al. 1998; Makrinou, Collinge, and Antoniou 2002 [table 2]). These features are characteristic of genes evolving under relaxed evolutionary constraints, typical of proteins with rapidly evolving function (Harrison and Gerstein 2002; Kitami and Nadeau 2002). What these functions are has yet to be elucidated. Despite extensive study, the functions of mammalian PrP remain elusive (Aguzzi and Hardt 2003), and no functional studies, or tissue expression profiles, of the other tetrapod PrPs have been reported.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1996. MOLPHY Version 2.3: Programs for molecular phylogenetics based on maximum likelihood. The Institute of Statistical Mathematics, Tokyo, Japan.
Aguzzi, A., and W. D. Hardt. 2003. Dangerous liaisons between a microbe and the prion protein. J. Exp. Med. 198:14.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[CrossRef][ISI][Medline]
Aparicio, S., J. Chapman, E. Stupka et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:13011310.
Batshake, B., and J. Sundelin. 1996. The mouse genes for the EP1 prostanoid receptor and the PKN protein kinase overlap. Biochem. Biophys. Res. Commun. 227:7076.[CrossRef][ISI][Medline]
Beaudoing, E., and D. Gautheret. 2001. Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res. 11:15201526.
Blanchette, M., and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12:739748.
Brenner, S., G. Elgar, R. Sandford, A. Macrae, B. Venkatesh, and S. Aparicio. 1993. Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265268.[CrossRef][ISI][Medline]
Chiaromonte, F., S. Yang, L. Elnitski, V. B. Yap, W. Miller, and R. C. Hardison. 2001. Association between divergence and interspersed repeats in mammalian noncoding genomic DNA. Proc. Natl. Acad. Sci. USA 98:1450314508.
Comincini, S., M. G. Foti, M. A. Tranulis, D. Hills, G. Di Guardo, G. Vaccari, J. L. Williams, I. Harbitz, and L. Ferretti. 2001. Genomic organization, comparative analysis, and genetic polymorphisms of the bovine and ovine prion Doppel genes (PRND). Mamm. Genome 12:729733.[CrossRef][ISI][Medline]
Comparative Genome Organization Workshop. 1996. Comparative genome organization of vertebrates. Mamm. Genome 7:717734.[CrossRef][ISI][Medline]
Dan, I., N. M. Watanabe, E. Kajikawa, T. Ishida, A. Pandey, and A. Kusumi. 2002. Overlapping of MINK and CHRNE gene loci in the course of mammalian evolution. Nucleic Acids Res. 30:29062910.
Eisen, J. A., and M. Wu. 2002. Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor. Popul. Biol. 61:481487.[CrossRef][ISI][Medline]
Eisenhaber, B., P. Bork, and F. Eisenhaber. 1999. Prediction of potential GPI-modification sites in proprotein sequences. J. Mol. Biol. 292:741758.[CrossRef][ISI][Medline]
Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, and J. Postlethwait. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:15311545.
Frazer, K.A., L. Elnitski, D. M. Church, I. Dubchak, and R. C. Hardison. 2003. Cross-species sequence comparisons: a review of methods and available resources. Genome Res. 13:112.
Gilligan, P., S. Brenner, and B. Venkatesh. 2002. Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences. Gene 294:3544.[CrossRef][ISI][Medline]
Graves, J. A., and M. Westerman. 2002. Marsupial genetics and genomics. Trends Genet. 18:517521.[CrossRef][ISI][Medline]
Guigo, R., E. T. Dermitzakis, P. Agarwal et al. (12 coauthors). 2003. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc. Natl. Acad. Sci. USA 100:11401145.
Harris, D. A., D. L. Falls, F. A. Johnson, and G. D. Fischbach. 1991. A prion-like protein from chicken brain copurifies with an acetylcholine receptor-inducing activity. Proc. Natl. Acad. Sci. USA 88:76647668.[Abstract]
Harrison, P. M., and M. Gerstein. 2002. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 318:11551174.[CrossRef][ISI][Medline]
Hirotsune, S., N. Yoshida, A. Chen, L. Garrett, F. Sugiyama, S. Takahashi, K. Yagami, A. Wynshaw-Boris, and A. Yoshiki. 2003. An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene. Nature 423:9196.[CrossRef][ISI][Medline]
Hirsh, A. E., and H. B. Fraser. 2001. Protein dispensability and rate of evolution. Nature 411:10461049.[CrossRef][ISI][Medline]
Horiuchi, M., and B. Caughey. 1999. Specific binding of normal prion protein to the scrapie form via a localized domain initiates its conversion to the protease-resistant state. EMBO J. 18:31933203.
Jermiin, L. S., G. J. Olsen, K. L. Mengersen, and S. Easteal. 1997. Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis. Mol. Biol. Evol. 14:12961302.
Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241254.[CrossRef][ISI][Medline]
Kim, K. S., C. H. Kim, D. Y. Hwang, H. Seo, S. Chung, S. J. Hong, J. K. Lim, T. Anderson, and O. Isacson. 2003. Orphan nuclear receptor Nurr1 directly transactivates the promoter activity of the tyrosine hydroxylase gene in a cell-specific manner. J. Neurochem. 85:622634.[CrossRef][ISI][Medline]
Kirkness, E. F., V. Bafna, A. L. Halpern et al. (11 co-authors). 2003. The dog genome: survey sequencing and comparative analysis. Science 301:18981903.
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170179.[ISI][Medline]
Kitami, T., and J. H. Nadeau. 2002. Biochemical networking contributes more to genetic buffering in human and mouse metabolic pathways than does gene duplication. Nat. Genet. 32:191194.[CrossRef][ISI][Medline]
Kobayashi, A., H. Yamagiwa, H. Hoshino, A. Muto, K. Sato, M. Morita, N. Hayashi, M. Yamamoto, and K. Igarashi. 2000. A combinatorial code for gene expression generated by transcription factor Bach2 and MAZR (MAZ-related factor) through the BTB/POZ domain. Mol. Cell. Biol. 20:17331746.
Kretzschmar, H. A., L. E. Stowring, D. Westaway, W. H. Stubblebine, S. B. Prusiner, and S. J. DeArmond. 1986. Molecular cloning of a human prion protein cDNA. DNA 5:315324.[ISI][Medline]
Kudo, T., T. Katayama, K. Imaizumi, Y. Yasuda, M. Yatera, M. Okochi, M. Tohyama, and M. Takeda. 2002. The unfolded protein response is involved in the pathology of Alzheimer's disease. Ann. NY Acad. Sci. 977:49355.
Lander, E.S., L. M. Linton, B. Birren et al. 2001. (256 co-authors). Initial sequencing and analysis of the human genome. Nature 409:860921.[CrossRef][ISI][Medline]
Larsen, F., G. Gundersen, R. Lopez, and H. Prydz. 1992. CpG islands as gene markers in the human genome. Genomics 13:10951107.[ISI][Medline]
Lee, I.Y., D. Westaway, A. F. Smit et al. (13 co-authors). 1998. Complete genomic sequence and analysis of the prion protein gene region from three mammalian species. Genome Res. 8:10221037.
Makrinou, E., J. Collinge, and M. Antoniou. 2002. Genomic characterization of the human prion protein (PrP) gene locus. Mamm. Genome 13:696703.[CrossRef][ISI][Medline]
Mastrangelo, P., and D. Westaway. 2001. The prion gene complex encoding PrP(C) and Doppel: insights from mutational analysis. Gene 275:118.[CrossRef][ISI][Medline]
Mayor, C., M. Brudno, J. R. Schwartz, A. Poliakov, E. M. Rubin, K. A. Frazer, L. S. Pachter, and I. Dubchak. 2000. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16:10461047.[Abstract]
Miyajima, N., R. Horiuchi, Y. Shibuya, S. Fukushige, K. Matsubara, K. Toyoshima, and T. Yamamoto. 1989. Two erbA homologs encoding proteins with different T3 binding capacities are transcribed from opposite DNA strands of the same genetic locus. Cell 57:3139.[ISI][Medline]
Moore, R.C., I. Y. Lee, G. L. Silverman et al. (21 co-authors) 1999. Ataxia in prion protein (PrP)-deficient mice is associated with upregulation of the novel PrP-like protein doppel. J. Mol. Biol. 292:797817.[CrossRef][ISI][Medline]
Murphy, E.P., and O.M. Conneely. 1997. Neuroendocrine regulation of the hypothalamic pituitary adrenal axis by the nurr1/nur77 subfamily of nuclear receptors. Mol. Endocrinol. 11:3947.
Nielsen, H., J. Engelbrecht, S. Brunak, and G. von Heijne, 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Prot. Eng. 10:16.[Abstract]
O'Brien, S. J., M. Menotti-Raymond, W. J. Murphy, W. G. Nash, J. Wienberg, R. Stanyon, N. G. Copeland, N. A. Jenkins, J. E. Womack, and J. A. Marshall Graves. 1999. The promise of comparative genomics in mammals. Science 286:458462, 479481.
O'Brien, S. J., E. Eizirik, and W. J. Murphy. 2001. Genomics. On choosing mammalian genomes for sequencing. Science 292:22642266.
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Heidelberg, Germany.
Oidtmann, B., D. Simon, N. Holtkamp, R. Hoffmann, and M. Baier. 2003. Identification of cDNAs from Japanese pufferfish (Fugu rubripes) and Atlantic salmon (Salmo salar) coding for homologs to tetrapod prion proteins. FEBS Lett. 538:96100.[CrossRef][ISI][Medline]
Premzl, M., L. Sangiorgio, B. Strumbo, J. A. Marshall Graves, T. Simonic, and J. E. Gready. 2003. Shadoo, a new protein highly conserved from fish to mammals and with similarity to prion protein. Gene 314C:89102.
Prusiner, S. B. 1998. Prions. Proc. Natl. Acad. Sci. USA 95:1336313383.
Prusiner, S. B., and M. R. Scott. 1997. Genetics of prions. Annu. Rev. Genet. 31:139175.[CrossRef][ISI][Medline]
Quandt, K., K. Frech, H. Karas, E. Wingender, and T. Werner. 1995. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23:48784884.[Abstract]
Radman, M., I. Matic, and F. Taddei. 1999. Evolution of evolvability. Ann. NY Acad. Sci. 870:146155.
Rivera-Milla, E., C. A. Stuermer, and E. Malaga-Trillo. 2003. An evolutionary basis for scrapie disease: identification of a fish prion mRNA. Trends Genet. 19:7275.[CrossRef][ISI][Medline]
Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd edition. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.
Schwartz, S., Z. Zhang, K. A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller. 2000. PipMakera Web server for aligning two genomic DNA sequences. Genome Res. 10:577586.
Simonic, T., S. Duga, B. Strumbo, R. Asselta, F. Ceciliani, and S. Ronchi. 2000. cDNA cloning of turtle prion protein. FEBS Lett. 469:3338.[CrossRef][ISI][Medline]
Smith, S. W., R. Overbeek, C. R. Woese, W. Gilbert, and P. M. Gillevet. 1994. The genetic data environment: an expandable GUI for multiple sequence analysis. Comput. Appl. Biosci. 10:671675.[Abstract]
Strumbo, B., S. Ronchi, L. C. Bolis, and T. Simonic. 2001. Molecular cloning of the cDNA coding for Xenopus laevis prion protein. FEBS Lett. 508:170174.[CrossRef][ISI][Medline]
Suzuki, T., T. Kurokawa, H. Hashimoto, and M. Sugiyama. 2002. cDNA sequence and tissue expression of Fugu rubripes prion protein-like: a candidate for the teleost orthologue of tetrapod PrPs. Biochem. Biophys. Res. Commun. 294:912917.[CrossRef][ISI][Medline]
Taylor, J. S., I. Braasch, T. Frickey, A. Meyer, and Y. van de Peer. 2003. Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 13:382390.
Thomas, J. W., J. W. Touchman, R. W. Blakesley et al. (68 co-authors). 2003. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424:788793.[CrossRef][ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Wang, Y., J. Shen, N. Arenzana, W. Tirasophon, R. J. Kaufman, and R. Prywes. 2000. Activation of ATF6 and an ATF6 DNA binding site by the endoplasmic reticulum stress response. J. Biol. Chem. 275:2701327020.
Ward, A. C., and G. J. Lieschke. 2002. The zebrafish as a model system for human disease. Front. Biosci. 7:d827d833.[ISI][Medline]
Waterston, R. H., K. Lindblad-Toh, E. Birney et al. 2002. (222 co-authors). Initial sequencing and comparative analysis of the mouse genome. Nature 420:520562.[CrossRef][ISI][Medline]
Williams, G. W., P. M. Woollard, and P. Hingamp. 1998. NIX: a nucleotide identification system at the HGMP-RC. URL: http://www.hgmp.mrc.ac.uk/NIX/
Wingender, E., P. Dietze, H. Karas, and R. Knuppel. 1996. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24:238241.
Wolf, M. J., S. Easteal, M. Kahn, B. D. McKay, and L. S. Jermiin. 2000. TrExML: a maximum likelihood approach for extensive tree-space exploration. Bioinformatics 16:383394.[Abstract]
Zahn, R., A. Liu, T. Luhrs, R. Riek, C. von Schroetter, F. Lopez Garcia, M. Billeter, L. Calzolai, G. Wider, and K. Wuthrich. 2000. NMR solution structure of the human prion protein. Proc. Natl. Acad. Sci. USA 97:145150.
Zuegg, J., and J. E. Gready. 2000. Molecular dynamics simulation of human prion protein including both N-linked oligosaccharides and the GPI anchor. Glycobiology 10:959974.