*Division of Cardiovascular Research, Hospital for Sick Children and Department of Biochemistry, University of Toronto, Toronto, Canada;
Department of Anatomy and Physiology, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, Canada;
and
Department of Zoology, University of Toronto, Toronto, Canada
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two types of noncollagenous cartilage have been described in the lamprey, designated cranial or branchial according to the tissues in which they were first recognized (Robson et al. 1997
). While little sequence information is available for the cartilaginous proteins found in branchial cartilage structures, several cDNAs of lamprin, the predominant matrix protein of cranial cartilage, have been cloned (Robson et al. 1993
). While it is very different from collagen, lamprin is highly hydrophobic in nature and shares biochemical characteristics with vertebrate elastins (Wright, Keeley, and Youson 1983
; Wright and Youson 1983
). Furthermore, lamprin cDNAs revealed tandemly repeated sequences with similarities to repeat sequences found not only in mammalian and avian elastins, but also in structural proteins of some invertebrates, including insect chorion proteins and spider silks (Robson et al. 1993
). For example, a tandem repeat region of a silkmoth chorion class B protein shares a 21/24 amino acid identity with lamprin (Robson et al. 1993
). Even more striking is a 28/30 amino acid identity shared between oothecin, an eggshell protein of the cockroach, and the tandem repeat sequence of lamprin (Pau, Brunet, and Williams 1971
; Pau 1984
). While such sequence similarities might suggest descent from a common ancestral protein, there are several difficulties with arguments based on sequence conservation. For example, outside these regions of identity, the sequences of the proteins show essentially no other similarity. Although these regions of identity could reflect a common exon that has been shuffled between genes over time, a second explanation for the appearance of such isolated sequence identity is a mechanism dependent on sequence convergence.
Sequence convergence, defined as derived sequence similarity between proteins of unrelated origin, is thought to be a rare process, especially for soluble globular proteins (Doolittle 1994
). Structural convergence apparently does occur, but similar three-dimensional structures can be achieved by sequences that do not share much linear identity (Russell et al. 1997
). Thus, physical contacts in three dimensions generated by the sequence are more important than the precise linear array of amino acids in determining structure. For this reason, the probability of primary sequence convergence in soluble globular proteins is very low.
On the other hand, the likelihood of sequence convergence in simpler, nonglobular proteins may be somewhat greater. For example, a strong case for sequence convergence has been established for two antifreeze glycoproteins, where a tripeptide repeat important for inhibiting the growth of an ice crystal lattice has independently appeared in two unrelated groups, or taxa, of fishes, one from the Arctic and the other from the Antarctic (Chen, DeVries, and Cheng 1997
). Lamprin and the proteins with which it shares sequence similarity also appear to be relatively simple structurally. All are predominantly hydrophobic in nature and have the ability to spontaneously self-aggregate into stable polymeric matrices (Hamodrakas et al. 1983
; Bressan et al. 1986
; Robson et al. 1993
). Such self-aggregation has been suggested to be based on interdigitation of hydrophobic side chains in short ß-sheet/ß-turn structures (Marsh, Corey, and Pauling 1955
; Hamodrakas et al. 1982, 1983
; Robson et al. 1993
).
To understand the origins of lamprin, a novel protein found only in the lamprey, and to address the question of sequence convergence in these proteins, we determined the structure and organization of the genes encoding lamprin in two distinct species of lamprey, Petromyzon marinus and Lampetra richardsoni. We identified the presence of multiple alternately spliced genes which, in at least one of these species, appear to be located in tandem in a head-to-tail orientation in the genome. The 3' untranslated regions (UTRs) of lamprin transcripts are highly unusual in that they are generated from more than one exon. Characterization of the structures and organization of these lamprin genes provides further evidence that similarities between lamprin and insect structural proteins are a result of sequence convergence.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Southern Blotting For Determination of Gene Copy Number in P. marinus
After overnight digestion of P. marinus genomic DNA individually with HindIII, SalI, EcoRI, and EcoRV, restriction fragments were electrophoretically separated (0.6% agarose), transferred, and then UV cross-linked to Hybond-N nylon membrane (Amersham, Ontario, Canada). Membranes were incubated at 67°C for 9 h in prehybridization buffer (6 x SSPE, 5 x Denhardt's solution, 0.5% SDS, 100 µg/ml denatured herring sperm DNA). This buffer was then replaced by hybridization buffer (6 x SSPE, 0.5% SDS, 100 µg/ml denatured herring sperm DNA) with the appropriate -32P dCTP random prime-labeled probe, and incubation continued overnight at 67°C. Membranes were washed in 2 x SSC, 0.5% SDS for 5 min at 25°C, followed by 2 x SSC, 0.1% SDS for 15 min at 25°C and 0.1 x SSC, 0.5% SDS for 30 min at 37°C, with a final stringent wash in 0.1 x SSC, 0.5% SDS for 30 min at 68°C. Hybridized DNA was detected by autoradiography.
Amplification of Introns of Lamprin Genes from P. marinus
Individual lamprin genes in P. marinus were amplified from gDNA using gene-specific primers in the first and last exons of each gene. The primers were L-1.8-specific forward (5'-CGACAGCGAAACGAAACAAAAAATCCC-3'), L-1.8-specific reverse (5'-TTTGGTGGGTGGTAGTTGGCGGAGG-3'), L-0.9/L-0.8 forward (5'-AGCCCTCCTCTCCACGTCGTC-3'), L-0.8-specific reverse (5'-GACATTGCACAGATGATGAGAATTTG-3'), and L-0.9-specific reverse (5'-CAGGAACTGAACGAACGGCGAAAATAAAC-3'). The primary amplification reactions consisted of Advantage Tth polymerase mix (Clontech, California), 1 x amplification buffer supplied with the enzyme, 1.1 mM Mg(OAc)2, 200 µM dNTPs, 200 nM of each primer, and 0.1 µg of gDNA (isolated from a single animal) in a total reaction volume of 100 µl. The cycling parameters for amplification were as follows: denaturation at 94°C for 30 s, annealing at 65°C for 30 s, and extension at 68°C for 8 min for 30 cycles. After electrophoresis, amplification products were purified from agarose gel slices (Geneclean, BIO 101, California) and used as templates for intron-specific amplification reactions.
To determine the location of specific exon/intron junctions and intron sizes, the full-length amplified genes were used as templates at a 1/2,000 dilution in combination with sets of primers designed for neighboring exons. The primers used in these reactions were designed to contain either BamHI or EcoRI restriction sites to facilitate cloning of the amplification products into pBluescript plasmid vector (Stratagene, California). Cloned products were sequenced at the HSC Biotechnology Service Centre (Toronto, Ontario, Canada).
Determination of Alternative Splicing of Lamprin Genes from P. marinus
Two primer sets were designed for each of the two P. marinus lamprin genes (L-1.8 and L-0.8). Both primer sets for L-1.8 used a reverse primer designed for exon 7 (5'-CGGAATTCGATCGCCTAGCAATTCTCCAGTCA-3'; italicized sequence represents restriction site and clamp sequence), an exon found only in this gene. This primer was used in combination with two different forward primers, one to exon 3 (5'-GCGGATCCTGGTGGTCTAGGCTATCCCG-3') and the other to exon 4 (5'-GCGGATCCACCCTTACGGTGGACTTGGATAC-3'). For L-0.8, the same two forward primers were used, but in combination with the L-0.8-specific reverse primer (described in Amplification of Introns,, above), which anneals to exon 8. The reaction mixture contained 0.25 U of Taq polymerase (Life Technologies, Maryland), 1 x PCR buffer supplied with the enzyme, 3 mM MgCl2, 200 µM dNTPs, 0.5 µM of each primer, and 1 µl of an annular cartilage cDNA library (Robson et al. 1993
) in a 50-µl reaction volume. The cycling parameters were as follows: denaturation at 94°C for 45 s, annealing at 58°C for 45 s, and extension at 72°C for 1 min for 34 cycles. Products were analyzed on a 1.5% agarose gel, subcloned into pBluescript, and sequenced.
Genome Walking for 5'-Upstream Sequence of Lamprin Genes from P. marinus
The Universal GenomeWalker kit (Clontech) was used to amplify 5'-upstream sequences. Briefly, P. marinus genomic DNA was digested to completion separately with DraI, EcoRV, PvuII, ScaI, and StuI. Adapters were ligated to the ends of the restriction fragments, and these "libraries" were used as templates to amplify the 5'-upstream sequences. For the primary amplification reactions, reverse primers specific to exon 1 of either L-1.8 (5'-GCTTGCATGGTGGCGGCCATTTTTATTT-3') or L-0.9/0.8 (5'-CAGAGCTTGGATAGCGGCGGCCATTTTTC-3') were used in combination with adapter primer 1 from the kit. After amplification, a fraction of this reaction was used as template in a second round of amplification using nested primers relative to the first set. Adapter primer 2 (supplied with the kit) was used as the forward primer in combination with a reverse primer that recognized all three lamprin genes (5'-GTGGAGAGGAGGGCTTTGAGTTGAGGCTG-3'). Amplification products were cloned into pBluescript plasmid vector with the use of the SalI site in the adapter and an EcoRI site in the second gene-specific primer. These clones were sequenced at the HSC Biotechnology Service Centre. The two forward primers used to distinguish between L-0.9 and L-0.8 GenomeWalker products were 5'-AGGCTTCAATGTGTAGCGTATTTTG-3' and 5'-AATACCTCCCATATTTTGACAG-3'.
Primer Extension for Transcription Start Site Determination in Lamprin Genes from P. marinus
Primer extension followed a previously described protocol (Ausubel et al. 1992, pp. 4.8.14.8.4
). Two different primers complementary to the 5' UTR of L-0.9 (primer 1: 5'-TTAAGGGTTTTCTACTTTCGCTGTC-3'; primer 2: 5'-CAGAGCTTGGATAGCGGCGGCCATTTTTC-3') were hybridized to 10 µg of P. marinus juvenile adult annular cartilage or kidney total RNA. The primer extension products were separated by electrophoresis on a 9% acrylamide/7 M urea sequencing gel, along with a standard sequencing reaction as a size marker. Bands were visualized by exposure to X-ray film.
Lamprin Genomic Clone Isolation and Characterization from L. richardsoni
An L. richardsoni genomic library was constructed by partially digesting genomic DNA with Sau3AI and cloning these fragments into the lambda FIX II/XhoI partial fill-in vector (Stratagene) following the instructions supplied by the manufacturer. To confirm an adequate size for the genomic inserts, 10 random clones were picked. An average size of approximately 17 kb was determined for the inserts. For amplification, a total of 2.5 x 105 plaques containing approximately 4.25 x 106 kb of genomic DNA were used. Radiolabeled full-length clones of the P. marinus cDNAs, L-1.8-10 and L-0.9-10 (Robson et al. 1993
) were used to screen
500,000 plaques from the amplified library by conventional filter hybridization methods. Three rounds of screening identified seven unique genomic clones. Lambda preps followed either the liquid lysate and CsCl
prep protocols (Ausubel et al. 1987, pp. 6.5.16.5.2
) for large-scale preps or a previously described small-scale prep protocol (Grossberger 1987
).
SalI restriction fragments of each genomic clone were subcloned into pBluescript for sequencing. To map the SalI restriction fragments within the full-length genomic clones, clones were fully digested with NotI, followed by partial digestion with SalI. Hybridization to radiolabeled T3 or T7 oligonucleotides indicated the order of these fragments within the genomic clones. Southern blot analysis identified lamprin-positive restriction fragments within the SalI-digested genomic clones.
Sequence Analysis
Computer-assisted analyses of all sequences were performed with the GCG set of molecular biology software programs.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Because these lamprin cDNAs could cross-hybridize with one another, we used the less-conserved intronic regions (see Exon/Intron Structure, below) as probes in Southern blots to determine gene copy number. Sequence comparisons between intron 1 of L-1.8 and L-0.9 indicate 48.6% identity over the first 184 bp and 63.3% identity over the last 166 bp. Intron 1 of L-1.8 is approximately 2.1 kb, compared with 787 bp for intron 1 of L-0.9. This suggests that neither intron should hybridize with the other, particularly under stringent washing conditions. When intron 1 of L-1.8 was used as a probe, only a single band hybridized to P. marinus genomic DNA digested individually with four different restriction enzymes (fig. 1A ). This indicated that there was only a single copy of this gene in the P. marinus genome. Similar results were seen for a second Southern blot using a probe containing 3' UTR sequence unique to L-1.8 (data not shown).
|
To distinguish between the two or more copies of L-0.9, we focused on the normally less well conserved 3' UTR regions to identify differences in sequence between the respective mRNAs. The full-length 3' UTR of L-0.9 was amplified from genomic DNA using primers that would not recognize L-1.8. Two bands resulted, one of the size expected if there were no introns present between the primers, and a second, smaller, band. The smaller band was not the result of a spurious amplification product from an L-0.9 template, as an L-0.9 cDNA yielded only the larger product. Sequencing of the smaller band demonstrated that it was very similar to the 3' UTR of the L-0.9-12 transcript, except for a 184 bp deletion (from position 693 to position 877 in GenBank accession L05925). The similarity between these two products was much greater than that to the corresponding region of L-1.8 (97.2% vs. 81.9% identity). These results therefore indicated the presence of a third gene for lamprin, which was very similar, but not identical, to L-0.9, differing mainly in the length of its 3' UTR. This third gene was thereafter designated L-0.8. Sequence differences between L-0.9 and L-0.8 3' UTRs were utilized to design specific primers for each, which were then used to amplify intronic regions (see Exon/Intron Structure, below).
Sequence comparisons between intron 1 of L-0.9 and intron 1 of L-0.8 indicated 96.5% identity over the first 184 bp and 95.2% identity over the last 166 bp, much greater identities than to the corresponding regions of intron 1 of L-1.8 (see above). With this degree of sequence identity, cross-hybridization would be expected even under the stringent washing conditions used. Therefore, analysis of the Southern blot in figure 1B suggests that the darker band in the HindIII lane corresponded to the L-0.9 gene, and the less strongly hybridizing band represented the L-0.8 gene. The third band in the EcoRV lane is the result of an EcoRV site within intron 1 of L-0.8, 581 bp from the 3' end. A similar Southern blot using intron 6 of the L-0.8 gene as a probe showed two bands in both the HindIII and EcoRI lanes of the Southern blot, although with differing intensities (fig. 1C ). The same two bands were seen using intron 6 from L-0.9 as a probe, but with their respective intensities reversed (data not shown). There is 93% identity between introns 6 of L-0.8 and L-0.9, and there is 77.5% identity between introns 6 of L-0.8 and L-1.8. Neither L-0.8 intron 6 nor L-0.9 intron 6 contains internal HindIII or EcoRI sites. Therefore, the results of the Southern blots suggested that these probes were each recognizing both L-0.9 and L-0.8 genes, but each was hybridizing more strongly to itself than to the other gene. The lack of a third band using these probes argues against an additional L-0.9-like gene. Together with L-1.8, this means that lamprin is coded for by a total of three genes in P. marinus.
Exon/Intron Structure of Lamprin Genes from P. marinus
Although screening of a genomic library would have been the preferred method of cloning the three P. marinus lamprin genes, difficulty in the construction of this genomic library resulted in an alternative PCR approach. The first round of amplification used three different primer sets, each specific to one of the lamprin genes. Forward primers were designed for the 5' UTRs, and reverse primers were designed for the 3' UTRs. For L-0.8 and L-0.9, the same forward primer was used. This forward primer differed from the corresponding region in L-1.8 at 2/21 positions. The reverse primer for L-0.8 amplification spanned the point of deletion in the 3' UTR that distinguished L-0.8 from L-0.9. The sequence inserted at this site in L-0.9 was used to construct the L-0.9 reverse primer. For L-1.8, the forward primer differed from the corresponding region in the other two genes at 11/27 positions. The reverse primer had 21/25 and 20/25 sequence similarity to L-0.9 and L-0.8, respectively. The sequence differences between these primer sets, combined with the high annealing temperature used in the amplification reaction, were sufficient for gene-specific amplification. The three PCR products generated from these primer sets ranged from 8 to 9.5 kb in size. These products were used as templates for the identification of the individual exon/intron boundaries. The amplification products were subcloned and sequenced to determine precise exon/intron junctions. The exon boundaries are summarized in figure 2.
|
Details of intron/exon boundaries are given in figure 3
. All introns fit the standard GT-AG rule for consensus sequence for splicosomal introns (Tarn and Steitz 1997
). All exons within the protein-coding region can be defined as class 1,1. That is, the intron interrupts a codon between the first and second bases. The smallest and largest exons (exons 7 and 8, of sizes 42 and 1,303 bp, respectively) both encode regions of the 3' UTR of L-1.8. The fully protein-encoding exons vary in size from 57 to 129 bp, which translates to 1943 amino acids. The average length of the introns is 980 bp. The smallest intron is intron 4 of L-0.8 (96 bp). The largest intron is intron 1 of L-1.8 (approximately 2,100 bp).
|
Although the mRNA for L-1.8-10 lacked the sequence coded for by exon 4, characterization of the gene for L-1.8 clearly demonstrated the presence of this exon (fig. 3 ). Similarly, sequence for exon 4 was also present in L-0.8. In order to determine whether alternative splicing of exon 4 might also take place in transcripts of L-1.8 and L-0.8, primer sets specific for each of the two genes were designed. The forward primer corresponded to exon 3 and recognized both genes. The L-1.8-specific reverse primer corresponded to a sequence in exon 7, an exon present only in L-1.8. The L-0.8-specific reverse primer was the same as that used for amplification of this entire gene.
PCR amplification of the original annular cartilage cDNA library with these primer sets produced two bands per primer set (lane 1 in fig. 4A and B ). These differed in size by approximately 60 bp, a size consistent with that of exon 4. The use of a forward primer corresponding to exon 4 rather than exon 3 in this amplification resulted in a single band in each (lane 3 in fig. 4A and B ). This confirmed that the pair of bands produced from the first primer set was the result of alternate splicing in exon 4 and not due to splicing differences downstream of this exon. The presence of exon 4 was confirmed by sequencing appropriate amplification products. These results indicated that exon 4 was subject to alternative splicing in all three lamprin genes, resulting in a total of six distinct mRNA products.
|
To identify SalI fragments that were hybridization-positive for lamprin cDNAs, two Southern blots were done on the seven SalI-digested genomic clones. One probe was a 177-bp fragment from the 5' end of the L-0.9-12 cDNA. A second probe contained the remainder of the sequence from this cDNA. In five of the seven clones (A, B, D, F, and G in fig. 5A and B ), the same two bands hybridized with both the 5' and the 3' probes. In clone C, a single (but different) fragment hybridized to each of the probes. In clone E, one fragment hybridized to the 5'-end probe. This fragment, as well as a second fragment, hybridized to the 3'-end probe in this clone. These data provided the first indication that the lamprin genes in L. richardsoni were arranged in tandem.
|
Identification of the Transcription Start Site in P. marinus
Since a genomic library was not available for P. marinus, a PCR-based genome-walking approach was used to isolate 5'-upstream sequence. This approach yielded 909, 350, and 370 bp of sequence from L-1.8, L-0.9, and L-0.8, respectively. Initially, L-0.9 and L-0.8 promoter regions were amplified together, since the reverse gene-specific primers could not distinguish between the two templates. Fortunately, sequence differences became apparent between the newly cloned upstream sequences from L-0.8 and L-0.9 (see fig. 7
; positions -190 to -170). Forward primers designed for this region specific for either L-0.8 or L-0.9, combined with a reverse primer for exon 2, allowed us to take advantage of the size differences found in intron 1 between these two genes. The sizes of the amplified products from genomic DNA using these primer sets defined which 5'-upstream sequence corresponded to which gene.
|
|
The most divergent of the five upstream/promoter sequences is that of the P. marinus L-1.8 gene (fig. 6 ). This gene shows strong sequence identity up to position -196 in the alignment, but no identifiable similarities with the other four sequences upstream of this position. However, a region much farther upstream in this gene (-685 to -564) contains significant sequence similarity to the -298 to -174 region in the alignment of the other genes. Given the location-independent nature of enhancer elements, conservation of this sequence in L-1.8 at another site suggests that the sequence could be important for regulation of lamprin gene transcription. The remaining four genes maintain strong sequence similarity to the end of the known sequences for P. marinus L-0.9 and L-0.8, approximately 400 bp upstream of the transcription start site. Sequence similarity between the two L. richardsoni lamprin genes remains very high up to the point approximately 530 bp upstream from the start site, beyond which sequence similarity is lost.
Lamprin Protein Sequence Comparisons
We previously described the protein-coding sequence for three P. marinus lamprin cDNAs encoded by two genes (Robson et al. 1993
). From the gene sequence information determined in this study, we added new protein sequence information from the L. richardsoni genes and further P. marinus sequences from L-0.8 and exon 4 of L-1.8. We aligned all available protein-coding sequence in the context of the genomic structure to identify sequence differences (fig. 8
). The majority of positions are conserved. Where sequence differences are present, they usually involve a conservative substitution (e.g., the first four variable positions in exon 1). There are, however, some notable differences in protein sequence. For example, two 5-aa deletions are found in L. richardsoni clone D1, one, in exon 3, which deletes the first pentapeptide repeat and the second in exon 5. The most striking nonconservative amino acid substitution is the replacement of a small hydrophobic alanine for a large basic arginine in the middle of the pentapeptide repeat found in exon 3 of clone D1. This is the pentapeptide repeat which shares sequence similarity with elastin and the insect structural proteins and also occurs as singlets in exons 4 and 5 of lamprin and in spider silk proteins.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As has been suggested for other multiple-copy genes (Long and Dawid 1980
), including those for histones, rRNAs, and tRNAs, multiple lamprin genes may have evolved as a result of a requirement for synthesis of large amounts of the lamprin gene product. Lamprin is a highly abundant protein in lamprey cranial cartilages, accounting for at least 50% of the dry weight of the tissue (Wright, Keeley, and Youson 1983
). Structural proteins of comparable abundance in their respective tissues are known to be single-copy genes (e.g., the 72-kDa elastin of aorta [Indik et al. 1987
; Olliver et al. 1987
] and the 96-kDa
1(II) collagen chain of mammalian cartilage [Cremer, Rosloniec, and Kang 1998
]). However, because of the small size of the lamprin protein (1012 kDa), comparable rates of gene expression for lamprin, elastin, and collagen would yield only a sixth as much lamprin as elastin or collagen. Therefore, multiple lamprin genes may have evolved to provide the necessary quantities of this abundant but relatively small protein.
Investigation of the genomic structure of lamprin genes from L. richardsoni provided clear evidence for organization of these genes in tandem in the genome of this species. Five of the seven unique genomic clones isolated from L. richardsoni contained two lamprin genes, and sequence data for clone D revealed that the two genes located in this clone were oriented in a head-to-tail fashion and separated by no more than 7 kb of intervening sequence. Although 7 kb is technically well within the size range for PCR amplification, attempts to amplify intervening sequence between P. marinus genes were unsuccessful. This suggests that if lamprin genes are situated in tandem in P. marinus, the intervening sequences may be much larger.
Alternative Splicing
The results presented here demonstrate that all three genes of P. marinus can be alternatively spliced in exon 4, yielding a total of six possible protein products. The functional significance of these alternatively spliced transcripts is unknown, but the preservation of alternative splicing in all three genes suggests some importance. The amino acid sequence encoded in the alternatively spliced exon is not unlike that of the remainder of the protein and contains one copy of the GGLGY pentapeptide, which we have speculated (Robson et al. 1993
) is important for the extracellular assembly of the insoluble lamprin matrix. The difference in the size of the protein created by the additional exon, rather than the primary sequence of exon 4, may be the functionally significant feature of this alternative splicing, perhaps leading to different fiber-forming properties of lamprin. Other extracellular matrix proteins, including fibronectin (Gehris et al. 1996
) and elastin (Rosenbloom et al. 1991
), are known to undergo alternative splicing. At least in the case of fibronectin, this alternative splicing is both temporally and spatially regulated. Although we have shown that forms of lamprin both containing and lacking exon 4 are found in annular cartilage at the juvenile adult stage of development, differential expression of these alternatively spliced variants could potentially be important at other stages of development.
Structure of the 3' UTR of Lamprin Genes
An unusual feature of the exon structure of lamprin genes is the presence of multiple exons encoding the 3' UTR (fig. 2
). In contrast to the case for all three lamprin genes described here, the translational stop codon in the majority of genes is located within the 3'-most exon. A recent exhaustive survey of stop codon position relative to exon structure found that only 7% of characterized genes have the translational termination signal within their penultimate exon (Nagy and Maquat 1998
), as is the case for lamprin genes L-0.8 and L-0.9. Location of the stop codon in the third to the last exon, as for L-1.8, was seen in only 0.4% of all genes.
In the same study, the normal termination codon position was found to be <50 bases upstream of the 3'-most exon in 98% of the genes which had one or more 3' untranslated exons. This is thought to be related to a nonsense-mediated decay mechanism (Hentze and Kulozik 1999
) whereby a termination codon located more than 5055 nt upstream of the 3'-most exon-exon junction results in mRNA decay (Nagy and Maquat 1998
), protecting the cell from prematurely terminated transcripts. The termination codon for all three lamprin genes falls within this range. For example, according to this termination codon position rule, exon 7 of L-1.8 could be a maximum of 45 nt in size. At 42 nt (the smallest exon found in the lamprin genes), exon 7 fulfills this requirement.
Splicing of exons into a mature mRNA is a complex process (Berget 1995
; Tarn and Steitz 1997
), suggesting that the presence of this additional exon in the 3' UTR of L-1.8 may have some functional significance. Interestingly, the MFOLD program in GCG predicts a very strong hairpin structure in contiguous exon 6 and 8 sequences for all three genes. However, when exon 7 sequence is included (as it is in the naturally occurring L-1.8 cDNA), this structure is disrupted (data not shown). Band compressions seen during the sequencing of the L-0.9 cDNA also suggested secondary structure in this region. As elements of secondary structure in mRNAs have been reported to be involved in regulating translational efficiency and mRNA stability (Munro and Eisenstein 1989
; Klaff, Riesner, and Steger 1996
), this unusual organization of the 3' UTR in the lamprin genes may have some functional role in these processes.
Lamprin Promoter Sequences
Both oligonucleotides used in the primer extension studies of L-0.9 mapped to the same transcription start site, and a characteristic TATA box was found just upstream of this location, within the usual range of these elements (Bucher 1990
). Although the transcriptional start site was experimentally determined only for the L-0.9 gene, the strong sequence conservation in this region suggests that this site is likely maintained in all three lamprin genes.
Regions of sequence conservation among the three P. marinus genes and the two L. richardsoni genes may be useful in suggesting the location of functionally important enhancer elements. However, no functionally significant (i.e., cartilage-specific) enhancer elements were recognized in these lamprey genes. The nature of enhancer elements, relatively short in length and somewhat variable in sequence, combined with the remote phylogenetic position of the lamprey, suggests that transcription factor databases may not be particularly useful in identifying potential sites in lamprey genes. Thus, other than the well-conserved TATA box and cap signal, the identification of specialized promoter/enhancer elements in lamprey genes will require further functional promoter studies.
Evidence for Sequence Convergence
Regions of sequence similarity, consisting of shared tandem pentapeptide repeats based on the GGLGY sequence of lamprin, had previously been identified among lamprey cartilage proteins, insect eggshell proteins, and mammalian and avian elastins (fig. 9 ) (Robson et al. 1993
). Several considerations suggest that these similarities in sequence are more likely to be an example of sequence convergence than the result of divergence from a common ancestral protein existing more than 500 MYA.
|
Differences in gene structure between lamprin and chorion genes also support arguments for sequence convergence. We have shown that exon junctions of lamprin genes are all of class 1,1, with introns splitting codons between the first and second bases. This type of junction is common among extracellular matrix proteins (Patthy 1987, 1991
), including elastin (Bashir et al. 1989
; Yeh et al. 1989
) and would allow for exon shuffling and duplication (Patthy 1987
), mechanisms which appear to have been prevalent in the evolution of at least some extracellular matrix proteins (Patthy 1991
). Although the genomic organization of oothecin is not known, the single intron of the large family of silkmoth chorion proteins always falls between complete codons (Spoerel, Nguyen, and Kafatos 1986
; Bucher 1990
; Hibner, Burke, and Eickbush 1991
). Such a difference in exon classes between lamprin and chorion genes make it less likely that they share exons from a common ancestral gene, since simple exon shuffling using exons of different classes would interfere with the reading frame and alter protein sequence. If this is the case, then sequence similarities between lamprin and oothecin, which share a 28/30 amino acid sequence identity, may represent one of the best examples of primary sequence convergence so far identified.
Sequence convergence is presumed to be driven by structural or functional properties imparted to the protein by the shared sequence. All of the proteins showing sequence similarities to lamprin, including elastins and the insect structural proteins, are components of the extracellular matrix, where they are assembled into extensive fibrillar structures. We have suggested elsewhere (Robson et al. 1993
) that such sequences may be important for the ability of these predominantly hydrophobic proteins to self-organize into a polymeric, fibrillar matrix. Requirements for such hydrophobic self-aggregation may be an important factor limiting the selection of amino acids used in the tandem repeats in these various structural proteins. Therefore, such simple repetitive sequences may have been independently "reinvented" in unrelated proteins in order to provide such characteristics. Cloning and characterization of morphologically and biochemically similar proteins which form the extracellular matrix of lamprey branchial cartilage (Robson et al. 1997
), hagfish cartilages (Robson, Wright, and Keeley 2000
), and noncollagenous cartilages of several invertebrates (unpublished data
) will shed additional light on the functional and evolutionary relationships among these unusual cartilage structural proteins.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Division of Cardiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania.
2 Keywords: convergent evolution
lamprey
cartilage
extracellular matrix
3 Address for correspondence and reprints: Fred W. Keeley, Division of Cardiovascular Research, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, Canada M5G 1X8. E-mail: fwk{at}sickkids.on.ca
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ausubel, V. M., R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith, and K. Struhl. 1987. Current protocols in molecular biology. John Wiley and Sons, New York.
. 1992. Current protocols in molecular biology. John Wiley and Sons, New York.
. 1998. Current protocols in molecular biology. John Wiley and Sons, New York.
Bashir, M. M., Z. Indik, H. Yeh, N. Ornstein-Goldstein, J. C. Rosenbloom, W. Abrams, M. Fazio, J. Uitto, and J. Rosenbloom. 1989. Characterization of the complete human elastin gene. Delineation of unusual features in the 5'-flanking region. J. Biol. Chem. 264:88878891.
Berget, S. M. 1995. Exon recognition in vertebrate splicing. J. Biol. Chem. 270:24112414.
Boyd, C. D., A. M. Christiano, R. A. Pierce, C. A. Stolle, and S. B. Deak. 1991. Mammalian tropoelastin: multiple domains of the protein define an evolutionarily divergent amino acid sequence. Matrix 11:235241.
Bressan, G. M., I. Pasquali-Ronchetti, C. Fornieri, F. Mattioli, I. Castellani, and D. Volpin. 1986. Relevance of aggregation properties of tropoelastin to the assembly and structure of elastic fibers. J. Ultrastr. Mol. Struct. Res. 94:209216.[ISI][Medline]
Bucher, P. 1990. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212:563578.[ISI][Medline]
Chen, L., A. L. DeVries, and C. H. Cheng. 1997. Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. Proc. Natl. Acad. Sci. USA 94:38173822.
Cremer, M. A., E. F. Rosloniec, and A. H. Kang. 1998. The cartilage collagens: a review of their structure, organization, and role in the pathogenesis of experimental arthritis in animals and in human rheumatic disease. J. Mol. Med. 76:275288.[ISI][Medline]
Doolittle, R. F. 1994. Convergent evolution: the need to be explicit. Trends Biochem. Sci. 19:1518.[ISI][Medline]
Gehris, A. L., S. A. Oberlender, K. J. Shepley, R. S. Tuan, and V. D. Bennett. 1996. Fibronectin mRNA alternative splicing is temporally and spatially regulated during chondrogenesis in vivo and in vitro. Dev. Dyn. 206:219230.[ISI][Medline]
Grossberger, D. 1987. Minipreps of DNA from bacteriophage lambda. Nucleic Acids Res. 15:6737.
Hamodrakas, S. J., S. A. Asher, G. D. Mazur, J. C. Regier, and F. C. Kafatos. 1982. Laser Raman studies of protein conformation in the silkmoth chorion. Biochim. Biophys. Acta 703:216222.
Hamodrakas, S. J., J. R. Paulson, G. C. Rodakis, and F. C. Kafatos. 1983. X-ray diffraction studies of a silkmoth chorion. Int. J. Biol. Macromol. 5:149153.[ISI]
Hardisty, M. W. 1981. The skeleton. Pp. 333376 in M. W. Hardisty and I. C. Potter, ed. The biology of lampreys. Vol. 3. Academic Press, London.
Hentze, M. W., and A. E. Kulozik. 1999. A perfect message: RNA surveillance and nonsense-mediated decay. Cell 96:307310.
Hibner, B. L., W. D. Burke, and T. H. Eickbush. 1991. Sequence identity in an early chorion multigene family is the result of localized gene conversion. Genetics 128:595606.
Indik, Z., K. Yoon, S. D. Morrow, G. Cicila, J. Rosenbloom, J. Rosenbloom, and N. Ornstein-Goldstein. 1987. Structure of the 3' region of the human elastin gene: great abundance of Alu repetitive sequences and few coding sequences. Connect. Tissue Res. 16:197211.[ISI][Medline]
Klaff, P., D. Riesner, and G. Steger. 1996. RNA structure and the regulation of gene expression. Plant Mol. Biol. 32:89106.[ISI][Medline]
Long, E. O., and I. B. Dawid. 1980. Repeated genes in eukaryotes. Annu. Rev. Biochem. 49:727764.[ISI][Medline]
Marsh, R. E., R. B. Corey, and L. Pauling. 1955. An investigation of the structure of silk fibroin. Biochem. Biophys. Acta 16:134.
Morris, S. C. 1997. Defusing the Cambrian explosion? Curr. Biol. 7:R71R74.
Moss, M. L. 1977. Skeletal tissues in sharks. Am. Zool. 17:335342.[ISI]
Munro, H. N., and R. S. Eisenstein. 1989. Translational control: the ferritin story. Curr. Biol. 1:11541159.
Nagy, E., and L. E. Maquat. 1998. A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23:198199.[ISI][Medline]
Olliver, L., P. A. Luvalle, J. M. Davidson, J. Rosenbloom, C. G. Mathew, A. J. Bester, and C. D. Boyd. 1987. The gene coding for tropoelastin is represented as a single copy sequence in the haploid sheep genome. Coll. Rel. Res. 7:7789.[ISI][Medline]
Parker, W. K. 1883. On the skeleton of Marsipobranch fishes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 174:411458.
Patthy, L. 1987. Intron-dependent evolution: preferred types of exons and introns. FEBS Lett. 214:17.[ISI][Medline]
. 1991. Modular exchange principles in proteins. Curr. Opin. Struct. Biol. 1:351367.
Pau, R. N. 1984. Cloning of cDNA for a juvenile hormone-regulated oothecin mRNA. Biochim. Biophys. Acta 782:422428.
Pau, R. N., P. C. J. Brunet, and M. J. Williams. 1971. The isolation and characterization of proteins from the left colleterial gland of the cockroach, Periplaneta americana (L.). Proc. R. Soc. Lond. B Biol. Sci. 177:565579.[ISI]
Robson, P., G. M. Wright, and F. W. Keeley. 2000. Distinct non-collagen based cartilages comprising the endoskeleton of the Atlantic hagfish, Myxine glutinosa. Anat. Embryol. (in press).
Robson, P., G. M. Wright, E. Sitarz, A. Maiti, M. Rawat, J. H. Youson, and F. W. Keeley. 1993. Characterization of lamprin, an unusual matrix protein from lamprey cartilage. Implications for evolution, structure, and assembly of elastin and other fibrillar proteins. J. Biol. Chem. 268:14401447.
Robson, P., G. M. Wright, J. H. Youson, and F. W. Keeley. 1997. A family of non-collagen-based cartilages in the skeleton of the sea lamprey, Petromyzon marinus. Comp. Biochem. Physiol. 118B:7178.
Rosenbloom, J., M. Bashir, H. Yeh, J. Rosenbloom, N. Ornstein-Goldstein, M. Fazio, V. M. Kahari, and J. Uitto. 1991. Regulation of elastin gene expression. Ann. N.Y. Acad. Sci. 624:116136.[Abstract]
Russell, R. B., M. A. Saqi, R. A. Sayle, P. A. Bates, and M. J. Sternberg. 1997. Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269:423439.[ISI][Medline]
Spoerel, N., H. T. Nguyen, and F. C. Kafatos. 1986. Gene regulation and evolution in the chorion locus of Bombyx mori. Structural and developmental characterization of four eggshell genes and their flanking DNA regions. J. Mol. Biol. 190:2335.[ISI][Medline]
Tarn, W.-Y., and J. A. Steitz. 1997. Pre-mRNA splicing: the discovery of a new spliceosome doubles the challenge. Trends Biochem. Sci. 22:132137.[ISI][Medline]
Wright, G. M., F. W. Keeley, and J. H. Youson. 1983. Lamprin: a new vertebrate protein comprising the major structural protein of adult lamprey cartilage. Experientia 39:495497.
Wright, G. M., and J. H. Youson. 1983. Ultrastructure of cartilage from young adult sea lamprey, Petromyzon marinus L: a new type of vertebrate cartilage. Am. J. Anat. 167:5970.[ISI][Medline]
Yeh, H., N. Anderson, N. Ornstein-Goldstein et al. (11 co-authors). 1989. Structure of the bovine elastin gene and S1 nuclease analysis of alternative splicing of elastin mRNA in the bovine nuchal ligament. Biochemistry 28:23702374.