From the
Reverse transcriptase (RTase), ()rather than being
unique to retroviruses as once appeared likely, is encoded in many DNA
genomes (1, 2) . RTase genes occur within 1) the
retrovirus-like class I retrotransposons that have long terminal
repeats (e.g. Ty of yeast), 2) class II (or LINE-like)
retrotransposons that lack terminal repeats of any kind and typically
have a dA-rich stretch at the 3`-end of the sense DNA strand, and 3)
unusual retroelements of chromosomal and organellar origin. Telomerase
is also an RTase(3) . Molecular phylogenetic analyses indicate
that the RTases encoded by the class II retrotransposons and
retroelements (called here type 2 RTases) are more like one another in
predicted amino acid sequence than they are like the RTases (type 1)
encoded by retroviruses and class I
retrotransposons(4, 5, 6, 7) .
Soon after the discovery of the coding sequences predicting the type 2 RTases, many investigators recognized that these enzymes would be mechanistically distinctive from the already well known type 1 enzymes (8, 9, 10, 11, 12) ; the complex series of events that assure replication of long terminal repeats, including internal primer binding sites on the RNA template, priming by tRNAs, and template switching, is unnecessary. Another difference between type 1 and type 2 RTases is the apparent absence, in most type 2 coding sequences, of segments that predict an RNase H. As summarized in this review, a few type 2 enzymes and telomerase have now been studied and, remarkably, each of them has a distinctive priming mechanism. Moreover, none of them appears to utilize as primer a nucleotide covalently bound to the RTase protein as does the type 1 hepatitis B virus RTase(13) .
The RTases of LINE-like Retrotransposons
When dNTPs and in vitro synthesized R2Bm RNA are added to the integrase reaction mixtures, reverse transcription occurs and is dependent upon the presence of the rDNA target site. Analysis of the cDNA product demonstrated that the 3`-hydroxyl at the first nick in the DNA target is the primer for the RTase; the template is R2Bm RNA (17) (Fig. 1a). These findings indicated that target site cleavage and reverse transcription are coupled, as predicted by early models for LINE-like element insertion(9, 10, 11, 12) .
Figure 1: Schematic representations of four RTase reactions described in the text: a, R2Bm; b, Mauriceville plasmid; c, bacterial retron; d, telomerase. Red is RNA, black is DNA, and green is newly synthesized DNA. Refer to the text for explanations and references.
The relative rates at which the reaction products accumulate are as expected if second strand cleavage follows reverse transcription. Thus, the initial product is a branched chain in which the cDNA copy of the R2Bm RNA is covalently linked to the 28 S rDNA and hydrogen bonded to the RNA. Significantly, although an RNA chain lacking the R2Bm sequences can act as cofactor for second strand cleavage, only RNAs containing the 3`-end of R2Bm RNA can serve as the template for the RTase; it appears that sequences within the 250-nt long 3`-untranslated region of R2Bm RNA are recognized by the enzyme and, together with 5-10 residues in the primer end of the 28 S rDNA, permit chain synthesis(18) .
All known genomic R2Bm elements have 4 As at
the 3`-junction with 28 S rDNA. Experiments with in vitro synthesized R2Bm RNA containing varying numbers of terminal As
indicate that the efficiency of cDNA synthesis decreases as the
terminus changes from 4 to 1 to 8 As(17, 18) .
Substitution of the four As with other bases can yield efficient
templates; many of the cDNA products initiate at the RNA terminal
residue. When the (A) terminus is reduced in length,
occasional internal initiations and frequent additions of extra
nontemplated residues occur. These observations may reflect constraints
imposed by the necessity of the enzyme to position itself accurately on
both the RNA template and the 28 S rDNA(18) . Virtually nothing
is known about second strand DNA synthesis or the fate of the RNA
template except that the two overhanging bases left after cleavage of
the second 28 S rDNA strand are removed.
The Prokaryotic Group of RTases
A subset of type 2 RTases can be defined based on the similarity of predicted amino acid sequences(6, 7) . This subset includes the enzymes encoded by mitochondrial plasmids found in some strains of Neurospora, bacterial retrons, and group II introns and is called the prokaryotic group, reflecting the prokaryote origin of mitochondria and chloroplasts(7, 25) .
The 3`-end of the plasmid transcript has striking similarities, in sequence and secondary structure, to the tRNA-like 3` termini of plant RNA virus genomes including the CCA 3` termini (Fig. 1b) (25, 29) . And like the RNA-dependent RNA polymerases encoded by RNA viruses, many of the (minus strand) cDNAs synthesized in the RNPs begin (5`-end) with a G corresponding to the penultimate C in the RNA(27) . Another large group of the cDNAs utilize DNA primers unrelated to the 3`-end of the RNA template and begin copying from the 3`-terminal A in the template. Experiments with purified RTase clarified these findings.
The RTase can be released from the RNPs
with micrococcal nuclease and purified(30) . When partially
purified enzyme was incubated with an RNA template synthesized in
vitro and containing 5`-truncated plasmid sequences, the cDNA
products were about 20 nucleotides longer than the template RNA. The
extra nucleotides were at the 5`-end of the cDNA and derived from
priming oligodeoxynucleotides that were bound to the RTase. The primers
were each different in sequence and length, and most appeared to be
short cDNAs copied from plasmid or mitochondrial RNA. Moreover, these
primers were all joined directly to the 5`-TGG sequence copied from the
3`-ACC end of the template RNA (as were some of the primed cDNAs
synthesized within RNPs and described above) and could be cleaved from
the RNADNA duplex with S1 nuclease. Assuming that the plasmid
RTase catalyzed synthesis of these primers, then it seems that the
enzyme is capable of switching templates, in analogy with the type 1
RTases.
When the RTase is freed of the bound primers by treatment
with polyethyleneimine(31) , it can utilize either exogenously
supplied oligodeoxynucleotides or the 3`-end of the RNA template itself
as primer; that is, the primer can be either DNA or RNA. Most
interestingly, however, the RTase is also efficient in de novo cDNA synthesis (Fig. 1b) and is the first DNA
polymerase known to initiate DNA chains. A template containing only the
3`-terminal 26 residues of the plasmid RNA is sufficient to direct
specific, de novo initiation, primarily opposite the
penultimate C residue (as in the other major group of cDNAs synthesized
in RNPs). The 5`-terminal G residue can be supplied by free
deoxyguanosine, dGMP, or dGTP. If the 3`-terminal residues of the
template are missing or if extra nucleotides are added to the 3`-end,
copying occurs but not de novo synthesis; instead, the 3`-end
of the RNA serves as a primer. The similarity between this RTase and
the RNA-dependent RNA polymerases of Q and brome mosaic virus is
underscored by their common ability to recognize the 3`-terminal
tRNA-like structure of the template and initiate synthesis by copying
the penultimate residue.
The msDNAs in various bacteria differ in length and sequence but share common features (Fig. 1c, bottom): 1) the 5`-end of the DNA (65-163 nt) is linked, through a 2`,5`-phosphodiester, to a guanosine within a short RNA (fewer than 100 nt), the msdRNA; 2) from 6 to 8 nt at the 3`-end of both the msDNA and msdRNA are complementary and form a duplex; and 3) both msDNA and msdRNA assume stable secondary structures.
Although
the retron RTases vary in size from about 300 to 590 amino
acids(34, 35) , the enzyme in cell extracts sediments
in association with msDNA as a large complex and has an apparent
molecular mass of 600-700 kDa on molecular sieves(33) .
The folded RNA of the retron transcription unit provides both the
template and primer for the enzyme (Fig. 1c, top) and may also serve as mRNA for RTase translation although
this has not been proved. Among the important features of the folded
RNA is a duplex stem formed by the inverted repeats a2 and a1 that
bracket the msdRNA and msDNA region of the retron transcript and
several stem/loop structures. A G residue in the short single-stranded
msdRNA segment just 3` of the a2a1 stem provides the 2`-hydroxyl
that primes reverse transcription. The residue in the msDNA region that
marks the start of the template segment also occurs just after the end
of the a2
a1 duplex. Thus, the a1
a2 stem brings the primer
and the template in close proximity. With the purified enzyme and the
folded RNA as template, msDNA is synthesized and its 5`-end is linked
to the 2`-hydroxyl of the expected guanosine residue in the msdRNA
region(36) . Elongation of the DNA proceeds along the template,
and the efficiency of chain extension depends on the absence of any
stable secondary structures, but not on the particular sequence, within
the msDNA template region(37) . In most instances, copying
stops and the chain terminates after addition of the 6-8
nucleotides that form the duplex between the 3`-ends of the msdRNA and
msDNA segments; the mechanism of this specific chain termination has
not been determined. In contrast to the lack of specificity for the RNA
sequence in the msDNA template region, the msdRNA region must be from
the same retron as the RTase; when the msdRNA regions of two different
retrons are exchanged, msDNA synthesis does not occur(38) .
With one possible exception, the RTase coding regions of retrons do not predict RNase H segments. Nevertheless, the formation of msDNA by the proposed scheme requires the removal of the RNA in the msDNA region of the template. Experiments with E. coli cells carrying mutations in chromosomal RNase H genes indicate that cellular RNase H is likely to play a role in msDNA synthesis(39) .
The formation of telomeric structures is associated with RTase in two known ways. Telomerase, which adds characteristic G-rich repeats to the ends of chromosomes in many eukaryotes, is an RTase(3) , and D. melanogaster's distinctive telomeric structures are made up of multiple copies of two LINE-like elements, HeT-A(43, 44) and TART(45, 46) , both of which can transpose onto Drosophila telomeres.
Telomerases are RNPs containing an uncapped RNA that acts as template (Fig. 1d). In the purified Tetrahymena thermophila enzyme, 80- and 95-kDa polypeptides bind the RNA and the telomeric primer DNA, respectively(47) . There is limited homology between the amino acid sequences predicted from the cloned gene (and cDNA) for p95 and viral RNA-dependent RNA polymerases; otherwise, the telomerase proteins are different from other RTases and from each other(47) . The telomerase RNAs vary in length (from 159 nt in Tetrahymena to 1300 in yeast(48) ) and sequence, but each includes a sequence complementary to the species-specific, 3` single strand overhanging, G-rich telomeric repeat; a conserved secondary structure among the protozoan telomerase RNAs includes a generally single-stranded, although partly constrained, configuration for the template region(49, 50, 51) . Active enzyme can be reconstituted after removal of the RNA by nuclease treatment(47, 52) .
In the first step in telomere extension, the 3` G-rich overhang base pairs with a few nucleotides in the telomerase RNA (Fig. 1d). The RTase then copies the rest of the basic repeat unit (e.g. GGGTTG in Tetrahymena) and pauses and shifts relative to the telomere so that the new terminus can be repositioned for addition of the next repeat(3, 52) . The mechanism whereby copying pauses and the primer shifts once the repeat segment is complete is unknown; as already mentioned, a similar question arises in the synthesis of retron msDNA. Other experiments indicate that the rate and processivity of polymerization in vitro are strongly dependent on the nucleotides just 5` of the terminal G-rich repeat on the primer(53, 54) ; it is likely that this represents a secondary binding site, the anchor site, for p95 and that such binding contributes to processivity.
Recent discussions about the origin of life on Earth postulate an early, ``RNA'' world and the conversion of that RNA world to one in which genetic information is stored in DNA through the mediation of a primitive RTase(55, 56, 57) . Thus, special interest has focused on the widespread type 2 RTases. It has been suggested that they may 1) have evolved from a primitive RNA-dependent RNA polymerase such as those now associated with some RNA viruses, 2) represent the most primitive RTases yet observed and thus may be ancestral to type 1 RTases and DNA polymerases, and 3) be the closest relatives we know to the RTase required by the ``RNA world'' hypothesis(27, 31, 55, 56, 57) .
Molecular phylogenies suggest that all the type 2 RTases may be more closely related than the organisms that harbor them and thus have, in part, an independent evolutionary history(4, 5, 6, 7) . This could be the consequence of conservation of the RTase coding sequence within the separate genomes over long periods of evolutionary history or could reflect horizontal transfer of elements across species and even phyla. There is evidence for horizontal transfer of jockey among distantly related members of the Drosophila genus(58) . Apart from the similar RTase coding sequences, certain other features recur among the type 2 RTases and even telomerase. These include the importance of template secondary structure, the recognition by the enzymes of the 3`-terminal structure of the RNA template, the initiating role of guanosine nucleotides, and the association of the enzymes with RNPs. Thus, while each of the enzymes has evolved to develop distinctive properties, their similarities hint at a common ancestor.