Department of Zoology & Genetics, Iowa State University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Retroviruses and long terminal direct repeat (LTR) retrotransposons share numerous similarities in genetic organization and mechanism of replication. Both are flanked by LTRs and encode Gag and Pol polyproteins that are processed by a pol-encoded aspartic protease (PR). Gag is a structural protein that forms virus or viruslike particles. pol also encodes the enzymes reverse transcriptase (RT) and integrase (IN), which synthesize retroelement cDNA and integrate it into the host genome. Retroviruses have a third open reading frame, env, enabling extracellular transmission.
Sequence heterogeneity is high among retroelements even though they encode proteins of same or similar function. One explanation lies in the comparatively low fidelity of reverse transcription that results in accelerated retroelement evolution (Gabriel and Mules 1999
). Despite sequence heterogeneity, phylogenetic analyses based on RT amino acid sequences divide LTR retroelements into five distinct lineages found in diverse eukaryotes (Xiong and Eickbush 1990
; Boeke et al. 2000a,
2000b;
Malik, Henikoff, and Eickbush 2000
). These include the vertebrate retroviruses (Retroviridae), two predominantly retrotransposon lineages (the Pseudoviridae and the Metaviridae), and the BEL and DIRS clades (Malik, Henikoff, and Eickbush 2000
). The Pseudoviridae (also known as the Ty1/copia elements) are characterized by a distinctive pol enzymatic domain order, wherein coding sequences for IN precede those for RT. This domain order is reversed in the other LTR retroelement lineages. The Metaviridae and Pseudoviridae are each divided into two genera (Boeke et al. 2000a,
2000b
). The Metaviridae consist of the Metavirus and Errantivirus, the latter has envlike genes similar to the Retroviridae. The Hemivirus and Pseudovirus constitute the two genera of Pseudoviridae, which are distinguished by the primer used to initiate reverse transcription. Pseudoviruses prime DNA synthesis with the terminal 3' residue of initiator methionine tRNA, whereas Hemiviruses utilize a half-tRNA generated by cleavage within the anticodon stem-loop.
Genome sequencing projects have enhanced our understanding of diversity and evolutionary trends among retroelements. For example, it has become apparent that infectious retroviruses evolved independently multiple times through the acquisition of env genes by retrotransposons (Malik, Henikoff, and Eickbush 2000
). This has been well documented in the Metaviridae, and more recently, examples of elements with envlike genes have been reported in the Pseudoviridae (Laten, Majumdar, and Gaucher 1998
; Kapitonov and Jurka 1999
; Peterson-Burch et al. 2000
). Additional important evolutionary trends will likely be revealed as genome sequences continue to unfold. As a framework for understanding such trends, we characterized the coding regions of Pseudoviridae found within GenBank as well as the complete genome sequences of five eukaryotes representing plants (Arabidopsis thaliana), humans, animals (Caenorhabditis elegans, Drosophila melanogaster), and fungi (Saccharomyces cerevisiae). We sought both conserved and divergent features that may offer insight into the mechanisms of replication and how these elements have adapted to their host cell environments. Because the Pseudoviridae are particularly abundant and diverse in higher plants, analyses focused on the Pseudoviridae of A. thaliana
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sequence Analysis
ClustalX version 1.81 was used to generate most multiple sequence alignments (Thompson et al. 1997
). Default settings were used for all parameters other than substitution matrices, where the BLOSUM series was used for both pairwise and multiple sequence alignments. The delay divergent sequences option was set to 45 for the RT alignment. Alignments of IN, PR, and RT are deposited in the EMBL database and are posted on our website (http://www.public.iastate.edu/
voytas/publications/mbe02.html). Additional methods of multiple sequence alignment were employed to identify conserved amino acid sequences in Gag and Pol. These included ungapped blastx alignments and the methods incorporated by the Blocks software (Henikoff et al. 1995
). Constrained regions and residues in Gag were revealed when the methods were not forced to align the sequences over their entirety. SeqLogos were generated from ungapped blastx multiple alignments of Gag amino acid sequences (Schneider and Stephens 1990
) using the WebLogo server (http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi).
The RT neighbor-joining tree (Poisson correction) was generated using MEGA 2.1 (Kumar et al. 2001
) from ClustalX alignments. Other trees were created with PAUP 4.028 (Swofford 1999
). The IN tree was generated from aligned sequences spanning from four residues upstream of the first histidine of the N-terminal zinc-binding motif (HHCC) to the end of the conserved region, which ends approximately 120 residues downstream of the glutamate at the catalytic site. The PR tree was generated from the alignment shown in figure 3A.
Bootstrap replicates used full character replacement. Default settings were used for all other parameters. Specifics regarding the rooting of trees are given in the figure legends. Trees were visualized with MEGA or TreeExplorer (http://evolgen.biol.metro-u.ac.jp/TE/TE_man.html).
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Seven conserved domains of RT (Xiong and Eickbush 1990
) were aligned for all elements and used to construct a consensus neighbor-joining tree (fig. 1
). The tree is characterized by long branch lengths, indicating considerable diversity among the Pseudoviridae. Most of this diversity is embodied by the A. thaliana elements; many of the other plant elements in the core data set have A. thaliana homologues. Nonplant elements emanate from the base of the tree. A previous study based on the partial A. thaliana genome sequence identified several of the major clades described here (Terol et al. 2001
) (numbered clades in fig. 1
). Representatives from these and additional lineages (numbered elements in fig. 1 ) were subjected to the analyses of coding sequences described in subsequent sections (data not shown). No unique coding sequence features were observed. We concluded that the core data set is representative of the Pseudoviridae.
|
gag
gag encodes proteins that form virus or viruslike particles and that package retroelement mRNAs (Vogt 1997
). For the purposes of this study, Gag was defined as the span of amino acids extending from the first methionine to the three catalytic residues of PR (see below for description of PR). Gag proteins of retroviruses and Ty1 are cleaved 2040 residues before the PR active site, so this definition likely overestimates this protein's length. Accordingly, observations pertaining to Gag features do not consider the C-termini.
In some retroelements, Gag and Pol are encoded by a single open reading frame, whereas in others they are separated by a shift in a reading frame or a stop codon. For these latter elements, Pol is usually expressed as a consequence of translational recoding (e.g., frameshifting or stop codon suppression) (Gesteland and Atkins 1996
). Most Pseudoviridae appear to encode Gag and Pol on a single ORF; 27 of the 32 elements in the core data set (table 1
) do not have a break in reading frame between gag and pol. Ty1 is known to use ribosomal frameshifting for pol expression, and the sequences that mediate Ty1 frameshifting are conserved in Ty4 (Voytas and Boeke 2002
). Two of the maize elements (Opie-2 and PREM-2) were too degenerate to assess their organization, and gag was entirely missing in an element from pea. The reference 1731 element has a frameshift located between gag and pol, suggesting that it undergoes frameshifting; however, a consensus element derived from 1731 insertions in the D. melanogaster genome encodes a single ORF (data not shown).
Gag is typically processed into multiple proteins, including a short nucleocapsid protein that coats the retroelement mRNA within the particle. The RNA-binding motif (RB) (Cx2Cx4Hx4C) is a characteristic feature of the nucleocapsid and is widespread among retroelements (Vogt 1997
). This motif is also a common feature of the Pseudoviridae with the exception of Ty1, Ty4, and Tca2. The transpositionally active Ty5 element has a shortened motif variant (Cx2Cx3Hx4C). In most Pseudoviridae, as in other retroelement families, the region flanking the motif exhibits a basic nature and contains low complexity amino acid sequence repeats. Most of these repeats are composed of N(n), K(n), G(n), and variations on RG(n) (data not shown).
Striking differences in the organization and number of RBs were observed in Agroviruses (fig. 2 ). Among Agroviruses with identifiable envlike ORFs, SIRE-1 and ToRTL1 have two RBs, whereas Endovir1-1 has three. In Endovir1-1, two motifs are arranged in tandem near the center of Gag, and one is located at the Gag C-terminus. The only other member of the Pseudoviridae with two tandem RBs is Tpv2-6. In this case, the RBs are located at the Gag C-terminus. It should be noted that in the RT phylogeny, Tpv2 is among the elements most closely related to the Agroviruses (fig. 1 ). Agroviruses for which envlike ORFs have not been identified include Opie-2 and PREM-2, and these elements have a single, centrally located RB. In general, the central location of this motif is a unique feature of the Agroviruses. Agroviruses also stand out from the rest of the Pseudoviridae on the basis of Gag length. Gag averages 656 amino acids in length for the five Agroviruses (ranging from 541 to 808), whereas the remaining family members average 329 amino acids. An internal domain of Gag accounts for most of the size difference. This helix forming region is located downstream of the central RB and lacks sequence similarity with other family members.
|
pol
pol encodes the enzymatic functions required for replication. Conserved amino acid sequence motifs for PR, IN, and RT were used as the starting point to delimit boundaries of these enzymes. Ty1 PR cleavage sites, which are known, were also used to set approximate boundaries for each enzyme (Garfinkel et al. 1991
; Moore and Garfinkel 1994
; Merkulov et al. 1996
). A caveat that should be mentioned is that Ty1 is among the most divergent family members. This is evident in phylogenetic analyses that include RTs from outside Pseudoviridae (Eickbush 1994
).
Protease
PR is the first enzyme encoded by pol, located at the N-terminus. It is required to release the other enzymes from the Pol precursor and is involved in processing of Gag (Gulnik, Erickson, and Xie 2000
). Retroelement aspartic proteases are characterized by a D(S/T)G motif at the catalytic site. We defined the PR N-terminus as the nearly invariant tryptophan, three residues upstream of the catalytic aspartate (fig. 2A
) although, as mentioned previously, this likely underestimates the length of the N-terminus by approximately 30 residues. The C-terminus of PR was defined as the last conserved region upstream of the Ty1 PR-IN cleavage site. Family members share no obvious conserved features or physiochemical properties at the location of the Ty1 PR cleavage sites. These criteria result in a PR that has 109120 amino acid residues. PR sequences do not strongly differentiate relationships among Pseudoviridae (fig. 3B
) but do group Agroviruses together.
Because Pseudoviridae PR sequences were divergent, we took advantage of the 3D structural information available for some species of Retroviridae (Gulnik, Erickson, and Xie 2000
). Sequences were sufficiently similar to enable conserved motifs to be mapped onto the dimeric form of the HIV-1 enzyme (fig. 3B
). The conserved glycine-rich region (shaded orange) is part of a flexible loop that works cooperatively to coordinate the substrate in the catalytic site (shaded red). When substrate is not present, PR undergoes a major conformational shift wherein the locking loops are separated from each other, allowing substrates access to the catalytic core. The remaining homology region (shaded blue) is largely hydrophobic and contributes to the protein core. A central, highly conserved proline residue is located at the point where this segment folds back sharply upon itself.
Integrase
IN binds and inserts the retroelement cDNA into host chromosomal DNA. IN is characterized by three domainsHHCC, a catalytic core (the DD35E motif), and a poorly conserved C-terminal domain (Haren, Ton-Hoang, and Chandler 1999
). The N-terminus of Pseudoviridae IN was defined as beginning four amino acid residues upstream of the first histidine in the HHCC motif. This position is several amino acids downstream from the known Ty1 N-terminal cleavage site. The highly conserved zinc-binding motif and central catalytic core regions differ very little between Pseudoviridae and other LTR retroelement INs (Haren, Ton-Hoang, and Chandler 1999
). Conservation in the catalytic domain terminates approximately 120 amino acids downstream of the glutamate in the DD35E motif (fig. 4A
). The C-terminus is defined as beginning at this point and extending to RT. This likely overestimates the length of this region because the actual boundary between IN and RT is unclear. Phylogenetic analyses were performed using the catalytic core domain (fig. 4B
), and element clusters largely correspond to those observed for RT (fig. 1
).
|
Although the remainder of the IN C-terminus lacks high sequence conservation, there are some shared features. The region is enriched in proline (8.5%), asparagine (6.8%), serine-threonine (12%/7.4%), and aspartate-glutamate (7%/9.6%). Numerous short runs of charged residues, such as lysines and glutamates, are also evident. A tandem duplication is located in the C-terminus of copia, further suggesting that this region can tolerate considerable variability (fig. 4B
). The C-termini of most Agrovirus share a short, 14-residue motif with four invariant residues, ILGD (fig. 4B
). The functional significance, if any, of this motif is unknown. Several fungal elements, (Ty1, Ty4, Ty5 of S. cerevisiae and Tca2 of Candida albicans) have comparatively large and extended C-termini. Ty1 and Ty5 are known to target their integration to specific locations (Voytas and Boeke 2002
). The targeting function has been mapped to the C-terminus of Ty5 IN (Xie et al. 2001
), and Ty1 has a nuclear localization signal in this region (Kenna et al. 1998
; Moore, Rinckel, and Garfinkel 1998
).
Reverse Transcriptase
The carboxy-terminal domain of pol encodes RT and its associated RNase H subdomain. These enzyme activities generate a cDNA copy of the retroelement from genomic mRNA (Telesnitsky and Goff 1997
). Previous studies have shown that RT is the most conserved retroelement-coding region (Xiong and Eickbush 1988
). RT immediately follows the IN C-terminus and continues through RNase H to the end of pol. Analysis of core data set RTs did not reveal or offer any new observations regarding function or evolution beyond those described in previous sections (fig. 1
) and other RT studies (Malik and Eickbush 2001
).
envlike ORFs
Three Agroviruses have envlike ORFs downstream of pol (Endovir1-1, SIRE-1, ToRTL1) (Laten, Majumdar, and Gaucher 1998
; Kapitonov and Jurka 1999
; Peterson-Burch et al. 2000
). The Endovir1-1 ORF shares significant amino acid similarity with the corresponding ORF of SIRE-1 (fig. 5
). Significant similarity is also observed for TorTL1 but over a much shorter span. Like retroviral env genes, the SIRE-1 and Endovir1-1 ORFs have central transmembrane domains. ToRTL1 and SIRE-1 have a transmembrane domain located at or near the N-terminus. The C-terminal halves of these proteins are generally rich in secondary structural features, particularly alpha helices. BLAST searches of the ORFs did not identify any related proteins in the GenBank databases. It should be noted that envlike ORFs could not be discerned in some members of the Agrovirus (e.g., PREM-2 and Opie-2). This may be attributed to mutations or deletions that obscure this coding region in these elements.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Familywide Features of Coding Sequences
In the Pseudoviridae, Gag and Pol are typically encoded by a single ORF. Only the S. cerevisiae Ty1 and Ty4 elements clearly have gag and pol separated by a frameshift (+1). Opie-2 and PREM-2 of maize are the only other candidates for separate gag and pol genes. But sequence degeneracy makes it difficult to assign reading frames for these elements. The prevalence of a single gag-polcoding region differs from most (Meta/Retro)viridae, where these genes are usually separated by a stop codon or a -1 frameshift (Vogt 1997
). Translational recoding regulates Gag and Pol stoichiometry for most (Meta/Retro)viridae guaranteeing the high levels of Gag required for proper virion assembly (Gesteland and Atkins 1996
). Of the single-ORF Pseudoviridae for which the stoichiometry of Gag and Pol has been examined, Gag preponderance is ensured by differential splicing (copia) and perhaps by increased Pol turnover (Ty5) (Brierley and Flavell 1990
; Yoshioka et al. 1990
; Irwin and Voytas 2001
). Differential protein stability has been well documented as the mechanism regulating Gag and Pol levels for Tf1, a single-ORF Metavirus (Atwood, Lin, and Levin 1996
). It is likely that the Pseudoviridae generally use mechanisms other than translational recoding to regulate gag-pol expression.
Gag
Gag proteins are typically highly divergent, yet Pseudoviridae Gag shows discernible conservation between family members at both the primary and secondary structural levels. We identified three sequence domains featuring several interspersed, highly conserved amino acid residues. In Ty1, mutations in domain C cause transposition phenotypes and disrupt VLP formation and morphology (Braiterman et al. 1994
; Monokian, Braiterman, and Boeke 1994
; Martin-Rendon et al. 1996
). Like the (Meta/Retro)viridae, Pseudoviridae Gag proteins have an RB characteristic of nucleocapsid. The signature of this motif differs from the (Meta/Retro)viridae by an additional conserved glycine residue (Cx2Cx3GHx4C). Exceptions include Ty5, which is missing the glycine, and Tca2 and Ty1, which lack a recognizable RB. In the case of Ty1, however, the corresponding region of Gag has other properties of nucleocapsid proteins (Cristofari, Ficheux, and Darlix 2000
). The major homology domain found in many (Meta/Retro)viridae Gag proteins was not observed in the Pseudoviridae (Orlinsky et al. 1996
). But the major homology region approximately corresponds to domains B and C. As discussed below, there has also been an expansion of Gag in the Agroviruses.
Pol
Pol showed a high degree of sequence conservation over most of its length. RT and its associated RH have been extensively studied and reviewed elsewhere (Telesnitsky and Goff 1997
). We only add that sequence conservation in the Pseudoviridae extends approximately 30 residues upstream of the RT domain 1 (as defined by Xiong and Eickbush 1990
). Pseudoviridae PR is the most divergent of the Pol enzymes. Only two amino acids are invariant in the data set, yet many chemically similar residues display regular spacing and are observed as vertical bands of color in the alignment (fig. 3A
). Mapping conserved regions onto the structure of the HIV-1 PR identifies a hydrophobic core region that helps maintain and stabilize the enzyme, assuring proper positioning and coordination of interacting regions of the dimer (Gulnik, Erickson, and Xie 2000
). The glycine-rich domain of PR forms much of the conformationally mobile, substrate clamping loop. The precisely spaced hydrophobic residues in the glycine-rich region maintain contacts with the hydrophobic backbone, likely placing some limitations on the range of movement of the clamping loops. PR size is very uniform within the family, and the precise spacing of conserved residues suggests that functional constraints limit sequence divergence.
Pseudoviridae IN has an N-terminal metal-binding domain and a central catalytic domain similar to those of the (Meta/Retro)viridae (Haren, Ton-Hoang, and Chandler 1999
). These domains are the second most highly conserved regions of pol after RT. The only major structural variation seen in pol occurs in the carboxy-terminus of IN. Using the conserved glutamate of the DD35E motif as a reference point, conservation extends farther downstream in the Pseudoviridae than in other retroelement families. We refer to this conserved C-terminal domain as the GKGY motif after the four most highly conserved residues. In HIV-1 IN, the C-terminal domain is involved in nonspecific DNA binding and also plays a role in oligomerization coordinated by the zinc-binding motif (Brown 1997
). Critical C-terminal residues in HIV-1 IN provide a basic charged platform involved in the nonspecific DNA interaction (Eijkelenboom et al. 1999
; Chen et al. 2000
). It is interesting that the GKGY motif shows six positions where a positive residue is preserved (asterisked columns in fig. 4A
), suggesting that they may play a similar role in DNA binding. The location of the HIV-1 basic residues corresponds approximately to the GKGY motif.
Some (Meta/Retro)viridae have a conserved C-terminal motif (GPF/Y) not found in the Pseudoviridae (Malik and Eickbush 1999
). This motif is located just after the GKGY domain and has also been speculated to play a role in DNA binding. The region C-terminal to the GKGY motif varies considerably in size among the Pseudoviridae. This is particularly evident among the fungal elements. Pseudoviridae IN C-termini are often proline rich, characteristic of some protein-protein interaction domains (Kay, Williamson, and Sudol 2000
). Some (Meta/Retro)viridae C-termini have chromodomains, and recent work suggests that chromodomains interact with histones (Nielsen et al. 2001
). Although we did not observe any chromodomains in the Pseudoviridae, the C-terminus of Ty5 IN interacts with chromatin, and this interaction is responsible for Ty5's target specificity (Xie et al. 2001
). Perhaps the IN C-terminus generally interacts with chromatin to help direct integration.
The Agroviruses
All members of the Retroviridae encode Env, and more recently LTR retroelements with envlike genes have been described in the Metaviridae (Wright and Voytas 1998
; Lerat and Capy 1999
), the BEL group elements (Bowen and McDonald 1999
; Frame, Cutfield, and Poulter 2001
), and the Pseudoviridae (Laten, Majumdar, and Gaucher 1998
; Kapitonov and Jurka 1999
; Peterson-Burch et al. 2000
). The D. melanogaster gypsy element envlike gene has been shown to mediate infection, indicating that it encodes a true Env protein (Kim et al. 1994
; Song et al. 1994
). The envlike genes of some insect Metaviridae and BEL group elements are evolutionarily related to viral env genes (Malik, Henikoff, and Eickbush 2000
). This suggests that retroviruses have evolved from retrotransposons through transduction of viral env genes. Of course, this process can occur in reverse, and some endogenous retroelements (e.g., IAP and VL30 elements) have likely originated from viruses that lost env (Boeke and Stoye 1997
).
Pseudoviridae with envlike genes include SIRE-1 of soybean, Endovir of A. thaliana, and ToRTL1 of tomato. The envlike ORFs of these elements vary in length from 476 to 668 amino acids and are separated from pol by distances ranging from 27 to over 1,000 nucleotides. Retroviridae env is expressed from a spliced mRNA, and this also appears to be the case for Metaviridae envlike genes (Avedisov and Ilyin 1994
; Marsano et al. 2000
; Wright and Voytas 2002
). Alternative splicing does not likely play a role in expression of the Pseudoviridae envlike genes because no conserved splice acceptor sites are evident in the vicinity of this ORF. It is possible that the envlike ORF is expressed from an internal promoter; however, searches failed to identify likely promoter elements or transcription factorbinding sites (data not shown). Alternative strategies for expression include internal ribosome entry and translational bypassing (Gesteland and Atkins 1996
; Jackson 2000
). The mechanism of expression will ultimately need to be determined experimentally.
Like env of the retroviruses, the envlike genes of the Pseudoviridae have predicted transmembrane domains. The organization of these domains is not conserved among the three elements: ToRTL1 and SIRE-1 have N-terminal transmembrane domains, and SIRE-1 and Endovir have one or two transmembrane domains in the C-terminal halves of their proteins. Comparisons among the three elements showed that the N-terminal halves share the greatest sequence identity, whereas the C-terminal halves are enriched in secondary structural features. We recognize that the Agrovirus data set is limited, and conserved features will be revealed by the characterization of additional envlike ORF-containing elements. Although we imply its involvement in extracellular transmission, an alternative possibility is that the Envlike protein facilitates movement between plant cells. Many plant viruses encode movement proteins that carry out this role, including some with transmembrane domains (Brill et al. 2000
; Melcher 2000
).
A notable Agrovirus feature is that gag has undergone considerable expansion. Like other Pseudoviridae, Agrovirus Gag has the conserved A, B, C core domains followed by an RB (fig. 2 ). Agroviruses with envlike ORFs have a second RB. Sequences adjacent to these RBs are variable, preventing determination of which RB is orthologous to the single RB found in other Pseudoviridae. Multiple RBs are also observed in retroviruses, where tandem pairs help package the viral genome within the virion. Two members of the Psueudoviridae contain tandem RBs as seen in the Retroviridae. Endovir1-1 has a tandem pair located in the center of Gag as well as a third RB at the C-terminus. Interestingly, Tpv2-6, which also has a tandem pair of RBs, is a member of a Pseudoviridae clade that is closely related to the Agroviruses. This suggests that duplication of the RB may have occurred before the acquisition of the envlike gene. It should be noted that Agroviruses from monocots without envlike ORFs have only a single, centrally located RB.
The expansion of gag occurred in two regions: between domains A and B and after domain C. In the Agroviruses with envlike ORFs, the larger expansion is located between the two RBs in the C-terminus. This expansion is not an obvious tandem duplication of the first half of gag because we could not detect homology or secondary structural similarity with upstream regions. The role of the expansion region is not known, but its association with the lineage of elements with envlike genes suggests a functional relationship between Gag and the Envlike protein. The C-terminal expansion is highly alpha helical in character, and the coiled-coils found in this region are a common feature of proteins involved in oligomerization. If the envlike ORF mediates infectivity, Gag may facilitate docking and budding of the virion, as is observed in retroviruses (Swanstrom and Wills 1997
). Another alternative is that this region acts as a movement protein enabling transport of virions between plant cells.
Evolutionary Trends
Although Pseudoviridae are found in diverse eukaryotes, clearly they have been very successful in colonizing plant genomes. In our survey, we identified 276 distinct A. thaliana Pseudoviridae RTs. Because all other plant elements cluster with A. thaliana clades, the diversity of A. thaliana elements reflects the diversity in other plant species. Some clades of A. thaliana elements do not have other plant homologues in this data set. As additional plant genomes are sequenced, it will be of interest to determine the full complement of plant Pseudoviridae lineages. Outside of plants, relatively few (e.g., S. cerevisiae, D. melanogaster) or no Pseudoviridae were identified (e.g., nematodes and humans). It is difficult to provide a parsimonious explanation for the punctate distribution of these retroelements in eukaryotes. It may be that they originated in plants, where they are ubiquitous, and then moved into other organisms by way of horizontal transfer. This latter hypothesis is supported by the apparent entry of copia into D. melanogaster by horizontal transfer and the relatively youthful appearance of D. melanogaster retrotransposons (Jordan, Matyunina, and McDonald 1999
; Bowen and McDonald 2001
). An alternative hypothesis is that Pseudoviridae were lost in some eukaryotic lineages. Clearly, species lacking Pseudoviridae have other successful retroelements (e.g., the Cer elements in nematodes and the LINE-1 elements in humans).
Most elements in the Pseudoviridae RT tree have very long branch lengths and are poorly separated near the tree base. Long branch lengths may be the result of accelerated sequence evolution brought about in part by the error-prone nature of RT (Gabriel and Mules 1999
). Widely used amino acid substitution models do not consider this and other factors that may influence repetitive sequence evolution. For example, methylation of repetitive DNAs accelerates mutation rates (Colot and Rossignol 1999
). Repetitive sequences in plants are highly methylated, perhaps increasing the observed diversity of the plant elements.
Although we invoke horizontal transfer as a possible explanation for the punctate distribution of elements in the eukaryotes, our analysis did not provide any strong evidence of horizontal transfer or domain swapping. We recognize that we excluded elements sharing greater than 75% amino acid identity from the data set; however, this process only eliminated elements from the same host organism. Some elements did not cluster with others from their host (e.g., Tca2 from C. albicans and Ty1 from S. cerevisiae). In all such cases, separation of the elements was poorly supported by bootstrap analyses, making it difficult to invoke horizontal transfer to explain their relationships. Agroviruses, which might have a means for host cell escape by infection, also cluster together and show no apparent signs of horizontal transfer. Trees generated from sequence domains other than RT generally had the same topology, providing little evidence for swapping of coding regions. Fine-scale organizational analysis of specific coding sequences such as IN also did not indicate swapping or small-scale rearrangments (data not shown). Evidence for horizontal transfer and domain swapping may surface with the identification of additional elements.
Proposal for a New Genus
The current taxonomic structure of the Pseudoviridae is not suggested by RT-coding sequence phylogenies. The Hemivirus and Pseudovirus are not monophyletic in either unrooted or rooted RT trees (fig. 1
, and data not shown). These genera were originally classified on the basis of the primer used for reverse transcription: a cleaved half tRNA for the Hemiviruses or the 3' end of a full tRNA for the Pseudoviruses (Boeke et al. 2000a,
2000b
). In the Metaviridae, genera distinctions are based on the presence (Errantivirus) or absence (Metavirus) of an env gene. The discovery of Pseudoviridae with envlike genes suggests a new classification scheme for this family. Throughout this article, we have referred to this lineage as the Agroviruses, and we propose that they represent a third genera of the Pseudoviridae. In addition to the envlike gene, it is clear that the Agroviruses are evolving independently from other members of the family: Agroviruses from both monocots and dicots are monophyletic, and all share an expanded gag gene. Ultimately, study of a functional Agrovirus will be required to understand the biological role for the dramatic coding sequence changes that characterize this distinct lineage of the Pseudoviridae.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: retrotransposon
Pseudoviridae
gag
pol
env
Address for correspondence and reprints: Daniel F. Voytas, Department of Zoology & Genetics, 2208 Molecular Biology Building, Iowa State University, Ames, Iowa 50011. voytas{at}iastate.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Atwood A., J. H. Lin, H. L. Levin, 1996 The retrotransposon Tf1 assembles virus-like particles that contain excess Gag relative to integrase because of a regulated degradation process Mol. Cell. Biol 16:338-346[Abstract]
Avedisov S. N., Y. V. Ilyin, 1994 Identification of spliced RNA species of Drosophila melanogastergypsy retrotransposon. New evidence for retroviral nature of the gypsy element FEBS Lett 350:147-150[ISI][Medline]
Bailey T. L., C. Elkan, 1994 Fitting a mixture model by expectation maximization to discover motifs in biopolymers Proc. Int. Conf. Intell. Syst. Mol. Biol 2:28-36[Medline]
Boeke J. D., T. Eickbush, S. B. Sandmeyer, D. F. Voytas, 2000a. Metaviridae Pp. 359367 in M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, E. B. Carsten, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle, and R. B. Wickner, eds. Virus taxonomy: seventh report of the international committee on taxonomy of viruses. Academic Press, New York
. 2000b. Pseudoviridae Pp. 349357 in M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, E. B. Carsten, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle, and R. B. Wickner, eds. Virus taxonomy: seventh report of the international committee on taxonomy of viruses. Academic Press, New York
Boeke J. D., J. P. Stoye, 1997 Retrotransposons, endogenous retroviruses, and the evolution of retroelements Pp. 343436 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York
Bowen N. J., J. F. McDonald, 1999 Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements Genome Res 9:924-935
. 2001 Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside Genome Res 11:1527-1540
Braiterman L. T., G. M. Monokian, D. J. Eichinger, S. L. Merbs, A. Gabriel, J. D. Boeke, 1994 In-frame linker insertion mutagenesis of yeast transposon Ty1: phenotypic analysis Gene 139:19-26[ISI][Medline]
Brierley C., A. J. Flavell, 1990 The retrotransposon copia controls the relative levels of its gene products post-transcriptionally by differential expression from its two major mRNAs Nucleic Acids Res 18:2947-2951[Abstract]
Brill L. M., R. S. Nunn, T. W. Kahn, M. Yeager, R. N. Beachy, 2000 Recombinant tobacco mosaic virus movement protein is an RNA-binding, alpha-helical membrane protein Proc. Natl. Acad. Sci. USA 97:7112-7117
Brown P. O., 1997 Integration Pp. 161204 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York
Chen J. C., J. Krucinski, L. J. Miercke, J. S. Finer-Moore, A. H. Tang, A. D. Leavitt, R. M. Stroud, 2000 Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding Proc. Natl. Acad. Sci. USA 97:8233-8238
Colot V., J. L. Rossignol, 1999 Eukaryotic DNA methylation as an evolutionary device Bioessays 21:402-411[ISI][Medline]
Cristofari G., D. Ficheux, J. L. Darlix, 2000 The GAG-like protein of the yeast Ty1 retrotransposon contains a nucleic acid chaperone domain analogous to retroviral nucleocapsid proteins J. Biol. Chem 275:19210-19217
Cuff J. A., G. J. Barton, 2000 Application of multiple sequence alignment profiles to improve protein secondary structure prediction Proteins 40:502-511[ISI][Medline]
Eickbush T. H., 1994 Origin and evolutionary relationships of retroelements Pp. 121157 in S. S. Morse, ed. The evolutionary biology of viruses. Raven Press, New York
Eijkelenboom A. P., R. Sprangers, K. Hard, R. A. Puras Lutzke, R. H. Plasterk, R. Boelens, R. Kaptein, 1999 Refined solution structure of the C-terminal DNA-binding domain of human immunovirus-1 integrase Proteins 36:556-564[ISI][Medline]
Frame I. G., J. F. Cutfield, R. T. Poulter, 2001 New BEL-like LTR-retrotransposons in Fugu rubripes, Caenorhabditis elegans, and Drosophila melanogaster Gene 263:219-230[ISI][Medline]
Gabriel A., E. H. Mules, 1999 Fidelity of retrotransposon replication Ann. NY Acad. Sci 870:108-118
Garfinkel D. J., A. M. Hedge, S. D. Youngren, T. D. Copeland, 1991 Proteolytic processing of pol-TYB proteins from the yeast retrotransposon Ty1 J. Virol 65:4573-4581[ISI][Medline]
Gesteland R. F., J. F. Atkins, 1996 Recoding: dynamic reprogramming of translation Annu. Rev. Biochem 65:741-768[ISI][Medline]
Gulnik S., J. W. Erickson, D. Xie, 2000 HIV protease: enzyme function and drug resistance Vitam. Horm 58:213-256[ISI][Medline]
Haren L., B. Ton-Hoang, M. Chandler, 1999 Integrating DNA: transposases and retroviral integrases Annu. Rev. Microbiol 53:245-281[ISI][Medline]
Henikoff S., J. G. Henikoff, 1992 Amino acid substitution matrices from protein blocks Proc. Natl. Acad. Sci. USA 89:10915-10919[Abstract]
Henikoff S., J. G. Henikoff, W. J. Alford, S. Pietrokovski, 1995 Automated construction and graphical presentation of protein blocks from unaligned sequences Gene 163:GC17-GC26[ISI][Medline]
Hogue C. W., 1997 Cn3D: a new generation of three-dimensional molecular structure viewer Trends Biochem. Sci 22:314-316[ISI][Medline]
Hunter E., J. Casey, B. Hahn, M. Hayami, B. Korber, R. Kurth, J. Neil, A. Rethwilm, P. Sonigo, J. Stoye, 2000 Retroviridae Pp. 369387 in M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, E. B. Carsten, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle, and R. B. Wickner, eds. Virus taxonomy: seventh report of the international committee on taxonomy of viruses. Academic Press, New York
Irwin P. A., D. F. Voytas, 2001 Expression and processing of proteins encoded by the Saccharomyces retrotransposon Ty5 J. Virol 75:1790-1797
Jackson R. J., 2000 A comparative view of initiation site selection mechanisms. Pp. 127183 in N. Sonenberg, J. W. B. Hershey, and M. B. Mathews, eds. Translational control of gene expression. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York
Jones D. T., 1999 Protein secondary structure prediction based on position-specific scoring matrices J. Mol. Biol 292:195-202[ISI][Medline]
Jordan I. K., L. V. Matyunina, J. F. McDonald, 1999 Evidence for the recent horizontal transfer of long terminal repeat retrotransposon Proc. Natl. Acad. Sci. USA 96:12621-12625
Kapitonov V. V., J. Jurka, 1999 Molecular paleontology of transposable elements from Arabidopsis thaliana Genetica 107:27-37[ISI][Medline]
Kay B. K., M. P. Williamson, M. Sudol, 2000 The importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains FASEB J 14:231-241
Kenna M. A., C. B. Brachmann, S. E. Devine, J. D. Boeke, 1998 Invading the yeast nucleus: a nuclear localization signal at the C terminus of Ty1 integrase is required for transposition in vivo Mol. Cell. Biol 18:1115-1124
Kim A., C. Terzian, P. Santamaria, A. Pelisson, N. Purd'homme, A. Bucheton, 1994 Retroviruses in invertebrates: the gypsy retrotransposon is apparently an infectious retrovirus of Drosophila melanogaster Proc. Natl. Acad. Sci. USA 91:1285-1289[Abstract]
Kim J. M., S. Vanguri, J. D. Boeke, A. Gabriel, D. F. Voytas, 1998 Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence Genome Res 8:464-478
Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis software, Arizona State University, Tempe, Ariz
Laten H. M., A. Majumdar, E. A. Gaucher, 1998 SIRE-1, a copia/Ty1-like retroelement from soybean, encodes a retroviral envelope-like protein Proc. Natl. Acad. Sci. USA 95:6897-6902
Lerat E., P. Capy, 1999 Retrotransposons and retroviruses: analysis of the envelope gene Mol. Biol. Evol 16:1198-1207[Abstract]
Malik H. S., T. H. Eickbush, 1999 Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons J. Virol 73:5186-5190
. 2001 Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses Genome Res 11:1187-1197
Malik H., S. Henikoff, T. Eickbush, 2000 Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses Genome Res 10:1307-1318
Manninen I., A. H. Schulman, 1993 BARE-1, a copia-like retroelement in barley (Hordeum vulgare L) Plant Mol. Biol 22:829-846[ISI][Medline]
Marracci S., R. Batistoni, G. Pesole, L. Citti, I. Nardi, 1996 Gypsy/Ty3-like elements in the genome of the terrestrial Salamander hydromantes (Amphibia, Urodela) J. Mol. Evol 43:584-593[ISI][Medline]
Marsano R. M., R. Moschetti, C. Caggese, C. Lanave, P. Barsanti, R. Caizzi, 2000 The complete Tirant transposable element in Drosophila melanogaster shows a structural relationship with retrovirus-like retrotransposons Gene 247:87-95[ISI][Medline]
Martin-Rendon E., G. Marfany, S. Wilson, D. J. Ferguson, S. M. Kingsman, A. J. Kingsman, 1996 Structural determinants within the subunit protein of Ty1 virus-like particles Mol. Microbiol 22:667-679[ISI][Medline]
McClure M. A., T. K. Vasi, W. M. Fitch, 1994 Comparative analysis of multiple protein-sequence alignment methods Mol. Biol. Evol 11:571-592[Abstract]
Melcher U., 2000 The 30K superfamily of viral movement proteins J. Gen. Virol. 81 Pt 1:257-266
Merkulov G. V., K. M. Swiderek, C. B. Brachmann, J. D. Boeke, 1996 A critical proteolytic cleavage site near the C-terminus of the yeast retrotransposon Ty1 Gag protein J. Virol 70:5548-5556[Abstract]
Monokian G. M., L. T. Braiterman, J. D. Boeke, 1994 In-frame linker insertion mutagenesis of yeast transposon Ty1: mutations, transposition and dominance Gene 139:9-18[ISI][Medline]
Moore S. P., D. J. Garfinkel, 1994 Expression and partial purification of enzymatically active recombinant Ty1 integrase in Saccharomyces cerevisiae Proc. Natl. Acad. Sci. USA 91:1843-1847[Abstract]
Moore S. P., L. A. Rinckel, D. J. Garfinkel, 1998 A Ty1 integrase nuclear localization signal required for retrotransposition Mol. Cell. Biol 18:1105-1114
Nielsen A. L., M. Oulad-Abdelghani, J. A. Ortiz, E. Remboutsika, P. Chambon, R. Losson, 2001 Heterochromatin formation in mammalian cells: interaction between histones and HP1 proteins Mol. Cell 7:729-739[ISI][Medline]
Orlinsky K. J., J. Gu, M. Hoyt, S. Sandmeyer, T. M. Menees, 1996 Mutations in the Ty3 major homology region affect multiple steps in Ty3 retrotransposition J. Virol 70:3440-3448[Abstract]
Peterson-Burch B. D., D. A. Wright, H. M. Laten, D. F. Voytas, 2000 Retroviruses in plants? Trends Genet 16:151-152[ISI][Medline]
SanMiguel P., A. Tikhonov, Y. K. Jin, et al. (11 co-authors) 1996 Nested retrotransposons in the intergenic regions of the maize genome Science 274:765-768.
Schneider T. D., R. M. Stephens, 1990 Sequence logos: a new way to display consensus sequences Nucleic Acids Res 18:6097-6100[Abstract]
Song S. U., T. Gerasimova, M. Kurkulos, J. D. Boeke, V. G. Corces, 1994 An env-like protein encoded by a Drosophila retroelement: evidence that gypsy is an infectious retrovirus Genes Dev 8:2046-2057[Abstract]
Swanstrom R., J. W. Wills, 1997 Synthesis, assembly and processing of viral proteins Pp. 263334 in J. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
Swofford D., 1999 PAUP 4.0. 4.0 edition Smithsonian Institution, Washington D.C
Tatusova T. A., T. L. Madden, 1999 BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences FEMS Microbiol. Lett 174:247-250[ISI][Medline]
Telesnitsky A., S. P. Goff, 1997 Reverse transcriptase and the generation of retroviral DNA Pp. 121160 in J. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y
Terol J., M. C. Castillo, M. Bargues, M. Perez-Alonso, R. de Frutos, 2001 Structural and evolutionary analysis of the copia-like elements in the Arabidopsis thaliana genome Mol. Biol. Evol 18:882-892
Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882
Tusnady G. E., I. Simon, 1998 Principles governing amino acid composition of integral membrane proteins: application to topology prediction J. Mol. Biol 283:489-506[ISI][Medline]
Vogt V. M., 1997 Retroviral virions and genomes Pp. 2770 in J. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y
Voytas D. F., J. D. Boeke, 2002 Ty1 and Ty5 of Saccharomyces cerevisiae. Pp. 631662 in N. L. Craig, R. Craigie, M. Gellert, and A. M. Lambowitz, eds. Mobile DNA II American Society for Microbiology, Washington, D.C.
Wolf E., P. S. Kim, B. Berger, 1997 MultiCoil: a program for predicting two- and three-stranded coiled coils Protein Sci 6:1179-1189
Wright D. A., D. F. Voytas, 1998 Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins Genetics 149:703-715
. 2002 Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses Genome Res. 12:122131
Xie W., X. Gai, Y. Zhu, D. C. Zappulla, R. Sternglanz, D. F. Voytas, 2001 Targeting of the yeast Ty5 retrotransposon to silent chromatin is mediated by interactions between integrase and Sir4p Mol. Cell. Biol 21:6606-6614
Xiong Y., T. H. Eickbush, 1988 Similarity of reverse transcriptase-like sequences of viruses, transposable elements, and mitochondrial introns Mol. Biol. Evol 5:675-690[Abstract]
. 1990 Origin and evolution of retroelements based upon their reverse transcriptase sequences EMBO J 9:3353-3362[Abstract]
Yoshioka K., H. Honma, M. Zushi, S. Kondo, S. Togashi, T. Miyake, T. Shiba, 1990 Virus-like particle formation of Drosophilacopia through autocatalytic processing EMBO J 9:535-541[Abstract]