Adaptive Evolution in LINE-1 Retrotransposons

Stéphane Boissinot and Anthony V. Furano

Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
We traced the sequence evolution of the active lineage of LINE-1 (L1) retrotransposons over the last ~25 Myr of human evolution. Five major families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1) of elements have succeeded each other as a single lineage. We found that part of the first open-reading frame (ORFI) had a higher rate of nonsynonymous (amino acid replacement) substitution than synonymous substitution during the evolution of the ancestral L1PA5 through the L1PA3B families. This segment encodes the coiled coil region of the protein-protein interaction domain of the ORFI protein (ORFIp). Statistical analysis of these changes indicates that positive selection had been acting on this region. In contrast, the coiled coil segment hardly changed during the evolution of the L1PA3B to the present L1PA1 family. Therefore, selective pressure on the coiled coil segment has changed over time. We suggest that the fast rate of amino acid replacement in the coiled coil segment reflects the adaptation of L1 either to a changing genomic environment or to host repression factors. In contrast, the second open-reading frame and the nucleic acid–binding domain of the first open-reading frame are extremely well conserved, attesting to the strong purifying selection acting on these regions.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
L1 (LINE-1) non–long terminal repeat (LTR) retrotransposons (fig. 1A ) constitute the major family of autonomously transposing elements in mammals (Smit 1999Citation ). The human genome contains ~500,000 L1 elements that account for 17% of its mass (Lander et al. 2001Citation ), attesting to the profound effect that L1 replication has had on mammalian genomes. L1 retrotransposition generates mostly defective copies which remain in the genome and accumulate mutations at the pseudogene rate (Voliva et al. 1983Citation ; Hardies et al. 1986Citation ; Pascale et al. 1993Citation ). Novel replication-competent L1 variants are also produced, and can subsequently generate a family of several hundreds or thousands of copies that share the diagnostic features of their progenitor (or group of closely related progenitors) (reviewed in Furano 2000Citation ). This process is exemplified in humans, where five major families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1) have succeeded each other as a single lineage over the last 25 Myr in the primate ancestors of modern day humans (fig. 1B ) (Boissinot, Entezam, and Furano 2001Citation ).



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1.—Differences among L1 families. (A) Structure of a human full-length element. A typical full-length element has a 5' untranslated region (5'UTR), two open-reading frames (ORFI and ORFII), and a 3'UTR (for a review, see Furano 2000Citation ). The 5'UTR has a regulatory function. ORFI encodes a 40-kDa protein with an N-terminal domain involved in protein-protein interaction (P-P) and a C-terminal domain with nucleic-acid binding and chaperone activity (N-P) (Martin, Li, and Weisz 2000Citation ; Martin and Bushman 2001Citation ). The N-terminal domain contains a potential coiled coil domain (C-C) and a leucine-zipper domain (Z). The C-terminal domain contains seven (1–7 in fig. 1 ) regions that are highly conserved among human, rat, mouse, and rabbit L1 (Hohjoh and Singer 1996Citation ). ORFII encodes endonuclease (EN) and reverse transcriptase (RT) domains and a Cys-rich region (C). The RT domain contains 10 (0–9) regions that are highly conserved among most non-LTR retrotransposons (Malik, Burke, and Eickbush 1999Citation ). In addition, the transcript of ORFII contains two regions (A and B) to which ORFIp is thought to bind (Hohjoh and Singer 1997Citation ). The 3'UTR contains a conserved G-rich motif (G) (Howell and Usdin 1997Citation ). (B) Maximum-likelihood tree built on the consensus sequences of the 3'UTR of six L1 families. The tree is rooted with L1PA6. The numbers above the nodes indicate the percentages of the time the labeled node was present in 1,000 bootstrap replicates of the data. (C) Nucleotide substitutions among L1 families, from L1PA5 to L1PA1. Each bar represents a substitution in the indicated pairwise comparison. Bars above the ORFs represent nonsynonymous substitutions, bars below the ORFs represent synonymous substitutions. Short bars represent substitutions located in noncoding regions

 
About 70% of the full-length (potentially active) members of the ancestral L1PA5–L1PA2 families have apparently been cleared from the genome (Boissinot, Entezam, and Furano 2001Citation ). As this loss was more profound from recombining regions of the genome when compared with regions of low or no recombination (e.g., the Y chromosome), we suggested that the full-length elements have been deleterious enough to be subjected to purifying selection. This has not yet occurred for L1PA1 which is actively transposing in humans, generating new inserts that cause both neutral polymorphisms and genetic defects (reviewed in Kazazian and Moran 1998Citation ). Although these latter events would reduce the fitness of the host, we have proposed that a more global (dysgenic) effect of L1 activity accounted for the selection against the full-length L1 elements (Boissinot, Entezam, and Furano 2001Citation ).

Whatever the case, the fact that L1 activity is deleterious enough to be subject to purifying selection suggests that control of L1 transposition would be important for maintaining host fitness. Aside from evidence suggesting that the general transcriptional inhibition imposed by DNA methylation (Jones 1999Citation ) could as well silence L1 transcription (Nur, Pascale, and Furano 1988Citation ; Thayer, Singer, and Fanning 1993Citation ; Hata and Sakaki 1997Citation ; Woodcock et al. 1997Citation ), neither the regulation of L1 replication nor any other possible role played by the host has been examined in detail. Host factors that specifically repress or reduce L1 activity would be highly advantageous. In turn, such factors would constitute a selective pressure on L1 to evade repression. Thus, L1 evolution may in part reflect interactions between the element and its host.

To identify regions of L1 that could be involved in host-L1 interactions, we examined the evolutionary changes that occurred in the evolution of the active lineage of L1 elements from the ancestral L1PA5 family to the currently active L1PA1 family. In particular, we identified a region of the first open-reading frame (ORFI) that uniquely shows a high rate of nonsynonymous (amino acid replacement) substitutions, which is the typical signature of positive selection. The fact that this region of ORFI encodes a coiled coil domain that has been shown to mediate protein-protein interaction (Hohjoh and Singer 1996Citation ; Martin, Li, and Weisz 2000Citation ), suggests that the ORFI protein (ORFIp) could be involved in host-L1 interaction.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Sequence Alignment
Full-length elements belonging to the five most recent human L1 families (L1PA5, L1PA4, L1PA3B, L1PA2, and L1PA1 following the nomenclature in Smit et al. 1995Citation ; Boissinot, Chevret, and Furano 2000Citation ; Boissinot, Entezam, and Furano 2001Citation ) were collected from the GenBank public database (table 1 Go ). These families encompass the last ~25 Myr of L1 evolution in humans. We limited our analysis to these five families because they are less likely to be interrupted by more recent internal insertions than older families. The selected full-length elements were part of our previous work in which their classification was confirmed by phylogenetic analysis (Boissinot, Entezam, and Furano 2001Citation ). L1 elements differ from each other by the family-specific mutations that they inherit from their progenitors, and by the mutations that they have accumulated at the neutral rate since insertion in the genome. As we were interested in the evolution of the active lineage, we focused only on the family-specific differences. We eliminated the mutations that arose after insertion by deriving a consensus sequence for each family. Mutations in the highly mutable CpG dinucleotides were eliminated from the consensus except when they corresponded to fixed differences among families. Alignment and consensus were created using the GCG program (Wisconsin Package, Version 10.0, Genetics Computer Group, Madison, Wis.). The alignment is available from the EMBL-ALIGN database (accession number, ALIGN_000165).


View this table:
[in this window]
[in a new window]
 
Table 1 List of the Full-length L1 Elements. Coordinates of the Elements Were Obtained Using RepeatMasker, Version 3.0 (A. F. A Smit and P. Green, http://ftp.genome.washington.edu/RM/RepeatMasker.html) and Correspond Approximately to the Beginning and the End of the Elements

 

View this table:
[in this window]
[in a new window]
 
Table 1 Continued

 
Sequence Analysis
Phylogenetic reconstructions were performed using the maximum-likelihood (ML) method as implemented in the PAML program package (Yang 2000Citation ). Variations in the rate of evolution along a sequence can result from either selection or recombination. We used the program PLATO 2.0 (Grassly and Holmes 1997Citation ) to identify such regions. First, an ML tree based on the entire coding region of L1 was calculated with PAML (Yang 2000Citation ). Using the ML tree as null hypothesis, PLATO employs a sliding window through which deviations from the ML generated branch-lengths are calculated, thereby identifying regions that differ in evolutionary rate from the complete coding sequence.

ORFIp was analyzed for coiled coil domains using the program COILS (Lupas, Van Dyke, and Stock 1991Citation ) at http://www.ch.embnet.org/software/COILS_form.html. COILS compares a given sequence to a database of sequences which are known to form coiled coil structures. COILS calculates the probability that the sequence of interest will adopt a coiled coil conformation.

Test for Selection
The effect of selection on a coding sequence can be estimated by comparing the synonymous (dS) and nonsynonymous (dN) substitution rates (for a review, see Yang and Bielawski 2000Citation ). The value of the ratio {omega} = dN/dS is an indicator of the type and strength of selection. If nonsynonymous mutations have no effect on fitness, they are going to be fixed at the same rate as synonymous mutations and a value of {omega} = 1 is expected. If nonsynonymous mutations are deleterious, they are going to be fixed at a lower rate than synonymous mutations (i.e., negative or purifying selection) and {omega} will be <1. If nonsynonymous mutations are advantageous, they are going to be fixed faster than synonymous mutations (i.e., positive or adaptive selection) and {omega} will be >1. The parameter {omega} was estimated using the ML method of Goldman and Yang (1994)Citation . In this method, parameters of a model of codon substitution are estimated from the data by ML and are used to calculate dN and dS. To test if dN is significantly different from dS, {omega} was fixed at 1 in the null model (i.e., neutrality), whereas {omega} was estimated as a free parameter in the alternative model (Yang 1998Citation ). The double of the log-likelihood difference between the two models is compared with a {chi}2 distribution with one degree of freedom to test whether {omega} is different from 1. All these calculations were performed using the codem1 program of the PAML package (Yang 2000Citation ).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Consensus sequences for the L1PA5 (13 elements), L1PA4 (13), L1PA3B (14), L1PA2 (22), and L1PA1 (14) families were derived from the elements listed in table 1 . We used the Ta-1d subset of L1PA1 to derive the L1PA1 consensus because it is the most recently evolved version of L1PA1 (Boissinot, Chevret, and Furano 2000Citation ). The alignment (see Materials and Methods) contains 282 nt substitutions and seven indels (table 2 Go ). Figure 1B shows that the five families have been succeeding each other and that a single lineage of L1 families best describes the evolution of the active lineage of L1.


View this table:
[in this window]
[in a new window]
 
Table 2 List of the Mutations Differentiating the 5 L1 Families. C 123 A Means that an Ancestral C at Position 123 Mutated to an A. Syn means a Synonymous mutation; id means indel. The Sequence Coordinates are from the Alignment of the Consensus Sequences of the L1PA5-L1PA1 Families (see Materials and Methods)

 

View this table:
[in this window]
[in a new window]
 
Table 2 Continued

 
Positive Selection on the Coiled Coil Domain of ORFI
The PLATO analysis identified a single region of ORFI that has undergone nucleotide substitutions at a statistically different rate than the complete coding sequence (Z = 9.65, P < 0.05, sliding window size = 5). Increasing the window size or modifying the parameters of the analysis identified the same approximate region and yielded similar Z values, all of which were statistically significant. This region starts 153 nt from the beginning of ORFI and is 217 nt long. It coincides almost perfectly with the coiled coil (C-C on fig. 1A ) encoding region of the protein-protein interaction domain of ORFI. All but one of the 29 substitutions that differentiate L1PA5 from L1PA3B in this region are nonsynonymous (fig. 1C and table 2 ). This fast rate of amino-acid replacement indicates that this region is either evolving under relaxed selection (i.e., neutrally) or under positive selection.

We distinguished between these possibilities by analyzing the nonsynonymous to synonymous rate ratio ({omega}, see Materials and Methods). The parameter {omega} was calculated independently for the coiled coil encoding region (from codon 51 to 148) and for the entire coding sequence (ORFI and ORFII), excluding the coiled coil domain (table 3 ). Because some of the pairwise comparisons in the coiled coil domain include very few if any synonymous substitutions, ML estimates of the parameter {kappa} (=transition-transversion rate ratio for synonymous substitutions) were first estimated based on the complete coding sequence. Values of {kappa} for the entire L1 range from 1.8 to 3.1 and several values within this range were incorporated into the ML model to calculate {omega}. These analyses gave congruent results (only values with {kappa} = 2.5 are shown in table 3 ). Pairwise comparisons among L1PA5, L1PA4, and L1PA3B give values for {omega} significantly higher than 1, indicating that nonsynonymous mutations have been fixed at a faster rate than synonymous mutations (i.e., faster than if they had been neutral). Thus, by these criteria, the coiled coil domain of ORFI has evolved under positive selection during the evolution of L1PA5 to L1PA3B. Although higher than 1, values of {omega} are lower when L1PA5 or L1PA4 are compared with L1PA2 or L1PA1 because of purifying selection acting between the evolution of L1PA3B to L1PA1 (see later). Purifying selection increases the relative number of synonymous substitutions between the older (L1PA5 or L1PA4) families and the younger (L1PA2 or L1PA1) families. As the number of nonsynonymous mutations between the older and younger families has hardly changed, the value of {omega} derived from these comparisons is lower.


View this table:
[in this window]
[in a new window]
 
Table 3 Maximum-likelihood Estimates of {omega}, dN, and dS for the C-C Domain (below diagonal) and for the Non–C-C Part of L1 (above diagonal)

 
In contrast to the fast rate of amino acid replacement from L1PA5 to L1PA3B, the coiled coil domain has remained almost unchanged from L1PA3B to L1PA1. Indeed, the coiled coil domains of L1PA2 and L1PA1 are identical although they differ in other parts of the sequence (fig. 1C and table 3 ). The ratio, {omega}, is lower than 1 although not significantly so. It is difficult to determine if the conservation of recent L1 families is because of purifying selection, the absence of selection, or recombination that would have homogenized the sequence of different families. In any case, it seems that positive selection has not played a significant role in the evolution of the L1 families (i.e., L1PA2 and L1PA1) derived from the L1PA3B family. The alternation of a high (L1PA5 to L1PA3B) and a low (L1PA3B to L1PA1) amino acid replacement rate indicates that the nature of the selective pressure acting on the coiled coil domain has changed over time.

By comparison, the amino acid sequence outside the coiled coil domain has been always highly conserved (table 3 , above diagonal); in all comparisons {omega} is significantly lower than 1. This low rate of amino acid replacement indicates that strong purifying selection has been acting on most regions of the L1 proteins. In ORFII, sequence conservation is not limited to the endonuclease (EN) and reverse transcriptase (RT) encoding domains (fig. 1C ). The segments that separate EN, RT, and the 3' terminal region of ORFII are also very conserved, suggesting that these regions are functionally important because they either encode for some yet to be described function or they play a role in the conformation of the ORFII protein.

Pattern of Amino Acid Replacements in the Coiled Coil Domain
Coiled coil structures are formed by the intertwining of two or more {alpha}-helical peptide chains that have a repeating arrangement of nonpolar side chains (reviewed in Lupas 1996Citation ). Typically, domains that can form coiled coil structures consist of seven-residue repeats (heptads), with nonpolar or hydrophobic residues in the first (a) and fourth (d) positions of the heptad (fig. 2 ). The coiled coil domain of ORFIp ranges from amino acid 52 to 131 and consists of a first group of four or five heptads (depending on the family) separated from a group of six heptads by three amino acids (fig. 2 ). The COILS program indicates a 90%–100% probability that these heptads will adopt a coiled coil conformation (Lupas, Van Dyke, and Stock 1991Citation ).



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 2.—Structure of the coiled coil domain of L1PA5 and L1PA1. Heptads are numbered I to XI and amino acid positions within each heptads are indicated a to g. Only amino acids at position a and d (in bold) and amino acids that differ between L1PA5 and L1PA1 are shown. Conservative changes for polarity (see text) are represented by + and radical changes by -

 
Because the probability that a sequence will form a coiled coil structure depends on the position of hydrophobic or nonpolar residues, change in the polarity of an amino acid can affect the conformation of the protein. Figure 2 shows the distribution of the amino acid replacements that occurred between L1PA5 and L1PA1. Conservative changes (indicated by a + in fig. 2 ) are substitutions between two polar or two nonpolar amino acids (following the classification in Li 1997Citation , pp. 13–17). The replacement of nonpolar amino acids by C or Y (usually considered polar) at position (a) or (d) of the heptad is also considered conservative because these two amino acids are found preferentially at positions (a) or (d) of known coiled coils (Lupas, Van Dyke, and Stock 1991Citation ). Out of the 28 amino acid replacements that differentiate L1PA5 from L1PA1, 26 are conservative and therefore not likely to affect the potential coiled coil conformation of the protein (fig. 2 ). Only the two amino acid substitutions, V83T and M96K, are not conservative with regard to polarity. Substitution M96K is at position (g) of heptad VI, which is not as critical to the coiled coil conformation as positions (a) and (d). On the other hand, V83T is a radical change at position (d) of heptad V and this change does disrupt heptad V in L1PA1 (fig. 3). Indeed, T is very rarely found at position (d) of any known coiled coil (Lupas, Van Dyke, and Stock 1991Citation ) and substitution V83T could affect the coiled coil conformation of the protein. However, probabilities of forming coiled coil (as calculated by COILS, with a window of 21 or 28) are not significantly affected by any of the 28 amino acid changes, including these two radical changes.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Our results demonstrate that the coiled coil domain of ORFI has been subjected to an episode of intense positive selection (L1PA5 -> L1PA4 -> L1PA3B) resulting in a high rate of amino acid replacement. Since then (L1PA3B -> L1PA2 -> L1PA1), the coiled coil domain has been highly conserved. Thus, either the strength or nature of the selective pressure on this region has changed over time. As the major amplification of L1PA2 occurred only in the African apes (subfamily Homininae, i.e., gorillas, chimpanzees, and humans; unpublished data), the change in selective pressure on ORFI occurred after the divergence between Asian (e.g., orangutan) and African apes. Interestingly, the same region of ORFI is hypervariable in murine rodents and galagos (prosimians) as indicated by a high rate of amino acid replacement, duplications, and deletions (Kolosha and Martin 1995Citation ; Cabot et al. 1997Citation ; Mayorov, Rogozin, and Adkison 1999Citation ; Furano 2000Citation ). However, whether these changes are the result of positive selection is yet to be determined (Mayorov, Rogozin, and Adkison 1999Citation ; Furano 2000Citation ).

Because many aspects of L1 biology are still unknown, we can only speculate about the possible causes of positive selection. As the 5'UTR (untranslated region) of L1 evolves at a very high rate, including its wholesale replacement (Adey et al. 1Citation 994; Furano 2000Citation ), adaptive changes in ORFI could be a response to changes in the 5'UTR. Although the number of base changes between the 5'UTR of L1PA4 and L1PA3B and between L1PA2 and L1PA1 are roughly the same, ORFI has evolved under positive selection between L1PA4 and L1PA3B but remained very conserved between L1PA2 and L1PA1. Thus, positive selection in ORFI is not correlated with the global rate of base substitution in the 5'UTR. Although changes in the amino acid sequence of the ORFI may be a response to particular sequence changes in the 5'UTR, the selective pressure on ORFI may also lie elsewhere.

Most of the genes for which positive selection has been documented are involved in interactions between the organism and its environment (see Yang and Bielawski 2000Citation ). By analogy, we propose that positive selection in L1 may reflect an interaction between the L1 element (the organism) and the host (its environment). For instance, the rapid evolution of the coiled coil domain could have been driven by L1 adaptation to a host factor required by L1 for replication. Rapid evolution of the putative host factor might have occurred for a number of reasons, including avoidance of recruitment by L1. Alternatively, rapid evolution of the coiled coil domain could have resulted from the evasion by L1 of a host-encoded repressor of L1 replication. This would be similar to positive selection in pathogenic genes that evade a host's immune system (Zanotto et al. 1999Citation ; Haydon et al. 2001Citation ).

In both cases, the alternation between periods of positive and purifying selection on ORFI can be correlated with changes in L1 activity. In rodents (Pascale, Valle, and Furano 1990Citation ) and primates (unpublished data), L1 activity (amplification) is episodic, and therefore its deleterious effect on the host changes over time. Possibly, very active (deleterious) families would induce a strong response by the host, leading to intense positive selection for both the host and the element. Conversely, families that generate just enough copies to persist in the genome, but not enough to cause serious damage, would probably be ignored by the host, and the action of positive selection would be very limited.

Figure 2 shows that positive selection in ORFI resulted in substitutions among amino acids that share similar physicochemical properties. Therefore, the effects of positive selection on the coiled coil domain have been limited by structural constraints, i.e., the ability to form a coiled coil structure. This suggests that the potential to form a coiled coil structure is an important functional feature of ORFIp. This conclusion is supported by the fact that, although the N-terminal one-third of ORFI shows no sequence homology among murine rodents (old world rats and mice), rabbits, galagos, and humans (Kolosha and Martin 1997Citation ), all possess the potential to form coiled coil structures (data not shown; Martin, Li, and Weisz 2000Citation ). The ability of ORFIp to form a coiled coil structure is also shared by nonmammalian L1-like elements, like the Xenopus Tx1L (cited in Pont-Kingdon et al. 1997Citation ), the teleost Swimmer (Duvernell and Turner 1998Citation ), and the bird CR1 elements (Haas et al. 1997Citation , unpublished data). Coiled coils often mediate protein-protein interactions with themselves or other proteins. In mouse and human L1, the coiled coil domain mediates ORFIp binding to itself (Hohjoh and Singer 1996Citation ; Martin, Li, and Weisz 2000Citation ) but the possibility of interactions with other proteins has not been explored. The ORFIps of two divergent mouse L1 families (Tf and L1MdA) readily interact (Martin, Li, and Weisz 2000Citation ) suggesting that conservative changes in the coiled coil domain would not significantly affect ORFIp interaction with itself. Thus interactions between ORFIp and other proteins could well be responsible for positive selection on ORFI.


    Footnotes
 
Thomas Eickbush, Reviewing Editor

Keywords: L1/LINE-1 human retrotransposon positive selection Back

Address for correspondence and reprints: Anthony V. Furano, NIH, Building 8, Room 203, 8 Center DR MSC 0830, Bethesda, Maryland 20892-0830. avf{at}helix.nih.gov . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 

    Adey N. B., S. A. Schichman, D. K. Graham, S. N. Peterson, M. H. Edgell, C. A. I. Hutchison, 1994 Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences Mol. Biol. Evol 11:778-789[Abstract/Free Full Text]

    Boissinot S., P. Chevret, A. V. Furano, 2000 L1 (LINE-1) retrotransposon evolution and amplification in recent human history Mol. Biol. Evol 17:915-928[Abstract/Free Full Text]

    Boissinot S., A. Entezam, A. V. Furano, 2001 Selection against deleterious LINE-1-containing loci in the human lineage Mol. Biol. Evol 18:926-935[Abstract/Free Full Text]

    Cabot E. L., B. Angeletti, K. Usdin, A. V. Furano, 1997 Rapid evolution of a young L1 (LINE-1) clade in recently speciated Rattus taxa J. Mol. Evol 45:412-423[ISI][Medline]

    Duvernell D. D., B. J. Turner, 1998 Swimmer 1, a new low-copy-number LINE family in teleost genomes with sequence similarity to mammalian L1 Mol. Biol. Evol 15:1791-1793[Free Full Text]

    Furano A. V., 2000 The Biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons Prog. Nucleic Acid Res. Mol. Biol 64:255-294[ISI][Medline]

    Goldman N., Z. Yang, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736[Abstract/Free Full Text]

    Grassly N. C., E. C. Holmes, 1997 A likelihood method for the detection of selection and recombination using sequence data Mol. Biol. Evol 14:239-247[Abstract]

    Haas N. B., J. M. Grabowski, A. B. Sivitz, J. B. Burch, 1997 Chicken repeat 1 (CR1) elements, which define an ancient family of vertebrate non-LTR retrotransposons, contain two closely spaced open reading frames Gene 197:305-309[ISI][Medline]

    Hardies S. C., S. L. Martin, C. F. Voliva, C. A. Hutchison III,, M. H. Edgell, 1986 An analysis of replacement and synonymous changes in the rodent L1 repeat family Mol. Biol. Evol 3:109-125[Abstract]

    Hata K., Y. Sakaki, 1997 Identification of critical CpG sites for repression of L1 transcription by DNA methylation Gene 189:227-234[ISI][Medline]

    Haydon D. T., A. D. Bastos, N. J. Knowles, A. R. Samuel, 2001 Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates Genetics 157:7-15[Abstract/Free Full Text]

    Hohjoh H., M. Singer, 1997 Sequence specific single-strand RNA-binding protein encoded by the human LINE-1 retrotransposon EMBO J 16:6034-6043[Abstract/Free Full Text]

    Hohjoh H., M. F. Singer, 1996 Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA EMBO J 15:630-639[Abstract]

    Howell R., K. P. Usdin, 1997 The ability to form intrastrand tetraplexes is an evolutionarily conserved feature of the 3' end of L1 retrotransposons Mol. Biol. Evol 14:144-155[Abstract]

    Jones P. A., 1999 The DNA methylation paradox Trends Genet 15:34-37[ISI][Medline]

    Kazazian H. H. Jr.,, J. V. Moran, 1998 The impact of L1 retrotransposons on the human genome Nat. Genet 19:19-24[ISI][Medline]

    Kolosha V. O., S. L. Martin, 1995 Polymorphic sequences encoding the first open reading frame protein from LINE-1 ribonucleoprotein particles J. Biol. Chem 270:2868-2873[Abstract/Free Full Text]

    ———. 1997 In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition Proc. Natl. Acad. Sci. USA 94:10155-10160[Abstract/Free Full Text]

    Lander E. S., L. M. Linton, B. Birren, et al. (100 co-authors) 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Li W.-H., 1997 Molecular evolution Sinauer Associates, Sunderland, Mass.

    Lupas A., 1996 Coiled coils: new structures and new functions Trends Biochem. Sci 21:375-382[ISI][Medline]

    Lupas A., M. Van Dyke, J. Stock, 1991 Predicting coiled coils from protein sequences Science 252:1162-1164[ISI][Medline]

    Malik H. S., W. D. Burke, T. H. Eickbush, 1999 The age and evolution of non-LTR retrotransposable elements Mol. Biol. Evol 16:793-805[Abstract]

    Martin S. L., F. D. Bushman, 2001 Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon Mol. Cell. Biol 21:467-475[Abstract/Free Full Text]

    Martin S. L., J. Li, J. A. Weisz, 2000 Deletion analysis defines distinct functional domains for protein-protein and nucleic acid interactions in the ORF1 protein of mouse LINE-1 J. Mol. Biol 304:11-20[ISI][Medline]

    Mayorov V. I., I. B. Rogozin, L. R. Adkison, 1999 Characterization of several LINE-1 elements in Microtus kirgisorum Mamm. Genome 10:724-729[ISI][Medline]

    Nur I., E. Pascale, A. V. Furano, 1988 The left end of rat L1 (L1Rn, long interspersed repeated) DNA which is a CpG island can function as a promoter Nucleic Acids Res 16:9233-9251[Abstract]

    Pascale E., C. Liu, E. Valle, K. Usdin, A. V. Furano, 1993 The evolution of long interspersed repeated DNA (L1, LINE 1) as revealed by the analysis of an ancient rodent L1 DNA family J. Mol. Evol 36:9-20[ISI][Medline]

    Pascale E., E. Valle, A. V. Furano, 1990 Amplification of an ancestral mammalian L1 family of long interspersed repeated DNA occurred just before the murine radiation Proc. Natl. Acad. Sci. USA 87:9481-9485[Abstract]

    Pont-Kingdon G., E. Chi, S. Christensen, D. Carroll, 1997 Ribonucleoprotein formation by the ORF1 protein of the non-LTR retrotransposon Tx1L in Xenopus oocytes Nucleic Acids Res 25:3088-3094[Abstract/Free Full Text]

    Smit A. F. A., 1999 Interspersed repeats and other mementos of transposable elements in mammalian genomes Curr. Opin. Genet. Dev 9:657-663[ISI][Medline]

    Smit A. F. A., G. Tóth, A. D. Riggs, J. Jurka, 1995 Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences J. Mol. Biol 246:401-417[ISI][Medline]

    Thayer R. E., M. F. Singer, T. G. Fanning, 1993 Undermethylation of specific LINE-1 sequences in human cells producing a LINE-1-encoded protein Gene 133:273-277[ISI][Medline]

    Voliva C. F., C. L. Jahn, M. B. Comer, C. A. Hutchison III,, M. H. Edgell, 1983 The L1Md long interspersed repeat family in the mouse: almost all examples are truncated at one end Nucleic Acids Res 11:8847-8859[Abstract]

    Woodcock D. M., C. B. Lawler, M. E. Linsenmeyer, J. P. Doherty, W. D. Warren, 1997 Asymmetric methylation in the hypermethylated CpG promoter region of the human L1 retrotransposon J. Biol. Chem 272:7810-7816[Abstract/Free Full Text]

    Yang Z., 1998 Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution Mol. Biol. Evol 15:568-573[Abstract]

    ———. 2000 PAML (phylogenetic analysis by maximum-likelihood) Version 3.0 University College, London

    Yang Z., J. P. Bielawski, 2000 Statistical methods for detecting molecular adaptation Trends Ecol. Evol 15:496-503[ISI][Medline]

    Zanotto A. M. d. A., E. G. Kallas, R. F. de Souza, E. C. Holmes, 1999 Genealogical evidence for positive selection in the nef gene of HIV-1 Genetics 153:1077-1089[Abstract/Free Full Text]

Accepted for publication August 3, 2001.