Department of Biochemistry, University of Texas Health Science Center at San Antonio
Abstract
Recently, a rapidly amplifying family of mouse LINE-1 (L1) has been identified and named TF. The evolutionary context surrounding the derivation of the TF family was examined through phylogenetic analysis of sequences in the 3' portion of the repeat. The Mus musculus domesticus TF family was found to be the terminal subfamily of the previously identified L1Md4 lineage. The L1Md4 lineage joins the other prototypical mouse LINE-1 lineage (the L1MdA2 lineage) approximately 1 MYA at about the time of the common ancestor of M. m. domesticus, Mus spicilegus, and Mus spretus. However, the TF family from M. m. domesticus was found to join to the previously reported M. spretus Ms475 and Ms7024 LINE-1 families at just 0.5 MYA, indicating horizontal transfer. The TF family from M. m. domesticus was then found to be even more recently related to LINE-1's from another species, M. spicilegus. A separate spretus A2 lineage was found through a directed search of a PCR library. This lineage, in contrast to the spretus TF lineage, does join domesticus at about 1 MYA, as would be expected in the absence of horizontal transfer. A third major family was also found that splits off from the L1Md4 lineage shortly after its departure from the L1MdA2 lineage. The new family, named the Z family, was found to contain the de novo LINE-1 inserts causing the beige and med mutations. Whether the split with the Z family was before or after the recombination that introduced the F-type promoters and defined the inception of TF as a lineage is unclear. In enumerating copies of the various LINE-1 families, we found that TF 3' ends were not much more numerous than the reported number of 5' ends, suggesting that TF may not be subjected to the 90% truncation pattern typical of LINE-1 as a whole.
Introduction
LINE-1 is a retrotransposon that has dispersed over 100,000 copies into the mammalian genome. Most of these copies are defective due to truncation or the accumulation of mutations over time (for review, see Edgell et al. 1987
; Hutchison et al. 1989
). New insertions sometimes disrupt genes and cause genetic disease, and recent progress in the study of LINE-1 has centered on using new inserts to track down currently active LINE-1 loci (for review, see Kazazian and Moran 1998
).
In the mouse, a recently amplified LINE-1 family has been characterized, named TF, and described as follows (DeBerardinis et al. 1998
; Naas et al. 1998
; Saxton and Martin 1998
). The TF family is defined by its recombinant structure, in which the 5' portion is derived from some earlier family with an F-type promoter, and the 3' portion is more closely related to the prototypical A-type L1MdA2 sequence. There are about 4,800 copies per diploid mouse genome of this variety of F-type promoter in Mus domesticus, and there are also copies in Mus spretus and Mus spicilegus. Sequenced members of the M. domesticus TF family are only diverged by 0.2% from their consensus, indicating that the family began amplifying only about 100,000200,000 years ago in this species. TF inserts are responsible for 2 of 160 mutant mouse alleles characterized at the molecular level (Kazazian and Moran 1998
). A majority of full-length TF family members that have been tested for transposition after introduction into cultured cells have been found to be active (DeBerardinis et al. 1998
), and the majority of their promoters are active in in vitro transcription assays (DeBerardinis and Kazazian 1999
).
LINE-1 sequences can be organized into a number of families, each, like TF, distinguished by several sequence variants shared by the members and commonly represented by a consensus sequence. We further organized the sequences into lineages using phylogenetic methods. This approach clarifies whether families are ancestral to one other or both derived from a common ancestor, and it also allows estimation of the time of origin and period of duration of the amplification leading to each family. Specifically, it has previously been reported that the youngest of the domesticus LINE-1's belong to two lineages that split about 1 MYA (Martin et al. 1985
; Hardies et al. 1986
; Rikke, Garvin, and Hardies 1991
). These are referred to here as the L1MdA2 and the L1Md4 lineages after their prototypical members. Two LINE-1 lineages in the closely related mouse species M. spretus have been characterized and named the Ms475 and Ms7024 lineages after definitive sequence variants (Casavant and Hardies 1994b
). Exactly how these spretus LINE-1 lineages join to the domesticus L1MdA2 and L1Md4 lineages was not resolved in previous studies (Rikke, Garvin, and Hardies 1991
; Casavant and Hardies 1994a
).
There were a number of issues that we wanted to clarify about how the TF family fits within the overall phylogenetic organization of mouse LINE-1. First of all, it became immediately apparent and will be documented below that L1Md4 is a member of the TF family. Since we were aware of other sequences in GenBank that were most closely related to L1Md4, we asked whether they also fit within the TF amplification 100,000200,000 years ago or if TF is a subset of a larger L1Md4 lineage. Second, the size of the TF family reported by DeBerardinis et al. (1998)
was 2,400 full-length copies per haploid genome. This number adjusted for 90% average truncation for LINE-1 (Hutchison et al. 1989
) is much larger than we would have expected from just the few copies of the L1Md4 lineage that we knew of in GenBank, so we made thorough searches both of GenBank and of a clone library to identify the source of this discrepancy.
Third, we were aware of some evidence that the spretus Ms475 and Ms7024 families were more closely related to L1Md4 than to other domesticus LINE-1's. We sought to clarify whether the known spretus families also had the recombinant TF promoter and whether a separate family of young spretus LINE-1's existed with A-type promoters. We also clarified whether the spretus Ms475 and Ms7024/domesticus TF relationship was close enough to indicate that the family had passed from one species to another by horizontal transfer. Finally, DeBerardinis et al. (1998)
reported that TF promoter-bearing sequences were all highly similar to one another. Thus, the ancestral sequences leading to TF did not replicate extensively in domesticus until recently. We asked if it was generally true of the L1Md4 lineage that ancestrally derived pseudogenes were lacking in the domesticus genome. Furthermore, we asked if this phenomenon might be explained by recent introduction of the TF family progenitor into M. domesticus from another species of mouse.
Materials and Methods
DNA Sequences
Two new sequences including TF promoter regions from M. spicilegus have been deposited in GenBank with accession numbers AF1925204 and AF192505. Table 1
lists information to identify the other LINE-1 DNA sequences named in this paper in GenBank, together with the corresponding coordinates in the standard numbering system of L1MdA2 (Loeb et al. 1986
) and the associated GenBank entry M13002. All coordinates cited in the text correspond to the latter numbering system. The new L1EL2 sequence was obtained from strain SPRET/Ei.
|
The primary method of estimating the age of the domesticus/spretus TF split was from substitutions specific to individual pseudogenes as indicated in the text. Details of a second method using shared variants of the TF lineage are given below. The use of shared variants requires a number of corrections, as follows. Active LINE-1 lineages are affected by conservative selective pressure on the coding sequences which is roughly compensated by a source of extra substitutions thought to be reverse transcription errors (Hardies et al. 1986
). Thus, for coding sequence, the 1%/Myr pseudogene rate constant can be used. In the 3' untranslated region (UTR), there is no selective pressure to compensate the reverse transcription errors, so a higher rate constant prevails. The best available estimate is 2.6 times the neutral rate (Rikke 1992
). Thus, a separate estimate of the age was made for the coding sequence and the 3' UTR. Of the differences separating TF and Ms475 found by neighbor-joining (fig. 1A
), the 6 distinguishing TF1 and TF2 are attributed to a gene conversion and are therefore not included. The remaining 25 (10.25 + 14.75) are divided into 8 in the coding sequence (positions 58876700; last 814 bp of ORF2) and 17 in the 3' UTR (positions 67007362; complete 3' UTR).
|
Estimation from the 3' UTR is complicated by two recognized gene conversions at positions 68456955 and 72287309. Omitting those subranges and all substitutions found within them reduces the total to 5 substitutions in 471 bp = 1.1%. Taking the 2.6-fold acceleration into account yields a divergence of 0.42 Myr to which the 0.2 Myr age of the TF consensus should be added. Dividing by 2 produces a 0.31 MYA estimate for the domesticus/spretus TF split from the 3' untranslated sequence.
LINE-1 Libraries
A library (SPLW31) was constructed in the vector pBluescript II SK+ after restriction enzyme digestion of SPRET/Ei genomic DNA with EcoRI and SacI and excision of the repetitive LINE-1 fragment from an agarose gel. The inserts covered LINE-1 positions 61107152. There were 3 6704C+ clones out of 49 LINE-1containing clones. Sequences SHA and SFA were from this source. Other experiments were conducted with libraries generated by PCR as follows.
PCR amplification of genomic DNA was conducted with LINE-1 primers bracketing LINE-1 positions 61127111. The primer sequences were 5'-GGCTCAGAACTGAACAAAGA-3' and 5'-GCTCATAATGTTGTTCCACCT-3'. Amplification was conducted with 100 ng of mouse DNA (either SPRET/Ei or DMZ) for 12 cycles with an annealing temperature of 60°C. The small number of cycles was chosen after preliminary experiments demonstrated that the LINE-1 product was still increasing in a log-linear fashion during these cycles. This was done to reduce the potential for heteroduplex formation prior to cloning.
The PCR products were cloned using the Invitrogen Topo TA cloning kit into the strain TOP10F'. The transformants were plated at low density and transferred to 96-well plates for amplification and storage in glycerol-based storage medium. Bacteria was stamped onto Hybond Nylon filters and subjected to amplification by a method adapted from Woo (1979)
: lysis and denaturation in 0.5 M NaOH/1 M NaCl, neutralization, baking, and a debris wash in 5 x SSC/0.5% SDS/1 mM EDTA before hybridization. Probes were either end-labeled with [32]P or, in the case of a generic LINE-1 probe, by random hexamer priming. The first probe used was a fragment of LINE-1, which identified a few wells that either did not grow up or did not contain LINE-1 inserts. All subsequent frequencies are expressed relative to the number of LINE-1containing clones. A library (SPLW39) from M. spretus strain SPRET/Ei was produced that had 1,107 LINE-1containing clones. A library (DMZLW40) from M. domesticus strain DMZ was produced that had 1,125 LINE-1containing clones. A library (PANLW61) from M. spicilegus strain PANCEVO was produced with 1,092 LINE-1containing clones. These were wild-derived inbred strains. SPRET/Ei and PANCEVO DNA was purchased from Jackson Laboratories. DMZ tissue was a gift from Francois Bonhomme.
Hybridization
Oligonucleotide probes were generally constructed to contain 15 bases with the variant position in the middle (table 2
). Probes are named after the variant base prefixed by an "o," except where names have been previously published. A spot of L1MdA2 DNA was present on all filters. It served as either a positive or a negative control, depending on the probe. Filters were washed at a temperature to remove faint positive signals, leaving a number of brightly positive clones (table 2
). Selected clones were subsequently sequenced completely on both strands using the Amplicycle sequencing kit (Perkin Elmer). These sequences retroactively confirm the specificity of the probes.
|
|
|
Results
The TF Family Belongs to the L1Md4 Lineage
Mus musculus domesticus TF family members exhibit only 0.2% average divergence from their consensus (DeBerardinis et al. 1998
). This high degree of similarity was presented both as an operational definition of what should be included within the TF family and as an indicator of the sudden and recent amplification that occurred in this family. In the 3' portion of the TF sequences reported by DeBerardinis et al. (1998)
, we note that two subfamilies can be discerned that are separated by a coordinated cluster of differences in the 3' UTR (positions 6845 [T/A], 6913 [A/T], 6920 [T/C], 6938 [C/T], 6954 [C/T], and 6955 [C/T]). We will refer to the subset having the former bases as TF1 and to the subset having the latter bases as TF2. Therefore, established TF sequences often differ by 6% in this local region. In order for high similarity to be retained as a defining property for short truncated TF sequences, separate comparisons with the consensus sequences of each of these two groups were necessary. L1Md4 differs from the consensus of TF2 at only 3 out of 2,046 bp (0.14%). Hence, L1Md4 is a typical member of the TF2 subgroup of the TF family, and the TF family can be considered a recent transposition burst at the end of the 1-Myr-old domesticus L1Md4 lineage.
The six differences mentioned above within a 111-bp region amount to an order of magnitude higher divergence than the overall average divergence within TF. This is presumably a small patch altered by gene conversion. The tree construction algorithms used here weigh these by default as six independent substitutions and resolve TF into two sublineages. However, given the likelihood that the six changes are really the result of one mutational event, they should more properly be downweighted to the status of a single informative position. When that is done, there are only a few informative positions which disagree on the subdivision of the sequences. We will not attempt to resolve here whether TF1 and TF2 are two separate sublineages or just two arbitrary subgroups within a single family of recombining sequences, but we will analyze TF1 and TF2 consensus sequences separately to be sure that they both support the same conclusions.
The domesticus TF Family is Closely Related to the spretus Ms475 and Ms7024 Families
We sought to clarify the relationship of the known spretus LINE-1 subfamilies to the domesticus LINE-1 families. Our previous efforts used only the single L1Md4 sequence to represent what turns out to be the domesticus TF family. The results were uncertain because some sequence positions favored grouping spretus with L1Md4, but others disagreed in a pattern suggesting interference by recombination. We thought that representing domesticus TF with a combination of the newly defined TF1 and TF2 consensus sequences might relieve this uncertainty, since one of them should represent the nonrecombinant version of TF with respect to the identified conversion tract at positions 68456955.
The TF consensus sequences were compared with L1MdA2, as a representative of the other major young domesticus LINE-1 lineage, and with a spretus Ms475 family sequence. Of the spretus sequences that had been determined, a number of them are identical to one another and map at the very end of the Ms475 lineage (Casavant and Hardies 1994b
). That common sequence is defined from coordinates 66217152 and represents a presently amplifying spretus family of about 2,000 copies. We extended this spretus Ms475 sequence by making a composite with a longer sequence named L1EL2, which is also identical between coordinates 6621 and 7152 and covers coordinates 5887 to the 3' end (7362).
The TF sequences were twice as divergent from L1MdA2 as from the Ms475 family (3.9% vs. 1.9%; 57 vs. 28 differences), and this relationship held for coding sequence (end of ORF2; positions 58876700) and the 3' UTR (positions 67017362) compared separately (2.4% vs. 0.9% in the coding sequence; 5.5% vs. 3.1% in the 3' UTR). The comparison was confined to A2 coordinate 5887 through the 3' end, because this is the only region defined for the spretus Ms475 family.
These distance relationships are shown in figure 1A in the form of a neighbor-joining tree. The root sequence (AF37352a) bisects the A2 : TF distance, giving the impression of nearly equal evolutionary rates throughout this part of the tree. However, we do not consider it safe to rely on an equal-rates assumption. AF37352a is a product of a GenBank search to organize available full-length sequences higher up on the mouse LINE-1 tree to find a close-in full-length root. This was done to solve the recurring problem of the loss of root information past the truncation point of a variety of shorter sequences that have been used to root LINE-1 trees. The reliability of this grouping was tested by the bootstrap analysis, which reported support of 100 for each of the groupings in figure 1A (not shown). Thus, the domesticus TF family appears to be more recently related to the spretus Ms475 family than to the domesticus A2 family by roughly a factor of 2.
We then tested further to see how the LINE-1 inserts belonging to each family were distributed on the tree. We added a variety of other young LINE-1 sequences recovered from GenBank, noting that sequences joined at positions all along the A2 side. However, all sequences that joined the L1Md4/TF side were either at the bottom within about 0.2% divergence of one of the TF consensus sequences or at the top near the split with the A2 family (not shown). In order to document these impressions in a statistically testable fashion, we constructed the tree shown in figure 1B.
Representative additional sequences from GenBank were added to the tree, and bootstrap values were computed to test the validity of their distribution. We only chose to add a few sequences that were far enough apart that it might be possible for their order to be statistically supportable. In this case, the tree was constructed by the method of maximum parsimony, because this method is more tolerant of the large amount of missing data caused by using truncated sequences. The root sequence was supplemented with L1Md6, which is the closest available truncated LINE-1 sequence which we have found mapping above the L1Md4(TF)/A2 split. The spretus Ms475 sequence was also supplemented by adding the consensus of the closely related spretus Ms7024 family (Casavant and Hardies 1994b
).
The results shown in figure 1B support and extend the general statements made above about the distribution of branches along these lineages. A2 lineage members are shown mapping at several different heights on the A2 lineage, with bootstrap support for these being distinguishable positions. In contrast, the sequences mapping near the top of the L1Md4 lineage joined to form a statistically supportable separate lineage. That set is represented by L1MdZ, L1bg, and MMCONREG2 in figure 1B and is called the Z family. The Z family will be discussed further below. With the other major young LINE-1 lineages thus accounted for, it remains true that the spretus Ms475 and Ms7024 families are more closely related to the domesticus TF family than to any other major domesticus LINE-1 lineage.
To see if the spretus Ms475 and Ms7024 families also have the characteristic TF promoter sequences, we cloned and partially sequenced full-length LINE-1's from each of these subfamilies from SPRET/Ei. Both were found to have TF promoter arrays (not shown). Therefore, by a phylogenetic definition, these also belong to the TF family, although they do not fall within the 0.2% divergence interval defined for domesticus TF. The Ms475 and Ms7024 families are likely to be the TF sequences identified in M. spretus by hybridization (Saxton and Martin 1998
).
Three Ways to Time the domesticus/spretus TF Split>
It was then asked if the domesticus/spretus TF split was younger than the 1.1-MYA musculus/spretus speciation (She et al. 1990
), so as to indicate a horizontal transfer. Three different methods were applied to this question:
|
The spretus A2 Lineage
Mus spretus was previously shown to also have LINE-1's with A-type promoters by hybridization studies (Jubier-Maurin et al. 1987
). We wanted to create a probe with which to find members of any spretus version of the domesticus A2 lineage for the purpose indicated above. The phylogenetic analysis conducted above provided a list of shared variants mapping on the A2 side just below the A2/L1Md4 split (fig. 3
) upon which an oligonucleotide probe could be based. We have previously extensively investigated probes based on 6661 (G/A) and 6505 (C/T) (Rikke, Garvin, and Hardies 1991
; Rikke and Hardies 1991
), which are already positioned very near the A2/L1Md4 junction. These probes detected thousands of LINE-1 copies in M. domesticus but only a few hundred in M. spretus. Furthermore, the LINE-1 copies detected in M. spretus were not doubly positive, as is the case for the domesticus A2 members. Instead, these copies were interpreted as coincidental substitutions in the large family of older spretus LINE-1 pseudogenes, and their numbers were quantitatively consistent with that interpretation (Rikke and Hardies 1991
).
|
An oligonucleotide probe centered on 6704C was used to probe libraries carrying a representation of mouse LINE-1 sequences. One small library (SPLW31) contained RI-SacIdigested DNA from SPRET/Ei. Two additional libraries were made by PCR amplification of LINE-1 inserts between coordinates 6112 and 7111 carrying spretus DNA (SPLW39) or domesticus DNA (DMZLW40). In each case, the clones were transferred to 96-well plates, and filters representing over 1,000 gridded clones were prepared for hybridization to oligonucleotide o6704C. Three clones in spretus library SPLW31, 119 clones in spretus library SPLW39, and 423 clones in domesticus library DMZLW40 were detected by probe o6704C.
Two of the spretus o6704C-positive clones were sequenced and found to share novel variants at 11 positions plus one G-to-A substitution at position 6626 in parallel with the Ms475/7024 lineage. The novel lineage is shown in figure 3, ending in the two sequences SFA and SHA. The joining of SFA and SHA is supported by a bootstrap value of 100. To validate the generality of the sequence patterns in these two clones, oligonucleotide probes were made corresponding to 11 of these positions and were used to probe the filters from the larger spretus library. There were nine probes in all (see o6403Ao7065A in tables 2 and 4 ). Seven of these probes covered a single variant position, and two of them covered two variant positions each. Each of these probes detected between 89 and 98 clones, except for o6626A, which detected 434 clones. Ninety-nine clones hybridized to five or more of the probes, indicating a novel lineage of significant length and copy number.
To clarify what promoter type was on this novel family, we amplified full-length LINE-1 PCR products from individual clones in a spretus lambda library using either A-type or TF-type promoter sequence for one of the PCR primers. The probes for the new family hybridized to the products from full-length A-type PCR products and not from full-length TF-type PCR products (data not shown). Hence, this novel lineage is the spretus version of the A2 lineage, as intended.
The spretus A2 lineage joined the tree very near the A2/L1Md4(TF) split, consistent with the placement of the speciation well above the domesticus/spretus TF split. In a further attempt to find other spretus sequences joining farther down the domesticus A2 lineage, additional clones from the spretus library were selected for sequencing that were o6704C-positive but did not match any of the probes for the new spretus lineage. These all joined the tree at a slightly higher position than the spretus A2 lineage (data not shown). Probing the spretus library with o6505T, which is positioned just under 6704, produced only seven positive clones. This is consistent with the coincidental background in spretus previously noted for this domesticus-specific probe (Rikke, Garvin, and Hardies 1991
; Rikke and Hardies 1991
). Therefore, using the A2 family as an indicator of the speciation, it appears necessary to interpret the TF split as a transfer after speciation even without strongly relying on a molecular clock.
Estimation of Subfamily Copy Numbers from GenBank
If TF is 90% truncated, as are LINE-1 inserts on average (Hutchison et al. 1989
), then the number of 5' TF ends reported by DeBerardinis et al. (1989) would imply 24,000 3' TF ends in the domesticus genome. This seemed far more than would be expected from the few close relatives to L1Md4 that we had encountered in GenBank. Using the shared variants defined in figure 2
, we conducted a thorough survey of the numbers of TF sequences currently in GenBank in comparison with the numbers of A2 and Z sequences. Sequences belonging to each of these LINE-1 families were found by a series of Blast searches with short search keys designed to focus on sequences 1 Myr or less in age. These included 6433A, 6704C, 6424A, 6820T, the TF-specific bases between positions 7228 and 7269, several variants positioned above the A2/L1Md4 split, and TF and A-type promoter sequences. In order for this sample to be an unbiased estimator of the relative copy numbers of each lineage in the mouse genome, we excluded certain sequences as follows. All sequences that were cloned using some selection for LINE-1 were excluded (including L1MdA2, MMU15647, and the L1Md-Tf5-30 series). Also, sequences that did not overlap the 3' region analyzed here were excluded. Several duplicate entries were excluded. In addition, there are two others, L1spa and L1orl, that are unbiased with respect to their length but could not be used to estimate the usual genomic copy number because they were selected based on their presence in particular mutant mice. The remaining sequences were aligned and assigned to an interval on the framework tree using maximum parsimony. The numbers of sequences falling in each interval are displayed in figure 4
.
|
|
The M. spicilegus TF Family
The outcome of the above studies indicated that the domesticus/spretus TF split involved an exchange after speciation. As indicated in the Introduction, we had thought that domesticus might be the recipient of such an exchange, since it seems to lack pseudogenes from the ancestral TF lineage. However, the results above are not precisely consistent with a spretus-to-domesticus transfer, because the period between the domesticus/spretus TF split at 0.40.5 MYA and the beginning of the domesticus TF amplification at 0.10.2 MYA is similarly lacking in pseudogenes. Therefore, we explored the possibility that there might be another species that was a more immediate donor of the TF family to M. domesticus. A PCR library was made from the M. spicilegus strain PANCEVO and probed with the TF-specific probe o6424A. This produced 103 positive clones, comparable to the content in the domesticus library (246). Probing with a comparably recent L1MdA2-specific probe, o6433A, detected only 18 clones from the PANCEVO library, and only 1 of these was also positive with a second L1MdA2-specific probe, o6505T (compared with 191 out of 246 for the domesticus library). The absence of young L1MdA2 members in the PANCEVO library rules out both PCR contamination and any possibility that PANCEVO is a recent spicilegus x domesticus hybrid.
Four PANCEVO 6424A+ clones were sequenced (PA, PD, PK, and PL in fig. 2 ). They each joined the tree one substitution above the domesticus TF1/TF2 junction on the tree. Thus, they share all but one of the 15 diagnostic positions separating the domesticus TF family from the Z family. They also share two of three substitutions in the interval between the domesticus/spretus TF split and the amplification of the domesticus TF family. This indicates that an exchange of the TF family relating domesticus and M. spicilegus occurred even more recently than the exchange relating domesticus and spretus.
Since the original definition of TF revolved around the novel TF promoter, we also examined M. spicilegus to see that it contained TF promoter sequences very closely related to domesticus TF promoter sequences. We PCR-amplified a fragment from the M. spicilegus strain PANCEVO that contains 577 bp from the 5' end of TF, including 85 bp of the TF promoter itself. These were cloned, and two were sequenced. The spicilegus sequences each contained only one (unshared) substitution compared with the domesticus TF consensus. Thus, M. spicilegus also contains TF promoter and 5' UTR sequence that is within 0.2% of the domesticus TF consensus and is essentially indistinguishable from domesticus TF.
Finally, the age of the domesticus/spicilegus split was estimated from the divergence of the four 3' pseudogenes (18/4,000 bp) and the two 5' pseudogenes (2/1,154 bp) in aggregate. The result is an estimated date of 0.4 MYA. This further supports the timing of these events as well after the speciation of all three of these mouse species. From the molecular data alone, M. spicilegus would be a reasonable candidate to have carried the TF family between M. spretus and M. m. domesticus. However, the biogeography of these species suggests a more complicated scenario (see Discussion).
The Z Family
The novel lineage that splits off the top of the L1Md4 lineage and is represented by L1MdZ, L1bg, and MMCONREG2 in figure 1B
is defined by only a few sequence variants at the present level of characterization. One of these, 6820T, has also been observed in a few spretus sequences (S10, S11, and S18 in Rikke, Garvin, and Hardies [1991]
). To determine if the sequences sharing 6820T amounted to a substantial new group, we prepared a probe specific for 6820T. This probe detected 126 and 141 clones in our spretus and domesticus libraries, respectively, yielding the copy numbers in figure 5
. As expected of a separate lineage, these clones were essentially exclusive of the ones detected by A2-specific probes, TF-specific probes, or Ms475-specific probes. In spretus, only 5 out of 126 of the 6820T clones were positive for o6704C and none was positive for o6852C/5C. In domesticus, only 14/141 clones were positive for o6704C, and only one was positive for TF-specific probe o6424A (see below). We refer to this group as the Z family after one of its members, L1MdZ (Kraft, Kadyk, and Leinwand 1992
).
Discussion
Horizontal Transfer of the TF Family
One motivation for this study was to try to understand how a transposon lineage could be apparently quiescent for a long period and then suddenly become very active. Discontinuities in the intensity of LINE-1 subfamily amplification have been described for the mouse/rat Lx family (Furano et al. 1994
) and for LINE-1 lineage 2 in Peromyscus (Casavant et al. 1998
). One might also infer discontinuity in the evolution of vole LINE-1 through the comparison of the results of Vanlerberghe et al. (1993)
with those of Modi (1996)
. Adey et al. (1994)
discussed LINE-1 evolution as the cyclical generation and loss of lineages, with the loss perhaps being due to displacement by new lineages. In contrast, periods of relatively continuous and highly coalescent development of LINE-1 lineages are also observed, for example, in Peromyscus LINE-1 lineage 1 (Casavant et al. 1998
) and in the M. spretus Ms475 family (Casavant and Hardies 1994b
). Verneau, Catzeflis, and Furano (1997)
made use of periods of rat LINE-1 evolution with continuous output to achieve high resolution in the phylogenetic analysis of rat species (see also Verneau, Catzeflis, and Furano 1998
).
Edgell et al. (1987)
have pointed out how apparently smoothly amplifying families can be an artifact of low resolution. It has also been pointed out that apparently low intensity of amplification in the early part of a lineage can be a sampling artifact (Casavant, Sherman, and Wichman 1996
; Casavant et al. 1998
). In the current study, we addressed sampling bias through both inclusion of the A2 lineage and inclusion of a tabulation of unbiased sequences in GenBank. Episodic LINE-1 amplification is reminiscent of SINE evolution and has prompted the analogy of a single "master" locus that was quiescent (or maybe even absent) and then became activated, triggering the amplification (Deininger et al. 1992
).
The discontinuity with TF is that it made less than 500 detectable copies in domesticus or its M. musculus ancestors for approximately 0.9 Myr, and then amplified several thousand copies in M. domesticus in the last 0.10.2 Myr. Our results establish a common ancestor of M. m. domesticus TF LINE-1 with families in other species after their speciation. A common ancestor after speciation necessarily implies a horizontal transfer, and that the recipient species should be deficient in LINE-1 pseudogenes originating from the lineage prior to the transfer. Although our results do not directly establish a direction of transfer, the deficiency of earlier TF pseudogenes in M. m. domesticus suggests that it is a recipient.
If there were only two species involved, then finding a common ancestor to the TF family after speciation would indicate a direct horizontal transfer. However, when there are exchanges involving more than two species, then one species may have carried the TF family between others that were never in direct contact. The biogeography of these species suggests that none of them were in direct contact. The geographical range of M. spicilegus is north of the Black Sea, where it is not in contact with either M. spretus or M. m. domesticus (Bonhomme 1986
). Mus spretus and M. m. domesticus are not thought to have come in contact until more recent times (Thaler 1986
). Thus, there may well have been a fourth species that carried the TF family among these others. One excellent candidate would be Mus macedonicus, which resides in a Middle Eastern location in contact with each of the other species (Bonhomme 1986
). Alternatively, M. m. musculus has contact with both M. m. domesticus and M. spicilegus.
It is unclear to what extent horizontal transfers will explain other discontinuities in the evolution of LINE-1, but it should be clear that interpreting the evolutionary history without comparative data for other species risks serious misinterpretation. These species are able to mate and produce fertile offspring in captivity (Bonhomme, Martin, and Thaler 1978
; Bonhomme et al. 1979
; Zechner et al. 1996
); therefore, there is as yet no need to invoke a mechanism of transfer other than introgression.
Size of the domesticus TF Family
The current study also clarifies the intensity of the TF amplification. One might have expected a total copy number of 24,000 TF 3' ends per haploid genome from the numbers of TF promoter detected by DeBerardinis et al. (1998)
and the 90% average truncation rate (Hutchison et al. 1989
). However, the true number of TF 3' ends is in the range of 3,7005,100. This implies a reduced degree of truncation in the TF family with a corresponding impact on its replicative efficiency. This impression of lesser truncation is also reinforced by the observation that, excluding the L1Md-Tf5-30 series that was selected to be full-length, four of eight intact TF sequences found in GenBank are full-length. The GenBank search should have been sensitive to even extremely short LINE-1's, because we searched for a collection of TF-specific variants that fall within 150 bp of the 3' end.
The reduced estimate for the domesticus TF copy number puts the TF amplification back within the range of the average sustained LINE-1 production in the mouse. This issue has previously been analyzed in detail for the spretus Ms475/7024 lineages, which have a similar high rate of recent amplification but with the difference that amplification also occurred in earlier times, resulting in an even higher copy number than that of domesticus TF. Yet, the Ms475/7024 amplification does not exceed the limits required to fit in a model of continuously sustained output (Hardies and Rikke 1989
; Casavant and Hardies 1994b
). Thus, the facet of domesticus TF amplification that is most unusual is not so much the degree of amplification exhibited as the extensive quiescent period preceding this amplification. The horizontal transfer explains this pattern without requiring a molecular activation event. Specifically, the sudden amplification of TF in M. domesticus had nothing to do with the recombination that created the TF structure in the first place. That recombination would have taken place before the spretus/domestics split on the TF lineage, long before the activity observed in domesticus.
Z Family
An intriguing aspect of the Z family is that one of its members is L1bg, which is an insert of recent origin causing the beige mutation (Perou et al. 1997
). L1bg shares six informative positions with L1MdZ and is supported as a member of the Z family by a bootstrap value of 99. A fourth insert of recent origin, L1med, is a 72-bp LINE-1 insert in the sodium channel Scn8a gene causing the med mutant allele (Kohrman, Harris, and Meisler 1996
). L1med covers only coordinates 72997362 and therefore overlaps only a few informative positions. One of these excludes it from the A2 lineage, and one (7309) excludes it from TF. It shares a two-base insertion at position 7309 with L1bg. Therefore, L1med probably comes from the same active subfamily of Z as does L1bg.
The finding of de novo inserts mapping to the Z family indicates the existence of a second currently active LINE-1 subfamily in the mouse. Thus far, de novo inserts have always been derived from full-length, fully active LINE-1 source loci, although no full-length Z family members have yet been characterized. Unlike TF, this family seems to have no shortage of truncated members. The promoter type for the Z family is unknown.
Acknowledgements
We acknowledge Mandy Rolando for excellent technical assistance and Eric Rogers for assistance in constructing the SPLW31 library. We thank Dr. Francois Bonhomme for helpful discussions. This work was supported by NIH grant GM51847.
Footnotes
Thomas H. Eickbush, Reviewing Editor
1 Present address: Department of Agronomy, Iowa State University.
2 Keywords: mouse
LINE-1
transposon
horizontal gene transfer
3 Address for correspondence and reprints: Stephen C. Hardies, Department of Biochemistry, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78284-7760. E-mail: hardies{at}uthscsa.edu
literature cited
Adey, N. B., S. A. Schichman, D. K. Graham, S. N. Peterson, M. H. Edgell, and C. A. Hutchison III. 1994. Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences. Mol. Biol. Evol. 11:778789.
Bonhomme, F. 1986. Evolutionary relationships in the genus Mus. Curr. Top. Microbiol. Immunol. 127:1954.
Bonhomme, F., F. Benmehdi, J. Britton-Davidian, and S. Martin. 1979. Genetic analysis of interspecific crosses Mus musculus L. x Mus spretus Lataste: linkage of Adh-1 with Amy-1 on chromosome 3 and Es-14 with Mod-1 on chromosome 9. C. R. Seances Acad. Sci. 289:545548.
Bonhomme, F., S. Martin, and L. Thaler. 1978. Hybridization between Mus musculus L. and Mus spretus Lataste under laboratory conditions. Experientia 34:11401141.
Casavant, N. C., and S. C. Hardies. 1994a. Shared sequence variants of Mus spretus LINE-1 elements tracing dispersal to within the last 1 Myr. Genetics 137:565572.
. 1994b. The dynamics of murine LINE-1 subfamily amplification. J. Mol. Biol. 241:390397.
Casavant, N. C., R. N. Lee, A. N. Sherman, and H. A. Wichman. 1998. Molecular evolution of two lineages of L1 (LINE-1) retrotransposons in the California mouse, Peromyscus californicus. Genetics 150:345357.
Casavant, N. C., A. N. Sherman, and H. A. Wichman. 1996. Two persistent LINE-1 lineages in Peromyscus have unequal rates of evolution. Genetics 142:12891298.
DeBerardinis, R. J., J. L. Goodier, E. M. Ostertag, and H. H. Kazazian Jr. 1998. Rapid amplification of a retrotransposon subfamily is evolving the mouse genome. Nat. Genet. 20:288290.[ISI][Medline]
DeBerardinis, R. J., and H. H. Kazazian Jr. 1999. Analysis of the promoter from an expanding mouse retrotransposon subfamily. Genomics 56:317323.
Deininger, P. L., M. A. Batzer, C. A. Hutchison III, and M. H. Edgell. 1992. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8:307311.[ISI][Medline]
Edgell, M. H., S. C. Hardies, D. D. Loeb, W. R. Shehee, R. W. Padgett, F. H. Burton, M. B. Comer, N. C. Casavant, F. D. Funk, and C. A. Hutchison III. 1987. The L1 family in mice. Prog. Clin. Biol. Res. 251:104129.
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791.
Furano, A. V., B. E. Hayward, P. Chevret, F. Catzeflis, and K. Usdin. 1994. Amplification of the ancient murine Lx family of long interspersed repeated DNA occurred during the murine radiation. J. Mol. Evol 38:1827.
Hardies, S. C., S. L. Martin, C. F. Voliva, C. A. Hutchison III, and M. H. Edgell. 1986. An analysis of replacement and synonymous changes in the rodent L1 repeat family. Mol. Biol. Evol. 3:109125.[Abstract]
Hardies, S. C., and B. A. Rikke. 1989. A selfish retrotransposition model for rodent LINE-1. Pp. 127134 in M. Clegg and S. O'Brien, eds. Molecular evolution; UCLA symposia on molecular and cellular biology, new series. Vol. 122. Alan R. Liss, New York.
Hutchison, C. A. III, S. C. Hardies, D. D. Loeb, W. R. Shehee, and M. H. Edgell. 1989. LINES and related retroposons: long interspersed repeated sequences in the eukaryotic genome. Pp. 593617 in D. E. Berg and M. M. Howe, eds. Mobile DNA. ASM, Washington, D.C.
Jubier-Maurin, V., P. Wincker, G. Cuny, and G. Roizes. 1987. The relationships between the 5' end repeats and the largest members of the L1 interspersed repeated family in the mouse genome. Nucleic Acids Res. 15:73957410.[Abstract]
Kazazian, H. H. Jr., and J. V. Moran. 1998. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19:1924.[ISI][Medline]
Kohrman, D. C., J. B. Harris, and M. H. Meisler. 1996. Mutation detection in the med and medJ alleles of the sodium channel Scn8a. Unusual splicing due to a minor class AT-AC intron. J. Biol. Chem. 271:1756717581.
Kraft, R., L. Kadyk, and L. A. Leinwand. 1992. Sequence organization of variant mouse 4.5 S RNA genes and pseudogenes. Genomics 12:555566.
Loeb, D. D., R. W. Padgett, S. C. Hardies, W. R. Shehee, M. B. Comer, M. H. Edgell, and C. A. Hutchison III. 1986. The sequence of a large L1Md element reveals a tandemly repeated 5' end and several features found in retrotransposons. Mol. Cell. Biol. 6:168182.[ISI][Medline]
Martin, S. L., C. F. Voliva, S. C. Hardies, M. H. Edgell, and C. A. Hutchison III. 1985. Tempo and mode of concerted evolution in the L1 repeat family of mice. Mol. Biol. Evol. 2:127140.[Abstract]
Modi, W. S. 1996. Phylogenetic history of LINE-1 among arvicolid rodents. Mol. Biol. Evol. 13:633641.[Abstract]
Naas, T. P., R. J. DeBerardinis, J. V. Moran, E. M. Ostertag, S. F. Kingsmore, M. F. Seldin, Y. Hayashizaki, S. L. Martin, and H. H. Kazazian Jr. 1998. An actively retrotransposing novel subfamily of mouse L1 elements. EMBO J. 17:590597.
Perou, C. M., R. J. Pryor, T. P. Naas, and J. Kaplan. 1997. The bg allele mutation is due to a LINE-1 element retrotransposition. Genomics 43:366368.
Rikke, B. A. 1992. Development of Mus domesticus-specific and Mus spretus-specific LINE-1 DNA probes as a novel genetic tool. Ph.D. dissertation, University of Texas Health Science Center, San Antonio, TX.
Rikke, B. A., L. D. Garvin, and S. C. Hardies. 1991. Systematic identification of species specificity in LINE-1 repetitive sequences between Mus domesticus and Mus spretus. J. Mol. Biol. 219:635643.
Rikke, B. A., and S. C. Hardies. 1991. LINE-1 repetitive DNA probes for species-specific cloning from Mus spretus and Mus domesticus genomes. Genomics 11:895904.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Saxton, J. A., and S. L. Martin. 1998. Recombination between subtypes creates a mosaic lineage of LINE-1 that is expressed and actively retrotransposing in the mouse genome. J. Mol. Biol. 280:611622.[ISI][Medline]
She, J. X., F. Bonhomme, P. Boursot, L. Thaler, and F. M. Catzeflis. 1990. Molecular phylogenies in the genus Mus: comparative analysis of electrophoretic, scnDNA hybridization and mtDNA RFLP data. Biol. J. Linn. Soc. 41:83103.[ISI]
Swofford, D. L. 1998. PAUP*, phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer, Sunderland, Mass.
Thaler, L. 1986. Origin and evolution of mice: an appraisal of fossil evidence and morphological traits. Curr. Top. Microbiol. Immunol. 127:311.[ISI][Medline]
Vanlerberghe, F., F. Bonhomme, C. A. Hutchison III, and M. H. Edgell. 1993. A major difference between the divergence patterns within the LINE-1 families in mice and voles. Mol. Biol. Evol. 10:719731.[Abstract]
Verneau, O., F. Catzeflis, and A. V. Furano. 1997. Determination of the evolutionary relationships in Rattus sensu lato (Rodentia: Muridae) using L1 (LINE-1) amplification events. J. Mol. Evol. 45:425436.
. 1998. Determining and dating recent rodent speciation events by using L1 (LINE-1) retrotransposons. Proc. Natl. Acad. Sci. USA 95:1128411289.
Woo, S. L. 1979. A sensitive and rapid method for recombinant phage screening. Pp. 389395 in R. Wu, ed. Methods in enzymology. Vol. 68. Academic Press, New York.
Zechner, U., M. Reule, A. Orth, F. Bonhomme, B. Strack, J.-L. Guenet, H. Hameister, and R. Fundele. 1996. An X-chromosome linked locus contributes to abnormal placental development in mouse interspecific hybrids. Nat. Genet. 12:398403.[ISI][Medline]