(Received for publication, August 24, 1994; and in revised form, November 18, 1994)
From the
The mouse LINE-1 (L1) retrotransposon contains two open reading frames (ORFs). Three classes of the protein encoded by the first open reading frame (ORF1) are expressed in the mouse embryonal carcinoma cell line, F9; the apparent molecular sizes of these proteins are 41.3, 43, and 43.5 kDa. Two of these three proteins (41.3 and 43 kDa) are translated in vitro from full-length, sense-strand L1 RNA isolated from ribonucleoprotein particles. A reverse transcription-polymerase chain reaction approach was used to clone the ORF1 region from RNA isolated from ribonucleoprotein particles, then the coding capacity of these clones was examined using in vitro transcription and translation. Multiple sequences that encode ORF1 were recovered by this approach, indicating that multiple loci of L1 in the mouse genome are expressed in F9 cells. In addition, L1 sequences with intact ORF1 regions appear to be selectively enriched in the ribonucleoprotein particles.
The LINE-1, or L1, ()family of interspersed repeats
comprises at least 10% of the mammalian genome. Like other interspersed
repeated DNA families in genomes of other organisms, L1 is dispersed
and amplified throughout the genome by a series of duplicative
transposition events. The majority of L1 elements in the mouse genome
are 5`-truncated and rearranged, leaving only
10,000 full-length
copies(1) . Truncated copies originate from active, full-length
versions of the element by retrotransposition (1, 2, 3) . It has been estimated that only
about 60 copies of L1 in the mouse genome are competent or active for
transposition(4) .
The biochemical details of the intermediate steps in L1 retrotransposition are unknown. Nevertheless, it is reasonable to hypothesize that L1 retrotransposition involves the expression of a full-length, sense-strand L1 transcript and the polypeptide products of ORF1 and ORF2(1, 5) . Full-length, sense-strand L1 transcripts and ORF1 protein have been detected in both human and mouse cell lines, primarily teratocarcinoma or embryonal carcinoma(6, 7, 8, 9) , and also in mouse testis(10) . In the mouse embryonal carcinoma cell line, F9, full-length, sense-strand L1 RNA is found in ribonucleoprotein particles (RNP), which also appear to include ORF1 protein(6) . In addition, human L1 RNA appears to be present in a high-molecular weight complex with reverse transcriptase activity in Ntera2D cells(11) .
Due to the high copy number of L1 sequences in the genome, L1 is abundantly represented in the RNA population of most cells. However, most of the transcripts that contain L1 are the result of fortuitous transcription and are not intermediates in L1 retrotransposition. This high background of L1-containing transcripts, many of which are truncated and rearranged, makes it difficult to distinguish the transcript encoded by an active L1 element(s). Two active human L1 elements revealed themselves by retrotransposing into the human factor VIII and dystrophin genes(2, 3, 12) . Both of these truncated, defective insertions provided sequence tags that allowed isolation of their active progenitors(2, 3) . Recently, the first potential example of a similar insertional mutagenesis event in the mouse genome has been reported(13) . However, the amount of sequence data that was reported is insufficient to discern whether the inserted element was derived from an active version of mouse L1.
There are two major subfamilies of mouse L1, A and F. The A subfamily is evolutionarily the youngest and is transcriptionally active(14) . Within the A subfamily, there are three length polymorphisms in the ORF1 coding region(14) . The shortest of these length variations defines subgroup 1, which is represented by L1MdA2(15) . Subgroup 2 contains an additional 42 bp in the 5` end of the ORF1 coding sequence and is represented by L1MdA9(16) . Although L1 elements with the subgroup 2 polymorphism are at least as abundant in the genome as subgroup 1 elements, they are not thought to be transcribed in F9 cells (14) .
In this study, the L1 sequences that are transcribed in F9 cells are characterized in more detail. In particular, we focused on RNA from L1-RNPs, which are enriched for 7.5-kb, full-length L1 transcripts. The ORF1 coding capacity of these transcripts was examined in order to identify the pool of L1 elements that express ORF1 protein in F9 cells and, therefore, are likely to represent active elements.
Figure 1:
Structure of mouse L1 and location of
probes used for this study. The structure of L1MdA2 (15) is
shown, including the location of relevant restriction endonuclease
cleavage sites; A indicates the A-rich tail at the
3` end of the element; the graylines on the extreme
ends indicate sequence in which the A2 element was inserted (target
sequence). The line labeled A2 represents the in
vitro transcript from pVK15 linearized with Esp31, used
for in vitro translation of ORF1. Locations and strand
specificity of the oligonucleotides used for PCR and DNA sequencing are
indicated by the arrows marking their 3`
end.
Oligonucleotides were purchased from DNA Express (Colorado State University) for use as PCR or sequencing primers. Oligonucleotides, o21 (5`-GAAGAACAAGCTTTTAACAGTG-3`) and o22 (5`-GAGTTGGAATTCTGTTCTTGTGG-3`), were used for the PCR amplification. Purified plasmids were sequenced with the T7 and T3 promoter primers (Promega) and o2 (5`-TTAGTTCTAGTATGGTTT-3`).
These transcripts were used for in
vitro translation in rabbit reticulocyte lysates (Promega) in the
presence of [S]methionine (Amersham). Proteins
were analyzed by SDS-PAGE following immunoprecipitation. Gels were
treated for fluorography in EN
HANCE (DuPont), then dried
and exposed to Hyperfilm-MP (Amersham).
The organization of a mouse L1 element and the location of probes used for these studies are shown in Fig. 1.
Figure 2:
Multiple forms of L1 ORF1 protein in F9
cells. Arrows indicate ORF1 proteins. A,
immunoprecipitation using ORF1 antibody following in vitro transcription and translation of L1MdA2 (lane1, A2 in Fig. 1) or from cytoplasmic extracts of F9 cells
after metabolic labeling (lane2). Preimmune serum
was used for the immunoprecipitation of the F9 cell extract in lane3. B, Western blot analysis of
immunoprecipitated proteins detecting ORF1 antibody with alkaline
phosphatase-conjugated goat anti-rabbit IgG. F9 cytoplasmic extract was
immunoprecipitated with ORF1 antibody (lane 2) or a 5-fold
excess of preimmune IgG (lane1). The stronger upper
band, particularly evident in lane1, is the IgG used
for immunoprecipitation. C, Western blot of F9 cytoplasmic
extract using ORF1 antibody and I-protein A (lane
1). Lane 2 contains the 44.6-kDa ORF1 fusion protein
expressed in bacteria(7) . A larger, 53-kDa protein (lane
1) was variably observed. p43 and p43.5 are not resolved on this
gel.
Quantitative phosphorimage
analysis of the ORF1 proteins that are immunoprecipitated from F9 cells
and detected by Western blotting with I-protein A (Fig. 2C, lane1) reveals that the
larger two forms (43 and 43.5 kDa) are approximately 5-fold more
abundant than the smaller form. All three of these ORF1 polypeptides
are rare in F9 cells; comparison of the signals obtained on Western
blots for known amounts of F9 protein extract and bacterially expressed
ORF1 fusion protein indicates that the three forms of ORF1 in F9 cells
together account for less than 0.001% of the protein recovered in the
cytoplasmic extract (Fig. 2C).
Initial attempts to detect L1 ORF1 protein following in vitro translation of poly(A) RNA from F9 cells and
immunoprecipitation with ORF1 antibody were not successful, in spite of
an efficient in vitro translation reaction. This is most
likely due to a combination of the relatively low abundance of L1 RNA
in the RNA population and its poor translation efficiency. The presence
of a C at -3 relative to the start of translation results in a
weak Kozak consensus sequence (19) for the subgroup of L1 that
has been reported to be predominantly transcribed in F9 cells (subgroup
1; (14) ).
To enrich the relative abundance of L1 sequences
in the RNA for in vitro translation, poly(A) RNA from L1 RNP was used (Fig. 3). L1 RNP were prepared by
sucrose gradient fractionation; this preparation is known to be
enriched in full-length 7.5-kb L1 RNA, relative to truncated versions
of L1 and the majority of other cellular mRNAs (e.g. actin; (6) ). A rabbit reticulocyte lysate programmed with this
poly(A)
RNA (isolated from about 10
cells)
reproducibly yielded two proteins (p41 and p43) that could be
immunoprecipitated from a more complex mixture of proteins with ORF1
antibody, but not with preimmune IgG (Fig. 4). Mixing
experiments demonstrate that p41 comigrates with ORF1 protein
translated in vitro from an L1MdA2 RNA template (data not
shown). It seems unlikely that p41 is posttranslationally modified in
the reticulocyte lysate after translation from L1 RNP RNA to give rise
to p43, because there is no evidence for such a modification of the p41
translated from L1MdA2 RNA in a parallel reaction. Since p41 appears to
be identical to ORF1 from L1MdA2, based on immunoreactivity and
electrophoretic mobility, p41 and p43 must be encoded by distinct
mRNAs.
Figure 3:
Northern blot of L1 RNA from F9 cells.
Lanes contain poly(A) RNA isolated from sucrose
gradient fractions enriched in L1-RNP (lane1) or
from total RNA from F9 cells (lane2). 180 pg of
6.0-kb RNA transcribed in vitro (A2 in Fig. 1)
was loaded in lane3. The blot was hybridized to
probe 1 (Fig. 1). Arrow indicates the 7.5-kb
L1-RNA.
Figure 4:
SDS-PAGE of proteins translated in
vitro. The translation was programed with poly(A) RNA from L1 RNP (the same as shown in Fig. 3, lane1) and immunoprecipitated with preimmune IgG (lane1) or ORF1 antibody (lane2). The total
translation reaction is shown in lane3. Arrows indicate the two proteins that reproducibly immunoprecipitated
with ORF1 antibody, p41 and p43.
As was observed with proteins isolated from F9 cells, there is approximately 5-fold more 43-kDa ORF1 protein than 41-kDa ORF1 protein after in vitro translation ( Fig. 2and Fig. 4). Because of the different methods of protein detection, these quantitation results are tentative; nevertheless, the fact that there is the same ratio between p41 and p43 in two different types of experiment suggests that RNAs encoding both p41 and p43 are similarly enriched during RNP fractionation. This assumes that the relative efficiency of translation of the two messages is the same in reticulocyte lysates and in F9 cells.
Figure 5:
RT-PCR amplification of the ORF1 region of
RNA isolated from L1 RNP. Ethidium bromide-stained agarose gel shows
DNA digested with HindIII (lane1),
the RT-PCR reaction (lane2), and the corresponding
PCR reaction in the absence of reverse transcriptase (lane3).
The PCR-amplified fragments were cloned into pBluescript. 32 clones, known to contain L1 ORF1 by hybridization to probe 1 (Fig. 1), were pooled into six groups (four to six clones in each group) and characterized by in vitro transcription followed by in vitro translation. The resulting ORF1 proteins from three of the six groups contained clones encoding a larger form of ORF1 than the one encoded by L1MdA2 (41.3 kDa). Further analysis of the coding capacity of individual clones from two of these groups revealed that seven (cD33, cD36, cD37, cD40, cD42, cD46, and cD47) out 10 encode a protein that comigrates with ORF1 translated in vitro from L1MdA2 (Fig. 6A). Of the remaining three, two clones (cD35 and cD43) encode a 42.0-kDa protein, and the third, cD39, encodes a 42.5 kDa protein (Fig. 6A). To compare the mobility of this 42.5-kDa protein directly with p43, the two were mixed and examined by SDS-PAGE. The result of the mixing experiment demonstrates that these two proteins are distinct (Fig. 6B). Thus, the majority of ORF1-containing clones that were isolated by RT-PCR and studied by in vitro translation encode a protein whose mobility on SDS-PAGE is indistinguishable from the ORF1 encoded by L1MdA2, and none of them encode p43.
Figure 6:
In vitro translation of L1 ORF1
clones. A SDS-PAGE of ORF1 proteins encoded by the individual, cloned
PCR-amplified fragments (indicated by their cD number). Lanes marked A2 or FP1 contain the products of in
vitro transcription and translation from pVK15 or the construct
used to make the ORF1 antibody(7) , respectively. B,
SDS-PAGE of proteins translated in vitro from A2, cD39, and
poly(A) RNA from RNP. All proteins were
immunoprecipitated with ORF1 antibody prior to
SDS-PAGE.
Figure 7:
DNA sequence alignment of L1 ORF1 clones
in the NH-terminal variable domain. Numbering starts with
the A in the first ATG (bold). The cDNAs, which represent
cloned RT-PCR amplified fragments, are numbered between 33 and 47. Only
the positions that differ from the ORF1 consensus sequence (14) are shown. Dottedline indicates the
region of the 42-bp insertion that defines subgroup 2. Slash (/) denotes the end of the determined sequence. The sequences of
cD35 between +1 and +130, and cD47 5` to +81, were not
determined. The HindIII site that was introduced in the
sequence during PCR amplification is underlined.
The sequence of 198 bp from the 3` end of seven of these cDNA clones was determined using the T3 primer. No amino acid replacements were detected in the carboxyl-terminal region (153 bp) of ORF1 between L1MdA2 and cD clones 33, 37, 40, 46, and 47. The only silent substitution is found 18 bp upstream of the termination codon in cD40. The two clones encoding larger forms of ORF1, cD39 and cD43, contain 2- and 3-bp substitutions, respectively, in this region.
Sucrose gradient fractionation of F9 cell extracts leads to
an enrichment of full-length L1 RNA in a high molecular weight
nucleoprotein complex, L1 RNP(6) . In vitro translation of poly(A) RNA isolated from these
RNP yields two proteins that are recognized by ORF1 antibody and have
identical mobilities by SDS-PAGE to the proteins from F9 cells (p41 and
p43). A third form of ORF1 is detected in F9 cell extracts but not
after in vitro translation of L1 RNP RNA (p43.5) and is likely
to result from posttranslational modification of p43.
More detailed characterization of the L1 RNA present in RNP by in vitro translation, RT-PCR, and DNA sequence analysis demonstrates that it is a heterogeneous population of L1 sequences, rather than a single RNA species. Clearly, the RNP RNA contains distinct transcripts that specify translation of the 41- and 43-kDa forms of ORF1, yet clones that encode p43 were not isolated in this study. The most likely explanation for the absence of clones encoding p43 after RT-PCR is that those cDNA(s) were not amplified with one or both of the primers that were used for the PCR. p43 could be a retrotransposed, truncated L1 (but retaining enough of ORF1 to be recognized by the antibody) that has acquired a new 5` sequence that includes a promotor for transcription in F9 cells. Alternatively, p43 could be encoded by either an entirely new family of L1 that is represented by small number of copies in the genome, a rearranged element, or an unrelated protein that fortuitously shares an epitope with L1. The latter possibility is unlikely because the mRNAs encoding p41 and p43 cofractionate as RNP through sucrose gradients in conditions where polysomes are known to be disrupted. Additional experiments to isolate and characterize the sequence(s) encoding p43 are in progress.
L1 RNP transcripts that
encode a 41-kDa form of ORF1 protein are heterogeneous, since six out
of seven of them contain at least one amino acid substitution relative
to each other and to L1MdA2. This high frequency of replacement
substitutions (6 replacement/2 silent) is expected from a coding
sequence evolving in the absence of selective constraint (20) and is also consistent with the high variability observed
between species in the NH-terminal region of L1
ORF1(21) . Thus, it seems likely that this part of the protein
has little impact on its functional role.
ORF1 was intact in all 10
clones that were tested individually in this study using in vitro translation. This result contrasts with the results of a previous
study, where 2 out of 10 ORF1 sequences contained a single nucleotide
deletion that disrupted the coding sequence of ORF1(14) . Seven
of the remaining sequences were open throughout the region studied, but
this region was too small to conclude that ORF1 is intact, and coding
capacity was not evaluated by translation(14) . The relatively
higher proportion of clones with an intact ORF1 in cDNAs derived from
poly(A) RNA derived from L1 RNP compared to F9
suggests that there is an enrichment for transcripts that encode intact
ORF1 in RNP over a more heterogeneous population of L1 transcripts
present in F9 cells.
A similar story emerges from studies of human L1. A protein corresponding to ORF1 could not be translated in vitro from two of five cDNA clones that were derived from full-length RNA from NTera2D cells(8, 9) . The ORF1 proteins encoded by the remaining three cDNA clones differed in mobility on SDS-PAGE from endogenous ORF1 protein isolated from the same cells, yet the ORF1 encoded by a known, active human L1 had the same mobility on SDS-PAGE as the endogenous ORF1 in NTera2D cells(22) . This result again suggests some type of selection for a subset of the transcribed sequences leading to production of ORF1 protein and active transposition. The two elements of human L1 that are known to be active for transposition contain intact ORF1(2, 3) .
These observations, together with sequence analysis which supports evolutionary arguments that active elements dominate the retrotransposition process(1) , suggest that retrotransposition is most likely to occur when L1 ORF proteins are supplied in cis from the L1 transcript. This leads to the hypothesis that L1 RNP, likely intermediates in L1 transposition, are assembled co-translationally, and makes the prediction that only L1 transcripts encoding functional ORF1 can be assembled into RNP complexes. This prediction is supported by the results of this study that demonstrate an enrichment for intact ORF1s in the full-length L1 transcripts found in RNPs. Thus, it appears likely that selection of full-length transcripts with intact ORF1 coding capacity from L1 RNP provides a useful handle for isolation of an active version of mouse L1.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U16662[GenBank]-U16672[GenBank].