(Received for publication, May 18, 1995)
From the
Replication of kinetoplast DNA minicircles of trypanosomatids initiates at a conserved 12-nucleotide sequence, termed the universal minicircle sequence (UMS, 5`-GGGGTTGGTGTA-3`). A single-stranded nucleic acid binding protein that binds specifically to this origin-associated sequence was purified to apparent homogeneity from Crithidia fasciculata cell extracts. This UMS-binding protein (UMSBP) is a dimer of 27.4 kDa with a 13.7-kDa protomer. UMSBP binds single-stranded DNA as well as single-stranded RNA but not double-stranded or four-stranded DNA structures. Stoichiometry analysis indicates the binding of UMSBP as a protein dimer to the UMS site. The five CCHC-type zinc finger motifs of UMSBP, predicted from its cDNA sequence, are similar to the CCHC motifs found in retroviral Gag polyproteins. The remarkable conservation of this motif in a family of proteins found in eukaryotic organisms from yeast and protozoa to mammals is discussed.
Kinetoplast DNA (kDNA) ()is a unique extrachromosomal
DNA network found in the single mitochondrion of parasitic flagellated
protozoa of the family Trypanosomatidae. In Crithidia
fasciculata, kDNA consists of about 5,000 DNA minicircles (2.5
kilobase pairs each) and about 50 DNA maxicircles (37 kilobase pairs
each) interlocked topologically to form a huge DNA network (for review,
see (1, 2, 3) ). Minicircles are
heterogeneous in their nucleotide sequence but contain two short
sequences, 70-100 base pairs apart, that are conserved in all the
species studied so far: the dodecamer sequence known as universal
minicircle sequence (UMS), 5`-GGGGTTGGTGTA-3`, and the hexamer
sequence 5`-ACGCCC-3`.
On the basis of in vivo observations, Englund and co-workers (4, 5, 6, 7, 8) have described the replication of kDNA minicircles as a process in which individual minicircles are detached from the central zone of the disc-shaped network, replicated, and reattached to the periphery of the disc. The network increases in size until it doubles and then divides and segregates into two daughter networks. Extensive studies of minicircle replication intermediates(9, 10, 11, 12, 13, 14, 15, 16) have suggested that replication begins at the UMS site with synthesis of an RNA primer and proceeds by continuous elongation of the leading light strand (L-strand). A single gap of 6-10 nucleotides remains in the newly synthesized light strand at the UMS site (13) and is repaired only after replication of the minicircles and their reattachment to the network have been completed(8) . Discontinuous synthesis of the lagging heavy strand (H-strand) starts when its origin, containing the conserved hexamer sequence, is exposed by the advancing replication fork. Highly gapped and nicked nascent H-strands are generated.
We have previously reported on the
recognition of UMS by a unique sequence-specific single-stranded DNA
binding protein from C. fasciculata(17) and on the
isolation and analysis of the UMSBP-encoding cDNA(18) . The
amino acid sequence of the polypeptide, predicted from the cDNA, is 116
residues long and contains five
Cys-X-Cys-X
-His-X
-Cys
(CCHC)-type zinc finger motifs. CCHC-type zinc finger motifs have been
found in one or two copies in the retroviral nucleocapsid proteins and
their Gag precursors (19) and in proteins of plant viruses and
of eukaryotic cells. It has been suggested that this type of zinc
finger is involved in binding of single-stranded nucleic
acids(20) . UMSBP belongs to a distinct group of cellular
proteins that contains several (5, 6, 7, 8, 9) adjacent
CCHC motifs, including cellular nucleic acid binding protein (CNBP)
from human and mouse, which binds a G-rich single-stranded sequence of
a sterol regulatory element(21) ; hexamer binding protein
(HEXBP) from Leishmania major, which binds a G-rich
single-stranded repeated sequence found in the 5`-untranslated region
of the gene encoding GP63(22) ; byr3 from Schizosacharomyces pombe(23) ; and CnjB from Tetrahymena thermophila(24) . The structure of the
CCHC motifs of the HIV-1 Gag polyprotein was determined using NMR
spectroscopy(25, 26, 27, 28) . These
studies have revealed a very compact and well-defined structure,
stabilized by coordination of the three cysteines and the histidine
residue to the zinc ion and by extensive internal hydrogen bonding.
Here we describe the purification to apparent homogeneity of UMSBP from C. fasciculata cell extracts and the physical characteristics and specific nucleic acid binding properties of the protein. Finally we discuss the sequence and structure conservation of CCHC-type zinc finger motifs from UMSBP and other cellular proteins in reference to the structure and nucleic acid binding properties of the homologous retroviral motif.
The cleared cell lysate was
fractionated by ammonium sulfate and then subjected to hydrophobic
chromatography on phenyl-Sepharose and adsorption chromatography on
hydroxyapatite. The following chromatofocusing step separated two forms
of UMSBP (Table 1, Fractions VIa and VIb) with estimated pI
values of 7.25 and 6.25, respectively, and an apparent polypeptide mass
of 12.6 and 13.7 kDa (under denaturing and reducing conditions) (Fig. 1). The partial sequencing of the shorter polypeptide
chain by Edman degradation has revealed the absence of 11 amino acid
residues at the protein N terminus compared with the sequence of the
cDNA ORF(18) . The identity of the longer polypeptide was
verified by peptide mapping of the two polypeptide chains (not shown).
Only the longer polypeptide was observed and co-chromatographed with
UMS-binding activity at the purification steps prior to the
chromatofocusing (Fraction VI). We presume that the shorter
polypeptide, which has a significantly lower binding affinity to UMS
DNA (not shown), is a degradation product of the full-length protein,
formed at this stage of the procedure. The final chromatography on
hydroxyapatite was carried out separately for the two forms of the
protein, recovering approximately 7% (Fractions VIIa) and 5% (Fraction
VIIb) of the overall UMS-binding activity measured in the cell lysates.
However, a minor fraction of the shorter polypeptide is still present
in the final UMSBP preparation (Fraction VIIb). Apparently homogenous
UMSBP preparations were stable for at least one year at -75
°C, in the presence of 2 mM Mg ions.
Figure 1: SDS-polyacrylamide gel electrophoresis of UMSBP. A sample of 120 ng of protein Fraction VIIb and molecular weight markers were electrophoresed in an SDS-polyacrylamide gel composed of: 16.5% T 3% C separating gel, 10% T 3% C spacer gel and 4% T 3% C stacking gel, following the procedure of Schagger and Von Jagow (29) as described under ``Experimental Procedures.'' Protein markers were prestained ovalbumin (46 kDa), carbonic anhydrase (30 kDa), trypsin inhibitor (21.5 kDa), and aprotinin (6.5 kDa) (Rainbow, Amersham Corp.), and 17-, 14.4-, 10.6-, and 8.2-kDa myoglobin fragments (SDS 17 S, Sigma).
Figure 2:
Gel filtration of UMSBP. UMSBP and protein
size markers were filtered through a G3000 SW HPLC column as described
under ``Experimental Procedures.'' Protein markers and their
stokes radii were bovine serum albumin (35.5 Å), bovine pancreas
chemotrypsinogen (22.4 Å), bovine lactalbumin (20.1 Å), and
horse cytochrome C (16.4 Å). V was
determined using bovine thyroglobulin (669 kDa). UMSBP was detected by
the standard mobility-shift assay and protein markers by A
. The Stokes radius of UMSBP was interpolated
from the linear plot of (-log K
) versus the known stokes radii values of the protein markers, as described
by Siegel and Monty (34) .
Figure 3:
Glycerol gradient sedimentation of UMSBP.
UMSBP and protein size markers were centrifuged in a 10-30% (v/v)
glycerol gradient and assayed as described under ``Experimental
Procedures.'' Protein markers and their sedimentation coefficients (s) were E. coli DNA polymerase I (5.6 S), human hemoglobin (4.13 S), horseradish
peroxidase (3.85 S), and horse cytochrome C (2.1 S). The sedimentation
coefficient was interpolated from the linear plot of the s
values of the markers, as
described by Martin and Ames(35) .
These data suggest that the native C. fasciculata UMSBP is a homodimer with a protomer mass of 13.7 kDa. On the basis of the protein activity in cell extracts, the apparent molecular weight, and the specific activity of the pure protein, we estimate the presence of approximately 12,000 UMSBP molecules/Crithidia cell.
Figure 4:
Determination of the equilibrium binding
constants of UMSBPDNA complexes. Samples containing serial
dilutions of both UMSBP and DNA ligand (at a constant molar ratio) were
analyzed by the mobility shift assay, following the procedure of Fried
and Crothers (31) and Liu-Johnson et al.(32) ,
as described under ``Experimental Procedures.'' The DNA
substrates used were UMS-H12, a 12-mer oligonucleotide representing the
H-strand of UMS; UMS-H40, a 40-mer oligonucleotide containing the
H-strand of UMS and flanking sequences from C. fasciculata kDNA minicircle; TEL-12, a 12-mer oligonucleotide
(5`-GGGGTTGGGGTT-3`) that contains two telomeric repeats from T.
thermophila. Data were analyzed by plotting (1 - r)(
- r)/rversus 1/[DNA
] and adjusting
to obtain a y intercept value of 0, where r is the fraction of
DNA radioactivity that is in the band representing the protein-DNA
complexes, [DNA
] is the total concentration
of DNA in the reaction, and
is the unknown but constant molar
ratio of active protein to total DNA. The slopes reciprocals yield K = 2.5
10
M
for UMSBP interaction with UMS-H12 (
), K =
2.6
10
M
with UMS-H40
(
), and K = 4.1
10
M
with TEL-12 (
) (TEL-12
concentrations account for only the fraction of monomeric molecules, as
the G-quartets are not bound by UMSBP (see below, Fig. 5)).
Figure 5:
UMSBP binds a single-stranded but not a
four-stranded DNA structure. 0.1 ng (23 fmol) of
5`-P]-labeled TEL-12 (5`-GGGGTTGGGGTT-3`), which
contains two telomeric repeats of T. thermophila, were
incubated with 2.6 (lane a), 8.1 (lane b), and 24.2 (lane c) units of UMSBP (Fraction VIIb) and analyzed on a
native polyacrylamide gel under the standard mobility-shift assay
conditions. Lane d contains no UMSBP. Indicated are: UMSBP
DNA complexes, free monomeric DNA molecules, and free DNA
molecules that had been dimerized by forming a G4
structure.
G-rich sequences similar to UMS, such as those of eukaryotic telomere termini, retroviral RNA genome dimerization site, gene regulatory elements, and immunoglobulins switch regions, form in vitro special four-stranded (quadruplex) DNA structures. These structures, known as G-quartets or G4-DNA, are stabilized by Hoogsteen base pairing (38, 39, 40, 41, 42) . Several proteins that have been discovered recently, bind specifically these special conformations(43, 44) . Considering the specific binding of UMSBP to a G-rich ligand that may potentially form a four-stranded structure, we have explored the possibility that such a conformation is recognized by the protein. Since we could not detect stable quadruplexes formed in vitro by the 12-mer UMS H-strand oligonucleotide, we have used for this purpose a similar oligonucleotide containing the repeated Tetrahymena telomeric sequence 5`-GGGGTTGGGGTT-3`. UMSBP binds tightly to this telomeric sequence ( Fig. 4and (17) ). This oligonucleotide adopts two different DNA conformations that migrate as two different bands upon electrophoresis in a native polyacrylamide gel. The lower mobility band corresponds to the quadruplex structure, which is composed of two oligonucleotide molecules in a fold-back conformation(38, 45) , while the higher mobility band represents the monomeric structure. Mobility-shift analyses (Fig. 5) clearly demonstrate that UMSBP binds only the higher mobility monomeric molecules, but not the lower mobility four-stranded dimers.
No binding of a dimeric DNA ligand (such as
G4-DNA) by UMSBP could be observed (Fig. 5). However, we have
further explored the possibility that a single UMSBP molecule may bind
simultaneously more than one UMS site. To address this question, we
have used two DNA ligands that contain the 12-mer UMS sequence but
differ in their length. The oligonucleotide UMS-H12 contains only the
12-mer H-strand of UMS, while the 40-mer UMS-H40 contains the UMS
12-mer and its flanking sequence at the minicircle H-strand. Whereas
both DNA ligands are tightly bound by UMSBP (equilibrium binding
constants measured for the two protein-DNA interactions were almost
identical (Fig. 4)), the two protein-DNA complexes differ in
their electrophoretic mobility in native polyacrylamide gels. If UMSBP
binds only one UMS site, then two types of protein-DNA complexes could
be expected: UMSBP(UMS-H12) and UMSBP
(UMS-H40). However, if
the complex contains two UMS elements, then three types of complexes
may be expected: UMSBP
(UMS-H12)
,
UMSBP
(UMS-H40)
, as well as
UMSBP
(UMS-H12)
(UMS-H40)
. Fig. 6describes the results of such an experiment in which the
oligonucleotides UMS-H12 and UMS-H40 were mixed together at various
molar ratios as indicated, heat denatured in order to disrupt any
pre-existing higher order structures, and used as radioactive probes in
an electrophoretic mobility shift experiment with UMSBP. Reciprocal
titration of one species of UMS
DNA over the other at the various
molar ratios, yields only two types of protein-DNA complexes. No
additional species of protein-DNA complexes could be detected,
indicating that only one DNA molecule is present in the UMSBP
UMS
complex.
Figure 6:
Stoichiometry of UMS elements bound in the
UMSBPUMS complex. UMSBP (Fraction VIb, 26 units) was incubated
under the standard mobility-shift assay conditions in a series of
binding reaction mixtures containing 46 fmol total of
P-5`-labeled UMS-H12 and
P-5`-labeled UMS-H40
at the following UMS-H12/UMS-H40 molar ratios: only UMS-H12, 7:1, 3:1,
1.7:1, 1:1, 1:1.7, 1:3, 1:7, only UMS-H40 (lanesa-i, respectively). Reaction products were
electrophoresed in a native 5% polyacrylamide gel at 4 °C and 16
V/cm for 2 h. Indicated are: UMSBP
(UMS-H40) and
UMSBP
(UMS-H12) complexes; free UMS-H12 and UMS-H40
oligonucleotides.
To determine the precise number of UMSBP monomers that bind
a single UMS element in the complex, we have conducted a mobility-shift
electrophoresis analysis of the protein-DNA complexes using an S-labeled UMSBP and
P-5`-labeled UMS DNA. We
have measured a value of 2.1 UMSBP-monomer/UMS site (Fig. 7),
indicating the apparent binding of two UMSBP monomers to each UMS
binding site and suggesting that UMSBP binds to DNA as a protein dimer.
Figure 7:
Stoichiometry of the protein monomers in
the UMSBPUMS DNA complex. UMSBP was prepared by the specific
proteolytic cleavage of UMSBP-glutathione S-transferase fusion
protein, expressed in E. coli (H. Abeliovich and J. Schlomai,
manuscript in preparation) in the presence of
[
S]cysteine. 9.3 pmol (monomers) of
S-labeled UMSBP were incubated in the presence of 0, 0.55,
0.82, 1.24, 1.83, 2.75, and 4.12 pmol of either unlabeled or
P-5`-labeled UMS-H12 DNA, under the standard mobility
shift assay conditions. Reaction mixtures were electrophoresed under
the standard mobility shift assay conditions (except that the
electrophoresis TAE buffer was at pH 8.0). The amount of protein and
DNA molecules in the UMSBP
UMS complexes was determined using
PhosphorImager. The slope (at the range of 0-2.75 pmol of DNA)
yields a ratio of 2.1 UMSBP monomers/UMS
site.
Figure 8: Conservation of CCHC-type zinc finger motifs. 35 CCHC motifs from 5 cellular proteins are compared. In a, alignment of the amino acid sequence of the 35 CCHC motifs of C. fasciculata UMSBP(18) , T. thermophila CnjB(24) , human and mouse CNBP(21) , L. major HEXBP(22) , and S. pombe Byr3(23) . Shadedbackground denotes conserved amino acids, and darkbackground indicates the cysteine and histidine residues of the CCHC motif. In b, a summary of the amino acid conservation found at each of the positions of the CCHC motifs is shown.
Overall, we have found a high degree of conservation in 13 out of the 15 positions of the CCHC motifs of this family of eukaryotic cellular proteins. This remarkable conservation can be explained in light of the functions found by South and Summers for the same residues at the same positions in the HIV-1 Gag motif(25) .
Figure 9:
Binding of UMSBP to an RNA analog of the
G-rich strand of UMS. 0.2 ng (46 fmol) of P-5`-labeled
12-mer deoxyoligonucleotide, comprising the G-rich sequence of UMS
(lanes d-f), or its RNA analog 5`-GGGGUUGGUGUA-3` (lanesa-c) were incubated with 20 (lanesa and d) and 60 (lanesb and e) units of UMSBP (Fraction VIIb) and analyzed on a
native polyacrylamide gel under the standard mobility-shift assay
conditions. Lanesc and f contain no
UMSBP.
Figure 10:
Binding specificity of UMSBP-rUMS
interaction. 40 units of UMSBP (Fraction VIIb) were incubated under the
standard mobility shift assay conditions with 0.09 ng (21 fmol) of the P-5`-labeled UMS RNA analog (rUMS) and increasing
concentrations of unlabeled rUMS (
) or nonspecific RNA competitor
(0.3-7.4 kilobase pairs of RNA transcripts, RNA molecular weight
markers, Boehringer Mannheim) (
). Reaction mixtures were analyzed
in a native polyacrylamide gel and quantified as described under
``Experimental Procedures.'' The inset describes
measurement of the equilibrium binding constant for the interaction of
UMSBP and rUMS (K = 2.0
10
M
), calculated as described under
``Experimental Procedures'' and in the legend to Fig. 4. [RNA
] is the
total concentration of RNA in the reaction.
During the S-phase of the trypanosomatid cell cycle, two
highly interlocked kDNA catenanes, one composed of minicircles and the
other of maxicircles, replicate at the same time and at the same
location. Thus, replication and assembly of the two types of these
topologically linked kDNA circles requires a strict coordination
between their replication mechanisms. Recently, two copies of an 11-mer
sequence identical to UMS (apart from its 3`-terminal residue) were
found in the maxicircle variable region of Trypanosoma
brucei(46, 47) . The presence of this
origin-associated sequence in both minicircles and maxicircles may
provide a clue for understanding this coordination at the replication
initiation step. A specific origin-binding protein that interacts with
the origin-associated UMS, is a likely candidate to function in the
process of replication initiation and may play a role in a mechanism
that coordinates kDNA minicircle and maxicircle replication. It is
within this context that we had searched for and isolated a UMS-binding
protein from C. fasciculata cell extracts. Since the
3`-terminal residue of UMS is insignificant for specific binding by
UMSBP(17) , we expect that both the 12-mer UMS of the
minicircles and the homologous 11-mer sequence of the maxicircles would
be equally bound by the protein. The conservation of UMSBP binding
sites in both maxicircles and minicircles supports a possible role for
UMSBP in coordinating the replication of the two types of circles.
Since UMS resides within a duplex DNA molecule, binding of UMSBP
requires the melting of this sequence. We have recently found that
UMSBP binds to native DNA minicircles and that the origin-associated
UMS element resides within an unwound or otherwise sharply distorted
DNA structure. ()We have shown here that UMSBP can bind a
UMS RNA analog, as implied by the remarkable homology of the CCHC
motifs from UMSBP and the retroviral Gag polyproteins. Whether UMS is
indeed transcribed in the trypanosomatid cell and a UMS RNA ligand is
actually available for binding by UMSBP, is yet unknown. Further
investigation is required to determine the in vivo binding
target and the biological function of UMSBP.
G-rich sequences similar to the UMS, such as those of telomeres(38, 39) , HIV-1 RNA genome dimerization site(40, 41) , IgG switch region (43, 48) , and others (44, 49, 50) form in vitro special four-stranded structures known as quadruplexes or G-quartets. Several DNA-binding proteins were recently found to interact specifically with a G-quartet structure(43, 44, 51, 52) . Although UMSBP binds exclusively to single-stranded nucleic acid conformation ( Fig. 5and (17) ), it may participate in regulation of quadruplex formation through its high affinity binding to the single-stranded conformation of quadruplex-forming sequences.
Local
melting of the DNA double helix occurs during various cellular
activities such as replication, recombination, and transcription.
Single-stranded DNA and RNA binding proteins may play important roles
in such cellular processes. UMSBP contains
Cys-X-Cys-X
-His-X
-Cys-type
zinc finger motifs, typical to proteins that bind exclusively to
single-stranded G-rich nucleic acid ligands(20) . It belongs to
a distinct group of cellular proteins including Leishmania HEXBP(22) , human CNBP(21) , yeast
byr3(23) , and Tetrahymena CnjB (24) that
contain several adjacent CCHC motifs. Comparison of the CCHC-type
motifs of these proteins (Fig. 8), reveals a remarkably high
degree of conservation in 13 out of the 15 positions of this motif.
Most of the conservation can be explained in light of the functions
found by South and Summers (25) for the same residues at the
same positions of the HIV-1 Gag motif. On the basis of these data, we
suggest that the CCHC zinc finger motif is strictly conserved not only
in the primary amino acid sequence and structure, but also in its
mechanism of single-stranded nucleic acid binding. The observation that
UMSBP is able to bind an RNA analog of the G-rich strand of UMS ( Fig. 9and Fig. 10) supports this notion. Whether the
proteins of this well defined group share biological functions other
than binding to single-stranded nucleic acids is yet to be explored.