(Received for publication, December 14, 1995; and in revised form, February 14, 1996)
From the
The promoter for the gene coding for human protein C has been characterized as to nucleotide sequences that regulate the synthesis of mRNA. The major transcription start site was found 65 nucleotides upstream from the first intron/exon boundary along with two minor sites. Functional characterization of 1528 base pairs at the 5`-end of the gene was then carried out by chloramphenicol acetyltransferase reporter assays, protection from DNase I digestion, and electrophoretic mobility shift assays employing HepG2 and HeLa cells. One of the upstream regions (nucleotides -25 to +9) contained binding sites for at least two different transcription factors, including a hepatic nuclear factor 1-binding site (-10 to +9) and two overlapping and oppositely oriented hepatic nuclear factor 3-binding sites (-25 to -11). A second major region (PCE1) (+12 to +30) appeared to be a unique, liver-specific regulatory sequence. An Sp1-binding site in exon I (+58 to +65) was also recognized by cotransfection experiments with an Sp1 expression plasmid. Specific mutations in these promoter elements reduced transcriptional activity and abolished the binding of hepatic nuclear proteins. Finally, a strong silencer element (PCS1) (between -162 and -82) and two possible liver-specific enhancer regions (PCE2 and PCE3), which interact coordinately with the promoter elements, were also found (between -1462 and -162).
Protein C is a vitamin K-dependent zymogen of a serine protease that is present in plasma(1, 2) . The active form, called activated protein C, can regulate the blood coagulation cascade by minor proteolysis by the inactivation of activated factors V and VIII (3) . Protein C is synthesized in hepatocytes as a single chain precursor, which undergoes processing steps to give rise to a two-chain molecule held together by a disulfide bond. Additional post-translational modifications include carboxylation of 12 amino-terminal glutamic acid residues(4) , hydroxylation of an aspartic acid residue(5, 6) , and glycosylation of several amino acid residues (7) . The two-chain form is converted to activated protein C by thrombin in the presence of thrombomodulin by the cleavage of a 12-residue peptide from the amino terminus of the heavy chain(2, 8) . Protein C together with protein S, its cofactor, antithrombin III, and tissue factor pathway inhibitor represent major independent pathways for the regulation of blood coagulation. A deficiency of protein C constitutes a risk factor for venous thrombosis as well as other thrombotic disorders(9, 10) .
A large number of mutations have been found in the genes from patients with protein C deficiency, including several in the 5`-flanking region of the gene. Recently, activated protein C resistance with a factor V Leyden mutation has been identified as a highly occurring risk factor for thrombotic disease(11) . However, individuals with a single genetic defect, such as protein C deficiency or activated protein C resistance, can be asymptomatic. Combined genetic defects often lead to a much higher thrombotic risk and support the concept that hereditary thrombophilia is often a multigenic disease(12, 13) .
The gene for protein C is 11 kb (
)in length and
contains nine exons(14, 15) . It is located on
chromosome 2q13-q14. The gene shares significant organizational
similarity with the genes coding for the other vitamin K-dependent
proteins that circulate in blood. However, significant differences in
the steady-state mRNA levels in liver and in the concentrations of
these proteins in plasma occur.
A comparison of the 5`-flanking sequences of the protein C gene with those of the genes coding for the other vitamin K-dependent coagulation proteins indicates a significant DNA sequence divergence. Nevertheless, transcriptional regulation of these genes has certain common features. In this investigation, a number of positive elements as well as a negative element are identified that regulate the gene coding for human protein C. The data demonstrate that transcriptional activity of the TATA-less protein C promoter is largely dependent upon sequences surrounding the transcription initiation sites. Three liver-specific promoter regions are identified, including contiguous binding sites for HNF1 and HNF3 and a unique regulatory element (designated PCE1) present in exon I. Regions responsible for positive and negative regulations in the upstream enhancer region are also defined.
Mutations in the promoter-binding sites were generated in plasmids pPC-1482 and pPC-1528 using oligonucleotide-directed mutagenesis and the polymerase chain reaction technique, respectively(18) . Overlapping oligonucleotides with mutations were used as primers, and pPC-1482 and pPC-1528 were used as templates. The oligonucleotides used for sequencing primers, PCR primers, and EMSAs were synthesized on an Applied Biosystems Model 380B DNA synthesizer.
Figure 1:
Sequence of
the 5`-end of the gene for human protein C. Bases are numbered relative
to the major transcription start site (+1), marked with an asterisk. Two minor start sites are marked with double
asterisks. Deletion constructs used in reporter gene assays were
as follows. pPC-1482 contained protein C sequences from -1462 to
+20 (); pPC-1528 contained sequences from -1462 to
+66 (
), and pPC-n-66 contained
sequences from -n to +66. Exons are underlined. The translation start codon (ATG) is shown in boldface.
A series of deletion constructs were then
generated from plasmid pPC-1528 (Fig. 2A) and tested
for activity in HepG2 and HeLa cells (Fig. 2B). A
deletion from -1462 to -723 resulted in a 30% reduction in
activity in HepG2 cells, and a further deletion to -162 caused
another 30% reduction in activity. This suggested the presence of at
least two enhancer regions (PCE3 and PCE2) between -1462 and
-162 in the promoter. These reductions in activity, resulting
from the deletions from -1462 to -162, were not observed in
the absence of the full exon I sequence (data not shown). This suggests
that the function of these enhancer elements depends upon the initial
assembly of the initiation complex on the protein C promoter. Further
stepwise deletions from -162 to -82 resulted in an increase
of 4-fold in reporter gene activity, indicating the presence of a
strong silencer element (PCS1) in this region. Deletion of the sequence
from -82 to -42 resulted in a small but reproducible
decrease in activity. Finally, a precipitous reduction in expression
occurred upon deletion of the sequence from -42 to +66
(PCE1). These experiments indicate that one or more promoter elements
are located from -42 to +66 and are functionally responsible
for high efficiency transcription. This region also contains an Sp1
consensus sequence (+58 to +65) that may play a role in this
activity (see below).
Figure 2: Transient expression of CAT activities by deletion constructs transfected into HepG2 and HeLa cells. A, a series of PC-CAT fusion constructs containing varying lengths of the protein C 5`-end sequences were transfected into HepG2 and HeLa cells. B, shown are CAT activities expressed by deletion constructs. CAT activity of pPC-1528 was arbitrarily defined as 100% in HepG2 cells and used as a reference to normalize the CAT activity data of other constructs.
Deletion constructs were also transfected into HeLa cells to determine further which region(s) was responsible for directing the liver-specific expression of the protein C gene. Plasmid pPC-42-66, containing the region from -42 to +66, exhibited much higher CAT expression than the promoterless pCAT-0 plasmid in HepG2 cells (Fig. 2B). In contrast, little increase was observed in HeLa cells. These data indicate that the region from -42 to +66 contains strong liver-specific elements as well as other regulatory elements. Small increments with plasmid pPC-82-66 and a decrease in CAT activity with plasmid pPC-162-66 were observed in HepG2 and HeLa cells. Furthermore, the exon I-dependent enhancer activity in the region from -1462 to -162 was not observed in HeLa cells, suggesting that the enhancer elements are liver-specific regulatory sequences. Further investigation is needed to define the function of these enhancer elements.
Figure 3:
DNase I footprint analyses. A,
sense strand of the protein C promoter region. A DNA fragment
containing the protein C promoter (from -42 to +58) was
labeled at the 3`-end of the sense strand and was subjected to DNase I
digestion in the absence (lane 2) and presence of HepG2
nuclear extracts (lanes 3 and 4) and HeLa nuclear
extracts (lane 5). A purine-specific sequence marker (G +
A; lane 1) was obtained by Maxam-Gilbert sequencing of the
end-labeled fragment(55) . B, antisense strand. Lane 1, G + A sequence marker; lane 2, without
nuclear extracts; lane 3, with HepG2 nuclear extracts. Brackets indicate regions that are protected from DNase I
digestion. C, sequence of the protected regions identified as
FP-I and FP-II. Naturally occurring mutations are indicated (),
as are transcription start sites (*).
To evaluate the effect of the mutations on transcription of
the gene in the regions identified by DNase I footprinting, the
pPC-1528 construct was mutated and transfected into HepG2 cells. A
mutation in the HNF1-binding site at +3 (C T) or -2
(T
C) reduced the reporter gene activity by 90 and 85%,
respectively, while a mutation in the HNF3 site at - 20 (A
G) reduced the activity by 82%. Finally, mutations in the PCE1 site at
+23, 24 (GG
AA) reduced the activity by 84% compared with
the wild-type promoter. These results demonstrate that specific
mutations in these protein-binding sites for HNF1, HNF3, and PCE1
greatly impair the promoter for the protein C gene.
Figure 4:
EMSAs of protein binding at the protein C
promoter region. End-labeled duplex oligonucleotides were each
incubated with crude HepG2 nuclear extracts and analyzed by EMSAs. For
competition, unlabeled duplex oligonucleotides in 20- or 200-fold molar
excesses over labeled oligonucleotides were added to the reaction
mixture 10 min before adding the labeled probe. F indicates
the position of free oligonucleotide, and B indicates the
position of the retarded DNA-protein complexes. A, EMSA of the P-labeled PC(HNF1) oligonucleotide (lane 1) and
competition with 20- and 200-fold molar excesses of unlabeled PC(HNF1) (lanes 2 and 3), PC(HNF1,m1) (lanes 4 and 5), PC(HNF1,m2) (lanes 6 and 7), and HNF1 (lanes 8 and 9) oligonucleotides; EMSAs of the
P-labeled PC(HNF1,m1) (lane 10) and PC(HNF1,m2) (lane 11) oligonucleotides; and EMSA of the
P-labeled HNF1 oligonucleotide (lane 12) and
competition with a 200-fold excess of unlabeled HNF1 (lane 13)
and PC(HNF1) (lane 14) oligonucleotides. B, EMSA of
the
P-labeled PC(HNF3) oligonucleotide (lane 1)
and competition with 20- and 200-fold excesses of unlabeled PC(HNF3) (lanes 2 and 3), PC(HNF3,m1) (lanes 4 and 5), and HNF3 (lanes 6 and 7)
oligonucleotides and EMSA of the
P-labeled PC(HNF3,m1)
oligonucleotide (lane 8). C, EMSA of the PCE1
oligonucleotide (lane 1) and competition with 20- and 200-fold
excesses of oligonucleotides designed from binding sites for different
liver-specific transcription factors: PCE1 (lanes 2 and 3), HNF1 (lanes 4 and 5), HNF3 (lanes 6 and 7), HNF4 (lanes 8 and 9), HNF5 (lanes 10 and 11), and C/EBP (lanes 12 and 13). D, EMSAs of the
P-labeled PCE1
oligonucleotide (lane 1) and competition with 20- and 200-fold
excesses of unlabeled PCE1 (lanes 2 and 3) and
PCE1,m1 (lanes 4 and 5) oligonucleotides and EMSA of
the
P-labeled PCE1,m1 oligonucleotide (lane 6). E, EMSA of the
P-labeled PC(Sp1) oligonucleotide (lane 1) and competition with 20- and 200-fold excesses of
unlabeled PC(Sp1) (lanes 2 and 3), PC(Sp1,m1) (lanes 4 and 5), and Sp1 (lanes 6 and 7) oligonucleotides and EMSA of the
P-labeled
PC(Sp1,m1) oligonucleotide (lane
8).
As shown in Fig. 4A, DNA-protein complex formation by the PC(HNF1)
oligonucleotide and HepG2 nuclear proteins was not influenced by the
addition of mutated oligonucleotides (PC(HNF1,m1) and PC(HNF1,m2)) (lanes 4-7), but was competed and abolished by 20- and
200-fold molar excesses of unlabeled HNF1 consensus oligonucleotide (lanes 8 and 9). Furthermore, P-labeled
PC(HNF1) sequences that were mutated (PC(HNF1,m1) and PC(HNF1,m2)) were
also unable to bind hepatic nuclear proteins (Fig. 4A, lanes 10 and 11). Finally, the retarded bands formed
by the oligonucleotide containing the HNF1 consensus sequence and HepG2
nuclear proteins were competed and eliminated completely by a 200-fold
molar excess of HNF1 and PC(HNF1) oligonucleotides (Fig. 4A, lanes 12-14). These results
clearly indicate that a nuclear protein(s) binds to an HNF1 site in the
promoter of protein C and that a single base mutation at +3 (C
T) or -2 (T
C) abolishes this binding.
The
PC(HNF3) oligonucleotide was also bound to HepG2 nuclear protein, but
with low affinity. However, this DNA-protein complex was competed and
eliminated by a 20- or 200-fold molar excess of unlabeled PC(HNF3)
oligonucleotide (Fig. 4B, lanes 6 and 7). Unlabeled mutated PCP2 oligonucleotide (PC(HNF3)) was
unable to compete and abolish DNA-protein complex formation (Fig. 4B, lanes 4 and 5). Also, the P-labeled PC(HNF3,m1) oligonucleotide was unable to bind
hepatic nuclear proteins (Fig. 4B, lane 8).
These results demonstrate that this site is an HNF3-binding site and
that a single base mutation at -20 (A
G) can abolish its
binding to hepatic proteins.
As shown in Fig. 4C,
two retarded bands were formed when the PCE1 oligonucleotide designed
from the FP-I region was incubated with HepG2 nuclear extract.
Oligonucleotides designed from consensus sequences for the most
abundant known hepatic transcription factors, including
HNF1(23) , HNF3 and HNF4(28) , HNF5(29) , and
C/EBP(30) , were unable to compete and eliminate the retarded
complexes formed by the PCE1 oligonucleotide (Fig. 4C, lanes 4-13), indicating that this element is a unique
and specific sequence. Also, the unlabeled mutated sequence (PCE1,m1)
was unable to compete and eliminate the DNA-protein complexes (Fig. 4D, lanes 4 and 5). The P-labeled PCE1,m1 oligonucleotide also failed to bind
hepatic nuclear proteins (Fig. 4D, lane 6).
These two base mutations were located at +23 and +24 (GG
AA).
This study has demonstrated that 1.5 kb of DNA from the
5`-flanking region and 66 bp from the noncoding exon I sequences of the
protein C gene contain sufficient information for high level expression
of the gene in HepG2 cells. The data also indicate that the protein C
promoter consists of at least three liver-specific regulatory elements
and one general element that drives the high level, liver-specific
expression of the gene. These elements include HNF1, HNF3, and PCE1 as
well as a potential Sp1-binding site, all of which are located in the
region surrounding the transcription initiation site.
HNF1, a
homeodomain transcription factor, has been reported to be a major
transactivator of numerous liver-specific genes and is also an
activator of the protein C promoter(27) . Cotransfection with
HNF1
induced a 1.5-fold transactivation in the wild-type promoter
and a 0.8-fold transactivation in a mutated promoter. These data are
consistent with the present experiments showing that the HNF1-binding
site is important for basal level transcription of the gene. Whether
other factors of the HNF1 family participate in the transactivation of
the protein C gene is not yet clear.
The binding affinity of hepatic nuclear protein(s) for the two HNF3 sites in the protein C promoter was quite low. Recently, it has been shown that cotransfection experiments with an HNF3 expression plasmid and the wild-type protein C promoter resulted in a 4-5-fold increase in promoter activity in HepG2 cells(34) . HNF3-binding sites have been identified as essential cis-acting elements in the promoters and enhancers of several liver-specific genes. However, the transactivation by HNF3 of an HNF3-dependent minimal promoter was relatively low since it did not exceed 4-5-fold. In contrast, HNF1-dependent promoters show a >100-fold increase when excess HNF1 is present. Several laboratories have reported that an important role of HNF3 could be to cooperate with other factors bound to contiguous DNA elements, such as the glucocorticoid-responsive enhancer(29) , the nuclear factor 1 element (24) , the HNF4/ARP1/COUP-TF family-binding site(35) , and the HNF1 element(36) . The low binding of hepatic protein to the HNF3 site in the protein C gene may also be due to the absence of accessory proteins or sequences. Another proposed role for HNF3 involves the transition of chromatin from an inactive to an active conformation(37) . In the case of the gene for protein C, binding of HNF3 to the PC(HNF3) sequence may contribute to opening the chromatin at or near the protein C promoter, therefore making it available for subsequent HNF1 binding and transcriptional activation.
Deletion analysis from the 3`-end revealed that the PCE1
site, a unique and liver-specific regulatory sequence, was the
principal element for high efficiency transcription. Mutations at
+23 and +24 (GG AA) in this element abolished its
binding to hepatic proteins and greatly decreased its transcriptional
activity. Mutational analysis, cotransfection experiments, and EMSAs
also showed a potential Sp1-binding site in exon I downstream from the
PCE1 element. Sp1-binding sites have been demonstrated as an important
regulatory element at transcription initiation sites for many TATA-less
promoters, including the gene for factor VII(38) . It is
believed that the preinitiation complex is assembled around the
multiple initiation sites directed by the tightly clustered regulatory
elements in the proximal promoter region of the protein C gene. Any
disruption in the promoter sequences surrounding the transcription
initiation sites impairs the assembly of the preinitiation complex,
causing a reduction in transcription efficiency. Furthermore, this
promoter region is similar to the 80-bp enhancer region described for
the prothrombin gene in which the HNF1-binding site is flanked on the
3`-side by Sp1 sequences (Fig. 5).
Figure 5: Schematic comparison of the known transcription regulatory sites present in the genes coding for the human vitamin K-dependent coagulation proteins. Red inverted triangles correspond to silencer or repressor elements. Upright triangles correspond to positive regulatory elements. Black triangles indicate elements with no known homologous sequences. Other elements are labeled according to their corresponding transcription factors. FVII, factor VII; FIX, factor IX; FX, factor X; PC, protein C; NF-1, nuclear factor 1.
cis-Acting elements located upstream from the promoter can also modulate the promoter activity. The upstream -162/-82 fragment decreases the activities of the strongly active pPC-82-66. This reduction, observed in HepG2 cells as well as in HeLa cells, may be due to a silencer element interacting with ubiquitous factors or to other effects, such as steric hindrances exerted on promoter elements. A possible HNF4-like element from -131 to -116 is also located in this region. Polymorphism in this region has been described to affect plasma protein C levels in the population(39) . Further work is needed to elucidate the role of negative regulation in protein C gene expression.
One particular feature of the protein C gene among the vitamin K-dependent genes is that it contains an additional short noncoding exon I sequence upstream from the translation start codon (AUG), separated by a 1463-bp intron sequence. The gene coding for the only other vitamin K-dependent anticoagulant factor, protein S, has also been postulated to contain an additional noncoding exon I sequence since two transcripts have been observed(40) . The participation of intron sequences in regulating protein C gene expression is currently under investigation.
It is common in many TATA-less promoters that transcription initiates from a cluster of sites surrounding +1. Each of the initiation sites found in the protein C gene were surrounded by pyrimidine-rich sequences as characterized by most initiation sites of other genes. The +1 and -7 sites were present in the HNF1-binding site, whereas the +12 site was located in the PCE1 site (Fig. 3C). Several promoters, generally but not necessarily lacking the TATA-box, have an initiator element that can replace or reinforce the role of the TATA sequence in directing the location of a transcription start site. These initiator elements have recently been grouped into families based upon sequence homology(41) . Sequences surrounding the multiple initiation sites of the protein C gene, however, were not homologous to any other known initiator sequence(s). It is unclear at this point whether an unidentified initiator element in the protein C gene or the clustered regulatory elements initiate the assembly of the transcription apparatus. This is very similar to the factor IX gene, where the promoter is characterized by a tight cluster of regulatory elements surrounding the transcription initiation site(42) . Mutations found to date in the 5`-flanking region of the factor IX gene in patients with factor IX deficiency (hemophilia B Leyden phenotype) were all located in this tight cluster called the Leyden-specific region from -40 to +20 in the 5`-end sequence. This is also comparable to the protein C gene, in which naturally occurring mutations in the 5`-region were located from -20 to +3. Characterization of the protein C promoter led to the understanding of genetic disorders caused by known and possible additional mutations occurring in the 5`-region in patients with type I protein C deficiency.
In addition to protein C, there are six other vitamin K-dependent glycoproteins that circulate in blood, including factors VII, IX, and X, prothrombin, protein S, and protein Z. The genes for these vitamin K-dependent proteins share significant organizational similarity and have evolved from a common ancestral gene(43) . It is also noted that five of these genes (not including the protein S and protein Z genes, the regulation of which has not been studied) are regulated by ``TATA''-less promoters. Furthermore, transcriptional regulation of these genes shares certain common features (Fig. 5). The factor VII gene, which is located on chromosome 13q34-qter(44) , is regulated by two promoter elements, the FVIIP1 site containing an HNF4-binding site and the FVIIP2-binding site present in a GC-rich sequence that binds hepatic specific factors as well as the ubiquitous transcription factor Sp1(38) . In addition, two silencer elements were located upstream of the promoter region. The factor IX gene, which is located on chromosome Xq26-27, is regulated by the presence of liver-specific cis-acting elements that interact with the liver enriched transcription factors C/EBP and HNF4 (45, 46) and with the liver-specific transcription factors nuclear factor 1-like liver-specific protein and D-site-binding protein (DBP) (42) . The factor IX gene may be hormonally regulated since the deficiency in hemophilia B Leyden, which is caused by mutations in the Leyden-specific regulatory region of the factor IX gene, can be partially overcome following puberty or by the administration of testosterone(47) . In rat liver, DBP, which also recognizes some of the cis-elements as C/EBP, was not expressed until puberty(48) . Reporter gene studies and DNA-protein binding assays with factor IX promoter sequences containing hemophilia B Leyden mutations of the C/EBP-binding site suggest that DBP may enhance C/EBP binding and transcription of the factor IX gene due to a synergistic interaction between C/EBP and DBP(49) . Hence, the hormonal regulation of the factor IX gene is probably due to the induction of DBP expression during puberty rather than the presence of an androgen-responsive element in the factor IX gene. The factor X gene is located 2.8 kb downstream from the factor VII gene on chromosome 13q34-qter. The factor X gene is regulated by three positive regulatory regions (FXP1, FXP2, and FXP3 sites) and a negative element that blocks the transcriptional activity toward the upstream factor VII gene(17, 50) . Transfection in HepG2 and human fibroblast cells suggests that the FXP1 and FXP3 sites interact with liver-specific trans-activating factors, while the FXP2 site interacts with ubiquitous transcription factors. Furthermore, the FXP1 site contains a 22-bp sequence similar to the consensus recognition site for the liver-specific transcription factor HNF4. This HNF4-binding element in the FXP1 site has a 6-bp core sequence (CTTTGC) that is also present in the HNF4-binding element present in the factor VII and factor IX promoters. The prothrombin gene is located on chromosome 11p11-q12(51) . It contains a weak promoter immediately before the transcription initiation site and a liver-specific enhancer sequence located 860-940 nucleotides from the transcription initiation site. The latter region apparently interacts with HNF1 and is flanked on the 3`-side by GC-rich sequence that is similar to an Sp1-binding site and is essential for enhancer activity (52, 53, 54) . The 10-base pair GC-rich sequence shares 90% sequence identity with the Sp1-binding site present in the factor VII promoter. As shown in Fig. 5, there are a number of common regulatory units shared by the vitamin K-dependent proteins as well as unique sequences that regulate the individual proteins.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U47685[GenBank].