(Received for publication, February 2, 1995; and in revised form, August 17, 1995)
From the
Mitogen-activated protein kinase (MAPK) or extracellular
signal-regulated kinase are ubiquitous kinases conserved from fungi to
mammals. Their activity is regulated by phosphorylation on both
threonine and tyrosine, and they play a crucial role in the regulation
of proliferation and differentiation. We report here the cloning of the
murine p44 MAP kinase (extracellular signal-regulated kinase 1) gene,
the determination of its intron/exon boundaries, and the
characterization of its promoter. The gene spans approximately eight
kilobases (kb) and can be divided into nine exons and eight introns,
each coding region exon containing from one to three of the highly
conserved protein kinase domains. Primer extension analysis reveals the
existence of two major start sites of transcription located at
-183 and -186 base pairs (bp) as well as four discrete
start sites for transcription located at -178, -192,
-273, and -292 bp of the initiation of translation.
However, the start site region lacks TATA-like sequences but does
contain initiator-like sequences proximal to the major start sites
obtained by primer extension. 1 kb of the promoter region has been
sequenced. It contains three putative TATA boxes far upstream of the
main start sites region, one AP-1 box, one AP-2 box, one Malt box, one
GAGA box, one half serum-responsive element, and putative binding sites
for Sp1 (five), GC-rich binding factor (five), CTF-NF1 (one), Myb
(one), p53 (two), Ets-1 (one), NF-IL6 (two), MyoD (two), Zeste (one),
and hepatocyte nuclear factor-5 (one). To determine the sites critical
for the function of the p44 MAPK promoter, we constructed a series of
chimeric genes containing variable regions of the 5`-flanking sequence
of p44 MAPK gene and the coding region for luciferase. Activity of the
promoter, measured by its capacity to direct expression of a luciferase
reporter gene, is strong, being comparable with the activity of the
Rous sarcoma virus promoter. Progressive deletions of the 1 kb
(-1200/-78) promoter region allowed us to define a minimal
region of 186 bp (-284/-78) that has maximal promoter
activity. Within this context, deletion of the AP-2 binding site
reduces by 30-40% the activity of the promoter. Further deletion
of this minimal promoter that removes the major start sites
(-167/-78) surprisingly preserves promoter activity. This
result implicates a major role of this region that contains the Sp1
sites. Finally, removal of the major start sites of transcription as
well as the Sp1 sites reveals additional promoter activity at the
upstream transcription minor start sites (-240/-167), an
activity that is enhanced by the upstream cis-acting elements. In
summary, our findings reveal a complex pattern of transcriptional
regulation of the mouse p44 MAPK promoter.
Mitogen-activated protein kinases (MAPKs) ()or
extracellular signal-regulated kinases were first described as two
proteins of 42 and 44 kDa that were phosphorylated on both tyrosine and
threonine residues following stimulation of 3T3-L1 adipocytes with
insulin(1, 2) . These same phosphoproteins had been
visualized previously by two-dimensional gel
electrophoresis(3, 4, 5) . MAPKs are
ubiquitously expressed, being found in all cell systems studied
including yeast, worms, flies, frogs, plants, and mammals(6) .
They are activated by a wide variety of extracellular signals, and
their activation requires the phosphorylation of the highly conserved
TEY motif present in almost all described MAPKs. An increasing body of
data in particular in yeast suggests that MAPKs belong to a multigene
family. In yeast, each reported isoform has been implicated in a
different signaling pathway, leading to mating, cell wall synthesis, or
regulation of osmotic pressure. Studies with the mammalian homologues
of yeast MAPK suggest that they play equivalent roles in different
processes, including proliferation, differentiation, and response to
environmental stress (7, 8, 9, 10, 11) .
As far as the p42 and p44 MAPK are concerned, two approaches have demonstrated their role in controlling fibroblast cell growth. First, we showed that overexpression of either a dominant-negative p44 MAPK mutant or an antisense construct prevented growth factor-induced cell cycle entry. Second, we (12) and others (13, 14) demonstrated that expression of a constitutively active form of a MAPK activator (MEK1) led to the constitutive activation of p42 and p44 MAPK, an action sufficient to promote cell cycle entry and oncogenicity in fibroblasts. So, at this stage, both MAPK isoforms that are coordinately regulated and capable of phosphorylating identical substrates in vitro appear to be redundant. Alternatively they might serve different functions as a consequence of alternative spliced isoforms that display distinct subcellular localization as recently reported(15) . To resolve this issue we isolate genomic MAPK clones in order to study their regulation and to subsequently disrupt each corresponding mouse gene. Here we describe the detailed structure of the murine p44 MAPK (extracellular signal-regulated kinase 1) gene. We have also investigated its promoter to identify cis elements important to drive transcription by a deletion analysis.
With the existence of a large MAPK family member and the presence of various pseudogenes, it was crucial to characterize with certainty the genomic clones that hybridize with the entire hamster p44 MAPK cDNA. From the two phages, 14 and 15, that hybridized at high stringency to the p44 MAPK probe, only phage 15 was assigned to the mouse p44 MAPK (extracellular signal-regulated kinase 1) gene. This identification was certified by total exon sequencing. In contrast, phage 14 corresponds to a p44 MAPK close family member. From the phage 15, we estimated that the transcription unit of the mouse gene for p44 MAPK (extracellular signal-regulated kinase 1) spans approximately 8 kb. Fig. 1shows the relationship of the gene to its corresponding mRNA/cDNA and protein. Sequencing of the plasmid subclones that hybridize to the hamster p44 MAPK probe allowed the determination of the position of the intron/exon junctions. We found nine exons ( Fig. 1and Table 1), and all of the splice acceptor and donor sequences agree with the ``GT-AG'' rule(19) . Each exon of the gene encodes one or more of the conserved subdomains previously identified in protein kinases(20) . The first exon contains all of the 5`-untranslated region and also contains the region coding for the GXGXXG domain determinant for the ATP binding. The possibility of alternative splicing with the presence of an additional intron in that region is discussed later. Exon 2 contains lysine 72 of subdomain II implicated in phosphate transfer, subdomain III, and subdomain IV with conservation of glutamic acid 89 and hydrophobic residues; exon 3 encodes the subdomains V and VI with the HRD motif; exon 4 encodes subdomain VII with the invariant DFG motif and subdomain VIII containing the APE triplet; exon 5 encodes the subdomain IX where aspartic acid 228 is conserved; exon 6 encodes subdomain X; exon 7 encodes subdomain XI; exon 8 encodes a region of the protein apparently implicated in the specificity of substrate recognition of the MAP kinase family plus 5.4% of the 3`-untranslated region; and exon 9 encodes the remaining 3`-untranslated region.
Figure 1: Organization of the mouse p44 MAPK gene in relation to its mRNA and predicted protein structure. Positions of exons (filled) and introns (open) are shown aligned with the common restriction enzyme sites and the position of the major transcriptional start sites. The locations of the introns are indicated by the nucleotide number on the cDNA (the exons are boxed, black for coding regions and hatched bars for 5` and 3` noncoding regions) where base 1 corresponds to the ``A'' of the ATG. Roman numerals correspond to the conserved kinase subdomains previously defined by Hanks et al.(20) .
Figure 2: Identification of the transcription start sites on the mouse p44 MAP kinase gene by primer extension. Primer extension reaction was performed with 30 µg of total ES cell RNA and ERS 5 oligonucleotide. The sizes of the extended fragments (178, 183, 186, 192, 273, and 292 bp) are indicated by arrows. The double-stranded sequencing reaction shown on the left, used to determine the size of the fragments described above, was obtained with the RP oligonucleotide primer used to sequence an internal PstI fragment of the p44 MAPK gene subcloned in the PTZ vector (Bio-Rad) (see ``Experimental Procedures'').
Figure 3: Sequence of the ATG 5`-flanking region of the mouse p44 MAPK gene. Analysis of the sequence flanking the ATG start from -1 (first base upstream of the A of ATG) to -1325. Consensus sequences for DNA-binding proteins (TATA box; AP-2; AP-1; Myb; p53; Ets-1; GAGA box; Malt box; NF-IL6; CTF-NF1; Sp1; GCF; serum-responsive element (Half SRE); MyoD; Zeste; hepatocyte nuclear factor 5 (HNF-5)), splice donor or splice acceptor sequences, oligonucleotides used in primer extension analysis (ERS 2, ERS 5, GP9), and major restriction sites are underlined. The restriction enzyme sites are BglII, PstI, NheI StyI, BssHII, and SacII. The positions of the start sites of transcription are shown by R (178, 183, 186, 192, 273, and 292). The lowercase letters represent the position of the intron. In the case of Sp1 or GCF the number of of each site in the underlined sequence is given.
Figure 4: Measurement of p44 MAP kinase promoter activity by transient expression of chimeric luciferase reporter gene-promoter constructs in CCL39 lung fibroblasts. Activities from different p44 MAPK promoter constructs measured in cells stimulated with 10% fetal calf serum were compared and plotted (percentage of the luciferase activity of the BH construct considered as 100%). The names of the constructs and the numerotation relative to the initiation of translation (+1) are given (see also Fig. 3and ``Experimental Procedures''). These data are representative of five independent transient transfection experiments.
MAPK belongs to a multigene family, and previous reports have shown that expression of dominant negative mutants or antisense constructs of p44 MAPK were able to inhibit fibroblast proliferation (7) . Because of their potential importance in growth control(7, 35) and differentiation (36, 37, 38) the genes for human p44 MAPK (extracellular signal-regulated kinase 1), p42 MAPK (extracellular signal-regulated kinase 2), and p63 MAPK (extracellular signal-regulated kinase 3) have been mapped(39) . However, it has not been possible to attribute specific biological roles to the individual isoforms even if some data describe differential activation of p42 MAPK versus p44 MAPK in platelets(40) . As a first step in such an analysis, we have isolated and partially characterized several different mouse MAPK genomic clones and characterized the gene for mouse p44 MAPK (extracellular signal-regulated kinase 1) and a portion of its 5`-flanking regulatory region in detail.
The p44 MAPK (extracellular signal-regulated kinase 1) gene spans approximately 8-kb and is divided into nine exons. An interesting aspect of the gene's structure is that one or more of the domains highly conserved among protein kinases are contained within individual exons. This is the first example of such a distribution, and it is strikingly different from what is observed in related kinases, such as mammalian cdc2 (41) which is divided into four exons only, without precise division of the protein kinase subdomains among them. This unusual subdivision could result from the evolution of an ancestral gene that has progressively acquired specific characteristics. The first 7 exons encode the protein kinase domains. An additional exon, exon 8, encodes the carboxyl terminus of MAPK. The C-terminal domain it encodes can be considered to be specific for p44 MAPK because it is one of the most divergent domains among the MAPK related kinases, the other variable domain being subdomain X. The ninth exon directly encodes 95% of the 3`-untranslated region of p44 MAPK mRNA.
We also describe the presence of two predominant start sites of transcription located at -183 and -186 bp upstream from the ATG (A = 1). However, overexposure of the gels allowed us to see additional discrete start sites. The presence of multiple sites of transcription initiation is open to interpretation. First, the oligonucleotides used in primer extension analysis could have hybridized to mRNA not yet described. This possibility has to be considered because during the screening of the genomic library five other clones, each apparently encoding a different gene, were shown to hybridize to the p44 MAPK probe at high stringency. The phenomenon could also be explained by the absence of a real consensus sequence for a TATA box. For SV 40 and histone H2A genes, removal or mutation of the canonical TATA box results in the initiation of transcription at many sites within the promoter(42, 43, 44) . A third interpretation of the detection of minor transcripts is the possibility of alternative splicing suggested by the presence of one splice donor and three splice acceptor consensus sequences. While it is possible that the splice donor site at -293 is used, it is not likely to be used with the splice acceptor site at position -138 since (a) the major start sites of transcription would be located at positions -338, -343, -346, and -352 in Fig. 2, too close to the ATG at position -338 and (b) the sequence located upstream of this ATG does not match the consensus sequence of Kozak. If splicing occurs between the donor site at position -293 and the splice acceptor sites at +92 or +100 the downstream ATG at position +166 could be used. However, the sequence upstream of this ATG also does not match the consensus sequence of Kozak, and if translation did start at this ATG the conserved GXGXXG ATP binding domain would be deleted. If such alternative splicings did occur, then shorter or longer mRNA could be transcribed from the same gene. Detection of a shorter mRNA has already been described for p42 MAPK/extracellular signal-regulated kinase 2, apparently as a result of alternative splicing of the gene(45, 46) . For the reasons outlined above and because it is impossible to detect, by high resolution polyacrylamide gel electrophoresis and Western blot, proteins with higher or lower molecular weight in fibroblasts or ES cell extracts (data not shown), we believe that it is unlikely that such alternatively spliced transcripts exist in these cells.
We have shown that a 1128-bp BglII/SacII fragment was sufficient to drive transcription of the luciferase gene. Activity of this promoter is high because it is comparable with the activity induced by the RSV promoter, which is considered to be a strong promoter. We have also shown that transcription can be initiated from each start site of transcription determined by primer extension and probably from initiator-like sequences (34) or fibroblast-specific initiation of transcription at least in vitro. First, transcription can be initiated from the minor start sites of transcription leading in the NP construct. The basal transcriptional activity detected with this construct is strongly enhanced in the P3` construct, suggesting positive intervening sequences within the -939/-367 region. However, deletion of an NheI/SacII on the 3` region of the BH fragment (BN) results in complete loss of activity, demonstrating that the TATA box located upstream of the NheI site does not have a relevant activity. Second, the +AP-2/Bs and -AP-2/Bs constructs containing the major start sites of transcription can also drive transcription but at a low level when compared with the BH construct. Presence of a small amount of mRNA in lung (47) and CCL39 fibroblasts (16) suggests that these sites are predominantly used in CCL39 cells. Third, the -167/-78 region where no start sites of transcription are detected is responsible for high basal promoter activity. Different interpretations could account for this result. We can suspect intervention of initiator-like sequences (34) or start sites undetectable in ES cells that can be alternatively used when the major ones are deleted. We can imagine that these different possibilities of initiation reflect what happens in vivo when tissue-specific cis-acting elements are used.
The MAPK promoter contains consensus binding sites for many transcription factors. Their presence however, does not prove their involvement in the regulation of p44 MAPK promoter activity. In fact, it is difficult to attribute a role to the binding sites located upstream of the NheI restriction site. The fact that the luciferase activity of the P3` construct is higher than the NP construct proves that this region contains positive regulatory elements. However, the role of the AP-2, AP-1, and CTF-NF1 sequences is clearer because their deletions decrease the p44 MAPK promoter activity. The fact that Jun, a partner of the AP-1 complex is phosphorylated by a Jun kinase, a member of the MAP kinase family, makes it tempting to speculate that MAPK could regulate its own transcription(8) . However, cotransfection of the BH construct with expression vector for Fos and Jun or constitutive active form of MAP kinase kinase (12) only shows a small increase in the MAPK promoter activity (data not shown). The role of Sp1 sites is unclear because they are all situated before the major start sites of transcription. Sp1 sites have been shown to play a role in transcription of housekeeping genes such as hprt(48, 49) or dhfr(50, 51) genes in the regulation of genes specific to or maximally expressed in the nervous central system such as nicotinic acetylcholine receptors (52) and plasminogen activator (53) as well as in the regulation of transcription of growth control-regulated genes such as c-myc(54) , epidermal growth factor receptor (55) and Ha-ras(56) . In each of these genes many start sites of transcription have been documented such as in the p44 MAPK gene. In the case of epidermal growth factor receptor promoter, transcriptional activity can be detected in chloramphenicol acetyltransferase transient transfection assays even in the absence of the major start sites of transcription(57) , and the proximal bases in front of the initiation of translation can bind nuclear proteins in gel retardation assays, suggesting a major role of this region in the initiation of transcription. This region can function as a promoter and mediates inductive response to epidermal growth factor, phorbol 12-myristate 13-acetate, and cAMP(58) . It is the case for the BsH construct, which shows high transcriptional activity. We can suppose that three of the five Sp1 sites in the construct (stop at the SacII site) are implicated in this activity. However, it is difficult to say if they play the same role in vivo. In fact, Sp1 belongs to a multigene family whose expression varies in different tissues. Expression of Sp1 is high in lung and thymus(59) , but p44 MAPK is low in lung and expressed to near undetectable levels in thymus(47) . However, the very high amount of MAPK mRNA in brain shows that previously described brain-specific Sp1-like factors (53, 60) could regulate transcription of MAPK. The presence of binding sites for GCF overlapping the Sp1 sites suggests a balance activity of these two transcription factors in different physiological conditions. A recent report also describes phosphorylation of Sp1 by a DNA-dependent protein kinase(61) . This phosphorylation could be the result of activation of the kinase pathway activated by UV light or by signaling pathway leading to apoptosis. We can also imagine Sp1-dependent activation of the transcription of p44 MAPK after such stress.
The variation of p44 MAPK mRNA levels in different organs could implicate tissue specific elements of the promoter(47) . Thus, the presence of a binding site for hepatocyte nuclear factor-5, which is involved in gene expression in the liver, two binding sites for MyoD, which is implicated in myocyte differentiation, and a GAGA box, which has been shown to be important in Drosophila development would suggest developmental expression and tissue-specific regulation of the p44 MAPK gene. However, the p44 MAPK mRNA and protein levels are not influenced by growth factors or by the position in the cell cycle in a given cell line, suggesting that the p44 MAPK gene, like many housekeeping gene products, is not submitted to acute regulation. In contrast, the complexity or ``plasticity'' of the promoter region, here defined, might reflect its ubiquitous expression from embryonic stem cells (ES cells, data not shown) to most differentiated tissues.
In summary the work reported here on the cloning and characterization of the p44 MAPK gene is the first step toward the inactivation of the gene by homologous recombination in embryonic mouse stem cells. Parallel studies with the p42 MAPK gene will be necessary to determine whether each MAP kinase serves specific function or is totally redundant and can entirely substitute for each other.