(Received for publication, May 5, 1995; and in revised form, September 5, 1995)
From the
Zp1 encodes one of the three major glycoproteins of the zona pellucida, an extracellular matrix that surrounds growing oocytes, ovulated eggs, and preimplantation embryos. The mouse gene is composed of 12 exons ranging in size from 82 to 364 base pairs and spans 6.5 kilobase pairs on chromosome 19 (2.13 ± 1.5 centimorgans distal to D19Bir1). The Zp1 exon map is similar to ZPB, a human orthologue, and an E-box (CANNTG), implicated in oocyte-specific gene expression of mouse Zp2 and Zp3, is similarly located upstream of the transcription start site. The single copy Zp1 gene encodes a 623-amino acid protein, the carboxyl-terminal half of which is significantly similar to a corresponding region of mouse ZP2. The conservation of this same region in a fish egg envelope protein suggests that not only has this protein domain been duplicated in mammals but that it has been conserved and used as an egg envelope protein in species that diverged 650 million years ago.
Among vertebrates, different reproductive strategies have evolved based on mating behavior, gamete structures, and the specificity of recognition molecules on the surface of sperm and eggs. In all vertebrates, however, a prerequisite to successful fertilization is penetration of sperm through an acellular envelope surrounding ovulated eggs. In mammals, capacitated sperm bind in a seemingly non-site-directed manner to the zona pellucida. Following the induction of the acrosome reaction and release of lytic enzymes, sperm penetrate the zona and fuse with the egg's plasma membrane, triggering the postfertilization block to polyspermy(1) . In contrast, most fish sperm lack an acrosome and penetrate the vitelline envelope surrounding fish eggs via a discrete micropyle(2) . Most commonly, the micropylar channel is sufficiently narrow to permit the passage of a single sperm, and subsequent fusion with the plasma membrane induces the cortical granule reaction, resulting in a block to polyspermy(3) . It has become increasingly clear that the proteins of the zona pellucida are conserved among eutherian mammals and that the proteins of the vitelline envelope are conserved among teleostean fish. More recently, it has become apparent that, although critical for speciation, the proteins from the mammalian egg envelope are distinctly related to those of the teleostean envelope.
The mouse zona pellucida contains three major glycoproteins: ZP1, ZP2, ZP3. Genes encoding the latter two zona proteins have been characterized. Zp2 is composed of 18 exons(4) , of which six encode a 241-amino acid domain reported as 28% identical with the wf♀ protein of the white flounder teleost(5) . Zp3 contains eight exons(6, 7) , of which the first six encode a 261-amino acid domain that is 33% identical with ZI-3, a major component of the inner layer of the egg envelope of a second teleost, Oryzias latipes(8) . Although similar structural domains are present in egg envelope proteins of teleosts and eutherian mammals, the site of synthesis is quite different in these two classes of vertebrates. In mice, the three zona genes (Zp1, Zp2, Zp3) are transcribed exclusively in growing oocytes(4, 9, 10) , and the resultant zona proteins are secreted to form the extracellular matrix. In contrast, there is growing evidence that proteins of the two teleost egg envelopes are produced in the liver after stimulation with estrogens and then transported to the egg where they form the vitelline envelope(5, 8) .
We have previously reported the characterization of mouse Zp2 and Zp3 genes. We now report the characterization of mouse Zp1 and compare the encoded protein to other egg envelope proteins in mammals and fish.
The sizes of introns were determined by DNA sequencing or polymerase chain reaction in a Perkin Elmer GeneAmp PCR System 9600 using Zp1 exon-specific forward and reverse oligonucleotide primers. The reaction conditions were as described by the Taq polymerase protocol (Perkin Elmer): 25 cycles of 95 °C for 15 s, 50 °C for 45 s, and 72 °C for 1.5 min. The first cycle was preceded by 5 min at 99 °C, and the last cycle was followed by a 7-min extension at 72 °C. The polymerase chain reaction products were analyzed by agarose gel electrophoresis.
Figure 1: Exon-intron Map of Mouse Zp1. Schematic representation of the exon map of mouse Zp1 and its human orthologue, ZPB(16) . The 12 vertical bars in each gene represent exons. The sizes of the mouse and human exons are nearly identical except for 3 and 12, both of which are larger in the mouse.
Overall, the exon maps of the mouse Zp1 and its human orthologue, ZPB, are remarkably conserved (Fig. 1). Although the human gene (16) is more spread out (encompassing 11 kb), the sizes of most exons are nearly identical. Only exons 3 and 12 are larger in the mouse, and it is mostly the additional sequence in the third exon that accounts for the larger mouse protein (623 amino acids) compared with the human protein (540 amino acids). Excluding the region unique to the mouse, 510 amino acids of the two proteins align; 53% of these residues are similar (42% identical) to the human ZPB.
Figure 2:
Chromosome localization. A,
Southern blot analysis of genomic DNA (4 µg) probed with P-labeled ZP1 cDNA after digestion with HindIII:
parental controls C57BL/6J (lane 1) and M. spretus (lane 2); back-cross progeny ([C57Bl/6J
SPRET/Ei)F
]
SPRET/Ei) animals 89-94, (lanes 3-8). Numbers to the left indicate molecular
weight markers (kb). B, haplotype analysis of the genomic DNA
from 94 lines of back-cross progeny (F
SPRET/Ei) of
the BSS panel from the Jackson Laboratory. Dark and light
squares indicate the presence of the C57Bl/6J and M. spretus alleles, respectively. Gray squares indicate animals
untyped for Vidlr. Previously mapped loci bracket Zp1 and are indicated at the left. The distances between loci
are indicated at the right in cM ± standard deviation (S.E.).
The numbers at the bottom of each column of squares indicate the number
of progeny with the particular haplotype. C, schematic
representation of mouse chromosome 19. Centromere is a filled
circle at the top of the vertical line. Relevant
loci are listed on the right. Numbers on the left indicate the
approximate map distance between selected loci in
cM.
The single genetic locus and the simple
digestion patterns seen in the preliminary studies designed to detect
polymorphisms at the Zp1 locus suggested that Zp1,
like Zp2 and Zp3, is a low copy number gene. Six
restriction enzymes recognizing hexanucleotide sites (SacI, BamHI, EcoRV, NcoI, KpnI, and HindIII) were used to digest isogenic samples of genomic DNA,
and the Zp1 fragment was subcloned into SuperCos I. After
digestion, Southern blots were prepared and probed with P-labeled ZP1 cDNA (Fig. 3). All of the restriction
enzyme fragments detected in genomic DNA (three examples are shown in Fig. 3A) were present in the subcloned genomic fragment
digests (Fig. 3B). These observations are consistent
with there being only one copy of Zp1 in the mouse genome.
Figure 3:
Restriction Enzyme Analysis of Zp1. A, Southern blot of 129Sv mouse genomic DNA (10
µg) probed with P-labeled ZP1 cDNA after digestion
with SacI (lane 1), BamHI (lane 2),
and EcoRV (lane 3). Numbers to the left indicate
molecular weight markers (kb). B, same as panel A except that each lane contains the 18.5 kb of Zp1 fragment subcloned into SuperCos I (0.05 µg). None of the
enzymes cleave within the 7.6-kb vector.
Figure 4:
Conservation of the mouse ZP1 protein. A, primary structure of the 623-amino acid ZP1 protein shown
on the first line was deduced from the coding regions of the Zp1 gene isolated from a 129Sv genomic library. The amino acid
sequence, represented with single-letter code, is numbered on the right. For comparison, the amino acid sequence deduced from a
cDNA clone derived from NIH Swiss mice (10) is shown on the second line. Dashes represent identities between the
NIH Swiss and 129Sv mice; the three polymorphism at positions 246 (Thr
Ala), 445 (Val
Leu), and 486 (Arg
Lys) are
indicated by A, L and K, respectively. B, schematic representation of the conserved protein domains
between mouse ZP1 (dark rectangle) and ZP2 (light gray
rectangle, lower) encoded by exons 5-12 and
11-18, respectively. The 348-amino acid sequence of mouse ZP1 is
47% similar (32% identical) to that of mouse ZP2. A smaller region of
the ZP1 polypeptide (275 residues, encoded by exons 5-9 plus exon
11) is 52% similar (36% identical) with a region encoded by exons
2-7 of white flounder wf♀ (light gray rectangle, upper), an egg envelope
protein(5) .
In addition to the conservation among mammals, the ZP1 protein also contains a domain that is present in other zona pellucida proteins of the same species as well as in other egg envelope proteins from very disparate species (Fig. 4B). Within 348 amino acids of ZP1 (residues 268-623) that align with mouse ZP2 (residues 363-713), 47% of the amino acids are similar, 32% are identical. In Zp1, this region is encoded by exons 5-12 and, in Zp2, by exons 11-18. Although each gene is located on a different chromosome, it appears that the domain encoded by these eight exons comes from a common ancestral gene. It further appears that much of this ancestral gene was present 650 million years ago. A similar (albeit slightly smaller) domain is present in an egg vitelline envelope protein present in white flounder(5) , where it is encoded by exons 2-7 of the wf♀ gene that correspond to exons 5-9 plus exon 11 of mouse Zp1 (Fig. 4B).
To investigate the possibility that a common regulatory pathway controls the expression of the three zona genes, we have determined the DNA sequence of 250 bp of the mouse Zp1 promoter. A TATAA box was identified -30 bp upstream of the transcription start site, but no CAAT box was detected. Comparison of the Zp1 promoter sequence with a data base of the binding sites for known transcription factors identified a multitude of potential binding sites. However, none were also present at comparable positions in the Zp2 and Zp3 promoters except for a consensus E-box sequence (CANNTG) located at -218 bp from the transcription start site (Fig. 5). In Zp2 and Zp3, the E-box forms the core of the aforementioned element IV. Clustered 6-bp mutations in it inhibit reporter gene activity and prevent the formation of the ZAP-1 complex(20) . The Zp1 E-box (CAgcTG) is located at virtually the identical position in the Zp2 promoter (-216 bp) and is located similarly to the critical E-box in the Zp3 promoter (-181 bp).
Figure 5: Conservation in the Zp1 promoter. The nucleic acid sequence of the first 250 bp upstream of the transcription start sites of three mouse zona genes were compared for common binding motifs of known transcription factors. Except for a TATAA binding site at approximately -30 bp, only an E-box (CANNTG) was identified in all three genes at comparable positions in the promoter: Zp1 (-18 bp), Zp2 (-216 bp), and Zp3 (-181 bp).
The zona pellucida is an extracellular matrix that surrounds growing oocytes, ovulated eggs, and preimplantation embryos. The zona is composed of three distinct glycoproteins, each of which is conserved among eutherian mammals (differences in nomenclature complicate correspondence). Several of the genes encoding the zona proteins have been characterized. The exon-intron maps and coding sequence of mouse Zp2, human ZP2, and the pig homologue(4, 21, 23) , and of mouse Zp3, human ZP3, and hamster ZP3(6, 7, 24, 25) are well conserved. A third human zona gene, ZPB, has recently been reported(16) . Because it is distinct from human ZP2 and ZP3, we reasoned that it is the orthologue of mouse Zp1. The recent cloning of mouse ZP1 cDNA from an expression library (10) has confirmed this hypothesis.
A near full-length ZP1 cDNA was used as a probe to isolate Zp1 from a 129/Sv murine genomic library. Mouse Zp1 is composed of 12 exons that span 6.5 kb of DNA. The sizes of the exons are similar to those reported for human ZPB(16) , except that exon 3 is considerably larger (364 versus 103 bp, Fig. 1). Alignment of the 623-amino acid polypeptide encoded by mouse Zp1 with the 540 amino acid human ZPB indicates that the additional 83 amino acids in the mouse protein are encoded by the elongated exon 3. Although less conserved than mouse and human ZP2 (61% identity) or ZP3 (67% identity) proteins(21, 24) , the 42% identity (53% similarity) of the amino acid sequence of mouse ZP1 and human ZPB proteins indicate homology. This conservation, coupled with the maintenance of 19 of 20 cysteine residues, suggests that the three-dimensional structure of the two proteins in their respective zona matrices may be conserved as well.
In addition to conservation among mammals within each class (e.g. ZP1, ZP2 or ZP3), there is evidence of common ancestry between classes. The mouse ZP1 protein contains a 348-amino acid domain that is 47% similar to mouse ZP2 and is encoded by eight exons in both mouse Zp1 and Zp2. A similar domain was first noted by comparing R55 (rabbit orthologue of mouse ZP1) to mouse ZP2, although the genetic locus of the rabbit gene was not reported(18) . As is most common in cases of partial gene duplication and exon shuffling (26, 27, 28, 29) , the 5` ends of this 348-amino acid domain encoded by mouse Zp1 and Zp2 are bounded by type 1 introns (i.e. the intron begins after the first residue of the codon) and the open reading frame is maintained with the nonconserved exons. The sequence conservation, coupled with the alignment of 10 cysteine residues, suggests that the structural aspects of this domain are similar in the two zona proteins. These data indicate that the eight exons come from a common ancestral gene that has been duplicated in mammals and reutilized by exon shuffling. Although related, each of the mouse zona genes has been mapped to a distinct chromosome. In this manuscript we locate Zp1 to the proximal portion of chromosome 19, 2.1 ± 1.5 cM distal to D19Bir1 (an anonymous DNA fragment). We have previously located mouse Zp2 and mouse Zp3 on chromosome 7 (11.3 ± cM distal to Tyr) and chromosome 5 (9.2 ± 2.9 cM distal to Gus), respectively(30) .
A slightly smaller portion of the ZP1/ZP2
domain encoded by Zp1 exons 5-9 plus exon 11 has been
identified in the wf♀ gene of white flounder, a
distantly related aquatic vertebrate(5) . Although the
wf♀ protein appears to be part of the fish egg envelope, the
expression of the wf♀ gene is restricted to the liver
where it is inducible with estrogens. Outside of this 275-amino acid
domain, the fish protein is quite dissimilar. It does not include a
furin proteolytic site important in the processing of the zona proteins ()and lacks the carboxyl-terminal transmembrane domain
characteristic of all zona pellucida proteins (encoded by the last exon
of each mouse zona gene). A recent report has identified a second,
different, protein domain present in mammal and fish egg envelope
proteins(8) . A 261-amino acid sequence present in mouse ZP3
(encoded by Zp3 exons 1-6) is 33% identical with LS-F, a
precursor of ZI-3, an egg envelope glycoprotein of O. latipes (medaka). Like the white flounder protein (from which it is
distinct), LS-F transcripts are uniquely present in the liver where
they are inducible with estrogen. Thus, although domains of vitelline
envelope and zona pellucida proteins have been conserved for at least
650 million years, the control mechanisms for their expression have
not. It appears that in fish, the two major glycoproteins are
synthesized in the liver and transported to the egg where they form the
inner egg envelope. The three mammalian zona proteins (one of which, Zp1, is ancestrally related to a second, Zp2) are
synthesized exclusively in the oocyte(10) .
Conservation among the zona genes extends to their promoters and may account, in part, for their coordinate and oocyte-specific expression. The sequence of the promoter region of mouse Zp1 was determined and compared with the promoters of mouse Zp2 and Zp3(4, 7) . Approximately 200 nucleotides upstream of the transcription start site of each gene is a canonical E-box sequence (CANNTG) (20) that has been described as a binding site for a class of transcription factors known as basic helix-loop-helix proteins(31) . These factors commonly bind as heterodimers; one subunit is a ubiquitously expressed protein (E2A, HEB, E2-2), and the other is a tissue-specific protein. Using reporter gene constructs microinjected into growing oocytes, we find that cluster mutations of the E-box in either the Zp2 or Zp3 promoter, dramatically reduce reporter gene expression. Using gel mobility shift assays with synthetic oligonucleotides (40 bp) centered on the CANNTG binding site of either the Zp2 or Zp3, we can detect ZAP-1 in oocytes but not granulosa cells(20) . The appearance of the ZAP-1 complex in oocytes is coincident with the detection of ZP2 transcripts in the prenatal ovary(22) . It will be of interest to determine if similar investigations detect functional ZAP-1 binding to the E-box in the mouse Zp1 promoter.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U24227[GenBank]-U24230[GenBank].