(Received for publication, July 19, 1995)
From the
The two human endothelin-converting enzyme (ECE-1) isoforms, which differ by their N-terminal region, are encoded by a single gene. The gene is composed of 19 exons that span more than 68 kilobases and has been mapped to the 1p36 band of the human genome. The two isoform mRNAs display different tissue distributions. Their precursors are transcribed from two distinct start sites, upstream from exon 1 and exon 3, respectively. Sequence analysis of the two putative promoters revealed the presence of motifs characteristic for several transcription factors. Comparison of the ECE-1 gene structure with those of other zinc metalloproteases, as well as a phylogenetic study, confirm the existence of a metalloprotease subfamily composed of ECE-1, ECE-2, neutral endopeptidase, Kell blood group protein, and two bacterial enzymes.
Endothelin, first isolated from cultured porcine endothelial
cells (1) , has recently opened a new field of research
activities. Distinct genes encode three distinct isopeptides,
endothelin-1 (ET-1), ()endothelin-2 (ET-2), and endothelin-3
(ET-3), which form the endothelin family(2) . ET-1 is regarded
as the most potent vasoconstrictor known at present. The measurement of
endothelin concentrations in biological fluids and studies using
endothelin receptor antagonists pointed to the important role of
endothelin in a number of pathophysiological situations including
subarachnoid hemorrhage(3, 4) , chronic heart
failure(5, 6) , and
hypertension(7, 8) . Furthermore, the recent targeted
disruptions of ET-1(9) , ET-3(10) , and ET
receptor genes (11) have demonstrated the importance of
the endothelin system during embryogenesis, more especially with
respect to the development of neural crest-derived tissues.
To form
the active endothelins, big endothelins have to be cleaved at
Trp-Val/Ile
. A putative endopeptidase
specific for this cleavage has been claimed (1) and has been
referred to as endothelin-converting enzyme (ECE). Complementary DNAs
coding for two ECEs have been recently
isolated(12, 13, 14, 15, 16, 17, 18) ,
and the corresponding proteins have been termed ECE-1 (13) and
ECE-2(18) . Both enzymes are membrane zinc-binding
metalloendopeptidases, with a single transmembrane domain defining a
short cytoplasmic tail and a large N-terminal domain, which contains
the active site. ECE-1 has been shown to be mainly located on the
plasma membrane(19) , whereas the acidic pH optimum of ECE-2
suggests an intracellular localization. Moreover, the wide and abundant
tissue distribution of ECE-1 mRNA favors the hypothesis that this
enzyme is mainly responsible for the cleavage of big endothelins.
We
have previously reported the cloning of two human ECE-1 isoforms, ()which both displayed an endothelin-converting activity.
These two enzymes have been tentatively termed ECE-1a and ECE-1b and
differ by their N-terminal extremities. Sequences of the two
corresponding cDNAs have since been separately
published(15, 16, 17) . We present here the
structure of the ECE-1 gene and its chromosomal localization, and thus
show that ECE-1a and ECE-1b are encoded by the same gene through the
use of two promoters separated by approximately 11 kb. We also analyze
the evolutionary relationship existing between ECE-1 and several other
zinc metalloproteases by comparing their sequences and gene structures.
Figure 1: Organization of the human ECE-1 gene. A, schematic drawing of the ECE-1 gene structure. The 19 exons are boxed, restriction enzyme cleavage sites (B, BamHI; E, EcoRI; S, SalI) and positions of translational start (atg) and stop (taa) codons are indicated. Genomic library phages that were used are represented, and arrows mark their extremities. Introns 2, 4, 8, and 14 are not drawn to scale. Intron 2 is approximately 11 kb long, and lengths of introns 4, 8, and 14 could not be assessed and are at least of 6 kb. B, generation of ECE-1a and ECE-1b isoform RNAs. The three first exons of the ECE-1 gene are drawn. ECE-1a specific sequences, ECE-1b specific sequences, and sequences common to both isoforms are represented by open, hatched, and filled boxes, respectively. Exons 4-19, which are not included in this figure, are common to both isoform mRNAs. Open ellipses represent the two isoform putative promoters.
Figure 4:
Nucleotide sequences of the 5` regions (A) and exon-intron boundaries (B) of the human ECE-1
gene. A, nucleotides belonging to the gene coding regions and
corresponding amino acid residues (above) are represented by capital letters; a-type numbers refer to the regions
immediately upstream from exon 3, containing ECE-1a transcription start
sites, while b-type numbers refer to the region surrounding
ECE-1b exons (1 and 2). Nucleotide numbers a1 and b1 correspond to the A of the putative functional start codons of
ECE-1a and ECE-1b, respectively. Potential binding sites for known
transcription factors are double underlined. ECE-1b
exon-intron boundaries are represented by . Symbols placed under
their corresponding nucleotides are
, 5`-ends of cloned human
ECE-1 cDNA ( (17) and O. Valdenaire, unpublished results),
and
, start sites obtained by the SLIC methodology with
ECV304 cells and HUVEC, respectively. B, position of
nucleotides on ECE-1 cDNA correspond to the sequence given in (16) except for the two ECE-1b introns 1 and 2 (see Fig. 4A).
To isolate the gene encoding ECE-1, a human genomic library
was screened using ECE-1 cDNA fragments or oligonucleotides as
hybridization probes. A total of 59 clones turned out to be positive.
Six clones, encompassing the major part of the gene, were selected and
analyzed. The ECE-1 gene spreads over more than 68 kb and contains 19
exons (Fig. 1A). Almost all the exon/intron boundaries
display the consensus splice donor and acceptor sequences (Fig. 4B). Two nucleotide differences, which could
reflect allelic variations, were detected between the genomic coding
sequence and the various published cDNA sequences: a silent mutation at
position 1119 (C to T) and a mutation (G to C) at position
b5 on the ECE-1b sequence (Fig. 4A), which replaces an
arginine by a proline.
Two isoforms of human ECE-1 have been
characterized. ECE-1a (758 residues) and ECE-1b (770
residues) differ only by their N termini and share the same C-terminal
726 residues. Sequences of the corresponding human mRNAs have been
published separately(15, 16, 17) . Northern
analysis (Fig. 3A) shows that these two mRNAs are
widely expressed in human tissues but that their relative abundance
varies. Indeed, ECE-1b signals were stronger than ECE-1a signals in
pancreas, peripheral blood leukocytes, prostate, testis, colon, and
ECV304 cells, whereas lung, spleen, placenta, small intestine, and
HUVEC exhibited stronger ECE-1a signals. The relative levels of the two
isoform mRNAs should be assessed by a more quantitative method like
RNase protection.
Figure 3:
Tissue distribution of human ECE-1 and its
two isoform mRNAs (A) and determination of ECE-1 transcription
start site by RNase protection (B). A, each sample
contains 2 µg of poly(A) RNA, except for HUVEC and
ECV304 cells (10 µg of total RNA). The blots were hybridized with a
probe common to both isoforms (ECE-1) and with probes specific for each
isoform (ECE-1a and ECE-1b), revealing a major signal in the range of
4.8 kb. A minor transcript of approximately 3.5 kb can also be detected
in ovary. Hybridization to human
-actin probe is shown as control
for integrity of samples. Sk. Muscle, PBL, and S.
Intestine refer to skeletal muscle, peripheral blood leukocytes,
and small intestine, respectively. B, total RNAs (10 µg)
from HUVEC (lanes 2 and 4), ECV304 cells (lanes 1 and 5), and yeast (lanes 3 and 6) were
protected with
P-labeled antisense RNAs transcribed from
two genomic fragments corresponding (Fig. 4A) to
nucleotides -a260 to -a83 (ECE-1a; lanes
1-3) and to nucleotides -b31 to b265 (ECE-1b; lanes 4-6). Approximate (±1 nucleotide)
corresponding positions (on Fig. 4A) of the signals are
indicated.
To check that the two isoforms were produced by
the same gene, a genomic Southern blot was hybridized with a short
ECE-1 cDNA fragment (nucleotides 1654-2086,
corresponding to exons 15-18). Two EcoRI fragments
(approximately 3 and 10 kb) and a 9-kb BamHI fragment were
detected, which is in agreement with the restriction map of the ECE-1
gene. The uniqueness of the ECE-1 gene was confirmed by chromosomal in situ hybridization (Fig. 2). Indeed, in the 150
metaphase cells examined after hybridization with an ECE-1 cDNA probe,
91 of the 212 silver grains associated with chromosomes were located on
chromosome 1. The distribution of grains on this chromosome was not
random, since 75 (82.4%) of them mapped to the p36 band of chromosome 1
short arm. From these results, we conclude that the two isoform mRNAs
originate from a single gene located on the 1p36 band of the human
genome.
Figure 2:
Chromosomal localization of the human
ECE-1 gene. Diagram of the G-banded human chromosome 1 illustrating the
distribution of labeled sites () after hybridization with an
ECE-1 cDNA probe.
RNase protection assays in combination with the SLIC methodology(22) , which enables to clone primer extension products, revealed two different transcription start sites (Fig. 3B and 4A) corresponding to the two ECE-1 isoforms. All the ECV304 SLIC clones started at the same site, located eight nucleotides upstream from ECE-1b ATG (nucleotide -b8). This start point was also detected by RNase protection and is in line with the first nucleotide (-b9) of a published ECE-1b cDNA(17) . RNase protection displayed other signals, more proximal (positions b8, b10, and b18). RNAs transcribed from these putative start sites would encode a shorter protein of 753 residues, corresponding to a downstream ATG codon (position b258). The putative existence of such a third isoform will be investigated. A major ECE-1a transcription start point was detected by RNase protection at position -a213. HUVEC SLIC clones started at various positions, spreading over a region of 94 base pairs (positions -a212 to -a119 on Fig. 4A). This could reflect minor transcription start points, which would be in agreement with the presence of additional faint signals in the protection assay.
The ECE-1 gene organization and the localization of its transcription start sites explain the N-terminal ECE-1 duality (Fig. 1B). Exons 1 and 2 encode ECE-1b specific sequence, and synthesis of ECE-1b mRNA includes the splicing of an ECE-1a specific part of exon 3 (to nucleotide a102). Such an organization suggests the presence of two isoform-specific promoters, located upstream from exon 1 and upstream from exon 3, respectively. A similar situation, also leading to the existence of two enzyme isoforms exists for the glucokinase gene (24) .
Regions located upstream from exon 1 and exon 3 (Fig. 4A), which should direct the transcription of ECE-1b and ECE-1a RNAs, respectively, display many putative binding sites for known transcription factors(25) . The region surrounding the ECE-1b start point presents features characteristic of a housekeeping gene promoter. Indeed, it lacks both CAAT and TATA boxes, is very rich in GC content, and presents many SP1 and AP2 consensus sites. Several of these motifs are located in the small first intron, which therefore may play a role in the transcription. ECE-1b putative promoter also contains three shear-stress response motifs, a putative binding site for ISFG3 factor, and an inverted GATA box. Transcription factor GATA-2 plays an important role in the preproendothelin-1 gene transcription(26) , and shear-stress response elements are present in several endothelial genes (27, 28) . ECE-1a transcription start region does not display housekeeping gene features but contains a CAAT box and potential binding sites for glucocorticoid receptors, NF-kappaB, PU-1, AP1, AP2, and c-ets1 transcription factors, as well as one shear stress and three acute phase (29) responsive elements. The protooncogene c-ets1 has been shown to be expressed in endothelial cells during angiogenesis and tumor vascularization(30) . The biological relevance of these sites needs investigating, especially with respect to different transcriptional regulations of ECE-1a and ECE-1b isoforms.
Among all
known zinc metalloproteases, only two bacterial and three mammalian
enzymes display a significant sequence homology with ECE-1, clear
enough to suggest that the six enzymes originate from a common
precursor. The three mammalian enzymes are ECE-2(18) , neutral
endopeptidase (NEP) (31) and Kell blood group protein
(Kell)(32) . The Lactococcus lactis PepO (33) and a Streptococcus gordonii metalloendopeptidase ()are devoid of transmembrane domain and somewhat shorter
(by approximately 100 residues) than the other enzymes. A phylogenetic
tree (Fig. 5B), realized with the peptide sequences,
depicts the evolutionary relationship within this enzyme family. A
second tree, deduced from the comparison of 120-residue regions
surrounding the zinc binding domain (Fig. 5C),
exchanges the positions of the bacterial enzymes and Kell. This
corresponds to a surprisingly high homology within this 120-residue
domain (50% of identity in amino acids between ECE-1 and PepO versus 25% for the whole sequence) between the bacterial
endopeptidases and their mammalian parents. Such a high conservation
may indicate that the concerned region is subject to severe
evolutionary constraints and that many of its residues are necessary to
keep the proper folding of the zinc metalloprotease active site. The
genes encoding NEP and Kell have been previously
characterized(34, 35) . The similar organizations of
the three genes (Fig. 5A) confirm that they belong to
the same family and moreover indicate that divergence between the
mammalian genes occurred after the divergence of eukaryotes and
prokaryotes. The same genes (NEP and Kell) have also been mapped, on
the human genome, to chromosomes 3q21-27 (36) and 7q33 (37) , respectively. The ECE-1 gene is localized on chromosome
1p36 (Fig. 2). To date, at least two other gene families enable
the establishment of a link between these chromosomal regions: the
genes encoding the carboxypeptidases A1 (38) and A3 (
)are located on 7q32 and 3q21-25, respectively, and
the paired box homeotic genes (PAX) 4 and 7 are located on 7q32 and
1p36, respectively(39, 40) . These gene positions may
be the remnants of large DNA duplications that were at the origin of
the ECE-Kell-NEP family diversification.
Figure 5:
Comparison of ECE-1, NEP, and Kell gene
structures (A) and phylogenetic analysis (B and C). A, coding regions of the 3 cDNAs are represented
interrupted by the exon/intron boundaries. and
indicate
the boundaries exactly conserved between the three genes and between
ECE-1 and Kell, respectively. Positions of the nucleotides encoding the
transmembrane domains (TM) and the zinc-binding motifs
(HEXXH) are indicated. B, phylogenetic tree of the
ECE-Kell-NEP family, realized with the GCG (Genetic Computer Groups)
package. h, human; b, bovine; r, rat; and Strep for an unknown metalloprotease of S. gordonii.
GenBank/EMBL accession numbers of the used sequences are as follows:
rNEP, P07861; hNEP, P08473; hECE-1, S47269; rECE-1, A53679; bECE-1,
S47268; bECE-2, U27341; hKell, P23276; PepO, A47098; and Strep, L11577.
Sequences were aligned, and only the peptide regions that could be
aligned with PepO sequence were retained for the analysis, which
corresponded roughly to the extracellular parts of the mammalian
enzymes (e.g. from residue 77 to C-terminal for hNEP). C, phylogenetic tree of the ECE-Kell-NEP family, using this
time the peptide stretches located around the HEXXH motif that
could be aligned with residues 534-660 of hNEP (surrounding the
HEXXH motif).
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) X91922[GenBank]-X91939[GenBank].