(Received for publication, October 2, 1995; and in revised form, January 4, 1996)
From the
Through the analysis of the porcine gene encoding the elastase inhibitor elafin, we demonstrated that there are at least three closely related members of the elafin family, and their genes have arisen by accelerated evolution. A porcine genomic DNA library was screened with a previously cloned human elafin cDNA probe, and several positive clones were obtained that can be distinguished by a combination of restriction enzymes. Sequence analysis of these clones revealed the presence of three homologous members whose genes, all consisting of three exons and two introns, are almost identical except the exon 2 sequences encoding the inhibitor domain called ``WAP motif''; the intron sequences are related to each other with sequence similarities of 93-98%, whereas the exon 2 sequences exhibited only 60-77% similarities among the three members. The extreme divergence in the exon 2 sequences compared to the highly conserved intron sequences may be generated by accelerated mutations confined in a short stretch of the genes following recent duplication events of a single ancestral gene. An RNase protection assay indicated that the messages of the elafin family members are abundantly expressed in the trachea and intestine, suggesting that the most likely selective forces for the accelerated evolution are extrinsic proteinases produced by invasive microorganisms.
Elafin is a unique elastase inhibitor (1) having a
transglutaminase substrate domain that serves as an anchoring sequence
to confine the inhibitor at its sites of
action(2, 3, 4) . Elafin, also known as
skin-derived antileukoproteinase (SKALP; (5) and (6) )
or elastase-specific inhibitor (ESI, (7) ), is synthesized as
pre-elafin and secreted into the extracellular matrix space where it is
cross-linked to certain structural proteins through its
transglutaminase substrate domain by the action of transglutaminase
(for review on transglutaminase, see (8, 9, 10) ). Although elafin was first
isolated as a 57-amino acid protein with a characteristic
disulfide-linked structure called four-disulfide core or WAP motif (11) , the 57-amino acid form is now considered to represent
the inhibitor domain generated, by proteolytic cleavage, from the
95-amino acid native elafin which consists of the following two
domains: 1) transglutaminase substrate domain that we termed cementoin
moiety (3) and 2) inhibitor domain or WAP motif. Biochemical
and molecular biological studies on elafin have so far been performed
using human materials; however, for more detailed studies such as
identification of the acceptor proteins to which elafin is anchored by
covalent cross-linking through the transglutaminase substrate domain,
we considered it more appropriate to use other mammalian species from
which fresh materials are relatively easily obtainable and initiated
cloning and characterization of porcine elafin gene, which resulted in
the following unexpected findings that 1) there are at least three
elafin-related genes that are very similar to each other over the
entire length of about 4 kb ()including the intron
sequences, suggesting that they have arisen by relatively recent gene
duplications and 2) in contrast to the extremely high similarity
(97-98%) in the intron and parts of the exon sequences
corresponding to the noncoding regions, the gene sequences coding for
the inhibitor domains or the WAP motif regions are surprisingly
variable (77-81%) among the newly found elafin family members.
This type of dramatic degree of non-homology within a short stretch of
gene sequences that specifically affects the functionally important
regions of protein sequences is called accelerated evolution.
Accelerated evolution, or an unusually high rate of mutation in a
certain segment of the genes, is a process postulated to occur in genes
following a duplication event (12, 13) and is
considered to be an effective mechanism to provide the hosts with a
defense system against unwanted outsiders such as pathogens and
parasites and, inversely, to provide the intruders with an increased
capacity of invasion. Currently, however, only a few cases of
accelerated evolution have been reported. 1) The first case is a family
of the serine proteinase inhibitors (serpins) in which an unusually
high degree of polymorphism was found in a narrow region surrounding
the reactive site. This original observation by Hill et al.(14, 15) was later extended to other members of
the serpin family by Barriello and Krauter (16) and the
divergence was confirmed at the cDNA level, but their genes, especially
the introns, have not been analyzed. Extrinsic proteinases used by
parasites to facilitate their spread throughout the host are considered
as the most likely selective force. 2) An extremely high rate of
nonsynonymous nucleotide substitutions that cause amino acid changes
has been found in the active domain-coding region of the wheat thionin
genes by cDNA cloning(17) . Thionins are a family of
cysteine-rich proteins and active against plant pathogens. 3) Nakashima et al.(18, 19) have found that the introns
and noncoding regions of the Trimeresurus flavoviridis (Habu
snake) venom gland phospholipase A isozyme genes are
unusually conserved as compared to the protein-coding regions except
for the presequence-coding regions, indicating that the genes have
evolved so as to bring about accelerated amino acid substitutions in
the mature protein-coding regions. An additional example of the
accelerated evolution reported here for the elafin family genes will be
helpful for both theoretical and experimental analyses of the unique
events in the evolution of the genes. It should also be emphasized that
the genes analyzed in the present study are the first mammalian family
of genes that have unique introns whose sequences are exceptionally
highly conserved (93-98% sequence identities) compared to their
relatively divergent exons. The genes may therefore be one of the most
recently duplicated gene families.
Figure 1: Schematic representation of the gene structures of porcine elafin (pgWAP-1), its family members (pgWAP-2 and pgWAP-3), and human elafin. Gray boxes represent exon 1. Repetitive sequence regions in exons 2 are indicated by hatched boxes. Filled boxes indicate the regions coding for the WAP motif region or inhibitor domain. Open boxes indicate the noncoding regions (exon 3). Restriction enzyme sites are also shown to illustrate a high degree of similarity of the three porcine genes even in the intron sequences. Gaps are introduced in regions a and b to produce optimal alignment; the gaps in region a are due to the difference in the number of repeats, and the gap in region b of the human elafin gene is due to the presence of the SINE sequence (PRE-1, small striped boxes) (29, 31) in the three porcine genes.
Figure 2: Nucleotide sequence of the porcine WAP-1 (elafin) gene and amino acid sequence deduced from its open reading frame. The nucleotide sequence is shown in upper case letters, and the deduced amino acid sequence is shown below the nucleotide sequence. Nucleotide and amino acid residue numbers are shown to the right of the sequence. Arrowhead indicates the putative cleavage site of the signal sequence; arrow marks the beginning of the WAP motif or the inhibitor domain. Intron sequences are bracketed [GT . . . AG]. A polyadenylation signal is indicated by closed box. Eight cysteine residues defining the WAP motif or four-disulfide core are shown white on black. Open box indicates the 3`-untranslated region (exon 3). Gray box represents SINE or PRE-1 flanked by a pair of direct repeats marked by double underlines.
Figure 3: Amino acid sequence comparison among elafin and its family members having the WAP motif. Amino acid sequences of human elafin (h-ela) and porcine WAP-1 (elafin), WAP-2 (also known as SPAI-2; (43, 44, 45) ), and WAP-3 are aligned, and the amino acid residues that are identical are shown white on black or gray boxes. Eight cysteine residues defining the WAP motif or four-disulfide core structure of the inhibitor domain (marked by horizontal lines) are indicated by asterisks. Amino acids are shown in the single-letter code. Residue numbers of the last residue in each line are noted on the right. The sequence of human elafin is from Saheki et al.(22) . Boxes 1 and 2 indicate highly variable regions and correspond to boxes 1 and 2 in Fig. 5, respectively. The box 2 region corresponds to the reactive site of the acid-stable proteinase inhibitor of human mucous secretions (HUSI-I or antileukoprotease) which consists of two WAP motifs(32, 46) .
Figure 5: High similarity in overall structures and extreme sequence divergence in a limited region of the genes of the elafin family members. The complete nucleotide sequences of the porcine WAP-1 (elafin), WAP-2, and WAP-3 genes were determined, deposited in the GSDB/DDBJ/EMBL/NCBI Data Bank, and compared to each other and to that of human elafin gene (h-ela)(22) . In this figure, only the exon 2 sequences and parts of the intron sequences flanking them are shown. Identical nucleotides are shown as dots. Gaps are designated by dashes. Stop codons are boxed. Note the high degree of sequence conservation in the intron sequences relative to the hypervariable regions 1 and 2 (open boxes) in the WAP motif-coding regions(- - -). The amino acid sequences encoded by the hypervariable regions are shown in Fig. 3(boxes 1 and 2).
Figure 4: Inhibition of pancreatic elastase by the WAP motif sequences of the porcine elafin family members. Three peptides (pWap-1, pWap-2, and pWap-3) corresponding to the WAP motif regions (indicated by horizontal line in Fig. 3) of the porcine WAP-1, WAP-2, and WAP-3 proteins were chemically synthesized and assayed for their inhibitory activities against elastase using a fluorogenic substrate. Assay mixtures contained 1 µg of elastase and 10 µg of the synthetic peptides or 0.3 µg of human (h) elafin in 2 ml.
The comparisons, especially the intraspecies and interspecies comparisons of the intron sequences, strongly suggest that the porcine elafin gene duplication and diversification occurred after the porcine had diverged from the human lineage. This conclusion is supported by the presence in the porcine genes but the absence in the human elafin gene of SINE (see below and ``Discussion'') (Fig. 1).
Figure 6: Phylogenetic tree of the porcine WAP family members and generation of divergence in the repetitive sequences. A, phylogenetic tree constructed based on the amino acid and nucleotide sequences of the family members and their genes. B, alignment of repetitive sequences of pgWAP-1, -2, and -3. Only the nucleotides different from the consensus sequence are shown. The nucleotides boxed in black represent those different from the consensus but conserved among the family members, which helped to define a patchwork pattern of resemblance (blocks A, B, and C) in the repetitive sequence of the three members.
Fig. 6compares the repetitive sequences of pgWAP-1, -2, and -3 by highlighting the nucleotide positions that are different from the consensus sequence AAAGGTCAAGATCCAGTC, which revealed an interesting pattern of similarity among the family members; for example, the repetitive sequences can be subdivided into 3 blocks, each of which consists of several 18-nucleotide repeating units (blocks A, B, and C in Fig. 6). pgWAP-1 and pgWAP-2 have block A in common; pgWAP-1 and pgWAP-3, a part of block A; and pgWAP-2 and pgWAP-3, block B and a part of block C. To obtain some insight into how these blocks of repeating sequences were generated, we constructed a phylogenetic tree (dendrogram) of the three porcine WAP family genes based on their nucleotide sequence similarities and tried to locate, on the dendrogram, the points of insertion or deletion of the blocks (Fig. 6A). The following series of events may have occurred. 1) Block A was inserted before the first duplication of the ancestor gene; 2) block B was inserted after the first but before the second duplication that produced pgWAP-2 and pgWAP-3; 3) block C was inserted following the second duplication; and 4) block D was deleted after the second duplication from the pgWAP-3 trait. The insertion of block C is considered to be a relatively recent event since the repeating unit in the block is most highly conserved.
Figure 7:
Tissue distribution of porcine elafin
family members determined by RNase protection analysis. Total RNA (5
µg) from the indicated porcine tissues were annealed with P-labeled cRNA probes specific for the porcine WAP-1,
WAP-2, and WAP-3 messages and digested with RNases A and T
.
The protected fragments were then analyzed by electrophoresis and
fluorography. The autoradiogram represents an exposure of 5
days.
In the present paper, we reported that the porcine elafin
family genes show an exceptionally high rate of nonsynonymous
nucleotide substitutions in a narrow region encoding the mature protein
sequence; the overall sequence similarity outside this narrow region is
very high including the intron sequences. A similar case has been
reported for the genes of snake venom phospholipase A isozymes(18, 19) . Among the reports describing
accelerated evolution, the above two reports are unique in
demonstrating an unexpectedly high conservation of introns since the
evolution rates of introns are generally considered to be much greater
than those of exons(35) ; the other studies concerning 1)
-proteinase inhibitor or
-antitrypsin, a major plasma proteinase inhibitor (15, 16, 36) and 2) plant thionins (17) have demonstrated greater substitution rates in the
reactive center region based on the protein or cDNA sequences. For
example, in the case of the plasma proteinase inhibitor
-antitrypsin, certain species such as
mouse(16) , guinea pig(37, 38) , and
rabbit(36) , unlike humans who have a single gene(39) ,
have been suggested, by cDNA cloning, to have multiple genes that are
very similar in the overall exon sequences (97-98% identity) but
show a strikingly high level of sequence divergence at a short segment
coding for a 9-amino acid stretch encompassing the reactive center.
Such an unusual kind of divergence has been interpreted as a result of
positive Darwinian evolution, thereby creating new reactive site
sequences with varying specificities to cope with an increasing number
of attacking proteinases. The localization of the porcine elafin family
members in the trachea and intestine (Fig. 7) is consistent with
this view since such regions are exposed to a variety of infectious
agents such as bacteria and parasites. Another explanation for the
divergence in the narrow region is the neutralist theory which suggests
that whenever an exceptionally high rate of substitutions is
encountered in molecular evolution, we should suspect loss of
constraint that allows previously harmful mutants to become selectively
neutral(13) . The variable region is indeed located in a
surface loop that is not crucial for maintaining the tertiary structure
of the protein, so that mutations in this region are not selected
against, as opposed to other parts of the
protein(40, 41) . In the case of the neutralist
concept, however, the question remains as to why the intron sequences
of the snake venom phospholipase A
isozyme (18, 19) and porcine elafin gene families are
conserved much more highly than those of the exons coding for the
hypervariable regions. The first identification, reported here, in the
mammalian genomes of a family of genes that are almost identical
including the intron sequences except for a short stretch of exons
where mutational burst is observed should contribute to a better
insight into the molecular mechanism of evolutionary manifold.
Furthermore, the presence of short interspersed repetitive elements
(SINEs) in the conserved introns may make the genes suitable for such
analyses of diversification of duplicated genes as discussed below.
Eukaryotic genomes contain repeated sequence elements dispersed to thousands of locations. SINEs are the most abundant of such elements and constitute 1-5% of the total mass of DNA. Each species has its own SINEs; for example, the human SINE, Alu family, and the porcine SINE, PRE-1, are quite different. This fact and their common properties such as the presence of an internal RNA polymerase III promoter, an A-rich tract at the 3` side, and a direct terminal repeat at the 5` and 3` ends strongly suggest that SINEs are retroposons dispersed through an RNA intermediate after speciation(34) . SINEs are therefore useful markers for estimating relative dates of gene duplications and for constructing gene phylogenies(42) . In the present study, SINEs were detected in the second introns of the porcine elafin gene family. In the human elafin gene, such a sequence is not present (Fig. 1)(22) . This fact suggests that duplications of the porcine elafin gene and evolution, by the following diversification, of the three genes cloned here occurred after the speciation of porcines and humans.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D50319[GenBank]-D50323[GenBank].