* Department of Biological Sciences, University of South Carolina
Department of Animal Science, University of Missouri-Columbia
Correspondence: E-mail: austin{at}biol.sc.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: aspartic proteinase PAGs pregnancy-associated glycoproteins
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aspartic proteinases have been described in animals, plants, fungi, and retroviruses (Davies 1990). The eukaryotic aspartic proteinases are characterized by a bilobed structure. Within each lobe resides an aspartic acid residue in the context of an invariant D-T/S-G motif. The aspartic acids within each lobe are brought into close proximity with one another and, together, in conjunction with a water molecule, are directly involved in the catalytic mechanism (Blundell et al. 1998). All eukaryotic aspartic proteinases show evidence of sequence homology outside the active site, which suggests that they share a common ancestry. Aspartic proteinases from vertebrates include a number of enzymes whose three-dimensional structures are known, such as the digestive enzymes pepsin (Sielecki et al. 1990) and chymosin (Newman et al. 1991); the lysosomal enzyme cathepsin D (Baldwin et al. 1993); and renin, which is involved in control of blood pressure through conversion of angiotensinogen to angiotensin I (Dhanaraj et al. 1992). Among the most interesting members of the vertebrate aspartic proteinase family are the pregnancy-associated glycoproteins (PAGs) of mammals in the order Artiodactyla, the even-toed hoofed mammals. Pregnancy-associated glycoproteins have been cloned or purified from placenta of mammals of sheep, goat, bovine, moose, and pig (Szafranska et al. 1995; Xie et al. 1997; Huang et al. 1999; Garbayo et al. 2000; Green et al. 2000). The PAGs are encoded by a large multi-gene family (Xie et al. 1997; Hughes et al. 2000). Many members of this family have been predicted to lack protease function, although the ability to bind peptides remains unaltered (Xie et al. 1997). Furthermore, there is evidence that natural selection has acted to diversify the members of the PAG gene family at amino acid residues in solvent-accessible positions; such changes have been hypothesized to modulate peptide binding (Xie et al. 1997; Hughes et al. 2000).
Additional PAG-like molecules have been reported from other mammals outside the Artiodactyla (Chen et al. 2001). These PAG-like molecules (sometimes known as pepsin F) differ from PAGs in that the PAG-like molecules are encoded by only one or two genes, whereas the PAGs are encoded by a large multi-gene family. Furthermore, whereas PAG expression is restricted to the placenta, PAG-like molecules are expressed both in extra-embryonic membranes and in the neonatal stomach. In addition, PAG-like genes have been found in the genomes of mammals belonging to a number of different orders (Carnivora, Lagomorpha, Perissodactyla, and Rodentia), but not in the human genome.
Because both PAGs and PAG-like molecules are known only from eutherian (placental) mammals, it seems a plausible hypothesis that these molecules originated within the placental mammals. A preliminary phylogenetic analysis by Chen et al. (2001) placed PAGs close to a group of PAG-like molecules from non-artiodactyl mammals, including a mouse proteinase expressed in the yolk sac and neonatal stomach. However, because this phylogenetic analysis included only mammalian aspartic proteinases, it was not possible to test the hypothesis that PAGs originated within the mammals. The phylogenetic tree lacked any statistical estimation of the confidence of branches, and thus it was not possible to assess the reliability of clustering patterns. In the present study, a comprehensive phylogenetic analysis of eukaryotic aspartic proteinases is used to reconstruct the relationships of major mammalian classes of aspartic proteinases and the evolutionary origin of PAGs.
We used sequence comparisons of promoter regions to obtain additional information regarding the relationships of PAGs; and we used reconstruction of amino acid changes in the phylogeny to identify key features of primary structure unique to the PAGs. Coupled with information on the three-dimensional structure of other aspartic proteinases, these features highlighted structural changes that may be important to the function of the PAGs, which is still poorly understood.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
In analyses of promoter regions, we used promoter sequences for porcine PAG2 (U39198), bovine PAG1 (L237833), bovine PAG2 (AY212886), and mouse pepsin F (AY212887).
Statistical Methods
Amino acid sequences and noncoding DNA sequences were aligned using the CLUSTAL W program (Thompson, Higgins, and Gibson 1994). In the case of coding DNA sequences, the amino acid sequence alignment was imposed on the DNA. In phylogenetic analyses and pairwise comparisons with a set of sequences, all sites at which the alignment postulated a gap in any sequence were excluded from the analysis. All alignments are available from the authors by request.
Phylogenetic trees were reconstructed from amino acid sequence alignments by the maximum parsimony (MP) method (Swofford 2000); by the quartet maximum likelihood (QML) method (Strimmer and von Haeseler 1996) implemented in the program TreePuzzle 5.0; by the minimum evolution (ME) method (Rzhetsky and Nei 1992). In the QML method, we used the JTT model (Jones, Taylor, and Thornton 1992) of amino acid evolution and the assumption that rate variation among sites followed a gamma distribution. In the ME analysis, amino acid sequence distances were estimated by the gamma model (Ota and Nei 1994). The parameter a of the gamma distribution was estimated by the method of Gu and Zhang (1997) and by the TreePuzzle 5.0 program; both methods yielded identical estimates (a = 1.3). All methods produced essentially identical results, and only the ME tree is shown here. The reliability of clustering patterns in the ME analysis was tested by the interior branch length test, using bootstrap estimation of the standard error of branch lengths as implemented in the MEGA2 program (Kumar et al. 2001).
Ancestral amino acid sequences and amino acid replacements were reconstructed by the MP method and by the maximum likelihood (ML) method of Yang, Kumar, and Nei (1995). Ancestral reconstruction was applied only to a sub-tree of closely related sequences. When sequences are closely related, both the MP and likelihood methods are predicted to give reliable inference of ancestral sequences (Zhang and Nei 1997). Ancestral DNA sequences for selected sites were reconstructed by the MP method. Because of long evolutionary distances, the ML method was not able to reconstruct DNA sequences in this case.
To test the hypothesis of sequence conservation in promoter regions, we compared the pattern of nucleotide substitution in available PAG promoter sequences with that at fourfold degenerate sites in the coding region. The single-invariant test of Rzhetsky and Nei (1992) rejected the applicability of the Jukes-Cantor model of sequence evolution to the PAG promoter sequences, but it did not reject the Kimura two-parameter (K2P) model (Kimura 1980). Therefore we used the K2P model to estimate the numbers of nucleotide substitutions per site in promoter regions. So that a comparable model was applied to both coding and noncoding regions in these comparisons, we estimated the numbers of substitutions per site at fourfold degenerate sites in coding regions using the method of Li, Wu, and Luo (1985). This method is equivalent to the K2P method at fourfold degenerate sites.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In addition to the cluster including pepsin A, PAG, and PAG-like molecules, three other clusters of vertebrate aspartic proteinases received strong support. Cathepsin C from mammals, chicken, and the frog Xenopus clustered with brook trout gastricin; and this cluster was supported by a highly significant internal branch (fig. 1). Similarly, cathepsin D from mammals, chicken, and zebrafish formed a cluster supported by a significant internal branch (fig. 1). Also, mammalian cathepsin E clustered with zebrafish nothepsin, and this cluster was supported by a significant internal branch (fig. 1). Thus, the phylogenetic analysis strongly supported the hypothesis that cathepsin C/gastricin, cathepsin D, and cathepsin E/nothepsin represent distinct subfamilies of aspartic proteinases that arose by gene duplications prior to the divergence of bony fishes from tetrapods. Nevertheless, there were no significantly supported clusters including molecules from both vertebrates and invertebrates (fig. 1). Thus, the phylogenetic analysis did not provide any evidence that vertebrate genomes include any aspartic proteinases that arose by gene duplication prior to the origin of the vertebrates.
To identify amino acid changes that might be crucial to the evolution of new function in the ancestor of PAGs, amino acid changes reconstructed by the MP and ML methods were examined in the sub-tree of PAGs and PAG-like molecules, rooted with the pepsin A subfamily (fig. 1). Both methods of reconstruction revealed three changes occurring at adjacent amino acid sites along the branch leading to PAGs (fig. 2). The sites involved correspond to residues 148150 of the mature mammalian pepsin molecule (Sielecki et al. 1990) and to residues 150152 of mature bovine PAG1 (accession number Q29432). These reconstructions indicated that the ancestor of PAG and PAG-like molecules had the amino acid sequence QNL at these sites, and that this was changed to EPV in the ancestor of PAGs. The posterior probabilities estimated by the ML method for the EPV motif in the PAG ancestor were all quite high, particularly in the case of the first two residues (fig. 2). All available artiodactyl PAG sequences retain EPV or EPI at these positions (fig. 2).
|
The promoter regions of artiodactyl PAG-encoding genes showed homology to the mouse gene encoding pepsin F (table 2). However, there was little evidence of homology between PAG promoters and published promoter sequences from mammalian pepsin A or renin genes (data not shown). In pairwise comparisons among PAG promoters, there was a region proximal to the start site that showed significantly lower sequence divergence than synonymous sites in the coding region (table 2). In comparisons between PAG and pepsin F, the same region likewise showed significantly lower sequence divergence than synonymous sites in the coding region (table 2). The reduced rate of nucleotide substitution in the proximal promoter region is evidence that this region is subject to purifying selection and thus to functional constraint. The fact that this pattern is seen both within the PAGs and in comparisons between PAGs and pepsin F is evidence that these two classes of genes are subject to similar functional constraints in this portion of their promoter regions.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Reconstruction of amino acid changes in the phylogeny of aspartic proteinases revealed that, in the ancestral PAG, the motif QNL was replaced by EPV at the residues corresponding to residues 148150 of pepsin. The fact that the EPV motif (or occasionally EPI) is conserved in all known PAGs suggests that this sequence change may have been important in the evolution of a function unique to PAGs. A preponderance of nonsynonymous nucleotide changes in the EPV codons on the branch leading to the PAGs supports the hypothesis that natural selection favored changes in these residues, which is consistent with their having played an important role in the origin of a functionally novel class of proteins.
The aspartic proteinases have an internally duplicated structure, with N-terminal and C-terminal domains that show evidence of homology to each other (Sielecki et al. 1990). This structure appears to have arisen by an ancient internal gene duplication (Tang et al. 1978). Each of these two domains includes a short conserved sequence containing an aspartate residue (residues 32 and 215, respectively, in pepsin) that contributes to the active site (Sielecki et al. 1990). In the three-dimensional structure of the molecule, a six-stranded anti-parallel ß-sheet creates a hydrophobic core on the side of the protein opposite to the substrate-binding cleft that contains Asp 32 and Asp 215. In porcine pepsin A, residues 149 and 150 are located at the beginning of one of the strands of this ß-sheet and are involved in hydrogen bonding with other strands (Sielecki et al. 1990) (fig. 3). The corresponding residues play an identical role in bovine chymosin (Newman et al. 1991). A similar structure is seen in rennin (Dealwis et al. 1994), which is less closely related to PAGs than are pepsin A and chymosin (fig. 1). Because of the potential importance of residues 148150 in maintaining the connection between the N-terminal and C-terminal domains of aspartic proteinases, changes in these residues in the ancestor of PAGs may have served to alter the three-dimensional structure of the molecule. Such a change in structure may explain the observation that PAGs lack aspartic proteinase activity in spite of the fact that the active site residues are conserved in most PAGs (Xie et al. 1997).
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Baldwin, E. T., T. N. Bhat, S. Gulnik, M. V. Hosur, R. C. Sowder, II, R. E. Cachau, J. Collins, A. M. Silva, and J. W. Erickson. 1993. Crystal structures of native and inhibited forms of human cathepsin D: implications for lysosomal targeting and drug design. Proc. Natl. Acad. Sci. USA 90:6796-6800.[Abstract]
Barrett, A. J. 1992. Cellular proteolysis: an overview. Ann. N.Y. Acad. Sci. 674:1-15.[ISI][Medline]
Blundell, T. L., K. Guruprasad, A. Albert, M. Williams, B. L. Sibanda, and V. Dhanaraj. 1998. The Aspartic proteinases: a historical overview. Pp. 113 in M. N. G. James, ed. Aspartic proteinases. Plenum Press, New York.
Chen, X., C. S. Rosenfeld, R. M. Roberts, and J. A. Green. 2001. An aspartic proteinase expressed in the yolk sac and neonatal stomach of the mouse. Biol. Reprod. 65:1092-1101.
Davies, D. R. 1990. The structure and function of the aspartic proteinases. Annu. Rev. Biophys. Chem. 19:189-215.[CrossRef][ISI][Medline]
Dealwis, C. G., C. Frazao, and M. Badasso, et al. (12 co-authors). 1994. X-ray analysis at 2.0 Å resolution of mouse submaxillary renin complexed with a decapeptide inhibitor CH-66, based on the 416 fragment of rat angiotensinogen. J. Mol. Biol. 236:342-360.[CrossRef][ISI][Medline]
Dhanaraj, V., C. G. Dealwis, and C. Frazao, et al. (18 co-authors). 1992. X-ray analyses of peptide-inhibitor complexes define the structural basis of specificity for human and mouse renins. Nature 357:466-472.[CrossRef][ISI][Medline]
Garbayo, J. M., J. A. Green, M. Mannekin, J.-F. Beckers, D. O. Kiesling, A. D. Ealy, and R. M. Roberts. 2000. Caprinee pregnancy-associated glycoproteins (PAG): their cloning, expression and evolutionary relationship to other PAG. Mol. Reprod. Dev. 57:311-322.[CrossRef][ISI][Medline]
Green, A. A., S. Xie, X. Quan, B. Bao, X. Gan, N. Mathialagan, J.-F. Beckers, and R. M. Roberts. 2000. Pregnancy-associated glycoproteins exhibit spatially and temporally distinct expression patterns during pregnancy. Biol. Reprod. 62:1624-1631.
Gu, X., and J. Zhang. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113.[Abstract]
Huang, F., D. C. Cockrell, T. R. Stephenson, J. H. Noyes, and R. G. Sasser. 1999. Isolation, purification, and characterization of pregnancy-specific protein B from elk and moose placenta. Biol. Reprod. 61:1056-1061.
Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170.[CrossRef][ISI][Medline]
Hughes, A. L., J. A. Green, J. M. Garbayo, and R. M. Roberts. 2000. Adaptive diversification within a large family of recently duplicated, placentally expressed genes. Proc. Natl. Acad. Sci. USA 97:3319-3323.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282.[Abstract]
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[ISI][Medline]
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.
Li, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150-174.[Abstract]
Newman, M., M. Saffro, C. Frazao, G. Khan, A. Zdanov, I. J. Tickle, T. L. Blundell, and N. Andreeva. 1991. X-ray analyses of aspartic proteinases IV: structure and refinement at 2.2 Å resolution of bovine chymosin. J. Mol. Biol. 221:1295-1309.[CrossRef][ISI][Medline]
Ota, T., and M. Nei. 1994. Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites. J. Mol. Evol. 38:642-643.[ISI]
Rzhetsky, A., and M. Nei. 1992. A simple method for estimating and testing minimum-evolution trees. Mol. Biol. Evol. 9:945-967.
Sielecki, A. R., A. A. Fedorov, A. Boodhoo, N. S. Andreeva, and M. N. G. James. 1990. Molecular and crystal structures of monolithic porcine pepsin refined at 1.8 Å resolution. J. Mol. Biol. 214:143-170.[ISI][Medline]
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
Swofford, D. L. 2000. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Szafranska, B., S. Xie, J. Green, and R. M. Roberts. 1995. Porcine preganancy-associated glycoproteins: new members of the aspartic proteinase gene family expressed in trophectoderm. Biol. Reprod. 5:21-28.
Tang, J., M. N. James, I. N. Hsu, J. A. Jenkins, and T. L. Blundell. 1978. Structural evidence for gene duplication in the evolution of the acid proteases. Nature 271:618-621.[ISI][Medline]
Thompson, J. D., D. G. Higgins, and D. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22:4673-4680.[Abstract]
Xie, S., J. Green, J. B. Bixby, B. Szafranska, J. C. DeMartini, S. Hecht, and R. M. Roberts. 1997. The diversity and evolutionary relationships of the pregnancy-associated glycoproteins, an aspartic proteinase subfamily consisting of many trophoblast-expressed genes. Proc. Natl. Acad. Sci. USA 94:12809-12816.
Yan, R., M. J. Bienkowski, and M. E. Shuck, et al. (15 co-authors). 1999. Membrane-anchored aspartyl protease with Alzheimer's disease beta-secretase activity. Nature 402:533-537.[CrossRef][ISI][Medline]
Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641-1650.
Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J. Mol. Evol. 44:(Suppl. 1): S139-S146.[ISI][Medline]
|