The {alpha}-Mannosidases: Phylogeny and Adaptive Diversification

Daniel S. Gonzalez* and I. King JordanGo,{dagger}

*Department of Medical Microbiology, University of Georgia; and
{dagger}Department of Biological Sciences, University of Nevada at Las Vegas


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 
{alpha}-Mannosidase enzymes comprise a class of gylcoside hydrolases involved in the maturation and degradation of glycoprotein-linked oligosaccharides. Various {alpha}-mannosidase enzymatic activities are encoded by an ancient and ubiquitous gene superfamily. A comparative sequence analysis was employed here to characterize the evolutionary relationships and dynamics of the {alpha}-mannosidase superfamily. A series of lineage-specific BLAST searches recovered the first ever recognized archaean and eubacterial {alpha}-mannosidase sequences, in addition to numerous eukaryotic sequences. Motif-based alignment and subsequent phylogenetic analysis of the entire superfamily revealed the presence of three well-supported monophyletic clades that represent discrete {alpha}-mannosidase families. The comparative method was used to evaluate the phylogenetic distribution of {alpha}-mannosidase functional variants within families. Results of this analysis demonstrate a pattern of functional diversification of {alpha}-mannosidase paralogs followed by conservation of function among orthologs. Nucleotide polymorphism among the most closely related pair of duplicated genes was analyzed to evaluate the role of natural selection in the functional diversification of {alpha}-mannosidase paralogs. Ratios of nonsynonymous and synonymous variation show an increase in the rate of nonsynonymous change after duplication and a relative excess of fixed nonsynonymous changes between the two groups of paralogs. These data point to a possible role for positive Darwinian selection in the evolution of {alpha}-mannosidase functional diversification following gene duplication.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 
Comparisons of recently completed whole-genome sequences have confirmed the extent to which genic organization is characterized by a hierarchy of gene families, subfamilies, and superfamilies (Tatusov, Koonin, and Lipman 1997Citation ). Despite the rapid accumulation of gene sequences, the rate of new gene family discovery continues to drop (Henikoff et al. 1997Citation ). It is foreseeable that the entire spectrum of genic diversity will soon fall into a limited number of gene families. Gene families consist of homologous sequences that are designated either orthologs or paralogs. Orthologs are related to a common ancestor by speciation, while paralogs are related by duplication (Fitch 1970Citation ). The evolutionary significance of gene duplication has long been recognized (Haldane 1932Citation ; Muller 1935Citation ). While orthologs tend to encode the same function, paralogs often evolve novel activities. Ohno (1970)Citation formulated the provocative hypothesis that gene duplication is a prerequisite for the evolution of any new gene function. He recognized that natural selection is inherently conservative and postulated that only the redundancy created by gene duplication could allow a gene copy to escape the pressure of negative selection and evolve a new function. Findings during the ensuing decades have revealed that while there are in fact other ways to evolve new genes and functions, gene duplication remains the most important mechanism for generating genic novelty (Li 1997Citation ).

The classification scheme of glycoside hydrolases provides an archetype example of the hierarchical relationships among a widespread evolutionarily and/or functionally related superfamily of enzymes (Henrissat 1991, 1998Citation ; Henrissat and Bairoch 1993, 1996Citation ; Henrissat and Romeu 1995Citation ; Henrissat and Davies 1997Citation ). Glycoside hydrolases are enzymes that hydrolyze the glycosidic bond between carbohydrates or between a carbohydrate and a noncarbohydrate moiety. The innovative sequence-based classification system originally proposed by Henrissat (1991)Citation currently consists of 66 families of glycoside hydrolases (see http://expasy.hcuge.ch/cgi-bin/lists?glycosid.txt). Structural and functional characteristics that indicate relationships between members of different families have resulted in the designation of clans that are composed of two or more families.

{alpha}-Mannosidases are glycoside hydrolases involved in both the maturation and the degradation of Asn-linked oligosaccharides (Dewald and Touster 1973Citation ; Tulsiani et al. 1982Citation ; Lal et al. 1994Citation ; Liao, Lal, and Moremen 1996Citation ). The glycoprotein maturation and degradation pathways are very conserved, and {alpha}-mannosidase activities have been detected in all eukaryotes assayed. {alpha}-Mannosidase–encoding genes have been isolated and their products characterized from a diverse group of eukaryotes, including the protozoan Trypanosoma cruzi, the yeast Saccharomyces cerevisiae, and the metazoans Drosophila melanogaster and Homo sapiens (Camirand et al. 1991Citation ; Kerscher et al. 1995Citation ; Liao, Lal, and Moremen 1996Citation ; Vandersall-Nairn et al. 1998Citation ). Traditionally, {alpha}-mannosidases have been organized into two classes (I and II) based on both functional characteristics and sequence homology (Moremen, Trimble, and Herscovics 1994Citation ; Henrissat 1998Citation ). The cellular compartment where they catalyze mannose hydrolysis (e.g., endoplasmic reticulum, Golgi, or lysosome) further distinguishes different {alpha}-mannosidase enzymes.

Class I {alpha}-mannosidase enzymes thus far characterized are all involved in the maturation of Asn-linked oligosaccharides. These enzymes all process the trimming of Man9GlcNAc2 to Man5GlcNAc2. While class I {alpha}-mannosidase enzymes only hydrolyze {alpha}-1,2 mannose bonds, they differ in their stereospecificities (Lal et al. 1994Citation ). Class I {alpha}-mannosidase enzymes are localized to either the endoplasmic reticulum or the Golgi complex. The majority of class II {alpha}-mannosidase enzymes that have been characterized catalyze the degradation of Asn-linked oligosaccharides. Class II {alpha}-mannosidase enzymes show less biochemical specificity, as they possess {alpha}-1,3, {alpha}-1,6, and {alpha}-1,2 hydrolytic activity. Enzymes of this class also have a wider range of cellular compartmentalization and can be localized to the cytosol and lysosomes in addition to the Golgi complex.

The rapid accumulation of genomic and cDNA nucleotide sequences in the various public databases has facilitated the in silico discovery of several putative {alpha}-mannosidase sequences with statistically significant similarities to classically cloned and functionally characterized {alpha}-mannosidase genes. Using a diverse representative set of {alpha}-mannosidase amino acid query sequences, an exhaustive search for and analysis of {alpha}-mannosidase homologous sequences was performed. Sequence retrieval, alignment, and phylogenetic analysis allowed a determination of the range and extent of {alpha}-mannosidase variation. The relationship among previously characterized and novel putative {alpha}-mannosidase sequences is defined and a classification consistent with the Henrissat scheme is proposed. The comparative method was used to assess the correlation between phylogenetic relationship and cellular localization of biochemically characterized {alpha}-mannosidase sequences. Finally, the existence of closely related orthologs and paralogs in the {alpha}-mannosidase IA-IB clade allowed a test for positive Darwinian selection for altered function following gene duplication.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 
Data Retrieval
All of the sequences used in this study were retrieved from the National Center for Biotechnology Information (NCBI) GenBank database. An initial data set of previously published and functionally characterized {alpha}-mannosidase amino acid sequences was retrieved manually from Entrez (http://www.ncbi.nlm.nig.gov/Entrez). These representative sequences were isolated from a wide phylogenetic range of eukaryotes and possessed a variety of biochemical activities and intracellular localizations. An in-house program (Cluster) was used to group these sequences by amino acid identity into clusters. Divergent {alpha}-mannosidase amino acid sequences with representatives from each cluster were used as queries in a series of lineage-specific BLAST (http://www.ncbi.nlm.nih.gov/BLAST) searches. Lineage-specific searches with a diverse set of query sequences allowed for a maximum amount of sequence space to be covered. NCBI's gapped and ungapped tblastn searches were run using the default BLOSUM62 (gapped) and the PAM250 (ungapped) distance matrices.

The sequences analyzed here correspond to the GenBank accession numbers listed below. The accessions include both nucleotide and amino acid sequences. The sequence abbreviations and their corresponding accession numbers follow the species names. Sequence abbreviations consist of two letters that represent the species binomial followed by a single-letter designation that indicates the cellular location of activity for functionally characterized sequences or a "p" to indicate putative {alpha}-mannosidase sequences that have not yet been biochemically characterized. Cellular compartmentalization abbreviations are as follows: E, endoplasmic reticulum; M, membrane-associated; G, Golgi apparatus; L, lysosome; X, extracellular; and V, vacuolar. Roman numerals that indicate the identity of the {alpha}-mannosidase family to which a sequence belongs make up the final component of the sequence abbreviations. The accession number list is as follows: Arabidopsis thaliana—AT-LII, Y11767; Aspergillus saitoi—AS-XI, D49827; Bos taurus—BT-LII, L31373; Caenorhabditis elegans—CE-pI.1, Z78012; CE-pI.2, Z73906; CE-pI.5, Z81497; CE-pI.7, Z68882; CE-pII.1, U40948; CE-pI.3, Z68270; CE-pI.6, Z47073; CE-pI.4, U41272; CE-pII.3, U97015; CE-pII.2, Z75954; Dictyostelium discoideum—DD-LII, M82822; Drosophila melanogaster—DM-GI, X82641; DM-pI, AL021086; DM-GII.1, X77652; DM-GII.2, AB018079; Escherichia coli—EC-pIII, AE000176; Emericella nidulans—EN-pIII, AF016850; Felis catus—FC-LII, AF010191; Homo sapiens—HS-EIII, AF044414; HS-LII, U68567; HS-GII.1, U31520; HS-GII.2, D55649; HS-pI, D86967; HS-GIA, X74837; HS-GIB, AF027156; Mus musculus—MM-MII, AB006458; MM-GIA, U04299; MM-LII, U87240; MM-GIB, AF078095; MM-GII, X61172; Mycobacterium tuberculosis—MT-pIII, Z92772; Oryctolagus cuniculus—OC-GIA, U04301; Penicillium citrinum—PC-XI, D45839; Pyrococcus horikoshii—PH-pIII, AP000003; Rattus norvegicus—RN-EIII, M57547; RN-GII, M24353; Saccharomyces cerevisiae—SC-EI, Z49631; SC-VIII, M29146; SC-pI.1, U00030; SC-pI.2, Z73229; Schizosaccharomyces pombe—SP-pI, AL021813; Spodoptera frugiperda—SF-GI, AF005035; SF-GII, AF005034; Sus scrofa—SS-MII, D28521; SS-GIA, Y12503; Synechocystis—SY-pIII, D63999; Trypanosma cruzi—TC-LII, AF077741.

Sequence Alignment
The CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ) and PROBE (Neuwald et al. 1997Citation ) programs were used to align amino acid sequences. Initial alignments of the total data set performed with CLUSTAL W revealed a highly divergent group of sequences, and therefore CLUSTAL W was not able to obtain an optimal global alignment. To effectively align the total amino acid data set, the PROBE program was used to identify a common ordered series of motifs (OSM) among all sequences. An alignment of diagnostic sites was extracted from the total OSM alignment. Diagnositic sites were chosen from sites that a PAUP* 4.0b parsimony reconstruction classified as apomorphies with a consistency index of 1 and that supported internal branches leading to the three main clades (families). Higher levels of amino acid sequence identity within families allowed the use of CLUSTAL W for within-family multiple alignments. CLUSTAL W was run with the PAM250 distance matrix and default gap penalty options. The relatively high sequence identity and the use of CLUSTAL W for within-family multiple alignment allowed for the inclusion of motif intervening regions (MIRs). MIRs contain additional information necessary to obtain accurate within-family phylogenetic reconstructions (McClure and Kowalski 1999Citation ).

Phylogenetic Analysis
Within-family and among-families amino acid alignments were used with the PAUP* 4.0b package (Swofford 1998Citation ) to reconstruct the phylogenies reported here. Both parsimony and the neighbor-joining (Saitou and Nei 1987Citation ) distance method were used in phylogenetic reconstruction. Parsimony heuristic searches were conducted with 10 replicates of random stepwise addition and tree bisection reconnection. Distance-based and parsimony methods gave virtually identical results. All topologies reported here are based on the neighbor-joining method. Trees were rooted with midpoint rooting along the longest branch. One hundred bootstrap replicates were performed using the full heuristic bootstrap option.

Amino Acid Sequence Diversity
Average percentages of amino acid identity and standard deviations within and between families were calculated using a subset of 12 sequences (four representatives from each family). The PAUP* 4.0b program was used to calculate the mean character difference distance matrix. Mean character differences were converted to percentages of identity and averaged within and between the three families.

Nucleotide Sequence Diversity
Closely related amino acid sequences for the Golgi {alpha}-mannosidase IA-IB clade were aligned using CLUSTAL W with the default options. The Golgi {alpha}-mannosidase IA-IB phylogeny was reconstructed as described above. Nucleotide sequences of the same taxa were aligned to correspond to the amino acid alignment using the DNA Stacks program (Eernisse 1992Citation ). Ancestral nucleotide sequences were inferred with parsimony using PAUP* 4.0b. The DnaSP program (Rozas and Rozas 1997Citation ) was used to calculate Ka and Ks values according to the method of Nei and Gojobori (1986)Citation and to perform the McDonald-Kreitman test of neutrality (McDonald and Kreitman 1991Citation ). The time elapsed since IA-IB duplication (TD) was calculated with Ks values using the method of Li (1997)Citation . This calculation was calibrated using the time elapsed since human-mouse speciation (TS) (Kumar and Hedges 1998Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 
Motif Detection and Alignment
A total of 51 amino acid sequences were retrieved from the GenBank database. These sequences included 30 previously functionally characterized {alpha}-mannosidase enzymes, as well as 21 novel putative {alpha}-mannosidase sequences. Because of the sensitivity of the lineage-specific BLAST approach (see Materials and Methods), this data set included many distantly related sequences. Such weakly conserved, distantly related sequences are characteristic of many protein families and superfamilies. Sequence conservation among all members of a protein family is often limited to small islands of relatively high similarity, referred to as motifs (Dayhoff, Schwartz, and Orcutt 1978Citation ). When these regions of high similarity occur in conserved order among a related set of proteins, they are referred to as an OSM (McClure 1991Citation ). The OSM makes up a unique signature that characterizes a protein family (Hudak and McClure 1999Citation ). Identification of the OSM facilitates multiple alignment of distantly related amino acid data sets such as those analyzed here (McClure, Vasi, and Fitch 1994Citation ). A recent comparative analysis of a number of motif detection methods reported that the PROBE program performs the best (Hudak and McClure 1999Citation ). The PROBE program was used to identify and align the conserved OSMs common to all the {alpha}-mannosidase sequences. PROBE identified 10 motifs that range from 14 to 36 amino acid residues in length, with an average length of ~23 residues. The OSM alignment consists of 276 total sites, 231 of which are phylogenetically informative. An alignment of the total data set was generated using CLUSTAL W. This suboptimal (see Materials and Methods) alignment consisted of 1,571 total and 1,164 informative sites.

Phylogeny and Classification
The OSM alignment was used to reconstruct a global {alpha}-mannosidase phylogeny (fig. 1 ). The resulting neighbor-joining tree reveals three robust clades of {alpha}-mannosidase sequences. A parsimony phylogeny reconstructed using the OSM alignment showed an identical topology with the exception of a few weakly supported branches within the three main clades. Distance-based and parsimony phylogenetic reconstructions based on the suboptimal CLUSTAL W alignment also gave qualitatively similar results, with the same three major clades and topological differences in weakly supported terminal nodes. Figure 2 shows an alignment of diagnostic sites that clearly distinguish the three major {alpha}-mannosidase clades. Average percentages of amino acid identity within and between families (table 1 ) are also consistent with the existence of three distinct {alpha}-mannosidase clades.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 1.—Neigbor-joining phylogeny of the {alpha}-mannosidase superfamily. The motif-based PROBE alignment was used to reconstruct this phylogeny as described in Materials and Methods. Vertical bars and roman numerals delineate the three {alpha}-mannosidase families. Taxon name abbreviations are described in Materials and Methods. Bootstrap values are shown above the branches. A scale bar indicating the relative amount of change ({Delta}) along branches is shown

 


View larger version (66K):
[in this window]
[in a new window]
 
Fig. 2.—Alignment of representative sequences that shows diagnostic sites from the ordered series of motifs (OSM) that distinguish the three {alpha}-mannosidase families. The diagnostic sites were chosen as described in Materials and Methods. Bold type for residues within families indicates that at least four out of five residues at a site are identical or similar (i.e., conservative changes). Numbers above the alignment indicate the positions in the OSM alignment at which the diagnostic sites occur

 

View this table:
[in this window]
[in a new window]
 
Table 1 Average Percentages of Amino Acid Identity (±SD) Within and Between {alpha}-Mannosidase Families

 
Clade I consists of eukaryotic {alpha}-mannosidase sequences, including fungal and metazoan representatives. Within this clade, there are two well-supported groups. One group consists mainly of functionally characterized sequences, and the other group is made up of novel sequences characterized in various genome-sequencing projects. Among the functionally characterized group, there are sequences with endoplasmic reticulum (ER), Golgi apparatus, and extracellular enzymatic activity. Sequences of this clade have previously been classified as belonging to glycoside hydrolase family 47 (Henrissat 1991Citation ; Henrissat and Bairoch 1993Citation ).

Clade II is made up of a more diverse set of eukaryotic sequences. In addition to fungal and metazoan isolated sequences, there are also plant, slime mold, and protazoan representatives. Clade II also has the lowest average level of sequence identity (table 1 ). The taxonomic and sequence diversity that characterizes this clade is consistent with the varied biochemical specificities of its taxa. Among the functionally characterized members of this clade, there are sequences with lysosomal, membrane-associated, and Golgi activity. Clade II Golgi {alpha}-mannosidase enzymes show {alpha}-1,3 and {alpha}-1,6 mannose bond cleavage distinct from the {alpha}-1,2 mannosidase activity of clade I Golgi sequences (Moremen, Trimble, and Herscovics 1994Citation ). Clade II sequences have previously been placed into glycoside hydrolase family 38 (Henrissat 1991Citation ; Henrissat and Bairoch 1993Citation ).

In the unrooted global {alpha}-mannosidase tree, clade III falls approximately at the midpoint between clades I and II (fig. 3 ). This group represents the most diverse taxonomic assemblage of {alpha}-mannosidase homologous sequences. Seven of the nine clade III members form a well-supported monophyletic group (fig. 1 ). Among these seven sequences, there are metazoan, fungal, and archaean representatives. The two most basal members of this clade are sequences from gram-negative (E. coli) and gram-positive (M. tuberculosis) eubacteria. However, there is no significant bootstrap support for the branches that group these sequences with the other clade III members (fig. 1 ). These sequences branch off the most internal node in the global phylogeny (fig. 3 ). The phylogenetic location and the eubacterial origin of these sequences suggest that they may be ancestral proto-{alpha}-mannosidase enzymes. The fact that these putative ancestral sequences group most closely with clade III is consistent with the diversity of taxa in this clade and suggests that clade III represents the most ancestral {alpha}-mannosidase family.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3.—Schematic of the unrooted version of the global {alpha}-mannosidase phylogeny shown in figure 1 . The relationship among the {alpha}-mannosidase families and the putative ancestral proto-{alpha}-mannosidase sequences is shown

 
Considered together with the glycoside hydrolase superfamily organization (Henrissat 1991Citation ), the topology of the global {alpha}-mannosidase phylogeny provides a heuristic for a coherent {alpha}-mannosidase classification scheme. Previous work based largely on biochemical characterization and, to a lesser extent, on sequence homology has suggested the existence of two distinct classes of {alpha}-mannosidase enzymes (Moremen, Trimble, and Herscovics 1994Citation ). However, a more recent study reporting the AN-III sequence revealed a new distinct group of closely related class II {alpha}-mannosidase sequences (AN-III, RN-EIII, and SC-VIII) that appeared to be distantly related to previously reported class II sequences (Eades et al. 1998Citation ). The present global {alpha}-mannosidase phylogenetic analysis incorporated six more sequences of this new group and compared them with sequences previously designated class I and II. The results of this comprehensive analysis are consistent with the Eades et al. (1998)Citation study and clearly indicate that there are three distinct well-supported groups of {alpha}-mannosidase sequences (figs. 1 and 2 and table 1 ). Thus, in accordance with the Henrissat scheme, clade III {alpha}-mannosidase sequences are proposed to belong to a new family of glycoside hydrolase sequences.

Above the level of family, the Henrissat scheme includes the designation of clan to cover separate but related families (see http://afmb.cnrs-mrs.fr/~pedro/CAZY/ghf_intro.html). Levels of average percentages of amino acid identity within families (table 1 ) indicate that each family represents a distinct homologous group of sequences. Average percentages of amino acid identity between families (table 1 ), on the other hand, are very low. Such low identity values suggest that sequences between families cannot be considered homologous with statistical confidence (Dayhoff, Schwartz, and Orcutt 1978Citation ). However, several other criteria suggest the possibility that the three {alpha}-mannosidase families studied here share a common ancestor. For example, lineage-specific BLAST searches that use {alpha}-mannosidase sequences from one family as a query can detect {alpha}-mannosidase sequences from different families. In addition, the presence of the OSM signature is suggestive of homology between families. Thus, while the BLAST results and the OSM signature are suggestive of common ancestry but not definitive, the three families of {alpha}-mannosidase sequences reported here are proposed to make up a {alpha}-mannosidase clan, or superfamily, based on the combination of this sequence evidence and the functional analogy of the {alpha}-mannosidase enzymes.

Gene Duplication and Functional Diversification
Gene duplication is a critical step in generating the functional diversification necessary for the evolution of complex organisms (Ohta 1991Citation ). According to the generally accepted view of gene duplication and evolution, the redundancy created by duplication allows paralogous gene copies to evolve new functions (Ohno 1970Citation ). However, once a paralog acquires a newly evolved function that enhances the fitness of its host, this function is likely to be constrained by negative selection (Goodman, Moore, and Matsuda 1975Citation ). One prediction of this hypothesis is that among the members of a gene family, orthologous copies are likely to encode the same functions, while paralogs will encode diverse functions. The {alpha}-mannosidase superfamily, with its abundance of functionally characterized sequences, provides an ideal system to test this prediction. Paralogous {alpha}-mannosidase sequences are known to encode slightly different biochemical activities. For example, among clade I, some sequences are ER-specific and some are Golgi-specific. If these discrete compartmentalized activities have evolved subsequent to duplication in the manner proposed above, then they should appear monophyletic when mapped onto a phylogeny of the sequences that encode them.

Within-family (clade I and II) {alpha}-mannosidase phylogenies were used to test this prediction. Relatively high levels of amino acid sequence homology within families allowed the alignment of MIRs in addition to the OSM. The inclusion of MIRs increased the phylogenetic resolution within families. Within-family phylogenies based on both OSM and MIR sequences showed topologies that were virtually identical (fig. 4 ) to the global {alpha}-mannosidase tree (fig. 1 ) based solely on the OSM alignment. The placement of only one sequence within each tree differed between the within-family and the among-families phylogenies. These results indicate that the OSM likely records an accurate phylogenetic history of the {alpha}-mannosidase superfamily despite the fact that it includes only a subset of the total sequence data. The increased resolution afforded by the inclusion of the MIR sequences manifested itself in a general increase in bootstrap support for the within-family trees. In both family I and family II, {alpha}-mannosidase sequences that encode enzymes with the same cellular compartmentalization group together in well-supported clades (fig. 4 ). These data support the hypothesis of gene duplication followed by functional diversification of paralogs and subsequent canalization of activity among orthologs. The presence of putative sequences in these clades suggests that these as yet uncharacterized sequences will prove to have the same cellular compartmentalization patterns as their close relatives.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 4.—Within-family (I and II) phylogenetic reconstructions. Phylogenies were reconstructed using CLUSTAL W alignments as described in Materials and Methods. Taxa name abbreviations are the same as in figure 1 . Monophyletic groups of sequences with similar cellular compartmentalizations are boxed, and the identities of the compartments are indicated. A solid line surrounds groups that include all characterized sequences, and a dashed line surrounds groups that include some putative sequences. Bootstrap values are shown above the branches. Scale bars indicating the relative amounts of change ({Delta}) along branches are shown

 
Positive Darwinian Selection After {alpha}-Mannosidase IA-IB Duplication
It is clear from the phylogenetic distribution of {alpha}-mannosidase cellular compartmentalization variants that gene duplication followed by functional diversification has been a pivotal mode of {alpha}-mannosidase superfamily evolution. In order to analyze in detail the molecular evolutionary dynamics of gene duplication and subsequent functional diversification, it is advantageous to investigate cases of relatively recent gene duplications. Recent duplications provide tractable levels of nucleotide polymorphism between paralogs and facilitate the accurate calculation of nonsynonymous and synonymous rates of substitution (i.e., avoid saturation of synonymous substitutions). The relatively shallow branch length between the {alpha}-mannosidase Golgi IA and IB clades (figs. 1 and 4 ) indicates that this is the most recent gene duplication yet detected in the {alpha}-mannosidase superfamily. {alpha}-Mannosidase IA- and IB-encoded enzymes have similar functions but differ in aspects of their substrate specificities and the structures of the hydrolysis products produced by their activity (Lal et al. 1998Citation ; Moremen, Trimble, and Herscovics 1994Citation ). The two genes also differ in their expression patterns. {alpha}-Mannosidase IA shows ubiquitous expression, while IB is primarily expressed in the placenta (unpublished data). The patterns of nucleotide polymorphism among the human and mouse IA and IB genes were analyzed in order to evaluate the role natural selection has played in IA-IB enzyme functional diversification subsequent to gene duplication.

Relative rates of Ka and Ks can yield improtant clues as to the nature of selection acting to shape nucleotide variation. A higher rate of Ka than Ks (Ka/Ks > 1) is generally considered unequivocal evidence of positive Darwinian selection (Kimura 1983Citation ; Hughes and Nei 1988Citation ; Sharp 1997Citation ). Levels of Ka and Ks for the IA- IB clade were analyzed to evaluate the role of selection following gene duplication (table 2 ). Comparison of extant IA-IB sequences shows an average Ka/Ks < 1. Such a pattern of variation demonstrates the prevalence of negative selection due to functional constraint on IA and IB amino acid sequences. This result is consistent with the expectations of the neutral theory (Kimura 1983Citation ) and is not surprising when the overall conservation and functional importance of {alpha}-mannosidase activity is considered.


View this table:
[in this window]
[in a new window]
 
Table 2 {alpha}-Mannosidase Clade IA-IB Nonsynonymous (Ka) and Synonymous (Ks) Substitution Rates (x100)

 
Evaluation of extant sequences does not take into account the full historical context of the nucleotide substitution process. Nonsynonymous substitutions have been shown to accelerate during a period of functional differentiation following gene duplication (Li and Gojobori 1983Citation ; Ohta 1994Citation ). During this phase of evolution, positive selection may predominate. Once a new function has evolved, the changes involved in the emergence of the novel activity will be constrained by negative selection (Goodman, Moore, and Matsuda 1975Citation ). To evaluate the nature of nucleotide changes that occurred just subsequent to duplication separate from substitutions after the human-mouse speciation, ancestral IA and IB nucleotide sequences were inferred (fig. 5 ). Ancestral sequences provided for the partitioning of Ka/Ks on individual branches of the IA-IB tree. Previously, this approach revealed evidence of ancient episodes of adaptive evolution (Messier and Stewart 1997Citation ; Zhang, Rosenberg, and Nei 1998Citation ). In the present case, for every branch on the IA-IB tree, Ka/Ks < 1 (fig. 5 ). Thus, there is no unequivocal evidence for positive selection. However, the Ka/Ks ratio for the internal branch connecting the ancestral IA-IB nodes is higher than the ratios for the terminal branches within the IA and IB clades (fig. 5 ). This result indicates a possible postduplication increase in Ka due to positive selection. A number of studies have interpreted such a relative increase in Ka as evidence for positive selection following gene duplication (Long and Langley 1993Citation ; Ohta 1994Citation ; Schmidt et al. 1997Citation ). However, it is also possible that the increase in Ka is due to a period of decreased functional constraint (i.e., negative selection) after duplication.



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 5.—A, Distribution of nucleotide variation on the Golgi {alpha}-mannosidase IA-IB phylogeny. The phylogeny was reconstructed as described in Materials and Methods. Taxon names are the same as in figure 1 , and ancestral nodes (A and B) are indicated. Time since duplication (TD) and time since speciation (TS) were determined as described in Materials and Methods. Numbers next to the branches correspond to Ka/Ks for the branch (table 2 ). The dark gray shading represents the lineage since duplication and before speciation. Functional diversification of IA-IB paralogs occurred along this partition of the tree. The light gray shading represents the orthologous lineages subsequent to speciation. Evolution along this partition of the tree has been dominated by functional constraint. B, Results of the modified McDonald-Kreitman test comparing the relative amounts of nonsynonymous (N) and synonymous (S) change between the two phylogenetic partitions. Fixed substitutions (dark gray) occurred between paralogous lineages, and polymorphic substitutions (light gray) occurred among orthologous lineages

 
To further evaluate the role of selection following gene duplication, a statistical analysis of the phylogenetic distribution of {alpha}-mannosidase IA-IB nucleotide changes based on the idea of the McDonald-Kreitman test (McDonald and Kreitman 1991Citation ) was used (as in Cirera and Aguade 1998Citation ). Ka/Ks > 1 is an extremely stringent criterion for evidence of positive selection (Wolfe and Sharp 1993Citation ). Under a scenario of accelerated adaptive evolution followed by negative selection, positive selection may predominate for only a fraction of the evolutionary history of a given lineage. Furthermore, the substitutions that are favored by positive selection likely represent a minority of the total sites. Therefore, discontinuous episodes of positive selection likely occur against a constant backdrop of negative selection that can easily obscure evidence of their existence. The McDonald-Kreitman test is sensitive in that it can provide evidence of adaptive evolution when Ka/Ks < 1 (Sharp 1997Citation ). In the analysis performed here based on the McDonald-Kreitman test, changes are partitioned on the IA-IB tree (fig. 5 ) to before (fixed) and after (polymorphic) speciation. In other words, sites that showed any variation within one or both of the paralogous groups (i.e., the HS-GIA, MM-GIA and/or the HS-GIB, MM-GIB group in fig. 5 ) are considered polymorphic, while sites with no variation within groups that differ between groups are considered fixed. According to the neutral theory, the ratio of nonsynonymous (N) to synonymous (S) changes should be the same for both classes of change (polymorphic and fixed). A G-test with Williams correction was used to compare polymorphic and fixed N/S (fig. 5 ). This allowed a test for evidence of positive selection following gene duplication. There is a significant departure from the expectations of neutrality due to an excess of fixed N changes. These data are consistent with the relative increase in Ka between ancestral IA and IB nodes and may be due to the fixation of N changes after duplication by positive Darwinian selection for functional diversification.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 
The tremendous accumulation of genomic sequence data provides the opportunity for increased resolution and power in molecular evolutionary studies. This potential has been exploited in the present analysis of the {alpha}-mannosidase enzyme superfamily. The combination of sensitive lineage-specific searches and a motif-based alignment approach has enabled a substantial expansion of the known {alpha}-mannosidase sequence space and revealed the presence of a new family of proteins. Consideration of functional specificity of characterized {alpha}-mannosidase sequences in a phylogenetic context indicated that functional diversification subsequent to gene duplication is a hallmark of {alpha}-mannosidase superfamily evolution. In at least one case, there is evidence that positive Darwinian selection may have acted following gene duplication. Currently, the distance between nearest paralogs prevents analysis of any other groups for a similar pattern of adaptive diversification. However, it is certainly possible that positive selection has played a role in the functional diversification of other {alpha}-mannosidase paralogs. A more robust sampling of sequences within closely related paralogous lineages would facilitate tests of this hypothesis. This is also true for the IA-IB clade characterized here. The time elapsed since the IA-IB duplication (fig. 5 ) suggests that virtually all mammals should have copies of each of these paralogs (barring loss). A denser IA-IB tree could facilitate a precise determination of the timing and nature of molecular adaptive events.


    Footnotes
 
Pierre Capy,

1 Abbreviations: ER, endoplasmic reticulum; MIR, motif intervening region; OSM, ordered series of motifs. Back

2 Keywords: {alpha}-mannosidase glycoside hydrolase gene duplication positive selection adaptation Back

1 Present address: Department of Microbiology and Center for Computational Biology, Montana State University. Back

4 Address for correspondence and reprints: I. King Jordan, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, Bethesda, MD 20894. E-mail: ikingjordan{at}hotmail.com Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 literature cited
 

    Camirand, A., A. Heysen, B. Grondin, and A. Herscovics. 1991. Glycoprotein biosynthesis in Saccharomyces cerevisiae. Isolation and characterization of the gene encoding a specific processing alpha-mannosidase. J. Biol. Chem. 266:15120–15127.[Abstract/Free Full Text]

    Cirera, S., and M. Aguade. 1998. Molecular evolution of a duplication: the sex-peptide (Acp70A) gene region of Drosophila subobscura and Drosophila madeirensis. Mol. Biol. Evol. 15:988–996.[Abstract]

    Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in protiens. Pp. 345–352 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, D.C.

    Dewald, B. and O. Touster. 1973. A new {alpha}-D-mannosidase occurring in Golgi membranes. J. Biol. Chem. 248:7223–7233.[Abstract/Free Full Text]

    Eades, C. J., A. M. Gilbert, C. D. Goodman, and W. E. Hintz. 1998. Identification and analysis of a class 2 alpha-mannosidase from Aspergillus nidulans. Glycobiology 8:17–33.

    Eernisse, D. J. 1992. DNA translator and aligner: HyperCard utilities to aid phylogenetic analysis of molecules. Comput. Appl. Biosci. 8:177–184.[Abstract]

    Fitch, W. M. 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19:99–113.[ISI][Medline]

    Goodman, M., G. W. Moore, and G. Matsuda. 1975. Darwinian evolution in the genealogy of haemoglobin. Nature 253:603–608.

    Haldane, J. B. S. 1932. The causes of evolution. Longmans and Green, London.

    Henikoff, S., E. A. Greene, S. Pietrokovski, P. Bork, T. K. Attwood, and L. Hood. 1997. Gene families: the taxonomy of protein paralogs and chimeras. Science 278:609–614.

    Henrissat, B. 1991. A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem. J. 280:309–316.[ISI][Medline]

    ———. 1998. Glycosidase families. Biochem. Soc. Trans. 26:153–156.[ISI][Medline]

    Henrissat, B., and A. Bairoch. 1993. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem. J. 293:781–788.[ISI][Medline]

    ———. 1996. Updating the sequence-based classification of glycosyl hydrolases. Biochem. J. 316:695–696.[ISI][Medline]

    Henrissat, B., and G. Davies. 1997. Structural and sequence-based classification of glycoside hydrolases. Curr. Opin. Struct. Biol. 7:637–644.[ISI][Medline]

    Henrissat, B., and A. Romeu. 1995. Families, superfamilies and subfamilies of glycosyl hydrolases. Biochem. J. 311:350–351.[ISI][Medline]

    Hudak, J., and M. A. McClure. 1999. A comparative analysis of computational motif-detection methods. Pp. 138–149 in R. B. Altman, A. K. Dunker, L. Hunter, T. E. Klein, and K. Lauderdale, eds. Pacific Symposium on Biocomputing '99. World Scientific, Singapore.

    Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167–170.

    Kerscher, S., S. Albert, D. Wucherpfennig, M. Heisenberg, and S. Schneuwly. 1995. Molecular and genetic analysis of the Drosophila mas-1 (mannosidase-1) gene which encodes a glycoprotein processing alpha 1,2-mannosidase. Dev. Biol. 168:613–626.[ISI][Medline]

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, New York.

    Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917–920.

    Lal, A., P. Pang, S. Kalelkar, P. A. Romero, A. Herscovics, and K. W. Moremen. 1998. Substrate specificities of recombinant murine Golgi alpha1, 2-mannosidases IA and IB and comparison with endoplasmic reticulum and Golgi processing alpha1,2-mannosidases. Glycobiology 8:981–995.

    Lal, A., J. S. Schutzbach, W. T. Forsee, P. J. Neame, and K. W. Moremen. 1994. Isolation and expression of murine and rabbit cDNAs encoding an alpha 1,2-mannosidase involved in the processing of asparagine-linked oligosaccharides. J. Biol. Chem. 269:9872–9881.[Abstract/Free Full Text]

    Li, W. H. 1997. Molecular evolution. Sinauer, Sunderland, Mass.

    Li, W. H., and T. Gojobori. 1983. Rapid evolution of goat and sheep globin genes following gene duplication. Mol. Biol. Evol. 1:94–108.[Abstract]

    Liao, Y. F., A. Lal, and K. W. Moremen. 1996. Cloning, expression, purification, and characterization of the human broad specificity lysosomal acid alpha-mannosidase. J. Biol. Chem. 271:28348–28358.[Abstract/Free Full Text]

    Long, M., and C. H. Langley. 1993. Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260:91–95.

    McClure, M. A. 1991. Evolution of retroposons by acquisition or deletion of retrovirus-like genes. Mol. Biol. Evol. 8:835–856.[Abstract]

    McClure, M. A., and J. Kowalski. 1999. The effects of ordered-series-of-motifs anchoring and sub-class modeling on the generation of HMMs representing highly divergent protein sequences. Pp. 162–170 in R. B. Altman, A. K. Dunker, L. Hunter, T. E. Klein, and K. Lauderdale, eds. Pacific Symposium on Biocomputing '99. World Scientific, Singapore.

    McClure, M. A., T. K. Vasi, and W. M. Fitch. 1994. Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11:571–592.[Abstract]

    McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654.

    Messier, W., and C. B. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151–154.

    Moremen, K. W., R. B. Trimble, and A. Herscovics. 1994. Glycosidases of the asparagine-linked oligosaccharide processing pathway. Glycobiology 4:113–125.

    Muller, H. J. 1935. The origination of chromatin deficiencies as minute deletions subject to insertion elsewhere. Genetics 17:237–252.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.[Abstract]

    Neuwald, A. F., J. S. Liu, D. J. Lipman, and C. E. Lawrence. 1997. Extracting protein alignment models from the sequence database. Nucleic Acids Res. 25:1665–1677.[Abstract/Free Full Text]

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.

    Ohta, T. 1991. Multigene families and the evolution of complexity. J. Mol. Evol. 33:34–41.[ISI][Medline]

    ———. 1994. Further examples of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138:1331–1337.

    Rozas, J., and R. Rozas. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Appl. Biosci. 13:307–311.[Abstract]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    Schmidt, T. R., S. A. Jaradat, M. Goodman, M. I. Lomax, and L. I. Grossman. 1997. Molecular evolution of cytochrome c oxidase: rate variation among subunit VIa isoforms. Mol. Biol. Evol. 14:595–601.[Abstract]

    Sharp, P. M. 1997. In search of molecular Darwinism. Nature 385:111–112.

    Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer, Sunderland, Mass.

    Tatusov, R. L., E. V. Koonin, and D. J. Lipman. 1997. A genomic perspective on protein families. Science 278:631–637.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Tulsiani, D. R. P., S. C. Hubbard, P. W. Robbins, and O. Touster. 1982. {alpha}-D-mannosidases of rat liver Golgi membranes. J. Biol. Chem. 257:3660–3668.[Abstract/Free Full Text]

    Vandersall-Nairn, A. S., R. K. Merkle, K. O'Brien, T. N. Oeltmann, and K. W. Moremen. 1998. Cloning, expression, purification, and characterization of the acid alpha-mannosidase from Trypanosoma cruzi. Glycobiology 8:1183–1194.

    Wolfe, K. H., and P. M. Sharp. 1993. Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J. Mol. Evol. 37:441–456.[ISI][Medline]

    Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708–3713.

Accepted for publication November 1, 1999.