Positive Darwinian Selection at the Pantophysin (Pan I) Locus in Marine Gadid Fishes

Grant H. Pogson and Kathryn A. Mesa

Department of Ecology and Evolutionary Biology, University of California, Santa Cruz

Correspondence: E-mail: pogson{at}darwin.ucsc.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Maximum-likelihood models of codon substitution were used to test for positive Darwinian selection at the vesicle protein pantophysin in two allelic lineages segregating in the Atlantic cod Gadus morhua and in 18 related species of marine gadid fishes. Positive selection was detected in the two intravesicular loops of the integral membrane protein but not in four membrane-spanning regions or the 3' cytoplasmic tail. The proportion of positively selected sites (24.9%) and the mean nonsynonymous/synonymous rate ratio ({omega} = dN/dS = 5.35) were both greater in the first intravesicular (IV1) domain compared with the second intravesicular (IV2) domain (11.0% positively selected sites with mean {omega} = 3.76). Likelihood ratio tests comparing models that assume identical {omega} ratios along all branches of the phylogeny to those that allow {omega} ratios to vary among lineages were not significant for either the IV1 or IV2 domains, indicating that the selective pressures favoring amino acid replacements have operated consistently in both regions during the diversification of the group. Positive selection was observed in the IV1 domain in both G. morhua allelic lineages, and, although three of the four codons that differ between alleles were targets of positive selection in the broader group, no similar polymorphisms were detected in other taxa. The two G. morhua Pan I alleles appeared to have evolved before the speciation event separating it from its sister taxon, Theragra chalcogramma, and on the basis of a standard mtDNA clock are estimated to be at least 2 Myr old. Although the function of pantophysin remains unknown, the strong signal of positive selection at specific sites in the IV1 and IV2 domains may help clarify its role in cellular trafficking pathways.

Key Words: balancing selection • gadid fishes • maximum likelihood • pantophysin • positive selection • S2 ribosomal protein


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
A growing number of statistical approaches have been developed to detect natural selection at the DNA sequence level (reviews in Kreitman and Akashi 1995; Hughes 1999; Yang and Bielawski 2000; Nielsen 2001). One of the most powerful approaches is the test for positive Darwinian selection in which a protein's rate of nonsynonymous subsitution (dN) is compared with its rate of synonymous substitution (dS). Provided that codon bias has not acted to constrain (dS), a dN/dS ratio (i.e., {omega} ratio) exceeding unity is strong evidence for the continued operation of natural selection favoring amino acid replacements. Recently, maximum-likelihood models of codon substitution have been developed that allow for variable {omega} ratios among sites and the identification of individual residues in a protein experiencing positive selection (Nielsen and Yang 1998; Yang et al. 2000). These methods appear to offer a number of advantages over earlier pairwise comparisons of dN and dS among taxa that average over all sites and lineages (Akashi 1999; Crandall et al. 1999 [see, however, Suzuki and Nei 2001]). However, tests for positive selection based on {omega} ratios greater than 1 are extremely stringent and will likely fail to identify adaptive evolution when selection is weak and/or episodic or when power is reduced because of limited taxon sampling (Anisimova, Bielawski, and Yang 2001).

In a wide range of species, elevated dN/dS ratios have commonly been reported at two broad classes of genes—those involved in host-pathogen interactions (e.g., Hughes and Nei 1988; Hughes 1992; Smith, Maynard Smith, and Spratt 1995; Bishop, Dean, and Mitchell-Olds 2000; Ford 2001) and those functioning in reproduction (e.g., Lee, Ota, and Vacquier 1995; Metz and Palumbi 1996; Tsaur and Wu 1997; Wyckoff, Wang, and Wu 2000; Swanson et al. 2001; Swanson, Nielsen, and Yang 2003 [see review by Ford 2002]). Despite this emerging generality, the diversity of genes that might experience positive selection in the genome is unclear. Positive selection has been described at proteins as diverse as digestive enzymes (e.g., Messier and Stewart 1997), cytochromes (e.g., Wu et al. 1999), toxins (e.g., Nakashima et al. 1995), cytokines (Shields, Harmon, and Whitehead 1996), hormones (e.g., Wallis 1996), and antifreeze proteins (e.g., Swanson and Aquadro 2002). The growing list of positively selected genes suggests that diversifying selection may be more common than previously estimated (e.g., Endo, Ikeo, and Gojobori 1996), although details of the selective process in many cases remain unknown.

In a previous study (Pogson 2001), a signature of positive selection was detected at the vesicle trafficking protein pantophysin in the Atlantic cod Gadus morhua. Pantophysin is an integral membrane protein found in small (<100 nm) cytoplasmic microvesicles that are thought to function in a variety of intracellular shuttling pathways (Haass, Kartenbeck, and Leube 1996; Windoffer et al. 1999; Brooks et al. 2000). Although pantophysin's role in trafficking pathways remains unknown, it shares a characteristic structure of two cytoplasmic tails, four membrane-spanning domains, and two intravesicular loops in common with other physins characterized to date (Johnston, Jahn, and Sudhof 1989; Haass, Kartenbeck, and Leube 1996; Fernandez-Chacon and Südhof 1999). Examination of the nucleotide sequences of 124 pantophysin (Pan I) alleles currently segregating in northwest Atlantic populations of G. morhua by Pogson (2001) suggested a prolonged period of diversifying selection: The two common alleles differed by six fixed nonsynonymous substitutions (and no synonymous changes) that clustered in the 56 residue first intravesicular loop (IV1 domain) of the protein. A seventh polymorphic replacement mutation was also detected in the IV1 domain that exhibited clinal variation throughout the north Atlantic, suggestive of an ongoing selective sweep. It is unclear if the elevated dN/dS ratio observed at the pantophysin locus in G. morhua is related to the unusual polymorphism present in the species or whether positive selection is a more general phenomenon occurring in related species irrespective of the presence of polymorphism.

The objective of the present study was to test for positive selection at the pantophysin locus among 18 species of marine fishes belonging to the family Gadidae. Because the transmembrane structure of pantophysin and signal of selection in the IV1 domain were known before the study, maximum-likelihood–based models of codon substitution (Yang et al. 2000) were implemented on different domains separately in addition to the entire protein. Models were also implemented that allow {omega} ratios to vary among lineages to assess the occurrence of positive selection throughout the phylogenetic tree. The 40S ribosomal S2 protein was sequenced from the same taxa to serve as a control and to provide additional phylogenetic information.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Samples
Table 1 lists the 18 species collected for the study and their sample locations. All individuals were captured by otter trawls, gillnets, or handlines with the exception of four E. gracilis that were bought in a Kurisho, Hokkaido fish market. Samples were obtained from 16 of the 22 described species (11 of 12 genera) in the subfamily Gadinae, one of three recognized in the family Gadidae (Cohen et al. 1990). Two outgroup taxa (M. molva and B. brosme) were included from the subfamily Lotinae. For G. morhua, one Pan IA and Pan IB allele were randomly sampled from the Grand Bank sample described in Pogson et al. (2001). DNA was extracted from two to five individuals of each species from EtOH-preserved gill, blood, or fin clips as described in Pogson, Mesa, and Boutilier (1995).


View this table:
[in this window]
[in a new window]
 
Table 1 Species and Sampling Details.

 
PCR and DNA Sequencing
PCR conditions for amplifying the pantophysin gene in G. morhua have been described in Pogson (2001). Primers used for amplifying and sequencing the pantophysin and S2 genes are provided as Supplementary Material online (http://www.molbiolevol.org). Flanking PCR primers 4 and 7 were used to amplify the Pan I gene region from all taxa, except the two outgroups (B. brosme and M. molva) that used forward primer 10. An 896-bp fragment of the 40S ribosomal S2 protein gene (originally cDNA clone GM842) was amplified using cycling conditions identical to those for pantophysin, except that extension times were reduced to 1 min. For both genes, complete sequences off both strands were obtained from two to five individuals using Applied Biosystems Automated Sequencers (Models 373 and 3100). Sequences were edited with Sequence Navigator (ABI) and compiled into consensus sequences with AutoAssembler (ABI). Alignments were performed using the default settings of the ClustalW program (Thompson, Higgins, and Gibson 1994) and followed by minor adjustments by eye. Sequences have been deposited in GenBank under accession numbers AY292467 to AY292500.

Phylogenetic Analyses
For both genes, maximum-parsimony (MP), neighbor-joining (NJ), and maximum-likelihood (ML) analyses were performed using PAUP* 4.0b8 (Swofford 1998). Before the ML analyses, the hierarchical likelihood ratio tests (LRTs) implemented by the Modeltest 3.06 program of Posada and Crandall (1998) identified the General Time Reversible (GTR) model of Rodriguez et al. (1990) as the best substitution model for both genes. ML analyses were performed using empirical base frequencies, a proportion of invariable sites estimated from the data, and among-site rate heterogeneity with gamma shape parameters of 1.12 and 1.13 for the pantophysin and S2 genes, respectively. For the MP analyses, 100 replicates of random taxon addition were followed by heuristic searches with tree-bisection-reconnection (TBR) branch swapping. NJ trees were reconstructed using Kimura (1980) two-parameter (K2P) distances and allowing for rate heterogeneity as per the ML analyses. Bootstrap support values for the MP and NJ trees were obtained from 1,000 replicates. The method of Shimodaira and Hasegawa (1999) was used to test alternative tree topologies in PAUP* using the fully optimized model, 1,000 bootstrap replicates, and among-site rate variation as described above.

Tests for Positive Selection
The CODEML program of the PAML package (Yang 1997) was used to test for the presence of positively selected sites (i.e., codons) in the S2 and Pan I genes. LRTs were performed comparing the scores obtained from models M7 and M8. Model M7 (ß) assumes that {omega} ratios follow a ß distribution constrained in the interval (0, 1). Under model M8 (ß and {omega}), an additional class of sites is added to the M7 model that allow for a proportion to have {omega} ratios (estimated from the data) to exceed unity. Although other models in the PAML package were also implemented (i.e., M0, M1, M2, and M3), the results presented here are restricted to the comparisons of M7 and M8 because these represent the most stringent tests of positive selection (Anisimova, Bielawski, and Yang 2001). Because of a priori knowledge of pantophysin's structure (Haass, Kartenbeck, and Leube 1996) and the signal of selection in the IV1 domain of G. morhua (Pogson 2001), analyses were also performed on the four domains of the protein separately (cytoplasmic, transmembrane, IV1, and IV2). To evaluate the presence of local optima on the likelihood surface (cf. Suzuki and Nei 2001), model M8 was rerun with starting {omega} values of 0.5, 1.0, 1.5, and 3.0. If sites with {omega} ratios greater than 1 were identified, the Bayesian method of Nielsen and Yang (1998) was used to calculate posterior probabilities for each site.

For genes/domains in which positive selection was inferred by LRTs, estimates of {omega} ratios along branches of the phylogeny were obtained by "free-ratios" models, which assume a different ratio for each branch. To evaluate the heterogeneity of substitution patterns among pantophysin domains, as well as between the Pan I and S2 genes, the fixed-sites models described in Yang and Swanson (2002) were also implemented on the combined data sets. These models allow for the possibility of different substitution rates (rs), base frequencies ({pi}s), transition/transversion rate ratio ({kappa}), and {omega} ratios among the partitioned data (i.e., the four Pan I domains and the S2 gene). Nested LRTs were then applied to evaluate the extent of heterogeneity among domains/genes. Fixed-sites models were also implemented on a subset of 12 species that overlapped with Carr et al.'s (1999) study by combining 896 bp of mtDNA (401 bp of cytochrome b and 495 bp of COI) to the Pan I and S2 sequences as a sixth partition. ML estimates of synonymous and nonsynonymous substitution rates between pantophysin and S2 sequences were obtained by the method of Yang and Nielsen (2000) using the yn00 program in the PAML package.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Divergence of the Pantophysin and S2 Genes
A 1,828-bp fragment of the pantophysin gene was analyzed from 18 taxa (including gaps). The gene region consisted of four exons (627 bp) and four introns (1,201 bp). At the nucleotide level, pairwise distances ranged from 0.1% to a maximum of 19.3% (overall mean = 9.6% [table 2]). At the protein level, gamma distances varied from zero (between G. macrocephalus and G. ogac) to 25.1% and exhibited an overall mean of 13.1%. For the S2 ribosomal protein, an 896-bp fragment was sequenced (representing 549 bp of coding DNA and 347 bp from two introns). Pairwise distances between S2 sequences ranged from 0.7% to 9.2% with an overall mean of 3.2%. The highly conserved S2 protein exhibited 97.8% to 98.3% amino acid identity with mammalian homologs (see Suzuki, Olvera, and Wool 1991). Only three positions in the protein varied among the 18 taxa, and the mean pairwise gamma distance was 0.4%. Although the S2 protein showed dramatically lower rates of nonsynonymous substitution (mean pairwise dN = 0.002) compared with pantophysin (mean pairwise dN = 0.064), the mean rates of synonymous substitution were similar in the two proteins (mean pairwise dS = 0.176 and 0.123, respectively). There was no evidence of codon bias in either gene; Wright's (1990) effective number of codons ranged from 54.5 to 60.8 for pantophysin and from 55.4 to 60.9 for S2 among the 18 taxa.


View this table:
[in this window]
[in a new window]
 
Table 2 Pairwise Kimura Two-Parameter Distances Among Taxa.

 
Phylogenetic Analyses
For the pantophysin gene, the MP, NJ, and ML analyses produced identical topologies, and bootstrap support throughout the tree was high (fig. 1A). All tree building methods grouped the Pan IA allele of the Atlantic cod G. morhua with the Pacific taxon T. chalcogramma rather than with the Pan IB allele. When either of the two G. morhua alleles were removed from the analyses, T. chalcogramma and G. morhua always grouped as sister taxa with bootstrap support values exceeding 95%. The pantophysin phylogeny differs markedly from that based on morphological and life-history characters (e.g., Svetovidov 1948; Dunn 1989) and resolves all ambiguities in the molecular phylogenies of Carr et al. (1999) who sequenced 896 bp of mtDNA from 14 gadid species (12 overlapping with the present study). The mtDNA sequences analyzed by Carr et al. (1999) were unable to determine the relationships among Gadus, Boreogadus, and Theragra and also could not reliably place P. virens on the tree. The pantophysin gene provided clear resolution of all taxa and clearly placed the two Pollachius species ancestral to M. aeglefinus/M. merlangus. Similar to that reported by Carr et al. (1999), no fixed differences were observed between G. ogac and G. macrocephalus, but a C to A mutation at position 294 in the second intron was nearly fixed in the five Greenland cod individuals sequenced (i.e., one CA heterozygote was detected). In addition to Gadus, the pantophysin phylogeny suggested that genus Microgadus is polyphyletic—M. proximus was more closely related to E. gracilis than it was to M. tomcod (see table 2). Carr et al. (1999) also came to a similar conclusion for the genus Microgadus but using a different Eleginus species, E. navaga. Shimodaira-Hasegawa (1999) tests confirmed that topologies constraining the monophyly of either Gadus and Microgadus were both significantly worse than the ML tree (P = 0.032 and 0.021, respectively).



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 1. Neighbor-Joining (NJ) trees for the (A) pantophysin and (B) S2 genes based on Kimura two-parameter distances. Bootstrap support values from 1,000 replicates are presented for the maximum-parsimony tree (first value) and the NJ tree (second value)

 
Phylogenetic relationships inferred from the S2 sequences failed to provide the same clear resolution as pantophysin (fig. 1B). For the MP analyses, five equally parsimonious trees of 253 steps were obtained (CI = 0.822, RI = 0.846) in which the relationships among Gadus spp., Theragra, Boreogadus, and Arctogadus could not be resolved. Both the strict consensus MP tree and the ML tree failed to identify T. chalcogramma and G. morhua as sister taxa, but all deeper branches were congruent with the pantophysin phylogeny. G. macrocephalus and G. ogac exhibited apparently fixed differences at two sites in the S2 gene (positions 355 and 618) and G. ogac had a 3-bp deletion in an intron at position 834 that was not present in G. macrocephalus.

Positive Selection
Table 3 presents the results of tests for positive selection involving the comparison of models M7 and M8. For all tests, G. ogac and G. macrocephalus were collapsed into a single lineage because they were indistinguishable at the protein level. Considering the entire pantophysin sequence, model M8 fit the data significantly better than model M7 and 8.4% of the sites were identified as experiencing positive selection with a mean {omega} ratio of 4.74. Model M8 suggested the presence of 17 positively sites (12 in the IV1 domain and five in the IV2 domain), but only seven sites had posterior probabilities exceeding 0.95 (fig. 2). When models M7 and M8 were implemented on the IV1 domain separately, the same 12 positively selected sites were identified, but the mean {omega} ratio increased to 5.35 and all had posterior probabilities greater than 0.95. An additional two sites were identified (positions 24 and 67), but posterior probabilities for both were low (<0.64). Models fit to the IV2 domain alone performed more poorly than those on the entire sequence; only three positively selected sites were identified, with only one having P > 0.95. Four sites experiencing diversifying selection were also identified in the transmembrane domains, but, because the LRT was not significant, these likely represent false positives. Different starting values of {omega} were found to have a negligible effect on model M8 (not shown). No positively selected sites were detected in the S2 gene.


View this table:
[in this window]
[in a new window]
 
Table 3 Log-Likelihood Scores, Parameter Estimates, and Positively Selected Sites Identified in the Pantophysin and S2 Genes.

 


View larger version (46K):
[in this window]
[in a new window]
 
FIG. 2. Aligned amino acid sequences of pantophysins from the 18 taxa. The locations of structural domains are presented at the top of the figure. Positively selected sites in the IV1 and IV2 domains identified by model M8 are indicated in bold italics. Amino acid insertions shown in the figure were not included in the tests for positive selection

 
To identify where positive selection in the IV1 and IV2 domains occurred in the phylogeny, and to assess the heterogeneity of {omega} ratios among lineages, "free-ratios" models were compared with "one-ratio" models that assume identical {omega} ratios over the entire tree. With 32 df, neither LRT for the IV1 (2{Delta}{ell} = 27.48, P = 0.695) or IV2 domain (2{Delta}{ell} = 19.25, P = 0.963) was significant, indicating that {omega} ratios were not heterogeneous among lineages. Branches experiencing positive selection in the IV1 or IV2 domains are shown in figure 3. In IV1, dN was observed to exceed dS on 19 out of 33 branches. Ratios of dN/dS were undefined on 13 branches (because of the absence of synonymous substitutions), and the observed numbers of nonsynonymous substitutions were generally low. Positive selection was detected in the IV2 domain along 14 branches, although {omega} ratios on nine branches were undefined. Estimates of dN exceeded dS in both intravesicular domains on seven branches. The signature of positive selection was only slightly higher for external branches (15 of 18 exhibiting {omega} ratios > 1) versus interior branches (11 of 15 with {omega} ratios > 1).



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 3. Cladogram of pantophysin sequences showing rectangular branches where the "free-ratios" models identified {omega} ratios greater than 1. Numbers above the branches provide ML estimates of the numbers of nonsynonymous/synonymous substitutions for the IV1 domain (filled rectangles) and the IV2 domain (open rectangles). Branches in which positive selection was detected in both domains have both patterns

 
The substitutions observed in both intravesicular loops were nonrandom in both their locations and likely effects on protein structure/function. In the IV1 domain, nearly 75% of the inferred substitutions (61 of 82) occurred at the 14 positively selected sites identified in table 3 and 50% of these changes (41) involved charged residues. In the IV2 domain, 50% of the inferred substitutions (24 of 48) occurred at the five positively selected sites identified by model M8, and a similar high proportion (19) involved charged amino acids. Insertions of amino acids were also detected in both intravesicular loops (two in IV1 and three in IV2), and all occurred directly adjacent to sites experiencing positive selection. Interestingly, three of the four sites that differ between the two G. morhua Pan I alleles in the IV1 domain (30, 48, and 51) were identified as targets of positive selection (see figure 2). The radical polymorphic replacement mutation identified in the IV1 domain by Pogson (2001) (called the Pan IA' allele) also occurred at a positively selected site (position 58).

Results from the Fixed-Sites Models
Table 4 presents the results of the fixed-sites models implemented on the combined pantophysin and S2 data sets that had been divided into five partitions (the four Pan I domains and the S2 sequences). The transmembrane topology of pantophysin confirmed by Haass, Kartenbeck, and Leube (1996) allowed the a priori division of the protein into four domains for these analyses. Allowing the substitution rate to vary among partitions (model B) fit the data significantly better than the simplest model (A) that assumes no rate heterogeneity. Rates of substitution in the IV1 and IV2 domains were both three times higher than observed in the transmembrane regions of the same protein and four times higher than observed at the S2 gene. Overall, pantophysin was evolving about 2.5 times faster than S2. Likelihood ratio tests comparing more complex models to simpler nested models were all significant. Model F (equivalent to performing separate analyses on each partition) provided a significantly better fit than model E despite using 33 x 5 branch lengths for the five partitions. Similar results were obtained for the reduced data set of 12 species (overlapping with Carr et al. [1999]) in which 896 bp of mtDNA sequence was added as a sixth partition (not shown). Here, the LRT comparing models A and B was highly significant (2{Delta}{ell} = 326.69, P < 0.0001) and indicated that the pantophysin IV1 and IV2 domains were evolving at about 80% of the rates exhibited by the two mitochondrial genes. Overall, the two mitochondrial genes were evolving at about twice the rate of the entire pantophysin sequence.


View this table:
[in this window]
[in a new window]
 
Table 4 Results from the Fixed-Sites Models.

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Positive Selection
The maximum-likelihood analyses implemented in this study have confirmed that positive Darwinian selection at the pantophysin locus is widespread among marine gadid fishes. Similar to that observed for the two G. morhua Pan I alleles by Pogson (2001), strong diversifying selection was observed in the first intravesicular loop (IV1 domain) of the protein and 50% of the inferred amino acid substitutions involved charged residues. One quarter of sites in the IV1 domain were targets of positive selection with a mean {omega} ratio (5.35), greater than previously described for abalone lysin (e.g., Yang, Swanson, and Vacquier 2000) but comparable to human class I MHC alleles (e.g., Yang and Swanson 2002). Diversifying selection was also detected in the smaller second intravesicular loop (IV2 domain) but was weaker (11% positively selected sites with mean {omega} ratio of 3.76) and occurred on fewer branches of the phylogeny than the IV1 domain. The action of positive selection in the two intravesicular loops accelerated the rates of evolution of both domains relative to other regions of pantophysin and the S2 ribosomal protein. Ratios of dN/dS were not heterogeneous among lineages in either intravesicular loop, suggesting that the selective pressures favoring such substitutions have been consistent throughout the diversification of the group. Because the role of pantophysin in cell trafficking pathways remains unknown, the functional significance and the selective pressures favoring these mutations remains unclear.

Prior knowledge of diversifying selection in the IV1 domain of G. morhua allowed random-sites models of codon substitution to be implemented on four domains of pantophysin separately. This approach was expected to provide an advantage in identifying positively selected sites because it is known that estimating {omega} ratios averaged over all amino acid positions in a protein can reduce the ability to detect positive selection (Crandall et al. 1999; Anisimova, Bielawski, and Yang 2001). However, the results obtained from the models fit to the four domains tended not to outperform those using all sites. Fitting model M8 fit to the IV1 domain did result in a dramatic improvement in the posterior probabilities of positively selected sites, but the same model fit to the IV2 domain failed to identify two sites suggested by the analysis on the entire sequence (table 3). Because the numbers of sequences analyzed was sufficient to detect positive selection with reasonable power, the small number of sites in the IV2 domain and/or low sequence divergence might explain the discrepancies between models fit to separate domains versus the entire sequence.

Implementation of the fixed-sites models that used prior knowledge of differential selection among pantophysin domains also fit the data more poorly than the random-sites models. This is likely caused by the presence of sites experiencing positive and purifying selection in the IV1 and IV2 domains, as several highly conserved features are evident in both intravesicular loops. For example, two cysteine residues are present in both regions (C23 and C52 in IV1 and C150 and C160 in IV2) similar to pantophysin and the related synaptophysin of both human and mouse (see Haass, Kartenbeck, and Leube 1996). Furthermore, in the IV1 domain, a block of five amino acids (YPFRL) is completely conserved among the 18 gadid species examined in the present study and in both human and mouse physins (Haass, Kartenbeck, and Leube 1996). The interspersion of sites experiencing positive and purifying selection in the partitioned data compromises the performance of fixed-sites models and highlights the advantages provided by the random-sites models of Nielsen and Yang (1998) and Yang et al. (2000). Yang and Swanson (2002) reached similar conclusions in their fixed-sites analyses of abalone lysins and human class I MHC alleles.

The detection of positive selection in the two intravesicular loops of pantophysin (and at specific residues within both loops) should direct researchers to explore the functional significance of these domains (and sites). Diversifying selection in the internalized loops of pantophysin could be driven by variable selection pressures related to changes in the physicochemical properties of the vesicle's cargo or to its interactions with other trafficking proteins. However, it is also possible that the positive selection observed is unrelated to pantophysin's normal function. For example, after microvesicle fusion, the intravesicular loops will be externalized on the target membrane, and if this occurs with the plasma membrane, the loops will be exposed on the cell surface. The intravesicular regions of all pantophysins characterized to date contain potential N-glycosylation sites, and Brooks et al. (2000) have recently shown that adipocyte pantophysin is glycosylated in vivo. Because many pathogens are known to target surface glycoproteins as receptors for cell invasion (Karlsson 1995), pathogen evasion might be responsible for the positive selection detected in the IV1 and IV2 domains as suggested for other cell surface proteins by Baum, Ward, and Conway (2002). The pathogen evasion hypothesis fails to explain the high divergence between the two Pan I alleles of G. morhua in a 30-bp region of the second intron in which a stem-loop structure has been disrupted by several deletions in the Pan IB allelic lineage (Pogson 2001). In this species, epistatic selection may be operating to maintain specific associations of intron and amino acid mutations in the IV1 domain that may be unrelated to pathogen evasion per se.

The diversifying selection observed in the present study was not associated with balanced polymorphism in any gadid species other than the Atlantic cod G. morhua. The divergent allelic lineages observed in the Atlantic cod could be an example of what Hughes (1999) has termed "transient polymorphism" that evolves in a single species because of a unique association of selection and opportunity. The absence of polymorphism in other gadids is somewhat surprising in that the four of the five polymorphic IV1 sites observed in G. morhua were targets of positive selection in the broader group. However, our ability to detect polymorphism in other taxa was limited by the small numbers of individuals sequenced (usually three or less) and the sampling of only a single population. It is noteworthy that the Pan I polymorphism in G. morhua would have gone undetected with a similar sampling strategy in several north Atlantic populations (i.e., Nova Scotia or the Barents Sea) because of highly skewed allele frequencies (see Pogson 2001).

Systematics and Biogeography
Monophyly of the subfamily Gadinae was strongly supported by both nuclear genes analyzed in the present study. The branching order of taxa differs sharply from Svetovidov (1948) and Dunn (1989), notably in the position of Micromesistius, which both authors concluded was the most-derived taxon. The pantophysin and S2 genes clearly place M. poutassou as a sister taxon to Trisopterus as the most primitive genera in the group. Unlike the mtDNA sequences analyzed by Carr et al. (1999), the pantophysin tree provided clear separation of Boreogadus/Arctogadus, Gadus spp. and Theragra, and Shimodaira-Hasegawa (1999) tests provided clear rejections of the assumption of monophyly for two genera (Gadus and Microgadus). The increased resolution provided by pantophysin does not appear to have resulted from a faster rate of evolution driven by positive selection. Removing positively selected sites identified by model M8 from the data was found to have no effect on the phylogeny, apart from collapsing the two G. morhua alleles and T. chalcogramma into an unresolved polytomy (not shown). Furthermore, fixed-sites models fit to the combined pantophysin, S2, cytochrome b, and COI sequences of 12 taxa in common with Carr et al.'s (1999) study showed that the two IV domains were evolving 80% as rapidly as the two mitochondrial genes. The inability of the mtDNA to resolve species relationships thus appears to result primarily from a higher level of homoplasy in the data.

All phylogenetic analyses implemented in the study grouped the Pan IA allele of Atlantic cod G. morhua with the Pacific Alaska pollock T. chalcogramma, rather than with the conspecific Pan IB allele. This suggests that the polymorphism characterized by Pogson (2001) may have evolved before the speciation event separating G. morhua and T. chalcogramma. This interpretation should be viewed with caution because the sequence divergence between Theragra and the two G. morhua alleles are very similar (see table 2), and only a single phylogenetically informative mutation at a positively selected site in the IV1 domain (position 51) clusters the G. morhua Pan IA allele with Theragra. Because parallel evolution is evident at many of the positively selected IV1 sites (see figure 2), it is possible that the N->T mutation shared by G. morhua and T. chalcogramma is homoplasious. An extensive survey of T. chalcogramma populations throughout the Pacific has failed to uncover the pollock equivalent of the G. morhua Pan IB allele (M. Canino, personal communication). However, if a transpolymorphism did exist after the separation of G. morhua and T. chalcogramma, it is also possible that the Pan IB allele has been lost in the latter by selection or even drift (Clark 1997). Irrespective of this uncertainty, the mtDNA divergence between G. morhua and T. chalcogramma of 4.3% observed by Carr et al. (1999) can be used to provide a rough estimate of the age of the Pan I polymorphism in the Atlantic cod: assuming a standard molecular clock of 2% per Myr, the two G. morhua alleles appear to be at least 2 Myr old.

The pantophysin gene tree also raises questions about the accepted biogeographic origins of several gadid species in the north temperate oceans. The family Gadidae originated in the Arctic–North Atlantic Basin in the early Tertiary (Svetovidov 1948) and could not have invaded the north Pacific until the opening of the Bering Strait in the mid-Pliocene about 3.0 to 3.5 MYA (Herman and Hopkins 1980). Earlier work based on allozymes (Grant 1987; Grant and Ståhl 1988) and mtDNA (Carr et al. 1999) had concluded that two Pacific species, G. macrocephalus and T. chalcogramma, represented independent invasions of the north Pacific from a presumed G. morhua ancestor. Grant and Ståhl (1988) also observed that G. macrocephalus possessed greatly reduced levels of allozyme polymorphism (and a highly skewed allele frequency spectra) compared with its presumed Atlantic ancestor and suggested it was caused by a bottleneck associated with speciation. However, the Pan I phylogeny strongly suggests that G. macrocephalus first colonized the north Pacific from an ancestor related to Boreogadus/Arctogadus and that G. morhua subsequently reinvaded the north Atlantic. The Greenland cod G. ogac represents another extremely recent recolonization of the Atlantic from a G. macrocephalus origin. A second independent colonization of the Pacific apparently occurred by the ancestor of E. gracilis/M. tomcod. These results suggest that major movements between the north temperate oceans can occur more commonly in this group than previously believed, and the low allozyme variation in G. macrocephalus cannot be attributed to a speciation bottleneck.

In summary, strong positive selection was observed at two intravesicular loops of the vesicle trafficking protein pantophysin in marine gadid fishes. The selection pressures favoring substitutions in pantophysin's IV domains appear to have been operating throughout the diversification of the subfamily Gadinae as well as the polymorphic allelic lineages detected in G. morhua. Similar to the Lyb-2 gene in the mouse studied by Hughes (1993), pantophysin may represent another example of a gene known to experience positive Darwinian selection before its function is fully understood. It is hoped that future work can take advantage of this signal of positive selection to elucidate pantophysin's role in cellular trafficking pathways.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
We would like to thank Paul Bentzen, Giacomo Bernardi, Mike Canino, Simon Courtenay, Karen Crow, Anna Kristin Danielsdottir, Jim Estes, Svein-Erik Fevolden, Jens Jacob, Wendy Staub, and Takashi Yanagimoto for their assistance in collecting the gadid species analyzed in this report. We also thank Ziheng Yang for providing assistance in running various models in the PAML package and three anonymous reviewers for providing helpful comments on improving the paper.


    Footnotes
 
Stephen Palumbi, Associate Editor Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 

    Akashi, H. 1999. Within- and between-species DNA sequence variation and the "footprint" of natural selection. Gene 238:39-51.[CrossRef][ISI][Medline]

    Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of the likelihood ratio tests in detecting adaptive molecular evolution. Mol. Biol. Ecol. 18:1585-1592.

    Baum, J., R. H. Ward, and D. J. Conway. 2002. Natural selection on the erythrocyte surface. Mol. Biol. Evol. 19:223-229.[Abstract/Free Full Text]

    Bishop, J. G., A. M. Dean, and T. Mitchell-Olds. 2000. Rapid evolution of plant chitinases: molecular targets of selection in plant-pathogen coevolution. Proc. Natl. Acad. Sci. USA 97:5322-5327.[Abstract/Free Full Text]

    Brooks, C. C., P. E. Scherer, K. Cleveland, J. L. Whittemore, H. F. Lodish, and B. Cheatham. 2000. Pantophysin is a phosphoprotein component of adipocyte transport vesicles and associates with GLUT4-containing vesicles. J. Biol. Chem. 275:2029-2036.[Abstract/Free Full Text]

    Carr, S. M., D. S. Kivlichan, P. Pepin, and D. C. Crutcher. 1999. Molecular systematics of gadid fishes: implications for the biogeographic origins of Pacific species. Can. J. Zool. 77:19-26.[CrossRef][ISI]

    Clark, A. G. 1997. Neutral behavior of shared polymorphism. Proc. Natl. Acad. Sci. USA 94:7730-7734.[Abstract/Free Full Text]

    Cohen, D. M., T. Inada, T. Iwamoto, and N. Scialabba. 1990. Gadiform fishes of the world (Order Gadiformes). Rome, Italy: FAO species catalogue. Vol. 10.

    Crandall, K. A., C. R. Kelsey, H. Imamichi, H. C. Lane, and N. P. Salzman. 1999. Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. Mol. Biol. Evol. 16:372-382.[Abstract]

    Dunn, J. R. 1989. A provisional phylogeny of gadid fishes based on adult and early life-history characters. Pp.209–235 in D. M. Cohen, ed. Papers on the systematics of Gadiform fishes. Natural History Museum of Los Angeles County, Los Angeles, Calif.

    Endo, T., K. Ikeo, and T. Gojobori. 1996. Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13:685-690.[Abstract]

    Fernandez-Chacon, R., and T. C. Südhof. 1999. Genetics of synaptic vesicle function: toward the complete functional anatomy of an organelle. Annu. Rev. Phys. 61:753-776.[CrossRef][ISI][Medline]

    Ford, M. J. 2001. Molecular evolution of transferrin: evidence for positive selection in salmonids. Mol. Biol. Evol. 18:639-647.[Abstract/Free Full Text]

    Ford, M. J. 2002. Applications of selective neutrality tests to molecular ecology. Mol. Ecol. 11:1245-1262.[CrossRef][ISI][Medline]

    Grant, W. S. 1987. Genetic divergence between congeneric Atlantic and Pacific Ocean fishes. Pp. 225–246 in N. Ryman and F. Utter, eds. Population genetics and fisheries management. University of Washington Press, Seattle, Wash.

    Grant, W. S., and G. Ståhl. 1988. Evolution of Atlantic and Pacific cod: loss of genetic variation and gene expression in Pacific cod. Evolution 42:138-146.[ISI]

    Haass, N. K., J. Kartenbeck, and R. E. Leube. 1996. Pantophysin is a ubiquitously expressed synaptophysin homologue and defines constitutive transport vesicles. J. Cell Biol. 134:731-746.[Abstract]

    Herman, Y., and D. M. Hopkins. 1980. Arctic oceanic climate in late Cenozoic time. Science 209:557-562.[ISI]

    Hughes, A. L. 1992. Positive selection and interallelic recombination at the merozoite surface antigen-1 (MSA-1) locus of Plasmodium falciparum. Mol. Biol. Evol. 9:381-393.[Abstract]

    Hughes, A. L. 1993. Evidence of positive selection at the Lyb-2 locus of the mouse. Immunogenetics 38:54-56.[ISI][Medline]

    Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York.

    Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170.[CrossRef][ISI][Medline]

    Johnston, P. A., R. Jahn, and T. C. Südhof. 1989. Transmembrane topography and evolutionary conservation of synaptophysin. J. Biol. Chem. 264:1268-1273.[Abstract/Free Full Text]

    Karlsson, K. A. 1995. Microbial recognition of target-cell glycoconjugates. Curr. Opin. Struct. Biol. 5:622-635.[CrossRef][ISI][Medline]

    Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[ISI][Medline]

    Kreitman, M., and H. Akashi. 1995. Molecular evidence for natural selection. Annu. Rev. Ecol. Syst. 26:403-422.[CrossRef][ISI]

    Lee, Y.-H., T. Ota, and V. D. Vacquier. 1995. Positive selection is a general phenomenon in the evolution of abalone sperm lysine. Mol. Biol. Evol. 12:231-238.[Abstract]

    Messier, W., and C.-B. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151-154.[CrossRef][ISI][Medline]

    Metz, E. C., and S. R. Palumbi. 1996. Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein binding. Mol. Biol. Evol. 13:397-406.[Abstract]

    Nakashima, K., I. Nobuhisa, and M. Deshimaru, et al. (11 co-authors). 1995. Accelerated evolution in the protein-coding regions is universal in crotalinae snake venom gland phospholipase A2 isozyme genes. Proc. Natl. Acad. Sci. USA 92:5605-5609.[Abstract]

    Nielsen, R. 2001. Statistical tests of neutrality in the age of genomics. Heredity 86:641-647.[CrossRef][ISI][Medline]

    Nielsen, R, and Z. Yang. 1998. Likelihood methods for detecting positively selected sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.[Abstract/Free Full Text]

    Pogson, G. H. 2001. Nucleotide polymorphism and natural selection at the pantophysin (Pan I) locus in the Atlantic cod, Gadus morhua (L.). Genetics 157:317-330.[Abstract/Free Full Text]

    Pogson, G. H., K. A. Mesa, and R. G. Boutilier. 1995. Genetic population structure and gene flow in the Atlantic cod, Gadus morhua: a comparison of allozyme and nuclear RFLP loci. Genetics 139:375-385.[Abstract/Free Full Text]

    Pogson, G. H., C. T. Taggart, K. A. Mesa, and R. G. Boutilier. 2001. Isolation by distance in the Atlantic cod, Gadus morhua, at large and small geographic scales. Evolution 55:131-146.[ISI][Medline]

    Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.[Abstract]

    Rodriguez, F., J. F. Oliver, A. Marin, and J. R. Medina. 1990. The general stochastic model of nucleotide substitution. J. Theor. Biol. 142:485-501.[ISI][Medline]

    Shields, D. C., D. L. Harmon, and A. S. Whitehead. 1996. Evolution of hemopoietic ligands and their receptors: Influence of positive selection on correlated replacements throughout ligand and receptor proteins. J. Immunol. 156:1062-1070.[Abstract]

    Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114-1116.[Free Full Text]

    Smith, N. H., J. Maynard Smith, and B. G. Spratt. 1995. Sequence evolution of the porB gene of Neisseria gonorrhoeae and Neisseria meningitidis: evidence of positive Darwinian selection. Mol. Biol. Evol. 12:363-370.[Abstract]

    Suzuki, K., Olvera, J., and I. G. Wool. 1991. Primary structure of rat ribosomal protein S2. A ribosomal protein with arginine-glycine tandem repeats and RGGF motifs that are associated with nucleolar localization and binding to ribonucleic acids. J. Biol. Chem. 266:20007-20010.[Abstract/Free Full Text]

    Suzuki, Y., and M. Nei. 2001. Reliabilities of parsimony-based and likelihood-based methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 18:2179-2185.[Abstract/Free Full Text]

    Svetovidov, A. N. 1948. Gadiformes. Fauna of the U.S.S.R. 9:1-222.

    Swanson, W. J., and C. F. Aquadro. 2002. Positive Darwinian selection promotes heterogeneity among members of the antifreeze protein multigene family. J. Mol. Evol. 54:403-410.[ISI][Medline]

    Swanson, W. J., R. Nielsen, and Q. Yang. 2003. Pervasive adaptive evolution in mammalian fertilization proteins. Mol. Biol. Evol. 20:18-20.[Abstract/Free Full Text]

    Swanson, W. J., Z. Yang, M. F. Wolfner, and C. F. Aquadro. 2001. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA 98:2509-2514.[Abstract/Free Full Text]

    Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4637-4680.

    Tsaur, S. C., and C.-I. Wu. 1997. Positive selection and the molecular evolution of a gene of male reproduction, Acp26Aa of Drosophila. Mol. Biol. Evol. 14:544-549.[Abstract]

    Wallis, M. 1996. The molecular evolution of vertebrate growth hormones: a pattern of near-stasis interrupted by sustained bursts of rapid changes. J. Mol. Evol. 43:93-100.[ISI][Medline]

    Windoffer, R., M. Borchert-Stuhltrager, N. K. Haass, S. Thomas, M. Hergt, C. J. Bulitta, and R. E. Leube. 1999. Tissue expression of the vesicle protein pantophysin. Cell Tissue Res. 296:499-510.[CrossRef][ISI][Medline]

    Wright, F. 1990. The "effective number of codons" used in a gene. Gene 87:23-29.[CrossRef][ISI][Medline]

    Wu, W., M. Goodman, M. I. Lomax, and L. I. Grossman. 1997. Molecular evolution of cytochrome c oxidase subunit IV: evidence for positive selection in simian primates. J. Mol. Evol. 44:477-491.[ISI][Medline]

    Wyckoff, G. J., W. Wang, and C.-I. Wu. 2000. Rapid evolution of male reproductive proteins in the descent of man. Nature 403:304-309.[CrossRef][ISI][Medline]

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Medline]

    Yang, Z., and J. P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496-503.[CrossRef][ISI][Medline]

    Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43.[Abstract/Free Full Text]

    Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressures at amino acid sites. Genetics 155:431-449.[Abstract/Free Full Text]

    Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selection pressures among site classes. Mol. Biol. Evol. 19:49-57.[Abstract/Free Full Text]

    Yang, Z., W. J. Swanson, and V. D. Vacquier. 2000. Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17:1446-1455.[Abstract/Free Full Text]

Accepted for publication August 5, 2003.





This Article
Abstract
FREE Full Text (PDF)
Supplementary Material
All Versions of this Article:
21/1/65    most recent
msg237v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (10)
Request Permissions
Google Scholar
Articles by Pogson, G. H.
Articles by Mesa, K. A.
PubMed
PubMed Citation
Articles by Pogson, G. H.
Articles by Mesa, K. A.