Power Analysis of Tests for Loss of Selective Constraint in Cave Crayfish and Nonphotosynthetic Plant Lineages

Jim Leebens-Mack*{dagger} and Claude dePamphilis{dagger}

*Department of Biology, Colgate University, New York;
{dagger}Department of Biology and Institute of Molecular Evolutionary Genetics, Penn State University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Loss of selective constraint on a gene may be expected following changes in the environment or life history that render its function unnecessary. The long-term persistence of protein-coding genes after the loss of known functional necessity can occur by chance or because of selective maintenance of an unknown gene function. The selective maintenance of an alternative gene function is not demonstrated by the failure of statistical tests to reject the hypothesis that there has been no change in the degree of constraint on the evolution of coding genes. Maintenance may be inferred, however, when power analyses of such tests demonstrate that there has been a sufficient number of nucleotide substitutions to detect the loss of selective constraint. Here, we describe a power analysis for tests of loss of constraint on protein-coding genes. The power analysis was applied to loss-of-constraint tests for opsin gene evolution in cave-dwelling crayfish and rbcL evolution in nonphotosynthetic parasitic plants. The power of previously applied tests for loss of constraint on cave crayfish opsin genes was insufficient to distinguish between chance retention and selective maintenance of opsin genes. However, the power of codon-based likelihood ratio tests for change in dN/dS (={omega}) (nonsynonymous to synonymous change) did have sufficient power to detect a loss of constraint on rbcL associated with a loss of photosynthesis in most examples but failed to detect such a change in three independent lineages. We conclude that rbcL has been selectively maintained in these holoparasitic plant lineages. This conclusion suggests that either these taxa are photosynthetic for at least a part of their life or rbcL may have an unknown function in these plants unrelated to photosynthesis.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The long-term impact of natural selection acting over many generations is evident in the nucleotide substitution patterns of protein-coding genes. Purifying selection acts to reduce the frequency of amino acid replacements caused by nonsynonymous nucleotide substitutions. As a result, the observed rate of nonsynonymous substitutions per nonsynonymous site (dN) is typically much lower than the per site rate of synonymous nucleotide substitution (dS) (e.g., Li, Wu, and Luo 1985Citation ; Yang and Nielsen 1998Citation ). The long-term strength of purifying selection is measured as the ratio of nonsynonymous to synonymous change (dN/dS = {omega}). For example, in an analysis of 48 mammalian nuclear genes, Yang and Nielsen (1998)Citation estimated rate ratios ranging from 0.017, for a highly conserved ATP synthase gene, to 0.838, for a weakly constrained interleukin gene. In contrast, genes exposed to diversifying selection are expected to have dN/dS > 1.0 (e.g., Swanson and Vacquier 1995Citation ; Messier and Stewart 1997Citation ; Yang et al. 2000Citation ; Barrier, Robichaux, and Purugganan 2001Citation ). These expectations allow one to predict how changes in the selective regime will be reflected in patterns of nucleotide substitutions in phylogenetic analyses of protein-coding genes. Such predictions form the basis of statistical tests for hypothesized variation in selection among lineages in a phylogeny or among coding domains in a gene (reviewed in Yang and Bielawski 2000Citation ).

For protein-coding genes that have evolved under purifying selection, the rate of nonsynonymous substitutions and dN/dS are expected to increase when a gene is converted to a pseudogene or when changes in the environment, life history, or developmental program render the gene's function obsolete. Obsolete genes may persist as open reading frames for long periods of time because of chance alone (Marshall, Raff EC, and Raff RA 1994Citation ). However, before the full-length open reading frame is interrupted by a stop codon, the cessation of purifying selection is expected to become evident in the pattern of nucleotide substitutions. Genes or pseudogenes that are not under the influence of purifying selection should on an average have dN/dS rate ratios (={omega}) equal to unity. This a priori expectation provides a basis to test for the loss of selectively maintained gene function (Crandall and Hillis 1997Citation ; Nei, Zhang, and Yokoyama 1997Citation ).

The null hypothesis for such comparative tests is that {omega} is the same in hypothetically constrained and unconstrained lineages. As in any formal hypothesis test, simply failing to reject the null hypothesis does not prove its veracity. The test may not have sufficient power to detect a change in {omega}. Given the narrowly defined alternative hypothesis in tests for the loss of selective constraint ({omega} = dN/dS = 1), it is relatively easy to simulate sequence evolution according to the alternative hypothesis and estimate the type-II error rate, ß. The type-II error rate is the probability of failing to reject the null hypothesis when the alternative is true. In cases where ß is very small, and yet the null hypothesis is not rejected, one must seriously consider the possibility that the null hypothesis is correct. In the context presented here, when a change in the environment, life history, or developmental program that has rendered known gene function obsolete is not associated with a change in {omega}, selective maintenance of some unknown gene function may be implicated.

Here, we provide a strategy for performing such a power analysis and apply it to the analyses of a photosynthetic gene, rbcL, in nonphotosynthetic parasitic plants (Wolfe and dePamphilis 1997Citation , 1998Citation ) and a photoreceptor gene coding for an opsin in cave-dwelling crayfish (Crandall and Hillis 1997Citation ). In both examples the failure to detect a change in the selective constraint has raised the question of an alternative function for these genes.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Phylogenetically based tests for change in tempo or mode of molecular evolution most typically involve comparisons between related lineages. At a minimum, three taxa are required in order to estimate branch lengths. In cases where more than three taxa are included in the analysis, it is best to infer topological relationships using data other than the gene sequences under consideration. In the analyses described here, we chose this approach to circumscribe subsets of taxa that include (1) a clade or lineage where the focal gene is hypothetically obsolete, (2) a sister lineage, and (3) an out-group. For each subset, we obtained maximum likelihood (ML) estimates of the parameters of Goldman and Yang's codon-based model of gene sequence evolution (Goldman and Yang 1994Citation ; Yang and Bielawski 2000Citation ) and ancestral nucleotide sequences (Yang, Kumar, and Nei 1995Citation ). We then tested the null hypothesis that there was no variation in {omega} among lineages. In cases where we were unable to reject the null hypothesis, we simulated sequence evolution under the loss of functional constraint hypothesis using estimated ancestral sequences, branch lengths, and rate parameters (fig. 1 ). The simulated data sets were used to perform formal power tests for each analysis.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1.—Flow diagram for generating test statistic distributions according to the null model for hypothesis testing and according to the alternative hypothesis for power analysis (after Huelsenbeck and Ranala 1997Citation )

 
rbcL Analysis
Parasitic plants have served as model systems for understanding how genes and genomes evolve in new adaptive zones (dePamphilis and Palmer 1990Citation ; Wolfe, Morden, and Palmer 1992Citation ; dePamphilis 1995Citation ). Recent phylogenetic investigations have shown a single origin of parasitism among the parasitic Orobanchaceae-Scrophulariaceae, followed by multiple independent losses of photosynthesis associated with holoparasitism within this group (dePamphilis, Young, and Wolfe 1997Citation ; Wolfe and dePamphilis 1998Citation ; Young, Steiner, and dePamphilis 1999Citation ; Olmstead et al. 2001Citation ). This situation offers an ideal opportunity to compare the molecular evolution of plastid genes in distinct nonphotosynthetic lineages (dePamphilis 1995Citation ; dePamphilis, Young, and Wolfe 1997Citation ; Young and dePamphilis 2000Citation ). We performed the following analysis in order to interpret the interesting observation that rbcL, the plastid gene coding for the large subunit of RUBISCO, has been retained in many of these distinct nonphotosynthetic lineages.

Sequence Data
Three plastid genes from eight photosynthetic and seven nonphotosynthetic plant species were included in the rbcL analysis. The rbcL genes sampled from the nonphotosynthetic species were intact, with open reading frames and no apparent amino acid substitutions, relative to photosynthetic seed plants in the region coding for the active site (Kellogg and Juliano 1997Citation ; Wolfe and dePamphilis 1998Citation ). A combined phylogenetic analysis of plastid-encoded intron maturase (matK) and ribosomal protein (rps2) was performed to establish the topological relationships of the species included in the rbcL analysis. Neither of these genes is involved in photosynthesis. Most of the matK (Young, Steiner, and dePamphilis 1999Citation ), rps2 (dePamphilis, Young, and Wolfe 1997Citation ), and rbcL (Wolfe and dePamphilis 1997Citation , 1998Citation ) sequences were collected for previous studies (table 1 ). Additional matK sequences were collected using a Beckman CEQ2000 automated sequencer following the manufacture's protocols as modified by Barkman et al. (2000)Citation . One-quarter volume sequencing reactions and postsequencing cleanup were performed in 96-well microtiter plates. The methodological details for PCR amplification were as reported by Young, Steiner, and dePamphilis (1999)Citation . PCR products were sequenced directly following Qiagen PCR column cleanup. All sequences are available on GenBank (table 1 ), and the sequence alignments for all three genes have been placed in TreeBASE (http://www.treebase.org/treebase/). The lengths of aligned sequences were 1,320 bp (440 codons) for rbcL, 1,632 bp for matK, and 616 bp for rps2.


View this table:
[in this window]
[in a new window]
 
Table 1 Source of Sequences for Photosynthetic and Nonphotosynthetic Plant Species

 
Phylogenetic Analysis
An ML analysis of the combined rps2 and matK data set was performed in PAUP* (Swofford 2000Citation ) using the Hasegawa-Kashino-Yano model (HKY) of nucleotide evolution (Hasegawa, Kashino, and Yano 1985Citation ) with variation in the rates of change across sites modeled as an approximated gamma distribution (Yang 1994Citation ). Bootstrap probabilities for each node on the ML tree were estimated after resampling the combined data matrix 100 times (Felsenstein 1985Citation ). Bootstrap support values for each clade were also estimated in Neighbor-Joining and maximum parsimony analyses using 500 bootstrap replicates. The HKY plus gamma model described for the ML analysis was also used to estimate the Neighbor-Joining bootstrap support values.

The resulting ML phylogeny was used to identify five pairs of photosynthetic and nonphotosynthetic lineages and an out-group for each pair (figs. 2 and 3 ). Separate tests for change in constraint on rbcL genes in the nonphotosynthetic lineage were performed for each of the five subsets.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 2.—The ML phylogram estimated from a combined data set of matK and rps2 sequences shows five independent losses of photosynthesis. Bootstrap values (ML-NJ-MP) are shown above each node. One value is shown for the cases where all analyses gave the same result

 


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3.—The ML rbcL-based phylograms for the five separate tests for loss of constraint with loss of photosynthesis. The inferred number of substitutions (nucleotide–amino acid) assuming a single-rate ratio, {omega}, is shown above each branch

 
Estimation of rbcL Evolution
The evolution of rbcL was modeled using using the codon-based model of sequence evolution developed by Goldman and Yang (1994)Citation and implemented in Yang's CODEML program (Yang 1999Citation ). The probability of nucleotide substitution along a branch of length t is estimated as P(t) = eQt, where the elements of the matrix Q are qij, the instantaneous substitution rate from codon i to codon j (i != j):

0 if j is a stop codon

0 if i and j differ at more than a single position

{pi}j for synonymous transversions

{kappa}{pi}j for synonymous transitions

{omega}{pi}j for nonsynonymous transversions

{omega}{kappa}{pi}j for nonsynonymous transitions.

For each of the five comparisons, the estimated parameters included the transition-transversion rate ratio ({kappa}), the per site nonsynonymous-synonymous rate ratio, {omega}, and the expected equilibrium frequencies for all 61 amino acid–coding codons, {pi}j. Codon frequencies were estimated from the observed nucleotide frequencies at each of the three codon positions. Ancestral sequences were also estimated in CODEML following the procedure described by Yang, Kumar, and Nei (1995)Citation . The probability of stop codons arising, pstop, in the absence of selective maintenance was approximated by allowing qij != 0 when j is a stop codon and then summing pij(t) across the inferred ancestral sequence for all three stop codons. Note that this approximation assumes that ancestral sequences were estimated without error. The probability of stop codons arising in nonphotosynthetic clades, pstop(clade), for Harveya and Orobanche was approximated as:


A likelihood ratio test was used to assess change in {omega} in nonphotosynthetic plant lineages (Yang 1998Citation ; Yang and Nielsen 1998Citation ). A simple model, where the rate ratio {omega} did not vary among branches, was compared with a slightly more complicated model that estimated one-rate ratio for branches leading to photosynthetic taxa ({omega}p) and another rate ratio for branches leading to nonphotosynthetic taxa ({omega}n). The likelihood ratio was calculated as the difference in log likelihoods: {delta} = lnL(two-rate ratios) - lnL(one-rate ratio).

For each of the five analyses, a null distribution for the likelihood ratio was generated using separate Monte-Carlo simulations. The sequence-generating simulation program is available upon request. For each analysis, 100 data sets were simulated using the basal ancestral sequence, model parameters, and branch lengths estimated from the original data sets under the single-rate ratio model (fig. 1 ). Following the procedures of Goldman (1993Citation , Huelsenbeck and Ranala 1997)Citation , the likelihood ratio was calculated for each simulated data set. P-values were estimated from the simulated null distributions and compared with P-values obtained by assuming that two times the nested likelihood ratio approximates a chi-square distribution for long sequences evolving under the null model (Yang and Bielawski 2000Citation ).

In order to assess the power of each test, 100 additional data sets were simulated according to the two-rate ratio model (fig. 1 ). The likelihood ratio statistic, {delta}, was calculated for each simulated data set, and the distribution of the resulting {delta}'s was compared with the null distribution. The probability of failing to reject a false null hypothesis, ß, was estimated as the proportion of the power distribution with {delta} less than the upper 5% of the null distribution.

Opsin Analysis
Others have taken alternative approaches for assessing variation among evolutionary lineages in the strength of purifying selection acting on protein-coding genes (Crandall and Hillis 1997Citation ; Nei, Zhang, and Yokoyama 1997Citation ). Here, we use one example to demonstrate how the Monte-Carlo simulation approach described for the likelihood ratio test can be used for any loss-of-constraint test performed on protein-coding genes.

Opsins are photosensitive pigments that play a central role in color vision for vertebrates (Yokoyama S and Yokoyama R 1996Citation ) and invertebrates (Briscoe and Chittka 2001Citation ). Opsins have also been implicated as playing a role in the entrainment of circadian rhythms in vertebrates, including subterranean blind mole rats (Spalax ehrenbergi) (Argamaso et al. 1995Citation ; David-Gray et al. 1999Citation ; Janssen et al. 2000Citation ). Crandall and Hillis (1997)Citation tested for loss of selective constraint on opsin genes sampled from three independent cave-dwelling crayfish species: Cambarus hubrichti, Orconectes australis, and Procambarus orcinus. Separate comparative analyses were performed to assess rhodopsin gene evolution in each cave-dwelling lineage relative to a surface-dwelling sister species. Analogous to our rbcL investigation, three-taxon data sets included a cave-dwelling species, a surface-dwelling congener, and an out-group taxon (Crandall and Hillis 1997Citation ). Ancestral nucleotide sequences were estimated using CODEML (Yang 1999Citation ), and nonsynonymous and synonymous substitutions were inferred for each pair of related cave- and surface-dwelling crayfish lineages. A 2 x 2 contingency table analysis was then used to test for an association between habitat and the type of inferred nucleotide substitution (nonsynonymous vs. synonymous). When the contingency table analyses and Kolmogorov-Smirnov tests for differences in nucleotide and amino acid substitution matrices all failed to detect a change in the cave-dwelling lineages, Crandall and Hillis (1997)Citation hypothesized that rhodopsin may have a function other than photoreception in lightless cave environments.

In order to assess the power of the contingency table analysis performed by Crandall and Hillis, we simulated opsin gene evolution taking the same approach described for the rbcL power analysis. For each three-taxon analysis specified in the original study, sequences were simulated with the per site nonsynonymous-synonymous rate ratio set at {omega}cave = 1.0 for the cave-dwelling species. Branch lengths, rate ratios for branches leading to the surface-dwelling crayfish, {omega}surface, and ancestral sequence were estimated in CODEML (Yang 1999Citation ).

Following the procedure of Crandall and Hillis, the number of nonsynonymous and synonymous substitutions along branches leading to the cave- and surface-dwelling species were inferred and used to test for independence between habitat and substitution type for all the 100 simulated data sets. The distribution of P-values obtained for Fisher's exact test was calculated for each analysis. Whereas the original paper reported the probability of the inferred contingency table values for nonsynonymouns and synonymous substitutions in cave- and surface-dwelling crayfish under the null hypothesis, we calculated P-values for the more appropriate one-tailed test. The P-values for the one-tailed test are influenced not only by the probability of the observed contingency table values but also by the probabilities of all other contingency tables that would suggest a stronger association while maintaining equal row and column totals. The power of each analysis, ß, was estimated in two ways. First, ß was estimated as the frequency of simulated data sets, where Pone-tailed > 0.05 (fig. 1 ). As a comparison, ß was also estimated using a more standard power analysis for Fisher's exact test (Cassagrande, Pike, and Smith 1978Citation ) implemented in DSTPLAN (Brown et al. 2000Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Holoparasite rbcL Analysis
The history of plastid diversification among the sampled taxa was well resolved in the rps2-matK phylogeny (fig. 2 ). The generally high bootstrap values allowed for unambiguous circumscription of five subsets that included a nonphotosynthetic lineage, closely related photosynthetic lineage, and an out-group lineage (fig. 3 ). The nonphotosynthetic lineage in each subset represents an independent origin of holoparasitism (fig. 2 , dePamphilis, Young, and Wolfe 1997Citation ; Wolfe and dePamphilis 1998Citation ; Young, Steiner, and dePamphilis 1999Citation ). Independent tests for loss of selective constraint on rbcL were performed on each subset as described previously.

For two of the five subsets, the likelihood ratio test resulted in the rejection of the null hypothesis that {omega} was equal in the nonphotosynthetic and photosynthetic lineages (table 2 ). Note that for all the likelihood ratio tests, the P-values derived from our simulation–based null distribution were very similar to those based on the chi-square null distribution (table 2 ). The rate ratios differed significantly between Alectra orobanchoides, a holoparasite, and A. sessiliflora, a photosynthetic hemiparasite (table 2 ). In this example, the power of the likelihood ratio test was suggested by the significant difference detected in the dN/dS rate ratio along a branch with only five inferred nucleotide substitutions (fig. 3 ). However, the probability of failing to reject the null hypothesis when {omega}n = 1.0 on a phylogeny with similar ancestral sequence, branch lengths, and transition bias was quite high (ß = 0.77, fig. 4A ).


View this table:
[in this window]
[in a new window]
 
Table 2 Likelihoods, Nonsynonymous-Synonymous Rate Ratios ({omega}), and Test Statistics for Assessing Loss of Constraint on rbcL Genes in Nonphotosynthetic Holoparasites

 


View larger version (19K):
[in this window]
[in a new window]
 
Fig. 4.—The cumulative probability distributions of the test statistic {delta} estimated from data sets {diamond} simulated under the null model (one {omega}) and {circ} simulated under the alternative model ({omega}p != {omega}n = 1.0). The dashed line represents the upper 5% of the null distribution

 
The test for loss of constraint was substantially more powerful in the other four analyses (table 2 , figs. 4 and 5 ), and a second significant change in {omega} was found on the branch leading to Lathraea clandestine (table 2 ). In contrast, no change in constraint on rbcL sequence evolution was detected in the lineages leading to Striga gesnerioides, Harveya or Orobanche (table 2 ). The results of power analyses suggested that we should have detected a change in {omega}, assuming that these lineages lost photosynthetic function immediately after each had diverged from their respective photosynthetic sister clades (fig. 3 ).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5.—Analyses of loss constraint on rbcL in Harveya and Orobanche were performed assuming (A and B) early and (C and D) late loss of photosynthesis in either clade. The cumulative probability distributions of the test statistic {delta} estimated from data sets {diamond} simulated under the null model (one {omega}) and {circ} simulated under the alternative model ({omega}P != {omega}n = 1.0). The dashed line represents the upper 5% of the null distribution

 
Given these results, we might hypothesize that RUBISCO plays an important role in a function other than photosynthesis in some but not all nonphotosynthetic plants. An alternative explanation, however, is that photosynthesis has been lost only recently in these lineages, and our power estimates are inflated by the assumption that all of the observed substitutions occurred after the loss of photosynthesis. Although we have no way to infer the time at which photosynthesis was lost along the branches ending in single holoparasitic species (A. orobanchoides, S. gesnerioides, and L. clandestina), in the Harveya and Orobanche analyses we can invoke a parsimony argument to assert that photosynthesis had been lost before the holoparasitic sister species diverged from each other (fig. 3D and E ). In both these cases, two power analyses were performed: (1) assuming loss of photosynthesis occurred with the origin of the holoparasitic clade (early), and (2) assuming loss of photosynthesis occurred at the time the sister species diverged (late).

In the Harveya analysis, the tests for change in {omega} with loss of photosynthesis was more powerful when photosynthesis was assumed to be lost early in the history of the holoparasitic clades (table 2 , fig. 5 ). This makes sense because there is more time for change in constraint to affect the number of observed synonymous and nonsynonymous substitutions. The branches leading to both Orobanche species were quite long (fig. 3E ), and there was no overlap observed between {delta} distributions estimated from data simulated under the null hypothesis ({omega}p = {omega}n) and the alternate hypothesis ({omega}n = 1) (fig. 5 ).

Despite the power of the likelihood ratio tests in the Harveya and Orobanche analyses, no change in {omega} was detected when the rbcL genes from holoparasitic sister species were assumed to be evolving under the same degree of constraint. Even under the null model, however, the inferred ratio of amino acid to nucleotide substitutions was much higher for H. capensis relative to H. purpurea, and for O. corymbosa relative to O. fasciculata (fig. 3 ). On the basis of this observation, we estimated the likelihood ratios for change in {omega} along the branches leading to H. capensis and O. corymbosa (table 2 ). For each of these analyses, the P-values reported in table 2 would lead us to reject the null hypothesis if we had proposed a priori that the rbcL genes in H. capensis and O. corymbosa were evolving differently from the rbcL genes in their respective sister species. However, because we posed this hypothesis only after inspecting the inferred substitution patterns, the reported P-values are inflated, and should be interpreted with caution.

With this caveat in mind, the combined results of the likelihood ratio tests and power analyses clearly demonstrate that rbcL evolution has been constrained in the lineages leading to the holoparasitic species S. gesnerioides, H. purpurea, and O. fasciculata. These species are achlorophyllous and thought to be totally dependent on their hosts for carbohydrates. Although weak rbcL gene expression has been detected in L. clandestina (Bricaud, Thalouarn, and Renaudin 1986Citation ; Delavault, Sakanyan, and Thalouarn 1995Citation ; Lusson, Delavault, and Thalouarn 1998Citation ), there are no rbcL gene expression data for S. gesnerioides, H. purpurea, and O. fasciculata. Intact rbcL transcription promoters have been sequenced from both O. fasciculata and O. corymbosa (Wolfe and dePamphilis 1997Citation ). Interestingly, the stem-loop structure deduced for the 3' untranscribed portion of the rbcL gene in photosynthetic plants, thought to be important in termination, is conserved in O. corymbosa but degenerate in O. fasciculate (Wolfe and dePamphilis 1997Citation ).

Although the rbcL genes in some of the holoparasitic lineages included in this investigation are apparently evolving under relaxed selective constraint, it is not surprising that the rbcL reading frames are still intact. Given the inferred ancestral sequences and the relatively short branches along which we are assuming loss of gene function, the probability of stop codons arising in the holoparasitic lineages was inferred to be quite low in most cases (table 2 ). As others have pointed out, unexpressed or obsolete genes may persist as open reading frames for some time even in the absence of selective maintenance (Marshall, Raff EC, and Raff RA 1994Citation ). It is also worth noting that the probability of the origin of a stop codon is not perfectly proportional to branch length. The probability is strongly influenced by the frequency of codons in the ancestral gene that are within one nucleotide substitution of a stop codon. As a consequence, the stop codon probabilities reported in table 2 may be inaccurate if the inferred ancestral sequences differ from the true ancestral sequences.

Because length mutations were not considered, the probability that an open reading frame would be truncated in the absence of selective constraint is even greater than the reported stop codon probabilities (table 2 ). Analyses of the noncoding regions of the plastid genome suggest that length mutations, especially those involving nucleotide repeats, may be as common as base changes (e.g., Golenberg et al. 1993Citation ). With this in mind, intact functional rbcL sequences can be taken as additional circumstantial evidence that the large subunit of RUBISCO has been selectively maintained in some holoparasitic lineages.

Crayfish Opsin Analysis
Although the contingency test for change in dN/dS used by Crandall and Hillis in their analysis of opsin genes in cave-dwelling crayfish (Crandall and Hillis 1997Citation ) differed from the likelihood ratio test described previously, the power of the test was also assessed with data sets simulated under the alternative hypothesis that hypothetically unconstrained sequences were evolving with dN equal to dS. For all the three contingency tests performed in the original article, ß was found to be greater than 65% (fig. 6 ). The more standard power analysis of Cassagrande, Pike, and Smith gave very similar estimates of ß (fig. 6 ). The results of the contingency analyses, therefore, are not suggestive of selectively maintained function for opsin in cave-dwelling crayfish.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 6.—The distribution of P-values for Fisher's Exact Test performed on 100 data sets simulated according to the alternative hypothesis ({omega}surface != {omega}cave = 1.0). The dark bars represent replicates where the null hypothesis was rejected. Type-II error rates were estimated as proportion of simulated data sets with P > 0.05 (ßs) and using the method of Cassagrande et al. (ßc). Arrows represent P-values obtained for one-tailed Fisher's Exact Test applied to the original data (Crandall and Hillis 1997Citation )

 
Detecting Change in Function Versus Loss of Constraint
An adaptive change in gene function has been inferred as positive selection for increase in the ratio of nonsynonymous to synonymous nucleotide substitutions (e.g., Messier and Stewart 1997Citation ; Barrier, Robichaux, and Purugganan 2001Citation ). Although the null hypothesis in tests for positive selection is identical to the null hypothesis for the loss of function tests described previously, the alternative hypotheses are different. The alternative hypothesis for change of function test is {omega} > 1.0. However, the inferred rate ratios may be much less than 1.0 when adaptive change has occurred over a short period of time or adaptations have involved just a few key amino acid substitutions. Depending on the context of a particular test, an estimated rate ratio less than unity may be interpreted as adaptive evolution (Yang 1998Citation ) or loss of function (table 2 ). The context is set by the ecological or biochemical information that motivated the a priori test for change in the ratio of nonsynonymous to synonymous change. When adaptive evolution is suspected, the power analyses described here to test for the loss of function could easily be modified for tests for positive selection on entire genes or specific domains.

The analyses of protein-coding genes described here are obviously simplistic. A single-rate ratio parameter, {omega}, is estimated and modeled for the entire gene. Studies of rbcL evolution have clearly demonstrated variation in amino acid substitution rates among structural domains (Kellogg and Juliano 1997Citation ; Wolfe and dePamphilis 1998Citation ). Recent advances in the modeling of amino acid substitution patterns among the structural domains exhibited in globular proteins (Goldman, Thorne, and Jones 1998Citation ) and transmembrane proteins (Liò and Goldman 1999Citation ) suggest that the estimation of variation in evolutionary rates among gene sequences and phylogenies would be enhanced by including information on the secondary structure. Even in instances when protein structure is not known, analyses allowing for among site variation in {omega} could help elucidate the function of particular sequence domains (Yang et al. 2000Citation ; Yang and Bielawski 2000). The likelihood ratio tests available in PAML for assessing variation in {omega} across a protein-coding gene (Yang et al. 2000Citation ) have been shown to be quite powerful (Anisimova, Bielawski, and Yang 2001Citation ). The number of parameters estimated in such an analysis, however, could be quite large in tests of variation in {omega} across a coding gene and among branches on a phylogeny. Failure to detect lineage-specific adaptive or diversifying selection in such analyses would be difficult to interpret without an accompanying power analysis.

Tests for variation in {omega} among domains and branches may be especially weak when branch lengths are short, the variation in selection among domains is subtle, or domain size is limited to a few codons. For example, most of the variation in the spectral sensitivity of vertebrate opsins can be explained by the amino acid composition at just five sites (Yokoyama and Radlwimmer 2001Citation ). Moreover, adaptive substitutions at one or two of these sites have resulted in spectral tuning that has maximized light sensitivity in extreme environments (Yokoyama 1997Citation ). The two functional opsin genes known from the Comoran coelacanth (Latimeria chalumnae) have evolved to produce pigments with maximal sensitivity ({lambda}max) in the very narrow range of the light spectrum that reaches its deep-sea habitat (Yokoyama et al. 1999Citation ). Functional cone (David-Gray et al. 1999Citation ) and rod (Janssen et al. 2000Citation ) opsin genes have been identified for the subterranean blind mole rat S. ehrenbergi. As with the coelacanth opsins, {lambda}max for the cone opsin shows an adaptive shift to the red portion of the spectrum relative to {lambda}max for homologous opsins in other mammals. Although experiments and anatomical studies have shown that S. ehrenbergi is indeed blind, multiple lines of evidence suggest that the reduced, subcutaneous eye and its photosensitive pigments have been coopted to aid in light-mediated entrainment of circadian rhythms (Argamaso et al. 1995Citation ; David-Gray et al. 1999Citation ; Janssen et al. 2000Citation ). In these examples, adaptive evolution has been demonstrated by linking change in amino acid sequence with change in protein function (Yokoyama S and Yokoyama R 1996Citation ; David-Gray et al. 1999Citation ; Yokoyama et al. 1999Citation ). A change in the ratio of nonsynonymous to synonymous nucleotide substitutions has not been demonstrated in these cases.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Comparative approaches in phylogenetics offer a solid framework for testing hypotheses concerning the evolutionary processes responsible for the observed patterns of diversity (e.g., dePamphilis 1995Citation ). However, a recent change in the evolutionary process may not have detectable influence on patterns of diversity for some time. Power analysis provides a means of assessing whether the failure to detect a change in the evolutionary process is caused by the absence of change or insufficient data. Given the complexity of the processes being inferred in studies of molecular evolution, the power of tests for change in constraint or rate are likely to depend not only on the degree of variation among sequences but also on the phylogenetic relationships among the genes being analyzed (Bromham et al. 2000Citation ). The Monte-Carlo simulation offers a flexible means to generate distributions for hypothesis testing and power analyses that are tailored to each data set. The generality of the likelihood ratio test extends to the assessment of its power for specific applications (fig. 1 ). For example, the same approach described here for analyzing changes in evolution at the molecular level has been used to assess the power of a likelihood ratio test for simultaneous Pleistocene speciation events in the grasshopper genus Melanoplus (Knowles 2000Citation ).

Testing the fit of gene sequence data to explicit evolutionary models can certainly increase our understanding of gene function. The results of such tests, however, must be interpreted with caution. In some instances power analysis can guide the interpretation of statistical tests that have failed to detect change or reject a null hypothesis. In earlier articles, Wolfe and dePamphilis (1997Citation , 1998Citation ) found no evidence of reduced evolutionary constraint on rbcL genes of the nonphotosynthetic plants included in their analysis. Three hypotheses were offered to explain their surprising result: (1) the plants are photosynthetic for at least a portion of their life, (2) the carboxylase or oxygenase activity of the peptide encoded by rbcL serves a critical function in some nonphotosynthetic plants, or (3) the open reading frame for rbcL is simply persisting by chance, and it will eventually be interrupted by a stop codon or length mutation. The power analyses performed here suggest that we can reject the third hypothesis for the persistence of rbcL in S. gesnerioides, Harveya purpurea, and Orobanche fasciculata. Physiological and gene expression studies are necessary to test the remaining two hypotheses.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We thank Hiroshi Akashi, Todd Barkman, Olle Pellmyr, Allan Strand, Andi Wolfe, Ziheng Yang, Ned Young, and two anonymous reviewers for their helpful comments on our manuscript. We thank Lena Landherr Sheaffer for sequencing additional matK genes. We especially thank Ziheng Yang for helping us use routines from his CODEML program for use in our sequence generating simulation program. Keith Crandall and David Hillis kindly provided us with their crayfish opsin sequences. This work was supported through NSF grants to J.L.-M. and C.deP.


    Footnotes
 
Elizabeth Kellogg, Reviewing Editor

Keywords: molecular evolution statistical power likelihood ratio test rbcL opsin rhodopsin crayfish Orobanchaceae Scrophulariaceae Back

Address for correspondence and reprints: Jim Leebens-Mack, Department of Biology and Life Sciences Consortium, 518 Mueller Laboratory, Penn State University, University Park, Pennsylvania 16802. jleebensmack{at}psu.edu Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Anisimova M., J. P. Bielawski, Z Yang, 2001 Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution Mol. Biol. Evol 18:1585-1592[Abstract/Free Full Text]

    Argamaso S. S., A. C. Froehloch, M. A. McCall, E. Nevo, I. Provencio, R. G. Foster, 1995 Photopigments and circadian systems of vertebrates Biophys. Chem 56:3-11[ISI][Medline]

    Barkman T. J., G. Chenery, J. R. McNeal, J. Lyons-Weiler, W. J. Elisens, G. Moore, A. D. Wolfe, C. W. dePamphilis, 2000 Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny Proc. Natl. Acad. Sci. USA 97:13166-13171[Abstract/Free Full Text]

    Barrier M., R. H. Robichaux, M. D. Purugganan, 2001 Accelerated regulatory gene evolution in an adaptive radiation Proc. Natl. Acad. Sci. USA 98:10208-10213[Abstract/Free Full Text]

    Bricaud C. H., P. A. Thalouarn, S. Renaudin, 1986 Ribulose 1,5-bisphosphate carboxylase activity in the holoparasite Lathraea clandestina L J. Plant Physiol 125:367-370[ISI]

    Briscoe A. D., L. Chittka, 2001 The evolution of color vision in insects Annu. Rev. Entomol 46:471-510[ISI][Medline]

    Bromham L., D. Penny, A. Rambaut, M. D. Hendy, 2000 The power of relative rates tests depends on the data J. Mol. Evol 50:296-301[ISI][Medline]

    Brown B. W., C. Brauner, A. Chan, D. Gutierrez, J. Herson, J. Lovato, J. Polsley, K. Russell, J. Venier, 2000 DSTPLAN: calculations for sample sizes and related problems. Version 4.2 (http://odin.mdacc.tmc.edu/anonftp/page_2.html). University of Texas, M. D. Anderson Cancer Center

    Cassagrande J. T., M. C. Pike, P. G. Smith, 1978 An improved approximate formula for calculating sample sizes for comparing two binomial distributions Biometrics 34:483-486[ISI][Medline]

    Crandall K. A., D. M. Hillis, 1997 Rhodopsin evolution in the dark Nature 387:667-668[ISI][Medline]

    David-Gray Z. K., H. M. Cooper, J. W. H. Janssen, E. Nevo, R. G. Foster, 1999 Spectral tuning of a circadian photopigment in a subterranean ‘blind’ mammal (Spalax ehrenbergi) FEBS Lett 461:343-347[ISI][Medline]

    Delavault P. M., V. Sakanyan, P. A. Thalouarn, 1995 Divergent evolution of two plastid genes, rbcL and atpB, in a non-photosynthetic parasitic plant Plant Mol. Biol 29:1071-1079[ISI][Medline]

    dePamphilis C. W., 1995 Genes and genomes Pp. 177–205 in Malcolm C. Press and Jonathan D. Graves, eds. Parasitic plants. Chapman and Hall, London

    dePamphilis C. W., J. D. Palmer, 1990 Loss of photosynthetic and chlororespiratory genes from the plastid genome of a parasitic flowering plant Nature 348:337-339[ISI][Medline]

    dePamphilis C. W., N. D. Young, A. D. Wolfe, 1997 Evolution of plastid gene rps2 in a lineage of hemiparasitic and holoparasitic plants: many losses of photosynthesis and complex patterns of rate variation Proc. Natl. Acad. Sci. USA 94:7367-7372[Abstract/Free Full Text]

    Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]

    Goldman N., 1993 Statistical tests of models of DNA substitution J. Mol. Evol 36:182-198[ISI][Medline]

    Goldman N., J. L. Thorne, D. T. Jones, 1998 Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses Genetics 149:445-458[Abstract/Free Full Text]

    Goldman N., Z. Yang, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736[Abstract/Free Full Text]

    Golenberg E. M., M. T. Clegg, M. L. Durbin, J. Doebley, D. P. Ma, 1993 Evolution of a noncoding region of the chloroplast genome Mol. Phylogenet. Evol 2:52-64[Medline]

    Hasegawa M., H. Kashino, T. Yano, 1985 Dating the human-ape split by a molecular clock of mitochondrial DNA J. Mol. Evol 22:160-174[ISI][Medline]

    Huelsenbeck J. P., B. Ranala, 1997 Phylogenetic methods come of age: testing hypotheses in an evolutionary context Science 276:227-232[Abstract/Free Full Text]

    Janssen J. W. H., P. H. M. Bovee-Geurts, Z. P. A. Peeters, J. K. Bowmaker, H. M. Cooper, Z. K. David-Gray, E. Nevo, W. J. Degrip, 2000 A fully functional rod visual pigment in a blind mammal: a case for adaptive functional reorganization? J. Biol. Chem 275:38674-38679[Abstract/Free Full Text]

    Kellogg E. A., N. D. Juliano, 1997 The structure and function of RuBisCo and their implications for systematic studies Am. J. Bot 84:413-428[Abstract]

    Knowles L. L., 2000 Tests of Pleistocene speciation in montane grasshoppers (genus Melanoplus) from the sky islands of western North America Evolution 54:1337-1348[ISI][Medline]

    Li W.-H., C.-I. Wu, C.-C. Luo, 1985 A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes Mol. Biol. Evol 2:150-174[Abstract]

    Liò P., N. Goldman, 1999 Using protein structural information in evolutionary inference: transmembrane proteins Mol. Biol. Evol 16:1679-1710

    Lusson N. A., P. M. Delavault, P. A. Thalouarn, 1998 The rbcL gene from the non-photosynthetic parasite Lathraea clandestina is not transcribed by plastid-encoded RNA polymerase Curr. Genet 34:212-215[ISI][Medline]

    Marshall C. R., E. C. Raff, R. A. Raff, 1994 Dollo's law and the death and resurrection of genes Proc. Natl. Acad. Sci. USA 91:12283-12287[Abstract/Free Full Text]

    Messier W., C.-B. Stewart, 1997 Episodic adaptive evolution of primate lysozymes Nature 385:151-154[ISI][Medline]

    Nei M., J. Zhang, S. Yokoyama, 1997 Color vision of ancestral organisms of higher primates Mol. Biol. Evol 14:611-618[Abstract]

    Olmstead R. G., C. W. dePamphilis, A. D. Wolfe, N. D. Young, W. Elisens, P. Reeves, 2001 Disintegration of the Scrophulariaceae Am. J. Bot 88:348-361[Abstract/Free Full Text]

    Shinozaki K., M. Ohme, M. Tanaka, et al. (23 co-authors) 1986 The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expressions Embo. J 5:2043-2049[ISI]

    Swanson W. J., V. D. Vacquier, 1995 Extraordinary divergence and positive Darwinian selection in a fusagenic protein coating the acrosomal process of abalone spermatozoa Proc. Natl. Acad. Sci 92:4957-4961[Abstract]

    Swofford D. L., 2000 PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4 Sinauer Associates, Sunderland, Mass

    Wolfe A. D., C. W. dePamphilis, 1997 Alternative paths of evolution for the photosynthetic gene rbcL in four nonphotosynthetic species of Orobanche Plant Mol. Biol 33:965-977[ISI][Medline]

    ———. 1998 The effect of relaxed functional constraints on the photosynthetic gene rbcL in photosynthetic and nonphotosynthetic parasitic plants Mol. Biol. Evol 15:1243-1258[Abstract/Free Full Text]

    Wolfe K. H., C. W. Morden, J. D. Palmer, 1992 Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant Proc. Natl. Acad. Sci. USA 89:10648-10652[Abstract]

    Yang Z., 1994 Maximum Likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods J. Mol. Evol 39:306-314[ISI][Medline]

    ———. 1998 Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution Mol. Biol. Evol 15:568-573[Abstract]

    ———. 1999 PAML: phylogenetic analysis by maximum likelihood (http://abacus.gene.ucl.ac.uk/software/paml.html). University College London, London

    Yang Z., J. P. Bielawski, 2000 Statistical methods for detecting molecular adaptation Trends Ecol. Evol 15:496-503[ISI][Medline]

    Yang Z., S. Kumar, M. Nei, 1995 A new method of inference of ancestral nucleotide and amino acid sequences Genetics 141:1641-1650[Abstract/Free Full Text]

    Yang Z., R. Nielsen, 1998 Synonymous and nonsynonymous rate variation in nuclear genes of mammals J. Mol. Evol 46:409-418[ISI][Medline]

    Yang Z., R. Nielsen, N. Goldman, A. Krabbe-Pedersen, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites Genetics 155:431-449[Abstract/Free Full Text]

    Yokoyama S., 1997 Molecular genetic basis of adaptive selection: examples from color vision in vertebrates Annu. Rev. Genet 31:311-332

    Yokoyama S., F. B. Radlwimmer, 2001 The molecular genetics of red and green vision in vertebrates Genetics 158:1697-1710[Abstract/Free Full Text]

    Yokoyama S., R. Yokoyama, 1996 Adaptive evolution of photoreceptors and visual pigments in vertebrates Annu. Rev. Ecol. Syst 27:543-567[ISI]

    Yokoyama S., H. Zhang, F. B. Radlwimmer, N. S. Blow, 1999 Adaptive evolution of color vision of the Comoran coelacanth (Latimeria chalumnae) Proc. Natl. Acad. Sci. USA 96:6279-6284[Abstract/Free Full Text]

    Young N. D., C. W. dePamphilis, 2000 Purifying selection detected in the plastid matK and flanking ribozyme regions within a group II intron of non-photosynthetic plants Mol. Biol. Evol 17:1933-1941[Abstract/Free Full Text]

    Young N. D., K. D. Steiner, C. W. dePamphilis, 1999 The evolution of parasitism in Scrophulariaceae/Orobanchaceae: plastid gene sequences refute an evolutionary transition series Ann. Mo Bot. Gard 86:876-893

Accepted for publication April 2, 2002.