*Commonwealth Scientific and Industrial Research Organisation, Canberra, Australia;
Division of Botany and Zoology, Australian National University, Canberra, Australia;
and
Center for Population Biology, University of California at Davis
Abstract
Previous analyses of the -esterase cluster of Drosophila melanogaster revealed 10 active genes and the Dm
E4a-
pseudogene. Here, we reconstruct the evolution of the pseudogene from the sequences of 12 alleles from widely scattered D. melanogaster populations and single alleles from Drosophila simulans and Drosophila yakuba. All of the Dm
E4a-
alleles contain numerous inactivating mutations, suggesting that pseudogene alleles are fixed in natural populations. Several lines of evidence also suggest that Dm
E4a is now evolving without selective constraint in the D. melanogaster lineage. There are three polymorphic indels which result in frameshifts; a key nucleotide of the intron splice acceptor is polymorphic; the neutral mutation parameter is the same for replacement and silent sites; one of the nonsilent polymorphisms results in a stop codon; only 1 of the 13 replacement polymorphisms is biochemically conservative; residues that are conserved among active esterases have different states in Dm
E4a-
; and there are about half as many transitional polymorphisms as transversional ones. In contrast, the D. simulans and D. yakuba orthologs Ds
E4a and Dy
E4a do not have the inactivating mutations of Dm
E4a-
and appear to be evolving under the purifying selection typical of protein- encoding genes. For instance, there have been more substitutions in the introns than in the exons, and more in silent sites than in replacement sites. Furthermore, most of the amino acid substitutions that have occurred between Dy
E4a and Ds
E4a are located in sites that typically vary among active
-esterases rather than those that are usually conserved. We argue that the original
E4a gene had a function which it has lost since the divergence of the D. melanogaster and D. simulans lineages.
Introduction
Pseudogenes provide a baseline from which to measure various components of molecular evolution. As inactive copies of functional genes, they are thought to evolve without selective constraint, and they can therefore reflect the patterns and the rates of underlying mutational and stochastic processes. Pseudogenes are thought to be a frequent outcome of gene duplication, so they also elucidate this important mechanism of gene origination (Walsh 1995
). Measurements of substitution rates in pseudogenes and of the frequencies at which pseudogenes themselves are generated, fixed, and removed within a population are perhaps most interesting in the context of how these values vary between taxa. For instance, Graur, Shuali, and Li (1989)
noticed that the rate of sequence loss among pseudogenes due to deletions was seven times as fast in rodents as it is in humans, whereas analyses of "dead on arrival" copies of Helena elements (Petrov, Lozovskaya, and Hartl 1996
; Petrov and Hartl 1998
) and the swallow pseudogene (sww
; Petrov et al. 1998
) suggest that deletions in Drosophila pseudogenes are on average seven times as large as those in mammalian pseudogenes and that the rate of deletions is about 2.6-fold greater.
Pseudogenes are, in fact, relatively rare in Drosophila genomes. Moreover, inter- and intraspecific sequence comparisons of some of those cases originally claimed to be pseudogenes have revealed some anomalous results more consistent with the selective constraint expected of functional genes. For example, it is found that divergence or polymorphism at nonsynonymous sites is substantially less than that at synonymous sites for Alcohol dehydrogenaselike (Adh-like) sequences in the melanogaster (Long and Langley 1993
) and repleta (Sullivan et al. 1994
; Begun 1997
) species groups, Cecropin pseudogene1 (Cec
1) in the melanogaster group (Ramos-Onsins and Aguade 1998
), and alleles of Esterase P (EstP) in the ß-esterase cluster of Drosophila melanogaster (Balakirev and Ayala 1996
). Only in the cases of the Larval cuticle protein pseudogene (Lcp
; Pritchard and Schaeffer 1997
) and the Cec
2 genes from Drosophila simulans, Drosophila mauritania, and Drosophila sechellia (Ramos-Onsins and Aguade 1998
) do the patterns of divergence accord with neutral expectations. As it turns out, the Adh-like sequence from the melanogaster group is thought to encode a functional gene (renamed jingwei; Long and Langley 1993
), and so do the Adh-like sequences from seven of the eight species examined in the repleta group (now called Finnegan; Begun 1997
). Both jingwei and Finnegan have exons that were not detected in early reports. The two genes from the Cecropin cluster produce transcripts (Ramos-Onsins and Aguade 1998
), as does EstP, now renamed Est7, some alleles of which also produce a catalytically active esterase (Dumancic et al. 1997
). In fact, the only evidence to suggest that Est7, Cec
1, and Cec
2 are pseudogenes is that a relatively high frequency of alleles have been found with disrupted open reading frames (Balakirev and Ayala 1996
; Ramos-Onsins and Aguade 1998
).
In this paper, we examine polymorphism and divergence of the DmE4a pseudogene from the
-esterase cluster of D. melanogaster (Russell et al. 1995
; Robin et al. 1996
). The
-esterase cluster comprises 10 active esterase genes plus the pseudogene, dispersed over 60 kb. The esterases encoded by the cluster show 37%66% amino acid identity, and no evidence for gene conversion or intergenic recombination has been detected. Orthologs for several of the genes, albeit not Dm
E4a-
, have been characterized in Drosophila buzzatii, the sheep blowfly Lucilia cuprina, and the housefly Musca domestica, and phylogenetic analyses and physical mapping of the clusters in these species suggest that the organization of the cluster has been fairly stable since the divergence of the Calliphoridae and the Drosophilidae (Newcomb et al. 1996
; Claudianos, Russell, and Oakeshott 1999
; Oakeshott et al. 1999
). All of the
-esterases characterized to date except Dm
E4a-
have conserved motifs indicative of hydrolytic function, and the low divergences between some orthologs also suggest that they have conserved functions. In general, the functions of
-esterases are poorly understood, but mutant alleles of the
E7 gene have been shown to confer organophosphate insecticide resistance in L. cuprina and M. domestica (Newcomb et al. 1997
; Campbell et al. 1997
). While the expression of some D. melanogaster
-esterases (e.g., EST23 and EST9) in digestive tissues (Spackman et al. 1994
) concurs with the idea that some may have a role in digestion or detoxification of xenobiotics, there is such diversity among paralogs in sequence, tissue and ontogenic expression pattern, substrate preferences, and inhibitor sensitivities (Oakeshott et al. 1999
) that it is too early to conclude that such a role is a general feature of the
-esterase cluster.
The DmE4a-
gene is located within an intron of another
-esterase gene, Dm
E6. A partial cDNA clone of Dm
E6 (accession number AI389293) has the Dm
E4a-
-containing intron correctly spliced out and an intact open reading frame. Phylogenetic analyses suggest that Dm
E4a-
stems from the most recent gene duplication in the D. melanogaster
-esterase cluster. However, the silent-site divergence between Dm
E4a-
and its most closely related paralog (Dm
E4) is close to saturation, so the duplication event probably happened before the divergence of the melanogaster and willistoni species groups (3040 MYA; Powell and DeSalle 1995
). The Dm
E4a-
allele sequenced has three indels that disrupt and prematurely truncate the open reading frame, plus a noncanonical splice acceptor at intron site II, so it appears to be nonfunctional. The lack of a detectable transcript for Dm
E4a-
(unpublished data), the replacement of the catalytic histidine residue with a tyrosine, and a lower G+C content at third- position sites than other
-esterases are also consistent with this interpretation. However, the distribution of amino acid differences between Dm
E4a-
and Dm
E4 along the primary sequence is nonrandom and similar to the distribution observed among functional esterases, which suggests that the forces of purifying selection may have been acting in both the Dm
E4 and the Dm
E4a-
lineages. Furthermore, relative-rate tests using other paralogs as outgroups suggest that Dm
E4a-
is not evolving significantly faster than Dm
E4. Neither of these observations would be expected if Dm
E4a-
has been a pseudogene for most of the time since the
E4/
E4a gene duplication event. To address the hypothesis that Dm
E4a-
was a functional esterase which only recently became a pseudogene, we examine the sequences of the orthologous
E4a from D. simulans and D. yakuba and compare the sequences of 12 Dm
E4a-
alleles collected from around the world.
Materials and Methods
Drosophila DNA and Strains
Drosophila simulans AR1 genomic DNA was obtained from Dr. Jill Karotam (CSIRO Division of Entomology). Drosophila yakuba flies (14021-0261.0) were obtained from the U.S. National Drosophila Species Resource Center. Nine D. melanogaster lines that were homozygous for the third chromosome and derived from populations in Maryland (Md7B), Zimbabwe (c53, c86, c88, c171), Ecuador (Ec32, Ec100), and Beijing, China (Bei23, Bei65), were obtained from Dr. Charles Aquadro (Cornell University). Two others from Rollingstone (Rs4) and Coffs Harbour in Australia were obtained from Dr. Wendy Odgers (CSIRO Division of Entomology). (The twelfth D. melanogaster sequence was the original one from an Oregon R library described in Russell et al. [1995]
; GenBank accession number U51049).
DNA Preparation
Single D. yakuba or D. melanogaster flies were prepared for PCR as described by Gloor and Engels (1992)
, except that debris was precipitated after homogenization by a brief pulse in a microcentrifuge and the supernatant was then diluted 1:4 in homogenization buffer (10 mM Tris-HCl [pH 8.2], 1 mM EDTA, 25 mM NaCl, 200 µg/ml proteinase K).
PCR Amplifications
Reactions were performed in 50 µl containing 10 mM Tris-HCl [pH 8.3], 1.5 mM MgCl2, 50 mM KCl, 200 µM of each dNTP, 1 µM of each primer, and 5 U Taq polymerase (Gibco BRL) under approximately 60 µl of mineral oil. The reactions were performed in a Corbett Research Thermal Sequencer FTS-1. A series of PCRs using four primer sets (GHS: 5'-ATIACIATITTYGGNCAYAGYTCNGG-3' and WSN: 5'-CCIARIATIATIGGDATNCGRTTRCTCCA-3'; Ds1: 5'-CCCGTGGTGCAGACCACNCAYGG-3' and Ds2: 5'-GCTGAGTGTTAACCCCCATCG-3'; Ds3: 5'-CCCAAGGAATTGCTGCGGAACAGT-3' and Ds4: 5'-AAGTTCCTCGGCGATGTTNAGNAC-3'; Ds8: 5'-GTAGATTGGAAGCCAGTAACCTCGGG-3' and Md1: 5'-YTGRTCYTTIARICCIGCRTTNCCNGGNAC-3') were used to amplify E4a sequence from D. simulans genomic DNA (fig. 1
). Approximately 0.02 µg of D. simulans DNA was used as a template, and the cycling regime was 1 cycle of 97°C for 3 min, 55°C for 1 min, and 72°C for 1 min and 40 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 90 s. For D. yakuba, a nested PCR approach was used to amplify
E4a. Ds1 and Ds4 primers were used in the first PCR (fig. 1
). The first PCR conditions were 1 cycle of 97°C for 3 min, 50°C for 2 min, and 72°C for 1 min; 35 cycles of 95°C for 30 s, 55°C for 30 s, and 72°C for 1 min; and 72°C for 10 min. The conditions for the second PCR were the same as those for the first except that in the second round, 1 µl of the product of the first PCR was used as a template, the Ds6 and Ds7 primers were used, the initial annealing temperature was 60°C, and the subsequent annealing temperature was 62°C.
|
Cloning and Sequencing
The amplified DNA was purified using a QIAQUICK spin column (QIAGEN) following the manufacturer's instructions. PCR products from the D. simulans, D. yakuba, and D. melanogaster Coffs Harbour allele were cloned into pGEM-T or pGEM-T Easy (Promega). Single clones were sequenced using TaqFS chemistry (ABI) as recommended by the suppliers.
The other 10 D. melanogaster alleles were sequenced directly. Three PCRs were performed from each of the 10 strains, the DNA was purified using QIAQUICK spin columns, and the yields from these were determined spectrophotometrically. For each strain, equimolar amounts from each PCR were pooled and then sequenced as a precaution against errors that may have occurred during the PCR or sequencing. Ninety nanomoles of amplified DNA was sequenced using 20 pmol of either the PCR primers or specifically designed primers (not shown) and TaqFS chemistry in a total volume of 10 µl.
Analyses
The program DnaSP, version 3.14 (Rozas and Rozas 1999
), was used to calculate Tajima's (1989)
D, Fu and Li's (1993)
D, Hudson's (1987)
4Nc and the HKA test (Hudson, Kreitman, and Aguade 1987
).
Results
E4a Appears to Be Active in D. simulans and D. yakuba
To test the hypothesis that E4a has recently been inactivated in the D. melanogaster lineage, a PCR strategy was used to amplify, clone, and sequence
E4a from D. simulans (hereafter called Ds
E4a). The strategy involved three rounds of primer design and amplification (fig. 1
). One of the second-round PCRs (using the Ds1-Ds2 primer pair) yielded products of two sizes. Both products were cloned and sequenced, and one contained 925 nt from the Ds
E4a target sequence and the other contained 1,192 nt from a paralog, tentatively called Ds
E4, that had been amplified by the spurious binding of the Ds1 primer. The 1,192 nt obtained from Ds
E4 (GenBank accession number AF159418) corresponds to codons 18376 of Dm
E4 and includes two introns of 58 and 54 nt. In the final round, a specific Ds
E4a primer (Ds8) was used in conjunction with a degenerate primer (Md1) designed to bind 3' of intron site II of
-esterases. The success of this PCR demonstrated that Ds
E4a, like its ortholog in D. melanogaster, is located within the second intron of
E6 (fig. 1
).
In total, 1,572 of the estimated 1,624 coding nucleotides of DsE4a were sequenced (GenBank accession number AF159419), representing all but 17 of the 541 codons expected in the Ds
E4a open reading frame (fig. 2
). The sequences homologous to introns II and III of Dm
E4a-
(84 and 178 nt, respectively) were also obtained, as were 282 nt 3' of the stop codon and 154 nt of exon II from Ds
E6. None of the inactivating mutations observed in Dm
E4a-
are present in Ds
E4a. Instead, with the exclusion of the two introns, Ds
E4a has an intact open reading frame.
|
|
|
The distribution of variation across sites that are conserved or otherwise among the 10 functional -esterases in D. melanogaster can also be used to assess the level of selective constraint. Sites in the alignment of the 10 active D. melanogaster
-esterases have been divided into two categories: the 82 sites that are conserved in all 10 sequences and the 520 sites that are variable. Three sites (4%) in the "conserved" category differ between Ds
E4a and Dy
E4a, whereas there are 69 "variable" sites (13%) that differ between Ds
E4a and Dy
E4a. A G-test on these values is significant (Gadj = 5.4, df = 1, P < 0.05), indicating greater constraint acting on residues usually conserved among other
-esterases. Therefore, the variation between Ds
E4a and Dy
E4a amino acid sequences is consistent with the type of purifying selection observed for a functional
-esterase.
Change in Sequence Evolution in the D. melanogaster Lineage
Comparisons of DsE4a or Dy
E4a with Dm
E4a-
must reflect the relaxation of constraint on the putative pseudogene, as well as the constraint on the active gene. Consistent with this, R/S and E/I values are greater in comparisons involving Dm
E4a-
than in those just comparing Ds
E4a and Dy
E4a (table 1
). Comparisons involving Dm
E4a-
also have lower transition-to- transversion ratios (table 1 ), which is to be expected for a pseudogene given the findings of Moriyama and Powell (1996)
that transversions made up 54% of noncoding polymorphisms but only 32% of coding polymorphisms among the 24 functional loci they surveyed.
Of the 96 amino acid sites that vary among the sequences of the E4a lineages, there are 23 that have the same state in Ds
E4a and Dy
E4a but differ in Dm
E4a-
, and there are only 8 that are shared between Dm
E4a-
and Dy
E4a but differ in Ds
E4a. These values are significantly different using Tajima's (1993)
1D test (
2 = 7.3, df = 1, P < 0.01), demonstrating that the rate of amino acid substitution is greater in the DmaE4a-
lineage, as would be expected if
E4a became a pseudogene in the D. melanogaster lineage. Furthermore, only two of the 23 substitutions that have occurred in the Dm
E4a-
branch align to the conserved sites defined above by classifying the sites according to an independent set of active
-esterases. This distribution (2 to 21) is not significantly different from the ratio of conserved to variable sites (82 to 520), so there is no apparent functional constraint among the Dm
E4a-
amino acid changes (Gadj = 0.14, df = 1, P > 0.05).
Is DmE4a-
Evolving Neutrally?
Although the above comparisons indicate a relaxation of selection on E4a in the D. melanogaster lineage, the only rigorous test of whether Dm
E4a-
is indeed evolving neutrally involves comparisons among D. melanogaster alleles (given that there are no species closer to D. melanogaster than D. simulans). To this end, a survey of 12 alleles sampled from various locations around the world was conducted. One of these was the original Oregon R allele (Robin et al. 1996
), and another was from a fly caught near Coffs Harbour, Australia, for which a 317-bp amplicon (using primers WSN and GHS) was cloned and sequenced. In the latter case, the region amplified spanned two of the three frameshift mutations observed in the original Oregon R allele (i.e., the 17-nt deletion and the 1-nt insertion; see fig. 2
). The Coffs Harbour allele has both of these inactivating mutations and an additional 13-nt deletion (of sites 171183; see fig. 4 ). For the other 10 flies (Md7B, Zc53, Zc86, Zc88, Zc171, Ec32, Ec100, Bei23, Bei65, and Rs4), amplicons of approximately 1.1 kb (using primers 4a.0 and 4a.5) were obtained. These were sequenced directly in both strands (fig. 1
). All 10 alleles have the three frameshift mutations originally observed in the Oregon R allele. The inactivating mutation in the splice acceptor is polymorphic, as are three further frameshifting deletions of 13, 28, and 13 nt (171183, 675702, and 9921004, respectively; fig. 4
; GenBank accession numbers AF159406AF159417).
|
Also consistent with this proposition, there are about half as many transitional (ts) polymorphisms as transversional (tv) polymorphisms in DmE4a-
(6:14). Thus, the ts/tv ratio is substantially lower (although not significantly so; Gadj = 2.2, df = 1, P > 0.05) than that observed in the divergence data (fig. 4
), and it is significantly different from that normally seen segregating among Drosophila coding regions (G = 11.1, df = 1, P < 0.01; Moriyama and Powell 1996
).
One of the 14 nonsilent polymorphisms creates a stop codon, and only one of the others is biochemically conservative (where conservative substitutions are within the subsets GA, VIL, FYW, STC, DE, NQ, or KR; Genetics Computer Group 1994). Their distribution also does not follow the pattern of amino acid conservation observed in active esterases. Five occur among 82 sites classified as conserved, and nine occur among the 520 variable sites. These two proportions are equivalent (Gadj = 3.0, df = 1, P > 0.05), which is consistent with DmE4a-
being a pseudogene. By way of comparison, equivalent polymorphism data from a similar sample for the functional Esterase 6 enzyme of D. melanogaster show a highly significant difference, indicative of selective constraint (1 polymorphism in the conserved category and 22 in the variable category; Gadj = 28, df = 1, P < 0.001; data from Dr. W. A. Odgers, CSIRO Division of Entomology, personal communication).
The McDonald and Kreitman (1991)
G-test compares the ratio of replacement to silent-site polymorphisms within species with the ratio of replacement to silent site divergences between species. A significant test result is often interpreted as changes in the selective forces acting in current populations relative to those acting during divergence. When this test is applied to the Dm
E4a-
polymorphism data using Ds
E4a for the interspecific comparison, the test is not significant (G = 1.74, df = 1, P > 0.05). Similarly, the HKA test (Hudson, Kreitman, and Aguade 1987
), which compares the intraspecific and interspecific sequence variation between two loci, is not rejected when Dm
E4a-
and Ds
E4a are compared with the Adh 5' regions of D. melanogaster and D. simulans (
2 = 0.24, df = 1, P > 0.05; Kreitman and Hudson 1991
). However, if the McDonald-Kreitman test is applied using Dy
E4a in the interspecific comparison, the neutral model is rejected (G = 6.8, df = 1, P < 0.01), as it is when Dm
E4a-
polymorphism is compared with the divergence between the apparently functional Ds
E4a and Dy
E4a (G = 9.2; P < 0.01). These results not only support the proposition that
E4a has become a pseudogene in the D. melanogaster lineage, but also suggest that it has been a pseudogene long enough to substantially influence the divergence of Dm
E4a-
and Ds
E4a.
The DmE4a-
Allelic Network
Since all of the sampled DmE4a-
alleles share three out of the six frameshift mutations, they coalesce (at these indel sites, at least) to an inactive ancestor that was descended from the original inactive Dm
E4a allele. The allelic network in figure 5 describes the relationship among Dm
E4a-
alleles based on the segregating sites, including indel data, which, in the absence of parallel or backward mutation (which are unlikely among alleles since they have diverged so little) or reticulate evolution (i.e., gene conversion or recombinationwhich may be expected), would represent the allele phylogeny. There is only one character change (that of indel 3) that occurs twice on the network, and this is most probably indicative of a reticulate event between alleles. Thus, there is a high level of linkage disequilibrium between the polymorphic sites scored in this sample. This is consistent with the position of Dm
E4a-
within cytological divisions 84DE (Russell et al. 1995
), which have been described as regions of low recombination (Aquadro, Begun, and Kindahl 1994
), albeit a more formal indication of recombination rate, Hudson's (1987)
estimator of 4Nc (where N is the population size and c is the recombination rate), is estimated to be 13.3 for the sequenced region (i.e., 0.0123 per adjacent site), indicating a fairly typical recombination rate (Aquadro, Begun, and Kindahl 1994
).
|
Powell and DeSalle (1995)
used biogeographical evidence and the assumption of a molecular clock to estimate that the divergence of D. melanogaster and D. simulans occurred 2.5 MYA. If we use this in our maximum-likelihood model, we estimate that the silent site mutation rate of
E4a is 2.5 x 10-8 mutations per site per year (CI = 1.9 x 10-8, 3.7 x 10-8), the replacement mutation rate is 6.1 x 10-9 mutations per site per year (4.6 x 10-9, 9.4 x 10-9), the pseudogene rate is 1.6 x 10-8 mutations per site per year (1.1 x 10-8, 1.0 x 10-6), the divergence time between D. simulans and D. yakuba is 7.3 Myr (5.1, 8.9) and the inactivation time is 1.6 Myr (0.006, 2.5).
Thus, these calculations give the unsatisfying result that the 95% confidence limits for the inactivation time encompass almost all of the time from the divergence of the D. simulans and D. melanogaster lineages until the present time. We can slightly improve on the lower (i.e., younger) bound by estimating a minimum coalescence time of the D. melanogaster E4a-
alleles. Thus, if we take the upper bound of the estimate of the pseudogene rate and the lower bound on our earlier estimate of (0.003), then we calculate that the alleles coalesced to a pseudogene at least 10,000 years ago.
The magnitudes of the estimates for silent and replacement site mutation rates seem biologically reasonable, although the upper bound on the pseudogene rate seems relatively high, given that Sharp and Li (1989)
estimated a rate of 1.6 x 10-8 mutations per site per year for the silent sites of four quickly diverging Drosophila genes, and Pritchard and Schaeffer (1997)
calculated the rate of substitution in Lcp
to be 6.8 x 10-8 mutations per site per year.
Note that an analysis of the relative rates of deletions and substitutions also suggests that the upper bound of pseudogene inactivation is more recent than the speciation of D. melanogaster and D. simulans. Among D. melanogaster alleles, there are 3 polymorphic deletions, 14 "nonsynonymous" substitutions, and 4 "synonymous" substitutions. Thus, while E4a was evolving "neutrally" as a pseudogene there were 4.7 nonsynonymous changes per deletion and 1.3 synonymous changes per deletion. There is only one fixed deletion in the melanogaster lineage, and since it causes a frameshift, it must have occurred at or after the point at which Dm
E4a became a pseudogene. Extrapolating from the polymorphism data, we would therefore expect (very roughly) about 4.7 nonsynonymous-site fixations and 1.3 synonymous-site fixations to have occurred since
E4a became a pseudogene. However, we calculate that there were actually about 30 replacement changes and 14 synonymous changes in the melanogaster lineage. This would suggest that approximately 25 nonsynonymous fixations and 13 synonymous-site fixations occurred before the inactivation of the pseudogene. These calculations are limited by the large inaccuracies introduced by working with such a small number of deletions, but nevertheless they support the proposition that many substitutions occurred in the melanogaster lineage before
E4a was inactivated.
Discussion
Silent site divergences suggest that an ancestral E4 gene duplicated to form
E4 and
E4a before the divergence of the D. melanogaster and D. willistoni lineages (Robin et al. 1996
). Since that time, both the
E4 gene and the
E4a gene appear to have evolved, for a period at least, under the purifying selection that is typical of protein-encoding and, more specifically, esterase-encoding genes. Replacement site divergence is less than silent site divergence, exons have been more constrained than introns, and amino acid substitutions have occurred more frequently in positions aligning to the variable sites of other
-esterases. However, at some stage after the D. melanogaster lineage diverged from the D. simulans lineage, the Dm
E4a gene was inactivated, and since then, it has evolved in a neutral fashion. Consequently, interspecific comparisons involving Dm
E4a-
yield higher R/S and E/I ratios than do comparisons between the active Ds
E4a and Dy
E4a genes. Furthermore, at least seven inactivating mutations have been acquired among Dm
E4a-
alleles.
Does the apparently nonneutral evolution that preceded DmE4a inactivation mean that
E4a had a function? Potentially neutral molecular events such as gene conversion and unequal recombination could cause a functionless gene to evolve in a such way that replacement substitutions appeared less frequent than silent substitutions, etc., and, indeed, there are a few cases outside the Drosophila literature in which pseudogenes are purported to have been reactivated by such reticulate events (Nei 1987
; Trabesinger-Ruef et al. 1996
). However, there are no signs of reticulate evolution in Ds
E4a or Dy
E4a sequences (e.g., patchworks of regions more similar to paralogs than orthologs) or, more generally, between
E4 and
E4a (Robin et al. 1996
). Therefore, it seems likely that selection on a functional esterase explains why there are relatively fewer replacement substitutions than silent substitutions in
E4a.
This does not necessarily mean that E4a had its own function distinct from
E4. Perhaps active copies of both
E4 and
E4a were required to produce enough esterase for a particular function. Another possibility is that the products of the two duplicate genes form heterodimers and that selection against defective heterodimers has kept both genes active. Gottleib and Ford (1997)
proposed such a mechanism to explain the multiple independent "silencings" of a duplicate PGI gene in Clarkia wildflower species. However, these processes do not easily explain why the R/S value is twice as great in Ds
E4a-versus-Dy
E4a comparisons as it is in Dm
E4-versus-Ds
E4 comparisons. It seems that even before the
E4a gene was inactivated, it evolved with relaxed constraint relative to its paralog
E4.
Not much is known that sheds light on the nature of the functions of the active E4/
E4a genes. Alleles of
E7, another gene in the
-esterase cluster, confer organophosphate insecticide resistance on L. cuprina and M. domestica (Newcomb et al. 1997
; Claudianos, Russell, and Oakeshott 1999
). Isozymes that are known to be encoded by the cluster (EST23 and EST9, which was once called EST C, in D. melanogaster and EST2 in D. buzzatii) exhibit high levels of allozyme polymorphism in a number of species (David 1982
; Barker 1994
), although nulls are rare (Langley et al. 1981
). These isozymes are expressed in high concentrations in digestive tissues of the feeding life stages, and the D. melanogaster isozymes are also abundant in adult heads (Healy, Dumancic, and Oakeshott 1991
). Therefore, it is tempting to suggest that the
E4 genes may also have roles in the digestion of dietary esters or xenobiotics. However, it is worth noting that all of the
E4 esterases have the same highly unusual residues around their catalytic sites (e.g., a histidine preceding the nucleophilic serine) that almost certainly make them functionally distinct from other esterases encoded by the
-esterase cluster (unpublished data).
A study similar to the one described here presents the sequences of 10 alleles of a larval cuticle protein pseudogene (Lcp) and a single allele of its ortholog in D. simulans (Pritchard and Schaeffer 1997
). Unlike Ds
E4a, the D. simulans ortholog of Lcp
is also a pseudogene and has at least three inactivating mutations. The evolution of this gene appears neutral in that divergence does not differ between nonsynonymous and synonymous sites. However, the divergence among Lcp
alleles is actually one of the lowest described for D. melanogaster genes (
= 0.001). If this observation that pseudogenes had a lower heterozygosity than functional genes were general, it could be argued that the extra variation observed among alleles of functional genes could be due to diversifying or balancing selection. However, Lcp
differs in this respect from Dm
E4
-
, which has a relatively high level of heterozygosity ( = 0.006). Obviously, no general conclusions can be drawn from such a small sample of pseudogenes, especially in the absence of more detailed information about the effect of selection on neighboring sites.
Comparison of DmE4a-
alleles reveals a high frequency of polymorphic deletions. This is consistent with the studies of Petrov, Lozovskaya, and Hartl (1996)
, Petrov et al. (1998)
, and Petrov and Hartl (1998)
, who have compared defective copies of the transposable element Helena within the virilis group, within the melanogaster subgroup, and from the swallow pseudogene. In the first of these studies, 18 copies of Helena from 8 different species were examined, and 11 copies had unique deletions in the size range of 175 bp, with an average size of 24.3 bp. This translates to 0.16 deletions per nucleotide substitution. These authors estimated that it would take 11.8 Myr for a pseudogene to lose half of its DNA. In Dm
E4a-
, there are 3 polymorphic deletions and 20 nucleotide polymorphisms, giving a deletion/nucleotide substitution ratio of 0.15, and the average size of a deletion is 18 bp. If we assume that the pseudogene substitution rate is not slower than the lower bound estimated for the silent substitution rate in Dm
E4a (i.e., 1.9 x 10-8), then the average number of nucleotides lost per site per year would be 0.15 x 1.9 x 10-8 x 18 = 5.1 x 10-8. Thus, it would take about 10 Myr to lose half a gene.
Petrov et al. (1998)
noticed that approximately half of the deletions observed in their data were larger than 10 bp and that many had flanking short (27 bp) direct duplications. They speculated that the direct duplications may be footprints of some kind of homology-based mechanism by which the deletions form. In Dm
E4a-
, all deletions are larger than 10 bp, but none have the signs of small short direct duplications (albeit the 7-bp insertion that is fixed among all alleles is, in its entirety, a duplication of flanking sequence).
The overall similarity between the results presented here on DmE4a-
and those presented by Petrov, Lozovskaya, and Hartl (1996)
and Petrov and Hartl (1998)
for Helena elements and for the swallow pseudogene (Petrov et al. 1998
) supports the argument that there is a general tendency among unconstrained sequences in Drosophila to have a high rate of DNA loss, regardless of where they are located or whether they are derived from transposable elements. This rate of gene loss for Drosophila is much greater than that for mammals. Graur, Shuali, and Li (1989)
used 52 human and rodent processed pseudogenes to calculate that it would take 400 Myr to lose half of their DNA. The difference is partly due to the higher frequency of deletions in Drosophila (Petrov and Hartl [1998
] estimate it is 2.6 times that of mammals) and partly due to their larger size (approximately 7 times as large). The extent of the difference is such that Petrov, Lozovskaya, and Hartl (1996)
have argued that the inherent deletion rate contributes to genome size evolution in Drosophila, whereas Ophir and Graur (1997)
show that there is no significant correlation between the age of 156 murid and human pseudogenes and their decrease in size and conclude that the inherent deletion rate makes an insignificant contribution to genome size evolution in those organisms.
Could the relatively high rate at which unconstrained DNA is lost in Drosophila be due to selection for less DNA? Akashi (1995)
has shown that selection coefficients of one in a million can influence codon bias and, given the estimates of relatively large population size for many Drosophila species, Charlesworth (1996)
implies that a selective advantage for genomes slightly smaller than others could be effective in these species. In the case of Helena elements, Petrov and Hartl (1998)
have argued against this proposal by pointing out that there is not a positive correlation between the age of the element and the lengths of the deletions. It is conceivable, however, that it is not the size of the deletions that are selected but rather that they occur at all, and also possibly where they occur. For instance, it may be that deletions are subject to positive selection because they prevent ectopic protein expression which would interfere with normal cellular biochemistry (Hughes and Hughes 1993
). Alternatively, it is possible that the function of
E6 is compromised by an active or full-length
E4a in its second intron. With a larger and more random sample of alleles, the methods of Tajima (1989)
and Fu and Li (1993)
could be employed to help establish whether there is actually positive selection acting on deletions. In fact, both of these tests were used on the nucleotide variation currently available for Dm
E4a-
alleles but failed to find any evidence for selection acting on the alleles (Tajima's D = 0.23, P > 0.1; Fu and Li's D = 0.35, P > 0.1).
DmE4a-
is one of the rare bona fide pseudogenes so far described for Drosophila. Everything suggests that it is evolving according to neutral expectations. It is, however, in a class of pseudogenes that are different from many of those described in the literature. Unlike the bulk of pseudogenes that litter the mammalian genome, there is no evidence that it was generated by reverse transcription. It is also not the typical pseudogene envisaged under Ohno's (1970)
model of gene duplication. In this model (termed the "Mutation During Non-functionality" model by Hughes [1994]
), a pseudogene is generated when a duplicate gene fails to happen upon a function that selection maintains. Instead, it appears that
E4a actually had a function after the gene duplication event and apparently still has one in at least two species of the melanogaster group. It is, however, no longer required in D. melanogaster. It seems possible that it is not alone in this category and that there may be a suite of previously functional genes in Drosophila for which the fixation of inactivated alleles is a very recent or contemporary event. Candidates for this category include Est7 (Balakirev and Ayala 1996
), Cec
1, Cec
2 (Ramos-Onsins and Aguade 1998
), GstD22 (Toung, Hsieh, and Chen-Pei 1993
), GstD26 (Toung, Hsieh, and Chen-Pei 1993
), PGLYM (Currie and Sullivan 1994
), and Adh-
from D. mercatorum (Sullivan et al. 1994
).
|
We thank Charles Langley for the discussions motivating this research, as well as subsequent ones, Chip Aquadro for providing the fly strains, and Charles Claudianos, Wendy Odgers, and Dave Rowell for their stimulating discussions. We also thank an anonymous reviewer for constructive suggestions.
Footnotes
Shozo Yokoyama, Reviewing Editor
1 Keywords: pseudogene
esterase
mutation rate
Drosophila.
2 Address for correspondence and reprints: G. Charles de Q. Robin, 3347 Storer Hall, Center for Population Biology, University of California at Davis, One Shields Avenue, Davis, California 95616-8554. E-mail: gcrobin{at}ucdavis.edu
literature cited
Akashi, H. 1995. Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Droso-phila DNA. Genetics 139:10671076.
Aquadro, C. F., D. J. Begun, and E. C. Kindahl. 1994. Selection, recombination and DNA polymorphism in Drosophila. Pp. 4666 in B. Golding, ed. Non-neutral evolution, theories and molecular data. Chapman and Hall, London.
Balakirev, E. S., and F. J. Ayala. 1996. Is esterase-P encoded by a cryptic pseudogene in Drosophila melanogaster? Genetics 144:15111518.
Barker, J. S. F. 1994. Sequential gel electrophoretic analysis of esterase-2 in two populations of Drosophila buzzatii. Genetica 92:165175.
Begun, D. J. 1997. Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145:375382.
Campbell, P. M., J. Trott, C. Claudianos, K.-A. Smyth, R. J. Russell, and J. G. Oakeshott. 1997. Biochemistry of esterases associated with organophosphate resistance in Lucilia cuprina with comparisons to putative orthologues in other Diptera. Biochem. Genet. 35:1740.[ISI][Medline]
Charlesworth, B. 1996. The changing size of genes. Nature 384:315316.
Claudianos, C., R. J. Russell, and J. G. Oakeshott. 1999. The same amino acid substitution in orthologous esterases confers organophosphate resistance on the house fly and a blowfly. Insect Biochem. Mol. Biol. 29:675686.[ISI][Medline]
Currie, P. D., and D. T. Sullivan. 1994. Structure, expression and duplication of genes which encode phosphoglyceromutase of Drosophila melanogaster. Genetics 138:352363.
David, J. R. 1982. Latitudinal variability of Drosophila melanogaster: allozyme frequency divergence between European and Afrotropical populations. Biochem. Genet. 20:747761.[ISI][Medline]
Dumancic, M. M., J. G. Oakeshott, R. J. Russell, and M. J. Healy. 1997. Functional conservation of the Drosophila melanogaster ESTP protein in drosophilids. Biochem. Genet. 35:251271.[ISI][Medline]
Fu, Y.-X., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693709.
Genetics Computer Group. 1994. GCG. Version 8.0. GCG, Madison, Wis.
Gloor, G. B., and W. R. Engels. 1992. Single-fly preps for PCR. Drosophila Inform. Serv. 71:148.
Gottlieb, L. D., and V. S. Ford. 1997. A recently silenced, duplicate PgiC locus in Clarkia. Mol. Biol. Evol. 14:125132.[Abstract]
Graur, D., Y. Shuali, and W.-H. Li. 1989. Deletions in processed pseudogenes accumulate faster in rodents than in humans. J. Mol. Evol. 28:279285.[ISI][Medline]
Healy, M. J., M. M. Dumancic, and J. G. Oakeshott. 1991. Biochemical and physiological studies of soluble esterases from Drosophila melanogaster. Biochem. Genet. 29:365388.
Hudson, R. R. 1987. Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245250.[ISI][Medline]
Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153159.
Hughes, A. L. 1994. The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. B Biol. Sci. 256:119124.[ISI][Medline]
Hughes, M. K., and A. L. Hughes. 1993. Evolution of duplicate genes in a tetraploid animal, Xenopus laevis. Mol. Biol. Evol. 10:13601369.[Abstract]
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, New York.
Kreitman, M., and R. R. Hudson. 1991. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from patterns of polymorphism and divergence. Genetics 127:565582.
Langley, C. H., R. A. Voelker, A. J. Leigh Brown, S. Ohnishi, B. Dickinson, and E. Montgomery. 1981. Null allele frequencies at allozyme loci in natural populations of Drosophila melanogaster. Genetics 99:151156.
Li, W.-H., T. Gojobori, and M. Nei. 1981. Pseudogenes as a paradigm of neutral evolution. Nature 292:237239.
Long, M., and C. H. Langley. 1993. Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260:9195.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652654.
Miyata, T., and T. Yasunaga. 1981. Rapidly evolving mouse alpha-globin related pseudogene and its evolutionary history. Proc. Natl. Acad. Sci. USA 78:450453.
Moriyama, E. N., and J. R. Powell. 1996. Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261277.[Abstract]
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Newcomb, R. D., P. M. Campbell, R. J. Russell, and J. G. Oakeshott. 1997. A single amino acid substitution converts a carboxylesterase to an organophosphate hydrolase and confers insectide resistance on a blowfly. Proc. Natl. Acad. Sci. USA 94:74647468.
Newcomb, R. D., P. D. East, R. J. Russell, and J. G. Oakeshott. 1996. Isolation of the esterase genes associated with organophosphate resistance in Lucilia cuprina. Insect Biochem. Mol. Biol. 5:211216.
Oakeshott, J. G., T. M Boyce, R. J. Russell, and M. J. Healy. 1995. Molecular insights into the evolution of an enzyme: esterase 6 in Drosophila. Trends Ecol. Evol. 10:103110.
Oakeshott, J. G., C. Claudianos, R. J. Russell, and G. C. Robin. 1999. Carboxyl/cholinesterases: a case study of the evolution of a successful multigene family. BioEssays 21:10311042.
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin.
Ophir, R., and D. Graur. 1997. Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene 205:191202.
Petrov, D. A., Y.-C. Chao, E. C. Stephenson, and D. L. Hartl. 1998. Pseudogene evolution in Drosophila suggests a high rate of DNA loss. Mol. Biol. Evol. 15:15621567.
Petrov, D. A., and D. L. Hartl. 1998. High Rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. Mol. Biol. Evol. 15:293302.[Abstract]
Petrov, D. A., E. R. Lozovskaya, and D. L. Hartl. 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384:346349.
Powell, J. R., and R. DeSalle. 1995. Drosophila molecular phylogenies and their uses. Pp. 88137 in M. K. Hecht, ed. Evolutionary biology. Vol. 28. Plenum Press, New York.
Press, W. H. 1988. Numerical recipes in C: the art of scientific computing. Cambridge University Press, New York and Cambridge, England.
Pritchard, J. K., and S. W. Schaeffer. 1997. Polymorphism and divergence at a Drosophila pseudogene. Genetics 147:199208.
Ramos-Onsins, S., and M. Aguade. 1998. Molecular evolution of the cecropin multigene family in Drosophila: functional genes vs. pseudogenes. Genetics 150:157171.
Robin, G. C. de Q., K. M. Medveczky, R. J. Russell, and J. G. Oakeshott. 1996. Duplication and divergence of the genes of the alpha-esterase cluster of Drosophila melanogaster. J. Mol. Evol. 43:241252.[ISI][Medline]
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular analysis. Bioinformatics 15:174175.
Russell, R. J., G. C. Robin, P. Kostakos, R. D. Newcomb, T. M. Boyce, K. M. Medveczky, and J. G. Oakeshott. 1995. Molecular cloning of an esterase gene cluster on chromosome 3R of Drosophila melanogaster. Insect Bio-chem. Mol. Biol. 26:235247.
Sharp, P. M., and W.-H. Li. 1989. On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398402.[ISI][Medline]
Spackman, M. E., J. G. Oakeshott, K.-A. Smyth, K. M. Medveczky, and R. J. Russell. 1994. A cluster of esterase genes on chromosome 3R of Drosophila melanogaster includes homologues of esterase genes conferring insecticide resistance in Lucilia cuprina. Biochem. Genet. 32:3962.
Sullivan, D. T., W. T. Starmer, S. W. Curtiss, M. Menotti-Raymond, and J. Yum. 1994. Unusual molecular evolution of an Adh pseudogene in Drosophila. Mol. Biol. Evol. 11:443458.[Abstract]
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595.
. 1993. Simple methods of testing the molecular evolutionary clock hypothesis. Genetics 135:599607.
Toung, Y.-P. S., T. S. Hsieh, and D. T. Chen-Pei. 1993. The glutathione S-transferase D genes: a divergently organized, intronless gene family in Drosophila melanogaster. J. Biol. Chem. 268:97379746.
Trabesinger-Ruef, N., T. Jermann, T. Zankel, B. Durrant, G. Frank, and S. A. Benner. 1996. Pseudogenes in ribonuclease evolution: a source of new biomacromolecular function? FEBS Lett. 382:319322.
Walsh, J. B. 1995. How often do duplicated genes evolve new functions? Genetics 139:421428.
Waterson, G. A. 1975. On the number of segregating sites in genetic models without recombination. Theor. Popul. Biol. 7:256276.[ISI][Medline]