Rapidly Evolving Genes in Human. I. The Glycophorins and Their Possible Role in Evading Malaria Parasites

Hurng-Yi Wang*,{dagger},{ddagger}, Hua Tang{ddagger}, C.-K. James Shen{dagger} and Chung-I Wu{ddagger},

* Department of Biology, National Taiwan Normal University, Taipei, Taiwan
{dagger} Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
{ddagger} Department of Ecology and Evolution, University of Chicago

Correspondence: E-mail: ciwu{at}uchicago.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
In an attempt to identify all fast-evolving genes between human and other primates, we found three glycophorins, GPA, GPB, and GPE, to have the highest rate of nonsynonymous substitutions among the 280 genes surveyed. The Ka/Ks ratios are generally greater than 3 for GPA, GPB, and GPE in human, chimpanzee, and gorilla, indicating positive selection. The uniformly high substitution rate across loci can be explained by the frequent sequence exchanges among genes. GPA is the receptor for the binding ligand EBA-175 of the malaria parasite, Plasmodium falciparum. The levels of nonsynonymous divergence and polymorphism of EBA-175 are also the highest in the genome of P. falciparum. We hypothesize that GPA has been evolving rapidly to evade malaria parasites. Both the high rate of nonsynonymous substitutions and the frequent interlocus conversions may be means of evasion. The support for the evasion hypothesis is still indirect, but, unlike other hypotheses, it can be tested specifically and systematically.

Key Words: positive selection • glycophorin • malaria • gene conversion • rapid evolution


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
The influx of DNA sequences has permitted a systematic analysis of rapidly evolving genes between closely related species. Such genes are enriched with information about the action of positive Darwinian selection and may shed light on the process of genic adaptation (Kitano and Saitou 1999; Wyckoff, Wang, and Wu 2000; Fay, Wyckoff, and Wu 2001; Johnson et al. 2001; Enard et al. 2002; Fay, Wyckoff, and Wu 2002; Smith and Eyre-Walker 2002; Bamshad and Wooding 2003). We have thus initiated an attempt at characterizing each rapidly evolving gene among higher primates. Previous studies of rapidly evolving genes have revealed a trend among genes of male reproduction (Civetta and Singh 1998; Ting et al. 1998; Wyckoff, Wang, and Wu 2000; Swanson et al. 2001) and defense against pathogens (Yang and Bielawski 2000). A systematic survey can confirm the generality of those observations and add interesting exceptional cases.

There are currently more than 500 published coding sequences from the Old World monkey (OWM) that can be inferred to be orthologous to a human gene. Among them, 280 meet a set of criteria (length, completeness, etc.; see Materials and Methods) to become the basis of our analysis. Of these 280 genes, 17 have been determined to be fast evolving. For each of these fast-evolving genes, we ask (1) if the gene has evolved especially rapidly in the human lineage; (2) if there are specific sites under positive selection; and (3) what may be the forces driving the rapid evolution.

In this data set, the fastest evolving genes in the human lineage are the glycophorins. There are three glycophorin loci in human, chimpanzee, and gorilla but only one in other primate species (Rearden et al. 1993; Blumenfeld et al. 1997). In human, glycophorin A (GPA) and B (GPB) code for antigens underlying the very common MN and Ss blood type polymorphisms, respectively. At least 40 other blood types are caused by the glycophorin variation (Blumenfeld and Huang 1995). Rearrangements by unequal recombination and/or gene conversion between glycophorin genes appear to be very common, and hot spots of recombination exist in a region of 4 kb encompassing the three extracellular exons (II, III, and IV) (Blumenfeld and Huang 1997).

While GPA constitutes the most abundant glycoproteins on the erythrocyte surface, an exceptional case of deletion homozygote for both GPA and GPB has been known to lead to normal adulthood (Schenkel-Brunner 2000). In human, GPA has been shown to be the receptor of a binding ligand, the 175-kD erythrocyte-binding antigen (EBA-175) of Plasmodium falciparum (Pasvol, Wainscoat, and Weatherall 1982; Sim et al. 1994). A recent study analyzed the evolution of GPA among higher primates (Baum, Ward, and Conway 2002) and suggested a "decoy" hypothesis for its rapid evolution. To further evaluate the possible forces driving the evolution of glycophorins, we sequenced the extracellular domain (exons II, III, and IV) of all three glycophorin genes from human, chimpanzee, and gorilla and the single gene from gibbon. We also analyzed published DNA sequences of P. falciparum in conjunction with the glycophorin sequences. An alternative "evasion" hypothesis is proposed to account for the overall patterns that are not easily discernible in the GPA sequences alone.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
Data Collection
All DNA sequences of the OWM (mostly Macaca and Papio) were downloaded from GenBank. Coding region sequences (CDS) were trimmed from the GenBank records. For each OWM sequence, we chose as putative ortholog the human refseq that has the highest score and lowest E value in the blast search. Among more than 500 pairs of sequences so identified, we retained those with matching annotations between human and OWM and rejected those with many ambiguous codons. Finally, we calculated the Ka and Ks values for each putative pair of orthologs by the method of Li (1993), where Ka is the number of nonsynonymous substitutions per nonsynonymous site and Ks is the number of synonymous substitutions per synonymous site. We finalized the data set using the 280 genes with Ks < 0.15 (most with Ks < 0.1; the average being 0.066). Because the objective is to compare Ka and Ks, the occasional misidentification of othology should not affect the main goal of searching for genes with high Ka/Ks ratios. Figure 1 contains these calculations.



View larger version (20K):
[in this window]
[in a new window]
 
FIG. 1. The Ka values of 280 orthologous genes between human and macaque are plotted against Ks. GPA is indicated (GPB and GPE are not shown)

 
DNA Sequencing
The extracellular domain of the glycophorin genes, which spans exons II, III, and IV (fig. 2), was targeted for polymerase chain reaction. Two primers, 5'-GTT CTT AAT CCC TTT CTC AAC TTC-3' and 5'-GCA TTT GAA ACA AGC AAT GGA TAG-3', were used for amplification of DNAs from 10 humans (two African Americans, two Caucasians, four Han Chinese, and two Ami aborigines in Taiwan), three chimpanzees, two gorillas, and one gibbon. The amplification was carried out using the following cycling parameters: initial denaturation at 94°C for 2 min, 35 cycles of denaturation at 94°C for 1 min, and primer annealing at 58°C for 1 min, followed by extension at 72°C for 1.5 min. The last cycle employed an extension time of 10 min. PCR products were purified from agarose gels with Gene Clean III elution kit (BIO 101, USA). The PCR products, which may be GPA, GPB, or GPE, were cloned into pCR2.1-TOPO vector.



View larger version (11K):
[in this window]
[in a new window]
 
FIG. 2. The canonical genomic structure of human GPA, GPB, and GPE. These genes are tandemly arranged and exons 2 to 4 account for 226 bp of the 2-kb region sequenced. The LSTTE/SSTTG stretches of amino acids represent the N/M antigenic determinants. The M/T amino acids in exon IV determine the S/s antigens. Unexpressed pseudoexons ({phi}) are represented by filled boxes followed by the splice site mutations

 
In total, 29 human, nine chimpanzee, and six gorilla sequences were obtained. Sequencing reactions were carried out on both strands, often by using internal primers as well. The sequences were determined with dye terminator cycle sequencing reactions using Applied Biosystems 377A sequencer. Because there is no singleton found in the coding regions, PCR errors should not be a major concern. We also verified the cases of gene conversion by new rounds of PCR and sequencing. (New sequences described in this study are human AY297541 to AY297569, chimpanzee AY297570 to AY297578, gorilla AY297580 to AY297585, and gibbon AY297579.)

Data Analysis
To calculate the sequence divergence in introns, Ki (Ki = number of nucleotide changes per intronic site), Kimura's two-parameter model was used (Kimura 1980). For coding regions, we used Li's (1993) method to calculate Ka and Ks as stated above. Ki in general may be a better representation of the neutral rate than Ks. In addition, there are far more intronic sites than the synonymous ones in this study. Because of the closeness between human and macaque, the results are essentially the same if other methods are used (Yang 1997; Yang and Bielawski 2000). For the phylogenetic tree of figure 3, we used the neighbor-joining method (Saitou and Nei 1987) based on Kimura's two-parameter model. Figure 4 shows evidence of gene conversion, which distorts the phylogenetic relationship.



View larger version (15K):
[in this window]
[in a new window]
 
FIG. 3. The phylogeny of the glycophorin sequences. Human sequences that conform to the canonical structure of figure 2 are labeled hGPA, hGPB, and hGPE, respectively. c and g denote chimpanzee and gorilla sequences, respectively. We designate the nonhuman sequences only as clusters 1 to 3 because the A-B-E designation is appropriate only in human (see text). Numbers at the nodes are the percentages of bootstrap support from 500 replications. Nodes with a bootstrapping value lower than 60% are collapsed into a single one. In the inset, the expected phylogeny among these genes in the absence of gene conversion is shown (Rearden et al. 1993). Note that the observed star-like phylogeny among the major clusters contrasts sharply with the expected tree of orderly bifurcation

 


View larger version (19K):
[in this window]
[in a new window]
 
FIG. 4. Three examples of segmental gene conversion among human glycophorin sequences. (a) hGPA-var1 (variant 1 of hGPA) was converted by hGPE. The first base shown here is position 701 from the beginning of exon 2. (b) hGPA-var2 and hGPA-var3 were converted by hGPE. The first base shown is position 1201. These two variant sequences are from Baum, Ward, and Conway. (2002). Dot (.) and dash (-) denote identical nucleotide and gap, respectively

 
For the divergence of figure 5a, every sequence of human, chimpanzee, and gorilla (h/c/g) was compared with that of gibbon. For figure 5b, we compared sequences among h/c/g (but not within species). The total number of interspecific comparisons is therefore 489 (29 x 9 + 29 x 6 + 9 x 6), based on 29 human, nine chimpanzee and six gorilla sequences. Intraspecific comparisons are not included here because of the presence of many nearly identical alleles. The average Ka/Ks and Ka/Ki values were calculated by averaging over Ks/Ka and Ki/Ka and taking their reciprocals. We present the harmonic mean because Ks and Ki are sometimes 0.



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 5. (a) Distributions of Ka, Ks, and Ki between the gibbon glycophorin and the 42 sequences from human, chimpanzee, and gorilla (h/c/g for short). Ka, Ks, and Ki are, respectively, the substitution numbers for nonsynonymous, synonymous, and intron sites. The results are similar when the macaque sequence is used as the outgroup. (b) Distributions of the Ka/Ks and Ka/Ki ratios in the interspecific comparisons between human, chimpanzee, and gorilla. Note that the number of comparisons in (b) is much larger than that in (a)

 
To identify the putative amino acid residues under positive selection and to estimate the strength of selection, Model 8 of codeml in the PAML package was used (Yang 1997; Yang et al. 2000). We chose one representative gene from each genealogical cluster for the analysis; that is, one GPA and GPB each from human, chimpanzee, and gorilla, respectively, as well as the single gene from orangutan, gibbon, and OWM. The human GPE-like sequences (fig. 3) were not chosen because of the presence of two pseudoexons out of the three exons (Blumenfeld et al. 1997).

For the malaria parasite, the nine genes used for the comparison with EBA-175 are STARP, CSP, AMA-1, Pfs25, RAP-1, sporozoite antigen, MSP3, Pfg27/25, and Pfs48/45. For genes with two major alleles in P. falciparum, such as EBA-175 and MSP3, the one that is closer to the P. reichenowi sequence was selected for comparison.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
Survey of Rapidly Evolving Genes Between Human and the Macaque Monkey
The goal is to identify fast-evolving genes between human and OWM. The analysis will then be extended to other primates and human populations. We use the standard procedure to measure the rate of coding sequence evolution (Ka and Ks; see Materials and Methods). In general, if Ka is significantly greater than Ks, putative positive selection is suggested.

In figure 1, Ka is plotted against Ks for the 280 genes from human and OWM. The data set likely has an overrepresentation of fast-evolving genes, perhaps due to a greater interest and effort at finding and publishing such genes by investigators. (The extrapolation from any data set to the whole genome will be plagued by possible biases in representation until the two respective genomes are nearly entirely sequenced and annotated.) In this data set, the average Ka is 2.67% and average Ks is 6.60%, their ratio being 0.405. The Ka/Ks value averaged across all 280 genes is 0.462. There is a slight and positive correlation (r = 0.262 and slope = 0.242) between Ka and Ks, as noticed by many authors in diverse taxonomic groups (Comeron and Kreitman 1998; Makalowski and Boguski 1998).

Genes above the diagonal have a Ka/Ks ratio greater than 1, and there are 26 of them. Many of these genes have an unusually small Ks, rather than a large Ka. Because smaller genes tend to experience wider fluctuations in Ks, the Ka/Ks criterion may result in the overinclusion of small genes among the fast-evolving ones. We therefore suggest a different measure, {delta} = (Ka Ks)/{sigma}Ks) where {sigma}Ks is the standard deviation of Ks. When Ka/Ks > 1, {delta} > 0. In table 1, 13 genes with {delta} > 1 are listed in the descending order of {delta} plus four other genes that have {delta} < 1 but Ka > 0.08. The {delta} statistic may have an advantage over the standard Ka/Ks presentation when portraying the rate of nonsynonymous substitution among small genes.


View this table:
[in this window]
[in a new window]
 
Table 1 A List of 17 Most Rapidly Evolving Genes Between Human and Old World Monkey (OWM).

 
One of the major categories in table 1 is the genes involved in immune response, including CD3G, CD59, CD3E, MSMB, interleukin 3, and interleukin 4. In addition, one of the three blood group–related genes, GYPA (= GPA), has also been shown to be involved in a pathogen interaction. Therefore, more than one-third of the genes listed in table 1 are involved in defense. Two mitochondrial respiratory complex enzymes, COX8 and NDUFC2, are fast evolving. These nuclear genes may have coevolved with mitochondria-encoded genes. The rest include two reproductive genes, PRM1 and SPINK2, one glycoprotein hormone, CGA, and one digestive enzyme, LYZ. The functions of MGC2217 and GW128 are yet to be identified. Among the top 13 entries, almost all of them have a Ks value below the average of 6.6%, with the exception of GPA (glycophorin A).

In our study, we chose genes for further analysis on the basis of both a high Ka/Ks ratio and a high Ka value. GPA has the highest Ka among all the genes surveyed. Another gene with a comparably high Ka value is the protamine, which has been analyzed in detail already (Rooney, Zhang, and Nei 2000; Wyckoff, Wang, and Wu 2000).

Evolution of the Glycophorins
Figure 2 shows the canonical genomic structure of GPA, GPB, and GPE in human. These sequences are defined by sites previously recognized in serological analyses and by the exon-intron splicing patterns. However, the delineation of the A, B, and E loci applies only to human. In both chimpanzee and gorilla, their GPBs (like human GPA) do not skip exon III. In gorilla, GPA has a common allele that skips this exon (Huang et al. 1995; Xie et al. 1997). The reason that the locus designation does not agree among species is explored below.

Gene Conversion
The 29 sequences from humans consist of 10 GPA, nine GPB, and 10 GPE alleles. The nine sequences from chimpanzee and six sequences from gorilla cannot be categorized according to the canonical structure of figure 2. Figure 3 presents the phylogeny of the glycophorin sequences from human, chimpanzee, gorilla (h/c/g for short) and the outgroup, gibbon, based on the 2-kb region shown in figure 2. Since the gene triplication occurred before the speciation among the three species (Rearden et al. 1993; Blumenfeld et al. 1997), we had expected three clusters representing the A, B, and E locus, respectively (see the inset in figure 3). Instead, nine clusters were observed. Although each cluster likely represents alleles of a locus of one species, there is no clear phylogenetic relationship among the clusters. (The appearance of an E-locus cluster is deceiving as the distances between species are too high for orthologous genes.)

A simple way to see the phylogenetic incongruence with the expectation is given in table 2. The orthologous loci between human and chimpanzee and between these two species and gorilla should be around 1.2% and 1.5%, respectively (Chen and Li 2001). Instead, no two clusters are less than 2.4% apart, suggesting the absence of true orthology among these clusters. Because the distances between all nine clusters to gibbon are close to the genomic average (Sibley and Ahlquist 1987; Li 1997), mutation rate is not the source of incongruence.


View this table:
[in this window]
[in a new window]
 
Table 2 Average Differences Between Sequences of the Nine Phylogenetic Clusters of Figure 3 and the Sequence of Gibbon.

 
Gene conversion is a plausible explanation for the divergence patterns of figure 3 and table 2. For example, conversion of locus B by locus A in any species would destroy the former's orthology with those of the other two species. Among our human sequences, three hybrid molecules have been identified, including two MiIII variants (Huang and Blumenfeld 1991), which are the GPB allele partially converted by GPA, and one GPA allele converted by GPE in intron 3. This latter case is shown in figure 4a. We also noticed small segments shared by hGPA and hGPE in Baum, Ward, and Conway's (2002) sequences (figure 4b). In addition, more than 40 glycophorin rearrangements have been discovered (Blumenfeld and Huang 1995), and hot spots of recombination have been identified in this region (Blumenfeld and Huang 1997). Overall, sequence exchanges among loci appear quite active in these species, but the conversion has been segmental. (Otherwise, sequences of all three loci from the same species would have been more closely related phylogenetically.) Partial conversions resulted in sequences that are recombinants between loci and, hence, are of highly mixed ancestry. This mixed ancestry may account for the lack of clear-cut phylogenetic pattern in figure 3.

In general, the long-term consequence of gene conversion is homogenization among loci, but that is not the effect of our concern. The comparison we consider here is the level of genetic variation across multiple loci that undergo occasional gene conversion vis-à-vis the level of single-locus variation. The former should generally be higher than the latter. This elevated level of variation may be characterized as the "storage and retrieval" effect, which plays a central role in the "malaria evasion" hypothesis to be elaborated later.

Rate of Evolution
A result of frequent partial gene conversions is that all three loci would have evolved at a comparable rate. In figure 5a, the divergence between the single glycophorin of gibbon and those of all three loci of h/c/g is shown. Indeed, all h/c/g glycophorin sequences have evolved at an extraordinarily high rate. The average Ka is 0.143, three times as high as the average Ks (0.045). (In human, GPB and GPE appear to have slightly smaller Ka/Ks ratios than GPA, an observation that may be accounted for by the presence of pseudoexons in these two genes in human.) It is striking that ka, ranging from 0.101 to 0.196, does not overlap with ks (0.031–0.073) in a total of 44 comparisons. The intron regions have evolved at an even slower rate than Ks, with Ki ranging from 0.038 to 0.049 (mean = 0.042).

The high rate of Ka does not depend on the choice of outgroup. Between the h/c/g sequences and that of the macaque, the average Ka/Ks ratio is 2.0, which increases to 3.56 if the outgroup is orangutan. It appears that the rate of amino acid substitution has accelerated since the apes separated from the OWMs. To address the possibility of recent acceleration in nonsynonymous substitutions, we compare the glycophorins among human, chimpanzee, and gorilla. Because each pairwise comparison represents a different genealogical depth, the Ka/Ks or Ka/Ki ratio was calculated for each of the 489 interspecific comparisons. In figure 5b, most of the Ka/Ks and Ka/Ki values are greater than 1, with an average of 4.0 and 2.61, respectively. These high ratios indicate that the selective pressure driving amino acid substitutions in glycophorins may have intensified since the time of African apes' common ancestor. It is significant that the increased selective pressure appears to be on all three loci.

Amino Acid Sites Under Selection
Given the large number of glycophorin coding sequences, we attempted to identify the putative amino acid residues under positive selection and estimate the strength of selection by the maximum likelihood method of (Yang et al. 2000). Using Model 8, we estimated that 78% of the residues have evolved under near neutrality with Ka/Ks{approx}1, and 22% have been driven by positive selection. According to this model, the average Ka/Ks ratio for these selectively driven sites is as high as 7.7. These fast-evolving sites are distinct from the glycosylation sites, which are relatively conserved among primate species (Baum, Ward, and Conway 2002). These sites are listed in Supplementary Material online.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
What drives the rapid evolution of the glycophorins? Baum, Ward, and Conway (2002) previously suggested a "decoy hypothesis" based on the rapid evolution of the GPA sequences. In this report, we show that all three glycophorin loci have comparably high Ka/Ks ratios in human, chimpanzee, and gorilla. These observations have led to an alternative suggestion, which will be referred to as the "evasion hypothesis."

The Decoy Hypothesis
In this hypothesis, GPA serves as a "decoy" to distract viruses and bacteria away from other vital organs (Baum, Ward, and Conway 2002). In general, one might not expect a decoy to evolve rapidly as it should be made easy to find. To explain the rapid evolution of GPA, the hypothesis posits that a decoy behaves like the immunoglobulins, which diversify rapidly to cope with a wide array of antigens. Glycophorins, however, have a rather different evolutionary dynamics than the immunoglobulins. Although they have been evolving rapidly, the heterozygosity in GPA in any individual is quite unremarkable, other than the common MN polymorphism. The low abundance of GPB, which underlies the Ss polymorphism, and the undetectable expression of GPE also seem incompatible with the postulate of diversity enhancement. In addition, since there are many sialic proteins specifically encoded on the erythrocyte surface (e.g., Kell, GPC, and Duffy [Schenkel-Brunner 2000]), they might be adequate decoys as well. Why then have they not been evolving rapidly like the glycophorins? Under the decoy hypothesis, pathogenic antigens must themselves be changing rapidly, so the decoy has to keep up the pace. Without identifying the candidate pathogens for which the decoy serves to distract, the hypothesis at the moment is not testable.

Alternatively, rapid evolution between loci and low diversity within locus may suggest evasion. The many incidences of interlocus conversions would also mean frequent and abrupt changes in the receptor structure. The evasion hypothesis below is very specific about the candidate pathogen and can be falsified with proper experimental setups.

The Evasion (from P. falciparum) Hypothesis
In human, both GPA and GPB have been shown to be the receptors of the malaria parasite, P. falciparum (Pasvol, Wainscoat, and Weatherall 1982; Dolan et al. 1994). The malaria ligand binding to human GPA has been identified to be the 175-kD erythrocyte-binding antigen (EBA-175). Its binding to GPA has been shown to be the primary pathway by which P. falciparum invades human erythrocytes (Sim et al. 1994). The proposed hypothesis is that GPA has been evolving rapidly to evade the malaria parasite. Both mutations and interlocus conversions are means of evasion; hence, the GPB and GPE sequences have been impacted by malaria as well. It should be noted that, under the evasion hypothesis, binding to EBA-175 is considered a negative pleiotropic effect of GPA. The normal function(s) of the glycophorins currently remain(s) unknown.

In parallel, EBA-175 may have been tracking the evolution of the glycophorins and, if true, should be evolving just as rapidly as the glycophorins. All these fast-evolving molecules should bear a strong signature of positive selection. In this section, we shall outline the evidence to show that the hypothesis is a viable one and deserves to be seriously tested. (The scope of the actual testing is, however, beyond the goals of the present study.)

Interspecific Divergence in EBA-175 and Other Loci
We first examined the nucleotide substitution rate of EBA-175 vis-à-vis those of other genes between P. falciparum and P. reichenowi, the latter infecting chimpanzee (Ozwara et al. 2001). The entire EBA-175 (4,359 bp) has a Ka value of 0.095 (SE = 0.009) and a Ks value of 0.074 (SE = 0.015). The difference is significant (P < 0.01), as determined by the simulation method of Wyckoff, Wang, and Wu (2000). EBA-175 has apparently been under positive selection since the speciation between the two Plasmodia (and, presumably, their hosts, human and chimpanzee). We also compute the Ka and Ks values for nine other antigen-coding loci (table 3). The value ranges from 0.014 to 0.058 for Ka and 0.027 to 0.152 for Ks. Many of these antigens, including AMA-1, CSP, MSP-3, and Pfs48/45, are themselves under positive selection (Hughes and Hughes 1995; Escalante, Lal, and Ayala 1998), but none has a higher Ka value than EBA-175.


View this table:
[in this window]
[in a new window]
 
Table 3 The Divergence of 10 Genes Between P. falciparum and P. reichenowi.

 
Polymorphism in EBA-175
The erythrocyte-binding domain of EBA-175 has been identified to be the 5' cysteine-rich region, a 616–amino acid stretch called "region II" (Sim et al. 1994). DNA sequences from region II are available from 20 worldwide strains of P. falciparum (Liang and Sim 1997). In this sample, there are 20 nonsynonymous variants and one synonymous variant. The observed low level of synonymous polymorphism corroborates the interpretation of a recent loss of genetic variation in P. falciparum (Rich and Ayala 1998; Volkman et al. 2001).

In comparison, the level of nonsynonymous variation is too high to be attributed to demographical influences. More revealing is the population frequencies of these nonsynonymous changes. The frequency spectrum of the derived mutations is shown in figure 6. Against the neutral equilibrium (blank bar), there is an excess of high-frequency mutations by Fay and Wu's H statistic (Fay and Wu 2000) (P < 0.05), a sign of positive selection (Ewens 1979). Because P. falciparum is believed to have experienced a recent loss in neutral variation and should have an excess of rare mutations over the neutral equilibrium (Tajima 1989; Fu 1994), the excess in high-frequency mutations in figure 6 is even more noteworthy. Such an excess in region II can best be accounted for by either global positive selection (Ewens 1972; Fay and Wu 2000) or local selection leading to population differentiation (Fu 1994; Slatkin and Wiehe 1998). (Again, the lack of differentiation at the synonymous sites rules out the possibility of neutral population subdivision.)



View larger version (12K):
[in this window]
[in a new window]
 
FIG. 6. The frequency spectrum of nonsynonymous polymorphism in region II of EBA-175 of P. falciparum. At 17 of the 20 sites, the derived nucleotide can be inferred by reference to the P. reichenowi sequence; the remaining three polymorphisms are ambiguous and are not included. The frequency of occurrences of these mutations in a sample of 20 genes is given on the X-axis while the Y-axis shows the number of sites. The frequency spectrum at the neutral equilibrium is given by {theta}/i where i is the number of occurrences (Fu 1994). {theta} is the population parameter (4Nu) and is estimated by {theta}(1 + 1/2 +···+ 1/19) = 17 (Watterson 1975). The lone synonymous mutation occurs only once in the sample

 
Correlation Between Glycophorin Variations and the Prevalence of Malaria
The removal of GPA reduces the invasion efficiency by 50% to 95%, depending on the P. falciparum strain (Okoyeh, Pillai, and Chitnis 1999). The full-length extracellular domain of GPA (exons II, III, and IV) has been shown to be necessary for the binding of EBA-175 in vitro. We thus expect many naturally occurring glycophorin variants to have different binding affinity to EBA-175. An example is the En(a-) variant, which is a recombinant between GPA and GPB, and has been shown to be a poor parasite receptor in the invasion assay (Pasvol, Wainscoat, and Weatherall 1982). In addition, the specificity of Plasmodium invasion has been known to be very high across primate species (Escalante, Barrio, and Ayala 1995).

From structural considerations, many glycophorin variants common in regions of malaria endemics may be poor receptors for EBA-175. The He variant is a GPB epitope converted in part by GPA but with several additional mutations. The He epitope, which may make GPA GPB-like or vice versa, occurs very rarely in Caucasians and Asians but is prevalent among Africans (2% to 10%) from malaria endemic regions (Race and Sanger 1975; Mourant, Domaniewska-Sobczak, and Kopeâc 1976). In contrast, the variant Sta can reach 5% to 10% in some East Asian populations but are extremely rare among Africans and Caucasians. Sta is mostly a GPA allele that skips exon III and thus resembles GPB in the extracellular domain (Huang, Chen, and Blumenfeld 2000). A most interesting case may be the Mi-III variant, a GPB allele partially converted by GPA. The conversion restores the expression of the pseudoexon III (fig. 1), making GPB more like a variant GPA (Huang and Blumenfeld 1991). Whereas Mi-III accounts for less than 1% among Caucasians and 3% among ethnic Han Chinese, it represents 30% to 90% of GPB in several large dominant aborigines groups. These groups, especially the Ami tribe, occupied the lower elevation in Taiwan, where malaria was common in the past (Broadberry and Lin 1996).

In addition to such structural variants, many populations in regions of malaria endemics harbor unusual glycophorin variants, often in unusual frequencies. The Hunter variant can reach 22% in West Africa but is rare among Caucasians (0.5%) (Blumenfeld et al. 1997). In New Guinea, the frequency of N antigen is higher than 90%, whereas it is generally about 50% (30% to 70%) elsewhere in the world (Mourant, Domaniewska-Sobczak, and Kopeâc 1976).

Conclusions
Unlike other genetic alterations, such as sickle cell anemia or G6PD deficiency (Tishkoff et al. 2001), that confer resistance to malaria, mutations in the glycophorins would not have been debilitating even in homozygotes (Schenkel-Brunner 2000). Moreover, a reservoir of GPB and GPE variants retrievable by gene conversion or unequal exchange may produce novel GPA variants and provide human and African apes a means to evade the pursuit of pathogens. In this scenario, the advantage of gene duplication may be the ability to "store and retrieve" genetic variations. Whether (and how) glycophorins and Plasmodium genes interact and coevolve will have implications in public health and evolutionary theories. If this hypothesis turns out to be correct, human and ape ancestors must have been battling malaria for over 10 million years.

Pooling the evidence, we consider it plausible that the evolution of human glycophorins is at least partially driven by P. falciparum. It may be fruitful to systematically document the invasion efficiency of P. falciparum strains that carry different EBA-175 alleles. Such efficiency should be assayed against human erythrocytes carrying different glycophorin variants.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
The amino acid residues that may have been driven by positive selection are listed on the journal's Web site.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 
We wish to thank C.-H. Huang for both the DNA samples and extensive discussions. We thank Li Jin and C. Toomajian for sharing DNA samples. We are grateful to I. Boussy, N. Osada, X. Lu, G. Morris, H. T. Yu, and S. C. Lee for the general help. H.Y.W. was supported by a predoctoral fellowship from the King Car Company of Taiwan to visit Chicago for two years. This work was also supported by NIH and NSF grants to C.-I.W. and Academia Sinica grants to J.C.S.


    Footnotes
 
Naruya Saitou, Associate Editor Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 Literature Cited
 

    Bamshad, M., and S. P. Wooding. 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4:99-111.[CrossRef][ISI][Medline]

    Baum, J., R. H. Ward, and D. J. Conway. 2002. Natural selection on the erythrocyte surface. Mol. Biol. Evol. 19:223-229.[Abstract/Free Full Text]

    Blumenfeld, O. O., and C. H. Huang. 1995. Molecular genetics of glycophorin MNS variants. Transfus. Clin. Biol. 4:357-365.

    1997. Molecular genetics of the glycophorin gene family, the antigens for MNSs blood groups: multiple gene rearrangements and modulation of splice site usage result in extensive diversification. Hum. Mutat. 6:199-209.

    Blumenfeld, O. O., C. H. Huang, S. S. Xie, and A. Blancher. 1997. Molecular biology of glycophorins in humans and nonhuman primates. Pp. 113–146 in A. Blancher, J. Klen, and W. W. Socha, eds. Molecular biology and evolution of blood group and MHC antigens in primates. Springer, New York.

    Broadberry, R. E., and M. Lin. 1996. The distribution of the MiIII (Gp.Mur) phenotype among the population of Taiwan. Transfus. Med. 6:145-148.[CrossRef][ISI][Medline]

    Chen, F. C., and W. H. Li. 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68:444-456.[CrossRef][ISI][Medline]

    Civetta, A., and R. S. Singh. 1998. Sex-related genes, directional sexual selection, and speciation. Mol. Biol. Evol. 15:901-909.[Abstract]

    Comeron, J. M., and M. Kreitman. 1998. The correlation between synonymous and nonsynonymous substitutions in Drosophila: mutation, selection or relaxed constraints? Genetics 150:767-775.[Abstract/Free Full Text]

    Dolan, S. A., J. L. Proctor, D. W. Alling, Y. Okubo, T. E. Wellems, and L. H. Miller. 1994. Glycophorin B as an EBA-175 independent Plasmodium falciparum receptor of human erythrocytes. Mol. Biochem. Parasitol. 64:55-63.[CrossRef][ISI][Medline]

    Enard, W., M. Przeworski, S. E. Fisher, C. S. Lai, V. Wiebe, T. Kitano, A. P. Monaco, and S. Paabo. 2002. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418:869-872.[CrossRef][ISI][Medline]

    Escalante, A. A., E. Barrio, and F. J. Ayala. 1995. Evolutionary origin of human and primate malarias: evidence from the circumsporozoite protein gene. Mol. Biol. Evol. 12:616-626.[Abstract]

    Escalante, A. A., A. A. Lal, and F. J. Ayala. 1998. Genetic polymorphism and natural selection in the malaria parasite Plasmodium falciparum. Genetics 149:189-202.[Abstract/Free Full Text]

    Ewens, W. J. 1972. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3:87-112.[ISI][Medline]

    1979. Mathematical population genetics. Springer-Verlag, Berlin.

    Fay, J. C., and C. I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413.[Abstract/Free Full Text]

    Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2001. Positive and negative selection on the human genome. Genetics 158:1227-1234.[Abstract/Free Full Text]

    2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:1024-1026.[CrossRef][ISI][Medline]

    Fu, Y. X. 1994. Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences. Genetics 138:1375-1386.[Abstract/Free Full Text]

    Huang, C. H., and O. O. Blumenfeld. 1991. Molecular genetics of human erythrocyte MiIII and MiVI glycophorins: use of a pseudoexon in construction of two delta-alpha-delta hybrid genes resulting in antigenic diversification. J. Biol. Chem. 266:7248-7255.[Abstract/Free Full Text]

    Huang, C. H., Y. Chen, and O. O. Blumenfeld. 2000. A novel St(a) glycophorin produced via gene conversion of pseudoexon III from glycophorin E to glycophorin A gene. Hum. Mutat. 15:533-540.[CrossRef][ISI][Medline]

    Huang, C. H., S. S. Xie, W. Socha, and O. O. Blumenfeld. 1995. Sequence diversification and exon inactivation in the glycophorin A gene family from chimpanzee to human. J. Mol. Evol. 41:478-486.[ISI][Medline]

    Hughes, M. K., and A. L. Hughes. 1995. Natural selection on Plasmodium surface proteins. Mol. Biochem. Parasitol. 71:99-113.[CrossRef][ISI][Medline]

    Johnson, M. E., L. Viggiano, J. A. Bailey, M. Abdul-Rauf, G. Goodwin, M. Rocchi, and E. E. Eichler. 2001. Positive selection of a gene family during the emergence of humans and African apes. Nature 413:514-519.[CrossRef][ISI][Medline]

    Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[ISI][Medline]

    Kitano, T., and N. Saitou. 1999. Evolution of Rh blood group genes have experienced gene conversions and positive selection. J. Mol. Evol. 49:615-626.[ISI][Medline]

    Li, W. H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-99.[ISI][Medline]

    1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Liang, H., and B. K. Sim. 1997. Conservation of structure and function of the erythrocyte-binding domain of Plasmodium falciparum EBA-175. Mol. Biochem. Parasitol. 84:241-245.[CrossRef][ISI][Medline]

    Makalowski, W., and M. S. Boguski. 1998. Synonymous and nonsynonymous substitution distances are correlated in mouse and rat genes. J. Mol. Evol. 47:119-121.[ISI][Medline]

    Mourant, A. E., K. Domaniewska-Sobczak, and A. C. Kopeâc. 1976. The distribution of the human blood groups and other polymorphisms. Oxford University Press, London, New York.

    Okoyeh, J. N., C. R. Pillai, and C. E. Chitnis. 1999. Plasmodium falciparum field isolates commonly use erythrocyte invasion pathways that are independent of sialic acid residues of glycophorin A. Infect. Immun. 67:5784-5791.[Abstract/Free Full Text]

    Ozwara, H., C. H. Kocken, D. J. Conway, J. M. Mwenda, and A. W. Thomas. 2001. Comparative analysis of Plasmodium reichenowi and P. falciparum erythrocyte-binding proteins reveals selection to maintain polymorphism in the erythrocyte-binding region of EBA-175. Mol. Biochem. Parasitol. 116:81-84.[CrossRef][ISI][Medline]

    Pasvol, G., J. S. Wainscoat, and D. J. Weatherall. 1982. Erythrocytes deficiency in glycophorin resist invasion by the malarial parasite Plasmodium falciparum. Nature 297:64-66.[ISI][Medline]

    Race, R. R., and R. Sanger. 1975. Blood groups in man. Lippincott, Philadelphia.

    Rearden, A., A. Magnet, S. Kudo, and M. Fukuda. 1993. Glycophorin B and glycophorin E genes arose from the glycophorin A ancestral gene via two duplications during primate evolution. J. Biol. Chem. 268:2260-2267.[Abstract/Free Full Text]

    Rich, S. M., and F. J. Ayala. 1998. The recent origin of allelic variation in antigenic determinants of Plasmodium falciparum. Genetics 150:515-517.[Free Full Text]

    Rooney, A. P., J. Zhang, and M. Nei. 2000. An unusual form of purifying selection in a sperm protein. Mol. Biol. Evol. 17:278-283.[Abstract/Free Full Text]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.[Abstract]

    Schenkel-Brunner, H. 2000. Human blood groups. Springer-Verlag, New York.

    Sibley, C. G., and J. E. Ahlquist. 1987. DNA hybridization evidence of hominoid phylogeny: results from an expanded data set. J. Mol. Evol. 26:99-121.[ISI][Medline]

    Sim, B. K., C. E. Chitnis, K. Wasniowska, T. J. Hadley, and L. H. Miller. 1994. Receptor and ligand domains for invasion of erythrocytes by Plasmodium falciparum. Science 264:1941-1944.[ISI][Medline]

    Slatkin, M., and T. Wiehe. 1998. Genetic hitch-hiking in a subdivided population. Genet. Res. 71:155-160.[CrossRef][ISI][Medline]

    Smith, N. G., and A. Eyre-Walker. 2002. The compositional evolution of the murid genome. J. Mol. Evol. 55:197-201.[ISI][Medline]

    Swanson, W. J., A. G. Clark, H. M. Waldrip-Dail, M. F. Wolfner, and C. F. Aquadro. 2001. Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc. Natl. Acad. Sci. USA 98:7375-7379.[Abstract/Free Full Text]

    Tajima, F. 1989. The effect of change in population size on DNA polymorphism. Genetics 123:597-601.[Abstract/Free Full Text]

    Ting, C. T., S. C. Tsaur, M. L. Wu, and C. I. Wu. 1998. A rapidly evolving homeobox at the site of a hybrid sterility gene. Science 282:1501-1504.[Abstract/Free Full Text]

    Tishkoff, S. A., R. Varkonyi, and N. Cahinhinan, et al. (17 co-authors). 2001. Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science 293:455-462.[Abstract/Free Full Text]

    Volkman, S. K., A. E. Barry, E. J. Lyons, K. M. Nielsen, S. M. Thomas, M. Choi, S. S. Thakore, K. P. Day, D. F. Wirth, and D. L. Hartl. 2001. Recent origin of Plasmodium falciparum from a single progenitor. Science 293:482-484.[Abstract/Free Full Text]

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276.[ISI][Medline]

    Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304-309.[CrossRef][ISI][Medline]

    Xie, S. S., C. H. Huang, M. E. Reid, A. Blancher, and O. O. Blumenfeld. 1997. The glycophorin A gene family in gorillas: structure, expression, and comparison with the human and chimpanzee homologues. Biochem. Genet. 35:59-76.[CrossRef][ISI][Medline]

    Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Medline]

    Yang, Z., and J. P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496-503.[CrossRef][ISI][Medline]

    Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.[Abstract/Free Full Text]

Accepted for publication May 29, 2003.