*Department of Parasitology, Faculty of Medicine, Chulalongkorn University, Bangkok;
Department of Biological Sciences, University of South Carolina
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Merozoite surface protein 4 (MSP4) and MSP5 are both surface proteins of the merozoite, the stage of the parasite that infects red blood cells (Marshall, Tieqiao, and Coppel 1998
; Wu et al. 1999
). In addition, there is evidence that MSP4 is expressed on the surface of both the sporozoite, the stage of the parasite transmitted from mosquito to vertebrate hosts, and the liver stage (Bottius et al. 1996
). The loci encoding these molecules are encoded within an approximately 3,000-bp region on chromosome 2, and each locus includes a single short intron (Marshall, Tieqiao, and Coppel 1998
) (fig. 1a
). We sequenced the coding region and intron of each gene, along with the intergenic region and a portion of the region downstream of the MSP5 gene, from 21 isolates of P. falciparum, which were mostly derived from field populations in Thailand.
|
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
After digestion with the respective enzymes, the MSP4 PCR-amplified product from each isolate was purified using the GF x PCR DNA and gel band purification kit (Pharmacia) and ligated into pUC119 vector. Transformation was performed by electroporation using Escherichia coli strain JM109 as a host. Plasmid DNA was purified using Wizard DNA Miniprep kit (Promega, Madison, Wis.). DNA sequencing was performed using either the dRhodamine Dye Terminator Cycle Sequencing Ready Reaction kit or the BigDyeTM Terminator V3.0 Cycle Sequencing kit in an ABI PRISMTM 310 DNA sequencer. Sequences of both MSP4 and MSP5, including their 5' and 3' UTRs, were determined from 3.56 kb of each isolate using primers MSP4-F, MSP4-INTF: 5'-CAATTTATCTGACGCAGCAG-3' (nucleotides 18771896); MSP4-INTR: 5'-ACGATGGGGTATGCAATAGG-3' (nucleotides 22442263); MSP4-F2: 5'-GAAGGTATTGAATGTGTTG-3' (nucleotides 25262544); MSP4-R: 5'-TAAAATATATTATTATGGTAAGATTTAAAC-3' (nucleotides 26912720); MSP4-F3: 5'-TACAATATTATATTGTATTG-3' (nucleotides 30013020); MSP4-R2: 5'-CCCTTTAAGTTTTCGAACAT-3' (nucleotides 30393058); MSP4-R3: 5'-CCAATCAGATGCATGTTAC-3' (nucleotides 33363354); MSP5-F, MSP5-5R: 5'-TCATTAATCTTCTGACAACC-3' (nucleotides 37383757); MSP5-INTF: 5'-GAACCTCCAAATAGATTACA-3' (nucleotides 41414160); MSP5-INTR: 5'-GCACCTCATCATCTACATTG-3' (nucleotides 43014320); MSP5-3F: 5'-AATTCTTATTCTTGCCATCC-3' (nucleotides 45334552); and MSP5-R. Verification of both gene sequences of each isolate was done using five subclones for MSP4 and the reamplified PCR product for MSP5, using the respective sequencing primers. Sequences have been deposited in GenBank under accession numbers AF447553AF447573.
Statistical Analysis
Sequences were aligned using the CLUSTALX program (Thompson et al. 1997
). In pairwise comparisons among sequences, all sites at which the alignment postulated a gap in any sequence were excluded so that a comparable data set was used for each comparison. A phylogenetic tree was constructed by the minimum evolution method (Rzhetsky and Nei 1992
) using the Tamura and Nei (1993)
nucleotide distance. The reliability of clustering patterns in the tree was tested by bootstrapping (Felsenstein 1985
); 1,000 bootstrap pseudosamples were used.
The numbers of synonymous substitutions per synonymous site (dS) and nonsynonymous substitutions per nonsynonymous site (dN) were estimated by the method of Nei and Gojobori (1986)
; the number of nucleotide substitutions per site (d) was estimated by the method of Jukes and Cantor (1969)
. Standard errors of means dS, dN, and d for all pairwise comparisons (designated as
S,
N, and
, respectively) were estimated by the bootstrap method implemented in the MEGA2 program (Kumar et al. 2001
). The average age of the nearest common ancestor for pairs of haplotypes (
) was estimated from the mean nucleotide diversity (
) at both synonymous sites and noncoding sites. It is expected that
= 2r
, where r is the rate of nucleotide substitution per site (Hughes and Verra 2001
). We used the estimate of r from Hughes and Verra (2001)
.
TA skew in nucleotide content was measured by the quantity (T - A)/(T + A) (Lobry 1996
). When the correlation between
and TA skew was computed in sliding windows along the sequence, the significance of the correlation was conducted by a randomization test because windows are not independent. In this test the Pearson correlation coefficient (r) between a set of n paired observations (Xi and Yi for i = 1, ..., n) is computed and then compared with the r values obtained for 10,000 random data sets. The random data sets are obtained from the actual data by randomly pairing the Xi and Yi values.
Peptides bound by class-I and class-II major histocompatibility complex (MHC) molecules were predicted by the methods of Parker, Bednarek, and Coligan (1994)
and Rammensee et al. (1999)
.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
The genomes of members of the genus Plasmodium are known to be extraordinarily AT-rich (Weber 1988
). In the present data the percent AT ranged from 69%74% in the exons to 79%91% in the introns and other noncoding regions (table 1
). Even more striking was the difference among regions with respect to TA skew (Lobry 1996
), which ranged from a value of -0.285 in exon 1 of MSP4 to 0.198 in the intergenic region (table 1
). When TA skew was plotted in a sliding window across the region, peaks were found corresponding to the introns and the intergenic region, whereas troughs corresponded to the exons and portions of the downstream region (fig. 1a
). For all windows, nucleotide diversity was negatively correlated with TA skew (r = -0.197; P < 0.0001; randomization test).
There was a striking contrast between the intergenic and downstream regions; the former had a high positive TA skew and low polymorphism, whereas the latter had an overall negative TA skew and high polymorphism (table 1 ). Note that, although there were two sharp peaks in TA skew at the beginning and end of the downstream region (fig. 1a ), the very low troughs in TA skew in the same region were sufficient to create a net negative TA skew for the entire region (table 1 ). The introns, the intergenic region, and the downstream region all had high percent AT, in spite of striking differences among them with respect to nucleotide diversity (table 1 ).
When mean dS and dN were computed in a sliding window of 15 codons across the coding region of MSP4, a region in which dN greatly exceeded dS was observed in the 5' portion of the gene (fig. 1b
). A pattern of dN exceeding dS is indicative of positive selection favoring amino acid replacements (Hughes and Nei 1988
). Computer prediction (Parker, Bednarek, and Coligan 1994
; Rammensee et al. 1999
) of peptides bound by human class-I and class-II MHC molecules indicated a number of potential T-cell epitopes in this region (table 2
). Additionally, several amino acid replacements observed in the population of sequences analyzed were predicted to decrease peptide binding by MHC molecules (table 2
). Mean dN (0.020 ± 0.008) was significantly greater than mean dS (0.004 ± 0.004) in the codons (codons 3079) encoding these putative epitopes (P < 0.05). In contrast, in the remainder of MSP5 there was no significant difference between mean dS (0.002 ± 0.002) and mean dN (0.002 ± 0.001).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
On the basis of silent nucleotide divergence across the region, we estimated the average pairwise divergence time at about 200,000 years and the age of the ancestor of the two most divergent haplotypes at nearly 500,000 years. Note that these are both conservative estimates. As discussed subsequently, our results suggested that there is a reduced rate of nucleotide substitution in regions with high TA skew, such as the introns and the intergenic region. By including these regions in the analysis, we obtained minimum estimates of divergence times. The results, thus, are not consistent with the hypothesis of a recent extreme population bottleneck in P. falciparum (Rich et al. 1998
; Volkman et al. 2001
). Rather, they are consistent with previous results, indicating that the effective population size of this species has been large for at least the past few 100,000 years (Hughes and Verra 2001
; Mu et al. 2002
).
Comparison of the numbers of synonymous and nonsynonymous nucleotide substitutions across coding regions suggested that the MSP4 gene is subject to a form of positive selection, favoring amino acid replacements in an approximately 40-residue region of the MSP4 protein (table 2
and fig. 1b
). Consistent with the experimental evidence of T-cell recognition of MSP4 (Bottius et al. 1996
), these results suggest that polymorphism in this region of MSP4 is selectively maintained and that the selection is driven by host T-cell recognition. Because the sequence divergence among MSP4 alleles (table 1
) is quite low in comparison with that among alleles at certain other polymorphic loci in P. falciparum (Hughes 1991
, 1992
; Hughes M. K. and Hughes A. L. 1995
; Verra and Hughes 2000
), it appears that the balancing selection acting at MSP4 is of comparatively recent origin. The existence of a relatively recent balanced polymorphism at the MSP4 locus is consistent with our estimates of the age of haplotypes in the MSP4-MSP5 region.
The absence of nucleotide polymorphism in the intron, even of a locus apparently subject to balancing selection, suggests the existence of a mechanism suppressing nucleotide polymorphism in this intron. If the pattern of nucleotide substitution in introns of P. falciparum differs from that in exons, then such a difference might explain the fact that Volkman et al. (2001)
observed little polymorphism in introns, whereas extensive polymorphism was reported from exons (Hughes and Verra 2001
; Mu et al. 2002
). Like the introns in MSP4 and MSP5, the introns examined by Volkman et al. (2001)
were also very short and AT-rich. But our results suggest that AT richness itself is not sufficient to account for a reduction in the substitution rate in P. falciparum genomic regions. For example, the downstream region showed extensive nucleotide polymorphism in spite of a very high AT content (90.1%; table 1
). In contrast, the two extensive noncoding regions which we sequencedthe intergenic region and the downstream region (table 1
)differed strikingly with respect to both the TA skew and the extent of nucleotide diversity (table 1 ). This result, along with a negative correlation between nucleotide diversity and TA skew in sliding windows across the entire MSP4-MSP5 regions, suggests the hypothesis that TA skew is a major factor affecting the rate of nucleotide substitution in this species.
If an inverse relationship between TA skew and nucleotide polymorphism is found throughout the genome of P. falciparum, knowledge of TA skew will be useful in the search for polymorphic markers, which in turn can be used to search for loci contributing to traits such as drug resistance. In addition, because introns and exons differed strikingly with respect to TA skew (fig. 1a
), TA skew provides a potential additional factor that can be incorporated into gene-finding algorithms for Plasmodium species, in which prediction of short exons has been problematic (Van Lin et al. 2001
).
Both the class-I and class-II MHC loci of humans are characterized by substantially reduced polymorphism in introns in comparison with exons (Cereb, Hughes, and Yang 1997
; Hughes 2000
). In this case, statistical analyses suggest that the introns are homogenized relative to exons by recombination (perhaps involving a gene conversionlike mechanism) and by subsequent genetic drift (Cereb, Hughes, and Yang 1997
; Hughes 2000
). In the case of the MHC, it is well established that there is a balancing selection maintaining polymorphism in the exons encoding the peptide-binding regions of the molecule (Hughes and Nei 1988
, 1989
). The absence of such selection on introns and the occurrence of recombination assure that, over long evolutionary time, polymorphisms in introns do not hitchhike along with the selected polymorphisms in exons.
It might be suggested that a similar phenomenon can explain the absence of SNPs in the introns of MSP4 and MSP5, but there are a number of reasons for doubting such an interpretation. First, the contrast between introns and exons with regard to polymorphism is seen in the case of MSP5, even though there is no evidence of balancing selection at that locus. Furthermore, the introns of both MSP4 and MSP5 show no evidence of homogenization by gene conversionlike mechanisms. These introns reveal numerous interallelic differences, but these differences involve different numbers of repeat arrays and not point mutations. Furthermore, the introns show low levels of nucleotide polymorphism even in comparison with coding regions not under balancing selection, including portions of MSP4 outside the predicted epitope region and the entire MSP5 coding region. It is also worth mentioning that, given our estimate of an average allelic divergence time of about 200,000 years, the polymorphism at the MSP4 locus is very recent in comparison with balanced polymorphisms at MHC loci (Hughes and Yeager 1998
) or even at such P. falciparum loci as those encoding the circumsporozoite protein (Hughes and Verra 1998
) and apical membrane antigen-1 (Verra and Hughes 2000
).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: malaria parasite
nucleotide content
polymorphism
Plasmodium falciparum
positive Darwinian selection
Address for correspondence and reprints: Austin L. Hughes, Department of Biological Sciences, University of South Carolina, Coker Life Sciences Building, 700 Sumter Street, Columbia, South Carolina 29208. E-mail: austin{at}biol.sc.edu
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bottius E., L. BenMohamed, K. Brahimi, et al. (12 co-authors) 1996 A novel Plasmodium falciparum sporozoite and liver stage antigen (SALSA) defines major B, T helper, and CTL epitopes J. Immunol 156:2874-2884[Abstract]
Cereb N., A. L. Hughes, S. Y. Yang, 1997 Locus-specific conservation of the HLA class I introns by intra-locus homogenization Immunogenetics 47:30-36[ISI][Medline]
Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]
Hughes A. L., 1991 Circumsporozoite proteins of malaria parasites (Plasmodium spp.): evidence for positive selection on immunogenic regions Genetics 137:345-353
. 1992 Positive selection and interallelic recombination at the merozoite surface antigen-1 (MSA-1) locus of Plasmodium falciparum Mol. Biol. Evol 9:381-393[Abstract]
. 2000 Evolution of introns and exons of class II MHC genes of vertebrates Immunogenetics 51:473-486[ISI][Medline]
Hughes M. K., A. L. Hughes, 1995 Natural selection on Plasmodium surface proteins Mol. Biochem. Parasitol 71:99-113[ISI][Medline]
Hughes A. L., M. Nei, 1988 Pattern of nucleotide substitution at MHC class I loci reveals overdominant selection Nature 335:167-170[ISI][Medline]
. 1989 Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection Proc. Natl. Acad. Sci. USA 86:958-962[Abstract]
Hughes A. L., F. Verra, 1998 Ancient polymorphism and the hypothesis of a recent bottleneck in the malaria parasite Plasmodium falciparum Genetics 150:511-513
. 2001 Very large long-term effective population size in the virulent human malaria parasite Plasmodium falciparum Proc. R. Soc. Lond. B 268:1855-1860[ISI][Medline]
Hughes A. L., M. Yeager, 1998 Natural selection at major histocompatibility complex loci of vertebrates Annu. Rev. Genet 32:415-435[ISI][Medline]
Jongwutiwes S., K. Tanabe, M. K. Hughes, H. Kanbara, A. L. Hughes, 1994 Allelic variation in the circumsporozoite protein of Plasmodium falciparum from Thai field isolates Am. J. Trop. Med. Hyg 51:659-667[ISI][Medline]
Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis software Bioinformatics 17:1244-1245
Lobry J. R., 1996 Asymmetric substitution patterns in two DNA strands of bacteria Mol. Biol. Evol 13:660-665[Abstract]
Marshall W. M., W. Tieqiao, R. W. Coppel, 1998 Close linkage of three merozoite surface protein genes on chromosome 2 of Plasmodium falciparum Mol. Biochem. Parasitol 94:13-25[ISI][Medline]
Mu J., J. Duan, K. Makova, D. A. Joy, C. Q. Huynh, O. H. Branch, W.-H. Li, X. Su, 2002 Chromosome-wide SNPs reveal an ancient origin for Plasmodium falciparum Nature (in press)
Nei M., T. Gojobori, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions Mol. Biol. Evol 3:418-426[Abstract]
Parker K. C., M. A. Bednarek, J. E. Coligan, 1994 Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side chains J. Immunol 152:163-175
Putaporntip C., S. Jongwutiwes, T. Tia, M. U. Ferreira, H. Kanbara, K. Tanabe, 2001 Diversity in the thrombospondin-related adhesive protein gene (TRAP) of Plasmodium vivax Gene 268:97-104[ISI][Medline]
Rammensee H.-G., J. Bachmann, N. N. Emmerich, O. A. Bachor, S. Stevanovic, 1999 SYFPEITHI: database for MHC ligands and peptide motifs Immunogenetics 50:213-219[ISI][Medline]
Rich S. M., M. C. Licht, R. R. Hudson, F. J. Ayala, 1998 Malaria's eve: evidence of a recent population bottleneck throughout the world population of Plasmodium falciparum Proc. Natl. Acad. Sci. USA 95:4425-4430
Rzhetsky A., M. Nei, 1992 A simple method for estimating and testing minimum-evolution trees Mol. Biol. Evol 9:945-967
Tamura K., M. Nei, 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees Mol. Biol. Evol 10:512-526[Abstract]
Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882
Van Lin L. H. M., T. Pace, C. J. Janse, C. Birago, J. Ramesar, L. Picci, M. Ponzi, A. P. Waters, 2001 Interspecies conservation of gene order and intronexon structure in a genomic locus of high gene density and complexity in Plasmodium Nucleic Acids Res 29:2059-2068
Verra F., A. L. Hughes, 2000 Evidence for ancient balanced polymorphism at the apical membrane antigen-1 (AMA-1) locus of Plasmodium falciparum Mol. Biochem. Parasitol 105:149-153[ISI][Medline]
Volkman S. K., A. E. Barry, E. J. Lyons, K. M. Nielsen, S. M. Thomas, M. Chol, S. S. Thakore, K. P. Day, D. F. Wirth, D. L. Hartl, 2001 Recent origin of Plasmodium falciparum from a single progenitor Science 293:482-484
Weber J. L., 1988 Molecular biology of malaria parasites Exp. Parasitol 66:143-170[ISI][Medline]
Wu T., C. G. Black, L. Wang, A. R. Hibbs, R. L. Coppel, 1999 Lack of sequence diversity in the gene encoding merozoite surface protein 5 of Plasmodium falciparum Mol. Biochem. Parasitol 103:243-250[ISI][Medline]