An Unusual Form of Purifying Selection in a Sperm Protein

Alejandro P. RooneyGo,, Jianzhi Zhang1, and Masatoshi Nei

Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Protamines are small, highly basic DNA-binding proteins found in the sperm of animals. Interestingly, the proportion of arginine residues in one type of protamine, protamine P1, is about 50% in mammals. Upon closer examination, it was found that both the total number of amino acids and the positions of arginine residues have changed considerably during the course of mammalian evolution. This evolutionary pattern suggests that protamine P1 is under an unusual form of purifying selection, in which the high proportion of arginine residues is maintained but the positions may vary. In this case, we would expect that the rate of nonsynonymous substitution is not particularly low compared with that of synonymous substitution, despite purifying selection. We would also expect that the selection for a high arginine content results in a high frequency of the nucleotide G in the coding region of this gene, because all six arginine codons contain at least one G. These expectations were confirmed in our study of mammalian protamine genes. Analysis of nonmammalian vertebrate genes also showed essentially the same patterns of evolutionary changes, suggesting that this unusual form of purifying selection has been active since the origin of bony vertebrates. The protamine gene of an insect species shows similar patterns, although its purifying selection is less intense. These observations suggest that arginine-rich selection is a general feature of protamine evolution. The driving force for arginine-rich selection appears to be the DNA-binding function of protamine P1 and an interaction with a protein kinase in the fertilized egg.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
During the process of spermatogenesis in animal species, protamines replace histones and bind sperm DNA (Wouters-Tyrou et al. 1998Citation ). In mammals, there are three types of protamines: protamine P1, which forms the major component of sperm DNA-binding proteins; protamine P2, which is expressed only in certain species (e.g., rodents, horses, and primates; Queralt et al. 1995Citation ); and transition proteins, which are intermediates in the process of histone replacement (Oliva and Dixon 1991Citation ). The primary function of protamine P1 is to bind and condense sperm DNA during the course of sperm nucleus condensation in spermatogenesis (Ballhorn 1982Citation ; Oliva and Dixon 1991Citation ; Ward and Coffey 1991Citation ). Upon fertilization, protamine P1 releases the bound sperm DNA so that subsequent zygotic development can proceed (Wouters-Tyrou et al. 1998Citation ). Thus, the proper functioning of protamine P1 is essential for the successful fertilization of eggs by sperm.

Protamine P1 is a small protein but contains a large proportion of positively charged arginine residues that allow it to tightly bind and condense sperm DNA. Interestingly, the total number of amino acids in protamine P1 and the positions of the arginine residues vary considerably with taxonomic group, yet the proportion of arginine residues remains nearly the same (fig. 1 ). This indicates that arginine positions in protamine P1 are subject to evolutionary change and that the conservation of a high proportion of arginines is maintained at the protein level rather than at the amino acid site level. This pattern of conservation of amino acids (arginines) is different from that of most proteins, in which a particular set of amino acids are conserved by purifying selection at a given set of amino acid positions that are usually functionally important (Kimura 1983Citation ; Nei 1987Citation ). This suggests that protamine P1 is subject to an unusual form of selection. The purpose of this paper is to investigate the pattern of selection involved in this unusually arginine-rich protein.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1.—Protamine P1 amino acid sequences for monotremes and representative marsupial carnivore, primate, and ruminant species. A dot indicates an identical residue with respect to the echidna sequence, and a dash indicates a gap in the sequence alignment. A stop codon is indicated by an asterisk. The number of arginine residues over the total number of residues per sequence is given in parentheses under the first column.

 

    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We chose to focus our attention primarily on protamine P1 evolution in mammals, because there are a large number of nucleotide sequences available for this group. We analyzed the protamine P1 sequences of monotremes (Ornithorhynchus anatinus and Tachyglossus aculeatus), two eutherian (placental) groups (primates and ruminants), and the marsupial carnivore family Dasyuridae, which comprises three subfamilies: Sminthopsinae, Dasyurinae, and Phascogalinae (Krajewski et al. 1997Citation ). The sminthopsine species used in this study were Ningaui ridei, Planigale (P. gilesi, P. ingrami, P. maculata maculata, P. maculata sinualis, P. tenuirostris, and an undescribed species), and Sminthopsis murina. The phascogaline species used were Antechinus (A. bellus, A. flavipes, A. godmani, A. habbema, A. leo, A. melanurus, A. minimus, A. naso, A. stuartii, A. swainsonii, and an undescribed species), Murexia (M. longicaudata and M. rothschildsi), and Phascogale (P. calura and P. tapoatafa). The dasyurine species used were Dasycercus cristicaudata, Dasykaluta rosamondae, Dasyuroides byrnei, Dasyurus (D. albopunctatus, D. geoffroii, D. hallucatus, D. maculatus, D. spartacus, and D. viverrinus), Myoictis (M. melas and M. wallacei), Neophascogale lorentzii, Parantechinus (P. apicalis and P. bilarni), Phascolosorex dorsalis, Pseudantechinus (P. macdonnellensis, P. ningbing, and P. woolleyae), and Sarcophilus harrisii. The primate taxa analyzed were Homo sapiens (human), Pan troglodytes (common chimpanzee), Pan paniscus (bonobo), Gorilla gorilla (gorilla), Pongo pygmaeus (orangutan), Hylobates lar (common gibbon), and Erythrocebus patas (red guenon). The ruminant taxa analyzed were Alces alces (moose), Odocoileus virginianus (white-tailed deer), Cervus elaphus (elk), Bos taurus (cow), and Gazella dorcas (gazelle). Nucleotide sequence data for the species used in this study are found in Retief, Winkfein, and Dixon (1993)Citation , Queralt et al. (1995)Citation , Retief et al. (1995a, 1995b)Citation , and Krajewski et al. (1997)Citation . These sequences were used to compute the nucleotide frequencies for the mammalian groups shown in figure 2 .



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 2.—Nucleotide composition of protamine P1 genes in monotremes, marsupial carnivores, primates, and ruminants. For comparisons involving marsupial carnivores, only the results for the Dasyurinae are presented here, since all three subfamilies show highly similar patterns. The species considered in this analysis are listed in the Materials and Methods section.

 
Nucleotide sequences were downloaded from GenBank and subsequently aligned by taking into consideration the deduced amino acid sequences in order to minimize potential uncertainties in alignment. All analyses were completed using the computer program MEGA, version 1.03 (Kumar, Tamura, and Nei 1993Citation ). The extent of purifying selection was studied by computing the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site. These values were computed by the methods of Nei and Gojobori (1986)Citation and Zhang, Rosenberg, and Nei (1998)Citation , but since the results are very similar, we present only the results obtained with the former method. The standard errors of S and N were computed by using the method of Nei and Jin (1989)Citation .


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Nucleotide Composition Bias
The genomic frequencies of the four nucleotides, adenine (A), thymine (T), cytosine (C), and guanine (G), have long been known to differ among species (Sueoka 1961, 1962Citation ). However, the cause of this variation is still a matter of controversy (Bernardi et al. 1985Citation ; Filipski 1987Citation ; Sueoka 1988Citation ; Wolfe, Sharp, and Li 1989Citation ; Bernardi 1993Citation ; Francino and Ochman 1999Citation ). There have been recent reports that the amino acid composition of a protein is influenced by the nucleotide composition of the genome (Foster, Jermiin, and Hickey 1997Citation ; Gu, Hewett-Emmett, and Li 1998Citation ; Nishizawa and Nishizawa 1998Citation ). In the case of protamine P1 sequences, however, the nucleotide frequencies are likely to be affected by selection for a high arginine content. To test this hypothesis, we compared the nucleotide frequencies in the intron and coding regions of mammalian protamine P1 sequences.

It is apparent that nucleotide compositions are different between the intron and the coding regions (fig. 2 ). For example, in marsupials, the overall frequency of G in the coding regions is 37% considering all three codon positions combined, but it is 25% in the intron (fig. 2 ). Arginine residues are encoded by the six codons AGA, AGG, CGA, CGC, CGG, and CGT, which are used in the proportion of 0.45:0.27:0.01:0.08:0.11:0.08 in marsupial protamine P1 sequences. Because these codons contain G, selection for arginines in protamine P1 is expected to increase the frequency of G in the coding regions overall. To study this problem in more detail, we computed the nucleotide frequencies at each codon position separately. An increase in G due to selection for arginine should be most apparent at second codon positions, since most amino acid changes occur at second positions. We found that the differences in nucleotide frequency among the three codon positions of the protamine P1 gene are enormous (fig. 2 ). For example, the frequencies of G are only 7% and 26% at the first and third codon positions, respectively, in marsupial protamine P1, while the frequency of G at second positions is 77%. Essentially the same results are obtained for monotremes and placentals. These observations suggest that the selection pressure to maintain a large proportion of arginines in protamine P1 has changed the nucleotide frequency of the coding regions.

Arginine-rich Selection
Although arginine content is nearly the same (about 50%) for all taxonomic groups, the arginine positions in the sequence vary among them (fig. 1 ). This suggests that the level of nonsynonymous substitutions would not be very low compared with that of synonymous substitution despite the purifying selection. To examine this problem, we computed dS and dN for different pairs of sequences. However, we computed these values only for the sequences within each of the monotreme, marsupial, and placental species groups, because the sequence lengths of the three species groups were quite different, and the sequence alignments among them were not very reliable. In this computation, we eliminated all alignment gaps, and the average values (S and N) of dS and dN for each group of pairwise comparisons were computed.

The results obtained (table 1 ) show that N/S is less than 1.0 in all mammalian groups, but the ratio is not as low as that for many other genes. For example, Zhang (2000)Citation computed S and N for 47 genes evolving at moderate rates in primates and artiodactyls. They were S = 0.325 and N = 0.090. Therefore, N/S is 0.28. Of the 47 genes examined, the interleukin genes 6 and 7 showed N/S ratios of 0.90 and 0.66, respectively, but in all other genes, N/S was usually about 0.3 or less. These comparisons indicate that the N/S in protamine P1 is quite high despite the action of a special type of purifying selection, as predicted earlier.


View this table:
[in this window]
[in a new window]
 
Table 1 Tests of Purifying Selection in Mammalian Protamine P1

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Under the usual form of purifying selection, two general features are observed: (1) site-specific amino acid conservation is maintained, and (2) dN is much lower than dS. It is in this respect that the purifying selection operating on protamine P1 is unusual, as neither of these features was observed in all proteins so far investigated. In other words, a relatively high rate of nonsynonymous substitution may occur in mammalian protamine P1 as long as the proportion of arginine residues is conserved.

One of the driving forces for arginine-rich selection on mammalian protamine P1 is probably the sperm DNA–binding function of this protein. However, if selection is driven solely by the need to maintain the basic charge of the protein, then perhaps lysine should also have a high frequency in protamine P1, since lysine is another basic amino acid. Yet, this pattern is not observed (fig. 1 ). Why is lysine not used? The answer to this question appears to be the special function arginine has at the time of sperm-egg fertilization. Ohtsuki et al. (1996)Citation found that protamine P1, by way of polyarginine clusters, is able to activate casein kinase II (CK-II) in fertilized eggs, while oligopolylysine clusters cannot. CK-II is a serine/threonine protein kinase that is responsible for cellular metabolic alteration through phosphorylation of more than 50 different cellular polypeptides. Specifically, arginine residues were found to interact with an acidic amino acid motif of the regulatory ß subunit of CK-II (Ohtsuki et al. 1996Citation ). Therefore, the high frequency of arginines in protamine P1 is apparently caused by the requirement of the amino acid for binding of sperm DNA and for activating an important regulatory protein in the fertilized egg.

So far, we have discussed only mammalian protamine P1. However, protamine genes also exist in other vertebrate classes, although the structure of these genes is somewhat different from those of mammals. The most notable difference is the fact that only mammals have an intron in their protamine P1 gene. Among the nonmammalian vertebrates, birds are most similar to mammals in terms of protamine gene organization (Oliva and Dixon 1989Citation ). However, unlike mammals and birds, a multigene family of about 15–20 protamine genes exists in rainbow trout, a teleost fish species (Dixon et al. 1986Citation ). Unfortunately, the genomic organization of the protamine genes of amphibians and reptiles has not been studied. The only invertebrate animal so far studied is an insect, the boll weevil (Anthonomus grandis).

The protamine sequences from nonmammalian animals cannot be aligned reliably because the sequence length varies with species group and the sequence divergence is high. However, arginine content is very high in all animals, although the content for the boll weevil is somewhat lower (table 2 ). The relative nucleotide frequencies in the first, second, and third positions are also similar to those of mammalian protamine P1 genes (fig. 3 ). Therefore, the same type of purifying selection for maintaining a high arginine content is occurring in all animal species in which protamines are found. Of course, the animal species groups that have been studied so far are quite limited. It is therefore unclear whether the above unusual type of purifying selection is operating on all or most animal protamine genes.


View this table:
[in this window]
[in a new window]
 
Table 2 Arginine Contents in the Protamine Genes of Nonmammalian Vertebrate Species and an Insect

 


View larger version (38K):
[in this window]
[in a new window]
 
Fig. 3.—Nucleotide composition of the protamine genes of birds, an amphibian, fishes, and an insect. The species considered in this analysis are listed in table 2 . GenBank accession numbers for these sequences are D63796, D85426, M28100, M30275, X01204, X07511, and X52058

 


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
C. Su and W. S. Ward provided helpful comments and discussion. We thank S. Yokoyama and two anonymous reviewers for constructive comments. This study was supported by grants from the NIH and the NSF to M.N.


    Footnotes
 
Shozo Yokoyama,

1 Present address: Laboratory of Host Defenses, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland. Back

2 Keywords: protamine arginine selection nucleotide composition bias Back

3 Address for correspondence and reprints: Alejandro P. Rooney, Institute of Molecular Evolutionary Genetics and Department of Biology, 328 Mueller Laboratory, Pennsylvania State University, University Park, Pennsylvania 16802. E-mail: apr3{at}psu.edu Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Ballhorn, R. 1982. A model for the structure of chromatin in mammalian sperm. J. Cell Biol. 93:298–305.[Abstract]

    Bernardi, G. 1993. The vertebrate genome: isochores and evolution. Mol. Biol. Evol. 10:186–204.[Abstract]

    Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953–958.

    Dixon, G. H., J. M. Aiken, J. M. Jankowski, D. I. McKenzie, R. Moir, and J. C. States. 1986. Organization and evolution of the protamine genes of salmonid fishes. Pp. 287–412 in G. R. Reeck, G. A. Goodwin, and P. Puigdomenech, eds. Chromosomal proteins and gene expression. Plenum, New York.

    Filipski, J. 1987. Correlation between molecular clock ticking, codon usage, fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Lett. 217:184–186.[ISI][Medline]

    Foster, P. G., L. S. Jermiin, and D. A. Hickey. 1997. Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J. Mol. Evol. 44:282–288.[ISI][Medline]

    Francino, M. P., and H. Ochman. 1999. Isochores result from mutation not selection. Nature 400:30–31.

    Gu, X., D. Hewett-Emmett, and W.-H. Li. 1998. Directional mutational pressure affects amino acid composition and hydrophobicity of proteins in bacteria. Genetica 102/103:383–391.

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England.

    Krajewski, C., M. Blacket, L. Buckley, and M. Westerman. 1997. A multigene assessment of phylogenetic relationships within the dasyurid marsupial subfamily Sminthopsinae. Mol. Phylogenet. Evol. 8:236–248.[ISI][Medline]

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.03. Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park.

    Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.[Abstract]

    Nei, M., and L. Jin. 1989. Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290–300.[Abstract]

    Nishizawa, M., and K. Nishizawa. 1998. Biased usage of arginines and lysines in protein are correlated with local-scale fluctuations of the G + C content of DNA sequences. J. Mol. Evol. 47:385–393.[ISI][Medline]

    Ohtsuki, K., Y. Nishikawa, H. Saito, H. Munakata, and T. Kato. 1996. DNA-binding sperm proteins with oligo-arginine clusters function as potent activators for egg CK-II. FEBS Lett. 378:115–120.[ISI][Medline]

    Oliva, R., and G. H. Dixon. 1989. Chicken protamine genes are intronless. The complete genomic sequence and organization of the two loci. J. Biol. Chem. 264:12472–12481.[Abstract/Free Full Text]

    Oliva, R., and G. H. Dixon. 1991. Vertebrate protamine genes and the histone-to-protamine replacement reaction. Prog. Nucleic Acid Res. Mol. Biol. 40:25–94.[ISI][Medline]

    Queralt, R., R. Adroer, R. Oliva, R. J. Winkfein, J. D. Retief, and G. H. Dixon. 1995. Evolution of protamine P1 genes in mammals. J. Mol. Evol. 40:601–607.[ISI][Medline]

    Retief, J. D., C. Krajewski, M. Westerman, and G. H. Dixon. 1995a. The evolution of protamine P1 genes in dasyurid marsupials. J. Mol. Evol. 41:549–555.

    Retief, J. D., C. Krajewski, M. Westerman, R. J. Winkfein, and G. H. Dixon. 1995b. Molecular phylogeny and evolution of marsupial protamine P1 genes. Proc. R. Soc. Lond. B Biol. Sci. 259:7–14.

    Retief, J. D., R. J. Winkfein, and G. H. Dixon. 1993. Evolution of the monotremes. The sequences of the protamine P1 genes of platypus and echidna. Eur. J. Biochem. 218:457–461.[Abstract]

    Sueoka, N. 1961. Correlation between base composition of deoxyribonucleic acid and amino acid composition of protein. Proc. Natl. Acad. Sci. USA 47:1141–1149.

    ———. 1962. On the basis of variation and heterogeneity of DNA base composition. Proc. Natl. Acad. Sci. USA 48:582–592.

    ———. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.

    Wang, Y.-C., S. Kumar, and S. B. Hedges. 1999. Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc. R. Soc. Lond. B Biol. Sci. 266:163–171.[ISI][Medline]

    Ward, W. S., and D. S. Coffey. 1991. DNA packaging and organization in mammalian spermatozoa: comparison with somatic cells. Biol. Reprod. 44:569–574.[Abstract]

    Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283–285.

    Wouters-Tyrou, D., A. Martinage, P. Chevaillier, and P. Sautiere. 1998. Nuclear basic proteins in spermiogenesis. Biochimie 80:117–128.

    Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol. Evol. (in press).

    Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708–3713.

Accepted for publication November 1, 1999.