Contribution of Homoplasy and of Ancestral Polymorphism to the Evolution of Genes in Anthropoid Primates

Colm O'hUigin*,1, Yoko Satta{dagger}, Naoyuki Takahata{dagger} and Jan Klein*

*Max-Planck-Institut für Biologie, Abteilung Immungenetik, Corrensstrasse 42, Tübingen, Germany;
{dagger}Department of Biosystems Science, The Graduate University for Advanced Studies, Hayama, Kanagawa, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Molecular phylogenies of lineages that split from one another in short succession are often difficult to resolve because different loci and different sites within the same locus yield incongruent relationships. The incongruity is commonly attributed to two causes: differential assortment of ancestral polymorphisms and homoplasy. To assess the relative contribution of these two causes, sequences of 57 segments from 51 loci in six primate lineages (human, chimpanzee, gorilla, orangutan, macaque, and tamarin, abbreviated as H, C, G, O, M, and T, respectively) were subjected to "partitioning" analysis, in which phylogenetically informative sites were identified in all 15 pairwise comparisons of each of the 57 segments and tallied for their support or lack thereof for each of the theoretically possible phylogenies. The six lineages include one of the best known cases of a difficult-to-resolve phylogeny: the trichotomy (H, C, G), in which the three lineages may have diverged from each other within a short period of time. In this period many of the ancestral polymorphisms apparently persisted and yielded phylogenetically incongruent signals. By contrast, no ancestral polymorphism is expected to have survived during the interval separating the divergences of the O, M, and T lineages from the ancestor of the (H, C, G) group. Any phylogenetic incompatibilities at sites in the O, M, and T lineages relative to the (H, C, G) group are therefore presumably the result of homoplasy. The frequency of homoplasy estimated in this manner is unexpectedly high: 12% for the (H, C, G) clade and 19% for the (H, C, G, O) clade. At least three-quarters of the 48% incompatibility observed in the (H, C) clade is attributable to the sorting out of ancestral polymorphisms coupled with intragenic recombination. Possible reasons for this high level of homoplasy in the O, M, and T lineages are discussed, and a computer simulation has been carried out to produce a model explaining the observed data.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Attempts to determine phylogenetic relationships among closely allied taxa often yield discordant results depending on the gene or even part of the gene used in the reconstruction. As a consequence, phylogenies of many groups of taxa remain unresolved. Perhaps the best known example of a prolonged controversy now tentatively resolved by a consensus based on sequences of a large set of genes is the one involving the human species, specifically the question of its closest living relative (Miyamoto et al. 1988Citation ; Bailey 1993Citation ; Rogers 1993Citation ; Ruvolo 1997Citation ; Satta, Klein, and Takahata 2000Citation ). After elimination of the orangutan (O), favored by some anthropologists and paleontologists (Schwartz 1987Citation ), the issue came to be referred to as "the trichotomy problem," the question of the relationship among three species, human (H), chimpanzee (C), and gorilla (G). The consensus approach identifies the chimpanzee as the nearest living relative of humans, but the evidence supporting this conclusion is not overwhelming. In the most recent and largest study encompassing 45 loci and 47 kb of sequence (Satta, Klein, and Takahata 2000Citation ), the consistency of the inferred relationship is rather poor. Of the 174 sites that are informative regarding the relationship between H, C, G, and O, only 91 (52%) support the (H, C) clade. Almost half the sites support alternative phylogenetic relationships between the species—either the (H, G) or the (C, G) clade. Inconsistency in the inferred patterns of shared-derived substitutions (incompatibility) is apparent both between and within the loci of the three species comprising the trichotomy.

The two main causes commonly invoked to explain why different portions of the genome provide different answers regarding the phylogenetic relationships within a group of taxa are assortment of ancestral polymorphism and homoplasy. In the former case, an ancestral population of species H, C, and G may contain two alleles, a and b, at locus 1 and two other alleles, x and y, at locus 2. If, for example, at locus 1 the a allele is subsequently fixed in species G, whereas allele b is fixed in species C and H, C will be judged as the closest relative of H by the analysis of this locus. If, on the other hand, allele x at locus 2 is fixed in species C, whereas the y allele is fixed in species G and H, G rather than C will appear to be the closest relative of H. Similarly, at the nucleotide level, two sites within a single gene may yield contradictory phylogenetic information if recombination takes place between them and their polymorphism is differentially resolved among the species. The second major cause of phylogenetic ambiguity, homoplasy (i.e., independently attained similarity at a site), is commonly differentiated into parallel evolution (similarity acquired from the same ancestral condition) and evolutionary convergence (similarity attained from different ancestral conditions). Thus, for example, if a changes to b independently in G and H, whereas it remains unaltered in C, species G will appear to be more closely related to H than C, although in reality it may have diverged earlier than C from the lineage leading to H.

The extent to which ancestral polymorphism and homoplasy contribute to the obfuscation of a phylogenetic relationship is not known. In most molecular phylogenetic reconstructions, attempts are made to take homoplasy into account by correcting the observed sequence for presumed hidden substitutions with the help of one of the correction formulas available (Nei and Kumar 2000Citation , pp. 33–50). The underlying assumptions of all these formulas are stochasticity of the evolutionary process at the molecular level and neutrality of the substitutions. The formulas differ in the extent to which they take into account various factors that may influence the stochasticity of the process, such as the ratio of transitions to transversions or the four-nucleotide content of the sequence.

Here we attempt to actually measure the extent to which ancestral polymorphism and homoplasy influence phylogenetic reconstruction. To this end, we use a large collection of primate sequences, one half of which we obtained in our Tübingen laboratory and the other half from databases. The collection was assembled for a variety of purposes, the estimate of the relative influence of ancestral polymorphism and homoplasy on phylogenetic reconstruction being one of them. The data set includes sequences of human, chimpanzee, gorilla, and orangutan, as well as representative species of Old World monkeys (OWM) and New World monkeys (NWM). It thus covers a range of divergence times extending from 5 MYA (the human-chimpanzee split; White, Suwa, and Asfaw 1998Citation ) to nearly 50 MYA (the Platyrrhini-Catarrhini split dated by Kumar and Hedges [1998]Citation to 47.6 ± 8.3 MYA). It is this wide span of evolutionary time that allows us to use the data set for the present purpose. The expectation is that the degree to which ancestral polymorphism and homoplasy obscure phylogenetic relationships depends on the particular time frame of the evolutionary process. To understand the reason for this dependence, consider two time intervals, one encompassing the period during which three closely related species lineages diverged from one another (e.g., G from [H, C], followed by the divergence of H from C) and the second covering the period from the first divergence (i.e., G from [H, C]) to the present time. In the first case, we must take into account that the resolution of ancestral neutral polymorphisms in a population consisting of 105 breeding individuals may take up to 3 Myr (Takahata 1993Citation ; Takahata and Satta 1997Citation ). Hence, if the interval between the first and the second divergence was <3 Myr (as it probably was in the case of the H, C, and G lineages), then the resolution of the ancestral polymorphism can be expected to have confounded the phylogeny of the three lineages. On the other hand, if the interval between the first and the second divergences was >3 Myr (as was in the case of the divergences of OWM, NWM, and ape lineages), the resolution of the ancestral polymorphism should not have had any confounding effect. As for the interval from the first divergence to the present, the length of the divergence time determines how much homoplasy can be expected. Because homoplasy at the molecular level generally involves more than one substitution at a site and because in stochastic processes two hits at a site are more probable in a long time interval than in a short one, in the short interval, the frequency of homoplasy can be expected to be negligibly low, unless the substitution rate is very high (Takahata 1995Citation ). Therefore, homoplasy may have confounded the phylogenetic relationship among some of the NWM, OWM, and ape genes, but it might not have influenced the phylogeny of the human, chimpanzee, and ape genes. The objectives of the present study were to estimate the degree of phylogenetic incompatibility for clades in which only homoplasy could be the cause, to infer the mechanisms by which homoplasy arises, and to determine the importance of homoplasy in phylogenetic reconstruction.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The Data Set
The collection of sequences used in the present study comprised orthologous genes at 51 loci in species representing the major groups of anthropoid primates: human, African and Asian great apes, OWM and NWM (table 1 Go ). The apes were represented by the common chimpanzee (Pan troglodytes), the pygmy chimpanzee (P. paniscus), lowland gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus); the OWM by the bear macaque (Macaca arctoides), the rhesus macaque (M. mulatta), the crab-eating macaque (M. fascicularis), the Japanese macaque (M. fuscata), the gelada baboon (Theropithecus gelada), the yellow baboon (Papio cynocephalus), the patas monkey (Erythrocebus patas), and the green monkey (Cercopithecus aethiops); and the NWM by the cotton-top tamarin (Saguinus oedipus), the golden-mantled tamarin (S. tripartitus), the common marmoset (Callithrix jacchus), the black tufted-ear marmoset (C. penicillata), the black-capped capuchin (Cebus apella), the common squirrel-monkey (Saimiri sciureus), the Bolivian squirrel monkey (S. boliviensis), the northern night monkey (Aotus trivirgatus), the southern night monkey (A. azarae), the red howler monkey (Alouatta seniculus), and the spider monkey (Ateles sp.). The human loci came from different chromosomes, and they represented different functional categories, from ubiquitously expressed housekeeping genes to genes restricted in their expression to specific tissues. The orthology of the loci was checked by phylogenetic analysis which led to the exclusion of four of the original 55 loci: HLA-G, RNR1 (ribosomal RNA1, 28S), and MICA, on grounds of possible paralogy within multigene families, and IVL (involucrin) because of a complicated mode of evolution (Teumer and Green 1989Citation ). The extent of sequence variability caused by either polymorphism or polymerase chain reaction (PCR) and sequencing errors was estimated by independently amplifying and resequencing segments of nine of the 51 genes. Human sequences were checked for accuracy by comparing them with the corresponding segments of the human genome database entries. If conflicts occurred, sequences that differed from other primate sequences by the fewest substitutions were used. When more than one sequence from a particular primate lineage was available in the databases, sequences from cotton-top tamarin, bear macaque, and common chimpanzee were chosen for consistency. In other cases, the sequence of the nearest available relative from the same lineage was used. For simplicity, we refer to the representatives of the individual lineages as operational taxonomic units (OTUs). Throughout the text, the human, chimpanzee, gorilla, orangutan, macaque (OWM), and tamarin (NWM) lineages (OTUs) are abbreviated as H, C, G, O, M, and T, respectively. Some genes (UOX, FPR1, HBG1) are functional in certain lineages but have become inactivated in others. In such cases, the gene is treated according to its functional state in the majority of the OTUs.


View this table:
[in this window]
[in a new window]
 
Table 1 List of 57 Segments of 51 Loci Examined in the Present Study

 

View this table:
[in this window]
[in a new window]
 
Table 1 Continued

 
Partitioning Analysis
Sequences were aligned by eye; only in the case of the globin genes and ApoB segment 1 was an alignment obtained from the databases. All variable sites were then identified and classified as two-base, three-base, or four-base sites according to the number of nucleotide types found at each of them in the different species. Following Satta, Klein, and Takahata (2000)Citation , each variable site was classified as consisting of singletons, doubletons, and tripletons depending on whether a variant nucleotide occurred in one, two, or three of the six OTUs, respectively. This classification is sufficient to adequately and unambiguously describe the configuration of a site of six OTUs; it is unnecessary to count quatratons (nucleotide shared by four OTUs), pentatons (sharing by five OTUs), or hexatons (an invariant nucleotide) at a site because these are already incorporated in the description of the site in terms of singletons, doubletons, and tripletons. For example, the sharing of a nucleotide by five sequences at a site (a pentaton) is already inferred by the observation that such a site consists of one singleton, no doubletons, and no tripletons. A quatraton is inferred either from the presence of two singletons, no doubletons, and no tripletons or from the presence of no singletons, one doubleton, and no tripletons. Consequently, groupings above the level of tripleton are redundant in the classification of a site consisting of six OTUs. The total number of different singletons, doubletons, and tripletons that the six species could theoretically yield is 6, 15, and 20, respectively. The partitioning pattern of a given site provides information about OTUs because a shared character is more likely to be derived from a single mutation at the stem of two branches than from two independent mutational events.

To explain the system of partitioning, consider a site occupied by nucleotides g, c, c, a, a, and a in the OTUs H, C, G, O, M, and T, respectively. (To avoid confusion between nucleotide and OTU designations, here and subsequently we use italicized lowercase letters for the former and roman type uppercase letters for the latter.) The site contains one singleton because the nucleotide g occurs only in H and in no other OTU. The site also contains a doubleton because it is occupied by a c in both C and G but in no other OTU. Finally, the site also contains one tripleton because it is occupied by an a in O, M, or T but by a different nucleotide in the remaining three OTUs. If we did not know the root of the tree, this partitioning pattern would suggest that C and G share a common ancestor, as do O, M, and T (so that C, G, and H would share a common ancestor as well). But because we know the root, we can explain the observed pattern by assuming a single mutation in the stem of the C, G, and H branches. The partitioning of sites is then taken one step further. We notice that the chosen site has a g in H but that it is occupied by two different nucleotides in the other OTUs, c in C and G but a in O, M, and T. Altogether, the site is occupied by three different nucleotides in the six OTUs, and so it is classified as a three-base site. The sole singleton is classified as a three-base singleton. Similarly, the site contains a three-base doubleton and a three-base tripleton. If the site were occupied by g, a, a, a, a, and a in H, C, G, O, M, and T, respectively, it would consist of a two-base singleton, no doubletons, and no tripletons.

The classification was used to provide support for or against the arrangements of the OTUs into specific clades. Singletons are not phylogenetically informative and do not provide support for or against particular clades. Doubletons and tripletons that support the separation of OTUs into clades consistent with the consensus phylogeny (the one depicted in fig. 1 ) are called compatible. Doubletons and tripletons that support separation into clades inconsistent with the consensus phylogeny are called incompatible. A site can contain more than one doubleton or tripleton and so can be informative about more than one clade in the phylogeny. Because the number of variable sites in a data set is a function of the evolutionary rate and the total branch length of the phylogeny, the number of homoplasies is expected to increase with an increase in these two parameters. The estimated degree of homoplasy can therefore be expected to vary according to the species chosen as an out-group and the interval from the first divergence to the present time.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1.—The simulated phylogeny of six species. The values t5 to t1 correspond to the divergence times of branches leading to tip sequences of T, M, O, G, C, and H, respectively. A0 is an ancestral sequence, and N1 to N4 are node sequences. The values d1 to d9 are the per-site nucleotide divergences of the individual branches

 
Computer Simulation
To examine the effects of mutation rate and nucleotide composition on the frequency of homoplasy, computer simulations were undertaken. The divergence times of the H, C, G, O, M, and T lineages were designated as t1 to t5 (fig. 1 ). Under a given mutation rate µ (per site per unit time), the expected number of nucleotide substitutions per site on branches leading to the different OTUs was based on actual sequence data (see Results). The values used were d5 = t5µ = 0.05, d4 = t4µ = 0.027, d3 = t3µ = 0.0125, d2 = t2µ = 0.005, and d1 = t1µ = 0.0045 for the T, M, O, G, and C or H lineages, respectively. For each OTU, a single sequence was generated in each replication. The simulation began with an ancestral sequence, A0, composed of 1,000 identical nucleotides (all a's). To generate a representative sequence of OTU T, for each of the 1,000 sites, a single uniform variable v was generated. If v was smaller than d5, a substitution was introduced at the site. The identity of the introduced nucleotide was determined by using the four-state model, in which a nucleotide has an equal probability of changing to any of the three remaining nucleotides (an assumption underlying the Jukes-Cantor model; see Jukes and Cantor 1969Citation ), or the two-state model, in which an a can change only to t (or vice versa) and g can only change to c (or vice versa); such a situation can occur in extremely at- or gc-rich regions of the genome. If, however, the simulation is started with a's only, the g{leftrightarrow}c change can never happen. The two-state model also covers the case in which transitions and transversions have very different rates. The simulation assumes only two bases, a and t or g and c. If instead of these two bases, two different bases, such as a and g, are considered, the simulation does not change in principle and the a and g model corresponds to the extreme case of a strong substitutional bias. To simulate the divergence of lineages, four node sequences (N1 to N4) and six tip sequences (T, M, O, G, C, and H) were generated (fig. 1 ). For example, N1 was generated from A0 with probability of substitutions of d6 = d5 - d4 at each site in the same way as above. From N1, the N2 node sequence and the M tip sequence were generated by nucleotide substitutions with probabilities of d7 = d4 - d3 and d3, respectively. The extent of incompatibility of the generated tip sequences was then examined by the same method as that applied to the actual data, and the extent of compatibility of the (H, C), (H, C, G), and (H, C, G, O) clades was estimated from the simulated sequences. To obtain the distribution of the extent of incompatibility, 10,000 incompatibility values for each clade were generated, and the proportion of incompatibility values that fell in a particular range was calculated. The range was defined as one of 20 divisions of an interval extending from 0 to 1.0. To examine the effect of an increased mutation rate, a 10 times higher mutation rate was obtained by increasing the di (for i = 1 to 5) values 10-fold. An "incompatibility value" was defined as the proportion of sites incompatible with the proposed phylogeny relative to all sites informative with respect to this phylogeny. For example, consider the (H, C, G) phylogeny: if there are x, y, and z sites supporting (H, C) G, (H, G) C, and (C, G) H phylogenies, respectively, then the incompatibility value for the (H, C, G) phylogeny is given by (y + z)/(x + y + z). Incompatibility values for (H, C, G, O) and (H, C, G, O, M) phylogenies can be calculated in a similar way.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Characteristics of the Data Set
The data set used in the final analysis comprised 57 segments of 51 genes in six OTUs and hence 306 sequences altogether. The length of the sequences varied depending on the gene. The total length of the concatenated sequences, after the removal of gaps and initiation and stop codons, was 62,533 bp. Fifty-four thousand seventy-three of the total number of sites (86.3%) were invariant, and 1,402 (15.2%) of the 8,457 (13.7%) variable sites were phylogenetically informative. At the variable sites, singletons were present in similar numbers in H (287), C (256), and G (308) sequences. In the O, M, and T sequences, the number of singletons increased to 684, 1,705, and 4,189, respectively, corresponding to their increasingly greater phylogenetic distance from the other OTUs.

To estimate the extent to which the interspecies comparisons might be influenced by either intraspecies polymorphism or by errors in sequence determination (either during PCR amplification or during sequencing), nine randomly chosen gene segments were reamplified and resequenced from all or nearly all the nonhuman OTUs. Comparison of the "old" and the "new" sequences (a total of about 30 kb in length) revealed 44 differences. The number of differences varied from gene to gene, being highest in APOA1 (12 differences in a total of 4.8 kb of sequence from four OTUs) and lowest in POMC (two differences in a total of 2.4 kb of sequence from four OTUs). The mean was 1.5 differences per kilobasepair of sequence, all the differences being singletons (i.e., they were not shared with any other sequence in the alignment). Significantly, all the incompatible sites included in the resequenced set could be confirmed.

Partitioning Analysis to Identify the Nearest Living Relative of the Human Species
In a partitioning analysis, the sites at which differences occur between the studied OTUs are considered individually in terms of their support or the lack thereof for a particular phylogeny. Initially, the differential sites are classified into singletons, doubletons, or tripletons for each of the six OTUs separately or for the various combinations of the OTUs, as described in Materials and Methods (table 2 ). In the next step, the 41 partitions that are theoretically possible with a set of six OTUs are further classified into two-, three-, or four-base categories (see Materials and Methods). Phylogenetically informative partitions are then identified, their deposition regarding individual phylogenies is noted, and the sites supporting a particular phylogeny are tallied. Because a larger number of phylogenies are possible for clades with a greater number of OTUs, only those partitions that informatively group one OTU of the (H, C, G), (H, C, G, O), or (H, C, G, O, M) phylogenies with the appropriate out-group (O, M, or T, respectively) are considered in estimating the extent of incompatibility for each clade.


View this table:
[in this window]
[in a new window]
 
Table 2 Partition of Variable Sites in the 51 Loci from Six OTUs

 
Table 2 displays the number of sites that fall into the individual partitions into which the differential sites in the alignment of the sequences from the six OTUs can be classified. The partitions differ in their phylogenetic informativeness. Singletons are uninformative about phylogenetic relationships among the species; only sites that contain at least two types of nucleotides, each type being shared by at least two OTUs, are considered to be phylogenetically informative. All the 15 possible doubletons, regardless of the number of nucleotide types found at the site, can be phylogenetically informative. By contrast, only the two- and three-base categories of the tripletons are phylogenetically informative because a four-base tripleton in reality consists of one tripleton and three singletons.

Altogether, 1,402 sites were found to be phylogenetically informative, and of these, approximately 90% provided information regarding the grouping of the OTUs under consideration here. The remaining 10% gave information on groupings not relevant to the study; for example, the grouping of O with M or of C, G, and M. Of the 89 sites that are informative about the H, C, and G relationship, 46 sites (52%) were found to support the (H, C) clade excluding all the other OTUs, a value similar to that found by Satta, Klein, and Takahata (2000)Citation . Of these, 43 sites were of the two-base type, and 3 sites were of the three-base type. Fourteen sites (11 of the two-base type and 3 of the three-base type) supported the (H, G) clade, and 27 sites (24 of the two-base and 3 of the three-base type) supported the (C, G) clade. Thus, the results of the partitioning analysis uphold the conclusion reached in an earlier study with a different data set (Satta, Klein, and Takahata 2000Citation ), namely, that the chimpanzee is the nearest living relative of Homo sapiens. At the same time, however, the high proportion of incompatibility between the phylogenetically informative sites (with 48% of the sites supporting alternative phylogenies) indicates that sorting out of ancestral polymorphisms, homoplasy, or both have blurred the phylogenetic signals that might have otherwise indicated clearly the disjunction of the H, C, and G lineages.

Dissociation of Ancestral Polymorphism from Homoplasy
The results described in the preceding section indicate that the gorilla lineage diverged from the lineage leading to the common ancestor of human and chimpanzee before these last two species (lineages) diverged from each other. The interval between these two divergences was apparently relatively short, probably not more than 1–3 Myr, well within the range of persistence of ancestral polymorphism in a large population (Takahata 1995Citation ). To estimate to what degree homoplasy might have contributed to the blurring of phylogenetic signals during this interval, it is necessary to extend the partitioning analysis by including more distantly related lineages (O, M, T) into it. Both the paleontological (Martin 1993Citation ) and molecular (Sarich and Wilson 1967Citation ; Sibley, Comstock, and Ahlquist 1990Citation ; Horai et al. 1992Citation ) data indicate that the orangutan lineage diverged from the lineage leading to the common ancestor of human, chimpanzee, and gorilla 12–15 MYA. Because the human and chimpanzee lineages diverged from each other 5 MYA, and the gorilla lineage diverged from the lineage of the (H, C) ancestor not more than 8 MYA (Horai et al. 1992Citation ), an interval of >4 Myr separated the divergence of the orangutan lineage from that of the (H, C, G) lineage. This interval is too long for any ancestral polymorphism (except that maintained by balancing selection; see Klein et al. 1998Citation ) to survive, and so any incompatibilities between phylogenetically informative sites in the analysis of the (H, C, G, O) phylogeny should be attributable to homoplasy.

The inclusion of the O OTU in the partitioning analysis revealed the existence of 332 sites that support the (H, C, G) clade excluding O with M as the out-group (table 3 ). Forty-five sites are inconsistent with this grouping in that they include O in the clade and exclude H (10 sites), C (14 sites), or G (21 sites). Thus, the (H, C, G) clade is supported by 88% of the informative sites, with the remaining 12% of sites supporting alternative phylogenies. The incompatibilities of the latter sites are presumably the result of homoplasy in the lineages leading to M on the one hand and to the (H, C, G, O) lineage on the other. The increase in the proportion of sites that support the standard phylogeny from 52% for the (H, C) clade to 88% for the (H, C, G) clade is presumably a reflection of the corresponding decrease in the contribution of ancestral polymorphism to the evolution of the two clades.


View this table:
[in this window]
[in a new window]
 
Table 3 Partitioning of Informative Sites in the 51 Loci Divided into Coding and Noncoding Regions

 
Similarly, the (H, C, G, O) grouping excluding M and T is supported by 638 (81%) of the 789 informative sites, the remaining 19% of sites supporting alternative groupings: (H, C, G, M), 10%; (H, C, M, O), 3.6%; (H, M, G, O), 1.6%; or (M, C, G, O), 3.8%. Here, homoplasy can be assumed to have affected 19% of the informative sites. The increase in homoplasy from 12% for the (H, C, G, O) lineage to 19% for the (H, C, G, O, M) lineage is as expected, taking into account the increased divergence time of the out-group T to the latter group (>45 Myr; Martin 1993Citation ) in comparison with that of the out-group M to the former group (~30 Myr; Martin 1993Citation ) and assuming that the probability of substitution is a function of time. By assuming that maximally 12% of the informative sites have also been influenced by homoplasy during the evolution of the (H, C, G) group, we estimate that in maximally one-quarter of the 48% incompatible sites found in this group, the incongruence with the consensus phylogeny is the result of homoplasy. The incongruence of the remaining three-quarters of incompatible sites is presumably caused by the segregation of ancestral polymorphisms.

Possible Reasons for the High Level of Homoplasy
The observation that 19% and 12% of the phylogenetically informative sites have undergone homoplasious substitutions during the time interval between the divergence of the Platyrrhini from the Catarrhini lineage and of the OWM from the ape lineage, respectively, is surprising and unexpected. The common perception, supported by computer simulations based on the standard models of molecular evolution (see below), is that homoplasies in intervals of these lengths are rare, on the order of a few percent at most. Even in cases in which intense selection is known or suspected to drive the substitution process (Takahata 1995Citation ) or in which functional convergences at the molecular level are postulated (Swanson, Irwin, and Wilson 1991Citation ; Irwin, White, and Wilson 1993Citation ; Lawn, Schwartz, and Patthy 1997Citation ), homoplasy is believed to be an exception rather than a rule. The following question therefore arises: What might be the reason for the high homoplasy found in the primate lineages? In what follows, we consider three possible answers to this question.

The first possibility is that the observed homoplasy is a manifestation of selection pressure for convergence in function. Many of the studied genes (ABO, RHAG, PRM2, RNASE3, SRY) are known or postulated to be under moderate-to-strong selection pressure (O'hUigin, Sato, and Klein 1997Citation ; Zhang, Rosenberg, and Nei 1998Citation ; Wyckoff, Wang, and Wu 2000Citation ). Could this pressure be responsible for the high homoplasy? To test this possibility, we divided the data set into coding and noncoding subsets and carried out the partitioning analysis separately on the two subsets (table 3, parts B and C ). Of the 29,451 coding sites of the 51 loci, 3,018 (10%) were found to be variable, and of these, 545 (18%) sites are phylogenetically informative. Of the 545 sites, 27, 148, and 312 are informative about the (H, C, G), (H, C, G, O), and (H, C, G, O, M) phylogenies, respectively. Fifteen of 27 (56%) relevant informative sites support the (H, C) clade, whereas 129 of 148 (87%) sites support the (H, C, G) clade, and 255 of 312 (82%) sites are compatible with the (H, C, G, O) clade. Similarly, of the 32,921 noncoding sites, 5,424 (16%) have been found to be variable, and of these, 855 (16%) are phylogenetically informative. Of these, 62, 229, and 476 sites are informative about the (H, C, G), (H, C, G, O), and (H, C, G, O, M) phylogenies, respectively. Thirty-one of the 62 (50%) relevant informative sites support the (H, C) clade, 203 of 229 (89%) the (H, C, G) clade, and 382 of 476 (80%) the (H, C, G, O) clade. Thus, the differences in the proportion of incompatibilities between the coding and noncoding regions are small, and no strong tendency for homoplasy arising preferentially in the coding regions is apparent. Selection therefore does not appear to play a dominant part in the generation of homoplasy.

The second possibility is that the observed high level of homoplasy is caused by a bias in nucleotide composition. The mechanisms producing bias in equilibrium nucleotide frequencies in different genomic regions are unclear (Wolfe, Sharp, and Li 1989Citation ). Because maintenance of compositional bias increases the probability of like substitutions, such bias could be expected to indirectly influence the extent of homoplasy at certain sites in a gene and in certain regions of a genome. This effect should be most pronounced in the third positions of codons and noncoding regions where mutational bias is the primary determinant of nucleotide composition. The absence of a significant increase in homoplasy in noncoding regions noted above therefore provides an argument against this explanation. Further evidence emerges from an examination of the nucleotide composition of the individual genes (table 4 ). Compositional bias was measured by using the method of Kornegay et al. (1993)Citation . It was then related to the number of variable sites, number of phylogenetically informative sites, and number of sites compatible or incompatible with the consensus phylogeny (table 4 ). The measurement of incompatibility was limited to the (H, C, G, O) and (H, C, G, O, M) groups where no contribution of ancestral polymorphism should occur. Most (45 of 57 segments) of the genes in the data set showed some degree of incompatibility which ranged from 0% to 72%. The relatively high percentages of incompatibility found in some of the short genes (ZFX, ACAT2, DRD4, PROC) may be caused by stochastic effects associated with a low number of informative sites. The longer genes containing more informative sites probably provide a more reliable estimate of incompatibility. The gist of the comparison is that compositional bias in either coding or noncoding regions does not appear to have a strong effect on the degree of homoplasy found in the individual genes. Genes showing the highest levels of compositional bias often show a below average (e.g., ADBR3, MSH2) or no (e.g., ZNFN1A1, APOB) homoplasy. By contrast, genes with high levels of homoplasy may have a low (e.g., RNASE3, PROC) or moderate (e.g., PRM2, LCAT) degree of compositional bias.


View this table:
[in this window]
[in a new window]
 
Table 4 Gene-by-Gene Analysis of Incompatibility

 
The third theoretically possible explanation for the observed high average level of homoplasy in the primate genes is a high mutation rate with an associated increased probability of multiple hits. An increased mutation rate can have a variety of causes, one of which is the presence of CpG dinucleotides prone to methylation and thus to a high frequency of C->T transitions (Green et al. 1990Citation ). To test the effect of an increased mutation rate, substitution rates were estimated for the individual genes. Assuming some correspondence between mutation and substitution rates and using Kimura's two-parameter method (Kimura 1980Citation ), we calculated the per-site substitution rate (K) for each gene in all 15 pairwise comparisons of the six OTUs and then summed up the values and expressed the sum as percent K—a measure we refer to as "{Sigma}K%" in table 4 . The rates were calculated for all sites of a given gene and for synonymous sites separately. Except for a few genes (generally the slowly evolving ones), the two rate estimates correlated reasonably well with each other. The general tendency revealed by the estimates is that genes with higher mutation rates show higher degrees of homoplasy. Thus, the 14 most rapidly evolving gene segments in the (H, C, G, O) clade have 23% incompatible sites on average (100 of 437 sites), whereas the most slowly evolving gene segments have a mean incompatibility of 15% (9 of 60 sites), the overall mean incompatibility for this clade being 17%. Eleven of the 12 gene segments that show no incompatibilities caused by homoplasy are in the slowly evolving set, and there is no segment without incompatibility in the set of the 25 most rapidly evolving gene segments. Nonetheless, it must be pointed out that although the tendency for the association of incompatibility with higher mutation (substitution) rates does exist, the association is not very strong and it is not clear to what degree the higher incompatibility levels could be attributed to the increased rates. At least some of the incompatibility may be related to certain special evolutionary characteristics of some of the genes. Thus, for example, the high incompatibility level of the RNASE3 gene found in the (H, C, G, O) clade (13 incompatibilities at 15 sites) may be attributed to its retention of the character of the RNASE2 gene from which it arose by duplication following the divergence of the Catarrhini from the Platyrrhini (Zhang, Rosenberg, and Nei 1998Citation ).

Search for the Cause of High Homoplasy by Computer Simulation
Partitioning analysis excluded selection pressure, but not nucleotide composition from being responsible for the high homoplasy of the primate genes, and provided an indication that variation in the mutation rate of different sites might be a factor. To test whether a combination of the two factors, nucleotide composition bias and variability of mutation rates, might explain the data, a computer simulation was carried out. The influence of nucleotide composition was simulated by using either the four-state or the two-state model of molecular evolution. The effect of the mutation rate was assessed by letting the genes evolve with a rate of d5 = 0.05 and then again with a 10-fold higher rate of d5 = 0.5. (As mentioned earlier, the strong transition bias model is covered by the two-state model.) The first of these two values was obtained by taking the average divergence at synonymous sites of the 51 studied loci calculated from the comparison of the T sequences with the sequences of the other five OTUs (i.e., d5 = 0.09862/2 = 0.05). Thus, the combination of the two variants of each of the two factors tested four different conditions under which the genes evolved. The simulated evolutionary process was aimed at producing six OTUs related to one another in the manner depicted in figure 1 (for further details see Materials and Methods). The simulation was repeated 10,000 times for each of the four sets of conditions, and the results were summarized in a graphic form separately for the (H, C), (H, C, G), and (H, C, G, O) clades (fig. 2 ; panels A, B, and C, respectively). Plotted on the abscissa of the graph are the incompatibility values or the proportions of informative sites incompatible with the particular clade in the set of artificially generated sequences. The ordinate indicates the frequency with which each particular proportion occurred in the 10,000 replicates; it can also be interpreted as the probability of obtaining a particular proportion of incompatible sites in one simulation experiment or at one locus.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2.—Simulation results of the effects of the mutation rate and nucleotide composition on the observed extent of homoplasy. The abscissa shows the proportion of informative sites that are incompatible with the (H, C) clade (A), the (H, C, G) clade (B), or the (H, C, G, O) clade (C) in a set of generated sequences. The ordinate shows the proportion of replicates among 10,000 supporting the indicated incompatibility. µ, mutation rate; L, low; H, high (i.e., 10 times higher than L); 2S, two-state model; 4S, four-state (Jukes-Cantor) model

 
The simulation reveals that under the low mutation rate and the application of either the two- or four-state model, more than 80% of the replications show no incompatibility in both the (H, C) and the (H, C, G) clades (fig. 2A and B ). In the remaining <20% replications, incompatibilities do occur, but because they are rare, they are widely scattered among the replications. Moreover, because mutations are generally rare, when an even rarer incompatible mutation does occur, it has a large effect on the incompatibility value. Consequently, the simulation is subject to a considerable "noise" reflected in the scattering of incompatibilities. Because the (H, C, G, O) clade is deeper than either the (H, C) or the (H, C, G) clade, the larger number of substitutions found in the deeper phylogeny results in less noise in the simulation results than that found in the shallower clades (fig. 2C ). Although more incompatibilities occur, their proportion in the total number of substitutions varies far less than when only a few informative sites are present. The increased number of incompatibilities is reflected in the observation that only 55% of replicates produce no incompatibilities under the four-state model and in that the value drops to 19% under the two-state model.

Higher mutation rates increase the proportion of incompatibility in each of the three clades in both the four- and two-state models. In the case of the (H, C) clade, the proportion of replicates without incompatibility falls below 10% under the four-state model and to 0% under the two-state model. Incompatibilities are distributed in varying proportions through most replicates, the variation again being the result of sampling effects in the shallow phylogeny. In the (H, C, G) clade, no replicate under either the four- or two-state model is without incompatibility. The distribution of incompatibility shows a peak at ~25% under the four-state model and a broad-shouldered peak at ~35% under the two-state model. Finally, in the (H, C, G, O) clade, the peak under the four-state model moves to ~30% and that under the two-state model to an equilibrium value of 80%. In the two-state model, four incompatibilities arise for every compatibility generated.

From these observations the following conclusions can be drawn. First, the level of homoplasy is insignificant when the mutation rate is uniformly low at all the sites and when the nucleotide composition is unbiased. Second, the two-state model does not markedly affect the extent of homoplasy in comparison with the four-state model when the range of sequence divergence is low (<10%). Third, a high mutation rate greatly increases the extent of homoplasy: even for a pair of OTUs with a short divergence time, the proportion of loci with compatible sites only is reduced to <10%. The reduction takes place under both models, but it is more pronounced under the two-state than under the four-state model.

Simulation Based on a Mixed Rate Model
Taking these results into consideration and taking into account the possibility that mutation rates may vary from site to site and between different regions of a gene or genome, a mixed rate model was constructed and used in another set of computer simulations. To simulate the situation encountered with the actual data set more realistically, the number of replicates was reduced from 10,000 to 51, corresponding to the number of genes in the set. And to provide for the heterogeneity of the mutation rate, we allowed 900 of the 1,000 sites at each gene to mutate at the low (1µ) rate and the remaining 100 sites at the 10 times higher (10µ) rate. In all other respects the simulation was carried out as in the first experiment. The observed compatibility values—62%, 85%, and 85% for the (H, C), (H, C, G), and (H, C, G, O) clades, respectively—were in reasonably good agreement with the actual data.

We have thus far considered compatible or incompatible sites for a particular clade irrespective of their occurrences within or between loci. However, a locus can be incompatible with a particular clade in two different ways: in one, all the informative (either compatible or incompatible) sites at the locus are incompatible (interlocus incompatibility), and in the other, the locus contains some sites incompatible with each other (intralocus incompatibility). In the experimental data, of the 57 sequence segments (table 4 ), 34 were informative for the (H, C) clade. Of these, six loci or segments (18%) showed intralocus incompatibility and contained 21 incompatible and 20 compatible sites within these loci. The relative extent of intralocus incompatibility for the (H, C) clade (21 vs. 20) was much larger than that for the (H, C, G) clade (44 vs. 244) and for the (H, C, G, O) clade (150 vs. 562). The remaining 28 segments supported either the (H, C) (18 segments, 53%) or the (H, G) and (C, G) (10 segments, 29%, of interlocus incompatibility) grouping unambiguously. Hence, in total, 82% of segments supported a single phylogeny. This proportion reduced to 28/53 = 53% for the (H, C, G) clade and 15/56 = 27% for the (H, C, G, O) clade, each with only one segment being incompatible with either of these clades. Similarly, in terms of the numbers of interlocus incompatible versus compatible sites, there were 22 versus 26 for the (H, C) clade but 1 versus 88 for the (H, C, G) clade and 1 versus 76 for the (H, C, G, O) clade. Thus, the actual data showed that compared with the (H, C, G) and (H, C, G, O) clades, both intra- and interlocus incompatibilities were notably high for the (H, C) clade.

Although the simulation result was in good agreement with the observed low extent of interlocus incompatibility for the (H, C, G) and (H, C, G, O) clades, it failed to account for the observed high extent of interlocus incompatibility for the (H, C) clade. Although by using a mixed rate model it was possible to generate a high degree of intralocus incompatibility, the extent of intralocus incompatibility tended to increase because the clade included distantly related OTUs. This simulation result was again inconsistent with the observed high extent of intralocus incompatibility for the (H, C) clade and the low extent for the (H, C, G) and (H, C, G, O) clades. Thus, the simulation result suggested that the relatively high extent of intra- and interlocus incompatibility observed in the (H, C) clade cannot be accounted for by high mutation rates at particular sites or by homoplasy. It must therefore have a different cause, namely, ancestral polymorphism.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
To sum up, comparative analysis of 57 sequences obtained from 51 genes in six primate OTUs representing the human, chimpanzee, gorilla, orangutan, OWM, and NWM lineages reveals the occurrence of homoplasy (parallelism or convergence) at much higher frequencies than is generally expected (19% in the six-OTU lineage and 12% in the OWM and ape lineages). Together with ancestral polymorphism, homoplasy is therefore a major source of incongruence in phylogenetic reconstructions. Whereas ancestral polymorphism may obscure only phylogenies of lineages that diverged from one another within a short interval on the evolutionary time scale, no such restriction applies to homoplasy. Of the three major factors considered here as potential causes of the observed high level of homoplasy, no compelling evidence could be mustered for the effect of selection. In particular, the observation that homoplasy is distributed equally between coding and noncoding parts of the genome argues against this explanation. Selection, however, is responsible for an increased tendency toward parallel substitutions in certain genes, such as those of the major histocompatibility complex (O'hUigin 1995Citation ; Kriener et al. 2000Citation ), not included in the data set used in the present study. Both the analysis of this data set and computer simulations implicate the two other factors—variation in nucleotide composition and, in particular, in mutation (substitution) rates. Taken in isolation, these two factors, however, do not fully account for the observed high level of homoplasy. The simulation study indicates that even in extreme cases of biased nucleotide composition, 10% incompatibility is reached very rarely when standard substitution models are applied. On the other hand, high mutation rates do appear to account for some but not all of the observed incompatibility, especially that observed in the (H, C, G) phylogeny. But because we observed a high extent of interlocus incompatibility, whereas simulation led to a high degree of intralocus incompatibility, ancestral polymorphism must be an important factor in determining the (H, C, G) phylogeny. Compared with the (H, C, G) and (H, C, G, O) clades, the large proportion of intralocus-incompatible sites relative to that of intralocus-compatible sites in the (H, C) clade cannot be accounted for by homoplasy. Rather, it is most likely the result of the combined effect of intragenic recombination and ancestral polymorphism in the stem lineage of humans, chimpanzees, and gorillas. Therefore, the contribution of homoplasy caused by mutations may be insignificant with respect to the (H, C, G) phylogeny. On the other hand, in the case of the (H, C, G, O) and (H, C, G, O, M) phylogenies, the simulation indicates that if a small percentage of nucleotide sites are allowed to undergo mutations (substitution) at a rate 5- to 10-fold higher than the normal rate, homoplasy (and phylogenetic incompatibility) will occur at levels similar to those observed at the 51 loci under study. The number of interlocus-incompatible loci as well as interlocus-incompatible sites for the (H, C) clade is much larger than that for the (H, C, G) or (H, C, G, O) clades. If the latter incompatibility is attributed to homoplasy, then the former must be attributed to ancestral polymorphism.

The inference that a category of rapidly evolving sites must exist in primate DNA has implications for phylogenetic studies. Such sites might be expected to contribute inordinately to phylogenetic information obtained on lineages diverging in rapid succession because few other sites will have undergone substitution within the short time interval. Examples of rapidly evolving sites in specific genes are known (Green et al. 1990Citation ), but the extent of their occurrence in the genome has not been determined. In cases in which rapidly evolving sites are the major source of information about phylogenetic relationships, they may have undergone several substitutions before more slowly evolving sites could contribute to phylogenetic resolution. In such cases, a phylogeny may be built primarily on sites that show a high degree of incompatibility and is likely to be incorrect.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We thank Dr. Naoko Takezaki for critical reading of the manuscript, Hana Jandova and Solveig Hirschle for technical assistance, and Jane Kraushaar for editorial assistance.


    Footnotes
 
Naruya Saitou, Reviewing Editor

1 Present address: Department of Genetics, Trinity College, Dublin, Ireland Back

Abbreviations: C, chimpanzee; G, gorilla; H, human; M, macaque; NCBI, National Center for Biotechnology Information; NWM, New World monkeys; O, orangutan; OTU, operational taxonomic unit; OWM, Old World monkeys; PCR, polymerase chain reaction; T, tamarin. Back

Keywords: homoplasy parallelism convergent evolution trichotomy primates polymorphism Back

Address for correspondence and reprints: Colm O'hUigin, Max-Planck-Institut für Biologie, Abteilung Immungenetik, Corrensstrasse 42, D-72076 Tübingen, Germany. E-mail: ohuiganc{at}tcd.ie . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Bailey W. J., 1993 Hominoid trichotomy: a molecular overview Evol. Anthropol 2:100-108

    Green P. M., A. J. Montandon, D. R. Bentley, R. Ljung, I. M. Nilsson, F. Giannelli, 1990 The incidence and distribution of CpG->TpG transitions in the coagulation factor IX gene Nucleic Acids Res 18:3227-3231[Abstract]

    Horai S., Y. Satta, K. Hayasaka, R. Kondo, T. Inoue, T. Ishida, S. Hayashi, N. Takahata, 1992 Man's place in hominoidea revealed by mitochondrial DNA genealogy J. Mol. Evol 35:32-43[ISI][Medline]

    Irwin D. M., R. T. White, A. C. Wilson, 1993 Characterization of the cow stomach lysozyme genes: repetitive DNA and concerted evolution J. Mol. Evol 37:355-366[ISI][Medline]

    Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism III. Academic Press, New York

    Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]

    Klein J., A. Sato, S. Nagl, C. O'hUigin, 1998 Molecular trans-species polymorphism Annu. Rev. Ecol. Syst 29:1-21[ISI]

    Kominato Y., P. D. McNeill, M. Yamamoto, M. Russel, S.-I. Hakomori, F. Yamamoto, 1992 Animal histo-blood group ABO genes Biochem. Biophys. Res. Commun 189:154-165[ISI][Medline]

    Kornegay J. R., T. D. Kocher, L. A. Williams, A. C. Wilson, 1993 Pathways of lysozyme evolution inferred from the sequences of cytochrome b in birds J. Mol. Evol 37:367-379[ISI][Medline]

    Kriener K., C. O'hUigin, H. Tichy, J. Klein, 2000 Convergent evolution of major histocompatibility complex molecules in humans and New World monkeys Immunogenetics 51:169-178[ISI][Medline]

    Kumar S., S. B. Hedges, 1998 A molecular timescale for vertebrate evolution Nature 392:917-920[ISI][Medline]

    Lawn R. M., K. Schwartz, L. Patthy, 1997 Convergent evolution of apolipoprotein (a) in primates and hedgehog Proc. Natl. Acad. Sci. USA 94:11992-11997[Abstract/Free Full Text]

    Martin R. D., 1993 Primate origins: plugging the gaps Nature 363:223-234[ISI][Medline]

    Miyamoto M., B. F. Koop, J. L. Slightom, M. Goodman, M. Tennant, 1988 Molecular systematics of higher primates: genealogical relations and classification Proc. Natl. Acad. Sci. USA 85:7627-7631[Abstract]

    Nei M., S. Kumar, 2000 Molecular evolution and phylogenetics Oxford University Press, Oxford

    O'hUigin C., 1995 Quantifying the degree of convergence in primate Mhc-DRB genes Immunol. Rev 143:123-140[ISI][Medline]

    O'hUigin C., A. Sato, J. Klein, 1997 Evidence for convergent evolution of A and B blood group antigens in primates Hum. Genet 101:141-148[ISI][Medline]

    Rogers J., 1993 The phylogenetic relationships among Homo, Pan and Gorilla: a population genetics perspective J. Hum. Evol 25:201-215[ISI]

    Ruvolo M., 1997 Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets Mol. Biol. Evol 14:248-265[Abstract]

    Sarich V. M., A. C. Wilson, 1967 Rates of albumin evolution in primates Proc. Natl. Acad. Sci. USA 58:142-148[ISI][Medline]

    Satta Y., J. Klein, N. Takahata, 2000 DNA archives and our nearest relative: the trichotomy problem revisited Mol. Phylogenet. Evol 14:259-275[ISI][Medline]

    Schwartz J. H., 1987 The red ape Orang-utans & human origins. Houghton Mifflin Company, Boston

    Sibley C. G., J. A. Comstock, J. E. Ahlquist, 1990 DNA hybridization evidence of hominoid phylogeny: a reanalysis of the data J. Mol. Evol 30:202-236[ISI][Medline]

    Swanson K. W., D. M. Irwin, A. C. Wilson, 1991 Stomach lysozyme gene of the langur monkey: tests for convergence and positive selection J. Mol. Evol 33:418-425[ISI][Medline]

    Takahata N., 1993 Allelic genealogy and human evolution Mol. Biol. Evol 10:2-22[Abstract]

    ———. 1995 Mhc diversity and selection Immunol. Rev 143:225-247[ISI][Medline]

    Takahata N., Y. Satta, 1997 Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences Proc. Natl. Acad. Sci. USA 94:4811-4815[Abstract/Free Full Text]

    Teumer J., H. Green, 1989 Divergent evolution of part of the involucrin gene in the hominoids: unique intragenic duplications in the gorilla and human Proc. Natl. Acad. Sci. USA 86:1283-1286[Abstract]

    White T. D., G. Suwa, B. Asfaw, 1998 Australopithecus ramidus, a new species of early hominid from Aramis, Ethiopia Nature 371:306-312

    Wolfe K. H., P. M. Sharp, W. H. Li, 1989 Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]

    Woodward E. R., A. Buchberger, S. C. Clifford, L. D. Hurst, N. A. Affara, E. R. Maher, 2000 Comparative sequence analysis of the VHL tumor suppressor gene Genomics 65:253-265[ISI][Medline]

    Wyckoff G. J., W. Wang, C. I. Wu, 2000 Rapid evolution of male reproductive genes in the descent of man Nature 403:304-309[ISI][Medline]

    Zhang J., H. Rosenberg, M. Nei, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes Proc. Natl. Acad. Sci. USA 95:3708-3713[Abstract/Free Full Text]

Accepted for publication May 2, 2002.