Department of Biological Sciences, University of South Carolina, Columbia
Correspondence: E-mail: austin{at}biol.sc.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: chromosome evolution gene duplication genome evolution nucleotide substitution segmental rearrangement
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Analysis of complete genome sequences from eukaryotes suggests that gene duplication, giving rise to multigene families, occurs continually over evolution (Lynch and Conery 2000). Whereas most duplicate genes are eventually lost, certain duplicate genes are retained, and a duplicate gene that assumes a new function selectively advantageous to the organism is more likely to be retained than one of neutral effect (Hughes 1994; Lynch et al. 2001). The genomes of mammals include genes that have been duplicated throughout evolutionary history. Multigene families now present in mammalian genomes have been shown by phylogenetic analysis to include duplicates that originated before the origin of vertebrates, duplicates that originated early in vertebrate history, duplicates that originated in the mammalian lineage before the radiation of the eutherian (placental) orders, and duplicates that originated after the radiation of the eutherian orders (Friedman and Hughes 2001, 2003; Gu, Wang, and Gu 2002). Analyses of the human genome have shown that more recently duplicated genes are more likely to be physically linked than are more anciently duplicated genes, apparently reflecting the predominance of tandem duplication as a mechanism of gene duplication (Friedman and Hughes 2003).
Comparison of genetic maps of different species of mammals has suggested that, over the evolution of mammals, genomes have evolved by rearrangement of genomic segments, including syntenic groups of genes (O'Brien et al. 1999; Band et al. 2000). The recent completion of a draft sequence of the human and mouse genomes provided further support for this view (Mouse Genome Sequencing Consortium 2002). Although it has so far not been possible to reconstruct with certainty the ancestral arrangement of syntenic groups in mammals (Mouse Genome Sequencing Consortium 2002), the availability of genomic sequence has improved reconstruction of the breakpoints involved in the rearrangements that took place since the last common ancestor of human and mouse (Pevzner and Tesler 2003). The rearrangement of genomic segments provides a mechanism whereby genes originally duplicated in tandem can be spread to different chromosomes over the course of evolution (Friedman and Hughes 2003).
In the present paper, we address the question of how the rearrangement among chromosomes over mammalian evolutionary history has affected the distribution of duplicate genes. Examining a set of gene families present in both human and mouse genomes, we compare their chromosomal distributions in the two species. Our results show that these two mammalian species have very different overall patterns of distribution of gene families among chromosomes, which suggests, in turn, that the chromosomal redistribution of multigene family members may be a factor in the evolutionary differentiation of different lineages of eukaryotes.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Protein families were identified by homology and a single-linkage method employed by the BlastClust software available in the Blast software package (Altschul et al. 1997). Sequence homology was established by identifying matches using a conservative E-value of 106 with a minimum of 30% sequence identity across at least 50% of length of two sequences. Preliminary analyses with this data set and others showed that this set of criteria is conservative for establishing gene family membership (Hughes and Friedman 2004; unpublished data). The single-linkage method assembles larger families by linking shared genes among families, thus ensuring that a given gene will be assigned to only one family. Parsing and processing of sequence and gene family data were performed by software written in Perl.
Evolutionary Analyses
Given family membership and chromosomal location information, we compared the distribution of families across chromosomes in the two species. In these analyses, we excluded the Y chromosome, which has only a small number of genes. For a set of families having at least two members in each of the two species, we compared the distribution of families on the autosomes and the X chromosome with the random expectation by a randomization test. This test was conducted by creating simulated genomes in which the chromosomal locations of the genes analyzed were randomly reassigned. In each simulated genome, both the set of genes and the set of gene locations on chromosomes were the same as in the real genome, but all genes were randomly assigned to chromosomal locations. The distribution of genes in the real genome was compared with that in 1,000 simulated genomes.
Homologous sequences were aligned at the amino acid level using the ClustalW program (Thompson, Higgins, and Gibson 1994), and this alignment was imposed on the DNA sequences. The number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) were estimated by a maximum-likelihood method (Yang and Nielsen 2000) using the software package PAML (Yang 1997).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
Nucleotide Substitution
To obtain evidence regarding the relative time of duplication of duplicate genes in human and mouse, we estimated the number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) in comparisons between the members of two member families within each species. A total of 2,349 comparisons between such gene pairs were made (1,156 in human and 1,193 in mouse). For both species, mean dS for gene pairs on different chromosomes was higher than that for gene pairs on the same chromosome (fig. 3). This difference was highly significant (P < 0.001 in a factorial analysis of variance using species and location on the same or different chromosomes as main effects). Mean dS for comparisons between gene pairs on the same chromosome was higher in mouse than in human, but mean dS for comparisons between gene pairs on different chromosomes was higher in human than in mouse (fig. 3). This effect was supported by a highly significant interaction (P = 0.001) between species and location on the same or different chromosomes in the analysis of variance. By contrast, the analysis showed no significant difference between species. When a similar analysis was applied to dN, no significant effects were observed (not shown).
|
The relative magnitude of these three modes differed strikingly, depending on the species and on whether the genes compared were on the same or different chromosomes. For human gene pairs on the same chromosome, the first mode was by far the most prominent, with the third mode being barely detectable (fig. 4A). On the other hand, for mouse genes on the same chromosome, the first and second modes were about equally prominent, and the third mode was more clearly visible than in the human case (fig. 4B). The relative prominence of the second and third modes in the case of mouse gene pairs on the same chromosome explains why mean dS for within-chromosome comparisons was higher in mouse than in human (fig. 3).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In both species, gene families tended to be confined to a single chromosome to a greater extent than expected by chance (table 2). This pattern of gene distribution probably reflects, at least in part, the fact that many gene families arise by tandem duplication and remain linked in tandem arrays. This same biological fact also apparently explains why, in both genomes, fewer families are found on multiple (3) chromosomes (table 2).
On the other hand, the distribution of families across chromosomes was strikingly different between the two species. The average number of families shared between chromosomes was nearly 60% higher in mouse than in human (fig. 1). Note that this difference is too large to be a consequence simply of the slightly (11%) larger average family size in mouse than in human (table 1). Human chromosomes rarely shared large numbers of gene families with more than one or two other chromosomes, whereas mouse chromosomes frequently did so (fig. 2).
Comparison of the patterns of sharing of large numbers (100) of gene families in the two species highlighted the unique nature of human chromosome 1. This chromosome shared 100 or more gene families with eight other chromosomes, whereas no other human chromosome shared 100 or more gene families with more than three other chromosomes (fig. 2A). There is evidence that human chromosome 1 represents an ancestral chromosome in eutherian (placental) mammals that has been independently rearranged in different eutherian lineages, although retained in primates (Murphy et al. 2003). If so, the role of this chromosome as a kind of superchromosome sharing large numbers of gene families with numerous others presumably also reflects a genomic pattern ancestral to eutherian mammals.
The maximum-likelihood (ML) method (Yang and Nielsen 2000) provides estimates of the number of synonymous substitutions per site (dS) even when it would be impossible to estimate that quantity by a simpler method. This method yields estimates even when multiple changes are hypothesized to have occurred at each site; thus, the higher values of dS estimated by this method should probably be treated with some caution. However, some authors (e.g., Blanc, Hokamp and Wolfe 2003) have argued that ML estimates of dS are meaningful even when dS is substantially greater than one substitution per site.
In the present data set, it seems likely that the relative magnitude of dS estimates contains biological meaning, although the sharp separation between the three observed modes (fig. 4) may partly be an artifact of the estimation process. Nonetheless, the trimodal distribution of dS in comparisons between duplicated genes in human and mouse (fig. 4) is of interest because the three modes appear to correspond to three periods of gene duplication in the mammalian lineage that have been identified by phylogenetic methods; namely, duplications after the mammalian radiation, duplications within the tetrapod lineage before the mammalian radiation, and duplications in early vertebrates before the origin of tetrapods (Friedman and Hughes 2001, 2003; Gu, Wang, and Gu 2002).
Our analyses revealed differences between human and mouse regarding the chromosomal distribution of duplicate gene pairs that appear to have originated during these three periods of gene duplication. In human, pairs on the same chromosome were found to correspond mainly to recent duplications, whereas those on separate chromosomes corresponded to more ancient duplications (fig. 4). This pattern is consistent with phylogenetic analyses showing that, in the human genome, within-chromosome duplicates disproportionately result from recent duplications, whereas between-chromosome duplications disproportionately result from ancient duplications (Friedman and Hughes 2003). This, in turn, is consistent with a model whereby most gene duplication occurs by tandem duplication, and over long periods of evolutionary time, tandem duplicates are eventually separated by chromosomal rearrangements (Friedman and Hughes 2003).
In comparison with human, the mouse genome showed a much higher proportion of ancient duplicates on the same chromosome and of recent duplicates on different chromosomes. One possible explanation for this difference is that the human chromosomal arrangement is closer to that of the eutherian ancestor than is that of the mouse and that segmental rearrangements have occurred to a greater extent in the mouse lineage than in the human lineage. Segment rearrangements may have dispersed recent duplicates throughout the genome to a greater extent in human than in mouse. By the same token, they may have brought back onto the same chromosome certain ancient duplicates that had been separated in the eutherian ancestor. A greater rate of interchromosomal segmental exchange in the rodent than in the primate lineage may further explain the fact that, in the mouse, a much higher proportion of chromosomes share large numbers of gene families (figs. 1 and 2), because this latter situation might also be a result of a more extensive mixing of ancestral mammalian chromosomal segments in the rodent than in the primate lineage.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
Band, M. R., J. H. Larson, and M. Rebeiz, et al. (11 co-authors). 2000. An ordered comparative map of the cattle and human genomes. Genome Res. 10:1359-1368.
Blanc, G., K. Hokamp, and K. H. Wolfe. 2003. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 13:137-144.
Clamp, M., D. Andrews, and D. Barker, et al. (36 co-authors). 2003. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res. 31:38-42.
Eichler, E. E., and D. Sankoff. 2003. Structural dynamics of eukaryotic chromosome evolution. Science 301:793-797.
Friedman, R., and A. L. Hughes. 2001. Pattern and timing of gene duplication in animal genomes. Genome Research 11:1842-1847.
Friedman, R., and A. L. Hughes. 2003. The temporal distribution of gene duplication events in a set of highly conserved human gene families. Mol. Biol. Evol. 20:154-161.
Gu, X., Y. Wang, and J. Gu. 2002. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate genomes. Nat. Genet. 31:205-209.[CrossRef][ISI][Medline]
Hughes, A. L. 1994. The evolution of functionally novel proteins after geme duplication. Proc. R. Soc. Lond B Biol. Sci. 256:119-124.[ISI][Medline]
Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York.
Hughes, A. L., and R. Friedman. 2004. Differential loss of ancestral gene families as a source of genomic divergence in animals. Proc. R. Soc. Lond. B Biol. Sci. 27: (Suppl.): S107-S109.
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917-920.[CrossRef][ISI][Medline]
Li, W.-H. 1982. Evolutionary change of duplicate genes. Isozymes 6:55-92.[ISI][Medline]
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
Lynch, M., M. O'Hely, B. Walsh, and A. Force. 2001. The probability of preservation of a newly arisen gene duplicate. Genetics 159:1789-1804.
Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520-562.[CrossRef][ISI][Medline]
Murphy, W. J., L. Frönicke, S. J. O'Brien, and R. Stanyon. 2003. The origins of human chromosome 1 and its homologs in placental mammals. Genome Res. 13:1880-1888.
O'Brien, S. J., M. Menotti-Raymond, W. J. Murphy, W. G. Nash, J. Wiensburg, R. Stanyon, N. G. Copeland, N. A. Jenkins, J. Womack, and J. A. M. Graves. 1999. The promise of comparative genomics in mammals. Science 286:458-481.
Pevzner, P., and G. Tesler. 2003. Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc. Natl. Acad. Sci. USA 100:7672-7677.
Thompson, J. D., D. G. Higgins, and T. Gibson. 1994. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract]
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Medline]
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17:32-43.