Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The purpose of this paper was to study this problem by constructing phylogenetic trees of TCR VB genes from humans and mice and by examining the evolutionary change of gene arrangements of the TCR VB region. The possibility of positive selection operating on the CDR regions was also investigated.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
All phylogenetic analyses in this paper were conducted using the MEGA computer program (Kumar, Tamura, and Nei 1993
), except when parsimony trees were constructed. We first constructed a phylogenetic tree for the functional VB genes by using the neighbor-joining (NJ) method (Saitou and Nei 1987
) with the uncorrected nucleotide differences (p-distances). We chose the p-distance because this distance is known to give better results when the number of sequences is large and the number of nucleotides used is relatively small (Nei and Kumar 2000
). Sites with alignment gaps were eliminated from the analyses (complete deletion option in the MEGA program). There were two human sequences that were identical to each other after complete deletion of the sites with gaps, so only one of them was used in the analysis. Although the human VB25.1 gene contains an intact open reading frame in some cell lines, it is potentially a pseudogene (Rowen, Koop, and Hood 1996
). Therefore, it was excluded from the analysis. For these reasons, the total numbers of sequences used were 44 for humans and 21 for mice. We also included two human TCR variable region (VA) genes of alpha chains (GenBank accession numbers AE000658 and X04939) in the analysis to root the phylogenetic tree. The total number of nucleotides per sequence after removal of sites with alignment gaps was 289. In this analysis, we used the sequences of the entire V domain rather than the framework regions, which were used by Su et al. (1999)
. However, essentially the same results were obtained when the CDR regions were eliminated from the analysis. The aligned sequence data used in this paper are available from the website, http://mep.bio.psu.edu/databases.
To examine the reliability of NJ trees, we also constructed parsimony consensus trees using PAUP* (Swofford 1998
). In this case, the full heuristic search (standard stepwise addition + tree bisection-reconnection [TBR]) method was implemented for 500 bootstrap replications, and for each replication the TBR search was repeated 100 times. The resultant bootstrap 50% majority-rule consensus tree was compared with the NJ tree.
The purpose of the above study was to understand the long-term evolution of VB genes. However, to relate the evolutionary relationships of individual genes to their genomic locations, it is important to study the intraspecific gene phylogeny (Ota and Nei 1994
). For this purpose, we constructed NJ trees for humans and mice separately using both functional genes and pseudogenes. Inclusion of pseudogenes introduced additional alignment gaps. Therefore, we excluded some truncated or unalignable pseudogenes. We also excluded some functional genes with relatively long gaps (>6 bp). As a result, the total number of sequences used in the analysis of human VB genes was 57, and the total number of nucleotide sites per sequence after removal of alignment gaps was 263. To root the human VB tree, we used the human VA20 gene (accession number M17663), which is evolutionarily related to VB genes (Arden et al. 1995a
). In the analysis of the mouse VB genes, the total number of taxa used was 32, and the number of sites was 189. The number of sites used in the mouse VB genes was smaller than that in the human VB genes, because mouse VB genes contained more alignment gaps. The mouse gene VA5 (accession number X02967) was used to root the mouse VB tree (Arden et al. 1995b
). All other methods used in this analysis were the same as those for the analysis of functional VB genes.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The same classification of VB genes has previously been proposed by Su et al. (1999)
for a study in which the functional VB genes from humans, mice, rabbits, sheep, cattle, and chicken were analyzed. In Su et al.'s (1999)
analysis, amino acid sequences rather than nucleotide sequences were used. Probably for this reason, the phylogenetic tree for human and mouse genes they obtained was not the same as that in figure 2
, especially with respect to poorly supported branching patterns. Nevertheless, the two studies identified the same gene groups, with the same human and mouse member genes and similar values of bootstrap support. This indicates that the classification of these gene groups is quite reliable and suggests that all member genes belonging to a gene group evolved from the same common ancestor.
Since chickens and mammals share some of the gene groups (Su et al. 1999
), these divergent VB groups in the human and mouse genomes must have been maintained over hundreds of millions of years. In fact, preliminary results from another study indicated that genes from group F are shared by cartilaginous fishes (sharks and skates) and genes from groups D and E are shared by Xenopus and the axolotl (amphibian), suggesting that the divergence among these groups occurred 350500 MYA (unpublished data). This remarkably long maintenance of different VB gene groups is not expected if these VB genes are subject to concerted evolution. Rather, it is consistent with the model of birth-and-death evolution, where divergent gene groups are usually maintained for a long evolutionary time (see Ota and Nei 1994
; Nei, Gu, and Sitnikova 1997
).
Times of Divergence of Different Gene Groups
Since the VB gene groups have apparently been maintained for a long time, it is interesting to estimate the times of divergence of these gene groups under the assumption of a molecular clock. For this purpose, we used the linearized-tree method of Takezaki, Rzhetsky, and Nei (1995)
. To examine the assumption of the molecular clock, we used Takezaki, Rzhetsky, and Nei's (1995)
two-cluster test. For this purpose, we first estimated the gamma parameter a to measure the extent of variance of evolutionary rate among different nucleotide sites by using Gu and Zhang's (1997)
method (computer program gz-dna, available at the website http://mep.bio.psu.edu) and obtained a = 2.6. Since this value was quite large, we ignored the rate variation in the subsequent analysis. We used the Jukes-Cantor, Kimura two-parameter, Tamura-Nei, and Tajima-Nei distances to test the assumption of a molecular clock (Nei and Kumar 2000
). Only when the Tajima-Nei distance was used was the molecular clock hypothesis acceptable for all of the gene groups. We therefore used this distance to construct a linearized tree.
For calibration of evolutionary time, we used the time of divergence between humans and mice. Recently discovered fossils suggest that humans and mice diverged at least 85 MYA (Archibald 1996
), whereas molecular data suggest that they diverged 100112 MYA (Li et al. 1990
; Kumar and Hedges 1998
). In this study, we assumed that humans and mice diverged 100 MYA. The tree in figure 2
shows pairs of human and mouse VB genes or gene clusters that are putatively orthologous. These pairs are indicated with black dots at the branching nodes. Among them, there are five pairs of human and mouse VB genes or gene clusters that are the most closely related in each of the five VB gene groups (marked with asterisks at the interior nodes; fig. 2
). We tentatively assumed that they were orthologous pairs of genes. We excluded group C from this analysis, because there is only one pair of human and mouse genes in group C, and these two sequences showed a higher level of divergence than all other putatively orthologous pairs of genes (fig. 2
). We then computed the average Tajima-Nei distance (
) for the five pairs of orthologous genes from humans and mice and obtained
= 0.281 ± 0.045. Assuming that humans and mice diverged 100 MYA, the rate of nucleotide substitutions in VB genes was estimated to be (1.41 ± 0.23) x 10-9 per site per year. Note that this rate could be an overestimate because the five pairs of genes or gene clusters used might have diverged earlier than 100 MYA. However, it is interesting that the estimate was close to that of Ig VH (variable region gene for the heavy chain) genes (1.43 x 10-9 [Gojobori and Nei 1984
] and 1.4 x 10-9 [Su and Nei 1999
]).
The times of divergence among different VB groups were then calculated by using the linearized tree and the estimated substitution rate. The results, which are presented in figure 3 , show that (1) group D genes diverged from other group genes about 423 MYA, (2) group F genes diverged from the remaining groups about 365 MYA, (3) group E genes diverged from groups A + B + C about 305 MYA, (4) group C genes diverged from the ancestor gene of groups A and B about 295 MYA, and (5) the divergence between group A and B genes occurred about 274 MYA. These estimates suggest that all five VB groups have been maintained in the human and mouse lineages for a long evolutionary time.
|
Evolutionary Dynamics of VB Genes Within Species
If the number of VB genes increases mainly by tandem gene duplication, we would expect that closely related genes are physically clustered in the genome, and therefore a group of physically closely located genes should cluster in a phylogenetic tree. To examine this prediction, we constructed the phylogenetic trees of VB genes for humans and mice separately (figs. 4 and 5
). In these trees, the number given in parentheses following the gene notation is the location number in the genome, with the gene at the 5'-most end being the first. The human tree in figure 4
shows that six group B genes are all located at genomic positions 111. The genes located at positions 5963 also form a cluster in group E. In the mouse genome (fig. 5
), the genes located at positions 2530 form a monophyletic cluster. These relationships indicate that tandem gene duplication is responsible for some clusters of duplicate genes. For other genes, however, the genomic and phylogenetic locations of the genes are not necessarily correlated.
|
|
Block DNA Duplications in the Human VB Gene Region
One reason why the physical clusters in the genome do not always correspond to phylogenetic clusters is that gene duplication often occurs as a block including many genes. Theoretically, when several genes duplicate as a block, the duplicate genes will be located at a distance of about the length of the DNA block that is duplicated. In fact, examining the genomic map of genes, we discovered that a 20-kb DNA segment is tandemly repeated five times in the human VB gene region (units ae in fig. 6
). These five repeat units span a DNA region of >100 kb and account for about 15% of the total human VB gene region. These repeats are remarkable in terms of their sequence length and the high similarities among them. Understanding the evolutionary relationships among the five repeat units would help us to understand the evolutionary dynamics of the VB family. For this purpose, we conducted a phylogenetic analysis of these repeat units.
|
|
To estimate the approximate times of duplication of these repeat units, we constructed a linearized tree for the units (fig. 7B ). In this analysis, we excluded the intron sequences from the alignment because we did not have a good estimate of the substitution rate of the VB introns. Using the rate of nucleotide substitution estimated above, this tree shows that (1) unit e diverged from the ancestor of units ad about 32 MYA, (2) the ancestor of units a and b diverged from the ancestor of c and d about 31 MYA, (3) units a and b diverged about 29 MYA, and (4) units c and d diverged about 24 MYA (fig. 7B ). It appears that these five units have been maintained in the genome for a long period of evolutionary time.
Positive Darwinian Selection in Human and Mouse VB Genes
Previously, working with Ig genes, Tanaka and Nei (1989)
showed that positive Darwinian selection operates for the CDRs but not for the FRs. They showed this by comparing the numbers of synonymous (dS) and nonsynonymous (dN) nucleotide substitutions per site (Nei and Gojobori 1986
). In Tanaka and Nei's (1989)
study, positive selection (dN > dS) was identified only when closely related sequences were compared or when dS was relatively small. This happened apparently because dN reaches a saturation level rather quickly, since only a special set of amino acids are used in the CDRs.
It is likely that the CDRs of TCR genes are also subject to positive selection, because they have essentially the same function as that of Ig CDRs. We therefore estimated the dS and dN values for the CDR sequences for each VB gene group using Nei and Gojobori's (1986)
method. This method is known to be more conservative for detecting positive selection than some of the recent methods, such as that of Zhang, Rosenberg, and Nei (1998)
. However, it is better to use a conservative method, because many assumptions made in estimating dS and dN may not be satisfied with real sequence data.
The relationships between dS and dN obtained for the CDRs and the FRs are presented in figure 8
. It is clear that in the CDRs, dN is generally much higher than dS, as long as dS < 0.3 (fig. 8A
). This suggests that there is positive selection operating in the CDRs. In fact, if we consider the region for dS 0.3 in figure 8A,
the number of points with dN > dS is 51, and the number of points with dN < dS is 7. A nonparametric binomial test indicates that the occurrence of this event by chance is exceedingly small (P < 10-8). When dS > 0.4, however, dN tends to be smaller than dS, as in the case of Ig CDRs. This relationship is probably caused by the saturation effect that is often observed for the number of nonsynonymous substitutions (Tanaka and Nei 1989
; Lee, Ota, and Vacquier 1995
). Figure 8B
shows the relationship between dN and dS for the FRs. Here, dN is almost always smaller than dS for the entire region of dS values. Therefore, purifying selection is predominant in the FRs.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have shown that divergent VB gene groups in the human and mouse lineages have been maintained in the genome over hundreds of millions of years. Interlocus gene conversion does not appear to have been important in the evolution of VB genes. Apparently, ancient gene duplication followed by subsequent diversification is the major mode of evolution in the VB gene family.
In both humans and mice, approximately 40% of the total VB genes have lost their function due to deleterious mutations (Rowen, Koop, and Hood 1996
; Rowen et al. 1997
). In the DNA repeat region mentioned above, four VB genes appear to have become pseudogenes during the last 32 Myr because of deleterious mutations (fig. 7B
), and the total number of pseudogenes had increased to six by the subsequent DNA block duplications (
V5.7,
V6.8,
V6.9,
V13.4,
V13.7, and
V13.8; fig. 6
). In addition, this region contains 10 relic VB genes (R1's and R2's), which apparently existed in the ancestral DNA block. These relic genes appear to have been generated either by insertion of a transposable element and deletion of an exon (e.g., R1 in fig. 6
) or by frameshift mutations (e.g., R2). Actually, many relic genes in the human VB gene region are associated with transposable elements (unpublished data).
These findings suggest that the TCR VB gene family has evolved following the model of birth-and-death evolution rather than concerted evolution. A similar conclusion has been reached for many other immune system genes, including the MHC (e.g., Nei, Gu, and Sitnikova 1997
; Gu and Nei 1999
) and the Ig V (e.g., Ota and Nei 1994
; Sitnikova and Nei 1998
; Sitnikova and Su 1998
) gene families in vertebrates and the pathogen resistance genes in animals (Zhang, Dyer, and Rosenberg 2000
) and plants (reviewed in Michelmore and Meyers 1998
). It appears that the general mode of evolution of immune system multigene families is birth-and-death evolution.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Lilly Research Laboratories, Indianapolis, Indiana
2 Keywords: T-cell receptor
variable region
birth-and-death evolution
block duplication
positive selection
3 Address for correspondence and reprints: Chen Su, Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285. su_chen{at}lilly.com
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Archibald, J. D. 1996. Fossil evidence for a late Cretaceous origin of "hoofed" mammals. Science 272:11501153
Arden, B., S. P. Clark, D. Kabelitz, and T. W. Mak. 1995a. Human T-cell receptor variable gene segment families. Immunogenetics 42:455500
. 1995b. Mouse T-cell receptor variable gene segment families. Immunogenetics 42:501530
Benton, M. J. 1993. The fossil record 2. Chapman and Hall, London
Clark, S. P., B. Arden, D. Kabelitz, and T. W. Mak. 1995. Comparison of human and mouse T-cell receptor variable gene segment subfamilies. Immunogenetics 42:531540
Crews, S., J. Griffin, H. Huang, K. Calame, and L. Hood. 1981. A single VH gene segment encodes the immune response to phosphorylcholine: somatic mutation is correlated with the class of the antibody. Cell 25:5966
Dayhoff, M. O. 1972. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Springs, Md
Davis, M. M. 1990. T cell receptor gene diversity and selection. Annu. Rev. Biochem. 59:475496[ISI][Medline]
Deininger, P. L., and M. A. Batzer. 1993. Evolution of retroposons. Pp. 157196 in M. K. Hecht, ed. Evolutionary biology. Vol. 27. Plenum Press, New York
Funkhouser, W., B. F. Koop, P. Charmley, D. Martindale, J. Slightom, and L. Hood. 1997. Evolution and selection of primate T cell antigen receptor BV8 gene subfamily. Mol. Phylogenet. Evol. 8:5164[ISI][Medline]
Gojobori, T., and M. Nei. 1984. Concerted evolution of the immunoglobulin VH gene family. Mol. Biol. Evol. 1:195 212[Abstract]
Gu, X., and M. Nei. 1999. Locus specificity of polymorphic alleles and evolution by a birth-and-death process in mammalian MHC genes. Mol. Biol. Evol. 16:147156[Abstract]
Gu, X., and J. Zhang. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:11061113[Abstract]
Hood, L., J. H. Campbell, and S. C. Elgin. 1975. The organization, expression, and evolution of antibody genes and other multigene families. Annu. Rev. Genet. 9:305353[ISI][Medline]
Jakobsen, I. B., and S. Easteal. 1996. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput. Appl. Biosci. 12:291295[Abstract]
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol. 42:5965[ISI][Medline]
Klein, J., and V. Hoej
I. 1997. Immunology. Blackwell, Oxford, England
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917920
Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Pennsylvania State University, University Park
Lee, Y. H., T. Ota, and V. D. Vacquier. 1995. Positive selection is a general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12:231238[Abstract]
Li, W.-H., M. Gouy, P. M. Sharp, C. O'hUigin, and Y. W. Yang. 1990. Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla, and Carnivora and molecular clocks. Proc. Natl. Acad. Sci. USA 87:67036707
Michelmore, R. W., and B. C. Meyers. 1998. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8:11131130
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426[Abstract]
Nei, M., X. Gu, and T. Sitnikova. 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. USA 94:7799 7809
Nei, M., and A. L. Hughes. 1992. Balanced polymorphism and evolution by the birth-and-death process in the MHC loci. Pp. 2738 in K. Tsuji, M. Aizawa, and T. Sasazuki, eds. HLA 1991. Proceedings of the 11th Histocompatibility Workshop and Conference. Oxford University Press, Oxford, England
Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, Oxford, England
Ohta, T. 1983. On the evolution of multigene families. Theor. Popul. Biol. 23:216240[ISI][Medline]
Ota, T., and M. Nei. 1994. Divergent evolution and evolution by the birth-and-death process in the immunoglobulin VH gene family. Mol. Biol. Evol. 11:469482[Abstract]
Rast, J. P., M. K. Anderson, T. Ota, R. T. Litman, M. Margittai, M. J. Shamblott, and G. W. Litman. 1994. Immunoglobulin light chain class multiplicity and alternative organizational forms in early vertebrate phylogeny. Immunogenetics 40:8399
Rowen, L., B. F. Koop, C. Boysen et al. (21 co-authors). 1997. Mus musculus TCR ß locus of the complete sequence. GenBank accession numbers AE000663, AE000664, and AE000665
Rowen, L., B. F. Koop, and L. Hood. 1996. The complete 685-kilobase DNA sequence of the human ß T cell receptor locus. Science 272:17551762
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425[Abstract]
Sitnikova, T. 1996. Bootstrap method of interior-branch test for phylogenetic trees. Mol. Biol. Evol. 13:605611[Abstract]
Sitnikova, T., and M. Nei. 1998. Evolution of immunoglobulin chain variable region genes in vertebrates. Mol. Biol. Evol. 15:5060[Abstract]
Sitnikova, T., and C. Su. 1998. Coevolution of immunoglobulin heavy and light chain variable region gene families. Mol. Biol. Evol. 15:617625[Abstract]
Smit, A. F. 1993. Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 21: 18631872
Smith, G. P. 1974. Unequal crossover and the evolution of multigene families. Cold Spring Harb. Symp. Quant. Biol. 38:507513[ISI][Medline]
Smith, G. P., L. Hood, and W. M. Fitch. 1971. Antibody diversity. Annu. Rev. Biochem. 40:9691012[ISI]
Su, C., I. Jakobsen, X. Gu, and M. Nei. 1999. Diversity and evolution of T-cell receptor variable region genes in mammals and birds. Immunogenetics 50:301308
Su, C., and M. Nei. 1999. Fifty-million-year-old polymorphism at an immunoglobulin variable region gene locus in the rabbit evolutionary lineage. Proc. Natl. Acad. Sci. USA 96: 97109715
Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony. Sinauer, Sunderland, Mass
Tajima, F., and M. Nei. 1984. Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1: 269285
Takezaki, N., A. Rzhetsky, and M. Nei. 1995. Phylogenetic test of the molecular clock and linearized trees. Mol. Biol. Evol. 12:823833[Abstract]
Tanaka, T., and M. Nei. 1989. Positive Darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447459[Abstract]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680[Abstract]
Zhang, J., K. D. Dyer, and H. F. Rosenberg. 2000. Evolution of the rodent eosinophil-associated RNase gene family by rapid gene sorting and positive selection. Proc. Natl. Acad. Sci. USA 97:47014706
Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:37083713
Zimmer, E. A., S. L. Martin, S. M. Beverley, Y. W. Kan, and A. C. Wilson. 1980. Rapid duplication and loss of gene coding for the a chains of hemoglobin. Proc. Natl. Acad. Sci. USA 77:21582162