Evolutionary Dynamics of the T-Cell Receptor VB Gene Family as Inferred from the Human and Mouse Genomic Sequences

Chen Su1,3, and Masatoshi Nei

Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The diversity of T-cell receptors is generated primarily by the variable-region gene families, each of which is composed of a large number of member genes. The entire genomic sequence of the variable region (VB) of the T- cell receptor ß chain from humans and mice has become available. To understand the evolutionary dynamics of the VB gene family, we conducted a phylogenetic analysis of all VB genes from humans and mice, as well as a detailed analysis of internal DNA duplications in the human genomic VB region. The phylogenetic tree obtained shows that human and mouse VB genes intermingle extensively rather than forming two separate clusters and that many gene duplications occurred both before and after the divergence between primates and rodents. Analyzing the genomic maps of transposable elements (e.g., LINEs and SINEs) and relic VB genes in the VB gene region, we present evidence that a 20-kb VB region duplicated tandemly four times in the human lineage during the last 32 Myr, and 6 out of the 15 VB genes in this region have become nonfunctional during this period. Our results show that the VB gene family is subject to evolution by a birth-and-death process rather than to concerted evolution.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The {alpha}ß heterodimeric T-cell receptor (TCR) recognizes antigenic peptides in association with the major histocompatibility complex (MHC) molecules, and it plays an important role in the adaptive immune response in vertebrates. The sequence diversity of the {alpha}ß TCRs is primarily generated by somatic recombination of variable (V), diversity (D in ß chains), and joining segment (J) genes, among which V genes encode the TCR variable region that interacts with antigens. Each V gene can be further divided into five regions (fig. 1 ): two complementarity-determining regions (CDRs) and three framework regions (FRs). The CDRs of TCRs form {alpha} loops and come into direct contact with antigens, while the FRs form ß sheets and provide structural support for the CDR loops. While there are only a few copies of D and J genes in the genomes of higher vertebrates, the number of V genes is much larger, and the sequence diversity among these gene copies is largely responsible for the generation of diversity of TCRs. These V genes are usually clustered in a certain region of a chromosome and form a multigene family (Davis 1990Citation ; Klein and Horejsi 1997Citation ).



View larger version (68K):
[in this window]
[in a new window]
 
Fig. 1.—Amino acid sequences of the germ line TCR VB genes from humans and mice. These are representative sequences from six major groups of TCR VB genes (see text). Two human TCR VA genes (AE000658 and X04939) are also included, and they are used as outgroups. A dash indicates an indel, and a dot indicates an amino acid identical to that of the first sequence. The FRs and CDRs (Arden et al. 1995a, 1995b) are marked at the top

 
An important question regarding the evolution of multigene families has been their mode of long-term evolution. A number of authors (e.g., Hood, Campbell, and Elgin 1975Citation ; Ohta 1983Citation ) have suggested that multigene families are generally subject to concerted evolution, which homogenizes the DNA sequences of the member genes by mechanisms such as interlocus recombination or gene conversion (see Smith, Hood, and Fitch 1971Citation ; Smith 1974Citation ; Zimmer et al. 1980Citation ). However, Nei and Hughes (1992)Citation showed that the pattern of evolution of MHC gene families is very different from the typical concerted evolution and follows the model of birth-and- death evolution. In this model of evolution, new genes are assumed to be created by repeated gene duplication, and some genes are maintained in the genome for a long time, but others are deleted or become nonfunctional through deleterious mutations. Later, it was shown that this model of evolution applies to the multigene families for immunoglobulin (Ig) heavy chains and light chains (e.g., Ota and Nei 1994Citation ; Rast et al. 1994Citation ; Nei, Gu, and Sitnikova 1997Citation ; Sitnikova and Su 1998Citation ). A similar model of evolution apparently applies to the TCR V gene families as well (Clark et al. 1995Citation ; Su et al. 1999Citation ), but no detailed study of this problem has been done with these gene families. Fortunately, the complete genomic sequences of the TCR ß chain variable region (VB) genes from humans (Rowen, Koop, and Hood 1996Citation ) and mice (Rowen et al. 1997Citation ) have now been published. This allows us to investigate the evolutionary dynamics of the TCR VB gene family in vertebrates.

The purpose of this paper was to study this problem by constructing phylogenetic trees of TCR VB genes from humans and mice and by examining the evolutionary change of gene arrangements of the TCR VB region. The possibility of positive selection operating on the CDR regions was also investigated.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Nucleotide Sequences Used
In humans, the TCR VB gene region, which is composed of ~685 kb, is located on chromosome 7 and contains 65 VB genes. Out of these 65 VB genes, 46 appear to be functional and 19 are pseudogenes. Pseudogenes were defined as the genes with frameshift mutations or defects in the recombination signals that nevertheless showed relatively high homology to the functional genes (Rowen, Koop, and Hood 1996Citation ). In addition, there are 26 "relic" genes, which are either truncated sequences or the sequences that show low similarity to functional VB genes (Rowen, Koop, and Hood 1996Citation ). If we use 75% sequence similarity as a criterion (Crews et al. 1981Citation ), the 65 VB genes can be divided into 30 VB subfamilies, each of which contains one to nine member genes. In mice, the TCR VB gene region (~701 kb) is located on chromosome 6 and contains 35 VB genes. Twenty-one of these genes contain functional open reading frames, and 14 are pseudogenes. Relic genes have not been clearly defined in mice (Rowen et al. 1997Citation ). There are 31 subfamilies of VB genes in mice, and each of these subfamilies has one to three member genes. The genomic sequences of the entire VB gene regions from humans (accession numbers U66059– U66061) and mice (accession numbers AE000663– AE000665) were retrieved from GenBank. In this paper, we used the conventional gene notation for VB genes rather than the nomenclature proposed by Arden et al. (1995a)Citation so that our results can be compared with those obtained in previous studies (e.g., Funkhouser et al. 1997Citation ; Su et al. 1999Citation ). However, our nomenclature can easily be converted into Arden et al.'s system by using the international ImMunoGeneTics (IMGT) database (http://imgt.cnusc.fr:8104). In the nomenclature we used, each gene is denoted by two numbers. The first number represents the subfamily to which the gene belongs, and the second designates the order of discovery of the genes in each subfamily (see fig. 2 ).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2.—Neighbor-joining (NJ) tree of 65 functional TCR VB genes from humans and mice. Six VB gene groups are indicated with brackets. A total of 289 nt per sequence was used. The percentage bootstrap value (PB) based on 1,000 replications is shown for each interior branch wherever PB >= 50%. For the branches that identify the VB gene groups, the PB values obtained for the maximum-parsimony (MP) tree are also presented in boldface type after the NJ PB values. Although not every part of the topology of the NJ tree was the same as that of the MP tree, most of the VB gene groups are supported by both NJ and MP methods. Putatively orthologous human and mouse genes were indicated by black circles at the interior nodes, and those used for the computation of substitution rate (see text) are indicated by asterisks. The branch lengths are measured by the number of (uncorrected) nucleotide substitutions per site, with the scale given below the tree. Two human TCR VA sequences (AE000658 and X04939) are used as outgroups. H = human; M = mouse

 
Phylogenetic Analyses
In this study, we used only the coding regions of functional genes and pseudogenes. To study the phylogenetic relationships of the VB genes from both humans and mice, we used only coding regions of the functional VB genes to maximize the sequence length that can be used. The sequence alignment was done by using the CLUSTAL W computer program (Thompson, Higgins, and Gibson 1994Citation ) with additional minor modifications by visual inspection.

All phylogenetic analyses in this paper were conducted using the MEGA computer program (Kumar, Tamura, and Nei 1993Citation ), except when parsimony trees were constructed. We first constructed a phylogenetic tree for the functional VB genes by using the neighbor-joining (NJ) method (Saitou and Nei 1987Citation ) with the uncorrected nucleotide differences (p-distances). We chose the p-distance because this distance is known to give better results when the number of sequences is large and the number of nucleotides used is relatively small (Nei and Kumar 2000Citation ). Sites with alignment gaps were eliminated from the analyses (complete deletion option in the MEGA program). There were two human sequences that were identical to each other after complete deletion of the sites with gaps, so only one of them was used in the analysis. Although the human VB25.1 gene contains an intact open reading frame in some cell lines, it is potentially a pseudogene (Rowen, Koop, and Hood 1996Citation ). Therefore, it was excluded from the analysis. For these reasons, the total numbers of sequences used were 44 for humans and 21 for mice. We also included two human TCR variable region (VA) genes of alpha chains (GenBank accession numbers AE000658 and X04939) in the analysis to root the phylogenetic tree. The total number of nucleotides per sequence after removal of sites with alignment gaps was 289. In this analysis, we used the sequences of the entire V domain rather than the framework regions, which were used by Su et al. (1999)Citation . However, essentially the same results were obtained when the CDR regions were eliminated from the analysis. The aligned sequence data used in this paper are available from the website, http://mep.bio.psu.edu/databases.

To examine the reliability of NJ trees, we also constructed parsimony consensus trees using PAUP* (Swofford 1998Citation ). In this case, the full heuristic search (standard stepwise addition + tree bisection-reconnection [TBR]) method was implemented for 500 bootstrap replications, and for each replication the TBR search was repeated 100 times. The resultant bootstrap 50% majority-rule consensus tree was compared with the NJ tree.

The purpose of the above study was to understand the long-term evolution of VB genes. However, to relate the evolutionary relationships of individual genes to their genomic locations, it is important to study the intraspecific gene phylogeny (Ota and Nei 1994Citation ). For this purpose, we constructed NJ trees for humans and mice separately using both functional genes and pseudogenes. Inclusion of pseudogenes introduced additional alignment gaps. Therefore, we excluded some truncated or unalignable pseudogenes. We also excluded some functional genes with relatively long gaps (>6 bp). As a result, the total number of sequences used in the analysis of human VB genes was 57, and the total number of nucleotide sites per sequence after removal of alignment gaps was 263. To root the human VB tree, we used the human VA20 gene (accession number M17663), which is evolutionarily related to VB genes (Arden et al. 1995aCitation ). In the analysis of the mouse VB genes, the total number of taxa used was 32, and the number of sites was 189. The number of sites used in the mouse VB genes was smaller than that in the human VB genes, because mouse VB genes contained more alignment gaps. The mouse gene VA5 (accession number X02967) was used to root the mouse VB tree (Arden et al. 1995bCitation ). All other methods used in this analysis were the same as those for the analysis of functional VB genes.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Evolutionary Relationships of Functional VB Genes from Humans and Mice
The NJ tree for the human and mouse VB functional sequences is presented in figure 2 . This tree shows that human and mouse genes do not form two separate clusters. Rather, they intermingle extensively. This suggests that many gene duplications occurred both before and after primates and rodents diverged and that duplicate VB genes have not been subject to any significant interlocus homogenization of sequences within either of the two species. Our tree shows that these VB genes can be classified into six gene groups, A–F, with the exception of the mouse V2 gene that did not cluster reliably with any other groups (Su et al. 1999Citation ). With the exception of group A, all sequences belonging to a gene group formed a monophyletic cluster with a high bootstrap value (PB = 100%; fig. 2 ). The MP tree showed a similar pattern of clustering, and all MP bootstrap values that supported the clustering of sequences in each gene group were quite high (PB values ranged from 83% to 100%) except for group A (fig. 2 ). However, the bootstrap probability for group A as computed by the interior branch test (Sitnikova 1996Citation ) was reasonably high (81%).

The same classification of VB genes has previously been proposed by Su et al. (1999)Citation for a study in which the functional VB genes from humans, mice, rabbits, sheep, cattle, and chicken were analyzed. In Su et al.'s (1999)Citation analysis, amino acid sequences rather than nucleotide sequences were used. Probably for this reason, the phylogenetic tree for human and mouse genes they obtained was not the same as that in figure 2 , especially with respect to poorly supported branching patterns. Nevertheless, the two studies identified the same gene groups, with the same human and mouse member genes and similar values of bootstrap support. This indicates that the classification of these gene groups is quite reliable and suggests that all member genes belonging to a gene group evolved from the same common ancestor.

Since chickens and mammals share some of the gene groups (Su et al. 1999Citation ), these divergent VB groups in the human and mouse genomes must have been maintained over hundreds of millions of years. In fact, preliminary results from another study indicated that genes from group F are shared by cartilaginous fishes (sharks and skates) and genes from groups D and E are shared by Xenopus and the axolotl (amphibian), suggesting that the divergence among these groups occurred 350–500 MYA (unpublished data). This remarkably long maintenance of different VB gene groups is not expected if these VB genes are subject to concerted evolution. Rather, it is consistent with the model of birth-and-death evolution, where divergent gene groups are usually maintained for a long evolutionary time (see Ota and Nei 1994Citation ; Nei, Gu, and Sitnikova 1997Citation ).

Times of Divergence of Different Gene Groups
Since the VB gene groups have apparently been maintained for a long time, it is interesting to estimate the times of divergence of these gene groups under the assumption of a molecular clock. For this purpose, we used the linearized-tree method of Takezaki, Rzhetsky, and Nei (1995)Citation . To examine the assumption of the molecular clock, we used Takezaki, Rzhetsky, and Nei's (1995)Citation two-cluster test. For this purpose, we first estimated the gamma parameter a to measure the extent of variance of evolutionary rate among different nucleotide sites by using Gu and Zhang's (1997)Citation method (computer program gz-dna, available at the website http://mep.bio.psu.edu) and obtained a = 2.6. Since this value was quite large, we ignored the rate variation in the subsequent analysis. We used the Jukes-Cantor, Kimura two-parameter, Tamura-Nei, and Tajima-Nei distances to test the assumption of a molecular clock (Nei and Kumar 2000Citation ). Only when the Tajima-Nei distance was used was the molecular clock hypothesis acceptable for all of the gene groups. We therefore used this distance to construct a linearized tree.

For calibration of evolutionary time, we used the time of divergence between humans and mice. Recently discovered fossils suggest that humans and mice diverged at least 85 MYA (Archibald 1996Citation ), whereas molecular data suggest that they diverged 100–112 MYA (Li et al. 1990Citation ; Kumar and Hedges 1998Citation ). In this study, we assumed that humans and mice diverged 100 MYA. The tree in figure 2 shows pairs of human and mouse VB genes or gene clusters that are putatively orthologous. These pairs are indicated with black dots at the branching nodes. Among them, there are five pairs of human and mouse VB genes or gene clusters that are the most closely related in each of the five VB gene groups (marked with asterisks at the interior nodes; fig. 2 ). We tentatively assumed that they were orthologous pairs of genes. We excluded group C from this analysis, because there is only one pair of human and mouse genes in group C, and these two sequences showed a higher level of divergence than all other putatively orthologous pairs of genes (fig. 2 ). We then computed the average Tajima-Nei distance () for the five pairs of orthologous genes from humans and mice and obtained = 0.281 ± 0.045. Assuming that humans and mice diverged 100 MYA, the rate of nucleotide substitutions in VB genes was estimated to be (1.41 ± 0.23) x 10-9 per site per year. Note that this rate could be an overestimate because the five pairs of genes or gene clusters used might have diverged earlier than 100 MYA. However, it is interesting that the estimate was close to that of Ig VH (variable region gene for the heavy chain) genes (1.43 x 10-9 [Gojobori and Nei 1984Citation ] and 1.4 x 10-9 [Su and Nei 1999Citation ]).

The times of divergence among different VB groups were then calculated by using the linearized tree and the estimated substitution rate. The results, which are presented in figure 3 , show that (1) group D genes diverged from other group genes about 423 MYA, (2) group F genes diverged from the remaining groups about 365 MYA, (3) group E genes diverged from groups A + B + C about 305 MYA, (4) group C genes diverged from the ancestor gene of groups A and B about 295 MYA, and (5) the divergence between group A and B genes occurred about 274 MYA. These estimates suggest that all five VB groups have been maintained in the human and mouse lineages for a long evolutionary time.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3.—Linearized tree for six VB gene groups. This tree was constructed by using Tajima and Nei (1984) distance. The times of divergence between VB gene groups are marked on the scale below the tree, whereas the nucleotide distances are marked above the tree. The shaded bar represents the standard deviation. MY = million years

 
As mentioned earlier, chickens have VB genes closely related to groups D and E (Su et al. 1999Citation ). Therefore, these two groups must have persisted for at least 300 Myr, because chickens and mammals diverged about 300 MYA (Dayhoff 1972Citation ; Benton 1993Citation ). In addition, the cartilaginous fish (sharks and skates) genomes contain group F genes. Therefore, these genes must have persisted for at least about 450 Myr. The axolotl and Xenopus genomes also contain group D and E genes, so these group genes must have been maintained for at least 350 Myr. Our estimates of divergence times from the linearized-tree analysis are more or less consistent with these paleontological inferences, although they showed a more recent time of divergence for groups E and F. This may have happened because our estimate of the rate of nucleotide substitution used could be an overestimate, as mentioned earlier.

Evolutionary Dynamics of VB Genes Within Species
If the number of VB genes increases mainly by tandem gene duplication, we would expect that closely related genes are physically clustered in the genome, and therefore a group of physically closely located genes should cluster in a phylogenetic tree. To examine this prediction, we constructed the phylogenetic trees of VB genes for humans and mice separately (figs. 4 and 5 ). In these trees, the number given in parentheses following the gene notation is the location number in the genome, with the gene at the 5'-most end being the first. The human tree in figure 4 shows that six group B genes are all located at genomic positions 1–11. The genes located at positions 59–63 also form a cluster in group E. In the mouse genome (fig. 5 ), the genes located at positions 25–30 form a monophyletic cluster. These relationships indicate that tandem gene duplication is responsible for some clusters of duplicate genes. For other genes, however, the genomic and phylogenetic locations of the genes are not necessarily correlated.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogenetic tree for 57 VB genes from humans. A total of 263 nt per sequence was used. Pseudogenes are marked indicated with {psi}. The PB values estimated by the neighbor-joining method are indicated in the same way as that explained in the legend to figure 2 . The genomic gene order, starting from the 5'-end gene, is shown in parentheses following each gene. A human VA20 gene was used as the outgroup. Groups D and F are absent from this tree because we excluded three functional human genes with long gaps. The maximum-parsimony method also supports the clusters of VB gene groups

 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 5.—Phylogenetic tree for 32 VB genes from mice. A total of 189 nt per sequence was used. A mouse VA5 gene was used as the outgroup. The PB values and the genomic gene order are shown in the same way as in figure 4 . Group F is absent from this tree because we excluded a functional mouse gene with long gaps. The maximum-parsimony method also supports the clusters of the groups

 
In our intraspecific gene trees, pseudogenes do not evolve much faster than functional genes. However, this does not mean that VB pseudogenes and functional genes evolve at the same rate, because our statistical test showed that the average root-to-tip branch length of pseudogenes (P) is significantly longer than that of functional genes (F) in both humans and mice (P < 0.001 in humans and P < 0.05 in mice). The reason why the difference between P and F is apparently smaller than that observed for human Ig VH genes (Ota and Nei 1994Citation ) and Ig kappa chain variable region (V{kappa}) genes (Sitnikova and Nei 1998Citation ) is unclear. One possible explanation is that the numbers of nucleotide substitutions that occurred in these VB genes are near the saturation level. In fact, the Jukes-Cantor distance (Jukes and Cantor 1969Citation ) between the two most closely related gene groups, groups A and B, is already quite high (0.73), and the distance between groups B and D is as high as 1.22. At this level of divergence, it may not be always easy to detect the differences in substitution rate between two sets of genes that intermingle in the tree. Another possible explanation is that the pseudogenes used in the tree construction might have lost their function only recently. (Note that highly diverged nonfunctional genes were defined as "relics" instead of pseudogenes in the VB family [Rowen, Koop, and Hood 1996Citation ]). Since we excluded the relic genes from our tree, only pseudogenes with relatively high sequence similarity to the functional VB gene were used. In fact, when we included relic genes that were alignable with the functional ones, they showed considerably longer branch lengths compared with the functional genes, as in the case of the Ig VH and V{kappa} genes (data not shown). A third possible explanation (hypothesis) is that VB genes might have been subject to frequent gene conversion, such that pseudogenes and functional gene sequences are partially mixed and their evolutionary distances have become similar. However, using a randomization test as implemented in the computer program Reticulate (Jakobsen and Easteal 1996Citation ), we failed to detect gene conversion and recombination events, even for closely related sequences (P ranged from 0.06 to 0.36 in humans and from 0.11 to 0.15 in mice). Therefore, this explanation seems unlikely to be true. This result also raises a serious question about the importance of concerted evolution in the VB family.

Block DNA Duplications in the Human VB Gene Region
One reason why the physical clusters in the genome do not always correspond to phylogenetic clusters is that gene duplication often occurs as a block including many genes. Theoretically, when several genes duplicate as a block, the duplicate genes will be located at a distance of about the length of the DNA block that is duplicated. In fact, examining the genomic map of genes, we discovered that a 20-kb DNA segment is tandemly repeated five times in the human VB gene region (units a–e in fig. 6 ). These five repeat units span a DNA region of >100 kb and account for about 15% of the total human VB gene region. These repeats are remarkable in terms of their sequence length and the high similarities among them. Understanding the evolutionary relationships among the five repeat units would help us to understand the evolutionary dynamics of the VB family. For this purpose, we conducted a phylogenetic analysis of these repeat units.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 6.—A ~100-kb genomic contig in the human VB gene region contains five DNA repeat units (a–e), each having a length of ~20 kb. The VB genes (including pseudogenes) are indicated by white boxes. Two types of relic VB genes, one characterized by a length of ~400 bp, the other by a length of ~600 bp, are designated as R1 and R2, respectively (indicated by black boxes). Two subfamilies of the mammalian apparent long terminal repeat retrotransposons (Smit 1993Citation ), MER63 and MSTc, are marked by gray boxes and dotted boxes, respectively. LINE insertion sequences, including L1M and L1ME2, are indicated by vertical striped boxes. Two different families of SINE insertion sequences are indicated by two different patterns. In particular, horizontal striped boxes represent the Alu family, and forward-slashed boxes represent medium interspersed repeats (MIRs). The genetic components with forward transcription orientation are presented above the line, whereas those with reverse orientation are shown below the line. The names of genetic components shared by all five units are italicized, while the names of all others are in roman type. The map is drawn to scale with nucleotide position numbers (GenBank accession number U66060)

 
Because of the insertions/deletions involved, we could not obtain a reliable alignment for the entire repeat region. Our alignment was therefore based on the V13, V6, and V5 genes only (including exons and introns), and we used three V genes (13.5, 6.4, and 5.1) outside of the repeat regions as the outgroup. The total number of nucleotide sites per repeat unit after removal of alignment gaps was 1,654 bp, and an NJ tree for the five repeat units was constructed using the Tajima-Nei distance (fig. 7A ).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 7.—Phylogenetic trees for the five units. A, tree based on an alignment of concatenated V13, V6, and V5 sequences (including exons and introns), with a total length of 1,654 bp after removal of alignment gaps. Genes V13.5, V6.4, and V5.1 were used as the outgroup, as they diverged before duplication of the five units (fig. 4 ). The PB values obtained with the neighbor-joining method are shown wherever PB >= 50%. B, linearized tree based on the exons of V13, V6, and V5 sequences. The timescale shown at the bottom was obtained by assuming that the rate of nucleotide substitution was 1.41 x 10-9 per site per year (see text). The nucleotide distances measured by the Tajima-Nei model are indicated above the tree

 
This tree shows that repeat units a and b form a cluster, as do units c and d, and that unit e diverged before the emergence of units a–d. Although the bootstrap percentage values for this tree are not very high, this topology is supported by the following two observations. First, the V6 gene is a pseudogene in units a and b, but it is functional in units c–e (fig. 6 ), suggesting that the V6 gene lost its function in the ancestral sequence of units a and b. Second, units c and d lost the transposable element L1M, which is still present in all other units at the same position (fig. 6 ). This suggests that units c and d evolved from a common ancestor without L1M.

To estimate the approximate times of duplication of these repeat units, we constructed a linearized tree for the units (fig. 7B ). In this analysis, we excluded the intron sequences from the alignment because we did not have a good estimate of the substitution rate of the VB introns. Using the rate of nucleotide substitution estimated above, this tree shows that (1) unit e diverged from the ancestor of units a–d about 32 MYA, (2) the ancestor of units a and b diverged from the ancestor of c and d about 31 MYA, (3) units a and b diverged about 29 MYA, and (4) units c and d diverged about 24 MYA (fig. 7B ). It appears that these five units have been maintained in the genome for a long period of evolutionary time.

Positive Darwinian Selection in Human and Mouse VB Genes
Previously, working with Ig genes, Tanaka and Nei (1989)Citation showed that positive Darwinian selection operates for the CDRs but not for the FRs. They showed this by comparing the numbers of synonymous (dS) and nonsynonymous (dN) nucleotide substitutions per site (Nei and Gojobori 1986Citation ). In Tanaka and Nei's (1989)Citation study, positive selection (dN > dS) was identified only when closely related sequences were compared or when dS was relatively small. This happened apparently because dN reaches a saturation level rather quickly, since only a special set of amino acids are used in the CDRs.

It is likely that the CDRs of TCR genes are also subject to positive selection, because they have essentially the same function as that of Ig CDRs. We therefore estimated the dS and dN values for the CDR sequences for each VB gene group using Nei and Gojobori's (1986)Citation method. This method is known to be more conservative for detecting positive selection than some of the recent methods, such as that of Zhang, Rosenberg, and Nei (1998)Citation . However, it is better to use a conservative method, because many assumptions made in estimating dS and dN may not be satisfied with real sequence data.

The relationships between dS and dN obtained for the CDRs and the FRs are presented in figure 8 . It is clear that in the CDRs, dN is generally much higher than dS, as long as dS < 0.3 (fig. 8A ). This suggests that there is positive selection operating in the CDRs. In fact, if we consider the region for dS <= 0.3 in figure 8A, the number of points with dN > dS is 51, and the number of points with dN < dS is 7. A nonparametric binomial test indicates that the occurrence of this event by chance is exceedingly small (P < 10-8). When dS > 0.4, however, dN tends to be smaller than dS, as in the case of Ig CDRs. This relationship is probably caused by the saturation effect that is often observed for the number of nonsynonymous substitutions (Tanaka and Nei 1989Citation ; Lee, Ota, and Vacquier 1995Citation ). Figure 8B shows the relationship between dN and dS for the FRs. Here, dN is almost always smaller than dS for the entire region of dS values. Therefore, purifying selection is predominant in the FRs.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 8.—Relationships between the number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN) for the group E functional genes from both humans and mice. Each point represent a pairwise comparison of two group E genes. The straight line represents the expected relationship in the case of strict neutrality. A, CDRs. An excess of dN over dS is most obvious when sequence divergence is low, indicating positive selection operating in CDRs. B, FRs. Most points are below the straight line, indicating strong purifying selection

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
On the basis of the reconstructed phylogeny and the assumption of a molecular clock, we estimated the time of occurrence of the five DNA block duplications in the human VB gene region. These time estimates are certainly very rough, but they are consistent with those obtainable from data on the evolution of Alu elements (Deininger and Batzer 1993Citation ). The Alu retroposon family consists of many different subfamilies, each of which can be traced back to a specific ancestral element, and the time of origin of each subfamily has been estimated. For example, the AluSc subfamily is found in both Old World and New World monkeys and originated about 35 MYA. By contrast, the AluY subfamily is found only in hominoids and gibbons and originated about 19 MYA (Kapitonov and Jurka 1996Citation ). This information helps us to infer the upper and lower bounds of the time of origin of the DNA repeat units. Repeat units a and e have unique insertions of the AluY elements near R2 and R1, respectively (fig. 6 ). Because these AluY insertions are not shared by the other repeat units, they must have occurred after the origins of units a and e, respectively. Therefore, units a and e must have persisted for at least 19 Myr. Two other AluY elements inserted into LIME2 in units b and d at two different positions (fig. 6 ), indicating that units b and d must also be at least 19 Myr old. Four of the five repeat units contain an AluSc element at the same position (fig. 6 ), indicating that all four rounds of DNA block duplications occurred within less than 35 MYA. (Repeat a does not have AluSc, but it is possible that this element was later lost in this lineage or that the topology given in fig. 7 is incorrect and repeat a is basal.) Therefore, information on the evolution of Alu subfamilies is consistent with the results obtained from the linearized NJ tree.

We have shown that divergent VB gene groups in the human and mouse lineages have been maintained in the genome over hundreds of millions of years. Interlocus gene conversion does not appear to have been important in the evolution of VB genes. Apparently, ancient gene duplication followed by subsequent diversification is the major mode of evolution in the VB gene family.

In both humans and mice, approximately 40% of the total VB genes have lost their function due to deleterious mutations (Rowen, Koop, and Hood 1996Citation ; Rowen et al. 1997Citation ). In the DNA repeat region mentioned above, four VB genes appear to have become pseudogenes during the last 32 Myr because of deleterious mutations (fig. 7B ), and the total number of pseudogenes had increased to six by the subsequent DNA block duplications ({psi}V5.7, {psi}V6.8, {psi}V6.9, {psi}V13.4, {psi}V13.7, and {psi}V13.8; fig. 6 ). In addition, this region contains 10 relic VB genes (R1's and R2's), which apparently existed in the ancestral DNA block. These relic genes appear to have been generated either by insertion of a transposable element and deletion of an exon (e.g., R1 in fig. 6 ) or by frameshift mutations (e.g., R2). Actually, many relic genes in the human VB gene region are associated with transposable elements (unpublished data).

These findings suggest that the TCR VB gene family has evolved following the model of birth-and-death evolution rather than concerted evolution. A similar conclusion has been reached for many other immune system genes, including the MHC (e.g., Nei, Gu, and Sitnikova 1997Citation ; Gu and Nei 1999Citation ) and the Ig V (e.g., Ota and Nei 1994Citation ; Sitnikova and Nei 1998Citation ; Sitnikova and Su 1998Citation ) gene families in vertebrates and the pathogen resistance genes in animals (Zhang, Dyer, and Rosenberg 2000Citation ) and plants (reviewed in Michelmore and Meyers 1998Citation ). It appears that the general mode of evolution of immune system multigene families is birth-and-death evolution.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
This work was supported by research grants from NIH to M.N. We thank Alex Rooney and Igor Rogozin for helpful discussions.


    Footnotes
 
Naruya Saitou, Reviewing Editor

1 Present address: Lilly Research Laboratories, Indianapolis, Indiana Back

2 Keywords: T-cell receptor variable region birth-and-death evolution block duplication positive selection Back

3 Address for correspondence and reprints: Chen Su, Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285. su_chen{at}lilly.com Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Archibald, J. D. 1996. Fossil evidence for a late Cretaceous origin of "hoofed" mammals. Science 272:1150–1153

    Arden, B., S. P. Clark, D. Kabelitz, and T. W. Mak. 1995a. Human T-cell receptor variable gene segment families. Immunogenetics 42:455–500

    ———. 1995b. Mouse T-cell receptor variable gene segment families. Immunogenetics 42:501–530

    Benton, M. J. 1993. The fossil record 2. Chapman and Hall, London

    Clark, S. P., B. Arden, D. Kabelitz, and T. W. Mak. 1995. Comparison of human and mouse T-cell receptor variable gene segment subfamilies. Immunogenetics 42:531–540

    Crews, S., J. Griffin, H. Huang, K. Calame, and L. Hood. 1981. A single VH gene segment encodes the immune response to phosphorylcholine: somatic mutation is correlated with the class of the antibody. Cell 25:59–66

    Dayhoff, M. O. 1972. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Springs, Md

    Davis, M. M. 1990. T cell receptor gene diversity and selection. Annu. Rev. Biochem. 59:475–496[ISI][Medline]

    Deininger, P. L., and M. A. Batzer. 1993. Evolution of retroposons. Pp. 157–196 in M. K. Hecht, ed. Evolutionary biology. Vol. 27. Plenum Press, New York

    Funkhouser, W., B. F. Koop, P. Charmley, D. Martindale, J. Slightom, and L. Hood. 1997. Evolution and selection of primate T cell antigen receptor BV8 gene subfamily. Mol. Phylogenet. Evol. 8:51–64[ISI][Medline]

    Gojobori, T., and M. Nei. 1984. Concerted evolution of the immunoglobulin VH gene family. Mol. Biol. Evol. 1:195– 212[Abstract]

    Gu, X., and M. Nei. 1999. Locus specificity of polymorphic alleles and evolution by a birth-and-death process in mammalian MHC genes. Mol. Biol. Evol. 16:147–156[Abstract]

    Gu, X., and J. Zhang. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106–1113[Abstract]

    Hood, L., J. H. Campbell, and S. C. Elgin. 1975. The organization, expression, and evolution of antibody genes and other multigene families. Annu. Rev. Genet. 9:305–353[ISI][Medline]

    Jakobsen, I. B., and S. Easteal. 1996. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput. Appl. Biosci. 12:291–295[Abstract]

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York

    Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol. 42:59–65[ISI][Medline]

    Klein, J., and V. HorejsI. 1997. Immunology. Blackwell, Oxford, England

    Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917–920

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Pennsylvania State University, University Park

    Lee, Y. H., T. Ota, and V. D. Vacquier. 1995. Positive selection is a general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12:231–238[Abstract]

    Li, W.-H., M. Gouy, P. M. Sharp, C. O'hUigin, and Y. W. Yang. 1990. Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla, and Carnivora and molecular clocks. Proc. Natl. Acad. Sci. USA 87:6703–6707

    Michelmore, R. W., and B. C. Meyers. 1998. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8:1113–1130[Abstract/Free Full Text]

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426[Abstract]

    Nei, M., X. Gu, and T. Sitnikova. 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. USA 94:7799– 7809

    Nei, M., and A. L. Hughes. 1992. Balanced polymorphism and evolution by the birth-and-death process in the MHC loci. Pp. 27–38 in K. Tsuji, M. Aizawa, and T. Sasazuki, eds. HLA 1991. Proceedings of the 11th Histocompatibility Workshop and Conference. Oxford University Press, Oxford, England

    Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, Oxford, England

    Ohta, T. 1983. On the evolution of multigene families. Theor. Popul. Biol. 23:216–240[ISI][Medline]

    Ota, T., and M. Nei. 1994. Divergent evolution and evolution by the birth-and-death process in the immunoglobulin VH gene family. Mol. Biol. Evol. 11:469–482[Abstract]

    Rast, J. P., M. K. Anderson, T. Ota, R. T. Litman, M. Margittai, M. J. Shamblott, and G. W. Litman. 1994. Immunoglobulin light chain class multiplicity and alternative organizational forms in early vertebrate phylogeny. Immunogenetics 40:83–99

    Rowen, L., B. F. Koop, C. Boysen et al. (21 co-authors). 1997. Mus musculus TCR ß locus of the complete sequence. GenBank accession numbers AE000663, AE000664, and AE000665

    Rowen, L., B. F. Koop, and L. Hood. 1996. The complete 685-kilobase DNA sequence of the human ß T cell receptor locus. Science 272:1755–1762

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425[Abstract]

    Sitnikova, T. 1996. Bootstrap method of interior-branch test for phylogenetic trees. Mol. Biol. Evol. 13:605–611[Abstract]

    Sitnikova, T., and M. Nei. 1998. Evolution of immunoglobulin {kappa} chain variable region genes in vertebrates. Mol. Biol. Evol. 15:50–60[Abstract]

    Sitnikova, T., and C. Su. 1998. Coevolution of immunoglobulin heavy and light chain variable region gene families. Mol. Biol. Evol. 15:617–625[Abstract]

    Smit, A. F. 1993. Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 21: 1863–1872

    Smith, G. P. 1974. Unequal crossover and the evolution of multigene families. Cold Spring Harb. Symp. Quant. Biol. 38:507–513[ISI][Medline]

    Smith, G. P., L. Hood, and W. M. Fitch. 1971. Antibody diversity. Annu. Rev. Biochem. 40:969–1012[ISI]

    Su, C., I. Jakobsen, X. Gu, and M. Nei. 1999. Diversity and evolution of T-cell receptor variable region genes in mammals and birds. Immunogenetics 50:301–308

    Su, C., and M. Nei. 1999. Fifty-million-year-old polymorphism at an immunoglobulin variable region gene locus in the rabbit evolutionary lineage. Proc. Natl. Acad. Sci. USA 96: 9710–9715

    Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony. Sinauer, Sunderland, Mass

    Tajima, F., and M. Nei. 1984. Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1: 269–285

    Takezaki, N., A. Rzhetsky, and M. Nei. 1995. Phylogenetic test of the molecular clock and linearized trees. Mol. Biol. Evol. 12:823–833[Abstract]

    Tanaka, T., and M. Nei. 1989. Positive Darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447–459[Abstract]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680[Abstract]

    Zhang, J., K. D. Dyer, and H. F. Rosenberg. 2000. Evolution of the rodent eosinophil-associated RNase gene family by rapid gene sorting and positive selection. Proc. Natl. Acad. Sci. USA 97:4701–4706

    Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708–3713

    Zimmer, E. A., S. L. Martin, S. M. Beverley, Y. W. Kan, and A. C. Wilson. 1980. Rapid duplication and loss of gene coding for the a chains of hemoglobin. Proc. Natl. Acad. Sci. USA 77:2158–2162

Accepted for publication November 20, 2000.