Adaptive Evolution of Variable Region Genes Encoding an Unusual Type of Immunoglobulin in Camelids

Chen Su, Viet Khong Nguyen1 and Masatoshi Nei

Institute of Molecular Evolutionary Genetics, Department of Biology, The Pennsylvania State University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
A typical immunoglobulin (Ig) molecule is composed of four polypeptide chains: two identical heavy (H) chains and two identical light (L) chains. This tetrameric structure is conserved in almost all jawed vertebrate species. However, it has been discovered that camels and llamas (family: Camelidae) possess a type of dimeric Ig that consists of two H chains only. These H chains do not associate with L chains, and they do not have the first constant region (CH1), which is present in the conventional Ig. In spite of these changes, the dimeric Ig maintains the normal immune function. To understand the evolution of the dimeric Ig, we studied the phylogenetic relationships of the variable region (VHH) genes of the dimeric Ig from Camelidae and those (VH) of the conventional Ig from mammals. The results showed that the VHH genes form a monophyletic cluster within one of the mammalian VH groups, group C. We examined the type of selective force in complementarity-determining regions (CDRs) and framework regions (FRs) by comparing the rate of synonymous (dS) and nonsynonymous (dN) substitutions. We found that the results obtained from VHH genes were similar to those from VH genes in that CDRs showed an excess of dN over dS (indicating positive selection), whereas the reverse was true for FRs (purifying selection). However, when the extent of positive selection or purifying selection was investigated at each codon site, three major differences between VHH and VH genes were found. That is, very different types of selective force were observed between VHH and VH genes (1) at the sites that contact the L chain in the conventional Ig, (2) at the sites that interact with the CH1 region in the conventional Ig, and (3) in the H1 loop. Our findings suggest that adaptive evolution has occurred in the functionally important sites of the VHH genes to maintain the normal immune function in the dimeric Ig.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
In all jawed vertebrates, immunoglobulins (Ig's) play a central role in the adaptive immune system by recognizing the specific molecular patterns of foreign antigens and mediating the interactions that eliminate pathogens from the host organism. A typical Ig molecule consists of four polypeptide chains: two identical heavy (H) chains and two identical light (L) chains. Each H chain is composed of four regions: a variable (VH) region and three constant (CH1, CH2, and CH3) regions, whereas each L chain is composed of two regions: a variable (VL) region and a constant (CL) region. This H2L2 tetrameric structure is conserved in the Ig's of all vertebrate species previously studied, from sharks to humans. It was therefore rather surprising when the serum of camels and llamas (family: Camelidae) was found to contain a novel type of Ig with neither L chains nor CH1 regions (Hamers-Casterman et al. 1993Citation ). Because this type of Ig contains two H chains only, it has been called the heavy-chain Ig. The antipathogenic function of the heavy-chain Ig has been demonstrated in several cases (Hamers-Casterman et al. 1993Citation ; Hamers and Muyldermans 1998Citation ), and it accounts for approximately 50% of the total Ig population in the serum of the camelid (Muyldermans and Lauwereys 1999Citation ).

An interesting question regarding the evolution of Ig's is as to how and when the heavy-chain Ig emerged. A number of studies showed that Ruminantia (sheep and cattle) and Suidae (pigs), two groups of mammals closely related to Camelidae, possess only the conventional Ig (e.g., Sun et al.1994Citation ; Dufour, Malinge, and Nau 1996Citation ; Sinclair, Gilchrist, and Aitken 1997Citation ). Therefore, the emergence of the heavy-chain Ig must have occurred only relatively recently. However, a detailed phylogenetic analysis is necessary to answer this question. Previous studies of phylogenetic relationships among Ig's from different species have concentrated primarily on the variable region genes (e.g., Ota and Nei 1994Citation ; Andersson and Matsunaga 1996Citation ; Sitnikova and Nei 1998Citation ) because these genes encode antigen-binding regions and are largely responsible for the generation of diversity of antibody repertoires. We therefore conducted a phylogenetic analysis of the variable region (VHH) genes of the heavy-chain Ig from camels and llamas, together with the VH genes of the conventional Ig from seven mammalian species, namely humans, mice, rabbits, sheep, cattle, swine, and camels.

The loss of the L chain and the CH1 region in this novel type of Ig raises an important question as to how its antipathogenic function has been maintained. Without the VL regions, the antipathogenic function of the heavy-chain Ig is now carried out by the VHH regions. VHH genes are apparently recent duplicates of VH genes, but their gene products have different characteristics with respect to solubility, efficiency of antigen binding, and the antigens recognized (Sheriff and Constantine 1996Citation ; Nguyen et al. 2000Citation ). Therefore, it seems that adaptive evolution after duplication of VH genes has led to the dichotomy of these two types of Ig's. One way of studying adaptive evolution in Ig's is to examine the extent of positive Darwinian selection in the antigen-binding regions because the need to bind a diverse spectrum of antigens may be aided by natural selection (e.g., Tanaka and Nei 1989Citation ; Sitnikova and Nei 1998Citation ). We have therefore examined the effect of positive selection in the antigen-binding regions of VHH genes. However, a more direct way of studying this problem would be to compare the extent of positive selection or purifying selection at each codon site of VHH genes in comparison with that of VH genes. This is because the structural change of the heavy-chain Ig is likely to have shifted the antigen-binding regions of VHH genes (reviewed in Muyldermans, Cambillau, and Wyns 2001Citation ), and a difference in selective force at a given amino acid site between VHH and VH genes may be viewed as an indication of adaptive change that has occurred in VHH genes. Here, a site-by-site examination is important because adaptive evolution can theoretically occur at any site, and examination of the average selective force over a region may obscure the effect of selection at single sites or at a subset of sites.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Nucleotide Sequences Used
The family of Camelidae is composed of three genera: Camelus, Lama, and Vicugna, among which Camelus and Lama have been found to possess the heavy-chain Ig, whereas very little is known about Vicugna. From one species of Camelus (Camelus dromedarius), Nguyen et al. (2000)Citation have sequenced a large number of germline sequences of VHH and VH genes, among which 42 are functional VHH genes, and 50 are functional VH genes. These genes represent most of the VHH and VH gene repertoires in the camel (Nguyen et al. 2000Citation ). For Lama, there are no germline sequences currently available, but 152 VHH cDNA sequences from the common llama Lama glama have been published in GenBank. We used these cDNA sequences from this species, though cDNA sequences are known to be subject to somatic mutation. However, the relative contribution of somatic mutation to total sequence variation is usually very small, except in the complementarity-determining regions (CDRs) (see later) of VH genes (Gojobori and Nei 1986Citation ; Nguyen et al. 2000Citation ). We therefore used these llama genes in the analysis. All the camel and llama genes were retrieved from GenBank. The accession numbers (in descending order) are as follows—(1) llama VHH: AJ238060–45, AJ237393–276, AJ236108–094, AJ131931–29; (2) camel VHH: AF000604, AJ245148–07; and (3) camel VH: AF000603, AJ245198–49.

For the purpose of the phylogenetic tree construction, we used the same VH sequences from humans, mice, rabbits, sheep, cattle, and pigs as those used by Sitnikova and Su (1998)Citation because they are representatives of the mammalian VH repertoire including three major VH gene groups (A, B, and C) (Ota and Nei 1994Citation ). The VH data set from Sitnikova and Su (1998)Citation included primarily germline genes. Only for the swine species Sus scrofa did they use cDNA sequences. However, we recently found seven pig germline genes published (accession numbers AF064692–86), and included them in the present study.

Phylogenetic Analysis
The VHH and VH nucleotide sequences were aligned using the computer program Clustal W (Thompson, Higgins, and Gibson 1994Citation ), with additional minor modifications by visual inspection. Phylogenetic analyses were conducted using the MEGA computer program (Kumar, Tamura, and Nei 1993Citation ), except when parsimonious trees were constructed (see later). We estimated the evolutionary distances between nucleotide sequences using the uncorrected p-distance. The p-distance was used because it is known to give better results when the number of sequences is large and the number of nucleotides used is relatively small (Nei and Kumar 2000Citation ; Takahashi and Nei 2000Citation ). The phylogenetic tree was constructed using the neighbor-joining (NJ) method (Saitou and Nei 1987Citation ). Sites with alignment gaps (indels) in each pair of sequences under comparison were omitted (the pairwise deletion option in MEGA). However, we obtained essentially the same result when the option of complete deletion (i.e., omitting all sites with indels) was used.

To examine the reliability of the tree topology, we constructed a parsimony consensus tree using PAUP* (Swofford 1998Citation ). In this case, the full heuristic search (standard stepwise addition + tree bisection–reconnection [TBR]) method was done for 500 bootstrap replications, and for each replication the TBR search was repeated 100 times. The resultant bootstrap 50% majority-rule consensus tree was compared with the NJ tree.

Selection on the Antigen-binding Regions of Camelid VHH Genes
In general, the Ig variable region genes can be divided into five regions: two CDRs, CDR1 and CDR2, and three framework regions (FRs), FR1, FR2, and FR3. The CDRs were antigen-binding sites and highly variable (Davies, Padlan, and Sheriff 1990Citation ). A number of previous studies showed that positive selection is operating on the CDRs of VH genes, as indicated by the fact that the average nonsynonymous (amino acid altering) substitutions per site (N) is higher than the average synonymous (silent) substitutions per site (S) in the CDRs, and that the reverse is true for the FRs (e.g., Tanaka and Nei 1989Citation ; Rothenfluh et al. 1994Citation ; Ota and Nei 1995Citation ; Sitnikova and Nei 1998Citation ). We first studied adaptive evolution in the heavy-chain Ig by examining if there is positive selection operating in the CDRs of VHH genes. For this purpose, we computed N and S for CDRs and FRs separately. Here we used the definition of CDRs and FRs presented by Hamers-Casterman et al. (1993)Citation . It should be noted that this definition for the VHH genes may not be appropriate because it was based on the 3D structure of the conventional Ig and on the alignment of VHH genes to VH genes. However, a number of crystal-structural studies of the heavy-chain Ig's (Desmyter et al. 1996Citation ; Spinelli et al. 2000Citation ) showed that this definition is generally acceptable. We therefore used it in our study. We deleted 21 llama VHH genes, 9 camel VHH genes, and 5 camel VH genes from the aligned data set because they contained long alignment gaps. As a result, a total of 211 sequences were used in this analysis. In addition, we used the original Nei and Gojobori method (Nei and Gojobori 1986Citation ) rather than the modified Nei and Gojobori method (Zhang, Rosenberg, and Nei 1998Citation ) to obtain the values of S and N because our estimate showed that the transition-transversion bias in the VHH genes (R = 0.99) was low, and because the original method is more conservative than the modified method. The variance of N and S was computed using the method of Nei and Jin (1989)Citation .

Selection on Individual Codon Sites of Camelid VHH Genes
To study adaptive evolution in the heavy-chain Ig in more detail, we compared the selective force operating at each codon site of VHH and VH genes. In this analysis, we chose 32 representative VHH genes from camels and 38 representative genes from llamas. The 70 VHH sequences used were chosen from 180 camel and llama VHH sequences by first constructing an NJ tree of the 180 sequences and then choosing randomly one sequence from each tight cluster found in the tree. In this way, we excluded the closely related sequences to speed up the computational process (see later).

In this analysis we used the method of Suzuki and Gojobori (1999)Citation , which can be briefly summarized as follows. First, the ancestral nucleotide sequence of each interior node was inferred by using a parsimony criterion for a given tree topology, which was generated by the NJ method. The numbers of synonymous (st) and nonsynonymous (nt) sites were then computed for each codon site by taking into account the extant sequences as well as the ancestral sequences. Next, for each codon site the observed numbers of synonymous and nonsynonymous changes for all branches were summed up to obtain the total numbers of synonymous (sc) and nonsynonymous (nc) changes, respectively. The probability of occurrence of nc nonsynonymous changes when the total number of changes was tc (=sc + nc) and all the changes were assumed to be neutral is then given by


Therefore, the probabilities of observing nc or more nonsynonymous substitutions is


Positive selection is detected if P is less than the significant level (e.g., 5%) because an excess of nonsynonymous substitutions over synonymous substitutions is regarded as a result of positive selection (Hughes and Nei 1988Citation ). A similar procedure can also be used to detect purifying selection. In this case we consider the probability of observing sc or more synonymous substitutions.

For the purpose of comparison, we also applied the same procedures to a combined data set of human VH3 genes and camel VH genes. Here we used genes from the human VH3 subfamily because these genes are known to be closely related to the camelid VHH genes (Vu et al. 1997Citation ; Nguyen et al. 2000Citation ). We also included camel VH genes for comparison because they appear to share the last common ancestor with VHH genes (see Results). Another important reason for using both human and camel VH genes is that the statistical power of the method of Suzuki and Gojobori increases as the number of sequences used or the total number of nucleotide changes observed for a certain site increases. In general, at least 50–60 fairly divergent sequences should be used (T. Gojobori, personal communication). Therefore, the use of either human VH3 genes or camel VH genes alone is unlikely to be sufficient. Lastly, our preliminary study indicated that if examined separately, the human VH3 and the camel VH genes showed quite similar profiles of selective forces, so that it seemed reasonable to combine these two homologous data sets. In this analysis, the number of human VH3 genes used was 58, and the number of camel VH genes used was 38. All human VH3 genes were retrieved from the international ImMunoGeneTics (IMGT) database (http://imgt.cines.fr:8104; Lefranc et al. 1999Citation ), whereas camel VH genes were retrieved from GenBank, as mentioned earlier. The 58 human sequences were chosen from a total of 113 human VH3 genes and the 38 camel sequences from 50 camel VH genes, using the same criterion as that used for VHH genes. The aligned sequences for these genes are available from the World Wide Web at http://mep.bio.psu.edu/databases.

The above computation was done by using the computer program SGI written by C.S., except for the reconstruction of ancestral sequences, which was done by using the computer program pamp in the PAML package (Yang 2000Citation ). The SGI program is available at http://mep.bio.psu.edu/.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Phylogenetic Relationships Among VHH and VH Genes
The CDRs are highly variable and difficult to align when sequences from different species are considered (Ota and Nei 1994Citation ). Therefore, they were omitted from this analysis. The phylogenetic tree of the FRs of VHH and VH genes constructed by the NJ method is presented in figure 1 . The MP tree showed a topology almost identical to that of the NJ tree (data not shown). The seven germline pig genes that were included in the analysis formed a tight cluster (PB = 100%) with the pig cDNA genes. For simplicity, we showed only two pig germline genes (AF064689 and AF064690) in the tree. By the same token, we did not show all the camelid VHH and VH genes in the tree because the llama VHH, camel VHH, and camel VH genes formed three separate clusters, and we presented only representative sequences for each of the three clusters in the tree.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 1.—Phylogenetic (NJ) tree of 93 VH and VHH genes from different species of mammals. The gene notations are the same as those used by Sitnikova and Su (1998)Citation , except for the camelid genes and the two pig germline sequences. The llama genes are denoted by the last three digits of the GenBank accession numbers. The camel genes are identified by the gene names given in the GenBank, except for VHH 604 and VH 603, which were from accession numbers AF000604 and AF000603, respectively. The two pig germline sequences are designated by their accession numbers (AF064690 and AF064689). A total of 225 nucleotides per sequence were used. The tree was constructed by using the uncorrected proportion of nucleotide differences (p-distance). Three VH gene groups (A, B, and C) are indicated by brackets. Bootstrap (PB) values are shown only for the interior branches for which PB >= 50%. For the interior branches that identify the three major gene groups, the PB values are shown in boldface. Two VH genes from the cartilaginous fish (sharks and skates) are used as the outgroup

 
Figure 1 shows that all VH genes are classified into three gene groups, A, B, and C, which constitute the mammalian VH repertoire (Ota and Nei 1994Citation ). All three gene groups are supported by highly significant bootstrap values (PB = 98%, 100%, and 99%, respectively). All the VHH genes form a single monophyletic cluster with a bootstrap support of 72%. This cluster belongs to group C, and its closest relatives appear to be camel VH genes and human VH3 genes. Within the cluster of VHH genes, the llama genes form a monophyletic cluster, which is separate from the cluster of camel genes. However, neither of the PB values for these two clusters is statistically significant (PB = 47% and 30%, respectively). Therefore, the camel and llama VHH genes may not have been well differentiated. VH genes from the three species closely related to Camelidae, namely sheep, cattle, and pigs, form quite different clusters, and the cattle and sheep genes belong to group B of mammalian genes. These results indicate that the VHH genes diverged from a lineage of group C VH genes, which is distinct from the VH lineages of pigs and ruminants. The divergence between VHH and VH genes occurred before the divergence between camels and llamas but after camelids diverged from ruminants, presumably about 11–40 MYA, according to the fossil records (Webb 1974Citation ; Harrison 1979Citation ). A detailed study of the divergence time between the VHH genes and the VH genes is beyond the scope of this paper and will be presented elsewhere.

The llama VHH sequences show longer root-to-tip branch lengths than the camel VHH genes (fig. 1 ). However, this does not necessarily mean that positive selection has occurred in the FRs of the llama VHH genes. As the llama genes used were obtained from cDNA libraries, the high diversity among the llama VHH sequences could be the result of the effect of somatic mutations in the FRs, the relaxation of functional constraint, or random chance. However, to understand the cause of the longer branches of llama VHH genes, it is necessary to use germline VHH sequences from L. glama. Note that some human and mouse germline genes appear as long root-to-tip branch lengths like those of llama cDNA.

Adaptive Evolution: Selection in the CDRs and FRs of Camelid VHH and VH Genes
In this analysis, we examined whether or not positive selection operates on the CDRs of camelid VHH genes. We also included the camel VH genes in the analysis because no such study has been done before. We divided all camel and llama genes into three groups: camel VHH, llama VHH, and camel VH genes (Table 1 ). The values of S and N were calculated separately for three different comparisons: within each group, between different groups, and for the entire data set. The results obtained are presented in Table 1 and may be summarized as follows: (1) When CDRs are considered, N is greater than S in all within- and between-group comparisons, although the difference is statistically significant in only one comparison. It is interesting that N is greater than S even in the comparison between VH and VHH genes. This suggests that the CDRs of the VH and VHH genes are located in the same DNA regions, as discussed earlier. Table 1 shows that if we consider the entire data set of all sequences, the difference between N and S is highly significant. This result is consistent with the conclusion obtained by Tanaka and Nei (1989)Citation and suggests that the CDRs are subject to positive Darwinian selection in both VHH and VH genes. (2) In contrast, in FRs N is almost always smaller than S, and if we consider the entire set of sequences, the former is significantly smaller than the latter. (3) Comparison of S values between the CDRs and the FRs in each within- and between-group study shows that S is more or less the same for all CDRs and FRs, and none of the differences is statistically significant. In contrast, N in the CDRs is always greater than N in the FRs, and the difference is always significant. This result also suggests that positive selection operates primarily in the CDRs. This conclusion appears to apply to both VH and VHH genes.


View this table:
[in this window]
[in a new window]
 
Table 1 Average Synonymous ((S) and Nonsynonymous ((N) Substitutionsa in the CDRs and FRs of the Camelid VH and VHH Genes

 
Adaptive Evolution: Selection at Individual Codon Sites in Camelid VHH Genes
The above results showed that the general pattern of selection in the CDRs and FRs of camelid VHH genes is similar to that of VH genes. To examine adaptive evolution in the heavy-chain Ig in more detail, we compared the type and extent of selective force at each codon site of VHH and VH genes. The results from the study of positive selection in VHH and VH genes are presented in figure 2A and B, respectively, and the results regarding purifying selection are presented in figure 3A and B, respectively.



View larger version (62K):
[in this window]
[in a new window]
 
Fig. 2.—Extent of positive selection at each codon site. A, Camel and llama VHH genes. The extent of positive selection was computed by using 70 representative VHH sequences, of which 32 came from camels and 38 from llamas. The regions of three FRs and two CDRs are marked on the top. The ordinate indicates the value of 1 - P for each amino acid site (see text). A dashed line indicates the 5% significance level (P = 0.05). A total of 95 codons are used after the removal of sites where one or more alignment gaps exist. B, Human VH3 and camel VH genes. The extent of selection was computed by using 58 human VH3 genes and 38 camel VH genes. All these genes belong to the VH group C and are most closely related to the VHH genes (fig. 1 ). The notations used are the same as those in (A)

 


View larger version (92K):
[in this window]
[in a new window]
 
Fig. 3.—Extent of purifying selection at each codon site. A, Camel and llama VHH genes. The extent of selection was computed by using the same 70 representative VHH sequences as those used in figure 2A. The 1 - P value above the dashed line indicates that the excess of the number of synonymous changes over the number of nonsynonymous changes is statistically significant at the 5% level. B, Human VH3 and camel VH genes. The extent of selection was computed by using the same genes as those used in figure 2B.

 
Figures 2A and 3A suggest that, out of the 25 codon sites in the CDRs of VHH genes, five codons were subject to positive selection, and two codons were subject to purifying (negative) selection. Of the remaining 70 FR sites, positive selection apparently occurred only at one codon site, and 20 sites were subject to purifying selection. Fisher's exact test showed that a significantly larger percentage of sites were positively selected in the CDRs than in the FRs (see Table 2 ). For VH genes, 5 out of 25 codons in the CDRs were positively selected, and seven were negatively selected. Of the remaining 70 FR sites, one was inferred as positively selected, and 26 were negatively selected. Again, Fisher's exact test showed that a significantly larger percentage of sites were positively selected in the CDRs than in the FRs.


View this table:
[in this window]
[in a new window]
 
Table 2 Numbers of Codon Sites at Which Positive or Purifying Selection Was Detected in the CDRs and FRs of VHH and VH Genes

 
However, the evolutionary change of VHH genes is not exactly the same as that of VH genes. There are three major differences between VHH and VH genes with respect to positive selection (fig. 2 ). First, the FRs of VHH genes appear to have experienced relatively higher proportions of nonsynonymous changes than the FRs of VH genes. Among the three FR regions, FR2 showed the highest differences between VHH and VH genes. In the FR2 of VH genes, most nucleotide substitutions were synonymous, and very few were nonsynonymous (fig. 2B; see also Rothenfluh et al. 1994Citation ; Rothenfluh, Blanden, and Steele 1995Citation ). In contrast, the FR2 of VHH genes has considerably more nonsynonymous changes, especially with regard to sites 40, 41, 44, 47, and 48. This result may be explained by the following. (1) The FR2 of VH genes contains functionally important sites that determine the contact of VH with VL (Chothia et al. 1985Citation ), and the nonsynonymous changes at these sites are deleterious. (2) In the heavy-chain Ig, H chains do not associate with L chains, so the functional constraint does not exist, and nonsynonymous substitutions may accumulate in the FR2 of VHH genes.

A closer examination of the substitutions occurring in the FR2 of VHH genes reveals that many nonsynonymous substitutions were from hydrophobic amino acids to hydrophilic ones. For example, the ancestral amino acid at site 44 was presumably hydrophobic because a great majority of the camel VH and human VH3 genes have the amino acid glycine (G) at this site. However, this glycine has changed to hydrophilic amino acids (E, K, and D) in 65 out of the 70 sites studied in the camel and llama VHH genes. To explain this observation, we propose that mutations from hydrophobic amino acids to hydrophilics might have been fixed by directional selection at some sites in the FR2 region for the following two reasons. First, in the case of the conventional Ig, it is known that before the Ig is assembled in the endoplasmic reticulum, some hydrophobic amino acids in FR2 are bound by the chaperon protein BiP (Knarr et al. 1995Citation ), and that during the antibody assembly process, this BiP protein has to be replaced by an L chain before the Ig can be secreted to the circulatory system. In the heavy-chain Ig, once the hydrophobic amino acids mutate to hydrophilic ones, it would be impossible for the BiP to bind to them. Therefore, the heavy-chain Ig can be secreted from the cell without the L chain (Muyldermans and Lauwereys 1999Citation ). Second, these mutations would increase the solubility of the heavy-chain Ig and would probably compensate for the loss of solubility without the VL region (Ghahroudi et al. 1997Citation ). With regard to these two aspects, the high proportions of nonsynonymous changes in FR2 of VHH genes might be an adaptation to the loss of the L chain.

Another difference between VHH and VH genes was observed in the region immediately preceeding (including the beginning of) CDR1 (positions 24–32; fig. 2 ). Sites of this region showed much higher proportions of nonsynonymous changes in VHH genes than in VH genes (see fig. 2A ). Interestingly, this region corresponds to the first hypervariable region (the H1 loop; Chothia et al. 1992Citation ) and is where the main-chain atoms of the first structural loop locate. Crystallographic studies of the VHH-antigen complexes showed that the amino acids located in this region interact with the antigens (Desmyter et al. 1996Citation ; Decanniere et al. 1999Citation ; Spinelli et al. 2000Citation ). The nonsynonymous changes in this region of the VHH genes are likely to reflect an additional expansion of the antigen-binding repertoire of the heavy-chain Ig to compensate for the loss of the antigen-binding specificity without the VL regions.

A third difference is observed at position 14, which is well conserved in VH genes and yet subject to strong positive selection in the VHH genes (fig. 2 ). We found that most nonsynonymous substitutions occurring at this site were conservative rather than radical because (1) the ancestral amino acid at this site was probably proline (P), if we consider the fact that a majority of the camel VH and human VH3 genes have a proline at this site; and (2) in VHH genes almost all amino acids (A, S, and T) that replaced proline at this site are small and neutral, just like proline, and in only one sequence was a relatively large and charged amino acid (H) found. It is puzzling why different small and neutral amino acids are positively selected at this site. This site is unlikely to be involved in antigen recognition because crystal-structure study indicates that this amino acid site resides at the region opposite the antigen-binding pocket (Desmyter et al. 1996Citation ). Examining the crystal structure, we found that this site is located extremely close to a small hydrophobic socket (formed by site 11 in FR1 and sites 110 and 112 in FR4 encoded by the gene JH; Lesk and Chothia 1988Citation ), which is well conserved at the sequence level and contacts the CH1 region in the conventional Ig. In the heavy-chain Ig, however, this function of the socket must have changed because of the loss of the CH1 domain. This functional change may require conformational changes in the flanking regions. For example, as the socket is hydrophobic, it may have to be buried inside the molecule to minimize exposure to the solvent. Examining the indexes of solvent accessibility (the tendency to have access to the solvent) of these amino acids (Bordo and Argos 1991Citation ; Karplus 1997Citation ), we found that all new amino acids (A, S, and T) at position 14 have considerably lesser solvent accessibility and smaller nonpolar surface area compared with the ancestral amino acid (P). This suggests that these mutations are directional and may have led to a conformational change of the flanking region of the socket. If this is the case, the high frequency of nonsynonymous substitutions at site 14 might be an adaptation to the loss of the CH1 region in the heavy-chain Ig.

The profile of the extent of purifying selection (fig. 3 ) is negatively correlated to that of positive selection (fig. 2 ). In particular, (1) the FR2 region of VHH genes shows a lesser extent of purifying selection than the FR2 of VH genes, (2) in VHH genes, codon sites 24–32 show considerably lower proportions of synonymous changes than those in VH genes, and (3) in VH genes, purifying selection operates at site 14, whereas positive selection occurs at this site in the VHH genes, as mentioned earlier.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We have shown that the camelid VHH genes evolved from a group C lineage of VH genes and have gone through a number of adaptive changes. Previous authors have noted some of these changes. For example, Vu et al. (1997)Citation and Nguyen et al. (2000)Citation showed that the H1 loop (site 26–32) in camelid VHH genes was characterized by a much higher sequence variability when compared with that of the human VH genes. In their study, the variability of a codon site was calculated by dividing the number of different amino acids by the frequency of the most common amino acid at the site (Kabat et al. 1991Citation ). Synonymous nucleotide substitutions were ignored. In addition, Nguyen et al. (2000)Citation showed that the sites that form the H1 loop undergo more somatic mutations in the camel VHH cDNA sequences than in the VH cDNA sequences. They suggested that this is a strategy used by the heavy-chain Ig to increase the antigen-binding repertoire. In this paper, we showed that the sites forming the H1 loop have experienced higher proportions of nonsynonymous substitutions in VHH genes than in VH genes.

In the present paper, we examined the adaptive changes of codons at individual amino acid sites using the method of Suzuki and Gojobori (1999)Citation . This method appears to give results that are more reliable than the previous ones (e.g., Fitch et al. 1997Citation ; Nielsen and Yang 1998Citation ) because it is based on more realistic assumptions and models (Suzuki and Nei 2001Citation ). For example, in the method of Fitch et al., the computation of pn in equation (1) is done under the assumption that st/(st + nt) is the same for all variable amino acid sites. However, as a majority of amino acid sites are subject to purifying selection rather than to positive selection with st > nt, this method tends to underestimate P for many sites, and consequently it tends to overestimate the number of sites with positive selection. In contrast, Nielsen and Yang assumed that the ratio of nonsynonymous to synonymous substitution rates is the same for all positively selected codon sites. In reality, this ratio is expected to vary from codon to codon.

To see whether different statistical methods give different results, we applied the codeml program of the PAML package (Yang 2000Citation ) to identify positively selected amino acid sites by the maximum likelihood method of Nielsen and Yang (1998)Citation . The results obtained by this method were different from those obtained by the Suzuki-Gojobori method, except for the VHH CDR1 region where both methods identified the same three sites. Moreover, the results of the codeml program depend on the initial value of {omega}, the ratio of nonsynonymous changes to synonymous changes. Different initial {omega} values will generate different results, suggesting that the method often fails to find the global maximum likelihood value. It is also known that the Nielsen-Yang method often gives false positive results (Suzuki and Nei 2001Citation ). In fact, the Nielsen-Yang method suggested more positively selected sites, especially in the FR regions, where it was difficult to offer a biological explanation. However, the identification of selective sites by the current statistical methods is subject to sampling errors (Suzuki and Gojobori 1999Citation ). Therefore, the validity of the results obtained by these methods should eventually be examined experimentally by using site-directed mutagenesis or some other molecular techniques (e.g., Jermann et al. 1995Citation ).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Y. Suzuki for helpful discussions. This study was supported by research grants from the National Institute of Health to M.N.


    Footnotes
 
Dan Graur, Reviewing Editor

Keywords: adaptive evolution positive selection phylogenetic tree Camelidae VHH genes immunoglobulin Back

Address for correspondence and reprints: Chen Su, Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285. su_chen{at}lilly.com . Back

Present address: Department of Ultrastructure, Vlaams Interuniversitair Instituut voor Biotechnologie, Vrije Universiteit Brussel, Paardenstraat 65, B-1640 Sint Genesius Rode, Belgium Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Andersson E., T. Matsunaga, 1996 Jaw, adaptive immunity and phylogeny of vertebrate antibody VH gene family Res. Immunol 147:233-240[ISI][Medline]

    Bordo D., P. Argos, 1991 Suggestions for "safe" residue substitutions in site-directed mutagenesis J. Mol. Biol 217:721-729[ISI][Medline]

    Chothia C., A. M. Lesk, E. Gherardi, I. M. Tomlinson, G. Walter, J. D. Marks, M. B. Llewelyn, G. Winter, 1992 Structural repertoire of the human VH segments J. Mol. Biol 227:799-817[ISI][Medline]

    Chothia C., J. Novotny, R. Bruccoleri, M. Karplus, 1985 Domain association in immunoglobulin molecules. The packing of variable domains J. Mol. Biol 186:651-663[ISI][Medline]

    Davies D. R., E. A. Padlan, S. Sheriff, 1990 Antibody–antigen complexes Annu. Rev. Biochem 59:439-473[ISI][Medline]

    Decanniere K., A. Desmyter, M. Lauwereys, M. A. Ghahroudi, S. Muyldermans, L. Wyns, 1999 A single-domain antibody fragment in complex with RNase A: non-canonical loop structures and nanomolar affinity using two CDR loops Struct. Fold. Des 7:361-370[Medline]

    Desmyter A., T. R. Transue, M. A. Ghahroudi, M. H. Thi, F. Poortmans, R. Hamers, S. Muyldermans, L. Wyns, 1996 Crystal structure of a camel single-domain VH antibody fragment in complex with lysozyme Nat. Struct. Biol 3:803-811[ISI][Medline]

    Dufour V., S. Malinge, F. Nau, 1996 The sheep Ig variable region repertoire consists of a single VH family J. Immunol 156:2163-2170[Abstract]

    Fitch W. M., R. M. Bush, C. A. Bender, N. J. Cox, 1997 Long term trends in the evolution of H(3) HA1 human influenza type A Proc. Natl. Acad. Sci. USA 94:7712-7718[Abstract/Free Full Text]

    Ghahroudi M. A., A. Desmyter, L. Wyns, R. Hamers, S. Muyldermans, 1997 Selection and identification of single domain antibody fragments from camel heavy-chain antibodies FEBS Lett 414:521-526[ISI][Medline]

    Gojobori T., M. Nei, 1986 Relative contributions of germline gene variation and somatic mutation to immunoglobulin diversity in the mouse Mol. Biol. Evol 3:156-167[Abstract]

    Hamers R., S. Muyldermans, 1998 Immunology of the camels and llamas Pp. 421–438 in P. Pastoret, H. Bazin, P. Griebel, and A. Govaerts, eds. Handbook of vertebrate immunology. Academic Press, London

    Hamers-Casterman C., T. Atarhouch, S. Muyldermans, G. Robinson, C. Hamers, E. B. Songa, N. Bendahman, R. Hamers, 1993 Naturally occurring antibodies devoid of light chains Nature 363:446-448[ISI][Medline]

    Harrison J. A., 1979 Revision of the Camelidae (Artiodactyla, Tylopoda) and description of the new genus Alforjas Palaeontol. Ctrl 95:1-20

    Hughes A. L., M. Nei, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection Nature 335:167-170[ISI][Medline]

    Jermann T. M., J. G. Opitz, J. Stackhouse, S. A. Benner, 1995 Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily Nature 374:57-59[ISI][Medline]

    Kabat E. A., T. T. Wu, H. M. Perry, K. S. Gottesman, C. Foeller, 1991 Sequences of proteins of immunological interest US Public Health Services, NIH, Bethesda, Md

    Karplus P. A., 1997 Hydrophobicity regained Protein Sci 6:1302-1307[Abstract/Free Full Text]

    Knarr G., M. J. Gething, S. Modrow, J. Buchner, 1995 BiP binding sequences in antibodies J. Biol. Chem 270:27589-27594[Abstract/Free Full Text]

    Kumar S., K. Tamura, M. Nei, 1993 MEGA: molecular evolutionary genetics analysis The Pennsylvania State University, University Park, Pa

    Lefranc M. P., V. Giudicelli, C. Ginestoux, J. Bodmer, W. Muller, R. Bontrop, M. Lemaitre, A. Malik, V. Barbi, D. Chaume, 1999 IMGT, the international ImMunoGeneTics database Nucleic Acids Res 27:209-212[Abstract/Free Full Text]

    Lesk A. M., C. Chothia, 1988 Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint Nature 335:188-190[ISI][Medline]

    Muyldermans S., C. Cambillau, L. Wyns, 2001 Recognition of antigens by single-domain antibody fragments: the superfluous luxury of paired domains Trends Biochem. Sci 26:230-235[ISI][Medline]

    Muyldermans S., M. Lauwereys, 1999 Unique single-domain antigen binding fragments derived from naturally occurring camel heavy-chain antibodies J. Mol. Recognit 12:131-140[ISI][Medline]

    Nei M., T. Gojobori, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions Mol. Biol. Evol 3:418-426[Abstract]

    Nei M., L. Jin, 1989 Variances of the average numbers of nucleotide substitutions within and between populations Mol. Biol. Evol 6:290-300[Abstract]

    Nei M., S. Kumar, 2000 Molecular evolution and phylogenetics Oxford University Press, Oxford

    Nguyen V. K., R. Hamers, L. Wyns, S. Muyldermans, 2000 Camel heavy-chain antibodies: diverse germline VHH and specific mechanisms enlarge the antigen-binding repertoire EMBO J 19:921-930[Abstract/Free Full Text]

    Nielsen R., Z. Yang, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene Genetics 148:929-936[Abstract/Free Full Text]

    Ota T., M. Nei, 1994 Divergent evolution and evolution by the birth-and-death process in the immunoglobulin VH gene family Mol. Biol. Evol 11:469-482[Abstract]

    ———. 1995 Evolution of immunoglobulin VH pseudogenes in chickens Mol. Biol. Evol 12:94-102[Abstract]

    Rothenfluh H. S., R. V. Blanden, E. J. Steele, 1995 Evolution of V genes: DNA sequence structure of functional germline genes and pseudogenes Immunogenetics 42: ::159-171[ISI][Medline]

    Rothenfluh H. S., A. J. Gibbs, J. Blangero, E. J. Steele, 1994 Analysis of patterns of DNA sequence variation in flanking and coding regions of murine germ-line immunoglobulin heavy-chain variable genes: evolutionary implications Proc. Natl. Acad. Sci. USA 91:12163-12167[Abstract/Free Full Text]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Sheriff S., K. L. Constantine, 1996 Redefining the minimal antigen-binding fragment Nat. Struct. Biol 3:733-736[ISI][Medline]

    Sinclair M. C., J. Gilchrist, R. Aitken, 1997 Bovine IgG repertoire is dominated by a single diversified VH gene family J. Immunol 159:3883-3889[Abstract]

    Sitnikova T., M. Nei, 1998 Evolution of immunoglobulin kappa chain variable region genes in vertebrates Mol. Biol. Evol 15:50-60[Abstract]

    Sitnikova T., C. Su, 1998 Coevolution of immunoglobulin heavy and light chain variable region gene families Mol. Biol. Evol 15:617-625[Abstract]

    Spinelli S., L. Frenken, D. Bourgeois, L. de Ron, W. Bos, T. Verrips, C. Anguille, C. Cambillau, M. Tegoni, 1996 The crystal structure of a llama heavy chain variable domain Nat. Struct. Biol 3:752-757[ISI][Medline]

    Spinelli S., L. G. Frenken, P. Hermans, T. Verrips, K. Brown, M. Tegoni, C. Cambillau, 2000 Camelid heavy-chain variable domains provide efficient combining sites to haptens Biochemistry 39:1217-1222[ISI][Medline]

    Sun J., I. Kacskovics, W. R. Brown, J. E. Butler, 1994 Expressed swine VH genes belong to a small VH gene family homologous to human VHIII J. Immunol 153:5618-5627[Abstract/Free Full Text]

    Suzuki Y., T. Gojobori, 1999 A method for detecting positive selection at single amino acid sites Mol. Biol. Evol 16:1315-1328[Abstract]

    Suzuki Y., M. Nei, 2002 Efficiencies of parsimony-based and likelihood-based methods for detecting positive selection at single amino acid sites Mol. Biol. Evol. (in press)

    Swofford D. L., 1998 PAUP*: phylogenetic analysis using parsimony Sinauer Associates, Sunderland, Mass

    Takahashi K., M. Nei, 2000 Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used Mol. Biol. Evol 17:1251-1258[Abstract/Free Full Text]

    Tanaka T., M. Nei, 1989 Positive Darwinian selection observed at the variable-region genes of immunoglobulins Mol. Biol. Evol 6:447-459[Abstract]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Vu K. B., M. A. Ghahroudi, L. Wyns, S. Muyldermans, 1997 Comparison of llama VH sequences from conventional and heavy chain antibodies Mol. Immunol 34:1121-1131[ISI][Medline]

    Webb S. D., 1974 Pleistocene llamas of Florida, with a brief review of the Lamini Pp. 170–214 in S. D. Webb, ed. Pleistocene mammals of Florida. University Press of Florida, Gainsville, Fla

    Yang Z., 2000 PAML: phylogenetic analysis by maximum likelihood University College London, London

    Zhang J., H. F. Rosenberg, M. Nei, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes Proc. Natl. Acad. Sci. USA 95:3708-3713[Abstract/Free Full Text]

Accepted for publication September 24, 2001.