*Département d'Informatique Fondamentale et Applications, LIRMM, 161 rue Ada, 34392 Montpellier, France;
Laboratoire d'Immunogénétique Moléculaire, LIGM, Université Montpellier II, UPR CNRS 1142, IGH, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The three main distinct mechanisms which generate tandem duplication of DNA stretches are slipped-strand mispairing, gene conversion, and unequal recombination. The latter (also known as unequal crossover) is widely viewed as the predominant biological mechanism responsible for the production of medium to large tandemly repeated sequences. Various examples have been described (Ohno 1970
; Smith 1976
; Jeffreys and Harris 1981
; Collins and Weissman 1984
; Gumucio et al. 1988
; Ruddle et al. 1994
; Honjo and Alt 1995
, p. 269). Recombination (Alberts et al. 1995
, p. 863) arises during meiosis, just after chromosome duplication, when chromosomes line up in tetrad configuration. At this time they can exchange segments of DNA. In most cases, recombination does not produce repeated segments because chromosomes are well aligned. However, because of the presence of short repeated sequences, unequal pairing of the chromosomes may sometimes occur, and the shift between both chromatids duplicates a fragment of DNA. Because a DNA fragment from one chromosome is transported to another chromosome, unequal recombination also deletes a fragment from one of the two chromosomes. This duplication mechanism is illustrated in figure 1
, step 1 (sequences are shortened for the purpose of illustration). Tandemly repeated sequences in turn increase the likelihood of additional tandem duplications (fig. 1
, step 2) because they increase the possibilities of mispairing. Block duplication, or simultaneous duplication of several genes in tandem (as shown in fig. 1
, step 3), was also found to have occurred in several loci (Lefranc et al. 1986
; Corbett et al. 1997
; Hordvik et al. 1999
).
|
The first manual reconstruction of the duplication history for a complete locus containing tandemly repeated genes was apparently related in Shen, Slightom, and Smithies (1981)
for the human fetal globin. Algorithms for the reconstruction of the ancestral predoubling genome, from a set of chromosomes divided into segments, are presented in El-mabrouk (2000
) and applied to the genome duplication that may have occurred in Saccharomyces cerevisiae. The emerging field of genome rearrangement describes edit distances (Sankoff and Blanchette 1999
) between species as the minimum number of inversions, translocations, duplications, and deletions necessary to transform one genome into another and uses these distances for phylogenetic inference. Closer to our problem, algorithms for phylogenetic analysis of minisatellites were previously presented in (Benson and Dong 1999). However, large repeats produced by unequal recombination have not received much attention to date, and we could not find any program or description of an algorithm for automated reconstruction of duplication histories.
In this paper, we deal with the problem of reconstructing the duplication history of a set of large tandemly repeated genes. We suppose that the main biological mechanism responsible for the generation of tandem repeats is the unequal recombination, and we adopt a single locus approach, i.e., we do not use sequence data from other species or from other loci. We also suppose that our loci have continually expanded via duplications and did not undergo any deletions. Indeed, comparisons between distinct species (Vijverberg and Bachmann 1999
) seem to show that positive selection tends to make loci expand, probably in order to generate diversity. However, this hypothesis will be discussed at the end of this article. Finally, we assume that our sequences were not affected by gene conversion events. These assumptions form the basis of the methods and algorithms we present in this paper, and we will see that they are in good agreement with the sequences we study.
In the following sections, we describe the mathematical model of evolution by tandem duplication, as induced by the above assumptions. We then present a simple, exhaustive procedure that searches for the best duplication trees, according to the maximum parsimony criterion, when given a set of ordered and aligned DNA sequences. Finally, we analyze two data sets of tandemly repeated sequences, the TRGV and the IGLC loci, obtained from the Immunogenetics DataBase IMGT.
![]() |
Models and Algorithms |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
In the initial duplication history, the adjacency relationships between ancestral copies (denoted i < j, see above) can be clearly identified throughout the evolution of the locus. In a partially ordered duplication history, these relationships are no longer represented. However, not all the adjacency relationships between ancestral copies are possible. For example, in figure 3b
, e < 3 < f < g < h is possible, whereas e < 3 < f only or e < 3 < 4 < 5 < b would never occur. In fact, it can be shown that the possible ancestral combinations of adjacent copies are given by the maximal antichains (Atallah 1999
, p. 13) of the partial order on the duplication events.
Duplication Trees
Another consequence of the absence of a molecular clock is that the position of the root cannot generally be recovered from the sequences. Unrooting a partially ordered duplication history creates what we call a duplication tree. As mentioned previously, a more precise denomination would be tandem duplication tree. A duplication tree is an unrooted phylogeny with ordered leaves whose topology is compatible with at least one phylogeny induced from a duplication history (a more formal definition is given below). Figure 3c
shows a hypothetical duplication tree that is compatible with the duplication histories shown in figures 3a and b
.
In turn, rooting a duplication tree may or may not produce a valid partially ordered duplication history because the mathematical model of duplication allows the root of a duplication history to be located somewhere on the path linking the most distant genes, but not everywhere. For example, in figure 4 there are three potential root locations (a, b, and c), and only two of them (b and c) lead to valid partially ordered duplication histories.
|
|
The PDH algorithm also allows us to determine whether a phylogeny can be a duplication tree; we simply run it on each root location along the path linking the leaves associated with the two most distant sequences on the locus. If at least one of these roots yields a valid partially ordered duplication history, the unrooted phylogeny is a possible duplication tree. The PDH algorithm thus provides us with a mathematical characterization of both the partially ordered duplication history and the duplication tree objects.
We define the following notation: (1) T is a rooted phylogeny with ordered leaves and (2) a cherry (i, u, j) is a pair of leaves (i and j) separated by a single node u in T; we call C(T ) the set of cherries of T. The PDH algorithm is a recursive procedure, which progressively reduces T by agglomerating the cherries that belong to recognized duplication events. It merges cherries until T has been reduced to its root, meaning that it constitutes a valid partially ordered duplication history, or it cannot go further, in which case T cannot be a partially ordered duplication history. It must be noted that the order in which cherries are agglomerated is not important; a cherry belongs to at most one duplication event. The PDH algorithm is given in figure 6 .
|
A locus containing n repeats can be obtained from any of (n - 1) 1-duplications from a locus containing (n - 1) repeats or from any of (n - 3) 2-duplications from a locus containing (n - 2) repeats, etc. Therefore, the number DH(n) of possible duplication histories for n repeats is given by the following recursive formula:
|
To count the number of partially ordered duplication histories PODH(n) for n repeats, we used the PDH algorithm. For relatively small n (i.e., for n 10), we generated every possible rooted phylogeny and applied the PDH algorithm to obtain the total number of partially ordered duplication histories. For n > 10, this procedure takes too much time, and the number of partially ordered duplication histories has to be estimated. This estimation is given by x Nrooted(n), where PPODH(n) is the proportion of partially ordered duplication histories among the set of all possible rooted phylogenies, and Nrooted(n) is the total number of rooted phylogenies, that is (2n - 3)
i=3n (2i - 5) (Cavalli-Sforza and Edwards 1967
). To estimate PPODH(n), we constructed with the uniform distribution a large number of rooted phylogenies and fed them into our PDH algorithm.
We adopted the same approach to obtain the number of duplication trees DT(n). For n 10, we generated every possible unrooted tree and obtained DT(n) using the PDH algorithm. For n > 10, we estimated the number of duplication trees using x Nunrooted(n), where PDT (n) is the proportion of duplication trees among the set of all possible unrooted phylogenies, and Nunrooted(n) is the total number of unrooted phylogenies, i.e.,
i=3n (2i - 5).
To construct rooted and unrooted trees with the uniform distribution, we used the classical addition scheme described in Gascuel (2000)
. Estimations were computed for 11
n
14 because duplication histories and duplication trees become too scarce for higher values of n (e.g., only 77 rooted phylogenies out of 5 x 107 were found to be duplication histories when n = 15). This procedure provided us with good estimates for n
14. For example, in the n = 10 experiments, where the exact numbers can be calculated, the relative deviation between PODH(n) and is equal to 0.8%, whereas between DT(n) and it is equal to 1.1%. Moreover, the standard deviations of the estimators (approximated using the Gaussian distribution assumption) are relatively low (i.e., less than 10% of the estimates) for 11
n
14 and for all estimated quantities.
Table 1 provides the results of this study. It clearly shows that PODH(n) and DT(n) are much smaller than Nrooted(n) and Nunrooted(n), respectively. Linear regression on the log values suggests that the ratio between Nrooted(n) and PODH(n), the ratio between Nunrooted(n) and DT(n), and the ratio between DH(n) and PODH(n) are at least of exponential order in n. Finally, it appears that the ratio between DT(n) and PODH(n) is strictly equal to two for exact values and remains close to two with estimated values. The constancy of this ratio is surprising and indicates that better analysis of these combinatorial objects represents an interesting direction for further research.
|
There are two ways to reconstruct duplication trees under the parsimony criterion. The most straightforward method is to apply a phylogenetic reconstruction program, such as DNAPENNY from the PHYLIP package, to our aligned sequences and check using PDH whether the output tree(s) is a duplication tree or not. This method can be efficient in practice, but it is not guaranteed to find a duplication tree because DNAPENNY does not restrict its search to duplication trees. The other method uses an algorithm that only searches the space of duplication trees. Although we cannot explore the space of all possible phylogenies for n sequences in reasonable time, it becomes possible to explore the more restricted space of all duplication trees, for n relatively large. A simple approach is then to generate all possible duplication trees with the desired number of leaves and select the best ones. For this purpose, we devised the DTEXPLORE algorithm, which performs a depth-first exploration of the solution space through a simulation of the unequal recombination process, as represented in figure 7 . This algorithm can be summarized as follows: we start with a tree consisting of two leaves and one root and one or several leaves are duplicated, as implied by our duplication model (fig. 2 ). When the desired number of leaves is reached, leaves are labeled according to their order so as to associate them with gene sequences, and the parsimony value of the resulting tree is computed. The search algorithm then backtracks, and alternative duplications are tried until the search space has been completely explored. Finally, the most parsimonious trees are outputted. The DTEXPLORE algorithm is shown in figure 8 .
|
|
However, the same duplication tree can still be generated several times (the number of duplication trees for nine repeats is 5,202) because several partially ordered duplication histories (corresponding to every possible root position) are sometimes compatible with a single duplication tree. Designing a more sophisticated memorization system to eliminate redundant partially ordered duplication histories represents an interesting direction for further research. However, according to our previous results, the ratio between the number of partially ordered duplication histories and the number of duplication trees seems to remain close (or be identical) to two, and so this possible refinement would at the most divide the CPU times by a factor of two.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We then applied DTEXPLORE to the multiple alignments so as to obtain the most parsimonious duplication trees explaining these sequences. With the same input, DNAPENNY, from the PHYLIP software package, was then used to compute the most parsimonious trees among all possible phylogenies, without the duplication tree restriction, using a branch-and-bound strategy (Hendy and Penny 1982
). Because DNAPENNY implements Fitch parsimony and shares the traditional phylogenetic assumptions we used (such as independent evolution of lineages), the most parsimonious duplication trees and the most parsimonious phylogenies should be identical, provided our preliminary hypotheses are respected. Thus, identity (when it occurs) between the trees produced using both methods supports the hypothesis that the repeats were generated by an unequal recombination process, provided the probability for a phylogeny to be a duplication tree is small enough (which depends on the number of repeats, as we saw previously).
We used the bootstrap procedure (Felsenstein 1985
) to assess the reliability of our duplication trees and the robustness of the tendency of DNAPENNY to find the same trees as DTEXPLORE. As in traditional phylogenetic analysis, this involved creating pseudosamples by randomly sampling with replacement characters from the initial multiple alignment. Each time we generated a new pseudosample, we searched for the optimal duplication trees using DTEXPLORE and for the most parsimonious phylogenies using DNAPENNY. Once every pseudosample had been analyzed, we computed the bootstrap proportion of every branch in the initial duplication tree (or phylogeny). We also computed the proportion of pseudosamples where the duplication trees found by DTEXPLORE were identical, or close, to the most parsimonious phylogenies. Both these indicators gave measures of the repeatability and tolerance of the results with respect to the sampling noise. Pseudosamples for the bootstrap procedure were generated using SEQBOOT from the PHYLIP package (Felsenstein 1989
), and 1,000 pseudosamples were generated for each data set. Bootstrap computations were distributed among several computers using the Parallel Virtual Machine (PVM) library (Sunderam 1990
).
As a complementary analysis, we used a Bayesian approach, implemented in BAMBE (Larget and Simon 1999
), to get a quantitative assessment of the support of the unequal recombination hypothesis. Given a set of nucleotide sequences and a model of substitution, BAMBE computes the posterior probabilities of a large sample of phylogenies. We then apply the PDH algorithm to find duplication trees among these phylogenies and sum their posterior probabilities to obtain the posterior probability of our duplication model. F84 (Felsenstein and Churchill 1996
) was used as a substitution model. For each data set, we ran BAMBE five times with a different random initialization.
Once our duplication trees were constructed, we rooted them using both the outgroup method and the molecular clock hypothesis on functional genes. In the first method, we selected appropriate outgroup sequences, i.e., homologous sequences from other species, with the constraint that the minimum distance between our initial sequences and the outgroup sequences had to be larger than the maximum distance within our initial sequences. We then constructed a global tree containing both our sequences and the outgroup sequences. Because our initial sequences and the outgroup sequences were relatively divergent, we computed a distance matrix from our data using the F84 model of substitution, and used BIONJ (Gascuel 1997
) to construct the global tree. Using a distance approach allows us to limit the long-branch attraction phenomenon, to which parsimony methods are very sensitive (Felsenstein 1978
). Also because the duplicated genes share the same environment, it is reasonable to think that at least the functional genes follow a form of molecular clock in which the total mutation rate from root to leaves is relatively constant. To root a duplication tree using this hypothesis, we constructed a tree from the initial data using BIONJ and used the branch lengths between functional genes to locate the minimum variance point.
The TRGV Locus
Our first data set stems from the human TRGV locus (Lefranc, Forster, and Rabbitts 1986
; Lefranc et al. 1986
), which corresponds to the variable region of the gamma T-cell receptor. It contains nine repeated genes; eight of them are 4.55 kb long, and the last one is slightly shorter (3 kb). These nine genes are named V1, V2, V3, V4, V5, V5P, V6, V7, and V8. Three of them are pseudogenes (V5P, V6, and V7), and one is an ORF (V1). The whole TRG locus was recently fully sequenced (accession number AF057177). We kept the DNA stretch starting 500 nucleotides upstream and finishing 500 nucleotides downstream of the coding sequences. After multiple alignment and gap removal, our sequences were 1,318 bp long.
The TRGV locus was shown to be polymorphic in French, Lebanese, Tunisian, Black-African, and Chinese populations (Ghanem et al. 1989
). The most striking polymorphism is the simultaneous absence of the V4 and V5 sequences in some individuals from these populations. Another polymorphism stems from the insertion of an additional copy called V3P between V3 and V4. Unfortunately, V3P has not been sequenced so far.
DTEXPLORE evaluated all possible duplication trees with nine repeats and finally came up with the single most parsimonious one shown in figure 9a . DNAPENNY came up with the same tree topology as DTEXPLORE. Considering that only 3.5% of phylogenies are also duplication trees for nine leaves, the identity between the most parsimonious phylogeny and the most parsimonious duplication tree strongly supports our assumptions concerning the biological model by which repeats are generated. The bootstrap analysis for this data set showed that most of the internal branches of the duplication tree are strongly supported. Similar bootstrap proportions (using the same pseudosamples) were found to support the branches of the most parsimonious phylogeny. The bootstrap analysis also showed that DNAPENNY came up with the same trees as those found by the DTEXPLORE for 86.8% of the pseudosamples and with nearly identical trees (at most one branch of difference) for 92.7% of them. In addition, the mean BAMBE results on this data set indicated that our duplication model is supported by a posterior probability of 0.977.
|
This partially ordered duplication history clearly indicates that segments V2-V3 and V4-V5 arose from a recent 2-duplication event and therefore respects the polymorphism that occurs in the TRGV locus. Although we cannot rule out that the missing segments could be explained by a deletion event, this strongly suggests that the 2-duplication simply did not occur in some populations. This agreement between our duplication tree and the polymorphism data provides further support for both our assumptions concerning the biological mechanism that produces the repeats and our reconstruction method.
The IGLC Locus
Our second data set stems from the IGLC locus (Hieter et al. 1981
; Dariavach, Lefranc, and Lefranc 1987
; Vasicek and Leder 1990
), which corresponds to the constant region of the light chain of the Ig structure. It contains seven tandemly repeated genes (C1, C2, C3, C4, C5, C6, and C7), whose accession numbers are, respectively (J00252, J00253, J00254, J03009, J03010, J03011, and X51175), of which three are pseudogenes (C4, C5, and C6). Because the whole locus has not yet been entirely sequenced, we used the V-REGIONS (in the IMGT standardized notation) of these sequences to construct our multiple alignment. After alignment and removal of positions with gaps, each DNA sequence was 285 bp long.
DTEXPLORE evaluated all possible duplication trees with seven repeats and finally returned the single most parsimonious duplication tree shown in figure 10a . With the same data, DNAPENNY came up with four equally parsimonious phylogenies. One of them is identical to the duplication tree we found during the exhaustive search, whereas the remaining ones (figs. 10bd ) are not duplication trees. Because the IGLC locus only contains seven genes and DNAPENNY found four equally parsimonious phylogenies, the probability of one of these phylogenies also being a duplication tree is relatively high (approximately equal to 0.6). Therefore, these results are in agreement with our assumptions, but they cannot be considered as a strong support of our approach. The bootstrap analysis indicated that duplications C2-C3 and C4-C5 are strongly supported, whereas other internal branches are less supported (fig. 10a ). Similar bootstrap proportions were found for the most parsimonious phylogenies. The bootstrap procedure also showed that for 60.3% of the pseudosamples, the phylogenies produced by DNAPENNY include the duplication trees found during the exhaustive search, and for 85.7% of the pseudosamples, every inferred duplication tree has at most one branch of difference with one of the most parsimonious phylogenies. Finally, the mean BAMBE results indicated that our duplication model is supported by a posterior probability of 0.958 on this data set.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A duplication history which only contains 1-duplications is analogous to a binary search tree, which is a classical object in computer science. It is easily shown that if we delete a rooted subtree from a binary search tree, the resulting tree is still a binary search tree. Therefore, if a deletion occurs during the evolution of a set of tandemly repeated genes which only undergo 1-duplications, the duplication history model is still valid. This means that our duplication model is (relatively) robust to deletions, provided the duplication history only (mostly) contains 1-duplication events. However, this important property does not always hold when the duplication history contains n-duplications with n > 1, and more work would be needed to characterize the effects of deletions on our current duplication model.
Currently, we have two different ways to reconstruct duplication trees. The first one combines existing programs, such as DNAPENNY or DNAPARS, with our PDH algorithm to select the possible duplication trees among the optimal phylogenies. Its drawback is that it is not guaranteed to find a duplication tree because there may be some cases where none of the optimal phylogenies are duplication trees. In the second one, we use DTEXPLORE to perform an exhaustive search of the duplication tree space; this is a guaranteed but nonefficient way to find the optimal duplication trees. We need to increase the speed of this reconstruction procedure, so as to be able to tackle larger loci containing higher numbers of repeats. A refined solution would be to include the duplication tree constraint into the search procedure and to use optimal (e.g., branch-and-bound) or heuristic techniques to explore the restricted solution space.
We also tried to reconstruct the duplication history of the 11 repeats of the UbiA polyubiquitin locus (Graham, Jones, and Candidio 1989
) in Caenorhabditis elegans. Unfortunately, the five most parsimonious duplication trees (184 parsimony steps) found by DTEXPLORE were different from the unique most parsimonious phylogeny found by DNAPENNY (178 parsimony steps). This indicates that our model of evolution by tandem duplication needs to be refined in some cases by introducing other mechanisms such as deletions or gene conversions, for example.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Address for correspondence and reprints: Olivier Gascuel, Département d'Informatique Fondamentale et Applications, LIRMM, 161 rue Ada, 34392 Montpellier, France. gascuel{at}lirmm.fr
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aho A., J. Hopcroft, J. Ullman, 1974 The design and analysis of computer algorithms Addison Wesley, Reading, Mass
Alberts B., D. Bray, J. Lewis, M. Raff, K. Koberts, J. Watson, 1995 Molecular biology of the cell. 3rd edition Garland Publishing Inc., New York
Atallah M., ed 1999 Algorithms and theory of computation handbook CRC Press LLC, Boca Raton, Fla
Beuson G., L. Dong, 1999 Reconstructing the duplication history of a tandem repeat Pp. 4453 in Proceedings of the Intelligent Systems in Molecular Biology ISMB'99
Cavalli-Sforza L., A. Edwards, 1967 Phylogenetic analysis: models and estimation procedure Evolution 21:550-570[ISI]
Collins F., S. Weissman, 1984 The molecular genetics of human hemoglobin Prog. Nucleic Acids Res. Mol. Biol 31:315-462[ISI][Medline]
Corbett S., I. Tomlinson, E. Sonnhammer, D. Buck, G. Winter, 1997 Sequence of the human immunoglobulin diversity (D) segment locus: a systematic analysis provides no evidence for the use of DIR segments, inverted D segments, "minor" D segments or D-D recombination J. Mol. Biol 270:587-597[ISI][Medline]
Dariavach P., G. Lefranc, M. Lefranc, 1987 Human immunoglobulin C lambda 6 gene encodes the KernOz-lambda chain and C lambda 4 and C lambda 5 are pseudogenes Proc. Natl. Acad. Sci. USA 84:9074-9078[Abstract]
El-mabrouk N., 2000 Duplication, rearrangement and reconciliation Pp. 537550 in D. Sankoff and J. H. Nadeau, eds. Comparative genomics. Kluwer Academics Publishers, Dordrecht
Felsenstein J., 1978 Cases in which parsimony or compatibility methods will be positively misleading Syst. Zool 27:401-410[ISI]
. 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]
. 1989 PHYLIPPhylogeny Inference Package Cladistics 5:164-166
Felsenstein J., G. Churchill, 1996 A hidden Markov model approach to variation among sites in rate of evolution Mol. Biol. Evol 13:93-104[Abstract]
Fitch W., 1971 Toward defining the course of evolution: minimum change for a specified tree topology Syst. Zool 20:406-416[ISI]
Gascuel O., 1997 BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data Mol. Biol. Evol 14:685-695[Abstract]
. 2000 Evidence of a relationship between algorithmic scheme and shape of inferred trees Pp. 157168 in W. Gaul, O. Opitz, and M. Schader, eds. Data analysis. Scientific modeling and practical applications, Springer-Verlag, Berlin
Ghanem N., C. Buresi, J. Moisan, M. Bensmana, P. Chuchana, S. Huck, G. Lefranc, M. Lefranc, 1989 Deletion, insertion, and restriction site polymorphism of the T-cell receptor gamma variable locus in French, Lebanese, Tunisian, and Black African populations Immunogenetics 30:350-360[ISI][Medline]
Graham R., D. Jones, E. Candidio, 1989 UbiA, the major polyubiquitin locus in Caenorhabditis elegans, has unusual structural features and is constitutively expressed Mol. Cell. Biol 9:268-277[ISI][Medline]
Gumucio D., K. Wiebauer, R. Caldwell, L. Samuelson, M. Meisler, 1988 Concerted evolution of human amylase genes Mol. Cell. Biol 8:1197-1205[ISI][Medline]
Hendy M., D. Penny, 1982 Branch and bound algorithms to determine minimal evolutionary trees Math. Biosci 59:277-290[ISI]
Hieter P., G. Hollis, S. Korsmeyer, T. Waldmann, P. Leder, 1981 Clustered arrangement of immunoglobulin lambda constant region genes in man Nature 294:536-540[ISI][Medline]
Honjo T., F. Alt, eds 1995 Immunoglobulin genes Academic Press, London
Hordvik I., J. Thevarajan, I. Samdal, N. Bastani, B. Krossoy, 1999 Molecular cloning and phylogenetic analysis of the Atlantic salmon immunoglobulin D gene Scand. J. Immunol 2:202-210
Jeffreys A., S. Harris, 1981 Processes of gene duplication Nature 296:9-10[ISI]
Larget B., D. Simon, 1999 Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees Mol. Biol. Evol 16:750-759
Lefranc M., A. Forster, R. Baer, M. Stinson, T. Rabbitts, 1986 Diversity and rearrangement of the human T-cell rearranging genes: nine germ-line variable genes belonging to two subgroups Cell 45:237-246[ISI][Medline]
Lefranc M., A. Forster, T. Rabbitts, 1986 Rearrangement of two distinct T-cell gamma-chain-variable-region genes in human DNA Nature 319:420-422[ISI][Medline]
Li W., 1997 Molecular evolution Sinauer Inc., Sunderland, Mass
Ohno S., 1970 Evolution by gene duplication Springer-Verlag, New York
Ruddle F., J. Bartels, K. Bentley, C. Kappen, M. Murtha, J. Pendleton, 1994 Evolution of Hox genes Annu. Rev. Genet 28:423-442[ISI][Medline]
Ruiz M., V. Giudicelli, C. Ginestoux, et al. (12 co-authors) 2000 IMGT, the international ImMunoGeneTics database Nucleic Acids Res 28:219-221
Sankoff D., M. Blanchette, 1999 Phylogenetic invariants for genome rearrangements J. Comput. Biol 3:431-445
Shen S., J. Slightom, O. Smithies, 1981 A history of the human fetal globin gene duplication Cell 26:191-203[ISI][Medline]
Smith G., 1976 Evolution of repeated DNA sequences by unequal crossover Science 191:528-535[ISI][Medline]
Sunderam V., 1990 PVM: a framework for parallel distributed computing Concurrency: Pract. Experience 2:315-339
Swofford D., P. Olsen, P. Waddell, D. Hillis, 1996 Phylogenetic inference Pp. 407514 in D. M. Hillis, C. Moritz, and B. K. Malle, eds. Molecular systematics. Sinauer Associates, Sunderland, Mass
Thompson J., D. Higgins, T. Gibson, 1994 CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Vasicek T., P. Leder, 1990 Structure and expression of the human immunoglobulin lambda genes J. Exp. Med 172:609-620[Abstract]
Vijverberg K., K. Bachmann, 1999 Molecular evolution of a tandemly repeated trnF(GAA) gene in the chloroplast genomes of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analyses Mol. Biol. Evol 16:1329-1340[Abstract]