Study of Intrachromosomal Duplications Among the Eukaryote Genomes

Guillaume Achaz, Pierre Netter and Eric Coissac

Structure et Dynamique des Génomes, Institut Jacques Monod, Paris, France


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Complete eukaryote chromosomes were investigated for intrachromosomal duplications of nucleotide sequences. The analysis was performed by looking for nonexact repeats on two complete genomes, Saccharomyces cerevisiae and Caenorhabditis elegans, and four partial ones, Drosphila melanogaster, Plasmodium falciparum, Arabidopsis thaliana, and Homo sapiens. Through this analysis, we show that all eukaryote chromosomes exhibit similar characteristics for their intrachromosomal repeats, suggesting similar dynamics: many direct repeats have their two copies physically close together, and these close direct repeats are more similar and shorter than the other repeats. On the contrary, there are almost no close inverted repeats. These results support a model for the dynamics of duplication. This model is based on a continuous genesis of tandem repeats and implies that most of the distant and inverted repeats originate from these tandem repeats by further chromosomal rearrangements (insertions, inversions, and deletions). Remnants of these predicted rearrangements have been brought out through fine analysis of the chromosome sequence. Despite these dynamics, shared by all eukaryotes, each genome exhibits its own style of intrachromosomal duplication: the density of repeated elements is similar in all chromosomes issued from the same genome, but is different between species. This density was further related to the relative rates of duplication, deletion, and mutation proper to each species. One should notice that the density of repeats in the X chromosome of C. elegans is much lower than in the autosomes of that organism, suggesting that the exchange between homologous chromosomes is important in the duplication process.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
All eukaryote genomes exhibit similar physical structures and constraints (i.e., linear chromosomes, scaffold attachment, nucleosome organization). However, many characteristics highlight important differences between them: (1) coding sequences represent 72% of the Saccharomyces cerevisiae genome (Goffeau et al. 1996Citation ) and only 2%–5% of the Homo sapiens genome (Dunham et al. 1999Citation ), (2) a centimorgan corresponds to kilobases in S. cerevisiae (Baudat and Nicolas 1997Citation ) and to megabases in humans (Dunham et al. 1999Citation ), (3) the number of introns per gene and the density of transposons increase with the genome size, and (4) the isochores organization has been ascribed mostly to vertebrate genomes (Bernardi 2000Citation ). Despite these differences, one would expect to find remnants of a similar nuclear organization in genome sequences. Events of DNA duplication were described in many eukaryote genomes, but are the duplication dynamics similar in all eukaryotes?

Within duplication events, four main subprocesses have been documented: abnormal segregation during cell division (leading to entire-chromosome[s] duplication, viz., hyperploidization, and sometimes to whole-genome doubling, viz. polyploidization), transposition (duplication of transposable elements), expansion of low-complexity sequences (microsatellites and minisatellites), and finally generic duplications of unspecific DNA regions within the same chromosome or between two chromosomes. We shall henceforth refer to this last subprocess as the iteration process. Polyploidization events were proposed to explain the large-scale duplications at the origin of vertebrates (Ohno 1970Citation ), in many angiosperms (Masterson 1994Citation )—even in Arabidopsis thaliana (Blanc et al. 2000)Citation , in the fish lineage (Amores et al. 1998Citation ), and in the yeast S. cerevisiae (Wolfe and Shields 1997Citation ). However, it is not clear if these large-scale duplications are always the result of polyploidization, successive hyperploidizations, or bursts of large iterations (Holland 1999Citation ; Llorente et al. 2000Citation ; Vision, Brown, and Tanksley 2000Citation ; Hughes, Da Silva, and Friedman 2001Citation ; Robinson-Rechavi et al. 2001Citation ).

In order to investigate the iteration process, we focused our attention on intrachromosomal repeats in the chromosome sequences. Two complete genomes, S. cerevisiae and Caernorhabditis elegans, and four partial ones, H. sapiens, Drosophila melanogaster, A. thaliana, and Plasmodium falciparum, were analyzed. It should be noted that the genome of S. cerevisiae was already investigated for its repeats in a previous study (Achaz et al. 2000Citation ) in which we proposed a model for the dynamics of the iteration process based on a continuous genesis of close direct repeats (CDR). A CDR is defined here as a repeat with its copies in the same orientation and with a physical distance between them (the spacer) smaller than 1 kb. The model supposes that most of the intrachromosomal repeats originate from these CDRs, the others being the result of further chromosomal rearrangements. In the present study, the model established in yeast was tested for new eukaryote chromosomes. We focused on the differences between genomes and tried to connect them to the genome context. In our model, supposing that most of the intrachromosomal repeats originate from tandem repeats, the chromosome sequences had been investigated to find the remnants of the chromosomal rearrangements. Hence, we view repeats as the markers of genome dynamics.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Data
We analyzed the complete eukaryote genomes of S. cerevisiae—16 chromosomes—(Goffeau et al. 1996Citation ) and C. elegans—six chromosomes—(Consortium 1998Citation ), and four partial genomes: H. sapiens—chromosomes 21 (Hattori et al. 2000Citation ) and 22 (Dunham et al. 1999Citation ), P. falciparum—chromosomes 2 (Gardner et al. 1998Citation ) and 3 (Bowman et al. 1999Citation ), A. thaliana—chromosomes 2 (Lin et al. 1999Citation ) and 4 (Mayer et al. 1999Citation ), and six chromosomal arms (X, 2L, 2R, 3L, 3R, 4) of D. melanogaster (Adams et al. 2000Citation ).

Sequences of H. sapiens, C. elegans, P. falciparum, and A. thaliana were extracted from GenBank (ftp://ncbi.nlm.nih.gov/genbank/genomes). The S. cerevisiae chromosomes were extracted from Saccharomyces Genome Database (http://genome-www.stanford.edu/Saccharomyces). Sequences of D. melanogaster were downloaded from Celera database (http://www.celera.com).

It should be pointed out that most sequences contain many gaps (stretches of N). For example, in chromosome 1 of C. elegans, 8.8% of its base pairs are N, and 29 gaps are longer than 10 kb. These stretches were not taken into account during the construction of the repeats' database.

Construction of the Repeats Database
General trends of repeats detection, like most of the heuristics already proposed (Leung et al. 1991Citation ; Vincens et al. 1998Citation ), are based on looking first for seeds (exact repeats) and then extending them with a local alignment program. The detailed methodology is described below through three main steps: searching, filtering, and extending.

First Step: Searching for Seeds
In this step, exact repeats (seeds) were detected by using the REPuter software (Kurtz and Schleiermacher 1999Citation ). This software detects all seeds (direct and inverted) in a given sequence that are any distance apart from the chromosome. As we are interested in unusually large seeds, the minimum length of seeds (Lmin) was calculated using the statistics developed by Karlin and Ost (1985)Citation . For each chromosome, we chose Lmin such that the probability of finding a two-copy word with at least this length in a same-size, same-nucleotide composition random sequence is 0.001. Typically, Lmin ranges from 21 for the smallest chromosome (chromosome 1 of S. cerevisiae) to 28 for the largest ones (chromosomes 21 and 22 of H. sapiens).

Second Step: Filtering the Seeds
First, to remove all low-complexity seeds (i.e., microsatellites or poly-A stretches), we used an entropy filter based on dinucleotide composition (Achaz et al. 2000Citation ). Second, all multicopy seeds were removed. A chromosome map in which each position is linked to its n-plication degree (duplication, triplication, etc.) was established. To build this map, we counted for each chromosome position the number of times this position is found in seeds (direct and inverted seeds were pooled together). This map is used to estimate the degree of redundancy of chromosomes (i.e., the number of duplications, triplications, etc.). Table 1 presents, for each species, the mean size of the chromosomes, the percentage of chromosomes included in two-copy seeds, and the percentage of chromosomes represented by all the seeds. As we are only interested here in two-copy seeds, we used the map to remove all seeds in which one of the positions is included in a multicopy repeat.


View this table:
[in this window]
[in a new window]
 
Table 1 Estimation of the Intrachromosomal Redundancy of Each Genome

 
Third Step: Extending the Seeds
Seeds were extended into larger nonstrict repeats by using a local alignment program (Smith and Waterman 1981Citation ) available at http://www-hto.usc.edu/software/seqaln. It should be pointed out that many seeds might give rise to the same extended repeat. Therefore, when two or more repeats occurred in the same location, we just kept the first one. Before the alignment is performed, 100 bp were picked up on both sides of the seeds. Thus, for a given seed of size N, the first alignment is computed with two sequences of 2 x 100 + N bp. The following matrix, which was built empirically, was retained for local alignments: match(A/T/C/G) = 4, mismatch = -4, Gapopen = -16, and Gapextension = -4. When the best local alignment ends at less than 10 bp of a terminus, 200 bp were added at the termini, and a new run was done. As the alignment of a large sequence requires too many computer resources, we devised the following heuristic to compute the alignment of large sequences. If the alignment size was more than 1 kb, the partial alignment was memorized, and only the rest of the alignment was computed in a new run. The process goes on until both sides of the complete alignment end at more than 10 bp of the termini. Thus, it provides a nonoptimal alignment but allows us to extend very large repeats. Then we removed all repeats in which the copies overlap because they generally correspond, at this stage, to three-copy repeats.

It should be mentioned that the methodology was similar to the one previously used in the S. cerevisiae analysis (Achaz et al. 2000Citation ), but was modified in order to analyze in the same way the chromosomes of yeast (<1.5 Mb) and man (35 Mb). The major modifications were applied to reduce the number of seeds and to keep only sensu stricto duplicated seeds (present only in two copies) for the alignment process.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The application of the methodology described above yields for direct and inverted repeats, respectively: 110 and 75 for S. cerevisiae, 136 and 48 for P. falciparum, 2,407 and 1,068 for A. thaliana, 6,885 and 1,845 for C. elegans, 1,479 and 691 for D. melanogaster, and 3,457 and 2,406 for H. sapiens.

Genome Style and History of Chromosomes
In order to analyze the relationship between chromosome size and redundancy level, we measured two parameters DN and DL, defined as follows:


As predicted by the estimated redundancy of the genome in Table 1 , if we exclude the Drosophila chromosomal arms (which are clearly underrepeated for their size), DN and DL are positively correlated with the chromosome size, using a Kendall tau-rank test ({tau} = 0.30, P < 0.05 for DN and {tau} = 0.40, P < 0.01 for DL). These observations are in agreement with an analysis of gene redundancy undertaken on partial genome sequences (Coissac, Maillier, and Netter 1997Citation ).

Two hypotheses can be proposed to explain the low densities of the chromosomal arms of D. melanogaster. The first one is a data bias: it should be noted that the analyzed sequences are constituted exclusively of euchromatine (only around two-thirds of the complete genome), and it is known that repeats are concentrated inside heterochromatine (Henikoff 2000Citation ). Moreover, assembly errors could lead to artificially deleted tandem repeats. The second hypothesis rests on biological grounds. One can imagine that Drosophila's genome has a special status in the duplication process (because there is no meiotic crossing-over in the male, the duplication process can be less active). The achievement of the complete sequence of D. melanogaster should solve this problem.

In order to investigate more precisely each chromosome, we analyzed DN and DL for direct and inverted repeats (fig. 1 ). It appears that DN is similar for all chromosomes within the same species, whereas DL is not. Thus, DN could define the style of redundancy of the genome. We assume that DN results from the iteration events combined with the loss of duplicated sequences, and then propose DN to be connected to the biological machinery of each species. Because the machinery is clearly different for each species, but similar for all chromosomes within the same genome, DN should be the consequence of each genome's dynamics. Furthermore, the differences between species come essentially from direct repeats, and less from inverted repeats. This suggests that the biological machinery is more connected to the creation and the loss of direct repeats than to the dynamics of inverted repeats.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1.—Occurrence and density of direct and inverted repeats in each chromosome. For each chromosome of each species, inverted repeats were compared to direct repeats. (a) Plot of DN (density in number: number divided by chromosome length) of inverted repeats as a function of DN of direct repeats. Chromosomes of the same species are grouped in gray areas, with the exception of the X chromosome of C. elegans and the fourth chromosomal arm of D. melanogaster, both indicated by black arrows. (b) Plot of DL (density in length: sum of repeats length divided by chromosome length) of inverted repeats as a function of DL of direct repeats. Each species is represented by a different symbol given just below the plot

 
The only two exceptions are the fourth chromosomal arm of D. melanogaster and the X chromosome of C. elegans. The high density of the small fourth chromosomal arm of D. melanogaster could be the result of its particular structure (if there is no data bias): it is mostly constituted of heterochromatin, but, contrary to centromeric chromatin (or Y chromosome), it is partially visible in polytene chromosomes. On the contrary, the X chromosome of C. elegans exhibits a lower DN than that of the other worm's chromosomes. This observation is in good agreement with the unequal distribution of repetitive elements, such as CeRep23 (Barnes et al. 1995Citation ), Cele1, Cele2, and Cele42 (Surzycki and Belknap 2000), between the autosomes and the X chromosome in C. elegans. It should be pointed out that exchanges between the homologous X chromosomes are only possible in hermaphrodite XX (males are X0), which could explain this lower DN. If this is true for C. elegans, one may expect this to be true for all heterochromosomes. The X chromosomal arms of D. melanogaster seem similar to the other chromosomal arms; however, none of the Drosphila male chromosomes is submitted to meitoic crossing-over.

Contrary to DN, DL could reflect better the chromosome history than the effects of the cellular machinery: a unique event of iteration can lead to a high DL for direct or inverted repeats. For example, direct repeats of the chromosome 1 of C. elegans exhibit a high DL and a normal DN (when compared with the other C. elegans chromosome values). This particularity is mainly caused by two large duplicated sequences, one 250-kb long (with an identity of 98.7%) and the other 600-kb long (fractionated into several segments of high identity, often more than 99%). Furthermore, the inverted repeats of the chromosome 1 of S. cerevisiae show a high DL and a normal DN, as a consequence of two internal regions inversely repeated in subtelomeres (Britten 1998Citation ).

A Model of Dynamics of Iteration
Our model of intrachromosomal iteration (Achaz et al. 2000Citation ) is based on a permanent genesis of CDR. The CDRs are then submitted to a high level of exchange (conversion and deletion). This high exchange rate tends to maintain the two copies identically (conversion) and also to eliminate them (deletion). At each round of exchange, both events are possible, but whereas conversion may still be followed by deletion, a deletion event cannot be followed by conversion.

Therefore, on a long timescale, a bias in favor of deletion should be observed. A CDR has to disappear sooner or later (depending on the relative rates of conversion and deletion). However, there are two situations where a repeat would be maintained: when it is protected from deletions by functional pressures (i.e., located inside a gene) or when the copies are spaced by further chromosomal rearrangements. This model was mainly based on three observations for CDR: (1) they are overrepresented, (2) they are mostly located inside the same gene, and (3) their length is positively correlated with the spacer (the physical distance between copies), and their identity is negatively correlated with it.

Through the present analysis, the model was tested with other eukaryotes. It should be mentioned that a model of tandem creation and further dispersion was already invoked for the families of two genes (Hox and NBG) in C. elegans (Ruvkun and Hobert 1998Citation ). The annotations of eukaryote chromosomes being partial, they were not taken into account. Thus, we did not analyze the relation of repeats position with genes location.

CDRs Are Overrepresented
The repartitions of spacer size for direct and inverted repeats (fig. 2 ) reveal that CDRs are overrepresented as compared with close inverted repeats. Moreover, in the previous study (Achaz et al. 2000Citation ), the repeats of S. cerevisiae were compared with the repeats that issued from random chromosomes. From this comparison, we showed that such close repeats (inverted or direct) are absent from random chromosomes. This strongly suggests that these CDRs are not the result of chance. The presence of many CDRs in all chromosomes is in good agreement with the model.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 2.—Distribution of spacers for each orientation (direct and inverted). Each histogram shows the distribution of spacers (the distance between the two copies) for direct and inverted repeats. Black boxes represent the direct repeats, and gray boxes represent the inverted ones. The distribution is established by the log 10 of the spacer size (in steps of 0.5). These histograms show clearly that CDRs (direct repeats with a spacer smaller than 1 kb) are overrepresented in all species

 
However, the repartition spacer's length indicates the existence of many direct repeats with a spacer between 1 and 10 kb in A. thaliana chromosomes (they represent more than one-third of all direct repeats). We looked for a plausible explanation for this overrepresentation in A. thaliana (as compared with other species), with particular attention to the sequence located between the two copies (the spacer). Several hypotheses can be envisaged and rejected: (1) repeats are not the edges of transposons because the spacers are not paralogous, (2) the hypothesis of campbell-like insertions of exogenous DNA, as it was proposed for B. subtilis (Rocha, Danchin, and Viari 1999aCitation ), can be eliminated because there is no difference in nucleotide composition between spacers and chromosomes, (3) there is no clear difference between these repeats and others—no high identity level, no special length, no special physical location. Similar observations can be established for the genome of C. elegans and D. melanogaster, where direct repeats with a spacer between 1 and 10 kb are also overrepresented.

In conclusion, we did not yet find any plausible hypothesis to understand why these repeats are overrepresented.

CDRs Are Identical and Short
We started by characterizing CDRs in terms of the distribution of their identity (fig. 3 ). Except for S. cerevisiae, CDRs have their two copies more identical than distant direct repeats (P < 10-4, Mann-Whitney rank test).



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 3.—Distribution of the identity percentage for direct repeats. The histograms show the distribution of the identity percentage of a given species for distant direct repeats and CDRs. The hatched black boxes represent only the CDRs, and the plain black boxes represent the distant direct repeats. It can be shown using a Mann-Whitney rank test that, except for yeast, CDRs are more identical than distant direct repeats

 
In order to explain the S. cerevisiae exception, one should take into consideration that the distinction between close and distant repeats has been arbitrarily fixed at the same spacer size (1 kb) for each organism. The biological difference between close and distant repeats is connected to the recombination machinery. As this machinery varies from yeast to human, the limit between close and direct repeats should not be identical for all species. In that way, it can be shown that for S. cerevisiae, direct repeats with a spacer smaller than 500 bp are more identical than other direct repeats (P < 0.05, Mann-Whitney rank test).

This greater similarity could be explained, on the one hand, by the recent origin of these repeats and, on the other, by a high conversion rate between the two copies when they are close together. As previously discussed, CDR could also be submitted to a high deletion rate. It has been reported that recombination rate is positively correlated with repeat length in yeast (Jinks, Michelitch, and Ramcharan 1993Citation ) and in mammalian cells (Rubnitz and Subramani 1984Citation ). Thus, CDRs with long copies are too unstable to persist, and only small CDRs are conserved. In order to test this hypothesis, the length distributions of close and distant direct repeats were compared: it appeared that CDRs are smaller than the distant ones (P < 10-4, Mann-Whitney rank test).

CDRs Exhibit an Exchange Rate Negatively Correlated with the Spacer Size
We previously observed a positive rank correlation between length and spacer and a negative rank correlation between identity and spacer for CDR in yeast: the closer the repeats, the more identical and shorter they are. Except for the P. falciparum chromosomes, correlations between identity, length and spacer were found in all eukaryotes (Table 2 ). This is in good agreement with an observation reported in C. elegans that the similarity between paralogous genes is negatively correlated with the physical distance between them (Semple and Wolfe 1999Citation ).


View this table:
[in this window]
[in a new window]
 
Table 2 Computed Kendall Rank Correlations for CDRs of Each Species

 
In order to understand such a result, we proposed that, as in bacteria (Peeters et al. 1988Citation ), the exchange rate between the two copies is negatively correlated with the spacer size. A higher conversion rate will increase the identity percentage, and a higher deletion rate will tend to remove large repeats.

In conclusion, the properties which supported the model of iteration dynamics established in S. cerevisiae are shared by other eukaryotes. This suggests that the model could be extended to all eukaryotes.

The Case of P. falciparum: How Parasitism Influences the Genome Style
P. falciparum chromosomes exhibit a high level of redundancy as compared with similar-sized chromosomes of S. cerevisiae (fig. 1 ), and their CDRs are extremely overrepresented: 74% have a spacer smaller than 1 kb (fig. 2 ). They are very identical (fig. 3 ) and very small (data not shown). However, no correlation between spacer, identity, and length can be highlighted (Table 2 ).

Two-thirds of the inverted repeats are located near the telomeres (one copy in each subtelomere), suggesting a peculiar history and a high exchange rate for these repeats. It was suggested that all subtelomeres exhibit a very plastic dynamics in S. cerevisiae (Pryde, Gorham, and Louis 1997Citation ) and in H. sapiens (Coleman, Baird, and Royle 1999Citation ). Their importance in the interchromosomal iteration process was demonstrated in S. cerevisiae (Coissac, Maillier, and Netter 1997Citation ).

All these observations are consistent with what was described previously: the highly repeated gene families and the special status of subtelomeres in P. falciparum (Gardner et al. 1998Citation ; Bowman et al. 1999Citation ).

Do These Observations Mean that This Ciliate Does Not Follow the Same Dynamics as the Other Eukaryotes?
P. falciparum is a human pathogenic parasite, the main agent of malaria. It has been reported that many bacterial pathogens exhibit a high redundancy level (Rocha, Danchin, and Viari 1999bCitation ) which has been related to high selective pressures for sequence variation. A significant number of repeats allows many recombination events, leading to a high plasticity of the genome, and then to a high evolution rate. As for these bacteria, the high redundancy level of P. falciparum could be a consequence of its parasitism.

The quasi-absence of distant repeats and the absence of correlation indicate that there are almost only young repeats. The absence of correlation is, in this way, not caused by the absence of the mechanism leading to them but by too short a time of evolution. Population studies suggest that P. falciparum spread worldwide from a limited area (Rich and Ayala 2000Citation ). The absence of old repeats could be a consequence of the recent change in the ecological conditions of P. falciparum, associated with a burst of evolution. In conclusion, P. falciparum follows the same iteration dynamics as the other eukaryotes. However, because it is a recent parasite, its chromosomes are more repeated than those of the other eukaryotes (as a result of parasitism), and there are almost no ancient repeats (because of its recent emergence).

How Tandem Repeats Can Be Turned into Spaced Repeats
Intrachromosomal repeats, in our model, are mostly created in tandem (by recombination between sister chromatides or by replication slippage), and are turned into distant repeats by chromosomal rearrangements. Analyzing all the ending states after several rearrangements is difficult. However, it is interesting to examine all the theoretical resulting states obtained after only one rearrangement event. Three kinds of rearrangement have been taken into account (fig. 4 ): deletion of a part of the tandem, insertion of a sequence inside the tandem repeat, and inversion taking away a piece of the tandem. The insertion process can be the result of either the insertion of a transposable element or the reparation of a double-strand break by sequence conversion (Voelkel and Roeder 1990Citation ). Small inversions have been suggested to explain the evolution of the genes' order between C. albicans and S. cerevisiae (Seoighe et al. 2000Citation ), highlighting their role in genome dynamics.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 4.—How tandem repeats can be turned into spaced repeats. This figure presents the three main ways to create a spacer between two tandem repeats: (1) a deletion event could occur inside the tandem repeats, leading to the creation of a spaced direct repeat (a) and two spaced direct repeats (b), (2) an insertion event could arise inside the tandem repeat, leading to one direct repeat with a spacer similar to another region in the genome (a) and two direct repeats (b), and (3) an inversion event could lead to one inverted repeat (a) or one direct repeat in which the spacer is an inverted repeat (b)

 
If the model is valid, one should find the vestiges of tandem rearrangement in the chromosome sequences. Thus, we used the wublastn software (http://blast.wustl.edu) to look for paralogs of the spacers in the complete chromosomes. Only spacers with size between 50 bp and 10 kb and flanked by direct repeats were taken into account. It should be stressed that the queried databases were constructed for each species with complete chromosomes only (the same that we used for the detection of the repeats). A sequence was arbitrarily considered as a paralog of the spacer if the sequence length was at least 80% of the spacer length, and if the two sequences were identical by more than 80%. Using this approach, large insertions (fig. 4.2a ) or some inversions (fig. 4.3b ) can be undoubtedly identified, but small internal deletions and small internal insertions (fig. 4.1b and 4.2b ) cannot be clearly differentiated. One should notice that deletion of an edge of a copy (fig. 4.1a ) or a complete inversion of a copy (fig. 4.3a ) cannot be detected by this method.

Results were sorted as a function of the number of paralogs detected in the chromosomes. For most spacers, no paralog was found. This has several possible reasons: (1) our criteria were very stringent, (2) the research was performed against the whole genome only for S. cerevisiae and C. elegans, and (3) we only detected paralogs for spacers issued from a recent unique event of rearrangement. Multiparalog families (when a spacer presented at least two paralogs) were separated because they give an idea of the relative transposition rate. All cases where the spacer had only one paralog have been analyzed more precisely as they appeared in figure 4 .

As shown in Table 3 , all possible remnants of the tandem rearrangement were detected in the sequence of chromosomes. These observations indicate that the theoretical rearrangements arise in the genome history, reinforcing the model of the iteration dynamics.


View this table:
[in this window]
[in a new window]
 
Table 3 Detected Paralogs for Spacers of Direct Repeats

 
A striking result was the overrepresentation of intrachromosomal direct paralogs in C. elegans. A detailed analysis of these paralogs revealed that they are mostly part of larger old tandem repeats. This observation has to be connected to the presence of large tandem repeats in the chromosomes of this species (i.e., a 600-kb repeat in the first chromosome), also recently described by Friedman and Hughes (2000)Citation . It seems probable that the worm genome has exhibited an active process of intrachromosomal iteration.

All generic duplications of nonspecific DNA regions within the same chromosome or between two chromosomes were referred to in this study as iteration. However, this iteration process should be divided into at least two distinct mechanisms. The first is the creation of tandem repeats (by sister chromatide exchange or replication slippage), which creates (under our model) most of the intrachromosomal repeats. The second is the genesis of repeats (inter- or intrachromosomal) by a double-strand break repair. Actually, this repair can lead to duplication when the repair is associated with a conversion mechanism. This implies that the duplication process can at least be divided into four mechanisms: abnormal chromosome segregation (hyperploidization); transposition (transposable elements); sister chromatide exchange, replication slippage (tandem repeats and satellites), or both; and double-strand break repair (iteration by conversion).


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Through this study of eukaryotes' intrachromosomal repeats, several biological results were highlighted. We extended our model, proposed for S. cerevisiae, to other eukaryote chromosomes (S. cerevisiae, C. elegans, P. falciparum, A. thaliana, D. melanogaster, and H. sapiens). This suggests that despite the differences in chromosomal properties, the iteration process follows globally the same dynamics in the eukaryote kingdom and thus has to be connected to structures and mechanisms shared by all eukaryote chromosomes.

The density of repeats number defines a genome style where the evolution rate results from iteration, deletion, rearrangement, and mutation. This rate is similar for all chromosomes within the same genome and is specific to each species. The main exception being the X chromosome of C. elegans, it suggests that exchanges between homologous chromosomes are important in the genesis of repeats. Thus, we propose that the genesis of tandem repeats is at least a consequence of exchange between homologous chromosomes.

Finally, we brought out the remnants of rearrangements of tandem repeats into spaced repeats. This suggests that tandem repeats, which can be easily created, are submitted to rounds of chromosomal rearrangements leading to the pattern of repeats observed today. Hence, repeats can be used to follow chromosome rearrangements and are markers of genome dynamics.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We thank I. Gonçalves, E. Rocha, D. Higuet, E. Maillier, J. Pothier, and A. Viari for their scientific help and their friendly support. This work was supported by grants from Association pour la Recherche sur le Cancer. E.C. and P.N. are members of Université Pierre et Marie Curie, Paris.


    Footnotes
 
Manolo Gouy, Reviewing Editor

Keywords: genome dynamics evolution duplication eukaryotes Back

Address for correspondence and reprints: Guillaume Achaz, Structure et Dynamique des Génomes, Institut Jacques Monod, Tour 43–44, 1° Étage, 4, Place Jussieu, 75251 Paris Cedex 05, France. achaz{at}ijm.jussieu.fr . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Achaz G., E. Coissac, A. Viari, P. Netter, 2000 Analysis of intrachromosomal duplications in yeast Saccharomyces cerevisiae: a possible model for their origin Mol. Biol. Evol 17:1268-1275[Abstract/Free Full Text]

    Adams M. D., S. E. Celniker, R. A. Holt, et al. (195 co-authors) 2000 The genome of Drosophila melanogaster Science 287:2185-2195[Abstract/Free Full Text]

    Amores A., A. Force, Y. L. Yan, et al. (13 co-authors) 1998 Zebrafish hox clusters and vertebrate genome evolution Science 282:1711-1714[Abstract/Free Full Text]

    Barnes T. M., Y. Kohara, A. Coulson, S. Hekimi, 1995 Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans Genetics 141:159-179[Abstract/Free Full Text]

    Baudat F., A. Nicolas, 1997 Clustering of meiotic double-strand breaks on yeast chromosome III Proc. Natl. Acad. Sci. USA 94:5213-5218[Abstract/Free Full Text]

    Bernardi G., 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]

    Blanc G., A. Barakat, R. Guyot, R. Cooke, M. Delseny, 2000 Extensive duplication and reshuffling in the Arabidopsis genome Plant Cell 12:1093-1101[Abstract/Free Full Text]

    Bowman S., D. Lawson, D. Basham, et al. (36 co-authors) 1999 The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum Nature 400:532-538[ISI][Medline]

    Britten R. J., 1998 Precise sequence complementarity between yeast chromosome ends and two classes of just-subtelomeric sequences Proc. Natl. Acad. Sci. USA 95:5906-5912[Abstract/Free Full Text]

    Coissac E., E. Maillier, P. Netter, 1997 A comparative study of duplications in bacteria and eukaryotes: the importance of telomeres Mol. Biol. Evol 14:1062-1074[Abstract]

    Coleman J., D. M. Baird, N. J. Royle, 1999 The plasticity of human telomeres demonstrated by hypervariable telomeres repeat array that is located on some copies of 16p and 16q Hum. Mol. Genet 8:1637-1646[Abstract/Free Full Text]

    Consortium. 1998 Genome sequence of the nematode C. elegans: a platform for investigating biology Science 282:2012-2018[Abstract/Free Full Text]

    Dunham I., N. Shimizu, B. A. Roe, et al. (239 co-authors) 1999 The DNA sequence of human chromosome 22 Nature 402:489-495[ISI][Medline]

    Friedman R., A. L. Hughes, 2000 Gene duplication and the structure of eukaryotic genomes Genome Res 11:373-381[Abstract/Free Full Text]

    Gardner M. J., H. Tettelin, D. J. Carucci, et al. (27 co-authors) 1998 Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum Science 282:1126-1132[Abstract/Free Full Text]

    Goffeau A., B. G. Barrell, H. Bussey, et al. (16 co-authors) 1996 Life with 6000 genes Science 274:546[Abstract/Free Full Text]

    Hattori M., A. Fujiyama, T. D. Taylor, et al. (62 co-authors) 2000 The DNA sequence of human chromosome 21 The chromosome 21 mapping and sequencing consortium. Nature 405:311-319

    Henikoff S., 2000 Heterochromatin function in complex genomes Biochem. Biophys. Acta 1470:O1-O8[ISI][Medline]

    Holland P. W., 1999 Gene duplication: past, present and future Semin. Cell Dev. Biol 10:541-547[ISI][Medline]

    Hughes A. L., J. Da Silva, R. Friedman, 2001 Ancient duplication did not structure the human Hox-bearing chromosomes Genome Res 11:771-780[Abstract/Free Full Text]

    Jinks R. S., M. Michelitch, S. Ramcharan, 1993 Substrate length requirements for efficient mitotic recombination in Saccharomyces cerevisiae Mol. Cell. Biol 13:3937-3950[Abstract]

    Karlin S., F. Ost, 1985 Maximal segmental match length among random sequences from a finite alphabet Pp. 225–243 in L. M. L. Cam and R. A. Olshen, eds. Proceedings of the Berkeley Conference in honor of Jerzy Neyman and Jack Kiefer, Vol. 1. Association for Computing Machinery, New York

    Kurtz S., C. Schleiermacher, 1999 REPuter: fast computation of maximal repeats in complete genomes Bioinformatics 15:426-427[Abstract/Free Full Text]

    Leung M. Y., B. E. Blaisdell, C. Burge, S. Karlin, 1991 An efficient algorithm for identifying matches with errors in multiple long molecular sequences J. Mol. Biol 221:1367-1378[ISI][Medline]

    Lin X., S. Kaul, S. Rounsley, et al. (39 co-authors) 1999 Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana Nature 402:761-768[ISI][Medline]

    Llorente B., A. Malpertuy, C. Neuveglise, et al. (24 co-authors) 2000 Genomic exploration of the hemiascomycetous yeasts: 18 Comparative analysis of chromosome maps and synteny with Saccharomyces cerevisiae. FEBS Lett 487:101-112

    Masterson J., 1994 Stomatal size in fossil plants: evidence for polyploidy in majority of angiosperms Science 264:421-424[ISI]

    Mayer K., C. Schuller, R. Wambutt, et al. (234 co-authors) 1999 Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana Nature 402:769-777[ISI][Medline]

    Ohno S., 1970 Evolution by gene duplication Springer-Verlag, Heidelberg, Germany

    Peeters B. P. H., J. H. De Boer, S. Bron, G. Venema, 1988 Structural plasmid instability in Bacillus subtilis: effect of direct and inverted repeats Mol. Gen. Genet 212:450-458[ISI][Medline]

    Pryde F. E., H. C. Gorham, E. J. Louis, 1997 Chromosome ends: all the same under their caps Curr. Opin. Genet. Dev 7:822-828[ISI][Medline]

    Rich S. M., F. J. Ayala, 2000 Population structure and recent evolution of Plasmodium falciparum Proc. Natl. Acad. Sci. USA 97:6994-7001[Abstract/Free Full Text]

    Robinson-Rechavi M., O. Marchand, H. Escriva, P. L. Bardet, D. Zelus, S. Hughes, V. Laudet, 2001 Euteleost fish genomes are characterized by expansion of gene families Genome Res 11:781-788[Abstract/Free Full Text]

    Rocha E. P., A. Danchin, A. Viari, 1999a Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis and other competent prokaryotes Mol. Biol. Evol 16:1219-1230[Abstract]

    Rocha E. P., A. Danchin, A. Viari, 1999b Functional and evolutionary roles of long repeats in prokaryotes Res. Microbiol 150:725-733[ISI][Medline]

    Rubnitz J., S. Subramani, 1984 The minimum amount of homology required for homologous recombination in mammalian cells Mol. Cell Biol 4:2253-2258[ISI][Medline]

    Ruvkun G., O. Hobert, 1998 The taxonomy of developmental control in Caenorhabditis elegans Science 282:2033-2041[Abstract/Free Full Text]

    Semple C., K. H. Wolfe, 1999 Gene duplication and gene conversion in the Caenorhabditis elegans genome J. Mol. Evol 48:555-564[ISI][Medline]

    Seoighe C., N. Federspiel, T. Jones, et al. (20 co-authors) 2000 Prevalence of small inversions in yeast gene order evolution Proc. Natl. Acad. Sci. USA 97:14433-14437[Abstract/Free Full Text]

    Smith T. F., M. S. Waterman, 1981 Identification of common molecular subsequences J. Mol. Biol 147:195-197[ISI][Medline]

    Surzycki S. A., W. R. Belknap, 2000 Repetitive-DNA elements are similarly distributed on Caenorhabditis elegans autosomes Proc. Natl. Sci. USA 97:245-249[Abstract/Free Full Text]

    Vincens P., L. Buffat, C. Andre, J. P. Chevrolat, J. F. Boisvieux, S. Hazout, 1998 A strategy for finding regions of similarity in complete genome sequences Bioinformatics 14:715-725[Abstract]

    Vision T. J., D. G. Brown, S. D. Tanksley, 2000 The origins of genomic duplications in Arabidopsis Science 290:2114-2117[Abstract/Free Full Text]

    Voelkel K., G. S. Roeder, 1990 Gene conversion tracts stimulated by HOT1-promoted transcription are long and continuous Genetics 126:851-867[Abstract/Free Full Text]

    Wolfe K. H., D. C. Shields, 1997 Molecular evidence for an ancient duplication of the entire yeast genome Nature 387:708-713[ISI][Medline]

Accepted for publication August 27, 2001.