Structure et Dynamique des Génomes, Institut Jacques Monod, Paris, France
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Within duplication events, four main subprocesses have been documented: abnormal segregation during cell division (leading to entire-chromosome[s] duplication, viz., hyperploidization, and sometimes to whole-genome doubling, viz. polyploidization), transposition (duplication of transposable elements), expansion of low-complexity sequences (microsatellites and minisatellites), and finally generic duplications of unspecific DNA regions within the same chromosome or between two chromosomes. We shall henceforth refer to this last subprocess as the iteration process. Polyploidization events were proposed to explain the large-scale duplications at the origin of vertebrates (Ohno 1970
), in many angiosperms (Masterson 1994
)even in Arabidopsis thaliana (Blanc et al. 2000)
, in the fish lineage (Amores et al. 1998
), and in the yeast S. cerevisiae (Wolfe and Shields 1997
). However, it is not clear if these large-scale duplications are always the result of polyploidization, successive hyperploidizations, or bursts of large iterations (Holland 1999
; Llorente et al. 2000
; Vision, Brown, and Tanksley 2000
; Hughes, Da Silva, and Friedman 2001
; Robinson-Rechavi et al. 2001
).
In order to investigate the iteration process, we focused our attention on intrachromosomal repeats in the chromosome sequences. Two complete genomes, S. cerevisiae and Caernorhabditis elegans, and four partial ones, H. sapiens, Drosophila melanogaster, A. thaliana, and Plasmodium falciparum, were analyzed. It should be noted that the genome of S. cerevisiae was already investigated for its repeats in a previous study (Achaz et al. 2000
) in which we proposed a model for the dynamics of the iteration process based on a continuous genesis of close direct repeats (CDR). A CDR is defined here as a repeat with its copies in the same orientation and with a physical distance between them (the spacer) smaller than 1 kb. The model supposes that most of the intrachromosomal repeats originate from these CDRs, the others being the result of further chromosomal rearrangements. In the present study, the model established in yeast was tested for new eukaryote chromosomes. We focused on the differences between genomes and tried to connect them to the genome context. In our model, supposing that most of the intrachromosomal repeats originate from tandem repeats, the chromosome sequences had been investigated to find the remnants of the chromosomal rearrangements. Hence, we view repeats as the markers of genome dynamics.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequences of H. sapiens, C. elegans, P. falciparum, and A. thaliana were extracted from GenBank (ftp://ncbi.nlm.nih.gov/genbank/genomes). The S. cerevisiae chromosomes were extracted from Saccharomyces Genome Database (http://genome-www.stanford.edu/Saccharomyces). Sequences of D. melanogaster were downloaded from Celera database (http://www.celera.com).
It should be pointed out that most sequences contain many gaps (stretches of N). For example, in chromosome 1 of C. elegans, 8.8% of its base pairs are N, and 29 gaps are longer than 10 kb. These stretches were not taken into account during the construction of the repeats' database.
Construction of the Repeats Database
General trends of repeats detection, like most of the heuristics already proposed (Leung et al. 1991
; Vincens et al. 1998
), are based on looking first for seeds (exact repeats) and then extending them with a local alignment program. The detailed methodology is described below through three main steps: searching, filtering, and extending.
First Step: Searching for Seeds
In this step, exact repeats (seeds) were detected by using the REPuter software (Kurtz and Schleiermacher 1999
). This software detects all seeds (direct and inverted) in a given sequence that are any distance apart from the chromosome. As we are interested in unusually large seeds, the minimum length of seeds (Lmin) was calculated using the statistics developed by Karlin and Ost (1985)
. For each chromosome, we chose Lmin such that the probability of finding a two-copy word with at least this length in a same-size, same-nucleotide composition random sequence is 0.001. Typically, Lmin ranges from 21 for the smallest chromosome (chromosome 1 of S. cerevisiae) to 28 for the largest ones (chromosomes 21 and 22 of H. sapiens).
Second Step: Filtering the Seeds
First, to remove all low-complexity seeds (i.e., microsatellites or poly-A stretches), we used an entropy filter based on dinucleotide composition (Achaz et al. 2000
). Second, all multicopy seeds were removed. A chromosome map in which each position is linked to its n-plication degree (duplication, triplication, etc.) was established. To build this map, we counted for each chromosome position the number of times this position is found in seeds (direct and inverted seeds were pooled together). This map is used to estimate the degree of redundancy of chromosomes (i.e., the number of duplications, triplications, etc.). Table 1
presents, for each species, the mean size of the chromosomes, the percentage of chromosomes included in two-copy seeds, and the percentage of chromosomes represented by all the seeds. As we are only interested here in two-copy seeds, we used the map to remove all seeds in which one of the positions is included in a multicopy repeat.
|
It should be mentioned that the methodology was similar to the one previously used in the S. cerevisiae analysis (Achaz et al. 2000
), but was modified in order to analyze in the same way the chromosomes of yeast (<1.5 Mb) and man (35 Mb). The major modifications were applied to reduce the number of seeds and to keep only sensu stricto duplicated seeds (present only in two copies) for the alignment process.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Genome Style and History of Chromosomes
In order to analyze the relationship between chromosome size and redundancy level, we measured two parameters DN and DL, defined as follows:
|
Two hypotheses can be proposed to explain the low densities of the chromosomal arms of D. melanogaster. The first one is a data bias: it should be noted that the analyzed sequences are constituted exclusively of euchromatine (only around two-thirds of the complete genome), and it is known that repeats are concentrated inside heterochromatine (Henikoff 2000
). Moreover, assembly errors could lead to artificially deleted tandem repeats. The second hypothesis rests on biological grounds. One can imagine that Drosophila's genome has a special status in the duplication process (because there is no meiotic crossing-over in the male, the duplication process can be less active). The achievement of the complete sequence of D. melanogaster should solve this problem.
In order to investigate more precisely each chromosome, we analyzed DN and DL for direct and inverted repeats (fig. 1 ). It appears that DN is similar for all chromosomes within the same species, whereas DL is not. Thus, DN could define the style of redundancy of the genome. We assume that DN results from the iteration events combined with the loss of duplicated sequences, and then propose DN to be connected to the biological machinery of each species. Because the machinery is clearly different for each species, but similar for all chromosomes within the same genome, DN should be the consequence of each genome's dynamics. Furthermore, the differences between species come essentially from direct repeats, and less from inverted repeats. This suggests that the biological machinery is more connected to the creation and the loss of direct repeats than to the dynamics of inverted repeats.
|
Contrary to DN, DL could reflect better the chromosome history than the effects of the cellular machinery: a unique event of iteration can lead to a high DL for direct or inverted repeats. For example, direct repeats of the chromosome 1 of C. elegans exhibit a high DL and a normal DN (when compared with the other C. elegans chromosome values). This particularity is mainly caused by two large duplicated sequences, one 250-kb long (with an identity of 98.7%) and the other 600-kb long (fractionated into several segments of high identity, often more than 99%). Furthermore, the inverted repeats of the chromosome 1 of S. cerevisiae show a high DL and a normal DN, as a consequence of two internal regions inversely repeated in subtelomeres (Britten 1998
).
A Model of Dynamics of Iteration
Our model of intrachromosomal iteration (Achaz et al. 2000
) is based on a permanent genesis of CDR. The CDRs are then submitted to a high level of exchange (conversion and deletion). This high exchange rate tends to maintain the two copies identically (conversion) and also to eliminate them (deletion). At each round of exchange, both events are possible, but whereas conversion may still be followed by deletion, a deletion event cannot be followed by conversion.
Therefore, on a long timescale, a bias in favor of deletion should be observed. A CDR has to disappear sooner or later (depending on the relative rates of conversion and deletion). However, there are two situations where a repeat would be maintained: when it is protected from deletions by functional pressures (i.e., located inside a gene) or when the copies are spaced by further chromosomal rearrangements. This model was mainly based on three observations for CDR: (1) they are overrepresented, (2) they are mostly located inside the same gene, and (3) their length is positively correlated with the spacer (the physical distance between copies), and their identity is negatively correlated with it.
Through the present analysis, the model was tested with other eukaryotes. It should be mentioned that a model of tandem creation and further dispersion was already invoked for the families of two genes (Hox and NBG) in C. elegans (Ruvkun and Hobert 1998
). The annotations of eukaryote chromosomes being partial, they were not taken into account. Thus, we did not analyze the relation of repeats position with genes location.
CDRs Are Overrepresented
The repartitions of spacer size for direct and inverted repeats (fig. 2
) reveal that CDRs are overrepresented as compared with close inverted repeats. Moreover, in the previous study (Achaz et al. 2000
), the repeats of S. cerevisiae were compared with the repeats that issued from random chromosomes. From this comparison, we showed that such close repeats (inverted or direct) are absent from random chromosomes. This strongly suggests that these CDRs are not the result of chance. The presence of many CDRs in all chromosomes is in good agreement with the model.
|
In conclusion, we did not yet find any plausible hypothesis to understand why these repeats are overrepresented.
CDRs Are Identical and Short
We started by characterizing CDRs in terms of the distribution of their identity (fig. 3
). Except for S. cerevisiae, CDRs have their two copies more identical than distant direct repeats (P < 10-4, Mann-Whitney rank test).
|
This greater similarity could be explained, on the one hand, by the recent origin of these repeats and, on the other, by a high conversion rate between the two copies when they are close together. As previously discussed, CDR could also be submitted to a high deletion rate. It has been reported that recombination rate is positively correlated with repeat length in yeast (Jinks, Michelitch, and Ramcharan 1993
) and in mammalian cells (Rubnitz and Subramani 1984
). Thus, CDRs with long copies are too unstable to persist, and only small CDRs are conserved. In order to test this hypothesis, the length distributions of close and distant direct repeats were compared: it appeared that CDRs are smaller than the distant ones (P < 10-4, Mann-Whitney rank test).
CDRs Exhibit an Exchange Rate Negatively Correlated with the Spacer Size
We previously observed a positive rank correlation between length and spacer and a negative rank correlation between identity and spacer for CDR in yeast: the closer the repeats, the more identical and shorter they are. Except for the P. falciparum chromosomes, correlations between identity, length and spacer were found in all eukaryotes (Table 2
). This is in good agreement with an observation reported in C. elegans that the similarity between paralogous genes is negatively correlated with the physical distance between them (Semple and Wolfe 1999
).
|
In conclusion, the properties which supported the model of iteration dynamics established in S. cerevisiae are shared by other eukaryotes. This suggests that the model could be extended to all eukaryotes.
The Case of P. falciparum: How Parasitism Influences the Genome Style
P. falciparum chromosomes exhibit a high level of redundancy as compared with similar-sized chromosomes of S. cerevisiae (fig. 1
), and their CDRs are extremely overrepresented: 74% have a spacer smaller than 1 kb (fig. 2
). They are very identical (fig. 3
) and very small (data not shown). However, no correlation between spacer, identity, and length can be highlighted (Table 2 ).
Two-thirds of the inverted repeats are located near the telomeres (one copy in each subtelomere), suggesting a peculiar history and a high exchange rate for these repeats. It was suggested that all subtelomeres exhibit a very plastic dynamics in S. cerevisiae (Pryde, Gorham, and Louis 1997
) and in H. sapiens (Coleman, Baird, and Royle 1999
). Their importance in the interchromosomal iteration process was demonstrated in S. cerevisiae (Coissac, Maillier, and Netter 1997
).
All these observations are consistent with what was described previously: the highly repeated gene families and the special status of subtelomeres in P. falciparum (Gardner et al. 1998
; Bowman et al. 1999
).
Do These Observations Mean that This Ciliate Does Not Follow the Same Dynamics as the Other Eukaryotes?
P. falciparum is a human pathogenic parasite, the main agent of malaria. It has been reported that many bacterial pathogens exhibit a high redundancy level (Rocha, Danchin, and Viari 1999b
) which has been related to high selective pressures for sequence variation. A significant number of repeats allows many recombination events, leading to a high plasticity of the genome, and then to a high evolution rate. As for these bacteria, the high redundancy level of P. falciparum could be a consequence of its parasitism.
The quasi-absence of distant repeats and the absence of correlation indicate that there are almost only young repeats. The absence of correlation is, in this way, not caused by the absence of the mechanism leading to them but by too short a time of evolution. Population studies suggest that P. falciparum spread worldwide from a limited area (Rich and Ayala 2000
). The absence of old repeats could be a consequence of the recent change in the ecological conditions of P. falciparum, associated with a burst of evolution. In conclusion, P. falciparum follows the same iteration dynamics as the other eukaryotes. However, because it is a recent parasite, its chromosomes are more repeated than those of the other eukaryotes (as a result of parasitism), and there are almost no ancient repeats (because of its recent emergence).
How Tandem Repeats Can Be Turned into Spaced Repeats
Intrachromosomal repeats, in our model, are mostly created in tandem (by recombination between sister chromatides or by replication slippage), and are turned into distant repeats by chromosomal rearrangements. Analyzing all the ending states after several rearrangements is difficult. However, it is interesting to examine all the theoretical resulting states obtained after only one rearrangement event. Three kinds of rearrangement have been taken into account (fig. 4
): deletion of a part of the tandem, insertion of a sequence inside the tandem repeat, and inversion taking away a piece of the tandem. The insertion process can be the result of either the insertion of a transposable element or the reparation of a double-strand break by sequence conversion (Voelkel and Roeder 1990
). Small inversions have been suggested to explain the evolution of the genes' order between C. albicans and S. cerevisiae (Seoighe et al. 2000
), highlighting their role in genome dynamics.
|
Results were sorted as a function of the number of paralogs detected in the chromosomes. For most spacers, no paralog was found. This has several possible reasons: (1) our criteria were very stringent, (2) the research was performed against the whole genome only for S. cerevisiae and C. elegans, and (3) we only detected paralogs for spacers issued from a recent unique event of rearrangement. Multiparalog families (when a spacer presented at least two paralogs) were separated because they give an idea of the relative transposition rate. All cases where the spacer had only one paralog have been analyzed more precisely as they appeared in figure 4 .
As shown in Table 3 , all possible remnants of the tandem rearrangement were detected in the sequence of chromosomes. These observations indicate that the theoretical rearrangements arise in the genome history, reinforcing the model of the iteration dynamics.
|
All generic duplications of nonspecific DNA regions within the same chromosome or between two chromosomes were referred to in this study as iteration. However, this iteration process should be divided into at least two distinct mechanisms. The first is the creation of tandem repeats (by sister chromatide exchange or replication slippage), which creates (under our model) most of the intrachromosomal repeats. The second is the genesis of repeats (inter- or intrachromosomal) by a double-strand break repair. Actually, this repair can lead to duplication when the repair is associated with a conversion mechanism. This implies that the duplication process can at least be divided into four mechanisms: abnormal chromosome segregation (hyperploidization); transposition (transposable elements); sister chromatide exchange, replication slippage (tandem repeats and satellites), or both; and double-strand break repair (iteration by conversion).
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The density of repeats number defines a genome style where the evolution rate results from iteration, deletion, rearrangement, and mutation. This rate is similar for all chromosomes within the same genome and is specific to each species. The main exception being the X chromosome of C. elegans, it suggests that exchanges between homologous chromosomes are important in the genesis of repeats. Thus, we propose that the genesis of tandem repeats is at least a consequence of exchange between homologous chromosomes.
Finally, we brought out the remnants of rearrangements of tandem repeats into spaced repeats. This suggests that tandem repeats, which can be easily created, are submitted to rounds of chromosomal rearrangements leading to the pattern of repeats observed today. Hence, repeats can be used to follow chromosome rearrangements and are markers of genome dynamics.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: genome dynamics
evolution
duplication
eukaryotes
Address for correspondence and reprints: Guillaume Achaz, Structure et Dynamique des Génomes, Institut Jacques Monod, Tour 4344, 1° Étage, 4, Place Jussieu, 75251 Paris Cedex 05, France. achaz{at}ijm.jussieu.fr
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Achaz G., E. Coissac, A. Viari, P. Netter, 2000 Analysis of intrachromosomal duplications in yeast Saccharomyces cerevisiae: a possible model for their origin Mol. Biol. Evol 17:1268-1275
Adams M. D., S. E. Celniker, R. A. Holt, et al. (195 co-authors) 2000 The genome of Drosophila melanogaster Science 287:2185-2195
Amores A., A. Force, Y. L. Yan, et al. (13 co-authors) 1998 Zebrafish hox clusters and vertebrate genome evolution Science 282:1711-1714
Barnes T. M., Y. Kohara, A. Coulson, S. Hekimi, 1995 Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans Genetics 141:159-179
Baudat F., A. Nicolas, 1997 Clustering of meiotic double-strand breaks on yeast chromosome III Proc. Natl. Acad. Sci. USA 94:5213-5218
Bernardi G., 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]
Blanc G., A. Barakat, R. Guyot, R. Cooke, M. Delseny, 2000 Extensive duplication and reshuffling in the Arabidopsis genome Plant Cell 12:1093-1101
Bowman S., D. Lawson, D. Basham, et al. (36 co-authors) 1999 The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum Nature 400:532-538[ISI][Medline]
Britten R. J., 1998 Precise sequence complementarity between yeast chromosome ends and two classes of just-subtelomeric sequences Proc. Natl. Acad. Sci. USA 95:5906-5912
Coissac E., E. Maillier, P. Netter, 1997 A comparative study of duplications in bacteria and eukaryotes: the importance of telomeres Mol. Biol. Evol 14:1062-1074[Abstract]
Coleman J., D. M. Baird, N. J. Royle, 1999 The plasticity of human telomeres demonstrated by hypervariable telomeres repeat array that is located on some copies of 16p and 16q Hum. Mol. Genet 8:1637-1646
Consortium. 1998 Genome sequence of the nematode C. elegans: a platform for investigating biology Science 282:2012-2018
Dunham I., N. Shimizu, B. A. Roe, et al. (239 co-authors) 1999 The DNA sequence of human chromosome 22 Nature 402:489-495[ISI][Medline]
Friedman R., A. L. Hughes, 2000 Gene duplication and the structure of eukaryotic genomes Genome Res 11:373-381
Gardner M. J., H. Tettelin, D. J. Carucci, et al. (27 co-authors) 1998 Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum Science 282:1126-1132
Goffeau A., B. G. Barrell, H. Bussey, et al. (16 co-authors) 1996 Life with 6000 genes Science 274:546
Hattori M., A. Fujiyama, T. D. Taylor, et al. (62 co-authors) 2000 The DNA sequence of human chromosome 21 The chromosome 21 mapping and sequencing consortium. Nature 405:311-319
Henikoff S., 2000 Heterochromatin function in complex genomes Biochem. Biophys. Acta 1470:O1-O8[ISI][Medline]
Holland P. W., 1999 Gene duplication: past, present and future Semin. Cell Dev. Biol 10:541-547[ISI][Medline]
Hughes A. L., J. Da Silva, R. Friedman, 2001 Ancient duplication did not structure the human Hox-bearing chromosomes Genome Res 11:771-780
Jinks R. S., M. Michelitch, S. Ramcharan, 1993 Substrate length requirements for efficient mitotic recombination in Saccharomyces cerevisiae Mol. Cell. Biol 13:3937-3950[Abstract]
Karlin S., F. Ost, 1985 Maximal segmental match length among random sequences from a finite alphabet Pp. 225243 in L. M. L. Cam and R. A. Olshen, eds. Proceedings of the Berkeley Conference in honor of Jerzy Neyman and Jack Kiefer, Vol. 1. Association for Computing Machinery, New York
Kurtz S., C. Schleiermacher, 1999 REPuter: fast computation of maximal repeats in complete genomes Bioinformatics 15:426-427
Leung M. Y., B. E. Blaisdell, C. Burge, S. Karlin, 1991 An efficient algorithm for identifying matches with errors in multiple long molecular sequences J. Mol. Biol 221:1367-1378[ISI][Medline]
Lin X., S. Kaul, S. Rounsley, et al. (39 co-authors) 1999 Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana Nature 402:761-768[ISI][Medline]
Llorente B., A. Malpertuy, C. Neuveglise, et al. (24 co-authors) 2000 Genomic exploration of the hemiascomycetous yeasts: 18 Comparative analysis of chromosome maps and synteny with Saccharomyces cerevisiae. FEBS Lett 487:101-112
Masterson J., 1994 Stomatal size in fossil plants: evidence for polyploidy in majority of angiosperms Science 264:421-424[ISI]
Mayer K., C. Schuller, R. Wambutt, et al. (234 co-authors) 1999 Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana Nature 402:769-777[ISI][Medline]
Ohno S., 1970 Evolution by gene duplication Springer-Verlag, Heidelberg, Germany
Peeters B. P. H., J. H. De Boer, S. Bron, G. Venema, 1988 Structural plasmid instability in Bacillus subtilis: effect of direct and inverted repeats Mol. Gen. Genet 212:450-458[ISI][Medline]
Pryde F. E., H. C. Gorham, E. J. Louis, 1997 Chromosome ends: all the same under their caps Curr. Opin. Genet. Dev 7:822-828[ISI][Medline]
Rich S. M., F. J. Ayala, 2000 Population structure and recent evolution of Plasmodium falciparum Proc. Natl. Acad. Sci. USA 97:6994-7001
Robinson-Rechavi M., O. Marchand, H. Escriva, P. L. Bardet, D. Zelus, S. Hughes, V. Laudet, 2001 Euteleost fish genomes are characterized by expansion of gene families Genome Res 11:781-788
Rocha E. P., A. Danchin, A. Viari, 1999a Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis and other competent prokaryotes Mol. Biol. Evol 16:1219-1230[Abstract]
Rocha E. P., A. Danchin, A. Viari, 1999b Functional and evolutionary roles of long repeats in prokaryotes Res. Microbiol 150:725-733[ISI][Medline]
Rubnitz J., S. Subramani, 1984 The minimum amount of homology required for homologous recombination in mammalian cells Mol. Cell Biol 4:2253-2258[ISI][Medline]
Ruvkun G., O. Hobert, 1998 The taxonomy of developmental control in Caenorhabditis elegans Science 282:2033-2041
Semple C., K. H. Wolfe, 1999 Gene duplication and gene conversion in the Caenorhabditis elegans genome J. Mol. Evol 48:555-564[ISI][Medline]
Seoighe C., N. Federspiel, T. Jones, et al. (20 co-authors) 2000 Prevalence of small inversions in yeast gene order evolution Proc. Natl. Acad. Sci. USA 97:14433-14437
Smith T. F., M. S. Waterman, 1981 Identification of common molecular subsequences J. Mol. Biol 147:195-197[ISI][Medline]
Surzycki S. A., W. R. Belknap, 2000 Repetitive-DNA elements are similarly distributed on Caenorhabditis elegans autosomes Proc. Natl. Sci. USA 97:245-249
Vincens P., L. Buffat, C. Andre, J. P. Chevrolat, J. F. Boisvieux, S. Hazout, 1998 A strategy for finding regions of similarity in complete genome sequences Bioinformatics 14:715-725[Abstract]
Vision T. J., D. G. Brown, S. D. Tanksley, 2000 The origins of genomic duplications in Arabidopsis Science 290:2114-2117
Voelkel K., G. S. Roeder, 1990 Gene conversion tracts stimulated by HOT1-promoted transcription are long and continuous Genetics 126:851-867
Wolfe K. H., D. C. Shields, 1997 Molecular evidence for an ancient duplication of the entire yeast genome Nature 387:708-713[ISI][Medline]