Abteilung für Pflanzenzüchtung und Ertragsphysiologie;
Zentrum zur Identifikation von Genfunktionen durch Insertionsmutagenese bei Arabidopsis thaliana (ZIGIA), Max-Planck-Institut für Züchtungsforschung, Köln, Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Comparative mapping in grasses and Solanaceae suggests that reorganization of NBS-LRR genes can occur rapidly (Leister et al. 1998
; Pan et al. 2000
). In extreme cases, copy number can vary widely among varieties of a particular species. Moreover, in different grass species, syntenic loci are frequently lost (Leister et al. 1998
), although plant genomes may harbor clusters of highly dissimilar NBS-LRR genes, the so-called mixed clusters (Leister et al. 1998, 1999
; Pan et al. 2000
). Both interlocus recombination and divergent selection acting on duplicated genes have been suggested as major factors in the generation of gene diversity within existing clusters (Leister et al. 1998
; Michelmore and Meyers 1998
; Parniske and Jones 1999
). Unlike these mechanisms, unequal crossing-over and gene conversion should cause sequence homogenization, and thus lead to concerted evolution of R genes within clusters (Michelmore and Meyers 1998
; Parniske and Jones 1999
; Young 2000
).
The genome of the ecotype Col-0 of Arabidopsis thalianathe first genome of a flowering plant to be completely sequencedcontains more than 150 NBS-LRR genes organized as isolated single genes and in tandem arrays (The Arabidopsis Genome Initiative 2000
). We have reconstructed the mode of relatively recent R gene evolution in this plant by combining data on the genomic organization of the entire set of NBS-LRR genes with their phylogenetic analysis.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequence Mapping and Physical Clustering
Complete A. thaliana BAC clone sequences and the BAC sequence status tables were retrieved from MATDB and used to assemble pseudochromosomes containing the contiguous genomic sequence of the ecotype Col-0. The MIPS annotations of NBS-LRR proteins were used to extract their genomic DNA sequences and subsequently BLASTed against all five pseudochromosomes for physical mapping. Linked NBS-LRR genes were grouped into clusters when they were not interrupted by more than eight other open reading frames (ORFs) encoding nonNBS-LRR proteins.
Phylogeny and Sequence Analyses
Cluster-based Approach
Protein BLAST searches were carried out among all members of individual clusters, and the highest e-value found was assigned to this cluster and used as a threshold similarity level for a BLAST search against all other NBS-LRR protein sequences. The resulting BLAST matrix was then used to identify NBS-LRR protein sequences homologous to cluster members. Lists of related NBS-LRR genes for each of the 40 clusters were compared and grouped into clades. Sequence similarities on the amino acid and nucleotide levels were determined within and among gene clades using the Genetics Computer Group (GCG) (Devereux, Haeberli, and Smithies 1984
) package.
Universal Phylogenetic Tree
All 166 NBS-LRR protein sequences were aligned by CLUSTALW (Thompson, Higgins, and Gibson 1994
), bootstrapped, and then subjected to parsimony and distance-matrix (observed differences and neighbor-joining) analyses (PAUP, V4b5 for Unix; Swofford 2000
).
Phylogenetic Analyses of NBS-LRR Protein Sequences
Protein sequences present in the clades, identified by the two approaches, were aligned. Trees were inferred using protein maximum-likelihood with PROTML (MOLPHY; Adachi and Hasegawa 1996
), using the JTT-F matrix with the neighbor-joining tree of ML distances as the starting topology and RELL bootstrapping (104). Members of different clades with similar N-termini (TIR or CC) were selected as outgroups.
Modeling of Random Sequence Sampling
The expectation values µ for the generation of heterogeneous clusters by random sequence sampling were determined according to the following equation:
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Clusters Can Sample Genes from Different Clades
The data obtained from the mapping of NBS-LRR genes and from phylogenetic analyses were combined, visually summarizing all information on relative gene location in the physical map, gene orientation, gene clustering, and assignment of cluster members to clades (fig. 3
). Most gene clusters were found to contain members of only one clade, but 10 clusters were made up of genes from up to three different clades (heterogeneous clusters). Twenty-six clusters contained only two gene copies in either head-to-head or head-to-tail orientation. Clusters containing members of the same clade exhibited modules of common origin, such as the head-to-tail orientation of genes within the pink clusters 6, 7, 8, and 32, and head-to-head orientation within the dark-green/light-green clusters 12, 18, 24, and 33. Furthermore, large clusters, such as the pink cluster 1 with four genes and the dark-green/light-green cluster 34 with six genes, were easily interpreted as amplification products of two-gene modules. The three clusters 11, 21, and 37 had up to 10 gene copies, mostly organized in a head-to-tail orientation. Seven heterogeneous clusters contained a module comprising a dark-green/light-green gene pair oriented head-to-head. The heterogeneous cluster 28 contained a yellow head-to-head module and a red gene, whereas cluster 38 consisted of one orphan gene next to two genes from the pink clade.
|
Expectation values for the number of heterogeneous clusters derived from random sequence sampling were calculated as described in Materials and Methods. Three different cluster sizes having decreasing numbers of total genes in the cluster (i.e., 10, 5, and 2) were tested. The prediction was that 9.5 heterogeneous clusters should exist with 10 genes, or 4.3 with a total of 5 genes, or 1.1 with 2 genes. The actual numbers in the Col-0 genome were 6, 5, and 3, respectively, implying that most heterogeneous clusters containing solely NBS-LRR genes were not derived from random sampling. On the basis of these results, we conclude either that a mechanism selectively sampling NBS-LRR genes of different clades is active or that positive selection acts on heterogeneous clusters generated by random sequence sampling.
Phylogeny Meets Genomics
To reconstruct the evolution of NBS-LRR genes, we considered all recent gene duplication events revealed by terminal branchings and, in addition, some recent duplications indicated by subterminal branchings in the trees of figure 2bj
. These events were also included in the combined map of figure 3
, as indicated by black highlighting and designation of the duplicated copy in parentheses or by a line joining the genes concerned. Presumably recent intra- or interchromosomal rearrangements that increased gene number were indicated in 50 cases. Sixteen were recent rearrangements of single-gene loci involving duplication coupled to remobilization of the gene to another locus. Twenty-six duplication events solely concerned gene clusters, indicating recent enlargement of these loci. In some cases, duplication and remobilization of entire clusters was evident. Thus, the yellow cluster 19 was formed by duplication of a part of cluster 11 or vice versa (fig. 2j
). Duplication and remobilization is also strongly supported by the fact that the relative orientation of these three genes is the same in both clusters ([tail-headgene 1][head-tailgene 2][tail-headgene 3]; fig. 3
). Further examples of duplication and remobilization of entire clusters include the yellow clusters 2 and 28, and the heterogeneous clusters 33 and 34. Precise reconstruction of the phylogeny of clusters 12, 18, 22, and 34 is hampered because they are closely related. The heterogeneous clusters 22 and 25 contain genes most closely related to members of other adjacent clusters (fig. 3
), suggesting that they originated from recent intrachromosomal recombination events.
NBS-LRR Gene Rearrangements Break Gene Order
The positions of NBS-LRR gene rearrangements were correlated with the duplicated chromosomal segments present in the Col-0 genomeproposed to be the result of an ancestral duplication of the progenitor genome that took place about 112 Myr ago (Ku et al. 2000
), which was followed by subsequent chromosomal rearrangements (The Arabidopsis Genome Initiative 2000
). The 24 large duplicated segments of
100 kbp make up 65.6 Mbp (58%) of the genome (Lin et al. 1999
; Mayer et al. 1999
; Blanc et al. 2000
; The Arabidopsis Genome Initiative 2000
) and harbor 126 of the 166 NBS-LRR genes detected. Recent NBS-LRR gene rearrangements could not be accounted for based on the positions and extents of segmental duplications. However, such large segmental duplications may have given rise to as many as 30 NBS-LRR rearrangements (data not shown). To identify additional, smaller, segmental duplications, we performed pairwise sequence similarity searches for the next 10 genes (five on each side) flanking each rearranged NBS-LRR gene locus. Only four recent events on chromosomes 1, 3, and 4 (as indicated by gray highlighting or by a dotted line joining the genes concerned in fig. 3
), involving duplications of at least one gene tightly linked to NBS-LRR loci, were detected (cluster 8 and the single-gene locus 1g62630, single-gene loci 1g10920 and 1g53350, single-gene loci 1g52660 and 3g15700, one gene copy each from clusters 20 and 22).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Gene amplification followed by unequal crossing-over was previously postulated to be the major mechanism involved in the generation of tandem and dispersed R gene families (Sheperd and Mayo 1972
; Hammond-Kosack and Jones 1997
; Holub 1997
; Hulbert et al. 1997
; Michelmore and Meyers 1998
; Ellis, Dodds, and Pryor 2000
), but no comprehensive treatment of this large gene family as the product of cycles of repeated gene rearrangements (The Arabidopsis Genome Initiative 2000
) has yet been attempted. We provide such an analysis. The combination of physical and phylogenetic analyses of the NBS-LRR genes of the Col-0 ecotype of A. thaliana makes it possible to detect, besides ancient events, relatively recent gene rearrangements. The analysis confirms that NBS-LRR genes are organized in single-gene loci, clusters, and superclusters, and thatas described previously (Meyers et al. 1999
; Pan, Wendel, and Fluhr 2000
)mixed clusters containing both TIR- and CC-type genes do not occur. Nine NBS-LRR gene clades and a few phylogenetic orphans can be recognized.
Their phylogeny is reflected in the physical organization of clusters: about three-quarters of the 40 clusters contain only genes from the same phylogenetic lineage; clusters made up of similar genes almost always have identical structures, and large gene clusters most probably originated from simpler modules (figs. 13
). These events have previously been interpreted in terms of exchange of sequence blocks as a result of intralocus recombination (McDowell et al. 1998
; Noel et al. 1999
; Ellis, Dodds, and Pryor 2000
)a mechanism, which has the capacity to alter the numbers of genes in clusters and to generate paraloguous chromosomal loci with a highly variable number of LRR units (Noel et al. 1999
).
The pattern of R gene organization, summarized in figures 13
, suggests that after an ancient event which generated CC- and TIR-type classes, a few ancestral genes underwent local amplification, leading to tandem gene pairs, which could have been broken up by chromosomal translocations or by other types of gene relocation (see discussion that follows). Tandem gene pairs appear to have been amplified to form larger clusters or novel cluster loci, but only nine lineages expanded significantly, thus generating the nine contemporary gene clades. The other lineages had a more limited expansion capacity: in four cases, expansion stopped after the first duplication, leading to phylogenetically isolated tandem gene pairs (orphan clusters 3, 14, 23, 35, and 40), whereas the remaining genes developed into, or were maintained as, single-gene loci. Concomitantly, a contraction phase might have reduced the sizes of specific clades. Some orphan genes are expressed, and one orphan single-gene locus is functional (Rpm1; Grant et al. 1995
), demonstrating that presumably ancient genes can still be active. Because the 20 orphans can be assigned to three TIR- and five CC-type lineages, and the nine ancient progenitors of the large contemporary NBS-LRR gene clades comprise seven TIR- and two CC-type lineages, the entire set of NBS-LRR genes in Col-0 may derive from as few as 10 TIR- and 7 CC-type progenitors.
Although the large segmental duplications in the Arabidopsis genome increased the number of NBS-LRR genes by about 30, the 50 rearrangements of NBS-LRR gene loci considered in our analyses are by definition recent, and thus are not associated with the duplications of large chromosomal segments. Those recent gene rearrangements increased the number of NBS-LRR genes by about 50, with about 20 genes generated by intralocus rearrangements and 30 by duplication followed by translocation. Indeed, such recent NBS-LRR rearrangements interrupt the colinearity of gene order in duplicated chromosomal fragments, and this is compatible with the lack of synteny of NBS-LRR gene loci that is also observed when different species are compared (Leister et al. 1998
). However, previous analyses of R gene clusters revealed that genes separated by speciation but occupying allelic positions within clusters can be more similar than duplicated sequences within a cluster (Michelmore and Meyers 1998
). This was interpreted as evidence for a birth-and-death model (Michelmore and Meyers 1998
), claiming that (1) divergent selection acting on duplicated genes is the major mechanism underlying the generation of R gene variation and (2) in the generation of R gene variation, intergenic unequal crossing-over and gene conversions are not the primary mechanisms. In contrast, analyses of the genomic organization of cereal NBS-LRR genes (Leister et al. 1998, 1999
) and of the Hcr9 gene cluster in tomato (Parniske and Jones 1999
) suggested that ectopic recombination (Leister et al. 1998
) (facilitated by molecular mechanisms poorly understood as yet) and interlocus recombination events (Parniske and Jones 1999
) have been important contributors to R gene cluster heterogeneity. The latter types of rearrangements are difficult to accommodate within the context of conventional evolutionary concepts, but our analyses demonstrate that such mechanisms, which allow sequence sampling between different R gene loci, must exist.
Besides homogeneous clusters, we have found 10 heterogeneous ones, and the sequences within these clusters belong to the major gene clades. The possibility that heterogeneous clusters all derive from diversification within homogeneous clusters must be rejected because it would imply that heterogeneous progenitor clusters gave rise to the present clades. This cannot be the case, simply because orphans appear phylogenetically older than genes grouped in clusters (fig. 2a ). Our data rather show that heterogeneous clusters probably derive from chromosomal translocation or gene-cluster remobilization events that brought together sequences from different clades. This type of gene reorganization is difficult to explain other than in terms of a positive selection for cluster complexity, operative in the presence of pathogens. Some of these events may have taken place early during higher plant evolution, but they nevertheless generated modules that have been duplicated and remobilized more recently. This process resulted in the contemporary situation in which cluster heterogeneity is a major component of NBS-LRR gene complexity, possibly providing the starting material for the generation of new resistance specificities by recombination of diverse NBS-LRR genes. For large clusters, a role for divergent selection acting on duplicated genes in the generation of cluster heterogeneity remains possible, but our data demonstrate that it cannot be considered as the only relevant mechanism in this process.
Is the scenario of amplification and rearrangement described earlier unique for NBS-LRR genes, or does it reflect common principles in genomic evolution in A. thaliana? Because other gene families within the A. thaliana genome exhibit a similarly complex organization (examples include the genes encoding receptor-like kinases, the genes for cytochrome P450-like proteins, and the other classes of R-like genes [The Arabidopsis Genome Initiative 2000
]), it will be interesting to test whether similar mechanisms underlie their evolution.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: Arabidopsis thaliana
evolution
gene cluster
genome
resistance gene
Address for correspondence and reprints: Dario Leister, Max-Planck-Institut für Züchtungsforschung, Carl-von-Linné Weg 10, 50829 Köln, Germany. leister{at}mpiz-koeln.mpg.de
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi J., M. Hasegawa, 1996 MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood Comput. Sci. Monogr 28:1-150
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Baker B., P. Zambryski, B. Staskawicz, S. P. Dinesh-Kumar, 1997 Signaling in plantmicrobe interactions Science 276:726-733
Blanc G., A. Barakat, R. Guyot, R. Cooke, M. Delseny, 2000 Extensive duplication and reshuffling in the Arabidopsis genome Plant Cell 12:1093-1101
Devereux J., P. Haeberli, O. A. Smithies, 1984 A comprehensive set of sequence analysis programs for the VAX Nucleic Acids Res 12:387-395[Abstract]
Ellis J., P. Dodds, T. Pryor, 2000 Structure, function and evolution of plant disease resistance genes Curr. Opin. Plant Biol 3:278-284[ISI][Medline]
Grant M. R., L. Godiard, E. Straube, T. Ashfield, J. Lewald, A. Sattler, R. W. Innes, J. L. Dangl, 1995 Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance Science 269:843-846[ISI][Medline]
Hammond-Kosack K. E., J. D. G. Jones, 1997 Plant disease resistance genes Annu. Rev. Plant Physiol. Plant Mol. Biol 48:575-607[ISI]
Holub E. B., 1997 Organization of resistance genes in Arabidopsis Pp. 526 in I. R. Crute, E. B. Holub, J. J. Burdon, eds. The gene-for-gene relationships in plantparasite interactions. British Society for Plant Pathology, CAB International, Oxon, U.K
Hulbert S., T. Pryor, G. Hu, T. Richter, J. Drake, 1997 Genetic fine structure of resistance loci Pp. 2743 in I. R. Crute, E. B. Holub, and J. J. Burdon, eds. The gene-for-gene relationships in plantparasite interactions. British Society for Plant Pathology, CAB International, Oxon, U.K
Kanazin V., L. F. Marek, R. C. Shoemaker, 1996 Resistance gene analogs are conserved and clustered in soybean Proc. Natl. Acad. Sci. USA 93:11746-11750
Kobe B., J. Deisenhofer, 1993 Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats Nature 366:751-756[ISI][Medline]
Ku H. M., T. Vision, J. Liu, S. D. Tanksley, 2000 Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny Proc. Natl. Acad. Sci. USA 97:9121-9126
Lagudah E. S., O. Moullet, R. Appels, 1997 Map-based cloning of a gene sequence encoding a nucleotide-binding domain and a leucine-rich region at the Cre3 nematode resistance locus of wheat Genome 40:659-665[ISI][Medline]
Leister D., A. Ballvora, F. Salamini, C. Gebhardt, 1996 A PCR-based approach for isolating pathogen resistance genes from potato with potential for wide application in plants Nat. Genet 14:421-429[ISI][Medline]
Leister D., J. Kurth, D. A. Laurie, M. Yano, T. Sasaki, K. Devos, A. Graner, P. Schulze-Lefert, 1998 Rapid reorganization of resistance gene homologues in cereal genomes Proc. Natl. Acad. Sci. USA 95:370-375
Leister D., J. Kurth, D. A. Laurie, M. Yano, T. Sasaki, A. Graner, P. Schulze-Lefert, 1999 RFLP- and physical mapping of resistance gene homologues in rice (O. sativa) and barley (H. vulgare) Theor. Appl. Genet 98:509-520[ISI]
Lin X., S. Kaul, S. Rounsley, et al. (37 co-authors) 1999 Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana Nature 402:761-768[ISI][Medline]
Mayer K., C. Schuller, R. Wambutt, et al. (229 co-authors) 1999 Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana Nature 402:769-777[ISI][Medline]
McDowell J. M., M. Dhandaydham, T. A. Long, M. G. Aarts, S. Goff, E. B. Holub, J. L. Dangl, 1998 Intragenic recombination and diversifying selection contribute to the evolution of downy mildew resistance at the RPP8 locus of Arabidopsis Plant Cell 10:1861-1874
Meyers B. C., A. W. Dickerman, R. W. Michelmore, S. Sivaramakrishnan, B. W. Sobral, N. D. Young, 1999 Plant disease resistance genes encode members of an ancient and diverse protein family within the nucleotide-binding superfamily Plant J 20:317-332[ISI][Medline]
Michelmore R. W., B. C. Meyers, 1998 Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process Genome Res 8:1113-1130
Morel J. B., J. L. Dangl, 1997 The hypersensitive response and the induction of cell death in plants Cell Death Differ 4:671-683[ISI]
Noel L., T. L. Moores, E. A. van Der Biezen, M. Parniske, M. J. Daniels, J. E. Parker, J. D. Jones, 1999 Pronounced intraspecific haplotype divergence at the RPP5 complex disease resistance locus of Arabidopsis Plant Cell 11:2099-2112
Pan Q., Y. S. Liu, O. Budai-Hadrian, M. Sela, L. Carmel-Goren, D. Zamir, R. Fluhr, 2000 Comparative genetics of nucleotide binding site-leucine rich repeat resistance gene homologues in the genomes of two dicotyledons: tomato and Arabidopsis Genetics 155:309-322
Pan Q., J. Wendel, R. Fluhr, 2000 Divergent evolution of plant NBS-LRR resistance gene homologues in dicot and cereal genomes J. Mol. Evol 50:203-213[ISI][Medline]
Parniske M., J. D. Jones, 1999 Recombination between diverged clusters of the tomato Cf-9 plant disease resistance gene family Proc. Natl. Acad. Sci. USA 96:5850-5855
Ronald P. C., 1998 Resistance gene evolution Curr. Opin. Plant Biol 1:294-298[ISI][Medline]
Shepherd K. W., G. M. E. Mayo, 1972 Genes conferring specific plant disease resistance Science 175:375-380[ISI]
Speulman E., D. Bouchez, E. B. Holub, J. L. Beynon, 1998 Disease resistance gene homologs correlate with disease resistance loci of Arabidopsis thaliana Plant J 14:467-474[ISI][Medline]
Spielmeyer W., M. Robertson, N. Collins, D. Leister, P. Schulze-Lefert, S. Seah, O. Moullet, E. S. Lagudah, 1998 A superfamily of disease resistance gene analogs is located on all homologous chromosome groups of wheat (Triticum aestivum) Genome 41:782-788[ISI]
Staskawicz B. J., F. M. Ausubel, B. J. Baker, J. G. Ellis, J. D. Jones, 1995 Molecular genetics of plant disease resistance Science 268:661-667[ISI][Medline]
Swofford D. L., 2000 Phylogenetic analysis using parsimony (*and other methods) Version 4. Sinauer Associates, Sunderland, Mass
The Arabidopsis Genome Initiative. 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4480[Abstract]
Traut T. W., 1994 The functions and consensus motifs of nine types of peptide segments that form different types of nucleotide-binding sites Eur. J. Biochem 222:9-19[Abstract]
Whitham S., S. P. Dinesh-Kumar, D. Choi, R. Hehl, C. Corr, B. Baker, 1994 The product of the tobacco mosaic virus resistance gene N: similarity to toll and the interleukin-1 receptor Cell 78:1101-1115[ISI][Medline]
Young N. D., 2000 The genetic architecture of resistance Curr. Opin. Plant Biol 3:285-290[ISI][Medline]
Yu Y. G., G. R. Buss, M. A. Maroof, 1996 Isolation of a superfamily of candidate disease-resistance genes in soybean based on a conserved nucleotide-binding site Proc. Natl. Acad. Sci. USA 93:11751-11756