* Institute of Cell, Animal and Population Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
Department of Ecology and Genetics, University of Aarhus, Ny Munkegade, Denmark
Correspondence: E-mail: deborah.charlesworth{at}ed.ac.uk.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: self-incompatibility Arabidopsis lyrata linkage disequilibrium polymorphism kinase domain gene conversion
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The very high diversity in the S-domain makes it difficult to evaluate the relative effects of recombination, gene conversion, selection, and genetic drift on its sequence evolution. Saturation hinders detection of genetic exchange, and the fact that even the most similar alleles differ at more than 30% of sites in the S-domain (Schierup et al. 2001) makes it impossible to estimate the proportion of sites determining the specificity. Here we approach these problems by (1) studying the intracellular kinase domain and intervening introns to get much longer sequences and (2) sequencing several copies of each specificity.
Although the kinase domain is unlikely to be involved in determining specificity, polymorphism in this domain is also of great interest. It probably functions in initiating the cascade of events triggered by the recognition of incompatible pollen, so its sequence evolution may be constrained, and the level of nonsynonymous polymorphism should be lower than in the S-domain. Lower diversity and longer sequences should make diversity patterns clearer, in particular, allowing us to test whether synonymous variability declines with distance from the S-domain HV regions. If only the S-domain functions in specificity, neutral diversity should decline if recombination occurs. Studies of a more conserved region such as the kinase domain will also allow better tests for recombination in the S-locus, using patterns of linkage disequilibrium, since such tests are difficult if the sequence alignment is uncertain (Awadalla and Charlesworth 1999).
A study of exons 2 to 5 of eight different Brassica oleracea SRK alleles suggests lower polymorphism in intron 4 than introns 2 and 3 (Nishio et al. 1997). These data do not show whether variability declines in a consistent manner in the kinase domain, as no quantitative analysis of diversity has been attempted, and the decrease in variability could be solely due to the high intron 2 variability. This intron includes many insertions and deletions (indels), making alignment uncertain, and it may have unusual mutational properties (it is adjacent to a transmembrane domain that has high AT content, which may be under different selective constraints from other parts of the protein).
Theoretical Expectations for Intraallelic and Interallelic Haplotype Structure
To understand the evolution of S-allele sequences, we must understand the factors that affect substitutions within and between these alleles. Polymorphisms at a locus under balancing selection, such as the S-locus, can be divided into variants that cause the functional differences and are themselves under balancing selection (these variants define different allelic classes/specificities) and neutral or weakly selected variants that are associated with the different allelic classes. Such associations are expected to persist for fairly long evolutionary times, because variants arising in a member of a given allelic class may increase in frequency and be fixed within the given class by genetic drift, but only recombination and/or gene conversion between allelic classes allow their movement to a different allele class. Like population subdivision, this can lead to sequence divergence between alleles of different functional classes and thus to linkage disequilibrium for neutral variants in a linked region (e.g., Strobeck 1983; Hudson and Kaplan 1988; Charlesworth, Nordborg, and Charlesworth 1997; McVean 2001). The degree of sequence differentiation, and the extent of the region of linkage disequilibrium, between functional allelic classes depends on the recombination frequency relative to the rate of mutational input of variants that do not affect the functional differences between the alleles. If recombination is absent or very infrequent, extreme haplotype structure may evolve, and neutral variants will often be found at intermediate frequencies in the set of alleles taken as a whole. Thus, only a minority of the variants in S-alleles, and the nearby genomic regions, may be maintained by balancing selection, and much of the diversity is probably a consequence of linkage to these few selected sites. The nonselected variants, including synonymous and intron polymorphisms, will nevertheless betray the operation of balancing selection, first by their high diversity and second because tests such as Tajima's test (Tajima 1993) will detect high variant frequencies.
Because there are many different S-alleles, several polymorphic amino acids must be involved in recognition functions (perhaps at least as many sites as the number of alleles, or lower if different specificities depend on combinations of amino acids; two alternative amino acids at six positions could potentially yield 26 = 64 alleles). If the specificity-determining amino acids occur throughout the S-domain, the region of increased diversity within the locus could thus be extensive (Navarro and Barton 2002; Nordborg and Innan 2003), rather than the sharp peaks expected when only a few alleles are maintained (for example at an allozyme locus where a single amino acid may distinguish the variants [see Hudson and Kaplan 1988]).
Divergence of S-allele sequences will not be restricted to synonymous differences. Even nonsynonymous mutations may drift to high frequency or fixation within classes if alleles do not recombine. Because of the strong selection maintaining a given specificity, an allelic class might accumulate deleterious substitutions without being lost from the population, especially because homozygotes are rarely formed, except for the most recessive S-alleles; recessive deleterious mutations are therefore not selectively eliminated. Thus, numerous amino acid differences in addition to those involved in determining the incompatibility types could accumulate, provided they are not strongly detrimental to the protein's function. The presence of such variants would then lead to minor fitness differences between allelic classes, which may be detectable from deviations from expected equilibrium frequencies in populations surveys.
These processes are accentuated because the maintenance of many different incompatibility types in a species implies low effective population sizes of individual allelic classes. In a gametophytic incompatibility system, the effective size of an allelic class in a panmictic population is approximately f*N, where N is the species' effective population size and f the harmonic mean of the frequency of the allelic class (Vekemans and Slatkin 1994). The effective size is always smaller than N/ne, where ne is the effective number of alleles. In sporophytic systems, more recessive alleles have larger than average effective population sizes (Uyenoyama 2000) but still much smaller than the species' effective population size.
The allelic classes' low effective sizes also imply low diversity among different members within each allelic class. Self-incompatible plant populations are, however, usually subdivided into more or less isolated demes, which will allow differentiation to build up within allelic classes between populations. However, although individual alleles may often be lost from a given deme, the strong advantage of rare S-alleles means that, unless the deme is completely cut off from migration, any lost allele that is still present in other populations will usually be restored. Differentiation is thus predicted to be low for different specificities at loci under balancing selection (such as S-loci) compared with other loci (Schierup, Vekemans, and Charlesworth 2000). On the other hand, extinction of allelic lineages in subpopulations and recolonization from other populations will further reduce the allelic effective population sizes compared with neutral reference genes, similarly to the effects in a metapopulation (Wade and McCauley 1988; Pannell and Charlesworth 1999; Schierup, Vekemans, and Charlesworth 2000). Thus, whereas allelic lineages themselves are ancient, and overall variability at these loci is high, the alleles within a given lineage may have a recent common ancestor, so that little polymorphism is expected within allelic classes.
In Brassica, there have been a few comparisons between sequences of the same functional S-allele from independent individuals. SRK S-domain sequences have been compared between pairs of haplotypes with the same incompatibility types. A pair of recessive (class II) B. oleracea S2 alleles differ by 31/856 amino acids (97.3% amino acid identity), and a pair of the more dominant S13 alleles by 11/856 amino acids (99.8% amino acid identity) plus a 2-bp indel (Kusaba et al. 2000). In a larger sample of B. oleracea, SLG and SRK sequences from kale, broccoli, and Brussels sprout cultivars, alleles with the S2 specificity formed two haplotypes. These have different linked SLG locus sequences (SLGa or SLGb, differing by up to 12% of the amino acid sites) as well as lesser differences in their SRK sequences, all of them outside the hypervariable regions (Miege et al. 2001), suggesting that the different S2 alleles have existed for prolonged evolutionary times. In wild species, it is difficult to collect sets of alleles with the same specificity, and the only such data for gametophytic S-alleles is the study of one allele in Papaver rhoeas (Lawrence et al. 1993).
Here, we describe a study of SRK allele sequences from natural populations of A. lyrata. We provide sequences of a greater length of the kinase domain than has previously been available and analyze the sequence results from a set of 18 alleles, including within-allelic class variation for nine of them, to test the three questions outlined above: whether variability declines with distance from the S-domain HV regions, whether this is due to recombination in the S-locus, and whether patterns of diversity within and between allelic classes suggest recombination.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cloning and Sequencing
All Aly 13 sequence subtypes were cloned from PCR products using TOPO XL (Invitrogen, San Diego, Calif.). DNA sequencing was performed on an ABI 377 automatic sequencing machine using either Big Dye (Applied Biosystems, Foster City, Calif.) or Dyenamic (Amersham Biosciences). Sequences were checked manually for accurate base calling using Sequencher (Gene Codes Corporation). Details of the combinations of sequencing primers used for each allele can be obtained from the authors.
To compare sequences of the same allele from different individuals with known Aly13 subtypes, we studied two sequences of the same Aly13 subtype from the same population of origin and one from a different natural population. As will be seen, a given subtype scored in this way was found to yield almost identical sequences from different individual plants. To ensure that the few differences between sequences of the same Aly13 subtype were not caused by errors in the sequencing procedure, at least three clones were sequenced from the PCR product of each individual for a given Aly13 subtype. This gives an upper estimate of the extent of differences between the same subtype (counting all observed variants) and a lower estimate (discounting variants seen in only one clone).
Sequence Analysis
The nucleotide sequences were aligned using ClustalX version 1.81 (Jeanmougin et al. 1998), and manual adjustments were performed using the SeAl 1.0 sequence editor (http://evolve.zoo.ox.ac.uk/software/Se-Al/main.html). Intron-exon boundaries were determined by aligning our sequences with the published SRKa and SRKb cDNA sequences (Kusaba et al. 2001 [GenBank accession numbers AB052755 and AB052756, respectively]). This was verified using the program Splice Site Prediction by Neural Network (http://www.fruitfly.org/seq_tools/splice.html).
Neighbor-joining and minimum-evolution trees (both using Jukes-Cantor correction) were generated with MEGA version 2.1 (Kumar et al. 2000), using the Nei-Gojobori (1986) method. Trees were estimated for both synonymous and nonsynonymous sites, using the joint alignment of the Aly13 sequences from A. lyrata and several A. thaliana S-domain gene sequences. The A. thaliana genes ARK1, ARK2, and ARK3 encode S-domain proteins that are thought not to be involved in self-incompatibility (GenBank accession numbers M80238, AY045777, AL031187, respectively) and included to root the trees, and T6K22.100 is the putative ortholog of the Aly13 gene (in A. thaliana, T6K22.100 is a pseudogene [Kusaba et al. 2001]).
The nucleotide diversity among the sequences and divergence between Aly13 subtypes from different populations were estimated using the program MEGA (Kumar et al. 2000). Nucleotide positions with indels were removed from the analysis, and most of our analyses were restricted to the exons because the introns were largely unalignable between allelic classes (see below).
Analysis of Recombination
We tested for recombination in the Aly13 sequences, using two methods. The first uses the relationship between linkage disequilibrium (LD) and the distance between polymorphic sites (Awadalla and Charlesworth 1999). Significance of the correlation coefficients of two LD measures with distance (r2 or D') was determined from 5,000 random permutations of the variable sites using the R2 program (http://www.daimi.au.dk/compbio/r2). The second analysis is based on pairs of sites (Hudson 2000) using the composite likelihood finite sites extension to this method (McVean, Awadalla, and Fearnhead 2002), implemented in the LDhat program (http://www.stats.ox.ac.uk/
mcvean). It is not clear how this approach is affected by balancing selection, but simulations show that LDhat does not falsely infer recombination when balancing selection acts in the absence of recombination (M. H. Schierup, unpublished data). With our data, the results were similar to those of the first approach. Because the sequences include intron regions that differ in length, distances between sites were assumed to include introns with lengths equal to the median length for each intron. Sites with gaps (or missing sequence) in more than two of the sequences were excluded from the analysis. Because of the uncertainty about the evolutionary history of a subset of the sequences, analyses were done on four different subsets of the sequences (see below)
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
The segregation results of the putative S-alleles in the families shown in table 3 are consistent with the plants' incompatibility types. The set A sequences, and the B sequences that are linked to the S-locus, behave indistinguishably in this respect and are therefore probably functional SRK alleles. We cannot exclude the possibility that the linked group B sequences might come from a paralogous locus with no function in SI but located in the S-locus genome region. However, the 13-6 sequence is expressed in flower buds but not in leaves, consistent with being an S-allele. In contrast, the unlinked 13-7 sequence has the opposite expression pattern (N. Prigoda and B. K. Mable, unpublished data). Interestingly, the two set B subtypes that have been tested (Aly13-6 and Aly13-14) are both recessive (L. Nielsen and M. H. Schierup, unpublished data), whereas the set A alleles so far tested are dominant or partially dominant (13-9, 13-12, 13-13, 13-15, and 13-22) or of intermediate dominance (alleles 13-16 and 13-25) (Mable, Schierup, and Charlesworth 2003; L. Nielsen and M. H. Schierup, unpublished data).
Functional Domains
The two alleles studied by Kusaba et al. (2001) were identified from cDNA. Among our sequences, Aly13-13's S-domain is almost identical with that of the SRKa allele, and Aly13-20 matches allele SRKb of Kusaba et al. (2001). Thus, at least some of our set A sequences probably encode functional SRK proteins. Moreover, several putative functional domains are recognizable in our sequences. There is a sequence similar to the phosphorylation site and ATP-binding region of other serine-threonine kinase domains, starting near the end of exon 3 and extending into exon 4. In exon 5, amino acids 35 to 47 resemble the serine/threonine kinase active site, including the motif (from residues 39 to 44) DLKASN. Finally, the last two amino acids of exon 5 and the first seven in exon 6 form a motif GTYGYMAPE. These motifs are also present in Brassica S-linked loci (Suzuki et al. 1999). All set A sequences and the A. thaliana sequence T6K22-100 have the DLKASN motif, but the set B sequences all have the A replaced by T, as do the kinase regions of the paralogous genes Aly8, Aly101, and 10.2 (see Schierup et al. 2001). The GTSGYMAPE motif is found in set B, whereas the A sequences (and T6K22-100) have S instead of A in the second position.
The region for which we have sequence data includes 14 amino acids that are conserved in all known protein kinases, including the Brassica SRK sequences (Kusaba et al. 2000). All these are invariant in our sequences, of both sets A and B. In the 13 amino acids of the putative kinase active site, there is only one polymorphism in the set A sequences, but those of set B differ in three of these residues. In exons 3 and 4, the sequences deviate more from other sequences. Of the 35 amino acids spanning the putative phosphorylation site and ATP-binding region, only 18 do not vary in our sequences; 14 of these sites are polymorphic in the 12 sequences of set A (there were also three fixed differences from the set B sequences, and four polymorphisms within set B, three of which are also polymorphic sites in set A).
Diversity of the Allele Sequences
Within both sets of sequences, there is considerable diversity (figs. 1 and 2 and tables 4 and 5). Set A sequence diversity at both synonymous and nonsynonymous sites in exon 1 exceeds even the very high level in Brassica SRK sequences; for synonymous sites, the respective S-domain diversity values are about 18% (Sato et al. 2002), compared with about 45% (table 4). Extremely high polymorphism extends into the kinase domain region, and none of the introns can be aligned between the set A putative alleles. Within both A and B sequence sets, the introns contain indel variants; intron 1 lengths range from 341 to 1246 bp among the set A sequences, and also vary among the set B sequences (table 2). As in the Brassica SRK kinase (Hinata et al. 1995; Nishio et al. 1997), indel variants are found in the S-domains (Kusaba et al. 2001; Schierup et al. 2001) and also in exons 2 and 3.
|
|
Polymorphism Within Putative Allelic Classes
If the Aly13 kinase sequences are indeed S-alleles, the allelic diversity should be present within populations rather than between different populations. To compare within-allelic types, we therefore obtained the following data set. For each subtype (see Materials and Methods), two individuals were sampled from the same population of origin, and one from a different natural population, yielding sets of three sequences of nine different putative alleles, five from set A and four from set B (table 6). The results show that the high Aly13 diversity cannot be attributed to divergence between sequences from different populations. The diversity within each subtype is very low, even in the introns, contrasting strongly with the extreme differences between the sequences of the subtypes described above, even those from the same population. Even the least conservative estimate of the numbers of differences between the sequences of the same subtype has a maximum of eight differences out of 3.19 kb of sequence of allele 22, and less than half this for all other alleles (five of them showed no differences in at least 2.52 kb). Four further subtypes (Aly13-5, Aly13-13, Aly13-19, and Aly13-20) provided pairs of sequences from two plants from one population; of these, there was a single nonsynonymous difference in Aly13-13.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In Brassica, the different S-allele haplotypes have different lengths and genes arrangements and even differences in gene content (reviewed by Nasrallah 2000), so recombination would certainly be surprising, and it has long been believed that the S-loci are in a nonrecombining region of genome. The existence of separate genes for the pollen and pistil recognition functions, which have recently been documented in both Brassica (Schopfer, Nasrallah, and Nasrallah 1999; Suzuki et al. 1999) and A. lyrata (Kusaba et al. 2001), makes it plausible that recombination is suppressed to maintain coadapted sets of the two different kinds of loci (or else that the S-locus is in a region where recombination rarely occurs, such as a centromeric region; however, the orthologous region in A. thaliana, on chromosome 4, is in the middle of the long arm, distant from the centromere).
The question of recombination is also important in relation to the analysis of S-allele sequence results. If recombination occurs, one cannot use the gene tree to estimate the phylogeny of these sequences. Ignoring recombination, the times of origination of alleles are overestimated, whereas including recombination can potentially explain the observed long external branches of the phylogenies of S-alleles (Schierup, Mikkelsen, and Hein 2001), without the need to appeal to selection at linked sites (Uyenoyama 1997; Charlesworth and Awadalla 1998).
Our observation of very high diversity, even in the kinase domain, is in general consistency with the belief that SRK alleles do not recombine with one another. This domain includes many amino acid residues that must probably be conserved for the protein to function and which are indeed conserved between widely different receptor kinases. The decreased nonsynonymous diversity in the A. lyrata SRK set A sequences (from 20% or higher in exons 1 and 2 to below 10% for exons 3 onwards) may be due to such selective constraints, but constraints predict the lowest diversity in the most functionally important regions and cannot explain the steadily decline in diversity with distance from the S-domain, whereas recombination could explain this pattern.
Synonymous site diversity, however, remains extremely high, and the introns are unalignable. In Brassica oleracea also, there is considerable diversity in exon 5 (Nishio et al. 1997). If recombination occurs, sites as far distant from the S-domain as exons 5 to 7 of the SRK kinase domain might be expected to have much lower synonymous diversity, judging from other data on genes under balancing selection. In MHC genes, diversity is specifically elevated near the codons for antigen recognition sites (e.g., Bergström et al. 1998). Theoretical analyses also suggest that peaks of elevated diversity will be restricted to sites very close to the targets of balancing selection (Hudson and Kaplan 1988; Nordborg, Charlesworth, and Charlesworth 1996; Andolfatto and Nordborg 1998; Takahata and Satta 1998). On the other hand, we cannot currently predict the rate of fall-off in diversity in the S-locus region, whose local recombination rate is unknown in either Brassica or A. lyrata. It is possible that these genes are in a region of low, but not zero, recombination, so that diversity, and linkage disequilibrium fall off slowly away from sites under balancing selection.
The observed high diversity in the kinase domain is, of course, only indirect evidence for linkage to a region experiencing balancing selection. A different explanation could be compensatory evolution: changes in the S-domain creating a new specificity might lead to sequence evolution in the kinase domain. Polymorphism in the kinase domain would then be related to the balancing selection in the S-domain (each allele's kinase domain would coadapt with its respective S-domain). Compensatory change in the kinase domain seems unlikely, given that it is intracellular and presumably functions independently of the extracellular S-domain (see Nasrallah 2000). The sequence polymorphism data also argue against such balancing selection. We observe a fairly steady fall-off in nonsynonymous diversity in the kinase domain (see fig. 2), whereas coadaptation should produce peaks of diversity close to the sites within this domain that are under selection. Coadaptation should also cause linkage disequilibrium between sites that define functional allelic classes, which is not observed. LD is low between the kinase and S-domains, although, as discussed next, this may be due to unusual evolutionary properties of the S-domain.
It is therefore clearly important to test for recombination. Our explicit tests do not show convincing evidence for reciprocal recombination between different SRK alleles. Tests based on the entire sequence are not convincing evidence for reciprocal recombination, since tests on the kinase domain alone are, at most, weakly statistically significant, suggesting that LD differences between the two domains (see above) may explain the declining LD with distance between polymorphic sites across the entire sequence. Of the various potential reasons for LD differences between the two domains, a mutational hotspot (see Innan and Nordborg 2002) in the S-domain seems unlikely because synonymous diversity is high throughout the gene (see fig. 2). Other possibilities are different levels of functional constraint at nonsynonymous sites, and/or occasional gene conversion of small segments in the S-domain that have destroyed linkage disequilibrium. Given the extremely long divergence times between allelic sequences (necessary to account for the high variability between alleles), these processes could create homoplasies in the S-domain. Gene conversion in the S-domain, involving different SRK alleles, would act like reciprocal recombination to reduce differences between alleles, except close to sites under balancing selection, perhaps explaining why set B alleles are less clearly distinct from set A alleles in the S-domain trees (fig. 1).
Diversity Within and Between Allelic Classes
The observed very low diversity within allelic classes of sequences is also consistent with lack of recombination. As outlined above, if recombination is very infrequent, individual alleles within a given allelic class will have a recent common ancestor, so that alleles with the same specificity are expected to have similar sequences. As predicted, the high overall A. lyrata SRK variability is due to differences between allelic classes and concomitant extensive haplotype structure. This has also been found for fungal incompatibility alleles (May et al. 1999). Our results, with very similar sequences in different individuals, even from different populations, but extreme differences between the different sequences, further support the other evidence that these sequences represent S-alleles with different specificities.
These observations contrast with those for the human HLA locus DRB1. In DRB1, serologically defined allelic lineages include considerable diversity in exon 2, which includes the antigen recognition site codons, whereas the adjacent exons and other introns are very homogeneous within the allelic classes, particularly when compared with divergence between them (mean respective pairwise divergence, k, values were 0.0007 for exons versus 0.086 for introns [Bergström et al. 1998]). The high exon 2 diversity within these allelic lineages is due to high nonsynonymous polymorphism in the antigen recognition site codons. Within classes, the mean ka for these codons is 0.065, based on five lineages with multiple sequences analyzed. ka/ks for these sequences averages 0.75, although for no allelic class did ka/ks exceed 1. In contrast, for divergence between allelic classes, the mean Ka was 0.38, almost three times the synonymous divergence. Nonrecognition-site codons in exon 2 had a ka/ks averaging 0.08 within allelic classes and 0.47 between classes (Bergström et al. 1998). This suggests that recognition site codons experience much faster replacement substitutions than other nearby codons. This could be due to diversifying selection generating new specificities within allelic classes or to rapid amino acid replacement caused by the low effective size within lineages at sites that do not recombine with the selected codons. Potentially, these two possibilities may be distinguishable by comparisons between human populations. If selection is leading to rapid diversification of sequences within serological lineages, isolated populations might be expected to show excess differences, specifically at nonsynonymous sites in these codons. This possibility represents an interesting difference between HLA genes and S-alleles, as incompatibility loci presumably retain the same specificities over long evolutionary times (Shiba et al. 2002). However, other possibilities for the DRB1 data cannot yet be ruled out, including recombination directed to highly specific regions of exon 2 (Bergström et al. 1998).
Gene Conversion (Entire Sections Revised)
An intriguing result in our study is the finding of unlinked sequences much more similar to the SRK sequences than to other paralogous loci and extremely similar to a subset ("B") of SRK sequences. This suggests exchange of sequence information between different S-domain loci, making it essential to test every putative SRK allele for linkage to the S-locus. Without linkage information, one cannot tell whether a new sequence is an allele of the SRK locus. The two unlinked set B sequences (Aly13-2 and Aly13-7) could represent a locus (or perhaps even two loci) that originated through duplication of an allele from the SRK locus to a different genomic location. This, however, must presumably have been a single event, which cannot account for the rather high diversity among the set B sequences (around 10% for both intron sites and silent sites in exons other than the S-domain [see table 5] while intron 1 cannot be readily aligned). Nucleotide diversity estimates for other A. lyrata S-domain loci (paralogs, see Charlesworth et al. 2003) and other loci (Wright, Lauga. and Charlesworth 2003) are only about 1% to 2%.
A more likely alternative is gene conversion involving SRK and another locus. Gene conversion between alleles in the S-locus region has been suggested several times in Brassica (Miege et al. 2001), for instance to explain the great similarity between the SLG8 and SLG46 alleles of Brassica campestris, which differ in their specificities but are identical in the three hypervariable regions and are 97.5% similar in their overall amino acid sequences (Kusaba et al. 1997). Conversion involving tandemly duplicated loci is well known to cause different evolutionary histories for different linked regions (e.g., Wang, Magoulas, and Hickey 1999; Araki, Inomata, and Yamazaki 2001; Drouin 2002), and there is evidence for gene conversion between the linked SRK and SLG genes in Brassica, whose gene trees show discordant evolutionary histories (Sato et al. 2002). Gene conversion events between paralogous sequences have been detected in tetraploid fish (Angers, Gharbi, and Estoup 2002) and in plants between paralogous resistance genes (Sun et al. 2001), actin genes (Moniz de Sá and Drouin 1996; Drouin et al. 1999), and ribosomal DNA genes (Buckler, Ippolito, and Holtsford. 1997). Gene conversion is also strongly suggested between different loci in the MHC cluster of genes (Martinsohn et al. 1999).
Like the gene conversion just mentioned between SRK and SLG, transfer of sequences might occur between the S-locus and a paralogous S-domain gene. The possibility of gene conversion involving the SRK locus does not necessarily imply that recombination must occur in this region. It is increasingly clear that, in Drosophila, genome regions with little or no recombination nevertheless undergo gene conversion (Langley et al. 2000; Jensen, Charlesworth, and Kreitman 2002). Our results in A. lyrata might represent another case.
It seems unlikely that transfer introduced S-locus variants to the other gene. As argued above for a single duplication event, one such conversion cannot account for the differences between the two set B sequences that are unlinked to SRK, assuming that these represent a single locus; whether these sequences come from one or more loci is, however, not yet known. Transfer in the other direction seems more likely, introducing variants to the S-locus from a less diverse locus and thus reducing the differences between certain SRK alleles. It is unknown whether the transfer was a single event or repeated. It may be helpful to obtain sequences of the SCR alleles of the haplotypes with set B alleles that are linked to the S-locus (Aly13-3, Aly13-6, Aly13-8, and Aly13-14), to help define the size and nature of the region of genome in which these haplotypes have similar sequences. With a single transfer, the linked set B sequences must have diversified after the event, which could account for these four sequences' S-domains forming a cluster (fig. 1). The much higher diversity of their S-domains, compared with the kinase domains, would then imply more rapid accumulation of substitutions in the S-domain during divergence of these sequences since the transfer event, suggesting different mutational properties, very different selective constraints, or gene conversion events with the set A S-domains.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Adam Eyre-Walker, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Andolfatto, P., and M. Nordborg. 1998. The effect of gene conversion on intralocus associations. Genetics 148:1397-1399.
Angers, B., K. Gharbi, and A. Estoup. 2002. Evidence of gene conversion events between paralogous sequences produced by tetraploidization in Salmoninae fish. Mol. Ecol. 54:501-510.
Araki, H., N. Inomata, and T. Yamazaki. 2001. Molecular evolution of duplicated amylase gene regions in Drosophila melanogaster: evidence of positive selection in the coding regions and selective constraints in the cis-regulatory regions. Genetics 157:667-677.
Awadalla, P., and D. Charlesworth. 1999. Recombination and selection at Brassica self-incompatibility loci. Genetics 152:413-425.
Bergström, T. F., A. Josefsson, H. Erlich, and U. Gyllensten. 1998. Recent origin of HLA-DPB1 alleles and implications for human evolution. Nat. Genet. 18:237-242.[ISI][Medline]
Buckler, E. S., A. Ippolito, and T. P. Holtsford. 1997. The evolution of ribosomal DNA: divergent paralogues and phylogenetic implications. Genetics 145:821-832.
Charlesworth, B., M. Nordborg, and D. Charlesworth. 1997. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided inbreeding and outcrossing populations. Genet. Res. 70:155-174.[CrossRef][ISI][Medline]
Charlesworth, D., B. K. Mable, M. H. Schierup, C. Bartolomé, and P. Awadallala. 2003. Diversity and linkage of genes in the self-incompatibility gene family in Arabidopsis lyrata. Genetics (in press).
Drouin, G. 2002. Testing claims of gene conversion between multigene family members: examples from echinoderm actin genes. J. Mol. Evol. 54:138-139.[ISI][Medline]
Drouin, G., F. Prat, M. Ell, and G. D. Clarke. 1999. Detecting and characterizing gene conversions between multigene family members. Mol. Biol. Evol. 16:1369-90.[Abstract]
Frey, B., and B. Suppmann. 1995. Demonstration of the expand PCR system's greater fidelity and higher yields with a Lac1-based PCR fidelity assay. Biochemica 2:8-9.
Hinata, K., M. Watanabe, S. Yamakawa, Y. Satta, and A. Isogai. 1995. Evolutionary aspects of the S-related genes of the Brassica self-incompatibility system: synonymous and nonsynonymous base substitutions. Genetics 140:1099-1104.
Hudson, R. R. 2000. A new statistic for detecting genetic differentiation. Genetics 151:2011-2014.
Hudson, R. R., and N.L. Kaplan. 1988. The coalescent process in models with selection and recombination. Genetics 120:831-840.
Innan, H., and M. Nordborg. 2002. Recombination or mutational hot spots in human mtDNA? Mol. Biol. Evol. 19:1122-1125.
Jeanmougin, F., J. D. Thompson, M. Gouy, D. G. Higgins, and T. J. Gibson. 1998. Multiple sequence alignment with ClustalX. Trends Biochem. Sci. 23:403-405.[CrossRef][ISI][Medline]
Jensen, M. A., B. Charlesworth, and M. Kreitman. 2002. Patterns of genetic variation at a chromosome 4 locus of Drosophila melanogaster and D. simulans. Genetics 160:493-507.
Kumar, S., K. Tamura, I. Jacobsen, and M. Nei. 2000. MEGA2: molecular evolutionary genetics analysis. Version 2.0. Distributed by the authors, Pennsylvania State University, University Park, Pennsylvania and Arizona State University, Tempe, Arizona.
Kusaba, M., K. Dwyer, J. Hendershot, J. Vrebalov, J. B. Nasrallah, and M. E. Nasrallah. 2001. Self-incompatibility in the genus Arabidopsis: characterization of the S locus in the outcrossing A. lyrata and its autogamous relative, A. thaliana. Plant Cell 13:627-643.
Kusaba, M., M. Matsushita, K. Okazaki, Y. Satta, and T. Nishio. 2000. Sequence and structual diversity of the S-locus genes from different lines with the same self-recognition specificites in Brassica oleracea. Genetics 154:413-420.
Kusaba, M., T. Nishio, Y. Satta, K. Hinata, and D. Ockendon. 1997. Striking sequence similarity in inter- and intra-specific comparisons of class I SLG alleles from Brassica oleracea and Brassica campestris: implications for the evolution and recognition mechanism. Proc. Natl. Acad. Sci. USA 94:7673-7678.
Langley, C. H., B. P. Lazzaro, W. Phillips, E. Heikkinen, and J. M. Braverman. 2000. Linkage disequilibria and the site frequency spectra in the su(s) and su(wa) regions of the Drosophila melanogaster X chromosome. Genetics 156:1837-1852.
Lawrence, M. J., M. D. Lane, S. O'Donnell, and V. E. Franklin-Tong. 1993. The population genetics of the self-incompatibility polymorphism in Papaver rhoeas. V. Cross-classification of the S-alleles from three natural populations. Heredity 71:581-590.[ISI]
Mable, B. K., M. H. Schierup, and D. Charlesworth. 2003. Estimating the number of S-alleles in a natural population of Arabidopsis lyrata (Brassicaceae) with sporophytic control of self-incompatibility. Heredity 90:422-431.
Martinsohn, J. T., A. B. Sousa, L. A. Guethlein, and J. C. Howard. 1999. The gene conversion hypothesis of MHC evolution: a review. Immunogenetics 50:168-200.[CrossRef][ISI][Medline]
May, G., F. Shaw, H. Badrane, and X. Vekemans. 1999. The signature of balancing selection: fungal mating compatibility gene evolution. Proc. Natl. Acad. Sci. USA 96:9172-9177.
McVean, G. A. T. 2001. What do patterns of genetic variability reveal about mitochondrial recombination? Heredity 87:613-620.[CrossRef][ISI][Medline]
McVean, G. A. T., P. Awadalla, and P. Fearnhead. 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241.
Miege, C., V. Ruffio-Chable, M. H. Schierup, D. Cabrillac, T. Gaude, and J. M. Cock. 2001. Intra-haplotype polymorphism at the Brassica S locus. Genetics 159:811-822.
Moniz de Sá M., and G. Drouin. 1996. Phylogeny and substitution rates of angiosperm actin genes. Mol Biol Evol. 13:1198-1212.[Abstract]
Nasrallah, J.B. 2000. Cell-cell signaling in the self-incompatibility response. Curr. Opin. Plant Biol. 3:368-373.[CrossRef][ISI][Medline]
Navarro, A., and N. H. Barton. 2002. The effects of multilocus balancing selection on neutral variability. Genetics 161:849-863.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.[Abstract]
Nishio, T., M. Kusaba, K. Sakamoto, and D. Ockendon. 1997. Polymorphism of the kinase domain of the S-locus receptor kinase gene (SRK) in Brassica oleracea L. Theor. Appl. Genet. 95:335-342.[CrossRef][ISI]
Nordborg, M., B. Charlesworth, and D. Charlesworth. 1996. Increased levels of polymorphism surrounding selectively maintained sites in highly selfing species. Proc. R. Soc. Lond. B Biol Sci. 163:1033-1039.
Nordborg, M., and H. Innan. 2003. The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population. Genetics 163:1201-1213.
Pannell, J. R., and B. Charlesworth. 1999. Neutral genetic diversity in a metapopulation with recurrent local extinction and recolonization. Evolution 53:664-676.[ISI]
Sato, T., T. Nishio, R. Kimura, M. Kusaba, G. Suzuki, K. Hatakeyama, D. Ockendon, and Y. Satta. 2002. Coevolution of the S-locus genes SRK, SLG and SP11/SCR in Brassica oleracea and B. rapa. Genetics 162:931-940.
Schierup, M. H., B. K. Mable, P. Awadalla, and D. Charlesworth. 2001. Identification and characterization of a polymorphic receptor kinase gene linked to the self-incompatibility locus of Arabidopsis lyrata. Genetics 158:387-399.
Schierup, M. H., A. M. Mikkelsen, and J. Hein. 2001. Recombination, balancing selection and phylogenies in MHC and self-incompatibility genes. Genetics 159:1833-1844.
Schierup, M. H., X. Vekemans, and D. Charlesworth. 2000. The effect of subdivision on variation at multi-allelic loci under balancing selection. Genet. Res. (Cambridge) 76:51-62.
Schopfer, C. R., M. E. Nasrallah, and J. B. Nasrallah. 1999. The male determinant of self-incompatibility in Brassica. Science 286:1697-1700.
Shiba, H., M. Iwano, and T. Entani, et al. (11 co-authors). 2002. The dominance of alleles controlling self-incompatibility in Brassica pollen is regulated at the RNA level. Pl. Cell 14:491-504.[CrossRef][ISI]
Strobeck, C. 1983. Expected linkage disequilibrium for a neutral locus linked to a chromosomal arrangement. Genetics 103:545-555.
Sun, Q., N. C. Collins, M. Ayliffe, S. M. Smith, J. Drake, T. Pryor, and S. H. Hulbert. 2001. Recombination between paralogues at the rp1 rust resistance locus in maize. Genetics 158:423-438.
Suzuki, G., N. Kai, T. Hirose, K. Fukui, T. Nishio, S. Takayama, A. Isogai, M. Watanabe, and K. Hinata. 1999. Genomic organization of the S locus: identification and characterization of genes in SLG/SRK region of S9 haplotype of Brassica campestris (syn. rapa). Genetics 153:391-400.
Tajima, F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135:599-607.
Takahata, N., and Y. Satta. 1998. Footprints of intragenic recombination at HLA loci. Immunogenetics 47:430-441.[CrossRef][ISI][Medline]
Uyenoyama, M. K. 1997. Genealogical structure among alleles regulating self-incompatibility in natural populations of flowering plants of self-incompatibility. Genetics 147:1389-1400.
Uyenoyama, M. K. 2000. Evolutionary dynamics of self-incompatibility. Genetics 156:351-359.
Vekemans, X., and M. Slatkin. 1994. Gene and allelic genealogies at a gametophytic self-incompatibility locus. Genetics 137:1157-1165.
Wade, M. J., and D. E. McCauley. 1988. Extinction and recolonization: their effects on the genetic differentiation of local populations. Evolution 42:995-1005.[ISI]
Wang, S. J., C. Magoulas, and D. Hickey. 1999. Concerted evolution within a trypsin gene cluster in Drosophila. Mol. Biol. Evol. 16:1117-1124.[Abstract]
Wright, S. I., B. Lauga, and D. Charlesworth. 2003. Subdivision and haplotype structure in natural populations of Arabidopsis lyrata. Mol. Ecol. 12:1247-1263.