Phylogenetics in Caenorhabditis elegans: An Analysis of Divergence and Outcrossing

Dee R. Denver, Krystalynne Morris and W. Kelley Thomas

Division of Molecular Biology and Biochemistry, School of Biological Sciences, University of Missouri-Kansas City, Kansas City, Missouri


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
This study establishes a phylogenetic framework for the natural geographic isolates of the widely studied nematode species Caenorhabditis elegans. Virtually complete mitochondrial genomes are sequenced from 27 C. elegans natural isolates to characterize mitochondrial divergence patterns and to investigate the evolutionary history of the C. elegans hermaphrodite lineages. Phylogenetic analysis of mitochondrial sequences reveals the presence of two major C. elegans hermaphrodite clades (designated clade I and clade II). Fifty-six nuclear loci, widely distributed across the five autosomes and the X chromosome, are also analyzed in a subset of the C. elegans isolates to evaluate nuclear divergence patterns and the extent of mating between different strains. A comparison of the phylogenetic tree derived from mitochondrial data with the phylogenetic tree derived from nuclear data reveals only one inconsistency in the distribution of isolates into clades I and II, suggesting that mating between divergent C. elegans strains is an infrequent event in the wild.

Key Words: Caenorhabditis elegans • mitochondrial genome • nuclear genome • phylogenetic analysis


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Caenorhabditis elegans is a free-living nematode species that serves as an important model for studying virtually all aspects of biology (Riddle et al. 1997; The C. elegans Sequencing Consortium 1998). However, a large disparity exists between the extensive knowledge of C. elegans as a model experimental system and the minimal information available about its natural environment and ecology. Almost all C. elegans studies use the classic N2 strain from Bristol, England, yet many other geographically diverse strains have been isolated and characterized (Hodgkin and Doniach 1997). The C. elegans natural isolates vary in feeding behavior (de Bono and Bargmann 1998), sperm size (LaMunyon and Ward 2002), body length, fecundity, and other phenotypic characters (Hodgkin and Doniach 1997).

Molecular variation in the C. elegans natural lines has been studied using single mitochondrial and nuclear genes (Thomas and Wilson 1991), as well as transposons such as Tc1 (Egilmez, Ebert, and Shmookler Reis 1995; Hodgkin and Doniach 1997). A recent large-scale analysis of nuclear single-nucleotide polymorphism (SNP) patterns shows extensive variation in SNP density among different lines and that many of the lines share SNPs (Koch et al. 2000). The Hawaiian strain (CB4856) harbors many unique SNPs and has the highest SNP density among the lines surveyed in Koch et al. 2000, making it the standard for C. elegans gene mapping (Wicks et al. 2001). Although these previous studies provide insights into C. elegans SNP patterns and densities in the natural isolates, a large-scale phylogenetic analysis of C. elegans evolution is lacking. An evolutionary framework for C. elegans is critical for understanding the fundamental divergence patterns of the natural isolates in addition to the phylogenetic contributions to phenotypic differences (such as feeding behavior).

C. elegans hermaphrodites are capable of matrilineal reproduction by self-fertilization; alternatively, males can spontaneously arise by X chromosome loss during hermaphrodite gametogenesis and subsequently mate with hermaphrodites (Riddle et al. 1997). C. elegans hermaphrodites cannot cross-fertilize other hermaphrodites. Although males are known to spontaneously arise at a rate of approximately 0.2% in laboratory culture (Hodgkin and Doniach 1997), very little is known about the frequency of males in the wild or how much they contribute to gene flow between genetically distinct C. elegans populations. It has recently been postulated that males are maintained in C. elegans as a mere consequence of the particular genetic system inherited from its dioecious ancestor and the nonadaptive spontaneous nondisjunction of sex chromosomes in hermaphrodites (Chasnov and Chow 2002). The extent to which wild C. elegans populations are able to maintain sufficient levels of genetic diversity and avoid the long-term consequences of mutation accumulation associated with inbreeding depends greatly on the extent of male-driven outcrossing between strains with divergent genomes. Understanding the relative contributions of hermaphrodites and males to the evolution of this species requires the establishment of a solid hermaphrodite phylogeny (reflected in mitochondrial sequences) as a foundation upon which to analyze nuclear variation patterns.

We investigate the molecular evolutionary histories of the C. elegans natural isolates through a large-scale analysis of variation in both the mitochondrial and nuclear genomes. Using nearly complete mitochondrial genome sequences from C. elegans natural lines, inherited exclusively through the hermaphrodite lineage (unpublished data), a phylogeny of the hermaphrodite lineages is constructed. Furthermore, more than 44 kb of nuclear DNA sequence was collected and analyzed from a subset of the natural isolates to examine the evolutionary history of the C. elegans nuclear genome and to evaluate the extent to which male-driven mating between divergent strains has occurred throughout C. elegans evolution. We also assay the incidence of high-copy Tc1 transposons in the nuclear genome, social feeding behavior, and copulatory plug formation in the natural isolates with respect to the phylogenetic framework. A unique opportunity to better understand phylogenetic diversity in C. elegans, a species capable of both matrilineal reproduction through hermaphrodites and sexual reproduction through male-hermaphrodite matings, is provided by the large-scale analysis of mitochondrial and nuclear genome divergence patterns presented here.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
PCR Amplification and DNA Sequencing
Twenty-seven C. elegans natural geographic isolates were analyzed (table 1). All lines were obtained from the Caenorhabditis Genetics Center at the University of Minnesota. All nematodes were cultured on standard NGM agarose plates, seeded with the OP50 strain of Escherichia coli as a food source (Sulston and Hodgkin 1988). Worm populations were allowed to expand until food sources were exhausted, after which the worms were harvested for DNA extraction.


View this table:
[in this window]
[in a new window]
 
Table 1 Caenorhabditis elegans Natural Geographic Isolates.

 
Polymerase chain reaction (PCR) amplifications for mtDNA sequences were performed in 50-µL reactions containing 67 mM Tris-HCl (pH 8.8); 6.7 mM MgCl2; 16.6 mM (NH4)2SO4; 10 mM ß-mercaptoethanol; 1 mM each of dGTP, dATP, dTTP, and dCTP; 0.5 µM of each primer; 10 to 100 ng genomic DNA; and 2.5 U of Thermus aquaticus DNA polymerase (Applied Biosystems). PCR reactions for nuclear sequences were performed with ABI XLII PCR buffer; 2.5 mM MgCl2; 0.2 M each of dATP, dCTP, dGTP, and dTTP; 0.5 µM of each primer; 10 to 100 ng genomic DNA, and 2.5 U of T. aquaticus DNA Polymerase (Applied Biosystems). Amplification of mtDNA was carried out by 35 cycles of denaturation at 94°C for 1 min, annealing at 52°C to 55°C for 1 min, and extension at 72°C for 2 min. Amplification of nuclear loci was carried out by 35 cycles of denaturation at 94°C for 1 min, annealing at 59°C to 67°C for 1 min, and extension at 72°C for 2 min. mtDNA-specific primers (see Denver et al. 2000 for information on primer positions and sequences) were designed to the published sequence of N2 C. elegans mitochondrial genome (Okimoto et al. 1992). Primers specific to nuclear sequences were designed to cosmid and YAC C. elegans sequences from WormBase to generate PCR products that range in size from 326 to 1,393 bp (see Supplementary Material online) (Stein et al. 2001). A subset (28 of 56 total) of the nuclear loci was targeted to homopolymeric nucleotide runs and other repetitive elements, whereas the remaining loci were randomly distributed across the nuclear genome (a computational random number generator randomly generated numbers within the size boundaries of the C. elegans chromosomes). mtDNA PCR products were purified for sequencing by gel isolation techniques. PCR products were separated in a 2% SeaPlaque GTG agarose gel (FMC Bioproducts). Bands were excised and purified with a QIAquick gel extraction kit (Qiagen). Nuclear PCR products were purified through solid phase reversible immobilization (SPRI) techniques (Elkin et al. 2001). The purified PCR product was used as a template for cycle sequencing with dRhodamine dye terminators (Applied Biosystems). PCR primers were used for sequencing reactions along with internal primers where necessary. Cycle sequencing was carried out by 25 cycles of denaturation at 96°C for 30 s, annealing at 50°C for 15 s, and extension at 60°C for 4 min. Unincorporated dye terminators were removed with Micro Bio-Spin Chromatography Columns (BioRad) or through ethanol precipitation. Sequences were determined with an ABI Prism 377 automated DNA sequencer (Applied Biosystems) and two CEQ capillary sequencing systems (Beckman). New C. briggsae mtDNA sequences were deposited in GenBank with the following accession numbers: AY171101 to AY171106. C. elegans mtDNA sequences described in this study were assigned GenBank accession numbers AY171133 to AY171222. C. elegans nuclear DNA sequences described here were assigned GenBank accession numbers AY171107 to AY171132 and AY171223 to AY171228.

DNA Sequence Alignment and Phylogenetic Analysis
Sequences were aligned to the published N2 sequences using the Eyeball Sequence Editor (ESEE) Version 3.01 computer alignment program or by subjecting FASTA text files to batch alignment with CLUSTALW (Higgins, Thompson, and Gibson 1996). All substitutions were confirmed by visually evaluating electropherogram sequence data. Stationarity of base composition was confirmed using Tree-Puzzle 5.0 (Schmidt et al. 2002). Phylogenetic analyses were performed using the PAUP* 4.0b10 software package (Swofford 1991). Both maximum parsimony (MP) and maximum likelihood (ML) analyses were performed for mitochondrial and nuclear sequence alignments. ML analyses used the simple Kimura model for nucleotide substitution. Analyses using the Jukes-Cantor model were also performed and yielded ML trees with identical topologies and similar bootstrap values to those done with the Kimura model. All bootstrap analyses were done with 1,000 replicates. In addition to directly deriving a tree from MP and ML analyses of nuclear sequences, nuclear substitutions were also forced onto the mtDNA phylogeny (using PAUP*4.0b10) to investigate differences between the mitochondrial and nuclear phylogenies. We searched for substitutions whose homoplasy indices were higher in the substitution matrix derived from the "forced" tree (nuclear variable sites placed on the hermaphrodite tree) compared with the matrix derived from direct MP analysis of nuclear sequences. The distribution of these variable sites that supported the inclusion of AB1 in clade I in phylogenetic analyses of nuclear sequences were then analyzed with respect to the physical map of C. elegans chromosomes.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Patterns of Variation
We analyzed nearly complete mitochondrial genome sequences (11,443 bp of 13,794 bp total) from all of the C. elegans natural isolates listed in table 1. The only major gap was an approximately 2 kb segment that encompasses the AT region, tRNAAla and part of the NADH dehydrogenase 5 (ND5) gene. The vast majority of mitochondrial variable sites observed were base substitutions rather than insertion-deletion (indels) changes (table 2). Among the 237 base substitutions observed in mtDNA sequences, 205 were detected in protein-coding sequences, 19 were in rRNA genes, and 13 were found in tRNA genes. No base substitutions were observed in the intergenic sequence examined. Among the 205 base substitutions observed in protein-coding sequences, 172 were silent and 33 were replacement substitutions. Transitions dominated over transversions among base substitutions (table 2). Only one instance of length change variation was observed (excluding variation in repetitive sequence) that involved a ±A in an intergenic region between the NADH dehydrogenase 4 (ND4) and cytochrome oxidase I (COI) genes (Okimoto et al. 1992). mtDNA base composition was highly stable in all the C. elegans natural isolates: 44.9% to 45.1% T, 9.0% to 9.1% C, 30.7% to 30.9% A, and 15.1% to 15.2% G. Stationarity of base composition, necessary for the phylogenetic analyses presented below, was confirmed using Tree-Puzzle (Schmidt et al. 2002). The pattern of mtDNA variation observed here in the C. elegans natural isolates was consistent with what has previously been observed in C. elegans and other animal species (Thomas and Wilson 1991; Vigilant et al. 1991; Ballard and Kreitman 1994; Xia, Hafner, and Sudman 1996).


View this table:
[in this window]
[in a new window]
 
Table 2 Patterns of Variation in Mitochondrial and Nuclear DNA.

 
We explored the nuclear substitution patterns of the C. elegans natural isolates by sequencing 44,111 bp of nuclear DNA from 18 of the isolates (table 2). The nine C. elegans lines excluded from nuclear analyses had mtDNA sequences identical to N2. The nuclear loci analyzed were spread across all six chromosomes and in both arm and core regions of autosomes (see Supplementary Material online) (Barnes et al. 1995). A subset of the analyzed loci was targeted to repetitive elements such as homopolymers; the remaining nuclear sequences analyzed were a result of loci randomly distributed across the nuclear genome (see Materials and Methods for details). As variation patterns in homopolymers are the focus of another study (Denver et al., in preparation), changes detected in homopolymer runs 8 bp or more in length were excluded in this study.

Nuclear variable sites were, on average, lower in abundance compared with mtDNA variable sites in the C. elegans natural isolates, as has been previously reported (table 2) (Thomas and Wilson 1991). However, two nuclear loci harbored an extremely large number of variable sites (table 2); these will be considered in-depth further below. With the exception of the two divergent alleles, all nuclear loci assayed, ranging in size from 326 to 1,393 bp (mean = 788 bp), contained few variable sites (see Supplementary Material online). In fact, the majority of nuclear sites assayed (30 of 56 total) were identical across all isolates assayed. Nuclear base substitutions dominated over indels, although the bias was far less dramatic than what was observed in the mitochondrial genome (table 2). Likewise, the predominance of transition over transversions was reduced compared with mitochondrial sequences (table 2). Variation in the density of variable sites among different isolates was observed. For instance, an average density of 2.7 variable sites/kb was observed in CB4856 (excluding the divergent B0019 and F35E8 loci), whereas TR403 displayed a much lower density of 0.07 variable sites/kb. As with mtDNA, the nuclear DNA patterns of variation observed here were consistent with what has previously been reported for C. elegans and other animal species (Petrov and Hartl 1999; Koch et al. 2000; Nachman and Crowell 2000).

The distribution of nuclear variable sites, again excluding the two divergent alleles, in different functional sequence classes (exon, intron, and intergenic) was also considered. A total of 8,303 bp of exon sequence, 13,385 bp of introns sequence, 355 bp of RNA gene sequence, and 22,068 bp of intergenic sequence was assayed among the natural isolates (table 1) (see Supplementary Material online). Variable sites were underrepresented in exon sequences (five observed, 10.1 expected at random) and intergenic sequences (20 observed, 27.0 expected at random) (table 2). Introns, however, contained a significantly higher number of variable sites compared with the random expectation (29 observed, 16.4 expected at random; P < 0.005 using the chi-square test). No variation was observed in the limited amount of RNA gene sequence assayed here.

Two highly divergent loci (B0019 and F35E8) were detected exclusively in CB4856 in the nuclear loci assayed (table 2). The possibility that the divergent loci detected in CB4856 were paralagous rather than orthologous to the targeted loci was tested by BLAST searching the sequences against the C. elegans genome (Altschul et al. 1990). In both cases, BLAST only found significant matches with the targeted orthologous sequence in the N2 genome. The B0019 locus on chromosome I encompassed intronic sequence in the B0019.1 gene (contains similarity to flavin-containing amine oxidases) (The C. elegans Sequencing Consortium 1998). Although the overall density of variable sites was much greater than what is observed among the majority of nuclear loci assayed, the general patterns of variation were consistent with what is observed in the other loci (table 2). In contrast to B0019, the divergent F35E8 locus on chromosome V spanned the entire F35E8.8 gene (glutathione-S-transferase homologue), composed of two exons and one intron (The C. elegans Sequencing Consortium 1998). All of the variable sites detected at this locus were base substitutions (table 2). Transitions were dominant in a fashion similar to what was observed at the other nuclear loci (table 2). The majority of substitutions (30 of 33 total) detected at this locus were in exon sequences. Among the 30 base substitutions observed in F35E8.8 exon sequence, eight were replacements and 22 were silent. No base substitutions were observed that resulted in premature stop codons for the F35E8.8 gene.

Phylogenetic Analysis of C. elegans Mitochondrial DNA
The hermaphrodite evolutionary history of the C. elegans natural isolates was investigated through phylogenetic analysis of 11,443 bp (out of 13,794 total) of DNA from the mitochondrial genomes of 27 C. elegans natural isolates (table 1). From these sequences, we identified 15 unique C. elegans mtDNA sequences that were used for phylogenetic analyses of the C. elegans hermaphrodite lineages. Ten of the 27 C. elegans lines (CB3191, CB4507, CB4555, CB4851, CB4932, DH424, LSJ1, N2, TR388, and TR389) shared identical analyzed mtDNA sequences. PB303 and RC301 had the same mtDNA sequences; AB2, AB3, and AB4 had identical mtDNA sequences as well. The remaining isolates analyzed had unique mtDNA sequences.

To establish an outgroup-based root for the intraspecific phylogeny, phylogenetic analysis of the C. elegans mitochondrial sequences was first done using sequence data from one if its congeners, C. briggsae. Six C. elegans mtDNA protein-coding gene sequences (cytochrome oxidase II [COII], cytochrome b [Cyt B], NADH dehydrogenase 1 [ND1], NADH dehydrogenase 2 [ND2], NADH dehydrogenase 4L [ND4L], and NADH dehydrogenase 6 [ND6]), for which homologous C. briggsae sequence data were also collected, were used in these initial rooted analyses (4,336 bp analyzed). Stationarity between C. elegans and C. briggsae mtDNA sequences was confirmed using Tree-Puzzle as before (Schmidt et al. 2002). The MP and ML approaches yielded trees with identical topologies and revealed the presence of two major C. elegans clades (fig. 1A). To better resolve intraspecific relationships, unrooted MP and ML analyses were performed using full-length (11,443 bp) C. elegans mtDNA sequences and again yielded trees with identical topologies (fig. 1B). Resolution within clade I was greatly improved using the full-length sequences; evolution within clade II, however, was less clear. Within clade I, a subclade (designated subclade Ia) that contains 12 isolates (including N2) was supported in 100% of 1,000 bootstrap replicates in both MP and ML analyses.



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 1. Phylogenetic analysis of C. elegans hermaphrodite lineages using mtDNA. (A) Bootstrap consensus tree for C. elegans natural isolates rooted with C. briggsae sequences, using sequences from six protein-coding genes. MP (top) and ML (bottom) bootstrap values (1,000 replicates performed for both MP and ML) are indicated to the right of the appropriate node. See text for nine isolates with mtDNA sequences identical to N2. (B) Unrooted phylogram of the C. elegans natural isolates using all 11,443 bp of mtDNA sequence collected. Bootstrap values are labeled as in (A)

 
Phylogenetic Analysis of Nuclear DNA
The utility of males and extent of outcrossing between divergent strains in C. elegans was investigated by comparing the evolutionary history of the hermaphrodites, as reflected in the mtDNA-based phylogeny, with that for the nuclear loci. If it is assumed that C. elegans evolves in a strictly matrilineal fashion, then the nuclear and mitochondrial sequences will necessarily be predicted to reveal identical phylogenies. Alternatively, inconsistencies between nuclear and mitochondrial phylogenies will arise if mating occurs between divergent strains.

The 12 C. elegans natural isolates for which nuclear variation is reported both here and in Koch et al. 2000 were used in phylogenetic analyses in order to maximize the number of shared informative nuclear markers (62 variable sites are shared between at least two of the 12 isolates). Trees derived from MP and ML analyses of C. elegans nuclear variable sites had identical topologies (fig. 2A). All mtDNA clade I lines remained in the same clade in nuclear analyses and all mtDNA clade II lines, with the exception of AB1, remained together in the same clade. AB1 was a member of mtDNA clade II. However, in nuclear analyses, AB1 was placed with the clade I worms (fig. 2), providing the only instance of evidence for crossing between mtDNA clade I and clade II worms.



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 2. Phylogenetic analysis of C. elegans nuclear sequences. (A) Phylogenetic analysis (MP) of 70 C. elegans nuclear variable sites that are found in more than one isolate. Data used are derived from both analyses presented here, as well as from variable sites reported in Koch et al. 2000. AB1 (dashed box) is the only isolate whose placement in clade I or clade II is inconsistent between the mitochondrial and nuclear phylogenetic analyses. (B) mtDNA hermaphrodite phylogeny (MP) of C. elegans isolates included in nuclear analyses

 
The distribution patterns of AB1 variable sites across the chromosomes were further characterized to investigate the evolutionary history of this isolate. Nuclear variable sites were forced onto the hermaphrodite phylogeny to look for AB1-specific nuclear homoplasies on the hermaphrodite tree that did not appear on the nuclear tree. These homoplasies represented variable sites that supported the inclusion of AB1 with mtDNA clade I worms in phylogenetic analyses of nuclear DNA. This analysis yielded 19 AB1 variable sites that, in nuclear phylogenetic analyses, supported the presence of AB1 in clade I. All of these sites were detected on autosomes; none were detected on the X chromosome (fig. 3). The AB1 "clade I–like" nuclear variable sites were also clustered together across the physical map of C. elegans chromosomes (fig. 3), suggesting that the observed pattern of SNP distribution in AB1 may have been the result of only a few recombination events between mtDNA clade I and clade II nuclear genomes.



View larger version (14K):
[in this window]
[in a new window]
 
FIG. 3. Distribution of variable sites in AB1 chromosomes. Open squares on the right side of chromosomes indicate nuclear variable sites that AB1 shares with all clade I isolates assayed and support the inclusion of AB1 with clade I worms in nuclear phylogenetic analyses. Open circles on the left side of chromosomes represent variable sites that are not specifically associated with all clade I isolates

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Patterns of Variation
This analysis provides a large data set to investigate both mitochondrial and nuclear substitution patterns as well as for phylogenetic analyses. One notable finding is the extraordinarily high ratio of base substitutions to indels observed in the mtDNA sequences of the C. elegans natural isolates. In fact, only one indel in complex mtDNA sequence is observed. This observation is likely influenced by the fact that the vast majority (~92%) of the mitochondrial genome has well-characterized coding function, whereas only approximately 25% of the nuclear genome is known to have classical coding function (Okimoto et al. 1992; The C. elegans Sequencing Consortium 1998). Furthermore, the major intergenic, or AT, region of the C. elegans mitochondrial genome is not surveyed here. This region contains numerous homopolymeric nucleotide runs in addition to an (AT)18 dinucleotide microsatellite that likely experience a high incidence of length variation. Hence, the base substitution/indel ratio listed in table 2 for mtDNA sequences is most likely an overestimate for the entire mitochondrial genome.

Substitution matrices from MP phylogenetic analyses can be used to indirectly investigate patterns of base substitution. The substitution matrix from MP analysis of the C. elegans mtDNA sequences yields 285 inferred substitutions: 72 T{longleftrightarrow}C, 47 C{longleftrightarrow}T, 41 A{longleftrightarrow}G, 35 G{longleftrightarrow}A, 20 T{longleftrightarrow}A, 12 A{longleftrightarrow}T, 23 T{longleftrightarrow}G, 19 G{longleftrightarrow}T, 9 A{longleftrightarrow}C, 2 C{longleftrightarrow}A, 2 C{longleftrightarrow}G, and 3 G{longleftrightarrow}C. Although an apparent bias toward T{longleftrightarrow}C over C{longleftrightarrow}T substitutions is observed in mtDNA sequences, this indirect data alone cannot discriminate between the forces of natural selection and underlying mutation biases in shaping substitution pattern biases observed in natural populations (Denver et al. 2000).

Common biases are observed in mitochondrial and nuclear variation patterns (base substitutions over indels, transitions over transversions, silent over replacement changes), although in a consistently less dramatic fashion in nuclear sequences (table 2). We also observe variation in the density of variable sites among natural isolates that is consistent with a previous independent study (Koch et al. 2000). An expected reduction in variable site density is observed in exon sequences (also previously reported in Koch et al. 2000); however, a surprising reduction in intergenic variable sites is also detected. This observation may be a consequence of the presence of regulatory elements and noncanonical functional sequences that are not yet fully understood. The recent and ongoing discovery of abundant small RNA genes and other unusual types of coding sequence in the C. elegans genome makes this a plausible and exciting possibility (Lee and Ambros 2001; Lau et al. 2001; Lagos-Quintana et al. 2001; Grosshans and Slack 2002).

The occurrence of highly divergent nuclear alleles in CB4856 has been previously noted. However, the origin(s) of these divergent alleles has not yet been investigated in-depth (Koch et al. 2000). The overall abundance of unique nuclear variable sites in CB4856 may be a consequence of reproductive isolation (this is the only strain isolated from an island). However, the heterogeneity in variable site density across different genomic segments requires a different explanation. One hypothesis is that these loci constitute general mutational hotspots. However, a general hotspot hypothesis would predict that these loci are hypervariable across all lines assayed; CB4856 is the only isolate in which these divergent alleles are found. A second hypothesis is that these alleles represent mutational bursts that uniquely occurred along the CB4856 lineage. As before, all lines would be equally expected to experience such mutational bursts, but the only two highly divergent alleles observed across 56 total nuclear loci assayed occur in CB4856. Additional sequence data is required to completely rule out this hypothesis. A third, and more likely, scenario involves the action of natural selection. These divergent alleles may be maintained through positive natural selection at the loci assayed (direct selection) or through genetic hitchhiking (Andolfatto 2001). Understanding the origins and maintenance of divergent alleles in the C. elegans isolates will require the identification and characterization of additional divergent alleles. Additional natural isolates from new geographic origins would also aid in understanding the nature of these highly variable loci.

C. elegans Phylogenetic Relationships
A phylogenetic framework for the hermaphrodite lineages of the C. elegans natural isolates is established using aligned mtDNA sequences. MP and ML analyses reveal the presence of two major C. elegans clades, designated clade I (contains N2) and clade II (fig. 1). Shared nuclear variable sites are aligned and subjected to MP and ML phylogenetic analyses to investigate the extent of outcrossing between divergent strains of C. elegans in the wild (fig. 2). These sets of analyses show that among 12 isolates analyzed, AB1 is the only line whose genome harbors evidence for recombination between divergent C. elegans lines. Furthermore, the clustering of clade I–like regions in AB1 autosomes and their complete absence on the X chromosome suggests that the observed distribution of variable sites in AB1 may be the result of only a few recombination events between mtDNA clade I and clade II nuclear genomes. The absence of clade I–like variable sites on the X chromosome may support a simple scenario where a single mating occurred between a clade I male and a clade II hermaphrodite that gave rise to the AB1 lineage.

The comparative phylogenetic analyses between C. elegans mitochondrial and nuclear sequences presented here suggest that mating between clade I and clade II is an infrequent event in the wild. Additional evidence for the low frequency of mating between divergent C. elegans strains is obtained from analyzing the distribution of other genotypic and phenotypic characters with respect to the mtDNA-derived hermaphrodite phylogeny (figs. 1, 2). All lines that display high-copy Tc1 (a nuclear genome-specific transposon) genotypes are members of subclade Ia (fig. 1B). Furthermore, the only C. elegans strains that are not "clumpers" (a social feeding phenotypic trait determined by one nuclear locus [de Bono and Bargmann 1998]) are located in this same subclade (fig. 1B). However, the inclusion of CB4932, a clumper, in subclade Ia may indicate that this line results from crossing between clade I and clade II worms (table 1). A similar scenario is observed for copulatory plug formation, another phenotypic character assayed in Hodgkin and Doniach 1997. All subclade Ia lines are negative for copulatory plug formation (fig. 1B and table 1). All other isolates, with the exception of two clade II lines (AB1 and CB4852) are positive for copulatory plug formation (fig. 1B and table 1). The distribution of the above genotypic and phenotypic characters across the hermaphrodite phylogeny supports a generally low level of mating between clade I and clade II strains of C. elegans. In addition to AB1, two additional candidates for lines derived from clade I–clade II matings are CB4852 and CB4932. CB4932 is the only subclade Ia isolate that exhibits clumping behavior. AB1 and CB4852 are the only isolates outside of subclade Ia that display a negative copulatory plug phenotype.

Geographic isolation between clade I and clade II C. elegans strains may be invoked as a possible reason for the lack of mating between divergent strains in the wild. However, there is no clear phylogeographic structure evident among the strains (fig. 1 and table 1). Clade I and clade II isolates are found across both North America and Europe (fig. 1 and table 1); in fact, CB4854 (clade I) and CB4853 (clade II) are both from Altadena, California. It is hence very unlikely that the observed lack of mating between divergent strains is exclusively due to geographic isolation, given the global distribution of both clade I and clade II isolates.

Is the congruence of C. elegans mitochondrial and nuclear phylogenies expected based on coalescence theory? The "three-times rule" of coalescence theory (Palumbi, Cipriano, and Hare 2001) states that species showing mtDNA branches much more than three times longer than the average mtDNA diversity within that species are likely to show nuclear alleles that are monophyletic. By this criterion, coalescence of nuclear clade I or clade II alleles are not expected, as the clade I–clade II branch in MP analyses is not even two times longer than the mtDNA diversity within clade I or clade II (fig. 1B). However, one major assumption of the "three-times rule" is that there are equal male and female effective population sizes (Palumbi, Cipriano, and Hare 2001). Although the relative effective population sizes of C. elegans males and hermphrodites in the wild are unknown, observations in laboratory culture suggest that hermaphrodites drastically outnumber males (Hodgkin and Doniach 1997), indicating that the "three-times rule" is inappropriate when evaluating coalescence of nuclear alleles in C. elegans.

We find only one clear instance of evidence for mating between clade I and clade II strains of C. elegans (isolate AB1). Analysis of phenotypic traits with respect to the hermaphrodite phylogeny reveal two additional candidates: CB4852 and CB4932. Hence, out of 27 natural isolates, we find that only three harbor direct or indirect evidence for mating between clade I and clade II strains in the wild. Geographic isolation appears to be an unlikely explanation, as no phylogeographic structure is evident in the tree. Furthermore, clade I and clade II strains have been found at the same collecting sites (table 1). The apparent infrequency of mating between divergent C. elegans strains may be a consequence of an overall low frequency of males in the wild and/or clade-specific mating preferences. Unfortunately, virtually no information is available about the natural environment and ecology of C. elegans. Information about variation in mating behavior among the natural isolates would also provide aid in understanding the causes behind the apparent lack of crossing between clade I and clade II. Experiments that compare the fitness of clade I–clade II hybrids generated in the lab to the fitness of the parental strains may provide insights into a possible role for epistatic selection against clade I–clade II hybrids in the wild. Finally, a broader set of C. elegans natural isolates distributed over a wider geographic range (no isolates are available from Asia, Africa, or South America) may provide insights into both the origins and maintenance of divergent alleles and the infrequency of mating between clade I and clade II C. elegans strains.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Funding for this work was provided by NIH R01 GM-36827 and the University of Missouri Research Board.


    Footnotes
 
1 Present address: Department of Biology, Indiana University, Bloomington, Indiana. Back

2 Present address: Hubbard Center for Genome Studies, University of New Hampshire, Durham, New Hampshire. Back

E-mail: ddenver{at}bio.indiana.edu. Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410.[CrossRef][ISI][Medline]

    Andolfatto, P. 2001. Adaptive hitchhiking effects on genome variability. Curr. Opin. Genet. Dev. 11:635-641.[CrossRef][ISI][Medline]

    Ballard, J. W. O., and M. Kreitman. 1994. Unraveling selection in the mitochondrial genome of Drosophila. Genetics 138:757-772.[Abstract/Free Full Text]

    Barnes, T. M., Y. Kohara, A. Coulson, and S. Hekimi. 1995. Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics 141:159-179.[Abstract/Free Full Text]

    Chasnov, J. R., and K. L. Chow. 2002. Why are there males in the hermaphroditic species Caenorhabditis elegans? Genetics 160:993-994.

    de Bono, M., and C. I. Bargmann. 1998. Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell 94:679-689.[ISI][Medline]

    Denver, D. R., K. Morris, M. Lynch, L. L. Vassilieva, and W. K. Thomas. 2000. High direct estimate of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science 289:2342-2344.[Abstract/Free Full Text]

    Egilmez, N. K., R. H. Ebert, and R. J. Shmookler Reis. 1995. Strain evolution in Caenorhabditis elegans: transposable elements as markers of interstrain evolutionary history. J. Mol. Evol. 40:372-381.[ISI][Medline]

    Elkin, C. J., P. M. Richardson, H. M. Fourcade, N. M. Hammon, M. J. Pollard, P. F. Predki, T. Glavina, and T. L. Hawkins. 2001. High-throughput plasmid purification for capillary sequencing. Genome Res. 11:1269-1274.[Abstract/Free Full Text]

    Grosshans, H., and F. J. Slack. 2002. Micro-RNAs: small is plentiful. J. Cell Biol. 156:17-21.[Abstract/Free Full Text]

    Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1994. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383-402.

    Hodgkin, J., and T. Doniach. 1997. Natural variation and copulatory plug formation in Caenorhabditis elegans. Genetics 146:149-164.[Abstract/Free Full Text]

    Koch, R., H. G. A. M. van Leuenen, M. van der Horst, K. Thijssen, and R. A. Plasterk. 2000. Single nucleotide polymorphisms in wild isolates of Caenorhabditis elegans. Genome Res. 10:1690-1696.[Abstract/Free Full Text]

    Lagos-Quintana, M., R. Rauhut, W. Lendeckel, and T. Tuschl. 2001. Identification of novel genes coding for small expressed RNAs. Science 294:853-858.[Abstract/Free Full Text]

    LaMunyon, C. W., and S. Ward. 2002. Evolution of larger sperm in response to experimentally increased sperm competition in Caenorhabditis elegans. Proc. R. Soc. Lond. B Biol. Sci. 269:1125-1128.[CrossRef][ISI][Medline]

    Lau, N. C., L. P. Lim, E. G. Weinstein, and D. P. Bartel. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294:858-862.[Abstract/Free Full Text]

    Lee, R. C., and V. Ambros. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294:862-864.[Abstract/Free Full Text]

    Okimoto, R., J. L. Macfarlane, D. O. Clary, and D. R. Wolstenholme. 1992. The mitochondrial genomes of two nematodes, Caenorhabditis elegans and Ascaris suum. Genetics 130:471-498.[Abstract/Free Full Text]

    Nachman M. W., and S. L. Crowell. 2000. Estimate of the mutation rate per nucleotide in humans. Genetics 156:297-304.[Abstract/Free Full Text]

    Palumbi, S. R., F. Cipriano, and M. P. Hare. 2001. Predicting nuclear gene coalescence from mitochondrial data: the three-times rule. Evolution Int. J. Org. Evolution 55:859-868.[ISI][Medline]

    Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:1475-1479.[Abstract/Free Full Text]

    Riddle, D. L., T. Blumenthal, B. J. Meyer, and J. R. Priess. 1997. C. elegans II. Cold Spring Harbor Laboratory Press, Plainview, NY.

    Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504.[Abstract/Free Full Text]

    Stein, L., P. Sternberg, R. Durbin, J. Thierry-Mieg, and J. Spieth. 2001. WormBase: network access to the genome and biology of Caenorhabditis elegans. Nucleic Acids. Res. 29:82-86.[Abstract/Free Full Text]

    Sulston, J., and J. Hodgkin. 1988. Methods. Pp. 587–606. in W. B. Wood, ed. The Nematode Caenorhabditis elegans. Cold Spring Harbor Laboratory Press, Plainview, NY.

    Swofford, D. 1991. PAUP: phylogenetic analysis using parsimony. Version 4.06b10. Illinois Natural History Survey, Champaign, Ill.

    The C. elegans Sequencing Consortium. 1998. Genome Sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012-2018.[Abstract/Free Full Text]

    Thomas, W. K., and A. C. Wilson. 1991. Mode and tempo of molecular evolution in the nematode Caenorhabditis: cytochrome oxidase I and calmodulin sequences. Genetics 128:269-279.[Abstract/Free Full Text]

    Vigilant, L., M. Stoneking, H. Harpending, K. Hawkes, and A. C. Wilson. 1991. African populations and the evolution of human mitochondrial DNA. Science 253:1503-1507.[ISI][Medline]

    Wicks, S. R., R. T. Yeh, W. R. Gish, R. H. Waterston, and R. H. Plasterk. 2001. Rapid gene mapping in Caenorhabditis elegans using a high density polymorphism map. Nat. Genet. 28:160-164.[CrossRef][ISI][Medline]

    Xia, X., M. S. Hafner, and P. D. Sudman. 1996. On transition bias in mitochondrial genes of pocket gophers. J. Mol. Evol. 43:32-40.[ISI][Medline]

Accepted for publication November 6, 2002.