Intron Size and Genome Size in Plants

Jonathan F. Wendel*,2, Richard C. Cronn*{ddagger}, Ines Alvarez*, Bao Liu*{ddagger}, Randall L. Small*§ and David S. Senchina*

*Department of Botany, Iowa State University;
{dagger}Pacific Northwest Research Station, USDA Forest Service, Corvallis;
{ddagger}Institute of Genetics and Cytology, Northeast Normal University, Changchun;
§Department of Botany, University of Tennessee, Knoxville

It has long been known that genomes vary over a remarkable range of sizes in both plants (Bennett, Cox, and Leitch 1997Citation ) and animals (Gregory 2001Citation ). It also has become evident that across the broad phylogenetic sweep, genome size may be correlated with intron size (Deutsch and Long 1999Citation ; Vinogradov 1999Citation ; McLysaght et al. 2000Citation ), suggesting that some component of genome size evolution takes place within genes. Examples include humans and pufferfish (Fugu), where comparisons of 199 introns in 22 orthologous genes showed that introns in Fugu were on average eight times as small as those in humans, consistent with their ratio of genome sizes (McLysaght et al. 2000Citation ). Similarly, Deutsch and Long (1999)Citation tabulated intron sizes across a broad phylogenetic spectrum of eukaryotes and noted a general but weak correlation with genome size, with humans having the most and longest introns (mean of 3.4 kbp) among the 10 taxa studied. Intron size is also correlated with genome size in Drosophila (Moriyama, Petrov, and Hartl 1998Citation ), showing that the correlation may extend to more recent divergences.

At present there is little information on the correlation between genome and intron sizes in plants, although there are suggestions that plants with small genomes have smaller introns (Deutsch and Long 1999Citation ; Vinogradov 1999Citation ). Whereas broad comparisons across widely divergent taxa are now possible given completed draft sequences for the rice (Goff et al. 2002Citation ; Yu et al. 2002Citation ) and Arabidopsis (Arabidopsis Genome Initiative 2000Citation ) genomes, the divergence time between Poaceae and Brassicaceae is so large that the influence of genome size on intron size may be confounded by numerous other, unstudied covariables. It seems likely that more informative studies will involve closely related taxa that vary significantly in genome size but which share recent evolutionary history and a broad suite of life-history features. An additional advantage of comparing close relatives is that orthology among genes, and hence introns, may be more readily established. This latter point may be especially important, given the relative lability of copy-number for many gene families (Small and Wendel 2000Citation ).

To exemplify this approach, we studied the relationship between intron size and genome size for orthologous genes from diploid and allopolyploid species of Gossypium (cotton) and from taxa representing its phylogenetic outgroup, Gossypioides kirkii and Kokia kuaiensis (Seelanan, Schnabel, and Wendel 1997Citation ; Wendel et al. 2002Citation ). The allopolyploid Gossypium species included the commercially important cottons G. hirsutum (Upland cotton) or G. barbadense (Pima cotton; Sea Island cotton). Allopolyploid cotton contains two, largely colinear (Brubaker, Paterson, and Wendel 1999Citation ) genomes ("A" and "D") that were reunited in a common nucleus as a consequence of a remarkable interspecific hybridization event during the Pleistocene (Wendel 1989Citation ; Wendel and Cronn 2002Citation ), involving two diploids (A genome, D genome) that had evolved in isolation in different hemispheres for perhaps 5–10 Myr (Cronn et al. 2002aCitation ; Wendel and Cronn 2002Citation ). Included in the present study were the closest living models of the diploid progenitors, namely Gossypium herbaceum and Gossypium arboreum (A genome) and Gossypium raimondii (D genome). These two diploids vary nearly twofold in genome size (2C = 2.0 pg and 3.8 pg for the D and A genomes, respectively [see Endrizzi, Turcotte, and Kohel 1985Citation ]); these differences are maintained in the derivative allopolyploid (2C = 5.8 pg), which for this and other reasons exhibits near-exclusive bivalent pairing at meiosis (Endrizzi, Turcotte, and Kohel 1985Citation ; Wendel and Cronn 200Citation 2). The phylogenetic outgroups selected have genomes nearly half as small again as the smallest cotton genome, i.e., 2C = 1.2 pg (Wendel et al. 2002Citation ). DNA was isolated from young leaves using published methods (Paterson, Brubaker, and Wendel 1993Citation ; Tel-zur 1999Citation ) or the Qiagen DNeasy Plant kit following the manufacturer's protocol. In selecting genes to include we focused on those for which robust evidence of orthology could be obtained. The precise nature of this evidence varied among genes but included Southern hybridization against genomic digests (data not shown) to verify single-copy status under high-stringency wash conditions, phylogenetic analysis of sequence data to evaluate the expected concordance of true orthologs with the established organismal history (Wendel and Albert 1992Citation ; Wendel 1995Citation ; Wendel and Cronn 2002Citation ), and comparative mapping to confirm that the genes isolated mapped to equivalent positions in the colinear genomes (Reinisch et al. 1994Citation ; Brubaker, Paterson, and Wendel 1999Citation ). An additional important criterion was the ability to readily polymerase chain reaction (PCR)-amplify the gene from all species studied. The 28 sets of orthologs selected represent a diversity of genes, including transcription factors, enzymes such as alcohol dehydrogenase and cellulose synthase, and a number of others putatively identified to function based on database searches (table 1 Go Go ). Some of the genes are described more fully elsewhere (Cronn, Small, and Wendel 1999Citation ; Small and Wendel 2000Citation ; Cedroni et al. 2002Citation ).


View this table:
[in this window]
[in a new window]
 
Table 1 Intron Sizes in Gossypium and Its Phylogenetic Outgroup

 

View this table:
[in this window]
[in a new window]
 
Table 1 Continued

 

View this table:
[in this window]
[in a new window]
 
Table 1 Continued

 
Primers for PCR amplification and sequencing were designed as described previously (Cronn, Small, and Wendel 1999Citation ; Small and Wendel 2000Citation ; Cedroni et al. 2002Citation ) or from cotton EST sequences in GenBank. Amplification and sequencing primers are available at J. Wendel's web site (http://www.botany.iastate.edu/~jfw/HomePage/jfwdata_sets.html). In general, two different amplification protocols were used on MJ Research thermocyclers. The first was a "touchdown PCR" method: 94°C for 3 min, followed by 10 cycles of 94°C for 1 min, 56°C for 1 min, and 72°C for 2.5 min accompanied by a 0.6°C decrease in annealing temperature each cycle, followed by 25 cycles of 94°C for 1 min, 50°C for 1 min, and 72°C for 2.5 min, ending with a 72°C final extension for 7 min. Other genes were amplified using an initial hot-start of 94°C for 3 min followed by 30 cycles of 94°C for 30 s, 54°C for 30 s, and 72°C for 1 min and 15 s, ending with a final extension at 72°C for 6 min. Annealing temperatures ranged among genes, however, from 48°C to 66°C, and hence, the general amplification conditions given above were adjusted on a gene-by-gene basis when necessary. Sequences that amplified with difficulty were cloned using standard TA cloning protocols and then sequenced from plasmid vectors. Automated sequencing was conducted using the ABI Big Dye v. 2.0 fluorescent primers and ABI Prism 377-3700 systems at the Iowa State DNA Sequencing and Synthesis Facility.

For each gene studied, the allopolyploid species contained two homoeologous sequences, representing descendants of those contributed by the A and D genome donors at the time of polyploid formation. To isolate both homoeologs we cloned amplification products and identified the two duplicates by restriction site analysis, or used homoeolog-specific amplification primers, or discovered both copies following screens of bacterial artifical chromosomes (BAC) libraries from G. hirsutum cv. Acala Maxxa (Tomkins et al. 2001Citation ) and G. barbadense cv. Pima S6 (A. Paterson, personal communication). Since each BAC contained only one of the two homoeologs, this latter strategy proved particularly effective against the nagging problem of in vitro PCR recombination (Cronn et al. 2002bCitation ).

Sequences were aligned using BioEdit v. 5.0.9 (Hall 1999Citation ) and analyzed for substitutions using DnaSP v. 3.53 (Rozas and Rozas 1999Citation ). Alignment of orthologs was straightforward due to the low levels of sequence divergence among the taxa studied. Substitution rates for orthologous exons in A and D genome species averaged 3.8% and 0.8% for synonymous and nonsynonymous sites, respectively, across the genes studied, with values approximately twice this size in comparisons with the outgroup. Intron divergence was slightly lower than that of synonomous sites in exons, averaging 3% across orthologous introns in A and D genome cottons and twice this amount in comparisons of either diploid with the outgroup. This low level of sequence divergence additionally facilitated inference of orthologous exon-intron boundaries among the genomes studied. Splice sites were inferred primarily through direct comparisons of genomic sequences with the orthologous cotton cDNAs from which the original PCR amplification primers were designed. For some genes, splice sites were inferred from BLAST searches against other EST databases, as described (Cronn, Small, and Wendel 1999Citation ; Small and Wendel 2000Citation ; Cedroni et al. 2002Citation ).

Intron sizes were inferred for partial or full-length genes for 28 sets of orthologs. As shown in table 1 , intron number varied widely among the genes analyzed, ranging from 1 (14 genes) to 11 (CesA1). Totalled across the 28 genes, 76 introns were both unambiguously inferred and sequenced from the 5 genomes (3 diploid, 2 homoeologous genomes in the allopolyploid), although only 56 of these were obtained from the outgroup. Gossypium introns ranged in size from 71 bp (AdhC) to more than 918 bp (a partial intron from A1550, a putative aldehyde dehydrogenase) with a mean length of 149.5 ± 151.4 and a median length of 94 bases. These estimates compare closely to the mean (Arabidopsis Genome Initiative 2000Citation ) and median (Yu et al. 2002Citation ) intron size estimates of 168 and 100, respectively, for a near-exhaustive sampling of genes from Arabidopsis, but rice introns apparently are larger (mean and median of 356 and 138, respectively, Yu et al. 2002Citation ). We note that both of these model organisms have genomes that are much smaller than the Gossypium species studied here, yet their mean intron size is larger.

With respect to the primary issue of whether genome and intron sizes are correlated within Gossypium, the data of table 1 show unequivocally that these two genomic features are uncoupled. For homologous and complete introns, the difference in cumulative intron length between the A genome (11,357 bp) and D genome (11,368 bp) diploids was only 11 nucleotides, with the smaller genome having the negligibly higher (0.1%) number. Moreover, there was no case among the 76 introns scored where intron sizes differ significantly between the diploid cottons, with all but two introns (numbers 5 and 6 of C4 kinase) differing by 8 bp or less. Similarly, total intron lengths for any given gene did not differ between the genomes studied. These results extend to the polyploid level, where the data show that intron sizes for homoeologous genes in allopolyploid cotton do not differ appreciably from each other or from those of their diploid progenitors. This latter finding is novel, though not unexpected given earlier, related results (Cronn, Small, and Wendel 1999Citation ; Small and Wendel 2000Citation ). When data are tabulated for the subset of 56 homologous introns sequenced in either outgroup genus Gossypioides or Kokia, both of which have much smaller genomes than Gossypium, the same general conclusions are reached, with mean intron sizes in Gossypium and its outgroup differing in length by an average of two nucleotides (means of 161.2 and 159.2, respectively). Thus, the rate of indel accumulation in introns was relatively low, with no evident differences among taxa in this respect.

Although we sampled only a tiny fraction of the introns in the Gossypium genome, the near-identity of intron sizes across taxa varying twofold in genome size and the uniformity of this observation across genes suggests that our primary conclusion is robust, i.e., that intron and genome size evolution are uncoupled in Gossypium. It well may be that this will turn out to be common in plants, noting again the comparison of intron sizes in Arabidopsis, Oryza, and Gossypium. Most researchers in animals have focused on broader evolutionary scales than that encompassed here, with the notable exception of Moriyama, Petrov, and Hartl (1998)Citation , who compared the sizes of 115 orthologous introns in two Drosophila species that vary twofold in genome size, much as in the present study. They reported that D. virilis, with a genome size of 0.34 to 0.38 pg, had introns significantly larger (mean of 394 bp) than those of D. melanogaster (mean of 283 bp), which has the smaller genome (0.18–0.21 pg). Additional studies are needed to evaluate the generality of this difference between insects and plants with respect to intron and genome size correlation.

One explanation for intron size differences among organisms is that they vary with respect to inherent mutational processes that generate insertions and deletions (Ogata, Fujibuchi, and Kanehisa 1996Citation ; Moriyama, Petrov, and Hartl 1998Citation ; Petrov et al. 2000Citation ; Petrov 2001Citation ). In the present study, either divergence amounts were too low to detect subtle differences in deletional bias or such differences do not exist in the lineages examined. In humans, shorter introns have been shown to have more of a mutational bias toward deletions than do longer introns (Vinogradov 2002Citation ), suggesting a causal connection between intron size and relative rates of indel accumulation. Carvalho and Clark (1999)Citation , in noting that the strength of natural selection should be related to recombination rate, showed a biased occurrence of longer introns in D. melanogaster in regions of low recombination, consistent with the notion that larger introns are slightly deleterious. Comeron and Kreitman (2000)Citation , however, propose that insertions that create longer introns are selectively advantageous in regions of low recombination precisely because they enhance recombination, thereby counterbalancing the mutational bias toward deletions. More recently, it has been suggested that the association between intron size and recombination rate is a passive response to differences in effective population size, without having to invoke natural selection at the level of the gene (Lynch 2002Citation ). The studies cited underscore the complexity of the issue, with determinants of intron size reflecting a balance of evolutionary forces potentially operating at the population, whole genome (Petrov 2001Citation ), and genic levels.

It was noted earlier that the correlations between genome and intron sizes that exist at the broader phylogenetic scale (e.g., human vs. avian) are relatively weak (Deutsch and Long 1999Citation ) and that "other factors are likely to be involved in the evolution of intron size" (loc. cit., page 3226). Moreover, whether a correlation is observed clearly depends on the taxa studied as well as the phylogenetic scale; maize and humans, for example, have rather similar genome sizes, but introns in humans are on average an order of magnitude larger than those in maize. As noted by others (Wong et al. 2000Citation ; Yu et al. 2002Citation ), this difference in gene organization reflects one of the most obvious differences between plant and mammalian genomes, with most transposable element insertions occurring between genes in the former (SanMiguel et al. 1996Citation ; Bennetzen 1998Citation ; Bennetzen 2000Citation ) but within genes (introns) in the latter (Wong, Passey, and Yu 2001Citation ). Thus, differences in TE activity and insertional preference likely explain much of the observed correlation between genome and intron sizes in the broader phylogenetic surveys (e.g., Hughes and Hughes 1995Citation ; Deutsch and Long 1999Citation ; Vinogradov 1999Citation ; McLysaght et al. 2000Citation ).

For comparisons among more narrowly circumscribed groups, the proximate and ultimate causes of intron size evolution are likely to be more subtle and may reflect the balance of several or more underlying mechanisms as well as external and internal evolutionary forces (Petrov 2001Citation ). Moriyama, Petrov, and Hartl (1998)Citation interpreted the longer introns in D. virilis compared with the introns in D. melanogaster to suggest that mechanisms governing genome size change "operate more or less uniformly" throughout the genome. The present study demonstrates that this need not be the case; intron sizes in plants may remain remarkably static even when confronted with mechanisms that massively expand (or contract—Wendel et al. 2002Citation ) other genomic components. An important corollary, with general significance to the issue of C-value evolution, is that genome size expansion and contraction likely reflect heterogeneous forces and mechanisms that need not uniformly affect noncoding genomic constituents.

Acknowledgements

We are grateful to R. Noyes, A. Paterson, and J. Rong for picking BAC clones, T. Wilkins and K. Shockey for GhCLK1 sequences, C. Grover for CesA sequence data, R. Percifield for technical assistance, and K. Adams for discussion. Financial support was provided by the National Science Foundation, the US-Israel Binational Science Foundation, the Plant Science Institute of Iowa State University, and the Spanish Ministry of Education, Culture, and Sports.

Footnotes

Kenneth Wolfe, Reviewing Editor

Keywords: molecular evolution Gossypium C value DNA content variation Back

Address for correspondence and reprints: Jonathan F. Wendel, Department of Botany, Iowa State University, Ames, Iowa 50011. E-mail: jfw{at}iastate.edu . Back

References

    Arabidopsis Genome Initiative. 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]

    Bennett M. D., A. V. Cox, I. J. Leitch, 1997 Angiosperm DNA c-values database http://www.rbgkew.org.uk/cval/database1.html

    Bennetzen J. L., 1998 The structure and evolution of angiosperm nuclear genomes Curr. Opin. Plant Biol 1:103-108[ISI][Medline]

    ———. 2000 Transposable element contributions to plant gene and genome evolution Plant Mol. Biol 42:251-269[ISI][Medline]

    Brubaker C. L., A. H. Paterson, J. F. Wendel, 1999 Comparative genetic mapping of allotetraploid cotton and its diploid progenitors Genome 42:184-203[ISI]

    Carvalho A. B., A. G. Clark, 1999 Intron size and natural selection Nature 401:344.[ISI][Medline]

    Cedroni M. L., R. C. Cronn, K. L. Adams, T. A. Wilkins, J. F. Wendel, 2002 Evolution and expression of MYB genes in diploid and polyploid cotton Plant Mol. Biol. (in press).

    Comeron J. M., M. Kreitman, 2000 The correlation between intron length and recombination in Drosophila. Dynamic equilibrium between mutational and selective forces Genetics 156:1175-1190[Abstract/Free Full Text]

    Cronn R. C., M. Cedroni, T. Haselkorn, C. Grover, J. F. Wendel, 2002b. PCR-mediated recombination in amplification products derived from polyploid cotton Theor. Appl. Genet 104:482-489[ISI][Medline]

    Cronn R. C., R. L. Small, T. Haselkorn, J. F. Wendel, 2002a. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes Am. J. Bot 84:707-725

    Cronn R., R. L. Small, J. F. Wendel, 1999 Duplicated genes evolve independently following polyploid formation in cotton Proc. Natl. Acad. Sci. USA 96:14406-14411[Abstract/Free Full Text]

    Deutsch M., M. Long, 1999 Intron-exon structure of eukaryotic model organisms Nucleic Acids Res 27:3219-3228[Abstract/Free Full Text]

    Endrizzi J. E., E. L. Turcotte, R. J. Kohel, 1985 Genetics, cytogenetics, and evolution of Gossypium Adv. Genet 23:271-375

    Goff S. A., D. Ricke, T.-H. Lan, et al. (52 co-authors) 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. japonica) Science 296:92-100[Abstract/Free Full Text]

    Gregory T. R., 2001 Animal genome size database http://www.genomesize.com

    Hall T. A., 1999 BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Nucleic Acids Symp. Ser 41:95-98

    Hughes A. L., M. K. Hughes, 1995 Small genomes for better flyers Nature 377:391.[ISI][Medline]

    Lynch M., 2002 Intron evolution as a population-genetic process Proc. Natl. Acad. Sci. USA 99:6118-6123[Abstract/Free Full Text]

    McLysaght A., A. J. Enright, L. Skrabanek, K. H. Wolfe, 2000 Estimation of synteny conservation and genome compaction between pufferfish (Fugu) and human Yeast 17:22-36[ISI][Medline]

    Moriyama E. N., D. A. Petrov, D. L. Hartl, 1998 Genome size and intron size in Drosophila Mol. Biol. Evol 15:770-773[Free Full Text]

    Ogata H., W. Fujibuchi, M. Kanehisa, 1996 The size differences among mammalian introns are due to the accumulation of small deletions FEBS Lett 390:99-103[ISI][Medline]

    Paterson A. H., C. L. Brubaker, J. F. Wendel, 1993 A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis Plant Mol. Biol. Rep 11:122-127

    Petrov D. A., 2001 Evolution of genome size: new approaches to an old problem Trends Genet 17:23-28[ISI][Medline]

    Petrov D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, K. L. Shaw, 2000 Evidence for DNA loss as a determinant of genome size Science 287:1060-1062[Abstract/Free Full Text]

    Reinisch A. J., J. Dong, C. L. Brubaker, D. M. Stelly, J. F. Wendel, A. H. Paterson, 1994 A detailed RFLP map of cotton, Gossypium hirsutum x G. barbadense: chromosome organization and evolution in a disomic polyploid genome Genetics 138:829-847[Abstract/Free Full Text]

    Rozas J., R. Rozas, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175[Abstract/Free Full Text]

    SanMiguel P., A. Tikhonov, Y. K. Jin, et al. (11 co-authors) 1996 Nested retrotransposons in the intergenic regions of the maize genome Science 274:765-768[Abstract/Free Full Text]

    Seelanan T., A. Schnabel, J. F. Wendel, 1997 Congruence and consensus in the cotton tribe Syst. Bot 22:259-290[ISI]

    Small R. L., J. F. Wendel, 2000 Copy number lability and evolutionary dynamics of the Adh gene family in diploid and tetraploid cotton (Gossypium) Genetics 155:1913-1926[Abstract/Free Full Text]

    Tel-zur N., 1999 Modified CTAB procedure for DNA isolation from epiphytic cacti of the genera Hylocereus and Selenicerus (Cactaceae) Plant Mol. Biol. Rep 17:249-254.[ISI]

    Tomkins J. P., D. G. Peterson, T. J. Yang, D. Main, T. A. Wilkins, A. H. Paterson, R. A. Wing, 2001 Development of genomic resources for cotton (Gossypium hirsutum L.): BAC library construction, preliminary STC analysis, and identification of clones associated with fiber development Mol. Breed 8:255-261[ISI]

    Vinogradov A. E., 1999 Intron-genome size relationship on a large evolutionary scale J. Mol. Evol 49:376-384[ISI][Medline]

    ———. 2002 Growth and decline of introns Trends Genet 18:232-236[ISI][Medline]

    Wendel J. F., 1989 New World tetraploid cottons contain Old World cytoplasm Proc. Natl. Acad. Sci. USA 86:4132-4136[Abstract]

    ———. 1995 Cotton Pp. 358–366 in N. Simmonds and J. Smartt, eds. Evolution of crop plants. Longman, London

    Wendel J. F., V. A. Albert, 1992 Phylogenetics of the cotton genus (GossypiumL.): character-state weighted parsimony analysis of chloroplast DNA restriction site data and its systematic and biogeographic implications Syst. Bot 17:115-143[ISI]

    Wendel J. F., R. C. Cronn, 2002 Polyploidy and the evolutionary history of cotton Adv. Agron. (in press).

    Wendel J. F., R. C. Cronn, J. S. Johnston, H. J. Price, 2002 Feast and famine in plant genomes Genetica 115:37-47[ISI][Medline]

    Wong G. K.-S., D. A. Passey, Y.-Z. Huang, Z. Yang, J. Yu, 2000 Is "junk" DNA mostly intron DNA? Genome Res 10:1672-1678.[Abstract/Free Full Text]

    Wong G. K.-S., D. A. Passey, J. Yu, 2001 Most of the human genome is transcribed Genome Res 11:1975-1977.[Free Full Text]

    Yu J., S. Hu, J. Wang, et al. (97 co-authors) 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. indica) Science 296:79-91[Abstract/Free Full Text]

Accepted for publication August 27, 2002.