Bendable Genes of Warm-blooded Vertebrates

Alexander E. Vinogradov

Institute of Cytology, Russian Academy of Sciences


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
It is shown that in the genomes of warm-blooded vertebrates the elevation of genic GC-content is associated with an increase in the bendability of the DNA helix, which is both absolute and relative as compared with random sequences. This trend takes place both in exons and introns, being more pronounced in the latter. At the same time, the free energy of melting (delta G) of exons and introns increases only absolutely with elevation of GC-content, whereas it decreases as compared with random sequences (again, this trend is stronger in the introns). In genes of cold-blooded animals, plants, and unicellular organisms, these regularities are weaker and often not consistent. Generally, there is a negative correlation between bendability and melting energy at any fixed GC-content value. This effect is stronger in the introns. These findings suggest that GC-enrichment of genes in the homeotherm vertebrates can be caused by selection for increased bendability of DNA.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genomes of higher eukaryotes consist of regions which differ in their GC-percent (isochores) (reviewed by D'Onofrio et al. 1999Citation ; Bernardi 2000Citation ). This heterogeneity reaches its highest degree in mammals and birds. It is the GC-rich regions which seem to be an evolved trait (Bernardi, Hughes, and Mouchiroud 1997Citation ). There are two alternative groups of views on the emergence of these regions: neutralist (e.g., mutation bias) (Wolfe, Sharp, and Li 1989Citation ; Wolfe and Sharp 1993Citation ; Ellsworth, Hewett-Emmett, and Li 1994Citation ; Eyre-Walker 1994Citation ; Francino and Ochman 1999Citation ) and selectionist. Among the proposed selectionist explanations there are those involving the physical DNA property (higher thermal stability) (Bernardi and Bernardi 1990Citation ) and the informational content of the coding sequences (codon usage bias for better translation performance or even shift in the amino acid composition) (D'Onofrio et al. 1999Citation ; Bernardi 2000Citation ). The latter hypotheses, however, cannot explain why the noncoding DNA (introns) of the GC-rich genes also show an increase in the GC-content. Here the GC-dependences of melting energy and bendability of DNA molecules are studied for the coding and noncoding parts of genes in different genomes and compared to those of random sequences.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The sequences of nuclear genes were extracted from GenBank (release 123). For better-presented genomes, namely, the human, mouse, nematode, fruitfly, cress thale, rice, and fission yeast, only genes with complete coding sequences (CDS) were taken; for others, all genes with at least two complete exons and introns between them were selected. Genes were checked for duplicates on the ground of CDS similarity (>99%). All coding sequences, including partial ones, were checked for the absence of internal stop codons. The intron-exon boundaries were taken from annotations. In total, 59,856 genes with 346,409 exons and 286,764 introns (total length of 204.3 Mb) were analyzed.

The random DNA sequences of 10-kb length were generated with 0.1% increments in the 1%–100% range of GC-content using Perl function rand. (Each base pair was drawn by two iterations, one for choice between the GC and AT pairs, and the other for choice between the purine and pyrimidine bases on a given strand. The real percent of GC-pairs and purine bases on a given strand was determined after the generation of the sequence.) Generally, the average content of purine bases on the coding strand in exons is about 48%, and in introns, about 52%. Therefore, the complete sets of random sequences were generated for different purine contents in the range of 45%–55% (with 2.5% increments); their bendability and melting energy were not found to vary significantly. Here, the results for the 50% purine content are presented.

The parameters under study were determined using the trinucleotide table for bendability based on consensus values obtained from the DNAse I digestion and nucleosome positioning studies (Gabrielian, Simoncsits, and Pongor 1996Citation ) and the dinucleotide table for free energy of melting (delta G) obtained from the UV absorbance and temperature profiles (SantaLucia, Allawi, and Seneviratne 1996Citation ) in a sliding tri- and dinucleotide frame (with 1 - nt step), respectively, and averaged for each sequence.

The statistical analyses were done with the Statgraphics (Statistical Graphics Co.) and Statistica (StatSoft, Inc.) software.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Both the bendability and the free energy of melting (which reflects the thermal stability) of various DNA sequences increase with elevation of the GC-content (fig. 1 ). However, the bendability of human exons and introns is increasing seemingly faster than that of the random sequences, whereas their melting energy is rising more slowly than its random-sequence counterpart (fig. 1 ). To check the statistical significance of this effect, the slopes of linear regression for genic and random sequences can be compared (fig. 1 , see legend). The slope of bendability is significantly higher in the human genes (exons, 0.0366 ± 0.0002; introns, 0.0368 ± 0.0002) as compared with random sequences (0.0302 ± 0.0004), whereas the reverse is true for the slope of melting energy (exons, 0.0108 ± 0.0000; introns, 0.0102 ± 0.0000; random sequences, 0.0114 ± 0.0000).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1.—The plot of (A, B) bendability and (C, D) melting energy versus GC-percent for (A, C) human exons and (B, D) introns, and for random DNA sequences (rand. seq.). The equations of linear regression for bendability: rand. seq. (for the GC-range of human genes), Y = 3.501 (±0.011) + X x 0.0302 (±0.0004), exons, Y = 3.382 (±0.016) + X x 0.0366 (±0.0002), introns, Y = 3.291 (±0.016) + X x 0.0368 (±0.0002). The corresponding equations for melting energy: random sequences, Y = 0.818 (±0.001) + X x 0.0114 (±0.0000), exons, Y = 0.837 (±0.001) + X x 0.0108 (±0.0000), introns, Y = 0.859 (±0.001) + X x 0.0102 (±0.0000). (The line of polynomial regression and its confidence limits are shown on each plot but cannot be discerned because they do not come out of the strand of points representing values for random sequences.)

 
However, these relationships are not quite linear (especially at the margins of GC-content range). Therefore, for correct comparison, the slopes for random sequences should be determined separately for each range of exonic or intronic GC-percents. The linear regression generally does not give a perfect approximation (r2 of bendability is only 95.4% for random sequences). Therefore, the dependences of bendability and melting energy on GC-content in random sequences were approximated using a nonlinear polynomial regression. The third-order polynomial for bendability and the second-order polynomial for melting energy were quite perfect and represented 99.96% and 99.998% of variance, respectively. These approximated values were subtracted from the exon and intron values of bendability and melting energy. The relationships of the obtained residuals with the exonic and intronic GC-content were analyzed (table 1, figs. 2 and 3 ). It can be seen that in the genes of homeotherms, the residuals of bendability correlate positively with the GC-content in both exons and introns, whereas the residuals of DNA melting energy show quite the opposite trend. It is noteworthy that both correlations are stronger in the introns as compared with exons (the difference between the corresponding correlation coefficients is significant for all the homeotherm cases, except bendability in the rabbit, which is represented by the smallest sample size).


View this table:
[in this window]
[in a new window]
 
Table 1 Coefficients of Correlation Between GC-percentage and Relative DNA Bendability or Relative Melting Energy (delta G),a and Partial Correlation Between Bendability and Melting Energy at Fixed GC-percentage for Genes of Different Organisms

 


View larger version (57K):
[in this window]
[in a new window]
 
Fig. 2.—The regression of (A, B) relative bendability and (C, D) relative melting energy on GC-percent for (A, C) human exons and (B, D) introns. (The relative values were obtained by subtraction of polynomial regression for random sequences. The Y-zero line corresponds to this regression, i.e., to the random sequences.) Dotted lines, confidence limits of regression (P = 0.95)

 
The bendability over melting energy trend is weaker in murids as compared with other mammals (table 1 ). This trend can also be seen in some lower animals (cold-blooded vertebrates and in vertebrates), although the correlations are weaker and may not be consistent. In unicellular organisms and plants, the correlations are much weaker and usually not consistent (table 1 ). Although the signs of correlations can be similar in the homeotherms and some other organisms, the distributions of residuals are, however, different: the greater part of the bendability residuals in the homeotherms is positive as compared to the lower organisms (cf. figs. 2 and 3 ). For the melting energy, the opposite trend is observed (figs. 2 and 3 ).

With only a few exceptions (exons of the rat, introns and exons of a green alga), there is a negative partial correlation between the bendability and melting energy at fixed GC-percent (table 1 ). This correlation is always stronger in the introns as compared with exons. (The coefficients of partial correlation between the polynomial-subtracted residuals of bendability and melting energy at fixed GC-percent were very similar and not shown.)


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The accelerated growth of bendability and lagging of melting energy with elevation of GC-percent in both exons and introns, as compared with random sequences, seem to contradict the hypothesis that mutation pressure is a cause of compositional heterogeneity in the genomes of warm-blooded vertebrates. (Because in the latter case, the bendability and melting energy of the introns at least, should not differ from the corresponding values for random sequences with the same GC content.) These results suggest that it is the need for bendability and not thermal stability which can be a leading force behind the elevation of GC-content in the genes of mammals and birds. The melting energy and bendability were found to correlate negatively at fixed GC-percent (table 1 ). This is probably because the thermostability of a DNA duplex at a given GC-content is determined by base stacking energy (Doktycz et al. 1992Citation ), which is inversely related to bendability (Anselmi et al. 2000Citation ).

Genome compositional heterogeneity is known to be lower in the murids as compared with other mammals (Robinson, Gautier, and Mouchiroud 1997Citation ; Douady et al. 2000Citation ). The compositional heterogeneity was found not only in the homeotherms, but also, to a lesser degree, in the lower animals (D'Onofrio et al. 1999Citation ; Jabbari and Bernardi 2000Citation ; Nekrutenko and Li 2000Citation ) and plants as well. Among the latter, it is most pronounced in the cereals but can be discerned also in the cress thale (Carels and Bernardi 2000Citation ; Nekrutenko and Li 2000Citation ). The present results suggest that in all these cases, except in cereals, this heterogeneity may be stipulated by the increase in bendability of GC-rich genome regions. The cereals differ from the homeotherms by a much higher exon-intron contrast in GC-content (Carels et al. 1998Citation ; Vinogradov 2001Citation ) and may present a special case. If these physical DNA properties are involved somehow in the GC-enrichment of cereal genomes, it is either the melting energy that may be a leading cause or there is some subtle balance between the two forces.

Although the GC-rich regions constitute only a minor part of the genome (10%–15%), they harbor a great part of the genes because of the very high gene concentration (Bernardi 2000Citation ). They are located in the early replicating and highly transcribed chromatin (Saccone et al. 1993, 1999Citation ; Federico, Saccone, and Bernardi 1998Citation ). Therefore, the DNA helix of these genes should be often bent and unbent in its transition from packaged to extended state to comply with the operation of transcription machinery. These requirements should extend both to the exons and introns and probably to the intergenic sequences as well (which are short and also GC-rich in the heavy isochores). The average molecular properties were suggested to dominate over the local features in the sequence-dependent nucleosome formation (Anselmi et al. 2000Citation ). It was supposed that introns can be necessary for correct chromatin structure (Zuckerkandl 1981Citation ; Trifonov 1993Citation ). In several cases, the involvement of introns in the nucleosome ordering was demonstrated experimentally (Lauderdale and Stein 1992Citation ; Liu et al. 1995Citation ). Therefore, it is interesting that the bendability over melting energy trend is more pronounced in the introns of the homeotherms as compared with their exons (table 1). In a seeming contradiction to the notion about possible significance of intronic bendability for the structure of chromatin, there is the fact that introns in the heavy isochores are GC-poorer than exons (e.g., Bernardi 2000Citation ; Vinogradov 2001Citation ). However, this can be explained by the impact of transposable elements, which decreases GC-content of introns even when these elements become nonrecognizable (Duret and Hurst 2001Citation ). The increase in the bendability of the highly expressed genes of mammals and birds may be associated with the higher organizational level of these animals, which requires fast and smoothly operating transcription.



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 3.—The regression of (A, B) relative bendability and (C, D) relative melting energy on GC-percent for nematode (A, C) exons and (B, D) introns. (The relative values were obtained by subtraction of polynomial regression for random sequences. The Y-zero line corresponds to this regression, i.e., to the random sequences.) Dotted lines, confidence limits of regression (P = 0.95)

 

    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The helpful comments of three anonymous reviewers are greatly appreciated. This work was supported by a grant from the Russian Foundation for Basic Research (RFBR).


    Footnotes
 
Kenneth Wolfe, Reviewing Editor

Keywords: isochore mutation bias GC-percent bendability thermal stability introns Back

Address for correspondence and reprints: Alexander E. Vinogradov, Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Ave. 4, St. Petersburg 194064, Russian Federation. aevin{at}mail.cytspb.rssi.ru . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Anselmi C., G. Bocchinfuso, P. De Santis, M. Savino, A. Scipioni, 2000 A theoretical model for the prediction of sequence-dependent nucleosome thermodynamic stability Biophys. J 79:601-613[Abstract/Free Full Text]

    Bernardi G., 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]

    Bernardi G., G. Bernardi, 1990 Compositional patterns in the nuclear genome of cold-blooded vertebrates J. Mol. Evol 31:265-281[ISI][Medline]

    Bernardi G., S. Hughes, D. Mouchiroud, 1997 The major compositional transitions in the vertebrate genome J. Mol. Evol 44: (Suppl. 1) S44-S51[ISI][Medline]

    Carels N., G. Bernardi, 2000 Two classes of genes in plants Genetics 154:1819-1825[Abstract/Free Full Text]

    Carels N., P. Hatey, K. Jabbari, G. Bernardi, 1998 Compositional properties of homologous coding sequences from plants J. Mol. Evol 46:45-53[ISI][Medline]

    D'Onofrio G., K. Jabbari, H. Musto, F. Alvarez-Valin, S. Cruveiller, G. Bernardi, 1999 Evolutionary genomics of vertebrates and its implications Ann. N. Y. Acad. Sci 870:81-94[Abstract/Free Full Text]

    Doktycz M. J., R. F. Goldstein, T. M. Paner, F. J. Gallo, A. S. Benight, 1992 Studies of DNA dumbbells. I. Melting curves of 17 DNA dumbbells with different duplex stem sequences linked by T4 endloops: evaluation of the nearest-neighbor stacking interactions in DNA Biopolymers 32:849-864[ISI][Medline]

    Douady C., N. Carels, O. Clay, F. Catzeflis, G. Bernardi, 2000 Diversity and phylogenetic implications of CsCl profiles from rodent DNAs Mol. Phylogenet. Evol 17:219-230[ISI][Medline]

    Duret L., L. D. Hurst, 2001 The elevated G and C content at exonic third sites is not evidence against neutralist models of isochore evolution Mol. Biol. Evol 18:757-762[Abstract/Free Full Text]

    Ellsworth D. L., D. Hewett-Emmett, W. H. Li, 1994 Evolution of base composition in the insulin and insulin-like growth factor genes Mol. Biol. Evol 11:875-885[Abstract]

    Eyre-Walker A., 1994 DNA mismatch repair and synonymous codon evolution in mammals Mol. Biol. Evol 11:88-98[Abstract]

    Federico C., S. Saccone, G. Bernardi, 1998 The gene-richest bands of human chromosomes replicate at the onset of the S-phase Cytogenet. Cell Genet 80:83-88[ISI][Medline]

    Francino M. P., H. Ochman, 1999 Isochores result from mutation not selection Nature 400:30-31[ISI][Medline]

    Gabrielian A., A. Simoncsits, S. Pongor, 1996 Distribution of bending propensity in DNA sequences FEBS Lett 393:124-130[ISI][Medline]

    Jabbari K., G. Bernardi, 2000 The distribution of genes in the Drosophila genome Gene 247:287-292[ISI][Medline]

    Lauderdale J. D., A. Stein, 1992 Introns of the chicken ovalbumin gene promote nucleosome alignment in vitro Nucleic Acids Res 20:6589-6596[Abstract]

    Liu K., E. P. Sandgren, R. D. Palmiter, A. Stein, 1995 Rat growth hormone gene introns stimulate nucleosome alignment in vitro and in transgenic mice Proc. Natl. Acad. Sci. USA 92:7724-7728[Abstract]

    Nekrutenko A., W. H. Li, 2000 Assessment of compositional heterogeneity within and between eukaryotic genomes Genome Res 10:1986-1995[Abstract/Free Full Text]

    Robinson M., C. Gautier, D. Mouchiroud, 1997 Evolution of isochores in rodents Mol. Biol. Evol 14:823-828[Abstract]

    Saccone S., A. De Sario, J. Wiegant, A. K. Raap, G. Della Valle, G. Bernardi, 1993 Correlations between isochores and chromosomal bands in the human genome Proc. Natl. Acad. Sci. USA 90:11929-11933[Abstract]

    Saccone S., C. Federico, I. Solovei, M. F. Croquette, G. Della Valle, G. Bernardi, 1999 Identification of the gene-richest bands in human prometaphase chromosomes Chromosome Res 7:379-386[ISI][Medline]

    SantaLucia J., H. Allawi, P. A. Seneviratne, 1996 Improved nearest-neighbor parameters for predicting DNA duplex stability Biochemistry 35:3555-3562[ISI][Medline]

    Trifonov E. M., 1993 Spatial separation of overlapping messages Comput. Chem 117:27-31

    Vinogradov A. E., 2001 Within-intron correlation with base composition of adjacent exons in different genomes Gene 276:143–151.

    Wolfe K. H., P. M. Sharp, 1993 Mammalian gene evolution: nucleotide sequence divergence between mouse and rat J. Mol. Evol 37:441-456[ISI][Medline]

    Wolfe K. H., P. M. Sharp, W. H. Li, 1989 Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]

    Zuckerkandl E., 1981 A general function of noncoding polynucleotide sequences. Mass binding of transconformational proteins Mol. Biol. Rep 7:149-158[ISI][Medline]

Accepted for publication August 13, 2001.