In silico p53 mutation hotspots in lung cancer

P. D. Lewis1,2,4 and J. M. Parry3

1 School of Biosciences, Cardiff University, Cardiff CF10 3US, UK, 2 Biostatistics and Bioinformatics Unit, University of Wales College of Medicine, Cardiff CF14 4XN, UK and 3 Centre for Molecular and Genetic Toxicology, School of Biological Sciences, University of Wales, Swansea SA2 8PP, UK

4 To whom correspondence should be addressed Email: lewispd{at}cf.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
For cancer one of the primary aims of molecular epidemiology is to identify the endogenous or exogenous cause of mutations within a gene. Regarding exogenous mutagens, many mutation data have become available via in vitro and in vivo mutation assays and become publicly available through mutation databases such as the Mammalian Gene Mutation Database (http://lisntweb.swan.ac.uk/cmgt/index.htm). One particular mutation assay incorporates the bacterial supF tRNA gene which allows selection of mutations at virtually all nucleotides. We have developed an algorithm called LwPy53 that utilizes mutation data from supF that can be used to predict chemically induced hot-spots along the p53 gene. The prediction is based on a number of parameters: the mutability of supF dinucleotides after treatment with a mutagen of interest; DNA curvature along the p53 gene; the selectability of a mutation along the gene; the likelihood of a site being within a nucleosome. We applied LwPy53 to exons 5, 7 and 8 of p53 using benzo[a]pyrene diol epoxide (BPDE)-induced mutation data for supF to obtain a predicted BPDE G->T transversion spectrum after hypothetical treatment with BPDE. The resulting predicted mutation distribution reveals strong mutation hot-spots at codons 157, 248 and 273 that correlate with known BPDE adduct hot-spots within p53. The predicted BPDE spectrum strongly resembles the G->T mutation spectrum compiled from known lung cancer mutation data from smokers and further supports evidence that BPDE contributes to the overall smoking-related mutation distribution in lung cancer. The algorithm shows how BPDE target sequence specificity and DNA curvature both shape the overall mutation distribution.

Abbreviations: BPDE, (+/–)-anti-7ß,8{alpha}-dihydroxy-9{alpha},10{alpha}-epoxy-7,8,9,10-tetrahydrobenzo[a]pyrene; MGMD, Mammalian Gene Mutation Database; PAH, polycyclic aromatic hydrocarbon


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
There is little doubt that most carcinogens are mutagens forming DNA adducts that have the potential to lead to a mutational event. A mutation that activates a proto-oncogene or inactivates a tumour suppressor gene can result in a selective advantage for cell proliferation that can ultimately lead to malignant transformation. One principal and vital aim in cancer research is to identify exogenous and endogenous mutagens that cause tumours and understand the underlying mechanisms that lead from adduct formation to mutagenesis. Once a mutagen is identified as a carcinogen a number of important factors need to be determined, such as: identification of the genes targeted in a given cell type; environmental (cellular) factors (structural and physical) that affect adduct formation; tissue specificity (exposure); the ability of the DNA repair machinery of a particular cell type to remove adducts in different sequence contexts.

In order to gain insight into the underlying molecular events of mutagenesis many in vivo and in vitro model systems have been developed to detect mutations. The enormous accumulation of mutant data for environmental mutagens has inevitably led to the creation of publicly available mutation databases, the most comprehensive for data from mammalian cell lines being The Mammalian Gene Mutation Database (MGMD) (http://www.listnweb.swan.ac.uk/cmgt/index.htm) (1). The complex pattern of mutations induced by a mutagen is generally referred to as a mutation spectrum. Qualitatively, a mutation spectrum describes the types of mutations that are observed at various frequencies, e.g. GC->TA transversions, deletions, etc., whereas quantitatively, a mutation spectrum reveals both the types and distribution of mutations along a given DNA sequence that includes both mutational hot-spots and cold-spots. Such information could give some insight into the involvement of a particular carcinogen in a certain cancer if the tumour-specific mutation spectrum is known. Many somatic mutation data have also become available for a number of cancer-related genes, in particular the p53 tumour suppressor gene, where mutations are observed in ~50% of human tumours. The extensive IARC TP53 mutation database (http://www.iarc.fr/p53/index.html) (2) contains over 15 000 p53 somatic mutations recorded for many different cancer types.

Previously, mutation information has been extracted from in vitro studies of mutagens and, combined with other factors, correlations made between mutagen, cancer type and tumour-specific mutations. A good example is the exposure of people in parts of Asia, Africa and North America to aflatoxin B1 and increased risk of hepatocellular carcinoma (3), where a G->T transversion mutation hot-spot is observed at codon 249 of the p53 gene in both tumour and human liver cells exposed to aflatoxin B1 in vitro. The association between skin cancer, characterized by tandem CC->TT transitions, and sunlight exposure is also strengthened by in vitro studies demonstrating the induction of CC->TT mutations after exposure to ultraviolet light (3). On the other hand, determination of the mutagens causing lung cancer is much more difficult due to the multitude of mutagens in tobacco smoke. It is estimated that ~90% of lung cancer deaths in the USA are caused by smoking (4) and ~60% of lung cancer cases contain p53 mutations (5). In an effort to pinpoint the mutagen(s) responsible for the lung cancer p53 mutation spectrum, Denissenko and co-workers (6) showed that adduct hot-spots of the polycyclic aromatic hydrocarbon (PAH) (+/–)-anti-7ß,8{alpha}-dihydroxy-9{alpha},10{alpha}-epoxy-7,8,9,10-tetrahydro-benzo[a]pyrene (BPDE), found in cigarette smoke, correlate with the position of major lung cancer p53 mutation hot-spots. The mutagenic signature of BPDE is also the G->T transversion, observed at high frequency in p53 in lung cancer of smokers but not non-smokers (7), in particular at methylated CpG sites within codons 157, 158, 245, 248 and 273. The potential of PAH adducts to induce point mutations is heavily dependent on the proximal and distal sequence context surrounding the adduct-bound nucleotide (8,9). It has also been established that BPDE binds preferentially to methylated CpG sites within the p53 gene (10) as well as genes in mammalian cell lines (11). Others have argued against the idea of tobacco smoke directly causing p53 mutations, primarily Rodin and Rodin (12), who suggested that p53 mutations observed in lung cancer result from selection of endogenous mutations caused by cellular environmental stress caused by smoking.

Ideally, determining the association between a specific carcinogen and cancer type would involve prediction of not just the types of mutation that would arise but also the actual carcinogen-specific target sequences in a gene such as p53, using mutation data generated in vitro, i.e. examining mutable sequences in the test gene for a given carcinogen and extrapolating this sequence information to the cancer gene. Such a predicted carcinogen-specific mutation distribution along a cancer gene in combination with knowledge of adduct distribution and exposure would allow for greater certainty when predicting the cause of tumourigenesis. However, the majority of genes in mutagen test systems are protein coding, displaying selection bias for certain nucleotides and thus preventing confident evaluation of the mutability of short nucleotide sequences. To bypass this problem the Escherichia coli gene supF, which encodes a tyrosine amber suppressor tRNA molecule, has been adapted for the study of mutagenesis in several shuttle vector plasmids (reviewed in 13). One of the main advantages of supF in shuttle vector plasmids is the extreme sensitivity of the gene to mutagenic inactivation. The repressor tRNA region, that is monitored for mutations, is only 85 nt in length and all three possible base substitution mutations can be detected at most nucleotides. Therefore, the selection bias associated with protein coding mutagenesis marker genes is largely avoided. We have recently demonstrated the usefulness of supF mutation data for analysing and revealing the underlying patterns of spontaneous (14) and UV-induced (15) mutations in tissue- and species-specific cell types.

We describe here an algorithm, LwPy53, to predict the BPDE G->T mutation distribution along exons 5, 7 and 8 of the p53 gene using BPDE-induced mutation data at the level of the dinucleotide from the supF gene (10). In addition to target sequence specificity, the mutation distribution of a gene also depends upon the regional accessibility of DNA to the mutagen and the regional rate of DNA repair systems to successfully remove an adduct. Therefore, the LwPy53 algorithm utilizes additional parameters to represent these factors, derived from information concerning predicted p53 chromatin structure (16) and deviation from the predicted average p53 DNA curvature. The p53 BPDE mutation distribution predicted by the algorithm is remarkably similar to the p53 mutation distribution observed in lung cancer of smokers.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The LwPy53 algorithm essentially predicts a p53 G->T mutation distribution using the following four parameters that govern whether a G->T mutation may occur at any given guanine: (i) probability that a mutation will occur at a guanine [MUT], derived from the 5' or 3' supF dinucletide relative mutability (drm) value; (ii) selection of mutation [SEL]; (iii) absolute deviation of mean curvature value [CUR]; (iv) presence of DNase I-protected sites [NUC]. These parameters represent, for any given guanine: (i) the mutability of the guanine dependent on the adjacent nucleotides; (ii) allowance of a mutation to occur at a nucleotide only if the mutation is non-synonomous (can be observed); (iii) the difference between the local DNA curvature at the position of the guanine and the overall mean curvature for exons 5, 7 and 8 (this parameter is somewhat general and crude but represents the local deviation in 3-dimensional DNA structure due to increased or decreased flexibility that could reflect the potential rate of DNA repair and also possible positioning within a nucleosome); (iv) the assumption that the nucleotide falls within a nucleosome and has a higher degree of protection from adduct formation. The LwPy53 algorithm is shown in Figure 1.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1. The LwPy53 algorithm (the steps of the algorithm are explained in detail in Materials and methods).

 
BPDE-induced, singlet oxygen and hydroxyl radical supF mutation data
BPDE-induced GC->TA transversion data for the supF gene, previously published by Yoon et al. (10), was retrieved from MGMD (1). Briefly, for the supF mutation assay mutants were generated using the pSP189 plasmid containing the supF gene, either methylated or unmethylated, after initial treatment of the plasmid with 2 µM BPDE for 30 min at room temperature, transfection into human XP-A cell line XP12BE, retrieval and scoring after electroporation into MB7070 bacteria. Singlet oxygen- and hydroxyl radical-induced GC->TA mutation data were also retrieved from previously published data (17) contained within MGMD. Unmethylated pSP189 plasmid was exposed to hydroxyl radical and singlet oxygen prior to transfection into human embryonic kidney Ad293 cells and transformation into MBL50 bacteria for mutant detection. A total of 49 GC->TA mutations were available for methylated supF, 37 for unmethylated supF, 37 for hydroxyl radical and 73 for singlet oxygen (Figure 2).



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 2. G->T mutation spectra for the supF gene generated using data from MGMD (1). (A) BPDE-induced G->T spectrum for the supF gene with methylated CpG sites; (B) BPDE-induced G->T spectrum for unmethylated supF; (C) hydroxyl radical-induced G->T spectrum for unmethylated supF; (D) singlet oxygen-induced G->T spectrum for unmethylated supF. The spectra span the region of the supF gene from nucleotide 99 to 183 as shown on the horizontal axis. The vertical axis for each spectrum shows the actual numbers of G->T mutations observed in each experiment.

 
[MUT]: Dinucleotide relative mutability (drm) for GC->TA transversions
The likelihood of GC->TA mutations within specific dinucleotides for all three BPDE, singlet oxygen and hydroxyl radical was calculated as described in detail by Cooper and Krawczak (18). Sixteen dinucleotides are possible in a DNA sequence and the frequency of each can be calculated for a given nucleotide length. For each dinucleotide the expected number of substitutions is equal to: (dinucleotide frequency x total number of GC->TA substitutions in all dinucleotides). One can then compare the observed versus the expected mutations for each dinucleotide. Furthermore, the mutability of each dinucleotide can be calculated relative to the least mutable dinucleotide (i.e. that presenting the least number of substitutions) by the equation:

where drm(d) is the dinucleotide relative mutability, O(d) is the observed dinucleotide frequency and E(d) is the expected dinucleotide frequency. Dinucleotide relative mutabilities can then be ordered according to rank from the most mutable to the least mutable, for example: CG > TC > GG > ··· AA > TA.

For the algorithm, each drm value was transformed to a value representing the probability that a mutation will occur [MUT] by summing all the drm values and dividing each one by the sum. When a guanine is selected to test for the assignment of a mutation the 5' or 3' nucleotide is randomly selected and [MUT], the resulting dinucleotide, used as the probability that a mutation occurs. The drm and corresponding [MUT] values (Table I) were calculated for BPDE-induced G->T mutations in methylated supF and, as control data, BPDE-induced G->T mutations in unmethylated supF and hydroxyl radical- and singlet oxygen-induced G->T mutations in unmethylated supF.


View this table:
[in this window]
[in a new window]
 
Table I. Rank order of dinucleotides (from left to right) according to mutability after treatment of methylated and unmethylated supF with BPDE, unmethylated supF with hydroxyl radical and unmethylated supF with singlet oxygen

 
[SEL]: p53 mutable guanines
For the predicted spectrum a guanine was declared mutable if a substitution to a thymine was non-synonomous, i.e. the mutation could be detected by the resulting codon change. For the algorithm, the value of [SEL] for each guanine along the p53 sequence was assigned a 1 (i.e. a mutation is allowed) if the site was mutable and a 0 if non-mutable.

[CUR]: predicted p53 DNA curvature
For each spectrum the magnitude of DNA curvature was calculated using the BEND algorithm (19) by submission of DNA sequence to the publicly accessible Bend.It server (http://www3.icgeb.trieste.it/~dna/bend_it.html) using the consensus bendability scale (20). Briefly, predicting DNA curvature is based on the geometry of the individual dinucleotide steps along a sequence (roll, twist and tilt angles) used to calculate a vector for each base pair. A curvature indicator for each nucleotide is then calculated as the angle between vectors for nucleotides 31 nt apart (~3 helical turns). The curvature value for each nucleotide is then given as the deflection angle per 10.5 residue helical turn (1°/bp = 10.5°/helical turn). The mean curvature was calculated for p53 exons 5, 7 and 8 and the absolute difference between the curvature value [CUR] for each nucleotide and the mean recorded, to show the deviation from the mean curvature at each position along the spectrum. Finally, each [CUR] value was transformed to a value between 0 and 1 where, in the algorithm, a value closer to 1 increases the probability of a mutation occurring at a guanine.

[NUC]: predicted p53 nucleosome positioning—DNase I protected sites
It has previously been shown that the binding of BPDE to DNA is influenced by nucleosome structure and that the formation of BPDE adducts is suppressed within nucleosomes (21,22). Tornaletti et al. (16) mapped the positions of DNase I protection along exons 5–8 of p53 where protected sites are indicative of histone binding within a nucleosome. Using these data in the algorithm, the value of [NUC] for each guanine was assigned a 1 (i.e. a mutation is allowed) if the site was not DNase I protected and 0 if it was.

The LwPy53 algorithm
A computer program was written using Microsoft Visual C++ 6.0 to implement the algorithm, which is represented as a flowchart in Figure 1. The program begins by randomly selecting a guanine (N) along exons 5, 7 or 8 of the p53 sequence. If a G->T transversion at that guanine is a synonomous change ([SEL]N = 0) then no mutation is recorded and the program loops to randomly select a new guanine. If the change is selectable ([SEL]N = 1) then the program checks if that guanine is a DNase I-protected site; if it is ([NUC]N = 1) then again, the program loops, no mutation is recorded and a new guanine is selected. If the guanine is not protected ([NUC]N = 0) the program moves to the next step of testing the probability of a G->T mutation occurring at this site. Firstly, a dinucleotide is formed by randomly selecting either the 5' or 3' adjacent nucleotide in order to obtain the associated probability of mutation [MUT]. Then, a random value between 0 and 1 is created [RAND1] and compared to [MUT]: if [MUT] is less than [RAND1] then no mutation is recorded and the program loops. However, if [MUT] is greater than [RAND1] then a second random number between 0 and 1 is created [RAND2] and compared with the deviation from mean curvature value [NUC]N for that guanine. If the random number is less than the curvature value a mutation is recorded for that guanine and added to the running total, else the program loops with no mutation recorded. To ensure comprehensive coverage of all guanines within the sequence the number of loops was set to 1 x 106. The likelihood of a mutation being recorded at a mutable guanine not assumed to be associated with histone binding is therefore simply dependent on adjacent nucleotides (short sequence context) and the relative local curvature of the DNA molecule (long sequence context).


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The accumulation of data for chemically induced mutations from mutagen test systems has increased rapidly and the collection of such data within dedicated databases such as MGMD has already allowed for methods to be developed for the rapid analysis and underlying pattern exploration of mutation spectra (15). The results presented here are an attempt to take mutation information from a mutagen test system, the supF mutation assay, and, using parameters representing other contributory factors, predict a chemically induced mutation spectrum for the p53 gene to compare with actual p53 mutation spectra. BPDE was chosen as a mutagen as mutation data were available for methylated supF (10), p53 being methylated in a variety of tissues (23). Importantly, BPDE is implicated as a major contributor to p53 G->T transversions in lung cancer of smokers where the mutation spectrum is different to that of non-smokers, with G->T mutations predominating, mainly at CpG sites (7). The large set of p53 G->T transversion data available within the IARC TP53 database for lung cancer would allow us to go one step further and compare the predicted BPDE-induced p53 G->T mutation spectrum, generated using the LwPy53 algorithm, with that generated for lung cancer.

The BPDE-induced mutation spectrum shown in Figure 2A reveals that BPDE specifically targets methylated CpG at a higher frequency than any other dinucleotide within the supF gene. Page et al. (8) and Ponten et al. (9) have previously demonstrated, using 16 nt constructs, that PAH adducts induce mutations at different frequencies depending on the construct sequence and type of enantiomeric isomer. Whereas it is clear sequence more distant than adjacent nucleotides may affect the likelihood of mutation at a given nucleotide, Krawczak et al. (24) demonstrated that it is mainly the flanking mononucleotides that influence the substitution rate, with significant effect rapidly decreasing up to 2 bp from the substitution site. After taking into account the different frequencies of each dinucleotide within the gene and calculating the relative mutability of each dinucleotide (drm) we were able to establish the mutabilities of other dinucleotides targeted by BPDE (Table I). Other dinucleotides with a 10% or greater chance of a G->T mutation occurring are GT, GA and GG. As a control experiment, Yoon et al. (10) also treated unmethylated supF with BPDE and obtained a different distribution of mutations (Figure 2B). The most mutable dinucleotide in unmethylated supF after treatment with BPDE was GG, followed by TG, GA and CG, demonstrating an enhancement in the likelihood of BPDE-induced mutations at methylated CpG sites. Similar dinucleotide mutabilities were calculated for hydroxyl radical and singlet oxygen for unmethylated supF for comparison against BPDE.

With the knowledge that adjacent nucleotides have the greatest influence on mutation at a site we used the BPDE-related dinucleotide mutabilities from the supF gene to predict mutable sequences along exons 5, 7 and 8 of the p53 gene after hypothetical treatment with BPDE using the LwPy53 algorithm (Figure 3). It is clear by looking at the G->T mutation spectrum for exons 5, 7 and 8 of p53 in smoke-inaccessible tissue (Figure 4A) and lung cancer of smokers (Figure 4B) that not all CpG sites are highly mutable and that other factors must contribute to the overall mutation spectrum. An obvious contributory factor would be selection of mutation [SEL] and this fixed (i.e. non-variable) parameter was introduced into the algorithm. Two other fixed parameters were also introduced to represent environmental factors that contribute to the overall shape of a mutation spectrum: DNA curvature (or more precisely absolute deviation from mean curvature [CUR]) and nucleosome positioning [NUC]. Both factors can be assumed to contribute to the accessibility of a mutagen to a DNA sequence and rate of DNA repair of adducts. The [CUR] values for each nucleotide along exons 5, 7 and 8 of p53 are shown as a spectrum in Figure 3A. Curvature differing more markedly from average DNA curvature within this region of p53 is observed for all exons shown. The three most prominent peaks along the spectrum are each in a different exon and also in the most mutable regions of p53 as a whole. The mutable (detectable) sites [SEL] are shown in Figure 3B(i), along with DNase I-protected sites [NUC] [Figure 3B(ii)], assumed to be positioned within nucleosomes.



View larger version (61K):
[in this window]
[in a new window]
 
Fig. 3. LwPy53 parameter data and predicted p53 G->T mutation spectra for exons 5, 7 and 8. (A) [CUR]: line graph representing the predicted absolute deviation from mean curvature for this region of p53 using the Bend.It server (http://icgeb.trieste.it/dna/curve_it.html); (B) (i) [SEL]: mutable guanines within the exons where a G->T mutation is detectable; (ii) [NUC]: DNase I-protected sites in the exons representing assumed nucleosome positions; (C) the predicted G->T spectrum for the exons using the [SEL] and [NUC] parameters but with equal dinucleotide mutability [equal MUT], i.e. the likelihood of a G->T mutation is equal for all dinucleotides along the gene; (D) the same predicted G->T spectrum as (C) but with addition of the [CUR] parameter in the algorithm and referred to as the ‘standard’; (E) the predicted G->T spectrum using the [SEL], [NUC] and supF-derived, BPDE-dependent [MUT] parameters; (F) the same predicted spectrum as (E) but with the [CUR] parameter included; (G) the predicted BPDE-induced G->T mutation spectrum derived by subtracting the number of mutations at each nucleotide in spectrum (D) (standard) from spectrum (F) to give the excess of G->T mutations at particular nucleotides relative to the number of mutations at each site if all dinucleotides had equal mutability. The vertical axes within (C) to (G) show within each spectrum the percentage of G->T (n = 1 x 106) mutations for each site relative to the overall number of G->T mutations for that particular spectrum.

 


View larger version (44K):
[in this window]
[in a new window]
 
Fig. 4. p53 exons 5, 7 and 8 G->T mutation spectra drawn from available data within the IARC TP53 Mutation Database, version 6 (2) and predicted reduced G->T mutation spectra for BPDE, hydroxyl radical and singlet oxygen. (A) The IARC G->T mutation spectrum for tissues assumed to be least accessible to smoke (n = 339), i.e. G->T mutations not thought to be caused by BPDE; (B) IARC G->T mutation spectrum for all lung cancer data with the exception of non-smokers and mutation data of individuals known to have undergone exposure to other mutagens, such as radon (n = 367); (C) predicted G->T mutation spectrum after exposure of BPDE as described for Figure 3F, from data generated using methylated supF (n = 1 x 106); (D) predicted hydroxyl radical-induced G->T mutation spectrum from data generated using unmethylated supF; (E) predicted singlet oxygen-induced G->T mutation spectrum from data generated using unmethylated supF (n = 1 x 106); (F) predicted BPDE-induced G->T mutation spectrum from data generated using unmethylated supF (n = 1 x 106). The vertical axes for all spectra show the percentage of G->T mutations for each site relative to the overall number of G->T mutations for that particular spectrum.

 
Within the LwPy53 algorithm, the fixed [SEL] and [NUC] parameters initially determine where mutations can occur. Taking [SEL] and [NUC] into account, Figure 3C shows a predicted distribution of p53 G->T transversions if all dinucleotides had the same mutability but without using the [CUR] parameter. The subtle variation in frequency of G->T transversions at different codons within this spectrum is due to the random factors used by LwPy53 when testing for the likelihood of mutation during iteration. Figure 3D shows the effect of introducing the [CUR] parameter and, predictably, with all dinucleotides having equal mutability, three distinct mutable regions appear reflecting the [CUR] spectrum of Figure 3A. Predicted mutable regions are generally situated between codons 154 and 161, 180 and 186, at 225, between 238 and 249, at 258 and between 269 and 283. The spectrum of Figure 3D is predicted independently of any mutagen (all mutable guanines having equal mutability) so this spectrum was referred to as the standard and may be compared with any mutagen-induced p53 mutation spectrum.

A predicted BPDE-induced G->T mutation spectrum was built using the same procedure as that for the standard to determine the effects of DNA curvature on the overall mutation distribution. Firstly, the LwPy53 algorithm generated a predicted BPDE spectrum without the [CUR] parameter, as shown in Figure 3E. Notable mutation hot-spots can be observed at codons 157, 225, 238, 245, 248, 258, 273 and 282 at positions of the most mutable dinucleotides, mainly CpG. The introduction of the [CUR] parameter sees the retention of mutable regions around codons 157, 245, 248 and 273 and loss of other predicted mutable regions, particularly in the region between codons 180 and 186 and codons 225 and 258. The resulting predicted BPDE-induced G->T mutation spectrum for exons 5, 7 and 8 of p53 is therefore influenced to a large extent not just by BPDE-specific target sequence but also DNA curvature. Even without implementation of the [CUR] parameter a similar spectral shape is observed between Figure 3A, D and E, suggesting that the DNA sequences that deviate from mean curvature are also, coincidentally, predicted to be targeted by BPDE. The predicted BPDE-induced spectrum was then compared with the predicted standard spectrum (having all dinucleotides of equal mutability) to see which codons had a relative excess of mutations (Figure 3G). Subtraction of the standard spectrum from the predicted BPDE-induced spectrum (reduced BPDE-induced spectrum) shows an excess of G->T mutations at codons 157, 248 and 273. Remarkably, these predicted BPDE-induced G->T mutation hot-spots occur at the same positions as the strong BPDE adduct binding sites reported by Dennisenko et al. (6). Interestingly, the peak positioned within the region of codon 157 arises due to a region of DNA with curvature of approximately zero, suggesting that this particular region is not situated within a nucleosome, in agreement with the conclusions drawn by Tornaletti et al. (16) after subjecting exons 5–8 of p53 to DNase I and micrococcal nuclease in order to determine nucleosome-binding sites.

We then made a comparison between the reduced predicted BPDE-induced G->T mutation spectrum and G->T mutation spectra of tissue least accessible to smoke (Figure 4A) and lung cancer of smokers (Figure 4B) from data available from the IARC TP53 Mutation Database (2). The G->T mutation spectrum of lung cancer in smokers differs from that of tissue least accessible to smoke by a reduction in mutation hot-spots primarily at codons 173 and 176 and a particular increase in the frequency of mutations at codons 157, 158, 248 and 273. The reduced G->T mutation spectrum predicted by LwPy53 (Figure 4C) is remarkably similar to the G->T mutation spectrum of lung cancer of smokers and also shows an increase in G->T mutations at codons 157, 248 and 273 relative to the G->T spectrum of tissue least accessible to smoke. Unlike the lung cancer spectrum for smokers, the algorithm does not predict a G->T mutation hot-spot at codon 249 and reduced hot-spots at codons 158 and 245. However, guanines at codons 158 and 245 have not been empirically shown to be BPDE adduct hot-spots (6). We have also included the predicted G->T p53 mutation spectra for hydroxyl radical (Figure 4D) and singlet oxygen (Figure 4E), endogenous oxidative mutagens. These oxidative spectra are included for comparative purposes as the mutation data were derived from unmethylated supF, whereas treatment of the methylated form of supF with these agents could produce a higher mutation frequency at CpG sites that may ultimately change the predicted p53 spectrum. We have also included for comparison the predicted G->T p53 spectrum for BPDE also derived from unmethylated supF (Figure 4F). All three of these predicted p53 spectra lack the mutation hot-spots of not just lung cancer but many other cancer types, lending support to the idea that methylated CpG sites are a preferential target for carcinogens (25).

The accuracy of the predictions made by the LwPy53 algorithm is dependent on the data supplied for the fixed and variable parameters but independent of available p53 mutation data. The data supplied for the algorithm are as good as the methodologies (and algorithm in the case of DNA curvature) applied to obtain the data. The overall shape of the distribution of predicted p53 G->T mutations that are independent of dinucleotide mutability shown in Figure 3D is heavily dependent upon DNA curvature, which warrants further investigation as the pattern strongly reflects the overall pattern of p53 mutations in cancer as a whole. This likely reflects the accessibility of regions of the gene to mutagens and the rate of DNA repair. It is interesting that the three major BPDE mutation hot-spots predicted by LwPy53 at codons 157, 248 and 273 not just correlate with the known major BPDE adduct hot-spots but have also been shown to reside within regions of slow DNA repair of bulky adducts (26).

The first conclusion to be drawn from the predicted p53 G->T mutation spectrum for exons 5, 7 and 8 is that for a mutagen with a similar or identical mutagenic specificity to BPDE, structural parameters that influence accessibility and DNA repair primarily determine the regions targeted by the agent, ultimately allowing preferential adduct formation at sequence-specific target sites. Whether BPDE itself is the main mutagen causing mutations at lung cancer hot-spots in smokers cannot definitely be proved as yet, but the reduced G->T mutation spectrum predicted by LwPy53 (Figure 3G) strongly supports the idea that either BPDE or a chemical with identical mutagenic specificity causes mutation hot-spots at codons 157, 248 and 273. This leads to our second conclusion, that BPDE is not a major contributor to mutation hot-spots at codons 158 and 245 in lung cancer. Further predictions using LwPy53 using mutation data from other mutagens may reveal likely candidates that cause G->T mutation hot-spots at codons 158 and 245. Finally, application of the LwPy53 algorithm to predicting p53 mutation spectra represents a first attempt using mutational data from an in vitro mutational assay. There is, unfortunately, a lack of mutation data available for known mutagens generated using methylated supF, an area that needs to be addressed so that further predictions can be made as to the cause(s) of p53 mutation spectra in cancer.


    Acknowledgments
 
We would like to thank Professor G.P.Holmquist for his helpful suggestions and E.Waters for database support.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 

  1. Lewis,P.D., Harvey,J.S., Waters,E.M. and Parry,J.M. (2000) The Mammalian Gene Mutation Database. Mutagenesis, 15, 411–414.[Abstract/Free Full Text]
  2. Olivier,M., Eeles,R., Hollstein,M., Khan,M.A., Harris,C.C. and Hainaut,P. (2004) The IARC TP53 Database: new online mutation analysis and recommendations to users. Hum. Mutat., in press.
  3. Perwez Hussain,S. and Harris,C.C. (1999) p53 mutation spectrum and load: the generation of hypotheses linking the exposure of endogenous or exogenous carcinogens to human cancer. Mutat. Res., 428, 23–32.[ISI][Medline]
  4. Shopland,D.R. (1995) Tobacco use and its contribution to early cancer mortality with a special emphasis on cigarette smoking. Environ. Health Perspect., 103 (suppl. 8), 131–142.[ISI][Medline]
  5. Hollstein,M., Sidransky,D., Vogelstein,B. and Harris,C.C. (1994) p53 mutations in human cancers. Cancer Res., 54, 4855.[ISI][Medline]
  6. Denissenko,M.F., Pao,A., Tang,M. and Pfeifer,G.P. (1996) Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in p53. Science, 274, 430–432.[Abstract/Free Full Text]
  7. Hainaut,P. and Pfeifer,G.P. (2001) Patterns of p53 G->T transversions in lung cancers reflect the primary mutagenis signature of DNA-damage by tobacco smoke. Carcinogenesis, 22, 367–374.[Abstract/Free Full Text]
  8. Page,J.E., Zajc,B., Oh-hara,T., Lakshman,M.K., Sayer,J.M., Jerina,D.M. and Dipple,A. (1998) Sequence context profoundly influences the mutagenic potency of trans-opened benzo[a]pyrene 7,8-diol 9,10-epoxide-purine nucleoside adducts in site-specific mutation studies. Biochemistry, 37, 9127–9137.[CrossRef][ISI][Medline]
  9. Ponten,I., Sayer,J.M., Pilcher,A.S., Yagi,H., Kumar,S., Jerina,D.M. and Dipple,A. (1999) Sequence context effects on mutational properties of cis-opened benzo[c]phenanthrene diol epoxide-deoxyadenosine adducts in site-specific mutation studies. Biochemistry, 38, 1144–1152.[CrossRef][ISI][Medline]
  10. Yoon,J.-H., Smith,L.E., Feng,Z., Tang,M.-S., Lee,C.-S. and Pfeifer,G.P. (2001) Methylated CpG dinucleotides are the preferential targets got G-to-T transversion mutations induced by benzo[a]pyrene diol epoxide in mammalian cells: similarities with the p53 mutation spectrum in smoking-associated lung cancers. Cancer Res., 61, 7110–7117.[Abstract/Free Full Text]
  11. Chen,J.X., Yi,Z., West,M. and Tang,M.S. (1998) Carcinogens preferentially bind at methylated CpG in the p53 mutational hot spots. Cancer Res., 58, 2070–2075.[Abstract]
  12. Rodin,S.N. and Rodin,A.S. (2000) Human lung cancer and p53: the interplay between mutagenesis and selection. Proc. Natl Acad. Sci. USA, 97, 12244–12249.[Abstract/Free Full Text]
  13. Kraemer,K.H. and Seidman,M.M. (1989) Use of Supf, an Escherichia coli tyrosine suppressor transfer-RNA gene as a mutagenic target in shuttle-vector plasmids. Mutat. Res., 220, 61–72.[ISI][Medline]
  14. Lewis,P.D., Harvey,J.S., Waters,E.M., Skibinski,D.O.F. and Parry,J.M. (2001) Spontaneous mutation spectra in supF: comparative analysis of mammalian cell line base substitution spectra. Mutagenesis, 16, 503–515.[Abstract/Free Full Text]
  15. Lewis,P.D. and Parry,J.M. (2002) Exploratory analysis of multiple mutation spectra. Mutat. Res., 518, 163–180.[ISI][Medline]
  16. Tornaletti,S., Bates,S. and Pfeifer,G.P. (1996) A high-resolution analysis of chromatin structure along p53 sequences. Mol. Carcinog., 17, 192–201.[CrossRef][ISI][Medline]
  17. Jeong,J.K., Jeudes,M.J. and Wogan,G.N. (1998) Mutations induced in the supF gene of pSP189 by hydroxyl radical and singlet oxygen: relevance to peroxynitrite mutagenesis. Chem. Res. Toxicol., 11, 550–556.[CrossRef][ISI][Medline]
  18. Cooper,D.N. and Krawczak,M. (1990) The mutational spectrum of single base-pair substitutions causing human genetic disease—patterns and predictions. Hum. Genet., 85, 55–74.[ISI][Medline]
  19. Goodsell,D.S. and Dickerson,R.E. (1994) Bending and curvature calculations in B-DNA. Nucleic Acids Res., 22, 5497–5503.[Abstract]
  20. Gabrielian,A. and Pongor,S. (1996) Correlation of intrinsic DNA curvature with DNA property periodicity. FEBS Lett., 393, 65–68.[CrossRef][ISI][Medline]
  21. Smith,B.L. and MacLeod,M.C. (1993) Covalent binding of the carcinogen benzo(a)pyrene diol epoxide to Xenopus laevis 5S DNA reconstituted into nucleosomes. J. Biol. Chem., 268, 20620–20629.[Abstract/Free Full Text]
  22. Thrall,B.D., Mann,D.B., Smerdon,M.J. and Springer,D.L. (1994) Nucleosome structure modulates benzo(a)pyrene diol epoxide adduct formation. Biochemistry, 33, 2210–2216.[ISI][Medline]
  23. Tornaletti,S. and Pfeiffer,G.P. (1995) Complete and tissue-independent methylation of CpG sites in p53 gene: implications for mutations in human cancers. Oncogene, 10, 1493–1499.[ISI][Medline]
  24. Krawczak,M., Ball,E.V. and Cooper,D.N. (1998) Neighbouring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am. J. Hum. Genet., 63, 474–488.[CrossRef][ISI][Medline]
  25. Dennisenko,M.F., Chen,J.X., Tang,M.S. and Pfeifer,G.P. (1997) Cytosine methylation determines hot spots of DNA damage in the human p53 gene. Proc. Natl Acad. Sci. USA, 94, 3893–3898.[Abstract/Free Full Text]
  26. Dennisenko,M.F., Pao,A., Pfeifer,G.P. and Tang,M.S. (1998) Slow repair of bulky adducts along the nontranscribed strand of the human p53 gene may explain the strand bias of transversion mutations in cancers. Oncogene, 16, 1241–1247.[CrossRef][ISI][Medline]
Received August 7, 2003; revised November 14, 2003; accepted December 16, 2003.