Polymorphism and Divergence in the ß-Globin Replication Origin Initiation Region

Stephanie M. Fullerton3,*, Jacquelyn Bond{dagger}, Julie A. Schneider{dagger}, Bruce Hamilton{dagger}, Rosalind M. Harding{dagger}, Anthony J. Boyce{ddagger} and John B. Clegg{dagger}

*Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University; and
{dagger}Medical Research Council Molecular Haematology Unit, Institute of Molecular Medicine, University of Oxford; and
{ddagger}Institute of Biological Anthropology, University of Oxford


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
DNA sequence polymorphism and divergence was examined in the vicinity of the human ß-globin gene cluster origin of replication initiation region (IR), a 1.3-kb genomic region located immediately 5' of the adult-expressed ß-globin gene. DNA sequence variation in the replication origin IR and 5 kb of flanking DNA was surveyed in samples drawn from two populations, one African (from the Gambia, West Africa) and the other European (from Oxford, England). In these samples, levels of nucleotide and length polymorphism in the IR were found to be more than two times as high as adjacent non-IR-associated regions (estimates of per-nucleotide heterozygosity were 0.30% and 0.12%, respectively). Most polymorphic positions identified in the origin IR fall within or just adjacent to a 52-bp alternating purine-pyrimidine ((RY)n) sequence repeat. Within- and between-population divergence is highest in this portion of the IR, and interspecific divergence in the same region, determined by comparison with an orthologous sequence from the chimpanzee, is also pronounced. Higher levels of diversity in this subregion are not, however, primarily attributable to slippage-mediated repeat unit changes, as nucleotide substitution contributes disproportionately to allelic heterogeneity. An estimate of helical stability in the sequenced region suggests that the hypervariable (RY)n constitutes the major DNA unwinding element (DUE) of the replication origin IR, the location at which the DNA duplex first unwinds and new strand synthesis begins. These findings suggest that the ß-globin IR experiences a higher underlying rate of neutral mutation than do adjacent genomic regions and that enzyme fidelity associated with the initiation of DNA replication at this origin may be compromised. The significance of these findings for our understanding of eukaryotic replication origin biology is discussed.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
It is well known that the mutation of DNA is nonuniform and that rates of misincorporation and insertion/deletion are determined by endogenous (i.e., DNA replication, proofreading, and repair) and exogenous mechanisms (Boulikas 1992Citation ). The role of DNA polymerase fidelity in replication has been extensively documented in vitro in both prokaryotic and eukaryotic systems and shown to rely on the polymerase used for synthesis, the composition of the sequence replicated (nearest-neighbor effects and repetitive structure), and the sequence’s orientation with respect to the origin of replication, or ori (determining leading- or lagging-strand synthesis) (Roberts and Kunkel 1996Citation ). Observed patterns of spontaneous mutation are roughly consistent with the experimental findings, although the absolute rate of observed misincorporation is lower due to in vivo mismatch repair (Umar and Kunkel 1996Citation ), which is itself context-dependent (e.g., Ye, Holmquist, and O’Connor 1998Citation ). Local DNA sequence environment has been shown to be an important determinant of rates of human germ line single-nucleotide substitution (Krawczak, Ball, and Cooper 1998Citation ).

The mutational consequences of proximity to eukaryotic replication origins has thus far not been investigated, despite known differences in the replication fidelity of polymerases involved in replication initiation, strand elongation, and the gap filling which follows removal of RNA primers (Kunkel 1992Citation ). The extent to which DNA replication fidelity is enhanced or compromised in proximity to a replication origin can, in theory, be investigated by analysis of polymorphism and divergence at sequences known to act as replication origins. This is because levels of polymorphism and divergence are both expected to be higher in regions experiencing a higher relative rate of neutral mutation. Prokaryotic ori’s, for example, show extensive sequence conservation between species (Sharp et al. 1989Citation ; Kornberg and Baker 1991Citation , p. 534) and low interstrain diversity (Spangenberg, Montie, and Tummler 1998Citation ), suggesting either high-fidelity replication/repair of these regions or selective constraint consistent with their essential role in DNA metabolism.

Characterization of replication origins in eukaryotes has advanced in recent years such that a significant number of candidate regions are now recognized. The best characterized eukaryotic ori’s are found in yeast and have been shown to comprise a number of modular domains (Marahrens and Stillman 1996Citation ). In all cases, the characterized replication origins contain a conserved (but degenerate) short sequence domain, known to bind the origin recognition protein complex involved in controlling the timing and location of replication initiation, and a less well-conserved domain, often with a low inherent helical stability, believed to function as a so-called DNA unwinding element (DUE) (DePamphilis 1999Citation ). Initiation of DNA replication begins when the DNA comprising the DUE unwinds and the {alpha} polymerase-primase complex synthesizes a RNA primer; DNA synthesis then proceeds bidirectionally from the DUE core (Kornberg and Baker 1991Citation ).

Human replication origins have been shown to contain similar modular elements (Dobbs, Shaiu, and Benbow 1994Citation ; Aladjem et al. 1998Citation ) and DNA-protein binding analyses at one origin have been reported (Dimitrova et al. 1996Citation ; Abdurashidova et al. 1998Citation ). To date, the only genetically defined human replication origin is one found 5' of the human ß-globin locus. Preliminary analyses of this origin using a replication direction assay (a biochemical method that identifies the initiation region [IR] of an ori by detecting a switch in leading-strand synthesis) showed that replication proceeded bidirectionally from a genomic region approximately 2.0 kb in length, regardless of cell type (Kitsberg et al. 1993Citation ). Deletion of this region in a Hemoglobin Lepore patient resulted in a reversal of strand synthesis polarity, suggesting that the ß-globin domain was being replicated by a different ori, located upstream of the ß-globin gene cluster. Subsequent investigation narrowed the defined IR to a 1.3-kb segment (which overlapped the same region identified by Kitsberg et al. 1993Citation ) and showed that interaction of the IR with the upstream locus control region was required for origin activation (Aladjem et al. 1995Citation ). Most recently, Aladjem et al. (1998)Citation have demonstrated that specific sequences in the vicinity of the initiation region (adjacent to but outside of the IR itself) are required for origin function. No analysis of replication-associated protein binding in this region has thus far been reported.

The ß-globin gene cluster is one of the most extensively investigated gene clusters in higher eukaryotes and represents an excellent candidate region in which to investigate patterns of polymorphism and divergence associated with proximity to a constitutive ori. At the time the replication origin was first identified, we were engaged in an investigation of polymorphism in the nearby human ß-globin locus (Fullerton et al. 1994Citation ; Harding et al. 1997Citation ), an analysis that overlapped the defined IR. We therefore extended our investigation, for a subset of the original sample, to include the whole of the IR and a significant length of 5'-flanking DNA. Our results suggest that a subregion of the replication origin initiation zone, composed of an AT-rich alternating purine-pyrimidine ((RY)n) repeat with DNA-unwinding capability, is subject to marked intraspecific and interspecific sequence divergence. Unexpectedly, the pattern of polymorphism at this complex repeat sequence differs substantially from variation observed at other human complex microsatellites, suggesting that nucleotide composition alone is unlikely to explain the observed hypervariability. Instead, a higher underlying rate of neutral nucleotide substitution, associated with the role of the (RY)n in replication initiation, is implicated.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Samples Investigated
Six samples (12 chromosomes) from individuals living in the Gambia, West Africa, and 12 samples (24 chromosomes) from individuals living in Oxfordshire, England, were sequenced. The Gambian DNA samples derive from immunological studies of host resistance to malaria and were collected and extracted by A. V. S. Hill of the Institute of Molecular Medicine (IMM), Oxford. Five of the six African samples were heterozygous for the ßS mutation. The English samples represent a subset of DNAs collected in the course of a study of blood pressure and hypertension by P. J. Ratcliffe, also of the IMM.

DNA Sequence Analysis
Bases 58298–64374 of the HUMHBB GenBank sequence (accession number U01317) were amplified from genomic DNA and sequenced manually with Sequenase T7 DNA polymerase (United States Biochemical). The 6-kb region was amplified as two 3-kb fragments (sites 1–3009 and 3010–6076, respectively). DNA amplification and direct sequencing in both regions was conducted in the same manner, following methods for the analysis of the 3' fragment published previously (Fullerton et al. 1994Citation ). Sequence haplotype linkage relationships were determined for the 3' fragment only. Primers for the amplification of the 5' fragment corresponded to positions 58298–58322 and 61307–61331 of the reference sequence. The sequences of primers used in the analysis of the entire 6-kb region are available on request. GenBank accession numbers for the IR sequence haplotypes are AF186606AF186620. The chimpanzee ß-globin sequence used for the interspecific comparison was that reported by Savatier et al. (1985)Citation (GenBank accession number X02345).

Estimating Genetic Diversity and Sequence Divergence
Genetic diversity in subregions of the 6-kb segment investigated was described in terms of per-nucleotide expected heterozygosity, or nucleotide diversity. Two estimates of nucleotide diversity were derived using the statistical analysis package DnaSP, version 3 (Rozas and Rozas 1999Citation ). The first, {theta}, is based on the observed number of polymorphic nucleotide sites or nucleotide and length (indel) changes observed in a sample of DNA sequences (Watterson 1975Citation ). The second, {pi}, measures the average pairwise sequence difference between any two random alleles in the sample investigated (Nei 1987Citation , p. 256). Standard errors for each estimate were calculated assuming no recombination between sites, which results in a more conservative value than assuming free recombination (Nei 1987Citation , pp. 255–257). The two estimates were compared via Tajima’s (1989)Citation D statistic. Values of D significantly different from 0 suggest that the two estimates of nucleotide diversity are discordant and hence that the observed variation is not consistent with the null hypothesis of neutrality. Significant differences in estimates of nucleotide diversity for particular sequence subregions were identified using the distribution of estimates generated from 10,000 independent coalescent simulations, assuming a neutral infinite-sites model without recombination and large constant population size (Hudson 1990Citation ).

Comparison of levels of intraspecific polymorphism and interspecific divergence was performed according to Hudson, Kreitman, and Aguadé (1987)Citation (the HKA test), using DnaSP’s Direct Mode feature. Nucleotide sequence differences between humans and chimpanzees were estimated from a human-chimpanzee sequence alignment generated using the SIM sequence alignment program (Huang, Hardison, and Miller 1990Citation ), with match, mismatch, gap-open, and gap-extension penalties of 1, -0.5, 1, and 0.5, respectively. This alignment was identical to one originally reported by Savatier et al. (1985)Citation . Comparisons were performed using nucleotide site differences only and using nucleotide and length changes considered together.

Sliding-window plots of nucleotide diversity, {pi}, for the total sample and each of the population samples, as well as plots of the average number of nucleotide substitutions per site between populations, Dxy, and nucleotide divergence between humans and chimpanzees (calculated as the average proportion of nucleotide differences between species, K [Nei 1987Citation , p. 276]), were generated using DnaSP. In all cases, parameter values were calculated for 100-bp windows placed at 5-bp intervals. Length polymorphism/divergence was disregarded in each of these analyses.

Pairwise sequence differences for the (RY)n repeat sequence and seven additional human complex microsatellites were calculated using alignments of unique sequence haplotypes found at each locus. Sequence data for five loci were obtained from GenBank: RNU2 (Liao and Weiner 1995Citation ; accession numbers U57504U57614), Mfd 59 (Garza and Freimer 1996Citation ; U48313U48320), DQCAR (Macaubas et al. 1997Citation ; U96944–U96962), and DQCARII and G51152 (Lin et al. 1998Citation ; AF042291– AF042316). Sequences for the other two loci, MIB (Grimaldi and Crouau-Roy 1997Citation ) and DRB1 (Bergström et al. 1999Citation ), were inputted manually using data presented in the published reports. Alignments were reconstructed to reflect those provided by the original authors, where available. All other sequences, including the (RY)n region, were manually aligned.

Calculation of Helical Stability
Helical stability in the 6-kb region surveyed for polymorphism was determined using the computer program THERMODYN (courtesy of D. Kowalski). The program uses experimentally determined thermodynamic parameters to calculate the free energy difference ({Delta}G) between the duplex and single-stranded states for multiple overlapping segments or "windows" of a particular DNA sequence (Natale, Schubert, and Kowalski 1992Citation ). Values of {Delta}G (in kcal/mol) were calculated for 100-bp windows placed at 5-bp intervals along the 6-kb consensus (GenBank) sequence, assuming a temperature of 37°C and an ionic strength of 10 mM.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Polymorphism and Divergence in the ß-Globin Origin IR Versus Flanking Regions
A 6-kb-long region encompassing the replication origin initiation region and 5'- and 3'-flanking regions was investigated for the presence of nucleotide and length polymorphism in 12 African and 24 European chromosomes. DNA sequence analysis identified 32 nucleotide polymorphisms and 9 insertion and/or deletion variants in the 6-kb region (fig. 1 ). Per-site nucleotide diversity in the replication origin IR, defined by the boundaries established by Aladjem et al. (1995)Citation and estimated from the observed number of polymorphic sites, was twice that observed in the combined flanking DNA and nearly three times as high when indel polymorphism was included in the diversity estimates (table 1 ). Coalescent simulations suggest that the observed differences are significant (two-tailed P < 0.05 for estimates of {theta}, whether substitutions only or substitutions and length polymorphism are considered). The same relative differences in levels of nucleotide diversity were also observed when African and European chromosomes were considered independently (table 1 ). The difference was significant only for the African sample (and then only when nucleotide and length variation were jointly considered), however, due to a lower overall level of variation in the European sample. No significant difference between the two estimates of nucleotide diversity, {theta} and {pi}, were observed for any sequence subregion, although estimates of D (Tajima 1989Citation ) did vary between the African and European samples (table 1 ). These differences are likely to reflect distinct demographic and/or selective pressures acting on the two population samples.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1.—The genomic distribution of sequence polymorphism in the ß-globin replication origin initiation region (IR) and flanking DNA. The 6-kb region investigated for sequence variation is shown in the top panel, illustrating the location of the ß-globin locus. Below, vertical lines mark the positions of the 41 polymorphic sites identified: downward-facing triangles indicate sites involved in insertion events, upward-facing triangles indicate deletions, and dots identify site polymorphisms which involve transition mutations. The boxed segment corresponds to the (RY)n repeat region shown in figure 2 . The larger bracketed region below this represents the extent of the originally identified 2-kb IR (Kitsberg et al. 1993Citation ). The smaller bracketed region marks the boundary of the subsequently mapped 1.3-kb IR (Aladjem et al. 1995Citation ) and represents the consensus IR region used in this study.

 

View this table:
[in this window]
[in a new window]
 
table 1 Nucleotide Diversity by Sequence Subregion for the Total Sample and Separate Population Samples

 
The types of polymorphic sequence changes observed in the origin IR and flanking regions are summarized in table 2 . The observed transition-to-transversion ratio in the IR (1:1) is higher than that found in the combined flanking DNA (1:2). This difference is not statistically significant.


View this table:
[in this window]
[in a new window]
 
table 2 Types of Polymorphic Nucleotide Substitutions Observed in the Replication Origin Initiation Region (IR) and Flanking DNA

 
The extent to which the higher level of polymorphism in the replication origin IR derives from a higher underlying rate of neutral mutation, rather than from the effects of balancing selection within the initiation region or selective constraint on flanking regions, can be investigated with a comparison of interspecific sequence divergence in the same regions using the HKA test (Hudson, Kreitman, and Aguadé 1987Citation ). The number of nucleotide substitutions between humans and chimpanzees in the 6-kb region was estimated from an alignment of the published human and chimpanzee reference sequences (as described in Materials and Methods) and compared with the number of nucleotide polymorphisms observed in the same region in humans. As shown in table 3 , although there is a tendency for polymorphism in the IR to be higher relative to the 5'-flanking region than suggested by the interspecific divergence, this trend is not statistically significant when only site differences are considered (a significant result is only obtained when indel morphism/divergenceis included). If selection were acting to enhance polymorphism in the IR (or alternatively constrain variation in the 5'-flanking region), we would expect to observe a significant deviation irrespective of whether indel variation was included. Instead, the latter finding is more likely to reflect underestimation of the number of length changes which have occurred since humans and chimpanzees diverged. Taken together with the known presence of high rates of interallelic recombination in the vicinity of the IR (Fullerton et al. 1994Citation ; Harding et al. 1997Citation ; Smith et al. 1998Citation ), these results suggest that the higher level of polymorphism in the origin IR is unlikely to be a consequence of balancing selection acting on linked variation somewhere in that region.


View this table:
[in this window]
[in a new window]
 
table 3 HKA Test Results for Different Pairwise Regional Comparisons

 
Polymorphism and Divergence Within the ß-Globin Origin Initiation Region
Analysis of sequence haplotype diversity within the replication origin IR demonstrates the nonrandom distribution of nucleotide and length variation found there. Nearly half of the nucleotide polymorphisms (5 out of 12) and all of the simple indel differences observed in the IR occur in a short segment that includes a 52-bp alternating purine-pyrimindine repeat sequence ((RY)n) (fig. 2 ). Sliding-window calculation of nucleotide diversity within the origin IR (estimated on the basis of substitutional variation only) indicates the extent to which the (RY)n is hypervariable compared with immediately adjacent sequences (figs. 3 –5). The level of per-nucleotide expected heterozygosity is an order of magnitude higher than that observed elsewhere in the IR or in the 5'- or 3'-flanking regions, a highly significant difference (P < 0.001; table 1 ). The same region that varies excessively within humans also shows the greatest interspecific divergence when the similarity between human and chimpanzee sequences is considered (fig. 3 ). HKA test comparison within the IR indicated no significant heterogeneity in patterns of polymorphism and divergence for the (RY)n and adjoining regions (P > 0.05 with or without indels; data not shown).



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 2.—Sequence polymorphism observed in the (RY)n repeat region of the ß-globin replication origin initiation region. Twelve distinct sequence haplotypes were observed in the (RY)n region, nine were observed in the African (Gambia) sample (n = 12), and three were observed in the European (Oxford) sample (n = 24). Sites in each of these haplotypes that differ from the GenBank consensus sequence (given at the top of the figure) are shown, along with the number of occurrences of each haplotype in the respective population samples. Sequence numbering is with respect to the total (6 kb) region analyzed. The sequence of the orthologous region in the chimpanzee is also indicated.

 


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 3.—Sliding-window depiction of intraspecific polymorphism and interspecific divergence in the consensus replication origin initiation region (IR). Estimates of nucleotide diversity ({pi}) for the total data set and the average proportion of nucleotide differences between humans and chimpanzees (K) are shown for 100-bp windows calculated at 5-bp intervals along the 1.3-kb sequence. Length polymorphism and divergence was disregarded in the calculation of both estimates. The location of the (RY)n repeat region within the consensus IR is indicated.

 
Similar sliding-plot calculations of nucleotide diversity for the African and European subsamples confirm that the polymorphism pattern observed in the combined sample is neither population-specific nor an artifact of pooling geographically subdivided populations (fig. 4 a). Interestingly, the absolute level of polymorphism is higher in the African sample than in the European sample, despite the fact that the African sample is smaller and nonrepresentative (i.e., multiple genetically identical ßS chromosomes were present in our sample at a frequency significantly greater than the frequency of the sickle cell anemia allele in the Gambia; Hill et al. 1991Citation ). Sliding-plot calculation of sequence haplotype divergence between the two samples shows that the (RY)n region also varies most between populations, consistent with this region experiencing a higher underlying rate of nucleotide substitution (fig. 4 b).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 4.—Sliding-window depiction of within- versus between-population divergence in the consensus replication origin initiation region. a, Estimates of nucleotide diversity ({pi}), disregarding length polymorphism, for each of the two population samples are shown. Diversity was calculated for 100-bp windows placed at 5-bp intervals along the 1.3-kb sequence. Note that while absolute levels of diversity differ in the two samples, there is higher diversity in the vicinity of the (RY)n repeat region (centered on position 400) in both cases. b, Estimates of the average numbers of nucleotide substitutions per site between the two populations, Dxy, for 100-bp windows placed at 5-bp intervals are shown. Again, the highest level of sequence divergence between populations is found in the vicinity of the (RY)n repeat region.

 
The sequence polymorphisms that occur in the hypervariable region of the IR fall mostly at the 3' end of the (RY)n repeat or in an adjacent homopyrimidine run (fig. 2 ). While the repetitive structure of the (RY)n region is likely to contribute in some way to the accumulation of sequence divergence in this portion of the IR, it is clear that the repeat region itself is not behaving as a "typical" microsatellite of complex sequence composition. Comparison of the pattern of sequence diversity found at the ß-globin (RY)n with that observed at seven other complex human microsatellites (Liao and Weiner 1995Citation ; Garza and Freimer 1996Citation ; Grimaldi and Crouau-Roy 1997Citation ; Macaubas et al. 1997Citation ; Lin et al. 1998Citation ; Bergström et al. 1999Citation ) indicates that the (RY)n has an unusually high proportion of sequence haplotypes which differ by nucleotide substitution alone, as opposed to differing only by length or by a combination of nucleotide substitutions and length changes (table 4 ). The loss or gain of one or more repeat units, which arise via the effects of replication enzyme slippage (Levinson and Gutman 1987Citation ), is a predominant feature of most microsatellites, even those complex repetitive sequences in which nucleotide substitution also occurs. Although this type of length variation is apparent within the short homogenous (AT)n array at the 3' end of the (RY)n repeat, only a small portion of the total observed variation may be attributed to such length changes. Rather, the majority of changes which are found within the (RY)n repeat occur as substitution events that change the nucleotide composition of the (RY)n but do not alter its overall length (fig. 2 ). This suggests that some process other than enzyme slippage is contributing to the high level of diversity in this region.


View this table:
[in this window]
[in a new window]
 
table 4 Types of Differences Observed in Pairwise Comparisons of Complex Microsatellite Sequences

 
Despite the unexpectedly high level of nucleotide substitution, none of the observed substitutions disrupt the alternating structure of the (RY)n repeat. An even larger survey of DNA sequence diversity in the same region, that assayed variation in 350 chromosomes from 9 populations (Harding et al. 1997Citation ), also identified no sequence variant that disrupts the (RY)n repeat. Similar repeat motif conservation was observed at only two of the other seven loci examined: the MIB microsatellite, for which no nucleotide substitution polymorphism was observed (Grimaldi and Crouau-Roy 1997Citation ), and the Mfd 59 locus, whose sequence diversity was determined for only a selected subset of alleles (Garza and Freimer 1996Citation ).

DNA Helical Stability in the ß-Globin Replication Initiation Zone
The striking preservation of the (RY)n repeat sequence structure in the face of high levels of substitutional polymorphism and divergence suggests that its structure may have a functional significance, possibly in the context of replication initiation. While little is currently known about specific regulatory sequences associated with human replication origins, it is possible to assess the helical stability of duplex DNA by a consideration of base-stacking interactions among nucleotides in a given sequence (Natale, Schubert, and Kowalski 1992Citation ). Using a computer program designed for this purpose (see Materials and Methods), we were able to investigate the nature of DNA-unwinding capability in the ß-globin replication initiation region.

As shown in figure 5, a region of pronounced helical instability is present in the defined IR, which coincides with the hypervariable (RY)n sequence. This suggests that the (RY)n sequence constitutes the DUE of the replication origin initiation region, i.e., the place at which DNA unwinding begins in preparation for initial strand synthesis. Similar investigation of the DNA-unwinding profile of African and European sequence haplotypes suggests that the observed polymorphisms, which retain the alternating (RY)n structure, have no detectable effect on DNA helical stability (data not shown). This (indirect) evidence supports the assumption that the observed polymorphisms persist in human populations because they have no appreciable deleterious effect on the function of the ß-globin ori. As the primary unwinding domain of an origin’s IR is likely to lie in close proximity to the start site of replicative synthesis, the hypervariability of the ß-globin DUE also suggests that only enzymatic interactions involved in the earliest stages of replication initiation are error-prone.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5.—DNA helical stability within the consensus replication origin initiation region (IR). The sliding-window plot of nucleotide diversity ({pi}) for the total sample (calculated with respect to substitution variation only) is compared with a similar depiction of the negative free energy ({Delta}G) required to unwind 100-bp windows of duplex DNA in the ß-globin IR. Low values of {Delta}G represent areas of inherently low helical stability. As shown, the local minimum for the IR coincides with the hypervariable (RY)n region.

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
This investigation of DNA sequence polymorphism and divergence of the human ß-globin origin of replication suggests that replication fidelity in the immediate vicinity of the replication origin IR is compromised. Three major observations support this conclusion: within-population DNA sequence variation is highest near the alternating (RY)n sequence which constitutes the likely DNA-unwinding element of the initiation region, between-population sequence haplotype divergence is highest in the same small genomic region, and, finally, interspecific divergence (estimated from a comparison of orthologous regions in humans and chimpanzees) is also highest in the same genomic location. All three findings are consistent with a rate of neutral mutation in and immediately 3' of the ß-globin DUE that is at least an order of magnitude higher than that in adjacent flanking regions. These observations suggest that local rates of sequence mutation, and, ultimately, patterns and levels of sequence diversity, are influenced by proximity to the initiation region of a known human replication origin.

The polymorphisms and substitutions observed in the ß-globin replication origin IR represent mutations that have eluded both mismatch repair and the effects of drift and purifying selection. The presence of allelic variants at significant frequencies in human populations (data reported here and in Harding et al. [1997Citation ]) suggest that the observed polymorphisms are not seriously deleterious, although possible functional effects of the variation clearly merit direct investigation. The fact that none of the observed variants disrupt the helical stability profile of the DUE is consistent both with the apparent neutrality of the variation and with the influence of selective constraint in determining the position and type of polymorphism found in the DUE. If the effects of constraint are significant, the observed sequence diversity may not reflect the full spectrum of mutational events that arise at the ß-globin replication origin. Investigation of de novo germ line or somatic mutation in the same region will provide relevant information regarding the mutational basis of the observed variation and should clarify the extent to which mutation rates vary within and around the replication origin initiation region.

The accumulation of polymorphism in the replication origin DUE, which is likely to lie at or near the start site of replication (Bielinksy and Gerbi 1998Citation ), suggests that DNA polymerase fidelity and/or DNA repair associated with replication initiation at the ß-globin ori may be compromised. These findings are consistent with observed differences in in vitro DNA polymerase error rates, which have suggested that different phases of replicative DNA synthesis (e.g., initiation vs. strand elongation) may be subject to different rates of mutation (Kunkel 1992Citation ). ß polymerase, for example, is known to play a role in small gap repair and hence may be involved in the excision of the RNA primer (and the first 25–50 nt of DNA synthesized by {alpha} polymerase) at the replication initiation start site (Linn 1991Citation ). However, the enzyme’s putative role in DNA synthesis at replication origins is less compelling in this context than is its known involvement in a form of DNA replication error known as dislocation mutagenesis (Kunkel and Alexander 1986Citation ). Dislocation mutagenesis results from transient template-primer slippage, in which the template slips with respect to the enzyme, a correct nucleotide is incorporated, and the template then realigns prior to continued synthesis, resulting in an apparent misincorporation event. This form of mutagenesis, which results in base substitutions and 1-nt frameshifts at distinct hot spots in vitro (Kunkel and Soni 1988Citation ), is consistent with many (if not all) of the polymorphisms observed in the ß-globin (RY)n repeat region (fig. 2 ). Moreover, dislocation mutagenesis is a unique feature of ß polymerase; no other eukaryotic DNA polymerases generate such errors in vitro (Roberts and Kunkel 1996Citation ).

This unusual feature of the ß polymerase enzyme, along with its possible role in the replication process, suggests that errors associated either with small gap filling following RNA primer excision or base excision repair are responsible for the observed hypervariability of the ß-globin replication origin IR. If so, these observations imply that eukaryotic replication origins may in general be subject to higher-than-average rates of within- and between-species sequence divergence. More specifically, our findings predict that sequences involved in DNA unwinding at eukaryotic replication origins, which lie in close proximity to the RNA primer synthesis site, may be particularly susceptible to nucleotide substitution and small length changes.

Although no other investigation of origin-associated polymorphic variation has thus far been reported, available evidence from other eukaryotic replication origins is consistent with these predictions. There is, for example, a well-known lack of sequence conservation among replication origins in the genome of the yeast Saccharomyces cerevisiae (Broach et al. 1983Citation ), which contrasts markedly with the extensive sequence similarity observed among prokaryotic ori’s (Kornberg and Baker 1991Citation , p. 534). Of the six replication origins which have been well-characterized in this species (Theis et al. 1999Citation ), only one of the two main modular elements commonly present, domain A, is conserved, and within this domain, conservation is restricted principally to a single essential 11-bp motif (Marahrens and Stillman 1996Citation ). Extensive differences among ori’s are observed in the B domain, which encompasses an easily unwound sequence suggested to function as a DUE (Natale, Umek, and Kowalski 1993Citation ). Conservation of domain A and element B1 of the B domain (but no other B-domain elements) has also been observed among homologous origins found in distinct Saccharomyces species (Theis et al. 1999Citation ). However, no sequence conservation exists between replication origins found in budding and fission yeast (Clyne and Kelly 1995Citation ).

One study has reported that in humans, the intergenic spacer of the rRNA repeat accumulates variation at high rates (Gonzalez and Sylvester 1995Citation ), a feature of rDNA found in a wide variety of organisms. The human intergenic spacer has also been shown in independent investigations to contain one or more initiation zones for DNA replication (Little, Platt, and Schildkraut 1993Citation ; Gencheva, Anachkova, and Russev 1996Citation ). The extent to which the reported variation coincides with the mapped initiation zone(s) has not been investigated. A pilot survey of variation in the vicinity of two other human replication origins suggests that these too may experience higher-than-average levels of nucleotide and length polymorphism (unpublished data).

The effects of selective constraint on regions subject to high rates of sequence turnover are reflected in the composition and genomic location of eukaryotic replication origins. If constraint acts primarily on DNA secondary structure (e.g., unwinding capacity), higher underlying rates of polymorphism and divergence in replication origin IRs are expected to favor conservation of general modular features rather than specific sequence motifs. This feature of replication origins is well known (Dobbs, Shaiu, and Benbow 1994Citation ; Marahrens and Stillman 1996Citation ). Similarly, to minimize the likelihood of deleterious mutations disrupting origin function, important regulatory sequences in the ori (which rely on specific base pair interactions) are more likely to lie at a distance from the start site of replication initiation. The best characterized replication origins in budding yeast occupy genomic regions of 100–200 bp in length, consistent with moderately broad positioning of regulatory elements; origins in higher eukaryotes are less well defined (Gilbert 1998Citation ). Indeed, the involvement of the far upstream (located 50 kb 5') locus control region in stimulating replication at the ß-globin origin suggests that cis-acting elements may be found at considerable distances from the start site of initiation (Aladjem et al. 1995Citation ). Finally, the hypervariability of replication origin initiation zones suggests that they should be confined to noncoding flanking and intervening sequences so that the integrity of adjacent coding and regulatory regions is preserved. Consistent with this prediction, the majority of known eukaryotic replication origins are, like the ß-globin ori, located in intergenic DNA (Brewer 1994Citation ).

A predisposition to higher-than-average rates of nucleotide substitution would, at first glance, appear to be a significant disadvantage for genomic regions that have an important role to play in DNA replication. The modular nature of eukaryotic replication origins, and at least partial reliance on elements whose function is determined by secondary structure rather than by nucleotide composition, must contribute to the tolerance of these regions to high rates of sequence turnover. Unexpectedly, however, high levels of polymorphism and divergence relative to adjacent noncoding regions at the ß-globin ori suggest that replication origins are subject to higher rates of mutation than nonfunctional DNA. Why relaxed replication fidelity is associated with replication initiation and why it persists in the face of selective pressures which should, on balance, favor high-fidelity synthesis is unclear. The finding of significant sequence diversity in close proximity to an important (and, until relatively recently, unrecognized) functional domain in the ß-globin gene cluster illustrates the complexity of forces with the potential to influence rates of spontaneous mutation in eukaryotic genomes. Investigation of population variation and interspecific divergence in this and other origin regions promises to further illuminate our understanding of both basic mutational mechanisms and replication origin biology.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank A. V. S. Hill and P. J. Ratcliffe for DNA samples, D. Kowalski for the program THERMODYN, and A. G. Clark, T. M. Pollard, S. A. Tishkoff, and two anonymous reviewers for helpful comments on the manuscript. We are also grateful to I. B. Rogozin for first drawing our attention to the phenomenon of dislocation mutagenesis. S.M.F.’s laboratory research was supported by grants from the Rhodes and Wellcome Trusts, as well as the Department of Anthropology, University of Durham, U.K.


    Footnotes
 
Wolfgang Stephan, Reviewing Editor

1 Abbreviations: DUE, DNA unwinding element; IR, initiation region; ori, origin of replication. Back

2 Keywords: globin, DNA replication, polymerase fidelity, mutation, polymorphism, human. Back

3 Address for correspondence and reprints: Stephanie M. Fullerton, Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802. E-mail: smf15{at}psu.edu Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Abdurashidova, G., S. Riva, G. Biamonti, M. Giacca, and A. Falaschi. 1998. Cell cycle modulation of protein-DNA interactions at a human replication origin. EMBO J. 17:2961–2969.[Abstract/Free Full Text]

    Aladjem, M. I., M. Groudine, L. L. Brody, E. S. Dieken, R. E. K. Fournier, G. M. Wahl, and E. M. Epner. 1995. Participation of the human ß-globin locus control region in initiation of DNA replication. Science 270:815–819.

    Aladjem, M. I., L. W. Rodewald, J. L. Kolman, and G. M. Wahl. 1998. Genetic dissection of a mammalian replicator in the human ß-globin locus. Science 281:1005–1009.

    Bergström, T. F., H. Engkvist, R. Erlandsson, A. Josefsson, S. J. Mack, H. A. Erlich, and U. Gyllensten. 1999. Tracing the origin of HLA-DRB1 alleles by microsatellite polymorphism. Am. J. Hum. Genet. 64:1709–1718.[ISI][Medline]

    Bielinksy, A.-K., and S. A. Gerbi. 1998. Discrete start sites for DNA synthesis in the yeast ARS1 origin. Science 279:95–98.

    Boulikas, T. 1992. Evolutionary consequences of nonrandom damage and repair of chromatin domains. J. Mol. Evol. 35:156–180.[ISI][Medline]

    Brewer, B. J. 1994. Intergenic DNA and the sequence requirements for replication initiation in eukaryotes. Curr. Opin. Genet. Dev. 4:196–202.[Medline]

    Broach, J. R., Y. Y. Li, J. Feldman, M. Jayaram, J. Abraham, K. A. Nasmyth, and J. B. Hicks. 1983. Localization and sequence analysis of yeast origins of DNA replication. Cold Spring Harb. Symp. Quant. Biol. 47:1165–1173.[ISI][Medline]

    Clyne, R. K., and T. J. Kelly. 1995. Genetic analysis of an ARS element from the fission yeast Schizosaccharomyces pombe. EMBO J. 14:6348–6357.

    DePamphilis, M. L. 1999. Replication origins in metazoan chromosomes: fact or fiction? Bioessays 21:5–16.

    Dimitrova, D., M. Giacca, F. Demarchi, G. Biamonti, S. Riva, and A. Falaschi. 1996. In vivo protein-DNA interactions at human DNA replication origin. Proc. Natl. Acad. Sci. USA 93:1498–1503.

    Dobbs, D. L., W.-L. Shaiu, and R. M. Benbow. 1994. Modular sequence elements associated with origin regions in eukaryotic chromosomal DNA. Nucleic Acids Res. 22:2479–2489.[Abstract]

    Fullerton, S. M., R. M. Harding, A. J. Boyce, and J. B. Clegg. 1994. Molecular and population genetic analysis of allelic sequence diversity at the human ß-globin locus. Proc. Natl. Acad. Sci. USA 91:1805–1809.

    Garza, J. C., and N. B. Freimer. 1996. Homoplasy for size at microsatellite loci in humans and chimpanzee. Genome Res. 6:211–217.[Abstract]

    Gencheva, M., B. Anachkova, and G. Russev. 1996. Mapping the sites of initiation of DNA replication in rat and human rRNA genes. J. Biol. Chem. 271:2608–2614.[Abstract/Free Full Text]

    Gilbert, D. M. 1998. Replication origins in yeast versus metazoa: separation of the haves and have nots. Curr. Opin. Genet. Dev. 8:194–199.[ISI][Medline]

    Gonzalez, I. L., and J. E. Sylvester. 1995. Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 27:320–328.

    Grimaldi, M.-C., and B. Crouau-Roy. 1997. Microsatellite allelic homoplasy due to variable flanking sequences. J. Mol. Evol. 44:336–340.[ISI][Medline]

    Harding, R. M., S. M. Fullerton, R. C. Griffiths, J. Bond, M. J. Cox, J. A. Schneider, D. S. Moulin, and J. B. Clegg. 1997. Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. J. Hum. Genet. 60:772–789.[ISI][Medline]

    Hill, A. V. S., C. E. M. Allsopp, D. Kwiatkowski, N. M. Anstey, P. Twumasi, P. A. Rowe, S. Bennett, D. Brewster, A. J. McMichael, and B. M. Greenwood. 1991. Common West African HLA antigens are associated with protection from severe malaria. Nature 352:595–600.

    Huang, X. Q., R. C. Hardison, and W. Miller. 1990. A space-efficient algorithm for local similarities. Comput. Appl. Biosci. 6:373–381.[Abstract]

    Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 1–44 in D. Futuyama and J. Antonovics, eds. Oxford surveys in evolutionary biology. Oxford University Press, Oxford, England.

    Hudson, R. R., M. Kreitman, and M. Aguadé. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153–159.

    Kitsberg, D., S. Selig, I. Keshet, and H. Cedar. 1993. Replication structure of the human ß-globin gene domain. Nature 366:588–560.

    Kornberg, A., and T. Baker. 1991. DNA replication. W. H. Freeman and Co., New York.

    Krawczak, M., E. V. Ball, and D. N. Cooper. 1998. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am. J. Hum. Genet. 63:474–488.[ISI][Medline]

    Kunkel, T. A. 1992. DNA replication fidelity. J. Biol. Chem. 267:18251–18254.[Free Full Text]

    Kunkel, T. A., and P. S. Alexander. 1986. The base substitution fidelity of eukaryotic DNA polymerases. Mispairing frequencies, site preferences, insertion preferences, and base substitution by dislocation. J. Biol. Chem. 261:160–166.[Abstract/Free Full Text]

    Kunkel, T. A., and A. Soni. 1988. Mutagenesis by transient misalignment. J. Biol. Chem. 263:14784–14789.[Abstract/Free Full Text]

    Levinson, G., and G. Gutman. 1987. High frequency of short frameshifts in poly-CA/GT tandem borne by bacteriophage M13 in Eschericia coli K-12. Nucleic Acids Res. 15:5323–5338.[Abstract]

    Liao, D., and A. M. Weiner. 1995. Concerted evolution of the tandemly repeated genes encoding primate U2 small nuclear RNA (the RNU2 locus) does not prevent rapid diversification of the (CT)n·(GA)n microsatellite embedded within the U2 repeat unit. Genomics 30:583–593.

    Lin, L., L. Jin, X. Lin, A. Voros, P. Underhill, and E. Mignot. 1998. Microsatellite single nucleotide polymorphisms in the HLA-DQ region. Tissue Antigens 52:9–18.

    Linn, S. 1991. How many pols does it take to replicate nuclear DNA? Cell 66:185–187.

    Little, R. D., T. H. Platt, and C. L. Schildkraut. 1993. Initiation and termination of DNA replication in human rRNA genes. Mol. Cell. Biol. 13:6600–6613.[Abstract]

    Macaubas, C., L. Jin, J. Hallmayer, A. Kimura, and E. Mignot. 1997. The complex mutation pattern of a microsatellite. Genome Res. 7:635–641.[Abstract/Free Full Text]

    Marahrens, Y., and B. Stillman. 1996. The initiation of DNA replication in the yeast Sacharomyces cerevisiae. Pp. 66–95 in J. J. Blow, ed. Eukaryotic DNA replication. Oxford University Press, Oxford, England.

    Natale, D. A., A. E. Schubert, and D. Kowalski. 1992. DNA helical stability accounts for mutational defects in a yeast replication origin. Proc. Natl. Acad. Sci. USA 89:2654–2658.

    Natale, D. A., R. M. Umek, and D. Kowalski. 1993. Ease of DNA unwinding is a conserved property of yeast replication origins. Nucleic Acids Res. 21:555–560.[Abstract]

    Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.

    Roberts, J. D., and T. A. Kunkel. 1996. Fidelity of DNA replication. Pp. 217–247 in M. L. DePamphilis, ed. DNA replication in eukaryotic cells. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175.

    Savatier, P., G. Trabuchet, C. Faure, Y. Chebloune, M. Gouy, G. Verdier, and V. M. Nigon. 1985. Evolution of the primate ß-globin gene region: high rate of variation in CpG dinucleotides and in short repeated sequences between man and chimpanzee. J. Mol. Biol. 182:21–29.[ISI][Medline]

    Sharp, P. M., D. C. Shields, K. H. Wolfe, and W. H. Li. 1989. Chromosomal location and evolutionary rate variation in enterobacterial genes. Science 246:808–810.

    Smith, R. A., P. J. Ho, J. B. Clegg, J. R. Kidd, and S. L. Thein. 1998. Recombination breakpoints in the human ß-globin gene cluster. Blood 92:4415–4421.

    Spangenberg, C., T. C. Montie, and B. Tummler. 1998. Structural and functional implications of sequence diversity of Pseudomonas aeruginosa genes oriC, ampC, and fliC. Electrophresis 19:545–550.

    Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.

    Theis, J. F., C. Yang, C. B. Schaefer, and C. S. Newlon. 1999. DNA sequence and functional analysis of homologous ARS elements of Saccharomyces cerevisiae and S. carlsbergensis. Genetics 152:943–952.

    Umar, A., and T. A. Kunkel. 1996. DNA–replication fidelity, mismatch repair and genome instability in cancer cells. Eur. J. Biochem. 238:297–307.[Abstract]

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256–276.[ISI][Medline]

    Ye, N., G. P. Holmquist, and T. R. O’Connor. 1998. Heterogeneous repair of N-methylpurines at the nucleotide level in normal human cells. J. Mol. Biol. 284:269–285.[ISI][Medline]

Accepted for publication October 11, 1999.