mtDNA Tandem Repeats in Domestic Dogs and Wolves: Mutation Mechanism Studied by Analysis of the Sequence of Imperfect Repeats

Peter Savolainen*, Lars Arvestad{dagger} and Joakim LundebergGo,*

*Department of Biotechnology and
{dagger}Department of Numerical Analysis and Computing Science, Royal Institute of Technology (KTH), Stockholm, Sweden

Abstract

The mitochondrial (mt) DNA control region (CR) of dogs and wolves contains an array of imperfect 10 bp tandem repeats. This region was studied for 14 domestic dogs representing the four major phylogenetic groups of nonrepetitive CR and for 5 wolves. Three repeat types were found among these individuals, distributed so that different sequences of the repeat types were formed in different molecules. This enabled a detailed study of the arrays and of the mutation events that they undergo. Extensive heteroplasmy was observed in all individuals; 85 different array types were found in one individual, and the total number of types was estimated at 384. Among unrelated individuals, no identical molecules were found, indicating a high rate of evolution of the region. By performing a pedigree analysis, array types which had been inherited from mother to offspring and array types which were the result of somatic mutations, respectively, could be identified, showing that about 20% of the molecules within an individual had somatic mutations. By direct pairwise comparison of the mutated and the original array types, the physiognomy of the inserted or deleted elements (indels) and the approximate positions of the mutations could be determined. All mutations could be explained by replication slippage or point mutations. The majority of the indels were 1–5 repeats long, but deletions of up to 17 repeats were found. Mutations were found in all parts of the arrays, but at a higher frequency in the 5' end. Furthermore, the inherited array types within the mother-offspring pair were aligned and compared so that germ line mutations could be studied. The pattern of the germ line mutations was approximately the same as that of the somatic mutations.

Introduction

Extensive size variation of the mtDNA control region (CR), caused by variation in the number of tandemly repeated sequences, has been described for a number of species. In vertebrates, arrays of repeats are found in five positions, referred to as repetitive sequence (RS) 1–5 (Hoelzel et al. 1994Citation ), all situated in regions where H-strand replication is regulated (fig. 1 ). RS1 and RS2 are found in the 5' end of the CR where the H-strand replication pauses, forming the three-stranded D-loop. RS3, RS4, and RS5 are situated in the 3' end of the CR, upstream of the site for H-strand origin of replication, and, consequently, near the end of the same replication (Clayton 1991Citation ; Hoelzel et al. 1994Citation ). A high degree of internal complementarity and the capability to fold into complex secondary structures have been described for repeat arrays at all five positions. The arrays at RS1, RS2, RS4, and RS5 have repeats of 40–160 bp showing variation in the number of tandem repeats and, often, length heteroplasmy. The arrays at RS3 are distinctive compared with those at RS1, RS2, RS4, and RS5 in that the repeats are shorter (6–22 bp), the numbers of repeats are higher (up to >40), and the level of heteroplasmy is higher. Tandem repeat arrays at RS3 were found in all 18 carnivores studied (Hoelzel et al. 1994Citation ), including domestic dog, and in 14 shrew species (Fumagalli et al. 1996Citation ) and have been found in several other mammals, such as pigs (Ghivizzani et al. 1993Citation ) and rabbits (Mignotte et al. 1990Citation ). In the carnivore, the arrays were composed of imperfect repeats forming a secondary sequence of repeat types which differed between molecules within individuals, but in pigs and rabbits, the arrays were composed of perfect repeats.



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 1.—Schematic diagram of the organization of the mitochondrial DNA control region in mammals. The conserved sequence blocks (CSBs), the H-strand replication origin (OH), the H- and L-strand promoters (HSP and LSP, respectively), and the locations of tandemly repeated sequences in different species (RS1–RS5) are shown (Walberg and Clayton 1981Citation ; Clayton 1991Citation ; Saccone, Pesole, and Sbisa 1991Citation ; Hoelzel et al. 1994Citation )

 
Although many studies have been performed on the repeat arrays of the CR, little is known about the mechanisms governing the length mutations, the maintenance of the number of repeats in the arrays, and, in the case of imperfect repeats, the apparent homogenization of the sequence of the repeats within species. Several mechanisms have been proposed to be the cause of the variation in copy number of the repeats. The most commonly suggested mechanism is slipped-strand mispairing during replication, also known as replication slippage (Levinson and Gutman 1987Citation ; Buroker et al. 1990Citation ; Fumagalli et al. 1996Citation ; Broughton and Dowling 1997Citation ; Wilkinson et al. 1997Citation ), but recombination and transposition (Rand and Harrison 1989Citation ) and unequal crossing-over or gene conversion (Hoelzel, Hancock, and Dover 1993Citation ; Hoelzel et al. 1994Citation ) have also been proposed to fit with various data. However, the methods used in these studies have not been able to directly record the mutational events.

In this study, the tandem-repeat region of domestic dog and wolf CR mtDNA was studied. As the repeat arrays of this region are composed of imperfect repeats, it is possible to establish the physiognomy of inserted or deleted elements and their approximate positions. By studying the inheritance of mtDNA molecules from mother to offspring, somatic mutations were recorded by direct comparison between the original and the mutated molecules. In combination with an analysis of the heteroplasmic variation in the domestic dog population, a comprehensive picture of the mechanisms involved in the evolution of the tandem-repeat arrays was obtained.

Materials and Methods

Samples
Fourteen dogs and 5 wolves were sampled. The dogs were chosen so that the four major phylogenetic groups of dogs according to normal CR sequence were represented (Savolainen et al. 1997Citation ; Vila et al. 1997Citation ). Three sequence variants were represented by three (variant D5), three (variant D6), and two (variant D8) unrelated individuals, respectively, and five variants were represented by a single individual. Furthermore, two three-generation maternal pedigrees were studied, one consisting of dogs and one consisting of wolves. Both pedigrees were of sequence variant D6 (also called W6 when found in wolves). Studied individuals were as follows: H9 (German shepherd, sequence variant D5); H54 (Jämthund, D8); H83 (German shepherd, D5); PD1, PD2, and PD3 (maternal pedigree, Labrador retriever, D6); Ny34 (border collie, D1); Ny36 (Samoyed, D7); Ny38 (Norrbottenspets, D8); Ny39 (Irish setter, D9); Ny41 (Siberian husky, D18); Ny66 (German shepherd, D10); Ny71 (German shepherd, D5); Sch3 (Schipperke, D6); Wolf1, Wolf2a, Wolf2b, Wolf2c, and Wolf3 (maternal pedigree, wolf of Russian origin, W6/D6). Blood was sampled for H54, PD1, PD2, PD3, Sch3, and Wolf3, muscle was sampled for Wolf1, liver was sampled for Wolf2a, Wolf2b, and Wolf2c, and hair was sampled for H9, H83, Ny34, Ny36, Ny38, Ny39, Ny41, Ny66, and Ny71.

DNA Preparation
DNA was obtained from blood samples using the Chelex procedure (Walsh, Metzger, and Higuchi 1991Citation ). DNA from muscle and liver samples was obtained using the following method: 1-cm3 pieces were cut into slices, washed in 1 x SSC, and put in 400 µl of 150 mM NaAc (pH 7.0), 1.25 mg/ml proteinase K, 50 mM DTT, and 2% NP40 detergent. The samples were incubated at 37°C overnight and extracted twice with phenol/chloroform, and DNA was recovered by two rounds of ethanol precipitation. DNA was obtained from hairs using the following method: hairs were pulled from the individuals so that the bulbs were recovered and placed in 200 µl of 10 mM Tris-HCl (pH 8.5), 0.9% polyoxyethylene 10 lauryl ether, 35 mM DTT, 50 µg/ml proteinase K, and 5% w/w Chelex 100 (Bio-Rad, Richmond, Calif.). The mixture was incubated at 56°C overnight and at 96°C for 10 min and was finally subjected to vortex mixing and centrifugation. The supernatant was used directly in the PCR amplification.

Direct DNA Amplification
The RS3 tandem-repeat region was amplified by PCR using the primers WD3 (5'-CAA GGT GCT ATT CAG TCA ATG G-3') and WD6 (5'-TAT AAT AGA TGA CAT GAG TTT ACG-3'). For control experiments and nested PCR, two more primers, situated internally of WD3 and WD6, were used: WD4 (5'-GGT TTG TAT AAG TTA ACT TAA TGT C-3') and WD5 (5'-TTT CAG GAC ATA TAG TTT TAG GG-3'). For fragment analysis, WD5 was fluorescently labeled with 6-FAM dye label (PE Applied Biosystems, Foster City, Calif.). The PCR mixture consisted of 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 2 µg/ml BSA, 0.2 mM of each dNTP, 0.1 µM of each primer, and 2 U of AmpliTaq DNA polymerase (Perkin Elmer, Norwalk, Conn.) in a volume of 50 µl. The amplification program consisted of predenaturation (94°C for 2 min) followed by 40 cycles of denaturation (94°C for 15 s), primer annealing (62°C for 30 s) and extension (72°C for 1 min), and a final extension (72°C for 10 min).

DNA Cloning and DNA Amplification of Cloned DNA
Amplified DNA was ligated into the pGEM-T vector (Promega, Madison, Wis.), transformed into Epicurian Coli competent cells (Stratagene, La Jolla, Calif.), and spread according to the manufacturers' directions. Individual bacterial clones were picked and taken directly to PCR amplification. Plasmid-specific primers RIT27 (5'-GCT TCC GGC TCG TAT GTT GTG TG-3') and RIT28 (5'-AAA GGG GGA TGT GCT GCA AGG CG-3') were used. The PCR mixture and amplification program were identical to those used for direct PCR, except that 35 cycles were used and primer annealing was performed at 69°C.

DNA Fragment Analysis and Sequence Analysis
For fragment analysis, 0.2 µl of amplification product was mixed with deionized formamide and fluorescently labeled size standard (Tamra 500, PE Applied Biosystems). For sequence analysis, 1 µl of amplification product was mixed with 3.2 pmol sequencing primer USP (5'-CGT TGT AAA ACG ACG GCC AG-3'), and BigDye cycle sequencing was performed according to the manufacturer's directions (PE Applied Biosystems), except that primer annealing was performed at 55°C. The samples were analyzed on an ABI PRISM 377XL (PE Applied Biosystems) using 4% denaturing polyacrylamide gel according to the manufacturer's directions. Genescan software (PE Applied Biosystems) was used for size calling and quantification of the DNA fragments, and DNA sequences were studied for the presence of polymorphic positions using the SeqEd software (PE Applied Biosystems).

DNA Sequence Alignment and Pairwise Comparison
Sequence comparisons of array types were made by translating the DNA sequence to trinary code and aligning the codes using in-house–developed software. For pairwise sequence comparisons in the mutation analysis, computer programs tailored for comparing two array types were used. The program assumes that the first array type is ancestral to the second and that the evolutionary events are point mutations, deletions, or insertions where the insert is adjacent to its template. The program gives an upper bound for the number of evolutionary events needed to explain the evolution of the first sequence into the second. The software tools for sequence comparisons on a Unix system are available on request.

PCR and Cloning Artifacts
In order to ascertain that the heteroplasmic variation found in the analyzed individuals was not caused by PCR or cloning, several experiments were carried out. First, from one individual, multiple blood samples were taken, DNA was extracted, and PCR amplifications and fragment analysis were performed (fig. 2A ). The different blood samples gave identical fragment analysis profiles (data not shown). Second, PCRs performed with alternative primers and with nested primers yielded the same profiles (data not shown). Third, when the PCR products were cloned and the clones were PCR amplified, only one peak per clone was revealed by fragment analysis (fig. 2B ). Fourth, when the cloned fragments of different lengths were counted, the distribution of the number of fragments of different lengths agreed with the fragment analysis profiles (fig. 2A ). These experiments show that the majority of the different length types found were actually present in the original DNA extract and that even though the existence of length artifacts created by PCR errors is not completely ruled out, they are rare. Furthermore, these experiments show that the frequency distribution of the length types was faithfully transferred through the cloning step.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2.—A, Fragment analysis of the tandem repeat region; PCR product from total DNA from the individual PD3. Lengths of fragments in base pairs and relative intensity of fluorescence are shown on the X and Y axes, respectively. Inset picture shows the distribution of the number of cloned PCR fragments of different lengths from PD3. B, Fragment analysis of the PCR product from one clone. Fragment length in base pairs and relative intensity of fluorescence are shown on the X and Y axes, respectively

 
In order to estimate the maximum frequency of any possible PCR length artifacts not detected in the previous experiments, dilutions down to a few molecules were made on the DNA extracts and PCR amplifications were performed. Thereby, only a few different array types were start molecules in the PCR. From an extract from PD3 containing approximately 5–10 mtDNA molecules, 46 clones were sequenced and 5 different array types were found. When compared with the array types found in the dog pedigree, three of the five array types were identical to 1 of the 21 inherited array types, and two were unique array types. Only the two unique array types are potential artifacts, as an artifact mutation resulting in an array type previously found in more than one individual is unlikely. Similar dilutions were performed on extracts from PD1 and PD2. From the three samples, a total of 129 clones were analyzed, and only four unique array types were found. This indicates that if length artifacts do occur, the maximum error rate is approximately 3.1 artifact array types per 100 analyzed clones.

Frequency of Point Mutation Artifacts
To investigate the frequency of point mutation artifacts, the PCR product from one clone from PD3 was cloned once more in a fashion identical to that used before. Thirty-three clones were analyzed by PCR and sequence analysis. Six point mutations, two of which were in the informative position, were found in the 9,570 bp analyzed, yielding a point mutation artifact frequency of 6.3 x 10-4. Assuming that the artifact frequencies are equal among sites, a frequency of 1.7 point mutation artifacts in the informative position per 100 clones is obtained.

Results

Intra- and Interindividual Fragment Length and DNA Sequence Variation
The RS3 tandem-repeat region was studied by fragment length analysis and DNA sequence analysis for 14 dogs and 5 wolves. All individuals showed length heteroplasmy caused by different numbers of 10-bp repeats in the tandem-repeat arrays (fig. 2A ). All individuals analyzed, including maternally related ones, had different fragment analysis profiles (data not shown). A second level of variation was caused by the fact that the repeated 10-bp sequences were not perfect repeats. The dogs had two different 10-bp repeat types differing in one position, indicated in bold type: ACACGTGCGT and ACACGTACGT. In the wolf pedigree, there was a second variable position, and a third repeat type, ACACATACGT, was present in these individuals. The three repeat types were designated 0, 1, and 2, respectively. The repeat types were distributed in different patterns, resulting in a secondary DNA sequence (fig. 3 ). Only the informative positions, i.e., those defining the two (for dogs) or three (for wolves) repeat types, were included in the analyses. Point mutations found in other positions were not taken into account. Within the individuals, several different arrays having different repeat type sequences (array types) were found, differing from each other by potential insertions or deletions (indels) of one or several repeats or point mutations in the informative position. The smallest array type found had 8 repeats, and the largest had 46 repeats. However, a very narrow distribution of the number of repeats was found in all individuals; 92% of the molecules had 25–35 repeats, and 97% had 20–40 repeats. Of 869 molecules studied, only 3 showed insertions or deletions of truncated repeats. These molecules were omitted from the analyses. It could be noted that in the wolves, there were two informative positions but only three repeat types instead of the possible four. This shows that positions 5 and 7 in the repeats are kept together as one unit in the length mutations.



View larger version (76K):
[in this window]
[in a new window]
 
Fig. 3.—Array types found in the different individuals. Individual (sequence variant), array type name, number of clones, and repeat type sequence are given. Array types that are found in more than one individual in a pedigree have the prefix "i" in the array type name

 


View larger version (91K):
[in this window]
[in a new window]
 
Fig. 3 (Continued)

 


View larger version (94K):
[in this window]
[in a new window]
 
Fig. 3 (Continued)

 
All individuals from which several clones were analyzed had large numbers of different array types. Two individuals from the maternal pedigree of dogs were more thoroughly studied. Two hundred eighteen clones were analyzed for PD2 and PD3, and 85 and 61 array types were found in the two individuals, respectively (fig. 3 ). A sample coverage analysis (Chao and Lee 1992Citation ) indicated total numbers of 384 and 334 array types in the two individuals, respectively.

The distribution of the frequencies of different array types within the individuals varied considerably. Some individuals had one array type at a frequency of over 50% and all other array types at much lower frequencies, while other individuals had several array types at intermediate and more equal frequencies (fig. 3 ). For example, in three of the members of the wolf pedigree and in H9, a single array type was predominant and constituted 50%–60% of the molecules, while the other array types were present in frequencies under 10%. In PD2, PD3, and Ny71, on the other hand, there were three array types present in frequencies between 10% and 20%, and in H54, all 18 molecules analyzed were unique, indicating that all array types had frequencies under 15%.

A pairwise repeat type sequence comparison performed on all possible pairs of array types among the unrelated individuals showed no array type matches, not even between individuals sharing the same nonrepetitive CR sequence variant. Three German shepherds sharing the same sequence variant (D5) were also divergent. It can therefore be concluded that the interindividual variation of the sequence of repeat types is larger than that of the nonrepetitive CR sequence. Within individuals, a common pattern in the repeat type sequence was found in the different array types, and usually each array type differed by a single indel or point mutation from one or several other array types. In most cases, more similarity between the array types was found in the 3' end than in the 5' end.

Very different structures in the patterns of the repeat type sequences were found when some individuals were compared. For example, in molecules found in PD1, PD2, and PD3, the 5' end consisted mostly of a consecutive row of repeats of type 1, and the 3' end had a pattern of alternating type 0 and type 1 repeats, while among the wolves, the 5' end consisted mostly of alternating repeats of types 1 and 2 (GenBank accession number AF202894), and the 3' end consisted of a consecutive row of repeats of type 0. In Sch3, an alternating pattern of repeat types 0 and 1 was found all through the arrays.

Inheritance of Array Types in the Maternal Pedigrees and Identification of Mutated Array Types
For the dog pedigree, 43, 218, and 218 clones, respectively, were sequenced from the three animals PD1, PD2, and PD3 (fig. 3 and table 1 ). In all individuals, a few array types were more common, with frequencies of up to 18%, while the majority of the array types were found only once, indicating that the frequencies of most array types were lower than 1%. The most frequent array types were inherited from mother to offspring, but the frequencies varied from generation to generation. The 11 array types that were common to PD1, PD2, and PD3 constituted 65%, 56%, and 73% of the molecules found in these individuals, respectively. The majority of the array types were found in only one individual. For example, 43 of the 61 array types found in PD3 were unique. The array types that were unique to one individual constituted 19%–33% of the molecules in the three individuals. In the wolf pedigree, only one array type was common to all individuals (table 2 ). This type constituted 26%–59% of the molecules found in the individuals. All other array types were found in only one to three individuals, and at low frequencies. Also in this pedigree, the proportion of the inherited array type fluctuated considerably between generations. This fluctuation indicates a narrow genetic bottleneck in the germ line. The number of segregating units in the germ line in mice has been calculated to be about 200 molecules using the variance in the frequency of mtDNA types in the offspring of female mice, and similar values were found for human pedigrees (Jenuth et al. 1996Citation ). When the same analysis was performed on Wolf1 and its offspring Wolf2a, Wolf2b, and Wolf2c (the dog pedigree could not be analyzed, as there was only one offspring per generation), the size of the germ line bottleneck was calculated to be 59.6 molecules, indicating that the number of segregating units in dogs and wolves is similar to those found in mice and humans.


View this table:
[in this window]
[in a new window]
 
Table 1 A Three-Generation Maternal Pedigree of Dogs Showing Array Types Found in More than One Individual

 

View this table:
[in this window]
[in a new window]
 
Table 2 A Three-Generation Maternal Pedigree of Wolves Showing Array Types Found in More than One Individual

 
The large number of array types in PD2 and PD3 and the presence of a germ line genetic bottleneck indicate that most of the unique array types were not inherited from PD2 to PD3 but were created by mutations. Furthermore, the fact that the frequencies of the most common array types, though fluctuating, remain at the same level over three generations in the pedigrees shows that the mutations are somatic, as a high frequency of germ line mutations would result in a decrease in the frequency of the common array types. In order to establish that practically all of the analyzed molecules which had been inherited from PD2 to PD3 had actually been identified and that the other molecules could be assumed to be somatically mutated, two analyses were performed. First, the number of array types shared between mother and offspring was plotted against the increasing number of analyzed clones (fig. 4 ). The experimental curve shows that a large proportion of the inherited array types had been found by the analysis of 218 clones in PD2 and PD3 and, importantly, that all of the most frequent inherited array types had been found. A very conservative extrapolation of the curve, under the assumption of a linear increase using the tangent of the last few clone pairs, shows that to find five more inherited array types, a total of 500 clones must be analyzed. A number of inherited array types considerably larger than the 17 found to be common for PD2 and PD3 is therefore not consistent with a germ line bottleneck size of about 200 molecules. Second, a sample coverage analysis (Chao et al. 2000Citation ) indicated that 35 array types were shared between PD2 and PD3, to be compared with the total number of 334 array types estimated to be present in PD3. In both analyses, an overestimate of the number of shared array types is probable, as array types identified as inherited but found only once in PD3 may have arisen independently by somatic mutations. The analyses show that only a minority of the array types in PD3 had been inherited from PD2 and that, among the array types in PD3 found in this study, practically all of those inherited from PD2 to PD3 had been identified. This implies that the other array types, which were found in PD3 but not in PD2, were mutated forms of the inherited array types. Therefore, it is indicated that as much as 20%–30% of the molecules in the individuals are somatically mutated.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 4.—The number of array types found in both PD2 and PD3, plotted against the number of analyzed clones. The increase in the number of shared variants is shown for every 10 clones analyzed in PD2 and PD3

 
Somatic Mutations: Test of Replication Slippage Mutation Model
In order to examine the mutation mechanism responsible for the length mutations, and to study the nature of the mutations, a comprehensive study was performed on the mutated array types found in PD3. To be able to study a mutation, it is necessary to identify the original and the mutated molecule types. Here, this was possible, as it could be presumed that all the original array types, transferred through the germ line, had been found and that all other array types were the results of somatic mutations in the original ones. As it could not be known beforehand which of the inherited array types was the origin of which mutated array types, all inherited array types had to be compared with all mutated array types.

Two different types of mechanisms can be expected to be involved in the length mutations in the repeat arrays: replication slippage and recombination. In the case of recombination, the insertions and deletions can occur anywhere in the arrays, and the inserted elements can be expected to have the sequence of any part of the molecules found in the population of inherited molecules. In the case of replication slippage, the outcome of the mutation is more restricted, as the insertions are always situated directly adjacent to (in tandem with) their template sequences. As the replication slippage model is the most conservative of the models, and any result consistent with replication slippage can also be explained by recombination, only the replication slippage model was tested.

The test of the replication slippage model was performed by aligning the inherited and the mutated array types in pairs and for each pair investigating whether the existence of the mutated array type could be explained by one or more mutations (point mutations, deletions or insertions caused by replication slippage) in the inherited array type. The explanation requiring the smallest number of mutations was chosen for each mutated array type. It was presumed that all array types in PD3 that were also found in PD2 were inherited, except those that were found only once in PD3, as these could have been created by independent somatic mutations (table 1 ). All molecules having a unique array type and the inherited array types i7, i10, i13, i14, i17, i18, and i21 were thus regarded as mutated molecules, giving 50 array types regarded as mutated. As these array types presumably are somatic-mutation variants of the inherited array types, the frequency of somatic mutation was set at 50/218 = 0.229 (the number of mutated array types per 218 analyzed clones). At this frequency of mutation, there is a high probability that some molecules are mutated more than once. For example, under the assumption of a population of constant size where all molecules are equally likely to become mutated, the number of mutations in the different molecules would be Poisson distributed. However, in this case, the population has experienced an expansion from a few copies in the single germ cell, and a distribution skewed toward more molecules having several mutations may be expected.

When the mutated array types in PD3 were compared with the inherited array types, the existence of all 50 mutated array types could be explained by one to four slippage-induced indels or point mutations (fig. 5 ). Thirty-seven of the mutated array types (43.7 expected according to the Poisson distribution model) could be explained by a single mutation, nine could be explained by two mutations (5.7 expected), three could be explained by three mutations (0.49 expected), and one could be explained by four mutations (0.03 expected). Thus, the results are in accordance with a model in which mutations are caused exclusively by replication slippage or point mutations.



View larger version (52K):
[in this window]
[in a new window]
 
Fig. 5.—A, The possible explanations for the 37 one-step mutations found in PD3. The unique array type (Uni.), the inherited array type (Inh.) which is the possible origin of the unique array type, the frequency of the inherited array type among 218 clones, the alignment showing the mutation, the type of mutation event (insertion, deletion, or point mutation), and the position for the mutation are given. The 5' end, the 3' end, and the middle of an array are defined by dividing the molecule into three equally long parts. When several positions are possible for the indel, the most central one is chosen. When several inherited array types can be the origin of a unique array type, the type and position of the mutation are given for the most common inherited array type only. Note that the inherited molecule is on the top and the mutated molecule is on the bottom in the pairwise alignments. B, The possible explanations for the nine two-step mutations, the three three-step mutations, and the single four-step mutation found in PD3. For the three- and four-step mutations, only one of several possible mutations are shown. The unique array type (Uni.), the inherited array type (Inh.) which is the possible origin of the unique array type, the frequency of the inherited array type among 218 clones, the alignment showing the mutations, the type of mutation event (insertion, deletion, or point mutation), and the number of mutation events (e) are shown. Note that the inherited molecule is on the top and the mutated molecule is on the bottom in the pairwise alignments

 


View larger version (39K):
[in this window]
[in a new window]
 
Fig. 5 (Continued)

 
The analysis was also performed on the array types in PD2. In this case, there were only 43 clones analyzed in the mother, PD1, and all array types inherited from PD1 to PD2 had probably not been identified. Instead, it was presumed that all array types that either were unique to PD2 or were found only once were mutated array types and that the remaining array types were inherited from PD1. This gave 16 inherited array types and 69 mutated types and a frequency of somatic mutation of 69/218 = 31.7%. When the presumably mutated array types were aligned and compared with the inherited ones, 47 array types could be explained by one mutation (56.8 expected according to the Poisson distribution model), 18 could be explained by two mutations (10.8 expected), 4 could be explained by three mutations (1.4 expected), and 0 had to be explained by four or more mutations (0.13 expected) (data not shown).

Mutation Characteristics
The possible explanations for the mutations in all unique array types in PD3 are shown in figure 5 . In some cases, it is impossible to determine from which of two or more inherited array types a unique array type stems, as mutations in two different inherited array types may result in the same new type. This is especially the case with the array types that have been altered in the 5' end, with its row of up to 14 consecutive type 1 repeats, and for deletions. However, for 19 of the 37 array types that had been mutated once, there was only one most-parsimonious explanation. When the individual pairwise comparisons between the inherited and the mutated array types are studied in PD3, the insertion and deletion events can be studied in detail. The length and the repeat type sequence of the indel elements and the approximate position of the mutation event can be determined. The characteristics of the mutations found in the molecules mutated once in PD2 and PD3 are given in tables 3 and 4 . Mutations were found in all parts of the arrays, but at a higher frequency in the 5' end. Almost half of the insertions and deletions were one or two repeats long. There was a small predominance of deletions over insertions and a 2.5-fold excess in the total number of deleted repeats compared with inserted repeats. The excess of deleted repeats could be attributed to 10 array types created by large deletions 9–17 repeats long. Fifteen point mutations were found in PD2 and PD3 in the informative position. Considering the expected PCR mutation frequency (7.4 point mutations per 436 clones, see Materials and Methods) and the fact that most of the point mutations were found in the 5' end, it is indicated that some of the point mutations are authentic and that point mutations, as well as length mutations, are more frequent in the 5' end.


View this table:
[in this window]
[in a new window]
 
Table 3 Summary of Mutation Characteristics

 
In order to elucidate whether the pattern of mutation found in PD2 and PD3 is general for the region or is influenced by the repeat type sequence of the molecules, e.g., by long stretches of identical repeats, the mutation analysis was performed on individuals having other array types: Wolf1, Wolf2a, and Wolf3 in the wolf pedigree, and H9 and H83, two unrelated German shepherds of sequence variant D5. These individuals had large proportions (44–59%) of one array type. It was presumed that this array type was inherited and that the array types that were unique to the individuals and were found only once were mutated molecules. The unique array types were aligned with and compared with the majority type, and the unique types that could be explained by one mutation in the majority type were identified (data not shown). The pattern of mutations in these individuals was similar to that found in PD2 and PD3 (tables 3 and 4 ).

Germ Line Mutations
The array types in PD3 that have been inherited from PD2 have been transferred in the germ line. The differences found between these array types are therefore the result of previous germ line mutations. An alignment of the 11 array types that were presumed to be inherited is shown in figure 6 . The array types fall into three groups, each composed of array types that differ only in the 5' end. The differences between the array types can be explained exclusively by indels. Approximately the same pattern of mutations as in the study of the somatic mutations is found: most of the mutations take place in the 5' end, and the mutations which can be easily identified, those that are not in the 5' end, are one or two repeats long.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 6.—An alignment of the 11 array types presumed to be inherited from PD2 to PD3. Group according to alignment, repeat type, frequency of array type among PD3 clones, and repeat type sequence are given

 
Discussion

A large amount of genetic variation was found in the region. When the array types were compared among the unrelated individuals, no array type matches were found. Furthermore, the frequency of the majority array type varied considerably among the individuals, and the array types had very different repeat type sequences in different individuals. Taken together, these observations indicate a high rate of germ line genetic evolution.

The proportion of molecules in PD3 with somatic mutations was calculated to be 21.2% (compensated for PCR point mutation artifacts), which is a very high figure compared with other estimates of the level of somatic mutation in mitotic tissue. In tandem repeat arrays in position RS5 in lagomorphs, the frequency of somatically mutated molecules was estimated to be between 1% and 5% (Casane et al. 1994Citation ). In the human brain, which is of a nonmitotic tissue type, 35%–45% of the mtDNA molecules had point mutations in the CR (Jazin et al. 1996Citation ). However, in blood, the frequency of point mutations could not be distinguished from the PCR artifact background.

The high proportion of somatically mutated molecules offered an opportunity to study the mechanisms causing the somatic mutations and their nature and position. The mutation analysis performed showed that all length mutations could be explained by replication slippage. Although recombination cannot be excluded by this study, direct evidence for recombination in vertebrate mtDNA has not been found. We therefore believe that, in light of the data presented here, replication slippage must be considered the most plausible explanation for the length polymorphism observed in these tandem arrays.

The study of the characteristics of the mutations showed a higher frequency of both length and point mutations in the 5' ends of the arrays than in the 3' ends. This observation agrees with the fact that, when compared within individuals, the array types were mostly more easily aligned in the 3' ends than in the 5' ends. The numbers of insertions and deletions were approximately the same, but the total number of deleted repeats was more than twice as large as the number of inserted repeats. This difference could be attributed to 11 large deletions between 9 and 17 repeats long. The cause of this is unclear. If the mechanisms for mutations are identical in somatic cells and in germ line cells, this would result in a gradual loss of repeats in the arrays over generations. However, in a study of tandem repeats in the RS5 position in lagomorphs, it was shown that the arrays tend to lose repeats in somatic cells, but not in the gonads (Casane et al. 1997Citation ), and this could also be the case in dogs. The fact that the number of repeats in the arrays was narrowly distributed around 30 may be explained by an equilibrium between the rates of insertions and deletions in the germ line (Kruglyak et al. 1998Citation ). In this study, the mutations were studied for three groups of dogs or wolves having different repeat type sequences in their array types. The same general pattern of mutation was found in all three groups, which indicates that the mutation characteristics found are general for this region and not to a high degree dependent on the repeat type sequence of the arrays.

The position of the repeat locus, just at the end of the H-strand replication, could be one explanation for the high mutation rate. It is plausible that there is a better possibility for slippage as the replication nears completion. Furthermore, the RS3 array in pigs has been shown to be transcribed at the initiation of H-strand replication (Ghivizzani et al. 1993Citation ). DNA polymerase errors near the RNA-DNA transition may therefore be a cause of mutations. Furthermore, the arrays have self-complementary sequences capable of forming secondary structures, which could favor slippage by increasing the stability of the slipped strand or by blocking the polymerase. The fact that most of the mutations were found at one end of the array could be explained by the asymmetry in the replication, as only the H-strand replication originates in this region.

The differences among the array types identified as inherited from PD2 to PD3, which are caused by germ line mutations, could all be explained by insertions or deletions, and approximately the same types of length mutations as in the somatically mutated molecules were found. The higher frequency of germ line mutations in the 5' end is mirrored by the fact that when array types are compared among individuals (when alignments are possible), the array types are more similar in the 3' end than in the 5' end. The mitochondrial replication mechanisms are believed to be the same in somatic cells and in germ cells, and it is probable that the findings obtained in the studies of the somatic mutations can be applied to the germ line mutations to a great extent. Therefore, replication slippage seems probable as the main source of length mutations in the germ line cells.

In the pedigrees, a considerable amount of fluctuation in the frequencies of the inherited array types was found between the generations. Furthermore, very different proportions of the majority array type were found among unrelated individuals. These facts indicate that in dogs as in other mammals, there is a relatively narrow bottleneck in the transfer of mtDNA molecules from mother to offspring in the female germ line, which causes genetic drift in the mtDNA population. This could explain the fact that the array types have very similar repeat type sequences within individuals, while the difference between individuals is sometimes large. If a single array type becomes predominant within an individual because of random drift, the majority of new array types will be mutated forms of this array type. Array types which do not originate from the predominant one will gradually be lost, and a relatively homogenous population of molecules is thus formed.

Mostly, it was not possible to align the array types from unrelated individuals without introducing a large number of indels (data not shown). Obviously, the evolution of the arrays is so rapid that comparisons of the array types is difficult even for individuals having the same nonrepetitive CR sequence variant. The individuals showing the greatest similarity were H9, H83, and Ny71, which are all German shepherds sharing the same sequence variant, D5. This similarity probably reflects a recent last common ancestor for these individuals, possibly in the last few hundred years. Preliminary analyses of genetic distance between individuals based on the mutation patterns found in this study showed that for individuals of different breeds, the array types were too dissimilar to give a useful result, but for the German shepherds, meaningful distance values were obtained (data not shown). This indicates that the analysis of these arrays may be used for phylogenetic studies on a timescale down to a few hundred years or less. However, more must be known about how rapid and homogenous among lineages the repeat type sequence evolution is before this possibility can be evaluated.

In conclusion, we propose that the following mechanisms are involved in the evolution of the tandem repeat arrays. Insertions and deletions are generated by replication slippage at a high rate, and point mutations occur at a lower rate, resulting in a heteroplasmic population of molecules. Most indels are one or two repeats long, but longer ones are not rare. The mutations are formed during the initiation or termination of the H-strand replication, and mutations are therefore most frequent in the 5' ends of the arrays. One reason for the high rate of length mutation is the capability of the arrays to form secondary structures, which may stabilize slipped strands or block the polymerase during replication. Because of a germ line bottleneck, the different array types are the subject of genetic drift, resulting in fluctuations in the proportions of the array types between generations. This leads to the extinction of some array types, while others become more frequent, resulting in homogenization of the population of molecules and fast evolution of the repeat type sequence.


View this table:
[in this window]
[in a new window]
 
Table 4 The Numbers of Insertions and Deletions of Different Sizes (Numbers of Repeats)

 
Acknowledgements

This work was supported by grants from the National Laboratory of Forensic Science (SKL), the National Board for Technical and Industrial Development (NUTEK), and the Swedish Research Council for Engineering Sciences (TFR). We would also like to thank Frode Lingaas for providing material from pedigree.

Footnotes

Ross Crozier, Reviewing Editor

1 Keywords: mitochondrial DNA tandem repeats mutation Canis. Back

2 Address for correspondence and reprints: Joakim Lundeberg, Department of Biotechnology, Royal Institute of Technology (KTH), S-100 44 Stockholm, Sweden. E-mail: joakim.lundeberg{at}biochem.kth.se Back

literature cited

    Broughton, R. E., and T. E. Dowling. 1997. Evolutionary dynamics of tandem repeats in the mitochondrial DNA control region of the minnow Cyprinella spiloptera. Mol. Biol. Evol. 14:1187–1196.[Abstract]

    Buroker, N. E., J. R. Brown, T. A. Gilbert, P. J. O'Hara, A. T. Beckenbach, W. K. Thomas, and M. J. Smith. 1990. Length heteroplasmy of sturgeon mitochondrial DNA: an illegitimate elongation model. Genetics 124:157–163.

    Casane, D., N. Dennebouy, H. de Rochambeau, J. C. Mounolou, and M. Monnerot. 1994. Genetic analysis of systematic mitochondrial heteroplasmy in rabbits. Genetics 138:471–480.

    ———. 1997. Nonneutral evolution of tandem repeats in the mitochondrial DNA control region of lagomorphs. Mol. Biol. Evol. 14:779–789.[Abstract]

    Chao, A., and S.-M. Lee. 1992. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87:210–217.[ISI]

    Chao, A., W.-H. Wang, Y.-C. Chen, and C.-Y. Kuo. 2000. Estimating the number of shared species in two communities. Statistica Sinica (in press).

    Clayton, D. A. 1991. Nuclear gadgets in mitochondrial DNA replication and transcription. Trends Biochem. Sci. 16:107–111.[ISI][Medline]

    Fumagalli, L., P. Taberlet, L. Favre, and J. Hausser. 1996. Origin and evolution of homologous repeated sequences in the mitochondrial DNA control region of shrews. Mol. Biol. Evol. 13:31–46.[Abstract]

    Ghivizzani, S. C., S. L. Mackay, C. S. Madsen, P. J. Laipis, and W. W. Hauswirth. 1993. Transcribed heteroplasmic repeated sequences in the porcine mitochondrial DNA D-loop region. J. Mol. Evol. 37:36–37.[ISI][Medline]

    Hoelzel, A. R., J. M. Hancock, and G. A. Dover. 1993. Generation of VNTRs and heteroplasmy by sequence turnover in the mitochondrial control region of two elephant seal species. J. Mol. Evol. 37:190–197.[ISI][Medline]

    Hoelzel, A. R., J. V. Lopez, G. A. Dover, and S. J. O'Brien. 1994. Rapid evolution of a heteroplasmic repetitive sequence in the mitochondrial DNA control region of carnivores. J. Mol. Evol. 39:191–199.[ISI][Medline]

    Jazin, E. E., L. Cavelier, I. Eriksson, L. Oreland, and U. Gyllensten. 1996. Human brain contains high levels of heteroplasmy in the noncoding regions of mitochondrial DNA. Proc. Natl. Acad. Sci. USA 93:12382–12387.

    Jenuth, J. P., A. C. Peterson, K. Fu, and E. A. Shoubridge. 1996. Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA. Nat. Genet. 14:146–151.[ISI][Medline]

    Kruglyak, S., R. T. Durrett, M. D. Schug, and C. F. Aquadro. 1998. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. USA 95:10774–10778.

    Levinson, G., and G. A. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203–221.[Abstract]

    Mignotte, F., M. Gueride, A. M. Champagne, and J. C. Mounolou. 1990. Direct repeats in the non-coding region of rabbit mitochondrial DNA. Involvement in the generation of intra- and inter-individual heterogeneity. Eur. J. Biochem. 194:561–571.[Abstract]

    Rand, D. M., and R. G. Harrison. 1989. Molecular population genetics of mtDNA size variation in crickets. Genetics 121:551–569.

    Saccone, C., G. Pesole, and E. Sbisa. 1991. The main regulatory region of mammalian mitochondrial DNA: structure-function model and evolutionary pattern. J. Mol. Evol. 33:83–91.[ISI][Medline]

    Savolainen, P., B. Rosen, A. Holmberg, T. Leitner, M. Uhlen, and J. Lundeberg. 1997. Sequence analysis of domestic dog mitochondrial DNA for forensic use. J. Forensic Sci. 42:593–600.[ISI][Medline]

    Vila, C., P. Savolainen, J. E. Maldonado, I. R. Amorim, J. E. Rice, R. L. Honeycutt, K. A. Crandall, J. Lundeberg, and R. K. Wayne. 1997. Multiple and ancient origins of the domestic dog. Science 276:1687–1689.

    Walberg, M. W., and D. A. Clayton. 1981. Sequence and properties of the human KB cell and mouse L cell D-loop regions of mitochondrial DNA. Nucleic Acids Res. 9:5411–5421.[Abstract]

    Walsh, P. S., D. A. Metzger, and R. Higuchi. 1991. Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques 10:506–513.

    Wilkinson, G. S., F. Mayer, G. Kerth, and B. Petri. 1997. Evolution of repeated sequence arrays in the D-loop region of bat mitochondrial DNA. Genetics 146:1035–1048.

Accepted for publication November 11, 1999.