1 Department of Zoology, University of Guelph, Guelph, Ontario, Canada N1G 2W1
2 Mammal Research Institute, Department of Zoology and Entomology, University of Pretoria, Pretoria 0002, South Africa
3 Department of Genetics, North Carolina State University, Raleigh, NC 27695-7614, USA
Correspondence
Daniel T. Haydon
D.Haydon{at}bio.gla.ac.uk
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() |
---|
Present address: Division of Environmental and Evolutionary Biology, University of Glasgow, Glasgow G12 8QQ, UK.
![]() |
MAIN TEXT |
---|
![]() ![]() ![]() ![]() |
---|
One frequently applied test for the presence of recombination is to examine the relationship between the level of linkage disequilibrium (LD) between tightly linked sites and those spaced further apart within the sequence (Schaeffer & Miller, 1993; Conway et al., 1999
; Awadalla & Charlesworth, 1999
; Hudson, 2001
; McVean et al., 2002
). LD is a measure of the correlation between the occurrence of genetic markers (e.g. nucleotides, restriction sites or alleles) at different sites in the genome measured across multiple genomes. Recombination occurring between two sites will usually reduce the LD between them. Since the recombination rate is likely to be higher between more physically distant pairs of sites, the result will be a negative relationship between estimated LD associated with pairs of bi-polymorphic sites (those at which just two nucleotides are present) and the number of nucleotide sites separating them (Fig. 1
).
|
Foot-and-mouth disease virus (FMDV), in the family Picornaviridae, is a widely distributed disease of cloven-hoofed animals, occurring as seven serotypes: A, O, C, Asia 1 and SAT (South African Territories) types 1, 2 and 3. Acute FMDV infections of domestic animals are usually of only 23 weeks' duration, which limits the opportunity for both accumulation of de novo genetic variation and multiple infections by different genotypes. However, more persistent subclinical infections may establish in cattle and particularly of SAT type viruses in African buffalo (Syncerus caffer) from which virus may be recovered for 25 years after time of first infection.
Evidence from several genera within the family Picornaviridae suggests that occasional recombination between distantly related genomes has been important in the genetic history of this group (e.g. Brown et al., 2003; Liu et al., 2003
; Yang et al., 2003
). In FMDV, strong evidence exists for historical between-serotype recombination (Krebs & Marquardt, 1992
; van Rensburg et al., 2002
) and a within-serotype recombinant has been identified by Tosh et al. (2002)
. However, it is not clear from these observations whether recombination is persistently high but only occasionally detectable, or whether it is actually a rare event. High rates of intragenic recombination will lower the resolution of phylogenetic inference and serve to generate antigenic novelty in areas where multiple strains co-circulate. Laboratory studies of FMDV and other picornaviruses suggest that recombination could be very common during infection (King et al., 1985
; King, 1988
). If it is, the epidemiology of FMDV dictates that most recombination is likely to be between virus genes of high sequence identity and hence only detectable at the virus population level through a general lowering of LD.
For each dataset considered, we calculated the correlation between the occurrence of different nucleotides at different sites using the Hill & Robertson (1968) measure
calculated for all pairs of sites segregating for two nucleotides (where
; the notation is conventional: pAB represents the frequency of alleles with nucleotide A present at the first site and B present at the second, pA represents the frequency of nucleotide A at the first site, etc.). We also calculated D', a measure of degree of association between nucleotide variants of different polymorphic sites, where
and Dmax is the largest possible value of D given the nucleotide frequencies (Brown, 1975
; Lewontin, 1988
). The correlation between both pairwise measures of LD and distance, dij, for pairs of polymorphic sites was evaluated using the standard Pearson correlation coefficient and significance was determined using a Mantel test (randomizing the position of sites; Manly, 1986
). The value from the actual sequence data was compared with the distribution of coefficients from randomized sets of data and was considered significant at a given level if its absolute magnitude exceeded the 95th percentile. Mean values of r2 and D' were computed over all bi-polymorphic pairs of sites within each dataset and denoted
and
, respectively.
The methodology that follows is conceptually similar to the informative sites' test of Worobey (2001) except that we used LD rather than numbers of informative sites as a test statistic. Phylogenetic tree topology was estimated for each dataset using DNADIST and FITCH in the PHYLIP package (Felsenstein, 1993
). This topology was then used to make maximum-likelihood estimates of branch lengths, rate heterogeneity (
) and transitiontransversion ratio (
/2) using the HKY85 model of base substitution (Hasegawa et al., 1985
) as implemented in BASEML in the PAML package (Yang, 1997
). The analysis was restricted to 3rd codon base positions to remove as far as possible the influence of selection. Having arrived at final estimates of
,
and branch lengths for each dataset, we used these parameters to simulate 500 equivalent datasets using the EVOLVER program (again using the HKY85 model) from the PAML package. We compared observed values of LD statistics from each real dataset with corresponding distributions of LD statistics obtained from each set of 500 simulated datasets and thereby inferred which observed values of LD differed significantly from expectation under the hypothesis of no recombination.
We examined six sets of sequences of FMDV VP1 genes from four different serotypes, SAT-1, -2 and -3 (where most of the isolates are from, or closely related to, isolates from African buffalo) and serotype O (all recovered from infections of domestic livestock). The data and its origins are described in Table 1. Prior analysis (using PLATO; Grassly & Holmes, 1997
) revealed that there was no large-scale heterogeneity in these alignments and thus no obvious evidence for genetically distinctive recombinants. Because population structure tends to increase LD, the largest dataset was broken down into those arising from smaller geographic regions. As an indication of the effectiveness of these methods for detecting recombination, we subjected four additional datasets to identical analyses (Table 1
). We analysed two human immunodeficiency virus (HIV) datasets (dataset G, HIV env gene sequences; Kuiken et al., 2000
; and dataset H, HIV nef sequences isolated from a single patient 41 weeks post-infection; Plikat et al., 1997
), dataset I, a mitochondrial DNA (mtDNA) dataset for the COII gene from Pan troglodytes verus (Wise et al., 1998
) and dataset J, rabies virus N gene sequences isolated from bats in the USA (Smith, 2002
). The HIV datasets were purportedly recombining, whereas the mtDNA and the negative-strand rabies virus were considered less likely to be recombining.
|
Analyses were performed using all bi-polymorphic sites and then with low frequency variants (singletons, where the polymorphism is maintained in just one sequence; doubletons, maintained in two sequences; and tripletons, maintained in three sequences) progressively removed. Table 2 shows LD statistics for all datasets. Two-thirds (4/6) of the FMDV datasets indicated at least one significantly negative correlation (at the 5 % level) between the Hill & Robertson measure of LD (r2) and inter-site distance, and one half (3/6) of the datasets indicated significantly negative relationships (at the 5 % level) between D' (a differently scaled measure of LD) and inter-site distance, both suggestive of recombination.
|
|
There are a number of reasons to suppose this form of analysis may be robust. While it requires parameter estimates of the mutation model, Worobey (2001) concluded that his simulations, conducted in an almost identical way, were robust to probable levels of uncertainty in parameter estimation. Patterns of virus demography may affect tree shape (Schierup & Hein, 2000
) but direct use of recovered phylogenies insulates our conclusions from effects of population demographics on the genealogical process. Overestimating rate heterogeneity (because of recombination events unaccounted for in the estimation of phylogeny) will result in less, not more, LD in simulated data, rendering our inference process conservative. Finally, it is not easy to envisage ways in which positive or purifying selection might result in a reduction in LD at 3rd base positions.
Our proposed methodology falls short of quantifying the extent of recombination in FMDV responsible for the identified linkage deficit. However, while quantitative estimates of recombination rates would be extremely valuable, currently the only way to estimate them from nucleotide data requires specifying a coalescent model. For example, the method described by McVean et al. (2002) assumes a FisherWright population genetic model (constant population size, no selection, no migration, non-overlapping generations) and as a result two sources of uncertainty are incurred: (i) a known additional variance in estimates of recombination rate arising from genealogical variability introduced by this model; and (ii) a largely unknown sensitivity to the inevitable violations of the assumptions made by this particular model when applied to FMDV.
Sequences of SAT serotypes, particularly those from or closely related to isolates from African buffalo which may remain infected for years may present the virus with greater opportunities for observable recombination than isolates of other serotypes, which are usually associated with shorter more acute infections. Results from these analyses suggest that frequent recombination between genetically closely related genotypes may be a plausible explanation for the low levels of LD characteristic of the FMDV alignments examined here.
![]() |
ACKNOWLEDGEMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() |
---|
Awadalla, P. (2003). The evolutionary genomics of pathogen recombination. Nat Rev Genet 4, 5060.[CrossRef][Medline]
Awadalla, P. & Charlesworth, D. (1999). Recombination and selection at Brassica self-incompatibility loci. Genetics 152, 413425.
Bastos, A. D. S. (2001). Molecular epidemiology and diagnosis of SAT-type foot-and-mouth disease in southern Africa. PhD thesis, University of Pretoria.
Bastos, A. D. S., Bertschinger, H. J., Cordel, C., van Vuuren, C. D., Keet, D., Bengis, R. G., Grobler, D. G. & Thomson, G. R. (1999). Possibility of sexual transmission of foot-and-mouth disease from African buffalo to cattle. Vet Rec 145, 7779.[Medline]
Bastos, A. D. S., Haydon, D. T., Forsberg, R., Knowles, N. J., Anderson, E. C., Bengis, R. G., Nel, L. H. & Thomson, G. R. (2001). Genetic heterogeneity of SAT-1 type foot-and-mouth disease viruses in southern Africa. Arch Virol 146, 15371551.[CrossRef][Medline]
Brown, A. H. (1975). Sample sizes required to detect linkage disequilibrium between two or three loci. Theor Popul Biol 8, 184201.[Medline]
Brown, B., Oberste, M. S., Maher, K. & Pallansch, M. A. (2003). Complete genomic sequencing shows that polioviruses and members of human enterovirus species C are closely related in the noncapsid coding region. J Virol 77, 89738984.
Conway, D. J., Roper, C., Oduola, A. M., Arnot, D. E., Kremsner, P. G., Grobusch, M. P., Curtis, C. F. & Greenwood, B. M. (1999). High recombination rate in natural populations of Plasmodium falciparum. Proc Natl Acad Sci U S A 96, 45064511.
Felsenstein, J. (1993). PHYLIP: Phylogeny Inference Package, version 3.5c. Department of Genetics, University of Washington, Seattle, WA, USA.
Grassly, N. C. & Holmes, E. C. (1997). A likelihood method for the detection of selection and recombination using sequence data. Mol Biol Evol 14, 239247.[Abstract]
Hasegawa, M., Kishino, H. & Yano, T. (1985). Dating of the humanape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 21, 160174.
Hey, J. & Wakeley, J. (1997). A coalescent estimator of the population recombination rate. Genetics 145, 833846.
Hill, W. G. & Robertson, A. (1968). Linkage disequilibrium in finite populations. Theor Appl Genet 38, 226231.
Holmes, E. C., Worobey, M. & Rambaut, A. (1999). Phylogenetic evidence for recombination in dengue virus. Mol Biol Evol 16, 405409.[Abstract]
Hudson, R. R. (1987). Estimating the recombination parameter of a finite population model without selection. Genet Res 50, 245250.[Medline]
Hudson, R. R. (2001). Two-locus sampling distributions and their application. Genetics 159, 18051817.
King, A. M. Q. (1988). Preferred sites of recombination in poliovirus RNA: analysis of 40 intertypic cross-over sequences. Nucleic Acids Res 6, 1170511723.
King, A. M. Q., McCahon, D., Saunders, K., Newman, J. W. & Slade, W. R. (1985). Multiple sites of recombination within the RNA genome of foot-and-mouth disease virus. Virus Res 3, 373384.[CrossRef][Medline]
Krebs, O. & Marquardt, O. (1992). Identification and characterization of foot-and-mouth disease virus O1 Burgwedel/1987 as an intertypic recombinant. J Gen Virol 73, 613619.[Abstract]
Kuiken, C., Thakallapalli, R., Eskild, A. & de Ronde, A. (2000). Genetic analysis reveals epidemiologic patterns in the spread of human immunodeficiency virus. Am J Epidemiol 152, 814822.
Lewontin, R. C. (1988). On measures of gametic disequilibrium. Genetics 120, 841847.
Liu, H. M., Zheng, D. P., Zhang, L. B., Oberste, M. S., Kew, O. M. & Pallansch, M. A. (2003). Serial recombination during circulation of type 1 wild-vaccine recombinant polioviruses in China. J Virol 77, 1099411005.
Manly, B. F. J. (1986). Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Res Popul Ecol 28, 201218.
Maynard Smith, J. (1992). Analysing the mosaic structure of genes. J Mol Evol 34, 126129.[Medline]
McVean, G. A. T., Awadalla, P. & Fearnhead, P. (2002). A coalescent based method for detecting and estimating recombination from gene sequences. Genetics 160, 12311241.
Plikat, U., NieseltStruwe, K. & Meyerhans, A. (1997). Genetic drift can dominate short-term human immunodeficiency virus type 1 nef quasispecies evolution in vivo. J Virol 71, 42334240.[Abstract]
Posada, D. & Crandall, K. A. (2001). Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci U S A 98, 1375713762.
Samuel, A. R. & Knowles, N. J. (2001). Foot-and-mouth disease type 0 viruses exhibit genetically and geographically distinct lineages (topotypes). J Gen Virol 82, 609621.
Schaeffer, S. W. & Miller, E. L. (1993). Estimates of linkage disequilibrium and the recombination parameter determined from segregating nucleotide sites in the alcohol dehydrogenase region of Drosophila pseudoobscura. Genetics 135, 541552.
Schierup, M. H. & Hein, J. (2000). Consequences of recombination on traditional phylogenetic analysis. Genetics 156, 879891.
Smith, J. S. (2002). Molecular epidemiology. In Rabies, pp. 79111. Edited by A. C. Jackson & W. H. Wunner. New York: Academic Press.
Tosh, C., Hemadri, D. & Sanyal, A. (2002). Evidence of recombination in the capsid-coding region of type A foot-and-mouth disease virus. J Gen Virol 83, 24552460.
van Rensburg, H., Haydon, D., Fourie Joubert, F., Bastos, A. D. S., Heath, L. & Nel, L. (2002). Genetic heterogeneity in the foot-and-mouth disease virus leader and 3C proteinase genes. Gene 289, 1929.[CrossRef][Medline]
Wall, J. D. (2000). A comparison of estimators of the population recombination rate. Mol Biol Evol 17, 156163.
Wise, C. A., Srmal, M. & Easteal, S. (1998). Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees. Genetics 148, 409421.
Worobey, M. (2001). A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol Biol Evol 18, 14251434.
Yang, C. F., Naguib, T., Yang, S. J. & 10 other authors (2003). Circulation of endemic type 2 vaccine-derived poliovirus in Egypt from 1983 to 1993. J Virol 77, 83668377.
Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 15, 555556.
Received 19 August 2003;
accepted 7 January 2004.