* Department of Biological Sciences, Imperial College London, Silwood Park Campus, Ascot, United Kingdom; and Institute of Molecular Genetics, Academy of Sciences, Prague, Czech Republic
Correspondence: E-mail: r.belshaw{at}imperial.ac.uk.
Abstract
There are at least 31 families of human endogenous retroviruses (HERVs), each derived from an independent infection by an exogenous virus. Using evidence of purifying selection on HERV genes, we have shown previously that reinfection by replication-competent elements was the predominant mechanism of copying in some families. Here we analyze the evolution of 17 HERV families using dN/dS ratios and find a positive relationship between copy number and the use of additional copying mechanisms. All families with more than 200 elements have also used one or more of the following mechanisms: (1) complementation in trans (elements copied by other elements of the same family; HERV-H and ERV-9), (2) retrotransposition in cis (elements copying themselves) within germ-line cells (HERV-K(HML3)), and (3) being copied by non-HERV machinery (HERV-W). We discuss why these other mechanisms are rare in most families and suggest why complementation in trans is significant only in the larger families.
Key Words: human endogenous retrovirus infection retrotransposition complementation
Introduction
Endogenous retroviruses (ERVs) are the proviral form of exogenous retroviruses that have become integrated into the germ line of the host (Boeke and Stoye 1997). The human genome contains 98,000 such ERVs (J. Paes, Pavlí
ek, and V. Pa
es 2002), and together with the 158,000 mammalian apparent long terminal repeat (LTR) retrotransposons (MaLRs), they make up 8% of our genome (IHGSC 2001).
Typically a human endogenous retrovirus (HERV) element consists of an internal region of three genes (gag, pol, and env) flanked by two sequences known as LTRs, which are identical at the time of integration and are essential for replication. Katzourakis and Tristem (2005) defined 31 HERV families, each of which is considered to be a clade derived from a single infection of the human germ line (Tristem 2000). Most HERV elements integrated into the genome tens of millions of years ago and have accumulated numerous stop codons and frameshift mutations or have undergone recombination between their LTRs, leading to the loss of the entire internal region and leaving only a solo LTR (Stoye 2001). All families except HERV-K(HML2) have long ceased proliferation (IHGSC 2001), and no active HERVs are known; thus, the copying mechanisms by which they proliferated can only be inferred indirectly.
We have examined previously the evolution of HERV-K(HML2) and several other families and found strong evidence of past purifying selection acting on the env gene (Belshaw et al. 2004), which is necessary only for movement between host cells. From this we inferred that most copying was via the reinfection of germ-line cells by replication-competent elements. To what extent this involved infectious transfer between host individuals or was simply the movement between cells of the same individual has yet to be determined. Acquisition of novel endogenous elements via reinfection has been demonstrated experimentally in mice, where endogenous elements can copy themselves into the germ line of offspring derived from transplanted and virus-free ovaries via infection from the host mother (Boeke and Stoye 1997). Here, we test the predominant role of reinfection by analyzing 17 HERV families, using a significantly reduced rate of nonsynonymous (dN) compared to synonymous (dS) nucleotide substitution as evidence of past purifying selection (Li 1997).
We find that all 13 families with a copy number below 200 have a low dN/dS ratio in the env gene (table 1). In each case, the ratio is significantly below 1 except for HERV-R, where P = 0.053. Reinfection of germ-line cells by replication-competent elements thus appears to be the predominant copying method in HERVs. However, the four families with a copy number above 200 (which are phylogenetically unrelated; Katzourakis and Tristem 2005) all show evidence of other mechanisms (see below). A binomial simulation (repeatedly taking the first four items from a shuffled list representing these four families and the 13 reinfecting families) shows that this is extremely unlikely to have occurred by chance (P < 0.001). There is also a significant correlation between copy number and the env dN/dS ratio (fig. 1), which we use as an estimate of the relative importance of reinfection by replication-competent elements (Spearman rank correlation; = 0.57; P = 0.01).
|
|
|
A third additional mechanism is known from HERV-W, two-thirds of whose elements have been copied by another type of retrotransposon called a LINE (long interspersed nuclear element; Pavlíek et al. 2002). Such LINE-copied elements lack promoter sequences (so cannot be transcribed further) and are dispersed through the phylogeny of the family (Costas 2002), showing that they have been derived from many different elements. This mechanism has been rare outside of the HERV-W family, but a plausible explanation for this is lacking. The remaining members of the family have a low env dN/dS ratio (0.36) that is significantly below 1 (P < 0.05), and the family therefore contained a core of reinfecting elements.
We have not analyzed some large groups of LTR elements in the human genome. The second largest HERV family is HERV-L, which may be over 70 Myr old (Bénit et al. 1999); the family lacks env and is thus assumed to have proliferated by copying within germ-line cells. Also, the abundant MaLRs are thought to be nonautonomous (Smit 1996; although it is not known if they are a natural group in the sense of the HERV families). Thus, it appears that reinfection by replication-competent elements has driven the evolution of most endogenous retrovirus lineages in our genome but was directly responsible only for a minority of the individual integrations that became fixed.
Methods
Our mining of HERVs is described in J. Paes, Pavlí
ek, and V. Pa
es (2002). For each family, we constructed a representative amino acid sequence for each gene by (1) finding open reading frames using getorf (Rice, Longden, and Bleasby 2000), (2) selecting the most representative ones using BlastAlign (Belshaw and Katzourakis, 2005) modified to use BlastP, and (3) confirming by blasting against GenBank. Nucleotide sequences were then aligned to the amino acid sequence using BlastAlignP (Belshaw and Katzourakis, 2005) and a Neighbor-Joining tree (HKY85 model) built using PAUP* (Swofford 1998). We calculated the dN/dS ratios on the internal branches of the tree using the "two-ratio" model in PAML (Yang 1997). This model allowed the largely neutral evolution represented by the terminal branches, when elements have become fixed and defective, to be ignored (Belshaw et al. 2004). We took values significantly below 1 (P < 0.05) as showing past purifying selection. Significance was measured by finding the likelihood when the internal ratio was fixed at 1 and comparing twice the difference in log.likelihood to the
2 distribution with one degree of freedom (Yang 1998). To improve accuracy we ignored both old and small HERV families and excluded solo LTRs from the copy number (hence, we assume that rates of recombinational deletion do not vary markedly between families). Further details are in the Supplementary Material online.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online www.molbiolevol.org.
Acknowledgements
This work was funded by the Wellcome Trust. A.K. was in receipt of a Natural Environment Research Council Studentship and subsequently a Medical Research Council Fellowship, and J.P. was funded by the Centre for Integrated Genomics Grant LN00A079. We also thank Vini Pereira for assistance with the programing and discussion of ideas.
Footnotes
1 Present address: Department of Zoology, University of Oxford, Oxford, United Kingdom
Lauren McIntyre, Associate Editor
References
Belshaw, R., and A. Katzourakis. 2005. BlastAlign: a program that uses blast to align problematic nucleotide sequences. Bioinformatics 21:122123.
Belshaw, R., V. Pereira, A. Katzourakis, G. Talbot, J. Paes, A. Burt, and M. Tristem. 2004. Long-term re-infection of the human genome by endogenous retroviruses. Proc. Natl. Acad. Sci. USA 101:48944899.
Bénit, L., J. B. Lallemand, J. F. Casella, H. Philippe, and T. Heidmann. 1999. ERV-L elements: a family of endogenous retrovirus-like elements active throughout the evolution of mammals. J. Virol. 73:33013308.
Boeke, J. D., and J. P. Stoye. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. Pp. 343435 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Plainview, N.Y.
Costas, J. 2002. Characterization of the intragenomic spread of the human endogenous retrovirus family HERV-W. Mol. Biol. Evol. 19:526533.
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696704.[CrossRef][ISI][Medline]
[IHGSC] International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860921.[CrossRef][ISI][Medline]
Katzourakis, A., and M. Tristem. 2005. Phylogeny of human endogenous and exogenous retroviruses. Pp. 186203 in E. D. Sverdlov, ed. Retroviruses and primate genome evolution. Landes Bioscience, Georgetown, Tex.
Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Löwer, R. 1999. The pathogenic potential of endogenous retroviruses: facts and fantasies. Trends Microbiol. 7:350356.[CrossRef][ISI][Medline]
Mager, D. L., and J. D. Freeman. 1995. HERV-H endogenous retroviruses: presence in the New World branch but amplification in the Old World primate lineage. Virology 213:395404.[CrossRef][ISI][Medline]
Mayer, J., and E. U. Meese. 2002. The human endogenous retrovirus family HERV-K(HML-3). Genomics 80:331343.[CrossRef][ISI][Medline]
Paes, J., A. Pavlí
ek, and V. Pa
es. 2002. HERVd: database of human endogenous retroviruses. Nucleic Acids Res. 30:205206.
Pavlíek, A., J. Pa
es, D. Elleder, and J. Hejnar. 2002. Processed pseudogenes of human endogenous retroviruses generated by LINEs: their integration, stability and distribution. Genome Res. 12:391399.
Rice, P., I. S. Longden, and A. Bleasby. 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16:276277.[CrossRef][ISI][Medline]
Smit, A. F. A. 1996. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6:743748.[CrossRef][ISI][Medline]
Stoye, J. P. 2001. Endogenous retroviruses: still active after all these years? Curr. Biol. 11:R914R916.[CrossRef][ISI][Medline]
Swanstrom, R., and J. W. Wills. 1997. Synthesis, assembly and processing of viral proteins. Pp. 263334 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Plainview, N.Y.
Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Tristem, M. 2000. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J. Virol. 74:37153730.
Yang, Z. H. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555556.[Medline]
Yang, Z. H. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568573.[Abstract]