Structure and Expression of the TREX1 and TREX2 3'right-arrow 5' Exonuclease Genes*

Dan J. Mazur and Fred W. PerrinoDagger

Wake Forest University School of Medicine, Department of Biochemistry, Winston-Salem, North Carolina 27157

Received for publication, November 3, 2000, and in revised form, January 23, 2001

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES

The TREX1 and TREX2 genes encode mammalian 3'right-arrow5' exonucleases. Expression of the TREX genes in human cells was investigated using a reverse transcription-polymerase chain reaction strategy. Our results show that TREX1 and TREX2 are expressed in all tissues tested, providing direct evidence for the expression of these genes in human cells. Potential transcription start sites are identified for the TREX genes using rapid amplification of cDNA ends to recover the 5'-flanking regions of the TREX transcripts. The 5'-flanking sequences indicate transcription initiation from consensus putative promoters identified -140 and -650 base pairs upstream of the TREX1 open reading frame (ORF) and -623 and -753 base pairs upstream of the TREX2 ORF. Novel TREX1 and TREX2 cDNAs are identified that contain protein-coding sequences generated from exons positioned in genomic DNA up to 18 kilobases 5' to the TREX1 ORF and up to 25 kilobases 5' to the TREX2 ORF. These novel cDNAs and sequences in the GenBankTM data base indicate that transcripts containing the TREX1 and TREX2 ORFs are produced using a variety of mechanisms that include alternate promoter usage, alternative splicing, and varied sites for 3' cleavage and polyadenylation. These initial studies have revealed previously unrecognized complexities in the structure and expression of the TREX1 and TREX2 genes.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES

The multistep processes of DNA replication, repair, and genetic recombination often require the excision of 3' nucleotides to generate DNA 3' termini suitable for subsequent metabolic steps. The apparent diversity of proteins containing 3' right-arrow 5' exonuclease activity likely reflects the different requirements for these enzymes in the maintenance of the human genome. In some cases these exonucleases are found in large proteins that contain multiple catalytic and functional properties. The 3' right-arrow 5' proofreading exonucleases are functional domains in the mammalian DNA polymerases delta  (1), epsilon  (2), and gamma  (3). These proofreading enzymes remove incorrectly polymerized nucleotides during DNA synthesis and minimize the incorporation of mismatches into the genome. The Werner syndrome protein (WRN) contains a 3' right-arrow 5' exonuclease activity in one functional domain and a 3' right-arrow 5' DNA helicase activity in another (4, 5). Deficiencies in the WRN protein increase genomic instability (6). The multifunctional p53 protein contains a 3' right-arrow 5' exonuclease localized to the central core domain (7). This core region in p53 also contains the sequence-specific DNA binding domain that functions in cell-cycle checkpoint control in mammalian cells (8). The hRAD1 (Ustilago maydis REC1) and hRAD9 are human homologues of yeast DNA damage checkpoint response proteins (9). These proteins also contain 3' right-arrow 5' exonuclease activities (10-12). The yeast mre11 mutant is defective in recombinational DNA repair (13). The purified MRE11 protein (14, 15) and a protein complex containing MRE11 contain 3' right-arrow 5' exonuclease activities (16). The TREX1 and TREX2 proteins are relatively small dimeric proteins that contain potent 3' right-arrow 5' exonucleases (17).1 The presence of 3' excision activities in this apparently diverse collection of proteins, and likely others, probably reflects the multiple pathways present in human cells requiring the modification of DNA 3' termini. However, insufficient information is currently available to understand the molecular pathways in which the different 3' right-arrow 5' exonucleases function.

Some insights into the catalytic requirements for 3' right-arrow 5' exonucleases have been gleaned from protein structure and mutagenesis studies and from protein sequence analysis. The proofreading exonuclease domains of the Escherichia coli DNA polymerase I large fragment and the bacteriophage T4 DNA polymerase have nearly identical folding patterns despite minimal overall sequence identity (18-20). Mutagenesis studies identify critical amino acids in three conserved motifs, Exo2 I, Exo II, and Exo III, that are positioned to coordinate two metal ions at the active site (21-23). The proofreading exonucleases of the mammalian DNA polymerases also contain these three Exo motifs (24). Statistical modeling strategies have been developed to identify additional proteins that might contain 3' excision activity (25, 26). This methodology revealed the conserved exonuclease motifs in the WRN protein (27, 28), and biochemical analysis confirmed the 3' right-arrow 5' exonuclease activity in this protein (4, 29). The TREX sequences contain the Exo I and Exo II motifs and a variation in the Exo III motif, renamed Exo IIIepsilon . The Exo IIIepsilon motif is characterized by the presence of the sequence HXAXXD rather than YXXXD (30-32) and is detected in the RNase T subfamily of exonucleases (25, 28, 33). The Exo IIIepsilon motif in the TREX proteins suggests that these mammalian exonucleases most closely relate to the bacterial epsilon subunit of DNA polymerase III, exonuclease I, and the recently described exonuclease X (34).

The increasing number of 3' right-arrow 5' exonuclease activities detected in proteins from human genes indicates that a variety of structural folds distinct from the proofreading exonucleases are likely. The multifunctional Escherichia coli exonuclease III has a potent 3' excision activity, and the structure of this protein is similar to APEX, the major human apurinic/apyrimidinic endonuclease (35, 36). Conserved residues in these enzymes indicate a common catalytic mechanism involving a single metal ion. The 3' excision activity of the APEX protein is relatively weak and appears to be influenced by substrate and reaction conditions (37) as well as by the structure of the 3' terminal nucleotide (38, 39). The structure of the p53 protein is similar to the exonuclease III and APEX proteins (40), and p53 protein is reported to contain 3' right-arrow 5' exonuclease activity (7). The hRAD1(REC1) and hRAD9 recombinant proteins contain 3 'right-arrow 5' exonuclease activities, but extensive sequence and modeling analyses have not provided insights into the catalytic mechanisms of these proteins (41). Additional studies will be necessary to identify the complete repertoire of human genes encoding 3' right-arrow 5' exonucleases.

The gene for TREX1 encodes the major 3' exonuclease activity measured in extracts prepared from mammalian cells. A 3' exonuclease activity was first detected in biochemical assays and named DNase III by T. Lindahl et al. (42). Recently, the human and mouse cDNAs encoding 3' right-arrow 5' exonucleases were identified by sequencing peptides generated from the purified bovine (17) and rabbit (43) enzymes. A second closely related mouse cDNA, named Trex2,3 was discovered in data base searches using the TREX1 cDNA as a query sequence (17). We have measured expression from both TREX genes using a RT-PCR strategy and investigated in detail the 5'-flanking regions of these genes. The human and mouse TREX1 proteins are 314 amino acids in length and not 304 as previously reported (17, 43). Our analysis confirms expression of TREX1 and provides the first evidence for the expression of TREX2 in human cells. Novel cDNAs containing the TREX1 and TREX2 ORFs have been identified that contain exons spanning 18 kb for TREX1 and 25 kb for TREX2. The salient features of the TREX genes are presented in this report.

    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES

DNAs-- Oligonucleotide primers were synthesized in the DNA laboratory of the Wake Forest University Comprehensive Cancer Center and are listed in Table I. The bovine genomic DNA was from Sigma-Aldrich Co. (D1501). The mouse genomic DNA (strain 129 SV) was a generous gift from P. Dawson (Wake Forest University School of Medicine). The human genomic DNA was the BAC clone RP11-24C3 (no. AC021328) purchased from Research Genetics.

PCR of the TREX1 ORF from Genomic DNA-- For amplification of TREX1 from genomic DNA, the PCRs (100 µl) contained 10 mM Tris-HCl, pH 9.0, 50 mM KCl, 0.1% Triton X-100, 200 µM dNTPs, 1.5 mM MgCl2, 50 ng of genomic DNA, and 1 µM each of the forward and reverse primers (Table I). The TREX2 PCRs also contained 5% Me2SO. Reactions were heated to 95 °C for 5 min prior to addition of Taq DNA polymerase (2.5 units, Promega Corp.) at 80 °C. The reactions were performed for 35 cycles at 95 °C for 1 min, 60 °C for 1 min, and 72 °C for 2 min. The products were resolved by agarose gel electrophoresis, recovered from the ethidium bromide-stained gels using spin columns (Qiagen), and sequenced using a PerkinElmer Life Sciences ABI Prism 377 automated DNA sequencer.

RACE Analysis of the 5'-Flanking Regions of TREX1 and TREX2-- Marathon ready cDNA from spleen (CLONTECH Laboratories, Inc.) was used to recover the 5'-flanking regions of TREX1 and TREX2 cDNAs. The two-round PCRs were performed according to the manufacturer's specifications using the nested Marathon Adapter primer pair and the TREX1- and TREX2-specific primer pairs indicated in Figs. 2, 4, 6, and 7. The TREX2 PCRs also contained 5% Me2SO. The first-round PCR products were fractionated on agarose gels and recovered from the gels using spin columns (Qiagen) in three separate size-selected pools. Samples of the size-selected products were used as templates in the second round PCR. Distinct product bands were recovered from gels, cloned into the pGEM®-T Easy vector (Promega Corp.), and sequenced.

Expression Analysis by PCR of the TREX1 and TREX2 Transcripts-- Total RNA was recovered from blast cells of a patient diagnosed with acute myeloblastic leukemia (AML) by guanidine isothiocyanate extraction and cesium centrifugation (44). The AML RNA was treated with RNase-free DNase I (Promega Corp.) and further purified using a RNeasy column (Qiagen). The AML RNA or tissue specific RNA (CLONTECH Laboratories, Inc.) (5 µg) was hybridized to 0.5 µg of oligo(dT)15 primer (Promega Corp.) for 5 min at 95 °C and then 10 min at 70 °C. The RNA was reverse transcribed with SuperScript II (Life Technologies, Inc.) for 2 h at 42 °C to generate cDNA. The PCR conditions using AML cDNA and the tissue-specific cDNA were as described above for genomic DNA. The nested primer pairs for the two-round PCR of AML cDNA are described in Figs. 3 and 5. The template for the first round PCR was 200 ng of reverse transcribed AML RNA, and the template for the second round was a sample (1 µl) of the first-round PCR products. The products from the second round were resolved by agarose gel electrophoresis, recovered from the gel, and sequenced. A single-round PCR was performed for TREX1 and TREX2 expression analysis using the 13 human tissue-specific cDNAs and the primers indicated in the text.

Identification of Novel TREX1 and TREX2 Transcripts-- Potential exons encoded in the genomic DNA of the 5'-flanking regions of the TREX1 and TREX2 ORFs were identified using the gene-finding algorithm GENSCAN (47). The PCR conditions for amplification of novel TREX1 and TREX2 cDNAs were as described above for genomic DNA. The specific primer pairs used in the two round amplification reactions are indicated in the text and in Figs. 6 and 7. The products from the second round were resolved by agarose gel electrophoresis, recovered from the gel, and sequenced.

    RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES

The peptide sequences generated from a purified mammalian 3' right-arrow 5' exonuclease identified the human TREX1 cDNA from EST W24304 in the GenBankTM data base in two independent studies (17, 43). More recently, we used the TREX1 sequence (no. AF151105) in a BLAST search of the GenBankTM data base to identify additional TREX1 ESTs (i.e. no. BE616406, AV764291, R23917, AA279657) and the human BAC clone RP11-24C3 (no. AC021328). Sequence alignments of these TREX1 ESTs indicated variations in the 5'-flanking regions (data not shown). These sequence variations prompted the systematic analysis presented in this work of the TREX1 cDNAs and the TREX1 genomic sequence in the human BAC clone RP11-24C3.

The TREX1 ORF-- The genomic DNA sequences flanking the mouse, bovine, and human TREX1 genes were examined to confirm the single ORF structure of this gene. Initial studies of human and mouse TREX1 cDNA sequences identified a common ATG codon positioned near the 5' end of the TREX1 ORFs (17). The recombinant proteins produced from the human and mouse TREX1 cDNAs using this ATG as a start codon generated active 3' right-arrow 5' exonucleases. However, mouse TREX1 ESTs (i.e. no. AI182180, BF577448, AA197643) contain a second in-frame ATG codon 30 nucleotides upstream raising the possibility that the initiating methionine in the TREX1 ORF had not been identified. To identify the initiating Met for TREX1, genomic DNAs positioned at the 5' end of the mouse, bovine, and human TREX1 ORFs were recovered using PCR, and the nucleotide sequences of these PCR products were determined. Primer pairs used in these reactions were designed from the TREX1 ESTs indicated in Fig. 1 and the available sequences in the GenBankTM dbEST data base (Fig. 1, Table I). The lengths of the genomic DNA fragments recovered from the PCRs were 1114 bp (mouse), 507 bp (bovine), and 532 bp (human). Alignments of the ESTs with the recovered genomic sequences identified consensus intron donor and acceptor sequences and indicated that an RNA splicing process modified the 5'-flanking regions of the TREX1 transcripts (data not shown). The deduced amino acid sequences were determined from the genomic DNA sequences and aligned using ClustalW to determine the relative identity at the 5' ends of the TREX1 ORFs (Fig. 1). The alignment shows that the Met labeled 1 is the only Met conserved in all three mammalian sequences, indicating that this is the initiating Met of the TREX1 ORF. No sequence identity is detected prior to the proposed initiating Met. Additionally, the translated genomic sequences for mouse and bovine genomic DNA contain in-frame stop codons at positions -28 and -40, providing further support for the assignment of the initiating Met for TREX1 (Fig. 1). The translated human genomic sequence has two additional in-frame Met at positions -38 and -55. The significance of these potential Met codons is currently unknown. The human and bovine TREX1 sequences at the initiating ATG are identical to the Kozak consensus sequence (45), and the mouse sequence differs at a single position (Fig. 1). Additional PCRs of genomic DNA have confirmed the single ORF structures of TREX1 in the mouse, bovine, and human genomes (data not shown). The human and mouse TREX1 ORFs indicate a coding region of 314 amino acids, and the bovine TREX1 ORF is 315 amino acids in length. A Drosophila TREX homolog (no. AE003581) encodes a protein of 351 amino acids and contains two exons. The homologous relationship between the mammalian and Drosophila genes is apparent by computational analysis using the COGNITOR program (46). The products of these genes fit into the same cluster of orthologous groups of proteins represented by the E. coli DNA polymerase III-epsilon subunit. Although the biochemical relationship between these proteins is very likely to catalyze the removal of nucleotides from DNA 3' termini, the evolutionary relationship between these genes, the cellular functions, and the three-dimensional structures are not known.


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1.   Identification of the TREX1 initiating Met. The indicated mouse (Ms), bovine (Bv), and human (Hu) TREX1 ESTs are aligned with the corresponding TREX1-containing genomic DNA sequences recovered by PCR. The TREX1 ORFs and the 5'-flanking exons (filled boxes) are connected by the intron sequences (solid lines). The alignments identify sequences present in the ESTs and genomic DNA (dotted lines) and sequences removed from ESTs by RNA splicing (solid, bent lines). The arrows indicate the positions of the PCR primers used to recover genomic DNA. The deduced amino acid sequences from mouse, bovine, and human TREX1 ORFs in the genomic sequences were aligned using ClustalW. The positions of identity in all three sequences (*) and two of three sequences (:) are indicated. The TREX1 protein sequences are boxed, and the deduced amino acids residues at positions prior to the proposed initiating methionine 1 are assigned negative values. The positions of in-frame stop codons (X) and residues prior to the stop codons (-) are indicated. The putative mouse, bovine, and human Kozak and consensus Kozak sequences are shown.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Oligonucleotide primers used for various amplification experiments

The 5'-Flanking Region of TREX1 Transcripts-- The 5'-flanking region of TREX1 cDNAs was examined using a 5'-RACE procedure. A two-round PCR was designed using spleen cDNA with the nested TREX1-specific reverse primers (T1rv1 and T1rv2) and the cDNA adapter primers. Seven independent clones ranging from 133 to 612 bp in length were recovered, and these sequences were aligned with the TREX1 genomic sequence (Fig. 2). To identify genomic sequences in the 5'-flanking region of TREX1 that might serve as transcription initiation sites, the sequence analysis Neural Network Promoter Prediction (NNPP) program was used. Two potential promoters positioned -140 and -650 bp from the TREX1 ORF are identified (Fig. 2). The 5' ends of four TREX1 cDNAs (Fig. 2, labeled 3-6) align with the genomic sequence at positions, indicating transcription initiation at the -650 consensus putative promoter sequence. Two of the cDNAs (Fig. 2, labeled 1 and 2) align at positions indicating transcription initiation at the -140 or the -650 putative promoter sequences. In addition, the 5' end of another cDNA (Fig. 2, labeled 7) was positioned 5' to both predicted promoter sequences, indicating additional or alternative promoters are present in the 5-flanking region of TREX1. Two of the cDNAs (Fig. 2, labeled 6 and 7) were spliced at consensus intron donor and acceptor sequences that had been previously identified in human ESTs, providing further support for a RNA splicing modification of the 5'-flanking region of TREX1 transcripts.


View larger version (7K):
[in this window]
[in a new window]
 
Fig. 2.   A 5'-RACE analysis of the TREX1 ORF. The TREX1 cDNAs (1-7) were recovered by 5'-RACE, and the sequences were aligned with the TREX1 genomic sequence (filled box and solid line). The alignment identifies sequences present in the cDNAs and genomic DNA (dotted lines) and sequences removed from cDNAs by RNA splicing (solid, bent lines). The 5' end positions (arrows), the NNPP-predicted promoters (-650 and -140), and the TREX1-specific primers (T1rv1 and T1rv2) are indicated. Hu, human.

Splicing of the TREX1 Transcripts and Expression in Human Tissues-- The TREX1 cDNA sequences recovered in the 5'-RACE analysis were compared with the 5'-flanking regions of TREX1 ESTs available in the GenBankTM data base. These sequences indicated the presence of one intron donor sequence and two acceptor sequences (Fig. 3). Thus, in addition to the unprocessed TREX1 transcript, two splicing pathways were possible for processing of the TREX1 transcripts. It was predicted that splicing from the donor site to acceptor site A would generate a transcript encoding the complete TREX1 ORF, whereas splicing to acceptor site B would generate a transcript that lacks necessary TREX1 sequence to encode an active TREX1 protein (Fig. 3A). It is possible that these alternatively spliced TREX1 transcripts reveal a pathway for regulation of TREX1 by alteration of the mRNA stability or translation efficiency. A two-round PCR was designed to estimate the relative abundance of the three possible TREX1 transcripts using reverse transcribed RNA from AML cells. The TREX1 cDNAs were amplified using the nested TREX1 ORF-specific primers (T1rv1 and T1rv2) and the nested 5'-flanking region primers (T1fr1 and T1fr2). The three possible TREX1 transcripts were detected by agarose gel electrophoresis of the PCR products (Fig. 3B). Sequencing of the cloned products confirmed the identity of the least abundant 532-bp band as the product of the unspliced TREX1 transcript. The 212- and 102-bp bands resulted from amplification of the two spliced TREX1 transcripts. The most abundant band is the 212-bp product generated by splicing from the donor site to acceptor site A positioned 26 base pairs 5' to the predicted initiating methionine codon. These data indicate that the most abundant TREX1 transcripts in AML cells, initiating at the -650 putative promoter, are processed by an RNA splicing mechanism that removes a 320-bp intron from the 5'-flanking region of the TREX1 transcripts. Analysis of mouse and bovine TREX1 ESTs in the data base reveal a similar RNA splicing pathway to conserved acceptor sites positioned at -7 bp in mouse and -21 bp in bovine prior to the initiating ATG codons, suggesting conservation of this mechanism between mammalian species.


View larger version (61K):
[in this window]
[in a new window]
 
Fig. 3.   Splicing of TREX1 transcripts and expression in human tissues. The RNA splicing pathways (solid, bent lines) from donor site to acceptor site A or acceptor site B and the three possible TREX1 transcripts (1-3) are indicated (A). The PCR products (532, 212, and 102 bp) are predicted using the indicated nested primer pairs (T1fr1, T1fr2 and T1rv1, T1rv2). Agarose gel electrophoresis of the PCR products generated using reverse transcribed AML RNA (B, lane 3) indicates the presence of all three TREX1 transcripts. Lane 1 contains DNA size standards, and lane 2 contains a PCR of non-reverse transcribed AML RNA. Total RNA from various human tissues was subjected to RT-PCR using the TREX1-specific T1fr2 and T1rv1 primers. Agarose gel electrophoresis of the PCR products (C) indicates the presence of the three TREX1 transcripts in all tissues. Hu, human.

The donor site to acceptor site A pathway is the predominate pathway for processing of TREX1 transcripts in various human tissues. To determine the pattern of TREX1 gene expression in human cells, a single-round RT-PCR experiment was performed using the T1fr2 and T1rv1 primers and reverse transcribed total RNA from 13 different tissues (Fig. 3). The three PCR products generated from TREX1 transcripts are detected in all tissues tested (Fig. 3C). In addition, the relative ratios of the three PCR products are similar in each tissue, with the 212-bp product being the most abundant in all tissues. Some quantitative differences in the staining intensities of the 212-bp products are apparent, suggesting variability in expression between tissue samples. These data suggest that TREX1 expression in spleen, prostate, thymus, and AML cells is higher than that in heart, skeletal muscle, and bone marrow. A more quantitative analysis will be necessary to substantiate these variations in expression, but ubiquitous expression of TREX1 is clear from these results.

The 5'-Flanking Region of TREX2 Transcripts-- In previous work from this laboratory, an active 3' right-arrow 5' exonuclease was generated from a single mouse Trex2 EST identified in the GenBankTM data base (17). To date only Trex2 ESTs from mouse have been deposited in the GenBankTM data base. A PCR strategy was developed to identify human TREX2 cDNAs and to investigate the 5'-flanking region of these cDNAs using a 5'-RACE procedure. A two-round PCR was designed using spleen cDNA with nested TREX2-specific reverse primers (T2rv1 and T2rv2) and the cDNA adapter primers. Six independent clones ranging in length from 95 to 959 bp were recovered, sequenced, and aligned with the TREX2 genomic sequence (Fig. 4). The genomic sequence in the 5'-flanking region of TREX2 was examined using the NNPP program to identify potential transcription initiation sites. Two potential promoters positioned -623 and -753 bp from the TREX2 ORF were identified (Fig. 4). The 5' ends of five TREX2 cDNAs (Fig. 4, labeled 1-5) align with the genomic sequence at positions, indicating transcription initiation at one of these consensus putative promoter sequences. The 5' end of another cDNA (Fig. 4, labeled 6) was positioned 5' to both predicted promoters, indicating additional or alternative promoters are present upstream in the 5'-flanking region of TREX2.


View larger version (6K):
[in this window]
[in a new window]
 
Fig. 4.   A 5'-RACE analysis of the TREX2 ORF. The TREX2 cDNAs (1-6) were recovered by 5'-RACE, and the sequences were aligned with the TREX2 genomic sequence (filled box and solid line). The alignment indicates that all of the recovered cDNA sequences are identical to the genomic DNA sequence (dotted lines). The 5' end positions (arrows), the NNPP-predicted promoters (-753 and -623), and the TREX2-specific primers (T2rv1 and T2rv2) are indicated. Hu, human.

Splicing of the TREX2 Transcripts and Expression in Human Tissues-- The genomic DNA sequence in the 5'-flanking region of the human TREX2 was examined for possible splice donor and acceptor sites. One potential intron donor sequence and two potential acceptor sequences were identified, suggesting the possibility for a processing pathway for TREX2 transcripts similar to that for TREX1 transcripts (Fig. 5A). Thus, like the TREX1 transcripts, two splicing pathways are possible for processing of the TREX2 transcripts. However, unlike processing of a TREX1 transcript, splicing from the donor site to acceptor site A or to acceptor site B generates a TREX2 transcript encoding the complete ORF. A two-round PCR was performed using the nested TREX2 ORF-specific primers (T2rv1 and T2rv2) and the nested 5'-flanking region primers (T2fr1 and T2fr2) with AML cDNA to recover TREX2 cDNAs. The three possible TREX2 transcripts are detected upon agarose gel electrophoresis of the PCR products (Fig. 5B). Sequencing of the cloned products confirmed the identity of the most abundant 643-bp band as the product of the unspliced TREX2 transcript. The 130- and 99-bp bands resulted from amplification of the two spliced TREX2 transcripts. The relative intensity of the bands likely reflects the relative abundance of the TREX2 transcripts in AML cells with the unspliced TREX2 transcript being the predominant transcript. These results provide the first evidence for expression of the TREX2 gene in human cells and identify an RNA splicing process that removes a 513- or a 544-bp intron from the 5-flanking region of the TREX2 transcripts. The sequence of the mouse Trex2 EST (no. AA060540) indicates splicing in the 5'-flanking region from the donor site to acceptor site B to remove a 623-bp intron, indicating conservation of this RNA splicing mechanism between mammalian species (data not shown).


View larger version (44K):
[in this window]
[in a new window]
 
Fig. 5.   Splicing of TREX2 transcripts and expression in human tissues. The RNA splicing pathways (solid, bent lines) from donor site to acceptor site A or acceptor site B and the three possible TREX2 transcripts (1-3) are indicated (A). The PCR products (643, 130, and 99 bp) are predicted using the indicated nested primer pairs (T2fr1, T2fr2 and T2rv1, T2rv2). Agarose gel electrophoresis of the PCR products generated using reverse transcribed AML RNA (B, lane 2) indicates the presence of all three TREX2 transcripts. Lane 1 contains DNA size standards. Total RNA from various human tissues was subjected to RT-PCR using the TREX2-specific T2fr2 and T2rv2 primers. Agarose gel electrophoresis of the PCR products (C) indicates the presence of only the 643-bp unspliced TREX2 transcript in all tissues. Hu, human.

An experiment was designed to measure the gene expression pattern for TREX2 in human cells. The pattern of TREX2 gene expression was determined in a series of RT-PCRs using the T2fr2 and T2rv2 primers and reverse transcribed total RNA from 13 different tissues (Fig. 5). The PCR product generated from the unspliced TREX2 transcript was detected in all tissues tested (Fig. 5C). The spliced TREX2 transcripts were not detected in this single-round PCR (data not shown), indicating that the unspliced form of the TREX2 transcript is the most abundant in all tissues tested. The staining intensities of the 643-bp products generated from thymus and spleen cDNA were greater than those from heart, skeletal muscle, and testis, suggesting some variations in expression levels of TREX2 in human cells. The recovery of TREX1 and TREX2 transcripts from all human tissues tested implicates these proteins in housekeeping functions such as the DNA repair processes necessary to maintain the integrity of the human genome.

Identification of Novel TREX1 Transcripts-- Previous mapping experiments located the human TREX1 ORF to chromosome 3p21.3-21.2 (43). However, limited genomic sequence in this region precluded a detailed analysis of the sequence surrounding the TREX1 ORF. More recently, we used the TREX1 cDNA (no. AF151105) as the query sequence in a BLAST search of the GenBankTM high throughout genomic sequences data base to identify the human BAC clone RP11-24C3 (no. AC021328). The presence of TREX1 on this genomic DNA fragment confirms the mapped position of TREX1 to 3p21.3-21.2 on the current Human Genome Map. The RP11-24C3 DNA clone is a "working draft" sequence consisting of 20 unordered contigs with gaps of ~100 bp. The correct order of the five contigs containing the TREX1 ORF was determined using human, mouse, and pig ESTs in the GenBankTM dbEST data base and a genomic sequence in the GenBankTM GSS data base (Fig. 6). A pig EST AW346748 identifies the two contigs labeled II and III immediately 5' to the TREX1 ORF-containing contig labeled I. The translated protein sequence of the first 66 bp of the pig EST identifies the human BAC clone AQ430752 in the GSS data base. A BLAST search of the dbEST data base using the AQ430752 clone as the query sequence identifies a mouse EST AI035823 that positions the next 5' adjacent contig labeled IV. Finally, a BLAST search of the dbEST data base using the EST AI035823 as a query sequence identifies several ESTs (i.e. no. AI031903) that indicate the position of the contig labeled V. Thus, five DNA fragments containing 25 kb of genomic DNA on RP11-24C3 have been ordered to reveal the genomic sequence positioned 5' to the TREX1 ORF. We cannot exclude the possibility that additional contigs might be intervening but were not detected in our analysis.


View larger version (20K):
[in this window]
[in a new window]
 
Fig. 6.   Novel TREX1-containing transcripts. The genomic sequence (solid line) of a 25-kb region of human chromosome 3p21.2-21.3 (no. AC021328) containing the TREX1 ORF (exon 13) is present in the DNA contigs labeled I-V (boldface squiggles indicate the contig boundaries). The genomic sequence between contig III and contig IV is the human BAC AQ430752 identified in the GenBankTM GSS data base. The human (Hu), mouse (Ms), and pig (Pg) ESTs and human BAC sequence used to identify the correct order of the contigs are shown, and the positions of the exons in these ESTs (filled boxes) are indicated. The GENESCAN-predicted exons (1-13) in the genomic sequence and in the predicted transcript sequence are shown as filled boxes. Three human ESTs from the GenBankTM dbEST data base are aligned with the GENESCAN-predicted exons. The introns are shown as solid, bent lines. The novel TREX1 cDNAs (1-4) were recovered in this study by PCR using the indicated primers as described under "Results and Discussion." The locations of NNPP-predicted transcription initiation sites (-18 kb, -650 bp, and -140 bp), poly(A) sites (#1 and #2), and stop codons are indicated.

A PCR strategy was developed to identify novel TREX1 cDNAs with possible transcription initiation sites positioned 18 kb 5' to the TREX1 ORF. The potential exons contained within this 18-kb region were identified using the gene-finding algorithm GENSCAN (47). A hypothetical transcript encoding a protein of 887 amino acids containing 13 exons with the TREX1 ORF as the most 3' exon is predicted (Fig. 6). Transcripts with this exact sequence have not been identified. However, human ESTs containing exons 1-4 (no. BE871415), exons 2-11 (no. AK022405), and exons 12 and 13 (no. BE615019) have been identified (Fig. 6). A PCR strategy was developed to identify novel TREX1 cDNAs that contain the exon sequences encoded in the 18-kb 5'-flanking region (Fig. 6). First, a two-round PCR was performed using the nested TREX1 ORF-specific primers (T1rv2 and T1rv1) and the nested 5'-flanking region primers (T1fr3 and T1fr4) with AML cDNA to amplify cDNAs containing exons 10-13. Two products were recovered from agarose gels and sequenced. The sequences confirm the presence of exons 10-13 in these TREX1 cDNAs (Fig. 6, labeled 1 and 2). Furthermore, the sequences indicate the PCR products were recovered from transcripts spliced from exon 10 to exon 11 precisely as predicted using the GENSCAN program, but neither cDNA was spliced from exon 11 to exon 12. Additional PCRs were performed using the nested TREX1 ORF-specific primers (T1rv2 and T1rv1) with the nested 5'-flanking region primers (T1fr5 and T1fr6) to amplify cDNAs containing exons 4-13 or with the nested 5'-flanking region primers (T1fr7 and T1fr5) to amplify cDNAs containing exons 1-13. Two additional TREX1 ORF-containing cDNAs were identified from these reactions (Fig. 6, labeled 3 and 4). The cDNA labeled 3 contains the GENSCAN predicted exons 4-13 and two additional exons labeled 5A and 6AB. The cDNA labeled 4 contains the predicted exons 1-13 and four additional exons: 5A, 6A, 6B, and 6C. The TREX1 ORF-containing cDNAs identified in these PCRs span the polyadenylation signal (Fig. 6, labeled poly(A) #1) identified between exons 11 and 12. To generate these transcripts mRNA synthesis must proceed past the nonconsensus poly(A) 1 signal (AUUAAA) and through the TREX1 ORF to the consensus poly(A) 2 signal (AATAAA) (Fig. 6). These data indicate a complex pattern of transcription initiation and RNA processing for the mammalian TREX1 gene and suggest an association between the TREX1 ORF and exons identified in the 5'-flanking region.

A 5'-RACE analysis supports transcription initiation 5' to the GENESCAN-predicted exon 1. A two-round PCR was performed using spleen cDNA with the nested primers (T1rv5 and T1rv4) designed from exon 1 and exon 3 sequences and the cDNA adapter primers. Three independent clones were recovered, and these sequences were aligned with the TREX1 genomic sequence (data not shown). The 5' ends of these cDNAs are within 150 bp of the predicted initiation ATG in exon 1. Analysis of this sequence using the NNPP program identifies a potential transcription initiation site positioned 185 bp 5' to the predicted initiation ATG in exon 1. This transcription initiation site is -18 kb 5' to the TREX1 ORF (Fig. 6).

Identification of Novel TREX2 Transcripts-- In a previous report it was suggested that the TREX2 ORF was part of a larger GENESCAN-predicted ORF located in a genomic clone (no. AF002998) from chromosome Xq28 (43). This hypothetical transcript encodes a protein of 840 amino acids containing 16 exons with the TREX2 ORF as the most 3' exon (Fig. 7). There are no ESTs that correspond precisely to this predicted transcript. However, a human EST containing several of the exons in the GENESCAN-predicted ORF (no. AF267739) has been identified (Fig. 7). A PCR strategy was designed to identify novel TREX2 cDNAs that contain exons encoded in the 5'-flanking region of the TREX2 ORF. A two-round PCR was performed using the nested TREX2 ORF-specific primers (T2rv1 and T2rv2) and the nested 5'-flanking region primers (T2fr3 and T2fr4) with AML cDNA to amplify cDNAs containing exons 1-16 in the 25-kb 5'-flanking region. Four products were recovered from agarose gels and sequenced. The resulting cDNAs confirm the presence of transcripts containing exons that span the complete 25-kb 5'-flanking region of TREX2 (Fig. 7, labeled 1-4). Many of the 16 GENESCAN-predicted exons and others are present in the TREX2 cDNAs. Exons 3, 6, 7, 8, and 14 predicted in the GENESCAN analysis are not present in any of the recovered cDNAs. Additional exons (Fig. 7, labeled 12A and X) are detected in the TREX2 cDNAs. Exon X contains multiple stop codons in all three reading frames that disrupt the potential continuous ORF in these TREX2 cDNAs. In TREX2 cDNA 3, an additional stop codon is present in exon 12 (Fig. 7). Generation of these TREX2 transcripts requires mRNA synthesis past the nonconsensus poly(A) 1 signal (AAGAAA) and through the TREX2 ORF to the consensus poly(A) 2 signal (AATAAA). Identification of these novel TREX2 cDNAs supports the concept that transcription initiation in the 5'-flanking region might generate transcripts that could be processed by RNA splicing to contain a single ORF including the 236-amino acid TREX2 sequence as the most 3' exon.


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 7.   Novel TREX2-containing transcripts. The genomic sequence (solid line) of a 25 kb region of human chromosome Xq28 (no. AF002998) containing the TREX2 ORF (exon 16) is shown. The positions of the GENESCAN-predicted exons (1-16) in the genomic sequence and in the predicted transcript sequence are shown as filled boxes. A human (Hu) EST from the GenBankTM dbEST data base is aligned with the GENESCAN-predicted exons. The introns are shown as solid, bent lines. The novel TREX2 cDNAs (1-4) were recovered in this study by PCR using the indicated primers as described under "Results and Discussion." The locations of NNPP-predicted transcription initiation sites (-25 kb, -753 bp, and -623 bp), poly(A) sites (#1 and #2), and stop codons are indicated.

A 5'-RACE analysis also supports transcription initiation 5' to the GENESCAN-predicted exon 1. A two-round PCR was performed using spleen cDNA with nested primers designed from exon 10 sequences (T2rv3 and T2rv4) and the cDNA adapter primers. Two clones were recovered, and these sequences were aligned with the TREX2 genomic sequence (data not shown). The 5' ends of these cDNAs are located near the predicted initiation ATG in exon 1. Analysis of this sequence using the NNPP program identifies a potential transcription initiation site positioned upstream from the predicted initiation ATG in exon 1 (Fig. 7). This transcription initiation site is more than 25 kb 5' to the TREX2 ORF. The detection of novel TREX1 and TREX2 transcripts containing a dicistronic structure indicates a complex pattern of expression for these genes. The potential relationships between the ORFs encoded in the 5'-flanking regions and the TREX1 and TREX2 ORFs are not apparent. Furthermore, the ability to translate the TREX ORFs within the context of the dicistronic transcript has not been tested.

In conclusion, we have demonstrated that the TREX1 and TREX2 genes encode mammalian 3' right-arrow 5' exonucleases that are expressed in all human tissues examined. There are a number of similarities in the structures and in the expression patterns of the TREX genes. The genomic sequence encoding the 314-amino acid TREX1 protein is contained in a single ORF. The 236-amino acid TREX2 protein is also encoded in a single ORF. For both TREX1 and TREX2, transcripts are initiated within 1 kb of the exonuclease ORFs, and intronic sequences are removed from the 5'-untranslated region by two possible RNA splicing pathways. Additional sites of transcription initiation are identified at positions 18 kb 5' to the TREX1 ORF and 25 kb 5' to the TREX2 ORF generating transcripts that contain the TREX ORF and a second upstream ORF of unknown function. The detection of TREX1 and TREX2 transcripts in all human cells indicates the ubiquitous expression of these genes and supports a requirement for these 3' exonucleases in DNA repair pathways in human cells.

    ACKNOWLEDGEMENTS

We thank Scott Harvey for excellent technical assistance throughout this work and Sallyanne Fossey and Don Bowden for hybrid scan analysis.

    FOOTNOTES

* This work was supported by National Institutes of Health Grants CA75350 and CA12197.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Dagger To whom correspondence should be addressed. Tel.: 336-716-4349; Fax: 336-716-7200; E-mail: fperrino@wfubmc.edu.

Published, JBC Papers in Press, January 29, 2001, DOI 10.1074/jbc.M010051200

1 Mazur, D. J., and Perrino, F. W. (2001) J. Biol. Chem., papers in press 10.1074/jbc.M100623200.

3 The discovery of a second gene closely related to the gene for TREX1/DNase III necessitated the renaming of DNase III (gene DRN) to TREX1. The TREX designation provides a unique symbol for the two closely related 3' exonuclease genes and is used in other species. The murine orthologs of these genes have the approved symbols of Trex1 and Trex2. The TREX name reflects the biochemical activity and likely biological role as three prime repair exonucleases.

    ABBREVIATIONS

The abbreviations used are: Exo, exonuclease; no., GenBankTM accession no.; RT, reverse transcription; PCR, polymerase chain reaction; ORF, open reading frame; EST, expressed sequence tags; bp, base pair(s); kb, kilobase(s); contig, group of overlapping clones; AML, acute myeloblastic leukemia; NNPP, Neural Network Promoter Prediction; GSS, genome survey sequence.

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
REFERENCES

1. Chung, D. W., Zhang, J., Tan, C.-K., Davie, E. W., So, A. G., and Downey, K. M. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 11197-11201[Abstract]
2. Kesti, T., Frantti, H., and Syväoja, J. E. (1993) J. Biol. Chem. 268, 10238-10245[Abstract/Free Full Text]
3. Ropp, P. A., and Copeland, W. C. (1996) Genomics 36, 449-458[CrossRef][Medline] [Order article via Infotrieve]
4. Shen, J. C., Gray, M. D., Oshima, J., Ashwini, S. K., Fry, M., and Loeb, L. A. (1998) J. Biol. Chem. 273, 34139-34144[Abstract/Free Full Text]
5. Yu, C. E., Oshima, J., Fu, Y. H., Wijsman, E. M., Hisama, F., Alisch, R., Matthews, S., Nakura, J., Miki, T., Ouais, S., Martin, G. M., Mulligan, J., and Schellenberg, G. D. (1996) Science 272, 258-262[Abstract]
6. Shen, J. C., and Loeb, L. A. (2000) Trends in Genetics 16, 213-220[CrossRef][Medline] [Order article via Infotrieve]
7. Mummenbrauer, T., Janus, F., Müller, B., Wiesmüller, L., Deppert, W., and Grosse, F. (1996) Cell 85, 1089-1099[Medline] [Order article via Infotrieve]
8. Albrechtsen, N., Dornreiter, I., Grosse, F., Kim, E., Wiesmuller, L., and Deppert, W. (1999) Oncogene 18, 7706-7717[CrossRef][Medline] [Order article via Infotrieve]
9. Udell, C. M., Lee, S. K., and Davey, S. (1998) Nucleic Acids Res. 26, 3971-6397[Abstract/Free Full Text]
10. Thelen, M. P., Onel, K., and Holloman, W. K. (1994) J. Biol. Chem. 269, 747-754[Abstract/Free Full Text]
11. Parker, A. E., Van de Weyer, I., Laus, M. C., Oostveen, I., Yon, J., Verhasselt, P., and Luyten, W. H. (1998) J. Biol. Chem. 273, 18332-18339[Abstract/Free Full Text]
12. Bessho, T., and Sancar, A. (2000) J. Biol. Chem. 275, 7451-7454[Abstract/Free Full Text]
13. Ajimura, M., Leem, S. H., and Ogawa, H. (1993) Genetics 133, 51-66[Abstract/Free Full Text]
14. Petrini, J. H., Walsh, M. E., DiMare, C., Chen, X. N., Korenberg, J. R., and Weaver, D. T. (1995) Genomics 29, 80-86[CrossRef][Medline] [Order article via Infotrieve]
15. Paull, T. T., and Gellert, M. (1998) Mol. Cell 1, 969-979[Medline] [Order article via Infotrieve]
16. Trujillo, K. M., Yuan, S. S., Lee, E. Y., and Sung, P. (1998) J. Biol. Chem. 273, 21447-21450[Abstract/Free Full Text]
17. Mazur, D. J., and Perrino, F. W. (1999) J. Biol. Chem. 274, 19655-19660[Abstract/Free Full Text]
18. Freemont, P. S., Friedman, J. M., Beese, L. S., Sanderson, M. R., and Steitz, T. A. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 8924-8928[Abstract]
19. Beese, L. S., and Steitz, T. A. (1991) EMBO J. 10, 25-33[Abstract]
20. Wang, J., Yu, P., Lin, T. C., Konigsberg, W. H., and Steitz, T. A. (1996) Biochemistry 35, 8110-8119[CrossRef][Medline] [Order article via Infotrieve]
21. Derbyshire, V., Grindley, N. D. F., and Joyce, C. M. (1991) EMBO J. 10, 17-24[Abstract]
22. Reha-Krantz, L. J., Stocki, S., Nonay, R. L., Dimayuga, E., Goodrich, L. D., Konigsberg, W. H., and Spicer, E. K. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 2417-2421[Abstract]
23. Reha-Krantz, L. J., and Nonay, R. L. (1993) J. Biol. Chem. 268, 27100-27108[Abstract/Free Full Text]
24. Bernad, A., Blanco, L., Lazaro, J. M., Martin, G., and Salas, M. (1989) Cell 59, 219-228[Medline] [Order article via Infotrieve]
25. Koonin, E. V., and Deutscher, M. P. (1993) Nucleic Acids Res. 21, 2521-2522[Medline] [Order article via Infotrieve]
26. Mian, I. S. (1997) Nucleic Acids Res. 25, 3187-3195[Abstract/Free Full Text]
27. Mushegian, A. R., Bassett, D. E., Jr., Boguski, M. S., Bork, P., and Koonin, E. V. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 5831-5836[Abstract/Free Full Text]
28. Moser, M. J., Holley, W. R., Chatterjee, A., and Mian, I. S. (1997) Nucleic Acids Res. 25, 5110-5118[Abstract/Free Full Text]
29. Huang, S., Li, B., Gray, M. D., Oshima, J., Mian, I. S., and Campisi, J. (1998) Nat. Genet. 20, 114-116[CrossRef][Medline] [Order article via Infotrieve]
30. Barnes, M. H., Spacciapoli, P., Li, D. H., and Brown, N. C. (1995) Gene (Amst.) 165, 45-50[CrossRef][Medline] [Order article via Infotrieve]
31. Strauss, B. S., Sagher, D., and Acharya, S. (1997) Nucleic Acids Res. 25, 806-813[Abstract/Free Full Text]
32. Taft-Benz, S. A., and Schaaper, R. M. (1998) Nucleic Acids Res. 26, 4005-4011[Abstract/Free Full Text]
33. Ito, J., and Braithwaite, D. K. (1998) Mol. Microbiol. 27, 235-236[CrossRef][Medline] [Order article via Infotrieve]
34. Viswanathan, M., and Lovett, S. T. (1999) J. Biol. Chem. 274, 30094-30100[Abstract/Free Full Text]
35. Mol, C. D., Kuo, C. F., Thayer, M. M., Cunningham, R. P., and Tainer, J. A. (1995) Nature 374, 381-386[CrossRef][Medline] [Order article via Infotrieve]
36. Gorman, M. A., Morera, S., Rothwell, D. G., de La Fortelle, E., Mol, C. D., Tainer, J. A., Hickson, I. D., and Freemont, P. S. (1997) EMBO J. 16, 6548-6558[Abstract/Free Full Text]
37. Wilson, D. M., III, Takeshita, M., Grollman, A. P., and Demple, B. (1995) J. Biol. Chem. 270, 16002-16007[Abstract/Free Full Text]
38. Chaudhry, M. A., Dedon, P. C., Wilson, D. M., III, Demple, B., and Weinfeld, M. (1999) Biochem. Pharmacol. 57, 531-8[CrossRef][Medline] [Order article via Infotrieve]
39. Chou, K. M., Kukhanova, M., and Cheng, Y. C. (2000) J. Biol. Chem. 275, 31009-31015[Abstract/Free Full Text]
40. Cho, Y., Gorina, S., Jeffrey, P. D., and Pavletich, N. P. (1994) Science 265, 346-355[Medline] [Order article via Infotrieve]
41. Venclovas, C., and Thelen, M. P. (2000) Nucleic Acids Res. 28, 2481-2493[Abstract/Free Full Text]
42. Lindahl, T., Gally, J. A., and Edelman, G. M. (1969) J. Biol. Chem. 244, 5014-5019[Abstract/Free Full Text]
43. Hoss, M., Robins, P., Naven, T. J., Pappin, D. J., Sgouros, J., and Lindahl, T. (1999) EMBO J. 18, 3868-75[Abstract/Free Full Text]
44. Lizardi, P. M. (1983) Meth. Enzymol. 96, 24-38[Medline] [Order article via Infotrieve]
45. Kozak, M. (1987) Nucleic Acids Res. 15, 8125-8148[Abstract]
46. Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) Nucleic Acids Res. 28, 33-36[Abstract/Free Full Text]
47. Burge, C., and Karlin, S. (1997) J. Mol. Biol. 268, 78-94[CrossRef][Medline] [Order article via Infotrieve]


Copyright © 2001 by The American Society for Biochemistry and Molecular Biology, Inc.