Rapid Protein Sequencing by Tandem Mass Spectrometry and cDNA Cloning of p20-CGGBP
A NOVEL PROTEIN THAT BINDS TO THE UNSTABLE TRIPLET REPEAT 5'-d(CGG)n-3' IN THE HUMAN FMR1 GENE*

(Received for publication, February 19, 1997, and in revised form, April 24, 1997)

Heidrun Deissler Dagger , Matthias Wilm §, Bülent Genç Dagger , Birgit Schmitz Dagger , Thomas Ternes Dagger , Frauke Naumann Dagger , Matthias Mann § and Walter Doerfler Dagger

From the Dagger  Institut für Genetik, Universität zu Köln, D-50931 Köln and the § Protein & Peptide Group, European Molecular Biology Laboratory, D-69117 Heidelberg, Federal Republic of Germany

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES


ABSTRACT

The autonomous expansion of the unstable 5'-d(CGG)n-3' repeat in the 5'-untranslated region of the human FMR1 gene leads to the fragile X syndrome, one of the most frequent causes of mental retardation in human males. We have recently described the isolation of a protein p20-CGGBP that binds sequence-specifically to the double-stranded trinucleotide repeat 5'-d(CGG)-3' (Deissler, H., Behn-Krappa, A., and Doerfler, W. (1996) J. Biol. Chem. 271, 4327-4334). We demonstrate now that the p20-CGGBP can also bind to an interrupted repeat sequence. Peptide sequence tags of p20-CGGBP obtained by nanoelectrospray mass spectrometry were screened against an expressed sequence tag data base, retrieving a clone that contained the full-length coding sequence for p20-CGGBP. A bacterially expressed fusion protein p20-CGGBP-6xHis exhibits a binding pattern to the double-stranded 5'-d(CGG)n-3' repeat similar to that of the authentic p20-CGGBP. This novel protein lacks any overall homology to other known proteins but carries a putative nuclear localization signal. The p20-CGGBP gene is conserved among mammals but shows no homology to non-vertebrate species. The gene encoding the sequence for the new protein has been mapped to human chromosome 3.


INTRODUCTION

DNA sequences containing trinucleotide repeats of the general sequence (CXG)n or (GAA)n appear to be genetically unstable. Amplifications of such repeats were found to be associated with several human diseases, e.g. fragile X (FraX)1 syndrome, Huntington's disease, myotonic dystrophy, or Friedreich's ataxia (for reviews see Refs. 1 and 2). The number of repeats usually correlates with the severity of the disease.

The FraX syndrome is associated with an amplification of the trinucleotide sequence 5'-d(CGG)n-3' in the 5'-UTR of the FMR1 (fragile X mental retardation) gene and extensive 5'-d(CG)-3' methylation in the repeat as well as in adjacent regions (3-7). As a consequence, the expression of the encoded protein FMRP is reduced or abolished. Fragile sites similar to this locus have been identified on several of the human chromosomes, and all sites characterized so far have shown an expansion of 5'-d(CGG)n-3' repeats suggesting an important structural function of this DNA sequence. Several studies have demonstrated that single-stranded oligodeoxyribonucleotides of the general sequence 5'-d(CXG)n-3' with X = A or G can form stable secondary structures in vitro (8-14). The mechanism of the amplification of unstable DNA sequences is not understood. Expansion might involve DNA slippage or unequal crossing over, but these models do not explain the observed phenomenon sufficiently (for review see Ref. 15). The function of the repeat itself is also unknown. A 5'-d(CGG)4-3' fragment in the rRNA gene promoter (16) and a 5'-d(CTG)25-3' element located in the promoter of the mouse growth inhibitory factor gene (17) are thought to function as regulatory elements.

Genetic instability of long 5'-d(CXG)n-3' tracts with deletion products of triplet repeat sequences has also been observed in Escherichia coli (18). The instability of long 5'-d(CGG)n-3' repeats in E. coli has been shown to depend on host cell genotype, length, polymorphism, and on the orientation of the triplet repeat relative to the replication origin (19). Expansion products of triplet repeat sequences have also been detected (20).

We are interested in the characterization of human cellular proteins binding to 5'-d(CGG)n-3' repeats in a sequence-specific manner. The study of these proteins may improve our understanding of triplet repeat amplifications and their function. Double-stranded as well as single-stranded simple trinucleotide repeat sequences are target sequences for sequence-specific DNA binding proteins (21-23). We have recently isolated the 20-kDa protein p20-CGGBP (5'-d(CGG)n-3' binding protein) from HeLa nuclear extracts by DNA affinity chromatography. This protein binds sequence-specifically to the unstable, double-stranded 5'-d(CGG)n-3' trinucleotide repeat (24). The protein requires more than eight repeats for proper binding. Base pair exchanges in the 1st base pair of every second triplet repeat abolish binding. None of the other known unstable trinucleotide repeats or either single strand of the trinucleotide repeat 5'-d(CGG)n-3' can serve as a target sequence for this protein. The binding of p20-CGGBP is also severely inhibited by complete or partial cytosine-specific DNA methylation of the binding motif. Interestingly, a p20-CGGBP activity has been found in several human and mammalian cell lines and in human primary lymphocytes.

Here we describe the rapid cloning of the full-length cDNA for p20-CGGBP by a novel strategy. Peptide sequence tags obtained by nanoelectrospray mass spectrometry (25) of the purified protein have been used to screen an EST data base. The encoded protein has been expressed in bacteria, and its binding specificity has been investigated in detail.


EXPERIMENTAL PROCEDURES

Cell Lines

Human HeLa cells were purchased from the Gesellschaft für Biotechnologische Forschung, Braunschweig, Germany.

Purification of p20-CGGBP from HeLa Nuclear Extracts

The 5'-d(CGG)n-3' binding protein p20-CGGBP was purified from HeLa nuclear extracts by anion exchange chromatography and DNA affinity chromatography as described (24) with slight modifications. A mixture of double-stranded (ds) and single-stranded calf thymus DNA-cellulose (Pharmacia Biotech Inc.) was used as an unspecific DNA matrix. As a specific DNA matrix 5'-d(CGG)17-3'ds-Sepharose was used. Each fraction was tested for its ability to bind sequence-specifically to the oligodeoxyribonucleotide 5'-d(CGG)17-3'ds as described. The sequences of various oligodeoxyribonucleotides used in this study were summarized in Table I. Protein fractions prepared for peptide sequencing were eluted from 5'-d(CGG)17-3'ds-Sepharose at 65 °C for 10 min in 5 mM Tris-HCl, 1% SDS, 2 mM dithiothreitol, 10 µg/ml aprotinin, pH 6.9, separated by SDS-polyacrylamide gel electrophoresis and transferred to a polyvinylidene difluoride (PVDF) membrane (Problot, Applied Biosystems) in 10 mM CAPS, 10% methanol, pH 11. After staining of the membrane in Coomassie Blue R-250, the appropriate band was excised and stored at -20 °C (26).

Table I. Repetitive oligodeoxyribonucleotides used in competition experiments


Composition of ds Sequence

(CGG)17ds 5'-(CGG CGG)8 CGG-3'
3'-(GCC GCC)8 GCC-5'
(CAG)17ds 5'-(CAG CAG)8 CAG-3'
3'-(GTC GTC)8 GTC-5'
(CGA)17ds 5'-(CGA CGA)8 CGA-3'
3'-(GCT GCT)8 GCT-5'
(CAA)17ds 5'-(CAA CAA)8 CAA-3'
3'-(GTT GTT)8 GTT-5'
(TGG)17ds 5'-(TGG TGG)8 TGG-3'
3'-(ACC ACC)8 ACC-5'
CGG8Tds 5'-(CGG TGG)8 CGG-3'
3'-(GCC ACC)8 GCC-5'
CGG8Ads 5'-(CGG AGG)8 CGG-3'
3'-(GCC TCC)8 GCC-5'
CGG8Gds 5'-(CGG GGG)8 CGG-3'
3'-(GCC CCC)8 GCC-5'
CGG10AGGds 5'-(CGG)3 AGG (CGG)9 AGG (CGG)3-3'
3'-(GCC)3 TCC (GCC)9 TCC (GCC)3-5'
CGG10AGG/(CCG)17 5'-(CGG)3 AGG (CGG)9 AGG (CGG)3-3'
3'-(GCC)3 GCC (GCC)9 GCC (GCC)3-5'
(CGG)17/CCG10CCT 5'-(CGG)3 CGG (CGG)9 CGG (CGG)3-3'
3'-(GCC)3 TCC (GCC)9 TCC (GCC)3-5'
FraxFds
5'-GTCCCCCGCTGCCGTCGCCGTCGCCGTCGCCGCCGCCGCCGCCGCCGCCGCC-3'
3'-CAGGGGGCGACGGCAGCGGCAGCGGCAGCGGCGGCGGCGGCGGCGGCGGCGG-5'

Southwestern Blotting

Protein fractions isolated from HeLa nuclear extracts and enriched for the 5'-d(CGG)17-3'ds binding activity as well as fractions without such activity were separated by SDS-polyacrylamide gel electrophoresis. After transfer to a PVDF membrane (Fluorotrans, Pall, Dreieich, Germany) in 195 mM glycine, 25 mM Tris, pH 8.3, the membrane was washed in buffer SW (10 mM K-HEPES, 100 mM KCl, 0.5 mM dithiothreitol, 0.1 mM EDTA, 0.2 mM spermine, 10% glycerol, protease inhibitors, pH 7.9). All subsequent steps were carried out at 4 °C unless stated otherwise. Proteins were denatured by washing the membrane twice for 10 min in buffer SWA (10 mM K-HEPES, 100 mM KCl, 0.5 mM dithiothreitol, 0.1 mM EDTA, 10% glycerol, M guanidinium hydrochloride, pH 7.9) and were renatured by removing half of the volume of buffer SWA, replacing it with buffer SW, and incubating the membrane for 10 min. This step was repeated four times and followed by a 5-min incubation in buffer SW. The membrane was blocked in buffer SW containing 5% low fat milk powder, 10 µg/ml salmon sperm DNA, and 1 µg/ml poly(dA·dT) for 60 min. After washing the membrane in buffer SW with 0.5% low fat milk powder, the membrane was incubated for 2 h at 18 °C in buffer SW with 60 fmol of 32P-labeled oligodeoxyribonucleotide (1.7 × 106 cpm) and 4 µg/ml poly(dA·dT). The membrane was finally washed in buffer SW for 10 min, dried,, and exposed for 2-4 days on Kodak XAR films.

Determination of the Amino Acid Sequence of p20-CGGBP and Identification of an EST Clone Coding for p20-CGGBP

The PVDF membrane carrying the blotted protein was destained in water. The protein was reduced, alkylated, and digested overnight with trypsin (12.5 ng/µl; Boehringer Mannheim, sequencing grade) at 37 °C in a 50 mM ammonium bicarbonate, 5 mM CaCl2 buffer (25, 27). Peptides were extracted from the membrane in 10 µl of aqueous 5% formic acid (Merck, Darmstadt, Germany) and subsequently in 10 µl of 1:1 acetonitrile:water, 5% formic acid (two changes). The resulting peptide mixture was dried and stored at -20 °C. The solution was reconstituted in 1 µl of 70% formic acid, rapidly diluted with 9 µl of water to avoid formylation, and was concentrated and desalted using 50 nl of PorosTM R2 material (Perspective Biosystems, Framingham, MA) prepared in a glass capillary as described previously (25, 27). The peptide mixture was eluted directly into a gold-coated glass capillary by passing 2 volumes of 0.4 µl of 50% methanol, 5% formic acid over the PorosTM column. The gold-coated glass capillary was mounted in the nanoelectrospray ion source (28, 29) on the mass spectrometer for peptide tandem mass spectrometry (MS) sequencing.

Tandem MS investigations were performed on an API III triple quadrupole mass spectrometer (PE Sciex, Ontario, Canada) equipped with an updated collision cell (30) and a nanoelectrospray ion source (28, 29). To detect peptides, which were below the chemical noise level, the parent ion scan technique, a filtering method for triple quadruple mass spectrometers, was used on the isoleucine/leucine immonium ion (31) (see Fig. 3). Only peptides generating upon fragmentation a predefined fragment ion, in this instance the 86-Da immonium ion of isoleucine/leucine, were detected. By applying this filtering procedure, even peptides generating signals below the noise level in the spectrum could still be recognized.


Fig. 3. Determination of the sequence of tryptic peptides from p20-CGGBP with nanoelectrospray tandem mass spectrometry. After extraction from the PVDF membrane, desalting, and concentration over Poros R2 material, one peptide (peptide a) could be detected in the mass spectrum of the sample. Two additional peptides (peptides b and c) were identified by using the parent ion scan on 86, the immonium ion of isoleucine or leucine (2nd panel from top). This scanning technique was used to detect peptides below the chemical noise level in the original spectrum. All peptides, which fragment to produce the mass 86 Da, could be detected in this scan and thus be distinguished from chemical background ions. Peptide a did not yield a signal in the parent ion scan because it did not contain isoleucine or leucine (see below). A complete y ion series could be retrieved from the tandem MS spectrum of peptide a (39) resulting in the amino acid sequence FVVTAPPAR. The amino acid sequences of peptides a, b, and c (printed in bold) obtained by tandem mass spectrometry and their locations in the total protein are shown in the bottom part of this figure. A nuclear localization signal (aa 69-84) was detected by computer-aided sequence analyses.
[View Larger Version of this Image (33K GIF file)]

Data base searches were performed using the sequence tag approach and the PeptideSearch program (32). Sequence tags present information on part of a peptide sequence and typically comprise an internal sequence stretch of about three to four amino acids and the mass location in a peptide of known total mass. Frequently, this information can be gained from a tandem mass spectrum and facilitates the highly specific identification of the underlying complete peptide sequence during a data base search. A non-redundant data base currently containing more than 200,000 protein sequences or open reading frames was searched. Another data base was prepared from the available EST data base dBEST (33, 34) which currently contains about 500,000 human sequence entries. The EST clone ID269133 (Genbank accession numbers [GenBank] and [GenBank]) was obtained via the I.M.A.G.E. consortium (35), distributed by Research Genetics, Huntsville, AL, and resequenced on automated DNA sequencers.

Expression of the Fusion Protein p20-CGGBP-6xHis in E. coli and Purification by Ni-Chelate Chromatography

A fragment p20-518 was amplified from the plasmid 269133 by using primers p20Bam5' (5'-dCGCGGATCCGAGCGATTTGTAGTAACAGCA-3') and p20Kpn3' (5'-dGGGGTACCTCAACAATCTTGTGAGTTGAG-3') using Pfu DNA polymerase (Stratagene). The fragment p20-518 contained a BamHI recognition site at the 5'-end, a site for KpnI at the 3'-end, and coded for amino acids (aa) 2-166 of p20-CGGBP. This fragment was cloned into the vector pQE40 (Qiagen, Hilden, Germany) and cut with BamHI and KpnI. Thereby, a histidine hexamer was introduced at the N terminus of the fusion protein (see below). The correct sequence of the fusion construct pQE40-518 was ascertained by DNA sequencing of 10 independent clones after transformation into E. coli M15[pREP4] (Qiagen). The vector pQE40 encoded the fusion protein 6xHis-dihydrofolate reductase (DHFR-6xHis). Its coding sequence was removed by cleavage with the above-mentioned enzymes.

The E. coli strains containing the appropriate constructs were grown in liquid culture at 37 °C to an A600 nm of 0.7. Synthesis of the fusion protein was induced by adjusting the medium to 1.5 mM isopropyl-1-thio-beta -D-galactopyranoside (MBI Fermentas). After incubation for another 2 h, bacteria were pelleted and lysed for 30 min on ice in buffer SB (50 mM sodium phosphate, 10 mM Tris-HCl, 300 mM NaCl, 1 mM MgCl2, 0.1% Tween 20, 20% glycerol, 8 µg/ml aprotinin, pH 7.85) in the presence of 1 mg/ml lysozyme (Calbiochem). The lysate was sonicated on ice for 1 min at 40 watts (Sonifier B12, Branson Sonic Power Company, Danbury, CT). The cytoplasmic extract was collected after centrifugation (10,000 × g, 4 °C, 10 min), frozen in liquid nitrogen, and stored at -80 °C.

The fusion protein p20-CGGBP-6xHis and the control protein DHFR-6xHis were purified by Ni2+-chelate affinity chromatography from bacteria expressing the constructs pQE40-518 and pQE40, respectively. Proteins from cytoplasmic extracts were bound to Ni2+-nitrilotriacetic-agarose (Qiagen) at 4 °C for 3 h and, subsequently, at room temperature for 30 min. All following steps were carried out at 4 °C. The material was washed with buffer SB and SB100 (same as SB but containing 100 mM imidazole, pH 6.5), and the fusion proteins were eluted with buffer SB500 (same as SB, but containing 500 mM imidazole, pH 6.0). Since imidazole inhibited the DNA-binding activity, the eluted fusion proteins were equilibrated in 20 mM K-HEPES, 100 mM NaCl, 1 mM MgCl2, 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween 20, and protease inhibitors, pH 7.9. Even in the presence of protective carrier proteins, the purified proteins were stable at -80 °C for only 2 weeks. The purity of the fusion proteins was followed by electrophoresis on SDS-polyacrylamide gels. The DNA-binding specificities of the purified fusion proteins and of the bacterial lysates were tested by electrophoretic mobility shift assays (EMSA).

Electrophoretic Mobility Shift Assay

EMSAs were carried out essentially as described (24). Oligodeoxyribonucleotides (nomenclature and sequences see Table I) were hybridized in a thermal cycler and 5'-end-labeled to a specific activity of 15,000 cpm/fmol by T4 polynucleotide kinase. When the binding activity of the bacterially expressed fusion protein p20-CGGBP-6xHis was determined, 10 µg of bovine serum albumin and/or 0.15 mM spermine were added per assay to increase the stability of the protein.

Isolation of RNA, Northern Blot, and Southern Blot Hybridizations

Total cellular RNA was isolated from cell lines according to standard procedures. Electrophoresis was carried out under denaturing conditions (36). The RNA was transferred to a nylon membrane (Qiabrane, Qiagen) and hybridized (37) with 32P-labeled probe p20-701. A dot blot membrane carrying standardized amounts of mRNA isolated from a number of human tissues (human master mRNA blot, CLONTECH, Heidelberg, Germany) was also hybridized with the p20-701 probe.

A somatic cell hybrid panel (Oncor, Inc., Heidelberg, Germany) containing 15 µg of PstI-cleaved genomic DNA from 23 different rodent cell lines each carrying one human chromosome was hybridized with probe p20-779. The DNA probe p20-779 contained the complete 779-bp insert and was isolated from the EST clone ID269133 after EcoRI and NotI cleavage. The DNA fragment p20-701 lacked the poly(A)-tail of the insert and was isolated after BfaI cleavage of probe p20-779. Hybridization probes were 32P-labeled by randomly primed tagging (38).


RESULTS

p20-CGGBP Binds Directly to Its Target Sequence

Protein fractions isolated from HeLa nuclear extracts and highly enriched for the 5'-d(CGG)17-3'-ds-binding activity were analyzed by Southwestern blotting for their binding activity to the target sequence of the p20-CGGBP protein (Fig. 1). Separated and blotted proteins were incubated with a 32P-labeled oligodeoxyribonucleotide that carried either 17 5'-d(CGG)-3' repeats [(CGG)17ds] or with the control oligodeoxyribonucleotide (CAG)17ds that contained 17 5'-d(CAG)-3' repeats. A band of about 20 kDa was detected in all fractions highly enriched for p20-CGGBP with 32P-labeled (CGG)17ds as a binding probe (Fig. 1, lanes designated fractions III and IV). This band was not present when (CAG)17ds was used as a probe. It also failed to be detected in a fraction deficient of 5'-d(CGG)17-3'-ds-binding activity. An additional band of 100 kDa was observed in some fractions as well as in crude nuclear extracts (Fig. 1, and data not shown) when either binding probe was used. This activity was likely due to unspecific DNA-protein interactions since this binding was also detected with HeLa nuclear extracts when a variety of single- or double-stranded oligodeoxyribonucleotides were used as binding probes (data not shown). These results confirmed that p20-CGGBP could bind directly to its target sequence without the involvement of additional cellular proteins.


Fig. 1. Southwestern blot analysis of p20-CGGBP. Protein fractions eluted from 5'-d(CGG)17-3'-ds-Sepharose (fractions III and IV) were separated by SDS-polyacrylamide gel electrophoresis (left panel, designated gel), blotted, and incubated (right panels, designated blot) with either (CGG)17ds or (CAG)17ds as 32P-labeled binding probes. Fraction III was isolated after the first step of specific DNA-affinity chromatography, and fraction IV was isolated after the second step. Both fractions showed high binding activity to the oligodeoxyribonucleotide (CGG)17ds in EMSAs (24). A specific band at 20 kDa was detected in fractions III and IV with the oligodeoxyribonucleotide (CGG)17ds as binding probe. The signal at 100 kDa was unspecific.
[View Larger Version of this Image (73K GIF file)]

Determination of the Amino Acid Sequence of the p20-CGGB-Protein: EST Clone ID269133 Contains the Complete Coding Sequence for p20-CGGBP

For the determination of the amino acid sequence of several internal peptides of p20-CGGBP, about 400 ng of p20-CGGBP were isolated from 5 × 109 HeLa cells, separated from accompanying proteins by SDS-polyacrylamide gel electrophoresis, and transferred to a PVDF membrane. The experimental design for the determination of the amino acid sequence is outlined in Fig. 2. Extraction of the peptides from the membrane resulted in limited recovery, as seen by the poor signal-to-noise ratio in Fig. 3, upper panel. Only one peptide was apparent whose fragmentation spectrum is shown in Fig. 3, lower panel. To detect peptide ion signals below the level of chemical noise, a parent ion scan for the immonium ion of isoleucine/leucine was performed. This scan revealed two more peptides that were subsequently fragmented (Fig. 3, 2nd panel from top). Tandem MS spectra of all three peptides were generated. Interpretation of these spectra resulted in one complete and two partial sequences. The sequence of peptide "a" was determined to be FVVTAPPAR. For peptide "b," interpretation of the high m/z range resulted in the N-terminal sequence VSV(I/L) or a peptide sequence tag (17 Da) VSV(I/L) (634.4 Da). A peptide sequence tag could also be assigned to peptide "c": (172.2 Da) (I/L)YV (601.80 Da). A search of a large non-redundant protein data base containing more than 200,000 entries revealed no match to the sequence FVVTAPPAR or the sequence tags, indicating that the protein was unknown. A search in the expressed sequence tag data base (dbEST) with PeptideSearch, however, did retrieve a matching EST (clone ID269133). This clone had been sequenced from the 3'- (Genbank accession no. [GenBank]) and 5'-ends (Genbank accession no. [GenBank]). Peptide a was found in the former and peptide b in the latter. The retrieved sequence for peptide b was VSVIQDFVK, which matched the obtained tandem mass spectrum for this peptide. Peptide c did not match directly, but regions 1 and 2 of the peptide sequence tag were found to match, indicating a sequence error in the DNA sequencing coding for the C-terminal part of the peptide (32). Resequencing of the clone indeed revealed a different DNA sequence. After its correction the peptide sequence was TALYVPLD which also led to complete agreement with the tandem mass spectrum of peptide c. The derived amino acid sequence of the 501-bp open reading frame, obtained after double-stranded resequencing of the EST clone, thus contained all three peptides covering 28 amino acids (Fig. 3, bottom).


Fig. 2. Strategy for the determination of the amino acid sequence of p20-CGGBP. After gel separation and digestion of the protein, the peptides in the unseparated mixture were sequenced by nanoelectrospray tandem mass spectrometry (25). Peptide sequence tags consisting of short amino acid sequences with their mass locations in the peptides were used to search EST data bases in an error-tolerant way to compensate for possible sequence mistakes in the EST entries. When a clone was identified, it was obtained from a generally available repository (35), and the full insert length was sequenced. The higher sequencing accuracy and the longer sequence then allowed more of the peptide data to be matched against the DNA sequence. This approach further verified the identity of the clone. When the clone contained the full-length coding sequence, it could be immediately subcloned for expression. Alternatively, different EST sequences and clones could be assembled for a full-length clone, or the matching EST itself could be used as a probe for further library screening.
[View Larger Version of this Image (18K GIF file)]

The analysis of the cDNA sequence revealed that the nucleotide sequence context of a putative start codon ATG (AGGATGG) at nucleotide position 197 of the 779-bp insert was in accordance with the Kozak rules (40) for functional translational start sites in eucaryotes. In addition, several very short ORFs were detected 5' of the putative start codon. The stop codon at nucleotide 698 was followed by a polyadenylation signal at nucleotide 729 and a poly(A)-tail starting at nucleotide 759. The 501-bp ORF encoded a 166-aa long protein with a molecular mass of about 19 kDa which is in accordance with the apparent molecular mass for p20-CGGBP as determined by SDS-polyacrylamide gel electrophoresis. The results indicated that the EST clone ID269133 contained the complete coding sequence of p20-CGGBP.

Properties of the p20-CGGBP cDNA

Northern blot analyses of RNA isolated from human HeLa cells detected only one transcript with an apparent size of 1.2 kilobases with p20-CGGBP cDNA (probe p20-701) as hybridization probe (data not shown). This finding suggested the presence of a large 5'-UTR of p20-CGGBP-specific RNA. Low level expression of this RNA was found in a number of human tissues by dot blot RNA hybridization. In contrast to very low level expression in whole fetal brain, expression was high in adult cerebellum and cerebral cortex but low in other regions of the adult brain. RNA isolated from human placenta, thymus, and lymph nodes also contained relatively high levels of a p20-CGGBP-specific RNA. These results agree with the observation that several EST clones with extensive homologies to the clone ID269133 (35) had been isolated from different human tissues as well as from mouse and rat. Moreover, DNA from several mammalian sources yielded specific signals with the p20CGGBP cDNA probe, whereas DNA from non-mammalian species, except chicken DNA, did not (Fig. 4). This high degree of conservation of the p20-CGGBP sequence among mammals and its expression in a variety of many human tissues are consistent with our observation that an activity binding to the double-stranded 5'-d(CGG)n-3' trinucleotide repeat was detected in a large number of human and mammalian cell lines (24). However, a yeast sequence homologous to the EST clone ID269133 was not found. Analyses of the derived amino acid sequence did not reveal any overall homology to known proteins. Computer-aided sequence analyses detected a putative nuclear localization signal between aa 69 and 84 (Fig. 3, bottom panel).


Fig. 4. The p20-CGGBP is conserved among mammals. Genomic DNA, 15 µg each, from sources as indicated was cleaved with HindIII, analyzed by Southern blotting and hybridization at 68 °C to the 32P-labeled cDNA of the p20-CGGBP gene as a probe with subsequent stringent washing in 0.1 × SSC, 0.1% SDS. Genomic DNA was prepared by standard procedures either from freshly frozen tissues (mouse, rat, pig, chicken, frog (Xenopus laevis), and Drosophila melenogaster) or from cell lines HeLa (human), BHK21 (hamster), MDCK (dog), FHM (fish, fat head minnow), or SF21 (insect, Spodoptera frugiperda).
[View Larger Version of this Image (60K GIF file)]

The results presented are in accordance with the finding that protein-DNA complex formation between p20-CGGBP and its target sequence cannot be competed by a set of oligodeoxyribonucleotides carrying consensus binding sequences for known DNA-binding proteins. We, therefore, conclude that p20-CGGBP is a novel DNA-binding protein.

Chromosomal Localization of the Gene Encoding p20-CGGBP

A somatic cell hybrid panel with genomic DNA from mouse or hamster hybrid cell lines, each carrying one specific human chromosome, was hybridized to the complete insert of EST clone ID269133 (probe p20-779). A strong human DNA-specific signal was detected with DNA from human chromosome 3 (arrowhead in Fig. 5). The intense cross-hybridization to the corresponding rodent genes (mouse and hamster) of p20-CGGBP was consistent with its high degree of conservation in mammals.


Fig. 5. Chromosomal localization of the gene for p20-CGGBP. DNA isolated from rodent cell lines carrying DNA of the indicated human chromosomes was hybridized to the 32P-labeled p20-CGGBP cDNA (probe, p20-779), and a human-specific signal was detected on chromosome 3 (see arrowhead). Also note the hybridization to genomic mouse and hamster DNAs.
[View Larger Version of this Image (80K GIF file)]

The Fusion Protein p20-CGGBP-6xHis Binds Sequence-specifically to the Double-stranded Trinucleotide Repeat 5'-d(CGG)n-3'

The binding specificity of the protein encoded by the 501-bp ORF of EST clone ID269133 was further characterized. The protein was expressed in E. coli which was chosen as a host because an activity similar to p20-CGGBP did not occur in this procaryote (see also Fig. 6C). A histidine-hexamer was introduced at the N terminus of the coding sequence of p20-CGGBP starting with the second amino acid to avoid internal translational start sites. The expressed fusion protein p20-CGGBP-6xHis showed an apparent molecular mass of 20 kDa upon electrophoresis in SDS-polyacrylamide gels as expected (Fig. 6A). Purification of the recombinant protein by Ni2+-chelate affinity chromatography was feasible only under native conditions and resulted in pure p20-CGGBP-6xHis (Fig. 6A). As a control, the mammalian enzyme dihydrofolate reductase (DHFR-6xHis) was also expressed and purified from bacteria under similar experimental conditions (Fig. 6A).


Fig. 6. Recombinant p20-CGGBP-6xHis binds specifically to the double-stranded oligodeoxyribonucleotide 5'-d(CGG)17-3'-ds. A, p20-CGGBP-6xHis and the control protein DHFR-6xHis were purified by Ni2+-chelate chromatography from bacteria expressing the appropriate constructs. B, purified recombinant p20-CGGBP-6xHis or the authentic p20-CGGBP were bound to the oligodeoxyribonucleotide (CGG)17ds in the presence or absence of several oligodeoxyribonucleotides as competitors (see Table I). A 100-fold molar excess of the competitor DNAs in comparison to the binding probe was used in these experiments. The specific p20-CGGBP-5'-d(CGG)17-3'-ds-complex cI was competed only by the homologous oligodeoxyribonucleotide. Complex cI showed identical electrophoretic mobilities with the recombinant or the authentic protein. C, lysates from a strain expressing DHFR-6xHis as well as purified DHFR-6xHis lacked any 5'-d(CGG)17-3'-ds binding activity.
[View Larger Version of this Image (71K GIF file)]

Bacterial lysates were prepared from bacteria induced for the expression of p20-CGGBP-6xHis or for DHFR-6xHis as well as from uninduced bacteria. The lysates were tested for the presence of proteins capable of binding to the oligodeoxyribonucleotide (CGG)17ds. Formation of the specific complex cI between (CGG)17ds and bacterial proteins was detected only in lysates prepared from bacteria expressing the fusion protein p20-CGGBP-6xHis (Fig. 6B) and not from those expressing DHFR-6xHis (Fig. 6C).

The specificity of the complex cI between the fusion protein p20-CGGBP-6xHis and the double-stranded target sequence was assessed with purified recombinant p20-CGGBP-6xHis that was bound to (CGG)17ds in the presence of oligodeoxyribonucleotides containing other trinucleotide or related sequence repeats (see Table I). The formation of the complex cI, which exhibited a similar mobility as the one formed with p20-CGGBP isolated from HeLa nuclei, was competed only by the homologous oligodeoxyribonucleotide (CGG)17ds (Fig. 6B) and not by oligodeoxyribonucleotides carrying different trinucleotide repeats, such as (CAG)17ds, (CGA)17ds, (TGG)17ds, and (CAA)17ds (Fig. 6B). Furthermore, no competition was detected with oligodeoxyribonucleotides containing base pair exchanges at the first base of every other triplet repeat. In addition, the oligodeoxyribonucleotide FraxFds, which was isolated from the FraxF locus (41) and contained only eight 5'-d(CGG)-3' repeats adjacent to other triplets, failed to compete for binding. Thus, more than eight 5'-d(CGG)-3' repeats were apparently required for proper binding. These competition patterns were identical to those described previously for HeLa nuclear extracts or for p20-CGGBP purified from them (24). However, the binding affinity of purified recombinant p20-CGGBP-6xHis was lower than that of the authentic p20-CGGBP. Possibly the recombinant protein could have been partly inactivated during or after purification by metal chelate chromatography, since the binding activity of the raw bacterial lysate was quite high (data not shown). The addition of carrier protein (e.g. bovine serum albumin) or of polyamines (e.g. spermine) increased the stability of the purified protein slightly without influencing its binding specificity.

These DNA binding studies confirmed that the ORF in the EST clone ID269133 encoded the full-length cDNA for p20-CGGBP. Furthermore, p20-CGGBP was found to be sufficient for the formation of complex cI. Thus, involvement of other cellular proteins in the formation of complex cI was unlikely.

p20-CGGBP Also Binds to Interrupted Repeats and Tolerates an A/G Mismatch in Its Target Sequence

FMR1 promoter sequences carrying more than 40 repeats of the 5'-d(CGG)-3' stretch appeared to be stable upon female transmission when these repeats were interrupted by 5'-d(AGG)-3' triplets at every 7th to 10th position (42-44). The loss of these interrupting 5'-d(AGG)-3' triplets at the 3'-end of a longer 5'-d(CGG)-3' trinucleotide repeat was implicated as a possible first step in the expansion.

Binding of purified p20-CGGBP or crude nuclear extract to the target oligodeoxyribonucleotide (CGG)17ds was competed in the presence of the oligodeoxyribonucleotide CGG10AGGds (see Table I). This oligodeoxyribonucleotide contained two 5'-d(AGG)-3' interruptions and an uninterrupted stretch of nine 5'-d(CGG)-3' repeats (Fig. Fig. 7). This result suggested that p20-CGGBP could indeed bind to an interrupted repeat.


Fig. 7. p20-CGGBP binds to the interrupted repeat 5'-d(CGG)3 AGG (CGG)9 AGG (CGG)3-3'-ds. HeLa nuclear extracts (ne) were bound to the oligodeoxyribonucleotide (CGG)17ds in the presence of one of the oligodeoxyribonucleotides (CGG)17ds, CGG10AGGds, CGG10AGG/(CCG)17, (CGG)17/CCG10CCT, CGG8Ads, or CGG8A/(CGG)17. The specific complex cI was competed by all competitors used, except CGG8Ads. Similar results were obtained with purified p20-CGGBP. See also Table I.
[View Larger Version of this Image (43K GIF file)]

Similarly, the ds oligodeoxyribonucleotides (CGG)17/CCG10CCT and CGG10AGG/(CCG)17 (see Table I) also competed for the binding to (CGG)17-ds (Fig. 7). These results indicate that either nine 5'-d(CGG)-3' repeats are sufficient for binding or that one mismatch (A/G or C/T) in the binding sequence can be tolerated. Competition experiments with oligodeoxyribonucleotides carrying a mismatch in the first position of every second triplet (for details and sequences see Table II) showed that the presence of several A/G mismatches obviously did not inhibit binding of p20-CGGBP to its target sequence.

Table II. Oligodeoxyribonucleotides carrying mismatch base pairs used in EMSA
<UP>General sequence: </UP><UP>5′-</UP>(<UP>CGG<B>N</B>GG</UP>)<UP><SUB>8</SUB>CGG-3′ = CGG8<B>N</B></UP>
<UP>3′-</UP>(<UP>GCC<B>P</B>CC</UP>)<UP><SUB>8</SUB>GCC-5′ = GCC8<B>P</B></UP>


Composition of ds N P Mismatch Competition for (CGG)17ds binding

(CGG)17ds C G No Yes
CGG8Tds T A No Partly
CGG8Ads A T No No
CGG8Gds G C No No
CGG8A/CCG8A A A A/A No
CGG8A/CCG8C A C A/C No
(CGG)17/CCG8A C A C/A No
CGG8A/(CCG)17 A G A/G Yes
CGG8G/CCG8A G A G/A No
(CGG)17/CCG8C C C C/C No
(CGG)17/CCG8T C T C/T No
CGG8T/CCG8C T C T/C No
CGG8G/(CCG)17 G G G/G No
CGG8T/(CCG)17 T G T/G No
CGG8G/CCG8T G T G/T No
CGG8T/CCG8T T T T/T No


DISCUSSION

Mechanisms underlying instability and the physiological function of trinucleotide repeats in the human genome are not understood. We have, therefore, investigated proteins that interact sequence-specifically with the unstable triplet repeat 5'-d(CGG)n-3' in the human FMR1 gene and have recently purified the 20-kDa protein p20-CGGBP from nuclear extracts of HeLa cells. This protein binds exclusively to 5'-d(CGG)n-3' repeats and not to any other unstable triplet repeat sequence (24). The cloning of the cDNA encoding this protein has now become possible with a strategy involving amino acid sequence determination by tandem mass spectrometry and screening of EST data bases with the obtained sequence tags.

From partial mass spectrometric sequence data of three peptides of the protein, a cDNA fragment has been identified in the EST data base using special software algorithms. Since the coding sequence of this protein is short, it is completely represented by the identified clone, omitting the need for any library screening and subcloning. To our knowledge this is the first reported example in which a combination of mass spectrometric sequencing, data base searching, and a generally available clone collection have made additional cloning efforts redundant. The rapid cloning of the cDNA of p20-CGGBP thus demonstrates the potential of mass spectrometric sequencing in conjunction with the screening of EST data bases. Once the protein had been purified on an SDS gel and the sequencing had started, it has taken only 2 weeks to have its clone available to express the protein and test for its putative function. The EST data base is thought to contain already more than half of all human genes (34). Since the size of the data base is still growing rapidly and due to efforts to obtain longer stretches of coding sequence, it is reasonable to expect that many human proteins may be amenable to the type of analysis described here.

Mass spectrometric sequencing has intrinsically favorable characteristics that make it the technique of choice for EST data base identifications. Much lower amounts of protein are sufficient for analysis as compared with conventional amino acid sequencing techniques such as Edman degradation (25). In one experiment several peptides can be fragmented. The chances that an analyzed peptide is represented in the EST data base are thus increased. The generation of sequence tags, short amino acid sequences together with their precise mass location in the peptide, from tandem mass spectra is relatively simple and can often be done automatically. Sequence tags have a high statistical search specificity making them a very powerful probe to locate the corresponding EST, even in the presence of DNA sequencing errors, as was the case here. We anticipate that mass spectrometric sequencing in conjunction with EST data bases will play an important role in cloning new proteins from organisms for which large EST data bases are available or from closely related organisms (45).

The authenticity of the isolated cDNA is supported by the finding that the recombinant protein p20-CGGBP-6xHis exhibits the same, although weaker, DNA-binding pattern as the purified protein. Most likely, p20-CGGBP is the exclusive protein partner in the p20-CGGBP-(CGG)17ds complex cI. It contains at least the DNA-binding domain of the involved proteins as shown by Southwestern blot analyses. Complex cI probably consists of more than one molecule p20-CGGBP because it is highly sensitive to deoxycholate treatment (24), suggesting the involvement of protein-protein interactions (46).

The physiological function of the novel DNA-binding protein p20-CGGBP in triplet repeat function and instability cannot be derived solely from its cDNA sequence due to the lack of homology to known proteins. However, high conservation of p20-CGGBP among mammals as confirmed by Southern blot analyses (Fig. 4) and computer-aided sequence analyses as well as its expression in a variety of human tissues point to an important, if not essential, function in mammalian cells. It is attractive to speculate that the 5'-d(CGG)-3' repeat itself has regulatory functions in the expression of the adjacent FMR1 gene. It has been described that short trinucleotide repeats 5'-d(CGG)-3' or 5'-d(CTG)-3' could function as regulatory elements in at least two different mammalian promoters (16, 17). In this context, a report about two individuals carrying an expanded, but unmethylated, 5'-d(CGG)-3' repeat in the 5'-UTR of the FMR1 gene without the fragile X phenotype but with normal expression of FMRP is very interesting (47). It is likely that the silencing of the FMRP expression in FraX individuals is in part due to cytosine-specific DNA methylation of the amplified repeat and adjacent sequences. Interestingly, p20-CGGBP can bind to an (un)interrupted FMR-1 triplet repeat but not to a highly methylated 5'-d(CGG)-3' repeat in vitro (24). The elucidation of the cDNA sequence of p20-CGGBP now provides a basis for the study of its role in triplet repeat function and expansion in mammalian and non-mammalian cells.


FOOTNOTES

*   This research was supported by a grant (to W. D.) from the Federal Ministry for Science, Education, Research and Technology, Bonn, Germany, through Zentrum für Molekulare Medizin, Köln, TP13.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
   To whom correspondence should be addressed: Institut für Genetik, Universität zu Köln, Weyertal 121, D-50931 Köln, Germany. Tel.: 49-221-470-2386; Fax: 49-221-470-5163.
1   The abbreviations used are: FraX, fragile X; bp, base pair(s); CGGBP, 5'-d(CGG)-3'ds binding protein; ds, double-stranded; EMSA, electrophoretic mobility shift assay; EST, expressed sequence tag; FMRP, fragile X mental retardation (protein); MS, mass spectrometry; ORF, open reading frame; UTR, untranslated region; PVDF, polyvinylidene difluoride; DHFR, dihydrofolate reductase; CAPS, 3-(cyclohexylamino)-1-propanesulfonic acid.

ACKNOWLEDGEMENTS

We thank Helmut Deissler, Institute of Cell Biology, University of Essen Medical School for help with the sequence analyses and for valuable comments on the manuscript and Sandra Kühn for providing D. melanogaster DNA. We are grateful to Petra Böhm for expert editorial work.


REFERENCES

  1. Ashley Jr, C. T., and Warren, S. T. (1995) Annu. Rev. Genet. 29, 703-728 [CrossRef][Medline] [Order article via Infotrieve]
  2. Campuzano, V., Montermini, L., Moltò, M. D., Pianese, L., Cossée, M., Cavalcanti, F., Monros, E., Rodius, F., Duclos, F., Monticelli, A., Zara, F., Cañizares, J., Koutnikova, H., Bidichandani, S. I., Gellera, C., Brice, A., Trouillas, P., De Michele, G., Filla, A., De Frutos, R., Palau, F., Patel, P. I., Di Donato, S., Mandel, J.-L., Cocozza, S., Koenig, M., and Pandolfo, M. (1996) Science 271, 1423-1427 [Abstract]
  3. Fu, Y.-H., Kuhl, D. P. A., Pizzuti, A., Pieretti, M., Sutcliffe, J. S., Richards, S., Verkerk, A. J. M. H., Holden, J. J. A., Fenwick, R. G., Jr., Warren, S. T., Oostra, B. A., Nelson, D. L., and Caskey, C. T. (1991) Cell 67, 1047-1058 [Medline] [Order article via Infotrieve]
  4. Pieretti, M., Zhang, F., Fu, Y.-H., Warren, S. T., Oostra, B. A., Caskey, C. T., and Nelson, D. L. (1991) Cell 66, 817-822 [Medline] [Order article via Infotrieve]
  5. Hansen, R. S., Gartler, S. M., Scott, C. R., Chen, S.-H., and Laird, C. D. (1992) Hum. Mol. Genet. 1, 571-578 [Abstract]
  6. Hornstra, I. K., Nelson, D. L., Warren, S. T., and Yang, T. P. (1993) Hum. Mol. Genet. 2, 1659-1665 [Abstract]
  7. Verkerk, A. J. M. H., Pieretti, M., Sutcliffe, J. S., Fu, Y.-H., Kuhl, D. P. A., Pizzuti, A., Reiner, O., Richards, S., Victoria, M. F., Zhang, F., Eussen, B. E., van Ommen, G.-J. B., Blonden, L. A. J., Riggins, G. J., Chastain, J. L., Kunst, C. B., Galjaard, H., Caskey, C. T., Nelson, D. L., Oostra, B. A., and Warren, S. T. (1991) Cell 65, 905-914 [Medline] [Order article via Infotrieve]
  8. Fry, M., and Loeb, L. A. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 4950-4954 [Abstract]
  9. Chen, F.-M. (1995) J. Biol. Chem. 270, 23090-23096 [Abstract/Free Full Text]
  10. Gacy, A. M., Goellner, G., Juranic, N., Macura, S., and McMurray, C. T. (1995) Cell 81, 533-540 [Medline] [Order article via Infotrieve]
  11. Mitas, M., Yu, A., Dill, J., and Haworth, I. S. (1995) Biochemistry 34, 12803-12811 [Medline] [Order article via Infotrieve]
  12. Mitchell, J. E., Newbury, S. F., and McClellan, J. A. (1995) Nucleic Acids Res. 23, 1876-1881 [Abstract]
  13. Usdin, K., and Woodford, K. J. (1995) Nucleic Acids Res. 23, 4202-4209 [Abstract]
  14. Yu, A., Dill, J., Wirth, S. S., Huang, G., Lee, V. H., Haworth, I. S., and Mitas, M. (1995) Nucleic Acids Res. 23, 2706-2714 [Abstract]
  15. McMurray, C. T. (1995) Chromosoma (Berl.) 104, 2-13 [CrossRef][Medline] [Order article via Infotrieve]
  16. Perevozchikov, A. P., Orlov, S. V., and Kuteikin, K. B. (1994) Dokl. Akad. Nauk. SSSR 338, 411-414
  17. Imagawa, M., Ishikawa, Y., Shimano, H., Osada, S., and Nishihara, T. (1995) J. Biol. Chem. 270, 20898-20900 [Abstract/Free Full Text]
  18. Gastier, J. M., Pulido, J. C., Sunden, S., Brody, T., Buetow, K. H., Murray, J. C., Weber, J. L., Hudson, T. J., Sheffield, V. C., and Duyk, G. M. (1995) Hum. Mol. Genet. 4, 1829-1836 [Abstract]
  19. Shimizu, M., Gellibolian, R., Oostra, B. A., and Wells, R. D. (1996) J. Mol. Biol. 258, 614-626 [CrossRef][Medline] [Order article via Infotrieve]
  20. Ohshima, K., Kang, S., and Wells, R. D. (1996) J. Biol. Chem. 271, 1853-1856 [Abstract/Free Full Text]
  21. Richards, R. I., Holman, K., Yu, S., and Sutherland, G. R. (1993) Hum. Mol. Genet. 2, 1429-1435 [Abstract]
  22. Yano-Yanagisawa, H., Li, Y., Wang, H., and Kohwi, Y. (1995) Nucleic Acids Res. 23, 2654-2660 [Abstract]
  23. Timchenko, L. T., Timchenko, N. A., Caskey, C. T., and Roberts, R. (1996) Hum. Mol. Genet. 5, 115-121 [Abstract/Free Full Text]
  24. Deissler, H., Behn-Krappa, A., and Doerfler, W. (1996) J. Biol. Chem. 271, 4327-4334 [Abstract/Free Full Text]
  25. Wilm, M., Shevchenko, A., Houthaeve, T., Breit, S., Schweigerer, L., Fotsis, T., and Mann, M. (1996) Nature 379, 466-469 [CrossRef][Medline] [Order article via Infotrieve]
  26. LeGendre, N., Mansfield, M., Weiss, A., and Matsudaira, P. T. (1993) in A Practical Guide to Protein and Peptide Purification for Microsequencing (Matsudaira, P. T., ed), 2nd Ed., pp. 71-101, Academic Press, London
  27. Shevchenko, A., Wilm, M., Vorm, O., and Mann, M. (1996) Anal. Chem. 68, 850-858 [CrossRef][Medline] [Order article via Infotrieve]
  28. Wilm, M. S., and Mann, M. (1994) Int. J. Mass Spectrom. Ion Proc. 136, 167-180 [CrossRef]
  29. Wilm, M., and Mann, M. (1996) Anal. Chem. 66, 1-8
  30. Thomson, B. A., Douglas, D. J., Corr, J. J., Hager, J. W., and Joliffe, C. L. (1995) Anal. Chem. 67, 1696-1704
  31. Wilm, M., Neubauer, G., and Mann, M. (1996) Anal. Chem. 68, 527-533 [CrossRef][Medline] [Order article via Infotrieve]
  32. Mann, M., and Wilm, M. S. (1994) Anal. Chem. 66, 4390-4399 [Medline] [Order article via Infotrieve]
  33. Boguski, M. S., Lowe, T. M. J., and Tolshtoshev, C. M. (1993) Nat. Genet. 4, 332-333 [Medline] [Order article via Infotrieve]
  34. Schuler, G. D., Boguski, M. S., Stewart, E. A., Stein, L. D., Gyapay, G., Rice, K., White, R. E., Rodriguez-Tomé, P., Aggarwal, A., Bajorek, E., Bentolila, S., Birren, B. B., Butler, A., Castle, A. B., Chiannilkulchai, N., Chu, A., Clee, C., Cowles, S., Day, P. J. R., Dibling, T., Drouot, N., Dunham, I., Duprat, S., East, C., Edwards, C., Fan, J.-B., Fang, N., Fizames, C., Garrett, C., Green, L., Hadley, D., Harris, M., Harrison, P., Brady, S., Hicks, A., Holloway, E., Hui, L., Hussain, S., Louis-Dit-Sully, C., Ma, J., MacGilvery, A., Mader, C., Maratukulam, A., Matise, T. C., McKusick, K. B., Morissette, J., Mungall, A., Muselet, D., Nusbaum, H. C., Page, D. C., Peck, A., Perkins, S., Piercy, M., Qin, F., Quackenbush, J., Ranby, S., Reif, T., Rozen, S., Sanders, C., She, X., Silva, J., Slonim, D. K., Soderlund, C., Sun, W.-L., Tabar, P., Thangarajah, T., Vega-Czarny, N., Vollrath, D., Voyticky, S., Wilmer, T., Wu, X., Adams, M. D., Auffray, C., Walter, N. A. R., Brandon, R., Dehejia, A., Goodfellow, P. N., Houlgatte, R., Hudson, J. R., Jr., Ide, S. E., Iorio, K. R., Lee, W. Y., Seki, N., Nagase, T., Ishikawa, K., Nomura, N., Phillips, C., Polymeropoulos, M. H., Sandusky, M., Schmitt, K., Berry, R., Swanson, K., Torres, R., Venter, J. C., Sikela, J. M., Beckmann, J. S., Weissenbach, J., Myers, R. M., Cox, D. R., James, M. R., Bentley, D., Deloukas, P., Lander, E. S., and Hudson, T. J. (1996) Science 274, 540-546 [Abstract/Free Full Text]
  35. Lennon, G. G., Auffray, C., Polymeropoulos, M., and Soares, M. B. (1996) Genomics 33, 151-152 [CrossRef][Medline] [Order article via Infotrieve]
  36. Lehrach, H., Diamond, D., Wozney, J. M., and Boedtker, H. (1977) Biochemistry 16, 4743-4751 [Medline] [Order article via Infotrieve]
  37. Koetsier, P., Schorr, J., and Doerfler, W. (1993) BioTechniques 15, 260-262 [Medline] [Order article via Infotrieve]
  38. Feinberg, A. P., and Vogelstein, B. (1983) Anal. Biochem. 132, 6-13 [Medline] [Order article via Infotrieve]
  39. Roepstorff, P., and Fohlmann, J. (1984) Biomed. Mass Spectrom. 11, 601 [Medline] [Order article via Infotrieve]
  40. Kozak, M. (1987) Nucleic Acids Res. 15, 8125-8148 [Abstract]
  41. Parrish, J. E., Oostra, B. A., Verkerk, A. J. M. H., Richards, C. S., Reynolds, J., Spikes, A. S., Shaffer, L. G., and Nelson, D. L. (1994) Nat. Genet. 8, 229-235 [Medline] [Order article via Infotrieve]
  42. Eichler, E. E., Holden, J. J. A., Popovich, B. W., Reiss, A. L., Snow, K., Thibodeau, S. N., Richards, C. S., Ward, P. A., and Nelson, D. L. (1994) Nat. Genet. 8, 88-94 [Medline] [Order article via Infotrieve]
  43. Hirst, M. C., Grewal, P. K., and Davies, K. E. (1994) Hum. Mol. Genet. 3, 1553-1560 [Abstract]
  44. Kunst, C. B., and Warren, S. T. (1994) Cell 77, 853-861 [Medline] [Order article via Infotrieve]
  45. Mann, M. (1996) Trends Biol. Sci. 21, 494-495 [CrossRef]
  46. Baeuerle, P. A., and Baltimore, D. (1988) Cell 53, 211-217 [Medline] [Order article via Infotrieve]
  47. Smeets, H. J. M., Smits, A. P. T., Verheij, C. E., Theelen, J. P. G., Willemsen, R., van de Burgt, I., Hoogeveen, A. T., Oosterwijk, J. C., and Oostra, B. A. (1995) Hum. Mol. Genet. 4, 2103-2108 [Abstract]

©1997 by The American Society for Biochemistry and Molecular Biology, Inc.