A Hyaluronan Binding Link Protein Gene Family Whose Members Are Physically Linked Adjacent to Chrondroitin Sulfate Proteoglycan Core Protein Genes

THE MISSING LINKS*

Andrew P. Spicer {ddagger}, Adriane Joo and Rodney A. Bowling, Jr.

From the Center for Extracellular Matrix Biology, Texas A&M University System Health Science Center, Institute of Biosciences and Technology, Houston, Texas 77030

Received for publication, December 23, 2002 , and in revised form, March 11, 2003.
    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
We describe a vertebrate hyaluronan and proteoglycan binding link protein gene family (HAPLN), consisting of four members including cartilage link protein. The encoded proteins share 45–52% overall amino acid identity. In contrast to the average sequence identity between family members, the sequence conservation between vertebrate species was very high. Human and mouse link proteins share 81–96% amino acid sequence identity. Two of the four link protein genes (HAPLN2 and HAPLN4) were restricted in expression to the brain/central nervous system, while one of the four genes (HAPLN3) was widely expressed. Genomic structures revealed that all four HAPLN genes were similar in exon-intron organization and were also similar in genomic organization to the 5' exons for the CSPG core protein genes. Strikingly, all four HAPLN genes were located immediately adjacent to the four CSPG core protein genes creating four pairs of CSPG-HAPLN genes within the mammalian genome. Furthermore, the two brain-specific HAPLN genes (HAPLN2 and HAPLN4) were physically linked to the brain-specific CSPG genes encoding brevican and neurocan, respectively. The tight physical association of the HAPLN and CSPG genes supports a hypothesis that the first HAPLN gene arose as a partial gene duplication event from an ancestral CSPG gene. There is some degree of coordinated expression of each gene pair. Collectively, the four HAPLN genes are expressed by most tissue types, reflecting the fundamental importance of the hyaluronan-dependent extracellular matrix to tissue architecture and function in vertebrate species. Comparison of the genomic structures for the HAPLN, CSPG genes and other members of the link module superfamily provide strong support for a common evolutionary origin from an ancestral gene containing one link module encoding exon.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Hyaluronan (HA)1 functions at multiple levels within the vertebrate extracellular matrix (ECM) and pericellular matrix (PCM) (1). HA may be found in variable amounts in many connective tissues, where it is usually bound by large aggregating chondroitin sulfate proteoglycans (CSPGs) (1), such as versican and aggrecan, or within the ECMs of the brain and central nervous system (CNS), where it is most often bound by the smaller aggregating CSPGs, brevican or neurocan.

Engineered and spontaneous loss-of-function mutations in the versican (2) and aggrecan genes (3), result in mid-gestational and early postnatal lethalities, respectively. This illustrates the fundamental importance of these two CSPGs and the HA-CSPG aggregates to normal development and tissue structure and function. In contrast, mice deficient in neurocan develop in an apparently normal fashion (4), although they do display disturbances in long term potentiation (LTP), which suggests a role for neurocan in the normal electrophysiological function of the brain/CNS.

The cartilage proteoglycan aggregate, consisting of HA and aggrecan is dependent upon cartilage link protein for its assembly and stability, with cartilage link protein and aggrecan binding along the HA chain in a 1:1 stoichiometry (5), i.e. one link protein monomer for each aggrecan monomer. In vitro reassociation studies and in vivo loss-of-function studies have demonstrated the critical role of cartilage link protein in the structure and function of the cartilage proteoglycan aggregate (6, 7). Mice deficient in cartilage link protein had a phenotype that was essentially a phenocopy of the spontaneous mouse mutant, cmd, which results from a 7-bp deletion within the aggrecan gene (7, 3). The name cartilage link protein may be somewhat of a misnomer as this protein has been reported to be expressed or present in many other locations within both the developing mammalian and vertebrate embryo and the adult (8, 9). Despite this, the phenotype associated with loss-of-function of cartilage link protein was restricted to the skeleton (7). This might suggest that, while cartilage link protein is expressed elsewhere, it is essential only to the stability and function of the cartilage proteoglycan aggregate and is not necessary to stabilize the HA-CSPG aggregates of other tissues, such as the brain and CNS. The recently described brain link protein, encoded for by the brain link protein 1 (BRAL1) gene (10), is clearly related to but distinct from the cartilage link protein in both amino acid sequence, predicted structure, and gene organization. This suggests that a second link protein may play a role in organizing and stabilizing the HA-CSPG aggregates of the brain and CNS.

HA-CSPG aggregates do not act in isolation within the ECM and PCM. These aggregates can interact with numerous other ECM and cell surface components through binding interactions of the CSPG core protein, the chondroitin sulfate and/or the HA chain. Binding partners that have been identified to date, include HA receptors such as CD44 (11), EGF receptors (12), sulfated glycolipids (13), tenascins (14, 15), fibulins (16), and neural cell adhesion molecule (15, 17). Therefore, the HA-CSPG aggregate, although present at relatively low levels in many tissues, may provide an important nucleus within the ECM and PCM around which an extensive matrix can be organized. Accordingly, the three components that make up known HA-CSPG aggregates: HA, a large aggregating CSPG protein, and a link protein, will be expected to be critical to the organization and function of the ECM and PCM of many tissues. Lastly, by providing a relatively high local concentration of negative charge, the HA-CSPG aggregate may act locally to concentrate or sequester cations such as sodium, in addition to maintaining the hydrated state of many tissues.

While attempting to identify novel genes encoding HA-binding proteins, we identified multiple expressed sequence tag (EST) clones encoding predicted link module-containing sequences. Further investigation of these partial sequences indicated that they represent four different genes and are predicted to encode link proteins, homologous to cartilage link protein. We present, herein, the first description of the vertebrate link protein gene family and suggest a probable evolutionary history for the link module superfamily.


    EXPERIMENTAL PROCEDURES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Data Base Searching and Sequence Analyses—We searched the human, mouse, and "other" EST databases using the complete human cartilage link protein amino acid sequence as our search tool and the TBLASTN algorithm. Default search parameters were selected, except that low complexity sequences were permitted. Positive sequences were grouped, according to amino acid identities. The nucleotide sequences from positive EST clones were used to rescreen the EST databases, using BLASTN, in an effort to extend the cDNA sequences at the 5'- and 3'-ends.

Derived cDNA sequences were used to screen the public domain human genome sequence (University of California, Santa Cruz) in an attempt to identify the chromosomal localization for each gene, in addition to determination of the probable gene structures. Genomic structures for all additional members of the link module superfamily were also obtained from the public domain human genome databases (University of California Santa Cruz and the Ensembl Web site, Ref. 18). Amino acid sequence analyses were performed using the OMIGA program with gaps introduced to maximize alignments. BLASTN and Pairwise BLAST (BLAST 2 sequences) were used to confirm nucleotide sequences flanking and spanning exon-intron boundaries from genomic DNA sequences.

Molecular Cloning—EST sequences were used as a starting point to obtain full-length cDNAs for each of the link protein sequences from both human and mouse. The sequences of EST clones were determined by automated DNA sequencing. Oligonucleotide primers were synthesized to extend the 5'-termini and to confirm the 3'-ends using 5'- and 3'-RACE reactions with PCR-ready cDNA pools derived from human placenta and brain poly(A)+ mRNAs (Clontech, Palo Alto, CA) using standard PCR conditions defined by the calculated annealing temperatures of the gene-specific primers. Amplified fragments were directly ligated using TOPO-cloning (Invitrogen Corp, Carlsbad, CA), and sequences of resultant plasmids were determined using automated DNA sequencing. Synthetic oligonucleotide primers were also designed to the predicted translation start and stop codons to facilitate the generation of open-reading frames for each link protein gene.

Our 5'-RACE attempts to identify the 5'-end of the human and mouse HAPLN4 genes were unsuccessful, presumably due to the high relative GC content of these mRNAs. For this reason, we screened an arrayed cDNA library constructed from human brain thalamus poly(A)+ RNA (www.rzpd.de; RZPD library number 595). Partial cDNAs for human HAPLN4 were labeled using random priming with [{alpha}-32P]dCTP, as previously described (19), and hybridized to the membrane array at 60 °C overnight in ExpressHyb solution (Clontech) supplemented with 150 µg/ml sheared, denatured salmon sperm DNA. Membranes were washed three times for 30 min each, at 65 °C in 0.1x SSC, 0.1% SDS, then exposed to BioMaxMR film (Kodak) at -80 °C with two intensifier screens. Positive clones were identified and ordered from the RZPD. The insert sizes were determined by restriction endonuclease digestion of mini-prep quality plasmid DNAs. The predicted full-length cDNA for HAPLN4 was 3.5 kilobase pairs. Those clones that possessed inserts in this size range were selected for automated DNA sequencing. To derive sequences for each HAPLN gene, either multiple overlapping ESTs were sequenced, or single cDNAs were sequenced on both strands.

Northern and Multiple Tissue Array Analyses—EST clones corresponding to human versican, brevican, aggrecan, and neurocan, MFGE8 and TM6SF2 were obtained from Research Genetics (www.res-gen.com) or the American Tissue Culture Collection (www.atcc.org) as frozen glycerol stocks or stab-cultures. Plasmids were propagated in Escherichia coli and the identity of each clone confirmed using automated DNA sequencing. Inserts were isolated from each plasmid clone, and each cDNA probe was radioactively labeled as described above and sequentially hybridized to either Northern blots (human Poly(A)+ RNA blot, Origene, Rockville, MD) or Multiple Tissue Expression arrays (Clontech) at 60 °C in ExpressHyb solution supplemented with 150 µg/ml sheared, denatured salmon sperm DNA under conditions recommended by the manufacturer. Membranes were hybridized overnight and washed at high stringency as described above. After membranes had been subjected to autoradiography, they were stripped by incubation for 2 x 15 min in boiling 0.5% SDS solution, then checked by autoradiography, prior to prehybridization and hybridization with the next probe. The same membrane was hybridized sequentially, to avoid differences in sample loading. Human {beta}-actin was used as a loading control on the MTE arrays and Northerns. The majority of MTEs were exposed for 16–20 h at -80 °C to BioMaxMR film, whereas northerns were typically exposed for between 2–3 days. Two independent MTE array membranes were sequentially hybridized for each probe.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Identification of a Vertebrate Link Protein Gene Family— EST data base searches led to the identification of multiple cDNA sequences encoding predicted polypeptides related to cartilage link protein. Numerous EST sequences were identified in human (>300 real hits), mouse (>200 real hits), rat (75 real hits), and zebrafish (21 real hits). In human and mouse, ESTs representing brevican, versican, CD44, TSG-6, LYVE1, and the two recently described stabilins, stabilin1 and stabilin2 (20), were identified in addition to numerous ESTs encoding polypeptides most closely related to cartilage link protein (CRTL1). These particular sequences could be grouped into four classes, corresponding to CRTL1 and three potential new link proteins. In human, only three classes were identified, and were designated HAPLN (hyaluronan and proteoglycan link protein) 1–3, where CRTL1 was redesignated HAPLN1. One of the new HAPLN genes, HAPLN2, has subsequently been reported elsewhere as Brain Link Protein 1, BRAL1 (10). In addition, a partial sequence for HAPLN3 has also been previously reported as OE-HABP (21). In the mouse, four classes of link protein sequence were identified, which included Crtl1, Bral1, Hapln3, and a fourth member, designated Hapln4. All four members were also identified in the rat, and partial sequences representing four putative hapln genes were identified in the zebrafish, Danio rerio (accession numbers, AL723066 [GenBank] , AL723003 [GenBank] , AL729072 [GenBank] , AL725723 [GenBank] , AL727046 [GenBank] (putative zhapln1), AW281205 [GenBank] (putative zhapln2), AI616686 [GenBank] , and AW454110 [GenBank] (putative zhapln3), AW422963 [GenBank] , and BM889913 [GenBank] (putative zhapln4)). This would suggest that the entire HAPLN gene family is present in all vertebrates, at least as far back as the bony fishes. We obtained our first human HAPLN4 sequences from a search of the human genome (see also below) against mouse Hapln4 sequences. Subsequent data base searching with the predicted human HAPLN4 cDNA sequence yielded multiple ESTs that corresponded to the 3'-untranslated region of human HAPLN4 only. The sequence of each clone was confirmed and a cDNA library was screened in order to obtain full-length cDNA clones for human HAPLN4. A recent search of the ensembl genomic data base confirmed the existence of only four bona fide link proteins in human and mouse.

The four link proteins share between 45 and 52% amino acid identity, with overall similarities of 52–62% (Fig. 1A). The percentage identities between human and mouse HAPLN proteins are: 96% (HAPLN1), 91.5% (HAPLN2), 81% (HAPLN3), and 91% (HAPLN4). This would suggest that there may be important functional distinctions between the four link proteins, perhaps in their relative specificity for interaction with one or more CSPG and with HA. All four link proteins are organized in the same way, with an N-terminal signal sequence followed by an immunoglobulin (Ig) domain and two consecutive link modules, or proteoglycan tandem repeats (PTR). The N termini and the Ig domains share the lowest overall sequence identity. Cartilage link protein binds to aggrecan via the Ig domain (22). Thus, low levels of sequence identity between the Ig domains would suggest that there may be differential specificity of interaction of the individual HAPLN proteins with the CSPGs (Fig. 1B). All ten cysteine residues are conserved across the family, as are the majority of those residues that have previously been shown or predicted to be important for HA binding in other link module superfamily members (Fig. 1A) (23). HAPLN4 differs from the other three members of the family; it possesses a 10-amino acid glycine-rich insertion within the first PTR and has an extended C terminus, which is rich in the amino acids glycine, tryptophan, alanine, proline, and aspartic acid (20/60 amino acids are Gly/Trp; 43/60 are Gly/Trp/Ala/Pro/Asp). Based upon the primary amino acid sequence, all four HAPLN proteins are predicted HA-binding proteins. Indeed, HAPLN1 (CRTL1) and HAPLN2 (BRAL1) are known to bind to HA (24). The functional significance of the extended HAPLN4 C terminus is not known at this time. HAPLN4 possesses one potential site for N-linked glycosylation, while HAPLN2 and HAPLN3 do not contain any consensus N-glycosylation sites. Predicted molecular masses range from 38 to 43 kDa for the four HAPLN proteins.



View larger version (52K):
[in this window]
[in a new window]
 
FIG. 1.
Amino acid sequence alignment of the four human HA-binding proteoglycan link (HAPLN) proteins. HAPLN1 is cartilage link protein (CRTL1). HAPLN2 is BRAL1. Conserved cysteine residues are boxed in black, while other conserved/identical residues are boxed in gray or in specific colors. Residues highlighted in pink indicate those residues that are conserved with residues in CD44 and TSG-6 that have been shown to be critical for binding to HA. Residues highlighted in blue indicate CD44-like residues, while those highlighted in red are TSG-6-like. Green residues are putative HA binding residues identified by molecular modeling studies (24), whereas yellow boxes highlight phenylalanine residues that may be involved in aromatic stacking interactions. The human and mouse HAPLN gene sequences have been deposited in GenBankTM and are available under accession numbers AY262759 [GenBank] (human HAPLN3); AY262757 [GenBank] (mouse Hapln3); AY262756 [GenBank] (human HAPLN4); AY262758 [GenBank] (mouse Hapln4).

 

Tissue expression was investigated by Northern analyses with a range of samples from major organs (Fig. 2). While this selection lacked many tissues rich in ECM, it was useful in providing a guide to the range of expression of each HAPLN. Our results indicated that HAPLN3 is the most widely expressed of all the HAPLN genes, with a transcript size of 2.1 kb being found in most tissues, including the brain. The highest levels of expression were observed in spleen and placenta. HAPLN1 was expressed in a restricted manner, being expressed primarily by the small intestine and placenta. In contrast, HAPLN2 and HAPLN4 were essentially restricted to the brain and CNS. One HAPLN2 transcript of ~1.9 kb was observed in human brain. One HAPLN4 transcript of 3.5 kb was also observed only in human brain. Our Northern analyses are supported by EST data base searches, which indicated that all of the HAPLN2 and HAPLN4 ESTs were derived from brain/CNS (data not shown). Overall, therefore, the HAPLN gene family includes one member that is widely expressed (HAPLN3), two members that are essentially brain/CNS specific (HAPLN2 and HAPLN4) and a fourth member (HAPLN1) that is expressed in a restricted subset of adult tissues. Collectively, HAPLN genes are expressed by most tissues.



View larger version (58K):
[in this window]
[in a new window]
 
FIG. 2.
Northern analyses of the human HAPLN gene family. Human poly(A)+ Northern blots were sequentially hybridized and stripped with radiolabeled probes for each of the human HAPLN genes. The relative position of the molecular weight markers is indicated on the left of each panel. HAPLN3 is the most widely expressed, whereas HAPLN2 and HAPLN4 are brain-specific.

 

Genomic Analyses—We used the first draft of the human genome sequence (25, 26) in an effort to determine the genomic structures for each of the HAPLN genes. We identified all exon-intron boundaries for each HAPLN gene and compared the gene structures for HAPLN2, HAPLN3, and HAPLN4 with the previously described structures for HAPLN1 (CRTL1) and the four large aggregating chondroitin sulfate proteoglycans genes, versican (CSPG2), brevican (BCAN), aggrecan (AGC1), and neurocan (CSPG3) (Fig. 3A). HAPLN2, HAPLN3, and HAPLN4 are comprised of 7, 5, and 5 exons, respectively. All members of the HAPLN gene family are organized in a similar fashion, with the signal peptide, Ig domain, and two PTRs encoded on separate exons. The first PTR of HAPLN2 is split into two exons, by a unique intron (Fig. 3A). Exon-intron boundaries between the signal peptide and the Ig loop, between the Ig loop and PTR1 and between PTR1 and PTR2 were conserved in all HAPLN genes. Furthermore, these exon-intron boundaries were also shared and conserved with the four CSPG genes (Fig. 3A). This conservation of gene structure strongly supports the evolution of the HAPLN and the large aggregating CSPG core protein genes from a common ancestral gene.



View larger version (49K):
[in this window]
[in a new window]
 
FIG. 3.
Gene structure analyses of the link module superfamily. A, comparison of the genomic structures for the human HAPLN genes and the human CSPG genes. Exons are indicated by the individual boxes, with the black areas indicative of the open-reading frame and the open, unfilled boxes indicative of untranslated sequences. 5'-UTR, 5'-untranslated region; SIG, signal peptide; Ig, immunoglobulin domain; PTR1, proteoglycan tandem repeat 1; PTR2, proteoglycan tandem repeat 2; 3'-UTR, 3'-untranslated region. The exon-intron boundaries between the signal peptide and the Ig domain, the Ig domain and PTR1 and between PTR1 and PTR2 are conserved in all HAPLN and CSPG genes, unequivocally supporting their evolution from a common ancestor. The position of the exon-intron splice site is indicated by the shape of each exonic block and is described in the box below the figure. B, comparison of the genomic structures flanking the link module containing sequences for HAPLN1, CD44, LYVE-1, stabilin-1 (STAB1), stabilin-2 (STAB2), TSG-6, and KIAA0527. Amino acid sequence alignment is shown, with the respective positions for the exon-intron boundaries indicated by the arrowheads (indicative of a boundary at the +1 position of that codon) or a star (indicative of a boundary at the +2 position of that codon). Conserved cysteines are highlighted in black. Other residues that are 100% conserved are highlighted in dark gray, while residues shared by at least 4 of 7 family members are highlighted in light gray.

 

We extended our comparative gene structure analyses of the HAPLN and CSPG genes to include all the remaining members of the link module superfamily, including CD44, LYVE1, TSG-6, stabilin1, stabilin2, and KIAA0527 (27). These members each possess a single HA-binding PTR. We identified the exon-intron boundaries flanking the single PTR of each gene and compared the relative position of the boundary within the open-reading frame for each gene (Fig. 3B). The exon-intron boundary immediately preceding PTR1 is relatively close in all members of the link module superfamily. Furthermore, this exon-intron boundary is of the same type (+1 position) in each family member. The exon-intron boundary immediately following the PTR is also of this type and the position of this boundary is identical for CD44, LYVE1, STAB1, STAB2, TSG-6, and KIAA0527, whereas this boundary is shifted by 2 codons in the HAPLN/CSPG family members. Overall, therefore, this PTR encoding cassette can be spliced in or out of each link module superfamily transcript while maintaining the reading frame. As such, this cassette could be recruited by genes and spliced into an existing framework, creating a new HA-binding protein.

Our results strongly support the evolution of the 1PTR [PDB] subfamily of the link module superfamily from a common ancestor (see also Fig. 5) and also suggest that the entire link module superfamily share one common cassette, corresponding to the first PTR of the HAPLN and CSPG genes and the single PTR of the remaining family members. This would support the hypothesis that all members of the link module superfamily were derived from a single ancestral gene containing a link module sequence through a process of gene duplication or exon recruitment and subsequent divergence. Gene structure data and amino acid sequence analyses permit a grouping of TSG-6 with the two stabilin proteins/genes, suggesting that these three genes share a more recent ancestor (Fig. 5). The potential evolutionary relationship between KIAA0527 and other members of the 1PTR [PDB] subfamily could not be established with current data.



View larger version (23K):
[in this window]
[in a new window]
 
FIG. 5.
Proposed evolutionary scheme responsible for the generation of the link module superfamily. We propose that an ancestral link module-containing gene existed in an unknown invertebrate (possibly something like an echinoderm). This gene contained a common exon encoding a single link module or PTR. Gene duplication events led to the diversity of the link module containing superfamily members we know today. The dashed line delineates the evolutionary division between invertebrate and vertebrates.

 

While attempting to determine the gene structures for the new HAPLN genes, we noticed that additional exons were predicted immediately downstream of the human HAPLN3 gene. Closer inspection revealed that these exons corresponded to the 3' exons for human aggrecan (AGC1). The polyadenylation signals for the human HAPLN3 and the AGC1 genes are only 2.4 kilobases apart. Inspection of the human genome revealed that each HAPLN gene was colocalized with one of the four large aggregating chondroitin sulfate proteoglycan (CSPG) genes. The head-to-tail orientation of Brevican and BRAL1 has been recently described (28). The four paralogous gene pairs were: HAPLN1-versican (CSPG2); HAPLN2-brevican (BCAN); HAPLN3-aggrecan (AGC1); and HAPLN4-neurocan (CSPG3) (Fig. 4). Three of the four pairs were organized in the same manner, with the two genes facing each other in a tail-to-tail orientation. The HAPLN2-BCAN gene pair was organized in a tail-to-head orientation, with HAPLN2 lying ~20 kilobase pairs upstream of brevican. While the HAPLN1-CSPG2 gene pair was separated by 56 kilobase pairs, this intergenic interval did not contain any additional genes, and it appears that this region of the human genome is characterized by greater intergenic distances and interexonic distances than most others.



View larger version (24K):
[in this window]
[in a new window]
 
FIG. 4.
Chromosomal localization and organization of the paralogous gene clusters identified on human chromosomes 5, 1, 15, and 19. Arrowheads indicate the direction of transcription for each gene. Each HAPLN gene is indicated by a black arrow, while the four CSPG genes are indicated by white arrows. The approximate genomic size for each gene is indicated in kilobase pairs as is the intergenic distance. In some instances Mb, megabase pairs, is used. Only the major known genes have been included within each paralogous cluster.

 

Examination of the genes flanking the four HAPLN-CSPG gene pairs indicated that the gene pairs are part of a larger cluster of paralogous genes (Fig. 4). This suggests that the four HAPLN-CSPG gene pairs arose through a larger, possibly genome-wide duplication event. For instance, one of the four MEF2 genes was associated with each HAPLN-CSPG pair, although the relative position and orientation of each MEF2 gene was different in relation to the HAPLN-CSPG gene pairs. Importantly, two other paralogous genes, EDIL3 and MFGE8 were located in the same relative position to HAPLN1 and HAPLN3, with no paralogous genes being present in the same position in relation to HAPLN2 and HAPLN4. This would suggest that the HAPLN1 and HAPLN3 gene clusters are more closely related and arose through a more recent duplication. The NTRK genes were also linked to the HAPLN-CSPG gene pairs; NTRK1 with HAPLN2-BCAN and NTRK3 with HAPLN3-AGC1. In the mouse, Ntrk2 is physically linked with the Hapln1-Cspg2 gene pair, with an apparent translocation event separating NTRK2 from HAPLN1-CSPG2 in humans. The overall organization of each gene cluster is consistent with two large duplication events followed by multiple chromosomal rearrangements (Fig. 5). This is consistent with prevailing theories regarding vertebrate genome evolution (29, 30). Those theories hypothesize that either two genome-wide duplication events occurred early in the evolution of the vertebrate lineage and were followed by multiple chromosomal rearrangements and deletions, or that multiple smaller duplications have occurred, followed by rearrangements and deletions. Based upon the organization of the gene clusters, the most recent duplication events are proposed to have generated the HAPLN2/HAPLN4 and the HAPLN1/HAPLN3 pairs (Fig. 5).

Expression Analyses—We investigated the expression of each HAPLN-CSPG gene pair in multiple human tissues using sequential hybridization to a Multiple Tissue Expression array, in order to determine if the linked gene pairs exhibited any sort of co-expression at the level of transcription (Fig. 6). In particular, we focused upon investigation of the HAPLN3-aggrecan and HAPLN4-neurocan gene pairs as these two gene pairs are separated by similar amounts of genomic DNA. Again, while the range of tissue samples included most major organs, it lacked many ECM-rich organs/tissues. Thus, the results presented herein were used as a general screen to determine broad patterns of co-expression or lack thereof, and are certainly not definitive. HAPLN1 and versican were not co-expressed in any significant manner in adult human tissues (data not shown). HAPLN2, brevican, HAPLN4, and neurocan, were restricted to the brain/CNS and subregions and were co-expressed, although HAPLN4 was not expressed as highly as HAPLN2 (Fig. 6 and data not shown). HAPLN3 and aggrecan were not co-expressed in a spatial fashion; aggrecan was restricted to the aorta and the trachea while HAPLN3 was widely expressed. Highest levels of expression for HAPLN3 were observed in all regions of the heart, the mammary gland, ovary, lymph node, spleen, thymus, and fetal heart and lung. Genes immediately flanking the HAPLN-CSPG gene pairs were not co-expressed with the HAPLN and/or CSPG genes (data not shown).



View larger version (63K):
[in this window]
[in a new window]
 
FIG. 6.
Multiple Tissue Expression Array analyses of the expression of the two linked CSPG-HAPLN pairs in humans. MTE arrays were sequentially hybridized with radiolabeled probes specific for each HAPLN and CSPG gene. The results shown are derived from sequential hybridization and stripping of the same MTE membrane. Samples were as follows: A1, whole brain; A2, left cerebellum; A3; substantia nigra; A4, heart; A5, esophagus; A6, transverse colon; A7, kidney; A8, lung; A9, liver; A10, leukemia HL-60; A11, fetal brain; A12, yeast total RNA; B1, cerebral cortex; B2, right cerebellum; B3, nucleus accumbens; B4, aorta; B5, stomach; B6, descending colon; B7, skeletal muscle; B8, placenta; B9, pancreas; B10, HeLa s3 cell line; B11, fetal heart; B12, yeast tRNA; C1, frontal lobe; C2, corpus callosum; C3, thalamus; C4, left atrium; C5, duodenum; C6, rectum; C7, spleen; C8, bladder; C9, adrenal gland; C10, leukemia K562 cells; C11, fetal kidney; C12, E. coli rRNA; D1, parietal lobe; D2, amygdala; D3, pituitary gland; D4, right atrium; D5, jejunum; D6, blank; D7, thymus; D8, uterus; D9, thyroid gland; D10, Leukemia MOLT-4 cells; D11, fetal liver; D12, E. coli DNA; E1, occipital lobe; E2, caudate nucleus; E3, spinal cord; E4, left ventrical; E5, ileum; E6, blank; E7, peripheral blood leukocyte; E8, prostate; E9, salivary gland; E10, Burkitt's lumphoma, Raji cell line; E11, fetal spleen; E12, poly r(A); F1, temporal lobe; F2, hippocampus; F3, blank; F4, right ventrical; F5, ileocecum; F6, blank; F7, lymph node; F8, testis; F9, mammary gland; F10, Burkitt's lymphoma, Daudi cell line; F11, fetal thymus; F12, human Cot-1 DNA; G1, paracentral gyrus of the cerebral cortex; G2, medulla oblongata; G3, blank; G4, intraventricular septum; G5, appendix; G6, blank; G7, bone marrow; G8, ovary; G9, blank; G10, colorectal adenocarcinoma SW480 cell line; G11, fetal lung; G12, human genomic DNA, 100 ng; H1, pons; H2, putamen; H3, blank; H4, apex of the heart; H5, ascending colon; H6, blank; H7, trachea; H8, blank; H9, blank; H10, lung carcinoma A549 cell line; H11, blank; H12, human genomic DNA, 500 ng. The relatively intense signal obtained for E. coli DNA was observed for most probes that had been generated from plasmids propagated in E. coli, presumably due to low levels of E. coli chromosomal DNA in the plasmid preparations and subsequent probes.

 


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
The diverse functions of extracellular matrices are dependent upon multiple interactions between their various components. One of the more important interactions that is required to maintain the integrity and support the functions of the ECM, is the binding interaction between the large chondroitin sulfate proteoglycans and HA, to form a proteoglycan-HA aggregate. This aggregate is dependent upon an additional protein, aptly named link protein. It is the binding of link protein to both the proteoglycan and the HA polymer that strengthens and stabilizes the proteoglycan-HA aggregate. While the cartilage proteoglycan aggregate, composed of HA, aggrecan, and cartilage link protein, is the most well known HA-proteoglycan aggregate, HA and related proteoglycans are found in most tissues. This suggests that similar aggregates are important to the ECM of many tissues. In particular, the brain ECM is built largely around a HA matrix, composed of HA plus two brain/CNS specific proteoglycans, in addition to aggrecan and versican (for reviews see Refs. 31 and 32).

Most of the proteins that bind to HA at the cell surface or within the ECM, do so through a common domain, the so-called link domain or link module (33). This module is defined by two disulfide bonds and is composed of two {alpha}-helices and two antiparallel {beta}-sheets arranged around a large hydrophobic core, similar in structure to the C-type lectin domain, a Ca2+-dependent carbohydrate-binding domain (33, 34). The link module containing proteins constitute the link module superfamily (33). To date, members of this superfamily have only been identified within the vertebrate lineage. It has also been assumed that the link module superfamily evolved from a single common ancestral protein with a C-type lectin-like domain (34) presumably contained within a single exon. The link module domain(s) within each superfamily member is most often contained within one or two separate exon(s) (Fig. 3).

We have extended the link module superfamily through our molecular identification of a link protein gene family composed of four members, including cartilage link protein (CRTL1). We have named this gene family, HAPLN (HA and proteoglycan link protein family) where CRTL1 becomes HAPLN1. Collectively, these four link proteins are expressed by most tissue types, and are particularly abundant in the brain/CNS. Indeed, two of the four link proteins (HAPLN2 and HAPLN4) are brain/CNS specific. HAPLN3 is the most widely expressed link protein and is, therefore, expected to play an important role in the organization and stability of the HA-dependent ECM of many tissues. We hypothesize that overlap in the spatial and temporal distribution of HAPLN1 (CRTL1) and HAPLN3 acts to restrict the Crtl1 knockout mouse phenotype to the developing skeleton.

The level of sequence identity within the HAPLN family and across species strongly suggest that there may be specificity or selectivity in the interaction of each HAPLN with one or more of the CSPGs. The area of lowest sequence identity within the family corresponds to the signal peptide and the Ig domain. It is the Ig region of cartilage link protein that binds to the proteoglycan (22). We predict, therefore, that each link protein may preferentially stabilize HA-proteoglycan aggregates comprised of specific proteoglycans. Thus, where one or more CSPG or HAPLN are co-localized, specific complexes may be assembled through the selectivity in interaction between HAPLN proteins and CSPGs. Alternately, all four HAPLN proteins may be able to interact with all four CSPGs, yielding 16 possible interactions. Furthermore, mixed complexes are entirely possible. The ability of particular HAPLN proteins to assemble with particular CSPGs and the biophysical properties of these complexes will be an area of intense interest and may provide us with an unimagined wealth of information regarding the biological properties and functions of the ECM.

Genes encoding separate components of functional protein complexes are rarely co-localized in vertebrates. We have shown that the four HAPLN genes and the four CSPG genes are organized into four paralogous gene pairs within the vertebrate genome, with each HAPLN gene located adjacent to a CSPG gene. This is one of the first examples of a gene family of this type where physically linked genes encode separate components of a functional protein complex. We initially hoped that this colocalization might be reflected by co-expression of each gene pair, defining the most probable interacting HAPLN and CSPG proteins. This is partly true; the two brain/CNS restricted HAPLN genes are physically linked to the two brain/CNS restricted CSPG genes, brevican and neurocan. While HAPLN2, brevican, HAPLN4, and neurocan are co-expressed in a broad sense in the adult brain/CNS, HAPLN1 and versican and HAPLN3 and aggrecan are generally not co-expressed in the adult (Fig. 6). It will be important to establish the spatial and temporal distribution of each HAPLN and CSPG protein within the embryo and adult tissues at the cellular level using, for instance, in situ hybridization and immunohistochemistry. In this regard, HAPLN2 (also called BRAL1) has been recently shown to colocalize with a splice variant of versican within the brain (24).

The organization of the HAPLN-CSPG genes into four paralogous gene pairs, provides important clues regarding the origin of the HAPLN genes and the other members of the link module superfamily (Fig. 5). We hypothesize that the first HAPLN gene arose from an ancestral CSPG-like gene via a gene duplication and inversion event. At this time, however, we cannot rule out the possibility that the ancestral gene was more similar to the HAPLN gene than to the CSPG. We further hypothesize that the first link module-containing gene existed in an invertebrate, possibly before the chordates. A common PTR cassette consisting of one or two exons is found in all the link module superfamily members (Fig. 3). We propose that the ancestral invertebrate link module gene contained a single PTR within one single exon. This ancestral gene initially duplicated to give two lineages. The two lineages ultimately gave rise to link module superfamily members containing a single link module or proteoglycan tandem repeat (PTR), and members containing two tandem PTRs. We will refer to the 2 subfamilies as the 1PTR [PDB] and 2PTR subfamilies from here onwards (Fig. 5). After the initial gene duplication and divergence event, the 2PTR ancestor went through an internal gene duplication event, which resulted in the presence of 2 tandem link modules. Subsequently, an additional duplication and inversion event yielded the first HAPLN-CSPG gene pair. We propose that these events predate the vertebrate lineage. In this regard, we have recently identified a single CSPG gene in the invertebrate cephalochordate, Amphioxus (Branchiostoma floridae).2 This gene shares exon-intron boundaries with its vertebrate equivalents and shares the highest amino acid sequence identity to aggrecan. Amphioxus lacks most of the cell and tissue types that we associate with a HA matrix in vertebrates, and it will, therefore, be extremely interesting to localize the various components of the HA-dependent ECM in this organism.

It has been hypothesized that two large, genome-wide, duplication events occurred relatively early in vertebrate evolutionary history (29). The alternate hypothesis states that multiple smaller scale duplications have occurred (30), which have had the collective effect of generating many gene families in the higher vertebrates. Our model fits well with either hypothesis. In the origin of the 2PTR subfamily, the first duplication event appears to have been followed by specialization of one of the gene pairs, such that its expression became restricted to neuronal cells. We propose that this event happened in an early vertebrate. The second gene duplication event generated the four paralogous HAPLN-CSPG gene clusters, as we know them. It is highly likely that a more recent rearrangement event moved HAPLN2 relative to brevican. This gene organization is observed in both human and mouse. We propose that two similar genome-wide duplication events resulted in the higher vertebrate 1PTR [PDB] subfamily, although establishment of the paralogous relationships between CD44, LYVE1, and KIAA0257 has not been possible to establish at this point. We propose that the first genome-wide duplication event yielded the CD44 and stabilin branches. The second event generated CD44 and LYVE1 or KIAA0257 and TSG-6 and a STAB gene. More recent duplication events are proposed to have resulted in the two STAB genes and the CD44 and KIAA0527 genes. KIAA0527 may encode a novel cell-surface HA receptor, although a full-length cDNA has not been reported to date. Overall, the 1PTR [PDB] subfamily is dominated by HA receptors, while the 2PTR subfamily is dominated by secreted ECM proteins.

The existence of a HAPLN gene family and the expression profile of its members suggests that HA-proteoglycan aggregates are present in most vertebrate tissues. The composition of each aggregate is not known at this time, but it is tempting to speculate that each HAPLN protein may have selectivity in its ability to bind to specific CSPGs and to stabilize the resultant aggregates. We assume that each HAPLN protein functions in a manner analogous to cartilage link protein, but we cannot rule out additional novel functions. In particular, the significance of the extended C terminus of HAPLN4 is not clear. By investigating the expression patterns and the in vitro and in vivo functions of each HAPLN protein, we expect to obtain novel insights regarding the function of the HA-dependent ECM, and the importance of the HA-dependent ECM to normal development and physiology.


    FOOTNOTES
 
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EBI Data Bank with accession number(s) AY262759 [GenBank] (human HAPLN3), AY262757 [GenBank] (mouse Hapln3), AY262756 [GenBank] (human HAPLN4), and AY262758 [GenBank] (mouse Hapln4).

* This research was supported in part by a Scientist Development Award (to A. P. S.) from the American Heart Association National Office, along with startup funds from the Texas A&M University System Health Science Center Research Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Back

{ddagger} To whom correspondence should be addressed. Tel.: 713-677-7575; Fax: 713-677-7576; E-mail: aspicer{at}ibt.tamu.edu.

1 The abbreviations used are: HA, hyaluronan; ECM, extracellular matrix; PCM, pericellular matrix; CSPG, chondroitin sulfate proteoglycan; CNS, central nervous system; HAPLN, hyaluronan and proteoglycan link; LTP, long term potentiation; EST, expressed sequence tag; PTR, proteoglycan tandem repeat; RACE, rapid amplification of cDNA ends; CRTL1, cartilage link protein 1; MTE, multiple tissue expression. Back

2 R. J. Taft and A. P. Spicer, manuscript in preparation. Back


    ACKNOWLEDGMENTS
 
We thank Tony Day at the University of Oxford for facilitating the sequence analyses of our link proteins, in addition to critiquing this manuscript.



    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Fraser, J. R., Laurent, T. C., and Laurent, U. B. (1997) J. Intern. Med. 242, 27-33[Medline] [Order article via Infotrieve]
  2. Mjaatvedt, C. H., Yamamura, H., Capehart, A. A., Turner, D., and Markwald, R. R. (1998) Dev. Biol. 202, 56-66[CrossRef][Medline] [Order article via Infotrieve]
  3. Watanabe, H., Kimata, K., Line, S., Strong, D., Gao, L. Y., Kozak, C. A., and Yamada, Y. (1994) Nat. Genet. 7, 154-157[Medline] [Order article via Infotrieve]
  4. Zhou, X. H., Brakebusch, C., Matthies, H., Oohashi, T., Hirsch, E., Moser, M., Krug, M., Seidenbecher, C. I., Boeckers, T. M., Rauch, U., Buettner, R., Gundelfinger, E. D., and Fassler, R. (2001) Mol. Cell. Biol. 21, 5970-5978[Abstract/Free Full Text]
  5. Faltz, L. L., Caputo, C. B., Kimura, J. H., Schrode, J., and Hascall, V. C. (1979) J. Biol. Chem.
  6. Morgelin, M., Heinegard, D., Engel, J., and Paulsson, M. (1994) Biophys. Chem. 50, 113-128[CrossRef][Medline] [Order article via Infotrieve]
  7. Watanabe, H., and Yamada, Y. (1999) Nat. Genet. 21, 225-229[CrossRef][Medline] [Order article via Infotrieve]
  8. Binette, F., Cravens, J., Kahoussi, B., Haudenschild, D. R., and Goetinck, P. F. (1994) J. Biol. Chem. 269, 19116-19122[Abstract/Free Full Text]
  9. Kobayashi, H., Sun, G. W., Hirashima, Y., and Terao, T. (1999) Endocrinology 140, 3835-3842[Abstract/Free Full Text]
  10. Hirakawa, S., Oohashi, T., Su, W. D., Yoshioka, H., Murakami, T., Arata, J., and Ninomiya, Y. (2000) Biochem. Biophys. Res. Commun. 276, 982-989[CrossRef][Medline] [Order article via Infotrieve]
  11. Lesley, J., and Hyman, R. (1998) Front. Biosci. 3, D616-630[Medline] [Order article via Infotrieve]
  12. Zhang, Y., Cao, L., Yang, B. L., and Yang, B. B. (1998) J. Biol. Chem. 273, 21342-21351[Abstract/Free Full Text]
  13. Miura, R., Aspberg, A., Ethell, I. M., Hagihara, K., Schnaar, R. L., Ruoslahti, E., and Yamaguchi, Y. (1999) J. Biol. Chem. 274, 11431-11438[Abstract/Free Full Text]
  14. Aspberg, A., Miura, R., Bourdoulous, S., Shimonaka, M., Heinegard, D., Schachner, M., Ruoslahti, E., and Yamaguchi, Y. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 10116-10121[Abstract/Free Full Text]
  15. Rauch, U., Feng, K., and Zhou, X. H. (2001) Cell. Mol. Life Sci. 58, 1842-1856[Medline] [Order article via Infotrieve]
  16. Olin, A. I., Morgelin, M., Sasaki, T., Timpl, R., Heinegard, D., and Aspberg, A. (2001) J. Biol. Chem. 276, 1253-1261[Abstract/Free Full Text]
  17. Friedlander, D. R., Milev, P., Karthikeyan, L., Margolis, R. K., Margolis, R. U., and Grumet, M. (1994) J. Cell Biol. 125, 669-680[Abstract]
  18. Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta-Vidal, A., Vastrik, I., and Clamp, M. (2002) Nucleic Acids Res. 30, 38-41[Abstract/Free Full Text]
  19. Feinberg, A. P., and Vogelstein, B. (1984) Anal. Biochem. 137, 266-267[Medline] [Order article via Infotrieve]
  20. Politz, O., Gratchev, A., McCourt, P. A., Schledzewski, K., Guillot, P., Johansson, S., Svineng, G., Franke, P., Kannicht, C., Kzhyshkowska, J., Longati, P., Velten, F. W., Johansson, S., and Goerdt, S. (2002) Biochem. J. 362, 155-164[CrossRef][Medline] [Order article via Infotrieve]
  21. Tsifrina, E., Ananyeva, N. M., Hastings, G., and Liau, G. (1999) Am. J. Pathol. 155, 1625-1633[Abstract/Free Full Text]
  22. Heinegard, D., and Hascall, V. C. (1974) J. Biol. Chem. 249, 4250-4256[Abstract/Free Full Text]
  23. Mahoney, D. J., Blundell, C. D., and Day, A. J. (2001) J. Biol. Chem. 276, 22764-22771[Abstract/Free Full Text]
  24. Oohashi, T., Hirakawa, S., Bekku, Y., Rauch, U., Zimmermann, D. R., Su, W. D., Ohtsuka, A., Murakami, T., and Ninomiya, Y. (2002) Mol. Cell. Neurosci. 19, 43-57[CrossRef][Medline] [Order article via Infotrieve]
  25. International Human Genome Sequencing Consortium (2001) Nature 409, 860-921[CrossRef][Medline] [Order article via Infotrieve]
  26. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T. J., Higgins, M. E., Ji, R. R., Ke, Z., Ketchum, K. A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G. V., Milshina, N., Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D., Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y. H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N. N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J. F., Guigo, R., Campbell, M. J., Sjolander, K. V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y. H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. (2001) Science 291, 1304-1351[Abstract/Free Full Text]
  27. Nagase, T, Ishikawa, K., Miyajima, N., Tanaka, A., Kotani, H., Nomura, N., and Ohara, O. (1998) DNA Res. 5, 31-39[Medline] [Order article via Infotrieve]
  28. Nomoto, H., Oohashi, T., Hirakawa, S., Ueki, Y., Ohtsuki, H., and Ninomiya Y. (2002) Acta Med. Okayama 56, 25-29[Medline] [Order article via Infotrieve]
  29. Ohno, S. (1993) Curr. Opin. Genet. Dev. 911-914
  30. Friedman, R., and Hughes, A. L. (2001) Genome Res. 11, 1842-1847[Abstract/Free Full Text]
  31. Yamaguchi, Y. (2002) Cell. Mol. Life Sci. 57, 276-289
  32. Oohira, A., Matsui, F., Tokita, Y., Yamauchi, S., and Aono, S. (2000) Arch. Biochem. Biophys. 374, 24-34[CrossRef][Medline] [Order article via Infotrieve]
  33. Day, A. J., and Prestwich, G. D. (2002) J. Biol. Chem. 277, 4585-4588[Free Full Text]
  34. Kohda, D., Morton, C. J., Parkar, A. A., Hatanaka, H., Inagaki, F. M., Campbell, I. D., and Day, A. J. (1996) Cell 86, 767-775[Medline] [Order article via Infotrieve]