©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Identification of Three N-terminal Ends of Type XVIII Collagen Chains and Tissue-specific Differences in the Expression of the Corresponding Transcripts
THE LONGEST FORM CONTAINS A NOVEL MOTIF HOMOLOGOUS TO RAT AND DROSOPHILA FRIZZLED PROTEINS (*)

(Received for publication, August 3, 1994; and in revised form, October 28, 1994)

Marko Rehn Taina Pihlajaniemi (§)

From the Collagen Research Unit, Biocenter and Department of Medical Biochemistry, University of Oulu, FIN-90220, Finland

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Transcripts for the alpha1 chain of mouse type XVIII collagen were found to be heterogeneous at their 5`-ends and to encode three variant N-terminal sequences of the ensuing 1315-, 1527-, or 1774-residue collagen chains. The variant mRNAs appeared to originate from the use of two alternate promoters of the alpha1(XVIII) chain gene, resulting in the synthesis of either short or long Nterminal non-collagenous NC1 domains, the latter being further subject to modification due to alternative splicing of the transcripts. As a result, the 1527- and 1774-residue polypeptides share the same signal peptide, and the lengths of their NC1 domains are 517 or 764 amino acid residues, respectively, while the 1315-residue polypeptide has a different signal peptide and a 301-residue NC1 domain. The longest NC1 domain was strikingly characterized by a 110-residue sequence with 10 cysteines, which was found to be homologous with the previously identified frizzled proteins belonging to the family of G-protein-coupled membrane receptors. Thus, it is proposed that the cysteine-rich motif, termed fz, represents a new sequence motif that can be found in otherwise unrelated proteins. Tissues containing mainly one or two NC1 domain mRNA variants or all three NC1 domains were identified, indicating that there is tissue-specific utilization of two alternate promoters and alternative splicing of alpha1(XVIII) transcripts.


INTRODUCTION

The collagen superfamily includes 19 collagen types(1, 2, 3) . A feature common to all collagenous proteins is the presence of at least one triple helical sequence of a repeated Gly-X-Y motif. All collagenous proteins contain non-collagenous sequences at their N and C termini and often also within the collagenous sequences. Portions of the non-collagenous domains of several collagens can be aligned with sequences found in non-collagenous mosaic proteins such as fibronectin, von Willebrand factor, and thrombospondin, while other regions are exclusively found in the collagens(4) .

The 19 types of collagen can be divided into two groups in terms of their primary structure and supramolecular aggregates, fibrillar and non-fibrillar collagens(1, 2, 3) . Characteristic of the structurally homologous fibril-forming collagens, i.e. types I-III, V, and XI, is that the repeated Gly-X-Y sequence is long and uninterrupted, while the members of the heterogeneous group of non-fibrillar collagens, types IV, VI-X, and XII-XIX, are all characterized by the presence of one or more interruptions in the collagenous sequence. Several kinds of molecular assembly have been found in the non-fibrillar collagens, and members of this group can be divided into distinct subgroups in terms of their structural homology.

A recent addition to the non-fibrillar collagens is type XVIII(5, 6, 7, 8) . Elucidation of the complete primary structure of the alpha1 chain of mouse type XVIII collagen has revealed a 1315-residue polypeptide that includes a 25-residue signal peptide, a 301-residue N-terminal non-collagenous domain (NC1), a 674residue collagenous sequence with nine interruptions of 10-24 residues, and a 315-residue non-collagenous domain (NC11)(8) . It is likely that hetereogeneity occurs at the N-terminal end, since the overlapping cDNA clones reported (6, 7) differ with respect to the first 27 amino acid residues of the alpha1(XVIII) chain.

Interestingly, seven of the collagenous domains of the alpha1(XVIII) chain and both flanking domains share homology with the recently described alpha1 chain of type XV collagen(6, 9, 10, 11) , and it has been suggested that they may form a new subgroup within the collagen family (5, 6, 7) . Their N-terminal non-collagenous domains share sequence homology in that both contain an approximate 200-residue sequence corresponding to the N-terminal end of thrombospondin(7, 10) . The C-terminal non-collagenous domains of the alpha1(XVIII) and alpha1(XV) chains are unique to these two chains and are highly homologous throughout their sequences(6, 8) . Despite the homology, the two chains are sufficiently different to make it unlikely that they reside in the same collagen molecule, implying that they are alpha chains of separate collagen types. The human genes encoding these homologous proteins are located on separate chromosomes, that for the alpha1(XVIII) chain on 21q22.3 (12) and that for the alpha1(XV) chain on 9q21-22(13) .

The purpose of this work was to elucidate the nature of the sequence variability affecting the N-terminal ends of the alpha1(XVIII) chains. Three variant N-terminal sequences were identified, and one of these was found to contain a cysteine-rich domain postulated here to represent a new sequence motif. Furthermore, marked differences were observed in the tissue distribution of the variant mRNAs corresponding to the three N-terminal ends.


MATERIALS AND METHODS

Preparation of a Mouse Embryo cDNA Library and cDNA Pool

The total RNA from an 18.5-day-old mouse embryo was a generous gift from Dr. Marjo Metsäranta (University of Turku, Finland). Poly(A) RNA was isolated from this total RNA using oligo(dT)-coated DynaBeads (Dynal, Norway), and a primer-extended cDNA pool was prepared from about 1 µg of the poly(A) RNA using an oligonucleotide MIXX-20 (see below) complementary to mouse alpha1(XVIII) sequences as a primer and the Time-Saver-cDNA synthesis kit (Pharmacia Biotech Inc.). The cDNAs were ligated into the gt10 vector (Stratagene) and packaged into bacteriophage particles using the in vitro packaging extract (Amersham Corp.). An aliquot of the blunt-ended cDNA pool was also characterized by the PCR (^1)method as described below.

Isolation of cDNA Clones and Nucleotide Sequencing

A 562-bp EcoRI fragment (nt 1-562 in the previously described clone ME-1, see (7) ) was used as a probe to screen the mouse 18.5-day-old embryo library(14) . The positive recombinant phages were isolated, and the insert DNAs were subcloned to the NotI site of Bluescript SK (Stratagene). The nucleotide sequences were determined for both strands of the cDNAs by the dideoxynucleotide sequencing method (15) using T7 polymerase (Pharmacia Biotech Inc.).

To isolate the extreme 5`-end of the short alpha1(XVIII) variant, cDNA fragments were obtained without the conventional library screening by the PCR method developed for isolation of end fragments from yeast artificial chromosomes clones(16) . A 3-µl aliquot of the pool of blunt-ended cDNA fragments were ligated at 0.1 µM final concentration of a linker solution (for linkers, see (16) ) in a 25-µl ligation reaction at room temperature for 4 h. The incubation was stopped by adding 75 µl of water, and the strands were denatured by heating for 10 min at 95 °C. 2 µl of the diluted cDNA pool was used as the template in a 10-µl PCR reaction containing Dynazyme polymerase buffer (Finnzymes, Finland), 0.2 mM of each deoxynucleotide, 10 pmol of MIXX-23 (see below), 1 pmol of a 25-mer linker primer (see (16) ), and 0.5 units of Dynazyme polymerase (Finnzymes, Finland). The amplification conditions were 1 min at 94 °C, 45 s at 65 °C, and 1 min at 72 °C for 35 cycles. After the first round of PCR, the reaction mixture was diluted with 250 µl of water, and 2 µl of the dilution was used for the second round of PCR with the conditions as above except that 5 pmol of MIXX-24 (see below) primer and 5 pmol of the 25-mer linker primer were used. The products synthesized were EcoRI digested and subcloned to the EcoRI site of Bluescript SK. After transformation, colony hybridization was used to screen for recombinants containing alpha1(XVIII) sequences.

To isolate cDNA clones covering the translation initiation codon of the long variant, the mouse embryo cDNA pool described above was used for PCR as described above. A 10-µl PCR reaction contained 2 µl of the cDNA pool, Dynazyme polymerase buffer, 0.2 mM of each deoxynucleotide, 5 pmol of the primer MIXX-39 (see below), 5 pmol of a 25-mer linker primer, and 0.5 units of Dynazyme polymerase, and the amplification was performed as above. The PCR products were characterized as above except that the probe used in the hybridization was the insert of the clone PE17.24 described above.

The oligonucleotide primers used were (bases added to generate a complete EcoRI restriction site as well two base extensions are underlined): MIXX-20, 5`-GATGGCAAATAGCACCC-3` (nt 379-395 and 1672-1688 in Fig. 2, A and B, respectively); MIXX-23, 5`-GTGGCTGGCCGGACATGAAACAG-3` (nt 345-367 and 1668-1660 in Fig. 2, A and B, respectively); MIXX-24, 5`-ATGAATTCTGGTCCAAAGATGTAGGCCGG-3` (nt 258-280 and 1551-1573 in Fig. 2, A and B, respectively); and MIXX-39, 5`-ATTCAGGGGACTCAGGGAATTC-3` (nt 86-107 in Fig. 2B).


Figure 2: Nucleotide and deduced amino acid sequences of cDNA clones for the three NC1 domains of the alpha1 chain of mouse type XVIII collagen. A, nucleotide and deduced amino acid sequences of the short NC1 domain, NC1-301, the sequences shown encoding the signal peptide and the first 78 residues of this domain. The asterisk indicates the extreme 5`-end nucleotide of the previously reported clone SXT-5(7) . B, nucleotide and deduced amino acid sequences for the long NC1 domains, NC1-764 and NC1-517, the sequences shown encoding the signal peptide and the first 541 or 273 residues, respectively, of these domains. Sequences lacking in clones PE8.1, PE19, and PE15.2 due to alternative splicing are shown in brackets. In A and B, the last 17 nt represent oligonucleotide MIXX-20 (see Fig. 1A). A 299-residue sequence is common to the three NC1 domains, and the first 76 residues of the common sequences are shown as shaded in A and B. The remaining 223 residues are not shown in the figure, as they are encoded by sequences downstream of oligonucleotide MIXX-20 used in this study and can be found in a previous study(7) . Potential N-linked glycosylation sites are boxed, cysteine residues are circled, and the arrows indicate the most likely signal peptide cleavage sites. The numbering of the nucleotide and amino acid residues begins from the 5`-end.




Figure 1: cDNA clones encoding three N termini of the mouse alpha1 chain of type XVIII collagen and hydropathy plot of the longest variant. A, the overlapping cDNA clones are shown with respect to the schematic structure of the short (upperpanel) and long (lowerpanel) NC1 domains. The clones covering three different length NC1 variants, NC1-301, NC1-517, and NC1-764, are shown. SS1 and SS2 indicate signal peptides for the NC1-301 and NC1-764/517 variants, respectively. The lengths of the signal peptides, the domain common to NC1-764 and NC1-517, the alternatively spliced sequence present only in NC1-764, and the sequence common to all NC1 variants are shown in amino acids; (2) indicates residues unique to the NC1-301. The locations of the EcoRI (E) and SacI (S) restriction sites are indicated. The EcoRI sites shown in parentheses represent linker sites introduced during cloning. cDNA sequences specific for the long and short NC1 domains are shown with gray and white boxes, respectively. The locations of oligonucleotide primers used in cloning are shown with arrows. The scale is in kilobases. B, in the hydropathy plot, the numbering of the amino acid residues begins from the first residue of the longest NC1 domain. The hydrophobic regions are positive, and the hydrophilic ones are negative. The alternatively spliced sequences are shown in brackets. The locations of all cysteine residues are shown with verticalbars. The variable sequences of the three N termini and their 299-residue common region are indicated. The arrow indicates the most likely signal peptide cleavage site.



Northern Blot Analysis of Mouse Tissues Using mRNA Variantspecific Probes

A Mouse Multi-Tissue Northern blot (Clontech) prepared by gel electrophoresis of 2 µg/lane of poly(A) RNA isolated from various adult mouse tissues was hybridized under stringent conditions with P-labeled probes. The probes were (a) a mixture of the previously described 2.3- and 1.6-kb cDNA clones SXT-1 (7) and ME-103(8) , respectively, which recognize all mRNAs encoding the mouse alpha1(XVIII) collagen chain; (b) a 458-bp EcoRI/SacI fragment of the clone PE19, which recognizes both alternatively spliced variants of the long NC1 variant of the mouse alpha1(XVIII) chain; (c) a 540-bp XmnI/StuI fragment of the clone PE17.24, specific for the longest variant of the mouse alpha1(XVIII) chain; and (d) a 147-bp EcoRI/MluI fragment of the clone PX2.25, specific for the shortest alpha1(XVIII) chain variant.

Sequence Analysis

DNASIS and PROSIS (Pharmacia Biotech Inc.) were used to analyze the nucleotide and amino acid sequence data. The hydropathy plot was constructed according to Kyte and Doolittle (17) with a window size of 9 residues. The signal sequences were predicted by the Antheprot software(18) . Nucleotide and amino acid homology comparisons were made against the GenBank, EMBL, PIR, and SWISS-PROT data bases at the National Center for Biotechnology Information (National Institutes of Health) using the BLAST network service(19) . Multiple sequence alignment was constructed using the Pileup program of the GCG package software (20) with the parameters of PAM-250 matrix(21) .


RESULTS

Isolation of cDNA Clones Encoding the 5`-End of the Mouse alpha1(XVIII) Chain

To evaluate the nature of the heterogeneity at the N terminus of the alpha1(XVIII) collagen chain, a primer-extension cDNA library and a cDNA pool were constructed from an 18.5-day-old mouse embryo using as a primer an oligonucleotide complementary to mouse alpha1(XVIII) sequences coding for a portion of the known N terminus (MIXX-20 in Fig. 1A). Screening of the library (see ``Materials and Methods'') resulted in the identification of five recombinant phages: PE8.1, PE15.2, PE17.24, PE19, and PE21 (Fig. 1A). The 0.36-kb cDNA clone PE21 appeared to represent the same 5`-end sequences as our previously reported clone SXT-5 (7) but extending 27 nt further in the 5` direction (Fig. 2A). Additional 5`-sequences were obtained by using an aliquot of the cDNA pool prepared by primer extension of mouse embryo RNA for amplification of the 5`-ends of the template cDNAs by PCR. For this purpose, the blunt-ended cDNA pool was ligated to a linker, and the 5`-ends of the cDNAs were specifically amplified with a sense primer corresponding to the linker and an antisense oligonucleotide (MIXX-24 in Fig. 1A) corresponding to 5`-sequences of type XVIII collagen. This resulted in the identification of the 0.25-kb clone PX2.25 (Fig. 1A), which extended 66 nt further in the 5` direction (Fig. 2A) than the previously reported clone SXT-5(7) .

The cDNA clones PE8.1, PE19, and PE15.2 varied in length between 833-929 nt but contained overlapping sequences at their extreme 5`- and 3`-ends to the 1.7-kb clone PE17.24 (Fig. 2B). The cDNA-derived open reading frame of the clone PE17.24 contained a stretch of leucine residues at its beginning, which suggested that the translation initiation codon might be located in the near 5` direction. Primer extension in combination with PCR of the 5`-ends using MIXX-39 as an antisense oligonucleotide resulted in isolation of the 94-bp clone PX4.3 (Fig. 1A), which extended furthest in the 5` direction and covered the putative initiation codon.

Nucleotide and Amino Acid Sequences for Three Variant N-terminal Ends

Where the previously reported clones encoded a full-length polypeptide of 1315 amino acid residues(7, 8) , characterization of the clones PE21 and PX2.25 resulted in the identification of 66 nt of 5`-untranslated sequences beyond the 20 nt previously reported on the basis of clone SXT-5(7) , thus extending the 5`-untranslated portion of the corresponding mRNA to 86 nt (Fig. 2A). As previously described, the polypeptide encoded by this variant mRNA contains a 25residue putative signal peptide (signal peptide 1) and a 301-residue N-terminal non-collagenous domain termed NC1-301(7) . This domain is characterized by the presence of a thrombospondin sequence motif with one potential N-linked glycosylation site and 2 cysteines residing within it.

The 1461 extreme 5` nucleotides derived from the overlapping clones PE17.24 and PX4.3 differed from the first 168 nt derived from clones PX2.25 and PE21, but nt 1462-1688 of PE17.24/PX4.3 and nt 169-395 of PX2.25/PE21 were identical. Thus, PE17.24/PX4.3 must encode a variant N-terminal domain in which the signal peptide and the first 2 amino acid residues of the NC1-301 domain are replaced by a markedly longer sequence. The sequences encoding the remaining 299 residues of the NC1 domain are identical for the two variants. Nucleotides 1338-1467 of PE-17.24/PX4.3 were identical to the 123-nt clone TA5 reported by Oh et al.(6) as differing from the 5`-end of the mRNA encoding the NC1-301. The sequences described in this paper therefore represent the same 5`-end as reported by Oh et al.(6) but extend 1.4 kb further in the 5` direction.

The overlapping clones PE17.24 and PX4.3 correspond to a non-collagenous domain of 785 residues (Fig. 2B). This sequence begins with a methionine, and only 2 nt of 5`-untranslated sequences are included in the clones. The N-terminal sequence of the predicted polypeptide is highly hydrophobic and clearly fulfills the criteria for a signal peptide. The cysteine at position 19 and leucine at position 21 best suit the rules for residues occupying the -3 and -1 positions in a signal peptide, but the valine at position 22 and the alanine at position 24 satisfy the rules almost as well(22) . Thus, the signal peptide is predicted to be either 21 or 24 residues in length. Assuming that the signal peptide identified here is 21 residues long (signal peptide 2), the NC1 domain encoded by the mRNA corresponding to PE17.24 will be 764 residues (NC1-764). A striking feature of the NC1-764 domain is the presence of 10 cysteine residues within a stretch of 110 residues located immediately upstream of the portion of the NC1-764 domain identical to the NC1-301. In addition, putative N-linked glycosylation sites are located at residues 354 and 361 of NC1-764.

Clones PE8.1, PE15.2, and PE19 were lacking nt 721-1461, which encode residues 241-486 of NC1-764 (Fig. 2B). Thus, these clones covered the same signal peptide 2 as clones PE17.24 and PX4.3, and the NC1 domain is 517 residues (NC1-517). A stretch of 247 residues located at the center of NC1-764 is lacking from NC1-517, and most strikingly, the region lacking encompasses the cysteine-rich domain and the two putative N-linked glycosylation sites (see above).

Hydropathy analysis indicated that an approximately 70-residue stretch adjacent to the putative signal peptide of NC1-764 and NC1-517 represents the most hydrophilic region of the NC1 domain sequences, this stretch being particularly rich in acidic amino acid residues (Fig. 1B and Fig. 2B). In contrast, the region subject to alternative splicing and the beginning of the common NC1 portion are the most hydrophobic parts (Fig. 1B).

A Novel Sequence Motif in the NC1-764 Domain of alpha1(XVIII) Chains

Homology searches against protein data banks resulted in the finding of an amino acid homology between the cysteine-rich domain of NC1-764 and the rat proteins frizzled-1 and frizzled-2 (fz-1 and fz-2, respectively) (Fig. 3)(23) . The BLAST search predicted that the probability of coincidental homology between the cysteine-rich domain of NC1-764 and rat fz-1 protein is 0.017%. The two rat proteins are homologous to a Drosophila melanogaster frizzled protein encoded by the locus frizzled(24) .


Figure 3: Comparison of the cysteine-rich sequences identified in the NC1-764 domain of the mouse alpha1(XVIII) collagen chain with rat frizzled-1 and frizzled-2 proteins and the Drosophila frizzled protein. The numbering of the amino acid residues begins from the N termini of each protein (23) . Note that the last alpha1(XVIII) chain amino acid residue in the aligned sequence represents the extreme C-terminal residue of the alternatively spliced region in the NC1-764 domain. The amino acid residues that are identical between the mouse alpha1(XVIII) collagen and one or more of the frizzled proteins are shown by blackboxes, and similar residues are shown in shadedboxes. The additional identities and homologies that exist only between the frizzled proteins are not indicated. The similarly located cysteine residues are numbered from the N-direction. A consensus motif for the homologous sequence present in all four polypeptides is indicated as follows: h, hydrophobic; p, polar; -, acidic residues; and +, basic residues.



The 10 cysteine residues located in the cysteine-rich region of the mouse alpha1(XVIII) collagen chain NC1 domain can be aligned with 10 almost identically spaced cysteine residues in the rat and Drosophila frizzled proteins (Fig. 3). Other residues around the cysteines are also found to be identical or similar, the identity between a stretch of 126 amino acids in NC1-764 and 127 amino acids in the rat frizzled-1 protein being 24% and the similarity 47%. Within this stretch, the degree of identity is 57-86%, and the degree of similarity is 81-95% between the three frizzled proteins. The numbers of amino acid residues separating cysteines 2 and 3, cysteines 3 and 4, cysteines 4 and 5, and cysteines 6 and 7 are identical in the three frizzled proteins and the alpha1(XVIII) chain, while differences of 1-4 residues in length can be observed between the other cysteine pairs.

Tissue Distribution of Variant 5`-Ends of Type XVIII Collagen Transcripts

A Northern blot containing poly(A) RNA isolated from adult mouse tissues was used to investigate the distribution of mouse type XVIII collagen transcripts with variant 5`-ends (Fig. 4). The probe identifying all variant mouse alpha1(XVIII) transcripts showed strong hybridization signals with liver, kidney, lung, skeletal muscle, and testis RNA and faint signals with the other samples (Fig. 4A). Interestingly, the tissues with strong signals contained multiple transcripts varying in size between 4.5-7.0 kb.


Figure 4: Northern blot analysis of variant mRNAs of mouse alpha1(XVIII) collagen in mouse tissues. 2 µg of poly(A) RNA from the adult tissues indicated were fractionated by gel electrophoresis. muscle, skeletal muscle. A, blot hybridized with a mixture of cDNA probes recognizing all of the alpha1(XVIII) collagen mRNAs. To obtain a representative hybridization pattern for all of the alpha1(XVIII) collagen mRNAs, a shorter exposure (shortexp.) of the skeletal muscle sample is also shown. B, blot hybridized with a probe identifying mRNAs encoding both long NC1 domains. C, blot hybridized with a probe recognizing mRNAs encoding the NC1-764 variant. D, blot hybridized with a probe identifying mRNAs encoding the NC1-301 domain. The positions of the probes with respect to the variant NC1 structures are given schematically below the autoradiography blots. The sizes of marker RNAs and the alpha1(XVIII) mRNA signals are indicated in kilobases.



The same Northern blot was also hybridized with variant-specific cDNA probes. Two different-sized mRNAs were found to occur for both the NC1-764 and NC-301 variants and probably also for the NC1-517 variant, which reflected the utilization of different poly(A) signals(5, 6) . Probe B (see Fig. 4) identifying all mRNAs encoding NC1-764 and NC1-517 domains resulted in clear signals of 5.7, 6.1, and 7.0 kb in the lung, liver, skeletal muscle, and kidney and extremely faint signals of 6.1 and 7.0 kb in the other tissues (Fig. 4B), while probe C (see Fig. 4) detecting only those mRNAs encoding the NC1-764 variant resulted in the detection of two transcripts of 6.1 and 7.0 kb in the lung, liver, skeletal muscle, and kidney (Fig. 4C) and extremely faint signals in the other tissues (not visible in Fig. 4C). Comparison of the signals obtained using the B and C probes indicated that the 7.0-kb band is specific for NC1-764 mRNA, and the 5.7-kb band is specific for NC1-517 mRNA, while the 6.1-kb band contains both the shorter NC1-764 mRNA and the longer NC1-517 mRNA. Furthermore, comparison of the signal intensities obtained with probes B and C suggests that mRNAs encoding the NC1-517 variant are in majority in liver. A probe specific for NC1-301 resulted in detection of strongly hybridizing mRNAs of 4.5 and 5.7 kb in the kidney and testis, the same transcripts also being faintly visible in the other tissues (Fig. 4D). Since the NC1-301specific probe did not recognize the mRNAs for the long variants, the results suggest that separate transcription initiation sites exist for the two types of transcripts. The relative expression levels of the three mRNA variants are indicated in Table 1.




DISCUSSION

The three NC1 domain variants of the type XVIII collagen chains are likely to be due to the use of two alternate promoters and to the primary transcripts for one of these promoters also being subject to alternative splicing. An overview of the three alpha1(XIII) polypeptide variants, which consist of 1774, 1527, or 1315 amino acid residues with sequence-derived molecular masses of 182.2, 156.0, or 134.3 kDa, respectively, is presented in Fig. 5. The first two variants have the same signal peptide and NC1 domains that are either 764 or 517 residues in length, depending on the alternative splicing, while the third variant has its own signal peptide, and its NC1 domain is 301 residues. All three polypeptides are thought to be identical with respect to a 299-residue portion of their NC1 domains, their collagenous domains, and their C-terminal non-collagenous domains.


Figure 5: Schematic structures of the full-length variant polypeptides of the mouse alpha1(XVIII) collagen chains. Collagenous sequences are shown in white, non-collagenous domains common to all variants are shown in black, non-collagenous sequences common to both long variant NC1 domain portions are shown in gray, and non-collagenous sequence unique to the NC1-764 variant is shown by cross-hatching. The putative signal peptide 1 is indicated with lefthatching, and the putative signal peptides 2 are indicated with righthatching. The lengths of the amino acid sequences (aa) specific for each variant are given, as well as the lengths of the common regions. C, cysteine residue; 10C, cluster of 10 cysteine residues; N, potential N-glycosylation site; 2N, two adjacent N-glycosylation sites; O, potential O-linked glycosylation site; fz and tsp, frizzled and thrombospondin sequence motifs, respectively; ac, acidic domain.



Several of the other collagens are known to be modified by the use of alternative promoters and alternative splicing, although the significance of these modifications is not fully understood at present. An alternate, cartilage-specific transcription start site has been found within intron 2 of the gene encoding the chick alpha2 chain of type I collagen(25) , and an alternative transcript of the chick alpha1(III) gene was identified in which exons 1-23 are replaced by the initiation of transcription at intron 23(26) . These type I and III variant transcripts probably direct systhesis of non-collagenous chains. Two transcription start sites have also been found for the gene encoding the alpha1 chain of type IX collagen(27) . The two promoters are used in a tissue-specific manner, resulting in the synthesis of alpha1(IX) polypeptides possessing either long or short N-terminal non-collagenous domains similar to our observations regarding type XVIII collagen. The first collagen found to undergo alternative splicing was type XIII collagen, and it is still the only one in which this affects both collagenous and non-collagenous sequences(28, 29, 30) . Subsequently, alternative splicing has been found to affect the modular N-terminal non-collagenous domains of the alpha3 chain of type VI collagen(31, 32, 33) and the alpha1 chains of the homologous collagen types XII and XIV (34, 35, 36) . The C-terminal non-collagenous domain of the alpha3 chain of type IV collagen (37) and the alpha2 chain of type VI collagen(38, 39) are also modified by alternative splicing. Furthermore, the primary transcripts for the alpha1(XIV) chain are also subject to alternative splicing affecting the 5`-untranslated sequences, which are hypothesized to modulate translational control(35) . The mode of alternative splicing of primary transcripts is in most cases exon skipping, but use of an internal splice acceptor site has also been observed in alpha2(VI) transcripts(38) . The 5`-end of the gene encoding the alpha1(XVIII) chains has not been characterized, and therefore we do not yet know how the proposed two promoters of this gene are arranged with respect to each other or what the mode of the observed alternative splicing is. Significant differences occur at the N-terminal ends of the type XVIII collagen chain variants, and it may therefore be presumed that the variant molecules possess different functional properties.

Both long NC1 domains have a markedly more acidic N terminus than NC1-301, which only consists of sequences present in all three NC1 variants. The most striking difference between the NC1 variant is the occurrence of a stretch of 110 amino acid residues with 10 cysteines within the 247-residue sequence only present in the NC1-764 domain. Interestingly, the alpha1(XVIII) chain cysteine-rich domain showed homology to a cysteine-rich domain found in the Drosophila frizzled protein and the rat frizzled-1 and frizzled-2 proteins. These proteins vary in size between 570-641 residues and all contain a domain characterized by 10 cysteines within their N-terminal one-third portion and seven putative membrane domains within their C-terminal two-thirds portion(23, 40) . With respect to the seven-transmembrane-domain profile, the frizzled proteins resemble the G-protein-coupled receptors. Mutations in the Drosophila frizzled locus encoding the frizzled protein cause abnormal orientation of the wing hairs, suggesting that this protein is needed for establishment of cell polarity in the epidermis(24, 41) . Moreover, genetic mosaic studies suggest that the product of the frizzled locus functions in a dual fashion as it appears to serve both in reception of a polarity signal and in its intercellular transmission to the adjacent cells. The cysteine-rich portions of the frizzled proteins encompass most of their extracellular portion, and these domains are thus likely to be involved in ligand binding and intercellular transmission of polarity information. The ligand(s) participating in this event is not known, however. Collagens are known to be mosaic proteins with a number of shuffled domains also present in non-collagenous proteins(4) . The homology identified here leads us to suggest that the cysteine-rich sequence, termed here the fz motif, represents another sequence motif that can be found in both non-collagenous and collagenous proteins. Elucidation of the possible function of the fz motif in type XVIII collagen will require recombinant expression experiments, however.

The tissue distribution of the variant alpha1(XVIII) collagen transcripts is unusual. Of the eight mouse tissues studied, markedly high levels of mRNAs for type XVIII collagen were found in the liver and kidney, the next highest levels being found in the lung, skeletal muscle, and testis, while the brain, heart, and spleen contained markedly lower levels. We know of no other collagen mRNAs with similarly prominent expression in liver. mRNAs for the NC1-301 variant appeared to be constitutively expressed in low amounts in all tissues except in the kidney and testis, where they were more abundant, while the two other mRNA variants likely to be derived from the second promoter were found in the kidney, liver, lung, and skeletal muscle, being thus more restricted in their tissue distribution. mRNAs encoding the NC1-517 variant were mainly responsible for the strong signals in the liver, but liver tissue was also found to contain the mRNA variant for the NC1-764 domain characterized by the cysteine-rich sequence. Lung tissue also contained mRNAs encoding the two long NC1 domains as its major variants but with a more even distribution of the two forms. The kidney contained all three mRNA variants, namely mRNAs for the two long NC1 domains and for the NC1-301, whereas the testis contained only mRNAs for the NC1-301 domain in any appreciable amounts. Northern analysis revealed wide expression in rat tissues of mRNAs for the frizzled-1 and frizzled-2 proteins with highest mRNA levels in the kidney, liver, heart, uterus, and ovary(23) . Thus, kidney and liver appear to be tissues that contain both mRNAs for the type XVIII collagen variant with the fz motif and mRNAs for the two frizzled proteins. In conclusion, the tissue distribution of alpha1(XVIII) mRNAs is unlike that of any of the other collagens, and there are distinct differences between the variants in this respect. This latter observation speaks for a possible functional significance for the utilization of two presumed promoters and alternative splicing of alpha1(XVIII) transcripts.


FOOTNOTES

*
This work was supported by grants from the Research Council for Medicine within the Academy of Finland and the Sigrid Juselius Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U11636 [GenBank]and U11637[GenBank].

§
To whom correspondence should be addressed: Dept. of Medical Biochemistry, University of Oulu, Kajaanintie 52 A, FIN-90220 Oulu, Finland. Tel.: 358-81-5375800; Fax: 358-81-5375810; pihlajan{at}phoenix.oulu.fi.

(^1)
The abbreviations used are: PCR, polymerase chain reaction; bp, base pair(s); kb, kilobase pair(s); nt, nucleotide(s).


ACKNOWLEDGEMENTS

We thank Jaana Väisänen for expert technical assistance, Dr. Marjo Metsäranta (University of Turku, Finland) for providing the mouse embryo total RNA sample, and Kari I. Kivirikko (University of Oulu, Finland) for critical reading of the manuscript.


REFERENCES

  1. van der Rest, M., and Garrone, R. (1991) FASEB J. 5, 2814-2823 [Abstract/Free Full Text]
  2. Kivirikko, K. I. (1993) Ann. Med. 25, 113-126 [Medline] [Order article via Infotrieve]
  3. Mayne, R., and Brewton, R. G. (1993) Curr. Opin. Cell Biol. 5, 883-890 [Medline] [Order article via Infotrieve]
  4. Bork, P. (1992) FEBS Lett. 307, 49-54 [CrossRef][Medline] [Order article via Infotrieve]
  5. Abe, N., Muragaki, Y., Yoshioka, H., Inoue, H., and Ninomiya, Y. (1993) Biochem. Biophys. Res. Commun. 196, 576-582 [CrossRef][Medline] [Order article via Infotrieve]
  6. Oh, S. P., Muragaki, Y., Kamagata, Y., Timmons, S., Ooshima, A., and Olsen, B. R. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 4229-4233 [Abstract]
  7. Rehn, M., and Pihlajaniemi, T. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 4234-4238 [Abstract]
  8. Rehn, M., Hintikka, E., and Pihlajaniemi, T. (1994) J. Biol. Chem. 269, 13929-13935 [Abstract/Free Full Text]
  9. Myers, J. C., Kivirikko, S., Gordon, M. K., Dion, A. S., and Pihlajaniemi, T. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 10144-10148 [Abstract]
  10. Kivirikko, S., Heinämäki, P., Rehn, M., Honkanen, N., Myers, J. C., and Pihlajaniemi, T. (1994) J. Biol. Chem. 269, 4773-4779 [Abstract/Free Full Text]
  11. Muragaki, Y., Abe, N., Ninomiya, Y., Olsen, B. R., and Ooshima, A. (1994) J. Biol. Chem. 269, 4042-4046 [Abstract/Free Full Text]
  12. Oh, S. P., Warman, M. L., Seldin, M. F., Cheng, S.-D., Knoll, J. H. M., Timmons, S., and Olsen, B. R. (1994) Genomics 19, 494-499 [CrossRef][Medline] [Order article via Infotrieve]
  13. Huebner, K., Cannizzaro, L. A., Jabs, E. W., Kivirikko, S., Manzone, H., Pihlajaniemi, T., and Myers, J. C. (1992) Genomics 14, 220-224 [Medline] [Order article via Infotrieve]
  14. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  15. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74, 5463-5467 [Abstract]
  16. Kere, J., Nagaraja, R., Mumm, S., Ciccodicola, A., D'Urso, M., and Schlessinger, D. (1992) Genomics 14, 241-248 [Medline] [Order article via Infotrieve]
  17. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132 [Medline] [Order article via Infotrieve]
  18. Deléage, G., Clerc, F. F., and Roux, B. (1989) Comput. Appl. Biosci. 5, 159-160 [Medline] [Order article via Infotrieve]
  19. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol. 215, 403-410 [CrossRef][Medline] [Order article via Infotrieve]
  20. Devereux, J., Haeberli, P., and Smithies, O. (1984) Nucleic Acids Res. 19, 387-395
  21. Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1979) in Atlas of Protein Sequences and Structure (Dayhoff, M. O., ed) pp. 345-352, National Biomedical Research Foundation, Wash. D. C.
  22. von Heijne, G. (1986) Nucleic Acids Res. 14, 4683-4690 [Abstract]
  23. Chan, S. D. H., Karp, D. B., Fowlkes, M. E., Hooks, M., Bradley, M. S., Vuong, V., Bambino, T., Liu, M. Y. C., Arnaud, C. D., Strewler, G. J., and Nissenson, R. A. (1992) J. Biol. Chem. 267, 25202-25207 [Abstract/Free Full Text]
  24. Adler, P. N., Charlton, J., and Vinson, C. (1987) Dev. Genet. 8, 99-119
  25. Bennett, V. D., and Adams, S. L. (1990) J. Biol. Chem. 265, 2223-2230 [Abstract/Free Full Text]
  26. Nah, H.-D., Niu, Z., and Adams, S. L. (1994) J. Biol. Chem. 269, 16443-16448 [Abstract/Free Full Text]
  27. Nishimura, I., Muragaki, Y., and Olsen, B. R. (1989) J. Biol. Chem. 264, 20033-20041 [Abstract/Free Full Text]
  28. Juvonen, M., and Pihlajaniemi, T. (1992) J. Biol. Chem. 267, 24693-24699 [Abstract/Free Full Text]
  29. Juvonen, M., Sandberg, M., and Pihlajaniemi, T. (1992) J. Biol. Chem. 267, 24700-24707 [Abstract/Free Full Text]
  30. Juvonen, M., Pihlajaniemi, T., and Autio-Harmainen, H. (1993) Lab. Invest. 69, 541-551 [Medline] [Order article via Infotrieve]
  31. Doliana, R., Bonaldo, P., and Colombatti, A. (1990) J. Cell Biol. 111, 2197-2205 [Abstract]
  32. Stokes, D. G., Saitta B., Timpl R., and Chu, M.-L. (1991) J. Biol. Chem. 266, 8626-8633 [Abstract/Free Full Text]
  33. Zanussi, S., Doliana, R., Segat, D., Bonaldo, P., and Colombatti, A. (1992) J. Biol. Chem. 267, 24082-24089 [Abstract/Free Full Text]
  34. Trueb, J., and Trueb, B. (1992) Biochim. Biophys. Acta 1171, 97-98 [Medline] [Order article via Infotrieve]
  35. Gerecke, D. R., Foley, J. W., Castagnola, P., Gennari, M., Dublet, B., Cancedda, R., Linsenmayer, T. F., van der Rest, M., Olsen, B. R., and Gordon, M. K. (1993) J. Biol. Chem. 268, 12177-12184 [Abstract/Free Full Text]
  36. Wälchli, C., Trueb, J., Kessler, B., Winterhalter, K. H., and Trueb, B. (1993) Eur. J. Biochem. 212, 483-490 [Abstract]
  37. Feng, L., Xia, Y., and Wilson, C. B. (1994) J. Biol. Chem. 269, 2342-2348 [Abstract/Free Full Text]
  38. Saitta, B., Stokes, D. G., Vissing, H., Timpl, R., and Chu, M.-L. (1990) J. Biol. Chem. 265, 6473-6480 [Abstract/Free Full Text]
  39. Saitta, B., Timpl, R., and Chu, M.-L. (1992) J. Biol. Chem. 267, 6188-6196 [Abstract/Free Full Text]
  40. Vinson, C. R., Conover, S., and Adler, P. N. (1989) Nature 338, 263-264 [CrossRef][Medline] [Order article via Infotrieve]
  41. Krasnow, R. E., and Adler, P. N. (1994) Development 120, 1883-1893 [Abstract/Free Full Text]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.