(Received for publication, July 25, 1994; and in revised form, October 3, 1994)
From the
Previous studies have reported >10 kilobases of human fibrillin-1 cDNA sequence, but a consensus regarding the 5` end of the transcript remains to be worked out. One approach to developing a clear consensus would be to search for regions of evolutionary conservation in transcripts from a related species such as mouse. As reported here, the mouse fibrillin-1 transcript encodes a highly conserved polypeptide of 2,871 amino acids. The upstream sequence that flanks the ATG is considerably less well conserved, however. Indeed, the ATG codon (which occurs in the context of a Kozak consensus sequence and is located just upstream of a consensus signal peptide) signals the point where human and mouse fibrillin-1 sequences cease to be nearly identical. Together, these results are consistent with previous efforts by Pereira et al. (Pereira, L., D'Alessio, M., Ramirez, F., Lynch, J. R., Sykes, B., Pangilinan, T., and Bonadio, J.(1993) Human Mol. Genet. 2, 961-968) to identify the human fibrillin-1 translational start site. Sequences immediately upstream of the ATG are GC-rich and devoid of TATA and CCAAT boxes, which suggests that the mouse fibrillin-1 gene will be broadly expressed. A survey of expression in mouse embryo tissues is consistent with this hypothesis and suggests two novel functions for fibrillin-associated microfibrils in non-elastic connective tissues.
Ten-nm microfibrils are found in a wide variety of extracellular matrices. In elastic tissues such as aorta and fetal bovine nuchal ligament, microfibrils have been shown to facilitate elastic fiber formation (Mecham and Davis, 1994). Microfibrils are also found in non-elastic tissues where they may anchor epithelial cells to the interstitial matrix and may also participate in wound healing (Vracko and Thorning, 1990). Microfibril-associated glycoprotein (Gibson et al., 1991; Chen et al., 1993) and fibrillin (Sakai et al., 1986) are considered to be integral microfibril constituents. Based on biochemical and immunochemical criteria, several other candidate proteins have been identified including: 32- and 250-kDa proteins isolated from bovine zonular fibrils (Streeten and Gibson, 1988), a 35-kDa protein isolated from bovine ligamentum nuchae (Serafini-Fracassini et al., 1981), proteins of 78, 70, and 25 kDa (Gibson et al., 1989), and a 58-kDa protein that is post-translationally modified into a 32-kDa fragment called the associated microfibril protein (Horrigan et al., 1992). Extracellular proteins that may be associated with microfibrils include emilin (Bressan et al., 1993), GP128/thrombospondin (Arabeille et al., 1991), 36-kDa microfibril-associated protein (Kobayashi et al., 1989), proteoglycan (Cleary and Gibson, 1983), fibronectin (Goldfischer et al., 1985), amyloid P component (Inoue and Leblond, 1986), vitronectin (Dahlback et al., 1989), and, perhaps, lysyl oxidase (Kagan et al., 1986; Baccarini-Contri et al., 1989).
Fibrillin is now
recognized as a gene family with two well-characterized members. The
human fibrillin-1 gene (FBN-1) ()has been mapped to
chromosome band 15q21, while the human fibrillin-2 gene has been
localized to chromosome band 5q23 (Lee et al., 1991). The
corresponding mouse fibrillin genes have been assigned to mouse
chromosome 2, band F (Fbn-1) and mouse chromosome 18, band
D-E1 (Fbn-2) (Li et al., 1993). The primary structure
of both molecules is similar, and the fibrillin-1 and fibrillin-2
polypeptides have been immunolocalized to 10-nm microfibrils in
developing elastic tissues (Zhang et al., 1994). A unique
fibrillin-like protein has recently been identified (Mecham and Davis,
1994), and it has been suggested that another fibrillin gene may exist
on human chromosome 17 (Lee et al., 1991). The fact that
naturally occurring mutations in FBN-1 cause Marfan syndrome
and dominantly inherited ectopia lentis emphasizes the importance of
this gene family (Dietz et al., 1991). Moreover, FBN-2 has been genetically linked to congenital contractural
arachnodactyly, a rare disorder that shares some of the skeletal
manifestations of Marfan syndrome (Lee et al., 1991).
Previous studies have reported >10 kb of human fibrillin-1 cDNA sequence, but a consensus regarding the structure of the 5` end of the transcript remains to be worked out. Pereira et al.(1993) used primer extension mapping and S1 nuclease analysis to show that the 5`-untranslated region of the FBN-1 transcript produced by cultured osteosarcoma cells (MG-63) was 134 nt in length. A limited analysis also suggested that sequences upstream of the cap site were GC-rich and lacked TATA and CCAAT boxes. This relatively simple structure predicted that the human fibrillin-1 promoter would have a broad temporal and spatial expression pattern. On the other hand, Corson et al.(1993) found evidence of a longer and more complicated 5`-untranslated region. DNA sequencing plus Northern analysis identified a 1.8-kb CpG island that contained three partial exons (designated B, A, and C). The CpG island also included a fourth exon (designated M) containing the initiator Met residue. In vitro and in vivo studies revealed a strong bias favoring the expression of transcripts that contained exon A. By analogy with other extracellular matrix genes, therefore, the CpG island may contain the transcription initiation site, while exons B, A, and C may be alternatively spliced first exons under the control of separate promoters. The formal possibility also exists that the CpG island does not contain the transcription start site and that there may be upstream exons (i.e. yet to be discovered) that may or may not be associated with CpG islands.
The study by Corson et al.(1993) has raised important questions about the structure of the FBN-1 transcript that bear on normal connective tissue function as well as the pathogenesis of Marfan syndrome and related disorders. One approach to answering these questions is to search for regions of evolutionary conservation, and thus functional significance, in the fibrillin-1 transcript of a related species such as the mouse. As reported here, the mouse embryo and human fibrillin-1 transcript share a remarkable >95% DNA and amino acid sequence identity, extending 8,613 nt downstream of the initiator Met codon identified by Pereira et al.(1993). More than 2.0 kb of 5`-flanking sequence upstream of the mouse initiator Met codon has also been obtained and shown to be considerably less well conserved, which argues against the possibility that upstream exons B, A and C code for additional fibrillin-1 amino acid sequence. The available human and mouse data therefore suggests that the initiator Met codon identified by Pereira et al.(1993) is the one and only fibrillin-1 translation start site. As predicted, the mouse fibrillin-1 gene is broadly expressed in developing mouse tissues. A survey of the expression pattern of the fibrillin-1 transcript suggests a novel role for fibrillin-associated microfibrils during cardiac morphogenesis and during connective tissue remodeling of uterine stroma after embryo implantation.
Figure 1: Mouse Fbn-1 cDNA clones. A partial representation of restriction sites is shown. N, NcoI; Ss, SstII; Ha, HaeII; S, SacI; E, EcoRI; B, BamHI; H, HindIII; Hp, HpaI; and M, MstII.
The conceptual mouse prefibrillin-1 molecule is
a polypeptide of 2,871 amino acids, with a calculated pI of 4.60 and a
molecular mass of 312 kDa (Fig. 2). The predicted domain
structure of mouse and human fibrillin-1 is identical (see Pereira et al., 1993 for additional details). Thus, the mouse molecule
consists of a signal peptide and five structurally distinct regions (A-E). The signal peptide is located at the extreme NH
terminus and consists of an estimated 17 amino acids. Region A,
which is contiguous with the signal peptide, consists of 42 amino acids
and has a net basic charge (estimated pI, 9.9). Region B consists of
324 amino acids that are organized into a total of 8 cysteine-rich
repeats. Region C is a 57 amino acid, proline-rich domain (25/57
residues, or 43.9% proline). Region D, the largest domain, consists of
2,247 amino acids which are organized into 49 cysteine-rich repeats.
Region E consists of 184 amino acids and represents the carboxyl
terminus of the molecule. The position and type of cysteine-rich
repeats, all potential sites of N-linked glycosylation, and a
single RGD sequence were all conserved in the mouse coding sequence.
Figure 2: Sequence comparison of mouse and human fibrillin-1 amino acid sequences. A, presents a schematic diagram of the mouse fibrillin-1 polypeptide. Regions A-E are depicted below. Cysteine-rich repeats are numbered in groups of five at the top. Symbols are as follows: open rectangles, EGF-CB repeats; open ovals, TGF-bp repeats; open circle, Fib motif; patterned rectangles, cysteine-rich repeats of varying pattern (see Pereira et al., 1993); patterned oval, TGF-bp-like repeat; patterned circle, Fib-like module. The amino-terminal signal peptide has been deleted from the schematic for the sake of simplicity. B presents the amino acid sequence of mouse fibrillin-1. Amino acid numbers are shown at the right. Human amino acids that differ from mouse are shown below individual lines of mouse sequence (in bold).
As noted, cysteine-rich repeats were found in regions B and D of the
mouse and human fibrillin-1 sequence. Thirty of these showed the
calcium binding consensus sequence: D/N-I/V-D/N-E/D-C. This
consensus was derived from an analysis of 154 EGF-CB repeats in 23
different proteins and from structural analyses of the EGF-CB repeat,
both bound and unbound to calcium ion (Selander-Sunnerhagen et
al., 1992). Variations on the consensus have been noted
previously, and some of these are present in mouse fibrillin (repeat
numbers correspond to the numbering system used for human fibrillin-1,
see Pereira et al., 1993): D-L-N/D-E-C
(repeat 29,
30, 52, and 53), D-I-D-Q-C
(repeat 26), D-N-D/N-E-C
(repeat 54), and D-T-D/N-E-C
(repeat 46). The
following potential calcium binding sequences, which have not
previously been reported, were also found: D-E-N/D-E-C
(repeat 48, 49, and 56), D-M-N/D-E-C
(repeat 12 and
45), and D-R-D/N-E-C
(repeat 39). All EGF-CB repeats
contained a second consensus sequence,
C
-X-D/N-X-X-X-X-Y/F-X-C
, which is a recognition
sequence for an Asp/Asn hydroxylase that co- and post-translationally
modifies D/N residues (Stenflo et al., 1987; Gronke et
al., 1989). Hydroxyaspartic acid and hydroxyasparagine residues
have been found in direct analyses of the EGF-CB repeats of coagulation
factors, the anticoagulant, protein C, and the latent TGF-
-binding
protein (Ohlin et al., 1988; Persson et al., 1989;
Handford et al., 1990; Colosetti et al., 1993).
Initial sequence comparisons using GAP and BESTFIT algorithms quickly demonstrated that the 5` mouse and human fibrillin-1 upstream sequences shared few regions of identity. Partially characterized human exon B, A, and C sequences, as defined by Corson et al.(1993), were therefore aligned with the entire mouse 5` upstream sequence using both programs. The BESTFIT algorithm generally gave higher sequence identity values, and these values have been reported here. To maximize alignment quality, putative splice donor sequences were included at the 3` end of the exon B, A, and C sequence files, whereas putative splice acceptor sequences were included at the 5` end of the exon M sequence file.
As shown in Fig. 3, mouse and human exon B showed 74% sequence identity with 1 gap (gap weight, 5.0; length weight, 0.05). Three potential open reading frames were found in the mouse sequence, but only one potentially conserved fragment of four amino acids was identified. The splice donor at the 3` end of mouse exon B did not appear to be conserved. Mouse and human exon A showed 94% sequence identity with 7 gaps (gap weight, 2.0; length weight, 0.05). One potential open reading frame was found in the mouse sequence, but fragments of conserved amino acid sequence were identified in all three reading frames. The splice donor at the 3` end of mouse exon A appeared to be conserved. Mouse and human exon C showed 85% sequence identity with 12 gaps (gap weight, 2.0; length weight, 0.05). One potential open reading frame was identified that contained several fragments of conserved amino acid sequence. The splice donor at the 3` end of mouse exon C appeared to be conserved. Mouse and human exon M upstream of the initiator Met codon showed 82% sequence identity with nine gaps (gap weight, 2.0; length weight, 0.05). A coding sequence in frame with the initiator Met codon was found, and this sequence contained several fragments of conserved amino acid sequence. The splice acceptor sequence at the predicted 5` boundary of mouse exon M appeared to be conserved.
Figure 3: Sequence comparison of 5` mouse and human Fbn-1 upstream exon sequences. Mouse and human nucleotide sequence alignments are shown on the left. Nucleotide sequence alignment and identity are presented using the BESTFIT sequence analysis program from the Genetics Computer Group, but essentially the same alignments were obtained using the GAP algorithm. Human sequence is presented atop the mouse sequence, and nucleotide numbers are shown on both the right and the left. Predicted splice donor/splice acceptor sites (a total of 5 nt) are shown on a separate line and have been underlined. Deduced murine amino acid sequences are shown on the right. Stop codons are indicated by an *. Fragments of conserved amino acid sequence are underlined.
The temporal and spatial expression patterns of Fbn-1 transcripts were determined using a specific 3`-untranslated region probe (see ``Experimental Procedures''). As shown in Fig. 4, dramatic Fbn-1 expression was observed in uterine tissues surrounding day 8.5-9.0 embryos. Particularly striking was the fact that gene expression was limited to the uterine stromal tissue (decidua) beneath and lateral to the placental implantation site, i.e. the mesometrial side of the gravid uterus. In contrast, expression was not significant in embryo tissues, placenta, placental membranes, or the ectoplacental cone.
Figure 4: Overview of Fbn-1 gene expression during murine development as determined by tissue in situ hybridization. Day 8.5-9.0 sections contained embryos surrounded by intact membranes, uterine tissues, and the placental disk, cut in random planes. Day 13.5 and 16.5 sections contain isolated whole embryos sectioned in the sagittal plane near or about the mid-line. Identical conditions were maintained throughout autoradiography and photography, thereby allowing a comparison of the overall strength of hybridization in all tissue sections. The Fbn-1 transcript is intensely expressed in the mesometrial decidua at day 8.5-9.0 of development. The anti-mesometrial decidua, placenta, placental membranes, and the embryo do not show significant hybridization. At day 13.5 and day 16 of development, the Fbn-1 transcript appears to be the widespread product of connective tissue cells. Significant hybridization is not observed in the brain, spinal cord, heart, and liver of day 13.5 and day 16.5 tissue sections (arrowheads).
Histological examination of
the day 8.5-9.0 gravid uterus confirmed that the Fbn-1 gene was highly expressed by mesometrial decidual cells.
Expression was above background in the anti-mesometrial decidua,
trophoblastic cells of the ectoplacental cone, placenta, and placental
membranes. ()Fbn-1 gene expression could not be
detected in non-gravid uterine tissues derived from age-matched CD-1
females. Fbn-1 gene expression in day 8.5-9.0 mouse
embryo tissues was generally negative with a single exception,
evaluation of 24 embryos demonstrated that the endocardium alone
expressed the Fbn-1 gene at levels above background (Fig. 5).
Figure 5: Fbn-1 gene expression in the day 8.5-9.0 embryo. At day 8.5-9.0 of development, the hybridization signal appears to be concentrated within endocardial tissue. Heart muscle, the neural epithelium, and the mesenchyme/connective tissues did not show significant hybridization (not shown). Bar, 20 µm.
The remarkable degree of evolutionary conservation between mouse and human fibrillin-1 gene products implies that strict structural requirements govern the assembly of extracellular 10-nm microfibrils. To facilitate assembly, regions B and D (which consist of consecutive EGF-like repeats) may function as rigid arms capable of projecting unique globular domains that interact with other microfibril constituents and/or with extracellular matrix molecules near the microfibril surface. A similar model has been proposed for the structure of other extracellular matrix molecules that have multiple EGF-like repeats, including laminin, tenascin, and thrombospondin-1 (Engel, 1989).
The fibrillin-1 transcriptional start site is
more problematical. Corson et al.(1993) have used DNA
sequencing plus Northern analysis to identify a 1.8-kb CpG island that
contained three putative exons plus a fourth downstream exon containing
a putative initiator Met residue. These findings suggested that the CpG
island contained the transcription initiation site and exons B, A, and
C may be alternatively spliced first exons under the control of
separate promoters. Alternatively, the CpG island may not represent the
transcription start site and there may be additional exons, as yet
undiscovered, within the upstream flanking sequence. Our preliminary
studies have yielded conflicting results regarding these possibilities.
On one hand, none of the upstream human fibrillin-1 sequences described
by Corson et al.(1993) were identified as candidate exons when
analyzed by the GRAIL program (Oak Ridge National Laboratory), ()a result that argues against the possibility of more than
one transcript initiation site. On the other hand, Northern analysis of
mouse mRNA identified fibrillin-1 transcripts that hybridize with exon
A and C probes and therefore appear to posses heterogeneous 5`
ends.
As expected from previous work (Corson et
al., 1993), transcripts with exon A sequences were most readily
identified, although they were extremely low abundance relative to
fibrillin-1 transcripts that hybridize with a 3`-untranslated region
probe (10-100-fold less, as determined by scanning gel
densitometry). Fibrillin-1 transcripts with heterogeneous 5` ends
therefore appear to be rare or restricted to a subset of developing
mouse tissues and cell lines.
While the fibrillin-1 translational start site appears well defined more information clearly is necessary before the existence and function of alternative upstream exons can be determined and the sites of transcription initiation are localized. Perhaps these issues will be resolved through the analysis of fibrillin-1 genes from species even more closely related to human than the mouse, e.g. the pig fibrillin-1 gene.
The endocardium was the only tissue in our study of day 8.5-9.0 embryos that expresses the Fbn-1 gene at levels above the background of the experiment. At this time the heart beats regularly and powerfully, and it is plausible that fibrillin-1 molecules are assembled into a microfibril-rich, subendocardial connective tissue that organizes muscle cells and helps them resist the mechanical forces of cardiac contraction. Fibrillin may certainly be expressed at earlier time points, however. For example, Gallagher et al.(1993) have shown that fibrillin-1 polypeptides are expressed along the primary axis of the avian embryo, including Hensen's node. These results further emphasize that fibrillin-associated microfibrils may play a critical role during connective tissue assembly.
In sharp contrast to the pattern of Fbn-1 expression, Magp is widely expressed in the mesenchyme of day 8.5-9.0 embryos, but is not expressed by decidual cells in any region of the gravid uterus or in endocardial tissue (Chen et al., 1993). By day 13 of mouse development, the overall pattern of Fbn-1 expression was similar to that of Magp in that both genes appeared to be the widespread product of connective tissue cells in the interstitium of many organs. The results of these tissue in situ hybridization studies indicate that microfibril genes are not always coordinately regulated in the mouse, which, in turn, may be an indication of tissue-specific differences in microfibril composition, structure, and function.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L29454[GenBank].