Characterization of an Evolutionarily Conserved Far-upstream Enhancer in the Human alpha 2(I) Collagen (COL1A2) Gene*

Taras T. AntonivDagger §, Sarah De Val§, Dominic Wells||, Christopher P. Denton**, Christian Rabe**, Benoit de Crombrugghe**, Francesco RamirezDagger , and George Bou-Gharios

From the Dagger  Brookdale Center in the Department of Biochemistry and Molecular Biology, Mount Sinai School of Medicine-New York University, New York, New York 10029, the  Medical Research Council, Clinical Sciences Center, Imperial College School of Medicine, Hammersmith Campus, London W12 ONN, United Kingdom, the || Gene Targeting Group, Department of Neuromuscular Diseases, Imperial College School of Medicine, Charing Cross Campus, London W6 4RF, United Kingdom, and the ** Department of Molecular Genetics, The University of Texas, M. D. Anderson Cancer Center, Houston, Texas 77020

Received for publication, February 13, 2001


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

We have examined the chromatin structure around and upstream of the transcriptional start site of the human alpha 2(I) collagen (COL1A2) gene. Four strong DNase I-hypersensitive sites (HS2-5) were only detected in fibroblasts, and a weaker one (HS1) was identified in type I collagen-negative cells. Another hypersensitive site potentially involved in COL1A2 silencing was found in intron 1 (HS(In)). HS1 and HS2 were mapped within conserved promoter sequences and at locations comparable to the mouse gene. HS3, HS4, and HS5 were likewise mapped ~20 kilobases upstream of COL1A2 at about the same position as the mouse far-upstream enhancer and within a remarkably homologous genomic segment. DNase I footprinting identified twelve areas of nuclease protection in the far-upstream region (FU1-12) and within stretches nearly identical to the mouse sequence. The region containing HS3-5 was found to confer high and tissue-specific expression in transgenic mice to the otherwise minimally active COL1A2 promoter. Characterization of the human element documented functional differences with the mouse counterpart. Enhancer activity substantially decreased without the segment containing FU1-7 and HS5, and inclusion of AluI repeats located 3' of HS3 augmented position-independent expression of the transgene. Hence, subtle differences may characterize the regulation of mammalian alpha 2(I) collagen genes by evolutionarily conserved sequences.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The extracellular matrix plays a critical role in morphogenesis and growth, as well as in tissue homeostasis and repair (1). Type I collagen is the most abundant extracellular component of the vertebrate connective tissue and consists of two alpha 1 chains and one alpha 2 chain (2). Type I collagen is produced at different levels by a large number of tissues and organs, and in a tightly controlled spatio-temporal manner (3). Transgenic studies, predominantly carried out on the rodent alpha 1(I) and alpha 2(I) collagen genes, have identified DNA sequences that drive transcription in distinct mesenchyme lineages (4-12). Although coordinated expression has been thought to underlie similar transcriptional mechanisms, the transgenic studies failed to identify comparable organization of regulatory sequences or common cis-acting elements in the two collagen genes (4-12). In contrast to the number of tissue-specific elements found within the 3.5 kb1 upstream sequence of the rat and mouse alpha 1(I) genes, regulated transcription of mouse alpha 2(I) collagen (Col1a2) involves yet to be identified DNA sequences located in both the proximal promoter and a far-upstream enhancer (6, 8-11).

The -350 promoter of gene Col1a2 is capable to direct transcription in transgenic models, but expression is limited to a few tissues, such as skin fascia, tail tendon, and calvaria osteoblasts (6). Similar results were obtained when multiple copies of the -315 to -284 region were linked to the basal -40 Col1a2 promoter (6). Chromatin analysis of the upstream region of Col1a2 subsequently identified a cluster of three DNase I-hypersensitive sites (HS3-5), located ~17 kb from the start site of transcription (11). Inclusion of the 6-kb region containing the three HS sites upstream of the -350 proximal promoter resulted in a high level of beta -galactosidase expression in a larger number of type I collagen-producing cells; this element was therefore termed the far-upstream enhancer of the Col1a2 gene (11). There may be functional redundancy among putative cis-acting elements in the far-upstream enhancer, because deletion of the HS5-containing segment had virtually no effect on the intensity or distribution of beta -galactosidase activity (11).

The regulatory network of Col1a2 seems to adhere to the functional domain model of gene expression, where functional domain defines the genomic region that contains cis-acting sequences regulating a particular locus (14). Such DNA elements include promoter, enhancer, and insulator sequences, as well as the DNase I hypersensitive site (HS) and the locus control region (LCR), (14). The model is probably best exemplified by the organization of the beta -globin cluster (13-15). In this evolutionarily conserved gene cluster, the upstream LCR potentiates expression of downstream genes by competitively interacting with the respective promoters at distinct stages of development (14, 15). Another important attribute of the LCR is the ability to organize chromatin and thus, spatially insulate the functional interactions between upstream enhancer and proximal promoter (14, 15). Phylogenetic conservation of chromatin organization and of related cis-acting elements are therefore strong and an indirect indication of a functional gene domain (13). In the case of alpha 2(I) collagen, current knowledge is limited to the mouse gene; indeed, transcriptional analysis of the human counterpart (COL1A2) has only focused on the proximal promoter.

Cell transfection and DNA binding assays have located the transforming growth factor-beta (TGFbeta ) and tumor necrosis factor-alpha (TNFalpha ) responsive elements of COL1A2 to the sequence lying between nucleotides -330 and -250, the human counterpart of the mouse -315 to -284 segment (16, 17). Recent studies have shown that the antagonistic stimuli of TGFbeta and TNFalpha on COL1A2 transcription are integrated by the binding of Sp1 in synergy with Smad and C/EBP proteins, respectively (18-20). On the other hand, TGFbeta stimulation of Col1a2 transcription has been reported to be mediated by the same sequence but through binding of CTF/NF-I (21). This last finding has raised the intriguing possibility that species-specific differences in seemingly identical cis-acting elements may result in distinct mechanisms to regulate transcription of the mammalian alpha 2(I) collagen genes (16).

The present study was undertaken to delineate the overall organization of the putative functional domain of COL1A2 and to begin defining its structural-functional relationship to the mouse unit. Toward this end, we analyzed the chromatin structure around and upstream of the start site of transcription; identified previously unknown sites of DNA-nuclear protein interaction; and assessed the activity of relevant genomic regions in transgenic mouse embryos. Altogether, the results indicate that the human and mouse genes share similar chromatin organizations and structurally homologous regulatory sequences. They also suggest that distinct enhancer/promoter interactions may underlie species-specific differences in the tissue-specific expression of the two genes.

    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Cells and DNA Constructs-- Human embryonic lung fibroblasts (WI-38, ATCC CCL-75), Jurkat T cells (ATCC TIB-152), and umbilical vascular endothelial cells (HUVEC, C-003-5C, Cascade Biologics, Inc., Portland, OR) were grown in Dulbecco's modified Eagle's medium, RPMI 1640, and M200 medium, respectively. Dulbecco's modified Eagle's medium and RPMI 1640 were supplemented with 10% fetal bovine serum, 100 µg/ml streptomycin, and 100 units/ml penicillin; M200 was supplemented with low serum growth LSGS (Cascade Biologics, Inc.) in the absence of antibiotics. Construct -378LAC in the pbeta gal-Basic vector (CLONTECH, Palo Alto, CA) was derived from the -378 COL1A2/LUC plasmid (17). Constructs were engineered by subcloning various upstream sequences 5' of the -378 promoter. The 5.2-kb fragment that extends from -22.8 to -17.5 kb was derived by double digestion with SpeI and XbaI of a larger SpeI subclone of the BAC clone GS056H18 BAC clone (Genome Systems, St. Louis, MO; GenBankTM accession number AC002074) to yield construct 22.8/17.5pLAC. The 2.0- and 3.0-kb fragments that extend from -22.8 to -20.8 kb and from -20.8 to -17.8 kb of COL1A2 were derived by internal HindIII deletion of the above clone to yield constructs 22.8/20.8pLAC and 20.8/17.5pLAC, respectively. The 2.3-kb fragment that extends from -21.1 to -18.8 kb was generated with proofreading Pfu DNA polymerase (Stratagene, La Jolla, CA) on DNA template of the original BAC clone and using primers 5'-TTACCCCCAATTTACAGATGAAAG-3' and 5'-GCCTCAGCAAGCAACGTGG-3' to yield construct 21.1/18.8pLAC. The 1.4-kb fragment that extends from -20.2 to -18.8 was generated by Swa1 digestion of the above 2.3-kb fragment to yield construct 20.2/18.8pLAC. The sequence of the mouse proximal promoter is in the GenBankTM under accession number S48747, whereas the sequence of the far-upstream enhancer has been deposited under accession number AF345994. Sequences were compared using the MacVector package of programs.

Chromatin Analysis and DNase I Footprinting Assay-- Cells were washed with ice-cold phosphate-buffered saline, scraped, pelleted, and resuspended in buffer A (15 mM Tris-HCl (pH 7.6), 15 mM NaCl, 60 mM KCl, 1 mM EDTA, 0.5 mM EGTA, 0.3 mM sucrose, 0.1% Triton X-100, 0.15 mM spermine, 0.5 mM spermidine, 1 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol). Cells were mechanically disrupted, the resulting homogenate was diluted with an equal volume of buffer B (buffer A without Triton X-100), and the nuclei were sedimented by centrifugation. Nuclear pellets were resuspended in 5 volumes of buffer C (buffer A without Triton X-100, EDTA and EGTA) and DNA concentrations were estimated by UV absorption at 260 nm; 15 OD units were used for each DNase I digestion. These reactions were performed in 40 mM Tris-HCl (pH 7.6) and 6 mM MgCl2 using 10 µl of DNase I (Amersham Pharmacia Biotech, Piscataway, NJ) at concentrations between 0 and 10 units/µl. After 15-min incubation at room temperature, the reaction was halted by adding 2 volumes of 50 mM Tris-HCl (pH 8.0), 100 mM NaCl, 100 mM EDTA, 1% SDS, and 40 µl of proteinase K (20 mg/ml). DNA purification, restriction nuclease digestion, and Southern analysis were performed according to the standard protocol (22). For DNase I footprinting, plasmid DNA was digested with the appropriate restriction enzyme and end-labeled by filling-in 3'-recessed ends with the Klenow enzyme (22). Gel purification of labeled DNA fragments, preparation of nuclear extracts from WI-38 cells, nuclear protein binding, and DNase I footprinting reactions were performed as previously described (16).

Generation and Analysis of Transgenic Embryos-- Transgenic embryos were produced by the standard pronuclear injection of DNA into fertilized C57Bl/10 × CBA/J F1 eggs (23). Plasmid DNA was digested with appropriate enzymes, purified from agarose gel, and microinjected at a concentration of 2-4 ng/ml in 10 mM Tris (pH 7.4) and 0.1 mM EDTA. Injected eggs were transferred in pseudo-pregnant CD1 females. Embryos were collected from the recipient females at 15.5 days post coitum (E15.5) for whole-mount fixation and staining. Aside from corresponding to a time of strong Col1a2 expression (24), stage E15.5 was chosen to avoid the problem of diminished permeability due to skin keratinization. Integration of the transgenes was assessed by Southern blot hybridization to a LacZ (the gene coding for beta -galactosidase) probe of DNA purified from embryonic placentas. After cutting open thorax and abdomen to facilitate substrate infiltration, embryos were placed in cold phosphate-buffered saline and then fixed for 45-60 min in 0.2% glutaraldehyde, 2% formalin in 0.1 M phosphate buffer, pH 7.3, containing 2 mM MgCl2 and EGTA. They were washed three times for 1 h each in the same buffer with 0.1% sodium deoxycholate and 0.2% Nonidet P-40, and stained overnight at room temperature in 1 mg/ml 5-bromo-4-chloro-3-indolyo-beta -D-galactoside solution (X-gal) containing 5 mM potassium ferrocyanide and 5 mM ferricyanide. For histological analysis, LacZ-expressing embryos were dehydrated and wax-embedded, and 6-µm sections were prepared, dewaxed, and counterstained with eosin.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

COL1A2 Chromatin Structure-- DNaseI-hypersensitive sites in chromatin are structural landmarks indicative of control regions involved in constitutive and tissue and/or stage-specific transcription (25). The mouse Col1a2 gene has been shown to contain five major hypersensitive sites at discrete locations within the genomic region that spans from -20 kb to +1.5 kb, relative to the start site of transcription; they have been termed HS1 to HS5, in a 3' to 5' direction (Fig. 1A) (11). HS1 and HS2 are located in the proximal promoter to about -100 bp and -2.1 kb, respectively; whereas HS3-5 map at around -17 kb in the far-upstream enhancer (Fig. 1A). We decided to search for DNase I-hypersensitive sites within the human COL1A2 genomic segment that extends from nucleotides -23 kb to +10 kb to assess whether the chromatin organization may resemble the mouse counterpart. To this end, three genomic fragments (probes Pa, Pb, and Pc in Fig. 1A) were used to probe Southern blots containing DNA from fibroblasts, which produce fibroblasts' high levels of type I collagen, and from HUVEC and Jurkat cells, which are type I collagen-negative.


View larger version (46K):
[in this window]
[in a new window]
 
Fig. 1.   DNase I-hypersensitive sites in the human COL1A2 gene. A, partial restriction map of the human gene with the locations of the Southern probes (Pa, Pb, and Pc) and hypersensitive sites (HS) indicated in comparison to those of the mouse. B-D, nuclei of the indicated cell lines were digested with increasing amounts of DNase I and hybridized to probes Pa (B), Pb (C), and Pc (D); the size of the DNase I-resistant bands is indicated in each autoradiograph along with the correlation of the bands and hypersensitive sites. The asterisk identifies an unspecific digest.

Probe Pa is a 0.2-kb fragment located in intron 1, which hybridizes to a 1.75-kb EcoRI genomic fragment (Fig. 1, A and B). A new 1.3-kb-long fragment was detected in DNA derived from Jurkat nuclei treated with DNase I, in addition to a significantly fainter band of similar size in DNase-treated WI-38 nuclei (Fig. 1B). Probe Pb is a 0.2-kb fragment located 3.6 kb upstream of the start site of transcription, which recognizes a 3.9-kb EcoRI genomic fragment (Fig. 1, A and C). Southern analysis of DNA from DNase-treated WI-38 nuclei revealed that probe Pb hybridizes faintly to a 3.5-kb species and more strongly to a 1.3-kb band; of these new hybridizing bands, the 1.3-kb species was not observed in DNase-treated Jurkat nuclei (Fig. 1C). Probe Pc is located 19 kb upstream of probe Pb and hybridizes to a 6.2-kb BamHI genomic fragment (Fig. 1, A and D). Unlike DNase-treated nuclei from HUVEC and Jurkat cells, probe Pc recognized three additional bands in DNase-treated nuclei from WI-38 fibroblasts (Fig. 1D). The relative positions of the six hypersensitive sites (designated as HS(In) and HS1-5 in Fig. 1) were validated by Southern analysis using probes corresponding to the opposite ends of the Pa, Pb, and Pc genomic fragments (data not shown). Accordingly, HS(In) was located at about +730 bp; HS1 at -130 bp; HS2 at -2.3 kb; and HS3-5 at 19.1, 19.5, and 20.5 kb, respectively (Fig. 1A). Scanning of the remaining genomic region comprised between HS2 and HS3 revealed no additional DNase I-hypersensitive sites (data not shown). Altogether, the analysis demonstrated that COL1A2 and Col1a2 share comparable chromatin organizations that are characterized by cell type-specific hypersensitive sites clustered immediately 5' of the start site of transcription and far-upstream of it.

Comparison of COL1A2 and Col1a2 Sequences-- To relate the above data at the DNA level, we sequenced ~3 kb encompassing mouse HS3-5 and compared it to the sequence generated by the Human Genome Project. We also compared selected areas of the proximal promoters and first introns where the other hypersensitive sites had been mapped. Dot-matrix analysis of the relevant genomic regions revealed that the cell type-specific hypersensitive sites common to COL1A2 and Col1a2 (HS1-5) reside within stretches of highly homologous sequences; by contrast, the unique HS in the first intron of COL1A2 [HS(In)] lies within a divergent sequence (Fig. 2A). A comparison of the sequences around human HS(In), HS1, and HS2 is shown in Fig. 2B.


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2.   Conservation of sequences around hypersensitive sites. A, dot-matrix analysis of the proximal promoter (top) and far-upstream (bottom) sequences of the mouse and human alpha 2(I) collagen genes. B, alignment of the human (top) and mouse (bottom) sequences around HS(In), HS1, and HS2.

Computer-aided alignment of the sequences that contain human and mouse HS3-5 revealed remarkably high homology (62%) between human nucleotides -21.1 to -18.8 kb and mouse nucleotides -17.7 and -15.4 (Fig. 3); this highly conserved genomic segment is herein referred to as the core homology region. The alignment also identified five individual islands (IS) of sequence with average identity of 80% or greater (Fig. 3). They include the sequences around HS3 (-19.1 to -18.8 kb, IS5); HS4 (-19.6 to -19.4 kb, IS4); HS5 (-20.7 to -20.5 kb, IS2); between -20.1 and -19.8 kb (IS3); and between -21.1 and -20.9 kb (IS1) (Fig. 3). IS5 displays the lowest degree of sequence identity due to the presence of three divergent segments within it (Fig. 3). The above observations thus correlated comparable arrangements of open chromatin with phylogenetically conserved DNA sequences.


View larger version (70K):
[in this window]
[in a new window]
 
Fig. 3.   Comparison of the human and mouse far-upstream sequences. The mouse sequence (-17.7 to -15.4 kb) is below the human sequence (-21.1 to 18.8 kb). Also shown are the positions of the footprinted (FU) areas, identity islands (IS), and hypersensitive sites (HS). The MAR consensus sequence is highlighted by the shadowed box, whereas the arrow indicates the 5'-end of construct 20.2/18.8pLAC.

Nuclear Protein Binding Sites in the Core Homology Region-- Previous characterization of the mouse far-upstream region has not included identification of binding sites for nuclear proteins, a prerequisite to deciphering enhancer function (11). The DNase I footprinting assay was therefore employed in the present study to map sites of nuclear protein interaction in the core homology region of COL1A2. The analysis identified twelve distinct areas of nuclear protein protection within identity islands IS1, IS2, IS3, and IS5 (Fig. 4). By contrast, no recognizable footprint was identified in IS4 where HS4 is located (Fig. 3). The twelve footprinted areas were designated FU1-12, in a 5' to 3' direction (Fig. 3). The footprints are broadly distributed into three separate clusters that reside within the upstream third of the core homology region (FU1-7), and in the middle (FU8-9) and at the 3'-end of it (FU10-12) (Fig. 5). Although the identity of the cognate trans-acting factors remains to be determined, the analysis nevertheless yielded the first indication of the number and distribution of putative cis-acting elements within the core homology region.


View larger version (84K):
[in this window]
[in a new window]
 
Fig. 4.   DNase I footprinting analysis of the COL1A2 far-upstream region. The genomic region was analyzed using the following probes: from -21 kb to -20.8 kb (labeled from 5'-end in A, and 3'-end in B); from -20 kb to -20.4 kb (labeled from 5'-end in C, and 3'-end in D); from -20.1 to -19.9 (labeled from 5'-end in E, and 3'-end in F); and from -19.6 kb to -18.76 kb (labeled from 5' in G and 3' in H). In each test, DNA was incubated with increasing amounts of DNase I in the presence (+) or absence (-) of WI-38 nuclear extracts. The numbers on the side of the protected areas correspond to the footprinted sequences of Fig. 3, whereas G/A indicates the Maxam and Gilbert reaction used as a marker.


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5.   Diagrams of the human transgenes. The core homology region is shown in gray with the arrowheads and black bars indicating the positions of the hypersensitive sites and footprints, respectively. On the right side are the statistics of the transgene analysis with level of expression arbitrarily expressed as high and low based on the uniformity of the whole-mount staining procedure (see Fig. 6). The word none signifies undetectable expression under the same experimental conditions. Indicated in parentheses are the total number of transgenics found positive by Southern analysis. The copy number of integrated transgenes ranged from 2 to 5.

Functional Analyses-- Six LacZ reporter gene constructs were engineered to determine the transcriptional contribution of the COL1A2 far-upstream region in transgenic mouse embryos. The first of them (-378LAC) consists of the -378 to +54 proximal promoter fused to the reporter gene, whereas the others contain additional combinations of far-upstream sequences subcloned 5' of the proximal promoter (Fig. 5). To be precise, they include the core homology region flanked by divergent 5' and 3' sequences (22.8/17.5pLAC); only the core homology region (21.1/18.8pLAC); the 5' divergent sequence and the segment of the core homology region harboring FU1-4 (22.8/20.8pLAC); the core homology region and 3' divergent sequence (20.8/17.5pLAC); and the core homology region without the 5' third segment that contains HS5 and FU1-7 (20.2/18.8pLAC) (Fig. 5).

The activity of each construct was examined according to the overall intensity of beta -galactosidase staining and to the percentage of transgenic embryos expressing the reporter gene. The former parameter was used to evaluate the transcriptional contribution of distinct genomic fragments, and the latter was used to identify element(s) that may contribute to position-independent expression of the transgenes. All transgenic embryos were derived at the same developmental stage and processed under the same experimental conditions to allow reliable comparisons between embryos expressing different constructs. Within the limits of the experimental approach, the comparisons revealed distinct features of the core homology region, which are based on the relative number of integrated transgenes expressing beta -galactosidase, as well as on the intensity and tissue distribution of X-gal staining. It should be noted that the last two features were fairly reproducible among different transgenic embryos that harbor the same construct; moreover, the number of integrated copies averaged between 2 and 5 without much variation in transgene expression. These observations enabled us to segregate with some confidence beta -galactosidase-positive embryos into high and low expressors (Fig. 5). Visual documentation of our estimates is provided in the whole-mount X-gal staining of illustrative transgenic embryos belonging to the groups of low (-378LAC and 20.2/18.8pLAC) and high (21.1/18.8pLAC) expressors (Fig. 6). These data were subsequently confirmed at the tissue level by histological examination of beta -galactosidase-positive transgenic embryos (see below). Similarly, the percentage of beta -galactosidase-positive embryos separated the constructs into low (33-40%) and high (71-88%) transgenic expressors (Fig. 5).


View larger version (55K):
[in this window]
[in a new window]
 
Fig. 6.   Expression of human COL1A2 transgenes. A, photomicrograph of beta -galactosidase staining of the whole E15.5 embryo harboring the -378LAC construct. B, photomicrograph of beta -galactosidase staining of the whole E15.5 embryo harboring the 21.8/18.8pLAC construct. C, photomicrograph of beta -galactosidase staining of the whole E15.5 embryo harboring the 20.2/18.8pLAC construct. Transgenic embryos were processed under the same experimental conditions, and stained for the same length of time.

The -378LAC construct directed low transcription of the reporter gene (Fig. 6A); expression was limited to a subset of type I collagen-producing cells, often with a mosaic pattern (Fig. 7). Positive sites include the interstitial tissue of the submandibular gland, skin fascia, and, occasionally, tendons and periosteum of developing ribs (Fig. 7). Noticeably, none of the four -378LAC transgenic embryos displayed ectopic beta -galactosidase activity in type I collagen-negative tissues. Furthermore, the percentage of integrated -378LAC construct expressing beta -galactosidase was low, 4 embryos out of 10 (Fig. 5). The staining intensity and tissue distribution, as well as the percentage of -378LAC transgenic expressors are almost identical to the values previously obtained with eight transgenic embryos carrying the mouse -350 Col1a2 promoter (11).


View larger version (122K):
[in this window]
[in a new window]
 
Fig. 7.   Tissue distribution of human -378LAC transgene expression. X-gal staining can be seen in a few cells of type I collagen-producing tissues. In A, the characteristic blue staining can be seen in the interstitial cells of the submandibular gland (arrow) but is not apparent in skeletal muscle (m). Staining can also be seen in the outer lining of the stomach, which is made up of fibroblasts and smooth muscle cells (B, g). No staining in liver lto cells, nor in the capsule (B, l). Little staining can be seen in the fascia of the skin (C, f) along the back of the embryo, whereas keratinocytes of the epidermis (arrow) are negative. In D, mesenchymal cells show blue staining in the ear region (ch). Some, but not all tendons, express the transgene as shown in the footpad of the hind limb (E). In F, osteoblasts of the clavicle bone are intensely blue. The periosteum of ribs also show blue staining prior to ossification, as illustrated in G. Osteoblasts surrounded by the osteum of the mandible (H) and frontal bone (I) appear blue (arrows). The retina of the eye (e) is not stained. The strong staining of the nose (J) is localized in the cells surrounding the nasal passage (arrows), as well as the fascia of the skin.

Addition of the core homology region with (22.8/17.5pLAC) or without (21.1/18.8pLAC) the divergent flanking sequences dramatically increased LacZ staining, and broadened significantly the expression profile to closely resemble that of the endogenous gene (24). These points are visually illustrated in the whole-mount and histological images of a positive 21.1/18.8pLAC transgenic embryos (Figs. 6B and 8). Among other mesenchymal tissues, LacZ is transcriptionally active in smooth and skeletal muscles, intramembranous bones, meninges, skin, liver, lung, and kidney (Fig. 8). Taken at face value, these data equated the core homology of COL1A2 functionally to the far-upstream enhancer of Col1a2 (11). An important difference was however noted between the two last constructs.


View larger version (129K):
[in this window]
[in a new window]
 
Fig. 8.   Tissue distribution of human 21.8/18.8pLAC. X-gal staining can be seen in almost all type I collagen-producing cells. In A, section of the neck region shows staining in the clavicle (arrow), arterial wall of the carotid artery (arrowhead), mesenchymal cells of the submandibular gland and skeletal muscle, as well as the osteoblasts of the mandible (star). In B, staining is present in osteoblasts of the frontal bone, but neither in brain (b) nor eye (e). In C, a cross section of the heart and surrounding region shows staining in the aorta (asterisk). Staining can also be seen in the heart valves (arrow) and pericardium (arrowhead); C1 is a higher magnification of this area. In D, osteoblasts in the mandibular bone are intensely blue, as well as the ribs (F, arrow). E1 shows a higher magnification of the diaphragm (d) and liver capsule being stained. E2 is also a higher magnification of the two adjacent ribs, nearest to the arrow in E. The section of the head shows staining of the meninges (F, m) but not the brain tissue (F, b). In G, the section of the abdomen shows staining in several organs, such as the layers of the stomach wall (g), splenic premordium (s), and pancreas (p). Interstitial cells of the lung were stained, as well as the pulmonary artery (H, arrow). In I, the osteoblasts of the femoral head show blue staining (arrow). Section through layers of the skin along the back of the embryo shows blue staining in the fascia and various skeletal muscle layers (J) but not in chondrocytes of the ribs (c) or in the epidermis. In K, kidney capsule (arrowhead) and interstitial cells of the mesengium and tubules are also stained.

Whereas five out of seven 22.8/17.5pLAC transgenes displayed strong LacZ staining, a significantly lower percentage of 21.1/18.8pLAC (33%) were beta -galactosidase-positive (Fig. 5). This finding raised the possibility that insulator-like element(s) may be present in the divergent sequences that flank the core homology region. Conveniently located restriction sites were therefore used to generate constructs containing either the 5'-flanking sequence and first four footprints of the core homology region (22.8/20.8pLAC) or the remaining of the core homology region and 3'-flanking sequence (20.8/17.5pLAC) (Fig. 5). Only the latter construct yielded high beta -galactosidase levels and in a large percent (88%) of transgenic embryos (Fig. 5). Inspection of the divergent 3'-flanking sequence apparently responsible for enhanced position-independent expression of the transgene revealed that it consists of two AluI repeats organized in a head-to-head orientation (26). Interestingly, AluI repeats are also interspersed throughout the beta -globin locus (13-15). The lack of expressors with the 22.8/20.8pLAC may be due to strong silencer(s) within the 5'-flanking sequence or to the relatively low number of transgenics examined.

The high percentage of 22.8/17.5pLAC and 20.8/17.5pLAC transgenes expressing beta -galactosidase is similar to the values reported for mouse constructs with different lengths of the far-upstream enhancer (11). Importantly, however, the mouse analysis was carried out with sequences that extend well beyond the 3'-boundary of the human core homology region (11). Accordingly, it remains to be determined whether or not DNA element(s) contributing to enhanced position independence are also present immediately 3' of the Col1a2 core homology region. Indeed, the shortest far-upstream construct of Col1a2 used previously includes only the region encompassing HS3-4 and ~2 kb of the divergent 3'-flanking sequence (11). Similarly to transgenes with the whole core homology region, this shorter mouse construct was reported to direct high levels of beta -galactosidase and in 80% of transgenic embryos (11). Lacking information about the distribution of nuclear protein binding sites in the core homology region of Col1a2, we tested the activity of a comparable human construct (20.2/18/8pLAC) without the HS5-containing region and 3'-flanking sequence (Fig. 5). Consistent with the results of 21.1/18/8pLAC, deletion of the 3'-flanking sequence correlated with decreased percent of transgenics expressing beta -galactosidase (Fig. 5). At variance with the mouse data, however, loss of the HS5-containing region -and consequently, loss of FU1-7 -resulted in significantly reduced X-gal staining (Fig. 6C).

Histological examination of tissues from 20.2/18.8pLAC transgenic embryos documented an expression profile broader than -378LAC, but less intense and uniform than constructs with the entire core homology region (Fig. 9). Positive sites include muscles, bones, and kidney; however, expression in skin fascia was abrogated in all five transgenic embryos (Fig. 9). This last result contrasts with the consistent, albeit low and mosaic, expression in skin of -378LAC (Fig. 7). Sequence inspection revealed the presence of a nuclear matrix attachment region (MAR) consensus sequence in the 5'-third of the core homology region (Fig. 3) (27). It remains to be determined whether or not the MAR sequence participates together with the AluI repeats in conferring position independence to the transgene. Awaiting additional data, we interpreted the transgenic results to suggest that full enhancer activity depends on the integrity of the whole core homology region and on the presence of the immediately 3'-flanking sequence.


View larger version (101K):
[in this window]
[in a new window]
 
Fig. 9.   Tissue distribution of human 20.2/18.8pLAC. X-gal staining can be seen in a large subset of type I collagen-producing cells. In A, osteoblasts of the calvarium (arrows) are stained blue but not the brain (b). Other osteoblasts are also stained, as seen in the clavicle bone (C). The mandibular bone (D, arrow) and the cortical bone of a lumbar vertebra (F). In B, cells in the tongue are stained. In D, the muscular wall of the abdominal aorta (a) is stained. The fascia of the skin (G, f) along the back of the embryo and keratinocytes of the epidermis (arrow) are both negative. The intercostal muscles (m) between the ribs appear blue (G) as do the shoulder girdle muscles (J, m). In the abdomen, the only staining is seen in the mesonephric tissue (H, ms) but not the interstitial cells of the testis (t). The other organ that shows strong staining is the kidney (I), mainly in the interstitium.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The type I collagen genes are an instructive example of coordinated regulation during embryogenesis in a variety of mesenchyme cells and with tightly controlled spatio-temporal patterns (2). Combinatorial interactions among restricted and ubiquitous transcription factors are generally believed to direct gene expression in individual cell types and at distinct developmental stages (28). Identification of cell and/or stage-specific cis-acting elements in type I collagen genes may therefore provide new insights into the mechanisms that underlie the diversification of mesenchyme cell specification. Finding evolutionarily retained DNA sequences in different organisms provides strong indication of critically important cis-acting elements and indirect evidence for the identification of transcriptional regulatory networks (13). Using the transgenic model and guided by the prior analysis of chromatin structure, we have identified DNA elements that direct high and tissue-specific expression of the human COL1A2 gene. The results have revealed remarkably conserved chromatin structure and sequence composition in the far-upstream region of the human and mouse alpha 2(I) collagen genes. They have also suggested differences in functional organization of the cis-acting elements that control tissue-specificity in the two genes.

Six distinct DNase I-hypersensitive sites were mapped around the start site of transcription of COL1A2 and within 20 kb upstream of it; their spatial arrangement, cell-type specificity and relative availability to DNase I digestion are comparable to those of the mouse gene. There is only one exception, namely a hypersensitive site in the first intron (HS(In)) of COL1A2, which is absent in Col1a2. HS(In) appears to be inaccessible to DNase I digestion in type I collagen-producing fibroblasts, a finding consistent with a putative silencing role of this unique hypersensitive site. Indeed, a previous study has correlated the first intron of COL1A2 with transcriptional repression of promoter constructs (4). Consistent with the uniqueness of HS(In), the intronic sequences of the mouse and human genes are divergent.

Our chromatin survey identified three strong hypersensitive sites (HS3-5) at about the same location as those of the mouse far-upstream enhancer and within nearly identical sequences. Although limited to three cell lines, the analysis nevertheless suggests that HS3-5 may represent sites of open chromatin in type I collagen-producing cells. DNase I footprinting assays correlated areas of sequence identity with twelve distinct areas of nuclear protein interaction, which are distributed in three different clusters within the core homology region. Like previous transgenic work on the mouse gene, we demonstrated that the far-upstream region of COL1A2 acts as a strong tissue-specific enhancer in concert with the proximal promoter (11). Our study has also extended the work on the mouse enhancer by revealing two additional features of the far-upstream sequence of COL1A2.

The first feature pertains to the role of the 3'-flanking sequence in augmenting position-independent expression of the transgene. A variety of elements of different composition have been described that protect expression of integrated transgenes from the surrounding chromatin environment (13-15, 29-30). The 1-kb segment that flanks the 3'-boundary of HS3-5 appears to fulfill such a function, because deletion of this sequence leads to a substantial decrease of beta -galactosidase-positive transgenic embryos. We also noted the presence of an MAR sequence within IS2. These kinds of sequences are commonly found at the boundaries of transcription units and/or near enhancers (13, 27); however, they do not confer by themselves position independence (13, 31). Accordingly, we propose that the AluI repeats may protect the transgene from position effect by organizing chromatin in concert with core homology region element(s), such as the MAR sequence. Similar data in transgenic models have suggested the existence of comparable elements in the alpha 1(I) collagen gene (12).

The second feature pertains to the loss of skin expression in the construct harboring the enhancer without putative cis-acting elements FU1-7. This result contrasts with the low but reproducible X-gal staining of skin fascia in -378LAC transgenic embryos. We believe that these observations are indirect evidence that the far-upstream enhancer works in concert with the proximal promoter. We rest our conclusion on the widely held view that high and tissue-specific gene expression depends on local concentrations of protein complexes that are bound to enhancer and promoter sequences and that, in turn, favor interaction between these physically distinct DNA entities (32). A case in point is the mutually exclusive interactions of the upstream LCR with the proximal promoters of the beta -globin gene cluster during development (13-16). It follows that eliminating a substantial number of cis-acting elements (i.e. FU1-7) could dramatically subvert the natural architecture of the enhancer/promoter interaction and, for example, lead to loss of expression in one or more tissues. Such a strict enhancer/promoter interdependence would explain the more severe effect of deleting a portion as opposed to the entirety of the core homology region. Tissue specificity is therefore more likely to depend on synergism among multiple cis-acting elements that reside in both enhancer and promoter, rather than only on individual sites in each of them.

Interestingly, reduced enhancer activity and loss of skin expression were not observed in comparable mouse transgenes (11). The discrepancy may be reconciled by arguing that these genomic segments, albeit highly homologous, are bound by different transcriptional complexes. There is a precedence in support of our hypothesis. The TGFbeta -responsive element was located to nearly identical promoter sequences of Col1a2 and COL1A2 but was associated with CTF/NF-I binding in the former and with the Sp1·Smad2-Smad4 complex in the latter (16, 19-21). Although the location of the nuclear protein binding sites in the mouse far-upstream enhancer is currently unknown, the differential expression of construct with similar deletions of the far-upstream enhancer may indicate subtle differences in the regulation of COL1A2 and Col1a2.

In conclusion, the human -378 COL1A2 promoter contains elements that control tissue specificity in subsets of mesenchymal cells, and this activity is significantly augmented when the promoter is linked to the far-upstream enhancer. We therefore propose that the predominant function of the far-upstream element is to broaden and intensify tissue-specific transcription from the proximal promoter. Indeed, promiscuous expression of the -378LAC transgene was never observed in tissues that do not express the endogenous collagen gene. Our study provides the first indication for the evolutionary conservation of the functional domain of the mammalian alpha 2(I) collagen genes and the first characterization of critical DNA elements in the far-upstream enhancer. Work in progress is focusing on the characterization of trans-acting factors binding to the core homology region of COL1A2, as well as on deciphering the functional relationship between the far-upstream enhancer and proximal promoter.

    ACKNOWLEDGEMENTS

We thank Cindy Else for excellent technical assistance and Karen Johnson for typing the manuscript.

    FOOTNOTES

* This work was supported by National Institutes of Health Grants AR386481, AR44888, and HL41262 and by the Medical Research Council and Arthritis Research Campaign (UK).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF345994.

§ Both authors contributed equally to the work.

Published, JBC Papers in Press, March 28, 2001, DOI 10.1074/jbc.M101397200

Dagger Dagger To whom correspondence should be addressed: Dept. of Biochemistry and Molecular Biology, Mount Sinai School of Medicine-New York University, One Gustave L. Levy Place, Box 1020, New York, NY 10029. Tel.: 212-241-1757; Fax: 212-722-5999; E-mail: ramirf01@doc.mssm. edu.

    ABBREVIATIONS

The abbreviations used are: kb, kilobase(s); COL1A2, human alpha 2(I) collagen gene; FU, footprint; HS, DNase I-hypersensitive sites; IS, identity island; LCR, locus control region; TGFbeta , transforming growth factor-beta ; TNFalpha , tumor necrosis factor-alpha ; HUVEC, human umbilical vascular endothelial cells; X-gal, 5-bromo-4-chloro-3-indolyo-beta -D-galactoside solution; bp, base pair(s); MAR, matrix attachment region.

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

1. Hay, E. D. (1991) in Cell Biology of Extracellular Matrix (Hay, E. D., ed), 2nd Ed. , pp. 419-462, Plenum Press, New York
2. Vuorio, E., and de Crombrugghe, B. (1990) Annu. Rev. Biochem. 59, 837-872[CrossRef][Medline] [Order article via Infotrieve]
3. Bornstein, P., and Sage, H. (1989) Prog. Nucleic Acids Res. 37, 67-106[Medline] [Order article via Infotrieve]
4. Sherwood, A. L., Bottenus, R. E., Matzen, M. R., and Bornstein, P. (1990) Gene 89, 239-244[Medline] [Order article via Infotrieve]
5. Pavlin, D. A., Lichtler, A. C., Bedalov, A., Kream, B. E., Harrison, J. R., Thomas, H. F., Gronowicz, G. A., Clark, S. H., Woody, C. O., and Rowe, D. W. (1992) J. Cell Biol. 116, 227-236[Abstract]
6. Niederreither, K., D'Souza, R. N., and de Crombrugghe, B. (1992) J. Cell Biol. 119, 1361-1370[Abstract]
7. Liska, D. A. J., Reed, M. J., Sage, E. H., and Bornstein, P. (1994) J. Cell Biol. 125, 695-704[Abstract]
8. Bedalov, A., Breault, D. T., Sokolov, B. P., Lichtler, A. C., Bedalov, I., Clark, S. H., Mack, K., Khillan, J. S., Woody, C. O., Kream, B. E., and Rowe, D. W. (1994) J. Biol. Chem. 269, 4903-4909[Abstract/Free Full Text]
9. Rossert, J., Eberspaecher, H., and de Crombrugghe, B. (1995) J. Cell Biol. 129, 1421-1432[Abstract]
10. Rossert, J. A., Chen, S. S., Eberspaecher, H., Smith, C. N., and de Crombrugghe, B. (1996) Proc. Natl. Acad. Sci. U. S. A. 93, 1027-1031[Abstract/Free Full Text]
11. Bou-Gharios, G., Garrett, L. A., Rossert, J., Niederreither, K., Eberspaecher, H., Smith, C., Black, C., and de Crombrugghe, B. (1996) J. Cell Biol. 134, 1333-1344[Abstract]
12. Krempen, K., Grotkopp, D., Hall, K., Bache, A., Gillan, A., Rippe, R. A., Brenner, D. A., and Breindl, M. (1999) Gene Expr. 8, 151-163[Medline] [Order article via Infotrieve]
13. Dillon, N., and Sabbatini, P. (2000) Bioessays 22, 657-663[CrossRef][Medline] [Order article via Infotrieve]
14. Engel, J. D., and Takimoto, K. (2000) Cell 100, 499-502[Medline] [Order article via Infotrieve]
15. Festenstein, R., and Kioussis, D. (2000) Curr. Opin. Genet. Dev. 10, 199-203[CrossRef][Medline] [Order article via Infotrieve]
16. Inagaki, Y., Truter, S., and Ramirez, F. (1994) J. Biol. Chem. 269, 14828-14834[Abstract/Free Full Text]
17. Inagaki, Y., Truter, S., Tanaka, S., Di Liberto, M., and Ramirez, F. (1995) J. Biol. Chem. 270, 3353-3358[Abstract/Free Full Text]
18. Greenwel, P., Tanaka, S., Penkov, D., Zhang, W., Olive, M., Moll, J., Vinson, C., Di Liberto, M., and Ramirez, F. (2000) Mol. Cell. Biol. 20, 912-918[Abstract/Free Full Text]
19. Zhang, W., Ou, J., Inagaki, Y., Greenwel, P., and Ramirez, F. (2000) J. Biol. Chem. 275, 39237-39245[Abstract/Free Full Text]
20. Poncelet, A. C., and Schnaper, W. H. (2001) J. Biol. Chem. 276, 6983-6992[Abstract/Free Full Text]
21. Rossi, P., Karsenty, G., Roberts, A. B., Roche, N. S., Sporn, M. B., and de Crombrugghe, B. (1988) Cell 52, 405-414[Medline] [Order article via Infotrieve]
22. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed. , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
23. Hogan, B., Beddington, R., Costantini, F., and Lacy, E. (1994) Manipulating the Mouse Embryo , 2nd Ed. , Cold Spring Laboratory Press, Cold Spring Harbor, NY
24. Niederreither, K., D'Souza, R., Metsaranta, M., Eberspaecher, H., Toman, P. D., Vuorio, E., and de Crombrugghe, B. (1995) Matrix Biol. 14, 705-713[Medline] [Order article via Infotrieve]
25. Gross, D. S., and Garrad, W. T. (1988) Annu. Rev. Biochem. 57, 159-197[CrossRef][Medline] [Order article via Infotrieve]
26. Howard, B. H., and Sakamoto, K. (1990) New Biol. 2, 759-770[Medline] [Order article via Infotrieve]
27. Bode, J., Benham, C., Knopp, A., and Mielke, C. (2000) Crit. Rev. Eukaryot. Gene Expr. 10, 73-90[Medline] [Order article via Infotrieve]
28. Kadonaga, J. T. (1998) Cell 92, 307-313[Medline] [Order article via Infotrieve]
29. Gross, D. S., and Garrard, W. T. (1987) Trends Biochem. Sci. 12, 293-297
30. Bonifer, C., Huber, M. C., Jäigle, U., Faust, N., and Sippel, A. E. (1996) J. Mol. Med. 74, 665-671
31. Poljak, L., Seum, C., Mattioni, T., and Laemmli, U. K. (1994) Nucleic Acids Res. 22, 4386-4394[Abstract]
32. Dröge, P., and Müller-Hill, B. (2001) Bioessays 23, 179-183[CrossRef][Medline] [Order article via Infotrieve]


Copyright © 2001 by The American Society for Biochemistry and Molecular Biology, Inc.