Molecular Physiology Laboratory, University of Edinburgh, Edinburgh EH8 9AG, Scotland, United Kingdom
A TYPICAL MAMMALIAN GENE is regulated by multiple control regions. Transcription control at the promoter occurs through the binding of multiple regulatory factors to specific sites near the transcription start site. However, additional protein-DNA interactions, occurring at considerable distances from the promoter, can exert an equally important effect on gene expression. Thus enhancers, which are 100200 base pairs (bp) in length, can lie many tens of kilobases from the promoter, yet are potentially capable of stimulating transcription. Whether the stimulation is effected by increasing the transcription rate, or by increasing the probability of gene activation within a given cell, is not entirely clear (3). It is likely, given the DNase I hypersensitivity of active enhancer sequences, that chromatin structure is altered, perhaps by decondensation or histone modification, allowing functional communication with the promoter. The inclusion of an enhancer sequence with a transgene generally leads to increased expression and may also confer tissue specificity. However, the site of integration of the transgene can have profound overriding effects on its overall expression. Thus integration into silent or repressive chromatin can lead to gene silencing, despite the presence of the enhancer.
The use of larger transgenes, P1 artificial chromosomes (PACs), bacterial artificial chromosomes (BACs), and yeast artificial chromosomes (YACs), as sources of insert represents an attempt to incorporate sufficient flanking sequences to buffer any extraneous effects from sequences outlying the integration site. In fact, this strategy has led to the identification of further classes of DNA control that prevent the integration site from influencing gene transcription. Boundary elements appear to provide a barrier to the spread of repressive chromatin conformation. Locus control regions (LCRs), on the other hand, act like specialised enhancers, supporting high levels of expression from a transgene, independent of the site of insertion, and roughly proportional to the transgene copy number (4).
Enhancers, boundary elements, and LCRs have been studied for only a handful of gene loci, and their exact form and functions remain unclear. The globin loci have been studied in most detail, shedding light on these DNA control elements. For example, mice carrying lacZ driven by the -globin promoter were found to express ß-galactosidase in only a very small proportion of embryonic erythrocytes (18). When the
HS40 enhancer (which controls expression of
-globin, and has been shown not to act as an LCR; Ref. 14) was included in the transgene construct, expression was seen in a high proportion of embryonic erythrocytes. The actual level of ß-galactosidase expression in erythrocytes was the same, regardless of whether the cells were isolated from transgenics containing the
-globin promoter alone or in conjunction with the enhancer. This would appear to add credence to the concept of an enhancer increasing the likelihood of transcription establishment or maintenance during development, rather than increasing the level of transcription. Variation between transgenic lines was attributed to differences in integration site.
It has been suggested that a constitutive DNase I hypersensitive site 5' of the chicken ß-globin locus, called 5'HS4, acts like a boundary or insulator element. Chromatin upstream of the site is condensed, whereas downstream the chromatin is open, histone H4 is more abundant in the acetylated state, and genes are transcriptionally active. In human erythroleukemia cells, 5'HS4 was able to inhibit the stimulatory effect of the mouse ß-globin 5'HS2 enhancer, when it was placed between the enhancer and a promoter. Under conditions where the construct integration site was controlled, however, the ability to block enhancement varied from site to site, and interestingly, 5'HS4 inhibited the enhancer effect even when placed upstream of the enhancer. Although these constructs may not truly reflect the action of endogenous 5'HS4, these suggest that the function of a boundary element may be more complicated than first thought.
The human ß-globin LCR is a key regulatory element, physically characterized by six DNase I hypersensitive sites designated HS1HS6. Most of the activity of the LCR resides in the core elements of the DNase I hypersensitive sites, which range in size from 200 to 400 bp. One of the hypersensitive sites, HS3, contains seven GT motifs (important cis-regulatory elements required for correct expression of many housekeeping and tissue-specific genes). Analysis of transgenic mice carrying a human ß-globin YAC, in which the sixth GT motif of the HS3 core was mutated, showed that expression of -and
-globin genes was significantly reduced during development but that ß-globin gene expression was unaffected (9). These results suggest that mutations of a single transcriptional motif of a distant regulatory element can have profound effects on gene expression. The relative contributions of HS2 and HS3, in isolation or combination, were investigated using constructs that linked the elements to a fluorescent reporter, driven by the ß-globin promoter. Both HS2 and HS3 showed similar levels of enhancement singly and marked synergy in regulating spatial and temporal expression together. HS2 was found to be important for tissue-specific expression of ß-globin during development (5).
Recently, new methods have been developed to look at higher-order chromosome organization (2). Long-range looping interactions have been shown to occur upon activation of the mouse ß-globin locus, shedding light on just how gene regulatory elements are able to act at large genomic distances from their target genes. Spatial clustering of cis-regulatory elements and active ß-globin genes produces an active chromatin hub (ACH) at different stages of development, in erythroid progenitors (19). The ACH core, which is conserved in mouse and humans, consists of the hypersensitive sites of the LCR, as well as additional hypersensitive sites upstream (5'HS60/62) and downstream (3'HS1) of the ß-globin locus. In erythroid progenitors, which are committed but not yet expressing ß-globin, stable interactions between 5'HS60/62, 3'HS1 and hypersensitive sites at the 5' of the LCR were demonstrated. After induction of differentiation, the rest of the LCR is incorporated into the cluster, creating an erythroid-specific, developmentally stable nuclear compartment dedicated to RNA polymerase II transcription of the activated gene (12).
To identify distant control regions affecting the expression of Th2 cytokine genes IL4, IL5, and IL13, which are clustered and expressed in a cell lineage-specific manner, both DNase I hypersensitivity assays and rigorous phylogenetic analyses were applied. Conserved regions of DNA of at least 100 bp lying in the vicinity of the Th2 cytokine cluster were identified, by comparison of DNA sequences from different mammalian species (7). Of the potential control sequences identified, several were found to show DNase I hypersensitivity. Deletion of a conserved noncoding sequence between IL4 and IL13 indicated that the sequence is required for optimal expression of the Th2 cytokines (8). Deletion of DNase I hypersensitive sites downstream of IL4 selectively compromised IL4 gene transcription by differentiated Th2 cells and mast cells (17), again demonstrating the dramatic effect of mutations/deletions at a distance from the affected gene. BAC transgenesis recently confirmed the presence of an LCR in the locus (6, 16). The expression of IL4 and IL13, but not IL5, were found to be copy number dependent, suggesting that IL4 and IL13 are regulated independently of the tightly linked IL5. Deletion analysis located the LCR to a 25-kb fragment, between IL5 and IL13, lying within a fourth gene, RAD50. It has not yet been determined whether expression of RAD50, a DNA repair enzyme, is regulated by the LCR. The Th2 cytokine cluster locus clearly demonstrates that genes lying either side of an LCR can be subject to very different expression control.
The chicken lysozyme locus has been shown to contain all the cis-elements necessary for position-independent and tissue-specific expression, entirely within a 24-kb DNase I hypersensitive site, and flanked by matrix attachment regions. It is considered to be a functional chromatin domain, structurally and functionally isolated from neighboring chromatin. Surprisingly, a locus encompassing a glioma-amplified sequence (cGas41) was found to lie entirely within the lysozyme chromatin domain, located 207 bp downstream from cLys. The cGas41 gene was preceded by a CpG island, which is commonly associated with housekeeping genes. Importantly, the cGas41 transcript, which encodes a transcription factor, is widely and differentially expressed, unlike lysozyme, which is expressed in the oviduct and myeloid cells (1). Here we see a clear example that two completely unrelated genes can be differentially expressed and yet coexist in the same functional chromatin domain.
Taken together, these studies demonstrate the potential complexities of mammalian gene regulation and the caveats involved in using large transgenes in which the extent of 5' and 3' sequences is often arbitrarily selected, depending on the availability of appropriate clones. The chosen clone may or may not contain all the elements necessary to recapitulate endogenous wild-type expression.
In this release of Physiological Genomics, a further example of long-range gene control is described. Nistala et al. (10) have looked at the differential expression of genes closely linked to the renin (REN) locus, encompassed within a 160-kb PAC clone. Previously, it had been shown that the PAC160 transgenic mouse represented a faithful model of human renin gene expression (15). It encompassed a strong enhancer, 12 kb upstream of REN. The enhancer was identified by homology to a similar sequence 2.6 kb upstream of the mouse renin gene, which directs renin expression to the kidney (13). REN transcription was largely restricted to the juxtaglomerular cells of the kidney, and its expression was highly regulated in response to physiological cues. Importantly, expression was shown to be proportional to transgene copy number and immune from position effects, suggesting the presence of a LCR, or at least, a strong enhancer element.
Nistala et al. wanted to know whether the other genes included in the PAC were expressed in transgenic mice and how the expression patterns of the genes compared with that of renin. A complete copy of KISS1, a metastasis suppressor gene, was identified upstream of the kidney enhancer, whilst FLJ10761, a gene that is highly conserved and possibly encodes an ethanolamine kinase, was identified immediately downstream of the REN locus. Expression of FLJ10761 in PAC160 mice closely emulated its normal expression in humans, being highly expressed in the kidney, liver, and testis. Since the gene is only 3.7 kb downstream of REN, it was of interest to identify elements that might drive the high-level liver expression, given that renin is not expressed in that tissue. Computer analysis of the sequences 1 kb upstream of FLJ10761 revealed binding sites for a number of liver-specific transcription factors (indicating a possible "natural expression boundary" between the genes), which now require detailed study.
Importantly, expression of both REN and FLJ10761 genes was proportional to copy number. Expression of KISS1, on the other hand, was not found to be proportional to copy number, nor was normal tissue specificity (abundant expression in the placenta and lower level expression in the brain) retained. In PAC160 mice, KISS1 was expressed in the lung, brain, and kidney. Nistala et al. speculate that the KISS1 promoter may have been influenced by sequences such as the kidney-specific enhancer and a second enhancer, 5 kb upstream of REN, that directs chorionic expression. (Clearly this is not the case for the endogenous human KISS1 gene, despite being positioned in relation to REN exactly as it is in PAC160). Taken together, the evidence suggests that the kidney-specific enhancer, 12 kb upstream of the human REN gene, may also act as an LCR. It will be interesting to see how expression patterns of the three genes are affected by homologous targeting or deletion of the putative LCR and enhancer sequences.
There are a number of lessons to be learned. If sufficiently large, the transgene construct may include all cis-elements necessary to emulate the endogenous expression patterns of your gene of interest, independent of site of insertion. However, it is clearly imperative to identify all genes present on the transgene, not only because their expression may have a bearing on the phenotype of the resultant transgenic animal, but also because assessment of their differential expression patterns gives valuable information about the location of putative LCRs, the range of influence of enhancers, and the possible location of expression boundaries between closely linked genes. With the rapidly improving techniques for homologous targeting of BAC and PAC sequences, together with phylogenetic analyses, it is expected that long-range control elements will become better defined, both at the sequence and functional levels. Given the long distances over which control elements act, apparent gene deserts (gene-poor regions greater than 500 kb), which make up 25% of the human genome, may prove to contain important elements for gene regulation (11) and would need to be considered in the development of transgene constructs of the future.
FOOTNOTES
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: J. J. Mullins, Molecular Physiology Laboratory, Univ. of Edinburgh, Edinburgh EH8 9AG, Scotland, UK (E-mail: J.Mullins{at}ed.ac.uk).
REFERENCES
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Visit Other APS Journals Online |