Departments of Obstetrics and Gynecology and Molecular Genetics, University of Illinois at Chicago, Chicago, Illinois 60612
Address all correspondence and requests for reprints to: Serdar E. Bulun, M.D., Department of Obstetrics and Gynecology, University of Illinois at Chicago, 820 South Wood Street M/C 808, Chicago, Illinois 60612. E-mail: sbulun{at}uic.edu
Aromatase, the key enzyme for E biosynthesis, is encoded by a single copy of the CYP19 gene, localized at chromosome 15q21.21 (1, 2). In most vertebrates, aromatase is expressed only in gonads and the brain, whereas primates express this gene in additional extragonadal sites (1). Aromatase expression and E production continuously increase, as evolutionary tree progresses and reach the maximum in the human. This is achieved by the more efficient use of existing promoters and recruitment of additional novel tissue-specific promoters in fat, skin, placenta, and the bone (1). E is essential in females for the development of reproductive organs, and in both sexes for bone mineralization and gonadal function (3, 4). Moreover, in E-dependent pathologic tissues such as breast cancer and endometriosis, aromatase is up-regulated via inappropriate activation of aberrant promoters (5, 6). Alternative use of multiple promoters, which regulate mature aromatase mRNA levels by splicing of each first exon or 5'-untranslated region (UTR) onto a common splice junction immediately upstream of the coding region, is the key molecular mechanism conferring tissue-specific expression of the CYP19 gene. For instance, the proximal ovary-specific promoter (PII) gives rise to a 5'-UTR contiguous with the first coding exon (exon II), whereas a constitutively active distal promoter (I.1) in placenta is the basis of strikingly elevated levels of circulating E (1001000 times normal) in pregnant women (7, 8). Because all mRNA species contain the identical open reading frame (exons II to X), the encoded protein is the same regardless of the promoter used. As a further twist, some of these promoters do not contain canonical TATA and CAAT elements, and each promoter is regulated in response to a distinct set of hormones or cytokines (1). Simpson et al. (1) estimated the size of the CYP19 (aromatase) gene as larger than 80 kb. Characterization of the entire genomic organization and the accurate size of this large gene, however, had not been possible to date using conventional phage or cosmid genomic libraries (1). Recently published Human Genome Project Data (http://www.ncbi.nlm.nih.gov/genome/guide/human/) allowed us, for the first time, to precisely locate all known promoters and elucidate the extraordinarily complex organization of the entire human CYP19 gene.
To achieve this, nucleotide sequence information of various 5'-UTRs and the coding region of the CYP19 gene were subjected to the BLAST (Basic Local Alignment Search Tool) homology search against the High Throughput Genomic Sequence (HTGS) database of the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/HTGS) (9). GenBank Accession numbers of the previously characterized unique 5'-UTRs and corresponding promoters of the human CYP19 gene used for pairwise local sequence alignments were: M22246 (human full-length aromatase cDNA), S52794 (ovary-specific promoter PII), D21241 (adipose/breast cancer, I.3), S96437 (placenta-minor, I.2), D29757 (brain, I.f), S71536 (fetal tissues, I.5), L21982 (skin and adipose, I.4), D14473 (placenta-minor, 2a), and X55983 (placental-major, I.1). Sequence of bone-specific 5'-UTR and its promoter I.6 was obtained from the original publication (10). BLAST search permitted us to align the CYP19 gene coding region (exons II to X) and promoters II, I.3, I.6, and I.2 to a 178,762-bp bacterial artificial chromosome (BAC) clone (RP11-522G20) mapped to chromosome 15q21.2 (GenBank accession no. AC012169). Similarly, the remaining promoters were precisely aligned within another 144,714-bp BAC clone (RP11-108K3) mapped to chromosome15q21.2 (GenBank accession no. AC020891). Next, using the BLAST 2 search program (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html), we set out to examine the alignment between these two clones, and results showed an overlap of a 6442-bp region, demonstrating continuity of the gene sequence in these BAC clones.
An analysis of these two BAC clones revealed that the entire gene spans
over approximately 123-kb of DNA. Additionally, the CYP19
gene is found between markers stSG12786 and stSG47530 with the 3'-end
of the gene telomeric to the 5'-end of the gene, showing the direction
of transcription as from centromere to telomere. Only the 30-kb
3'-region encodes aromatase, whereas a large 93-kb 5'-flanking region
serves as the regulatory unit of the gene. The most proximal
ovarian-specific promoter II and two other proximal promoters, I.3
(expressed in adipose tissue and breast cancer) and I.6 (expressed in
bone), are found to be located within the 1-kb region upstream of the
ATG translation start site in exon II, as expected (Fig. 1). Promoter I.2, the minor
placenta-specific promoter, is located approximately 13 kb upstream of
the ATG site in exon II. The promoters specific for the brain (I.f),
fetal tissues (I.5), adipose (I.4), and placenta (2a and I.1) are
localized in tandem order at
33, 43, 73, 78, and 93 kb,
respectively, upstream of the first coding exon, exon II. Intriguingly,
placental promoter I.1 located approximately 93,000 bp upstream of the
coding region is the most distally located promoter, which gave rise to
splicing of a 103-bp first exon onto the common splice junction
immediately (38 bp) upstream of the ATG translational start site. The
activity of promoter I.1 is the basis for 100-1000 times elevated
levels of circulating E in pregnant women (11, 12). Thus,
recruitment of this most distal promoter may have an evolutionary
impact, since, of all species, humans are unique to acquire and
maintain extraordinarily high levels of aromatase expression in
placenta.
|
Elucidation of the complete structure and organization of the human CYP19 gene in its fine details will facilitate further characterization of various molecular mechanisms by which the tissue-specific and temporal expression of this gene is regulated, in normal tissues and pathological conditions such as breast cancer and endometriosis.
Additionally, this report highlights a potential use of the Human Genome Project data and powerful bioinformatics tools (especially the freely available web-based programs) to disseminate information about the structure and organization of very large and complex genes. In absence of this freely available database and other resources, this type of work not only may be technically challenging to many laboratories, but also require a significant amount of time and effort devoted to traditional molecular biology techniques such as cloning, library screening, and sequencing.
Acknowledgments
Footnotes
This work was supported in part by NIH Grants HD38691 and CA67167 (to S.E.B.).
Abbreviations: BAC, Bacterial artificial chromosome; BLAST, Basic Local Alignment Search Tool; UTR, untranslated region.
Received April 16, 2001.
Accepted May 24, 2001.
References