From the aInstitut für Immungenetik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, Spandauer Damm 130, 14050 Berlin, Germany, the dWellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom, the fDepartment of Pathology, Division of Immunology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom, and the hUrologische Klinik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, Schumannstrasse 20/21,10117 Berlin, Germany
Received for publication, December 6, 2002 , and in revised form, March 3, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
OR genes are expressed not only in the olfactory epithelium but also in numerous other organs (12, 13, 14, 15), suggestive of additional functions. In the testes of several mammalian species including man, at least 50 OR genes are transcribed that could be involved in sperm development, sperm competition, chemotaxis, or interaction between sperm and oocyte (16, 17, 18). The involvement of OR in path finding has already been demonstrated for axon guidance of olfactory sensory neurons (OSN) (19, 20). For testicularly expressed OR, an involvement in self non-self-discrimination has been suggested that could have developed to favor fertilization of female gametes by genetically different male germ cells (21).
However, next to nothing is known about the transcriptional control of OR genes. Only a single OR is expressed by a given OSN in a monoallelic fashion (22), probably to avoid the necessity of signal integration or scoring. It is enigmatic how this monoallelic expression mode is achieved for hundreds of OR genes present in a single OSN and whether it is implemented also in nonolfactory tissues. Interestingly, the analysis of promoter regions of the odorant transduction pathway components and other marker proteins of mature OSN has so far failed to reveal a general understanding of transcriptional control mechanisms. However, a consensus sequence, the Olf-1 site, that binds a transcription factor (early B-cell factors) expressed solely in OSN and early B-cells (23) has been identified. The importance of this site for OR expression has, however, been questioned, because mice with an early B-cell factors null mutation displayed a profound B-cell deficit but exhibited a morphologically normal olfactory epithelium, an expressed olfactory marker protein, and the OR-specific G-protein Golf (24). Recently, two additional transcription factors (O/E-2 and O/E-3) were identified that interact with the same DNA-binding site. They were found to be transcribed only in OSN, but their relevance for OR expression awaits confirmation (25). Experimental evidence is also lacking for OLF-1 sites within the putative promoter region of some OR genes belonging to the chromosome 17p13.3 cluster (7). Likewise, the comparison of paralogous OR genes in man and mouse (6, 26, 27, 28, 29) has not yet provided any evidence for common regulatory features shared among OR genes from different species.
Our study addresses these problems by carrying out a detailed analysis of the transcriptional status of HLA-linked OR genes. We have employed testicular tissue for these studies to explore the function of these receptors in reproduction. We describe here that testicular OR gene transcripts are generated by a highly unorthodox combination of complex co- and post-transcriptional events, including long distance and intracoding exon splicing, exon sharing, and premature polyadenylation.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Rapid Amplification of cDNA Ends (RACE)Gene-specific primers were located directly downstream of the start codon or in the third transmembrane domain for the 5' rapid amplification of cDNA ends (RACE). Initially, a "pool" PCR was conducted containing the anchor primer 1 (AP1, 10 pmol), the gene-specific primers of all analyzed OR (gene-specific primer 1, 10 pmol each; Table I), 0.5 ng of anchored cDNA (Marathon-ReadyTM cDNA; Clontech), 2.5 units of Taq (Takara), and dATP, dCTP, dGTP, and dTTP (0.2 mM each) in a 50-µl reaction. The following parameters were used: 94 °C for 30 s, 63 °C for 30 s, and 72 °C for 40 s for 35 cycles and finally 72 °C for 4 min. Subsequently, nested gene-specific PCRs were conducted using 0.5 µl of PCR product from the pool PCR, the respective nested gene-specific primer (gene-specific primer 2; Table II) and the nested anchor primer AP2 with the following parameters: 94 °C for 30 s, X °C (Table II, Temperature column) for 30 s, and 72 °C for 40 s for 35 cycles and finally 72 °C 4 min. This nested PCR strategy eliminated much of the nonspecific background. In experiments with cDNA from freshly prepared testicular RNA (see below), the anchor primers AP1 and AP2 were replaced by SMART-TAG-1 and SMART-TAG-2, respectively (Tables III and IV).
|
|
|
|
To analyze coding region and 3'-UTR, primers located directly upstream of the start codon or in the last third of the coding region were used (Tables III, IV, V). A first pool PCR was conducted containing AP1 or SMART-TAG-1 (10 pmol) and the gene-specific primers (gene-specific primer 1; Table III; 10 pmol each). The following parameters were used: 94 °C for 30 s, 58 °C for 30 s, and 72 °C for 40 s for 35 cycles and finally 72 °C for 4 min. Subsequently, nested gene-specific PCRs were carried out using 0.5 µl of PCR product from the pool PCR, the respective nested gene-specific primer (gene-specific primer 2; Table IV), and the nested anchor Primer AP2 or SMART-TAG-2 with the following parameters: 94 °C for 30 s, X °C (Table IV, Temperature column) for 30 s, and 72 °C for 40 s for 35 cycles and finally 72 °C 4 min. For PCRs with primers located in the last third of the coding region, the following parameters were used: 94 °C for 30 s, X °C (Table V, Temperature column) for 30 s, and 72 °C 40 s for 35 cycles and finally 72 °C 4 min. Using the RACE results, gene-specific primers located in the 5'- and 3'-UTR were designed to facilitate the amplification of full-length cDNA (Table VI). PCR was conducted with the following parameters: 94 °C for 30 s, X °C (Table VI, Temperature column) for 30 s, and 72 °C for 40 s for 35 cycles and finally 72 °C 4 min. PCR products were analyzed by gel electrophoresis (10% acrylamide) and visualized with UV light after staining with ethidium bromide.
|
|
To verify the identity of the RACE products, Southern analysis was conducted with probes specific for the OR genes of interest. The probes used to confirm the 5' RACE products covered the first 100300 bp of the coding region and 100200 bp of the 5'-UTR.
Isolation of Total RNA and cDNA SynthesisTo preserve the RNA, human testis obtained from the Urologische Klinik, Universitätsklinikum Charité, Humboldt-Universität zu Berlin, was immersed in RNALater (Ambion) directly after surgery. Total RNA was purified by RNAgents® Total RNA Isolation System (Promega) according to the manufacturer's recommendations. RNA quality was assessed by gel electrophoresis. First-strand cDNA was synthesized from 5 µg of total RNA with an oligo(dT) primer (Oligo-dT-SMART) using SuperScriptTMII reverse transcriptase (Invitrogen) according to the manufacturer's recommendations. cDNA quality was assessed by the amplification of housekeeping genes (Table VII; primer and conditions were from Ref. 31).
|
Cloning and SequencingRACE reactions containing OR-specific products were purified with CHROMA SPIN-TE 100 columns and cloned into pCR®II using the TOPO TA cloning kit (Invitrogen). Before sequencing, recombinant clones were identified by blue/white screening and confirmed by PCR with the respective RACE primers. Inserts of positive clones were amplified with vector-specific primers (M13APfor and M13Aprev; Table VII; 95 °C for 20 s, 60 °C for 30 s, and 72 °C for 1 min for 25 cycles and finally 72 °C for 7 min) and, after ethanol precipitation, utilized directly for automatic sequencing (Licor) with labeled M13 primers (M13for and M13rev; Table VII; 95 °C for 4 min, 95 °C for 15 s, 56 °C for 15 s, and 70 °C for 15 s for 27 cycles and finally 72 °C for 7 min; with 5% Me2SO).
Promoter SearchAn analysis of putative promoter regions was undertaken using the TRANSFAC data base, in conjunction with the MatInspector program (32), and an in silico promoter search was performed using two pieces of software, Promoter Inspector (33) and Eponine (34).
Functional Promoter AssayThe region of interest was cloned into the pGL3 basic vector (Promega) in front of the firefly luciferase reporter gene using restriction sites BglII and XhoI. All of the resulting constructs were verified by sequencing, and plasmid DNA was purified prior to transfection. Human embryonic kidney cells (HEK293) (35) and Odora cells (36) were cultured in Dulbecco's modified Eagle's medium (Invitrogen) with 10% fetal calf serum, 100 units/ml penicillin, and 100 µg/ml streptomycin. 2 x 105 cells grown to 4080% confluence were transfected using the manufacturer's protocol for SuperFect reagent (Qiagen). Luciferase activity was determined on 100 µl of the cells in medium with 100 µl of Bright-Glo reagent (Promega). After a 2-min incubation to allow complete cell lysis, the samples were placed in a luminometer (Berthold Sirius), and readings were taken. Activity values were normalized to the average activity of the pGL3 control vector (which contains SV40 Promoter and Enhancer sequences) after subtracting the background luminescence activity (medium without cells plus Bright-Glo reagent).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Analysis of 5'-Untranslated RegionsInitially, gene-specific 5' RACE primers were located directly downstream of the start codon. This proximal position was chosen to facilitate access to the very 5' end of the transcripts. Because some G-protein-coupled receptors have been shown to be functional without the first two TM domains (37), additional gene-specific primers located in the third TM domain were designed to take also account of a transcriptional start site further downstream. After amplification of the 5'-UTR, the RACE reactions were examined by Southern analysis with OR-specific probes (Fig. 1), before cloning and sequencing of individual amplicons. OR-specific sequences were analyzed with the BLAST algorithm (38), revealing the genomic organization of the respective OR genes. Subsequently, splice donor and acceptor sites were verified. Despite two nested RACE reactions, a high background of false positive clones was obtained, especially for weakly expressed genes. The precise data are summarized in www.charite.de/immungenetik/ORexpression/exon-table.xls.
|
5' RACE was successful for almost all of the analyzed HLA-linked OR. No positives were detected for hs6M1-4P, -13P, -14P, and -19P, which are, at least in some haplotypes, likely to be pseudogenes. 5'-UTRs of varying sizes but without additional exons were found for hs6M1-1, -3, -6, -10, -15, -17, and -20. In contrast, additional 5' exons were uncovered for hs6M1-12, -16, -18, -21, and -27.3 To compare HLA-linked and unlinked OR genes, 5' RACE was also conducted for three OR genes located on other chromosomes. hs19M1-4 was not expressed in testis, hs17M1-20 was expressed but revealed no 5' exon, and hs7M1-2 was expressed and showed two 5' exons.
For hs6M1-12, the most centromeric gene of the HLA-linked OR cluster, one additional 5' exon (A) was identified 1.5 kb upstream of the start codon in three unrelated clones, and no additional 5' exon was found in four unrelated clones.3
hs6M1-21, -27, -18, and -16 genes are located in a genomic region of about 110 kb (Fig. 2) and show a very unusual genomic organization. hs6M1-21, the most telomeric gene, exhibited several 5'-untranslated exons and shared some of these exons (exon L, exon M, and exon S, (Fig. 2)3 with hs6M1-27 and hs6M1-18. The first exon of hs6M1-21, observed in all three splice variants, was situated more than 100 kb upstream of the start codon. Variant 1 comprised only exon L, whereas variant 2 showed in addition to exon L another exon (M) 98 kb upstream of the initiator ATG codon. Variant 3 was found to contain exon S, situated
50 kb upstream of the start codon, and exon L (Fig. 2).3 The coding region of hs6M1-27 was located
20 kb centromeric to the coding region of hs6M1-21. For this gene, two alternatively spliced transcripts were found: variant 1 contained three 5'-untranslated exons: exon L (
80 kb upstream of the hs6M1-27 start codon), exon M, and exon S. All three exons were also identified for hs6M1-21. The second variant of hs6M1-27 contained only exon N (Fig. 2),3 which resides
75 kb upstream of the start codon and was not found in any other OR transcript. The coding region of hs6M1-18 (Fig. 2) resides
75 kb centromeric of hs6M1-21, in the same transcriptional orientation as hs6M1-21 and -27. Three alternatively spliced variants of this gene were found. The first exhibited four 5'-untranslated exons, again starting with exon L, which in this case was
30 kb upstream of the coding region. The other three exons (exons O, P, and Q) were situated within the first 4 kb upstream of the start codon. The second variant contained only exon L, and the third variant contained only exon R, 600 bp distal to the start codon. Within the introns of hs6M1-18, -21, and -27, the three OR genes hs6M1-17, -19P, and -20 were located in opposite transcriptional orientation (Fig. 2). RACE products were found for hs6M1-17 and -20, but they contained no 5'-untranslated exons.
|
|
|
Premature PolyadenylationTo analyze the coding region and the 3'-UTR of the HLA-linked OR genes, RACE was conducted with gene-specific primers located directly upstream of the start codon. Only OR genes that could be successfully amplified in the previous 5' RACE experiments were considered (Table III). After pool and nested PCR, the products were cloned and sequenced. Using testis cDNA, specific products could be obtained for only three of the 12 analyzed genes (hs6M1-16, -18, and -21). Additionally, all of the RACE products with the exception of one transcript from hs6M1-16 and -18 showed premature polyadenylation. To exclude the possibility that these findings were artifacts of the anchored cDNA library (Marathon; Clontech), mRNA was also freshly isolated from human testis and reverse transcribed using a tagged oligo(dT) primer.
After pool and nested PCR with this cDNA and the respective tag primers (SMART-TAG-1 and -2), products were obtained for hs6M1-16 and -21 with premature polyadenylation at the same positions as with the anchored cDNA library. However, no transcripts were observed for hs6M1-18 with the SMART-tagged cDNA. For hs6M1-21, eight different RACE products from the Marathon library and three from the second cDNA source were found. They exhibited premature polyadenylation within the coding region at three defined positions (Fig. 4), although no polyadenylation consensus signals could be detected. Interestingly, the hs6M1-16 transcript with the most downstream premature polyadenylation site (position 596) had lost 234 bp of the coding region, including the premature polyadenylation sites of the other analyzed transcripts and the genuine start codon. Because splice donor and acceptor sites were found at the edges of the missing material, this clone is best explained by an additional splicing event. Additionally, one transcript from hs6M1-16 (Fig. 5, variant 11) was found without premature polyadenylation, but again, the other premature polyadenylation sites were lost because of a splicing event removing the first 221 bp of the coding region.
|
Analysis of the 3'-Untranslated RegionsFurther oligonucleotides located in the last third of the coding region were designed to analyze the 3'-UTR of hs6M1-21, -27, -18, and -16. Three alternatively spliced variants were observed for hs6M1-16. The first transcript (variant 8) contained exon K, the second (variant 9) exon J, and the third (variant 10) exon I as well as exon J (Fig. 5).3 These exons were located within 1.6 kb of the stop codon. A 3' unspliced variant (11) was also found. Only the most downstream exon (K) contained a typical polyadenylation signal (40) (Fig. 5).
Two unspliced 3' variants differing only in length were identified for hs6M1-27 (variants 3 and 4; Fig. 6).3 Variant 4 contained the polyadenylation consensus signal 24 bp upstream of the poly(A) tail. The 3'-UTR of the only hs6M1-18 3' variant was 845 bp in size and unspliced,3 and a polyadenylation signal was found 17 bp upstream of the poly(A)-tail. However, no specific 3' RACE products from hs6M1-21 could be obtained.
|
Analysis of the Coding RegionsTo amplify transcripts that contained the complete coding region, gene-specific primers located in the 5'- and 3'-UTR were designed. Five variants of hs6M1-16 were detected (Fig. 5, variants 1115). Variant 14 contained the complete coding region and a spliced 3'-UTR (exon J), whereas variant 15 revealed an unspliced 3'-UTR. The 3' ends of variants 12 and 13 resembled those of 14 and 15 (Fig. 5) but lacked the first 234 bp of the coding region and 19 bp of the 5'-UTR, removing the first methionine of the presumed intact OR and suggesting that this transcript produced a protein of only 238 amino acids as opposed to the 316 amino acids of the full-length OR. Variant 11 had lost 221 bp of the coding region and showed an unspliced 3'-UTR. This could lead to the same truncated protein as described for variants 12 and 13. For all of the transcripts mentioned, consensus splice sites were found. hs6M1-27 yielded two different variants: variant 6 contained the complete coding region (Fig. 6), whereas variant 5 lacked the sequence between bp 412 and 683 of the coding region. These 272 bp must have been removed by splicing to give rise to a frameshift that resulted in a premature termination codon 20 codons after the splice site. In contrast, all of the products obtained for hs6M1-18 were found to be unspliced.
Search for Regulatory RegionsThe TRANSFAC data base, accessed by MatInspector, was used to search for transcription factor-binding sites within the 500-bp region containing exons B and C of hs6M1-16 and exon L of hs6M1-18, -21, and -27. A number of matches were observed (Fig. 2), but the OLF-1 binding site was not included. The detailed analysis of the region between exons B and L revealed significant matches for transcription factors belonging to three different groups: fork head-related activators, SRY-related factors, and AP1 transcription factors (Table VIII). Their binding motifs are all common within the genome. Therefore, the FastM (41) program was used, which allows a model of a putative promoter region to be developed through predicting two binding sites, their strand orientation, their sequential order, and the allowed distance between binding sites. However, nothing distinctive about this collection of transcription factor-binding sites could be discerned.
|
The sequence between exon C of hs6M1-16 and exon L of hs6M1-18, -21, and -27 (Fig. 2) was then compared against other regions of the human genome, using the BLAST program, to see whether this is a unique sequence or whether it exists in other OR clusters. The analysis revealed that the first half (positions 126247; Fig. 2) of this sequence was unique. However, the second half (positions 248346) of the sequence was found to be similar to several other regions within the genome. One of these similar sequences (with a 69% shared base pair identity) was located in an OR cluster on chromosome 11q12.2, but because this region of the genome is currently unfinished, further work is needed to confirm whether this shared sequence is located in a similar regulatory region in the chromosome 11q12.2 cluster.
To further investigate the putative promoter region between hs6M1-16 and hs6M1-18, -21, and -27, a functional analysis of the region was carried out. This involved cloning the candidate promoter region (positions 126346; Fig. 2) into the luciferase reporter vector pGL3 in both the forward and reverse orientation and transfecting the resulting constructs and the respective control vectors into two cell types. Because a suitable cell line derived from a seminoma was unavailable, two others were employed. Firstly, Odora cells, derived from rat OSN where OR could be expected to be expressed, were transfected. Because hs6M1-27 was also found to be expressed in the kidney (results not shown), human embryonic kidney cells (HEK293) were also used. After transfection, both sets of cells were assayed for luminescence. In both cases, positive controls revealed strong signals, but no promoter activity was found for the putative promoter region. In the case of OLFOP(F) (the region of interest in the forward orientation), cell luminescence was in several independent experiments below that observed for cells without constructs, and OLFOP(R) (the region of interest in the reverse orientation) revealed almost no activity (Table IX).
|
In addition, two promoter prediction programs were employed to analyze the major and the minor HLA-linked OR clusters, but no regulatory regions were predicted for any of the OR genes (Fig. 7), although promoters of several other genes were predicted accurately.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Testicular Expression of HLA-linked OR GenesOver the last 10 years, more and more evidence has accumulated suggesting that OR are not exclusively expressed in OSNs (44). This has led to the hypothesis that OR also receive and transduce signals within numerous nonolfactory tissues (15). OR could be involved in various aspects of sperm biology (12, 13,16, 17, 18, 21, 45) or in detecting or creating area codes that are important in embryogenesis (15, 46). The huge number of different OR species appears exceptionally well suited to meet these requirements, and the detection of OR transcripts in embryonic (47) as well as many adult tissues (12, 48) provides further supporting evidence. In this study, we demonstrate for the first time that HLA-linked OR genes are also expressed in testis.
Only a relatively small fraction (5%) of the entire repertoire of non-MHC-linked mammalian OR is known to be transcribed in testis (12, 16, 17, 45). Interestingly, nearly all of these testis-expressed OR are potentially functional, amounting to
15% of the human OR with open reading frames (2). In contrast, more than 85% of the analyzed HLA-linked OR genes with open reading frame are expressed in this tissue. This discrepancy might in part be explained by sensitivity differences of the methods used. Additionally, interindividual differences in the expression of single OR genes could be responsible as well, because of independent tissue donors in the respective studies. Therefore, it remains to be seen whether the observed expression differences between HLA-linked and non-HLA-linked OR are functionally relevant. The functional significance of testicular OR expression has been questioned repeatedly (49), in particular in connection with the seemingly "promiscuous" expression of various genes that serve no obvious function within the testis. However, several lines of evidence argue in favor of the functionality of OR in testis: (i) promiscuous gene expression is also a feature of certain thymic medullary cells, where it serves a purpose within negative T-cell selection (50); (ii) certain OR have been detected on spermatozoa (13), proving testicular translation of OR transcripts; (iii) the expression levels of OR genes within the testis appear to be comparable with that in the main olfactory epithelium (7)4; and (vi) the G-protein used by OR for signal transduction is expressed in testis (45), and spermatozoa seem to be endowed also with proteins involved in olfactory desensitization (51), implying that testicular OR expression is a meaningful event.
Specific ligands are known only for very few OR (52, 53, 54, 55), and these odorants are without exception volatile, favoring their interaction with OR residing on cells of the main olfactory epithelium. However, because OR also play a major role in axonal targeting of olfactory neurons to specific glomeruli in the olfactory bulb (19, 56, 57, 58), it may well be that OR interact with nonvolatile molecules as well, e.g. soluble molecules. This might be an essential requirement for the directed movement of spermatozoa (12, 13, 16, 18, 21, 45, 59). Therefore, OR may be considered as likely further examples for multi-functional proteins, such as MHC class I molecules that fulfill completely different roles within the immune system and particular areas of the brain (60).
Transcripts and Genomic Organization of OR GenesIn most of the analyzed cases, olfactory neurons of the main olfactory epithelium express only one OR gene/cell in a monoallelic fashion (22, 61). In other tissues, the mode of expression is unknown. To analyze the underlying regulatory mechanism, the exact knowledge of the transcriptional start is a minimal prerequisite. In testis, the transcripts of six HLA-linked genes (hs6M1-12, -16, -18, -21, and -27 and the non-HLA-linked hs7M1-2) contain 5'-untranslated exons, whereas seven genes (hs6M1-1, -3, -6, -10, -17, and -20 and the HLA-unlinked hs17M1-20) are characterized by transcripts without 5'-UTR exons.3
Interestingly, all but one (hs6M1-12) of the analyzed HLA-linked OR genes with 5'-untranslated exons are clustered within a region of 110 kb (Figs. 2 and 7). The exception, hs6M1-12, shows a pronounced similarity to hs6M1-16, an OR gene within the 110-kb region cluster (62). This similarity is most prominent in the coding region (>90%) but extends
5kb upstream, suggesting a common ancestor for both genes. Nevertheless, no homologous 5' exons are found for hs6M1-12 and -16. The only 5' exon found for hs6M1-12 resides in a region that has probably been inserted after duplication of an ancestral OR gene. An analogous situation can be observed for the seven upstream exons of hs6M1-16, which are located mostly in regions with, at best, low similarity to hs6M1-12. This implies that both genes are characterized by unique regulatory elements.
Although the analysis of the human expressed sequence tag data base TIGR has shown that one-third of all genes exhibit alternative splicing and 80% are spliced in the 5'-UTR (63), the complexity found for the 5'-UTR of hs6M1-16 is remarkable and, to our knowledge, unprecedented for OR loci (Fig. 3). Five of the seven upstream exons and the coding exon itself are used as transcriptional starts, suggesting the existence of six different promoters. Taking into account that OR could be expressed in different tissues, it does not appear surprising that different promoters are needed for flexible transcriptional regulation. This is exemplified by FGF1 (64) and other genes.
The OR genes hs6M1-18, -27, and -21 give rise to very unorthodox transcripts. Most of these share a common first 5' exon, suggesting a conjoint regulation by the same promoter. This seems to resemble the situation of the two human zinc finger genes PEG3 and ZIM2 (65); seven of the 11 exons of ZIM2 are located upstream of and shared with PEG3. The distances between the respective zinc finger and OR genes are remarkably similar (30 kb), but there are two major differences: In addition to this first shared exon, transcription of hs6M1-18 and -27 starts at further sites (exons N and R; Fig. 2),3 and the analysis of additional tissues and transcripts may uncover even more exons and transcription start sites. Furthermore, the introns of hs6M1-18, -27, and -21 contain three OR genes (hs6M1-17, -19P, and -20) in opposite transcriptional orientation, of which hs6M1-17 and -20 were demonstrated to be transcribed in testis. As a consequence, the mRNA for the most telomeric gene of this transcriptional unit (hs6M1-21) is expected to have lost numerous untranslated as well as five complete OR coding exons (those of hs6M1-17, -18, -19P, -20, and -27), in total more than 100 kb (Fig. 2). The large intron sizes are also highly unusual, because other OR genes exhibit distances from 1.3 to 11.1 kb between the first 5' noncoding exon and the coding region (7, 11, 26, 27, 28). hs6M1-32, a member of the telomeric subcluster of HLA-linked OR genes (Fig. 7), is another notable exception to the rule; its 5'-UTR is
64 kb in length and, like hs6M1-18, -27, and -21, it also splices around other OR genes (hs6M1-10 and -33P) (6).
Numerous hs6M1-16, -18, and -21 transcripts exhibit premature polyadenylation at defined sites within the coding region without detectable poly(A) signals. They match genomic sequence without any frameshifts or nonsense mutations. Because the same premature polyadenylation sites are present within freshly prepared testis cDNA and A/T-rich sequence stretches are missing at the respective places, cDNA library artifacts can be excluded as explanation. Most other genes that use more than one polyadenylation site possess the common polyadenylation signal (AAUAAA) only at the most 3' site (66), because this strong signal would otherwise repress the production of full-length transcripts. This configuration is also exhibited by hs6M1-16 and -27, where only the most distal exon contains the consensus polyadenylation signal. In general, splicing in the 3'-flanking region seems to be a rare phenomenon, which is in line with an unspliced 3'-UTR of hs6M1-18 and -27. On the other hand, the 3' exons of hs6M1-16 (Fig. 5) demonstrate for the first time that OR genes or G-protein-coupled receptors exhibit a spliced 3'-UTR.
Only 81 bp upstream of the shared first exon (L) of hs6M1-18, -27, and -21, transcription of some variants of hs6M1-16 initiate in the opposite direction (Fig. 2), suggesting that this small region contains a bidirectional promoter. Examples for bidirectional promoters in the human genome are common (67), but no bidirectional promoter controlling more than two genes has been described so far. This configuration could also explain why more than one OR gene has rarely been found to be expressed in a given OSN (68). However, neither an extensive in silico analysis, including the search for transcription factor-binding sites (32) and sequence similarities between OR clusters, nor experimental efforts using reporter gene constructs prove the existence of such a shared promoter region (Fig. 2 and Table IX). The analysis of the major and the minor HLA-linked OR clusters using two promoter prediction programs, Eponine and Promoter Inspector, also provides no hint for the existence of "classical" promoters (Fig. 7). Nevertheless, a shared promoter for hs6M1-16, -18, -21, and -27 may exist, but its presence remains elusive.
This lack of convincing promoter motifs has also been observed in studies of other OR gene clusters, both in human and mouse (7, 26, 27, 28). The central question of how an individual cell "decides" which OR gene to transcribe and how to suppress transcription of all others remains thus unsolved. Three mechanisms have been invoked for some years to account for this remarkable selectivity (reviewed by Ref. 69): (i) genomic rearrangements similar to those employed by B- and T-cells to generate immunoglobulins and T-cell receptors, respectively; (ii) gene-specific assemblies of transcription factors acting on gene-specific regulatory motifs; and (iii) involvement of "locus control regions" regulating the transcription of OR genes within clusters. There is either no evidence for such mechanisms (in particular, gene rearrangements; see also Ref. 28), or they present considerable conceptual difficulties together with lack of evidence (transcription factor assemblies and locus control regions). However, the concept of only a single OR transcription complex that stably associates with a single OR control region (Ref. 27; reviewed in Ref. 70) within a given OSN would provide a simple solution for the complex problem of gene- and allele-specific transcription.
The published OR coding regions of higher eukaryotic species like lamprey, teleosts, and mammals are all intronless, but for Caenorhabditis elegans and Drosophila melanogaster introns within the coding region were described (71). In this respect, HLA-linked OR seem to be no exception, if the coding region is analyzed only at the genomic level. However, our experiments reveal for hs6M1-16 and -27 that part of the coding region may be removed by splicing (Figs. 5 and 6). In case of hs6M1-27, this intracoding region splicing leads to premature termination because of a frameshift. The resulting protein would maximally contain the first three TM domains and is probably not functional on its own, because the pocket thought to accommodate a ligand is not present (3). In the case of hs6M1-16, two very similar splicing products (Fig. 5) are observed that could result in an OR without the first two TM domains. This N-terminally truncated OR would start like intact OR with an extracellular domain. For chemokine receptors, which are also G-protein-coupled 7-TM receptors, it has been demonstrated that the five distal TM regions may be sufficient for signal transduction (37). Because 10 of the HLA-linked OR genes comprise the respective splice site as well as the alternative start codon at position 79 (6), one may expect a biological function also for the truncated proteins. Hypothetically, differently truncated OR such as hs6M1-27 and -16 might even complement each other functionally. Provided that cells existed that transcribe two or more OR genes simultaneously, the resulting combinatorial potential between different truncated OR fragments could increase the number of receptor specificities considerably, and the presence of numerous OR pseudogenes within the human genome (72) could gain new relevance. hs6M1-14P, for example, was considered a pseudogene solely because of the missing conventional initiator ATG (6), but again, the alternative start codon at position 79 could give rise to a 5-TM fragment. In addition, OR gene polymorphism is expected to increase the complexity even further (5, 44). Therefore, the accumulation of mutations during primate evolution does not necessarily reduce the number of OR specificities as postulated (73) but might have led to new functional properties of OR. Differential splicing of OR transcripts creating functional species from seemingly nonfunctional genes could be another example for the economical use of resources that has been suggested by the finding of only 30,00040,000 genes instead of some earlier predictions of up to 100,000 genes within the human genome (39, 74).
![]() |
FOOTNOTES |
---|
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
b Supported by Volkswagen-Stiftung Grant I/72 740 (to A. Z. and J. T.).
c These authors contributed equally to this work.
e Supported by a studentship from the Medical Research Council.
g Supported by a Wellcome Trust program grant.
i Supported by the Wellcome Trust.
j Recipients of a Wellcome Trust travel grant.
k To whom correspondence should be addressed. Tel.: 49-30-4505-53502; Fax: 49-30-4505-53953; E-mail: andreas.ziegler{at}charite.de.
1 The abbreviations used are: OR, odorant receptor; HLA, human leucocyte antigen; MHC, major histocompatibility complex; OSN, olfactory sensory neuron; RACE, rapid amplification of cDNA ends; UTR, untranslated region; TM, transmembrane.
2 S. Povey, personal communication.
3 Website address: www.charite.de/immungenetik/ORexpression/exon-table.xls.
4 A. Ziegler, A. Volz, and K. Zatloukal, unpublished results.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|