From Stanford Medical Informatics, Stanford University School of Medicine, Stanford, California 94305-5479
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Karavanich and Anholt (2) first compared the olfactomedin-related proteins from rat, mouse, human, and frog and identified several conserved motifs. The pairwise identities among these sequences ranged from 22 to 98% identical. Upon identification of the human TIGR proteins homology with olfactomedin, Nguyen et al. (4) confirmed that TIGR possessed several of the motifs identified by Karavanich and Anholt (2). Rozsa et al. (5) extended their evaluations to a slightly more divergent set of proteins, including a rat latrophilin sequence and a human olfactomedin-related sequence.
Subsequently, Kulkarni et al. (6) have identified a human olfactomedin gene family of which TIGR/myocilin is one member. The members of this family, designated HOLFA, HOLFB, HOLFC, HOLFD, and HTIGR, are distributed in various tissue types. For example, HOLFB is expressed in pancreatic and prostate tissues, and HOLFC is expressed in cerebellum. In their study, the researchers concentrated their analysis on the olfactomedin domain of the sequences because of extreme differences in sequence lengths among the olfactomedin-related proteins. When the evolutionary relationships between these domains are evaluated, HOLFA and HOLFC appear to be related phylogenetically, as do HOLFB and HOLFD. TIGR/myocilin appears to belong to a separate subgroup. In addition to the human gene family, Kulkarni et al. (6) examined 13 other human and non-human sequences, including latrophilins and several ESTs.1
In this study, we have extended the analysis of the human olfactomedin-related gene family. Despite continued interest in the olfactomedin domain-containing (ODC) proteins, the native function has not been determined for any of these proteins. Presumably, sequence elements conserved across long evolutionary times are likely to be essential to the structural or functional success of a protein family (7). To elucidate potential sequence epitopes important to the native functions of the TIGR sequence, we have attempted to identify a larger, more diverse family of related sequences (<30% identical). Based on this larger set, we have identified conserved sequence motifs that may be key to the function or structure of the family. Using these motifs, we have also identified a family of proteins that may be distantly related to the TIGR and olfactomedin proteins.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Secondary Structure Prediction
Several programs are available currently for predicting secondary structure from sequence. Because these algorithms employ different methods for prediction, they can result in contradictory results. To maximize the accuracy of our predictions, we used Jpred (9) jura.ebi.ac.uk:8888/, a server that incorporates information from several programs including PHD, PREDATOR, and ZPRED. Jpred can predict secondary structure for single sequences or multiple alignments, thus incorporating evolutionary data, in addition to the primary sequence. The server calculates a consensus secondary structure from the individual predictions thereby increasing the accuracy.
The Jpred server can perform a multiple sequence alignment on a set of unaligned sequences and provide predictions based on that alignment. Alternately, a set of aligned sequences may be submitted for secondary structure prediction. Multiple alignments are sensitive to the composition of the sequences included in the alignment (7). To evaluate our hypothesis that the ODC sequences are related in the N-terminal region and, hence, have similar structures, several separate predictions were made. All of the sequences containing a domain N-terminal to the olfactomedin domain were aligned using ClustalW (10). The olfactomedin, TIGR, and latrophilin groups were each submitted separately to Jpred. The remaining sequences were submitted individually. The similarity among the sequences in the C-terminal region of TIGR (i.e. the olfactomedin domain) is significantly greater than that of the rest of the molecule. Thus, the entire set of C-terminal sequences was submitted together for prediction of the secondary structure of this region.
Conservations in the C-terminal Domain
The sequences identified by Kulkarni (6) and the new sequences identified in our analysis have olfactomedin domains ranging in length from 29 residues for the shortest EST sequence to 272 residues for the longest protein. To identify conserved residues within the olfactomedin domain, we partitioned the domain into nine segments, delineated primarily by multiple residue insertions in one or more sequences resulting in a column of gaps in the aligned sequences. For each segment, redundant members (>75% identity) were removed prior to identification of conserved residues using the JalView sequence-editing program (circinus.ebi.ac.uk:6543/jalview/). The human TIGR protein was added back to each segment for comparison after the analysis was complete.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Consistent with the conclusions of Karavanich and Anholt (2) and Kulkarni et al. (6), the C-terminal region is well conserved and appears to have evolved slowly, predominantly via point mutations; the N-terminal region, on the other hand, seems to have evolved more quickly. Because of the different apparent rates of evolution between the ends of the molecule, we have analyzed these two regions separately.
N-terminal Region
Fig. 2 shows a ClustalW alignment of a representative set of the N-terminal regions of the sequences of Kulkarni et al. (6), olfactomedins, TIGR orthologs, and all of the sequences we have identified in this work containing an N-terminal region. In this region, the similarity among the JOLHN, MKIDN, HOLFE, HOLFF, and HOLFC sequences and the remaining olfactomedin, TIGR sequences is evident. The GXCXXT consensus motif is clear in these sequences, as well as the TIGR and olfactomedin sequences. Also, several of the sequences possess a residual leucine-zipper (LZ) or a leucine-rich (LR) region in their N terminus. Fig. 2 also shows the obvious disparity among the latrophilins (represented by BLAT1), DOLFA, and WOLF2 and the remaining sequences. These proteins lack the GXCXXT motif, as well as a residual LR/LZ region.
In our pursuit of distantly related sequences that might be part of a TIGR/olfactomedin superfamily, we searched for homology to the N-terminal region of the sequences shown in Fig. 2. Our congruence analysis using Shotgun did not identify any distantly related sequences with significant similarity to the set other than their LR/LZ regions. In fact, the list of Shotgun hits consisted primarily of sequences with LR or LZ regions. The hits included proteins such as myosin, plectin, kinesin, tropomyosin, and several other coiled-coil sequences. Previous work has shown that this similarity does not appear to be a sufficient basis to infer an evolutionary relationship (1).
In distantly related proteins, structure is often more highly conserved than sequence (11). Although proteins sharing 30% sequence identity are expected to exhibit the same fold structure, many proteins with statistically insignificant sequence similarity may also have similar folds (12, 13). Because the coiled-coil structure and LR/LZ domain appeared to be the basis for the similarity between our query sequences and their Shotgun hits, we decided to evaluate our sequences based on their structures. Three-dimensional structural information is not available for any of these sequences; hence we compared the predicted secondary structures for the proteins.
Because the composition of a set of sequences can significantly affect a multiple alignment and, hence, the resultant secondary structure prediction, several separate predictions were made for the N-terminal region. The TIGR-like sequences, the olfactomedin-like sequences, and the latrophilin sequences were each submitted separately for secondary structure predictions. Fig. 2 shows a representative sequence from each prediction with the secondary structure elements, helices and ß-strands, indicated for each group. The sequences that did not clearly fall into one of these groups were also submitted individually to confirm their predictions were consistent with the larger group.
The predicted secondary structures confirm the results of the sequence alignments; DOLFA, WOLF2, and the latrophilins appear to be part of groups separate from that of the olfactomedin- and TIGR-related sequences. All three sequences (DOLFA, WOLF2, and BLAT1) are primarily sheet-like in the N-terminal region. The TIGR- and olfactomedin-related sequences are all primarily helical. Although there are differences in the initiation and termination of the helices and sheets in the separate predictions, the predictions are quite similar for the TIGR- and olfactomedin-related sequences. JOLHN, HOLFE, and HOLFF are also predicted to be predominantly helical in the N-terminal region.
C-terminal Region
Secondary Structure of the C-terminal Region
We also used Jpred to predict the secondary structure of the C-terminal region. The entire family of ODC sequences aligned by ClustalW was submitted to the Jpred server. Each program predicted that the majority of this domain is composed of ß-sheets, with only a few very short helical regions as shown in Fig. 3 for the HTIGR sequence. The figure also includes the N-terminal region of TIGR to highlight the explicit difference in the predicted structures of the two domains.
|
As discussed previously, we divided the olfactomedin domain into nine segments for our analysis and then removed redundant sequences to <75% identity. The conserved motifs within each of the nine segments are shown in Fig. 4. Segments 2 and 4 are not shown, because they did not contain any residues strongly conserved by identity or by type. Some of the motifs shown in Fig. 4 may appear to include sequences with higher than 75% identity. The <75% sequence identity comparison and the identification of conserved residues were performed on the entire segment, rather than just the conserved portions shown in the figure. For example, in region 1, BLAT1 and BLAT3 are 100% identical (HQSGAWCKDPL); however, the surrounding sequences in segment 1 are more divergent (<75% identical). The exception to this rule is the HTIGR sequence. To maintain a consistent reference sequence throughout the comparison, and because we were interested primarily in the relationship of the other family members to the HTIGR sequence, it was added back to the alignments after the conservation analyses were completed.
|
One of the most extensive conserved motifs does not contain any fully conserved residues. The area containing the cysteine shown to be involved in the formation of oligomers in olfactomedin and believed to be involved in dimerization in TIGR (Cys-433; region 7) has no fully conserved residues. This surprising result is primarily because of the presence of two sequences, HEST4 and ZEST2. The ZEST2 sequence is consistent with the conserved motif, with the exception of an arginine in place of the conserved cysteine. Although the HEST4 sequence possesses a cysteine in the conserved position, the surrounding residues do not match the motif. Both HEST4 and ZEST2 are EST sequences and contain nonsense codons upstream of this motif. Future studies should investigate the accuracy of these sequences in this region.
Comparison of HTIGR Mutations and Conserved Residues
Presumably, sequence elements essential to the function or structure of a family of evolutionarily related proteins should be more highly conserved than less essential elements. When functional or structural information is available for a family of proteins, differences among the proteins can be useful for determining which conserved residues are key elements in its function or structure (3). Partially conserved or divergent residues, which impart differences in function or structure, may also be identified.
Because the normal function and three-dimensional structure of the TIGR and olfactomedin protein families have not been elucidated, another source of information must be found. Rozsa et al. (5) evaluated the effects of mutations on the predicted secondary structure in several TIGR/olfactomedin-related proteins (HTIGR, MTIGR, FOLFA, ROLFA, HOLFA, and RCL2B). The predicted effects of these mutations on the secondary structure of human TIGR provide some insight into the pathogenesis of glaucoma. Likewise, our previous work examined mutations in light of the evolutionary relationship between the TIGR (HTIGR, MTIGR, RTIGR, and BTIGR) and olfactomedin (COLFA, ROLFA, FOLFA, HOLFC, and MOLFA) sequences. Evaluating HTIGR mutations in terms of the conserved sequence elements across a more diverse set of sequences should provide additional understanding of the structural and functional characteristics of the ODC family. The comparison of disease-associated mutations versus non-disease-associated polymorphisms in terms of their degree of conservation as the family has evolved may help to elucidate those residues/motifs that are functionally or structurally essential.
We compared 11 polymorphisms and 27 mutations catalogued by several researchers (5, 1423) to the fully and partially conserved residues in our alignment of the ODC C-terminal domains. Table II provides a summary of each segment of the alignment, including the fully conserved residues within the region, each variant in the region, whether the variant has been shown to be associated with glaucoma, and whether the wild-type residue is conserved (partially or fully) within our family.
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Analysis of the N-terminal Region
The addition of the new sequences to our analysis has provided additional support that the olfactomedin and TIGR proteins are related throughout the length of the molecule rather than only in the olfactomedin domain in the C terminus. In Fig. 2, the GXCXXT motif and the leucine-rich/leucine-zipper regions of the sequences are highlighted. All of the olfactomedin- and TIGR-related sequences possess a residual leucine zipper aligned with the leucine zipper of TIGR. The additional sequences have also helped to adjust our previous alignments (1). When we align only the TIGR and olfactomedin sequences the N-terminal region includes a (A/V)LEE(E/Y)K motif spanning residues 151 through 156 in HTIGR and 135 through 140 in MOLFA. With the addition of the more divergent sequences, that region appears to be only partially conserved among the HTIGR, COLFA, and HOLFC proteins and is significantly more divergent among the other sequences.
In addition to the support from our sequence analysis, structural analysis of the N-terminal regions provides additional support that these sequences are related. As discussed above, our Shotgun analysis revealed mostly leucine-rich/leucine-zipper sequence hits such as myosin, kinesin, and other coiled-coil proteins. Coupled with the identification of a residual leucine zipper in the N-terminal region of the olfactomedins, the analysis suggested that conserved elements might be evident in the secondary structure rather than the primary structure. Because the evolutionary relationship between the TIGR and olfactomedin-like sequences appears to be distant, we completed predictions for each set separately, as well as in combination with the other sequences that dont clearly fit into either category. Our secondary structure predictions confirm that the helical nature of the N-terminal region is conserved in the olfactomedin-like and TIGR-like sequences. As shown in Fig. 2, the predicted structure for both groups is predominantly helical with a limited strand region at the very beginning of the sequence.
The latrophilins, on the other hand, are mostly strand-like in the N-terminal region. These proteins are transmembrane G protein-coupled receptors with large extracellular N-terminal regions containing the olfactomedin domain. The family is also characterized by several splice variants. It is likely that the olfactomedin domain in these proteins is the only domain related to the olfactomedins. The drastically different ß-strand nature of the region preceding the olfactomedin domain would appear to support this conclusion.
Rozsa et al. (5) used the Chou-Fasman (24) and Garnier-Osguthorpe-Rosman (25) algorithms to predict the secondary structure of the wild-type and allelic variants of TIGR with glaucoma-associated mutations. Although both Chou-Fasman and Garnier-Osguthorpe-Rosman algorithms predicted the N-terminal region of TIGR to be a mixture of helical and strand regions, it is predominantly helical. Likewise, although their C-terminal prediction contains both strands and helices, the composition is primarily strands. Several of the programs used by the Jpred server for the prediction of secondary structure have improved significantly on the accuracy of predictions over those obtained with Chou-Fasman or Garnier-Osguthorpe-Rosman algorithms. The Jpred prediction, a consensus of multiple secondary structure predictions, improves the average prediction accuracy to 72.9% (9). Based on these improvements, the differences in the prediction of the secondary structure of TIGR between the work of Rozsa (5) and our present work are not surprising.
In our previous work (1), we hypothesized that the exon 2 region of TIGR was the result of an insertion event into a distant olfactomedin relative. Our current analysis has revealed two additional pieces of evidence supporting this hypothesis. The JOLHN sequence is the first non-olfactomedin sequence we have identified that is clearly related, yet significantly divergent from TIGR in the N-terminal region (32% identity). JOLHN aligns with the exon 1 region of TIGR through the end of the leucine zipper. After a large gap (27 amino acids), the alignment resumes in the middle of exon 2. Additionally, alignment with the divergent set of olfactomedins (COLFA, HEST3, HOLFC, FOLFA, and HOLFF) also results in a large gap (30 amino acids) after the end of the leucine zipper. The alignment resumes for a short region in the middle of exon 2 suggesting the presence of large insertions in the exon 2 region.
Analysis of the C-terminal Domain
Our analysis of the C-terminal region of the sequences shown in Fig. 1 reveals several significant findings. The predicted secondary structure of the C-terminal regions of the family appears to be completely different from that of the N-terminal region, evidence of the multidomain structure of this family. Multiple alignments of these sequences revealed several conserved motifs that might be used to confirm additional members of this family. In particular, the motifs in segments 1, 5, and 7 and the short (Y/N)N(P/A/S) motif in segment 9 are highly conserved within the family.
Because neither functional nor structural information is available for any of the sequences included in our analysis, we used HTIGR mutation data to evaluate the significance of conserved and non-conserved residues within our collected family of sequences. From our comparison, it is possible to identify residues that may be important to some functional or structural characteristic specific to the TIGR family of proteins, as well as residues that might be essential in the overall structural or functional characteristics of the entire family of sequences. For example, as shown in Table II, Cys-433 is almost completely conserved in segment 7. The C433R mutation has been shown to be associated with glaucoma in seven of 25 unrelated Brazilian patients with juvenile-onset open-angle glaucoma (22) suggesting that some key functional or structural role has been disrupted by this substitution. Interestingly, one of the zebrafish EST sequences (ZEST2) contains an arginine aligning with Cys-433 of HTIGR. The W286R mutation, on the other hand, occurs in a residue conserved only within the TIGR and latrophilin sequences, perhaps indicating an important role for this residue in these two families. Similarly, T293K occurs in the region of an apparent insertion in the TIGR proteins that is absent from the remaining sequences. Table III provides a summary of other conserved and non-conserved HTIGR mutations that may highlight regions that are important to the overall function of the protein.
|
Another interesting result of our analysis is the identification of two large regions where disease-causing mutations in conserved residues appear to be concentrated. The clustering of mutations in residues 360 to 380 and 477 to 481 may indicate regions in which to focus efforts to further elucidate the function/structure of the ODC proteins.
The Chitinase Family
In addition to the family of ODC sequences, we have identified another family that may be a distantly related protein. The catalytic TIM-barrel domain of chitinase A shares specific regions of sequence similarity with the ODC family (Fig. 7). The chitinase catalytic motif occurs in the region of a large insertion in the alignment. This is a likely indication that the similarity between chitinase and the ODC proteins cannot include the chitinase enzymatic activity; however, other structural or overall functional characteristics may still be maintained. Among distantly related proteins (<30% identical), conserved residues are likely to be structurally or functionally important to the protein (7). For example, the DXnDXXGXW motif present in the olfactomedin-related family and the chitinase sequences (Fig. 7D) might represent some key functional or structural element within the two families. Additional conserved residues and motifs, also highlighted in Fig. 7, probably represent equally essential residues. Sufficient evidence to specify the significance of the relationship between these families has not been obtained; however, given the limited availability of structural or functional information available for the ODC family, further investigation is warranted.
Although several questions regarding the ODC family remain unanswered, our study has provided important insights into the diversity within this group of related sequences. Although several of the sequences appear to contain only an olfactomedin domain, as is the case for the latrophilins and several of the EST sequences, the relationship extends outside this domain in sequences such as TIGR, the neuronal olfactomedin-like sequences, HOLFE, and HOLFF. Within the olfactomedin domain, we have identified several conserved motifs that will be useful for identifying and confirming future additions to this group of sequences and for investigating the elements necessary to maintain the structural and/or functional success of these molecules. Two regions with an increased frequency of mutations in conserved residues point to potential areas of investigation into the structural/functional operation of the ODC proteins. All of these elements should lead to a clearer understanding of the involvement of human TIGR in the pathogenesis of glaucoma and the normal function(s) of this family of sequences.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, May 21, 2002, DOI 10.1074/mcp.M200023-MCP200
1 The abbreviations used are: EST, expresses sequence tag; ODC, olfactomedin domain-containing; LR, leucine-rich; LZ, leucine-zipper.
2 The following sequences were used to search for distant homologs: HTIGR, NCBI accession number NP_000252; BTIGR, NCBI accession number BAA82152; MTIGR, NCBI accession number NP_034995; RTIGR, NCBI accession number BAA34199; COLFA, NCBI accession number AAF40413; FOLFA, NCBI accession number Q07081; MOLFA, NCBI accession number AAB84058; HOLFA, NCBI accession number Q99784; ROLFA, NCBI accession number AAC04320; HOLFB, NCBI accession number AA447264; HOLFC, NCBI accession number AAD20056; HOLFD, NCBI accession number W53028; MPAN3, NCBI accession number BAA28767; HKIAA, NCBI accession number BAA74844; HLAT2, NCBI accession number CAA10458; BLAT1, NCBI accession number AAD09191; BLAT3, NCBI accession number AAD05324; PLAT1, NCBI accession number AAC98700; RCL_3, NCBI accession number AAC62664; RCL2B, NCBI accession number AAC62657; WOLF1, NCBI accession number AAB52933; WOLF2, NCBI accession number CAB04088.
3 The following sequences were identified by our searches: DOLFA, NCBI accession number AAF48788; JOLHN, NCBI accession number AV670132 and NCBI accession number AV670616; MKIDN, NCBI accession number AI987584; HOLFE, NCBI accession number XP_001313; HOLFF, NCBI accession number CAC17635; HEST4, NCBI accession number AL562289; HEST3, NCBI accession number BF530912; HL-P4, NCBI accession number T23140; ZEST1, NCBI accession number AW115690; ZEST2, NCBI accession number AW154600. Because these searches were completed, the HOLFE sequence has been deleted from GenPept at the request of the submitter; however, a mouse sequence 86% identical to HOLFE is still present with NCBI accession number AAH05485.
4 Chitinase A, NCBI accession number AAC79665.
* This work was supported in part by NLM, National Institutes of Health Grant NLM 07033 (to M. L. G.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
To whom correspondence should be addressed: Stanford Medical Informatics, Stanford University School of Medicine, 251 Campus B X-215, Stanford, CA 94305-5479. Tel.: 650-736-0156; Fax: 650-725-7944; Email: mgreen{at}smi.stanford.edu.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
All ASBMB Journals | Journal of Biological Chemistry |
Journal of Lipid Research | Biochemistry and Molecular Biology Education |