*Department of Biology, University of Utah;
and
Departamento de Genética, Universidad de Sevilla, Seville, Spain
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The amino acid sequences of lipocalins are quite divergent, and low levels of sequence identity, even below 20%, are found when comparing the overall sequence among some members of the family. In spite of the low level of sequence similarity, the tertiary structures of lipocalins are strongly preserved. The lipocalin folding motif (fig. 1
) (Cowan, Newcomer, and Jones 1990
; Flower 1995
) is an eight-stranded antiparallel ß-barrel with an N-terminal 310
-helix and a C-terminal
-helix (A1 and A2 in fig. 1
, respectively). The barrel is open at one side and encloses a binding pocket. Another characteristic of the lipocalins is their ability to form oligomers, which range from the dimeric state of many lipocalins, such as odorant-binding proteins (Tegoni et al. 1996
), to the complex octamers of crustacyanins (Keen et al. 1991
). There are three conserved sequence motifs called structurally conserved regions (SCRs) that have been proposed as a prerequisite for a protein to be considered a lipocalin (Flower, North, and Atkwood 1993
). Flower, North, and Atkwood (1993)
propose a separation of kernel versus outlier lipocalins based on the conservation of the SCRs, as well as on the existence of disulfide bridges. The SCRs represent a structural element composed of three loops that are close to each other in the three-dimensional structure and constitute the bottom of the ß-barrel. A role for the SCRs as a receptor-binding site has been suggested based on their exposure to the solvent. This proposition, however, remains to be demonstrated. With regard to ligands, a broad set of hydrophobic molecules have been shown to bind to different lipocalins. Some lipocalins have an exquisite specificity for a given ligand, such as the epididymal secretory proteins that bind retinoic acid, but not other retinoids (Newcomer and Ong 1990
). Other lipocalins, like ß-lactoglobulins, apolipoproteins D (ApoDs), and some chemoreceptor lipocalins, bind a variety of ligands of very different natures (reviewed by Flower 1995
).
|
In this paper, we report the detection of two novel lipocalins in different organisms and the analysis of the phylogenetic relationships of all the proteins so far inscribed in the lipocalin family.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Only the mature protein sequences were used, either deduced from the known mature N-terminal amino acid or predicted by von Heijnes (1990)
method. Similarly, the C-terminal peptide cleaved after the attachment of glycosyl-phosphatidylinositol (GPI) to a particular lipocalin was not considered for the alignment. Partial sequence entries were excluded. The most recent entry was considered when several sequences were available for a given protein. Proteins were named using an abbreviated species name followed by a functional label (see table 1
).
|
Phylogenetic Analyses
Phylogenetic analyses based on protein sequences were carried out using the maximum-likelihood method with the MOLPHY software, version 2.3 (Adachi and Hasegawa 1996
). Parsimony analyses were performed with the PROTPARS program of the PHYLIP software package, version 3.5 (Felsenstein 1993
).
The number of sequences is the most important limiting factor when an exhaustive phylogenetic analysis is attempted under the maximum-likelihood principles. In our study, the 113 sequences of our sample make an exhaustive tree topology search prohibitive. Therefore, we devised a different strategy. First, we reconstructed a global tree based on the protein sequences given in table 1 . The steps for obtaining the global tree were: (1) calculation of a maximum-likelihood distance matrix with the PROTML program under the JTT model (Jones, Taylor, and Thornton 1992
) normalized to the amino acid composition of the data set (program options D and jf), (2) a neighbor-joining tree reconstruction (Saitou and Nei 1987
) with the program NJDIST using the previous distance matrix as an input, (3) the use of the resulting tree topology as a seed to search for a topology with higher likelihood value using the same amino acid substitution model. We used the method of topology search by local rearrangement (option R of the PROTML program) to try to improve the neighbor-joining tree and to calculate a local bootstrap probability (LBP) under 1,000 replicates (Hasegawa and Kishino 1994
).
We then divided the global tree into subtrees following two criteria: an LBP greater than 75%, and literature information on different functional groups of lipocalins. The sequences composing each group were aligned de novo using the lipocalin structural mask described above. We also selected for each group the sequence in the most related subtree with the shortest distance to the node joining both groups. This sequence was used as an outgroup for the purpose of tree representation. Two different methods were used to analyze the defined groups, depending on the number of sequences. Exhaustive topology searches were carried out for groups with eight or less sequences (including the outgroup). This approach renders the maximum-likelihood tree under the JTT model with data frequencies. The LBP under 1,000 replicates was also calculated for each node. In groups of more than eight sequences, we followed the same strategy as that used in estimating the global tree. A majority-rule of LBP 50% was established for each node in every subtree; unsupported nodes were excluded, and their branches forced to yield polytomies.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In our searches, we detected a conceptual protein deduced from a cDNA clone obtained at the slug stage of the cellular slime mold Dictyostelium discoideum. The DNA sequence came from the Dictyostelium cDNA project in Japan, but no information is currently available concerning the properties and role of the protein. This putative lipocalin displays significant sequence similarity in the regions aligned with SCRs 1 and 3 of the kernel lipocalins and shows high overall sequence similarity with the prokaryotic lipocalins (fig. 2 ).
|
Although not included in this study, we also detected a putative open reading frame (ORF) with a conceptual protein that strongly resembles the 2u-globulins in the rabbit. Interestingly, this ORF is located in the 3' untranslated region of the cytochrome P-450 gene (accession number RABIIA11).
Finally, recent work reports on plant enzymes with sequence similarity to lipocalins (Bugos, Hieber, and Yamamoto 1998
). These enzymes, involved in photoprotection of the photosynthetic apparatus, are formed by a transit peptide needed for translocation to the thylakoid space of chloroplasts, a cysteine-rich domain, a lipocalin-like polypeptide, and a C-terminal charged region. It is plausible that a fusion of a plant lipocalin to other proteins has occurred during evolution to create the xanthophyll cycle enzymes. The lipocalin
1-microglobulin is in fact an example of a mosaic protein that is initially synthesized fused to a proteinase inhibitor (Kaumeyer, Polazzi, and Kotick 1986
) and is cleaved thereafter. However, we would rather be cautious with family ascription until more information is gathered that unequivocally establishes the evolutionary origin of these plant enzymes, either from a lipocalin gene fused to other plant genes or as an unrelated protein converging toward the lipocalin fold. Thus, the xanthophyll cycle enzymes were not included in our study.
Alignments
When aligning the lipocalin amino acid sequences, we used structural criteria to guide the inclusion of alignment gaps. A penalty mask was constructed from the known folding pattern of eight lipocalins whose crystal structures had been resolved. The gap penalty mask, included because of the conserved nature of the lipocalin fold, is essential for the proper alignment of some highly divergent lipocalin sequences.
These structural constraints allow gaps in the loop regions of the lipocalin fold (see fig. 1
). There are four main gaps in the lipocalin alignment that are placed at loops L1, L4, L5, and L7. Interestingly, L1, L5, and L7 are located at the open end of the barrel, and they form a loop scaffold strongly implicated in ligand binding and in protein-protein interactions (Flower 1995
). The gap at L1 is mainly due to insertions in
1-microglobulins (A1mgs) and
1-acid glycoproteins (a1GPs). Loop L1 is of particular importance for A1mgs, because they covalently bind to IgA and to a still- unknown chromophore via an intermolecular disulfide bridge located in this loop (Calero et al. 1994
). A unique insertion present in the retinol-binding proteins (RBPs) accounts for the gap located in L5, a loop involved in the specific interaction of RBP with the protein transthyretin (Sivaprasadarao, Boudjelal, and Findlay 1993
). Similarly, the gap in L7 is generated by an insertion in RBPs. Finally, the gap in L4, a loop located at the closed ends of the calyxes of lipocalins, is formed by insertions in ß-lactoglobulins (BLs), the prostaglandin D synthases (PGDSs), and the neutrophil lipocalins (NGALs).
Other alignment regions that deserve comment are the N and C termini of lipocalins. After cleavage of the signal peptide, the N-terminal region is variable in size, ranging from 3 to 20 amino acids up to the beginning of the structurally conserved 310 helix (A1 in fig. 1
). The C termini are more variable than the N termini, with substantial differences even between closely related lipocalins. Moreover, there is a very specific C-terminal region in the grasshopper lipocalin Lazarillo that is responsible for the GPI membrane linkage uniquely exhibited by this lipocalin (Ganfornina, Sánchez, and Bastiani 1995
). All of this information agrees with a less conserved nature of the C terminus at the structural level.
Phylogenetic AnalysisGlobal Tree
The resulting global tree is represented unrooted in figure 3 . The seed tree showed a negative log-likelihood value of -27,015.63 ± 965.09. The final tree had a value of -26,918.6 ± 961.58. This difference reflects the improvement obtained with the local branch rearrangement method. In order to test the reliability of this tree, we repeated the reconstruction method entering the alignment sequences in different orders. No changes in topology or in LBP were obtained. Thirteen lipocalin clades (labeled with Roman numerals) were identified in the global tree. The sequences used in our analysis are listed in table 1
and assigned to each particular clade.
|
Overall, the global tree segregates the lipocalins into the same family groups previously described in the literature (Flower 1996
). However, there are clades, such as clades II and V, that group together proteins lacking a common expression pattern or function. Moreover, some nodes in the tree receive strong support and suggest the existence of monophyletic lipocalin superclades. These superclades are: (1) the group of clades I, II, and III; (2) the group including clades IV and three ungrouped sequences; (3) the node clustering clades VI and VII; and (4) the node that groups clade VIII, clade IX, and the ungrouped horse sequence Ecab.C1p. These superclades are also identified and supported by 200 bootstrap replicates of a maximum-parsimony analysis. A monophyletic group composed of clades VIIIXII is supported with an 83% LBP but is not supported by maximum-parsimony analysis.
Analysis of Subtrees
In analyzing the phylogeny of lipocalins, we want to resolve both the relationships among different lipocalin groups and the relationships among the sequences within each group. For this reason, we ran separate phylogenetic analyses on each clade identified in the global tree. The resulting subtrees are shown in figure 4
.
|
The phylogeny of A1mg (clade VI) clearly reflects orthologous relationships. Both its topology and the branch distances are in good agreement with the species tree of vertebrates. In contrast, other subtrees, such as that of the BLs (clade IV), illustrate a complex history of gene duplications. In some cases, those duplications preceded speciation, as in the case of equine BLs. In other cases, recent duplications appear in a given species (e.g., the dog BLA and BLC, and the cat BLB and BLC). The existence of recurrent duplications in this subtree is supported by the presence of some BL pseudogenes in the goat and cow (Passey and Mackinlay 1995
), which have not been included in our analysis. The BL tree reveals an interesting location for the human PP14, a protein secreted by the endometrial and decidual epithelia and abundantly present in uterine and amniotic fluids. PP14 is grouped with the two equine BLBs and does not appear as an ancestral form in the BL tree, as is the case in maximum-parsimony trees, both in the present study (not shown) and in the work by Piotte et al. (1998)
. The derived position of Hsap.PP14 in our BL subtree suggests a strong divergence from the BL sequence signature. This agrees with the absence of BLs in primate milk and suggests that BLs were recruited for other reproductive roles early in primate evolution. Other lipocalin clades such as RBPs (clade III) or a1GPs (clade XII) mostly show orthologous sequences, although some recent duplications appear in specific species. An interesting duplication is recorded in the RBP subtree involving the retinal protein Purpurin, so far only identified in the chicken. The basal position of Purpurin in the tree suggests two possible evolutionary possibilities. Either an early lineage-specific duplication of RBPs along with divergence of Purpurin occurred in birds, while the rest of vertebrates lack the retinal protein, or, alternatively, Purpurin is indeed present in other lineages but not identified yet.
Two clades (clades II and V) group proteins that are apparently unrelated. Clade V combines the PGDSs, the NGALs, and the QSPs. The amphibian proteins Xlae.cpl1 and Bmar.lip are most probably orthologous PGDSs, given their expression pattern and retinoic acid binding (Achen et al. 1992
; Lepperdinger et al. 1996
). The ancestral position of the chicken QSPs in the tree, along with the unreported presence of PGDS and NGAL in birds, suggests that QSPs are ancestral forms of PGDS and that they may bear other features common to PGDSs besides stabilizing growth in cell populations (reviewed by Flower 1996
). The well-supported monophyletic relationship of the mammalian NGALs and PGDSs and the presence of both protein types in the same species suggest a case of gene cooption after duplication and divergence of an ancestral PGDS-like gene, specifically in the mammalian lineage. In clade II, the arthropodan lipocalins cluster together with the vertebrate ApoDs. The crustacean lipocalins are particularly closely related to ApoD, while the insect lipocalins stand apart as two polytomic clusters separately relating the fat body and epidermis lipocalins on one side and the nervous system lipocalins on the other. The duplications found in the crustacean Homarus gammarus and in the lepidopteran Manduca sexta seem to be recent events in the evolution of these lineages. The unresolved node associating insect lipocalins and the lengthy branches of the arthropodan tree indicate a long and divergent history for these lipocalins. Further analysis of the expression pattern and functional role of the proteins associated in this clade is needed in order to formulate hypotheses about the ancestral role of lipocalins in the metazoan lineage.
Finally, the monophyletic superclade grouping clades VIIIXIII contains proteins found so far only in mammals. These lipocalins have been considered outliers in the family (Flower, North, and Atkwood 1993
) because of their low amino acid sequence similarity and lack of some of the SCRs. Accordingly, these subtrees show long branch lengths and poorly supported nodes. These groups contain lipocalins that are particularly prone to gene duplications (e.g., MUPs, a2gs). However, attempts to derive further evolutionary implications within each clade are limited by the high sequence divergence and should take into account other characters for phylogenetic reconstruction.
Rooting the Lipocalin Tree
In the previous section, we analyzed the relationships among different lipocalins and lipocalin groups using an unrooted tree. Rooting our global lipocalin tree would help us to assign polarity to character changes and to suggest plausible scenarios for the evolution of specific lipocalin properties. The existence of a single cluster grouping the arthropodan lipocalins should allow an unambiguous rooting for the metazoan lipocalins. However, the discovery of bacterial lipocalins (Bishop et al. 1995
; Bishop and Weiner 1996
) opens up the possibility of rooting the tree using the clade of prokaryotic lipocalins.
The bacterial lipocalins have been found in a restricted number of species, raising the possibility that these lipocalins originated through horizontal transfer. In order to determine whether the bacterial lipocalins are the result of horizontal transfer, we estimated the G+C contents in the first and third codon positions of gene samples of the bacterial species under study. (The samples were retrieved from the GenBank database.) We then calculated the mean of these indices and the first and third quartiles. A biased G+C in the first and third codon positions would be suggestive of horizontal transfer (Lawrence and Ochman 1997
). The results of these calculations (see table 2
) show that none of the computed G+C contents of the bacterial lipocalin genes are outside of the expected limits (between the first and third quartiles). Given the predicted time sensitivity of this test (Lawrence and Ochman 1997
), our data provide no support for a hypothesis whereby bacterial lipocalins were recently acquired through horizontal transfer.
|
|
The results presented here suggest that the prokaryotic lipocalins have been vertically transmitted and that the common ancestor of these lipocalins with the metazoan lipocalins is very ancient indeed. Therefore, we rooted our global lipocalin tree using the clade containing all of the prokaryotic lipocalins as an outgroup.
A Rooted Tree for Lipocalins
The result of rooting the lipocalin tree with the bacterial lipocalins is shown in figure 6
. It is puzzling that the D. discoideum lipocalin appears to be located within the bacterial clade (clade I). The dictyostelid lipocalin might be the result of a process of convergent evolution under bacterial-like selective forces. Alternatively, this could be a technical artifact of phylogenetic reconstruction which could be resolved by further sampling of protoctist lipocalins.
|
Clades II and III, along with the outgroup lipocalins (clade I), are separated from the rest of the family members by a long branch that indicates a large sequence divergence. The rest of the tree seems to reflect the entire genome duplications and/or individual gene duplication events that are thought to have occurred during vertebrate evolution (Holland et al. 1994
). Clades IV and V represent the result of a duplication from an ancestral gene that was probably closer in sequence to the PGDSs. One of the duplicated genes gave rise to the exclusively mammalian BL clade, even more derived and rich in recent gene duplications (see above and fig. 4
). A similar evolutionary pathway can be suggested for the related A1mgs and the C8GCs (clades VI and VII), given the existence of A1mgs in primitive chordates.
The chemoreceptor lipocalins, urinary proteins, and a1GPs (clades VIIIXIII) appear to be a monophyletic group. The origin of these outlier lipocalins poses a number of interesting evolutionary questions. This superclade is highly derived (based on the strong amino acid sequence divergence) and rich in individual gene duplications (e.g., MUPs). It is composed of lipocalins so far found only in marsupials and placental mammals, which suggests a shared common ancestor that was present before the split of these two mammalian groups. The marsupial late-lactation proteins (LLPs) appear to be related to a group of chemoreception lipocalins (in clade XIII), and the possum whey protein Trichosurin (Tvul.Lip) groups with an endometrial equine protein and a salivary dog lipocalin in clade XI. Similar phylogenetic relationships have been reported in a parsimony analysis (Piotte et al. 1998
). The sequence relationships revealed by our trees suggest that the outlier lipocalins evolved from the ancient A1mgs (clade VI). This is consistent with the close relationship between a putative odorant-binding protein found in amphibians (Rpip.OP) and the A1mgs (see fig. 3
). It is worth noting that there are also common characteristics between the superclade of outlier lipocalins and the BLs (clade IV). They all have similar sequence divergence values (see below), and both the BLs and the marsupial outlier lipocalins are expressed in the mammary glands. These characteristics, however, could be the result of convergence and of independent cooption events for a role in the mammary glands if it holds true that the A1mgs are the ancestors of the outlier lipocalins. An alternative evolutionary hypothesis, although not supported by our protein-sequence-derived trees, is that the outlier lipocalins originated by divergence from an ancestrally duplicated BL. Some of these proteins could have maintained the ancestral role in the mammary glands of marsupials, while the eutherian counterparts were coopted for other functions as ligand-binding proteins secreted to various body fluids. More sampling of lipocalins in these groups is necessary to resolve this issue.
Phylogenetic Partitioning of Lipocalin Properties
An important advantage of rooting the lipocalin family tree is that we can assign polarities to the evolution of particular lipocalin features. We studied the tissue pattern of protein/mRNA expression and the proposed function for each lipocalin clade. No straight correspondence is obvious between the tree topology and the physiological roles of different lipocalins. Although there are some clades with distinctive functions (e.g., retinol metabolism in RBPs or cryptic coloration in lepidopteran and crustacean lipocalins), some other functions are carried out by members of different clades. Binding of odorants, for example, has been reported for chemoreception lipocalins (clades VIII XIII) and for human ApoD from clade II (Zeng et al. 1996
), and a broad spectrum of lipocalins are thought to play roles in immune modulation and cell regulation (Flower 1994, 1996
). Similarly, the extent of gene duplications within each clade does not show a trend according to the tree relationships. We also studied some biochemical properties of lipocalins, such as oligomerization and glycosylation. None of these characteristics showed a clear phylogenetic pattern.
Anchoring to the membrane surface has been proposed to be an ancestral character for lipocalins (Bishop and Weiner 1996
) based on the membrane localization of bacterial lipocalins, the grasshopper protein Lazarillo, and ApoD. These authors suggest a plausible evolutionary scenario that would account for the different modes of membrane association observed in lipocalins. According to this hypothesis, the ancestral N-terminal lipid binding of bacterial lipocalins was followed by the appearance of a hydrophobic loop in eukaryotic lipocalins (L7, between ß-strands 7 and 8) that associates them with membrane proteins. This was followed by the acquisition of the C-terminal GPI anchor, a more specialized type of membrane association. The last two types of membrane binding were subsequently lost in the rest of the family. Our phylogenetic study nevertheless suggests a radically different hypothesis. Primitive eukaryotic lipocalins, such as Ddis.Lip, apparently lack hydrophobic loops that could be used for membrane anchoring. The same applies to the lipocalins found in crustaceans (Hgam.CRC1 and CRC2) and in several orders of insects, both primitive (Schistocerca americana) and highly derived (D. melanogaster). Moreover, the generality of the function for the hydrophobic surface loop of ApoD still remains to be proved, since the only demonstrated interaction of loop L7 (Yang et al. 1994
) requires a cysteine residue found so far only in human ApoD. This points to a protein-protein interaction of recent invention in the evolution of the ApoD clade and therefore categorizes the ApoD membrane localization as an autapomorphic character for phylogeny reconstruction. Thus, we find no reasons to propose the hydrophobicity of the ApoD loop L7 to be a character present in the ancestor of both chordate and arthropod lineages. In addition, we so far have no evidence for the ancestrality of the GPI membrane anchor within the arthropodan lipocalins, since it has only been found in the grasshopper Lazarillo protein. Currently, the most plausible hypothesis is that the GPI anchor is also an autapomorphic character.
We also explored the sequence divergence in the lipocalin family by calculating the maximum distance (amino acid substitutions/100 residues) observed in each individual clade. We only compared the clades within the subtrees that relate mammalian representatives because of possible biases in the sampling of sequences in different organisms. The values obtained (amino acid substitutions/100 residues) were as follows: clade II, 30; clade III, 17; clade IV, 103; clade V, 43; clade VI, 37; clade X, 189; clade XI, 113; clade XII, 104; clade XIII, 145. These results suggest an increasing sequence divergence in the more derived lipocalins. We did not include clades VII, VIII, and IX because of their scarce organismal representation, but the trend is still clear when the 50 amino acid substitutions calculated for the MUPs (clade VIII) are considered.
Likewise, there seems to be a phylogenetic trend in the degree of contact between the ligands and the internal cavities of lipocalins, which is evidenced when mapping onto the lipocalin tree several parameters that govern the ligand-binding interactions. The binding areas of the internal cavities of Msex.Icy and Pbra.Bbp (clade II) are approximately 1,255 Å2, while those of Hsap.Rbp (clade III) and Mmus.MUP6 (clade VIII) decrease to 860 and 500 Å2, respectively (Flower 1995
). The surface area of the cavity of Btau.OBP (clade X) is estimated to be 390 Å2 (Tegoni et al. 1996
). These data are consistent with the fact that relatively large ligands bind to the more ancestral clades, while derived lipocalins bind to smaller ligands. The lipocalins of the ancestral clade II bind large ligands, such as the bilins that bind to the lepidopteran lipocalins and ApoD (biliverdin IX
shows 920.7 Å2 of solvent-accessible surface [SAS]) and the hemin that binds to a bacterial lipocalin (Barker and Manning 1997
). The lipocalins of clade III bind ligands of intermediate size, since retinol has an SAS of 588.9 Å2. Finally, small ligands are bound by the odorant-binding proteins and MUPs of clades X and VIII (with endogenous ligands showing 407.1 Å2 and 326.5 Å2 of SAS, respectively). In addition, we studied the ligand-protein contacts using the complementary factors (Sobolev et al. 1996
) in five representative lipocalins with known tertiary structure: Pbra.Bbp (code 1BBP in the Brookhaven data bank), Hsap.Rbp (1RBP), Btau.BLG (1BSO), Btau.OBP (1PBO), and Mmus.MUP6 (1MUP). The complementarity factors calculated for each ligand-protein pair are 0.4 (1BBP), 0.54 (1RBP), 0.34 (1BSO), 0.82 (1PBO), and 0.86 (1MUP). This parameter reflects that the contacts between, for instance, Pbra.Bbp and its bilin ligand are looser than those established between Mmus.MUP6 and the ligand 2-(s-butyl) thiazoline. When all of these results are mapped on the global rooted tree of lipocalins, they suggest a decrease in binding surface area during the evolution of lipocalins. Also, in close relation to this pattern, the number of intramolecular disulfide bridges ranges from the two to three bridges present in the ancestral lipocalins (clades IIIV) to the single bridges, and even absence of disulfide bonds (Tegoni et al. 1996
), of more derived lipocalins, which will tend to confer more flexible protein structures.
In summary, a view of the lipocalin family is emerging from our phylogenetic analyses: The more derived lipocalins seem to have evolved a more flexible protein structure and a ligand-binding pocket that binds smaller hydrophobic ligands with more efficiency than the ancestral lipocalins. In addition, this trend is accompanied by a greater rate of sequence divergence within the more derived clades.
Phylogenetic Relationships Within the Calycin Superfamily
Flower, North, and Atkwood (1993)
and Flower (1993)
proposed a protein superfamily called calycins that includes the lipocalins, the avidins, and a complex group of mostly intracellular lipid transporters globally called the fatty-acid-binding proteins (FABPs). The hypothetical phylogenetic relationship of these three protein families was based on their remarkable structural resemblance: each had an antiparallel ß-barrel with an internal ligand-binding site. Nonetheless, their low and local sequence similarity makes an overall alignment of these protein families very unreliable. However, lipocalins and FABPs have been aligned including their structural similarity as an important criterion (e.g., Flower, North, and Atkwood 1993
). We used this alignment to add several members of the FABP family to our data set in order to construct trees that would test whether the FABPs constitute a sister group of the lipocalins. The tree obtained from a maximum-likelihood analysis is shown in figure 7 . The FABP representatives form a monophyletic group with a high LBP value. However, the FABP clade does not group with the prokaryotic lipocalins, which would be expected if a common ancestor of the two protein families ever existed. Instead, the FABPs group with the more derived lipocalins (clades IVXIII). These results confer no good evidence for a sister group relationship and suggest two mutually exclusive hypotheses: (1) that the FABPs evolved from already-existing lipocalins or (2) that the structural resemblance of FABPs and lipocalins reflects a process of evolutionary convergence to optimize a similar ligand-binding function.
|
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Abbreviations: GPI, glycosyl-phosphatidylinositol; LBP, local bootstrap probability; ORF, open reading frame; RT-PCR, reverse transcriptase polymerase chain reaction; SCR, structurally conserved region.
2 Keywords: lipocalin,
calycin,
molecular evolution,
protein phylogeny.
3 Address for correspondence and reprints: Diego Sánchez, Department of Biology, University of Utah, 257 South 1400 East, Salt Lake City, Utah 84112-0840. E-mail: sanchez{at}bioscience.utah.edu
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Achen, M. G., P. J. Harms, T. Thomas, S. J. Richardson, R. E. Wettenhall, and G. Schreiber. 1992. Protein synthesis at the blood-brain barrier. The major protein secreted by amphibian choroid plexus is a lipocalin. J. Biol. Chem. 267:2317023174.
Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28:1150.
Akerstrom, B., and L. Logdberg. 1990. An intriguing member of the lipocalin protein family: alpha-1-microglobulin. Trends Biochem. Sci. 15:240243.[ISI][Medline]
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[ISI][Medline]
Barker, A., and P. A. Manning. 1997. VlpA of Vibrio cholerae 01: the first bacterial member of the 2-microglobulin lipocalin superfamily. Microbiology 143:18051813.
Bishop, R. E., S. S. Penfold, L. S. Frost, J. V. Höltje, and J. H. Weiner. 1995. Stationary phase expression of a novel Escherichia coli outer membrane lipoprotein and its relationship with mammalian apolipoprotein D. J. Biol. Chem. 270:2309723103.
Bishop, R. E., and J. H. Weiner. 1996. Outlier lipocalins more than peripheral. Trends Biochem. Sci. 21:127.
Bugos, R. C., A. D. Hieber, and H. Y. Yamamoto. 1998. Xanthophyll cycle enzymes are members of the lipocalin family, the first identified from plants. J. Biol. Chem. 273:1532115324.
Calero, M., J. Escribano, A. Grubb, and E. Méndez. 1994. Location of a novel type of interpolypeptide chain linkage in the human protein HC-IgA complex (HC-IgA) and identification of heterogeneous chromophore associated with the complex. J. Biol. Chem. 269:384389.
Cavaggioni, A., J. B. C. Findlay, and R. Tirindelli. 1990. Ligand binding characteristics of homologous rat and mouse urinary proteins and pyrazine-binding protein of calf. Comp. Biochem. Physiol. 96B:513520.
Cowan, S. W., M. E. Newcomer, and T. A. Jones. 1990. Crystallographic refinement of human serum retinol binding protein at 2 Å resolution. Proteins Struct. Funct. Genet. 8:4461.[ISI][Medline]
Felsenstein, J. 1993. PHYLIP (phylogeny inference package). Distributed by the author, Department of Genetics, University of Washington, Seattle.
Flower, D. R. 1993. Structural relationship of streptavidin to the calycin protein superfamily. FEBS Lett. 333:99102.[ISI][Medline]
. 1994. The lipocalin protein family: a role in cell regulation. FEBS Lett. 354:711.[ISI][Medline]
. 1995. Multiple molecular recognition properties of the lipocalin protein family. J. Mol. Recogn. 8:185195.[ISI][Medline]
. 1996. The lipocalin protein family: structure and function. Biochem. J. 318:114.[ISI][Medline]
Flower, D. R., A. C. T. North, and T. K. Atkwood. 1993. Structure and sequence relationships in the lipocalins and related proteins. Protein Sci. 2:753761.
Ganfornina, M. D., and D. Sánchez. 1999. Generation of evolutionary novelties by functional shift. Bioessays 21:432439.
Ganfornina, M. D., D. Sánchez, and M. J. Bastiani. 1995. Lazarillo, a new GPI-linked surface lipocalin, is restricted to a subset of neurons in the grasshopper embryo. Development 121:123134.
Haefliger, J. A., M. C. Peitsch, D. E. Jenne, and J. Tschopp. 1988. Structural and functional characterisation of C8, a member of the lipocalin protein family. Mol. Immunol. 28:123131.
Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum likelihood tree. Mol. Biol. Evol. 11:142145.
Holden, H. M., W. R. Rypniewski, J. H. Law, and I. Rayment. 1987. The molecular structure of insecticyanin from the tobacco hornworm Manduca sexta L. at 2.6 Å resolution. EMBO J. 6:15651570.[Abstract]
Holland, P. W. H., and J. GarcÍa-Fernández. 1996. Hox genes and chordate evolution. Dev. Biol. 173:382395.[ISI][Medline]
Holland, P. W. H., J. GarcÍa-Fernández, N. A. Williams, and A. Sidow. 1994. Gene duplications and the origins of vertebrate development. Dev. Suppl. pp.125133.
Igarashi, M., A. Nagata, H. Toh, Y. Urade, and O. Hayaishi. 1992. Structural organization of the gene for prostaglandin D synthase in the rat brain. Proc. Natl. Acad. Sci. USA 89:53765380.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.[Abstract]
Kaumeyer, J. F., J. O. Polazzi, and M. P. Kotick. 1986. The mRNA for a proteinase inhibitor related to the HI-30 domain of inter-alpha-trypsin inhibitor also encodes alpha-1-microglobulin (protein HC). Nucleic Acids Res. 14:78397850.[Abstract]
Keen, J. N., I. Caceres, E. E. Eliopoulos, P. F. Zagalsky, and J. B. C. Findlay. 1991. Complete sequence and model for the A2 subunit of the carotenoid pigment complex, crustacyanin. Eur. J. Biochem. 197:407417.[Abstract]
Kjeldsen, L., A. H. Johnsen, H. Sengelov, and N. Borregaard. 1993. Isolation and primary structure of NGAL, a novel protein associated with human neutrophil gelatinase. J. Biol. Chem. 268:1042510432.
Kremer, J. M. H., J. Wilting, and L. H. M. Janssen. 1988. Drug binding to human alpha-1-acid glycoprotein in health and disease. Pharmacol. Rev. 40:147.[ISI][Medline]
Lawrence, J. G., and H. Ochman. 1997. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44:383397.[ISI][Medline]
Lepperdinger, G., B. Strobl, A. Jilek, A. Weber, J. Thalhamer, H. Flöckner, and C. Mollay. 1996. The lipocalin Xlcpl1 expressed in the neural plate of Xenopus laevis embryos is a secreted retinaldehyde binding protein. Protein Sci. 5:12501260.
Nagata, A., Y. Suzuki, M. Igarashi, N. Eguchi, H. Toh, Y. Urade, and O. Hayaishi. 1991. Human brain prostaglandin D synthase has been evolutionarily differentiated from lipophilic-ligand carrier proteins. Proc. Natl. Acad. Sci. USA 88:40204024.
Newcomer, M. E., and D. E. Ong. 1990. Purification and crystallization of a retinoic acid-binding protein from rat epididymis. J. Biol. Chem. 265:1287612879.
Passey, R. J., and A. G. Mackinlay. 1995. Characterisation of a second, apparently inactive, copy of the bovine beta-lactoglobulin gene. Eur. J. Biochem. 233:736743.[Abstract]
Piotte, C. P., A. K. Hunter, C. J. Marshall, and M. R. Grigor. 1998. Phylogenetic analysis of three lipocalin-like proteins present in the milk of Trichosurus vulpecula (Phalangeridae, Marsupialia). J. Mol. Evol. 46:361369.[ISI][Medline]
Ross, A. C. 1993. Cellular metabolism and activation of retinoids: roles of cellular retinoid-binding proteins. FASEB J. 7:317327.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Sánchez, D., M. D. Ganfornina, and M. J. Bastiani. 1995. Developmental expression of the lipocalin Lazarillo and its role in axonal pathfinding in the grasshopper embryo. Development 121:135147.
Schubert, D., M. LaCorbiere, and F. Esch. 1986. A chick neural retina adhesion and survival molecule is a retinol-binding protein. J. Cell Biol. 102:22952301.[Abstract]
Sivaprasadarao, A., M. Boudjelal, and J. B. C. Findlay. 1993. Lipocalin structure and function. Biochem. Soc. Trans. 21:619622.[ISI][Medline]
Sobolev, V., R. C. Wade, G. Vriend, and M. Edelman. 1996. Molecular docking using surface complementarity. Proteins 25:120129.
Tanaka, T., Y. Urade, H. Kimura, N. Eguchi, A. Nishikawa, and O. Hayaishi. 1997. Lipocalin-type prostaglandin D synthase (beta-trace) is a newly recognized type of retinoid transporter. J. Biol. Chem. 272:1578915795.
Tegoni, M., R. Ramoni, E. Bignetti, S. Spinelli, and C. Cambillau. 1996. Domain swapping creates a third putative combining site in bovine odorant binding protein dimer. Nat. Struct. Biol. 3:863867.[ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
von Heijne, G. 1990. The signal peptide. J. Membr. Biol. 115:195201.[ISI][Medline]
Yang, C. Y., Z. W. Gu, F. Blanco-Vaca, S. J. Gaskell, M. Yang, J. B. Massey, A. M. Gotto, and H. J. Pownall. 1994. Structure of human apolipoprotein D: locations of the intermolecular and intramolecular disulfide links. Biochemistry 33:1245112455.
Zeng, C., A. I. Spielman, B. R. Vowels, J. J. Leyden, K. Biemann, and G. Preti. 1996. A human axillary odorant is carried by apolipoprotein D. Proc. Natl. Acad. Sci. USA 93:66266630.