* Institut Charles Darwin International, Romainville, France; and UMR7138 Systématique, Adaptation, Evolution, Département Systématique et Evolution, Muséum National d'Histoire Naturelle, Paris, France
Correspondence: E-mail: lecointr{at}mnhn.fr.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: metabolism metabolic pathways biochemical evolution
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Using these new concepts, the historical development of aliphatic amino acid catabolism was first inferred in relation to the development of the Krebs cycle, and then aliphatic amino acid anabolism was incorporated into the matrix. Obviously, such a methodological framework now has to incorporate the most widely shared pathways through life, called here "universal metabolism"; that is, anabolism and catabolism of the three fundamental kinds of biomolecules: amino acids, fatty acids, and saccharides. It is of interest because that framework offers new and transparent means to answer questions about metabolism evolution. One of them is the relative timing of the rise of glycolysis, the Krebs cycle, and amino acid biosynthesis. Meléndez-Hevia, Waddell, and Cascante (1996) considered glycolysis as earlier than both the amino acid biosynthesis and the Krebs cycle because glucose is implicitly considered as more profitable as the earliest nutrient than are amino acids. Amino acid catabolism was, therefore, thought by these authors to be secondary. Conversely, Cunchillos and Lecointre (2003) found the Krebs cycle as the product of metabolism of amino acids of groups I and II (in the sense of Cordón [1990], namely, Asp, Asn, Glu, Gln, Arg, and Pro). However, their matrix did not include glycolysis. For this reason, glycolysis and gluconeogenesis have been included here. Is the metabolism of monosaccharides (also including Calvin and pentose-phosphate cycles) the first? In the same framework, when did fatty acid metabolism rise, compared with other pathways? A number of other metabolic pathways have also been included in the present work to submit a greater diversity of widely shared reactions to a transparent procedure for inferring a more complete view of their temporal development. For instance, the development of the urea cycle is traditionally thought to be linked to the metabolism of arginine and might have been possible as early as arginine metabolism, and the metabolism of aromatic amino acids is classically thought as arising later than the appearance of aliphatic amino acid metabolism.
Earlier authors (e.g., Horowitz 1945; Cordón 1990) have speculated on the evolutionary timing of the development of pathways from a theoretical point of view. For example, because free aliphatic amino acids were considered one of the very first sources of molecules in abiotic environments, upstream reactions of amino acid catabolic pathways must have occurred before downstream reactions, whereas downstream reactions of amino acid anabolic pathways must have occurred before upstream reactions (see Cunchillos and Lecointre [2000]). The present methods have the power to test such hypotheses. By using those methods, those hypotheses were corroborated for some aliphatic amino acids only (Cunchillos and Lecointre 2003). The evolutionary timing of development of other pathways can now to be tested in the same way. In the present work, new data increasing complexity of characters required to face new methodological challenges.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Enzymes acting on complex molecules have been excluded from that universal core of enzymatic activities. Complex molecules are assemblages of those elementary molecules whose metabolic evolution is being studied. Complex molecules include, for instance, coenzymes, triglycerids, phospholipids, nucleosids, and polymers (e.g., glycogen, starch, proteins, DNA, and RNA). They all are secondary products of the core metabolism studied here. Purines and pyrimidines are considered as complex molecules themselves because they are never synthesized as such in vivo. Indeed, their precursors are already attached to other compounds (riboses or ribose-phosphates), so their recognition as isolated compounds have no biological basis. This delineation leads us to sample the following pathways for phylogenetic analysis.
As in Cunchillos and Lecointre (2000), taxa are defined from the tip of the pathway to its point of contact into the Krebs cycle. To name pathways, prefixes "d" and "s" are used to refer to degradation and synthesis, respectively. For example, dGLN is the set of enzymatic activities involved in converting glutamine to oxoglutarate, whereas sGLN is the synthetic pathway from oxoglutarate to glutamine. When degradation or synthesis of a compound can occur in different ways, they are numbered in the name of the pathway. For instance, cysteine can be degraded via mercaptopyruvate (dCYS2) or directly through pyruvate and acetyl-CoA (dCYS1). Taxons are listed in tables 1 and 2. Fatty acid catabolic and anabolic pathways stop at acetyl-CoA. Monosaccharide anabolic pathways (gluconeogenesis and pentose-phosphate cycle) stop at oxaloacetate, and monosaccharide catabolic pathways (glycolysis and Calvin cycle) stop at acetyl-CoA. Amino acid anabolism and catabolism stop at different points of the Krebs cycle or at acetyl-CoA, depending on the amino acid (oxoglutarate, succinyl-CoA, oxaloacetate, or acetyl-CoA). The Krebs cycle is divided into two parts, designated using the two main points of entrance: KC1 from oxaloacetate to oxoglutarate and KC2 from oxoglutarate to oxaloacetate. The urea cycle is not delineated in reference to the Krebs cycle, but independently, including the reactions of its own cycle.
|
|
A new kind of homology of type II was used in the present work. Pathways can be similar in the recurrence of a set of reactions made by the same enzymes from different substrates. This homology is used in the metabolism of fatty acids (character 18) and called "IId." All characters are named in table 1, with the number from enzyme nomenclature and the type of homology involved. In figures 1 and 2, characters involving homologies of type II are numbered from 1 to 32. It must be stressed that, if homologies of type I are named strictly following the international enzymatic nomenclature, homologies of type II are not. For example, hydratase (character 29) is used here in a wider meaning than in the international nomenclature. A reaction involving the pyridoxal-phosphate is coded as involving a hydratase because a molecule of water is implied, whereas it is not the case for international nomenclature. Conversely, our delineation of homologies of type II is more precise for dehydrogenases (or reductases) involving NAD. Alcohol-NAD-dehydrogenases, aldehyde-NAD-dehydrogenases, deaminase-NAD-dehydrogenases, and FAD-dehydrogenases have been separated into different characters.
|
|
Phylogenetic Reconstruction
Tree Search
The matrix contains 75 taxons and 202 characters (table 2). Characters are treated as unordered and unweighted. Heuristic searches were conducted with NONA (Goloboff 1998) as implemented into WINCLADA (Nixon 1999a), using TBR branch swapping. For a better exploration of trees, the Parsimony Ratchet (Hopper Islands [Nixon 1999b]) was used. The proportion of data to be reweighted was set between 25% and 50% and the number of iterations progressively increased from 25,000 to 150,000 (option amb- poly=). This increase in iterations was used to check that the number of supplementary MP trees gained each time was decreasing or null. Each time the number of trees was recorded after having collapsed, all unsupported nodes in all trees ("hard collapse").
Rooting
The tree was rooted using an all-zero hypothetical ancestor (HYPANC). This is justified by the fact that, in the coding of character states, zero was given to the absence of enzymes or to the absence of performance of particular functions (even in presence of a putative suitable substrate) or to absence of use of a cofactor. Such a rooting option will automatically put the simplest pathways closer to the root. However, this does not make any assumption of the nature of the corresponding enzymatic reactions. The aim was to produce explicit hypotheses so that the character coding and trees produced by the present study will still be useful even if another way of rooting is used. They will just have to be considered again in the light of the new root. In other words, the topology (the way branches are connected to each other) could still be the same but just rooted differently. Because a different rooting yields different phylogenetic interpretations, the present work would still be useful for such a new interpretation.
Defining Time Spans
Criteria
From the root to the tip of branches, phylogenetic trees provide a relative order of transformations (here enzymatic innovations) through time. We call an "upstream node" a deep, more inclusive internal branch or clade and a "downstream node" a more terminal, less inclusive internal branch or clade. Time spans ("periods" or "phases") in metabolism are defined as the time along the tree separating two character changes of type II, taking into account the following criteria as in Cunchillos and Lecointre (2003):
Classically, homologies of type II involve several enzymes with different specificities. However, for 14 cases, single enzymes with the same high specificity innovate a new enzymatic mechanism. They are recorded as homologies of type I, but because they also correspond to enzymatic changes used to define homologies of type II, they are taken into account for defining periods. In figure 1, they are shown with asterisks.
Polytomies
It must be stressed that both the tree topology and the new nonhomoplastic changes in homologies of type II are involved in the definition of periods. How this can be used when parts of the tree are unresolved? A polytomy is just an absence of resolution. A polytomy normally contains branches of different periods, simply because there are different numbers of new homologies of type II occurring on them. For example, within the clade B (fig. 1), a polytomy involves branches of the same period as the origin of the clade B (period 1, pink) and branches of later periods 2 (orange) and 3 (yellow). Some of these branches are of period 3 because they exhibit two additional homologies of type II (13 and 14). Homologies of type II are, therefore, cumulative in defining periods.
The Problem of Question Marks
Question marks bring complications in the definition of time spans. As a consequence of using the parsimony criterion, some homologies of type II are optimized onto a node deeper than the ones where the derived state is really observed, because of question marks in the rest of terminals of the clade. For example, character 8 (carboxylations using biotin) is optimized to occur on the node K. Among the 12 pathways included within K, only five (dLEU, dILE, dVAL, dMET, and dTHR1) really exhibit this function; all others are coded "?" for that function. Thus, the early position of the gain of character 8 is because of the parsimony criterion used to manage question marks in character optimization, whereas real observation of this function would optimize two gains of character 8 onto nodes T and F only. This causes no problem, except for the ordering the time spans. K is of period 7 because of the gain of this homology of type II. However, seven downstream pathways (dALA, dCYS2, dCYS3, dSER, dCYS1, dGLY, and dTHR2) do not really have to wait for that period to be achieved, because they do not need to use this function, for which they are coded "?". Therefore, assigning their complete development to period 7 is an artifact of optimization affecting time-span assignment. Because these pathways are actually possible as early as the previous period (period 6), they have been assigned to that period. This is the reason why there can be an apparent contradiction in observing a node or a branch to which is assigned a period earlier than an upstream (more inclusive) branch (dCYS1 is possible in period 6 while node K is in period 7). Such situations are met four times in the analysis and are always caused by question marks.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cordón (1990) defined four groups of amino acids, according to common reactions in their metabolism and the point of entry into the Krebs cycle: group I enters by oxaloacetate, group II by ketoglutarate, group III by pyruvate and acetyl-CoA, and group IV by succinyl-CoA. In our tree (fig.1), amino acids of Cordón's groups I and II develop first. At period 6 (light blue), it is possible to synthesize and degrade all these amino acids (Asp, Asn, Arg, Pro, Gln, and Glu). Based on the full development of the synthesis of arginine (sARG3 and sARG4), the urea cycle is fully developed by the period 4 (light green). The complete metabolism of Cordón's groups III (Ser, Gly, Cys, and Ala) and IV (Thr, Val, Ile, and Met) starts at period 5 (deep green) and mostly develops during periods 5 (for syntheses) and 6 to 7 (for degradations). Aromatic amino acids are degraded as early as period 4 for histidine and period 6 or later for others, whereas their synthesis is so sophisticated that it is only possible very late in periods 8 and 9.
The two parts of the Krebs cycle develop within period 6. Period 5 (deep green) is the period of termination of almost all aliphatic amino acid syntheses (except for lysine and methionine, whose syntheses are achieved in period 7). Period 6 (light blue) is very rich; that is, it contains many events that cannot be ordered: the closing of the Krebs cycle, the termination of the degradation of all aliphatic amino acids of groups III (except dCYS2), and the full development of glycolysis and gluconeogenesis. Later in period 7 (deep blue) almost all aliphatic amino acid metabolism (except dCYS2), degradation of fatty acids, and one of the two syntheses of fatty acids (the "intramitochondrial pathway," here numbered "2") are all possible. It is noticeable that the relative order of glycolysis and the closure of the Krebs cycle cannot be clarified; both of them appear to be the consequence of amino acid metabolic pathways. Period 8 (purple) closes the complete development of all aliphatic amino acid metabolisms and develops the Calvin cycle. Period 10 is arbitrarily defined as containing all events later than period 9. This includes syntheses of aromatic amino acids and fatty acids along the "extramitochondrial pathway," here numbered "1".
Some branches are in a period earlier than one of their upstream branches because of question mark optimization (see above). This is the case for optimizations of characters 4, 8, 18, and 19. Questions marks in character 8 (carboxylations using biotin) imply that terminal branches dALA, dCYS3, dSER, dCYS1, dGLY, dTHR2, and branches Q and S are in period 6 while their upstream branches K, O, and R are in period 7. Question marks of character 18 (ß-oxidation sequence repeat) in KC1 and sLYS3 place terminals KC1 and sLYS3 in period 6 while their upstream nodes L and U are in period 7. Question marks in character 19 (ß-oxidations) for pathways of clade H (except KC2, dLYS, dILE, dVAL, and synthesis and degradation of fatty acids) place the gain of ß-oxidation onto the branch H. That branch is assigned to period 6 because of the new gain of ß-oxidation (character 5, already available from period 2, is not new). All pathways that exhibit a question mark for character 19 are assigned one period earlier. Branch M, all members of clade W, branch N, and all members of clade X are, therefore, assigned to period 5, which is earlier than period 6 of H. The same reasoning is followed for character 4, which is coded "1" in all pathways of asparagine and glutamine and coded "?" in all others. All others are assigned one period earlier. Question marks explain why one can find new homology of type II in a branch without changing the period. The assignment of the period involves one period forward because of the new type II homology and one period earlier because of a question mark for another character. For example, within the clade B, the node on which character "5" occurs remains in period 1 (pink) because one step forward for the new homology "5" and one step earlier because of question marks on character "4" for dARG, sPRO1, dPRO. Another example is the node E, which remains in period 3 despite the new homology "11".
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Amino acid metabolism is basal in the phylogenetic tree, and all other metabolisms develop from that background. The priority of amino acid degradation over syntheses depends on the amino acid. Full degradations are earlier than full syntheses for amino acids of Cordón's groups I (Asp and Asn) and II (Glu, Gln, Arg, and Pro, excepting proline, for which full synthesis is possible in period 2 while degradation appears in period 3) and for aromatic amino acids (His, Tyr, and Phe), contradicting Cordón's views on priority of aromatic amino acid syntheses over their degradations. For amino acids of Cordón's groups III (Ala, Cys, Gly, and Ser) and IV (Leu, Val, Ile, Lys, Thr, and Met) full syntheses are possible before full degradations (except for Trp), contradicting Cordón's views. In groups I and II, degradation and synthesis tend to be reverse processes, whereas in groups III and IV, they are different pathways. This observation is also true for aromatic amino acids, as the syntheses are different and very late in comparison to degradations. Our results do not fully corroborate predictions of Horowitz (1945) and Cordón (1990) according to which amino acid catabolic pathways develop forwards and which anabolic pathways develop backwards. Indeed, dPRO, dHIS, dALA, dTRP, and dMET develop backwards, whereas none of the amino acid anabolisms are found to develop backwards, except sLYS. Furthermore, in some cases, opportunistic late connections of pathways lead to complex developments neither forwards nor backwards as in sLEU, sVAL, dILE, sGLY, sTHR, sMET, dLEU, and dGLY. Horowitz (1945) and Cordón's (1990) predictions were not based on a data matrix, but (1) on the hypothesis that amino acids of abiotic origin might have been one of the very first sources of energy and structural components available to primitive forms of life (which has been highly corroborated since), (2) on a theoretical model of selective pressure acting on protocells during the course of biochemical evolution, and (3) on consistent comparative reasoning from structures of metabolic pathways and their compounds (see Cunchillos and Lecointre [2000]). No computerized parsimony was used, and it is not surprising that a formalized reconstruction algorithm using parsimony can deal with complex issues of homoplastic late opportunistic pathway connections better than a human mind can.
Metabolism of fatty acids and saccharides develop after the full development of metabolism of amino acids of groups I and II, and they are associated with the anabolism of amino acids of groups III and IV. Sugar metabolisms are within the clade N, with anabolic pathways of the amino acids of group III, as predicted by Cordón (1990). Syntheses of aromatic amino acids are branched within sugar metabolism. From the present data, it is neither possible to draw conclusions about forward or backwards development of fatty acid and sugar metabolisms nor possible to draw conclusions about priority of either their anabolism or their catabolism. Note that the extramitochondrial synthesis of fatty acids precedes (period 7) the intramitochondrial synthesis (period 9).
Period 6 is very rich, a period during which events are difficult to order and the two portions of the Krebs cycle take place after the setting of metabolisms of amino acids of groups I and II. Interestingly, one portion of the Krebs cycle has a catabolic origin and the other an anabolic origin: KC2 is associated with catabolism of amino acids of groups III and IV, and KC1 is associated with anabolism of the same groups. Despite the late branching of glycolysis in the tree, it is difficult to order the rise of full glycolysis and the rise of full Krebs cycle: they are all embedded within period 6. So the views of Meléndez-Hevia et al. (1996), that glycolysis must have preceded the Krebs cycle because it provided energy, are neither confirmed nor contradicted. However, relative position of a catabolic pathway must not only be assessed in terms of energy but also in the light of the compounds provided. Availability of glucose in early abiotic environments is much more speculative than availability of amino acids (Cunchillos and Lecointre 2000, 2002). For that reason, because the Krebs cycle is derived from amino acid metabolisms of groups III and IV, the hypothesis that the Krebs cycle arose earlier than glycolysis is much more likely.
From the present data, the ordering of gluconeogenesis compared with glycolysis is not possible: both are in period 6. Pentose-phosphate and Calvin cycles arose later (periods 7 and 8, respectively) than glycolysis and gluconeogenesis, as predicted by Cordón (1990). The urea cycle arises as early as the synthesis of arginine in period 4.
The present inferences partially corroborate Cordón's scenario, which, after all, is not surprising, because they are only based on the comparative anatomy of metabolic pathways, whereas Cordón incorporated much biological input. It is very interesting that his scenario was embedded into a consistent and well-documented DNA-free conception of natural selection of enzymatic specificity, leading to the evolution of metabolism through natural selection of proteins without any DNA information storage. This is one of the reasons why the present work has nothing to do with theories of early information storage: to test Cordón's scenario, such storage was not necessary. Concerning information storage in nucleic acids, it should be stressed that the compounds involved in the pathways studied here are basic elements for construction of nucleic acids of metabolic origin. Therefore, those nucleic acids arose later than the pathways. Biological purines and pyrimidines are the products of several already complex molecules; they are never synthesized as such in cells, because their parts are already linked to chemically different compounds such as riboses. From the evolutionary point of view, they are later chemical compounds than all the metabolic pathways compared here. The question of knowing whether there might have been DNAs or RNAs of nonmetabolic origin encoding "information" for enzymatic activities studied cannot be answered here. Cordón's scenario for the rise of protein complexity and enzymatic specificity by natural selection without invoking DNA/RNA "information" storage is more parsimonious than explanations that require such a storage and is well worth further consideration for positive input in the increasing criticisms of the current uses of notions of genetic control and information (Kupiec and Sonigo 2000; Segal 2003).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cordón, F. 1990. Tratado evolucionista de biologia. Aguilar, Madrid.
Cunchillos, C. and G. Lecointre. 2000. L'histoire du catabolisme des acides aminés aliphatiques inférée par l'analyse cladistique de deux nouveaux types de caractères : l'enzyme et la réaction enzymatique. Pp. 87106 in V. Barriel and T. Bourgoin, eds. Caractères. Biosystema 18, Société Française de Systématique, Paris.
. 2002. Early steps of metabolism evolution inferred by cladistic analysis of the structure of amino acid catabolic pathways. C. R. Biologies 325:119129.[CrossRef][ISI][Medline]
. 2003. Evolution of amino acid metabolism inferred through cladistic analysis. J. Biol. Chem. 278:4796047970.
de Pinna, M. C. C. 1991. Concepts and tests of homology in the cladistic paradigm. Cladistics 7:367394.[ISI]
Enzyme Nomenclature. 1973. Recommendations (1972) of the International Union of pure and applied chemistry and the International Union of biochemistry. Elsevier Scientific Publishing, Amsterdam.
Goloboff, P. 1998. Nona: computer program and software. Published by the author, Tücuman, Argentina.
Horowitz, N. H. 1945. On the evolution of biochemical syntheses. Proc. Natl. Acad. Sci. USA 31:153157.[ISI]
Kupiec, J. J. and P. Sonigo. 2000. Ni Dieu ni gène. Seuil, Paris.
Meléndez-Hevia, E., T. G. Waddell, and M. Cascante. 1996. The puzzle of the Krebs citric acid cycle: assembling the pieces of chemically feasible reactions, and opportunism in the design of metabolic pathways during evolution. J. Mol. Evol. 43:293303.[ISI][Medline]
Nixon, K. C. 1999a. Winclada (BETA). Version 0.9.9. Published by the author, Ithaca, New York.
. 1999b. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407414.[CrossRef][ISI]
Segal, J. 2003. Le zéro et le un, histoire de la notion scientifique d'information au 20ème siècle. Syllepse, Paris.