Herbarium and Department of Biology, University of Michigan
![]() |
Abstract |
---|
![]() |
Introduction |
---|
One of the earliest analyses to use probabilistic arguments to test explicit hypotheses of monophyly was reported by Templeton (1983)
. He used compatibility to construct phylogenetic trees based on endonuclease restriction sites to compare competing hypotheses of relationships among four hominid taxa for their distinctness from one another using a nonparametric test. For larger studies, it became more common to use phylogenetic reconstructions without statistical arguments to compare hypotheses of monophyly. Although indirect, parsimony reconstructions became more reasonable for this purpose with the bootstrap method of Felsenstein (1985)
, which uses a probability concept to measure the strength of branches in a parsimony tree. Bremer (1988)
described a novel nonprobabilistic measure of these branches.
Faith (1991)
argued the importance of distinguishing a priori hypotheses from a posteriori hypotheses. A priori hypotheses are formulated before examining the data to be used to evaluate them. A posteriori hypotheses arise from an examination of the data, for example, as a result of phylogenetic reconstruction based on the data. A posteriori hypotheses can sometimes be tested using the data on which they were erected, but this requires special care to avoid inferential circularity. Branch support measures such as bootstrap and Bremer support, when used to evaluate monophyletic groups revealed by parsimony reconstruction, are in support of a posteriori hypotheses, because the same data that gave rise to the hypotheses are used to evaluate them.
Faith (1991)
defined random variables to test explicitly a priori hypotheses of monophyly for groups of evolutionary units (EUs) among those under study. Faith and Trueman (1996)
and Huelsenbeck, Hillis, and Neilson (1996)
have proposed tests of monophyly as well. Although these methods do attach probabilistic or statistical statements to a priori hypothesized monophyletic groups, they all involve in some way the construction of estimates of phylogenetic trees.
Alroy (1994)
proposed direct tests of monophyly, based on compatibility concepts, that do not require the estimation of phylogenetic trees, and do provide a measure of statistical significance estimated by simulated data with permutations. The criterion he proposed depends on previously hypothesized orderings of character states into character state trees in which the directions of evolutionary change have been specified.
Wilkinson (1998)
proposed a novel concept of support for a hypothesis by a character, a concept that is distinct from compatibility and parsimony. His concept can use aligned sequence data without the need to construct phylogenetic trees. Wilkinson support is defined as follows: a character supports a hypothesized monophyletic group if the character could have changed states on the branch leading to that group in any tree that contains the group, without requiring any extra state changes.
Wilkinson (1998)
described a simple test to see if a character supports a hypothesized monophyletic group (HMG). For a given character, compare all pairs of EUs with one chosen from the HMG and one chosen from the residual paraphyletic group; if, for any such pair, one EU is in the same character state as the other EU, then the character does not support the hypothesis, but if for no such pair is one EU in the same character state as the other EU, then the character does support the hypothesis. This test can also be described in several equivalent ways: (1) Support is provided if no character state overlaps both the HMG and the remaining taxa; if any states are shared, then support is not provided. (2) If any of the EUs in the HMG are in the same state as an EU outside the HMG, then there is no support. If all of the character states of EUs in the HMG are different from states of EUs in the remnant group, then there is support. Where confusion is unlikely, we will use "support" in the sense of Wilkinson (1998)
, and when necessary we will use "Wilkinson support" to mean the same thing.
Although this definition explicitly labels one group as monophyletic and the other as a remnant group that may well be paraphyletic, the definition does not assume either direction of evolutionary change; thus, support would be the same if the remnant group were hypothesized to be monophyletic and the previously hypothesized monophyletic group became the remnant group.
In the work reported here, we first present a small artificial data set to show how Wilkinson support differs from concepts based on parsimony. We describe how to determine the exact probability that a character would support a random assemblage of EUs the same size as the hypothesized monophyletic group, termed a "same-sized hypothesis." We use the negative natural logarithm of this probability as a character weight. Total support for an HMG by a data set is the sum of these character weights over all the characters that support the HMG. Next, we describe two different random models that we use to simulate statistical significance for total support. Then, we use Floricaula/LEAFY amino acid sequence data to compare three hypotheses involving gene gain/loss in seed plants (Frohlich and Parker 2000)
. Finally, we discuss the potential utility of these support concepts in phylogenetic studies.
![]() |
How Wilkinson Support Differs from Parsimony Concepts |
---|
|
|
Wilkinson support is different from common parsimony-based concepts and is not affected by hypotheses of phylogenetic relations that are unrelated to the hypothesis of monophyly to be evaluated. Some characters might not support any hypothesis of a given size. Other characters might support only a few hypotheses of a given size, or they might support many hypotheses or even all hypotheses. A character that supports many hypotheses would support a specific hypothesis less convincingly than a character that supports only a few. For this reason, for a given hypothesis size, we calculate the probability that a position would support a random (i.e., equiprobably chosen) hypothesis by dividing the number of possible hypotheses it supports by the total number of hypotheses. For hypotheses of this size, we then weight the character with the negative natural logarithm of this probability.
The hypothesis of monophyly (a, b, c) divides the EUs of table 1 into two groups, one containing EUs a, b, and c, and the other containing EUs d, e, and f. There are 20 ways to choose from six EUs a monophyletic group containing three EUs. Individual characters can support from 0 to 20 of these 20 possible hypotheses. Characters I, IV, and VI cannot support any hypothesized monophyletic group with three EUs, so for hypotheses of size 3, their weights are irrelevant, undefined, and left blank in table 1 . The other characters can support a hypothesis of this size. Character II, for example, supports four hypotheses of size 3: (a, c, d), (b, c, d), (a, e, f), and (b, e, f); thus, its weight is -ln(4/20) = 1.61. Weights are listed for each character in the right-hand column of table 1 . Note that characters V and IX are parsimony-uninformative (if parsimony state changes are weighted equally), but these characters could support some hypotheses of size 3. Character X supports all possible hypotheses of this size, so its weight is -ln(20/20), which is 0. In fact, character X supports all hypotheses of any size; we call such characters "Wilkinson-uninformative." The total support for the hypothesis (a, b, c) from all characters is 7.13.
![]() |
The Calculation of Weight |
---|
Recall that the states of a character correspond to the distinct amino acids in an aligned position. This position may lack an amino acid for some EUs. There are many ways to construe these gaps, including (1) as missing data; (2) as one additional character state containing all of the EUs with gaps at that position; and (3) as if each gap represented a distinct, unique state. When some data are missing for a given character, we evaluate the Wilkinson test in such a way that the character fails to provide support only if two nongap character states are shared by the pair of EUs. Thus, our calculation of Wilkinson support will not distinguish between (1) and (3) but will exclude (2).
A computer program, MEAWILK, has been written to calculate support provided by a data set for hypothesized monophyletic groups. MEAWILK can be downloaded from the website http://www-Personal.umich.edu/gfred/. For each hypothesis, MEAWILK reports the following measures: size = numbers of EUs in the hypothesized monophyletic group and in the residual group; NPC = number of positions that could support some hypothesis of this size; NPD = number positions that do support this hypothesis; SWC = Sum of weights of positions that could support a hypothesis of this size; SWD = sum of weights of positions that do support this hypothesis (=total support); %SWD = 100 x SWD/SWC.
![]() |
Purposes for Support Measures |
---|
Purpose 1
Purpose 1, to determine the significance of a measure of support, is important when there is some question that the HMG might have no more support than could be expected for a group chosen at random. To determine such a significance, some model of randomness must be defined. Significance can then be defined as the probability that a support measure would be as large as that actually observed if the model were sampled "at random." Low probabilities indicate that the level of support is too high to be expected as a result of the random process defined in the model. Significance clearly depends on the model used to define it.
We have used two different models to define "significant." Model 1, hypothesis randomization, chooses, equiprobably, a same-sized hypothesis and evaluates support for it using the real data. Model 2, data set randomization, simulates a new data set by independently replacing each character with one equiprobably chosen from those with the same numbers of EUs in the same numbers of states and the same numbers of EUs with missing data, and then evaluates support for the real hypothesis using this simulated data set; in other words, the state names within each column are permuted. This is done independently for each column.
Model 1 preserves all of the dependencies among the actual characters and seeks the probability that a randomly chosen hypothesis of the same size would have a given level of support from the these actual characters. In model 2, the characters are sampled independently of one another. Thus, the observed data set is presumed to be one possible sample of this independent character replacement process. SWD is the negative natural log of the model 2 probability that the characters that actually do support the HMG would all do so at random. Notice that this says nothing regarding the characters that do not actually support the HMG. The probability that the whole data set would offer a given level of support is estimated by the simulated significances. MEAWILK can estimate significances by sampling either of these two models a large number of times. In some cases, these two models can give very different significances. For example, the significance of total support (=7.13) for (a, b, c) (table 1 ) is 0.1033 under model 1 and 0.0114 under model 2 based on simulations of 10,000 samples.
Purpose 2
Each HMG in a phylogenetic tree may be evaluated with Wilkinson support to identify those HMGs with very little support. Wilkinson support for an HMG, defined by a branch in a phylogenetic tree, depends only on the EUs it contains, not on how they are arranged in the tree. The total support measure does depend very much on the number of EUs in the HMG, so total support values for different-sized HMGs are not easily comparable. Because monophyletic groups in an estimated tree are all post hoc hypotheses, relative to the data used to estimate that tree, evaluating the significance of their support from these data as if they were a priori hypotheses will result in artificially high significance (i.e., low probabilities). Therefore, high significances are suspect as supporting evidence for an HMG, but low significances derived from model 1 or model 2 may be meaningful. Thus, any monophyletic groups in the tree with low significances may deserve to be collapsed by removal of the branches leading to them, whether or not their support values otherwise appear high.
Purpose 3
Purpose 3 is the comparison of competing (incompatible) a priori hypotheses of monophyly. Hypotheses that cannot be distinguished from random under model 1 are not shown to be individually credible in the context of the data of interest. Thus, they cannot credibly compete with other hypotheses. Only highly significant competing hypotheses need to be considered. Often, competing a priori hypotheses are long-standing and have each been found credible in the context of other data, so it is not unexpected that they would each appear nonrandom with respect to relevant new data. Consider again the example of an 8-EU hypothesis from a data set with 25 EUs. Over a thousand of the more than one million hypotheses of this size would be significant under model 1 at P < 0.001, and most pairs of these hypotheses would be incompatible. Thus, it is not surprising that two incompatible hypotheses would both be significantly nonrandom. Significance tests of this sensitivity cannot be used to choose among extremely well supported hypotheses. It is impractical to use simulation to calculate P values many orders of magnitude smaller than this, but total support may be used to compare same-sized hypotheses. For two hypotheses of the same size, the weight of a character for one is the same as the weight of that character for the other. Thus, in the comparison of same-sized hypotheses, total support for one is strictly comparable with total support for the other. The larger the difference in total support, the stronger the inference favoring the hypothesis with the higher total support. Because an individual character has different weights for different-sized hypotheses, the total support for different-sized hypotheses may not be strictly comparable. A better measure might be the percentage of total support received by each hypothesis; this is the total support of a hypothesis divided by the sum of all character weights that could support a hypothesis of this size, expressed as a percentage. The larger the difference in percentages of total support, the stronger the inference favoring the hypothesis with the higher percentage of total support.
MEAWILK calculates total support and also estimates the probability that random processes would provide that amount of total support under the two models of randomness. Of course, high levels of total support or low probabilities, i.e., high significances, are not direct measures of historical truth, which is inherently unknowable in most cases. Like other measures, such as Bremer support or bootstrap support, Wilkinson support constitutes evidence based on data. Homoplasy, that is, distantly related taxa in a same state that is different from the states of other EUs in their respective clades, could result in support for false hypotheses of monophyly, just as it can mislead parsimony tree estimates. Even without homoplasy, a shared primitive character state can result in support for a formerly monophyletic group made paraphyletic by the rapid evolution on some lineage(s) derived from the group. A special case of this is long-branch attraction, which can also allow distinct, rapidly evolving lineages to appear monophyletic; this is well known to be a hazard in parsimony estimation as well (Felsenstein 1978). In these ways, Wilkinson support shares with other measures the limitation of quantifying in a plausible way the extent to which observed data support hypotheses of monophyly.
![]() |
Comparison of Hypotheses Using a Real Data Set |
---|
The clear importance of LFY gene phylogeny led Frohlich and Parker (2000)
to sequence these genes from other gymnosperms, lower plants, and additional flowering plants and to construct gene trees based on parsimony and on maximum likelihood using the aligned amino acid sequences. These studies showed that representatives of all living gymnosperm groups have both paralog subfamilies of LFY, named "Needle" and "Leaf." In the parsimony analysis, the flowering plant LFYs and the gymnosperm Leaf paralogs form a monophyletic group with bootstrap support of 91% or 94% and with Bremer support of 6. The optimal maximum-likelihood trees also show this monophyletic grouping. If this is truly a monophyletic group, then the gene duplication must antedate the divergence between the flowering plant and the (extant) gymnosperm lineages. Flowering-plant ancestors would initially have had both LFY paralogs, but would have lost the Needle paralog.
The Leaf and Needle paralogs apparently function to help specify, respectively, the male and female reproductive units of gymnosperms, which have long been borne separately. If the male reproductive unit of the gymnosperm ancestor also acquired female structures, eventually becoming the flower, then the purely female structure would no longer be needed and could be lost, along with the Needle paralog that specified it. This is the scenario proposed in the mostly-male theory of flower evolutionary origins (Frohlich and Parker 2000
). Support for hypothesis 1, that flowering plant LFYs and gymnosperm Leaf paralogs form a monophyletic group, supports the mostly-male theory. Parsimony and maximum-likelihood tree construction provide an indirect indication of support for hypothesis 1. We wish to compare the three hypotheses of monophyly using the direct measure of Wilkinson support.
![]() |
Wilkinson Analysis of LFY Genes |
---|
Table 2
lists the taxa and LFY genes included in the data set. Frohlich and Parker (2000)
used only complete LFY sequences to avoid questions arising from bootstrap treatment of missing data. Here, we add three partial LFY sequences from mosses and from the liverwort Marchantia, for a total of 30 EUs. The allotetraploid Nicotiana has two nearly identical LFY genes, one from each chromosome set. The moss Atrichum angustatum (Brid.) BSG has two LFY genes (AtranFlo1, GenBank accession number AF286054; and AtranFlo2, AF286055) which are so similar to each other that they must result from an independent gene duplication. This species is haploid (i.e., normal for a moss; Crum and Anderson 1981
), so the duplication is not the result of polyploidy. The sample was collected in Schenectady, N.Y., on the Union College grounds. Marchantia polymorpha L. (MarpoFlo; GenBank accession number AF286056) was purchased from the Carolina Biological Supply company. Vouchers for both are at MICH and are M. W. Frohlich s.n. collections. Identifications have been confirmed by H. Crum. All other genes are the same as in Frohlich and Parker (2000)
. The full alignment is available at the website mentioned above.
|
The results of our analyses of these data by MEAWILK are presented in tables 36 . Table 3 presents results for all 30 EUs. Table 4 shows results using only 19 EUs, including only the five exemplar angiosperms. The significance under both random models was very high (P < 0.001) for all four hypotheses. Thus, all competing hypotheses and the monophyly of angiosperms are individually credible in the context of these data, which means they cannot be explained as results of random processes specified by either of the two simulation models. Thus, we seek to assess their relative credibility (purpose 3).
|
|
Hypothesis 2, that angiosperms lost the Leaf paralog, is the same size as hypotheses 1 and 3 for the 19-EU data set, where it has a total support value 161 below that of hypothesis 1, indicating that hypothesis 2 is the least credible of the three hypotheses. Although the slight difference in hypothesis sizes with 30 EUs could make them noncomparable, the great difference in total support leaves little doubt that hypothesis 2 should be rejected in favor of hypothesis 1.
Hypothesis 4, monophyly of the angiosperm LFY genes, is consistent with all three other hypotheses. It remains the most strongly supported, with total support exceeding the other hypotheses by ca. 500 and 300 for all angiosperms and for the five exemplars, respectively. Because hypothesis 4 is consistent with all of the other hypotheses tested, its strong support does not require rejection of any of them but does illustrate the level of support achieved by a well-accepted hypothesis of monophyly. By contrast, we evaluated a hypothesis of monophyly created by choosing approximately every third EU from the 30-EU data set to generate a hypothesis the same size as hypotheses 1 and 3. Its total support was 14.38, with significances of 0.407 for model 1 and 0.444 for model 2. For 19 EUs we choose approximately every second EU to get a hypothesis the same size as hypotheses 13. Its total support was 14.99, with a significance of 0.4590 for model 1 and a data set significance of 0.5783.
![]() |
Examination of Characters |
---|
|
Table 6 shows the start of the second exon of LFY, which begins at amino acid 203 except in the Atrichum sequences, where it begins at amino acid 201. The Atrichum sequences contain large gaps in this region. The left portion of the illustrated alignment is anchored by the splice site and confirmed by sequence similarity, but the right-hand portion of the alignment is quite uncertain. In the right-hand portion, many characters that provide support do so with very little weight, and many support more than one of the first three incompatible hypotheses. Characters that support both hypothesis 1 and hypothesis 3 contribute the exact same weight to each, so they have no effect on the difference in total support. Weight values for hypothesis 2 are close to those for hypotheses 1 and 3, so there is almost no contribution to the difference in total support.
|
![]() |
Discussion |
---|
Parsimony is achieved by trying to determine exactly which characters change along the branches leading to the groups hypothesized to be monophyletic in the estimated trees. This is a more specific task than that needed to determine Wilkinson support. One may think of characters that could have changed on any one of several branches (which is the Wilkinson support criterion) as offering partial support to each of the branches on which the characters could have changed. Instead of trying to reconstruct precisely where the changes happened, as in tree construction methods, one accepts characters that could have changed as evidence, although perhaps weak or circumstantial evidence, for a monophyletic group. Even in legal proceedings, a sufficiency of circumstantial evidence can lead to conviction. Here, one compares the circumstantial evidence for conflicting monophyletic-group hypotheses. If one hypothesis is, in fact, historically true and the other(s) are false, then it is reasonable to expect more circumstantial evidence to support the true hypothesis, allowing it to be identified.
From a parsimony point of view, parsimony-uninformative characters may seem inherently valueless, but such characters can have an impact. Even within some parsimony analyses, some character state changes may be accorded weights different from others. As a result, alternative tree topologies may acquire different lengths, even for a character that would be parsimony-uninformative if evaluated with equally weighted character state changes ("unordered character states," as in PAUP). Parsimony-uninformative characters certainly can affect analyses based on maximum likelihood, because the probability of character state changes may differ on different branches. An informative character may become parsimony-uninformative when the cost is exactly the same for every state change on every branch on which it could occur.
Some characters that would have no impact on a parsimony analysis (if parsimony state changes are weighted equally) can provide Wilkinson support for hypotheses of monophyly. Consider characters V and IX in table 1 , which are parsimony-uninformative because their state changes could be placed on any tree without requiring any extra changes, i.e., without requiring more than 3 for character V and 4 for character IX. For a hypothesis of size 3, character IX has a Wilkinson support weight of 0.92 and character V has a weight of 2.3. The latter is the maximum weight that any character can have for hypotheses of this size, which means that character V does support some hypotheses of this size but fails to support most of them, including (a, b, c). Thus, parsimony-uninformative characters can have a substantial effect on the total Wilkinson support for a hypothesis. As in this example, characters that exhibit numerous states (e.g., amino acid data) and have uncertain transformation series (so they are analyzed with equal-weights parsimony) may well be parsimony-uninformative yet show high weights for Wilkinson support. Wilkinson support analysis may be especially useful with amino acid data.
Some characters that provide evidence for a monophyletic group with parsimony do not provide Wilkinson support. For example, parsimony may unambiguously reconstruct a character state change at the base of a monophyletic group, even though that character reverts to its original state high within the monophyletic group, requiring a second, homoplasious, change in the character. In a Wilkinson analysis, any occurrence of the same character state both inside and outside of the HMG results in no support.
The tendency of one amino acid to change to another during evolution varies greatly among pairs of amino acids. For example, character IX in table 1
has valine (V) and isolucine (I), which are chemically similar and transform relatively easily from one to the other (Henikoff and Henikoff 1993
). In our example, we treated these as distinct states, so this character does provide support for the hypothesis (a, b, c). Transformation to other amino acids of that character (serine [S], threonine [T], and glutamine [Q]) are less likely, as are transformations among the latter three. If amino acids V and I were not distinguished, then this character would no longer support the hypothesis. In our weighting of Wilkinson support we do not address this issue (although it could be considered in a meta-analysis). Amino acids could be grouped into classes within which transformation is more likely and between which transformation is less likely. These classes could then be used as character states in the calculation of support.
A strength of Wilkinson support is that characters are evaluated individually to reveal how much each character supports each hypothesis. Such an examination of individual characters can be done in parsimony by sequential character removal (Davis, Frohlich, and Soreng 1993
), but that method is severely limited by computational difficulty and, in practice, is seldom done. Sequential character removal shows that some characters critical for the existence of a particular clade do not change state at the base of that clade. Hence, it is not easy to recognize the critically important characters supporting a clade without using that method. Furthermore, in the example of figures 14
, we illustrate that the presence of state changes at the base of a monophyletic group can be affected by within-HMG arrangements of EUs.
In our LFY analyses, some characters support all three of the major competing hypotheses and thus do not contribute to the differences among them. This occurs most often in characters with many gap states and/or singleton amino acids. These factors allow a character to support many different hypotheses of monophyly, which results in low weights for such characters. Thus, the number of (unweighted) characters that support a hypothesis may not be a very good measure of hypothesis support. Here, for example, a character showing the same amino acid for two closely related taxa, with the others showing gaps, would support all four hypotheses, but with a weight of only 0.57 for the first three hypotheses (e.g., character 250 in table 6 ). A character with so many gaps that its weight is small can make only minor contributions to total support, so the effect of its missing data is minimal.
In an analysis of LFY genes by unordered-states parsimony, characters 123 and 129 in table 5 would not support any of our four monophyly hypotheses (although they would be parsimony-informative within the major gene clades). Character 123 in table 5 shows the same state (lysine [K]) for all the free-sporing plants and also for the Needle gymnosperm paralogs, but the Leaf paralogs all show state arginine (R), while flowering plants show three different states (alanine [A], asparagine [N], and T). It seems reasonable that the shared states between free-sporing plants and Needle paralogs suggest that they should group together. Five free-sporing plants and four Needle paralogs all share this state, which suggests that it may be unchangeable due to some functional constraints. By contrast, the greater mutability of the amino acid at this position is a feature that unites the Leaf paralogs and flowering plants.
Character 129 shows all but one of the angiosperms with tryptophan (W) at this position; the exception is Peperomia, which has cysteine (C). The free-sporing plants and the two gymnosperm paralogs each show a variety of amino acids, although methionine (M) occurs in all three, and V occurs in two. Here, the amino acid seems to have become nearly immutable within flowering plants, whereas it could change quite easily in nonflowering plants. In unordered-states parsimony this provides no evidence of monophyly, but if such parsimony-uninformative characters provide no evidence of monophyly, we might expect such characters to give Wilkinson support to true and false monophyletic groups in roughly equal proportions. At the lowest level of support, signified by a period in figure 6, all four hypotheses of the LFY example do receive about the same amount of support; there are totals of 37, 41, 37, and 36 such characters for hypotheses 14, respectively. Characters providing the strongest support have very unequal distributions, with six characters ranked "3" for hypothesis 1, none for hypothesis 2 or 3, and 15 for hypothesis 4. Characters ranking "2" for the four hypotheses are 3, 1, 1, and 12, respectively.
Wilkinson support will allow researchers to focus attention on the individual characters and to judge the specific data elements that generate the conclusions. Wilkinson support makes possible a higher-level analysis (a meta-analysis), to be conducted on the initial analysis. One may use knowledge from any other source(s) to judge individual characters, including reliability of particular region(s) of the alignment, chemical properties of the amino acids, their likely structural effects or functions within the protein, etc. With enough other information, individual characters might be judged from "probably reliable" to "highly questionable." Such a meta-analysis could be used to question the accuracy of particular data or of the alignment. Especially when well-accepted hypotheses of monophyly are evaluated for Wilkinson support, this may be an important application not directly related to testing and comparing hypotheses of monophyly. This refocusing of attention onto the character data may be an important impact of Wilkinson support.
Generally, there are many different ways to define random processes to simulate "significance." We have chosen two simple models to illustrate this concept. Model 1 chooses random groups of EUs of the same size as the group under study and determines how often the randomly chosen group shows as much support as the actual group under study. This method preserves all of the interrelations among the data. If the number of EUs is small, as in our artificial example (and in some real data sets), then there will be only a few ways to choose a random group of EUs, which sets a lower limit to the realized significance. In our example of six EUs, there are 20 ways to choose an HMG containing three EUs. In this simple example, the HMG contains half of the EUs, so, by symmetry, support is the same for HMG (a, b, c) and for HMG (d, e, f), i.e., for 2 out of the 20. If this particular split is always more strongly supported than any other (e.g., has overwhelming support), then the realized significance under model 1 will be 2/20, or 10%. It cannot be lower than 10%, so it cannot reach the traditional significance cutoff of 5%. One must be aware of these inherent lower limits when interpreting significances under model 1 with small numbers of EUs. In our case, the HMGs (a, b, c) and (d, e, f) are more strongly supported than any other split of size 3, so this HMG does receive the maximal support under model 1.
Model 2 explicitly assumes character independence, so states for each character are rearranged on the EUs independently. Here, the characters, not the EUs, impose a lower limit for realized significance. The independence of all characters is a very powerful assumption, which may not be appropriate in all cases.
Characters that are uninformative for equal-weights parsimony may provide Wilkinson support; hence, Wilkinson support may arise from characters different from those important for parsimony. The Wilkinson criterion differs substantially from branch support measures such as Bremer support and bootstrap percentages based on parsimony. This makes Wilkinson support a good adjunct to parsimony analyses to judge group monophyly. If both sorts of analysis support monophyly of an assemblage of EUs, then the agreement between these methods gives higher total credibility to the monophyly of the group.
|
![]() |
Acknowledgements |
---|
![]() |
Footnotes |
---|
1 Abbreviations: EU, evolutionary unit; HMG, hypothesized monophyletic group; LFY, Floricaula/LEAFY gene; NPC, number of positions (in the amino acid alignment) that could support some hypothesis of the size under consideration; NPD, number of positions that do support the hypothesis; size, numbers of EUs in the smaller and the larger groups (one group is the HMG, and the other is the residual group); SWC, sum of weights of positions that could support a hypothesis of the size under consideration; SWD, sum of weights of positions that do support this hypothesis (=total support); %SWD, 100 x SWD/SWC.
2 Keywords: Wilkinson support
tests of monophyly
LEAFY gene
Floricaula gene
mostly-male theory
origin of angiosperms
3 Address for correspondence and reprints: Michael W. Frohlich, Herbarium, University of Michigan, NUBS building, 1205 North University, Ann Arbor Michigan 48109-1057. E-mail: mfroh{at}umich.edu
![]() |
literature cited |
---|
Alroy, J. 1994. Four permutation tests for the presence of phylogenetic structure. Syst. Biol. 43:430437[ISI]
Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42:795803
Crum, H. A., and L. E. Anderson. 1981. Mosses of eastern North America, Vol. 2. Columbia University Press, New York
Davis, J. I., M. W. Frohlich, and R. J. Soreng. 1993. Cladistic characters and cladogram stability. Syst. Bot. 18:188196[ISI]
Faith, D. P. 1991. Cladistic permutation tests for monophyly and nonmonophyly. Syst. Zool. 40:366476[ISI]
Faith, D. P., and J. W. H. Trueman. 1996. When the topology dependent permutation test (T-PTP) for monophyly returns significant support for monophyly should that be equated with (a) rejecting the null hypothesis of nonmonophyly, (b) rejecting a null hypothesis of "no structure", (c) failing to falsify a hypothesis of monophyly, or (d) none of the above? Syst. Biol. 45:580585
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401410[ISI]
. 1985. Confidence limits on phylogenies; an approach utilizing the bootstrap. Evolution 39:783791
Frohlich, M. W., and E. M. Meyerowitz. 1997. The search for homeotic gene homologs in basal angiosperms and Gnetales: a potential new source of data on the evolutionary origin of flowers. Int. J. Plant Sci. 158:S131S142
Frohlich, M. W., and D. S. Parker. 2000. The mostly male theory of flower evolutionary origins. Syst. Bot. 25:155170[ISI]
Henikoff, S., and J. G. Henikoff. 1993. Performance evaluation of amino-acid substitution matrices. Proteins Struct. Funct. Genet. 17:4961[ISI][Medline]
Huelsenbeck, J. P., D. M. Hillis, and R. Neilson. 1996. A likelihood ratio test for monophyly. Syst. Biol. 45:546558[ISI]
Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221244
Wilkinson, M. 1998. Split support and split conflict randomization tests in phylogenetic inference. Syst. Biol. 47:673695[ISI][Medline]