Maximum parsimony (phylogenetics)

In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes is to be preferred. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy (i.e., convergent evolution, parallel evolution, and evolutionary reversals). In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris [1] in 1970 and Walter M. Fitch in 1971.[2]

Maximum parsimony is an intuitive and simple criterion, and it is popular for this reason. However, although it is easy to score a phylogenetic tree (by counting the number of character-state changes), there is no algorithm to quickly generate the most-parsimonious tree. Instead, the most-parsimonious tree must be found in "tree space" (i.e., amongst all possible trees). For a small number of taxa (i.e., fewer than nine) it is possible to do an exhaustive search, in which every possible tree is scored, and the best one is selected. For nine to twenty taxa, it will generally be preferable to use branch-and-bound, which is also guaranteed to return the best tree. For greater numbers of taxa, a heuristic search must be performed.

Because the most-parsimonious tree is always the shortest possible tree, this means that—in comparison to the "true" tree that actually describes the evolutionary history of the organisms under study—the "best" tree according to the maximum-parsimony criterion will often underestimate the actual evolutionary change that has occurred. In addition, maximum parsimony is not statistically consistent. That is, it is not guaranteed to produce the true tree with high probability, given sufficient data. As demonstrated in 1978 by Joe Felsenstein,[3] maximum parsimony can be inconsistent under certain conditions, such as long-branch attraction. Of course, any phylogenetic algorithm could also be statistically inconsistent if the model it employs to estimate the preferred tree does not accurately match the way that evolution occurred in that clade. This is unknowable. Therefore, while statistical consistency is an interesting theoretical property, it lies outside the realm of testability, and is irrelevant to empirical phylogenetic studies.[4]