Darwinian Phylogenetic Tools Are Mathematically Flawed
Many evolutionists use software tools to construct evolutionary trees from genetic data. Two mathematicians have just reported in Science1 that several popular “tree-building” algorithms can give misleading results:
Markov chain Monte Carlo (MCMC) algorithms play a critical role in the Bayesian approach to phylogenetic inference. We present a theoretical analysis of the rate of convergence of many of the widely used Markov chains. For N characters generated from a uniform mixture of two trees, we prove that the Markov chains take an exponentially long (in N) number of iterations to converge to the posterior distribution. Nevertheless, the likelihood plots for sample runs of the Markov chains deceivingly suggest that the chains converge rapidly to a unique tree. Our results rely on novel mathematical understanding of the log-likelihood function on the space of phylogenetic trees. The practical implications of our work are that Bayesian MCMC methods can be misleading when the data are generated from a mixture of trees. Thus, in cases of data containing potentially conflicting phylogenetic signals, phylogenetic reconstruction should be performed separately on each signal. (Emphasis added in all quotes.)
Will this workaround cure all problems, though? Only for small data sets – maybe. The more data, the more impossible the task becomes:
For small trees one can hope to overcome the slow convergence by using multiple starting states. However, mixtures coming from large trees may contain multiple species subsets where one tree has T1 as an induced subtree and the other has T2. If there are k such subsets, then about 15k random starting points will be needed. Thus if there are 10 disagreement subsets, then 1510 random starting points will be needed in order to sample from the posterior distribution.
That’s over 576 billion. Most tree-building programs try to take shortcuts around the computational hurdles, but these mathematicians are not sure that the heuristic algorithms used are successful in avoiding assumptions that could bias the results. Their paper has proven one way the results can be misleading. Are there others?
In our setting, BMCMC [Bayesian Markov-Chain Monte Carlo] methods fail in a clearly demonstrable manner. We expect that there is a more general class of mixtures where BMCMC methods fail in more subtle ways. These subtle failures may occur for many real-world examples where the Markov chains quickly converge to some distribution other than the desired posterior distribution. Users of BMCMC methods should ideally avoid mixture distributions that are known to produce degenerate behavior in various phylogenetic settings. A good practice is to decompose the data into nonconflicting signals and perform phylogenetic reconstruction separately on each signal. Our work highlights important unresolved questions: how to verify homogeneity of genomic data and what phylogenetic methods can efficiently deal with mixtures.
Thus, they leave some potential gaping loopholes unexplored.
1Mossel and Vigoda, “Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees,” Science, Vol 309, Issue 5744, 2207-2209, 30 September 2005, [DOI: 10.1126/science.1115493].
What this seems to say is that the method might work on closely-related organisms, like species of snails, but the more you mix different types of organisms into a tree of common ancestry, the more the results of these popular methods will give misleading results. Even with the closely-related trees, though, how can one be sure that the answers might “fail in more subtle ways”? And how do we know that once the smaller trees are assembled, the algorithms won’t mislead horrendously in the final mix?
Most creationists would probably not have qualms about trees of closely-related “kinds” of animals, like cats for one, or dogs for another. It is the Darwinian assumption that everything is phylogenetically related – cats, pine trees, bacteria, sharks, petunias, turtles, mushrooms, senators – that causes the controversies.
Evolutionists often showcase the printouts from these programs in their scientific papers to lend an air of computational legitimacy to their theories (see the fallacy of statistics). Well, we warned you that evolutionists are bad at math (08/19/2005, 07/25/2002). The only illustration in Darwin’s Origin of Species was a phylogenetic tree. Since then, tree-building has become a favorite pastime around the Darwin Temple gamerooms (10/22/2001, 06/13/2003). Impressive as the charts look to the uninformed, they hawk symbolism over substance. This fits Hawkins Theory of Scientific Progress (right sidebar).
After reading this article, and the links to previous ones, how do you feel about that NSF Tree of Life project costing $12 million in tax dollars? (10/30/2002). If you want a better Tree of Life, try God’s (search) – it’s free, it’s honest, and you don’t have to play Monte Carlo to find it.