Propping Up Darwin's Tree of Lie
A valiant effort to construct Darwin’s tree icon in an open-source way may only serve to perpetuate a myth.
What do evolutionists do to look busy like scientists? They stitch together leaves on branches. That is, they assume Darwin’s view of universal common ancestry (the “tree of life” image), then try to find ancestor-descendent relationships between “leaves” (observable species) on the tips of branches here or there. One group may try to unite the marsupials. Another group may try to unite the slime molds. A third may try unify bats by common ancestry. This “tree building” exercise, called phylogenetics, assumes that all the branches connect to a common, single root.
Yet nobody has tried to connect all the branches into a single tree—till now. The new project, however, depends on the veracity of earlier published trees, which are usually controversial. In fact, published phylogenies often point out severe conflicts, alternative possible trees, or confounding influences like horizontal gene transfer (HGT) or “convergent evolution.” When assembling them together, the overall tree cannot be better than the branches. So is the result a tree of life or a tree of lie?
In PNAS this week, 22 evolutionists from across America published the first draft of their “Open Tree Taxonomy” (OTT) graph, which Astrobiology Magazine likens to a “Wikipedia” for evolutionary history (not that Wikipedia makes for a good analogy; see ENV). Their goal is to begin the process of constructing the full Darwinian tree out of the isolated phylogenies already published, and then to provide a digital framework that can be updated as research into the tree of life continues:
Scientists have used gene sequences and morphological data to construct tens of thousands of evolutionary trees that describe the evolutionary history of animals, plants, and microbes. This study is the first, to our knowledge, to apply an efficient and automated process for assembling published trees into a complete tree of life. This tree and the underlying data are available to browse and download from the Internet, facilitating subsequent analyses that require evolutionary trees. The tree can be easily updated with newly published data.
Astrobiology Magazine tries to justify this activity as having practical value beyond just giving Charles Darwin a good name.
Evolutionary trees, branching diagrams that often look like a cross between a candelabra and a subway map, aren’t just for figuring out whether aardvarks are more closely related to moles or manatees, or pinpointing a slime mold’s closest cousins. Understanding how the millions of species on Earth are related to one another helps scientists discover new drugs, increase crop and livestock yields, and trace the origins and spread of infectious diseases such as HIV, Ebola and influenza.
The statement ignores the question of whether phylogeny is the only way to reap these rewards; certainly many of them were practiced before Darwin’s day (and since) using other methods and assumptions. A skeptical reading of the paper, however, uncovers many points where error can creep in. Here are some excerpts:
- Starting ignorance: “Despite decades of effort and thousands of phylogenetic studies on diverse clades, we lack a comprehensive tree of life, or even a summary of our current knowledge.”
- Questionable gap filling: “When source phylogenies are absent or sparsely sampled, taxonomic hierarchies provide structure and completeness.” Again, “Taxonomies contribute to the structure only where we do not have phylogenetic trees.” What if both are reliant on the assumption of universal common ancestry?
- First step, wrong direction? “Although a massive undertaking in its own right, this draft tree of life represents only a first step.“
- Data destruction: “585,081 of the names are classified as nonphylogenetic units (e.g., incertae sedis [of uncertain placement]) and were therefore not included in the synthesis pipeline.” Sometimes the treasure is in the junk.
- Selective inclusion: “The complete database contains 6,810 trees from 3,062 studies. At the time of publication, 484 studies in our database are incorporated into the draft tree of life.” That’s 20% of the studies, and 10% of the trees—implying that many of the studies presented two or more conflicting phylogenies from the same data. In addition, they couldn’t use analog phylogenies that were presented, for instance, as printed diagrams in papers.
- Prejudice: “Our goal is to generate a best estimate of phylogenetic knowledge; based on our tests, we give several reasons not to use all available trees for synthesis. First, including trees that are incorrect does not improve the synthetic estimate.” Who decides what is best? Who decides which published trees are incorrect?
- Consensus is not science: “In each major clade, expert curators selected and ranked input trees for inclusion based on date of publication, underlying data, and methods of inference (see Materials and Methods for details). These rankings generally reflect community consensus about phylogenetic hypotheses.” Is date a criterion of correctness? Who decides if the data are good? Who decides what methods are best? The ranking is guaranteed to reinforce the current consensus, which assumes universal common descent. Skeptics and mavericks are eliminated at the starting line by the “experts”—who ranked them?
- Rigging the game: “Not all trees are sufficiently well-curated; at this point, we have focused curation efforts on trees that will most improve the synthetic tree.” The data are not speaking objectively; they’ve guaranteed what they set out to find.
- Let others do the rigor: “The full set of trees in the database is important for other questions such as estimating conflict or studying the history of inference in a clade, highlighting the importance of continued deposition and curation of trees into public data repositories.” It’s good they keep all the data public, but who will really check it?
- Heavily biased inputs: “Most tips in the synthetic tree (98%) come from taxonomy only, reflecting both the need to incorporate more species into phylogenies and the need to make published phylogenies available.” This means only 2% use the highly-advertised molecular phylogenetic methods of inferring ancestry.
- Heavily biased outputs: “We obtained trees from digital repositories and also by contacting authors directly, but our overall success rate was only 16%.” In Materials and Methods, they say: “The data retrieved are by no means a complete representation of phylogenetic knowledge because we obtained digital phylogeny files for only 16% of recently published trees.”
- [It goes without saying that no input skeptical of universal common ancestry was included.]
So after chucking the “bad” trees (as determined by the “experts”), and eliminating data not considered helpful to phylogenetic synthesis, they came up with a draft tree based on a tiny minority of possible inputs. Fortunately, they strived for transparency about conflicts:
We constructed a tree alignment graph, the graph of life, by loading the Open Tree Taxonomy and the 484 rooted phylogenies into a neo4j database. The graph of life contains 2,339,460 leaf nodes (after excluding nonphylogenetic units from OTT), plus 229,801 internal nodes. It preserves conflict among phylogenies and between phylogenies and the taxonomy. To create the synthetic tree, we traversed the graph, resolving conflict based on the rank of inputs, and labeled accepted branches that trace a synthetic tree summarizing the source information. This method allows for clear communication of how conflicts are resolved through ranking, and of the source trees and/or taxonomies that support a particular resolution.
Transparency is good, but the official result is likely to be the one everybody focuses on (like Astrobiology Magazine, which showcased the draft tree below the headline). As such, a published diagram can take on a life of its own, becoming the emblem of consensus (visualization). Changes to the draft are likely to be slight, made by biologists working on one small branch here or there. Even if the authors invite editing and admit their limitations, who will have the courage to scrap the tree entirely and start over?
This tree is comprehensive in terms of named species, but it is far from complete in terms of biodiversity or phylogenetic knowledge. It does not aim to infer novel phylogenetic relationships, but instead is a summary of published and digitally available phylogenetic knowledge.
That knowledge, however, was highly selective, even by their own admission. It was rigged to support a consensus from the outset.
Even so, the resulting tree was plagued with conflicts. A look at Figure 3 in their paper shows multiple possible linkages between groups, such that one could call it a Network of Life as much as a Tree of Life. That’s what “reticulation” means:
The Open Tree of Life contains areas with conflict (Fig. 3). For example, the monophyly of Archaea is contentious—some data-store trees indicate that eukaryotes are embedded within Archaea rather than a separate clade. Similarly, multiple resolutions of early diverging animal and Eukaryotic lineages have been proposed. Reticulations help visualize competing hypotheses, gene tree/species tree conflicts, and underlying processes, such as horizontal gene transfer (HGT), recombination, and hybridization, which have had major impacts throughout the tree of life [e.g., hybridization in diverse clades of green plants and animal lineages, including our own, and HGT in bacteria and archaea]. The graphical synthesis approach used here naturally allows for storage of conflict and non–treelike structure, enabling downstream visualization, analysis, and annotation of conflict (Fig. 3) and highlighting the need for additional work in this area.
What the positivist hand gave, in other words, the empirical hand took away. There isn’t a tree. There is a summary draft of one, criss-crossed by numerous points of conflict, resulting in a network diagram, not a tree diagram. That’s why, at the end of the paper, one wonders if Darwin’s tree icon is even real. A vocal minority of evolutionary biologists has severe doubts about it (2/01/07, 4/01/10); not long ago, the staunchly pro-evolutionary magazine New Scientist declared that “Darwin was wrong” about the tree of life (1/22/09).
In a final part of the Discussion section called “Dark parts of the tree” the authors admit to major gaps:
Hyperdiverse, poorly understood groups, including Fungi, microbial eukaryotes, Bacteria, and Archaea, are not yet well-represented in input taxonomies. Our effort also highlights where major research is needed to achieve a better understanding of existing biodiversity.
The Materials and Methods section recounts much of the subjective pruning applied by the “expert” curators who decided which data to include to get their deficient, conflicted, biased “expert” tree.
Update 9/22/15: Elizabeth Pennisi went overboard in her praise of this draft tree. Writing for Science Magazine, she presented their diagram as gospel truth.
Want to know how related you are to a wombat? Or an amoeba? Now you can, thanks to the newly released Open Tree of Life, which knits together more than 500 family trees of various groups of organisms to create a supertree with 2.3 million species.
Pennisi presents none of the empirical or philosophical problems with this “cool visualization” but did offer a couple of quizzical statements that cast doubt on its value: “We hope the tree looks much different a year from now,” the project coordinator said. Another one said that if you gather your own data, “you can make your own tree of life.” Does that mean this tree is wrong? Is anyone allowed to make a bush of life, a web of life, or a network of life? Why does it have to be a tree? Jonathan Wells thinks the history of life more closely resembles a lawn.
Care to guess what was citation #1 in the references? Here’s a hint from the first sentence in the paper: “The realization that all organisms on Earth are related by common descent (1) was one of the most profound insights in scientific history.” You got it; (1) refers to Darwin’s Origin of Species. This whole effort was a paean of homage to King Charles. An offering acceptable to the Bearded Buddha was guaranteed from the first sentence; these devotees took the coins of the offerers and melted them down into a crown to put on their idol’s head.
Most of the phylogenetic trees included in this futile exercise use software that is programmed to get a tree whether one exists or not. Few readers probably remember the important paper from 7/25/02 that proved it’s mathematically impossible to get a reliable tree of life from the available data? They said, “one is forced to admit that no future civilization will ever build a computer capable of solving the problem while guaranteeing that the optimal solution has been found.” The only workaround is to program shortcuts—algorithms that assume evolution!
In short, this incestuous practice of Darwinian tree-building begins with Darwinian assumptions, proceeds with Darwinian assumptions, and (not surprisingly) ends up with Darwinian conclusions. Let’s recall our acronyms from the Darwin Dictionary. They were all illustrated by this paper.
GIGO: Garbage in, garbage out.
DIDO: Darwin in, Darwin out.
GIDO: Garbage in, Darwin out.
DIGO: Darwin in, garbage out.
DODO: Darwin only, Darwin only.
Want access to a genuine Tree of Life? Read this.