Human Genome Infinitely More Complex Than Expected
Ten years after the Human Genome Project was completed, now we know: biology is “orders of magnitude” more complicated than scientists expected. So wrote Erika Check Hayden in Nature News March 31 and in the April 1 issue of Nature.1
An air of daunting complexity haunts the article. The Human Genome Project was one of the great scientific investigations of the end of the 20th century. Some compared it to the Manhattan Project or the Apollo program. It used to be tedious, painstaking work to read the sequence of DNA letters. Now, deciphering genomes is a matter of course. But with the rush of data coming from genomes of everything from yeast to Neanderthals, one thing has become clear: “as sequencing and other new technologies spew forth data, the complexity of biology has seemed to grow by orders of magnitude,” Hayden wrote.
A few things were surprisingly simple. Geneticists expected to find 100,000 genes in the human genome; the count is more like 21,000. But with them came a huge surprise in the accessory molecules – transcription factors, small RNAs, regulators – all arranged in dynamic interacting networks that boggle the mind. Hayden compared them to the Mandelbrot set in fractal geometry that unveils deeper levels of complexity the closer you look.
“When we started out, the idea was that signalling pathways were fairly simple and linear,” says Tony Pawson, a cell biologist at the University of Toronto in Ontario. “Now, we appreciate that the signalling information in cells is organized through networks of information rather than simple discrete pathways. It’s infinitely more complex.”
Hayden acknowledged that the “junk DNA” paradigm has been blown to smithereens. “Just one decade of post-genome biology has exploded that view,” she said, speaking of the notion that gene regulation was a straightforward, linear process – genes coding for regulator proteins that control transcription. “Biology’s new glimpse at a universe of non-coding DNA – what used to be called ‘junk’ DNA – has been fascinating and befuddling.” If it’s junk, why would the human body decode 74% to 93% of it? The plethora of small RNAs produced by these non-coding regions, and how they interact with each other and with DNA, was completely unexpected when the project began.
These realizations are dissipating some of the early na�vet� of the Human Genome Project. Planners predicted we would “unravel the mysteries behind everything from evolution to disease origins.” Cures for cancer were envisioned. We would trace the path of evolution through the genetic code. That was so 1990s. Joshua Plotkin, a mathematical biologist at the University of Pennsylvania in Philadelphia, said, “Just the sheer existence of these exotic regulators suggests that our understanding about the most basic things – such as how a cell turns on and off – is incredibly na�ve.” Leonid Kruglyak, a geneticist at Princeton University in New Jersey, commented on the premature feeling that the data would speak for itself: “There is a certain amount of naivety to the idea that for any process – be it biology or weather prediction or anything else – you can simply take very large amounts of data and run a data-mining program and understand what is going on in a generic way.”
Some are still looking for simple patterns in the complexity. Top-down approaches try to build models where the data points fall into place:
A new discipline – systems biology – was supposed to help scientists make sense of the complexity. The hope was that by cataloguing all the interactions in the p53 network, or in a cell, or between a group of cells, then plugging them into a computational model, biologists would glean insights about how biological systems behaved.
In the heady post-genome years, systems biologists started a long list of projects built on this strategy, attempting to model pieces of biology such as the yeast cell, E. coli, the liver and even the ‘virtual human’. So far, all these attempts have run up against the same roadblock: there is no way to gather all the relevant data about each interaction included in the model.
The p53 network she spoke of is a good example of unexpected complexity. Discovered in 1979, the p53 protein was first thought to be a cancer promoter, then a cancer suppressor. “Few proteins have been studied more than p53,” she said. “…Yet the p53 story has turned out to be immensely more complex than it seemed at first.” She gave some details:
Researchers now know that p53 binds to thousands of sites in DNA, and some of these sites are thousands of base pairs away from any genes. It influences cell growth, death and structure and DNA repair. It also binds to numerous other proteins, which can modify its activity, and these protein�protein interactions can be tuned by the addition of chemical modifiers, such as phosphates and methyl groups. Through a process known as alternative splicing, p53 can take nine different forms, each of which has its own activities and chemical modifiers. Biologists are now realizing that p53 is also involved in processes beyond cancer, such as fertility and very early embryonic development. In fact, it seems wilfully [sic] ignorant to try to understand p53 on its own. Instead, biologists have shifted to studying the p53 network, as depicted in cartoons containing boxes, circles and arrows meant to symbolize its maze of interactions.
Network theory is now a new paradigm that has replaced the one-way linear diagram of gene to RNA to protein. That used to be called the “Central Dogma” of genetics. Now, everything is seen to be dynamic, with promoters and blockers and interactomes, feedback loops, feed-forward processes, and “bafflingly complex signal-transduction pathways.” “The p53 story is just one example of how biologists’ understanding has been reshaped, thanks to genomic-era technologies,” Hayden said. “….That has expanded the universe of known protein interactions – and has dismantled old ideas about signalling ‘pathways’, in which proteins such as p53 would trigger a defined set of downstream consequences.”
Biologists made a common mistake of assuming that more data would bring more understanding. Some continue to work from the bottom up, believing that there is an underlying simplicity that will come to light eventually. “It’s people who complicate things,” remarked one Berkeley researcher. But one scientist who predicted the yeast genome and its interactions would be solved by 2007 has had to put off his target date for a few decades. It’s clear that our understanding remains very rudimentary. Hayden said in conclusion, “the beautiful patterns of biology’s Mandelbrot-like intricacy show few signs of resolving.”
There’s a bright side to the unfolding complexity. Mina Bissell, a cancer researcher at the Lawrence Berkeley National Laboratory in California, confesses she was “driven to despair by predictions that all the mysteries would be solved” by the Human Genome Project. “Famous people would get up and say, ‘We will understand everything after this’,” Hayden quoted her saying. But it turned out for good, in a way: “Biology is complex, and that is part of its beauty.”
1. Erika Check Hayden, “Human genome at ten: Life is complicated,” Nature 464, 664-667 (April 1, 2010) | doi:10.1038/464664a.
Who predicted the complexity: the Darwinians or the intelligent design proponents? You already know the answer. The Darwinians have been wrong on this matter time and time again. The origin of life would be simple (the Warm Little Pond of Darwin’s dreams). Protoplasm would be simple. Proteins would be simple. Genetics would be simple (remember Darwin’s pangenes?). The carrier of genetic information would be simple. DNA transcription would be simple (the Central Dogma). The origin of the genetic code would be simple (the RNA World, or Crick’s “frozen accident.”). Comparative genomics would be simple, and we would be able to trace the evolution of life in the genes. Life would be littered with the trash of mutations and natural selection (vestigial organs, junk DNA). Simple, simple, simple.
Simple-minded.