Breakthrough: Second Genetic Code Revealed
It’s sometimes difficult to assess the impact of a scientific paper when it is first published, but one that came out on the cover of Nature today has potential to equal the discovery of the genetic code. The leading science journal reported the discovery of a second genetic code – the “code within the code” – that has just been cracked by molecular biologists and computer scientists. Moreover, they used information technology – not evolutionary theory – to figure it out.
The new code is called the Splicing Code. It lives embedded within the DNA. It directs the primary genetic code, in very complex but now predictable ways, how and when to assemble genes and regulatory elements. Cracking this code-within-a-code is helping elucidate several long-standing mysteries about genetics that emerged from the Human Genome Project: Why are there only 20,000 genes for an organism as complex as a human being? (Scientists had expected far more.) Why are genes broken up into segments (called exons), separated by non-coding elements (called introns), and then spliced together after transcription? And why are genes turned on in some cells and tissues, but not in others? For two decades molecular biologists have been trying to figure out the mechanisms of genetic regulation. This important paper represents a milestone in understanding what goes on. It doesn’t answer all the questions, but it shows that an inner code exists – a communication system that can be deciphered so clearly, that the scientists could predict what the genome would do in certain situations with uncanny accuracy.
Imagine hearing an orchestra in an adjacent room. You open the door and look inside, and find just three or four musicians producing all that sound. That’s what co-discoverer Brendan Frey said said the human genome is like. We could only find 20,000 genes, but we knew that a vast array of protein products and regulatory elements were being produced. How? One method is alternative splicing. Different exons (gene elements) can be assembled together in different ways. “For example, three neurexin genes can generate over 3,000 genetic messages that help control the wiring of the brain,” Frey said. The paper explains right off the bat that 95% of our genes are known to have alternative splicing, and in most cases, the transcripts are expressed differently in different cell and tissue types. Something must control how those thousands of combinations are assembled and expressed. That’s the job of the Splicing Code.
Readers wanting a quick overview can read the Science Daily article, “Researchers Crack ‘Splicing Code,’ Solve a Mystery Underlying Biological Complexity.” It says, “Researchers at the University of Toronto have discovered a fundamentally new view of how living cells use a limited number of genes to generate enormously complex organs such as the brain.” In Nature itself, Heidi Ledford led off with an article called “The code within the code.”1 Tejedor and Valc�rcel followed with “Gene regulation: Breaking the second genetic code.2 Then the main dish was the paper by the University of Toronto Team led by Benjamin J. Blencowe and Brendan J. Frey, “Deciphering the splicing code.”3
The paper is a triumph of information science that sounds reminiscent of the days of the World War II codebreakers. Their methods included algebra, geometry, probability theory, vector calculus, information theory, code optimization, and other advanced methods. One thing they had no need of was evolutionary theory, which was never mentioned in the paper.4 Their abstract reverberates with the dramatic tension of a rousing overture:
Here we describe the assembly of a ‘splicing code’, which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing for thousands of exons. The code determines new classes of splicing patterns, identifies distinct regulatory programs in different tissues, and identifies mutation-verified regulatory sequences. Widespread regulatory strategies are revealed, including the use of unexpectedly large combinations of features, the establishment of low exon inclusion levels that are overcome by features in specific tissues, the appearance of features deeper into introns than previously appreciated, and the modulation of splice variant levels by transcript structure characteristics. The code detected a class of exons whose inclusion silences expression in adult tissues by activating nonsense-mediated messenger RNA decay, but whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.
The interdisciplinary team that cracked the code consists of specialists from the Department of Electrical and Computer Engineering as well as the Department of Molecular Genetics – and Frey works for Microsoft Research. Like the codebreakers of old, Frey and Barash developed “a new computer-assisted biological analysis method that finds ‘codewords’ hidden within the genome.” Taking vast amounts of data generated by the molecular geneticists, the group “reverse-engineered” the splicing code until they could predict how it would act. Once they got a handle on it, they tested it with mutations, and watched exons get inserted or deleted as they predicted. They found that the code can even cause tissue-specific changes, or act differently when the mouse is an embryo or an adult. One gene, Xpo4, is implicated in cancer; they noted that “These findings support the conclusion that Xpo4 expression must be tightly controlled such that it is active during embryogenesis but downregulated in adult tissues, to avoid possible deleterious consequences including oncogenesis” (cancer). It appears they were quite astonished at the level of control they were witnessing. Intentionally or not, Frey used the language of intelligent design – not that of random variation and selection – as the key to their approach: “Understanding a complex biological system is like understanding a complex electronic circuit.”
Heidi Ledford said that the apparent simplicity of the Watson-Crick genetic code, with its four bases, triplet codons, 20 amino acids and 64 DNA “words” – conceals a universe of complexity beneath the surface.1 The Splicing Code-within-the-code is much more complex:
But between DNA and proteins comes RNA, and an expanding realm of complexity. RNA is a shape-shifter, sometimes carrying genetic messages and sometimes regulating them, adopting a multitude of structures that can affect its function. In a paper published in this issue (see page 53), a team of researchers led by Benjamin Blencowe and Brendan Frey of the University of Toronto in Ontario, Canada, reports the first attempt to define a second genetic code: one that predicts how segments of messenger RNA transcribed from a given gene can be mixed and matched to yield multiple products in different tissues, a process called alternative splicing. This time there is no simple table – in its place are algorithms that combine more than 200 different features of DNA with predictions of RNA structure.
The work highlights the rapid progress that computational methods have made in modelling the RNA landscape. In addition to understanding alternative splicing, informatics is helping researchers to predict RNA structures, and to identify the targets of small regulatory snippets of RNA that do not encode protein. “It’s an exciting time,” says Christopher Burge, a computational biologist at the Massachusetts Institute of Technology in Cambridge. “There’s going to be a lot of progress in the next few years.”
Informatics – computational biology – algorithms and codes – such concepts were never a part of Darwin’s vocabulary as he developed his theory. Mendel had a vastly oversimplified computational model of how traits could be sorted out during inheritance, but even then, the idea that traits were encoded awaited discovery till 1953. Now we see that the original genetic code is itself subject to an even more complex embedded code. These are revolutionary ideas. And there are indications of even further levels of control. For instance, RNA and proteins have a three-dimensional structure, Ledford reminds us. The functions of the molecules can change when the shape changes. Something must control the folding so that the 3-D structure performs as required for function. And then the access to genes appears to be regulated by another code, the histone code, that is encoded by molecular markers or “tails” on the histone proteins that serve as nuclei for DNA coiling and supercoiling. Ledford spoke of an “ongoing renaissance in RNA informatics” characterizing our time.
Tejedor and Valc�rcel agreed with the complexity concealed by the simplicity.2 “At face value, it all sounds simple: DNA makes RNA, which then makes protein,” they began. “But the reality is much more complex.” We learned in the 1950s that the basic genetic code is shared by all living organisms from bacteria to humans. But it soon became apparent that there was a bizarre, counter-intuitive feature in complex organisms (eukaryotes): their genomes were interrupted by introns that had to be snipped out so that the exons could be spliced together. Why? Now the fog is lifting: “An advantage of this mechanism is that it allows different cells to choose alternative means of pre-mRNA splicing and thus generates diverse messages from a single gene,” they explained. “The variant mRNAs can then encode different proteins with distinct functions.” You get more information out of less code – provided you have a code-within-the-code that knows how to do it.
What makes cracking the splicing code so difficult is that the factors controlling what exons get assembled is determined by multiple factors: sequences adjacent to the exon boundaries, sequences in the exons, sequences in the introns, and regulatory factors that either assist or inhibit the splicing machinery. Not only that, “the effects of a particular sequence or factor can vary depending on its location relative to the intron�exon boundaries or other regulatory motifs,” Tejedor and Valc�rcel explained. “ The challenge, therefore, is to compute the algebra of a myriad of sequence motifs, and the mutual relationships between the regulatory factors that recognize them, to predict tissue-specific splicing.”
To solve the puzzle, the team fed the computer huge amounts of data on RNA sequences and the conditions under which they formed. “The computer was then asked to identify the combination of features that could best explain the experimentally determined tissue-specific selection of exons.” In other words, they reverse-engineered the code. Like WWII codebreakers, once they knew the algorithm, they could make predictions: “It correctly identified alternative exons, and predicted their differential regulation between pairs of tissue types with considerable accuracy.” And like a good scientific theory, the discovery led to new insights: “This allows reinterpretation of the function of previously defined regulatory motifs and suggests previously unknown properties of known regulators as well as unexpected functional links between them,” they said. “For instance, the code inferred that the inclusion of exons that lead to truncated proteins is a common mechanism of gene-expression control during the transition between embryonic and adult tissues.”
Tejedor and Valc�rcel see the publication of the paper as an important first step: “revealing the first piece of a much larger Rosetta Stone required to interpret the alternative messages of our genomes.” Future work will undoubtedly improve our knowledge of this new code, they said. In their ending, they referred to evolution briefly in a curious way: not to say that evolution produced these codes, but that progress will require understanding how codes interact. Another surprising possibility is that the degree of conservation seen so far raises the possibility of “species-specific codes” –
The code is likely to work in a cell-autonomous manner and, consequently, may need to account for more than 200 cell types in mammals. It will also have to deal with the extensive diversity of alternative-splicing patterns beyond simple decisions of single exon inclusion or skipping. The limited evolutionary conservation of alternative-splicing regulation (estimated to be around 20% between humans and mice) opens up the question of species-specific codes. Moreover, coupling between RNA processing and gene transcription influences alternative splicing, and recent data implicate the packing of DNA with histone proteins and histone covalent modifications – the epigenetic code – in the regulation of splicing. The interplay between the histone and the splicing codes will therefore need to be accurately formulated in future approaches. The same applies to the still poorly understood influence of complex RNA structures on alternative splicing.
Codes, codes, and more codes. The near silence about Darwinism in any of these papers suggests that old-school evolutionary theorists will have a lot to ponder after reading these papers. Meanwhile, those excited about the biology of codes will be on the cutting edge. They can play with a cool web tool the codebreakers created to stimulate further research. It can be found at the University of Toronto site, called WASP – “Website for Alternative Splicing Prediction.” Visitors will look in vain for anything about evolution here, despite the old maxim that nothing in biology makes sense without it. A new version for the 2010s might read, “Nothing in biology makes sense except in the light of informatics.”
1. Heidi Ledford, “The code within the code,” Nature 465, 16-17 (06 May 2010) | doi:10.1038/465016a.
2. J. Ram�n Tejedor and Juan Valc�rcel, “Gene regulation: Breaking the second genetic code,” Nature 465, 44-46 (06 May 2010) | doi:10.1038/465045a.
3. Yoseph Barash, John A. Calarco, Weijun Gao, Qun Pan, Xinchen Wang, Ofer Shai, Benjamin J. Blencowe and Brendan J. Frey, “Deciphering the splicing code,” Nature 465, 53-59 (06 May 2010) | doi:10.1038/nature09000.
4. “Conservation” information is mentioned several times, but refers only to a measure of sequence similarity between species, e.g., between mice and human genomes. Conservation does not have evolutionary significance without begging the question of evolution.
We are happy to bring you this story on the day of its release. It may be one of the really big science papers of the year, or decade. It could be Nobel Prize material. (Any great discovery is, of course, surrounded by the work of many other teams, as was the work of Watson and Crick.) What more can we add as commentary other than, “Wow”? This is amazing confirmation of design, and a huge challenge to the Darwin Empire. It will be interesting to see how they try to recast their simplistic 19th-century story of random mutation and natural selection in light of this. Did you catch what Tejedor and Valc�rcel said? Species may have their own “species-specific” codes. “The interplay between the histone [epigenetic] and splicing codes will therefore need to be accurately formulated in future approaches,” they said. Being translated, that means: “Darwinists need not apply. You don’t have the skills to handle this.” If the plain-old Watson-Crick genetic code was a challenge to Darwinism, how now with the Splicing Code generating thousands of transcripts out of the same genes? On top of that, the Epigenetic Code controls the context of gene expression. It may well be that the arrangement of chromosomes inside the nucleus plays a kind of coding role in the regulation of gene expression, too. Who knows what other codes are involved in this incredible “interplay” we have only begun to read, like a Rosetta Stone just beginning to poke above the sand?
Now that we are thinking codes and informatics, all kinds of new research paradigms come to mind. What if the genome acts partly like a storage area network? What if there is cryptography going on, or compression algorithms? We should be thinking advanced information systems and storage technologies. Maybe we will find some steganography even. Undoubtedly there are additional robustness mechanisms, like backups and retrieval – perhaps that helps explain pseudogenes. Whole-genome duplications may be responses to stress; other anomalies may be due to antivirus activity. Some of these trails may prove useful markers for historical events that have nothing to do with universal common ancestry, but open up comparative genomics in terms of informatics and design for robustness, and the understanding of disease.
The end of it all is an organism functioning in the world. Think of a tiger: strong, sleek, unified, with a coat of fur marked with stripes, eyes and ears pointed forward, stalking through the jungle, equipped with all the muscles and bones and senses and behaviors it needs to live through the days and nights of a planet orbiting a star. Above it are birds flying through the canopy. Below are snails and small reptiles and ants. Fish are darting in the river. Hundreds of species of plants each know how to send their roots down and their leaves up and when to produce their flowers and fruits. A team of human scientists carries their video cameras into the jungle hoping to remotely trigger them and capture footage of the elusive tiger. In the soil below and within and without all these other organisms, trillions of microbes are functioning in their microcosmos. All this is happening because of molecular codes translating and regulating chemistry into directed function. What philosophers in ancient times could possibly have known that this level of computational, information-based complexity undergirded the stuff of life? We are the generation blessed to discover these realities.
Darwinists are in for rough going ahead. The discoverers tried mutating the Splicing Code and got cancers and mistakes. How are you going to navigate a fitness landscape now, when it is a minefield of catastrophes waiting to happen when one starts mucking with all these intertwined codes? We know there is some built-in robustness and tolerance, but the picture emerging is a highly-complex, engineered, optimized informatics system – not a random arrangement of parts that can be endlessly tinkered with. The whole idea of code is an intelligent design concept. A. E. Wilder-Smith used to emphasize this. A code implies a convention between two parties. Convention – coming together – is an agreement in advance. It implies planning and purpose. The symbol SOS, he would say, we use by convention as a sign of distress. SOS does not look like distress. It does not smell like distress. It does not feel like distress. Nobody would know it means distress unless they understand the convention. In the same way, the codon for alanine, GCC, does not look, smell or feel like alanine. It would have nothing to do with alanine unless there were a pre-planned convention between two coding systems – the protein code and the DNA code – that “GCC shall mean alanine.” To convey that convention, a family of translators, the aminoacyl-tRNA-synthetases, are employed to translate one code into the other.
That should have nailed the case for design in the 1950s, and many creationists preached it effectively. But the evolutionists are like fast-talking salesmen. They wove their just-so stories about Tinker Bell zapping the code and creating new species by mutation and selection and convinced many that miracles can still happen. OK, well now it’s 2010 and we have the Epigenetic Code and the Splicing Code, two codes much more complex and dynamic than the simple DNA code. We have codes within codes, codes above and below codes – a hierarchy of codes. They can’t just stick their finger in the pistol this time and bluff their way out of it with smooth talking now, not with cannons to the left of them and cannons to the right of them, a whole arsenal aimed at their vital parts. This is a game changer. The informatics age has grown around them and they are has-beens, like pike-thrusting Greeks facing modern tanks and helicopters.
Sad to say, they don’t realize it; or if they do, they have no intention of conceding. In fact, some of the worst and most vicious, intolerant and hateful anti-creationist, anti-design rhetoric in recent memory has been pouring forth from the Darwin-controlled journals and newspapers this week, right when the Splicing Code paper was being published. Some examples will be forthcoming. And as long as they have the microphones and control the institutions, many people are going to be misled into thinking they still have the high ground in science. We bring you this material for you to read, study, understand, and arm yourselves with the information you need to combat bigoted bluffing blather with truth. Now go do it.