Celebrate ENCODE III and the demise of ‘Junk DNA’
An international genome survey finds that vast quantities of non-coding DNA
are transcribed and probably functional, not ‘junk’ as evolutionists had thought
by Margaret Helder, PhD
It is no secret among biologists that the ENCODE II pronouncements in 2012 were controversial. International scientists in the ENCODE consortium made statements about how 80% of the human genetic material was “functional”1 But the term functional, for the past couple of decades, had become associated with evolutionary concepts. The prevailing idea was that the DNA sequences in the cell were mostly not functional but rather made up of “junk DNA” which testified to the evolutionary origin of the cell’s genetic material. So, the ENCODE II team found themselves very unpopular with a lot of fellow biologists for suggesting a high level of functionality in the genome. Now, however in 2020, a third ENCODE project was published. Would it build on the conclusions of the ENCODE II team, or would it revise its estimates of functionality drastically downward?
It all began in the year 2000 when President Bill Clinton and British Prime Minister Tony Blair jointly announced that an initial stage of the Human Genome Project was complete. Since 1990 an international team of scientists had been attempting to describe the precise order of nucleotides in the chromosomes in human cells. The idea was one gene provides the correct order of nucleotides to code for one protein. Proteins are elaborate molecules that make up most components of living cells. Billions of cells make up our bodies. The expectation was that the Human Genome Project would provide information on about 100,000 different genes.
Imagine everyone’s surprise when scientists announced that there were only about 23,000 (or fewer) genes in the more than 3 billion nucleotides of the human genome. This meant that less than 2% of the human genome coded for proteins! What was the rest of that huge collection of noncoding nucleotides doing? This finding seemed to indicate that the vast majority of the human DNA had no apparent function. According to Ewan Birney, head of the European Bioinformatics Institute in Cambridge, England, people always knew that there must be some regulation of the genes, but even if the useful information were doubled, compared to the amount in the genes, that would only result in 8-9% functional information at most among the 3.2 billion nucleotides.
The Leftover Junk Evolutionary View
There were two responses possible to this astonishing discovery. One was to declare that the human genome was full of “junk DNA.” According to this point of view, the genome was obviously the product of long periods when DNA which had been originally useful, was now broken so that other sequences had to take over control. Huge areas of repeat sequences seemed to confirm the idea of useless DNA. The human genome was declared to be stark confirmation of the reality of evolution and a rejection of creation. What creator would leave so much junk lying around? And this is how junk DNA with no function came to be equated with arguments for evolution. Another group asked what the rest of the genome was doing. Were these lengthy sequences functional, and if so, what were they actually doing?
The Search for Function Approach
Driven by curiosity about the nature of the genome, in 2003 an international consortium of scientists from 10 countries began a systematic survey of 1% of the human genome. Whatever was in that 1% stretch of DNA was studied whether it was a gene (coding for a protein) or a noncoding sequence of nucleotides. Such a study should give a representative indication of what was happening in the rest of the genome as well. So it was in 2007, that the ENCODE (ENCyclopedia of DNA Elements) consortium released their results. To the surprise of many, most of the DNA studied seemed to have a function, or at least it was copied into other molecules in the cell. The scientists thus concluded: “the simple view of the genome as having a defined set of isolated loci [genes] transcribed independently does not seem to be accurate.”2
The initial ENCODE report was interesting enough to encourage the U.S. based National Human Genome Research Institute to fund a study of the whole human genome. As a result, in 2012, a new larger ENCODE consortium published its results. In summary they found that:
The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology.3
This was a major bombshell to come out of the ENCODE II study. Actually, the result had been hinted at in the first report, namely a repudiation of the concept of “junk DNA.” Many biologists were deeply incensed at the suggestion that: “These data enabled us to assign biochemical functions for 80% of the genome, in particular outside the well-studied protein-coding regions.”4 Commentary in the same issue reinforced the point: “Why evolution would maintain large amounts of ‘useless’ DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA.”5
A large number of other scientists in the field of DNA studies then went on record as objecting to every aspect of the ENCODE study. For a start, they did not like the systematic nature of the research. Systematic studies mean that the investigators approach the research without expectations as to what they will find. As a result, every observation is equally welcome since they have no preconceptions. Many scientists however prefer to see hypothesis driven projects. In this case the investigator asks a specific question. Details not included in the scope of the question, will not necessarily be observed. Some prominent scientists declared that the ENCODE consortium erred in not asking an evolution-based question.
Among the chorus of negative comments on ENCODE was Sean Eddy (presently at Harvard) who said in his blog Cryptogenomicon “ENCODE Says What?”
Personally, I don’t think we can understand genomes unless we try to recognize all the different, noisy, neutral evolutionary processes that work in them.6
But the person who became best known for his attacks on ENCODE was Dan Graur. Concerning the 80% functional claim of the consortium for the human genome, Dan Graur and colleagues declared:
Progress in understanding the functional significance of DNA sequences can only be achieved by not ignoring evolutionary principles.7
As an alternative, Graur and his friends argued for a 10% functional proportion in the genome.8 Their article was provocatively entitled: “On the immortality of television sets: ‘Function’ in the human genome according to the evolution-free gospel of ENCODE.”9
Counter-Arguments and More Clues
Some from the beleaguered ENCODE consortium replied to the attacks in an article in Proceedings of the National Academy of Sciences. They reviewed “the strengths and limitations of biochemical, evolutionary and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and biological implications of these discrepancies.”10 But they did not retreat from their previous conclusions. They thus referred to “pervasive activity over an unexpectedly large fraction of the genome, including noncoding and nonconserved regions and repeat elements. Such results greatly increase upper bound estimates of functional sequences.”11
Firstly in their reply, however, the ENCODE II consortium referred to another bombshell connected to their report: the fact that since 2005, important genetic markers connected with specific diseases, had increasingly been identified in the noncoding part of the genome (that is, the part not connected with genes, supposedly representing junk DNA). The consortium used Genome Wide Association Studies (GWAS) in part to make these identifications. Meanwhile, progress in the ability to obtain sequences of entire genomes from more and more people developed. Specialists began to look for unique markers in people with a specific disease (or condition) compared to others who lacked this trait. And surprise, surprise:
More recently, genome-wide association studies have indicated that a majority of trait-associated loci [locations], including ones that contribute to human diseases and susceptibility, also lie outside the protein-coding regions. These findings suggest that the noncoding regions of the human genome harbor a rich array of functionally significant elements with diverse regulatory and other functions.”12
The scientists had discovered that a great deal of important activity was going on in noncoding regions of the genome. If markers of disease are found not connected to genes, it must mean that something important is going on in these noncoding regions. If the area were unimportant, a change in the DNA order of nucleotides there would not matter one way or the other. But it obviously did matter.
Sources of Genetic Disease in Mutated Non-Coding DNA
When scientists first began to study the human genome in detail, the expectation had been that the cause of many genetically determined disease conditions would be solved by studying the relevant protein and its coding in a gene. This is certainly true for some diseases. For example, sickle cell anemia is caused by a mutation in the 17th nucleotide in a chain of 441 nucleotides that code for the beta globin chain of hemoglobin. This one mistake alters the shape of the whole molecule and the results can be catastrophic for people who possess the mutation. Similarly, the hope was that the causes of most or all genetically caused diseases or developmental abnormalities would be solved by such studies. However, the results were proving discouraging until the scientists began to examine GWAS for large populations. These studies allowed them to get past the examination of specific genes in order to search where the relevant mutations were actually occurring in the DNA molecule.
ENCODE III Sheds More Light
These two bombshells of a high proportion of functionality throughout the genome and the location of most disease associated markers in noncoding parts of the genome, are major themes vigorously promulgated in ENCODE III, eight years after ENCODE II.13 On the issue of disease markers, an ENCODE III article declares that “GWAS and cancer genomics studies continue to deposit-disease related sequence variations into public databases, and most of these variants fall into non-coding regions.”14 Reflecting on this fact, one team from the consortium points out that we are now perceiving “a fundamental feature of the genetic architecture of disease that has heretofore, to our knowledge, escaped notice.” 15 What we now observe, they say, “has important theoretical and practical implications for understanding both genetic architecture of disease and the problem of connecting genetic signals with their target genes, which is critical for therapeutic translation.”16
These insights demonstrate that there remains much to learn about the genetic causes of many diseases. Apparently, the majority of mutations linked to disease lie nowhere near the relevant protein-coding genes. Accordingly, an online overview of ENCODE Encyclopedia Version 3 declares: “over 80% of the variants reported by the GWAS are in noncoding regions of the genome and the mechanism of how they contribute to disease onset is unknown.”
Implications for Function
Although these disease related studies point us to the regulatory (noncoding) sections of the genome, the really critical question is how much of the genome is functional. For a start, it would help if everyone agreed on what the word “functional” means. The ENCODE II people assumed that any DNA sequence that was copied into another molecule must be functional. Dan Graur objected to this assumption, declaring that “Transcription does not equal function.”17 His point was that the mere existence of molecular activity does not necessarily imply that the activity benefits the cell or contributes to its evolutionary success. The molecular product may have been accidentally produced. The ENCODE III team did not back down from the previous positions of the consortium. They continued to defend the view that any nucleotide sequence that leads to a molecular product or to any chemical activity is functional.18
While the ENCODE II team made the tactical error of quoting a high value for functionality in the genome, the new team made no such mistake. In numerous ways, however, they indicated that they were actually extending the ENCODE II conclusions. They declared, “Importantly, although very large numbers of noncoding elements have been defined, the functional annotation of ENCODE-identified elements is still in its infancy.” 19 There will be more such revelations to come. Also, they insist: “we do not claim that the current cCRE [regulatory regions] classification scheme reflects the full biological spectrum of regulatory activities encoded in the genome.”20
There will undoubtedly be more discoveries to come. ENCODE III added to the evidence for function in non-coding DNA with newly discovered regulatory sequences such as a “class of functional sequence elements not previously recognized by ENCODE.”21 The result of this discovery alone is: “These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RNA binding proteins.”22 Similarly, they declare: “ENCODE III data increased the number of annotated cis-regulatory elements by nearly 22% compared with ENCODE II.”23 They made similar statements throughout the report, for example, under the heading “Dense encoding of regulatory information.”24
Obviously the ENCODE III team was extending the identification of functional elements much further than ENCODE II had. Their summary statement affirms belief that the case for genomic function in non-coding DNA will grow stronger:
It has become apparent that, by virtually any metric, elements that govern transcription, chromatin organization, splicing, and other key aspects of genome control and function are densely encoded in many parts of the human genome sequence.25
The ENCODE III teams did not retreat in the face of pressure from doctrinaire evolutionists. They continued to make their observations and to let the evidence speak for itself. In general, they took a pragmatic approach to the whole issue in the hope that “Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.”26 They felt no obligation to trot out irrelevant evolutionary theories on junk DNA, especially since these have been already falsified by the ENCODE results.
- ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489 #7414: 57-74.
- ENCODE Project Consortium. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447 #7146: pp. 799- 816. See p. 812.
- Brendan Maher. 2012. The Human Encyclopaedia. Nature 489 #7414 pp. 46-48. See p.
- ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489 p. 57.
- Ines Barroso. 2012. Non-coding but functional. Nature 489 p. 54.
- September 8, 2012.
- Dan Graur et al. Genome Biol. Evol. 5 (3): p. 587 italics mine.
- Graur et al. 578.
- Genome Biol. Evol. 5 (3): 578-590.
- Manolis Kellis et al. Defining functional DNA elements in the human genome. PNAS 111 #17 pp. 6131-6138. See p. 6131.
- Kellis et al. 6134.
- Kellis et al. 6131 italics mine.
- ENCODE Project Consortium. 2020. Perspectives on ENCODE. 583 #7818: 693-698. And flagship article: Expanded encyclopaedias of DNA elements in the human and mouse genomes. pp. 699-710.
- Fabian Grubert et al. Landscape of cohesin-mediated chromatin loops in the human genome. Nature 583 #7818: PP. 737-743. See p. 743.
- Wouter Meuleman et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584 #7820: pp. 244-251. See 250.
- Meuleman et al. 251.
- Graur et al. 581.
- The sequence of nucleotides “that specify molecular products (for example, protein coding genes or noncoding RNAs) or biochemical activities with mechanistic roles in gene or genome regulation (for example, transcription promoters or enhancers)” is functional.” p. 699 and there was a similar definition with ENCODE II p. 57.
- ENCODE Project Consortium. 2020. Perspectives on ENCODE p. 697.
- ENCODE Project Consortium. 2020. Flagship article p. 706.
- ENCODE Project Consortium. 2020. Flagship article p. 702.
- Eric L. Van Nostrand et al. A large-scale binding and functional map of human RNA-binding proteins. Nature. 583 #7818: pp. 711-719. See p. 711.
- ENCODE Project Consortium. 2020. Flagship article p. 706.
- Meuleman et al. 247.
- ENCODE Project Consortium. 2020. Flagship article p. 709.
- ENCODE Project Consortium. 2020. Flagship article p. 699.
Margaret Helder completed her education with a Ph.D. in Botany from Western University in London, Ontario (Canada). She was hired as Assistant Professor in Biosciences at Brock University in St. Catharines, Ontario. Coming to Alberta in 1977, Dr Helder was an expert witness for the State of Arkansas, December 1981, during the creation/evolution ‘balanced treatment’ trial. She served as member of the editorial board of Occasional Papers of the Baraminology Study Group in 2001. She also lectured once or twice a year (upon invitation) in scheduled classes at University of Alberta (St. Joseph’s College) from 1998-2012. Her technical publications include articles in the Canadian Journal of Botany, chapter 19 in Recent Advances in Aquatic Mycology (E. B. Gareth Jones. Editor. 1976), and most recently she authored No Christian Silence on Science (2016) which promotes critical evaluation of scientific claims. She is married to John Helder and they have six adult children.