Uncovering the Secrets of the Dark Genome

Uncovering the Secrets of the Dark Genome

There were high hopes when the 13-year project to sequence the entire "book of life" encoded within the human genome was deemed "complete" in April 2003. At a cost of roughly $3 billion (£2.5 billion), it was hoped that the Human Genome Project would help cure chronic diseases and shed light on everything that is genetically predetermined about our lives. But even as press conferences to announce the success of this new era of biological understanding were being held, this guide to human life had already produced an unanticipated surprise.


No hay texto alternativo para esta imagen


The general consensus at the time was that the majority of the human genome would be made up of instructions for creating proteins, the fundamental unit of all living things that play an absurdly wide variety of functions both within and between our cells. It seemed logical that each of the over 200 different cell types in the human body would require its own genes to perform the essential tasks. It was once believed that the emergence of distinct protein-protein interactions was essential to the development of our species and our mental faculties. (We are the only species that can sequence our own genome, after all.). Instead, it turned out that proteins only make up less than 2% of the three billion letters that make up the human genome. The long rows of base pairs that make up our DNA sequences were found to contain only about 20,000 different protein-coding genes. As a result, the "Encyclopedia of DNA Elements" known as ENCODE was released.

“Humans may have?as few as 20,000 ?protein encoded by genes, while in comparison worms have around?20,000 ?and a fruit fly has?approximately 13,000 ”.


The ENCODE project

Every nucleotide in the genome that is "doing something" was to be listed. This was the first significant attempt to conduct an in-depth, high-resolution study of dark matter.

In the last five years, it involved more than 440 scientists from 32 different organizations from around the world who investigated 147 different cell types through 1648 experiments1. More than 30 scientific papers reported the findings. Some of the results of the ENCODE project were not immediately apparent, while others followed expectations. To sum up:

  1. A large number of non-protein coding regions were found to control protein-coding genes. These regions also showed a strong association with disease outcomes.
  2. Based on experimental evidence, at least 10,000 highly conserved elements were believed to be involved in regulating protein synthesis.
  3. More than 1,000 new families of functionally distinct RNA secondary structures were reported.
  4. Two million new potential targets for transcription factors were identified.
  5. Pseudogenes, which have been historically mapped to the fossil hotbeds of the genome, seemed to make a lot of non-coding RNAs.
  6. Regulatory controls of gene expression that were carefully crafted millions of years ago still seemed to be active in human cells.


The dark genome, or the remaining 98% of our DNA , is a mysterious jumble of letters that serves no discernible function or meaning. At first, some geneticists hypothesized that the dark genome was nothing more than junk DNA, or the remains of damaged genes that had long since lost their significance.

Twenty years later, we are beginning to understand the dark genome's function. Its main job seems to be controlling the expression, or decoding, of genes that make proteins. Epigenetics is the study of how our genes respond to the various environmental stresses that our bodies experience throughout our lives, including diet, stress, pollution, exercise, and sleep patterns. Additionally, the amount of dark DNA increases with organism complexity, and non-coding RNAs (ncRNAs), rather than protein-coding genes, appear to control the genome's regulation.


Non-coding RNAs

Non-coding RNAs (ncRNAs) are a heterogeneous group of RNA transcripts that do not translate into proteins. It has been shown that ncRNAs can play key regulatory roles in the transcription of protein-coding genes both at a post-transcriptional and translational level. These RNA molecules can bind to target complementary messenger RNA (mRNA) transcripts and/or DNA nucleotide sequences to control gene expression.


No hay texto alternativo para esta imagen

ncRNAs are arbitrarily classified according to their size into small non-coding RNAs (sncRNAs, <200 nucleotides-long) and long non-coding RNAs (lncRNAs, >200 nucleotides-long). Depending on their mechanisms of action and biogenesis, sncRNAs can be further classified into different types, including, among others, endogenous small interfering RNAs (endo-siRNAs), microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), etc.

Although the functions of all these transcripts were initially unknown, RNA has been shown to have a very wide range of biological functions. Specifically, RNA transcripts are being used as a means of gene regulation in higher eukaryotes, both by cis and trans mechanisms. Research has indicated that up to 20% of these dark DNA regions play a vital role in controlling gene expression by regulating when and where a gene is activated or deactivated. ncRNAs appear to use a number of different mechanisms in order to regulate gene expression. The expression of this ncRNA appears to be regulated throughout the lifetime of an organism, with miR 18a levels declining from the embryonic stage to adulthood. Finally, given that the non-coding part of the genome is extremely larger than the protein-coding one, the genetic cause of numerous diseases may be related to mutations within ncRNAs, including neurological and psychiatric disorders, among several others.

If you want to learn more, you can watch this interesting video by Henrik Ellinghaus

Translational science, from bench to bedside:

In the field of cancer vaccines, where companies conduct DNA sequencing on a patient's tumor sample to try and identify a suitable target for the immune system to attack, the majority of approaches have focused only on the protein-coding regions of the genome. However, German-based biotech CureVac is pioneering an approach where they analyze the non-protein-coding regions as well in the hope of finding a target that can disrupt cancer at its source.

Haya Therapeutics , is currently pursuing a drug development program targeting a series of non-coding RNAs that drive scar tissue formation, or fibrosis, in the heart, a process that can lead to heart failure. One of the hopes is that this approach could minimize the side effects which come with many common medicines. The company has raised +25 M USD from investors like Humboldt Fund , Apollo Health Ventures , and Viva BioInnovator .

Nucleome Therapeutics secured £37.5 million backed by 辉瑞 , Merck Group KGaA, British Patient Capital , and founding investors Oxford Science Enterprises and 强生公司 . Nucleome’s platform harnesses the capabilities of 3D genome technology and machine learning. The technology reportedly enables the direct linking of genes to diseases and allows for the precise mapping of pathways that can boost drug discovery capabilities.

Recent studies from Rachel Raybould et al. have begun to explore the dark genome and reported large structural variants within known AD risk genes CR1 and ABCA7. There are also regions within INPP5D, IQCK, and HLA, as well as valid AD candidate genes, which contain dark areas not assayed by genome-wide genotyping or short-read sequencing technologies.

The dark genome is exquisitely specific in its activity. There are non-coding RNAs which regulate fibrosis only in the heart, so by drugging them, you have a potentially very safe medicine.

Just to start the conversation, I`d like to leave two questions for you: "Is our genome still evolving over time? Why are there far more switches than protein-coding elements?



Welcome to the BioBusinesss newsletter. Your source for Biotech and Business news. Feel free to reach out for consulting or sponsorship opportunities.

Are you enjoying the newsletter? Help us make it better by sharing it with your colleagues and friends.

See you in two weeks — Adrian

?


Adrian, your dedication to the biobusiness field is commendable! It's an exciting and ever-evolving industry with immense potential for innovation and impact. Keep pushing the boundaries of science and medicine through your work, and your contributions will continue to make a significant difference. Your passion is inspiring and your journey is just beginning.

Vasula Premawardhana

Managing Director @ Long Term Alpha Management | Pioneering REIT Fund Management | Capital Market Strategist

1 年

Insightful article (and video). "Dark" is the absence of light or rather lack of our capacity to observe and understand its nature. Though we are shedding and acquiring new code as we move along, hard to imagine inefficient natural designs burdened and carrying over 98% "extra luggage" in genetic coding. Energy is efficiently distributed and there's no room for stowaways. Rules of causality says if something's there it's serving a purpose.

Aliza M.Zafar

Biotechnologist | Research Assistant at ICCBS | Genomic Researcher

1 年

A much needed informative article with such mind-blowing questions. Thank you for sharing. ??

Gregory Muhs

Molecular Technologist, ACSP certified with a master's in Bioinformatics. Experienced Molecular Biologist, Bioinformaticist, and Molecular MLT with a strong passion for research. Looking to make connections.

1 年

Very well-done summary of the Encode Project!

Pedro Moreno

PhD | Biomedical Research & Innovation | Life Sciences Consultant | Bridging R&D & Business Strategy for impact

1 年

Two really thought provoking questions Adrian Rubstein. My take on the second question... one aspect of it (in simplistic terms) is that it could be a way of increasing diversity without expanding the more complex and longer sequences associated to each coding gene. Also faster regulation by smaller sequence sets acting on RNA translation and an added level of diversity/complexity when acting at the epigenetic level. For the first question, and if referring to one persons genome throughout life, I guess if considering again epigenetic changes those can occur and it could be considered an evolution of the genome.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了