登录查看更多内容

Finding the needle in the haystack: determining genetic variations associated with complex diseases

Dr. Maria C. Dunford

CEO & Founder at Lifebit

发布日期: 2019年10月16日

There are 3,000,000,000 base pairs in the human genome, which define our individual code. 99% of these are identical for all humans- only the remaining 1% can help explain our differences. Finding these differences, or single nucleotide polymorphisms (SNPs), is the equivalent of looking for a needle in a haystack. SNPs are random and can be harmless, or they can be correlated with specific diseases or traits.

Genome-wide association studies (GWAS) have gained traction since the mid 2000s, with over 7,800 GWAS studies performed which have facilitated the detection of over 159,200 unique ‘needles’ - SNP-trait associations, all archived in EMBL-EBI’s GWAS Catalogue.

The basic premise behind GWAS is to compare the genetic variations between two cohorts: the first is a set of individuals with a specific disease or trait being studied, and the second is the control group. Essentially, researchers are able to identify genetic variations associated with a particular disease.

Phenome-wide association studies (PheWAS), are the complementary ‘inverse’ approach to GWAS, and examine many different phenotypes to see which are associated to a given genetic variant. The fundamental difference between both methods is the direction of inference - in GWAS it is from outcome to exposure, while in PheWAS it is from exposure to outcome.

GWAS enabled a major shift in the way we link genotype to phenotype & map disease traits

GWAS and PheWAS have proven to be the most powerful ways of revealing the effects between genetic variations and phenotypic outcomes. Before such methods were implemented, researchers had relatively low-resolution approaches (i.e. linkage mapping) or more targeted approaches where candidate genes were resequenced in cohorts of interest. Post-GWAS is an entirely different story - we now have a high-resolution and unbiased approach to unearthing regions or genes that may have not previously been on our radar.

GWAS and PheWAS analyses are performed for the following three main purposes:

Providing potential targets for therapy by highlighting underlying molecular pathways,
Identifying markers used to predict individual disease risk or phenotypic trait (Polygenic Risk Score estimations), and
Cohort analysis for better patient stratification and clinical trials.

GWAS have successfully contributed to our understanding of disease mechanisms, with the GWAS poster child being age-related macular degeneration (AMD), the leading cause of irreversible vision loss in individuals over the age of 60. GWAS has unambiguously revealed that the complement pathway is involved, which was not previously known to play a role in the disease. There are many other exemplars of GWAS successes, including type 2 diabetes, schizophrenia and auto-immune diseases.

GWAS results are highly reproducible and extremely powerful as they enable valid predictions, or polygenic risk scores (PRS), in new unexplored datasets. PRS allow the stratification of a population based on the sum of trait-associated SNPs weighted by their effect sizes. Essentially, such risk scores provide an overall measure of an individual’s genetic liability to develop disease - regarded by many as the holy grail. Large-scale initiatives, such as the UK 100,000 Genomes Project, are ideal databases for GWAS/PheWAS analysis and determining PRS - the bigger the dataset, the more variant-trait associations can be identified.

GWAS & PheWAS analysis challenges - the criticism & controversy

GWAS and PheWAS have allowed the broad characterisation of the genetic basis of traits and diseases, however, several challenges remain. Let’s address the elephant(s) in the room.

Extrapolating findings to other populations

Although GWAS and PheWAS studies has been revelatory in many aspects, they are still predominantly focused on European populations (88% of studies in 2017) with 72% of discoveries from participants recruited from three countries (US, UK, Iceland).

This is especially concerning for the translatability of polygenic predictions from one population to another. If researchers were to develop a polygenic risk score for having a heart attack, for instance, the scores would be meaningless if applied to any other population besides white Europeans. If GWAS is to be a truly equitable and useful tool to predict disease risk, studies need to be repeated in more diverse populations.

Recently, there has been a significant push to include other ethnic groups and admixed populations as there is a pressing need to extrapolate findings to non-European populations and to increase the statistical power of these studies.

Furthermore, the lack of diversity should raise red flags, especially when performing GWAS and PheWAS analyses to improve patient stratification for clinical trials. Since these analyses can be performed in cohorts as small as 250 individuals (depending on the number of SNPs tested and effect size), findings may not be easily translatable to other patient groups, as ancestry admixture plays a significant role.

2. Scaling analysis to accommodate growing cohort sizes

Besides genetic diversity and careful cohort selection being recognised challenges in the field, the importance of cohort sizes has become a critical factor in assuring the statistical power of findings.

Early GWAS and PheWAS studies did not reveal many correlations mainly due to the small cohort sizes. Smaller sample sizes do not generate enough statistical power to find associations linking markers and phenotypes (this also depends a lot on the genetic architecture of the trait(s) you are focusing on - oligogenic vs polygenic).

But as public and private datasets grow in size and complexity (i.e. UK Biobank, or even commercial databases such as 23&Me), researchers now have access to treasure troves of genetic data, allowing them to enhance the quality of their cohorts, and by default the significance of their results. Case in point - a recent GWAS study published in Nature Genetics used data from 1.1 million individuals to assess their adventurousness and willingness to take risks.

However, as GWAS and PheWAS continue to evolve in complexity and more data is being analysed, we now face the issue of scaling analysis. Specifically, when we’re now talking about hundreds of thousands to millions of individuals in a single study, we need to take into account how computationally challenging and expensive that can turn out to be.

3. Uniting disconnected data

Often times, high quality GWAS and PheWAS studies grow beyond the capacity of a single institution- which is surely even more the case now considering the 1M+ samples sizes. This requires researchers to unite disconnected data pulled from different public and/or private data repositories.

Uniting sensitive and disconnected data, however, can prove challenging and a maximum security risk if you have to physically transfer or share it. Researchers need a practical and efficient way to do so, without having to put at risk their data… or their sanity.

Overcoming GWAS & PheWAS challenges by getting analysis on-demand.

In an ideal world of GWAS and PheWAS analyses, cohorts would include a large number of individuals, datasets would be diverse in terms of ethnicity and the genetic and phenotypic data would be centralised in one big public database. Effectively, this would eliminate computational issues and researchers could easily run and scale the analysis on-demand in the fastest, most scalable and cost-effective way.

One can always dream, right?

Well, there are specific ways for researchers to overcome challenges associated with GWAS and PheWAS analyses.

To accommodate for large cohorts, researchers should turn to the cloud for infinitely scalable compute resources. At Lifebit, we have developed cloud-native optimised implementations of both GWAS and PheWAS workflows, which are freely and accessible on-demand in the Lifebit CloudOS platform. These state-of-the-art standardised and reproducible pipelines harness the power of the cloud for elastic resource provisioning, while at the same time, keeping costs low (> 80% cloud cost reductions) with cost saving instances and deployment efficiency.

Besides improving scalability and reining in costs, researchers should be able to perform GWAS and PheWAS analyses on distributed data from private and public datasets. At Lifebit, we have created the only federated data analysis platform - the Lifebit CloudOS platform - which allows researchers to access disconnected data without having to deal with the inefficiencies of transferring or copying large volumes of data. This provides an ideal foundation for a globally fragmented and distributed GWAS community which stores data in countless isolated databases around the world.

By using Lifebit CloudOS technology for your GWAS and PheWAS analyses, you will be able to:

Infinitely scale your GWAS & PheWAS analysis to study ever-increasing cohort sizes
Access public data and combine it with your private data for federated analysis, allowing you to cover more ground without having to transfer massive datasets
100% reproducible, compliant and FAIR GWAS and PheWAS pipelines - optimised for cloud-native usage that ensures maximum scale, speed and cost-minimisation
Minimise and monitor costs and runtimes, ensuring you stay within your research budget, and
Provide intuitive visualisation of your GWAS results

By robustly developing end-to-end GWAS and PheWAS workflows, researchers will be able to avoid ad-hoc analyses, and embrace scalable and reproducible bioinformatics workflows in production with the CloudOS platform.

Try it!

Try Lifebit's cloud-native GWAS & PheWAS workflows today

If you are curious to see what our GWAS and PheWAS workflows are capable of achieving, check out the ones we have already run through our public jobs links: GWAS & PheWAS.

Have your own data to test drive? Run our scalable GWAS and PheWAS workflows on the CloudOS platform.

This post was originally published on Lifebit's Blog on October 16th, 2019 by Dr. Maria Chatzou Dunford.

要查看或添加评论，请登录

Dr. Maria C. Dunford的更多文章

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

2019年11月20日

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

Anyone working in a data-driven industry will undoubtedly recognise the name Jupyter, a clever acronym for Julia…
How to best detect disease and cancer driver-genes using the novel HotNet2 algorithm

2019年11月4日

How to best detect disease and cancer driver-genes using the novel HotNet2 algorithm

Why study cancer heterogeneity? Introducing HotNet2 to support your efforts. In 2018, 18.

1 条评论
How growing consumer demand for ancestry DNA testing is creating new challenges for DTC testing companies

2019年10月2日

How growing consumer demand for ancestry DNA testing is creating new challenges for DTC testing companies

All humans are 99% genetically identical. Yet the 1% difference hidden within our genomes is enough to inform us about…
How to achieve standardised secondary genomics analysis with DRAGEN

2019年9月20日

How to achieve standardised secondary genomics analysis with DRAGEN

The quest for Scalable and Integrated Genomics Analysis In today’s world of bioinformatics, data has gotten too broad…

1 条评论
How to analyse genomics data without bioinformatics skills using a genome browser

2019年9月13日

How to analyse genomics data without bioinformatics skills using a genome browser

As sequencing technologies become more performant year over year, three challenges for the research community arise: 1)…

1 条评论
The ultimate guide on how to offer genetic testing services or boost your current offerings in no time

2019年9月4日

The ultimate guide on how to offer genetic testing services or boost your current offerings in no time

Unless you have been living under a rock, you should be well aware that consumer genetics are booming and that the…

3 条评论
Filling in the blanks for direct-to-consumer genetic testing companies: delivering industry’s fastest & most scalable imputation method

2019年8月29日

Filling in the blanks for direct-to-consumer genetic testing companies: delivering industry’s fastest & most scalable imputation method

The commoditised genotyping array market has generated increased interest with various large direct-to-consumer (DTC)…

1 条评论
Standardising cloud-native bioinformatics pipelines: nf-core meets CloudOS

2019年8月7日

Standardising cloud-native bioinformatics pipelines: nf-core meets CloudOS

Bioinformaticians in the pursuit of best practices The democratisation of Next Generation Sequencing technologies has…

1 条评论
Cloud-native is the future of bioinformatics applications

2019年6月25日

Cloud-native is the future of bioinformatics applications

It was clear from the London Bioinformatics Frontiers Conference hosted last week at The Francis Crick Institute, that…

3 条评论
Bioinformatics Frontiers: Building towards cloud-native applications & beyond

2019年5月30日

Bioinformatics Frontiers: Building towards cloud-native applications & beyond

Bioinformaticians are only starting to adapt to the emerging landscape of serverless applications aiming to cater to…

2 条评论

See all articles

Finding the needle in the haystack: determining genetic variations associated with complex diseases

Dr. Maria C. Dunford

CEO & Founder at Lifebit

GWAS enabled a major shift in the way we link genotype to phenotype & map disease traits

GWAS & PheWAS analysis challenges - the criticism & controversy

Overcoming GWAS & PheWAS challenges by getting analysis on-demand.

Try it!

Dr. Maria C. Dunford的更多文章

社区洞察

其他会员也浏览了

UNGA ‘78 with GenCoE: Partnering to Accelerate Genomic Science in Africa

Von Willebrand Factor: Insights from Genetic Studies

Use Your DNA Raw Data To Check Your Cystic Fibrosis Carrier Status

A Step-Change in Whole Genome Sequencing Coverage

Brave New Therapeutics: The Virus that Cures, the Drug that Lives

Underdiagnosed Rare Diseases are More Common Than You Think and How NGS is Improving Detection

Exome sequencing in rare genetic diseases

Increasing Testing in Genetic Disorders Driving the Growth of the Carrier Screening Market

Growing Number of Infectious Diseases is Escalating the Demand of PCR Technologies Globally

Time for genomics to fulfil its promise

GWAS enabled a major shift in the way we link genotype to phenotype & map disease traits

GWAS & PheWAS analysis challenges - the criticism & controversy

Overcoming GWAS & PheWAS challenges by getting analysis on-demand.

Try it!

Dr. Maria C. Dunford的更多文章

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

How to best detect disease and cancer driver-genes using the novel HotNet2 algorithm

How growing consumer demand for ancestry DNA testing is creating new challenges for DTC testing companies

How to achieve standardised secondary genomics analysis with DRAGEN

How to analyse genomics data without bioinformatics skills using a genome browser

The ultimate guide on how to offer genetic testing services or boost your current offerings in no time

Filling in the blanks for direct-to-consumer genetic testing companies: delivering industry’s fastest & most scalable imputation method

Standardising cloud-native bioinformatics pipelines: nf-core meets CloudOS

Cloud-native is the future of bioinformatics applications

Bioinformatics Frontiers: Building towards cloud-native applications & beyond

社区洞察

其他会员也浏览了

UNGA ‘78 with GenCoE: Partnering to Accelerate Genomic Science in Africa

Von Willebrand Factor: Insights from Genetic Studies

Use Your DNA Raw Data To Check Your Cystic Fibrosis Carrier Status

A Step-Change in Whole Genome Sequencing Coverage

Brave New Therapeutics: The Virus that Cures, the Drug that Lives

Underdiagnosed Rare Diseases are More Common Than You Think and How NGS is Improving Detection

Exome sequencing in rare genetic diseases

Increasing Testing in Genetic Disorders Driving the Growth of the Carrier Screening Market

Growing Number of Infectious Diseases is Escalating the Demand of PCR Technologies Globally

Time for genomics to fulfil its promise