登录查看更多内容

How growing consumer demand for ancestry DNA testing is creating new challenges for DTC testing companies

Dr. Maria C. Dunford

CEO & Founder at Lifebit

发布日期: 2019年10月2日

All humans are 99% genetically identical. Yet the 1% difference hidden within our genomes is enough to inform us about our origins. Ancestry analysis has undeniably become the #1 genetic analysis performed throughout the world. If the pace of consumer ancestry testing continues the way it is today, over 100M individuals will have been tested in the next 24 months.

Let’s cut to the chase and understand why there is so much demand for Ancestry testing and, at the same time, discover some of its applications.

Innate human curiosity of our origins

It’s human nature to want to know about our family history and where our ancestors came from. Before DNA sequencing technologies were within consumers’ reach, individuals would have to track their origins through genealogy, which involves scouring many public databases and manually piecing data together. And unless you are part of the royal family, who meticulously archives hundreds of years worth of private records, tracking down public documents is tricky because consistent and standardised record-taking was implemented relatively recently.

As such, genetic ancestry testing came as a relief to those intrigued by their genealogy: it’s as easy as purchasing a kit, spitting into a tube and delving into your genetic history!

Besides providing an overview of our origins, Ancestry testing has also led to countless family reunions, from long lost siblings finding each other to adopted individuals discovering their direct family through Ancestry databases.

Interestingly, Ancestry testing is no longer limited to a report in exchange for your spit: companies have been rolling out eye-catching offers, such as the announcement between 23andme and Airbnb with their Heritage Travel feature: 23andme results now include travel recommendations based on your ancestry.

2. Solving cold cases - Genetic forensics

Ancestry testing has also had a more controversial effect on society, as it has reopened many criminal cases that went cold decades ago - most notoriously, The Golden State Killer was identified through Ancestry testing.

Ever since ‘Family Tree Forensics’ has been used to identify suspects in over 50 murder cases. The most outspoken critics warn, however, that this could be the end of genetic privacy.

3. Drug discovery

The wealth of information gathered from consumer ancestry testing greatly surpasses any current populational study efforts undertaken by public consortia. The sheer number of individuals (over 26M) included in direct-to-consumer (DTC) genetic databases significantly increases the power of genealogical algorithms to infer matches between individuals, also commonly referred to as the network effect. The power of these massive data networks generated by DTC genetic companies has also extended into groundbreaking research.

Case in point: the partnership announced by 23andme and GlaxoSmithKline (GSK) is expected to leverage the 6 billion DNA base pairs amassed by the leading DTC company to power drug discovery research.

4. Patient stratification and cohort selection for better clinical trial outcomes and research

Besides drug discovery, such treasure troves of genetic information also contribute to improved patient stratification - the data-driven matching of patients to the appropriate clinical trials in early drug development initiatives. However, as a community, we must be wary of Eurocentric biases in genome-wide association studies (GWAS), as not all populations will systematically respond in the same way to clinical biomarkers and novel drugs in clinical trials.

The remaining challenges of performing Ancestry analysis

Ancestry testing has revolutionised the way we approach genealogy and our individual ‘stories’, while at the same time, it has made DNA sequencing and genetics popular with the masses. There are still major challenges, however, that impede the seamless integration of such technology at production scale.

Population bias - the lack of inclusive reference populations

The heavy skew of population cohorts towards individuals of European ancestry affects the subsequent translatability of findings. To date, about 78% of individuals included in GWAS studies are of European descent, 10% Asian, 2% African, 1% Hispanic, with all other ethnicities representing less than 1%. Individuals with an underrepresented ethnicity are more likely to receive inaccurate reporting from genetic tests, which can lead to serious negative repercussions.

The community is well aware of this disparity when it comes to population genetic studies, and it’s being addressed with GWAS studies focusing on underrepresented ethnicities. These collaborative efforts are necessary to ensure that genomic research does not conserve historical inequalities or diminish the contribution that genomics could make to humanity as a whole, effectively bringing down racial barriers.

2. Complexity in establishing the right analysis workflow to deliver accurate results

The complex nature of Ancestry workflows represents another stumbling block for DTC companies. In order to implement this kind of workflow, providers need to oversee the processing of raw data to yield ethnicity and matching predictions, while at the same time, delivering stellar performance - who wants to hand out inaccurate ancestry reports? Just last year, it was estimated that 40% of variants identified by DTC genetic testing companies were false positives.

Best-in-class Ancestry workflows should implement the most up-to-date reference populations and statistical algorithms for sample imputation (brush up on imputation methods by reading this blog post).

Currently, this is not the case as most commercial applications are a bit outdated and rely on algorithms from a few years ago (remember, this space moves extremely fast!). To be competitive in this space, it is essential to keep up and implement the latest statistical and scientific methods.

3. Scaling Ancestry analysis over millions of individuals data

Due to the scale of human genetic data, the ultimate challenge to deliver state-of-the-art Ancestry testing services is figuring out how to scale analysis to meet customer demand, which can sometimes be in the hundreds, thousands or even millions of individuals.

Of course this is a good problem to have, as it means that you’re in business! However, this might hinder the success of your DTC business offering or severely delay your research. Therefore it is essential to carefully plan out the scalability of the analysis. You need to take into account all of the engineering work that will go into your backend to scale in the cloud or on HPC or both, data management, analysis progress and cost monitoring, auditing, and support (both technical and scientific - it’s essential to keep your Ancestry algorithms up-to-date, as previously mentioned).

After having gone through the main challenges associated with offering Ancestry testing, I hope that I haven’t scared you too much! But it is always good to know what you will be facing before rolling up your sleeves.

The best piece of advice I can offer on this subject is to avoid reinventing the wheel at all costs! This will cost you time and money, and your offering will most probably be either equivalent or worse than what is already available out there.

Lifebit’s high-performance Ancestry pipeline powers DTC companies & research

This is why at Lifebit, we have developed a new computationally efficient method to infer ancestry effectively utilising existing information about allele frequencies associated with different human populations, and of course using the most relevant reference genomes to date.

Lifebit’s Ancestry pipeline, used by a number of DTC and biotech companies, utilises reference population data from 27 different sub-populations. Finding good reference populations with many sub-populations can be very difficult (as established above), which is why we introduced a new Ancestry pipeline in the CloudOS Marketplace. ??

Discover our Ancestry pipeline

Our highly accurate Ancestry pipeline extracts haplo-groups, aggregates ancestry estimates and uses python BaseMap module to create an output map PNG map file annotated with the countries and reference files. Furthermore, it outputs a pie chart for ancestry populations and raw data in JSON format for further analysis/visualisations.

Besides offering the best-in-class pipeline through the Lifebit CloudOS Marketplace, we also deliver the necessary tools in order to make sure you can sustain your customer base in the most time- and cost-efficient way. For this, we provide:

Try out our Ancestry Pipeline!

If you are already using Lifebit’s Ancestry pipeline on CloudOS, we would love to know what you think! If you’re interested in running our Ancestry pipeline, contact our Customer Success team , they would love to help you out! Contact us: [email protected]

This post was originally published on Lifebit's Blog on October 2nd, 2019 by Dr. Maria Chatzou.

How growing consumer demand for ancestry DNA testing is creating new challenges for DTC testing companies

Dr. Maria C. Dunford

CEO & Founder at Lifebit

The remaining challenges of performing Ancestry analysis

Lifebit’s high-performance Ancestry pipeline powers DTC companies & research

更多精彩文章

社区洞察

其他会员也浏览了

Ukrainian soldiers are used for genetic material in Staten Island hospital

evolutionary

Do Ancestry DNA Kits Expire?

Grandparent DNA Test: How It Works

MyHeritage vs 23andMe: Detailed Review and Comparison

What Are RSIDs?

AncestryDNA vs 23andMe: Which Is Better For Raw Data Analysis?

Y chromosome is evolving faster than the X, primate study reveals

Genetics Testing and Privacy: Musings

AncestryDNA vs 23andMe: Which Is Better For Raw Data Analysis?

The remaining challenges of performing Ancestry analysis

Lifebit’s high-performance Ancestry pipeline powers DTC companies & research

Fixing the reproducibility crisis in science: Lifebit CloudOS meets Jupyter

2019年11月20日

How to best detect disease and cancer driver-genes using the novel HotNet2 algorithm

2019年11月4日

Finding the needle in the haystack: determining genetic variations associated with complex diseases

2019年10月16日

How to achieve standardised secondary genomics analysis with DRAGEN

2019年9月20日

How to analyse genomics data without bioinformatics skills using a genome browser

2019年9月13日

The ultimate guide on how to offer genetic testing services or boost your current offerings in no time

2019年9月4日

Filling in the blanks for direct-to-consumer genetic testing companies: delivering industry’s fastest & most scalable imputation method

2019年8月29日

Standardising cloud-native bioinformatics pipelines: nf-core meets CloudOS

2019年8月7日

Cloud-native is the future of bioinformatics applications

2019年6月25日

Bioinformatics Frontiers: Building towards cloud-native applications & beyond

2019年5月30日

社区洞察

其他会员也浏览了

Ukrainian soldiers are used for genetic material in Staten Island hospital

evolutionary

Do Ancestry DNA Kits Expire?

Grandparent DNA Test: How It Works

MyHeritage vs 23andMe: Detailed Review and Comparison

What Are RSIDs?

AncestryDNA vs 23andMe: Which Is Better For Raw Data Analysis?

Y chromosome is evolving faster than the X, primate study reveals

Genetics Testing and Privacy: Musings

AncestryDNA vs 23andMe: Which Is Better For Raw Data Analysis?