Bracing for the Petabyte Era in Genomics
Image from istock

Bracing for the Petabyte Era in Genomics

Divya Anantsri


A quick look at the latest genome sequencing costs tracked by the National Human Genome Research Institute (NHGRI) shows that the cost of DNA sequencing has plummeted between 2001 and 2022.

Graph showing the decreasing cost of sequencing over time, compared to what would be predicted by Moore’s law.

Further, Illumina has recently claimed that whole genome sequencing on the NovaSeqX can be completed for as little as $200.

While this consistent downward trend offers great promise for the genomics industry, it has uncovered a new challenge:? this vast amount of data needs to be processed and stored in cost-effective ways before interpretation can begin.

Moreover, given that sequencing projects (especially those pertaining to population health and clinical trials) increasingly include large sample sizes, the data generated is often in the order of terabytes. These volumes are slowly exceeding the capacity of on-premise servers, necessitating the use of robust and scalable cloud solutions.?

For bioinformatics labs aiming to optimize turnaround time (TAT) by migrating to the cloud, popular options include Amazon AWS, Microsoft Azure and Google Cloud Platform.?

However, the problem isn’t completely resolved until the infrastructure is set up to optimize performance and minimize storage costs.

On a related note, I was recently at a talk at Bio-IT World 2024,?where Grigoriy Sterin, Senior Principal Engineer from Tessera Therapeutics, a gene editing company, elaborated on how they built their data platform by continuously overcoming challenges in their cloud infrastructure.?

Grigoriy emphasized the importance of the following measures for scaling up while moving away from manual and cumbersome setups -?

  • Storing results in a shareable way?
  • Tracking and automating workflows?
  • Using workflow engines that support parallelization

Considering these nuances are indeed effective ways to optimise cloud storage.?

Echoing Sterin’s insights on overcoming cloud infrastructure hurdles, I’d like to highlight our poster at Bio-IT - Cloud Storage and Data Management Strategy for NovaSeq X+ Data - where we addressed scaling issues for our in-house genomics data using cloud storage.

This poster outlines our operations team's strategy for optimizing the storage and processing of NovaSeq X Plus (NSX+) data, aimed at efficiently managing more than 16 petabytes of data over the next decade.?

In essence, the NSX+ generates 1.5 TB of data per run using the 10B flow cell.

Our data operations team - Srikant Sridharan , Aman Saxena , Priyanshu Agarwal - has developed a streamlined pipeline, incurring AWS costs of $300/run to process 100 samples. The total AWS cost includes compute and transfer at $220 and $80/run, respectively.

We streamlined this pipeline by adopting a few strategies -?

  1. Centralizing data processing
  2. Eliminating S3 dependency and adopting an Amazon EC2 instance for computing and temporary storage
  3. Utilizing Azure for long-term storage

Our optimized data flow architecture yields a cost reduction of $650/run, saving 60% on cloud expenses.
RK in conversation about this poster at Bio-IT world 2024

You can download a PDF version of this poster from our website!

We'd be eager to discuss any challenges you may be facing with managing and storing your NGS data on the cloud.

Feel free to get in touch with our Business Development team - Radhakrishna Bettadapura , Jaya Singh, PhD Ernie Hobbs Bernie Tebbe for more details!


Giridharan Appaswamy

Genomics, Immunology, cell and gene therapy, Retro & Lentiviral gene delivery, Stem cells, Apoptosis, Autophagy, Cell cycle, Oncology, Rare disorders, MolecularDx, PGT, NGS, Flow cytometry, Molecular cloning

6 个月

Eventually there will be solutions to slim down/compress and decompress data in good old fashioned ways. This one works well enough. https://www.petagene.com/ The need for more able platforms persists though

回复

Thanks for sharing. You may also check our report on 'Genomics Market - Global Forecasts to 2029' at: https://www.globalmarketestimates.com/market-report/genomics-market-4315

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了