登录查看更多内容

Bracing for the Petabyte Era in Genomics

Strand Life Sciences

Personalized Genomics For A Healthier Future

发布日期: 2024年4月29日

Divya Anantsri

A quick look at the latest genome sequencing costs tracked by the National Human Genome Research Institute (NHGRI) shows that the cost of DNA sequencing has plummeted between 2001 and 2022.

Graph showing the decreasing cost of sequencing over time, compared to what would be predicted by Moore’s law.

Further, Illumina has recently claimed that whole genome sequencing on the NovaSeqX can be completed for as little as $200.

While this consistent downward trend offers great promise for the genomics industry, it has uncovered a new challenge:? this vast amount of data needs to be processed and stored in cost-effective ways before interpretation can begin.

Moreover, given that sequencing projects (especially those pertaining to population health and clinical trials) increasingly include large sample sizes, the data generated is often in the order of terabytes. These volumes are slowly exceeding the capacity of on-premise servers, necessitating the use of robust and scalable cloud solutions.?

For bioinformatics labs aiming to optimize turnaround time (TAT) by migrating to the cloud, popular options include Amazon AWS, Microsoft Azure and Google Cloud Platform.?

However, the problem isn’t completely resolved until the infrastructure is set up to optimize performance and minimize storage costs.

On a related note, I was recently at a talk at Bio-IT World 2024,?where Grigoriy Sterin, Senior Principal Engineer from Tessera Therapeutics, a gene editing company, elaborated on how they built their data platform by continuously overcoming challenges in their cloud infrastructure.?

Grigoriy emphasized the importance of the following measures for scaling up while moving away from manual and cumbersome setups -?

Storing results in a shareable way?
Tracking and automating workflows?
Using workflow engines that support parallelization

Considering these nuances are indeed effective ways to optimise cloud storage.?

领英推荐

The 23 Top Companies in Genomics

Bertalan Meskó, MD, PhD 7 年前

?? MGnify: Protein Database, ??PAMLj module for…

Zifo Bioinformatics 2 周前

Harmonized single-cell perturbation data ?? New…

Zifo Bioinformatics 9 个月前

Echoing Sterin’s insights on overcoming cloud infrastructure hurdles, I’d like to highlight our poster at Bio-IT - Cloud Storage and Data Management Strategy for NovaSeq X+ Data - where we addressed scaling issues for our in-house genomics data using cloud storage.

This poster outlines our operations team's strategy for optimizing the storage and processing of NovaSeq X Plus (NSX+) data, aimed at efficiently managing more than 16 petabytes of data over the next decade.?

In essence, the NSX+ generates 1.5 TB of data per run using the 10B flow cell.

Our data operations team - Srikant Sridharan , Aman Saxena , Priyanshu Agarwal - has developed a streamlined pipeline, incurring AWS costs of $300/run to process 100 samples. The total AWS cost includes compute and transfer at $220 and $80/run, respectively.

We streamlined this pipeline by adopting a few strategies -?

Centralizing data processing
Eliminating S3 dependency and adopting an Amazon EC2 instance for computing and temporary storage
Utilizing Azure for long-term storage

Our optimized data flow architecture yields a cost reduction of $650/run, saving 60% on cloud expenses.

RK in conversation about this poster at Bio-IT world 2024

You can download a PDF version of this poster from our website!

We'd be eager to discuss any challenges you may be facing with managing and storing your NGS data on the cloud.

Feel free to get in touch with our Business Development team - Radhakrishna Bettadapura , Jaya Singh, PhD Ernie Hobbs Bernie Tebbe for more details!

Giridharan Appaswamy

Genomics, Immunology, cell and gene therapy, Retro & Lentiviral gene delivery, Stem cells, Apoptosis, Autophagy, Cell cycle, Oncology, Rare disorders, MolecularDx, PGT, NGS, Flow cytometry, Molecular cloning

6 个月

Eventually there will be solutions to slim down/compress and decompress data in good old fashioned ways. This one works well enough. https://www.petagene.com/ The need for more able platforms persists though

Palash Khanna

6 个月

Thanks for sharing. You may also check our report on 'Genomics Market - Global Forecasts to 2029' at: https://www.globalmarketestimates.com/market-report/genomics-market-4315

查看更多评论

要查看或添加评论，请登录

Bracing for the Petabyte Era in Genomics

Strand Life Sciences

Personalized Genomics For A Healthier Future

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Next Generation in Human Genome Analysis

??? vSNP for WGS Data, ?? RNA-seq with MultiRNAflow, ?? sc-RNA and Spatial Transcriptomics with ctQC, ?? KRAGEN: KG Enhanced RAG Framework

Data Provenance Matters for Single-Cell Research: Tips to Keep Your Data Clean and Reliable as a Responsible Scientist

Why are open bioinformatics pipelines so important for genomic surveillance?

The Genome Era is here. How do we make the most of it?

AI in Genome Sequencing – Artificial Intelligence’s latest Trend Setter can Sequence Genome

LLMs for mRNA Design ?? Tidy Omics Data Analysis ?? NIH Grants $50M for Multiomics Research ?? Single-Cell Analysis in the Browser ???

Progeni for Target Identification ??, Single-cell gene expression predictions with scPRAM??, riboseq ??? for Ribosome Profiling

edgeR 4.0: Enhanced Sequencing Data Analysis ?? History & Strategy of Novo Nordisk ?? Code-Sharing Guide in Biology ??

Human Gene Catalogue ??, Microbiome Data Sharing Issues ??, Long-Read RNA-Seq Tool Benchmarks ??

领英推荐

Overcoming Data Management Hurdles in Multiomics Analysis

2024年5月9日

Single-Cell RNA-sequencing Data Curation Service & Data Showcase

2024年5月8日

Strand is heading to Bio-IT 2024!

2024年4月10日

Harnessing the Power of Harmonized Data: Strand’s Approach

2024年4月3日

Strand OncoTrends | April 2024

2024年4月1日

Resolving Ontology Inconsistencies: Insights from Strand's Approach

2024年3月22日

What’s in a term?

2024年3月12日

From Chaos to Clarity: Harmonizing Data

2024年2月23日

Maximizing Data Power: The Role of Data Pooling in Pharma Research

2024年2月19日

StrandNGS MethylSeq supports Agilent’s SureSelect XT HS2 Target enrichment protocol

2024年2月5日

社区洞察

其他会员也浏览了

Next Generation in Human Genome Analysis

??? vSNP for WGS Data, ?? RNA-seq with MultiRNAflow, ?? sc-RNA and Spatial Transcriptomics with ctQC, ?? KRAGEN: KG Enhanced RAG Framework

Data Provenance Matters for Single-Cell Research: Tips to Keep Your Data Clean and Reliable as a Responsible Scientist

Why are open bioinformatics pipelines so important for genomic surveillance?

The Genome Era is here. How do we make the most of it?

AI in Genome Sequencing – Artificial Intelligence’s latest Trend Setter can Sequence Genome

LLMs for mRNA Design ?? Tidy Omics Data Analysis ?? NIH Grants $50M for Multiomics Research ?? Single-Cell Analysis in the Browser ???

Progeni for Target Identification ??, Single-cell gene expression predictions with scPRAM??, riboseq ??? for Ribosome Profiling

edgeR 4.0: Enhanced Sequencing Data Analysis ?? History & Strategy of Novo Nordisk ?? Code-Sharing Guide in Biology ??

Human Gene Catalogue ??, Microbiome Data Sharing Issues ??, Long-Read RNA-Seq Tool Benchmarks ??