Bracing for the Petabyte Era in Genomics
Divya Anantsri
A quick look at the latest genome sequencing costs tracked by the National Human Genome Research Institute (NHGRI) shows that the cost of DNA sequencing has plummeted between 2001 and 2022.
Further, Illumina has recently claimed that whole genome sequencing on the NovaSeqX can be completed for as little as $200.
While this consistent downward trend offers great promise for the genomics industry, it has uncovered a new challenge:? this vast amount of data needs to be processed and stored in cost-effective ways before interpretation can begin.
Moreover, given that sequencing projects (especially those pertaining to population health and clinical trials) increasingly include large sample sizes, the data generated is often in the order of terabytes. These volumes are slowly exceeding the capacity of on-premise servers, necessitating the use of robust and scalable cloud solutions.?
For bioinformatics labs aiming to optimize turnaround time (TAT) by migrating to the cloud, popular options include Amazon AWS, Microsoft Azure and Google Cloud Platform.?
However, the problem isn’t completely resolved until the infrastructure is set up to optimize performance and minimize storage costs.
On a related note, I was recently at a talk at Bio-IT World 2024,?where Grigoriy Sterin, Senior Principal Engineer from Tessera Therapeutics, a gene editing company, elaborated on how they built their data platform by continuously overcoming challenges in their cloud infrastructure.?
Grigoriy emphasized the importance of the following measures for scaling up while moving away from manual and cumbersome setups -?
Considering these nuances are indeed effective ways to optimise cloud storage.?
领英推荐
Echoing Sterin’s insights on overcoming cloud infrastructure hurdles, I’d like to highlight our poster at Bio-IT - Cloud Storage and Data Management Strategy for NovaSeq X+ Data - where we addressed scaling issues for our in-house genomics data using cloud storage.
This poster outlines our operations team's strategy for optimizing the storage and processing of NovaSeq X Plus (NSX+) data, aimed at efficiently managing more than 16 petabytes of data over the next decade.?
In essence, the NSX+ generates 1.5 TB of data per run using the 10B flow cell.
Our data operations team - Srikant Sridharan , Aman Saxena , Priyanshu Agarwal - has developed a streamlined pipeline, incurring AWS costs of $300/run to process 100 samples. The total AWS cost includes compute and transfer at $220 and $80/run, respectively.
We streamlined this pipeline by adopting a few strategies -?
Our optimized data flow architecture yields a cost reduction of $650/run, saving 60% on cloud expenses.
You can download a PDF version of this poster from our website!
We'd be eager to discuss any challenges you may be facing with managing and storing your NGS data on the cloud.
Feel free to get in touch with our Business Development team - Radhakrishna Bettadapura , Jaya Singh, PhD Ernie Hobbs Bernie Tebbe for more details!
Genomics, Immunology, cell and gene therapy, Retro & Lentiviral gene delivery, Stem cells, Apoptosis, Autophagy, Cell cycle, Oncology, Rare disorders, MolecularDx, PGT, NGS, Flow cytometry, Molecular cloning
6 个月Eventually there will be solutions to slim down/compress and decompress data in good old fashioned ways. This one works well enough. https://www.petagene.com/ The need for more able platforms persists though
Thanks for sharing. You may also check our report on 'Genomics Market - Global Forecasts to 2029' at: https://www.globalmarketestimates.com/market-report/genomics-market-4315