Data repository: sharing transcriptomics data and promoting open science

Data repository: sharing transcriptomics data and promoting open science

(written by Nicolas Casadei )

Abundant publicly available RNA-sequencing (RNA-seq) data nowadays provides biologists and clinician scientists with wide opportunities of contextualizing their research, generating data-driven hypotheses, and identifying trends across studies. However, for newcomers, the vast amount of data can be overwhelming.

To manage this wealth of information, data repositories providing open access to research results, including transcriptomics data, are crucial. They do not only simplify the navigation through the deposited data and condense all required metadata in one place, but, in the broader range, they also play an essential role in propagating the concept of open science, emphasizing the importance of transparency and collaboration.

Benefits of Data Repositories:

  • Reduced Redundancy: By depositing data, researchers avoid repeating well-described experiments, saving valuable time and resources.
  • Data Verification: Uploading raw data allows others to verify the analysis and potentially uncover new insights.
  • Accessibility: Analyzed datasets are readily usable by researchers with varying levels of expertise, accelerating their progress.

Treasure Troves of Gene Expression Data

Gene expression research relies heavily on specialized databases for storing and sharing data. Three prominent organizations manage these invaluable resources:

  1. National Center for Biotechnology Information (NCBI) (US): This powerhouse, part of the National Library of Medicine, offers a comprehensive suite of resources. The Gene Expression Omnibus (GEO) focuses on gene expression datasets, while the Sequence Read Archive (SRA) stores sequencing data.
  2. European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) (Europe): EMBL-EBI provides ArrayExpress for microarray data and the European Nucleotide Archive (ENA) for RNA-seq data.
  3. DNA Data Bank of Japan (DDBJ) (Japan): DDBJ offers the DDBJ Sequence Read Archive (DRA) for RNA-seq data.

These databases aren't isolated entities. They are all part of the International Nucleotide Sequence Database Collaboration (INSDC), fostering continuous exchange and ensuring researchers can access data from any participating data bank.

Unlocking the Power of Gene Expression Data

While these databases are user-friendly, here are some key strategies to maximize your search efficiency:

  • Start with a Clear Goal: Formulate a precise hypothesis before diving into the search. This will guide your keyword selection.
  • Refine Your Search: Expect a vast number of results. Utilize filters like organism and study type to narrow your focus and find the most relevant datasets.
  • Beware of Duplicates: Be mindful that identical datasets might have different project names. Consulting the linked research paper can clarify any confusion.
  • Embrace Diverse Results: Resist the urge to solely focus on datasets with seemingly perfect results. Consider datasets with potentially conflicting information, as they can offer a more comprehensive picture.
  • Practice Responsible Sharing: When publishing findings based on deposited data, always cite the database and the original research for proper attribution and to allow others to easily locate the resources.

Additional Resources:

For an in-depth exploration of each database, refer to the following research articles:

要查看或添加评论,请登录

NGS Competence Center Tübingen (NCCT)的更多文章

社区洞察

其他会员也浏览了