登录查看更多内容

Embracing AI Transformation: Let’s Start with Data

Andrii Buvailo, Ph.D.

Science & Technology Communicator | Life Sciences

发布日期: 2024年7月8日

The integration of artificial intelligence (AI) into early-stage drug discovery promises to accelerate hit discovery, optimization, and overall drug development processes. But how to onboard AI in your organization in a meaningful way? It starts with taking care of your data and articulating goals, first.

A recent article published in Nature Communications by Kristina Edfeldt et al. highlights the pivotal role of data management, dissemination, and AI integration within the Structural Genomics Consortium (SGC).

Drawing insights from this comprehensive roadmap, I highlithed the key points to consider when thinking about building/updating your organization’s data storage and processing principles:

Adhere to FAIR Principles: Ensure all data is Findable, Accessible, Interoperable, and Reusable. Implement standardized metadata schemas and persistent identifiers (e.g., DOIs) for all datasets to enhance findability and accessibility. Use interoperable data formats like XML or JSON to facilitate data exchange and integration.
Establish Precise Ontologies and Standardized Vocabulary: Define clear ontologies for data categorization, such as the BioAssay Ontology (BAO) for biological screening assays. Use standardized vocabularies like Medical Subject Headings (MeSH) to ensure consistency and improve machine readability.
Implement Centralized Database Architecture: Develop a unified data architecture using relational databases (e.g., PostgreSQL) or graph databases (e.g., Neo4j) to store and manage data. Ensure schema compatibility with established repositories like ChEMBL and PubChem to facilitate seamless data integration and dissemination.
Leverage Lab Automation and Integrated ELN/LIMS Systems: Utilize automation tools such as liquid handlers and robotic workstations to record detailed experimental metadata (e.g., reagent purity, ambient temperature). Integrate ELNs (e.g., LabArchives) with LIMS (e.g., LabWare) through APIs to streamline data capture and protocol linkage.
Promote Transparent and Reproducible Data Processing: Develop and publish open-source data processing pipelines using languages like Python and R. Document all preprocessing steps, including quality control measures, normalization techniques, and data transformation methods, in code repositories such as GitHub.
Create and Manage Multimodal Data Objects: Combine diverse data types (e.g., proteomics, genomics, chemical screening) into comprehensive data objects using data integration platforms like KNIME or Galaxy. Utilize BioCompute objects for tracking data processing pipelines and ensuring reproducibility.
Versioning and Archiving: Implement version control systems (e.g., Git) to track changes in datasets and maintain detailed change logs. Use data nutrition labels to summarize key characteristics, updates, and quality metrics for each dataset version.
Utilize Cloud-Based Data Hosting and Analysis: Leverage cloud platforms (e.g., AWS, Google Cloud, Microsoft Azure) for scalable data storage and computational resources. Employ the Model2Data approach by bringing analysis code to cloud-based data storage to minimize data transfer costs and enhance processing efficiency.
Engage in Active Learning and DMTA Cycles: Design data-driven feedback loops within the Design-Make-Test-Analyze (DMTA) cycles, using predictive models to guide experimental design. Implement active learning strategies to prioritize experiments that reduce prediction uncertainty and maximize data informativeness.
Foster Collaboration Between Experimentalists and Data Scientists: Promote a collaborative environment where experimentalists and data scientists work together from the onset of data generation. Incorporate data science into experimental design to enhance the impact and efficiency of research efforts, utilizing platforms like Jupyter Notebooks for shared analysis.

This article is from yesterday's newsletter "Weekly Tech+Bio Highlights #7 ."

Make sure to check it for the roundup of technology news, company annoucements and scientific breakthrous relevant to pharma and biotech research.

领英推荐

What is AI data management?

IBM 1 个月前

Empowering Intelligence: Automated Machine Learning…

Pratibha Kumari J. 1 年前

Altair Forward First – September 2024 Edition

Altair 1 个月前

---

Welcome to my newsletter, "Where Technology Meets Biology"!

Here, I am sharing noteworthy news, trends, biotech startup picks, industry analyses, and interviews with pharma KOLs. Contact me for consulting or sponsorship opportunities here or at www.BiopharmaTrend.com .

Enjoying the newsletter? Subscribe to become part of the 15K+ readers here on LinkedIn. Please help us spread the word by sharing it with your colleagues and friends.

Also, consider joining my Substack community, where we are exploring a lot more (5.5K+ industry professionals are reading it via weekly email).

-- Andrii

Where Technology Meets Biology

21,049 位关注者

John Harman

Get drug discovery and development done | Turn ideas into cures

4 个月

Andrii Buvailo, Ph.D. another great article. While all 10 points are well stated and accurate for today's lab informatics culture, I want to present some out-of-the-box opinions which I believe are necessary to break out of data mediocrity: #'s 5 & 7 - Git is not an acceptable history in the scientific world. Git is a developer tool. The data analysis descriptors, observability artifacts (log files), protocols, metadata, test cases & outcomes all need to be captured as a primary element of experimentation. #'s 3, 6 & 10 - I declare multi-system data replication, data meshes, data layers to be obsolete. It is possible to define the ontology/structure your conceptual objects and store them all in a single platform suitable for real-time transactional needs (e.g. running experiments) and data analytics needs (e.g. genomic analysis pipelines). It is hard...but we must move from adding more cards to the "house of cards" and fix the foundation of our laboratories. Dig deeper into the digital science development consortium to see how we can support the future of our industry.

1 次回应

Joseph Pareti

AI Consultant @ Joseph Pareti's AI Consulting Services | AI in CAE, HPC, Health Science

4 个月

i would add 'data observability' , but perhaps they already take into account https://docs.google.com/presentation/d/1fB-UEL1zc5UEgB9Eo4URdxR9IfYQA9UUkXdNizaSjaA/edit?usp=sharing

1 次回应

Where Tech Meets Bio (Substack Newsletter)

4 个月

Some relevant read: AI Foundation Models in Biotech: New Paradigm https://www.techlifesci.com/p/ai-foundation-models-in-biotech-new

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Embracing AI Transformation: Let’s Start with Data

Andrii Buvailo, Ph.D.

Science & Technology Communicator | Life Sciences

领英推荐

Where Technology Meets Biology

21,049 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The AI Healthcare Revolution - Insights from Anupam Gupta

An In-Depth Guide to Ontologies: Understanding, Applications, and Key Tools

MIT's GenSQL: Bridging AI and Databases for Smarter Analytics

Conquering Class Imbalance: Techniques and Strategies for Robust Models

Maximising ML Model Performance: The Importance of Data Sample Selection

Essential Strategies for Crafting an Advanced Analytics Roadmap in the Life Sciences Sector

Supercharge Pharma Operations with AI!

Evolution of Auto-coding

How to Leverage Computer Vision Data Labeling Through Embeddings

What Does the Snowflake of Biotech Look Like? Enter Scispot GLUE

领英推荐

Where Technology Meets Biology

21,049 位关注者

How Companies Adapt to Changes in Clinical Development Market?

2024年11月8日

Key Trends in Aging Research: Where Are We Now?

2024年10月31日

A Race Towards Better Model For Blood-Brain Barrier Permeability Prediction is On

2024年10月17日

Microbiome, Aging, and the Synthetic Immune System: A New Frontier in Geroscience

2024年9月20日

"Tech in Bio" Corner #2

2024年8月29日

"Tech in Bio" Corner #1

2024年8月15日

How to Define Intelligence and Consciousness for In Silico and Organoid-based Systems?

2024年8月2日

10 Notable Biotech Companies With Recent Major VC Rounds

2024年7月22日

Pioneering the Largest Foundation Model to Transform RNA Research: An Interview with Brendan Frey

2024年6月14日

Building Data Foundation for Biology

2024年5月23日

社区洞察

其他会员也浏览了

The AI Healthcare Revolution - Insights from Anupam Gupta

An In-Depth Guide to Ontologies: Understanding, Applications, and Key Tools

MIT's GenSQL: Bridging AI and Databases for Smarter Analytics

Conquering Class Imbalance: Techniques and Strategies for Robust Models

Maximising ML Model Performance: The Importance of Data Sample Selection

Essential Strategies for Crafting an Advanced Analytics Roadmap in the Life Sciences Sector

Supercharge Pharma Operations with AI!

Evolution of Auto-coding

How to Leverage Computer Vision Data Labeling Through Embeddings

What Does the Snowflake of Biotech Look Like? Enter Scispot GLUE