WDL 1.2.0: Latest Release??, CellBin: Spatial Transcriptomics Pipeline??, ADMET-AI??: Evaluating Large-scale Chemical Libraries using ML??
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 43 | Date: 28 June 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
The study examines the renin-angiotensin system’s role in lung disorders using RNA-sequence datasets. It identifies distinct localization patterns for two angiotensin receptors, AGTR1 and AGTR2, in the human lung. The study also finds an association between AGTR2 and lung phenotype, providing insights into this pathway’s role in lung homeostasis.
Pineoblastoma (PB) is a rare childhood brain tumour. Using a single-nucleus RNA-sequencing cohort of primary PB tumours, the study maps the origins of PB subgroups to the mouse pineal gland’s developmental stages. It identifies CRX, OTX2, and NEUROD1 as master regulators of an oncogenic photoreceptor program across PB subgroups. Interestingly, this program is also prevalent in retinoblastoma and Group3 medulloblastoma, suggesting a shared oncogenic dependency among these distinct CNS tumours.
The study presents an automated method for predicting Alzheimer’s disease (AD) progression within 6 years in individuals with mild cognitive impairment (MCI). It uses natural language processing and machine learning techniques on speech data, along with age, sex, and education level. The models achieved an accuracy of 78.5% and a sensitivity of 81.1%. The study highlights the potential of AI-powered pipelines in facilitating remote and cost-effective screening and prognosis for Alzheimer’s disease.
The research explores the use of Large Language Models (LLMs) like GPT and LLaMA in cheminformatics, specifically in understanding SMILES, a method for representing chemical structures. The study observes that SMILES embeddings generated using LLaMA achieve better results than those from GPT in molecular property and drug-drug interaction prediction tasks. The findings suggest potential for further exploration of LLMs in molecular embedding.
Source code is available here.
The study evaluates three deep learning techniques (StarDist, CellPose, and BCFind-v2) for 3D reconstruction of human brain volumes. It focuses on the Broca’s area and compares methods based on predicted density, localization, computational efficiency, and human annotation effort. The results suggest that these techniques are effective in providing each cell’s 3D location and offer results comparable to the adopted stereological design.
??? Latest Tools
The Workflow Description Language (WDL) team has released WDL 1.2.0, a major update enhancing the flexibility and usability of workflow descriptions in bioinformatics. This version introduces features and enhancements to streamline workflow management and execution, simplifying the implementation and management of complex bioinformatics workflows for developers and researchers.
CellBin, a one-stop pipeline for high-resolution spatial transcriptomic data of Stereo-seq. It offers a comprehensive platform for generating high-confidence single-cell spatial gene expression profiles. It includes image stitching, image registration, tissue segmentation, nuclei segmentation, and molecule labelling. The study highlights that CellBin is user-friendly and improves the signal-to-noise ratio of single-cell gene expression data. It has been shown to obtain accurate single-cell spatial data using mouse brain tissue.
DDN3.0 is a tool for differential network analysis, which is crucial for understanding complex diseases. DDN3.0 enhances the framework with three efficient algorithms for unbiased model estimation, multiple acceleration strategies, and data-driven determination of hyperparameters. The tool is designed to jointly learn common and rewired network structures under different conditions, and it can help identify a network of significantly rewired molecular players potentially responsible for phenotypic transitions.
Pruning-enabled Gene-Cell Net (PredGCN) is a tool designed to address limitations in automatic cell type annotation from single-cell transcriptomics. PredGCN incorporates a Coupled Gene-Cell Net (CGCN) and integrates a Gene-Splicing Net (GSN) and a Cell Stratification Net (CSN) with a pruning operation. It leverages multiple feature extraction methods and region demarcation principles for precise cell identification.
Source code is available here.
Entourage is a tool designed to address the challenges in pan-virus detection and virome investigation. Entourage enables short-read sequence assembly, viral sequence search, and intrasample sequence variation quantification. It offers end-to-end virus sequence detection analysis through a single command line. The tool’s utility is demonstrated through its application on HeLa cell culture samples and a preassembled Tara Oceans dataset.
PxBLAT is a Python-based framework designed to enhance the BLAT sequence alignment tool. PxBLAT offers significant improvements in execution speed and data handling, as demonstrated by benchmarks conducted across various sample groups. It also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion.
Source code is available here.
领英推荐
ADMET-AI, a machine learning platform, has been developed to provide quick and accurate predictions of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties. It is available both as a web-based ADMET predictor and as a Python package for local execution, capable of predicting for one million molecules in just 3.1 hours.
Source code is available here.
GENTANGLE is a pipeline developed for the computational design of two overlapping genes in different reading frames of a microbial genome. This technique enhances the reliability of control mechanisms in engineered organisms. The software can be used to design and test gene entanglements for microbial engineering projects.
Source code is available here.
Protein Interaction Explorer (PIE) is a tool integrated with the iPPI-DB database, designed to support structure-based drug discovery initiatives focused on protein-protein interactions. It provides a comprehensive suite of tools to aid in decision-making in PPI drug discovery, including identifying and characterizing crucial factors such as binding pockets and predicting hot spots.
Source code is available here.
hictk, a new toolkit, has been developed to operate on .hic and .cool files used in Hi-C data processing. It outperforms existing tools and provides the flexibility of working natively with both file formats. The toolkit includes a C++ library with Python and R bindings and CLI tools for common operations, including format conversion.
Source code is available here.
?? Community News
A recent study in Nature Medicine explored the role of microbial features in type 2 diabetes. Scientists analysed over 8,000 shotgun metagenomic sequences from individuals with varying glycaemic statuses to understand how specific subspecies and strains contribute to the disease’s pathological mechanisms. The study found that gut microbiome dysbiosis plays a role in mechanisms such as glucose metabolism and butyrate fermentation, along with other findings that provide insights into the gut microbiome's association with type 2 diabetes.
A new computational tool, Surrogate Quantitative Interpretability for Deepnets (SQUID), has been developed by the Simons Centre for Quantitative Biology at Cold Spring Harbor Laboratory. This tool uses deep neural networks to interpret how AI models analyse genomic data, bringing us closer to understanding the inner workings of AI in genomics.
?? Educational Corner
The blog post highlights the ggside R package as a powerful tool for visualizing data, particularly in the context of single-cell RNA sequencing (scRNA-seq) datasets. By leveraging the flexibility of ggplot2, ggside enables users to create side-by-side plots that simultaneously display multiple variables such as gene expression, cell types, and experimental conditions.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.