nf-core’s fetchngs in Spotlight ??, MultiQC v1.21 Release ??, AI Breakthroughs in Predictive Medicine ??, tRigon illuminates (Path-)Omics Analysis!
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 27 | Date: 8 March 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
This study assesses 59 methods for marker gene selection in single-cell RNA sequencing (scRNA-seq) data, using real and simulated datasets. Evaluation criteria include marker gene recovery, predictive accuracy, memory usage, speed, and implementation quality. Simple techniques like the Wilcoxon rank-sum test, Student’s t-test, and logistic regression show notable effectiveness. Case studies delve into the performance of commonly utilized methods.
The study explores drug-target interactions (DTIs) using natural language processing (NLP) and pre-trained language models. By combining gene descriptions from Entrez Gene with chemical descriptions from the Comparative Toxicogenomics Database (CTD), optimal performance is achieved, with an F1 score of 80.6 on the DrugProt test set. A comparative analysis assesses the effectiveness of gene descriptions from Entrez Gene and UniProt databases in enhancing DTI extraction tasks.
Exploring single-cell transcriptomic integration, this study unveils challenges posed by cell type imbalances across datasets. Through the Iniquitate pipeline, integration robustness is evaluated under diverse imbalances. Benchmarking five methods highlights significant downstream analysis impacts, prompting the introduction of new metrics and guidelines for addressing imbalance in integration strategies.
This study compares normalization methods for microbiome data analysis, indicating that simpler relative abundance-based transformations perform slightly better than complex compositionally aware methods. This suggests that minimizing complexity while correcting for read depth may be preferable in data preparation for machine learning.
“Gbdmr”, a DMR detection algorithm, comparing it with dmrff and traditional EWAS. It shows better performance in strong CpG site correlation scenarios, while dmrff excels with weak correlation. Applied to multiple DNA methylation datasets, gbdmr identifies more DMR CpGs linked to phenotypes, highlighting its reliability in DMR detection.
Cell-cell communication disruptions in disease are studied using scRNAseq in larger cohorts. Community, an R-based tool, analyses communication in scRNAseq between case-control cohorts, integrating cell type abundance. Tested on ulcerative colitis and acute myeloid leukaemia datasets, Community excels over other pipelines in speed and robustness for assessing differential cell-cell communication.
This article evaluates rejection strategies (full, partial, none) in cell type annotation methods, implying that hierarchical classifiers perform better with partial rejection, preserving label information. Optimal rejection requires careful threshold selection. Without rejection, flat and hierarchical annotation perform similarly when transcriptomic relationships are accurately captured.
Code is available here.
This review delves into how deep learning, particularly convolutional neural networks (CNNs), transforms predictive modelling in omics analysis. By converting data into image-like formats, CNNs boost prediction accuracy, although challenges like model interpretability persist. Interdisciplinary collaborations are vital for tackling these obstacles and harnessing the full potential of CNNs in omics research.
??? Latest Tools
The latest release of MULTIQC introduces a box plot feature and an "Export to CSV" button for tables, enhancing data visualization and export capabilities. Other improvements include the replacement of setup.py with pyproject.toml, enhanced heatmap functionality, and better handling of non-existent modules and non-hashable values.
nf-core/fetchngs is a bioinformatics pipeline to fetch metadata and raw FastQ files from both public databases. At present, the pipeline supports SRA / ENA / DDBJ / GEO ids. See usage.
Holomics, an R shiny app, offers user-friendly tools for multi-omics analysis. It simplifies data upload, filtering, single-omics analysis, and multi-omics integration. A case study on sugar beet storability showcases its versatility and consistency.
SpatialView, an open-source web application, addresses the need for interactive visualization tools in spatial transcriptomics (ST) experiments. It enables users to visualize data and results from multiple 10x Genomics Visium ST experiments, facilitating investigations into cellular heterogeneity and tissue organization within diseases.
SpatialView is available here.
FCS-GX swiftly identifies and removes contaminant sequences from assembled genomes, demonstrating high sensitivity and specificity. Testing on 1.6M GenBank assemblies detected 36.8 Gbp contamination, prompting updates in NCBI RefSeq assemblies.
FCS-GX is available here.
tRigon, a Shiny application, facilitates fast, comprehensive, and reproducible analysis of high-dimensional pathomics datasets, addressing challenges like outlier variability and data missingness. Available on CRAN and GitLab, tRigon offers local installation or online access, demonstrating rapid computation across datasets of varying sizes and hardware settings.
tRigon is available via the CRAN repository with its source code available on GitLab.
KaMRaT, implemented in C++, processes large k-mer count tables from multi-sample RNA-seq data to identify condition-specific or differentially expressed sequences, independent of gene or transcript annotation. It scores k-mers using count statistics, merges overlapping k-mers into contigs, and selects k-mers based on their occurrence across specific samples.
领英推荐
Source code is available here.
BERMAD, the approach that tackles batch effects in scRNA-seq data by balancing under- and over-correction. Its multi-layer adaptation and dual-channel framework improve accuracy and retain heterogeneous information, advancing techniques in extensive experiments.
NPSV-deep is a deep learning-based approach for genotyping structural variants from short-read genome sequencing data. It significantly enhances accuracy, reducing errors by 25% for high-confidence SVs, and improves overall genotyping concordance by 1.5 percentage points for GIAB SVs.
Source code and pre-trained models are available here.
HormoNet, utilizing deep learning, predicts hormone-drug interactions (HDI) and their risk levels by integrating hormone and drug target protein features. It attempts to addresses data imbalance and has demonstrated high performance on a few hormone-drug benchmark datasets, offering insights into HDI relationships for improved therapy design.
Source code available here.
Uniprot API functionality facilitates easy access to protein information like names and taxonomy details using accession numbers. This feature enriches the platform by providing users with comprehensive data on proteins of interest, enhancing their research capabilities and data analysis efficiency.
?? Community News
BV-BRC Beta, funded by NIAID, offers a beta website to support research on bacterial and viral infectious diseases. Integrating data and tools from PATRIC and IRD/ViPR, it facilitates biomedical research with essential pathogen information and advanced analysis capabilities.
DECIPHER version 11.24 introduces a new ACMG/AMP pathogenicity interface for sequence variant predictions and additional features.
?? Upcoming Events
Explore strain-level analysis in human microbiome research through computational tools and large-scale scenario investigations in this webinar. Ideal for those interested in microbial-omics studies, no prior metagenomic-specific knowledge is required. Delve into contemporary concepts in microbial ecosystem analysis.
Explore how long-read sequencing is impacting human genetics in a webinar by Radboudumc. Learn about the efficacy of HiFi sequencing in identifying mutations, its potential in understanding undiagnosed diseases and rare disease cohorts, and discover the relevance of HiFi genomes in clinical research.
?? Educational Corner
This blog post discusses the use of dimensionality reduction methods like PCA, t-SNE, and UMAP for visualizing high-dimensional data, highlighting concerns about potential distortions in data interpretation. It suggests adopting a probabilistic framework for interpreting dimensionality reduction plots, acknowledging the possibility of inaccuracies. The author proposes empirical user studies to evaluate the practical utility of these methods.
This tutorial delves into AlphaFold2, an AI system for predicting protein structures, catering to researchers with a basic grasp of protein structure. It covers prediction methods, validation, and integration into research projects, offering insights into its significance and accessing predicted structures.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
?
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.