nf-core’s fetchngs in Spotlight ??, MultiQC v1.21 Release ??, AI Breakthroughs in Predictive Medicine ??, tRigon illuminates (Path-)Omics Analysis!

Bioinformer Weekly Roundup

Stay Updated with the Latest in Bioinformatics!

Issue: 27 | Date: 8 March 2024

?? Welcome to the Bioinformer Weekly Roundup!

In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!

?? Featured Research

A comparison of marker gene selection methods for single-cell RNA sequencing data | Genome Biology

This study assesses 59 methods for marker gene selection in single-cell RNA sequencing (scRNA-seq) data, using real and simulated datasets. Evaluation criteria include marker gene recovery, predictive accuracy, memory usage, speed, and implementation quality. Simple techniques like the Wilcoxon rank-sum test, Student’s t-test, and logistic regression show notable effectiveness. Case studies delve into the performance of commonly utilized methods.

Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model | bioRxiv

The study explores drug-target interactions (DTIs) using natural language processing (NLP) and pre-trained language models. By combining gene descriptions from Entrez Gene with chemical descriptions from the Comparative Toxicogenomics Database (CTD), optimal performance is achieved, with an F1 score of 80.6 on the DrugProt test set. A comparative analysis assesses the effectiveness of gene descriptions from Entrez Gene and UniProt databases in enhancing DTI extraction tasks.

Characterizing the impacts of dataset imbalance on single-cell data integration | Nature Biotechnology

Exploring single-cell transcriptomic integration, this study unveils challenges posed by cell type imbalances across datasets. Through the Iniquitate pipeline, integration robustness is evaluated under diverse imbalances. Benchmarking five methods highlights significant downstream analysis impacts, prompting the introduction of new metrics and guidelines for addressing imbalance in integration strategies.

Proportion-based normalizations outperform compositional data transformations in machine learning applications | Microbiome

This study compares normalization methods for microbiome data analysis, indicating that simpler relative abundance-based transformations perform slightly better than complex compositionally aware methods. This suggests that minimizing complexity while correcting for read depth may be preferable in data preparation for machine learning.

Gbdmr: identifying differentially methylated CpG regions in the human genome via generalized beta regressions | BMC Bioinformatics

“Gbdmr”, a DMR detection algorithm, comparing it with dmrff and traditional EWAS. It shows better performance in strong CpG site correlation scenarios, while dmrff excels with weak correlation. Applied to multiple DNA methylation datasets, gbdmr identifies more DMR CpGs linked to phenotypes, highlighting its reliability in DMR detection.

Community assesses differential cell communication using large multi-sample case-control scRNAseq datasets | bioRxiv

Cell-cell communication disruptions in disease are studied using scRNAseq in larger cohorts. Community, an R-based tool, analyses communication in scRNAseq between case-control cohorts, integrating cell type abundance. Tested on ulcerative colitis and acute myeloid leukaemia datasets, Community excels over other pipelines in speed and robustness for assessing differential cell-cell communication.

Uncertainty-aware single-cell annotation with a hierarchical reject option | Bioinformatics Oxford Academic

This article evaluates rejection strategies (full, partial, none) in cell type annotation methods, implying that hierarchical classifiers perform better with partial rejection, preserving label information. Optimal rejection requires careful threshold selection. Without rejection, flat and hierarchical annotation perform similarly when transcriptomic relationships are accurately captured.

Code is available here.

Advances in AI and machine learning for predictive medicine | Journal of Human Genetics

This review delves into how deep learning, particularly convolutional neural networks (CNNs), transforms predictive modelling in omics analysis. By converting data into image-like formats, CNNs boost prediction accuracy, although challenges like model interpretability persist. Interdisciplinary collaborations are vital for tackling these obstacles and harnessing the full potential of CNNs in omics research.

??? Latest Tools

Just out - MultiQC v1.21

The latest release of MULTIQC introduces a box plot feature and an "Export to CSV" button for tables, enhancing data visualization and export capabilities. Other improvements include the replacement of setup.py with pyproject.toml, enhanced heatmap functionality, and better handling of non-existent modules and non-hashable values.

Seqera’s latest “Pipeline in The Spotlight''

nf-core/fetchngs is a bioinformatics pipeline to fetch metadata and raw FastQ files from both public databases. At present, the pipeline supports SRA / ENA / DDBJ / GEO ids. See usage.

Holomics - a user-friendly R shiny application for multi-omics data integration and analysis | BMC Bioinformatics

Holomics, an R shiny app, offers user-friendly tools for multi-omics analysis. It simplifies data upload, filtering, single-omics analysis, and multi-omics integration. A case study on sugar beet storability showcases its versatility and consistency.

SpatialView: an interactive web application for visualization of multiple samples in spatial transcriptomics experiments | Bioinformatics Oxford Academic

SpatialView, an open-source web application, addresses the need for interactive visualization tools in spatial transcriptomics (ST) experiments. It enables users to visualize data and results from multiple 10x Genomics Visium ST experiments, facilitating investigations into cellular heterogeneity and tissue organization within diseases.

SpatialView is available here.

Rapid and sensitive detection of genome contamination at scale with FCS-GX | Genome Biology

FCS-GX swiftly identifies and removes contaminant sequences from assembled genomes, demonstrating high sensitivity and specificity. Testing on 1.6M GenBank assemblies detected 36.8 Gbp contamination, prompting updates in NCBI RefSeq assemblies.

FCS-GX is available here.

tRigon: an R package and Shiny App for integrative (path-)omics data analysis | BMC Bioinformatics

tRigon, a Shiny application, facilitates fast, comprehensive, and reproducible analysis of high-dimensional pathomics datasets, addressing challenges like outlier variability and data missingness. Available on CRAN and GitLab, tRigon offers local installation or online access, demonstrating rapid computation across datasets of varying sizes and hardware settings.

tRigon is available via the CRAN repository with its source code available on GitLab.

KaMRaT: a C?++ toolkit for k-mer count matrix dimension reduction | Bioinformatics Oxford Academic

KaMRaT, implemented in C++, processes large k-mer count tables from multi-sample RNA-seq data to identify condition-specific or differentially expressed sequences, independent of gene or transcript annotation. It scores k-mers using count statistics, merges overlapping k-mers into contigs, and selects k-mers based on their occurrence across specific samples.

Source code is available here.

BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework | Bioinformatics Oxford Academic

BERMAD, the approach that tackles batch effects in scRNA-seq data by balancing under- and over-correction. Its multi-layer adaptation and dual-channel framework improve accuracy and retain heterogeneous information, advancing techniques in extensive experiments.

NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data | Bioinformatics Oxford Academic

NPSV-deep is a deep learning-based approach for genotyping structural variants from short-read genome sequencing data. It significantly enhances accuracy, reducing errors by 25% for high-confidence SVs, and improves overall genotyping concordance by 1.5 percentage points for GIAB SVs.

Source code and pre-trained models are available here.

HormoNet: a deep learning approach for hormone-drug interaction prediction | BMC Bioinformatics

HormoNet, utilizing deep learning, predicts hormone-drug interactions (HDI) and their risk levels by integrating hormone and drug target protein features. It attempts to addresses data imbalance and has demonstrated high performance on a few hormone-drug benchmark datasets, offering insights into HDI relationships for improved therapy design.

Source code available here.

UniprotR: Retrieving Information of Proteins from Uniprot | CRAN

Uniprot API functionality facilitates easy access to protein information like names and taxonomy details using accession numbers. This feature enriches the platform by providing users with comprehensive data on proteins of interest, enhancing their research capabilities and data analysis efficiency.

?? Community News

Bacterial and Viral Bioinformatics Resource Center Website Now Available | Global Biodefense

BV-BRC Beta, funded by NIAID, offers a beta website to support research on bacterial and viral infectious diseases. Integrating data and tools from PATRIC and IRD/ViPR, it facilitates biomedical research with essential pathogen information and advanced analysis capabilities.

DECIPHER v11.24 Released | EMBL EBI

DECIPHER version 11.24 introduces a new ACMG/AMP pathogenicity interface for sequence variant predictions and additional features.

?? Upcoming Events

Strain-resolved approaches for human microbiome studies | EMBL EBI

Explore strain-level analysis in human microbiome research through computational tools and large-scale scenario investigations in this webinar. Ideal for those interested in microbial-omics studies, no prior metagenomic-specific knowledge is required. Delve into contemporary concepts in microbial ecosystem analysis.

Assessing HiFi genomes as first-tier analysis in rare disease genetic research | PacBio

Explore how long-read sequencing is impacting human genetics in a webinar by Radboudumc. Learn about the efficacy of HiFi sequencing in identifying mutations, its potential in understanding undiagnosed diseases and rare disease cohorts, and discover the relevance of HiFi genomes in clinical research.

?? Educational Corner

Assessing the utility of data visualizations based on dimensionality reduction | Matthew N. Bernstein

This blog post discusses the use of dimensionality reduction methods like PCA, t-SNE, and UMAP for visualizing high-dimensional data, highlighting concerns about potential distortions in data interpretation. It suggests adopting a probabilistic framework for interpreting dimensionality reduction plots, acknowledging the possibility of inaccuracies. The author proposes empirical user studies to evaluate the practical utility of these methods.

AlphaFold A practical guide | EMBL EBI

This tutorial delves into AlphaFold2, an AI system for predicting protein structures, catering to researchers with a basic grasp of protein structure. It covers prediction methods, validation, and integration into research projects, offering insights into its significance and accessing predicted structures.

?? Connect with Us

Stay connected and engage with us on social media for daily updates, discussions, and more!

?? Subscribe

Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.

Subscribe Now

We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!

Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.

Contact: [email protected]

?

nf-core’s fetchngs in Spotlight ??, MultiQC v1.21 Release ??, AI Breakthroughs in Predictive Medicine ??, tRigon illuminates (Path-)Omics Analysis!

Zifo Bioinformatics

Bioinformer Weekly Roundup

?? Welcome to the Bioinformer Weekly Roundup!

?? Featured Research

??? Latest Tools

领英推荐

?? Community News

?? Upcoming Events

?? Educational Corner

?? Connect with Us

?? Subscribe

Bioinformer

4,245 位关注者

Zifo Bioinformatics的更多文章

社区洞察

其他会员也浏览了

The Future of Genes is Algorithmic: 5 Real-Case Examples in Machine Learning for Genomics to Spark Your Curiosity

From Genes to Proteins: 10 Ways AI is Transforming Bioinformatics and Computational Biology

UniProt: The Google of Proteins! ????

The Unprecedented Shift: Navigating the Future with AI and Exponential Technologies

AI is Now Designing Life: Who Governs Synthetic Biology?

The Language of Cells: Protein-Protein Interactions Unraveled ????

Deep protein language models, CasPEDIA, & AWS for Bioinformatics

GET is a new model for understanding human biology

?? Decoding PubMed: A Bioinformatics Odyssey Through Time and Texts! ??

Exploring the Convergence of AI and Biosciences: Emerging Roles in Biosciences Startups in the US

Bioinformer Weekly Roundup

?? Welcome to the Bioinformer Weekly Roundup!

?? Featured Research

??? Latest Tools

领英推荐

?? Community News

?? Upcoming Events

?? Educational Corner

?? Connect with Us

?? Subscribe

Bioinformer

4,245 位关注者

Zifo Bioinformatics的更多文章

??RBC-GEM: Genomic Metabolic Model Tool, ???SCEMENT for scRNA-Seq Integration, ??BeeR: New Protein for Drug Delivery??

??Stem Cell Therapy for Cornea Restoration???, ??UTAP2: Transcriptome Analysis Pipeline, ?? GoldPolish for Targeted Sequence Polishing??

??vcfexpress: analysing VCF files, ??D-CAF: Dual-approach co-expression analysis framework, ??? GeneFEAST for functional enrichment analysis??

??Jellyfish: visualize tumor evolution, ?? SpatialKNifeY for spatial omics image processing, ??GABA effect on salt tolerance in strawberry??

??Halfpipe for RNAseq, ??BioChatter: LLM driven bioscience, ?? CamITree: Phylogenetic Analysis??, PNL: Polygenic Risk Scores using PairNet??

???ImmunoTar for Cancer Immunotherapy, ?? Spatial multiomics integration with SIMO??, ??CoMIT: COVID19 prediction??

??BioLake: RNA expression Analysis ??, ?? scaleSC for scRNA-seq data processing, ??GEMCAT: a genome-scale metabolic modeling algorithm??

??♂???FastCCC detecting cell-cell communications (CCCs)??, ?CuRVE framework: Organ Scale Protein Mapping, ??Krait2 for microsatellite investigation??

?PISCIS: Spot Detection in FISH Images??, ??CHOPOFF for CRISPR Off-Target Detection???, ??RASAM: Hyperaccessible Regions in DNA??

??LTA Chromothripsis in Osteosarcoma??, Ageing and Caloric Restriction in Muscle Wasting ??, Funmap: mapping causal variants ???

社区洞察

其他会员也浏览了

The Future of Genes is Algorithmic: 5 Real-Case Examples in Machine Learning for Genomics to Spark Your Curiosity

From Genes to Proteins: 10 Ways AI is Transforming Bioinformatics and Computational Biology

UniProt: The Google of Proteins! ????

The Unprecedented Shift: Navigating the Future with AI and Exponential Technologies

AI is Now Designing Life: Who Governs Synthetic Biology?

The Language of Cells: Protein-Protein Interactions Unraveled ????

Deep protein language models, CasPEDIA, & AWS for Bioinformatics

GET is a new model for understanding human biology

?? Decoding PubMed: A Bioinformatics Odyssey Through Time and Texts! ??

Exploring the Convergence of AI and Biosciences: Emerging Roles in Biosciences Startups in the US