??? vSNP for WGS Data, ?? RNA-seq with MultiRNAflow, ?? sc-RNA and Spatial Transcriptomics with ctQC, ?? KRAGEN: KG Enhanced RAG Framework

??? vSNP for WGS Data, ?? RNA-seq with MultiRNAflow, ?? sc-RNA and Spatial Transcriptomics with ctQC, ?? KRAGEN: KG Enhanced RAG Framework

Bioinformer Weekly Roundup

Stay Updated with the Latest in Bioinformatics!

Issue: 40 | Date: 7 June 2024

?? Welcome to the Bioinformer Weekly Roundup!

In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!

?? Featured Research

Single-nucleus analysis reveals microenvironment-specific neuron and glial cell enrichment in Alzheimer’s disease | BMC Genomics

The article talks about bioinformatic methods to analyse single-nucleus RNA sequencing data from Alzheimer’s disease (AD) patients. It identifies six cell types and a subcluster of neuron and glial cells co-expressing lncRNA-SNHG14, MRTFA, and MRTFB. This subcluster may contribute to microenvironment remodelling in AD.

Identification of 17 novel epigenetic biomarkers associated with anxiety disorders using differential methylation analysis followed by machine learning-based validation | bioRxiv

The study identifies 17 novel methylation biomarkers associated with anxiety disorders using targeted bisulfite sequencing. These biomarkers, linked to cell apoptosis, mitochondrial dysfunction, and neurosignaling regulation, were used to develop a diagnostic risk prediction system. The system, validated by machine learning, could enhance the clinical utility of anxiety disorder diagnostics.

Interpretable prediction of mRNA abundance from promoter sequence using contextual regression models | NAR Genomics & Bioinformatics

In this research, interpretable neural network models are designed to predict mRNA expression levels from DNA sequences. The Contextual Regression framework is applied to extract weighted features and cluster samples. Motif analysis reveals active or repressive regulation on gene expression and uncovers multiple grammars of motif combinations. This provides new understandings into the regulatory architecture of promoter sequences.

The source code is available here.

Computational approaches and challenges in the analysis of circRNA data | BMC Genomics

Advancements in sequencing methods and bioinformatics tools have enabled the characterization of circular RNAs (circRNAs), a class of non-coding RNA. This review discusses computational strategies for circRNA identification, their role within the competing endogenous RNA network, interactions with RNA-binding proteins, and available databases for circRNA annotation.

From RNA sequence to its three-dimensional structure: geometrical structure, stability, and dynamics of selected fragments of SARS-CoV-2 RNA | NAR Genomics & Bioinformatics

The computational study investigates the folding of a specific SARS-CoV-2 RNA fragment into various structures, including a G-quadruplex and five different hairpins. The impact of two types of counterions and flanking nucleotides on these structures is examined. The G-quadruplex structure is found to be the most stable, and the seven-nucleotide loop is the most flexible part of the RNA fragment.

HiFi long-read amplicon sequencing for full-spectrum variants of human mtDNA | BMC Genomics

The study uses LR-PCR and PacBio HiFi sequencing to analyse mitochondrial diseases (MDs). It finds that long-read sequencing (LRS) is more effective than next-generation sequencing (NGS) in detecting low-frequency single nucleotide variants (SNVs) and structural variants (SVs) in mitochondrial DNA (mtDNA). The research provides insights into the genetics of mitochondrial diseases.

Data is available here.

Benchmarking of bioinformatics tools for the hybrid de novo assembly of human whole-genome sequencing data | bioRxiv

The research benchmarks 11 different pipelines for de novo genome assembly of a human reference material. It evaluates software performance using QUAST, BUSCO, and Merqury metrics, and assesses computational costs. The study finds that Flye, especially when using Ratatosk error-corrected long-reads, and a combination of two rounds of Racon and Pilon polishing, yield the good results.

??? Latest Tools

Improving the Reliability and Quality of Nextflow Pipelines with nf-test | bioRxiv

The paper discusses nf-test, a unified testing framework for Nextflow pipelines. It enables developers to test process blocks, workflow patterns, and entire pipelines. nf-test features snapshot testing and smart testing, significantly reducing development time and test execution time by up to 80%.

The source code is available here.

vSNP: a SNP pipeline for the generation of transparent SNP matrices and phylogenetic trees from whole genome sequencing data sets | BMC Genomics

The vSNP pipeline, developed for diagnostic laboratories, enables easy verification and validation of sequence accuracy across various pathogens. It is used for real-time phylogenetic analysis of disease outbreaks and produces easy-to-read SNP matrices and phylogenetic trees.

The source code is available here.

KRAGEN: a knowledge Graph-Enhanced RAG framework for biomedical problem solving using large language models?| Oxford Academic Bioinformatics

KRAGEN, a new tool, combines knowledge graphs, Retrieval Augmented Generation, and advanced prompting techniques. It converts knowledge graphs into a vector database and retrieves relevant facts. KRAGEN breaks down complex problems into smaller subproblems, solves each using relevant knowledge, and consolidates the solutions.

The source code is available here.

ctQC improves biological inferences from single cell and spatial transcriptomics data | bioRxiv

This research introduces ctQC, a data-driven quality control approach tailored for single-cell RNA-seq data. It adapts to cell type variations, by separating cell types, mitigating cell stress signatures, and minimizing ambient RNA artifacts. ctQC also maintains spatial coherence of cell clusters in spatial RNA profiling data.

MultiRNAflow: integrated analysis of temporal RNA-seq data with multiple biological conditions?| Oxford Academic Bioinformatics

The MultiRNAflow suite is introduced for analyzing raw sequencing data. It combines several packages into a unified framework, enabling both exploratory and supervised statistical analyses of temporal data across multiple biological conditions. This makes it suitable for complex experimental designs.

The source code is available here.

Readon: a novel algorithm to identify read-through transcripts with long-read sequencing data?| Oxford Academic Bioinformatics

The research introduces Readon, a minimizer sketch algorithm developed to identify read-through transcripts in the human genome. It splits the reference sequence into active regions, calculates minimizers, and constructs arrays for query indexing. Readon does comparative assessments and includes tools for predicting transcript outcomes and visualizing splicing patterns.

The source code is available here.

ReactomeGSA: new features to simplify public data reuse?| Oxford Academic Bioinformatics

ReactomeGSA, a multi-omics pathway analysis platform, has been updated to simplify the reuse and integration of public data. The update includes a new Python application, grein_loader, to fetch experiments from the GREIN resource, supporting both EMBL-EBI’s Expression Atlas and GEO RNA-seq Experiments Interactive Navigator. A search function allows users to search for public datasets across both resources.

The source code is available here.

Predicted mechanistic impacts of human protein missense variants | bioRxiv

The study combines sequence and structure-based methods to predict the impact of protein missense variants on protein stability, protein-protein interactions, and small-molecule binding pockets. Using AlphaFold2, it predicts structures for nearly 500,000 protein complexes and about 100,000 small-molecule binding pockets. The study highlights the value of mechanism-aware variant effect predictions and characterizes the distribution of mechanistic impacts of protein variants found in patients.

?? Community News

STew: A new method for unveiling the spatial fingerprint of diseases | News Medical Life Sciences

Zhang and her team have developed a new computational machine learning method, Spatial Transcriptomic multi-viEW (STew), which allows for the joint analysis of spatial variation and gene expression changes. This method can handle large amounts of cells and effectively combines location and genetic information.

Multiomics Couples Data Generation and Clinical Science | GEN Genetic Engineering & Biotechnology

The multiomics approach, which combines genomics, transcriptomics, proteomics, and digital pathology, is becoming a reality in research and clinical medicine. Despite some omics technologies lagging and the challenge of integrating massive amounts of data, this approach is enriching our understanding of health and disease. With the aid of artificial intelligence, it is expected to improve efficiency and outcomes in treating cancer and other serious illnesses and is just beginning to advance drug discovery and precision medicine.

Discovery of antibiotic lolamicin that targets deadly bacteria without harming gut microbiome | News Medical Life Sciences

Researchers have designed lolamicin, a selective antibiotic that targets the lipoprotein transport system in Gram-negative bacteria. It’s effective against multidrug-resistant pathogens, spares the gut microbiome, and prevents secondary infections. The development of lolamicin addresses the need for new Gram-negative antibacterial agents, as most antibiotics disrupt the gut microbiome, and no new class has been approved by the FDA in over 50 years.

?? Educational Corner

Transitioning from gnomAD V2 to V4: Impact on genetic variant classification | Zifo Clinical Genomics

The gnomAD database, crucial for determining the pathogenicity of observed variants, has been updated to version 4 (V4). V4, based on reference genome GRCh38, includes variants from exomes and genomes. This blog discusses about important considerations to keep in mind as most labs will transition over from gnomAD V2 to V4.

CBIOPORTAL – HOW TO MAKE IT WORK FOR YOU | Zifo Bioinformatics

This blog post discusses about the cBioPortal, an open-source platform that visualizes, analyses, and serves large-scale cancer genomics data sets. It talks about the data, computational requirements, installation steps and various challenges involved when working across cBioPortal.

Simple custom colour palettes with R ggplot graphs | R Bloggers

Exploring a method to establish a color palette in R with the use of ggplot library. This blog discusses about the code snippets utilization from the ggplot library in R for color palette setting.

?? Connect with Us

Stay connected and engage with us on social media for daily updates, discussions, and more!

?? Subscribe

Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.

Subscribe Now

We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!


Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.

Contact: [email protected]

Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.



要查看或添加评论,请登录

社区洞察

其他会员也浏览了