??? vSNP for WGS Data, ?? RNA-seq with MultiRNAflow, ?? sc-RNA and Spatial Transcriptomics with ctQC, ?? KRAGEN: KG Enhanced RAG Framework
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 40 | Date: 7 June 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
The article talks about bioinformatic methods to analyse single-nucleus RNA sequencing data from Alzheimer’s disease (AD) patients. It identifies six cell types and a subcluster of neuron and glial cells co-expressing lncRNA-SNHG14, MRTFA, and MRTFB. This subcluster may contribute to microenvironment remodelling in AD.
The study identifies 17 novel methylation biomarkers associated with anxiety disorders using targeted bisulfite sequencing. These biomarkers, linked to cell apoptosis, mitochondrial dysfunction, and neurosignaling regulation, were used to develop a diagnostic risk prediction system. The system, validated by machine learning, could enhance the clinical utility of anxiety disorder diagnostics.
In this research, interpretable neural network models are designed to predict mRNA expression levels from DNA sequences. The Contextual Regression framework is applied to extract weighted features and cluster samples. Motif analysis reveals active or repressive regulation on gene expression and uncovers multiple grammars of motif combinations. This provides new understandings into the regulatory architecture of promoter sequences.
The source code is available here.
Advancements in sequencing methods and bioinformatics tools have enabled the characterization of circular RNAs (circRNAs), a class of non-coding RNA. This review discusses computational strategies for circRNA identification, their role within the competing endogenous RNA network, interactions with RNA-binding proteins, and available databases for circRNA annotation.
The computational study investigates the folding of a specific SARS-CoV-2 RNA fragment into various structures, including a G-quadruplex and five different hairpins. The impact of two types of counterions and flanking nucleotides on these structures is examined. The G-quadruplex structure is found to be the most stable, and the seven-nucleotide loop is the most flexible part of the RNA fragment.
The study uses LR-PCR and PacBio HiFi sequencing to analyse mitochondrial diseases (MDs). It finds that long-read sequencing (LRS) is more effective than next-generation sequencing (NGS) in detecting low-frequency single nucleotide variants (SNVs) and structural variants (SVs) in mitochondrial DNA (mtDNA). The research provides insights into the genetics of mitochondrial diseases.
Data is available here.
The research benchmarks 11 different pipelines for de novo genome assembly of a human reference material. It evaluates software performance using QUAST, BUSCO, and Merqury metrics, and assesses computational costs. The study finds that Flye, especially when using Ratatosk error-corrected long-reads, and a combination of two rounds of Racon and Pilon polishing, yield the good results.
??? Latest Tools
The paper discusses nf-test, a unified testing framework for Nextflow pipelines. It enables developers to test process blocks, workflow patterns, and entire pipelines. nf-test features snapshot testing and smart testing, significantly reducing development time and test execution time by up to 80%.
The source code is available here.
The vSNP pipeline, developed for diagnostic laboratories, enables easy verification and validation of sequence accuracy across various pathogens. It is used for real-time phylogenetic analysis of disease outbreaks and produces easy-to-read SNP matrices and phylogenetic trees.
The source code is available here.
KRAGEN, a new tool, combines knowledge graphs, Retrieval Augmented Generation, and advanced prompting techniques. It converts knowledge graphs into a vector database and retrieves relevant facts. KRAGEN breaks down complex problems into smaller subproblems, solves each using relevant knowledge, and consolidates the solutions.
The source code is available here.
This research introduces ctQC, a data-driven quality control approach tailored for single-cell RNA-seq data. It adapts to cell type variations, by separating cell types, mitigating cell stress signatures, and minimizing ambient RNA artifacts. ctQC also maintains spatial coherence of cell clusters in spatial RNA profiling data.
The MultiRNAflow suite is introduced for analyzing raw sequencing data. It combines several packages into a unified framework, enabling both exploratory and supervised statistical analyses of temporal data across multiple biological conditions. This makes it suitable for complex experimental designs.
领英推荐
The source code is available here.
The research introduces Readon, a minimizer sketch algorithm developed to identify read-through transcripts in the human genome. It splits the reference sequence into active regions, calculates minimizers, and constructs arrays for query indexing. Readon does comparative assessments and includes tools for predicting transcript outcomes and visualizing splicing patterns.
The source code is available here.
ReactomeGSA, a multi-omics pathway analysis platform, has been updated to simplify the reuse and integration of public data. The update includes a new Python application, grein_loader, to fetch experiments from the GREIN resource, supporting both EMBL-EBI’s Expression Atlas and GEO RNA-seq Experiments Interactive Navigator. A search function allows users to search for public datasets across both resources.
The source code is available here.
The study combines sequence and structure-based methods to predict the impact of protein missense variants on protein stability, protein-protein interactions, and small-molecule binding pockets. Using AlphaFold2, it predicts structures for nearly 500,000 protein complexes and about 100,000 small-molecule binding pockets. The study highlights the value of mechanism-aware variant effect predictions and characterizes the distribution of mechanistic impacts of protein variants found in patients.
?? Community News
Zhang and her team have developed a new computational machine learning method, Spatial Transcriptomic multi-viEW (STew), which allows for the joint analysis of spatial variation and gene expression changes. This method can handle large amounts of cells and effectively combines location and genetic information.
The multiomics approach, which combines genomics, transcriptomics, proteomics, and digital pathology, is becoming a reality in research and clinical medicine. Despite some omics technologies lagging and the challenge of integrating massive amounts of data, this approach is enriching our understanding of health and disease. With the aid of artificial intelligence, it is expected to improve efficiency and outcomes in treating cancer and other serious illnesses and is just beginning to advance drug discovery and precision medicine.
Researchers have designed lolamicin, a selective antibiotic that targets the lipoprotein transport system in Gram-negative bacteria. It’s effective against multidrug-resistant pathogens, spares the gut microbiome, and prevents secondary infections. The development of lolamicin addresses the need for new Gram-negative antibacterial agents, as most antibiotics disrupt the gut microbiome, and no new class has been approved by the FDA in over 50 years.
?? Educational Corner
The gnomAD database, crucial for determining the pathogenicity of observed variants, has been updated to version 4 (V4). V4, based on reference genome GRCh38, includes variants from exomes and genomes. This blog discusses about important considerations to keep in mind as most labs will transition over from gnomAD V2 to V4.
This blog post discusses about the cBioPortal, an open-source platform that visualizes, analyses, and serves large-scale cancer genomics data sets. It talks about the data, computational requirements, installation steps and various challenges involved when working across cBioPortal.
Exploring a method to establish a color palette in R with the use of ggplot library. This blog discusses about the code snippets utilization from the ggplot library in R for color palette setting.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.
Interesting ??