??? CREDO for Docker File, AI Boosts Patent Data in SureChEMBL ??, Unravelling Transcription Regulator Patterns in Single-Cell Transcriptomics! ????
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 28 | Date: 15 March 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
This research investigates the influence of various nf-core/rnaseq pipeline settings on data comparability, employing different workflows on multiple datasets with spike-ins. Comparative analysis of fold change ratios and differential gene expression reveals an 85% overlap across pipelines. Genes at lower concentrations and single-isoform genes are notably impacted. Recommendations emphasize maintaining pipeline consistency or gradual transition to ensure consistent data analysis.
The study employs multivariate analysis to identify additional SNPs linked to kidney function markers in CKD. Through CCA and metaCCA applied to genotype datasets, previously unreported lead SNPs for kidney function markers are identified. These SNPs show eQTL colocalization with CKD-related genes, suggesting potential pathways for further investigation. Additionally, correlation analysis suggests SNPs are jointly associated with both kidney function markers, making an impact on future CKD analyses.
Crescendo is introduced as an integration algorithm for spatial transcriptomics data, aiming to visualize genes and their spatial patterns across samples by correcting gene expression counts for technical confounders. Demonstrations on various datasets showcase Crescendo's scalability and cross-technology integration capabilities, facilitating the detection of gene-gene colocalization, including ligand-receptor interactions.
The study investigates challenges in telomere-to-telomere de novo genome assembly, focusing on gaps introduced by string graph simplification heuristics. Mathematical analysis highlights a higher gap frequency in Oxford Nanopore reads due to their read-length distribution. The RAFT algorithm addresses this issue by fragmenting reads, resulting in improved contiguity and haplotype resolution in real datasets.
This research examines how genes work together with transcription regulators (TRs) across many single-cell RNA-seq datasets in humans and mice. They focus on consistent patterns and create rankings for TRs, like other data methods. This helps find possible ways genes control each other, even across different species, like ASCL1 in brain cells. Suggesting that the approach could be helpful for research community.
The study explores targeting Powassan virus (POWV) proteins for vaccine development using bioinformatics. Predicted epitopes undergo rigorous screening for vaccine candidacy, with results suggesting potential effect in inducing immune responses against POWV. However, further research is needed to validate these findings and advance efforts against POWV and related arbovirus infections.
This report introduces a divide-and-conquer strategy for constructing multiple sequence alignments (MSAs) to predict protein-protein interactions (PPIs). By generating separate alignments within different clades of the evolutionary tree, coevolutionary signals are identified and integrated using machine learning techniques, aiming to improve prediction accuracy and alignment quality for genome-wide interaction scans in bacterial genomes.
This article investigates drought resistance in doum palms (Hyphaene compressa) through transcriptome analysis under water stress reveals significant differentially expressed genes (DEGs) related to cellular processes, metabolism, and transcription factors like MYB and WRKY. Findings of this article provides foundational insights into drought stress responses in doum palms, aiding future genetic modification efforts for enhanced resilience.
??? Latest Tools
The study introduces Oncosplice, a tool for assessing mutations' impact on alternative splicing. It identifies 8K deleterious variants among 12M somatic mutations and enhances patient survival estimation by pinpointing new cancer-involved genes. With a 94% positive predictive value for pathogenic variants, Oncosplice aims to accelerate insight into silent mutations' consequences and facilitate variant dataset filtering, with immediate experimental and clinical applications.
Source code for the Oncosplice is available here.
CREDO is a tool designed to address reproducibility challenges in bioinformatics research by simplifying the creation of Docker images with embedded bioinformatics tools. It facilitates the incorporation of heterogeneous packages and environments, enhancing Docker image reproducibility and promoting open science practices.
All data files are available at: CREDOengine; CREDOgui.
The paper introduces RNA3DB, a dataset derived from the Protein Data Bank (PDB) for training and benchmarking deep learning models in RNA structure prediction. RNA3DB divides RNA 3D chains into non-redundant groups (Components), ensuring distinct training, validation, and testing sets by sequence and structure. The dataset, along with methodology and source code, aims to facilitate reproducible and customizable dataset splits for structural RNA research.
The Disease-Related Variant Annotation (DVA) method predicts the effect of missense variants using a comprehensive set of variant features, including allele frequency and protein-protein interaction network features based on graph embedding. The article highlights that after benchmarking DVA against existing methods, it demonstrates improved predictive performance. This suggests that DVA presents better approach for assessing the functional impact of single nucleotide missense variants.
The source code for DVA is available here.??????????
The lit-OTAR framework, a collaboration between Europe PMC and Open Targets, uses deep learning to extract evidence from scientific literature for drug target identification. It integrates Named Entity Recognition (NER) to identify genes, diseases, organisms, and chemicals, linking them to databases. With millions of articles processed, it reveals numerous associations, aiding drug discovery and research.
领英推荐
GPDMiner is a text mining platform for the biomedical domain, addressing challenges posed by the growing volume of academic papers. It extracts and analyses biomedical information from PubMed, highlighting connections between genes, proteins, and diseases. With features for saving analytical outcomes, GPDMiner aids researchers in navigating and managing biomedical literature.
The newly developed R package, ursaPGx, is introduced to facilitate pharmacogenetic annotation using multi-sample phased whole-genome sequencing data VCF input files. This user-friendly tool outputs star allele annotations for pharmacogenes annotated in PharmVar, supporting the advantages of long-read sequencing technologies for robust pharmacogenetic analysis.
Code scripts used for analysis are available here.
?? Community News
AI-generated annotations incorporated into SureChEMBL have expanded the scope of information accessible to users. Developed in collaboration with EMBL-EBI’s Europe PMC team, the platform's integration with AI improves its comprehension of intricate patent data. Further enhancements, such as an open API and increased patent coverage, are anticipated to boost the platform's accessibility and functionalities.
Researchers from UC San Diego's Skaggs School of Pharmacy and Pharmaceutical Sciences have uncovered numerous previously unidentified bile acids, vital molecules used by the gut microbiome to communicate with the body. This discovery holds potential for enhancing our comprehension of the gut microbiome and its implications for conditions such as type 2 diabetes and intestinal bowel diseases.
All contemporary Plasmodium falciparum, the most lethal malaria parasite in humans, stem from a single original infection, rendering them highly related with minimal genetic variances. Within the P. falciparum genome, certain regions have long puzzled scientists due to significant mutation spikes. Researchers have now pinpointed two genes where these spikes occur, revealing that they arise from DNA transfer between genes.
?? Upcoming Events
This Workshop provides a comprehensive introduction to analysing single-cell RNA sequencing data. Participants learn processing, analysis, and integration using standard tools, covering topics like sequencing technologies, quality control, preprocessing, clustering, and differential expression analysis. Suitable for beginners and those updating their skills.
National Academies is hosting a webinar to unveil a new report on RNA modifications. This report outlines a forward-looking trajectory for sequencing RNA and its modifications, aiming to advance biology and medicine. Members of the study committee will present key insights and recommendations, with an interactive Q&A session for audience engagement.
?? Educational Corner
Batch effects, non-experimental factors that impact experimental data, are critical to identify and mitigate in research. This blog post delves into distinguishing significant differences from those that can be disregarded. The Functional Equivalence standard is useful but assumes errors are randomly distributed across the genome, which may not always be the case.
This blog talks about the comparison of qPCR, microarrays, and next-generation sequencing (NGS) and highlights key factors influencing their selection for genomic analysis. Factors such as sample size, depth of analysis, cost, and research objectives play significant roles in determining the most suitable method.
Next-generation sequencing (NGS) has transformed genetic research yet faces challenges due to its error rate of 1 per 1000 base pairs. This blog post mentions a recent study which investigates high-fidelity sequencing methods, identifying three key error contributors and proposing twelve strategies to address them. These strategies span from chemistry improvements to algorithm optimization, offering potential solutions to enhance sequencing accuracy.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
?
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.