?? MGnify: Protein Database, ??PAMLj module for jamovi, ?OpenDock for Protein-Ligand Docking, ??BIMSA: Sequence Alignment ??
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 60 | Date: 01 November 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
Gene interaction networks map functional interactions between genes, often analyzed through co-expression networks on RNA sequencing data. Single-cell RNA sequencing (scRNAseq) provides detailed cellular insights but poses challenges due to high sparsity and dimensionality. A new framework using sparse inverse covariance matrix estimation captures direct interactions in scRNAseq data, improving performance with data transformation and zero-inflated modeling. This framework is implemented in Snakemake and the R package ZINBStein, enhancing scRNAseq analysis with flexibility and efficiency.
E. coli chemotactic motion can be studied using lab experiments or macroscale PDEs but bridging them requires understanding all underlying fields and conditions. Machine learning, using Whitney and Takens embedding theorems, addresses these challenges by identifying PDEs from simulation and experimental data. These models can infer chemonutrient concentrations from bacterial density history under reasonable conditions. Data-driven PDEs can predict bacterial density profiles and estimate unmeasured chemonutrient field evolution.
Zebrafish collective behaviour, influenced by environmental factors, is used to assess physical and mental states. Genetic modification can control this behaviour. This study examines zebrafish lacking the col11a2 gene, linked to early-onset osteoarthritis. These mutants exhibited more orderly collective motion compared to wildtype fish. Using an active matter model, this behaviour was interpreted by modelling mutants with higher orientational noise, indicating potential for tuning biological systems through genetic modifications.
Next-generation sequencing (NGS) technologies allow for studying cellular processes over time but often require data interpolation due to limited time points. The NGS chess problem compares sequencing data analysis to observing multiple independent chess games simultaneously. The analysis of the spatiotemporal kinetics advocates for a new methodology that considers DNA-particle interactions in each cell independently even for a homogeneous cell population.
To address the lack of intronic reads in secondary structure probing data for human MYC pre-mRNA, the SIRP-seq method was developed. This method combines spliceosomal inhibition with RNA probing and sequencing, using dimethyl sulfate and the spliceosome inhibitor pladienolide B to retain intronic sequences in HeLa cells. This increased read coverage over MYC intronic regions, allowing for complete reactivity profiles via DMS-MaPseq. The analysis, performed with the DRACO program, revealed distinct reactivity profiles and predicted multiple secondary structural conformations. This method offers insights into MYC RNA splicing regulation and potential therapeutic targeting.
This study integrates single-cell RNA sequencing (scRNA-seq) data with known genetic disorder genes and phenotypic information to predict specific cell types disrupted by pathogenic mutations for 482 disease phenotypes. The analysis revealed significant phenotype-cell type associations, highlighting the role of cellular context in disease manifestation and the potential of single-cell data for developing targeted therapies.
Researchers mined scientific literature to extract causal relationships between diseases, mapping them to ICD identifiers for clinical application. These validated causal associations were used to create a directed acyclic graph for causal inference frameworks, improving polygenic risk scores and disentangling pleiotropic effects of variants.
This study combined metabolomic and transcriptome analyses to identify crucial pathways and differentially expressed genes (DEGs) and metabolites (DEMs) involved in the response to water stress in Ophiocordyceps sinensis. Key findings include the upregulation of genes related to carbohydrate metabolism and the β-oxidation pathway, which work together to provide energy under water stress.
By leveraging GWAS associations with temporal and spatial brain expression data, this study elucidates the dynamics of gene expression during infancy. It highlights the dominant influence of these genes on the neuronal system, providing insights into developmental and neurological disorders.
??? Latest Tools
EuDockScore: Euclidean graph neural networks for scoring protein-protein interfaces | Oxford Academic The study introduces EuDockScore, a Euclidean graph neural network-based model for evaluating protein-protein interactions. It includes EuDockScore-Ab for antibody-antigen interactions and EuDockScore-AFM for re-ranking AlphaFold-Multimer docking predictions. This tool offers an efficient and precise computational alternative to experimental scoring of protein interactions.
The code for these models is available here.
The article introduces BIMSA (Bidirectional In-Memory Sequence Alignment), a tool that speeds up sequence alignment using Processing-In-Memory (PIM) technology. By applying the BiWFA (Bidirectional Wavefront Alignment) algorithm, BIMSA handles sequences up to 100,000 bases and outperforms existing methods. It also scales with the number of compute units, promising further improvements with newer PIM architectures.
Code and documentation are available here.
OpenDock is an open-source framework for protein-ligand docking and modeling, built using Python and PyTorch. Developed to address limitations in traditional molecular docking tools, and allows integration of various scoring functions for both docking and post-processing. It supports sampling techniques like simulated annealing and Monte Carlo optimization, with potential extensions for genetic algorithms and particle swarm optimization. Additional features include distance constraints, facilitating covalent and restricted docking.
Code and documentation are available here.
iSeq facilitates easy retrieval of metadata and NGS data from multiple databases (GSA, SRA, ENA, DDBJ) via the command-line interface. It supports over 25 accession formats, Aspera downloads, parallel and multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing NGS data reanalysis.
The tool is freely available on bioconda and the code and documentation can be found here.
Antibody diversity, resulting from V(D)J recombination and mutations, often exhibits a bias towards germline residues in natural sequences. This bias makes it challenging for language models to suggest essential non-germline mutations for effective binding. AbLang-2, developed through a comprehensive study of germline bias effects, is optimized to predict non-germline residues and efficiently proposes a diverse array of valid mutations with high probability.
The usage and documentation of AbLang-2 is available here.
The Collapsible Tree is an interactive web app and JavaScript framework designed to visualize hierarchical tree structures with expandable and collapsible features. Unlike t-SNE and UMAP, which need separate figures, it combines cellular states and gene expression data from single-cell transcriptomics into one plot. This integration enables detailed comparisons of gene expression across lineages and uncovers subtle patterns between sub-lineages.
This study introduces KiNext, a Nextflow pipeline designed to identify and classify protein kinases from predicted proteomes. The pipeline adheres to FAIR principles (Findable, Accessible, Interoperable, and Reusable) and uses Hidden Markov Models (HMMs) to detect both conventional eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs). KiNext categorizes ePKs into eight groups based on their catalytic domains. This tool ensures reproducibility and traceability of the identified kinases.
The tool is available?here.
领英推荐
This study presents CMAGN, a novel computational model designed to predict circRNA–miRNA associations (CMAs). The model integrates several advanced computational techniques, including graph attention autoencoder and network consistency projection. The model reconstructs the similarity networks and applies network consistency projection to predict latent CMAs. CMAGN achieves an area under the ROC and PR curves exceeding 0.96 in five-fold cross-validation on two widely used CMA datasets.
The tool is available?here.
?? Community News
In this recent study, researchers discovered a genetic signature in newborns that can predict neonatal sepsis before symptoms appear. Conducted by UBC and SFU researchers in collaboration with the MRC Unit in The Gambia, this study has potential implications for early diagnosis in lower- and middle-income countries. The findings used machine learning to identify a four-gene signature that predicts sepsis with high accuracy. This advancement could improve early intervention and outcomes for infants worldwide.
Researchers have developed a model that goes beyond traditional genome-wide association studies (GWAS) to understand how specific mutations influence diseases. By integrating functional genomics data with GWAS findings, this approach delves into gene expression and protein interactions to reveal how mutations impact biological processes. This provides a broad picture of how genetic variations contribute to disease development.
This study published in Cellular and Molecular Life Sciences explores how mutations in the IER3IP1 gene cause MEDS1, a rare disease characterized by microcephaly, epilepsy, and diabetes. The research shows that these mutations disrupt protein transport between the endoplasmic reticulum and the Golgi apparatus, affecting nerve cell development and survival. This study provides insights into the disease mechanism and suggests potential areas for future therapeutic strategies.
Researchers at the University of California, Davis, and the University of California, Berkeley, have developed low-toxicity lipid nanoparticles (LNPs) that can deliver gene editing mRNA to the fetal mouse brain. Published in ACS Nano, the study shows that these LNPs degrade quickly, reducing inflammation risk while effectively transfecting brain cells. The technology proved potential for treating neurodevelopmental conditions like Angelman syndrome.
Scientists have developed a novel polymerase enzyme that significantly reduces stutter during forensic DNA analysis. This breakthrough enhances the accuracy and reliability of DNA profiling, especially in cases with low-quality or degraded samples. The new polymerase minimizes the formation of stutter artifacts, which are common issues in traditional methods. This advancement promises to improve the efficiency of forensic investigations and the interpretation of complex DNA mixtures.
Researchers at Technische Universit?t Berlin have combined a Bayesian algorithm, a digital twin of monoclonal antibodies (mAbs) cultivation, and 24 mini-bioreactor perfusion systems to optimize mAb development. This system allows for rapid experiments to identify conditions that maximize viable cell volume. The study highlights the valuable capabilities of integrating machine learning in bioprocess development, emphasizing the need for increased autonomy in biopharma innovation.
?? Upcoming Events
Genome Informatics 2024 conference hosted on Wellcome Genome Campus focuses on large scale approaches to understand the genome structure and biology. The conference aims to bring together computational biologists, human geneticists and others working on comparative and evolutionary genomics. The conference exhibits large genomic dataset analysis methods, sequence algorithms and approaches for genome assembly, covering areas of variant discovery, functional genomics, Pan genome, single cell, and spatial omics, as well as microbial and metagenomics methodologies. Registration to attend the conference virtually are still open and can found in the above link.? ?
The webinar from EMBL will show case the utility of new released protein database called MGnify. The MGnify protein database website provides information for the proteins identified through MGnify’s analysis pipeline. The webinar may be helpful for people interested in proteins predicted from metagenomic assemblies, and their functional annotations.
The webinar from EMBL showcase the newest updates to the job dispatcher, an EMBL-EBI hosted website, which provides access to a variety of bioinformatics tools and biological datasets via web and programmatic interfaces. The webinar aims to give a comprehensive guide on navigating various sequence analysis tools (EMBOSS, Clustal Omega, NCBI BLAST+, etc.) and intends to demonstrate the programmatic way for using the tools. Anyone interested in sequencing analysis without prior bioinformatic experience may attend this course virtually by registering on the above link on a first come first serve basis.
?? Educational Corner
This blog post discusses methods for comparing spatial patterns in continuous raster data across arbitrary regions using R. It highlights various techniques and packages that facilitate the analysis of spatial data, focusing on tools that help identify and visualize patterns and differences in raster datasets.
This article explores the transition of Nextflow workflows from high-performance computing (HPC) environments to cloud platforms. It covers the benefits and challenges of migrating workflows to the cloud, detailing the steps and considerations necessary to ensure efficient and scalable execution of bioinformatics pipelines.
This blog provides a beginner-friendly guide to mastering the clear and history commands in the Linux terminal. It explains how to use these commands to maintain a clean workspace and track command-line activities, offering practical examples and best practices.
This blog introduces PAMLj, a new power analysis module for jamovi, designed to enhance research planning. It supports a wide range of statistical tests, including ANOVA, regression, and SEM, allowing users to calculate necessary sample sizes, expected power, and minimal detectable effect sizes. The module also features sensitivity analysis with interactive graphs and tables, making it a comprehensive tool for researchers.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.