?? SARS-CoV-2 Variant Splicing ?? CDBProm: Bacterial Promoter Directory ?? scGPT: Single-Cell Multi-Omics ?? Personalized Immunotherapy ??
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 26 | Date: 1 March 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
The study presents Adversarial Information Factorization, a new method for addressing batch effects in single cell RNA sequencing data. Unlike existing methods, it doesn't require prior knowledge of cell types or specific normalization strategies. It reports to have outperformed current techniques in certain scenarios, maintaining gene expression relationships between cell types and improving clustering metrics in a leukaemia cohort dataset.
The study develops a Boolean model of breast cancer signalling pathways tailored to five cell lines, reproducing protein activities and drug-response behaviours. Protein synergy scores (PSSs) assess the effects of drug combinations, revealing synergistic effects across cancer cell lines. Validation against experimental data demonstrates high correlation. Clustering analysis identifies distinct patterns in protein activities corresponding to different modes of synergy.
The study links alternative polyadenylation with genome-wide association studies (GWAS) across 22 cancer types, revealing its impact on cancer susceptibility genes. Co-localized genetic variants affect 28.57% of cancer loci, influencing cancer heritability. Identification of 642 cancer susceptibility genes underscores the role of alternative polyadenylation in cancer etiology, with validation showing its impact on CRLS1 gene expression in breast cancer cells.
The research suggests using computer-assisted metabolic profiling to improve metabolomics study design. Through constraint-based modelling and flux simulation, it predicts metabolite profiles under perturbations like genetic diseases. SAMBA (SAMpling Biomarker Analysis) ranks potential biomarkers by comparing simulated flux distributions between control and disease conditions, yielding insights into metabolic perturbations.
The study investigates host-viral interactions during the COVID-19 pandemic, focusing on alternative splicing post-infection. Analysing RNA splicing patterns across SARS-CoV-2 variants and vaccination status, it reveals significant deactivation of alternative splicing in infected individuals. Vaccinated Omicron-infected patients display distinct splicing patterns, highlighting the complex interplay between viral variants, immunity, and splicing.
Protein Embedding based Alignments (PEbA) revolutionizes protein sequence alignment using embeddings from language models, outperforming traditional methods on thousands of benchmark alignments, particularly for highly divergent sequences. PEbA's superiority over recent methods like DEDAL and vcMSA demonstrates the efficacy of protein language model embeddings in achieving accurate alignments.
This paper introduces a new strategy utilizing RNA nanotechnology and nanopores to study transcription and RNA polymerase behaviour, enabling insights into alternative transcription termination and processivity at the single-molecule level. This approach offers promising avenues for accurate RNA structural mapping and quantitative analysis of RNA transcripts.
The study introduces modified FDR control procedures (M1, M2, M3) for Multiple Comparison Procedures (MCPs), considering correlation structures based on information theory. Comparisons with BH and BY procedures using simulation and real colorectal cancer gene expression data highlight the efficiency of the proposed methods. Results show improved feature screening under different correlation levels, with M1 and M2 outperforming BH and BY in predictive model fitting.
??? Latest Tools
Breast cancer (BC) diagnosis may benefit from microRNA (miRNA) biomarkers. BSig, an evolutionary learning-based method, identifies a diagnostic miRNA signature from serum profiles, achieving high prediction accuracy. Twelve miRNAs, including hsa-miR-3185 and hsa-miR-3648, significantly contribute to BC diagnosis. Bioinformatics analysis reveals 65 miRNA-target genes specific to BC, shedding light on underlying mechanisms.
BSig, a tool capable of BC detection and facilitating therapeutic selection, is publicly available at here .
PRFect, a newly developed machine-learning tool for detecting and predicting programmed ribosomal frameshifts (PRFs) within coding genes. Through the integration of diverse cellular attributes, PRFect achieves high sensitivity, specificity, and accuracy, exceeding 90%. The code is openly accessible, open-source, and can be effortlessly installed via terminal command. PRFect represents a notable advancement in PRF detection, offering researchers a valuable tool for exploring ribosomal frameshifting mechanisms.
Introducing scCASE, a method for enhancing single-cell chromatin accessibility sequencing (scCAS) data using non-negative matrix factorization and an updated cell-to-cell similarity matrix. scCASE demonstrates superiority over existing methods, facilitating interpretable identification of cell type-specific peaks, and offering biological insights. It is expanded to scCASER, incorporating external reference data for enhanced performance.
rMATS-turbo, a revamped version of the rMATS tool for analyzing alternative splicing from RNA-seq data. This protocol outlines the updated software, maintaining its statistical framework and user interface while enhancing speed and data storage efficiency. rMATS-turbo accommodates large RNA-seq datasets and enables efficient parallel processing on compute clusters, facilitating diverse studies on alternative splicing mechanisms.
This study talks about scGPT, a foundation model for single-cell biology constructed from a generative pretrained transformer across a repository of over 33 million cells. Our study explores the application of foundation models in advancing cellular biology and genetic research. scGPT effectively distils critical biological insights concerning genes and cells, with potential for further optimization through transfer learning across diverse downstream applications.
CANAL, a universal cell-type annotation tool that continuously fine-tunes a pre-trained language model on emerging scRNA-seq data. CANAL employs techniques such as experience replay and representation knowledge distillation to maintain model performance and adapt to new data without forgetting previous information. Comprehensive experiments demonstrate CANAL's effectiveness and versatility in various biological scenarios.
领英推荐
An implementation of CANAL is available here.
GPAD is a tool using Natural Language Processing to extract gene-disease association data from OMIM. GPAD analyses gene-phenotype associations and validation methods, offering real-time insights into discovery trends. Trends reveal increasing GDA discoveries post-exome sequencing, followed by a recent decline due to larger cohort requirements.
Introducing MaskGraphene, a method for integrating spatial transcriptomics (ST) data from multiple slices using self-supervised and contrastive learning. It efficiently aligns and integrates ST slices, facilitating spatial-aware data integration and identification of shared and unique cell/domain types. Applied to various ST datasets, MaskGraphene optimizes joint embedding, performs batch correction, and tracks spatiotemporal changes during embryonic development.
kalis is an R package facilitating rapid computation of the Li & Stephens (LS) model for inferring recent ancestry at variants across genomes. Leveraging multi-core parallelism and CPU vector instruction sets, kalis enables scaling to large genomic datasets, supporting local ancestry, selection, and association studies.
Introducing BOLT-seq, a method for cost-effective and scalable transcriptome profiling that skips RNA purification and allows for quick library construction. This approach clusters small molecule drugs based on mechanisms of action and intended targets, offering an alternative for transcriptome profiling.
Comprehensive Directory of Bacterial Promoters (CDBProm) is a resource for in-silico predicted bacterial promoter sequences. Using an Extreme Gradient Boosting (XGBoost) algorithm, CDBProm accurately identifies promoters with 87% accuracy. Trained on over 55 million upstream regions from 6000 bacterial genomes, it maps potential promoters to genomic data, facilitating efficient quantitative analysis of bacterial promoters.
The collection with over 24 million promoters is publicly available at here.
?? Community News
Researchers from the Barcelona Institute of Science and Technology Centre for Genomic Regulation (CRG) discovered that the Snhg11 gene, less active in brains with Down syndrome(DS), may contribute to memory deficits. This noncoding RNA affects neuron function in the hippocampus, suggesting new therapeutic strategies. By analysing mouse models and human tissue, they unveiled insights into DS-related gene expression through single nucleus RNA sequencing.
Recent research suggests that cancers classified as immunologically "cold" may produce cancer-fighting T cells. Scientists from LJI, UC San Diego, and UC San Diego Moores Cancer Centre observed pre-existing immunity against tumour neoantigens in these tumours, potentially indicating a path for personalized immunotherapy. Published in Science Translational Medicine, the study represents progress in identifying immune cell targets for individual patient tumours.
?? Upcoming Events
This webinar covers metagenomic methods for human microbiome research and statistical analysis techniques for understanding its composition and function. Suitable for all interested in microbiome studies, it requires no prior bioinformatics knowledge but benefits from basic biology understanding. Participants will gain insights into the human microbiome's significance, available methods, and future challenges.
?? Educational Corner
This course covers the latest software and best practices in long-read data analysis. This course introduces principles of long-read data analysis, focusing on advancements in sequencing technologies like Oxford Nanopore and Pacific Biosciences. Basic knowledge of molecular biology and proficiency in Linux BASH command line are prerequisites for attendees.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
?
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.