ST-CellSeg: Cell Segmentation for Spatial Transcriptomics??, SPOT for Proteomic Data??, CViewer for Shotgun Metagenomics???, AI for Cancer Treatment??
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 44 | Date: 05 July 2024
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
?? Featured Research
The research explores the role of introns in protein production and diversification. It reveals that over 40% of protein-coding genes have their longest intron in the second or third tertile of all introns. The study also finds a correlation between the position of the longest intron and the gene’s involvement in different KEGG pathways. The findings suggest a link between intron length and the function of the spliceosome and ribosomes.
Source code is available here.
The research investigates the use of dinucleotide arrangements in ultra-conserved non-coding elements (UCNEs) for their computational identification within the human genome. Nine features of dinucleotide arrangements were identified that differentiate UCNEs from the rest of the genome. Machine Learning techniques were used to classify UCNEs with an accuracy rate of 82-84%. The study also explored the representation of UCNE-associated SNPs in the ClinVar database, finding it aligns with a random distribution.
This study investigates the use of gene expression data to identify diagnostic biomarkers for tuberculosis. Using two human whole blood microarray datasets, the study identifies 62 common deregulated genes (DEGs) linked to immune response and type-II interferon signaling. The expression levels of three specific genes, GBP5, IFITM3, and EPSTI1, were found to be potential diagnostic markers.
The study introduces a new method for analyzing time series transcriptomics data from clinical trials. This method uses multi-commodity flow algorithms for trajectory inference in large scale clinical studies, considering individual-based timing restrictions and integrating data from multiple patients. The approach showed improved performance and identified novel disease subtypes when tested on multiple drug datasets.
Source code is available here.
The study introduces a method to identify Alternative Splicing (AS) events, especially for low-abundance genes, using reverse transcription (RT) PCR with gene-specific primers (GSPs) followed by nested PCR. This approach addresses the shortcomings of conventional methods such as expressed sequence tags (EST), microarrays, and RNA-seq.
??? Latest Tools
SPatial Omnibus Test (SPOT) for analysing spatial proteomics data. SPOT evaluates the association between spatial summary and outcome across a range of radii, adjusting for confounders. It aggregates results using the Cauchy combination test, providing an omnibus p-value for the overall degree of association. The method was tested on simulations and applied to ovarian and lung cancer studies, showing improved power over alternatives.
Source code is available here.
DeepGSEA, a deep learning approach for gene set enrichment (GSE) analysis in single-cell RNA sequencing data, uses prototype-based neural networks to capture GSE information and perform significance tests on each gene set. It allows for visualization of the underlying distribution of a gene set. The method was tested for sensitivity and specificity with simulation studies and applied to real scRNA-seq datasets.
Source code is available here.
Self-Learning Gene Clustering Pipeline (SGCP) is a spectral method for detecting modules in gene co-expression networks. It incorporates a step that uses gene ontology (GO) information in a self-learning process. The method was applied to 12 real gene expression datasets, and it was observed that SGCP produces modules with a high degree of GO enrichment. SGCP assigns a high statistical importance to GO terms that are mostly different from those reported by other methods.
Source code is available here.
CViewer is a Java-based statistical framework for exploring shotgun metagenomics data. This framework integrates with conventional pipelines and offers both exploratory and hypothesis-driven analyses. It provides a highly interactive toolkit that simplifies the analysis of multiomics datasets. The framework uses algorithms based on numerical ecology and machine learning principles to find correlations among datasets and provide discrimination based on case-control relationships.
Source code is available here.
scType is a deconvolution-free marker-based cell annotation method for spatial transcriptomics (ST) assays. It does not require computationally intensive deconvolution or large single-cell reference atlases. The method enables fast and accurate identification of abundant cell types from ST data, particularly when a large panel of genes is detected. The method was applied to Visium and Slide-seq assays, which balance high resolution and the number of genes detected for cell type annotation.
Source code is available here.
Variant Impact Predictor database (VIPdb) version 2 is a collection of Variant Impact Predictors (VIPs) developed over the past 25 years. It summarizes the characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns of 403 VIPs. The database includes VIPs capable of predicting the impacts of various types of genetic variants.
Database is available here.
HyDRA is a pipeline that combines short and long-read RNA sequencing data to enhance the accuracy of custom transcriptome assemblies. It was developed and validated using data from ovarian and fallopian tube samples, identifying over 50,000 high-confidence long noncoding RNAs. The pipeline is relevant due to the vast availability of short-read RNAseq data, and it could help discover high-confidence transcripts within specific cell types and tissues.
领英推荐
Source code is available here.
BioCoder is a benchmark developed to evaluate the performance of large language models (LLMs) in generating bioinformatics-specific code. The benchmark was used to evaluate various models, including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. The results highlight the importance of accommodating a long prompt with full context and containing domain-specific knowledge of bioinformatics for successful models.
Source code is available here.
ST-CellSeg is an image-based machine learning method for spatial transcriptomics. It uses a manifold for cell segmentation and considers multi-scale information. The method constructs a fully connected graph as a spatial transcriptomic manifold and determines the low-dimensional spatial probability distribution representation for cell segmentation.
Source code is available here.
GraphCompass is a set of graph analysis methods for evaluating and comparing the spatial arrangement of cells in samples from diverse biological conditions. It builds on the Squidpy spatial omics toolbox and includes statistical approaches for cross-condition analyses. It can be applied to various omics techniques and is showcased through its application to three different studies.
Source code is available here.
?? Community News
The Australian National University (ANU) has developed an AI tool, DeepPT, which predicts a patient’s mRNA profile to aid in personalized cancer treatment. In conjunction with another tool, ENLIGHT, it can predict a patient’s response to various cancer therapies.
Scientists at the Max Delbrück Center have developed an open-source spatial transcriptomics platform, Open-ST, which creates 3D molecular maps from tissue samples with subcellular precision. This allows for the reconstruction of gene expression in cells within a tissue in three dimensions.?
?? Upcoming Events
This course offers insights into next-generation sequencing data analysis, covering topics like sequence informatics, variant calling, and pipeline creation. It’s aimed at researchers starting to use high-throughput sequencing technologies in their work.
?? Educational Corner
This blog post discusses the value of analyzing Copy Number Variations (CNVs) in gene panels. It highlights how CNVs, which constitute a significant type of variation, can enhance the diagnostic yield of routine gene panel tests.
This blog provides insights into how to use the power of regular expressions for the analysis and manipulation of DNA, RNA, and protein sequences. Regular expressions, a powerful tool in text processing, can be effectively used in bioinformatics tasks such as pattern matching and sequence analysis.
This tutorial offers participants an opportunity to learn about biological data analysis using R and Unix/Linux tools. It starts with an introduction to bioinformatics and omics data analysis. The tutorial concludes with a practical walkthrough of a simple bioinformatics workflow, specifically focusing on aligning transcriptomic sequences with genomic data.
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.