??SequenceCraft: for RNA-cleaving deoxyribozymes, ?? RiceSNP-ABST: identification of SNPs, ?? PIXANT, a multi-phenotype imputation tool??
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 68 | Date: 10 January 2025
?? Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you are a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we have got you covered. Subscribe now to stay ahead in the exciting realm of Bioinformatics!
?? Featured Research
The study explores the presence of conserved tissue-specific cryptic poison exons within human introns. Researchers identified a group of cryptic exons that are evolutionarily conserved and exhibit RNA-seq read coverage similar to annotated exons. These exons often generate nonsense-mediated decay (NMD) isoforms and show tissue-specific and cancer-specific expression. The findings suggest that many cryptic exons remain unannotated due to their low abundance in RNA-seq libraries.
The study examines the regional flow of antimicrobial resistance genes (ARGs) among the One Health sectors in Dengfeng, Henan Province, China. Researchers identified 40 ARG types and 743 ARG subtypes across human, food, and environmental samples, with a higher ARG load in food and environmental samples. The study highlights the role of dietary habits, occupational exposure, and horizontal gene transfer in ARG dissemination. Machine learning models based on microbiome profiles were effective in predicting the presence of carbapenem-resistant strains.
The study introduces a 1D convolutional neural network (CNN) for predicting the positions of capture baits in whole-exome sequencing (WES) kits, aimed at normalizing GC bias and enhancing copy number variation (CNV) detection. The research found that incorporating experimental coverage, on-target information, and sequence data improved bait prediction accuracy. This approach aims to reduce systemic biases in genomic studies and improve CNV detection sensitivity and specificity.
The study introduces TransConvNet, a hybrid model combining transformer and convolutional neural networks, for classifying prostate cancer metabolomics data. It features a 1D convolution layer and a gating mechanism to enhance feature extraction and attention weights.
A genome-scale metabolic model for the denitrifying bacterium Thauera sp. MZ1T accurately predicts degradation of pollutants and production of polymers | PLOS Computational Biology
The study presents a genome-scale metabolic model for Thauera sp. MZ1T, a denitrifying bacterium, to predict pollutant degradation and polymer production. The model, consisting of 1,744 metabolites, 2,384 reactions, and 861 genes, was validated using over 70 different carbon and nitrogen sources. It achieved a prediction accuracy of 95% for growth on various sources and 85% for aromatic compound assimilation under denitrifying conditions. The model helps reveal metabolic processes influencing wastewater treatment systems and natural environments.
The study investigates the intraneuronal binding of amyloid beta (Aβ) with reelin and its implications for Alzheimer's disease onset. It was found that in neurons of the entorhinal cortex with high reelin expression, Aβ and reelin bind directly, potentially making Aβ inert and protecting neurons. However, these neurons are among the first to be affected in Alzheimer's, suggesting a paradox. The study proposes that these neurons might have a higher capacity to produce Aβ, offering a potential explanation for this contradiction.
The study explores the use of deep sequencing of immune repertoires to predict the human leukocyte antigen (HLA) type from T-cell receptor (TCR) sequences. By analyzing large cohorts of HLA-typed donors, the researchers developed a computational model to infer associations between TCRs and HLA alleles. The model was refined using an iterative procedure, improving its predictive performance even for rare HLA alleles. This approach could aid in diagnostic and precision medicine tools, as well as immunotherapy design.
??? Latest Tools
The article present KMetaShot as a tool for fast and reliable taxonomic classification of MAG (metagenome-assembled genomes). The kMetaShot tool, which uses k-mer/minimizer counting, for taxonomy classification has been benchmarked against CAMITAX and GTDBtk with in silico and mock communities.
The tool documentation is available here.
The research paper has put forth a deep learning model called RiceSNP-ABST to predict for the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice. The model has been trained on six different datasets with The RiceSNP-ABST model is also available as a web-based tool.
It can be accessed here.
The study developed a deep learning algorithm called DeepYY1 to identify YY1-mediated chromatin loops, achieving high prediction performance across various datasets. The research highlighted the importance of sequences in forming these loops and additionally, the distribution of replication origin sites within the loops are briefly discussed.
The web server can be accessed here.
In this study, researchers presented the development of CapHLA tool, which is based on a convolution and attention-based model, to predict peptide presentation probability (PB) and binding affinities (BA) for HLA-I and HLA-II. Unlike genetics-based models, the model which was trained with eluted ligand and binding affinity mass spectrometry data was used to develop a neoantigen quality model for predicting immunotherapy response in combination with antigen expression level (EP) from transcriptomic data.
It is evident that operation of MTFs (Master transcription factors) is strongly correlated with the initiation and progression of cancer. The researchers of the study developed a master transcription factors prediction and analysis web resource (MTFAP) to predict and analyze MTFs with different data types. MTFAP extends support for further analysis and data visualization for the MTFs identified by Coltron and CRCmapper.
The platform is freely available here.
PIXANT, a multi-phenotype imputation method based on mixed fast random forest, is a new arrival among the many phenotypic imputation method. This method utilizes efficient machine learning (ML)-based algorithms to improve statistical power of genome-wide association studies (GWAS).
The tool is available here.
This study introduces a deep learning approach called AEGAN, which merges the functionalities of AutoEncoder and GAN to create synthetic samples for the minority class in imbalanced gene expression datasets. This data balancing method has proven effective in cancer classification and enhances the performance of classifier models.
The source code is available here.
This article discusses a Python library called rnalib, which is designed for developing custom bioinformatics analysis methods. Rnalib provides a fast, readable, reproducible, and robust framework for creating innovative tools and methods for transcriptomics data analysis.
Source code, documentation are available here.
This article introduces gretl, an efficient, comprehensive, and integrated tool for analyzing genome graphs. gretl offers a wide range of statistics to gain insights into the structure and composition of these graphs. It can be used to evaluate different graphs, compare the outputs of graph construction pipelines with varying parameters, and conduct in-depth analyses of individual graphs, including sample-specific analysis.
The tool code is available at here.
This article describes SampleExplorer, a tool that enables researchers to search for relevant data using both text and gene set queries. SampleExplorer embeds sample metadata and employs a transformer-based language model (LM) to retrieve similar datasets. It offers an efficient method for discovering relevant gene expression datasets in large public repositories.
Model is accessible at here.
领英推荐
This article introduces QuICSeedR, an R package offering a comprehensive toolkit for the automated processing, analysis, and visualization of F-SAA data. Notably, QuICSeedR also lays the groundwork for establishing an F-SAA data management and analysis framework, promoting more consistent and comparable results across various research groups.
?QuICSeedR is available here.
The article introduces IDclust, a framework designed for unsupervised identification of cell types using single-cell transcriptomics and epigenomics. IDclust employs biologically meaningful thresholds and iterative clustering to identify clusters with significant biological features at multiple resolutions. The framework ensures all clusters have distinct features and stops clustering when no more interpretable clusters are found.
The R package is available here.
The study introduces SequenceCraft, a machine learning-based platform for analyzing RNA-cleaving deoxyribozymes, featuring a curated database of over 350 catalytic cores and optimized algorithms for predicting observed rate constants. The platform allows for quantitative analysis and clustering of data, aiming to streamline the discovery of effective RNA-cleaving DNAzymes.
The tool is available on GitHub here.
?? Community News
The article discusses how GPT-4 shows potential in analyzing continuous glucose monitoring data, providing accurate metrics and summaries. However, it also highlights the need for further refinement to enhance its application in diabetes care.?
The study examines how socioeconomic development influences the global cancer burden, highlighting disparities in cancer incidence and mortality between high and low HDI countries. It emphasizes the need for targeted strategies to address these disparities and improve cancer outcomes worldwide.
Stanford researchers have developed an AI model that integrates visual and text data to improve cancer prognosis predictions. This model, trained on extensive medical images and texts, outperforms standard methods and aims to transform patient care.
Researchers have developed cell-type specific epigenetic clocks to measure biological age at the resolution of individual cell types. This advancement provides a more detailed understanding of aging processes and could improve the diagnosis and treatment of age-related diseases.
?? Upcoming Events
This webinar by EMBL-EBI focuses on the automated annotation processes in UniProt. It will cover the methodologies and tools used to annotate protein sequences automatically, ensuring high-quality and consistent data. Participants will learn how these automated systems work and how to leverage them for their research.
This webinar explores the latest advancements in in situ structural biology. It aims to expand the toolbox available for structural cell biology, discussing new techniques and technologies that allow for detailed structural analysis within the cellular context. Attendees will gain insights into how these tools can be applied to their research.
This webinar provides a comprehensive overview of microbial metagenomics. It covers the entire workflow from sample collection to data analysis, offering a 360o approach to studying microbial communities. Participants will learn about the latest methods and tools used in metagenomics and how to apply them to their research projects.
This webinar discusses the future of microbial genomics, highlighting new technologies and methodologies that are shaping the field. It focuses on innovations that improve the accuracy, efficiency, and scalability of microbial genomic studies. Attendees will learn about the latest trends and how these advancements can be integrated into their research.
?? Educational Corner
Integrating functional programming with modern AI tools boosts efficiency and creativity. This post examines how AI can generate individual functions that can be combined into complete programs, emphasizing the key benefits of functional programming and how AI enhances this process.?
Researchers analyze Single Nucleotide Polymorphism (SNP) datasets to determine disease relevance by mapping SNP locations to identifiers, referencing databases, and interpreting literature. This blog suggests a workflow to map SNPs to?rsIDs, find associated diseases or traits, analyze results, and interpret them.?
Tidyverse, a collection of R packages, improves data usability, exploration, and sharing. This article emphasizes the significance of?Tidyverse?functions, presenting six examples with the?mtcars?dataset to illustrate how chaining functions can achieve optimal results.?
Automating version control is crucial for preventing disorganized code and avoiding code loss. This post outlines the steps required to connect R Studio to GitHub, ensuring that coding projects remain reproducible and collaboration-friendly.?
Transforming data is essential in statistical analysis and preprocessing. In R, proper transformations help meet statistical assumptions, normalize distributions, and improve analysis accuracy. This guide explains how to implement and visualize common transformations—logarithmic, square root, and cube root—using base R functions.?
?? Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
?? Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: [email protected]
Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.