??SequenceCraft: for RNA-cleaving deoxyribozymes, ?? RiceSNP-ABST: identification of SNPs, ?? PIXANT, a multi-phenotype imputation tool??

??SequenceCraft: for RNA-cleaving deoxyribozymes, ?? RiceSNP-ABST: identification of SNPs, ?? PIXANT, a multi-phenotype imputation tool??

Bioinformer Weekly Roundup

Stay Updated with the Latest in Bioinformatics!

Issue: 68 | Date: 10 January 2025

?? Welcome to the Bioinformer Weekly Roundup!

In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you are a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we have got you covered. Subscribe now to stay ahead in the exciting realm of Bioinformatics!

?? Featured Research

Human introns contain conserved tissue-specific cryptic poison exons | NAR Genomics and Bioinformatics

The study explores the presence of conserved tissue-specific cryptic poison exons within human introns. Researchers identified a group of cryptic exons that are evolutionarily conserved and exhibit RNA-seq read coverage similar to annotated exons. These exons often generate nonsense-mediated decay (NMD) isoforms and show tissue-specific and cancer-specific expression. The findings suggest that many cryptic exons remain unannotated due to their low abundance in RNA-seq libraries.

Regional antimicrobial resistance gene flow among the One Health sectors in China | Microbiome

The study examines the regional flow of antimicrobial resistance genes (ARGs) among the One Health sectors in Dengfeng, Henan Province, China. Researchers identified 40 ARG types and 743 ARG subtypes across human, food, and environmental samples, with a higher ARG load in food and environmental samples. The study highlights the role of dietary habits, occupational exposure, and horizontal gene transfer in ARG dissemination. Machine learning models based on microbiome profiles were effective in predicting the presence of carbapenem-resistant strains.

Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions | BMC Bioinformatics

The study introduces a 1D convolutional neural network (CNN) for predicting the positions of capture baits in whole-exome sequencing (WES) kits, aimed at normalizing GC bias and enhancing copy number variation (CNV) detection. The research found that incorporating experimental coverage, on-target information, and sequence data improved bait prediction accuracy. This approach aims to reduce systemic biases in genomic studies and improve CNV detection sensitivity and specificity.

Deep learning-based metabolomics data study of prostate cancer | BMC Bioinformatics

The study introduces TransConvNet, a hybrid model combining transformer and convolutional neural networks, for classifying prostate cancer metabolomics data. It features a 1D convolution layer and a gating mechanism to enhance feature extraction and attention weights.

A genome-scale metabolic model for the denitrifying bacterium Thauera sp. MZ1T accurately predicts degradation of pollutants and production of polymers | PLOS Computational Biology

The study presents a genome-scale metabolic model for Thauera sp. MZ1T, a denitrifying bacterium, to predict pollutant degradation and polymer production. The model, consisting of 1,744 metabolites, 2,384 reactions, and 861 genes, was validated using over 70 different carbon and nitrogen sources. It achieved a prediction accuracy of 95% for growth on various sources and 85% for aromatic compound assimilation under denitrifying conditions. The model helps reveal metabolic processes influencing wastewater treatment systems and natural environments.

Intraneuronal binding of amyloid beta with reelin—Implications for the onset of Alzheimer’s disease | PLOS Computational Biology

The study investigates the intraneuronal binding of amyloid beta (Aβ) with reelin and its implications for Alzheimer's disease onset. It was found that in neurons of the entorhinal cortex with high reelin expression, Aβ and reelin bind directly, potentially making Aβ inert and protecting neurons. However, these neurons are among the first to be affected in Alzheimer's, suggesting a paradox. The study proposes that these neurons might have a higher capacity to produce Aβ, offering a potential explanation for this contradiction.

Learning predictive signatures of HLA type from T-cell repertoires | PLOS Computational Biology

The study explores the use of deep sequencing of immune repertoires to predict the human leukocyte antigen (HLA) type from T-cell receptor (TCR) sequences. By analyzing large cohorts of HLA-typed donors, the researchers developed a computational model to infer associations between TCRs and HLA alleles. The model was refined using an iterative procedure, improving its predictive performance even for rare HLA alleles. This approach could aid in diagnostic and precision medicine tools, as well as immunotherapy design.

??? Latest Tools

kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes | Briefings in Bioinformatics

The article present KMetaShot as a tool for fast and reliable taxonomic classification of MAG (metagenome-assembled genomes). The kMetaShot tool, which uses k-mer/minimizer counting, for taxonomy classification has been benchmarked against CAMITAX and GTDBtk with in silico and mock communities.

The tool documentation is available here.

RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice | Briefings in Bioinformatics

The research paper has put forth a deep learning model called RiceSNP-ABST to predict for the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice. The model has been trained on six different datasets with The RiceSNP-ABST model is also available as a web-based tool.

It can be accessed here.

DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops | Briefings in Bioinformatics

The study developed a deep learning algorithm called DeepYY1 to identify YY1-mediated chromatin loops, achieving high prediction performance across various datasets. The research highlighted the importance of sequences in forming these loops and additionally, the distribution of replication origin sites within the loops are briefly discussed.

The web server can be accessed here.

CapHLA: a comprehensive tool to predict peptide presentation and binding to HLA class I and class II | Briefings in Bioinformatics

In this study, researchers presented the development of CapHLA tool, which is based on a convolution and attention-based model, to predict peptide presentation probability (PB) and binding affinities (BA) for HLA-I and HLA-II. Unlike genetics-based models, the model which was trained with eluted ligand and binding affinity mass spectrometry data was used to develop a neoantigen quality model for predicting immunotherapy response in combination with antigen expression level (EP) from transcriptomic data.

The tool is accessible here and also the code provided in the GitHub repository is here.

MTFAP: a comprehensive platform for predicting and analyzing master transcription factors | Scientific Reports

It is evident that operation of MTFs (Master transcription factors) is strongly correlated with the initiation and progression of cancer. The researchers of the study developed a master transcription factors prediction and analysis web resource (MTFAP) to predict and analyze MTFs with different data types. MTFAP extends support for further analysis and data visualization for the MTFs identified by Coltron and CRCmapper.

The platform is freely available here.

Rapid and accurate multi-phenotype imputation for millions of individuals | Nature Communications

PIXANT, a multi-phenotype imputation method based on mixed fast random forest, is a new arrival among the many phenotypic imputation method. This method utilizes efficient machine learning (ML)-based algorithms to improve statistical power of genome-wide association studies (GWAS).

The tool is available here.

AEGAN-Pathifier: a data augmentation method to improve cancer classification for imbalanced gene expression data | BMC Bioinformatics

This study introduces a deep learning approach called AEGAN, which merges the functionalities of AutoEncoder and GAN to create synthetic samples for the minority class in imbalanced gene expression datasets. This data balancing method has proven effective in cancer classification and enhances the performance of classifier models.

The source code is available here.

Rnalib: a Python library for custom transcriptomics analyses | Oxford Academic

This article discusses a Python library called rnalib, which is designed for developing custom bioinformatics analysis methods. Rnalib provides a fast, readable, reproducible, and robust framework for creating innovative tools and methods for transcriptomics data analysis.

Source code, documentation are available here.

Gretl—Variation Graph Evaluation TooLkit | Oxford Academic

This article introduces gretl, an efficient, comprehensive, and integrated tool for analyzing genome graphs. gretl offers a wide range of statistics to gain insights into the structure and composition of these graphs. It can be used to evaluate different graphs, compare the outputs of graph construction pipelines with varying parameters, and conduct in-depth analyses of individual graphs, including sample-specific analysis.

The tool code is available at here.

SampleExplorer: Using language models to discover relevant transcriptome data | Oxford Academic

This article describes SampleExplorer, a tool that enables researchers to search for relevant data using both text and gene set queries. SampleExplorer embeds sample metadata and employs a transformer-based language model (LM) to retrieve similar datasets. It offers an efficient method for discovering relevant gene expression datasets in large public repositories.

Model is accessible at here.

QuICSeedR: An R package for analyzing fluorophore-assisted seed amplification assay data | Oxford Academic

This article introduces QuICSeedR, an R package offering a comprehensive toolkit for the automated processing, analysis, and visualization of F-SAA data. Notably, QuICSeedR also lays the groundwork for establishing an F-SAA data management and analysis framework, promoting more consistent and comparable results across various research groups.

?QuICSeedR is available here.

IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics | NAR Genomics and Bioinformatics

The article introduces IDclust, a framework designed for unsupervised identification of cell types using single-cell transcriptomics and epigenomics. IDclust employs biologically meaningful thresholds and iterative clustering to identify clusters with significant biological features at multiple resolutions. The framework ensures all clusters have distinct features and stops clustering when no more interpretable clusters are found.

The R package is available here.

SequenceCraft: machine learning-based resource for exploratory analysis of RNA-cleaving deoxyribozymes | BMC Bioinformatics

The study introduces SequenceCraft, a machine learning-based platform for analyzing RNA-cleaving deoxyribozymes, featuring a curated database of over 350 catalytic cores and optimized algorithms for predicting observed rate constants. The platform allows for quantitative analysis and clustering of data, aiming to streamline the discovery of effective RNA-cleaving DNAzymes.

The tool is available on GitHub here.

?? Community News

AI-powered glucose analysis: GPT-4 offers promise with room for refinement | News Medical Life Sciences

The article discusses how GPT-4 shows potential in analyzing continuous glucose monitoring data, providing accurate metrics and summaries. However, it also highlights the need for further refinement to enhance its application in diabetes care.?

Study shows the impact of socioeconomic development on global cancer burden | News Medical Life Sciences

The study examines how socioeconomic development influences the global cancer burden, highlighting disparities in cancer incidence and mortality between high and low HDI countries. It emphasizes the need for targeted strategies to address these disparities and improve cancer outcomes worldwide.

Stanford researchers develop AI model to enhance cancer prognosis predictions | News Medical Life Sciences ?

Stanford researchers have developed an AI model that integrates visual and text data to improve cancer prognosis predictions. This model, trained on extensive medical images and texts, outperforms standard methods and aims to transform patient care.

Epigenetic Clocks Show Biological Age at Cell-Type Resolution | Genetic Engineering and Biotechnology News?

Researchers have developed cell-type specific epigenetic clocks to measure biological age at the resolution of individual cell types. This advancement provides a more detailed understanding of aging processes and could improve the diagnosis and treatment of age-related diseases.

?? Upcoming Events

Automated annotation in UniProt | EMBL-EBI Training

This webinar by EMBL-EBI focuses on the automated annotation processes in UniProt. It will cover the methodologies and tools used to annotate protein sequences automatically, ensuring high-quality and consistent data. Participants will learn how these automated systems work and how to leverage them for their research.

In situ structural biology: expanding the toolbox for structural cell biology | EMBL

This webinar explores the latest advancements in in situ structural biology. It aims to expand the toolbox available for structural cell biology, discussing new techniques and technologies that allow for detailed structural analysis within the cellular context. Attendees will gain insights into how these tools can be applied to their research.

Microbial metagenomics: a 360o approach | EMBL

This webinar provides a comprehensive overview of microbial metagenomics. It covers the entire workflow from sample collection to data analysis, offering a 360o approach to studying microbial communities. Participants will learn about the latest methods and tools used in metagenomics and how to apply them to their research projects.

Delivering the Future of Microbial Genomics With Direct and Real-Time Sequencing | Technology Networks

This webinar discusses the future of microbial genomics, highlighting new technologies and methodologies that are shaping the field. It focuses on innovations that improve the accuracy, efficiency, and scalability of microbial genomic studies. Attendees will learn about the latest trends and how these advancements can be integrated into their research.

?? Educational Corner

Leveraging AI to Enhance Functional Programming in 2025 | R-bloggers?

Integrating functional programming with modern AI tools boosts efficiency and creativity. This post examines how AI can generate individual functions that can be combined into complete programs, emphasizing the key benefits of functional programming and how AI enhances this process.?

Cracking the Code: Deciphering SNP Relevance from Chromosomal Locations | Medium?

Researchers analyze Single Nucleotide Polymorphism (SNP) datasets to determine disease relevance by mapping SNP locations to identifiers, referencing databases, and interpreting literature. This blog suggests a workflow to map SNPs to?rsIDs, find associated diseases or traits, analyze results, and interpret them.?

Some of the more useful Tidyverse functions | R-bloggers?

Tidyverse, a collection of R packages, improves data usability, exploration, and sharing. This article emphasizes the significance of?Tidyverse?functions, presenting six examples with the?mtcars?dataset to illustrate how chaining functions can achieve optimal results.?

Connecting RStudio to GitHub | R|Py notes?

Automating version control is crucial for preventing disorganized code and avoiding code loss. This post outlines the steps required to connect R Studio to GitHub, ensuring that coding projects remain reproducible and collaboration-friendly.?

How to Transform Data in R (Log, Square Root, Cube Root) | R-bloggers?

Transforming data is essential in statistical analysis and preprocessing. In R, proper transformations help meet statistical assumptions, normalize distributions, and improve analysis accuracy. This guide explains how to implement and visualize common transformations—logarithmic, square root, and cube root—using base R functions.?

?? Connect with Us

Stay connected and engage with us on social media for daily updates, discussions, and more!

?? Subscribe

Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.

Subscribe Now

We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!


Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.

Contact: [email protected]

Copyright ? 2024, Bioinformer Weekly Roundup. All rights reserved.



要查看或添加评论,请登录

Zifo Bioinformatics的更多文章

社区洞察

其他会员也浏览了