Article 3. Microbial Taxonomy in the Age of Genomics

1. Microbial Taxonomy and Nomenclature:

Microbial taxonomy encompasses two primary tasks:

  • The identification of an isolate as a known species or as a novel taxa: The rapid explosion of Massively Parallel Sequencing Technology (MPST) has fundamentally changed how we study microbes. A plethora of web-based and stand alone tools can assist researchers in analyzing sequences, assembling genomes, and comparing them, leading to deeper insights into the microbial world. Modern microbial taxonomy relies heavily on Overall Genome Related Index (OGRI) methods. These methods quantify the similarity or distances between two microbial genomes and aid in the crucial task of

> identification, in which an isolate is assigned to a known species or is recognized as belonging to a previously unknown group (taxa)

AND

> classification, whereby the identified isolate is placed in the established established taxonomic framework.

  • The nomenclature (naming) of a species if the isolate has been identified as a new species: Rules governing the nomenclature of scientific names for Bacteria and Archaea are laid down in the International Code of Nomenclature of Prokaryotes (ICNP) (1), formerly the International Code of Nomenclature of Bacteria (ICNB) or Bacteriological Code (BC) (2). ICNP has since been discussed and revised (2-4). The Great Automatic Nomenclature (GAN) tool has been developed to generate a large number of new linguistically correct Latin and Greek genera and species names for naming Bacteria and Archaea (5).

2. OGRI methods can be used to evaluate the delineation of taxonomic ranks

Overall Genome Relatedness Indices (OGRI) computation is between a pair of genomes. The term OGRI was first coined by Chun and Rainey (6) and at the time of writing, a large number of OGRI methods have been developed. In all OGRI methods, the alignment fraction (AF) used to measure similarity / distances between two genomes, their genes or their proteins is dependent on the purity of the sequences generated from the genome (that is not a mixed genome).

A list of OGRI methods is given below. Different OGRI methods will produce different similarity values which can than be applied to assess the delineation of the taxonomic rank of the isolate under study. Currently of all the OGRI methods tested, ANI methods provide a robust species demarcation capabilities but there is no consensus yet on the most useful and versatile OGRI method for delineating genera, though there have been some reports on the use of POCP and AAI to delineate genera.

  • Average Nucleotide Identity (ANI) (7)
  • Percentage of Conserved DNA (cDNA) (7)
  • EzAAI (8)
  • OrthoANI (9)
  • CDS-based ANI (cANI) [10)
  • gANI (12)
  • Alignment Fraction (AF) (13)
  • One way Average Amino-acid Identity (AAI) (11)
  • Percentage Of Conserved Proteins (POCP) (12A, 12B)
  • Proteome Coverage (ProCov) (13)
  • Reciprocal AAI (rAAI) (14)
  • tANI (15)
  • Microbial Genome Atlas (miga) (16)
  • PHylogenomic ANalyses for the TAxonomy and Systematics of Microbes (PHANTASM) (17)
  • pyani (18)
  • aniclustermap (19)

References:

1. https://www.the-icsp.org/index.php/executive-board-ics-p

2 Parker CT, Tindall BJ, Garrity GM (2019). International Code of Nomenclature of Prokaryotes (2008 revision). Int J Syst Evol Microbiol 69:S1–S111.

3. Arahal DR, Bull CT, Busse H-J, Christensen H, Chuvochina M, Dedysh SN, Fournier P-E, Konstantinidis KT, Parker CT, Rossello-Mora R, Ventosa A, G?ker, M. (2023). Guidelines for interpreting the International Code of Nomenclature of Prokaryotes and for preparing a Request for an Opinion. Int. J Syst Evol Microbiol 73:005782

4. Oren A, Arahal DR, Rosselló-Móra R, Sutcliffe IR, Moore ERB (2021). Preparing a revision of the International Code of Nomenclature of Prokaryotes. Int J Syst Evol Microbiol 71:004598.

5. Pallen MJ, Telatin A, Oren A (2021). The Next Million Names for Archaea and Bacteria. Trends Biotechnol29: 289-298. doi.org/10.1016/j.tim.2020.10.009.

6. Chun J, Rainey FA (2014) Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. Int J Syst Evol Microbiol 64:316-324. doi:10.1099/ijs.0.054171-0

7. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol, 57:81-91. doi:10.1099/ijs.0.64483-0

8. Kim D, Park S, Chun J (2021). Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity. J Microbiol, 59:476-480. doi:10.1007/s12275-021-1154-0

9. Lee I, Ouk Kim Y, Park SC, Chun J. (2016). OrthoANI: An improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol, 66:1100-1103. doi: 10.1099/ijsem.0.000760.

10. Konstantinidis KT, Tiedje JM (2005). Genomic insights that advance the species definition for prokaryotes. PNAS 102:2567-2572. doi:/10.1073/pnas.0409727102

11. Konstantinidis KT, Tiedje JM (2005). Towards a Genome-Based Taxonomy for Prokaryotes. J Bacteriol, 187:6258-6264. doi:10.1128/JB.187.18.6258-6264.2005

12A. Nicholson AC, Gulvik CA, Whitney AM, Humrighouse BW, Bell ME, Holmes B, Steigerwalt AG, Villarma A, Sheth M, Batra D, Rowe LA, Burroughs M, Pryor JC, Bernardet J-F, Hugo C, K?mpfer P, Newman JD, McQuiston JR (2020). Division of the genus Chryseobacterium: Observation of discontinuities in amino acid identity values, a possible consequence of major extinction events, guides transfer of nine species to the genus Epilithonimonas, eleven species to the genus Kaistella, and three species to the genus Halpernia gen. nov., with description of Kaistella daneshvariae sp. nov. and Epilithonimonas vandammei sp. nov. derived from clinical specimens. Int J Syst Evol. Microbiol, 70:4432-4450. doi:10.1099/ijsem.0.003935

12B. https://github.com/hoelzer/pocp

13. Qin Q-L, Xie B-B, Zhang X-Y, Chen X-L, Zhou B-C, Zhou J, Oren A, Zhang Y-Z (2014). A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights. J Bacteriol, 196:2210-2215. doi:10.1128/JB.01688-14

14. Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides NC, Pati A (2015) Microbial species delineation using whole genome sequences. NAR, 43:6761-6771. doi:10.1093/nar/gkv657

15. Gosselin S, Fullmer MS, Feng Y, Gogarten JP (2022). Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Nonparametric Bootstrap Support. System Biol, 71:396–409. doi.org/10.1093/sysbio/syab060

16. Rodriguez-R LM, Gunturu S, Harvey WT, Rosselló-Mora R, Tiedje JM, Cole JR, Konstantinidis KT. (2018). The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level. NAR 46(W1):W282-W288. doi: 10.1093/nar/gky467.

17. Wirth, Joseph S, Bush, Eliot C (2023). Automating microbial taxonomy workflows with PHANTASM: PHylogenomic ANalyses for the TAxonomy and Systematics of Microbes. NAR 51:3067–3077, https://doi.org/10.1093/nar/gkad196

18. Pritchard L, Glover RH, Humphris S, Elphinstone JG, Toth, IK (2016). Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods 8:12-24, doi: 10.1039/C5AY02550H

19. https://github.com/moshi4/ANIclustermap


3. OGRI_B (ver 1.2): A Versatile Tool for OGRI Analysis --

A basic computational bash tool called OGRI_B, implements a number of different OGRI methods (ANI, AAI, POCP, …) and can be installed on UNIX, Linux and most OS X operating systems from https://gitlab.pasteur.fr/GIPhy/OGRI


OGRI_B offers several key advantages for studying and understanding different OGRI methods. It combines various OGRI methods, allowing users to explore how different approaches yield diverse similarity values for taxonomic rank delineation of isolates. This single tool eliminates the need to install and learn each OGRI method individually, making it an accessible and efficient learning resource.


However, OGRI_B has a few limitations. It is currently restricted to calculating pairwise similarity values, meaning it can only compare a single genome to another or a single genome to a group of others, but not analyze relationships between multiple genomes simultaneously. Additionally, although the use of multiple threads can improve speed, analysis times can still be significant. For example, generating a single similarity value between two 5 Mbp genomes using 12 threads can take up to one minute.


In conclusion, while OGRI_B has limitations in terms of functionality and speed, it remains a valuable tool for researchers seeking to explore and learn about various OGRI methods in a streamlined and user-friendly manner.


OGRI_B is a command line programs written in Bash to compute pairwise similarity measures between whole genome sequences. Every computed similarity is based on local sequence alignments:

? ? ? Average Nucleotide Identity (ANI),

? ? ? Percentage of Conserved DNA (cDNA)

? ? ? OrthoANI (oANI)

? ? ? Percentage Of Conserved Proteins (POCP)

? ? ? CDS-based ANI (cANI; gANI),

? ? ? Alignment Fraction (AF),

? ? ? (one-way) Average Amino-acid Identity (AAI)

? ? ? Proteome Coverage (ProCov)

? ? ? Reciprocal AAI (rAAI)

Installation:

#Clone this repository in an appropriate dir (e.g /home/user/opt/)

$git clone https://gitlab.pasteur.fr/GIPhy/OGRI.git        

#Go to directory /home/user/opt/OGRI and change the permission to make OGRI_B.sh executable:

$cd OGRI/
$chmod +x OGRI_B.sh        

#All my scripts are in a specific directory called scripts (e.g /home/usr/opt/scripts); usr is your login name. The path to the scripts is set in ~/.bashrc as export PATH=$PATH:/home/usr/opt/scripts/


Required Dependencies:

#The following dependencies (programs) are required to run OGRI_B.

Program version number source

gawk > 4.0.0 ftp.gnu.org/gnu/gawk

prodigal ≥ 2.6.3 github.com/hyattpd/Prodigal

blast+ ≥ 2.12.0 ftp.ncbi.nlm.nih.gov/blast/executables/blast+

#Check if the dependencies are already installed. If installed, the path to the program will be shown on your screen:

$which gawk

$which prodigal

$which blast+

#Install the dependencies, if not installed. Ubuntu comes with many packages ready for installation. You will need admin rights for this step

a) First, bring all the currently installed packages up to date and upgrade them to the most recent versions

$sudo apt-get update && sudo apt-get upgrade

b) Next, install the packages; all packages will be installed system wide

$sudo apt-get install gawk

$sudo apt-get install prodigal

$sudo apt-get install ncbi-blast+


Usage:

#getting help:

$OGRI_B.sh -h        

NOTE: If at least one of the dependencies (programs) is not available in your $PATH variable or has a different name, than the OGRI_B.sh will exit with an error message.


#Running examples:

There are implementations of many different OGRI methods with example genome files. Please refer to https://gensoft.pasteur.fr/docs/OGRI/1.2/


要查看或添加评论,请登录

Bharat P.的更多文章

社区洞察

其他会员也浏览了