Python code to "Download Genome Table ( All chromosomal/ Assembly Data) from NCBI"?

Python code to "Download Genome Table ( All chromosomal/ Assembly Data) from NCBI"

Python code to "Download Genome Table ( All chromosomal/ Assembly Data) from NCBI"

#pythonprogramming #bioinformatics #genome #computationalbiology

Searching the NCBI genome by "Organism Name" can open up detailed tabulated information on the whole genome of the respective organism. The table enlists chromosomal ID, NCBI Refseq ID, INSDC (International Nucleotide Sequence Databases Collaboration) ID, chromosomal/ assembly size, GC %, Number of proteins detected, Total RNA counts (rRNA, tRNA, and other RNAs), Number of genes, and so on. If your project involves a phase where you need to download the entire genome information, at the chromosomal level, in .fasta format, I have come up with a solution in the form of a python code that can ease you at this task. You can perform this otherwise tedious (when considering the processes of downloading individual chromosomal sequence data, Naming individual sequence files and organizing them into named directories, cross-checking with the GenBank data for their size) task just by providing the NCBI genome "URL" for the subject organism and a custom "Name" at your disposal, for organizing the data. My python code leverages the utility of the following libraries:

  1. Requests: In order to make requests to the webpage to get the source code data.
  2. Beautifulsoup as bs4 to scrape web-data using the "lxml" parser
  3. Biopython: To parse .fasta files and and use sequence objects to work with multiple attributes of sequence data.
  4. Pandas: To organize the table content in a human-readable format.
  5. Openpyxl: For organizing data into .xlsx format.

The Python code for the abovementioned task is provide in the GitHub, and can be accessed at: Entry_Form/NCBI_Whole_Genome_downloadfile at master · Vijithkumar2020/Entry_Form (github.com)

要查看或添加评论,请登录

Vijithkumar Vijayan的更多文章

社区洞察

其他会员也浏览了