Python code to "Download Genome Table ( All chromosomal/ Assembly Data) from NCBI"
Vijithkumar Vijayan
DST-INSPIRE fellow | Ph.D. Scholar in Bioinformatics | Blogger | YouTuber
Python code to "Download Genome Table ( All chromosomal/ Assembly Data) from NCBI"
Searching the NCBI genome by "Organism Name" can open up detailed tabulated information on the whole genome of the respective organism. The table enlists chromosomal ID, NCBI Refseq ID, INSDC (International Nucleotide Sequence Databases Collaboration) ID, chromosomal/ assembly size, GC %, Number of proteins detected, Total RNA counts (rRNA, tRNA, and other RNAs), Number of genes, and so on. If your project involves a phase where you need to download the entire genome information, at the chromosomal level, in .fasta format, I have come up with a solution in the form of a python code that can ease you at this task. You can perform this otherwise tedious (when considering the processes of downloading individual chromosomal sequence data, Naming individual sequence files and organizing them into named directories, cross-checking with the GenBank data for their size) task just by providing the NCBI genome "URL" for the subject organism and a custom "Name" at your disposal, for organizing the data. My python code leverages the utility of the following libraries:
The Python code for the abovementioned task is provide in the GitHub, and can be accessed at: Entry_Form/NCBI_Whole_Genome_downloadfile at master · Vijithkumar2020/Entry_Form (github.com)