Python to access Uniprot and ClustalO
We have summer interns at Diadem Biotherapeutics , so I'm teaching them some handy bioinformatics and thought I would share. We're using Google Colab to execute the Python code.
1) Let's install libraries
import requests, io
import pandas as pd
2) Lets define the Uniprot API
Uniprot_API = 'https://rest.uniprot.org/uniprotkb/search?'
3) Let's write a "do_request" function to access the Uniprot API
def do_request(Uniprot_API, entry='', **kwargs):
params = ''
req = requests.get('%s%s%s' % (Uniprot_API, entry, params),params=kwargs)
if not req.ok:
print(req.text)
req.raise_for_status()
sys.exit()
return req
4) Let's use our "do_request" function query Uniprot using p53 as an example. To customize, more information on query fields and formats can be found here: https://www.uniprot.org/help/api
req = do_request(Uniprot_API, query='gene:p53 AND reviewed:true',
format='tsv',
fields='accession,id,length,organism_name,organism_id,xref_pdb,xref_hgnc',
size='50')
print(req.text)
Should return the following:
Entry Entry Name Length Organism Organism (ID) PDB HGN
P04637 P53_HUMAN 393 Homo sapiens (Human) 9606 1A1U;1AIE;1C26;1DT7;1GZH;1H26;1HS5;1JSP;1KZY;1MA3;1OLG;1OLH;1PES;1PET;1SAE;1SAF;1SAK;1SAL;1TSR;1TUP;1UOL;1XQH;1YC5;1YCQ;1YCR;1YCS;2AC0;2ADY;2AHI;2ATA;2B3G;2BIM;2BIN;2BIO;2BIP;2BIQ;2F1X;2FEJ;2FOJ;2FOO;2GS0;2H1L;2H2D;2H2F;2H4F;2H4H;2H4J;2H59;2J0Z;2J10;2J11;2J1W;2J1X;2J1Y;2J1Z;2J20;2J21;2K8F;2L14;2LY4;2MEJ;2MWO;2MWP;2MWY;2MZD;2OCJ;2PCX;2RUK;2VUK;2WGX;2X0U;2X0V;2X0W;2XWR;2YBG;2YDR;2Z5S;2Z5T;3D05;3D06;3D07;3D08;3D09;3D0A;3DAB;3DAC;3IGK;3IGL;3KMD;3KZ8;3LW1;3OQ5;3PDH;3Q01;3Q05;3Q06;3SAK;3TG5;3TS8;3ZME;4AGL;4AGM;4AGN;4AGO;4AGP;4AGQ;4BUZ;4BV2;4HFZ;4HJE;4IBQ;4IBS;4IBT;4IBU;4IBV;4IBW;4IBY;4IBZ;4IJT;4KVP;4LO9;4LOE;4LOF;4MZI;4MZR;4QO1;4RP6;4RP7;4X34;4XR8;4ZZJ;5A7B;5AB9;5ABA;5AOI;5AOJ;5AOK;5AOL;5AOM;5BUA;5ECG;5G4M;5G4N;5G4O;5HOU;5HP0;5HPD;5LAP;5LGY;5MCT;5MCU;5MCV;5MCW;5MF7;5MG7;5MHC;5MOC;5O1A;5O1B;5O1C;5O1D;5O1E;5O1F;5O1G;5O1H;5O1I;5OL0;5UN8;5XZC;6FF9;6FJ5;6GGA;6GGB;6GGC;6GGD;6GGE;6GGF;6LHD;6R5L;6RJZ;6RK8;6RKI;6RKK;6RKM;6RL3;6RL4;6RL6;6RM5;6RM7;6RWH;6RWI;6RWS;6RWU;6RX2;6RZ3;6S39;6S3C;6S40;6S9Q;6SHZ;6SI0;6SI1;6SI2;6SI3;6SI4;6SIN;6SIO;6SIP;6SIQ;6SL6;6SLV;6T58;6V4F;6V4H;6VQO;6VR1;6VR5;6VRM;6VRN;6W51;6XRE;6ZNC;7B46;7B47;7B48;7B49;7B4A;7B4B;7B4C;7B4D;7B4E;7B4F;7B4G;7B4H;7B4N;7BWN;7DHY;7DHZ;7DVD;7EAX;7EDS;7EEU;7EL4;7NMI;7RM4;7V97;7XZX;7XZZ;7YGI;8A31;8A32;8A92;8DC4;8DC6;8DC7;8DC8;8F2H;8F2I; HGNC:11998;
P10361 P53_RAT 391 Rattus norvegicus (Rat) 10116
P02340 P53_MOUSE 390 Mus musculus (Mouse) 10090 1HU8;2GEQ;2IOI;2IOM;2IOO;2P52;3EXJ;3EXL;
Q42578 PER53_ARATH 335 Arabidopsis thaliana (Mouse-ear cress) 3702 1PA2;1QO4;
O09185 P53_CRIGR 393 Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus) 10029
Q8SPZ3 P53_DELLE 387 Delphinapterus leucas (Beluga whale) 9749
Q9TTA1 P53_TUPBE 393 Tupaia belangeri (Common tree shrew) (Tupaia glis belangeri) 37347
P61260 P53_MACFU 393 Macaca fuscata fuscata (Japanese macaque) 9543
P56424 P53_MACMU 393 Macaca mulatta (Rhesus macaque) 9544
P79892 P53_HORSE 280 Equus caballus (Horse) 9796
Q29537 P53_CANLF 381 Canis lupus familiaris (Dog) (Canis familiaris) 9615
P56423 P53_MACFA 393 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 9541
Q9TUB2 P53_PIG 386 Sus scrofa (Pig) 9823
Q9W678 P53_BARBU 369 Barbus barbus (Barbel) (Cyprinus barbus) 40830
P25035 P53_ONCMY 396 Oncorhynchus mykiss (Rainbow trout) (Salmo gairdneri) 8022
O12946 P53_PLAFE 366 Platichthys flesus (European flounder) (Pleuronectes flesus) 8260
O57538 P53_XIPHE 342 Xiphophorus hellerii (Green swordtail) 8084
O93379 P53_ICTPU 376 Ictalurus punctatus (Channel catfish) (Silurus punctatus) 7998
P79820 P53_ORYLA 352 Oryzias latipes (Japanese rice fish) (Japanese killifish) 8090
Q92143 P53_XIPMA 342 Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus) 8083
Q9W679 P53_TETMU 367 Tetraodon miurus (Congo puffer) 94908C
4) Let's make it pretty with Pandas. Pandas is a great way to work with structured data. More here: https://pandas.pydata.org/docs/
uniprot_list = pd.read_table(io.StringIO(req.text), sep='\t'
uniprot_list.head()
Should return the following
5) Let's use our do_request function to get the protein sequences in FASTA format
req = do_request(Uniprot_API, query='gene:p53 AND reviewed:true',
format='fasta')
fasta = req.text
print(fasta)
This should return the following:
领英推荐
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 OS=Homo sapiens OX=9606 GN=TP53 PE=1 SV=
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG
GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
>sp|P10361|P53_RAT Cellular tumor antigen p53 OS=Rattus norvegicus OX=10116 GN=Tp53 PE=1 SV=1
MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNSMEDLFLPQDVAELLEGPE
EALQVSAPAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFLQSGTAKSV
MCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERC
SDGDGLAPPQHLIRVEGNPYAEYLDDRQTFRHSVVVPYEPPEVGSDYTTIHYKYMCNSSC
MGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEEHCPELPPG
SAKRALPTSTSSSPQQKKKPLDGEYFTLKIRGRERFEMFRELNEALELKDARAAEESGDS
RAHSSYPKTKKGQSTSRHKKPMIKKVGPDSD
>sp|P02340|P53_MOUSE Cellular tumor antigen p53 OS=Mus musculus OX=10090 GN=Tp53 PE=1 SV=4
MTAMEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEA
LRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYGFHLGFLQSGTAKSVM
CTYSPPLNKLFCQLAKTCPVQLWVSATPPAGSRVRAMAIYKKSQHMTEVVRRCPHHERCS
DGDGLAPPQHLIRVEGNLYPEYLEDRQTFRHSVVVPYEPPEAGSEYTTIHYKYMCNSSCM
GGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEVLCPELPPGS
AKRALPTCTSASPPQKKKPLDGEYFTLKIRGRKRFEMFRELNEALELKDAHATEESGDSR
AHSSYLKTKKGQSTSRHKKTMVKKVGPDSD
>sp|Q42578|PER53_ARATH Peroxidase 53 OS=Arabidopsis thaliana OX=3702 GN=PER53 PE=1 SV=1
MAVTNLPTCDGLFIISLIVIVSSIFGTSSAQLNATFYSGTCPNASAIVRSTIQQALQSDT
RIGASLIRLHFHDCFVNGCDASILLDDTGSIQSEKNAGPNVNSARGFNVVDNIKTALENA
CPGVVSCSDVLALASEASVSLAGGPSWTVLLGRRDSLTANLAGANSSIPSPIESLSNITF
KFSAVGLNTNDLVALSGAHTFGRARCGVFNNRLFNFSGTGNPDPTLNSTLLSTLQQLCPQ
NGSASTITNLDLSTPDAFDNNYFANLQSNDGLLQSDQELFSTTGSSTIAIVTSFASNQTL
FFQAFAQSMINMGNISPLTGSNGEIRLDCKKVNGS
>sp|O09185|P53_CRIGR Cellular tumor antigen p53 OS=Cricetulus griseus OX=10029 GN=TP53 PE=2 SV=1
MEEPQSDLSIELPLSQETFSDLWKLLPPNNVLSTLPSSDSIEELFLSENVTGWLEDSGGA
LQGVAAAAASTAEDPVTETPAPVASAPATPWPLSSSVPSYKTYQGDYGFRLGFLHSGTAK
SVTCTYSPSLNKLFCQLAKTCPVQLWVNSTPPPGTRVRAMAIYKKLQYMTEVVRRCPHHE
RSSEGDSLAPPQHLIRVEGNLHAEYLDDKQTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDPSGNLLGRNSFEVRICACPGRDRRTEEKNFQKKGEPCPELP
PKSAKRALPTNTSSSPPPKKKTLDGEYFTLKIRGHERFKMFQELNEALELKDAQASKGSE
DNGAHSSYLKSKKGQSASRLKKLMIKREGPDSD
>sp|Q8SPZ3|P53_DELLE Cellular tumor antigen p53 OS=Delphinapterus leucas OX=9749 GN=TP53 PE=2 SV=1
MEESQAELGVEPPLSQETFSDLWKLLPENNLLSSELSPAVDDLLLSPEDVANWLDERPDE
APQMPEPPAPAAPTPAAPAPATSWPLSSFVPSQKTYPGSYGFHLGFLHSGTAKSVTCTYS
PALNKLFCQLAKTCPVQLWVSSPPPPGTRVRAMAIYKKSEYMTEVVRRCPHHERCSDYSD
GLAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGM
NRRPILTIITLEDSNGNLLGRNSFEVRVCACPGRDRRTEEENFHKKGQSCPELPTGSAKR
ALPTGTSSSPPQKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGESRAHS
SHLKSKKGQSPSRHKKLMFKREGPDSD
>sp|Q9TTA1|P53_TUPBE Cellular tumor antigen p53 OS=Tupaia belangeri OX=37347 GN=TP53 PE=2 SV=1
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
SVTCTYSPDLNKLFCQLAKTCPVQLWVDSAPPPGTRVRAMAIYKQSQYVTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLHAEYSDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGKLLGRNSFEVRICACPGRDRRTEEENFRKKGESCPKLP
TGSIKRALPTGSSSSPQPKKKPLDEEYFTLQIRGRERFEMLREINEALELKDAMAGKESA
GSRAHSSHLKSKKGQSTSRHRKLMFKTEGPDSD
>sp|P61260|P53_MACFU Cellular tumor antigen p53 OS=Macaca fuscata fuscata OX=9543 GN=TP53 PE=2 SV=1
MEEPQSDPSIEPPLSQETFSDLWKLLPENNVLSPLPSQAVDDLMLSPDDLAQWLTEDPGP
DEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYGFRLGFLHSGTAK
SVTCTYSPDLNKMFCQLAKTCPVQLWVDSTPPPGSRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYSDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPCHQLP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPA
GSRAHSSHLKSKKGQSTSRHKKFMFKTEGPDSD
>sp|P56424|P53_MACMU Cellular tumor antigen p53 OS=Macaca mulatta OX=9544 GN=TP53 PE=2 SV=1
MEEPQSDPSIEPPLSQETFSDLWKLLPENNVLSPLPSQAVDDLMLSPDDLAQWLTEDPGP
DEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYGFRLGFLHSGTAK
SVTCTYSPDLNKMFCQLAKTCPVQLWVDSTPPPGSRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYSDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPCHQLP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPA
GSRAHSSHLKSKKGQSTSRHKKFMFKTEGPDSD
>sp|P79892|P53_HORSE Cellular tumor antigen p53 (Fragment) OS=Equus caballus OX=9796 GN=TP53 PE=2 SV=2
PAVNNLLLSPDVVNWLDEGPDEAPRMPAAPAPLAPAPATSWPLSSFVPSQKTYPGCYGFR
LGFLNSGTAKSVTCTYSPTLNKLFCQLAKTCPVQLLVSSPPPPGTRVRAMAIYKKSEFMT
EVVRRCPHHERCSDSSDGLAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEPPEVGSDC
TTIHYNFMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENF
RKKEEPCPEPPPRSTKRVLSSNTSSSPPQKKKPLDGEYFT
>sp|Q29537|P53_CANLF Cellular tumor antigen p53 OS=Canis lupus familiaris OX=9615 GN=TP53 PE=2 SV=2
MEESQSELNIDPPLSQETFSELWNLLPENNVLSSELCPAVDELLLPESVVNWLDEDSDDA
PRMPATSAPTAPGPAPSWPLSSSVPSPKTYPGTYGFRLGFLHSGTAKSVTWTYSPLLNKL
FCQLAKTCPVQLWVSSPPPPNTCVRAMAIYKKSEFVTEVVRRCPHHERCSDSSDGLAPPQ
HLIRVEGNLRAKYLDDRNTFRHSVVVPYEPPEVGSDYTTIHYNYMCNSSCMGGMNRRPIL
TIITLEDSSGNVLGRNSFEVRVCACPGRDRRTEEENFHKKGEPCPEPPPGSTKRALPPST
SSSPPQKKKPLDGEYFTLQIRGRERYEMFRNLNEALELKDAQSGKEPGGSRAHSSHLKAK
KGQSTSRHKKLMFKREGLDSD
>sp|P56423|P53_MACFA Cellular tumor antigen p53 OS=Macaca fascicularis OX=9541 GN=TP53 PE=2 SV=2
MEEPQSDPSIEPPLSQETFSDLWKLLPENNVLSPLPSQAVDDLMLSPDDLAQWLTEDPGP
DEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYGFRLGFLHSGTAK
SVTCTYSPDLNKMFCQLAKTCPVQLWVDSTPPPGSRVRAMAIYKQSQHMTEVVRRCPHHE
RCSDSDGLAPPQHLIRVEGNLRVEYSDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEPCHQLP
PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPA
GSRAHSSHLKSKKGQSTSRHKKFMFKTEGPDSD
>sp|Q9TUB2|P53_PIG Cellular tumor antigen p53 OS=Sus scrofa OX=9823 GN=TP53 PE=2 SV=1
MEESQSELGVEPPLSQETFSDLWKLLPENNLLSSELSLAAVNDLLLSPVTNWLDENPDDA
SRVPAPPAATAPAPAAPAPATSWPLSSFVPSQKTYPGSYDFRLGFLHSGTAKSVTCTYSP
ALNKLFCQLAKTCPVQLWVSSPPPPGTRVRAMAIYKKSEYMTEVVRRCPHHERSSDYSDG
LAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGMN
RRPILTIITLEDASGNLLGRNSFEVRVCACPGRDRRTEEENFLKKGQSCPEPPPGSTKRA
LPTSTSSSPVQKKKPLDGEYFTLQIRGRERFEMFRELNDALELKDAQTARESGENRAHSS
HLKSKKGQSPSRHKKPMFKREGPDSD
>sp|Q9W678|P53_BARBU Cellular tumor antigen p53 OS=Barbus barbus OX=40830 GN=tp53 PE=2 SV=1
MAESQEFAELWERNLISTQEAGTCWELINDEYLPSSFDPNIFDNVLTEQPQPSTSPPTAS
VPVATDYPGEHGFKLGFPQSGTAKSVTCTYSSDLNKLFCQLAKTCPVQMVVNVAPPQGSV
IRATAIYKKSEHVAEVVRRCPHHERTPDGDGLAPAAHLIRVEGNSRALYREDDVNSRHSV
VVPYEVPQLGSEFTTVLYNFMCNSSCMGGMNRRPILTIISLETHDGQLLGRRSFEVRVCA
CPGRDRKTEESNFRKDQETKTLDKIPSANKRSLTKDSTSSVPRPEGSKKAKLSGSSDEEI
YTLQVRGKERYEMLKKINDSLELSDVVPPSEMDRYRQKLLTKGKKKDGQTPEPKRGKKLM
VKDEKSDSD
>sp|P25035|P53_ONCMY Cellular tumor antigen p53 OS=Oncorhynchus mykiss OX=8022 GN=tp53 PE=2 SV=1
MADLAENVSLPLSQESFEDLWKMNLNLVAVQPPETESWVGYDNFMMEAPLQVEFDPSLFE
VSATEPAPQPSISTLDTGSPPTSTVPTTSDYPGALGFQLRFLQSSTAKSVTCTYSPDLNK
LFCQLAKTCPVQIVVDHPPPPGAVVRALAIYKKLSDVADVVRRCPHHQSTSENNEGPAPR
GHLVRVEGNQRSEYMEDGNTLRHSVLVPYEPPQVGSECTTVLYNFMCNSSCMGGMNRRPI
LTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEEINLKKQQETTLETKTKPAQGIKRAM
KEASLPAPQPGASKKTKSSPAVSDDEIYTLQIRGKEKYEMLKKFNDSLELSELVPVADAD
KYRQKCLTKRVAKRDFGVGPKKRKKLLVKEEKSDSD
>sp|O12946|P53_PLAFE Cellular tumor antigen p53 OS=Platichthys flesus OX=8260 GN=tp53 PE=2 SV=1
MMDEQGLDGMQILPGSQDSFSELWASVQTPSIATIAEEFDDHLGNLLQNGFDMNLFELPP
EMVAKDSVTPPSSTVPVVTDYPGEYGFQLRFQKSGTAKSVTSTFSELLKKLYCQLAKTSP
VEVLLSKEPPQGAVLRATAVYKKTEHVADVVRRCPHHQTEDTAEHRSHLIRLEGSQRALY
FEDPHTKRQSVTVPYEPPQLGSETTAILLSFMCNSSCMGGMNRRQILTILTLETPDGLVL
GRRCFEVRVCACPGRDRKTDEESSTKTPNGPKQTKKRKQAPSNSAPHTTTVMKSKSSSSA
EEEDKEVFTVLVKGRERYEIIKKINEAFEGAAEKEKAKNKVAVKQELPVPSSGKRLVQRG
ERSDSD
>sp|O57538|P53_XIPHE Cellular tumor antigen p53 OS=Xiphophorus hellerii OX=8084 GN=tp53 PE=2 SV=1
MEEADLTLPLSQDTFHDLWNNVFLSTENESLAPPEGLLSQNMDFWEDPETMQETKNVPTA
PTVPAISNYAGEHGFNLEFNDSGTAKSVTSTYSVKLGKLFCQLAKTTPIGVLVKEEPPQG
AVIRATSVYKKTEHVGEVVKRCPHHQSEDLSDNKSHLIRVEGSQLAQYFEDPNTRRHSVT
VPYERPQLGSEMTTILLSFMCNSSCMGGMNRRPILTILTLETTEGEVLGRRCFEVRVCAC
PGRDRKTEEGNLEKSGTKQTKKRKSAPAPDTSTAKKSKSASSGEDEDKEIYTLSIRGRNR
YLWFKSLNDGLELMDKTGPKIKQEIPAPSSGKRLLKGGSDSD
>sp|O93379|P53_ICTPU Cellular tumor antigen p53 OS=Ictalurus punctatus OX=7998 GN=tp53 PE=2 SV=1
MEGNGERDTMMVEPPDSQEFAELWLRNLIVRDNSLWGKEEEIPDDLQEVPCDVLLSDMLQ
PQSSSSPPTSTVPVTSDYPGLLNFTLHFQESSGTKSVTCTYSPDLNKLFCQLAKTCPVLM
AVSSSPPPGSVLRATAVYKRSEHVAEVVRRCPHHERSNDSSDGPAPPGHLLRVEGNSRAV
YQEDGNTQAHSVVVPYEPPQVGSQSTTVLYNYMCNSSCMGGMNRRPILTIITLETQDGHL
LGRRTFEVRVCACPGRDRKTEESNFKKQQEPKTSGKTLTKRSMKDPPSHPEASKKSKNSS
SDDEIYTLQVRGKERYEFLKKINDGLELSDVVPPADQEKYRQKLLSKTCRKERDGAAGEP
KRGKKRLVKEEKCDSD
>sp|P79820|P53_ORYLA Cellular tumor antigen p53 OS=Oryzias latipes OX=8090 GN=tp53 PE=2 SV=2
MDPVPDLPESQGSFQELWETVSYPPLETLSLPTVNEPTGSWVATGDMFLLDQDLSGTFDD
KIFDIPIEPVPTNEVNPPPTTVPVTTDYPGSYELELRFQKSGTAKSVTSTYSETLNKLYC
QLAKTSPIEVRVSKEPPKGAILRATAVYKKTEHVADVVRRCPHHQNEDSVEHRSHLIRVE
GSQLAQYFEDPYTKRQSVTVPYEPPQPGSEMTTILLSYMCNSSCMGGMNRRPILTILTLE
TEGLVLGRRCFEVRICACPGRDRKTEEESRQKTQPKKRKVTPNTSSSKRKKSHSSGEEED
NREVFHFEVYGRERYEFLKKINDGLELLEKESKSKNKDSGMVPSSGKKLKSN
>sp|Q92143|P53_XIPMA Cellular tumor antigen p53 OS=Xiphophorus maculatus OX=8083 GN=tp53 PE=2 SV=2
MEEADLTLPLSQDTFHDLWNNVFLSTENESLPPPEGLLSQNMDFWEDPETMQETKNVPTA
PTVPAISNYAGEHGFNLEFNDSGTAKSVTSTYSVKLGKLFCQLAKTTPIGVLVKEEPPQG
AVIRATAVYKKTEHVGEVVKRCPHHQSEDLSDNKSHLIRVEGSQLAQYFEDPNTRRHSVT
VPYERPQLGSEMTTILLSFMCNSSCMGGMNRRPILTILTLETTEGEVLGRRCFEVRVCAC
PGRDRKTEEGNLEKSGTKQTKKRKSAPAPDTSTAKKSKSASSGEDEDKEIYTLSIRGRNR
YLWFKSLNDGLELMDKTGPKIKQEIPAPSSGKRLLKGGSDSD
>sp|Q9W679|P53_TETMU Cellular tumor antigen p53 OS=Tetraodon miurus OX=94908 GN=tp53 PE=2 SV=1
MEEENISLPLSQDTFQDLWDNVSAPPISTIQTAALENEAWPAERQMNMMCNFMDSTFNEA
LFNLLPEPPSRDGANSSSPTVPVTTDYPGEYGFKLRFQKSGTAKSVTSTYSEILNKLYCQ
LAKTSLVEVLLGKDPPMGAVLRATAIYKKTEHVAEVVRRCPHHQNEDSAEHRSHLIRMEG
SERAQYFEHPHTKRQSVTVPYEPPQLGSEFTTILLSFMCNSSCMGGMNRRPILTILTLET
QEGIVLGRRCFEVRVCACPGRDRKTEETNSTKMQNDAKDAKKRKSVPTPDSTTIKKSKTA
SSAEEDNNEVYTLQIRGRKRYEMLKKINDGLDLLENKPKSKATHRPDGPIPPSGKRLLHR
GEKSDSD4
6) Lets define a helper function to access another url (for ClustalO)
def get_url(url, **kwargs):
response = requests.get(url, **kwargs);
if not response.ok:
print(response.text)
response.raise_for_status()
sys.exit()
return response
7) Let's use our function to submit our FASTA sequences to ClustalO for sequence alignments. Get the job ID and the status.
req = requests.post("https://www.ebi.ac.uk/Tools/services/rest/clustalo/run", data={
"email": "[email protected]",
"iterations": 0,
"outfmt": "clustal_num",
"order": "input",
"sequence": fasta
})
job_id = req.text
print(job_id)
req = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}")
print(req.text)
This should return something that looks like:
clustalo-R20230708-222339-0868-90023705-p1m
QUEUED
8) you can re-run the following to check if the job is finished
req = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}")
print(req.text)
When ready you should get
FINISHED
9) Now lets get the results
req = get_url(f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num")
print(req.text)
You should get a nice multiple sequence alignment
CLUSTAL O(1.2.4) multiple sequence alignmen
sp|P04637|P53_HUMAN ---MEEPQSDPSVEPPLSQETFSDLWKLLPENNV------LSPL-P--SQAMDDLMLSPD 48
sp|P10361|P53_RAT ---MEDSQSDMSIELPLSQETFSCLWKLLPPDDI------LPTTATGSPNSM-EDLFLPQ 50
sp|P02340|P53_MOUSE MTAMEESQSDISLELPLSQETFSGLWKLLPPEDI------LPS-----PHCM-DDLLLPQ 48
sp|Q42578|PER53_ARATH ------------------------------------------------------------ 0
sp|O09185|P53_CRIGR ---MEEPQSDLSIELPLSQETFSDLWKLLPPNNV------LSTLP--SSDSI-EELFLSE 48
sp|Q8SPZ3|P53_DELLE ---MEESQAELGVEPPLSQETFSDLWKLLPENNL------LSSELS--PA-VDDLLLSPE 48
sp|Q9TTA1|P53_TUPBE ---MEEPQSDPSVEPPLSQETFSDLWKLLPENNV------LSPL-P--SQAMDDLMLSPD 48
sp|P61260|P53_MACFU ---MEEPQSDPSIEPPLSQETFSDLWKLLPENNV------LSPL-P--SQAVDDLMLSPD 48
sp|P56424|P53_MACMU ---MEEPQSDPSIEPPLSQETFSDLWKLLPENNV------LSPL-P--SQAVDDLMLSPD 48
sp|P79892|P53_HORSE -------------------------------------------------PAVNNLLLSP- 10
sp|Q29537|P53_CANLF ---MEESQSELNIDPPLSQETFSELWNLLPENNV------LSSELC--PAV--DELLLPE 47
sp|P56423|P53_MACFA ---MEEPQSDPSIEPPLSQETFSDLWKLLPENNV------LSPL-P--SQAVDDLMLSPD 48
sp|Q9TUB2|P53_PIG ---MEESQSELGVEPPLSQETFSDLWKLLPENNL------LSSELS--LAAVNDLLLSP- 48
sp|Q9W678|P53_BARBU ---------------MAESQEFAELWERNLISTQ---------EAGTCWELI-ND----E 31
sp|P25035|P53_ONCMY --MA---DLAENVSLPLSQESFEDLWKMNLNLVA------VQPPETESWVGY-DNFMMEA 48
sp|O12946|P53_PLAFE --MMDEQGLDGMQILPGSQDSFSELWASVQTPSIATIAEEF-------------DDHLGN 45
sp|O57538|P53_XIPHE ---ME----EADLTLPLSQDTFHDLWNNVFLSTENESLA----PPEG---------LLSQ 40
sp|O93379|P53_ICTPU --MEGNGERDTMMVEPPDSQEFAELWLRNLIVRD---------N--SLWGKE-------E 40
sp|P79820|P53_ORYLA --------MDPVPDLPESQGSFQELWETVSYPPLETLSLPTVNEPTGSWVATGDMFLLDQ 52
sp|Q92143|P53_XIPMA ---ME----EADLTLPLSQDTFHDLWNNVFLSTENESLP----PPEG---------LLSQ 40
sp|Q9W679|P53_TETMU ---ME----EENISLPLSQDTFQDLWDNVSAPPISTIQTAAL--ENEAWPAERQMNMMCN 51
sp|P04637|P53_HUMAN DIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYG 108
sp|P10361|P53_RAT DVAELLEGPEEALQV---S-APAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYG 106
sp|P02340|P53_MOUSE DVEEFFEGPSEALRV---SGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGNYG 105
sp|Q42578|PER53_ARATH ----------------------------------------------MAVTNLPTCDGLFI 14
sp|O09185|P53_CRIGR NVTGWLEDSGGALQGVAAAAASTAEDPVTETPAPVASAPATPWPLSSSVPSYKTYQGDYG 108
sp|Q8SPZ3|P53_DELLE DVANWLDER--PDEAPQMPEP-----PAPAAPTPAAPAPATSWPLSSFVPSQKTYPGSYG 101
sp|Q9TTA1|P53_TUPBE DIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYG 108
sp|P61260|P53_MACFU DLAQWLTEDPGPDEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYG 108
sp|P56424|P53_MACMU DLAQWLTEDPGPDEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYG 108
sp|P79892|P53_HORSE DVVNWLDEG--PDEAPRMPAA-----P-----APLAPAPATSWPLSSFVPSQKTYPGCYG 58
sp|Q29537|P53_CANLF SVVNWLDED--SDDAPRMPAT-----SA-----PTAPGPAPSWPLSSSVPSPKTYPGTYG 95
sp|P56423|P53_MACFA DLAQWLTEDPGPDEAPRMSEAAPPMAPTPAAPTPAAPAPAPSWPLSSSVPSQKTYHGSYG 108
sp|Q9TUB2|P53_PIG -VTNWLDEN--PDDASRVPAP-----PAATAPAPAAPAPATSWPLSSFVPSQKTYPGSYD 100
sp|Q9W678|P53_BARBU YLPSSFDPN---------IFDNVL----------TEQPQPSTSPPTASVPVATDYPGEHG 72
sp|P25035|P53_ONCMY PLQVEFDPS---------LFEVSA---TEPAPQPSISTLDTGSPPTSTVPTTSDYPGALG 96
sp|O12946|P53_PLAFE LLQNGFDMN---------LFELPP----------EMVAKDSVTPPSSTVPVVTDYPGEYG 86
sp|O57538|P53_XIPHE ------NMD---------FWE-DP----------ETMQETKNVPTAPTVPAISNYAGEHG 74
sp|O93379|P53_ICTPU EIPDDLQEV---------PCDVLL---SD-----MLQPQSSSSPPTSTVPVTSDYPGLLN 83
sp|P79820|P53_ORYLA DLSGTFDDK---------IFDIPI----------EPVPTNEVNPPPTTVPVTTDYPGSYE 93
sp|Q92143|P53_XIPMA ------NMD---------FWE-DP----------ETMQETKNVPTAPTVPAISNYAGEHG 74
sp|Q9W679|P53_TETMU FMDSTFNEA---------LFNLLP----------EPPSRDGANSSSPTVPVTTDYPGEYG 92
* *
sp|P04637|P53_HUMAN FRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPV------------------------- 143
sp|P10361|P53_RAT FHLGFLQSGTAKSVMCTYSISLNKLFCQLAKTCPV------------------------- 141
sp|P02340|P53_MOUSE FHLGFLQSGTAKSVMCTYSPPLNKLFCQLAKTCPV------------------------- 140
sp|Q42578|PER53_ARATH ISLIVIV----SSIFGTSSAQLNATFY--SGTCPNASAIVRSTIQQALQSDTRIGASLIR 68
sp|O09185|P53_CRIGR FRLGFLHSGTAKSVTCTYSPSLNKLFCQLAKTCPV------------------------- 143
sp|Q8SPZ3|P53_DELLE FHLGFLHSGTAKSVTCTYSPALNKLFCQLAKTCPV------------------------- 136
sp|Q9TTA1|P53_TUPBE FRLGFLHSGTAKSVTCTYSPDLNKLFCQLAKTCPV------------------------- 143
sp|P61260|P53_MACFU FRLGFLHSGTAKSVTCTYSPDLNKMFCQLAKTCPV------------------------- 143
sp|P56424|P53_MACMU FRLGFLHSGTAKSVTCTYSPDLNKMFCQLAKTCPV------------------------- 143
sp|P79892|P53_HORSE FRLGFLNSGTAKSVTCTYSPTLNKLFCQLAKTCPV------------------------- 93
sp|Q29537|P53_CANLF FRLGFLHSGTAKSVTWTYSPLLNKLFCQLAKTCPV------------------------- 130
sp|P56423|P53_MACFA FRLGFLHSGTAKSVTCTYSPDLNKMFCQLAKTCPV------------------------- 143
sp|Q9TUB2|P53_PIG FRLGFLHSGTAKSVTCTYSPALNKLFCQLAKTCPV------------------------- 135
sp|Q9W678|P53_BARBU FKLGFPQSGTAKSVTCTYSSDLNKLFCQLAKTCPV------------------------- 107
sp|P25035|P53_ONCMY FQLRFLQSSTAKSVTCTYSPDLNKLFCQLAKTCPV------------------------- 131
sp|O12946|P53_PLAFE FQLRFQKSGTAKSVTSTFSELLKKLYCQLAKTSPV------------------------- 121
sp|O57538|P53_XIPHE FNLEFNDSGTAKSVTSTYSVKLGKLFCQLAKTTPI------------------------- 109
sp|O93379|P53_ICTPU FTLHFQESSGTKSVTCTYSPDLNKLFCQLAKTCPV------------------------- 118
sp|P79820|P53_ORYLA LELRFQKSGTAKSVTSTYSETLNKLYCQLAKTSPI------------------------- 128
sp|Q92143|P53_XIPMA FNLEFNDSGTAKSVTSTYSVKLGKLFCQLAKTTPI------------------------- 109
sp|Q9W679|P53_TETMU FKLRFQKSGTAKSVTSTYSEILNKLYCQLAKTSLV------------------------- 127
: * . .*: * * * : : *
sp|P04637|P53_HUMAN -------------QLWVDST------PPPGTRVRAMAIYKQ-SQHMTEVVRRCPHHERCS 183
sp|P10361|P53_RAT -------------QLWVTST------PPPGTRVRAMAIYKK-SQHMTEVVRRCPHHERCS 181
sp|P02340|P53_MOUSE -------------QLWVSAT------PPAGSRVRAMAIYKK-SQHMTEVVRRCPHHERCS 180
sp|Q42578|PER53_ARATH LHFHDCFVNGCDASILLDDTGSIQSEKNAGPNVNSARGFNVVDNIKTALENACPGVVSCS 128
sp|O09185|P53_CRIGR -------------QLWVNST------PPPGTRVRAMAIYKK-LQYMTEVVRRCPHHERSS 183
sp|Q8SPZ3|P53_DELLE -------------QLWVSSP------PPPGTRVRAMAIYKK-SEYMTEVVRRCPHHERCS 176
sp|Q9TTA1|P53_TUPBE -------------QLWVDSA------PPPGTRVRAMAIYKQ-SQYVTEVVRRCPHHERCS 183
sp|P61260|P53_MACFU -------------QLWVDST------PPPGSRVRAMAIYKQ-SQHMTEVVRRCPHHERCS 183
sp|P56424|P53_MACMU -------------QLWVDST------PPPGSRVRAMAIYKQ-SQHMTEVVRRCPHHERCS 183
sp|P79892|P53_HORSE -------------QLLVSSP------PPPGTRVRAMAIYKK-SEFMTEVVRRCPHHERCS 133
sp|Q29537|P53_CANLF -------------QLWVSSP------PPPNTCVRAMAIYKK-SEFVTEVVRRCPHHERCS 170
sp|P56423|P53_MACFA -------------QLWVDST------PPPGSRVRAMAIYKQ-SQHMTEVVRRCPHHERCS 183
sp|Q9TUB2|P53_PIG -------------QLWVSSP------PPPGTRVRAMAIYKK-SEYMTEVVRRCPHHERSS 175
sp|Q9W678|P53_BARBU -------------QMVVNVA------PPQGSVIRATAIYKK-SEHVAEVVRRCPHHERTP 147
sp|P25035|P53_ONCMY -------------QIVVDHP------PPPGAVVRALAIYKK-LSDVADVVRRCPHHQSTS 171
sp|O12946|P53_PLAFE -------------EVLLSKE------PPQGAVLRATAVYKK-TEHVADVVRRCPHHQT-- 159
sp|O57538|P53_XIPHE -------------GVLVKEE------PPQGAVIRATSVYKK-TEHVGEVVKRCPHHQS-- 147
sp|O93379|P53_ICTPU -------------LMAVSSS------PPPGSVLRATAVYKR-SEHVAEVVRRCPHHERSN 158
sp|P79820|P53_ORYLA -------------EVRVSKE------PPKGAILRATAVYKK-TEHVADVVRRCPHHQN-- 166
sp|Q92143|P53_XIPMA -------------GVLVKEE------PPQGAVIRATAVYKK-TEHVGEVVKRCPHHQS-- 147
sp|Q9W679|P53_TETMU -------------EVLLGKD------PPMGAVLRATAIYKK-TEHVAEVVRRCPHHQN-- 165
: : . :.: :: . : . **
sp|P04637|P53_HUMAN D-SDGLAPPQHLIRVEGNLRVEYLDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|P10361|P53_RAT D-GDGLAPPQHLIRVEGNPYAEYLDDRQTFR-------HSVVVPYEPPEVGSDYTTIHYK 233
sp|P02340|P53_MOUSE D-GDGLAPPQHLIRVEGNLYPEYLEDRQTFR-------HSVVVPYEPPEAGSEYTTIHYK 232
sp|Q42578|PER53_ARATH DVLA-LASEASVSLAGGPSWTVLLGRRDSLTANLAGANSSIPSPIE------SLSNITFK 181
sp|O09185|P53_CRIGR E-GDSLAPPQHLIRVEGNLHAEYLDDKQTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|Q8SPZ3|P53_DELLE DYSDGLAPPQHLIRVEGNLRAEYLDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 229
sp|Q9TTA1|P53_TUPBE D-SDGLAPPQHLIRVEGNLHAEYSDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|P61260|P53_MACFU D-SDGLAPPQHLIRVEGNLRVEYSDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|P56424|P53_MACMU D-SDGLAPPQHLIRVEGNLRVEYSDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|P79892|P53_HORSE DSSDGLAPPQHLIRVEGNLRAEYLDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 186
sp|Q29537|P53_CANLF DSSDGLAPPQHLIRVEGNLRAKYLDDRNTFR-------HSVVVPYEPPEVGSDYTTIHYN 223
sp|P56423|P53_MACFA D-SDGLAPPQHLIRVEGNLRVEYSDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 235
sp|Q9TUB2|P53_PIG DYSDGLAPPQHLIRVEGNLRAEYLDDRNTFR-------HSVVVPYEPPEVGSDCTTIHYN 228
sp|Q9W678|P53_BARBU D-GDGLAPAAHLIRVEGNSRALYREDDVNSR-------HSVVVPYEVPQLGSEFTTVLYN 199
sp|P25035|P53_ONCMY ENNEGPAPRGHLVRVEGNQRSEYMEDGNTLR-------HSVLVPYEPPQVGSECTTVLYN 224
sp|O12946|P53_PLAFE --EDTAEHRSHLIRLEGSQRALYFEDPHTKR-------QSVTVPYEPPQLGSETTAILLS 210
sp|O57538|P53_XIPHE --EDLSDNKSHLIRVEGSQLAQYFEDPNTRR-------HSVTVPYERPQLGSEMTTILLS 198
sp|O93379|P53_ICTPU DSSDGPAPPGHLLRVEGNSRAVYQEDGNTQA-------HSVVVPYEPPQVGSQSTTVLYN 211
sp|P79820|P53_ORYLA --EDSVEHRSHLIRVEGSQLAQYFEDPYTKR-------QSVTVPYEPPQPGSEMTTILLS 217
sp|Q92143|P53_XIPMA --EDLSDNKSHLIRVEGSQLAQYFEDPNTRR-------HSVTVPYERPQLGSEMTTILLS 198
sp|Q9W679|P53_TETMU --EDSAEHRSHLIRMEGSERAQYFEHPHTKR-------QSVTVPYEPPQLGSEFTTILLS 216
: * . *: * * . : : .
sp|P04637|P53_HUMAN YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEP 295
sp|P10361|P53_RAT YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEEH 293
sp|P02340|P53_MOUSE YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEVL 292
sp|Q42578|PER53_ARATH FSA-----VGLNTNDLVA----------LSGAHTFGRARCGVFN----NRLFNFSGTGNP 222
sp|O09185|P53_CRIGR YMCNSSCMGGMNRRPILTIITLEDPSGNLLGRNSFEVRICACPGRDRRTEEKNFQKKGEP 295
sp|Q8SPZ3|P53_DELLE FMCNSSCMGGMNRRPILTIITLEDSNGNLLGRNSFEVRVCACPGRDRRTEEENFHKKGQS 289
sp|Q9TTA1|P53_TUPBE YMCNSSCMGGMNRRPILTIITLEDSSGKLLGRNSFEVRICACPGRDRRTEEENFRKKGES 295
sp|P61260|P53_MACFU YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEP 295
sp|P56424|P53_MACMU YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEP 295
sp|P79892|P53_HORSE FMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKEEP 246
sp|Q29537|P53_CANLF YMCNSSCMGGMNRRPILTIITLEDSSGNVLGRNSFEVRVCACPGRDRRTEEENFHKKGEP 283
sp|P56423|P53_MACFA YMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKGEP 295
sp|Q9TUB2|P53_PIG FMCNSSCMGGMNRRPILTIITLEDASGNLLGRNSFEVRVCACPGRDRRTEEENFLKKGQS 288
sp|Q9W678|P53_BARBU FMCNSSCMGGMNRRPILTIISLETHDGQLLGRRSFEVRVCACPGRDRKTEESNFRKDQET 259
sp|P25035|P53_ONCMY FMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEEINLKKQQET 284
sp|O12946|P53_PLAFE FMCNSSCMGGMNRRQILTILTLETPDGLVLGRRCFEVRVCACPGRDRKTDEESSTKTPNG 270
sp|O57538|P53_XIPHE FMCNSSCMGGMNRRPILTILTLETTEGEVLGRRCFEVRVCACPGRDRKTEEGNLEK--SG 256
sp|O93379|P53_ICTPU YMCNSSCMGGMNRRPILTIITLETQDGHLLGRRTFEVRVCACPGRDRKTEESNFKKQQEP 271
sp|P79820|P53_ORYLA YMCNSSCMGGMNRRPILTILTLET-EGLVLGRRCFEVRICACPGRDRKTEEESRQKTQP- 275
sp|Q92143|P53_XIPMA FMCNSSCMGGMNRRPILTILTLETTEGEVLGRRCFEVRVCACPGRDRKTEEGNLEK--SG 256
sp|Q9W679|P53_TETMU FMCNSSCMGGMNRRPILTILTLETQEGIVLGRRCFEVRVCACPGRDRKTEETNSTKMQND 276
: . *:* . ::: : * * *. . . .
sp|P04637|P53_HUMAN HHELPP---GSTKRALPNNTSSSPQP----------KKKPLDGEYFTLQI---------- 332
sp|P10361|P53_RAT CPELPP---GSAKRALPTSTSSSPQQ----------KKKPLDGEYFTLKI---------- 330
sp|P02340|P53_MOUSE CPELPP---GSAKRALPTCTSASPPQ----------KKKPLDGEYFTLKI---------- 329
sp|Q42578|PER53_ARATH DPTLNSTLLSTLQQLCPQNGSAST-----ITNLDLSTPDAFDNNYFANLQSNDGLLQSDQ 277
sp|O09185|P53_CRIGR CPELPP---KSAKRALPTNTSSSPPP----------KKKTLDGEYFTLKI---------- 332
sp|Q8SPZ3|P53_DELLE CPELPT---GSAKRALPTGTSSSPPQ----------KKKPLDGEYFTLQI---------- 326
sp|Q9TTA1|P53_TUPBE CPKLPT---GSIKRALPTGSSSSPQP----------KKKPLDEEYFTLQI---------- 332
sp|P61260|P53_MACFU CHQLPP---GSTKRALPNNTSSSPQP----------KKKPLDGEYFTLQI---------- 332
sp|P56424|P53_MACMU CHQLPP---GSTKRALPNNTSSSPQP----------KKKPLDGEYFTLQI---------- 332
sp|P79892|P53_HORSE CPEPPP---RSTKRVLSSNTSSSPPQ----------KKKPLDGEYFT------------- 280
sp|Q29537|P53_CANLF CPEPPP---GSTKRALPPSTSSSPPQ----------KKKPLDGEYFTLQI---------- 320
sp|P56423|P53_MACFA CHQLPP---GSTKRALPNNTSSSPQP----------KKKPLDGEYFTLQI---------- 332
sp|Q9TUB2|P53_PIG CPEPPP---GSTKRALPTSTSSSPVQ----------KKKPLDGEYFTLQI---------- 325
sp|Q9W678|P53_BARBU KTLDKIPSANK--RSLTKDSTSSVPRPEGSK--KAKLSGSSDEEIYTLQV---------- 305
sp|P25035|P53_ONCMY TLETKTKPAQGIKRAMKEASL--PAPQPGASKKTKSSPAVSDDEIYTLQI---------- 332
sp|O12946|P53_PLAFE PKQTKK-------RKQAPS-NSAPHTTTVMKSKSSSSAEEEDKEVFTVLV---------- 312
sp|O57538|P53_XIPHE TKQTKK-------RKSAP----APDTSTAKKSKSASSGEDEDKEIYTLSI---------- 295
sp|O93379|P53_ICTPU KTSGKT---------LTKRSMKDPPSHPEAS--KKSKNSSSDDEIYTLQV---------- 310
sp|P79820|P53_ORYLA ----KK-------RKVTPN-TS----SSKRKKSHSSGEEEDNREVFHFEV---------- 309
sp|Q92143|P53_XIPMA TKQTKK-------RKSAP----APDTSTAKKSKSASSGEDEDKEIYTLSI---------- 295
sp|Q9W679|P53_TETMU AKDAKK-------RKSVP----TPDSTTIKKSKTASSAEEDNNEVYTLQI---------- 315
: : :
sp|P04637|P53_HUMAN -----RGR-----------ERFEMFRELNEALELKDAQAG-K-EPGGSRAHSS---HLKS 371
sp|P10361|P53_RAT -----RGR-----------ERFEMFRELNEALELKDARAA-E-ESGDSRAHSS---YPKT 369
sp|P02340|P53_MOUSE -----RGR-----------KRFEMFRELNEALELKDAHAT-E-ESGDSRAHSS---YLKT 368
sp|Q42578|PER53_ARATH ELFSTTGSSTIAIVTSFASNQTLFFQAFAQSMINMGNISPLTGSNGEIRLDC------KK 331
sp|O09185|P53_CRIGR -----RGH-----------ERFKMFQELNEALELKDAQAS-K-GSEDNGAHSS---YLKS 371
sp|Q8SPZ3|P53_DELLE -----RGR-----------ERFEMFRELNEALELKDAQAG-K-EPGESRAHSS---HLKS 365
sp|Q9TTA1|P53_TUPBE -----RGR-----------ERFEMLREINEALELKDAMAG-K-ESAGSRAHSS---HLKS 371
sp|P61260|P53_MACFU -----RGR-----------ERFEMFRELNEALELKDAQAG-K-EPAGSRAHSS---HLKS 371
sp|P56424|P53_MACMU -----RGR-----------ERFEMFRELNEALELKDAQAG-K-EPAGSRAHSS---HLKS 371
sp|P79892|P53_HORSE ------------------------------------------------------------ 280
sp|Q29537|P53_CANLF -----RGR-----------ERYEMFRNLNEALELKDAQSG-K-EPGGSRAHSS---HLKA 359
sp|P56423|P53_MACFA -----RGR-----------ERFEMFRELNEALELKDAQAG-K-EPAGSRAHSS---HLKS 371
sp|Q9TUB2|P53_PIG -----RGR-----------ERFEMFRELNDALELKDAQTA-R-ESGENRAHSS---HLKS 364
sp|Q9W678|P53_BARBU -----RGK-----------ERYEMLKKINDSLELSDVVPP-S-EMDRYRQKLLTKG--KK 345
sp|P25035|P53_ONCMY -----RGK-----------EKYEMLKKFNDSLELSELVPV-A-DADKYRQKCLTKRVAKR 374
sp|O12946|P53_PLAFE -----KGR-----------ERYEIIKKINEAFEGAAEKEK-A-KNK----------VAVK 344
sp|O57538|P53_XIPHE -----RGR-----------NRYLWFKSLNDGLELMDKTG-----------------PKIK 322
sp|O93379|P53_ICTPU -----RGK-----------ERYEFLKKINDGLELSDVVPP-A-DQEKYRQKLLSKTCRKE 352
sp|P79820|P53_ORYLA -----YGR-----------ERYEFLKKINDGLELLEKESK-S-KN--------------K 337
sp|Q92143|P53_XIPMA -----RGR-----------NRYLWFKSLNDGLELMDKTG-----------------PKIK 322
sp|Q9W679|P53_TETMU -----RGR-----------KRYEMLKKINDGLDLLENKP----KSK----------ATHR 345
sp|P04637|P53_HUMAN KKG--QSTSRHKKLMFKTEGPDSD 393
sp|P10361|P53_RAT KKG--QSTSRHKKPMIKKVGPDSD 391
sp|P02340|P53_MOUSE KKG--QSTSRHKKTMVKKVGPDSD 390
sp|Q42578|PER53_ARATH VNG-----------------S--- 335
sp|O09185|P53_CRIGR KKG--QSASRLKKLMIKREGPDSD 393
sp|Q8SPZ3|P53_DELLE KKG--QSPSRHKKLMFKREGPDSD 387
sp|Q9TTA1|P53_TUPBE KKG--QSTSRHRKLMFKTEGPDSD 393
sp|P61260|P53_MACFU KKG--QSTSRHKKFMFKTEGPDSD 393
sp|P56424|P53_MACMU KKG--QSTSRHKKFMFKTEGPDSD 393
sp|P79892|P53_HORSE ------------------------ 280
sp|Q29537|P53_CANLF KKG--QSTSRHKKLMFKREGLDSD 381
sp|P56423|P53_MACFA KKG--QSTSRHKKFMFKTEGPDSD 393
sp|Q9TUB2|P53_PIG KKG--QSPSRHKKPMFKREGPDSD 386
sp|Q9W678|P53_BARBU KDGQTPEPKRGKKLMVKDEKSDSD 369
sp|P25035|P53_ONCMY DFG--VGPKKRKKLLVKEEKSDSD 396
sp|O12946|P53_PLAFE QEL--PVPSSGKRLVQRGERSDSD 366
sp|O57538|P53_XIPHE QEI--PAPSSGKRLLKGGSDSD-- 342
sp|O93379|P53_ICTPU RDGAAGEPKRGKKRLVKEEKCDSD 376
sp|P79820|P53_ORYLA DSG--MVPSSGKKLKSN------- 352
sp|Q92143|P53_XIPMA QEI--PAPSSGKRLLKGGSDSD-- 342
sp|Q9W679|P53_TETMU PDG--PIPPSGKRLLHRGEKSDSD 367t
Final notes: you can modify p53 to any protein of interest. EMBL-EBI has some great resources. I recommend this video to start: https://www.youtube.com/watch?v=-2g3nFhZkzo
Hope you are having as much summer fun as we are!