COVID-19 Protein Comparison Analysis

No alt text provided for this image

Computational Analysis

THE WORLD OF MICROBES

We are living in a world which got jolted in January by the onset of a brand new enemy, an enemy which cannot be seen but can only be felt , via nausea, headaches, shortness of breath and in some unfortunate cases the quintessential death. The Novel Corona Virus – nCOV19 or a close cousin or a form of the SARS virus family. The virus seems to be virulent and transmissible by touch or on the surfaces that the virus deposits itself onto. Within weeks, the virus infects inhabitants in Wuhan, China from where it started and the number of people affected by the virus quickly reach 1000s and 100000 by the time we are writing this paper.

The entire world primary care community goes to an overdrive where patients keep filtering in and the virus has become virulent to cause 1000s of death. However, the virus shows a very unique behavior, it seems to miss kids of the age 0-9 years and seems to affect people with the age band 65 years and older. This challenge that the world faces prompted Makers Lab, R&D division of TechMahindra to go into an overdrive ourselves to see what can be done round it purely from a computational model perspective, and see if we find something.

The Coronavirus disease 2019 (COVID-19) pandemic is rapidly evolving; it has spread to more than 150 countries. By March 20th, 2020, there are more than 250,000 confirmed cases and at least 10,000 have died. More than two-third of them are outside mainland China where the virus originated. There are no vaccines available and there is little evidence on the effectiveness of potential therapeutic agents. In addition, there is presumably no pre-existing immunity in the population against COVID-19 and everyone in the population is assumed to be susceptible.

OUR DESIGN PRINCIPLE

Our design principle is based on a very common software principle called YAGNI (You ain't gonna need it). This principle has been designed to simplify the process and act of writing code, so that it affects the most where needed, without changing the entire underlying premise

We were aware that protein structures and shape do affect the way proteins behave but as a starting point we just wanted to look at peptide chains in Corona and compare them with peptide chains in other viruses of similar types, the basic idea being if a chemical compound is known to act on one, it might give scientists a head start to act on this virus.

OUR PRE-STUDY

We are computational scientists with little or no knowledge of Biology or the functioning of the body, leave alone the world of viruses and bacteria, so we had to resort to classes online to understand the virus, its structure and also how it functions. We studied for 48 hours straight , and some of the classes I found brilliant which gave me understanding were classes on Virology (Virology Lectures by Vincent Racaniello) https://www.youtube.com/watch?v=svlKm4S1M3Y&list=PLGhmZX2NKiNlwig68CGPHQI_Pcxri4LDx

I would highly recommend anyone trying to understand viruses to go to this lecture

One thing that made us clear at the lab was the way, a virus behaves in comparison to a bacteria and why mechanisms to fight bacterial infection will never work for viruses.

Bacteria self-replicate and viruses need a host. There has been an endless debate on whether a virus is living or not, a debate I could settle in my head by the fact, if the virus lives outside the host when it is not replicating, it is non-living. The virus only starts living when it enters the host cells and tries to transcribe its mRNA to make copies. Given below is the table of difference between viruses and bacteria

No alt text provided for this image

A virus is defined as ‘An infectious, obligate intercellular parasite, comprising of genetic material (RNA or DNA) surrounded by a protein coat, sometimes a membrane - Vincent Racaneillo

Virologist the world over divide the viral infectious cycle in two phases even though no such boundaries exist. Having said that an infectious cycle is shown below :

No alt text provided for this image

Fig 1: Source: Vincent Racaneillo course

What we see is the way the virus uses the host cell. The viral spike protein after attachment on the cell membrane essentially breaks open in two parts. One of the parts are used for genome replication and the other part is used for translation and creating more viral proteins. The compounds are mixed together to make more viruses which are then released in greater number.  

EXPERIMENTAL RESEARCH

Armed with this knowledge, and backed by the fact that protein membrane of the virus plays a major part in attacking a host cell within the human body and then replicates, we wanted to do the following studies, more like questions that we wanted an answer about

a)     Can we get the viral protein structure and can we make a 3d model of atoms by placement?

b)     Can we compare the protein amino acid chains of two or more viruses?

In order to do that, we used Bio Python to search from the RCSB database to get PDB files and see if we could construct the viral structures and atom placement. On further finding out, we discovered that BioPython has a PDB parser that enables us to get chains of models, residues and atoms and also gives the x,y and z coordinates of the atom placements. Our job was to find out the structure and plot it . First task was easy and we managed to create a good structure of not just the Corona virus but also other protein structures available to us.

Given below is an image from our system created file showing the atomic placement of corona virus as compared to the structure available on RCSB database. 

 

No alt text provided for this image


This close match gave us a lot of confidence in proceeding to the next task for comparing protein nucleotides

There are two mechanisms in which we would have measured the protein peptide links, either though converting them into a vector, or using sequence matching techniques to see which peptide chains match. The structure of pdb is such that it has to be converted into chains. We converted it into chains of atoms like 'N-CA-C-CB-O-CG-CD-CE-NZ which essentially meant atoms in one single chain and wrote a code to compare only the chains itself between two virus proteins. The idea was to check how many sequences match, and how much does the protein sequence match in total. Even though we knew, proteins do fold at temperatures and it is the protein’s shape that gives the protein its characteristic behavior, we wanted to limit our search to only chain values, because we started with the assumption that if chains can be found similar, then similar antidotes might work on them

MINI DISCOVERY

We first started by blasting open the pdb structure obtained from BioPython PDB parser as shown below

if(extension=='pdb'):

           data = parser.get_structure("pdb",main_virus_needed+'\\'+filename)

           initial_models = data.get_models()

           models = list(initial_models)

           type(models[0])

           for model in models:

               chains = list(models[0].get_chains())

               for chain in chains:

                   residues = list(chains[0].get_residues())

                   for residue in residues:

                       atoms = list(residue.get_atoms())

                       atoms_li = []

                       #Getting atom chains here to append

                       for atom in atoms:

                           atoms_li.append((atom.get_name()))

                       residue_atoms.append(atoms_li)

 

PDB structures are maintained in sequences of residues containing atoms at a specific position on x,y and z axis so we combined the atoms in one residue to form a chain like the one explained above

str_atoms_c.append('-'.join(map(str,i)))

Once these chains were obtained we started comparing peptide chains of one protein to another, given below are some results we obtained from the system which completely astonished us

No alt text provided for this image

[Figure 2: Comparison between Corona and Sars virus proteins Source: Makers Lab Tech Mahindra]

Amino peptide chains when compared between Corona and SARS was an astonishment as even though the novel coronavirus is considered to be a form of SARs, the amino peptide chains hardly matched. The sequence similarity was 3.68 percent and none of the chains actually matched within a set.

Similarly comparisons were drawn with Measles and Influenza virus proteins and the results are shown below

No alt text provided for this image


[Figure 3: Comparison between Corona and Measles virus proteins Source: Makers Lab Tech Mahindra]

84.89 % of the peptide chains in Measles were found in Corona and about 78.5% of chains matched with Influenza. It was a by a chance that we also measured this against Malaria…

No alt text provided for this image

[Figure 4: Comparison between Malaria and Corona virus Source: Makers Lab Tech Mahindra]

Amino peptide chains when compared between Corona and Malaria give us the result as given above in figure. As shown above, following things become clear

a)     The sequence match of proteins in both viral proteins is only  6.94 % . This is not a surprise as there are structural dis-similarities

b)     The red box explains what all sets of protein chains match each other and what percentage. The ones shown give the maximum match amongst a given set in comparison

c)     The astonishment happened with the       green         box. It is seen, that 99% of Malarial protein sequence is found in Corona virus as well!!

A similar run was made against the HIV virus and the following results were obtained

No alt text provided for this image

[Figure 5: Comparison between HIVand Corona virus Source: Makers Lab Tech Mahindra]

CONCLUSIONS

Disclaimer: These are experimental conclusions drawn numerically. These need to be validated by labs and other practitioners to see if they work

1)     99.2% of malarial peptide chains are similar to Corona peptide chains which also is corroborated with some study around the world where HydrozChloroquine , a non-toxic version of Chloroquine is known to work against the virus

2)     The paper and experiments are intended to give lab scientists and practitioners an easy reference guide to kick start treatment , lest an epidemic arises due to microbes and viruses

3)     Protein shapes do account for the way a protein functions but to kickstart a vaccination trial for a new epidemic viral protein, just comparing peptide chains could be a starting point. We plan to release this research for the scientific community as a web page

4)     Similarity of measles peptide chains and Influenza also suggest treatments followed for measles and influenza can be tested with the virus

5)     A theoretical directional move could be why kids are not getting affected badly by the virus is because kids from 0-9 get healthy doses of MMR and Malaria vaccines. Kids are administered malaria vaccine from 5 to 17 months as per WHO recommendation, which might have antibodies working against the virus protein.[Facts to be ascertained by clinical trial]

Amazing. There seems to be no boundaries to your knowledge and work Nikhil Malhotra

Prahalada Karnam

Business Coach, Future of Work and Education, People Analytics, Business Strategy, Supply Chain Management, Marketing and Consulting.

4 å¹´

Excellent headway into this research. The concern, though, that I read was that the current virus is quite a mutant. What I mean is that it has been observed in the US that this virus has 3-4 different spike proteins, In Europe a different one, likewise in Iran, China, etc. There is a different variant in India too. Maybe this is the reason that researchers are taking a bit longer to come out with a universal remedy. But really appreciate the zeal and enthusiasm of you and your team to find a remedy to this pandemic malaise. God Speed to your efforts!

Saurabh Pandey

Data Governance/ Product / Program Management @ Citi

4 å¹´

Great work Nikhil Malhotra! Proud of you and the entire team at Makers Lab.

Nidhi Ladha

Sustainability Specialist - ESG, Strategy, GRI, TCFD, IR

5 å¹´

Wow Nikhil! Fascinating to see yet another example of Technology enabling sustainability. A Tech enabled Healthcare solution in the offing at Maker Lab I assume I must admit, your research paper propelled me to understand to think about Bacteria and Viruses, to think binary.?

要查看或添加评论,请登录

Nikhil Malhotra的更多文章

  • The Intersection of AI and Quantum

    The Intersection of AI and Quantum

    We are all very lucky to be living in this era. If you are born in the last 4-5 decades, you have witnessed a path…

    8 条评论
  • Consciousness , brains and machines

    Consciousness , brains and machines

    Weekend Blog: #10 : This weekend I delve into something fundamental, a question which has been asked to mystics, but…

    6 条评论
  • BEYOND NEURAL NETWORKS

    BEYOND NEURAL NETWORKS

    All of you know that AI has done magnificently well since this century started. 2004 onwards there has been a rapid…

    2 条评论
  • BHAML – Bharat Mark-up Language

    BHAML – Bharat Mark-up Language

    If Indian kids can code in their own dialect, they can utilize technology better. INTRODUCTION Artificial Intelligence…

    8 条评论
  • THE OM CONSTANT

    THE OM CONSTANT

    INTRODUCTION The universe as we see it, is God's creation of numbers, or should I say , numbers help us figure out what…

    24 条评论
  • THE QUANTUM DANCE OF SHIVA

    THE QUANTUM DANCE OF SHIVA

    Concept by Meghraj Tambaku, Art by Rahul Marathe and Interpretation by Nikhil Malhotra On a Saturday night, this day on…

    36 条评论
  • Rise with RADIQal

    Rise with RADIQal

    INTRODUCTION An era of rapidly changing technology, virtualization, SDN/NFV and 5G as all-encompassing bubble for…

    3 条评论
  • Quantum AI (QAI) Entanglement

    Quantum AI (QAI) Entanglement

    Paper 1: Foray into the world of Quantum Computing We are living in a world where “experience” is counted as the new…

    4 条评论
  • The Imitation Game

    The Imitation Game

    A.I.

    3 条评论
  • Who would bell the cat ?

    Who would bell the cat ?

    Current State of Affairs An era of rapidly changing technology, virtualization, SDN/NFV, advent of 4G LTE technology is…

    10 条评论

社区洞察

其他会员也浏览了