Predicting Protein Structures using Quantum Computers - Part 2

Predicting Protein Structures using Quantum Computers - Part 2

This is the second part of the two-part blog on predicting protein structures using quantum computers. In the first part we gave a brief introduction to protein structures and how they are studied. We also discussed ways to predict protein structure. The first part can be found in the link given below.

https://www.dhirubhai.net/pulse/predicting-protein-structures-using-quantum-computers-kalyan-dasgupta-65wpc

In this part we will discuss how quantum computers could help in predicting protein structures.

Role Quantum Computers could play

Quantum Computers have two superpowers. One is superposition and the other is entanglement. Quantum computing algorithms try to exploit these powers to compute efficiently. While a majority of the algorithms are able to use the principle of superposition, the use of entanglement is not so straightforward. A class of hybrid algorithms, called variational algorithms, that uses a mix of quantum as well as classical compute has become very popular for solving a wide range of problems. One can read ref. 1 to know more about this class of algorithms. In variational algorithms, parameterized quantum circuits are created to generate states that are expected to be close to the desired lowest (or the highest, depending upon the problem) energy state. The quantum circuit essentially creates a superposition of these low energy states and the expectation value (average value) of the measurement of these states is considered to be the lowest energy value. The classical part of the algorithm involves an optimizer that updates the parameters of the circuit based on the movement of this expectation value of the energy measurements. The circuit that creates this superposition of low energy states is called an ansatz, a German word for an educated guess (Wiki Ansatz- https://en.wikipedia.org/wiki/Ansatz).

As discussed in part 1 of the blog, the protein structure prediction problem suffers due to the large feasible conformational space. Choosing the correct conformation from such a vast space is naturally a big challenge. With quantum hardwares, one can create 2^n (2 raised to power n) states from n qubits. This is a large number of states for systems having more than or close to 100 qubits. If our encoding scheme (a scheme that maps a conformation to a quantum state) is good, and the objective/energy function (Hamiltonian in quantum computing/mechanics parlance) captures all the force fields properly, the job is then left to the ansatz to take us closer to the lowest energy state. If the structure of the ansatz is inspired from effects that are quantum mechanical in nature, we will be able to properly exploit both the powers of superposition and entanglement. Ansatzes used in computational chemistry to model electron behaviour in molecules is one example where quantum mechanical effects are modelled. However, in the realm of protein folding, we do not (at least currently) have the resources or the algorithms to model at the level of quantum particles for the entire protein chain. The ansatzes that are used, instead, model at a coarser level and are more heuristic and hardware friendly in nature. There are some guidelines one could follow in the selection of an ansatz (ref. 2). Qubits are entangled in the hope that they would somehow lead us to a combination of very useful low energy states. In essence, the approach using variational algorithms uses ideas from physics-based models in how the Hamiltonian or the energy function is formulated and also uses ideas from machine learning or knowledge-based models in how the ansatz circuit is created. One thing to note while preparing an ansatz is the corresponding depth of the circuit. Current generation of quantum hardware are noisy and large circuit depths often result in qubits losing their fidelity.

Encoding approaches

In this article we will be discussing modelling approaches keeping in mind gate-based quantum hardware. Going through the literature on quantum computing in protein structure prediction, one can find mostly coarse grained lattice based representations. Fig. 1 shows some of the lattice representations used in practice in classical algorithms that can be adapted for qubit encoding.

Lattice structures: (a) cubic, (b) diamond, (c) cubic with planar diagonals, (d) hexagonal, (e) triangular and (f) face-centred-cubic. Courtesy: Reference 12

The first natural way to think about encoding 3D protein structures is to map the coordinate space to quantum states. It is to be noted that qubits can take up states which correspond to superpositions of binary logical states 0 and 1. However, on measurement, they collapse to either state 0 or 1. Now, because of this superposition effect, n qubits will result in 2^n states. If we are to map these states to the coordinate space, we could possibly have 2^n coordinate points at our disposal. Let us say, we are interested in a protein molecule having N nodes (one node could either represent the backbone nitrogen and carbon atoms or the entire amino acid itself). Every node has to be associated with a coordinate point. In any one dimension we would need log N (base 2) qubits to represent the coordinates of a node. To represent all the nodes, we will need N log N qubits. With 3 dimensions (x, y and z), we will need 3N log N qubits. Fig. 2(a) illustrates the idea and also gives the qubit requirement in a 2D system. We will need 16 qubits for just 4 nodes in a 2D system. Every node needs 4 qubits to represent its position in space.

Figure 2: (a) Coordinate-based encoding. Courtesy: Reference 3, (b) One hot turn-based encoding. Courtesy: Reference 5.

In ref. 3, the HP (hydrophobic-polar) interaction model was considered, where every node represents an amino acid monomer. In HP interaction models, hydrophobic amino acids attract each other and they collapse to form a core at the centre of the folded protein structure. Polar (hydrophilic) molecules stay on the outer periphery shielding the hydrophobic molecules from the aqueous medium. This phenomenon is known as hydrophobic collapse.

Another encoding approach is the turn-based encoding. This method is an improvement over the coordinate-based approach in terms of resource requirements. The turn-based encoding approach maps the turns taken, while going from one node to the next, to quantum states. A 3D dimensional lattice could have 6 turns (above, below, right, left, front, back) from any given point in the lattice. A 3-qubit encoding will suffice to represent all the movements. An N bead (molecule) system will have N ? 1 turns, and the total number of qubits required will be 3(N ? 1). One can read ref. 4 to know more on different turn encoding approaches. These approaches, however, require some extra ancilla qubits to enforce some of the constraints in the energy function. Fingerhuth et. al. in ref. 5, implemented one hot encoding method, where every node-node bond is represented by 6 qubits - each corresponding to a direction in a 3D lattice. Fig. 2(b) illustrates the turn encoding scheme. In ref. 6 the authors propose a turn-based encoding in a cubic lattice with larger degrees of freedom (denser packing). Apart from the 6 directions mentioned above, this formulation takes into account movement along the diagonals as well. The qubit requirement in this formulation is 6(N ? 1). Even with dense packing, one has to take care of the angles (in a simplified backbone representation) between successive Cα atoms, as they vary between 80 degrees to 155 degrees in reality (ref. 7). Fig. 3 gives the predicted structure for a 20-node system using an HP (H - blue, P - red) interaction model with denser packing in a cubic lattice. The experiments were performed in an IBM quantum machine named ibm_cusco. In the figure, every node represents a Cα atom. One can see the phenomenon of hydrophobic collapse here.

Figure 3: Predicted structure for a 20 node system - PHPHPHHPPHHHPPHPHHHP. Hydrophobic nodes are represented by colour blue, while polar (hydrophilic) nodes are represented by colour red (ref. 6).

In ref. 8, the authors presented a turn-based, resource efficient quantum model for a coarse-grained protein sequence mapped onto a tetrahedral lattice structure. There were two encoding approaches proposed in this paper. One was the sparse encoding approach using one hot encoding, requiring 4(N ? 3) qubits. The second, more dense encoding approach mapped turns to quantum states and requires 2(N ? 3) qubits.

Formulating the energy function or the Hamiltonian

The energy function that we seek to minimize should take into account all necessary forces at play. It should also take into consideration all the constraints that a feasible state has to satisfy. The energy function or the Hamiltonian is formulated by first fixing up the interaction model we wish to emulate. For example, in an HP interaction model, only interactions between hydrophobic amino acids are modelled. In a Miyazawa-Jernigan (MJ) interaction model, the relative interaction strengths between all the naturally occurring amino acids are given beforehand in the form of a table. There could be several other interaction models. The relative interaction strengths provided could either be from experimental results or from classical knowledge-based models. The formulation incentivizes amino acids (or nodes in the encoded form) with higher relative interaction strengths to come closer as compared to those with lesser strengths.

The constraints could be many. The primary ones are those that ensure the following.

  1. Continuity constraint - There should be a continuity in the chain. In other words, all adjacent amino acid molecules should remain connected.
  2. Geometric constraints - No two amino acid molecules should occupy the same lattice position and no amino acid molecule should occupy more than one lattice position.
  3. Physical constraints - In encoding schemes where bonds along diagonals are possible, there should be no crossing of bonds. Chirality constraints pertaining to the position of the side chains could also be added in (ref. 8).

Once, the interaction model and the constraints are formulated, the final Hamiltonian (H) is written as follows.

The second term on the right can be further broken down into the following terms.

The last term on the right pertains to the constraints that are specific to the encoding scheme.

Measurement and accessing low-energy states

The encoded states are all represented in terms of 0s and 1s corresponding to the qubit states. If the formulation results in quadratic terms (terms containing at most two variables), the problem would be akin to solving a quadratic unconstrained binary optimization (QUBO) problem. However, complex formulations often result in terms containing more than two variables (higher order terms). These are classically very hard problems to solve.

With the Hamiltonian prepared, the quantum circuit corresponding to the ansatz can be run. The circuit creates a superposition of low energy states. On measurement, they collapse to one particular state. The energy expectation value estimated (after several rounds of measurements), is dependent upon the Hamiltonian that we have prepared. It is possible to extract bitstrings (strings of 0s and 1s) corresponding to the low energy states. Qiskit, which is a popular Python software development kit for working with quantum computers, has a feature called Runtime Sampler, that can give us these bitstrings (ref. 9). This ability to sample a humongous number of bitstring states is what gives quantum hardware its real power. The obtained bitstring can be decoded to obtain the low energy conformations.

While working with a hardware one has to keep in mind that errors will creep in and qubit fidelity will be lost to some extent. There are error mitigation techniques available that a user can make use of while running the experiments. One can read ref. 10 to learn more about error mitigation and suppression techniques while working with a quantum hardware.

Final thoughts

In this two-part article we discussed the protein structure prediction problem and the different computing tools available to us in tackling this problem. The benefits of solving this problem are enormous, in terms of drug discovery, devising novel therapeutic strategies to prevent or rectify protein misfolding, unraveling various biological processes, etc. We discussed how, quantum computers could help in solving this 60 year-old puzzle. Variational quantum algorithms are widely being used in taking a shot at this problem. At the moment, the protein chains whose structures are being analyzed in quantum hardware are small in size. It is expected that as quantum computers grow in their capabilities and more efficient quantum algorithms come into the picture, we will be able to handle larger-sized protein chains. Building good ansatzes that can effectively navigate the solution space and take us to the lowest energy states is an area of research. Ref. 11 is a good perspective paper on protein structure prediction using quantum computers. Although variational algorithms have been, by far, the go-to method of choice, there could be other ways where quantum and classical computers could work together to solve this problem. Quantum computers could create the initial set of low energy conformation states at a coarse-grained level. Classical computers could then be used to fine-tune at a much granular level using physics-based models. This is one among many other ideas researchers are exploring. However, the holy grail would be to execute MD simulations of protein chains in a quantum computer with quantum effects factored in. With technology progressing at a rapid pace, it is hoped that this would be a reality by the end of this decade.

Acknowledgement

The author would like to thank Prof. Sanjib Senapati and his PhD. students Avinash, Vasavi and Ashwini from the Biotechnology Department of IIT Madras, Chennai, and Soham Bopardikar from the College of Engineering, Pune. I would also like to thank Anuradha Gupta and Praveen Jayachandran for their valuable suggestions in improving the writeup.

References

  1. https://www.dhirubhai.net/pulse/algorithms-near-term-quantum-computers-kalyan-dasgupta,
  2. S. Sim, P. D. Johnson, and A. Aspuru-Guzik, Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms, arXiv preprint arXiv:1905.10876v1, 2019.
  3. Perdomo, et. al., Construction of model Hamiltonians for adiabatic quantum computation and its application to finding low-energy conformations of lattice protein models, Phys. Rev. A, vol. 78, Jul 2008.
  4. Babbush, A., et. al., Construction of energy functions for lattice heteropolymer models: Efficient encodings for constraint satisfaction programming and quantum annealing, Advances in Chemical Physics. Wiley, Apr 2014.
  5. M. Fingerhuth, T. Babej, and C. Ing, A quantum alternating operator ansatz with hard and soft constraints for lattice protein folding, arXiv preprint arXiv:1810.13411, 2018.
  6. Jaya Vasavi P, et. al., An approach to solve the coarse-grained Protein folding problem in a Quantum Computer, arXiv:2311.14141v1, 2023.
  7. T. J. Oldfield and R. E. Hubbard, Analysis of Cα geometry in protein structures, Proteins: Structure, Function, and Genetics, vol. 18, no. 4, pages 324–337, Apr. 1994. https://onlinelibrary.wiley.com/doi/10.1002/prot.340180404.
  8. A Robert, et. al., Resource-efficient quantum algorithm for protein folding, npj Quantum Information, vol. 7:38, 2021.
  9. Qiskit - open source quantum development, https://qiskit.org/.
  10. https://www.dhirubhai.net/pulse/guide-me-through-errors-quantum-computer-ritajit-majumdar/?trackingId=vg0NwSKkSQqE1NLUXH1hNA%3D%3D
  11. Hakan Doga, et. al., A perspective on protein structure prediction using quantum computers, arXiv preprint arXiv:2312.00875v1 , 2023.
  12. https://dimacs.rutgers.edu/~alantha/papers2/alantha-bill-bc.pdf


Aditya Anand

Student at Indian Institute of Technology (Banaras Hindu University), Varanasi

1 年

I am 3rd year materials science student of IDD program from IIT BHU. I am learning and using Qiskit, and hope you could tell me about skills I need to get an internship and later placement at IBM related to materials science.

回复

要查看或添加评论,请登录

Kalyan Dasgupta的更多文章

社区洞察

其他会员也浏览了