Parallelizing Quantum-Classical Workloads: Profiling the Impact of Splitting Techniques
Anupama Ray
Research Scientist | IBM Quantum Technical Ambassador and Qiskit Advocate | Adjunct Professor at IIT Jodhpur
The largest fleet of Quantum Computers are available today! Since 2016 IBM has been putting their quantum devices on cloud for anyone to access them around the globe and till now more than 60 such IBM Quantum computers have been deployed on cloud. We have 433 qubit devices that are already available and given the IBM Quantum Roadmap we are hoping to have a 1000+ qubit device this year in 2023. But are we utilising so many qubits already? Are we utilising these devices to their full potential yet?
In the recently concluded IEEE conference on Quantum Computing and Engineering (QCE2023) we witnessed 150+ research papers and most quantum algorithms are hybrid in nature with some quantum and some classical workloads. In this blog we present a profiling study of two techniques to parallelize quantum-classical workloads across multiple quantum devices namely: Circuit Parallelization and Data Parallelization. These parallelization strategies can not only improve the running time of hybrid quantum algorithms, but can improve utilisation of resources and lower the impact of noise in the overall application.
Noise is the primary hindrance in the scalability and applicability of current quantum devices. Noise comes into the system from faulty gates, faulty measurement, faulty state preparation and the natural tendency of a qubit to spontaneously release energy and settle down to its minimum energy state. Efforts are consistently being made to improve the hardware, and come up with software methods to lower the effect of noise on the system. Look into our previous blog which shares some methods to lower the effect of noise on a quantum device. One approach to potentially mitigate noise could be to leverage the diverse noise profiles across multiple machines at a given point in time. Parallelization could be useful to exploit this fact. As a side-effect, it could lower the per machine workload and result in faster execution of a job. In this blog, we discuss a profiling study of two parallelisation methods to (i) lower the noise on the system, and (ii) reduce the workload on each system, resulting in faster execution.
Circuit parallelization
The first parallelization method that we shall look into is termed as circuit cutting, or circuit parallelization [1, 2]. In this method, a circuit is partitioned into multiple subcircuits, each of which has a lower number of qubits and/or gates. Due to the reduced number of gates, measurements etc., the noise on each subcircuit is expected to be lower than the original circuit [3, 4, 5]. Moreover, each subcircuit can now be executed simultaneously on two different hardware.
We apply the method of circuit parallelization on a VQE problem. Variational Quantum Eigensolver (VQE) is a class of hybrid quantum-classical algorithm which focuses on finding the ground state of some Hamiltonian. This has applications primarily in quantum chemistry, condensed matter physics etc. Fig. 2 shows an example of a hybrid quantum-classical algorithm. In these algorithms, the overall workload is divided between a quantum processing unit (QPU) and a classical processing unit (CPU). The quantum circuit executed on the QPU is usually a parameterized circuit. The expectation value of a Hamiltonian obtained from this circuit is fed to the CPU, which determines a better set of parameters to obtain an improved expectation value on the next iteration.
For our problem, we consider a simple n-qubit Quantum Heisenberg Spin Model (QHSM) where the interaction is between neighbouring qubits along the Z-direction with coupling strength=1. The ground state of such a Hamiltonian is a stream of qubits such that the spins of two neighbouring qubits are opposite. Using circuit parallelization, we partition the parameterized circuit into two subcircuits and execute each subcircuit independently on a different QPU. Finally we classically recombine the outcomes from each QPU to find the expectation value.
In our experiment, we use a 6-qubit Hamiltonian. The ideal expectation value of this Hamiltonian, computed via classical numerical methods, is -5. In Fig. 2 we show the expectation values obtained with and without circuit parallelization. For both cases, we show the results with and without measurement error mitigation (MEM) which is used to mitigate the effect of faulty measurements.
We observe that the expectation value obtained without circuit parallelization is greater than -3 both with and without MEM. The effect of noise on the system is clearly observable. On the other hand, when circuit parallelization is used, the obtained expectation value is nearly -5 with MEM. In other words, even MEM is not sufficient to produce the expected outcome of VQE in a noisy scenario. However, when combined with circuit parallelization, MEM nearly leads to the ideal expectation value. On all our experiments, we observed an average improvement of 39% in the quality of expectation value obtained for the VQE problem when circuit parallelization was used.
Data parallelization
Not all circuits are suitable for cutting. This is because the classical recombination to determine the uncut probability distribution scales exponentially with the number of cuts. Therefore, if the number of cuts is large, the classical recombination time overwhelms the advantages of circuit parallelization. The VQE problem conform to a circuit which can be effectively partitioned using a single cut. However, the same is not true for a Quantum SVM (QSVM) circuit which is used for quantum machine learning (QML). In such cases, data parallelization could be useful.
领英推荐
We illustrate data parallelization by batching the samples for training and test sets to compute the kernel for QSVM classifier. We combine data parallelization with reducing the feature set and applying error mitigation to look at its effect on speed-up, reducing resource foot-print and accuracy of prediction. We have conducted our experiments with training and testing samples of 100 samples (70:30 split), with 6 features and 2 feature respectively. We explore circuit batch sizes of 1000 and 500 circuits per job. We have tried both MEM and ZNE error-mitigation in the profiling experiments. Our profiling experiments for execution time (purely time spent executing the workload on the machine) and resource foot-print (qubit-minutes) indicate that using QSVM with both data parallalization and a reduced feature set yields (reducing features from 6 to 2) up to 3× faster quantum workload execution time and reduces quantum resource usage (qubits-seconds) by 3×, while providing an accuracy comparable to the baseline without these optimizations.
Additional Insights
Besides exposing trade-offs when using these strategies, this also open up future opportunities for designing workflow orchestration platforms to intuitively leverage these. Parallel execution of job batches help leverage quantum processors of diverse capacity on the cloud depending, on their load. But the user workload itself might not always be amenable to exploit this opportunity, e.g., a 6-qubit job will not run on a 5-qubit machine even if it is available. Circuit cutting, circuit batching and feature sub-selection can help adapt the workload and also leverage parallelization. At the same time, without dynamic orchestration, parallel jobs of the same application running on different machines can be delayed by the slowest of them, based on changing quantum machine load and availability. Our study demonstrates improvements in the quality of the result for both VQE and QSVM when error mitigation is used, but at the cost of higher quantum resource usage. Under budgetary constraints, the orchestration platform could selectively utilize error mitigation while being aware of resource costs.
Authors: Ritajit Majumdar, PhD , Padmanabha Venkatagiri S , Yogesh Simmhan , Anupama Ray , Tuhin Khare , Rajiv Sangle
Tags: #ibm, #ibmresearch, #IBMResearchIndia, #quantumcomputing, #ibmquantum #QCE2023 #QuantumSystemSoftware (QSYS)
This Blog is based on a recently accepted paper at IEEE International Conference on Quantum Computing and Engineering (QCE) 2023.
The preprint is available at https://arxiv.org/abs/2305.06585.
References
[1] T. Peng, A. W. Harrow, M. Ozols, and X. Wu, “Simulating large quantum circuits on a small quantum computer,” Physical Review Letters, vol. 125, no. 15, p. 150504, 2020.
[2] W. Tang, T. Tomesh, M. Suchara, J. Larson, and M. Martonosi, “Cutqc: using small quantum computers for large quantum circuit evaluations,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 473–486.
[3] S. Basu, A. Saha, A. Chakrabarti, and S. Sur-Kolay, “i-qer: An intelligent approach towards quantum error reduction,” ACM Transactions on Quantum Computing, 2021.
[4] T. Ayral, F.-M. L. Re ?gent, Z. Saleem, Y. Alexeev, and M. Suchara, “Quantum divide and compute: exploring the effect of different noise sources,” SN Computer Science, vol. 2, no. 3, pp. 1–14, 2021.
[5] R. Majumdar and C. J. Wood, “Error mitigated quantum circuit cutting,” arXiv:2211.13431, 2022.
Founder & CEO of BosonQ Psi | CFD & CAE Simulations | Quantum Computing | HPC |
11 个月Very interesting work and insights! One of the points that is coming out from the article that I would like to reiterate, the future is going to CPU + GPU + QPU, i.e., Quantum computers will be part of HPC workload (rather than a Quantum-only future) and this article brings out the point that it's a workload orchestration/distribution problem. For practical world (industry and businesses), they are waiting to see an ROI with this HPC + QC infrastructure.
IBM Supply Chain Transformation Leader (AI, Analytics) and Quantum
12 个月Highly recommended read
Data Scientist @ Schneider Electric | IIIT Hyderabad | Gen-AI | MLE
1 年Congratulations!
IBM Quantum India Leader | IBM Master Inventor
1 年Excellent work!