Matrix Multiplication Using MPI
Semanto Mondal
National PhD in AI @Unina, CNR & Unipegaso | MS in Data Science@UniNa | Intern @Apple Academy | Machine Learning | Deep Learning | Ex O&M Engineer @Mango | CCNA | JNCIA
Question: Does an increase in the number of processes always reduce the computation time?
PROBLEM STATEMENT:
Parallel Computing is becoming very powerful to handle large amounts of data. In general, there will be multiple processes running parallel to accomplish a particular task. Here comes one biggest question, does an increase in the number of processes always reduce the computation time?
So, this work aims to compute a matrix multiplication among two square matrices and the result of this multiplication will be stored in another square matrix to check whether an increase in the number of processes always reduces the computation time or not. The task should be divided into multiple different processes. If we consider n=number of processes, we will take different values of n from 1 to some 16 and compare the computation time for different numbers of processes. We will consider two larger matrices of N=700 so that the problem would be complex and well distributed among different processes. If two small metrics are considered then the complexity of the problem won’t be suitable for the parallel computing can be performed using traditional architecture only.
What is Parallel Computing?
Previously during past times, all the computations were performed on a single processor. So single processor has to perform all the necessary calculations such as arithmetic operations. This process is time-consuming. To overcome this problem and to utilize the powerful available resources parallel computing has been introduced. This is a special type of computing technology where multiple processors/cores perform the computation by working together. As a result, a big complex task is evenly distributed and load balanced. It reduces the pressure on a single processor. In this process, the entire workload is subdivided into multiple smaller tasks and distributed among several processors that are available in the parallel computing resources. This significantly reduces the computation time and enables large-scale computation.
What is HPC?
HPC stands for High-Performance Computing. This utilizes the availability of high-performance resources along with parallel computing. It uses all the available resources within the available resources to perform the calculations simultaneously. As a result, the computation time is reduced and more tasks can be done within a fraction of a second which could not have been thought of a few decades back. Nowadays this technology is used in a variety of fields such as weather forecasting, computer vision, astrophysics, financial modeling, time series forecasting, image processing, and many more. In general, this technology is suitable for very complex and computationally intensive applications that could not be performed using conventional computation systems.
IBISCO CLUSTER:
Different architectures are formed using different nodes/servers to perform HPC. Cluster is one of the most used architectures for HPC. In a cluster design, a N-number of devices such as computers or servers are connected to form a single computing system. These serves are also known as nodes which work together to perform any of the computational tasks.
The following architecture is used for the task:
What is MPI?
Message Passing Interface is available for the applications programming which is commonly used for parallel computing that allows multiple nodes to not only communicate but also synchronize with each other through message passing. More precisely this is used in distributed memory systems. In a distributed memory system each node has its local memory unit.
The working of the MPI can be described in the following steps:
领英推荐
a.????? Initialization: Initially MPI library is initiated on each node of the HPC Cluster
b.????? Data Distribution: The input data is subdivided into different small units and distributed into different processing nodes.
c.????? Computation: Each of the individual nodes performs the task assigned to it.
d.????? Communication: Using message passing each node communicates with each other and passes different intermediate data.
e.????? Finalization: After the job is completed, each of the resources is released and made free for the next job which could be for OpenMPI.
IMPLEMENTATION:
The full implementation with the explanation of the code is posted in this GitHub repo: https://github.com/semanto-mondal/MPI
COMPUTATION TIME COMPARISON:
So we can get the answer to the question that we have raised earlier from the plot above. As the number of processes increases the computation time decreases significantly. From the above plot, we can visualize that till nprocs is equal to 14 with the increase of nprocs, computation time reduces whereas after 14 with the increase of nprocs, the computation time starts increasing. This phenomenon is called Scaling Inefficiency. It is a common problem in parallel computing that after a threshold point with the increase of nprocs the computation time increases instead of decreasing. One cause that leads to this problem is communication overhead. As the number of processors used in the computation increases, the amount of communication required between processors can also increase. This communication can involve sending and receiving data between processors, synchronizing the computation across processors, and managing the distribution of work among processors.
CONCLUSION:
In general, MPI uses only the CPUs to perform the computations. CPUs are comparatively slower than the GPUs. To force MPI to use GPUs we need to use CUDA-aware MPI libraries which utilize both the availability of CPUs and GPUs. Otherwise, to deal with more complex and computationally expensive problems it is better to use CUDA which is another parallel computing platform and programming model developed by NVIDIA. These could also be used in image processing as well as in other machine-learning fields where we have to handle several tasks and perform the computation of large amounts of data. There are several fields like quantum chemistry, training deep learning models using GPUs, climate modeling, and processing of satellite images where HPC can play an important role.
"Master's in Data Science at Federico II | Analytical Thinker | Future-Ready Data Scientist"
10 个月Interesting!
Climbing the Maslow's Pyramid | Roche | Machine Learning Developer | Productionizing RAG-LLM
10 个月very interesting proof!