Introduction to High Performance Computing (HPC)

Introduction to High Performance Computing (HPC)

Definition

HPC refers to the use of supercomputers and parallel processing to perform complex calculations at lightning speeds. Unlike traditional computers, HPC systems integrate thousands (or even millions) of processing cores, working collaboratively to handle massive datasets and execute intricate simulations.

Use cases

The most relevant and important use case for hpc is simulation because it enables researchers and industries to replicate complex real-world processes in a virtual environment. This helps scientists to reduce risk and manage cost efficiency. Examples of simulations using hpc include:

  1. Predicting weather and climate
  2. Understanding space
  3. Optimization of algorithms

Examples and Implementations

There are two primary implementation models: shared memory and message passing. Both have their unique strengths and applications, depending on the architecture and scale of the computing system.

  1. Shared memory

Shared memory is a way for multiple programs (or processes) to communicate and work together by using the same memory space.


This is one of the most efficient methods for single-node machines and has little overhead. However, shared memory is limited by the physical memory of the machine. It does not scale easily to systems with multiple nodes. Also, developers must implement proper synchronization to avoid race conditions and ensure data consistency, which can add complexity to the code when working with shared varibales.

The most well-known shared memory implementation library is openMP.

Example: Implementation of parallel summation on an array

#include <stdio.h>
#include <omp.h>

int recursive_sum(int nums[], int begin, int end);

int main(int argc, char const *argv[])
{
    int A[] = {
        1, 2, 3, 4, 5, 6, 7, 8
    };

    int ans;
    #pragma omp parallel
    {
        #pragma omp single
        {
            ans = recursive_sum(A, 1, 8);
        }
    }
    printf("Sum of the numbers in verctor is: %d\n", ans);
}


int recursive_sum(int nums[], int begin, int end) {
    if (end - begin <= 2)
    {
        int sum = 0;
        for (int i = begin; i <= end; i++)
        {
            int threadNum = omp_get_thread_num();
            int threads = omp_get_num_threads();
            sum += nums[i-1];
            printf("Threads: %d, Thread num: %d, Calculated sum:%d\n", threads,
            threadNum, sum);
        }
        return sum;
    }

    int middle = (end - begin) / 2 + begin;
    int left = 0, right = 0;

    #pragma omp task shared(left)
    left = recursive_sum(nums, begin, middle);

    #pragma omp task shared(right)
    right = recursive_sum(nums, middle + 1, end);
    
    #pragma omp taskwait
    return left + right;
}        

output:

Threads: 12, Thread num: 10, Calculated sum:7
Threads: 12, Thread num: 9, Calculated sum:3
Threads: 12, Thread num: 9, Calculated sum:7
Threads: 12, Thread num: 0, Calculated sum:5
Threads: 12, Thread num: 0, Calculated sum:11
Threads: 12, Thread num: 8, Calculated sum:1
Threads: 12, Thread num: 8, Calculated sum:3
Threads: 12, Thread num: 10, Calculated sum:15
Sum of the numbers in verctor is: 36        


2. Message passing

The message passing model is used in environments where processes are distributed across multiple machines or nodes. Unlike shared memory, message passing requires processes to explicitly send and receive data via messages, even if those processes are running on different physical machines.


Message passing is ideal for distributed systems and clusters where processes are running on different machines. It is more scalable than shared memory and works well for heterogeneous systems, where nodes may have different memory and processing capabilities.

However, Since data must be physically transferred between processes, message passing typically involves more overhead compared to shared memory, especially in high-latency networks.

One of the best tools for using message passing is MPI.

Example: Calculating pi using the area of a circle with a radius of 1

#include <mpi.h>
#include <math.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int n = 100, myid, numprocs, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x;
    

    MPI_Init (&argc,&argv);
    MPI_Comm_size (MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank (MPI_COMM_WORLD,&myid);

    MPI_Bcast (&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
    h = 1.0 / (double) n; 
    sum = 0.0;
    for (i = myid + 1; i <= n; i += numprocs) { 
        x = h * (((double)i) - 0.5); 
        sum += 4.0 * sqrt (1.0 - x*x); 
    }
    mypi = h * sum;
    printf("Calculated piece of pi: %f, on process: %d\n", mypi, myid); 
    MPI_Reduce (&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
    if (myid == 0) {
        printf("Calculated pi: %.16f, Error: %.16f\n", pi, fabs(pi - PI25DT));
    }
    
    MPI_Finalize();
    return 0;
}        

output:

Calculated piece of pi: 0.411435, on process: 0
Calculated piece of pi: 0.401444, on process: 2
Calculated piece of pi: 0.387584, on process: 4
Calculated piece of pi: 0.379954, on process: 6
Calculated piece of pi: 0.406696, on process: 1
Calculated piece of pi: 0.395133, on process: 3
Calculated piece of pi: 0.383863, on process: 5
Calculated piece of pi: 0.375827, on process: 7
Calculated pi: 3.1419368579000082, Error: 0.0003442043102151        
David Harracksingh

Linux Systems Build Engineer Supercomputer HA-HPC Design

3 个月

Interesting. I was doing something not quite the same with shared memory and posted it on Twitter the other day. I have a theory on how to use it but not for calculations.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了