High-performance Computing in C++ : Open Muti Processing(OpenMP)

High-performance Computing in C++ : Open Muti Processing(OpenMP)

Open Multi-Processing:

?Let's consider the parallelization approaches, basically, we can think of imperative Parallelization and Declarative parallelization.

  1. ?Imperative Approach: In this case, everything is done by hand which means that we initially separate the data set into several independent parts. Then create threads to process each part separately by using std::thread or pthread. Run the computation on each thread and consolidate the outcome for the final result. In this approach, the main challenge is to divide the data and share it across the thread for computation. Alternatively, we can use some libraries like?Microsoft PPL , Intel TBB etc.
  2. ?Declarative Approach: Here we are leaving our sequential code as it is for optimization to the compiler. Decorate code with compiler hints. In this approach, the compiler decides to split which part of the code can be parallelized?into several threads. (eg: For loop)?

In the above case, OpenMP comes into the picture. OpenMP is the standard API?for decorating code for multiprocessing

  • Data Parallelism?( eg: loops)
  • Task Parallelism?(eg:?running the block of code in a separate thread)

Essentially the OpenMP is a compiler+library solution. Here we completely depend on the compiler to optimize the code but use some library to perform some operations.

In c++, we use #pragma directives to indicate the parallelization. Plenty of compiler clauses support Data Sharing, synchronization, scheduling, etc.?

The following picture gives some details on how the OpenMP works.

No alt text provided for this image

  • Work Sharing:

Let's see, how that omp for and omp do parallelize the loops to improve the performance and section(s) which assign different code blocks to different threads to optimize the execution. Similarly omp provide the single thread approach, it assigns the single block to a single thread and executes it separately. It implies the barrier at the end of execution. One more concept is "master", in this case, code is always executed by the master thread, and no barrier at end of the execution.?

No alt text provided for this image
No alt text provided for this image

  • Synchronization:

The following techniques are used in omp for synchronization of code execution.

  • Critical: block executed by one thread at a time
  • Atomic: next memory update is atomic
  • Ordered: block executed in the same order as if it were sequential
  • Barrier: all threads wait until each one has reached this point
  • Nowait: threads can proceed without waiting on other threads.

Let's consider the following code for synchronization.

No alt text provided for this image

  • Data sharing:
  • Shared: item is accessible by all threads simultaneously

??????All variables except the loop counter shared by default

  • Private: item is thread-local, not inited or available outside parallel region
  • Firstprivate: like private but initiated to the original value
  • Lastprivate: like private except original value updated at exit
  • Defualt: defines whether, by default, variables are shared or not (none)

?Refer to the following example to check the implementation

No alt text provided for this image

要查看或添加评论,请登录

Shrikant Badiger的更多文章

  • NVMe Over TCP

    NVMe Over TCP

    NVMe over TCP is enhanced feature of NVMe over Fabrics. It used the standard network stack(Ethernet) without any…

    1 条评论
  • Bazel Build for C++ Software Application

    Bazel Build for C++ Software Application

    Bazel Tool is developed by google to automate the build process. Now It's an open source and it can be used by anyone.

  • C++ Class Layout

    C++ Class Layout

    Class Layout: Only non-static data members will contribute to the size of the class object. If we have static and…

    1 条评论
  • High-performance Computing in C++

    High-performance Computing in C++

    Single Instruction Multiple Data (SIMD) Multiple core CPUs and Multithreading: Declarative(OpenMP), imperative…

  • vSocket Interface - Guest to Host Communication

    vSocket Interface - Guest to Host Communication

    vSocket: VMware vSocket provides a very similar API to the Unix Socker interface for communication. vSocket library is…

  • Custom Memory Management in C++

    Custom Memory Management in C++

    Memory Management: Process in which memory allocation and de-allocation to the variable in running program and handle…

  • Pointers in C

    Pointers in C

    Pointers in C: Pointers are fundamental parts of C Programming. Pointers provide the lots of power and flexibility in C…

  • CMake and Useful Info

    CMake and Useful Info

    CMake is an open-source tool to build, test, and package software applications. CMake provides control over the…

    1 条评论
  • Interrupt !!

    Interrupt !!

    Processors need to detect hardware activities. There are multiple solutions to detect hardware activities.

  • PXE: Preboot Execution Environment

    PXE: Preboot Execution Environment

    PXE: Preboot Execution Environment. Technology helps computers to boot up remotely through a network interface.

社区洞察

其他会员也浏览了