Rust and C++: Between Blackholes and Fractals.

Jose Crespo

Mathematician lurking in the Tech Underworld

发布日期: 2024年1月20日

Reading the inspiring innovative efforts of Tim Palmer to explain the properties of our universe with chaotic physics, I came up with a toy example by applying some kind of childish computation to process a black hole graphically with the renowned Mandelbrot fractal algorithm. By supposing that in fact the structure of the black hole can be approached with some fractal structure, we will try to model the toy example with C++ and Rust and see the differences in dealing with edge computing.

Given the real-time data coming from the imaginary black hole, let's try to be as performant as we can with a focus on overcoming the bottlenecks, particularly those related to mutex locking and data sharing. We can employ a strategy where each thread works on a separate buffer. After processing, these buffers can be merged into the final image. This approach allows us to get rid of the nasty mutex locks during processing, thus reducing the overhead.

CODE IN C++

#include <vector>
#include <thread>
#include <future>
#include <QImage>

class MandelCompute {

  // bla bla bla

};

void processPart(MandelCompute& task) {

  task(); // Invoking the task

}


int main() {

  int imgX = 4096, imgY = 2160; // High-resolution image
  int Xparts = 10, Yparts = 10; 
// Divide the image into a grid for parallel processing

std::vector<std::vector<std::unique_ptr<MandelCompute>>> tasks(Xparts,     std::vector<std::unique_ptr<MandelCompute>>(Yparts));
std::vector<std::thread> threads;



// Create separate images for each part to avoid locking

  for (int i = 0; i < Xparts; i++) {

    for (int j = 0; j < Yparts; j++) {

      double partXSpan = (lowerCornerX - upperCornerX) / Xparts;

      double partYSpan = (upperCornerY - lowerCornerY) / Yparts;

      double x1 = upperCornerX + i * partXSpan;

      double y1 = upperCornerY - j * partYSpan;

      double x2 = x1 + partXSpan;

      double y2 = y1 - partYSpan;

      int pxlX = (i == Xparts - 1) ? imgX - i * (imgX / Xparts) : imgX / Xparts;

      int pxlY = (j == Yparts - 1) ? imgY - j * (imgY / Yparts) : imgY / Yparts;

      // Each task works on a separate QImage object

      tasks[i][j] = std::make_unique<MandelCompute>(x1, y1, x2, y2, std::make_shared<QImage>(pxlX, pxlY, QImage::Format_RGB32), 0, 0, pxlX, pxlY, zoomLevel);

      threads.emplace_back(processPart, std::ref(*tasks[i][j]));

    }

  }

  // Wait for all threads to complete

  for (auto& thread : threads) {
    thread.join();
  }

// Combine the parts into the final image

// Each thread works on an independent QImage buffer (tasks[i][j]->img

  QImage finalImage(imgX, imgY, QImage::Format_RGB32);

  QPainter painter(&finalImage);

  for (int i = 0; i < Xparts; i++) {

    for (int j = 0; j < Yparts; j++) {

      painter.drawImage(i  (imgX / Xparts), j  (imgY / Yparts), *tasks[i][j]->img);

    }

  }

  // Save the final high-resolution image

  finalImage.save("black_hole_fractal.png", "PNG", 100);

  return 0;

}

What we have done here:

1. Separate Buffers: Each thread works on an independent QImage buffer (`tasks[i][j]->img`). This avoids the need for locking mechanisms during the processing phase.

2. Thread Management: The std::thread is used directly for parallel processing. Each thread is responsible for processing a part of the image.

3. Combining Image Parts: After all threads have completed their processing, the main thread combines the parts into a single final image.

4. Elimination of Mutex Locking: By using independent buffers and combining them at the end, mutex locking overhead is eliminated, which should improve the performance compared to the original Rust implementation that required mutexes for shared data access.

This reformulation leverages C++'s flexibility in managing memory and threads for high-performance applications, reducing overhead and potentially improving the execution speed for the given task.

Now it's the turn for Rust...

I have just commented in the code where I found the bottlenecks in comparison with the cleaner and more direct implementation in C++.

use std::sync::{Arc, Mutex};
use image::{RgbImage, ImageBuffer};
use rayon::prelude::*;

fn mandel_compute(...) {
    // Function body
}

fn main() {
    let img_x = 4096;
    let img_y = 2160;
    let img = Arc::new(Mutex::new(ImageBuffer::new(img_x, img_y)));

/*
 Bottleneck 1: Mutex locking overhead
 Every thread acquires a lock before modifying the image, which can be a  source of contention
and reduce parallel efficiency, especially if the locking is fine-grained
*/

    (0..Xparts).into_par_iter().for_each(|i| {
        (0..Yparts).into_par_iter().for_each(|j| {
            let img_clone = Arc::clone(&img);
            // Calculate partXSpan, partYSpan, etc.
/*
Bottleneck 2: Lock management overhead
The overhead of lock management (locking and unlocking) might impact     performance compared to direct access in C++
*/
            let mut img = img_clone.lock().unwrap();
            mandel_compute(...); // Work on a part of the image
        });
    });
/*
Bottleneck 3: Arc overhead
Using Arc for shared ownership adds slight overhead for reference  counting.
In high-performance scenarios, even this small overhead can be significant
*/
    let img = Arc::try_unwrap(img).expect("Lock still held").into_inner().unwrap();
    img.save("black_hole_fractal.png").unwrap();
}

So, what do we have here after coming back from the black hole with the data from the two spatial probes, one in C++ and the other in Rust? Yes, it's kind of a benchmarking test on steroids for proving the edge limit of Rust versus modern C++. Unfortunately, the Rust probe, due to latency problems, was swallowed by the black hole:

Rust's Programmatic Resources for High-Performance Computing

1. Thread Management:

- Rust uses std::thread for spawning threads, similar to C++.

- Libraries like rayon are used for data-parallel operations and can simplify the implementation of parallel algorithms. However, while rayon is efficient, it abstracts away many low-level details, which limits necessary optimizations that are possible in C++.

Srini Rajam 8 年前

Quantum Computing and the Traveling Salesperson…

Simone Severini 2 年前

Fast Simple Solutions to the Knapsack Problem

Srinath Sridhar 7 年前

2. Memory and Data Sharing:

- Rust employs Arc (Atomic Reference Counted) for sharing data safely between threads. This introduces overhead due to atomic reference counting.

- Mutexes ('std::sync::Mutex') are used for protecting shared data. This safe approach ensures data integrity but at the cost of performance due to lock contention.

- For lock-free programming, Rust provides atomic primitives and channels, but these are more complex to use correctly compared to C++'s flexibility with raw pointers and direct memory management.

3. Unsafe Code:

- Rust allows the use of unsafe code blocks to perform certain low-level operations. While this can potentially match C++'s performance, it requires a deep understanding of Rust's safety model and careful programming to avoid undefined behavior (cognitive overload for the techbros)

- The need to resort to unsafe code for certain optimizations negates some of Rust's safety advantages.

4. Image Processing Libraries:

- Rust's image crate is commonly used for image manipulation. While it is quite powerful, it might not be as optimized for performance as specialized C++ libraries, such as those used in conjunction with QImage.

Why These Resources Still Fall Short Compared to C++

Even with these resources, Rust cannot achieve the same level of performance as the C++ implementation for edge scenarios:

1. Fine-Grained Control: C++ allows more direct control over hardware and memory. This level of control is crucial in scenarios like even our toy example, where manipulating image data at a very low level can lead to significant performance gains.

2. Overhead of Safety Features: While Rust's safety features are highly ok for many applications, they introduce certain overheads. In extremely performance-critical applications, even small overheads can be significant.

3. Complexity of Unsafe Optimizations: While Rust's unsafe block can be used to bypass some safety checks for optimization, it increases the complexity of the code and the risk of introducing subtle bugs. This complexity deter developers from using unsafe to achieve the necessary optimizations.

4. Abstraction Layers: Rust's libraries, though efficient, add layers of abstraction that can obscure potential optimizations. These abstractions, while making the language more ergonomic and safe, while limiting the ability to fine-tune performance.

So, summing up, my techbros:

In high-performance computing, especially in scenarios demanding low latency and high throughput, the overhead imposed by Rust's safety and abstraction layers can be significant. While Rust is undoubtedly powerful and capable of high performance, C++'s less restrictive nature allows for more direct interaction with hardware and memory, providing optimizations that are challenging to replicate in Rust without compromising on the language's core safety principles.

Andre Bogus

Rust accelerationist

8 个月

Why did you set up multiple buffers in the C++ variant and joined them after the computation finished, while using a shared buffer behind a Mutex in Rust?

1 次回应

Attila NOT LOOKING F.

Senior Blockchain Developer at DLabs.hu

8 个月

Rust really shines when you cannot implement the whole system yourself alone. It suggests clear distinctions between business logic and synchronization mechanisms. Much easier to do code reviews if the unsafe synchronization code is separated into its own crate and the type-system encodes the assumptions made by that crate. But yeah, if you are clever and work alone, you might feel limited that you need to explain your assumptions to the compiler, which gets similar to the feeling of working together with someone else.

Abdur-Rahmaan Janhangeer

Python & OpenSource | The DM button is always near ^^_

8 个月

Arc in rust is funny. Just mentionning it evokes overhead. Arc is conceptually interesting but feels like a bore when using

查看更多评论

要查看或添加评论，请登录

查看全部

Rust and C++: Between Blackholes and Fractals.

Jose Crespo

Mathematician lurking in the Tech Underworld

CODE IN C++

Now it's the turn for Rust...

Rust's Programmatic Resources for High-Performance Computing

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Superiority of the Quantum Support Vector Machine

Quantum Web Dynamics Theory (QWDT) — A Comprehensive Guide

The FFT is Fifty

Stephen Wolfram remembers Ed Fredkin (1934-2023)

How did the visionary development of graph theory influence the fields of computer science and network theory?

The Search Problem: Grover's Search

SC18 - Distributed Memory Sparse Inverse Covariance Matrix Estimation on High-Performance Computing Architectures

New Special Issue "Methods and Applications of Quantum Data Processing"

Basic Introduction to HPC and Parallel Computing

Quantum Computing for the Curious

CODE IN C++

Now it's the turn for Rust...

Rust's Programmatic Resources for High-Performance Computing

领英推荐

How the old C used nowadays in NVIDIA GPUs can be transformed into a lethal Weapon with C++20

2024年2月26日

Rethinking Network Strategies: Where Most AI Data Centers Miss the Mark

2024年2月7日

Overcoming the Limitations of Training Models in AI with GPUs

2024年1月23日

SerDes: The True Game Changer of FPGA Against MCU

2023年11月18日

From More to Moore: Breakthrough FPGA State Machines with Category Theory

2023年11月8日

Next Jump in FPGA Memory: Coding a Novel Adaptive and Predictive FIFO Management for Edge Computing

2023年11月5日

A Novel Approach to Full-Adders: Bridging Digital Logic, Algebra, and Category Theory

2023年10月18日

How to Easily Hack a Crypto Algorithm with Induction Probes

2023年10月11日

?Let's Inject Faults into that PLL Device and bypass AES Encryption! PART 2: Generating Glitched Clock Signals Using AND-XOR Technique in FPGAs

2023年10月7日

Emergent Algebraic Structures in Digital Logic: A Deep Exploration into NAND Gates Through Group and Category Theories, and Galois Fields

2023年9月30日

社区洞察

其他会员也浏览了

Superiority of the Quantum Support Vector Machine

Quantum Web Dynamics Theory (QWDT) — A Comprehensive Guide

The FFT is Fifty

Stephen Wolfram remembers Ed Fredkin (1934-2023)

How did the visionary development of graph theory influence the fields of computer science and network theory?

The Search Problem: Grover's Search

SC18 - Distributed Memory Sparse Inverse Covariance Matrix Estimation on High-Performance Computing Architectures

New Special Issue "Methods and Applications of Quantum Data Processing"

Basic Introduction to HPC and Parallel Computing

Quantum Computing for the Curious