Comparing copying performance in C++, Golang and Rust
Essentially, during the implementation of code such as functions, applications, algorithms, and more, programmers are concerned about factors like code size, data memory usage, and the execution speed of the code. Other criteria, such as maintainability, extensibility, and readability, also play a role. It's important to note that a trade-off often exists among these factors.
While C++ provides numerous features that allow programmers to express their ideas effectively, there are instances where this expressiveness isn't fully optimised. I recently undertook a comparison of various methods for copying data from one buffer to another in C++, Go Lang, and Rust.
I found this comparison intriguing for two reasons: 1) I've been advised that employing raw pointers in C++ is generally discouraged in favor of using smart pointers and algorithms; 2) Given that numerous high-load, low-latency projects are developed using Go Lang and Rust, I'm curious if these projects experience performance degradation. I'm seeking to determine if there's a valid case for utilising raw pointers in C++, as well as understanding the suitability of Go Lang and Rust in high-load project contexts.
I've chosen a very straightforward use case involving the copying of data from one buffer to another. I utilised the default command-line parameters of the g++, Go, and Rust compilers in terms of optimisation. While this initial setup is certainly insufficient for making a final decision, it serves as a foundational starting point.
Here's a C++ code that assesses the performance of copying a buffer using STL's copy and memcpy. The STL copy can be employed with the std::vector container and with raw pointers. I examined both scenarios out of sheer curiosity.
#include <vector>
#include <iostream>
#include <chrono>
namespace m = std::chrono;
double copy_stl() {
std::vector<int> v;
std::vector<int> v2(v);
for (int i = 0; i < 10000; ++i)
v.push_back(i);
auto start = std::chrono::high_resolution_clock::now();
std::copy(v.begin(), v.end(), std::back_inserter(v2));
auto stop = std::chrono::high_resolution_clock::now();
auto d = duration_cast<m::microseconds>(stop - start);
return d.count();
}
double copy_stl_2() {
int m1[10000];
int m2[10000];
for (int i = 0; i < 10000; ++i)
m1[i] = i;
auto start = std::chrono::high_resolution_clock::now();
std::copy(std::begin(m1), std::end(m1), std::begin(m2));
auto stop = std::chrono::high_resolution_clock::now();
auto d = duration_cast<m::microseconds>(stop - start);
return d.count();
}
double copy_raw() {
int m1[10000];
int m2[10000];
for (int i = 0; i < 10000; ++i)
m1[i] = i;
auto start = std::chrono::high_resolution_clock::now();
memcpy(m2, m1, 10000);
auto stop = std::chrono::high_resolution_clock::now();
auto d = duration_cast<m::microseconds>(stop - start);
return d.count();
}
int main(void)
{
int numIterations = 1000;
double totalDuration {0.0};
for (int i = 0; i < numIterations; i++) {
totalDuration += copy_stl();
}
std::cout << "Time taken by stl copying: "
<< totalDuration / numIterations
<< " microseconds" << std::endl;
totalDuration = 0.0;
for (int i = 0; i < numIterations; i++) {
totalDuration += copy_stl();
}
std::cout << "Time taken by stl 2 copying: "
<< totalDuration / numIterations
<< " microseconds" << std::endl;
totalDuration = 0.0;
for (int i = 0; i < numIterations; i++) {
totalDuration += copy_raw();
}
std::cout << "Time taken by raw copying: "
<< totalDuration / numIterations
<< " microseconds" << std::endl;
}
As we can observe, using memcpy with raw pointers offers the most efficient copying performance. However, it's important to note that improper usage can lead to memory corruption.
领英推荐
package main
import (
"fmt"
"time"
)
func main() {
var arr [10000]int
for i := 0; i < len(arr); i++ {
arr[i] = i
}
var arr2 [10000]int
numIterations := 1000
totalDuration := time.Duration(0)
for i := 0; i < numIterations; i++ {
startTime := time.Now()
copy(arr2[:], arr[:])
endTime := time.Now()
duration := endTime.Sub(startTime)
totalDuration += duration
}
avg := totalDuration / time.Duration(numIterations)
fmt.Printf("Average time taken for copying: %v\n", avg)
}
Moving on to evaluate the performance of the Go language. It's not unexpected that copying in Go is slower than memcpy, yet it's faster than using the STL copy algorithm.
use std::time::{Instant, Duration};
fn main() {
let mut arr: [i32; 10000] = [0; 10000];
for (i, item) in arr.iter_mut().enumerate() {
*item = i as i32;
}
let mut arr2: [i32; 10000] = [0; 10000];
let num_iterations = 1000;
let mut total_duration = Duration::new(0, 0);
for _ in 0..num_iterations {
let start_time = Instant::now();
arr2.copy_from_slice(&arr);
let end_time = Instant::now();
let duration = end_time.duration_since(start_time);
total_duration += duration;
}
let avg = total_duration / num_iterations as u32;
println!("Average time taken for copying: {:?}", avg);
}
The last experiment involves Rust. Its performance is nearly comparable to that of Go, but it's approximately twice as fast.
As a summary, here's a table displaying the measured microseconds needed to copy a buffer of 10,000 integers across various setups:
I refrain from drawing definitive conclusions. However, it's evident that we have an array of diverse tools at our disposal for implementing solutions, allowing us to select the most suitable option for each unique scenario.