Strategies for reserving memory for std::vector
I haven't written here lately, but a post by Amir Kirsh caused me to think of a few things - so I decided to share them here as well.
??????: consider strategies for reserving memory in vectors
So the basic example that I'm shamelessly stealing from Amir is this piece of code:
??????::????????????<??????::????????????> ????????
??????(std::????????_?? ??=0; ??<??000; ++??) {
??????????.????????_????????(??????::????_????????????(??));
};
Here we generate a vector that holds the strings "0" to "999", and while in terms of functionality it is fine - adding the elements one by one causes the vector to keep reallocating memory and copy\move the elements from the old memory location to the new one. But since we know how much memory we are going to need, we can tell vector to allocate all of it in advance:
??????::????????????<??????::????????????> ????????
????????.??????????????(??000);
??????(std::????????_?? ??=0; ??<??000; ++??) {
??????????.????????_????????(??????::????_????????????(??));
};
Great, much better! If we know how much memory we are going to need we can reserve it (the amount of memory the vector has is called the capacity) in advance and save all the reallocations and copy\move operations. But what if we don't know what size the vector is going to be? Well, there are still a couple of options to consider.
Before going any further, I'm going to make some assumptions mostly so it will be easy for me to do the calculations :
So I'm going to consider three options for when we don't know exactly how much memory we are going to need.
Take you best guess
This particularly effective when you are building your vector out of another source of information with a known size. Since the cost of guessing a number which is a little too large is a small memory waste, and the cost of guessing a number which is a little too small is a large memory waste (since the vector will reallocate and double the amount of memory it uses) and a large computation overhead (to copy\move almost all the elements) - it is better to overestimate in this case.
For example, let us assume we want to create a vector of names of all the students that have a birthday this month. While we know how many students they, we don't know in advance how many were born this month - but it probably about 1/12 of them, so a slightly higher number might be a good guess, for example:
birthday_students_vec.reserve(total_number_of_students / 10);
This will skip all the unnecessary reallocations and copy\move operations. You might end up with a little memory consumption overhead, but this is likely to happen anyway if you let the vector do the reallocations on its own.
So if you can, you should take a guess as to how much memory to reserve - even if it is inaccurate. But what if you simply don't know or can't even guess? Well, there are still more options.
Reserve 32 elements
The idea here is that we can save quite a lot of reallocations in exchange for the risk that we might waste some (relatively small) amount of memory. This is based on the observation that much of the allocations happen when the vector is still rather small - and slow done as it grows in size.
领英推荐
If we look at the original example - we've had 11 memory allocations in it: form 2^0 to 2^10. If instead we were to do something like this:
??????::????????????<??????::????????????> ??????;
????????.??????????????(32);
??????(std::????????_?? ??=0; ??<??000; ++??) {
??????????.????????_????????(??????::????_????????????(??));
};
Then now we only have 6 memory allocations: from 2^5 to 2^10. We've saved nearly half of the memory allocations, and a few of the copy\move operations. The "cost" here is that if instead of 1000 elements the vector would have turned out to have something like 9 elements (or any number blow 17, really) in it, than we have wasted more memory than we otherwise would have. But since 32 is still relatively small number, that extra waste would probably also be relatively small.
Reserve a lot more than you need, and then shrink it back down
Alright, lets look at a code example and see how something like that may work
??????::????????????<??????::????????????> ??????;
????????.??????????????(7500);
??????(std::????????_?? ??=0; ??<??000; ++??) {
??????????.????????_????????(??????::????_????????????(??));
};
strs.shrink_to_fit();
So here we are allocating a lot more memory than what we will actually going to need - which is the main drawback of this strategy. It is by no means simple to assume that that is possible - so you need to check this won't cause any troubles. Another potential issue is that according to the standard, the vector doesn't have to respect the shrink_to_fit function - it is non-binding. But most implementations will, and if not it is simple to work around this limitation.
But what do we gain? Well, in terms of allocations, we have two - one when we reserve the memory at the beginning, and one when we shrink the memory back. That down from 11, and only one "unnecessary" memory allocation. Not bad if you can afford the initial memory allocation.
What about copy\move operations? Well in the original solution we had 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 such operations, for a total of 1023 operations. In the new solution we have exactly 1000 of them - we've saved 23 copy\move operations! This is not unusual, in most cases we will save on the number of operations - and in the worst case it will only cost us one extra operation. So this is pretty good from this perspective as well.
Again, the main problem with this approach is the need for a large initial memory allocation. I think that can also be remediated using a temporary vector with custom allocator - but that is a topic for another post.
Further reading:
So
Lecturer at the Academic College of Tel-Aviv-Yaffo, Co-Organizer of Core C++
3 年Great post Noam, very good points!?