In the previous post I have compared two alternative ways of transforming a string to upper case, both using std::transform: one that modifies an existing string and one that generates a new one by inserting at the end using std::back_inserter. For the second alternative I have presented two implementations, one that does an initial reservation for the newly created string and one that does not.
inline std::string to_upper_v2(std::string const & text) { auto uppertext = std::string{}; std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper); return uppertext; } inline std::string to_upper_v3(std::string const & text) { auto uppertext = std::string{}; uppertext.reserve(text.size()); std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper); return uppertext; }
The curious conclusion of the tests was that the version with reserve was actually slower than the one that did not perform an initial reservation.
The solution was built with Visual Studio 2015 Update 2. As it was later noticed in the comments, the actual cause of that is a Microsoft optimization for std::string by using an array of 16 characters for strings that do not exceed this size and only dynamically allocate memory for larger strings. Since all the strings had a length between 3 and 12 characters, this optimization was used for all strings. Therefore, reserve() dynamically allocated memory that was never used and its execution time only added to the overall time.
To actually be able to test the performance of these two implementations with VC++, the strings should be larger than 16 characters. So I changed the code to generate strings between 17 and 25 characters long.
auto dist_len = std::uniform_int_distribution<>{ 3, 12 };
The results this time were totally different. The 3rd version with initial reserving was more performant than the one that did not do that. It can also be noticed that the more strings need to be transformed the more similar times it takes for all the versions.
No of strings | time v1 | time v2 | time v3 | Percentage of slowdown with v2 | Percentage of slowdown with v3 |
1000 | 122 | 219 | 205 | 79.5 | 68.0 |
10000 | 1202 | 2178 | 2055 | 81.2 | 71.0 |
100000 | 13563 | 22758 | 21431 | 67.8 | 58.0 |
1000000 | 136160 | 225669 | 214149 | 65.7 | 57.3 |
10000000 | 1368034 | 2268991 | 2155969 | 65.9 | 57.6 |
100000000 | 23090172 | 27997658 | 27322888 | 21.3 | 18.3 |
In the chart below with blue it is represented the results for version 2 and with orange the results for version 3 (with initial reservation).
Note: Generating 100 milion strings between 17 and 25 characters require a lot of memory. In my tests it peaked to 13GB. So if you want to run the code you should be aware of this.
Compiling code with gcc version 11.2.0 (Rev5, Built by MSYS2 project) (Windows 10), my results are:
for dist_len {3, 12}:
duration v1 (1000) = 0
duration v2 (1000) = 0
duration v3 (1000) = 0
duration v1 (10000) = 1001
duration v2 (10000) = 0
duration v3 (10000) = 0
duration v1 (100000) = 3000
duration v2 (100000) = 1953
duration v3 (100000) = 2003
duration v1 (1000000) = 29952
duration v2 (1000000) = 27020
duration v3 (1000000) = 24999
duration v1 (10000000) = 276953
duration v2 (10000000) = 257000
duration v3 (10000000) = 235970
for dist_len {17, 25}:
duration v1 (1000) = 0
duration v2 (1000) = 0
duration v3 (1000) = 984
duration v1 (10000) = 982
duration v2 (10000) = 999
duration v3 (10000) = 1006
duration v1 (100000) = 12000
duration v2 (100000) = 14010
duration v3 (100000) = 14000
duration v1 (1000000) = 108033
duration v2 (1000000) = 116949
duration v3 (1000000) = 101997
duration v1 (10000000) = 1042996
duration v2 (10000000) = 1190016
duration v3 (10000000) = 1025000
I didn’t execute test for 100 millions strings.