A comparison of two std::transform alternatives revisited

In the previous post I have compared two alternative ways of transforming a string to upper case, both using std::transform: one that modifies an existing string and one that generates a new one by inserting at the end using std::back_inserter. For the second alternative I have presented two implementations, one that does an initial reservation for the newly created string and one that does not.

inline std::string to_upper_v2(std::string const & text)
{
  auto uppertext = std::string{};
  std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper);
  return uppertext;
}
 
inline std::string to_upper_v3(std::string const & text)
{
   auto uppertext = std::string{};
   uppertext.reserve(text.size());
   std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper);
   return uppertext;
}

The curious conclusion of the tests was that the version with reserve was actually slower than the one that did not perform an initial reservation.

The solution was built with Visual Studio 2015 Update 2. As it was later noticed in the comments, the actual cause of that is a Microsoft optimization for std::string by using an array of 16 characters for strings that do not exceed this size and only dynamically allocate memory for larger strings. Since all the strings had a length between 3 and 12 characters, this optimization was used for all strings. Therefore, reserve() dynamically allocated memory that was never used and its execution time only added to the overall time.

To actually be able to test the performance of these two implementations with VC++, the strings should be larger than 16 characters. So I changed the code to generate strings between 17 and 25 characters long.

auto dist_len = std::uniform_int_distribution<>{ 3, 12 };

The results this time were totally different. The 3rd version with initial reserving was more performant than the one that did not do that. It can also be noticed that the more strings need to be transformed the more similar times it takes for all the versions.

No of strings	time v1	time v2	time v3	Percentage of slowdown with v2	Percentage of slowdown with v3
1000	122	219	205	79.5	68.0
10000	1202	2178	2055	81.2	71.0
100000	13563	22758	21431	67.8	58.0
1000000	136160	225669	214149	65.7	57.3
10000000	1368034	2268991	2155969	65.9	57.6
100000000	23090172	27997658	27322888	21.3	18.3

In the chart below with blue it is represented the results for version 2 and with orange the results for version 3 (with initial reservation).

Note: Generating 100 milion strings between 17 and 25 characters require a lot of memory. In my tests it peaked to 13GB. So if you want to run the code you should be aware of this.

1 Reply to “A comparison of two std::transform alternatives revisited”

Michal says:

January 26, 2022 at 2:55 pm

Compiling code with gcc version 11.2.0 (Rev5, Built by MSYS2 project) (Windows 10), my results are:

for dist_len {3, 12}:

duration v1 (1000) = 0
duration v2 (1000) = 0
duration v3 (1000) = 0
duration v1 (10000) = 1001
duration v2 (10000) = 0
duration v3 (10000) = 0
duration v1 (100000) = 3000
duration v2 (100000) = 1953
duration v3 (100000) = 2003
duration v1 (1000000) = 29952
duration v2 (1000000) = 27020
duration v3 (1000000) = 24999
duration v1 (10000000) = 276953
duration v2 (10000000) = 257000
duration v3 (10000000) = 235970

for dist_len {17, 25}:

duration v1 (1000) = 0
duration v2 (1000) = 0
duration v3 (1000) = 984
duration v1 (10000) = 982
duration v2 (10000) = 999
duration v3 (10000) = 1006
duration v1 (100000) = 12000
duration v2 (100000) = 14010
duration v3 (100000) = 14000
duration v1 (1000000) = 108033
duration v2 (1000000) = 116949
duration v3 (1000000) = 101997
duration v1 (10000000) = 1042996
duration v2 (10000000) = 1190016
duration v3 (10000000) = 1025000

I didn’t execute test for 100 millions strings.