A comparison of two std::transform alternatives

UPDATE: For an update on the implementation and the conclusions see A comparison of two std::transform alternatives revisited.

I was writing a small utility function to transform a string to uppercase. The obvious solution for that is std::transform, but as I was writing it I realized there are several alternatives:

  • transform an existing string, by setting its elements to uppercase one by one
  • iterate over an existing string, and insert a copy of its uppercased elements into another string, initially empty, using std::back_inserter

Obviously, the second approach should be slower since it has to deal with buffer reallocations. However, I was curious how slower comparing to the first approach that would be. So I decided to test it.

UPDATE: It has been suggested that in the second version I should make a reserve of the string before using std::back_inserter to add characters to the string. Therefore I added a 3rd version that does that.

This is how I implemented the two version different versions of the helper function:

inline std::string to_upper_v1(std::string const & text)
{
  auto uppertext { text };
  std::transform(std::begin(uppertext), std::end(uppertext), std::begin(uppertext), toupper);
  return uppertext;
}

inline std::string to_upper_v2(std::string const & text)
{
  auto uppertext = std::string{};
  std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper);
  return uppertext;
}

inline std::string to_upper_v3(std::string const & text)
{
   auto uppertext = std::string{};
   uppertext.reserve(text.size());
   std::transform(std::begin(text), std::end(text), std::back_inserter(uppertext), toupper);
   return uppertext;
}

To test it, I decided to randomly generate strings. The length of the strings and their content is randomly generated. Both functions are tested with the same strings after a copy is initially done.

void test_v1(std::vector<std::string>& strings)
{
   auto start{ std::chrono::high_resolution_clock::now() };
   for (auto& s : strings)
   {
      to_upper_v1(s);
   }

   auto end{ std::chrono::high_resolution_clock::now() };
   auto duration{ std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() };
   std::cout << "duration v1 (" << strings.size() << ") = " << duration << std::endl;
}

void test_v2(std::vector<std::string>& strings)
{
   auto start{ std::chrono::high_resolution_clock::now() };
   for (auto& s : strings)
   {
      to_upper_v2(s);
   }

   auto end{ std::chrono::high_resolution_clock::now() };
   auto duration{ std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() };
   std::cout << "duration v2 (" << strings.size() << ") = " << duration << std::endl;
}

void test_v3(std::vector<std::string>& strings)
{
   auto start{ std::chrono::high_resolution_clock::now() };
   for (auto& s : strings)
   {
      to_upper_v3(s);
   }

   auto end{ std::chrono::high_resolution_clock::now() };
   auto duration{ std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() };
   std::cout << "duration v3 (" << strings.size() << ") = " << duration << std::endl;
}

int main()
{ 
   auto seed_data = std::array<int, std::mt19937::state_size> {};
   std::random_device rd;

   std::generate(std::begin(seed_data), std::begin(seed_data), std::ref(rd));
   std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
   auto eng = std::mt19937{ seq };
   auto dist_len = std::uniform_int_distribution<>{ 3, 12 };
   auto dist_char = std::uniform_int_distribution<>{ 0, 25 };

   auto strings = std::vector<std::string>{};
   strings.reserve(100000000);
   for (auto i = 0; i < 100000000; ++i)
   {
      auto length = dist_len(eng);
      auto text = std::string(length, '\0');
      std::generate(std::begin(text), std::end(text), [&dist_char, &eng]() {return 'a' + dist_char(eng); });

      strings.push_back(text);
   }

   auto counts = {1000, 10000, 100000, 1000000, 10000000, 100000000};
   for (auto count : counts)
   {
      {
         auto v1 = std::vector<std::string>(std::begin(strings), std::begin(strings) + count);      
         test_v1(v1);
      }

      {
         auto v2 = std::vector<std::string>(std::begin(strings), std::begin(strings) + count);
         test_v2(v2);
      }

      {
         auto v3 = std::vector<std::string>(std::begin(strings), std::begin(strings) + count);
         test_v3(v3);
      }
   }

   return 0;
}

The results, tested with a 64-bit release build with Visual Studio 2015 Update 2, look like below. Times are in microseconds.

No of strings time v1 time v2 time v3 Percentage of slowdown with v2 Percentage of slowdown with v3
1000 40 57 64 42.5 60
10000 593 568 637 42.5 53.1
100000 3894 5769 6497 48.2 66.8
1000000 40005 57852 65793 44.6 64.5
10000000 394573 584048 734463 48 86.1
100000000 4298742 6171199 7577972 43.6 76.3

I have run this several times with similar results. The following image shows how much slower the versions using std::back_inserter were comparing with the version that modifies the string directly. With blue it is represented the results for version 2 and with orange the results for version 3 (with initial reservation).
transform

This clearly indicates that using std::back_inserter is slower, and it is actually 30 to 60% slower. However, what has surprized me is that reserving the necessary space for the string before std::back_inserter starts inserting elements is even slower (in some cases it can take twice as much time than version 1). Of course this measures the time to allocate the buffer too, not just the time for transforming the string, but the point here is to profile the entire function, not just the transform operation.

7 Replies to “A comparison of two std::transform alternatives”

  1. The second version does more allocations. What happens if you call an uppertext.reserve(text.size()) before std::transform?

  2. Hello,

    To be fair and only measure the overhead of std::back_inserter, the to_upper_v2 should do a uppertext.reserve(text.size()) before call the transform.

  3. I have taken the feedback for using reserve() before calling transform() and the results were surprising. This version is even slower. I have triple checked to see if I made a mistake, but could not spot any.

  4. My bad, I do not spend enough time to read the test case.
    The length of the string is always < 3 and std::string in Microsoft STL have a small buffer optimization of 16 bytes. This explain why the V3 is slower, no reallocation was perform but more calls are done.

  5. Thanks for the post!

    With 40M length string on my machine, calling reserve is a bit 30% faster than not reserving, 12% slower than copying first with VS 2017 x64 release.

    Any clue why v1 is much faster than v2 and v3? Could be that inserting needs to check the size.

    I also implemented plain for loop which is about the same as reserving. The theory above doesn’t seem to work for it. Any clue?

    inline std::string to_upper_v4(std::string const & text)
    {
    auto uppertext{ text };
    int len = text.length();
    for (int i = 0; i < len; i++) {
    uppertext[i] = toupper(text[i]);
    }
    return uppertext;
    }

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.