Extension methods in C++

A few days ago Bjarne Stroustrup has published a proposal paper (N4174) to the C++ standard committee called Call syntax: x.f(y) vs. f(x,y). The following excerpt from the paper summarizes the proposal:

The basic suggestion is to define x.f(y) and f(x,y) to be equivalent. In addition, to increase compatibility and modularity, I suggest we explore the possibility of ignoring uncallable and inaccessible members when looking for a member function (or function object) to call.

x.f(y) means

  1. First try x.f(y) –does x’s class have a member f? If so try to use it
  2. Then try f(x,y) – is there a function f? If so, try to use it
  3. otherwise error

f(x,y) means

  1. First try x.f(y) – does x’s class have a member f? If so try to use it
  2. First try f(x,y) – is there a function f? If so, try to use it
  3. otherwise error

This may sound a bit crazy, but to me this immediately shouted EXTENSION METHODS, which is something that I’ve been wondering for a while how could be added to the language. I find this one of the most important proposals (I am aware of) for the evolution of the C++ language.

UPDATE: Recently I have discovered that a second paper on the same topic exists. The N4165 paper, called Unified Call Syntax, is authored by Herb Sutter. Unlike the first paper, Sutter’s paper proposes only making x.f(y) equivalent to f(x,y) and not the other way around. Here is a quote from the paper:

This single proposal aims to address two major issues:

  • Enable more-generic code: Today, generic code cannot invoke a function on a T object without knowing whether the function is a member or nonmember, and must commit to one. This is long-standing known issue in C++.
  • Enable “extension methods” without a separate one-off language feature: The proposed generalization enables calling nonmember functions (and function pointers, function objects, etc.) symmetrically with member functions, but without a separate and more limited “extension methods” language feature. Further, unlike “extension methods” in other languages which are a special-purpose feature that adds only the ability to add member functions to an existing class, this pro-posal would immediately work with calling existing library code without any change. (See also following points.)

Herb Sutter argues that the unified call syntax would achieve major benefits including consistency, simplicity, and teachability, improve of discoverability and usability of existing code and improve C++ tool support. However, he also explains why why making f(x) equivalent to x.f() is not possible since it would break existing code.

Extension Methods in C#

I’ll take a step back for a short paragraph on extension methods in C#.

An extension method allows you to add functionality to an existing type without modifying the original type or creating a derived type (and without needing to recompile the code containing the type that is extended.)

Let’s assume you want to write a method that counts words in a text. You could write a method called WordCount that looks like this (for simplicity we’ll only consider space as a delimiter):

static class StringUtilities
{
   public static int WordCount(string text)
   {
      return text.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Length;
   }
}

You can use it like this:

var text = "This is an example";
var count = WordCount(text);

Just by changing the syntax a bit and adding the this keyword in front of the type of first argument (always the type you want to extend), the compiler treats the method as part of the type.

static class StringUtilities
{
   public static int WordCount(this string text)
   {
      return text.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Length;
   }
}

With this change we can now write:

var text = "This is an example";
var count = text.WordCount();

WordCount(text) vs. text.WordCount() is exactly what the Stroustrup’s N4174 paper is proposing.

Notice the extension methods in C# have several requirements including the following:

  • the extension method is always a public static member of a static class
  • the extension method has access only to the public interface of the extended type

Extension Methods in C++

The question that one may ask is how would this equivalence of x.f(y) and f(x,y) be beneficial to the language. My immediate answer is that it defines extension methods and enable developers to extend functionality without touching exiting code.

Let’s take a real case example. The C++ standard containers provide methods like find() to find an element in the container. There are also generic algorithms for the same purpose (that work in a generic way for various ranges defined by iterators). But this find() methods return an iterator and you have to check the value against the end() to interpret the result. Using std::map for instance, many times you just need to know whether it contains a key or not. std::map does not have a contains() method, but you can easily write a helper function:

template<typename TKey, typename TValue>
bool contains(std::map<TKey, TValue> const & c, TKey const key)
{
   return c.find(key) != c.end();
}

And with that in place you can write:

auto m = std::map<int, char> {{1, 'a'}, {2, 'b'}, {3,'c'}};
if(contains(m, 1))
{
   std::cout << "key exists" << std::endl;
}

However, I would very much like to be able to say (because in an object oriented world this seems much more natural to me):

if(m.contains(1))
{
}

If x.f(y) and f(x,y) were equivalent this later code would be perfectly legal (and beautiful).

Here is a second example. Suppose you want to define some query operators like the ones available in LINQ in .NET. Below is a dummy implementation of several such operators for std::vector.

template<typename T, typename UnaryPredicate>
std::vector<T> where(std::vector<T> const & c, UnaryPredicate predicate)
{
   std::vector<T> v;
   std::copy_if(std::begin(c), std::end(c), std::back_inserter(v), predicate);
   return v;
}

template <typename T, typename F, typename R = typename std::result_of<F(T)>::type>
std::vector<R> select(std::vector<T> const & c, F s)
{
   std::vector<R> v;
   std::transform(std::begin(c), std::end(c), std::back_inserter(v), s);
   return v;
}

template<typename T>
T sum(std::vector<T> const & c)
{
   return std::accumulate(std::begin(c), std::end(c), 0);
}

(Many thanks to Piotr S. and Yakk for helping with the implementation of select.)

Those functions enable us to write code that “sums the square of the even numbers from a range” as shown below:

auto v = std::vector<int> {1,2,3,4,5,6,7,8,9};

auto t1 = where(v, [](int e){return e % 2 == 0; });
auto t2 = select(t1, [](int e){return e*e; });
auto s = sum(t2);

I don’t particularly like how the code looks. You have to catch the return value from each function even though it’s an intermediary value that you discard later.

One can improve that by using the return value from the previous call as a direct argument to the next call:

auto s = sum(select(where(v, [](int e){return e % 2 == 0; }), [](int e){return e*e; }));

However, I like this even less. First, because it gets hard to follow what are the arguments for each call (and even if you try to format it differently it’s not going to help much) and second because it inverses the natural reading order or the operations. You first see sum, then select and last where. Even though this is how we described it earlier (“sums the square of the even numbers from a range”), it is misleading with regard to the order of the operations.

However, if x.f(y) and f(x,y) were equivalent it would be very easy to write the above code like this:

auto v = std::vector<int> {1,2,3,4,5,6,7,8,9};
auto s = v.where([](int e){return e % 2 == 0; })
          .select([](int e){return e*e; })
          .sum();

Isn’t that beautiful? I think it is.

Conclusion

The N4174 paper is rather an exploration of possibilities for uniform calling syntax than a very formal proposal. There are various aspects that have to be carefully considered especially when considering how to treat f(x, y). The N4165 paper makes a good case of the uniform calling syntax, explains the benefits better and argues against treating f(x) equivalent to x.f(). You should go ahead and read the two papers for detailed information. However, I sincerely hope that one day this will be accepted and become a core feature of the C++ language.

10 Replies to “Extension methods in C++”

  1. > `std::map` does not have a `contains()` method,

    It does, actually, it’s just not called `contains()`. It’s called `count()`, and it returns the number of elements in the map which match the key, so for `std::map` it returns 0 or 1 to indicate if the map contains the desired key. For `std::multimap` it can return more than one.

    So if you really like the name `contains()` your extension method could be:

    template
    bool contains(std::map const &c, TKey const key)
    {
    return c.count(key);
    }

  2. Very interesting proposal! I’m curious what the rules will be for what type the first parameter has to be (ie. T, T&, T const &).

    Interesting side effect: C functions become object-oriented. You could do things like this:
    “a string”.strlen();

    I can’t decide if I’m okay with that or not 😛

    Quick note: Your example implementation of LINQ actually kind of defeats the point of LINQ, which is that it’s returning a filtered iterator rather than a new container (for most operations). They’re also generic to different container types. However, with lambdas (c++11), ranged-based for loops (c++11), auto return type (c++14), and constraints (c++17?), and generators (c++17?), they would be very easy to write:

    template
    auto where(T const & c, UnaryPredicate predicate) requires(is_range())
    {
        for (auto&& val : c)
            if (predicate(val))
                yield val;
    }
    
    template 
    auto select(T const & c, F selector) requires(is_range())
    {
        for (auto&& val : c)
            yield selector(val);
    }
    

    C++17 is practically a new language 🙂

  3. I came to almost the same remark!

    With this proposal, + the proposal of range iterators, we are close to be able to use template functions from as member function.

    Example:

    vector v = { 1, 2, 3, 4 };
    v.transform(v.begin(), [](int i){ // std::tranform
        return i + 10;
    });
    int m = v.max_element(); // std::max_element
    

    And if we go even further, what about allowing the for-range syntax for range iterators? And be able to write:

    std::transform(i : v) {
        i += 10;
    }
    
  4. This is certainly a nice way to implement extensions methods, and I hope it become a part of C++.

    @jasonthe, I think it would be pretty neat in some cases:

        2.pow(bit_depth)
    

    and certainly not in others:

        0.max(some_int)
    

    However I do think Herb’s argument with discoverability far outweights the few weird cases. Now writing:

        "something".
    

    would list functions operating on strings. However I also do think his addition which would allow multicast would flood the suggestion list, making it less useful…

  5. As I mentioned in the article, that was just a dummy implementation. I could have provided just the prototypes, but I wanted to make it a bit more realistic (with what’s already available today).

  6. Reentering the world of C++ from mostly C#, this last comment from Marius makes me a bit sad. My syntax will not be as seamless and more importantly to my mind, less intuitive without this.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.