Lazily evaluated folds in C++

1. Intro

In the previous post we delved into c++ fold expressions; despite them being available only for a set of built-in operators we devised a way to use them with arbitrary callables, that allows us to write things like a compile time reduction of sequence’s maximum:

In this post we are going to expand on arbitrary fold expressions by incorporating proper move semantics and implementing generator-like lazy evaluation. Along the way we’ll attempt our take on expression templates, a technique I call flat expression templates and examine some peculiarities of the constexpr specifier.

2. Becoming move aware

Recall that in our previous post we used a “vessel” type to serve as node type in our expression tree ? Well, obviously we don’t have to make a copy of our (expression) data every time we build such a node, so why not have two versions for it, one for lvalue data, which will hold a reference to the actual data, and one for rvalue data that will “move absorb” the data into a member:

The type O_x is winking the right (to our perspective) eye which according to the Taranki Herald guide to eye flirtation means “I love you”. We’ll be respecting the order of evaluation and need to distinguish between left and right folds so node types should be discrete for the operators to be able to correctly build the expression tree. I don’t normally use emoticons to convey messages in code, but this felt … right.

The give() member functions return references to the underlying data and clone() is used to overcome some nuances in type deduction later on, ensuring that we’ll either get a reference or a copy of the data if it was an lvalue or an rvalue respectively (useful when using O_x temporaries to avoid references to expiring objects). It goes without saying that we have to sprinkle std::move(...) and std::forward(...) all over the place, eg:

A forward macro like fw is a good way to be bit more terse and write more readable code. I improved mine by adding __VA_ARGS__ as suggested by Vittorio Romeo. The complete code will be shown in compilable examples (and made available in more “official releases” later on) but for now let me stress three points:

  • Forwarding references need to be used (evaluated, passed to another function, called) with std::forward to preserve their value category.
  • std::move is like playing catch, it doesn’t work if there’s nobody on the other side. No matter how “move aware” a library is, it’ll end up copying stuff when handling non-moveable types.
  • Moving from an object places it in a valid but unspecified state so be really thorough and explicit; even when creating dangling references or invalid state things may appear to work and that’s the worst kind of bug, the kind that bites when you’ve moved on.

3. Lazy evaluation

To respect their “functional heritage”, fold expressions in c++, should make a plea for lazy evaluation. Laziness is a common feature among functional programming languages, whereby expressions are evaluated only when necessary rather than upon declaration. This affects our programming style and the way we think of programs, infinity and efficiency. Below we’ll outline laziness in two programming languages and describe what’s doable in our case

The Haskell case

Haskell implements lazy evaluation down to its expression evaluation machinery. This means that the abstract syntax tree we get for a valid program is “evaluated” according to lazy evaluation rules, which most commonly manifest themselves in the following three ways:

  • arguments to functions are evaluated only when this is necessary for evaluation to continue.
  • an argument is not necessarily evaluated fully: only the parts that are needed are examined.
  • an argument is evaluated at most only once. This is done in the implementation by replacing expressions by graphs and calculating over them.

The Python case

One could argue that since Python is an interpreted language and solely code paths that are executed need be checked in any way (as opposed to C++ not compiling due to type errors in unevaluated contexts), you have lazy evaluation right there. But this is not the case; Python’s lazy evaluation is not built into expression evaluation, take the following example:

Here the interpreter calls f1 three times, even though the second argument to f1 is not evaluated. Haskell wouldn’t do that (it would only evaluate f1 twice); we won’t get into details, but this case falls under the first rule mentioned above.

So how can Python handle infinite lists, have lazy evaluation and perform calculation on demand ? The answer lies in the use of 1 (and a half) pattern. We avoid calling them 2 patterns because of the amount of conceptual overlapping between them. It all starts with the iterator pattern and implementing the iterator protocol:

Implementing the “magic methods” __iter__ and next (or __next__ in Python3) as shown above makes the class usable in contexts where we loop through the items it “produces”. Boilerplate code can be reduced using the generator pattern like so:

This pretty cool piece of code can provide “computation on demand” and has endless applications, from mundane ones to mind boggling. Check the references for a thorough introduction.

The curious case of C++

C++ implements a special case of lazy evaluation in short circuiting, where e.g. the second argument to the || operator won’t be evaluated if the first is true : the expression will yield true right away (note that both sides must compile). The other place where we find lazy evaluation in c++ is template (meta)programming : only instantiated templates are evaluated (OK plus the syntax of unevaluated templates is checked) so in a way “if you don’t use it, it doesn’t exist”. To implement lazy evaluation for our fold expressions we’ll be inspired by the iterator pattern, though this would be a great opportunity to apply the generator pattern via coroutines (let’s wait for things to become a bit more standard and we’ll be using yield-like constructs in C++ !)

4. Flat expression templates

Expression templates as used in the previous post are not ideal for lazy evaluating the expressions we build since the content is embedded in the form of the data structure. In order to lazily evaluate we need to be able to step through intermediate states easily and express clearly the resulting types and the advancement of our computation. I haven’t quite seen this the way we’ll be presenting it here, hence my take at giving it a name : flat expression templates. That been said I have to mention that it’s a common technique in GPU programming to “flat out” data structures in order to make them more “stream-able”, so this is not a “virgin birth”. To illustrate what this variation of expression templates does take for example a right fold expression, say:

(a + (b + (c + d)))

what we’ll do is convert this:


into this


We need a type to hold the expression in a reverse polish notation fashion and we choose to store the “nodes” in a tuple to be as “compile time friendly” as possible (we’ll see later on that constexpr evaluation is still on the table).

The type O_Om holds the expression (in its m hand) and has both its eyes open (we’ll be using it for left folds as well). To highlight some of the code we note that:

  • state is the accumulator and has the type of applying the callable on the expression. If no accumulator is explicitly provided the rightmost argument becomes it. As in Haskell the accumulator consumes the expression from the right to left (so write your callables accordingly).
  • A nested iterator type will keep track of the nodes that have been computed
  • yield returns the result of the expression right away. It can be evaluated at compile time and it’s a good example of a constexpr member function not being const since it alters the state and wouldn’t be accepted by C++11 compilers that implicitly made constexpr member functions const. Luckily this is not the case in C++14 and beyond so we write code like this.
  • vtu::call_with_tuple_element_first is a mechanism to visit a tuple with a “runtime” index (quoted because inside yield the i is evaluated at compile time)

5. Finally, lazy folds

The last piece of the puzzle is the expression iterator, which will be a nested member of O_Om:

All this iterator needs to do is hold references to the flattened expression and the state and keep track of the current position in the expression. Calling ++ on it advances the position on the expression and evaluates it, while dereferencing it returns the current state. We only implement == and < and take advantage of std::rel_ops to generate the rest. Having this in place we can write code like this:

JoinAc is a functor that concatenates the stringification of its arguments and we can loop through the expression state with range based for loops since our O_Om has begin and end member functions (a traditional for or while would also be acceptable). We also showcase that compile time evaluation is still doable by creating a constexpr variable mm using a “right away” evaluation.
Check a compilable example here

6. References