A “sorted view”

1. Intro

This installment elaborates on the creation of a “sorted_view” utility. The “References” section contains links to the complete code; throughout the article, we provide demos and snippets to showcase and explain the implementations. The section on modernizing the code contains real world applications of the following C++17 features:

  • structured bindings
  • template argument deduction for class templates

Problem description

Apply a relative sorting on two or more data sequences, i.e take the sequences

s1 = ["Michael", "Larisa", "Paavo", "Mark"]
s2 = ["Phelps", "Latynina", "Nurmi", "Spitz"]

If the goal was to sort based the ordering of s1, the expected output would be

s1 = ["Larisa", "Mark", "Michael", "Paavo"]
s2 = ["Latynina", "Spitz", "Phelps", "Nurmi"]

Related work

This is a common problem in modern packaging. I’ve answered to exactly this problem description here. The “sorting by indices” techniques I used is much better explained (and extended and prepackaged) by Raymond Chen in a great series of articles you can start reading here.

What to expect

This is an example of  how higher abstractions can give better performance. What we do, instead of delaying or minimizing expensive swap / copy / move operations (as in the sources linked so far), is to create a literate view for our sequences that will give us the desired sorted step through and avoid touching the sequences.

2. Design a sorted view

Any goals that are not immediately clear will be explained shortly after.


  • Don’t mutate the original sequences
  • Don’t make any copies of the original sequences
  • Allow custom comparators to be used for sorting
  • Easy “sorted walks”
  • Ability to use other sequences as tie breakers


The few Standard Library components we’ll use to make our life simpler:

  • A tuple to hold the (one or more) sequences
  • A vector of indices that holds the step through order
  • The sort and iota algorithms

3. Implementation

In a nutshell, our class will look like this


Since spelling out the type of our class is quite cumbersome and error prone, we’ll create a helper function to automate this:

So the tuple class member is referencing any l-value references and move constructing from r-value references.


Sorting is pretty straightforward

The I template parameter denotes the sequence we use for sorting. A lambda that uses the custom comparison function with the referenced elements of that sequence is used in the sort algorithm.


True to our “easy access” goal we provide an access method

all it does is retrieve the element of the Ith sequence at the sorted position pos.


Say we had the top ten of Olympic medalists. If we encode each column as a sequence (hence we’d have the sequences ‘first_name’, ‘last_name’ etc) this is how we’d use the sorted view class

The output is

{ Birgit-Fischer-Germany }
{ Bjorn-Daehlie-Norway }
{ Carl-Lewis-United States }
{ Jenny-Thompson-United States }
{ Larisa-Latynina-Soviet Union }
{ Mark-Spitz-United States }
{ Michael-Phelps-United States }
{ Ole Einar-Bjorndalen-Norway }
{ Paavo-Nurmi-Finland }
{ Sawao-Kato-Japan }
{ Usain-Bolt-Jamaica }

Working code can be found here.

Proving the “no copy” clause

Let’s avoid standardese and just use an example. Therein the S class explicitly forbids copy construction and allows move construction; the sorted_view class captures everything as expected.

The additional clue here is that if the two definitions of tuple_t were different, the program wouldn’t compile due to conflicts.

4. Sorting with multiple criteria

Let’s return to our original example. If we were to sort based on gold medals, we see that Mark Spitz and Carl Lewis collected an equal amount of them so we’d like to specify a tie breaker, let’s say silver medals. Nope, equal there too so  let’s fall back to bronze. This process of relative sorting with fallback is what the following method does

The Demo provides essentially the following use case

i.e. sort based on total medals (sequence 9), then gold (6), then silver (7) and finally bronze (8). Complete code can be found in the references in case you ‘re wondering how is the equality operator generated (we only pass one comparator) or how to actually apply the fallback comparisons (generic lambdas are great by the way).

5. What can we get out of c++17

I hate the “I have this hammer, are there any nails” approach. The things we’ll work on modernizing are actual shortcomings of the pre C++17 code version so don’t expect a cornucopia of features. That been said, I find the following language additions exceptional.

No more “make_stuff” stuff

Template argument deduction for class templates is here to eliminate most cases of make_something helper functions. Though they might not look like such, the following are class template instantiations:

Implicitly synthesized Deduction Guides (from existing constructors) take over the task of spelling out the class type (which in this case matches the desired behavior in terms of capturing). We can also specify our own deduction guides explicitly but this deserves its own post.  Also note that the type of the two variables declared above is different (different instantiations of the same class template).

NOTE: A nitpicker might have noticed that since the beginning the sorted view constructor didn’t use forwarding references but mere r-value references. That was so that we’d take advantage of reference collapsing rules and have the C++17 extension work out of the box in the desired fashion. If you don’t believe me make the constructor templated in the Demo below and see if template argument deduction for class templates works (spoiler … it doesn’t).

Accessible access

The way we accessed the sorted view elements was kind of awkward, mostly because the user had to be quite familiar with the class interface. Secondly because even then, unless we went the extra mile to bake “named tuples” into sorted_view, no one would be comfortable using the indexes of the inner tuple without looking at the instantiation (I used many fields in the initial example make this obvious).

Structured bindings provide a simple way to decompose a class (destructure one might say) but even better, you can do this your own way! Since destructuring the data holder of the sorted view (the inner tuple) doesn’t do much good, here’s what we’ll provide :

Every binding is an instance of class that holds pointers to a tuple element and the sorted indices

Also, this type provides a sorted access through an overloaded subscript operator:

Now this makes the code way clearer with minimum effort. Here’s what it took to provide custom structured bindings for our class:

It’s worth mentioning that ADL allows get to be written inside the utl namespace (there’s a dispute on Reddit on whether writing it inside std would be UB). Usage of the techniques mentioned in this section can be found in the following Demo

6. Extensions

There are things to consider in order to augment the functionality of the sorted view class:

  • Provide an interface so that sort could take advantage of C++17 parallel sorts if desired
  • Raymond Chen has an article on the Schwarzian transform, maybe this would be good to have
  • Custom view should be also possible. Why just sort, a lot of other stuff can be done the same way (my first attempt cluttered the interface so until I find a way to keep simple things simple, having a dedicated class is OK)
  • Provide named tuples for the C++14 implementation
  • Concepts: the elements sorted_view acts upon, have to be sortable, indexable etc and all these are constraints wonderfully expressed with concepts (you should see the errors I got while writing this w/o concepts).

On most of the points above, I’ll return with a dedicated post, until then thank you for reading.


  1. sorted_view.h
  2. sorted_view.cpp
  3. Sorting by indices – Raymond Chen
  4. How do I sort two vectors containing relative entries
  5. Template argument deduction for class templates
  6. Structured bindings

Quantifiers, metaprogramming and concepts

1. Intro

Quantification expresses the extent to which a  predicate is true over a set of elements. Two forms are the most common:

  • Universal \forall
  • Existential \exists

To define the quantifiers, let A represent an expression, and let x represent a variable. Then:

  • Universal Quantification: If we want to indicate that A is true for all possible values of x, we write “\forall\ x\; A\;“. Here \forall\ x\; is called the universal quantifier, and A\; is called the scope of the quantifier. The symbol \forall is pronounced “for all“.
  • Existential quantification: If we want to indicate that A is true for at least one value of x, we write “\exists\ x\; A\;“. This statement is pronounced “There exists an x such that A“. Here \exists\ x\; is called the existential quantifier, and A is called the scope of the existential quantifier.

We elaborate on the topic as follows: In section 2 we describe quantification in the template metaprogramming context, in section 3 we explore related standard library facilities introduced in C++17 and section 4 adds concepts to the equation. Finally, in section 5 we draw some concluding remarks.

2. Use in metaprogramming 

Metaprograms often use predicate logic in creative ways. For example, type queries in generic code are used to constrain, dispatch, specialize, activate or deactivate code at compile time. Even until C++14, expressing predicate logic at compile time was cumbersome, when handling variadic arguments, involving a helper type and crafty use of boolean logic:

The C++17 Standard allows for much simpler (almost mathematical) syntax, by folding over the logical operators:

Above we define two variable templates that become true or false depending on the value of their (boolean) template arguments. Since they’re conceptually and syntactically simpler, as well as cheaper in terms of instantiations, we’ll prefer these to built subsequent tools.

Making the Quantifiers

To define our quantifiers we need to involve the predicates, essentially the process that produces the list of boolean we feed to the above variable templates. Let P be a predicate over a type and Ts a type list, then:

These quantifiers could be used for example to check that a list of types is trivially copyable:

A toy example can be found here . Of course the quantifiers can be used inline w/o stub code like the are_tc variable template above.

Generalized Predicates – Currying metafunctions

A limitation of the code so far is that it can only be used for univariate predicates. Say we had a predicate that needed 2 or more variables to compute, how would we handle it? Here the is_smaller type checks that the size of its operands is ascending:

You might argue that “ \forall T\; \in\; A\; smaller\;” i.e. is_smaller holds for every type T in typelist A, has no meaning and you’d be right (smaller than what?). Thankfully we need to introduce a restriction which is that multivariate predicates are to be bound until only one argument position is free and we can do that either in place (by loading the needed types throughout our interfaces) or by introducing a compile time bind, essentially currying metafunctions:

With this in place here’s how to verify that a type is smaller than every type in a typelist:

3. The Standard library

The universal and existential quantifiers are defined in the C++17 Standard library as conjunction and disjunction respectively. Their main difference from what we defined so far is that they short-circuit in the expression level, i.e. not every expression needs to be well-formed if the result can be computed before their evaluation. This comes in handy in certain cases, but it implies they’ll have to be recursively implemented (I’m pretty sure the following implementation is attributed to Johnathan Wakely):

i.e. we stop inheriting when the expression has computed. In contrast, our flattened instantiations implementation excels in compile times and instantiation depth violations. Choose wisely between flat and short circuiting depending on your use case.

As an example, recursive instantiations here use an integer sequence of 18 elements to break a tuple, essentially due to its move constructor. The move constructor has a conditional noexcept that is implemented recursively. Therefore, for each template argument, a constant number of additional instantiations is required (the implementation e.g. of is_nothrow_move_constructible in terms of is_nothrow_constructible which is implemented in terms of __is_nt_constructible and so on, for 15 instantiation levels!). This means that each template argument for tuple requires 15 additional instantiation levels for this check. On top of that, 9 levels are always required (constant depth). Therefore, 17 arguments require an instantiation depth of 17*15+9 == 264 (default for clang 3.7 was 256).

Another example use of this mechanism would be to elaborate on quantification the other way around, i.e. one type vs many predicates (instead of 1 predicate vs many types we’be doing so far):

4. Concepts

Concepts are predicates. And as such a simple extension of our quantifiers can encapsulate concepts as well:

To get an idea of how this could be used check below:

or this Demo (disclaimer: only to showcase the use of quantifiers with concepts, don’t try to find particular meaning in the code).

5. Conclusions

Metaprogramming is changing. For the better. Better techniques and more expressive syntax are emerging that will allow library writers to produce more elegant code. Users of this code will have a better experience due to more comprehensive compiler errors. The price we have to pay is understanding the formalities and theory involved with this ecosystem; the bet to be won is Stroustrup’s “make simple things simple” claim, so that modern C++ can be “expert friendly” but not “expert only”.

6. References

On Quantifiers and predicate logic

Standard library documentation

Related Stack Overflow articles listed chronologically

Related code on the blog’s repository

Want a (std::) slice?

1. Intro

Whenever we use data structures that are compile time restricted in terms of extends, we allow for significant optimizations and early error detection in the expense of more cumbersome access style and code maintenance, due to the inevitable introduction of templated code. Users of such data types, algebraic (tuple, pair) or aggregate (array), breathed a sigh of relief (well… maybe not all of them) with the introduction of std::apply in C++17, that handles the boilerplate of unpacking their contents into a callable. To give an example, you should now be able to write your callable like so


  • Remain blissfully unaware of the structure that packs your arguments (the back end doesn’t need to specialize for different argument sources)
  • Use the callable with any of the aforementioned data types without special treatment (the front end needs no boilerplate to unpack the data structures)

So, for example, you can know invoke your callable like so

2. Creating slices

Handy as it may be, this ecosystem of data types and tools, leaves a lot to be desired. First and foremost, the slice operation i.e. the creation of  an entity that captures a subset of some given data. My main motivation for this was to be able to pass slices to the apply function although while researching the topic I found an old freestanding request, which I’ve just answered here (isn’t it a bummer when you invent something only to discover other people have had similar musings?) Below we define the tuple_slice operation that fills this gap and elaborate on the code:

Not to be confused with valarray’s slice, this piece of code does 3 simple things:

  • Protects from invalid instantiations with static asserts
  • Forwards the container (tuple, array, pair) + an integer sequence to the implementation function
  • The implementation function gets every element indexed by the sequence and forwards it as a tuple i.e. lvalues are referenced and rvalues are move constructed

A slice of t can be obtained like so

tuple_slice<I1, I2>(t);

Where [I1, I2) is the exclusive range of the subset.

To be able to create slices that are not zero-based, all we do is create the index sequence out of the cardinality of the desired sequence and then offset the values. This works but a helper “make_integer_range” type would be nice to have:

You can experiment with integer ranges here. Back to the main act then (using make_index_range to simplify the definition of tuple_slice is trivial so we’ll omit it), here are some example uses of tuple_slice:

3. Design considerations

One might ask “Shouldn’t a slice operation on an array create another array?” and this brings us to a generic code paradox: While trying to handle more cases we may end up being more restrictive; The extra effort to create respective make_array and forward_as_array functions is not the problem (there you go if you don’t believe me). Here are the considerations:

  • We cannot create an array of references, hence we’d fall back to creating an array of reference wrappers. Not only does this create an interface mismatch with the tuple case (we need to call get() on a reference wrapper to obtain the value) but between lvalue and rvalue arrays as well: slicing the later would give us move constructed values, hence no  get() method.
  • How would this logic extend to the pair case? Slicing a pair does not give us yet another pair (only in the dummy case where the slice is identical to the original) so the fallback of producing a tuple would be viewed as an exception. To me, the frequency of the word “exception” in the description of a design is inversely proportional to its quality.
  • Slices should handle uniformly the empty case and tuples do this .
  • If there’s a unique sliced type, pattern matching it would require minimum effort.
  • There’s always the option to create arrays out of the slices if we want to (would be interesting to see if the intermediate tuple can be optimized away) .

So, there you go … less is more

4. References

Compose and curry as folds

1. Intro

In a previous post we introduced C++17 fold expressions and described a way to extend them for arbitrary callables. Implementation details don’t matter for what we’re elaborating on here but it should be clear that (given the tools we developed) the following is possible:

(Op<F>(args) + ...)
(... + Op<F>(args))

i.e. performing a right or left fold respectively using the “F” type as the folding operator (the “+” operator is used just to generate the underlying expression templates). By “F” we mean a type that will serve as the reducing function (it generates callable instances) e.g. this:

Such tools enhance the notion of “reduce” being the ultimate production line where new algorithmic skeletons and processes can be forged sometimes to a surprising extend as we’ll see below.

2. On composition

In a nutshell composition is the pointwise application of one function to the result of another to produce a third one. Add another one to the pipeline and the mathematic notation goes something like this:

(\;f\; \circ\ g\; \circ\ y\;)\;(\;x\;)\; \equiv f\;(\;g\;(\;y\;(\;x\;)\;)\;)\;

…and so on and so forth, we should be able to compose as many functions as we want provided that the result of the “inner” functions can be a valid input to the outer.

Now, look at the mathematical notation again. What does the right hand side look like? It’s just folding over the function call operator! Written in pseudo C++ (I say pseudo because C++ folds don’t support arbitrary operators out of the box) it would look like this:

args () ...

where args is a pack of callables we want to compose. Now it all boils down to synthesizing the reducer, and composing two callables is easy enough, by creating a lambda that applies the left side to the result of calling the right side with whatever arguments you pass to it:

We’ve also added a helper compose function to hide the fold generation even though it’s good to know that it takes a right fold to do this. Having this in place you can write things like

One more thing

I might have oversimplified the definition of the Comp structure. Taking move semantics into consideration is not only a matter of efficiency but correctness as well, since preserving value category is essential when using forwarded callables. To ammend this, the returned lambda will do a perfect capture using this helper type:

This type will be used by the (C++14 generalized) lambda capture to acquire either a reference (wrapper) or a moved callable (depending on its value category). So now Comp is written as follows

3. On currying

Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument. In other words it could be described (nitpickers pardon me) by the production of partially applied functions something you see in C++ with certain applications of std::bind. Following the same logic as above we’ll be using the left fold pipeline with the following reducer

Now this is possible

The last piece of the puzzle is constexpr lambdas that would really allow for this to extend to synthesizing constexpr callables and this is on its way for C++17


A runnable example can be found here. Note that this particular example only runs for g++ and I’ll be diving into clang in a later endeavor.


Lazily evaluated folds in C++

1. Intro

In the previous post we delved into c++ fold expressions; despite them being available only for a set of built-in operators we devised a way to use them with arbitrary callables, that allows us to write things like a compile time reduction of sequence’s maximum:

In this post we are going to expand on arbitrary fold expressions by incorporating proper move semantics and implementing generator-like lazy evaluation. Along the way we’ll attempt our take on expression templates, a technique I call flat expression templates and examine some peculiarities of the constexpr specifier.

2. Becoming move aware

Recall that in our previous post we used a “vessel” type to serve as node type in our expression tree ? Well, obviously we don’t have to make a copy of our (expression) data every time we build such a node, so why not have two versions for it, one for lvalue data, which will hold a reference to the actual data, and one for rvalue data that will “move absorb” the data into a member:

The type O_x is winking the right (to our perspective) eye which according to the Taranki Herald guide to eye flirtation means “I love you”. We’ll be respecting the order of evaluation and need to distinguish between left and right folds so node types should be discrete for the operators to be able to correctly build the expression tree. I don’t normally use emoticons to convey messages in code, but this felt … right.

The give() member functions return references to the underlying data and clone() is used to overcome some nuances in type deduction later on, ensuring that we’ll either get a reference or a copy of the data if it was an lvalue or an rvalue respectively (useful when using O_x temporaries to avoid references to expiring objects). It goes without saying that we have to sprinkle std::move(...) and std::forward(...) all over the place, eg:

A forward macro like fw is a good way to be bit more terse and write more readable code. I improved mine by adding __VA_ARGS__ as suggested by Vittorio Romeo. The complete code will be shown in compilable examples (and made available in more “official releases” later on) but for now let me stress three points:

  • Forwarding references need to be used (evaluated, passed to another function, called) with std::forward to preserve their value category.
  • std::move is like playing catch, it doesn’t work if there’s nobody on the other side. No matter how “move aware” a library is, it’ll end up copying stuff when handling non-moveable types.
  • Moving from an object places it in a valid but unspecified state so be really thorough and explicit; even when creating dangling references or invalid state things may appear to work and that’s the worst kind of bug, the kind that bites when you’ve moved on.

3. Lazy evaluation

To respect their “functional heritage”, fold expressions in c++, should make a plea for lazy evaluation. Laziness is a common feature among functional programming languages, whereby expressions are evaluated only when necessary rather than upon declaration. This affects our programming style and the way we think of programs, infinity and efficiency. Below we’ll outline laziness in two programming languages and describe what’s doable in our case

The Haskell case

Haskell implements lazy evaluation down to its expression evaluation machinery. This means that the abstract syntax tree we get for a valid program is “evaluated” according to lazy evaluation rules, which most commonly manifest themselves in the following three ways:

  • arguments to functions are evaluated only when this is necessary for evaluation to continue.
  • an argument is not necessarily evaluated fully: only the parts that are needed are examined.
  • an argument is evaluated at most only once. This is done in the implementation by replacing expressions by graphs and calculating over them.

The Python case

One could argue that since Python is an interpreted language and solely code paths that are executed need be checked in any way (as opposed to C++ not compiling due to type errors in unevaluated contexts), you have lazy evaluation right there. But this is not the case; Python’s lazy evaluation is not built into expression evaluation, take the following example:

Here the interpreter calls f1 three times, even though the second argument to f1 is not evaluated. Haskell wouldn’t do that (it would only evaluate f1 twice); we won’t get into details, but this case falls under the first rule mentioned above.

So how can Python handle infinite lists, have lazy evaluation and perform calculation on demand ? The answer lies in the use of 1 (and a half) pattern. We avoid calling them 2 patterns because of the amount of conceptual overlapping between them. It all starts with the iterator pattern and implementing the iterator protocol:

Implementing the “magic methods” __iter__ and next (or __next__ in Python3) as shown above makes the class usable in contexts where we loop through the items it “produces”. Boilerplate code can be reduced using the generator pattern like so:

This pretty cool piece of code can provide “computation on demand” and has endless applications, from mundane ones to mind boggling. Check the references for a thorough introduction.

The curious case of C++

C++ implements a special case of lazy evaluation in short circuiting, where e.g. the second argument to the || operator won’t be evaluated if the first is true : the expression will yield true right away (note that both sides must compile). The other place where we find lazy evaluation in c++ is template (meta)programming : only instantiated templates are evaluated (OK plus the syntax of unevaluated templates is checked) so in a way “if you don’t use it, it doesn’t exist”. To implement lazy evaluation for our fold expressions we’ll be inspired by the iterator pattern, though this would be a great opportunity to apply the generator pattern via coroutines (let’s wait for things to become a bit more standard and we’ll be using yield-like constructs in C++ !)

4. Flat expression templates

Expression templates as used in the previous post are not ideal for lazy evaluating the expressions we build since the content is embedded in the form of the data structure. In order to lazily evaluate we need to be able to step through intermediate states easily and express clearly the resulting types and the advancement of our computation. I haven’t quite seen this the way we’ll be presenting it here, hence my take at giving it a name : flat expression templates. That been said I have to mention that it’s a common technique in GPU programming to “flat out” data structures in order to make them more “stream-able”, so this is not a “virgin birth”. To illustrate what this variation of expression templates does take for example a right fold expression, say:

(a + (b + (c + d)))

what we’ll do is convert this:


into this


We need a type to hold the expression in a reverse polish notation fashion and we choose to store the “nodes” in a tuple to be as “compile time friendly” as possible (we’ll see later on that constexpr evaluation is still on the table).

The type O_Om holds the expression (in its m hand) and has both its eyes open (we’ll be using it for left folds as well). To highlight some of the code we note that:

  • state is the accumulator and has the type of applying the callable on the expression. If no accumulator is explicitly provided the rightmost argument becomes it. As in Haskell the accumulator consumes the expression from the right to left (so write your callables accordingly).
  • A nested iterator type will keep track of the nodes that have been computed
  • yield returns the result of the expression right away. It can be evaluated at compile time and it’s a good example of a constexpr member function not being const since it alters the state and wouldn’t be accepted by C++11 compilers that implicitly made constexpr member functions const. Luckily this is not the case in C++14 and beyond so we write code like this.
  • vtu::call_with_tuple_element_first is a mechanism to visit a tuple with a “runtime” index (quoted because inside yield the i is evaluated at compile time)

5. Finally, lazy folds

The last piece of the puzzle is the expression iterator, which will be a nested member of O_Om:

All this iterator needs to do is hold references to the flattened expression and the state and keep track of the current position in the expression. Calling ++ on it advances the position on the expression and evaluates it, while dereferencing it returns the current state. We only implement == and < and take advantage of std::rel_ops to generate the rest. Having this in place we can write code like this:

JoinAc is a functor that concatenates the stringification of its arguments and we can loop through the expression state with range based for loops since our O_Om has begin and end member functions (a traditional for or while would also be acceptable). We also showcase that compile time evaluation is still doable by creating a constexpr variable mm using a “right away” evaluation.
Check a compilable example here

6. References

Fun with folds

The functional genome

 A fold is a higher order function (a function that has one or more function parameters and/or returns a function) that is recursively applied over a data structure. An easy to grasp example of such a function can be found in Haskell :

foldl (operator) accumulator [...]

Here are the building blocks of the above expression :

  • foldl : A left fold; left as in leftwise applied (…not a leftover fold) – more on this later.
  • (operator) : A binary function
  • accumulator : A starting value
  • […] : A list to fold

 So, say you want for example to sum the (list of) numbers from 1 to 10 :

foldl (+) 0 [1..10]

 This expands to :

( ( ( 1 + 2 ) + 3 ) + ... + 10 )

 Does the power of folds boil down to accumulations? Far from it!


 Folding can be used creatively to perform all sorts of actions, algebra preserving ones:

foldl (max) 0 [1, 20, 2, 14]

or catamorphisms

foldl (Stringify) "" [1, 20, 2, 14]

here we assume that Stringify is a function (you’d need a type class for this) that maps its input to a stringified concatenation, so that we’d get “120214” and have a numeric sequence reduced to a string.

Folds in a C++ Context

 C++17 will have fold expressions! Clang supports them since 3.6 as does the trunk version of gcc. The syntax differs from Haskell and is designed to play well with variadic templates. A fold expression can be written like this:

(args + ...)

 This is a (unary) left fold, because we “anchor” the application of the operator to the left, ie it expands as:

((args$0 + args$1) + ...) + args$n

 If we were to create a right fold :

(... + args)

 the expression expands like this :

args$0 + (... + (args$n-1 + args$n))

 We used the term unary previously not to describe the operator (+) but the accumulator. When an accumulator is explicitly specified

(args + ... +   an)
(  a0 + ... + args)

we dub the fold expression as “binary”. For unary folds the identity value of the operator’s algebra is implicitly used, i.e for ‘+’ a zero is added, for ‘*’ the result is multiplied by 1 etc, so that empty parameter packs have meaningful folding.

Show me some folding

A for_each lambda

auto ForEach = [](auto&& fun, auto&&... args) {
    (fun(args), ...);

To showcase this, let’s assume the existence of a printer functor

struct Printer 
    template <class T> 
    void operator()(T&& arg) { std::cout << arg; }

we can call it on arbitrary sequences like this

ForEach(Printer{}, 1, " to the ", 2, " to the ", 3, '\n');

ForEach applies an arbitrary callable to variadic arguments by folding with the , operator. It also does this in a left to right fashion, which is the associativity of the comma operator. Note that if a right fold was used inside ForEach we’d still be applying the callable from left to right (the associativity of comma does not change). Run the code here.

Summing the contents of an std::array at compile time

namespace detail
	template <class... Ts>
	constexpr auto sum_(Ts&&... args)
		return (args + ...);

	template <typename T, size_t N, size_t... Is>
	constexpr T sum_impl(array<T, N> const &arr, 
		return sum_(get<Is>(arr)...);

template <typename T, size_t N>
constexpr T sum(array<T, N> const &arr)
    return detail::sum_impl(
        arr, make_index_sequence<N>{});

int main()
    constexpr array<int, 4> arr{ { 1, 1, 2, 3 } };
    cout << integral_constant<int, sum(arr)>{};

The core function here is detail::sum_ which uses fold expressions to calculate the total of a sequence.

A more modern summing style

The above is OK and all, but C++17 will give us the tools to be way, way more terse. In this case the function apply abstracts away the process of unfolding an algebraic data type into a function call (what we manually do, in a total of three functions, with the index sequence above). Having this in place, all it takes is to encapsulate our fold :

struct Summer
    template <class... Ts>
    constexpr auto operator()(Ts&&... args) { 
        return (args + ...); 

and then call

apply(Summer{}, arr)

Sadly the current proposal for apply only works for tuples. It takes a minor modification to have it working for std::array as well, specifically the use of a mechanism to obtain the valency of the input data type (for now tuple_size is “hardcoded” into apply). Click here for a demo.

Folding arbitrary functions

A somewhat limiting feature of c++ fold expressions, is that they’re only available for certain operators:

        +  -  *  /  %  ^  &  |  ~  =  <  >  <<  >>
        +=  -=  *=  /=  %=  ^=  &=  |=  <<=  >>=
        ==  !=  <=  >=  &&  ||  ,  .*  ->*

So doing

args Op ...

is not allowed for arbitrary operators. We are going to create a mechanism to bypass this. To solve this we are going to use expression templates, which is the first thing that came to mind when looking at Haskell’s fold diagram, due to its resemblance to Veldhuizen’s expression parse trees:


The code below creates add-hoc expression templates for arbitrary operators using thin wrapper types that can be parametrized by the operant/operator pair.

First thing we need is a wrapper type that apart from data, records the type of the operant we want to use:

template <class F, class T>
struct XX
	T data;

	template <class D>
	constexpr XX(D&& data)
		: data(std::forward<D>(data))
	{ }

	constexpr T give() const
		return data;

Then we create an expression class that records the operants and the type of the operation we want to perform.

template <class L, class F, class R>
struct YY
	L l;
	R r;

	template <class LL, class RR>
	constexpr YY(LL&& l, RR&& r)
		: l(std::forward<LL>(l))
		, r(std::forward<RR>(r))
	{ }

	constexpr auto give() const
		return F{}(l.give(), r.give());

We’ll be using the + operator in our ad-hoc expression templates so we overload it to perform expression parsing for our wrapper class (the two versions are created in way that satisfies the types involved in a left to right evaluation):

template <class F, class R, class T>
constexpr auto operator+(
    XX<F, T> const& lhs, R const& rhs)
	return YY< XX<F, T>, F, R >(lhs, rhs);

template <class F, class T1, class T2, class R>
constexpr auto operator+(
    YY<T1, F, T2> const& lhs, R const& rhs)
	return YY< YY<T1, F, T2>, F, R >(lhs, rhs);

Finally the interface functions use the above machinery:

namespace detail
	template <class... Ts>
	constexpr auto _foldl(Ts&&... args)
		return (args + ...);

	template <class... Ts>
	constexpr auto _foldr(Ts&&... args)
		return (... + args);

template <class F, class... Ts>
constexpr decltype(auto) foldl(Ts&&... args)
	return detail::_foldl(XX<F, Ts>(args)...).give();

template <class F, class... Ts>
constexpr decltype(auto) foldr(Ts&&... args)
	return detail::_foldr(XX<F, Ts>(args)...).give();

What we get out of this, is a foldl and a foldr function that, much like the Haskell counterparts, can be parameterized by the type of a binary operator. And then things get interesting :

// concatenate the stringification of sequence elements
    std::string(" bird in the hand, is worth "), 
    std::string(" in the bush"));

// create a vector out of a sequence
auto k = foldl<Vettore>(1, 2, 30, 12);

// find the maximum element of a sequence
foldl<Max>(1, 20, 3, 5);

The complete code for the above examples can be found here.

Hopefully this is clear enough to proceed to the next step, providing some syntactic sugar. We didn’t get through all this ET trouble to end up with something compile time recursion could (kind of, almost in let’s say a similar way) handle. What we aimed for is the ability to write customizable C++ folds ourselves. Here’s how :

template <class F, class T>
XX<F, T> Op(T const& val)
    return XX<F, T>(val); 

So when using this “operator generator” we’d be writing things like

(Op<Max>(args) + ...)


1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4191.html
3. http://en.cppreference.com/w/cpp/language/fold
4. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
5. Extension to std::experimental::apply
6. Arbitrary folds
7. All the fold examples in one place
8. https://www.youtube.com/watch?v=HGjGR9DfWSI

Lambda hackery: Overloading, SFINAE and copyrights


 A “bad marketing” regarding lambdas in C++ was the use of the term “functions”. Ofcourse connaisseurs talked about “lambdas” “closures” or “lambda expressions” since lambdas are never quite functions in the codegen sense and we certainly can’t think of them as such in language semantics terms; yet this misconception can trigger a quest for features we normally find in plain or templated functions; the first and second sections of this post elaborate on this.

1. Lambda Overloading 

If we think of lambdas as functions we’d might make an attempt to overload them. This attempt is easy to rebut by stating that lambdas are closures ie runtime objects and well … objects, even callable ones, do not overload! What is interesting here though, is that even lambdas “translatable” to functions (usually non capturing closures) when compiled have an intermediate representation of an object with at least 3 member functions : a function call operator, a conversion operator and one static member function, that simply invokes the lambda with this set to zero; it would take some preconditions being satisfied (like the actual closure object not being used) to optimize the closure away and end up with a plain function call. What I mean by “can’t think of them as such (functions) in language semantics terms” is that even in this case we can’t expect this to work :

auto f = []( int){ cout << __PRETTY_FUNCTION__ << endl; }; 
auto f = [](char){ cout << __PRETTY_FUNCTION__ << endl; }; 

f( 1 ); 

much like we can’t overload a function pointer: it’s always an ‘already declared‘ error. Since the title promises overloading, we’d better not call it a day yet. The closest thing to a lambda that can overload is its function call operator, so you might already had your “aha!” moment by now. If not, here it is :

Create an object out of the lambdas you want to overload and make their function call operators overloaded versions of the superobject’s function call operator.

The idea is often attributed to Mathias Gaunard as well as the following implementation:

template <class F1, class F2>
struct overload_set : F1, F2
    overload_set(F1 x1, F2 x2) : F1(x1), F2(x2) {}
    using F1::operator();
    using F2::operator();

template <class F1, class F2>
overload_set<F1,F2> overload(F1 x1, F2 x2)
    return overload_set<F1,F2>(x1,x2);

The code can be described in a few simple steps

  • overload_set is the superobject’s type; it inherits from the closure types we want to overload
  • the closure types’ function call operators are explicitly made visible to the scope of the class
  • overload is a convenience function that returns an overload set

usage is straighforward :

auto f = overload (
    []( int) { cout << __PRETTY_FUNCTION__ << endl; }, 
    [](char) { cout << __PRETTY_FUNCTION__ << endl; }

f( 2 );

The names of the function call operators can be made available through ADL (argument dependent lookup) but the using declarations of the original implementation were left unharmed since they make the code more explanatory. We’ll be removing them when we want to scale to arbitrary number of overload candidates (industrial-strength version is provided in the References section, link No 2):

template <class... F>
struct overload_set : F... 
  overload_set(F... f) : F(f)... {}  
template <class... F>
auto overload(F... f) 
  return overload_set<F...>(f...);   

Overloading lambdas gives you the full power of closures plus Ad-hoc polymorphism ie make lambdas (actually the superobject) behave differently for each type. You can imagine this being useful when writing a for_each algorithm for tuples of different types; then the functor you pass to the algorithm can handle the different elements and be defined right at the location where it is called. In the following example we assume we have a function for_each_in_tuple that visits every element in a tuple; we can then define “in situ” a mutator func (an overload set of lambdas) that will transform the tuple in ways that vary depending on the element type:

int main()
    auto func = overload (
        [](int &val)    { val *= 2;   }, 
        [](string &arg) { arg += arg; }, 
        [](char &c)     { c    = 'x'; }
    tuple<int, string, char> tup {1, "boom", 'a'};
    for_each_in_tuple(func, tup); 
    cout << get<0>(tup) << endl 
         << get<1>(tup) << endl 
         << get<2>(tup); 

(Run the example here)


 According to common knowledge generic lambdas are function objects with a templated function call operator. Back in the day when we mistook lambdas for functions we might attempt to use SFINAE on them, yet another language feature that works for function templates only. Using the above mechanism, expression SFINAE works out of the box for generic lambdas:

auto f = overload (
  [](auto arg) -> enable_if_t< 
    is_integral<decltype(arg)>::value> {cout << "1";}, 
  [](auto arg) -> enable_if_t<
    !is_integral<decltype(arg)>::value>{cout << "2";}

Easy peasy: Since function call operators are templated for generic lambdas they fall under SFINAE rules when the superobject’s function call operator triggers creating candidates for overload resolution.

3. Copyrights

 Closures may generally be copied. There are cases though, where you want to prohibit this. For instance you may want to limit the ways a lambda can escape a scope in order to avoid dangling references. Silly example follows:

struct Locall
    int val;
    ~Locall() { 
        cout << "destroyed"; 

int main()
	function<void()> f; 
		Locall a{ 101 }; 
		auto k = [re = ref(a)] {
            cout << "... but using value " 
            << re.get().val;
		f = k;

	return 0;

(run the example here)

If we marked k as non copyable, the compiler wouldn’t allow f to “absorb” a reference to an object about to die; it’s a way for the coder to convey intents that may later be forgotten (or can’t be grasped by another person working on your code). To make a non copyable lambda use copyrights :

auto k = [nocopy, re = ref(a)] { /* ... */ };

Since the obvious way for a lambda to be non copyable is to capture a move only type, I use a macro that generates an “init capture” (a C++14 generalized capture) with a unique name and a light(est)weight captured object. This is how:

#define CONCATENATE_DETAIL(x, y) x##y

struct NC
    NC()                     = default; 
    NC(NC const&)            = delete; 
    NC& operator=(NC const&) = delete; 
    NC(NC&&)                 = default; 

#define nocopy UNIQUE_IDEXPR(x) = NC()

Note that what we get from this is a unique expression (auto)xi = N() (where auto isn’t actually spelled and i equals 1, 2, 3…), a move initialization due to the xvalue produced by NC() ie requires the plain and move constructors to be available (auto generated or defined). Also xi is in the scope of the closure class so we don’t need to worry about collisions with the outer scope


 Something is redundant in the nocopy macro, can you spot it? Answer in the comments section or look at the code in Github (link No 2) to find out.


  1. Mathias Gaunard on Lambda the Ultimate
  2. The code on the blog’s repository
  3. Fun with lambdas