# Avoiding compile-time recursion

One of my favorite blogs, is Raymond Chen’s “The Old New Thing”. In a recent post, he described a way to find a type in a tuple, with a compile time predicate that given a type and a tuple, will return the index of that type in the tuple, or fail to compile if the type does not exist (or exists more than once):

 type_index>::value; // Our type^^^ is found at index ^^^ 1 of our tuple

view raw
type_index_use.cpp
hosted with ❤ by GitHub

The value is calculated at compile time and all is good in the world, no?

You can follow the link to check his implementation, Raymond offers a great walk-through as always. The thing that bothers me is the use of recursive template instantiations. This happens when a template instantiates itself until it hits a termination case. The reason I dislike the technique is because it tends to:

• Increase the binary size. Symbols are generated for a chain of instantiations which end up bloating the executable.
• Slow down compilation. It’s a recursion and the compiler has to pay the price up front.
• Complicate the error messages when something goes wrong. The compiler will spit out the whole recursion tree when something goes wrong, and the error messages get obfuscated.

There’s a chapter called “The Cost of Recursive Instantiation” in the template book, but what I’ve found really dangerous is that when trying to scale code designed this way the burden on the compiler may get out of hand. E.g. we may start adding nested types in such templates or try to piggy back on the recursion to design extra machinery.

Of course in the problem at hand, Raymond’s implementation is simple and efficient (even checked binary outputs and resulting generated symbol – as always you can trust what he’s saying). I’m writing this post to show there’s usually an alternative and to point out that not considering the cost of these things may haunt you in the future.

Let’s see how a non recursive implementation would look like:

 template // 1 constexpr bool match_v = std::is_same_v>; template >> struct type_index; template class Tuple, class… Args, // 3 std::size_t… Is> struct type_index, std::index_sequence> : std::integral_constant>) + … + 0)> { static_assert(1 == (match_v> + … + 0), "T doesn't appear once in Tuple"); };

I’m doing all the work at (3) with this guy:

Is * match_v<T, Is, Tuple<Args...>> + ... + 0

This fold is binary to handle empty parameter packs.  The “match_v” predicate (1) checks if a type “T” exist at index “I” in a tuple “Tuple”. We can see what an example instantiation would do:

type_index<int, tuple<char, bool, int>>

// T = int
// Tuple = tuple<char, bool, int>
// Args... = char, bool, int
// Is = 0, 1, 2 -> index sequence of cardinality = sizeof...(Args)

// If we denote as "m" the "match_v" predicate, we have
(0 * m<int,0,Tuple> + (1 * m<int,1,Tuple> + (2 * m<int,2,Tuple>)))
(0 * 0 + (1 * 0 + (2 * 1))) = 2

2 -> the index of "int" inside Tuple

Essentially we multiply each index with the predicate’s result, and sum the results. The predicate (1) could be written inline, but since there’s a second use for it, it’s good to abstract it out.

Our specialization has a static assertion. By using a variation of the fold expression described above, it makes sure that only one occurrence of the type “T” exists in “Tuple”. This way, when our meta-function returns zero, we can be positive it means “index zero” instead of “does not exist”; additionally we make sure that if the type exists more than once in the tuple, a static assertion is thrown much like in the original implementation.

Check a demo here.

## A peace offering

I understand that flattening template instantiations might not appeal to some people, so let me extend an olive branch here: when done right, compile time recursion can be OK. Take for example the “index_sequence” we use in our code, which is implemented recursively:

• Implementations use an amazing trick (I believe it’s attributed to Dave Abrahams but don’t quote me on that – it certainly appeared in his work a long time ago) to lower recursion depth to Log2(N). It is based on the realization that you can use the first half of an index sequence to build the second half, by adding an initial offset:

index_sequence<20> = index_sequence<10, 10 + index_sequence<10>>

• Ubiquitous library facilities are reusable. What this means, is that if someone instantiated an index sequence of twenty elements, most probably this will be shareable in your implementation.

Finally, if we end up using recursion (hope not but…), simplicity should compensate for potential hazards. Here’s how I’d solve the same problem recursively, with a single function:

 template consteval std::size_t TypeIndex() { if constexpr (I >= std::tuple_size::value) { return 0; } else if constexpr (std::is_same_v>) { return I + 1; } else { return TypeIndex(); } }

Check a demo here.

Not much explanation is needed apart from the fact that I’ve opted to return the 1-based index and use 0 when the search fails, instead of a compilation error. We also avoid compilation error when more that one instances occur, by returning the index of the first match, thus short-circuiting template instantiations.

## Disclaimer

If you’re in a situation where such things matter, always always measure: measure compilation time, executable size, check generated symbols, check compilation in different platforms (time and again I’ve seen VS struggle with sizes & symbol generation that gcc would simply optimize out and discard). I only wrote this post because it’s good to know your options.

# A “sorted view”

## 1. Intro

This installment elaborates on the creation of a “sorted_view” utility. The “References” section contains links to the complete code; throughout the article, we provide demos and snippets to showcase and explain the implementations. The section on modernizing the code contains real world applications of the following C++17 features:

• structured bindings
• template argument deduction for class templates

### Problem description

Apply a relative sorting on two or more data sequences, i.e take the sequences

s1 = ["Michael", "Larisa", "Paavo", "Mark"]
s2 = ["Phelps", "Latynina", "Nurmi", "Spitz"]


If the goal was to sort based the ordering of s1, the expected output would be

s1 = ["Larisa", "Mark", "Michael", "Paavo"]
s2 = ["Latynina", "Spitz", "Phelps", "Nurmi"]


### Related work

This is a common problem in modern packaging. I’ve answered to exactly this problem description here. The “sorting by indices” techniques I used is much better explained (and extended and prepackaged) by Raymond Chen in a great series of articles you can start reading here.

### What to expect

This is an example of  how higher abstractions can give better performance. What we do, instead of delaying or minimizing expensive swap / copy / move operations (as in the sources linked so far), is to create a literate view for our sequences that will give us the desired sorted step through and avoid touching the sequences.

## 2. Design a sorted view

Any goals that are not immediately clear will be explained shortly after.

### Goals

• Don’t mutate the original sequences
• Don’t make any copies of the original sequences
• Allow custom comparators to be used for sorting
• Easy “sorted walks”
• Ability to use other sequences as tie breakers

### Tools

The few Standard Library components we’ll use to make our life simpler:

• A tuple to hold the (one or more) sequences
• A vector of indices that holds the step through order
• The sort and iota algorithms

## 3. Implementation

In a nutshell, our class will look like this

 template class sorted_view { std::tuple _ts; mutable std::vector _is; public: sorted_view(Args&&… args); // sort based on the Ith sequence template void sort(F&& bop) const; // size of the Ith sequence template auto size() const; // the SORTED element of the Ith sequence at position pos template decltype(auto) at(std::size_t pos); };

### Capturing

Since spelling out the type of our class is quite cumbersome and error prone, we’ll create a helper function to automate this:

 template auto make_sorted_view(Args&&… args) { return sorted_view(std::forward(args)…); }

view raw
make_sorted_view.cpp
hosted with ❤ by GitHub

So the tuple class member is referencing any l-value references and move constructing from r-value references.

### Sorting

Sorting is pretty straightforward

 template void sort(F&& bop) const { using namespace std; _is.resize(get(_ts).size()); iota(begin(_is), end(_is), 0); std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) { return forward(bop)(get(_ts)[lhs], get(_ts)[rhs]); }); }

view raw
sorted_view_sort.cpp
hosted with ❤ by GitHub

The I template parameter denotes the sequence we use for sorting. A lambda that uses the custom comparison function with the referenced elements of that sequence is used in the sort algorithm.

### Accessing

True to our “easy access” goal we provide an access method

 template decltype(auto) at(std::size_t pos) { return std::get(_ts)[_is[pos]]; }

view raw
sorted_view_at.cpp
hosted with ❤ by GitHub

all it does is retrieve the element of the Ith sequence at the sorted position pos.

### Example

Say we had the top ten of Olympic medalists. If we encode each column as a sequence (hence we’d have the sequences ‘first_name’, ‘last_name’ etc) this is how we’d use the sorted view class

 // 1. Create the sorted view auto sv = make_sorted_view(first_name, last_name, country, sport, years, gender, medals_gold, medals_silver, medals_bronze, medals_total); sv.sort<0>(std::less<>{}); // 2. Sort by first name for (size_t i = 0; i < sv.size<0>(); i++) printf("{ %s-%s-%s }\n", sv.at<0>(i).c_str(), // 3. access the ith sorted element of sequence 0 sv.at<1>(i).c_str(), sv.at<2>(i).c_str());

The output is

{ Birgit-Fischer-Germany }
{ Bjorn-Daehlie-Norway }
{ Carl-Lewis-United States }
{ Jenny-Thompson-United States }
{ Larisa-Latynina-Soviet Union }
{ Mark-Spitz-United States }
{ Michael-Phelps-United States }
{ Ole Einar-Bjorndalen-Norway }
{ Paavo-Nurmi-Finland }
{ Sawao-Kato-Japan }
{ Usain-Bolt-Jamaica }

Working code can be found here.

### Proving the “no copy” clause

Let’s avoid standardese and just use an example. Therein the S class explicitly forbids copy construction and allows move construction; the sorted_view class captures everything as expected.

 struct S { int i; S(int i) : i(i) {} int size() { return 1; } S(S const&) = delete; S(S&&) = default; }; int main() { S s1(1), s2(2); auto sv = make_sorted_view(s1, s2, S(3)); using sv_t = decltype(sv); using tuple_t = sv_t::tuple_t; using tuple_t = std::tuple; }

The additional clue here is that if the two definitions of tuple_t were different, the program wouldn’t compile due to conflicts.

## 4. Sorting with multiple criteria

Let’s return to our original example. If we were to sort based on gold medals, we see that Mark Spitz and Carl Lewis collected an equal amount of them so we’d like to specify a tie breaker, let’s say silver medals. Nope, equal there too so  let’s fall back to bronze. This process of relative sorting with fallback is what the following method does

 template void multi_sort(F&& bop) const { using namespace std; _is.resize(get(_ts).size()); iota(begin(_is), end(_is), 0); std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) { return detail::use_comparator::apply( _ts, lhs, rhs, forward(bop)); }); }

The Demo provides essentially the following use case

 sv.multi_sort<9, 6, 7, 8>([](auto&& a, auto&& b) { return a > b; });

i.e. sort based on total medals (sequence 9), then gold (6), then silver (7) and finally bronze (8). Complete code can be found in the references in case you ‘re wondering how is the equality operator generated (we only pass one comparator) or how to actually apply the fallback comparisons (generic lambdas are great by the way).

## 5. What can we get out of c++17

I hate the “I have this hammer, are there any nails” approach. The things we’ll work on modernizing are actual shortcomings of the pre C++17 code version so don’t expect a cornucopia of features. That been said, I find the following language additions exceptional.

### No more “make_stuff” stuff

Template argument deduction for class templates is here to eliminate most cases of make_something helper functions. Though they might not look like such, the following are class template instantiations:

 sorted_view sv1(first_name, last_name, sport, medals_gold, medals_silver); sorted_view sv2(medals_bronze, medals_total);

Implicitly synthesized Deduction Guides (from existing constructors) take over the task of spelling out the class type (which in this case matches the desired behavior in terms of capturing). We can also specify our own deduction guides explicitly but this deserves its own post.  Also note that the type of the two variables declared above is different (different instantiations of the same class template).

NOTE: A nitpicker might have noticed that since the beginning the sorted view constructor didn’t use forwarding references but mere r-value references. That was so that we’d take advantage of reference collapsing rules and have the C++17 extension work out of the box in the desired fashion. If you don’t believe me make the constructor templated in the Demo below and see if template argument deduction for class templates works (spoiler … it doesn’t).

### Accessible access

The way we accessed the sorted view elements was kind of awkward, mostly because the user had to be quite familiar with the class interface. Secondly because even then, unless we went the extra mile to bake “named tuples” into sorted_view, no one would be comfortable using the indexes of the inner tuple without looking at the instantiation (I used many fields in the initial example make this obvious).

Structured bindings provide a simple way to decompose a class (destructure one might say) but even better, you can do this your own way! Since destructuring the data holder of the sorted view (the inner tuple) doesn’t do much good, here’s what we’ll provide :

 sorted_view sv(first_name, last_name, sport, medals_gold, medals_silver, medals_bronze, medals_total); auto [bName1, bName2, bSport, bGoldM, bSilverM, bBronzeM, bTotalM] = sv;

Every binding is an instance of class that holds pointers to a tuple element and the sorted indices

 template struct sorted_element { std::remove_reference_t* _data; std::vector const* _is; sorted_element(T& elem, std::vector const& is) : _data(&elem), _is(&is) { } decltype(auto) operator[](std::size_t i) { return _data->at(_is->at(i)); } };

Also, this type provides a sorted access through an overloaded subscript operator:

 sv.multi_sort<6, 4, 5>([](auto&& a, auto&& b) { return a > b; }); auto [sName1, sName2, sSport, sGoldM, sSilverM, sBronzeM, sTotalM] = sv; for (size_t i = 0; i < sv.size<6>(); i++) { std::cout << sTotalM[i] << " " << sGoldM[i] << " " << sSilverM[i] << " " << sName1[i] << " " << sName2[i] << std::endl; }

Now this makes the code way clearer with minimum effort. Here’s what it took to provide custom structured bindings for our class:

 namespace std { // 1. provide tuple_element>::type template struct tuple_element> { using sorted_view_t = utl::sorted_view; using type = typename sorted_view_t::template tuple_element_t; }; // 2. provide tuple_size> template struct tuple_size> : public integral_constant { }; } namespace utl { // 3. provide a get(sorted_view<…>) function template auto get(sorted_view const& sv) { return sv.template elem(); } }

It’s worth mentioning that ADL allows get to be written inside the utl namespace (there’s a dispute on Reddit on whether writing it inside std would be UB). Usage of the techniques mentioned in this section can be found in the following Demo

## 6. Extensions

There are things to consider in order to augment the functionality of the sorted view class:

• Provide an interface so that sort could take advantage of C++17 parallel sorts if desired
• Raymond Chen has an article on the Schwarzian transform, maybe this would be good to have
• Custom view should be also possible. Why just sort, a lot of other stuff can be done the same way (my first attempt cluttered the interface so until I find a way to keep simple things simple, having a dedicated class is OK)
• Provide named tuples for the C++14 implementation
• Concepts: the elements sorted_view acts upon, have to be sortable, indexable etc and all these are constraints wonderfully expressed with concepts (you should see the errors I got while writing this w/o concepts).

On most of the points above, I’ll return with a dedicated post, until then thank you for reading.

# Quantifiers, metaprogramming and concepts

## 1. Intro

Quantification expresses the extent to which a  predicate is true over a set of elements. Two forms are the most common:

• Universal $\forall$
• Existential $\exists$

To define the quantifiers, let A represent an expression, and let x represent a variable. Then:

• Universal Quantification: If we want to indicate that A is true for all possible values of x, we write “$\forall\ x\; A\;$“. Here $\forall\ x\;$ is called the universal quantifier, and $A\;$ is called the scope of the quantifier. The symbol $\forall$ is pronounced “for all“.
• Existential quantification: If we want to indicate that A is true for at least one value of x, we write “$\exists\ x\; A\;$“. This statement is pronounced “There exists an x such that A“. Here $\exists\ x\;$ is called the existential quantifier, and A is called the scope of the existential quantifier.

We elaborate on the topic as follows: In section 2 we describe quantification in the template metaprogramming context, in section 3 we explore related standard library facilities introduced in C++17 and section 4 adds concepts to the equation. Finally, in section 5 we draw some concluding remarks.

## 2. Use in metaprogramming

Metaprograms often use predicate logic in creative ways. For example, type queries in generic code are used to constrain, dispatch, specialize, activate or deactivate code at compile time. Even until C++14, expressing predicate logic at compile time was cumbersome, when handling variadic arguments, involving a helper type and crafty use of boolean logic:

 template< bool… Bs > using bool_sequence = std::integer_sequence< bool, Bs… >; template< bool… Bs > using bool_and = std::is_same< bool_sequence, bool_sequence<(Bs || true)… >>; template< bool… Bs > using bool_or = std::integral_constant< bool, !bool_and< !Bs… >::value >;

The C++17 Standard allows for much simpler (almost mathematical) syntax, by folding over the logical operators:

 template constexpr bool all_v = (… && Bs); template constexpr bool any_v = (… || Bs);

view raw
new_pred_logic.cpp
hosted with ❤ by GitHub

Above we define two variable templates that become true or false depending on the value of their (boolean) template arguments. Since they’re conceptually and syntactically simpler, as well as cheaper in terms of instantiations, we’ll prefer these to built subsequent tools.

### Making the Quantifiers

To define our quantifiers we need to involve the predicates, essentially the process that produces the list of boolean we feed to the above variable templates. Let P be a predicate over a type and Ts a type list, then:

 // ———————————— "is there ?" template

view raw
quantifiers.cpp
hosted with ❤ by GitHub

These quantifiers could be used for example to check that a list of types is trivially copyable:

 template constexpr bool are_tc = universal_quantifier;

A toy example can be found here . Of course the quantifiers can be used inline w/o stub code like the are_tc variable template above.

### Generalized Predicates – Currying metafunctions

A limitation of the code so far is that it can only be used for univariate predicates. Say we had a predicate that needed 2 or more variables to compute, how would we handle it? Here the is_smaller type checks that the size of its operands is ascending:

 template using is_smaller = std::integral_constant;

view raw
is_smaller.cpp
hosted with ❤ by GitHub

You might argue that “$\forall T\; \in\; A\; smaller\;$” i.e. is_smaller holds for every type T in typelist A, has no meaning and you’d be right (smaller than what?). Thankfully we need to introduce a restriction which is that multivariate predicates are to be bound until only one argument position is free and we can do that either in place (by loading the needed types throughout our interfaces) or by introducing a compile time bind, essentially currying metafunctions:

 template

view raw
meta_curry.cpp
hosted with ❤ by GitHub

With this in place here’s how to verify that a type is smaller than every type in a typelist:

 struct S0 {}; struct alignas(2) S1 {}; struct alignas(4) S2 {}; struct alignas(8) S3 {}; int main() { static_assert(universal_quantifier::type, S1, S2, S3>); }

## 3. The Standard library

The universal and existential quantifiers are defined in the C++17 Standard library as conjunction and disjunction respectively. Their main difference from what we defined so far is that they short-circuit in the expression level, i.e. not every expression needs to be well-formed if the result can be computed before their evaluation. This comes in handy in certain cases, but it implies they’ll have to be recursively implemented (I’m pretty sure the following implementation is attributed to Johnathan Wakely):

 template struct conjunction : std::true_type { }; template struct conjunction : B1 { }; template struct conjunction : std::conditional_t, B1> {}; template struct disjunction : std::false_type { }; template struct disjunction : B1 { }; template struct disjunction : std::conditional_t> { };

view raw
con_dis.cpp
hosted with ❤ by GitHub

i.e. we stop inheriting when the expression has computed. In contrast, our flattened instantiations implementation excels in compile times and instantiation depth violations. Choose wisely between flat and short circuiting depending on your use case.

As an example, recursive instantiations here use an integer sequence of 18 elements to break a tuple, essentially due to its move constructor. The move constructor has a conditional noexcept that is implemented recursively. Therefore, for each template argument, a constant number of additional instantiations is required (the implementation e.g. of is_nothrow_move_constructible in terms of is_nothrow_constructible which is implemented in terms of __is_nt_constructible and so on, for 15 instantiation levels!). This means that each template argument for tuple requires 15 additional instantiation levels for this check. On top of that, 9 levels are always required (constant depth). Therefore, 17 arguments require an instantiation depth of 17*15+9 == 264 (default for clang 3.7 was 256).

Another example use of this mechanism would be to elaborate on quantification the other way around, i.e. one type vs many predicates (instead of 1 predicate vs many types we’be doing so far):

 template class… Ps> constexpr bool satisfies_all_v = std::conjunction…>::value; template class… Ps> constexpr bool satisfies_any_v = std::disjunction…>::value;

## 4. Concepts

Concepts are predicates. And as such a simple extension of our quantifiers can encapsulate concepts as well:

 // ———————————————————– // ——————– to express "1 predicate Vs many types" template

To get an idea of how this could be used check below:

 template concept bool Data = requires(T t) { t.data; }; template requires existential_quantifier auto f(Ts… ts) { std::cout << "version for data\n"; return 0; }

or this Demo (disclaimer: only to showcase the use of quantifiers with concepts, don’t try to find particular meaning in the code).

## 5. Conclusions

Metaprogramming is changing. For the better. Better techniques and more expressive syntax are emerging that will allow library writers to produce more elegant code. Users of this code will have a better experience due to more comprehensive compiler errors. The price we have to pay is understanding the formalities and theory involved with this ecosystem; the bet to be won is Stroustrup’s “make simple things simple” claim, so that modern C++ can be “expert friendly” but not “expert only”.

# Want a (std::) slice?

## 1. Intro

Whenever we use data structures that are compile time restricted in terms of extends, we allow for significant optimizations and early error detection in the expense of more cumbersome access style and code maintenance, due to the inevitable introduction of templated code. Users of such data types, algebraic (tuple, pair) or aggregate (array), breathed a sigh of relief (well… maybe not all of them) with the introduction of std::apply in C++17, that handles the boilerplate of unpacking their contents into a callable. To give an example, you should now be able to write your callable like so

 template void call_me(Ts&&… args) { /* implementation */ }

and

• Remain blissfully unaware of the structure that packs your arguments (the back end doesn’t need to specialize for different argument sources)
• Use the callable with any of the aforementioned data types without special treatment (the front end needs no boilerplate to unpack the data structures)

So, for example, you can know invoke your callable like so

 std::apply(call_me, a); // a is array std::apply(call_me, p); // p is pair std::apply(call_me, t); // t is tuple

view raw
using_apply.cpp
hosted with ❤ by GitHub

## 2. Creating slices

Handy as it may be, this ecosystem of data types and tools, leaves a lot to be desired. First and foremost, the slice operation i.e. the creation of  an entity that captures a subset of some given data. My main motivation for this was to be able to pass slices to the apply function although while researching the topic I found an old freestanding request, which I’ve just answered here (isn’t it a bummer when you invent something only to discover other people have had similar musings?) Below we define the tuple_slice operation that fills this gap and elaborate on the code:

 namespace detail { template constexpr auto slice_impl(Tuple&& t, std::index_sequence) { return std::forward_as_tuple( std::get(std::forward(t))…); } } template constexpr auto tuple_slice(Cont&& t) { static_assert(I2 >= I1, "invalid slice"); static_assert(std::tuple_size>::value >= I2, "slice index out of bounds"); return detail::slice_impl(std::forward(t), std::make_index_sequence{}); }

view raw
tuple_slice.cpp
hosted with ❤ by GitHub

Not to be confused with valarray’s slice, this piece of code does 3 simple things:

• Protects from invalid instantiations with static asserts
• Forwards the container (tuple, array, pair) + an integer sequence to the implementation function
• The implementation function gets every element indexed by the sequence and forwards it as a tuple i.e. lvalues are referenced and rvalues are move constructed

A slice of t can be obtained like so

tuple_slice<I1, I2>(t);


Where [I1, I2) is the exclusive range of the subset.

To be able to create slices that are not zero-based, all we do is create the index sequence out of the cardinality of the desired sequence and then offset the values. This works but a helper “make_integer_range” type would be nice to have:

 namespace detail { template struct integer_range; template struct integer_range> { using type = std::integer_sequence; }; } template > using make_integer_range = typename detail::integer_range, std::make_integer_sequence>::type; template using make_index_range = make_integer_range;

view raw
make_int_range.cpp
hosted with ❤ by GitHub

You can experiment with integer ranges here. Back to the main act then (using make_index_range to simplify the definition of tuple_slice is trivial so we’ll omit it), here are some example uses of tuple_slice:

 int main() { int i1(1), i2(1); char c1('o'), c2('k'); auto tp{ std::make_tuple(2., 4., 23, 7, 'x', 'l') }; std::array ar{ { 'a', 'b', 'c', 'd' } }; // 1. read from a slice ———————————– std::tie(i1, i2) = tuple_slice<2, 4>(tp); // 2. assign to a slice ———————————– tuple_slice<0, 2>(ar) = std::tie(c1, c2); // 3. apply a slice on a lambda ————————— std::apply([](auto&& arg1, auto&& arg2) { std::cout << "(" << arg1 << ", " << arg2 << ")\n"; }, tuple_slice<1, 3>(ar)); }

view raw
slice_examples.cpp
hosted with ❤ by GitHub

## 3. Design considerations

One might ask “Shouldn’t a slice operation on an array create another array?” and this brings us to a generic code paradox: While trying to handle more cases we may end up being more restrictive; The extra effort to create respective make_array and forward_as_array functions is not the problem (there you go if you don’t believe me). Here are the considerations:

• We cannot create an array of references, hence we’d fall back to creating an array of reference wrappers. Not only does this create an interface mismatch with the tuple case (we need to call get() on a reference wrapper to obtain the value) but between lvalue and rvalue arrays as well: slicing the later would give us move constructed values, hence no  get() method.
• How would this logic extend to the pair case? Slicing a pair does not give us yet another pair (only in the dummy case where the slice is identical to the original) so the fallback of producing a tuple would be viewed as an exception. To me, the frequency of the word “exception” in the description of a design is inversely proportional to its quality.
• Slices should handle uniformly the empty case and tuples do this .
• If there’s a unique sliced type, pattern matching it would require minimum effort.
• There’s always the option to create arrays out of the slices if we want to (would be interesting to see if the intermediate tuple can be optimized away) .

So, there you go … less is more

# Compose and curry as folds

## 1. Intro

In a previous post we introduced C++17 fold expressions and described a way to extend them for arbitrary callables. Implementation details don’t matter for what we’re elaborating on here but it should be clear that (given the tools we developed) the following is possible:

(Op<F>(args) + ...)
(... + Op<F>(args))


i.e. performing a right or left fold respectively using the “F” type as the folding operator (the “+” operator is used just to generate the underlying expression templates). By “F” we mean a type that will serve as the reducing function (it generates callable instances) e.g. this:

 struct Max { template constexpr decltype(auto) operator()(T1&& lhs, T2&& rhs) { return lhs > rhs ? std::forward(lhs) : std::forward(rhs); } };

view raw
max_reducer.cpp
hosted with ❤ by GitHub

Such tools enhance the notion of “reduce” being the ultimate production line where new algorithmic skeletons and processes can be forged sometimes to a surprising extend as we’ll see below.

## 2. On composition

In a nutshell composition is the pointwise application of one function to the result of another to produce a third one. Add another one to the pipeline and the mathematic notation goes something like this:

$(\;f\; \circ\ g\; \circ\ y\;)\;(\;x\;)\; \equiv f\;(\;g\;(\;y\;(\;x\;)\;)\;)\;$

…and so on and so forth, we should be able to compose as many functions as we want provided that the result of the “inner” functions can be a valid input to the outer.

Now, look at the mathematical notation again. What does the right hand side look like? It’s just folding over the function call operator! Written in pseudo C++ (I say pseudo because C++ folds don’t support arbitrary operators out of the box) it would look like this:

args () ...


where args is a pack of callables we want to compose. Now it all boils down to synthesizing the reducer, and composing two callables is easy enough, by creating a lambda that applies the left side to the result of calling the right side with whatever arguments you pass to it:

 struct Comp { template auto operator()(F1 lhs, F2 rhs) { return [=](auto… args) { return lhs(rhs(args…)); // ^^^^^^^^^^^^^^^^^ }; } }; template auto compose(Ts&&… args) { using fld::Op; return (Op(std::forward(args)) + …).give(); }

view raw
simple_composer.cpp
hosted with ❤ by GitHub

We’ve also added a helper compose function to hide the fold generation even though it’s good to know that it takes a right fold to do this. Having this in place you can write things like

 int main() { auto f1 = [](auto a) { return 2 * a; }; auto f2 = [](auto a) { return 3 * a; }; auto f3 = [](auto a, auto b) { return b * a; }; auto cfs = compose(f1, f2, f3); std::cout << cfs(2, 3) << std::endl; }

view raw
compose_example.cpp
hosted with ❤ by GitHub

### One more thing

I might have oversimplified the definition of the Comp structure. Taking move semantics into consideration is not only a matter of efficiency but correctness as well, since preserving value category is essential when using forwarded callables. To ammend this, the returned lambda will do a perfect capture using this helper type:

 template using perfect_capture_t = std::conditional_t::value, std::reference_wrapper>, T>;

view raw
perfect_capture.cpp
hosted with ❤ by GitHub

This type will be used by the (C++14 generalized) lambda capture to acquire either a reference (wrapper) or a moved callable (depending on its value category). So now Comp is written as follows

 struct Comp { template auto operator()(F1&& lhs, F2&& rhs) { return [ lhs = perfect_capture_t(std::forward(lhs)), rhs = perfect_capture_t(std::forward(rhs)) ](auto&&… args) { return lhs(rhs(std::forward(args)…)); }; } };

view raw
composer.cpp
hosted with ❤ by GitHub

## 3. On currying

Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument. In other words it could be described (nitpickers pardon me) by the production of partially applied functions something you see in C++ with certain applications of std::bind. Following the same logic as above we’ll be using the left fold pipeline with the following reducer

 struct Curry { template auto operator()(F1&& lhs, F2&& rhs) { return [ lhs = perfect_capture_t(std::forward(lhs)), rhs = perfect_capture_t(std::forward(rhs)) ](auto&&… args) { return lhs(rhs, std::forward(args)…); // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ }; } }; template auto curry(Ts&&… args) { using fld::Op; return (… + Op(std::forward(args))).give(); }

view raw
curry.cpp
hosted with ❤ by GitHub

Now this is possible

 int main() { auto f1 = [](auto a) { return 2 * a; }; auto f2 = [](auto a) { return 3 * a; }; auto f4 = [](auto a, auto b, auto c) { return c * b * a; }; std::cout << curry(f4, 2, 1)(1) << std::endl; auto ccfs = compose(f1, f2, curry(f4, 2, 1)); std::cout << ccfs(1) << std::endl; }

view raw
curry_example.cpp
hosted with ❤ by GitHub

The last piece of the puzzle is constexpr lambdas that would really allow for this to extend to synthesizing constexpr callables and this is on its way for C++17

## Demo

A runnable example can be found here. Note that this particular example only runs for g++ and I’ll be diving into clang in a later endeavor.

# Lazily evaluated folds in C++

## 1. Intro

In the previous post we delved into c++ fold expressions; despite them being available only for a set of built-in operators we devised a way to use them with arbitrary callables, that allows us to write things like a compile time reduction of sequence’s maximum:

 struct Max { template constexpr decltype(auto) operator()(T1&& lhs, T2&& rhs) { return lhs > rhs ? std::forward(lhs) : std::forward(rhs); } }; int main() { static_assert(20 == fld::foldl(1, 20, 3, 5), "ET phone home"); }

In this post we are going to expand on arbitrary fold expressions by incorporating proper move semantics and implementing generator-like lazy evaluation. Along the way we’ll attempt our take on expression templates, a technique I call flat expression templates and examine some peculiarities of the constexpr specifier.

## 2. Becoming move aware

Recall that in our previous post we used a “vessel” type to serve as node type in our expression tree ? Well, obviously we don’t have to make a copy of our (expression) data every time we build such a node, so why not have two versions for it, one for lvalue data, which will hold a reference to the actual data, and one for rvalue data that will “move absorb” the data into a member:

 template struct O_x { T& mem; constexpr O_x(T& data) : mem(data) {} constexpr decltype(auto) give() { return mem; } constexpr decltype(auto) clone() { return mem; } }; template struct O_x { T mem; constexpr O_x(T&& data) : mem(std::move(data)) {} constexpr decltype(auto) give() { return (mem); } constexpr decltype(auto) clone() { return mem; } };

view raw
right_node.cpp
hosted with ❤ by GitHub

The type O_x is winking the right (to our perspective) eye which according to the Taranki Herald guide to eye flirtation means “I love you”. We’ll be respecting the order of evaluation and need to distinguish between left and right folds so node types should be discrete for the operators to be able to correctly build the expression tree. I don’t normally use emoticons to convey messages in code, but this felt … right.

The give() member functions return references to the underlying data and clone() is used to overcome some nuances in type deduction later on, ensuring that we’ll either get a reference or a copy of the data if it was an lvalue or an rvalue respectively (useful when using O_x temporaries to avoid references to expiring objects). It goes without saying that we have to sprinkle std::move(...) and std::forward(...) all over the place, eg:

 #define fw(…) ::std::forward(__VA_ARGS__) namespace detail { /* Implementation … */ template constexpr auto makeO_x(T&& data) { return O_x(fw(data)); } /* Implementation … */ template constexpr auto _foldr(Ts&&… args) { return (fw(args) + …); } } template constexpr decltype(auto) foldr(Ts&&… args) { return detail::_foldr(detail::makeO_x(fw(args))…); }

view raw
node_makers.cpp
hosted with ❤ by GitHub

A forward macro like fw is a good way to be bit more terse and write more readable code. I improved mine by adding __VA_ARGS__ as suggested by Vittorio Romeo. The complete code will be shown in compilable examples (and made available in more “official releases” later on) but for now let me stress three points:

• Forwarding references need to be used (evaluated, passed to another function, called) with std::forward to preserve their value category.
• std::move is like playing catch, it doesn’t work if there’s nobody on the other side. No matter how “move aware” a library is, it’ll end up copying stuff when handling non-moveable types.
• Moving from an object places it in a valid but unspecified state so be really thorough and explicit; even when creating dangling references or invalid state things may appear to work and that’s the worst kind of bug, the kind that bites when you’ve moved on.

## 3. Lazy evaluation

To respect their “functional heritage”, fold expressions in c++, should make a plea for lazy evaluation. Laziness is a common feature among functional programming languages, whereby expressions are evaluated only when necessary rather than upon declaration. This affects our programming style and the way we think of programs, infinity and efficiency. Below we’ll outline laziness in two programming languages and describe what’s doable in our case

Haskell implements lazy evaluation down to its expression evaluation machinery. This means that the abstract syntax tree we get for a valid program is “evaluated” according to lazy evaluation rules, which most commonly manifest themselves in the following three ways:

• arguments to functions are evaluated only when this is necessary for evaluation to continue.
• an argument is not necessarily evaluated fully: only the parts that are needed are examined.
• an argument is evaluated at most only once. This is done in the implementation by replacing expressions by graphs and calculating over them.

### The Python case

One could argue that since Python is an interpreted language and solely code paths that are executed need be checked in any way (as opposed to C++ not compiling due to type errors in unevaluated contexts), you have lazy evaluation right there. But this is not the case; Python’s lazy evaluation is not built into expression evaluation, take the following example:

 def f1(a, b): print('call') return a def main(): f1(f1(1, 2), f1(2, 3)) main()

view raw
non_lazy.py
hosted with ❤ by GitHub

Here the interpreter calls f1 three times, even though the second argument to f1 is not evaluated. Haskell wouldn’t do that (it would only evaluate f1 twice); we won’t get into details, but this case falls under the first rule mentioned above.

So how can Python handle infinite lists, have lazy evaluation and perform calculation on demand ? The answer lies in the use of 1 (and a half) pattern. We avoid calling them 2 patterns because of the amount of conceptual overlapping between them. It all starts with the iterator pattern and implementing the iterator protocol:

 class CustomRange: def __init__(self, max): self.max = max def __iter__(self): self.curr = 0 return self def next(self): numb = self.curr if self.curr >= self.max: raise StopIteration self.curr += 1 return numb for i in CustomRange(10): print i

view raw
custom_range.py
hosted with ❤ by GitHub

Implementing the “magic methods” __iter__ and next (or __next__ in Python3) as shown above makes the class usable in contexts where we loop through the items it “produces”. Boilerplate code can be reduced using the generator pattern like so:

 def custom_range(max): a = 0 while a < max: yield a a += 1

This pretty cool piece of code can provide “computation on demand” and has endless applications, from mundane ones to mind boggling. Check the references for a thorough introduction.

### The curious case of C++

C++ implements a special case of lazy evaluation in short circuiting, where e.g. the second argument to the || operator won’t be evaluated if the first is true : the expression will yield true right away (note that both sides must compile). The other place where we find lazy evaluation in c++ is template (meta)programming : only instantiated templates are evaluated (OK plus the syntax of unevaluated templates is checked) so in a way “if you don’t use it, it doesn’t exist”. To implement lazy evaluation for our fold expressions we’ll be inspired by the iterator pattern, though this would be a great opportunity to apply the generator pattern via coroutines (let’s wait for things to become a bit more standard and we’ll be using yield-like constructs in C++ !)

## 4. Flat expression templates

Expression templates as used in the previous post are not ideal for lazy evaluating the expressions we build since the content is embedded in the form of the data structure. In order to lazily evaluate we need to be able to step through intermediate states easily and express clearly the resulting types and the advancement of our computation. I haven’t quite seen this the way we’ll be presenting it here, hence my take at giving it a name : flat expression templates. That been said I have to mention that it’s a common technique in GPU programming to “flat out” data structures in order to make them more “stream-able”, so this is not a “virgin birth”. To illustrate what this variation of expression templates does take for example a right fold expression, say:

(a + (b + (c + d)))


what we’ll do is convert this:

into this

We need a type to hold the expression in a reverse polish notation fashion and we choose to store the “nodes” in a tuple to be as “compile time friendly” as possible (we’ll see later on that constexpr evaluation is still on the table).

 template struct O_Om { std::tuple nodes; gut::member_result_t(nodes).clone()), decltype(std::get<0>(nodes).clone())> state; template constexpr O_Om(A&& state, Us&&… args) : nodes{fw(args)…}, state{fw(state)} { } /* nested iterator type here */ iterator begin() { return iterator(state, 0, nodes); } iterator end() { return iterator(state, sizeof…(Ts), nodes); } constexpr decltype(auto) yield() { for (std::size_t i(1); i < sizeof…(Ts); ++i) { state = vtu::call_with_tuple_element_first( F{}, nodes, i, state); } return state; } };

The type O_Om holds the expression (in its m hand) and has both its eyes open (we’ll be using it for left folds as well). To highlight some of the code we note that:

• state is the accumulator and has the type of applying the callable on the expression. If no accumulator is explicitly provided the rightmost argument becomes it. As in Haskell the accumulator consumes the expression from the right to left (so write your callables accordingly).
• A nested iterator type will keep track of the nodes that have been computed
• yield returns the result of the expression right away. It can be evaluated at compile time and it’s a good example of a constexpr member function not being const since it alters the state and wouldn’t be accepted by C++11 compilers that implicitly made constexpr member functions const. Luckily this is not the case in C++14 and beyond so we write code like this.
• vtu::call_with_tuple_element_first is a mechanism to visit a tuple with a “runtime” index (quoted because inside yield the i is evaluated at compile time)

## 5. Finally, lazy folds

The last piece of the puzzle is the expression iterator, which will be a nested member of O_Om:

 class iterator { // Publicly define the standard typedefs reference _state; std::size_t _pos; std::tuple& _nodes; public: iterator(reference state, std::size_t pos, std::tuple& nodes) : _state(state), _pos(pos), _nodes(nodes) {} iterator(const iterator& other) : _state(other._state) , _pos(other._pos) , _nodes(other._nodes) {} iterator& operator=(const iterator& other) { _state = other._state; _pos = other._pos; _nodes = other._nodes; return *this; } bool operator==(const iterator& other) const { return _pos == other._pos; } bool operator<(const iterator& other) const { return _pos < other._pos; } iterator& operator++() { ++_pos; _state = vtu::call_with_tuple_element_first(F{}, _nodes, _pos, _state); return *this; } reference operator*() const { return _state; } pointer operator->() const { return &_state; } };

view raw
expr_iterator.cpp
hosted with ❤ by GitHub

All this iterator needs to do is hold references to the flattened expression and the state and keep track of the current position in the expression. Calling ++ on it advances the position on the expression and evaluates it, while dereferencing it returns the current state. We only implement == and < and take advantage of std::rel_ops to generate the rest. Having this in place we can write code like this:

 int main() { std::string acc; auto k = fld::foldr( std::string(" < in the bush >"), 10, std::string(" < bird in the hand, is worth > "), 1, acc); using namespace std::rel_ops; for (auto&& state : k) { std::cout << "computed value so far : " << state << std::endl; } constexpr auto mm = fld::foldr(11, 2, 5, 2, 4).yield(); std::cout << "\nmax is " << mm << std::endl; }

JoinAc is a functor that concatenates the stringification of its arguments and we can loop through the expression state with range based for loops since our O_Om has begin and end member functions (a traditional for or while would also be acceptable). We also showcase that compile time evaluation is still doable by creating a constexpr variable mm using a “right away” evaluation.
Check a compilable example here

# Fun with folds

## The functional genome

A fold is a higher order function (a function that has one or more function parameters and/or returns a function) that is recursively applied over a data structure. An easy to grasp example of such a function can be found in Haskell :

foldl (operator) accumulator [...]


Here are the building blocks of the above expression :

• foldl : A left fold; left as in leftwise applied (…not a leftover fold) – more on this later.
• (operator) : A binary function
• accumulator : A starting value
• […] : A list to fold

So, say you want for example to sum the (list of) numbers from 1 to 10 :

foldl (+) 0 [1..10]


This expands to :

( ( ( 1 + 2 ) + 3 ) + ... + 10 )


Does the power of folds boil down to accumulations? Far from it!

Folding can be used creatively to perform all sorts of actions, algebra preserving ones:

foldl (max) 0 [1, 20, 2, 14]


or catamorphisms

foldl (Stringify) "" [1, 20, 2, 14]


here we assume that Stringify is a function (you’d need a type class for this) that maps its input to a stringified concatenation, so that we’d get “120214” and have a numeric sequence reduced to a string.

## Folds in a C++ Context

C++17 will have fold expressions! Clang supports them since 3.6 as does the trunk version of gcc. The syntax differs from Haskell and is designed to play well with variadic templates. A fold expression can be written like this:

(args + ...)


This is a (unary) left fold, because we “anchor” the application of the operator to the left, ie it expands as:

((args$0 + args$1) + ...) + args$n  If we were to create a right fold : (... + args)  the expression expands like this : args$0 + (... + (args$n-1 + args$n))


We used the term unary previously not to describe the operator (+) but the accumulator. When an accumulator is explicitly specified

(args + ... +   an)
(  a0 + ... + args)


we dub the fold expression as “binary”. For unary folds the identity value of the operator’s algebra is implicitly used, i.e for ‘+’ a zero is added, for ‘*’ the result is multiplied by 1 etc, so that empty parameter packs have meaningful folding.

## Show me some folding

### A for_each lambda

auto ForEach = [](auto&& fun, auto&&... args) {
(fun(args), ...);
};


To showcase this, let’s assume the existence of a printer functor

struct Printer
{
template <class T>
void operator()(T&& arg) { std::cout << arg; }
};


we can call it on arbitrary sequences like this

ForEach(Printer{}, 1, " to the ", 2, " to the ", 3, '\n');


ForEach applies an arbitrary callable to variadic arguments by folding with the , operator. It also does this in a left to right fashion, which is the associativity of the comma operator. Note that if a right fold was used inside ForEach we’d still be applying the callable from left to right (the associativity of comma does not change). Run the code here.

### Summing the contents of an std::array at compile time

namespace detail
{
template <class... Ts>
constexpr auto sum_(Ts&&... args)
{
return (args + ...);
}

template <typename T, size_t N, size_t... Is>
constexpr T sum_impl(array<T, N> const &arr,
index_sequence<Is...>)
{
return sum_(get<Is>(arr)...);
}
}

template <typename T, size_t N>
constexpr T sum(array<T, N> const &arr)
{
return detail::sum_impl(
arr, make_index_sequence<N>{});
}

int main()
{
constexpr array<int, 4> arr{ { 1, 1, 2, 3 } };
cout << integral_constant<int, sum(arr)>{};
}


The core function here is detail::sum_ which uses fold expressions to calculate the total of a sequence.

### A more modern summing style

The above is OK and all, but C++17 will give us the tools to be way, way more terse. In this case the function apply abstracts away the process of unfolding an algebraic data type into a function call (what we manually do, in a total of three functions, with the index sequence above). Having this in place, all it takes is to encapsulate our fold :

struct Summer
{
template <class... Ts>
constexpr auto operator()(Ts&&... args) {
return (args + ...);
}
};


and then call

apply(Summer{}, arr)


Sadly the current proposal for apply only works for tuples. It takes a minor modification to have it working for std::array as well, specifically the use of a mechanism to obtain the valency of the input data type (for now tuple_size is “hardcoded” into apply). Click here for a demo.

## Folding arbitrary functions

A somewhat limiting feature of c++ fold expressions, is that they’re only available for certain operators:

        +  -  *  /  %  ^  &  |  ~  =  <  >  <<  >>
+=  -=  *=  /=  %=  ^=  &=  |=  <<=  >>=
==  !=  <=  >=  &&  ||  ,  .*  ->*


So doing

args Op ...


is not allowed for arbitrary operators. We are going to create a mechanism to bypass this. To solve this we are going to use expression templates, which is the first thing that came to mind when looking at Haskell’s fold diagram, due to its resemblance to Veldhuizen’s expression parse trees:

The code below creates add-hoc expression templates for arbitrary operators using thin wrapper types that can be parametrized by the operant/operator pair.

First thing we need is a wrapper type that apart from data, records the type of the operant we want to use:

template <class F, class T>
struct XX
{
T data;

template <class D>
constexpr XX(D&& data)
: data(std::forward<D>(data))
{ }

constexpr T give() const
{
return data;
}
};


Then we create an expression class that records the operants and the type of the operation we want to perform.

template <class L, class F, class R>
struct YY
{
L l;
R r;

template <class LL, class RR>
constexpr YY(LL&& l, RR&& r)
: l(std::forward<LL>(l))
, r(std::forward<RR>(r))
{ }

constexpr auto give() const
{
return F{}(l.give(), r.give());
}
};


We’ll be using the + operator in our ad-hoc expression templates so we overload it to perform expression parsing for our wrapper class (the two versions are created in way that satisfies the types involved in a left to right evaluation):

template <class F, class R, class T>
constexpr auto operator+(
XX<F, T> const& lhs, R const& rhs)
{
return YY< XX<F, T>, F, R >(lhs, rhs);
}

template <class F, class T1, class T2, class R>
constexpr auto operator+(
YY<T1, F, T2> const& lhs, R const& rhs)
{
return YY< YY<T1, F, T2>, F, R >(lhs, rhs);
}


Finally the interface functions use the above machinery:

namespace detail
{
template <class... Ts>
constexpr auto _foldl(Ts&&... args)
{
return (args + ...);
}

template <class... Ts>
constexpr auto _foldr(Ts&&... args)
{
return (... + args);
}
}

template <class F, class... Ts>
constexpr decltype(auto) foldl(Ts&&... args)
{
return detail::_foldl(XX<F, Ts>(args)...).give();
}

template <class F, class... Ts>
constexpr decltype(auto) foldr(Ts&&... args)
{
return detail::_foldr(XX<F, Ts>(args)...).give();
}


What we get out of this, is a foldl and a foldr function that, much like the Haskell counterparts, can be parameterized by the type of a binary operator. And then things get interesting :

// concatenate the stringification of sequence elements
foldr<Join>(
1,
std::string(" bird in the hand, is worth "),
10,
std::string(" in the bush"));

// create a vector out of a sequence
auto k = foldl<Vettore>(1, 2, 30, 12);

// find the maximum element of a sequence
foldl<Max>(1, 20, 3, 5);


The complete code for the above examples can be found here.

Hopefully this is clear enough to proceed to the next step, providing some syntactic sugar. We didn’t get through all this ET trouble to end up with something compile time recursion could (kind of, almost in let’s say a similar way) handle. What we aimed for is the ability to write customizable C++ folds ourselves. Here’s how :

template <class F, class T>
XX<F, T> Op(T const& val)
{
return XX<F, T>(val);
}


So when using this “operator generator” we’d be writing things like

(Op<Max>(args) + ...)


## Intro

A “bad marketing” regarding lambdas in C++ was the use of the term “functions”. Ofcourse connaisseurs talked about “lambdas” “closures” or “lambda expressions” since lambdas are never quite functions in the codegen sense and we certainly can’t think of them as such in language semantics terms; yet this misconception can trigger a quest for features we normally find in plain or templated functions; the first and second sections of this post elaborate on this.

If we think of lambdas as functions we’d might make an attempt to overload them. This attempt is easy to rebut by stating that lambdas are closures ie runtime objects and well … objects, even callable ones, do not overload! What is interesting here though, is that even lambdas “translatable” to functions (usually non capturing closures) when compiled have an intermediate representation of an object with at least 3 member functions : a function call operator, a conversion operator and one static member function, that simply invokes the lambda with this set to zero; it would take some preconditions being satisfied (like the actual closure object not being used) to optimize the closure away and end up with a plain function call. What I mean by “can’t think of them as such (functions) in language semantics terms” is that even in this case we can’t expect this to work :

auto f = []( int){ cout << __PRETTY_FUNCTION__ << endl; };
auto f = [](char){ cout << __PRETTY_FUNCTION__ << endl; };

f( 1 );
f('a');


much like we can’t overload a function pointer: it’s always an ‘already declared‘ error. Since the title promises overloading, we’d better not call it a day yet. The closest thing to a lambda that can overload is its function call operator, so you might already had your “aha!” moment by now. If not, here it is :

Create an object out of the lambdas you want to overload and make their function call operators overloaded versions of the superobject’s function call operator.

The idea is often attributed to Mathias Gaunard as well as the following implementation:

template <class F1, class F2>
{
overload_set(F1 x1, F2 x2) : F1(x1), F2(x2) {}
using F1::operator();
using F2::operator();
};

template <class F1, class F2>
{
}


The code can be described in a few simple steps

• overload_set is the superobject’s type; it inherits from the closure types we want to overload
• the closure types’ function call operators are explicitly made visible to the scope of the class
• overload is a convenience function that returns an overload set

usage is straighforward :

auto f = overload (
[]( int) { cout << __PRETTY_FUNCTION__ << endl; },
[](char) { cout << __PRETTY_FUNCTION__ << endl; }
);

f('k');
f( 2 );


The names of the function call operators can be made available through ADL (argument dependent lookup) but the using declarations of the original implementation were left unharmed since they make the code more explanatory. We’ll be removing them when we want to scale to arbitrary number of overload candidates (industrial-strength version is provided in the References section, link No 2):

template <class... F>
{
};

template <class... F>
{
}


Overloading lambdas gives you the full power of closures plus Ad-hoc polymorphism ie make lambdas (actually the superobject) behave differently for each type. You can imagine this being useful when writing a for_each algorithm for tuples of different types; then the functor you pass to the algorithm can handle the different elements and be defined right at the location where it is called. In the following example we assume we have a function for_each_in_tuple that visits every element in a tuple; we can then define “in situ” a mutator func (an overload set of lambdas) that will transform the tuple in ways that vary depending on the element type:

int main()
{
[](int &val)    { val *= 2;   },
[](string &arg) { arg += arg; },
[](char &c)     { c    = 'x'; }
);

tuple<int, string, char> tup {1, "boom", 'a'};
for_each_in_tuple(func, tup);

cout << get<0>(tup) << endl
<< get<1>(tup) << endl
<< get<2>(tup);
}


(Run the example here)

## 2. SFINAE

According to common knowledge generic lambdas are function objects with a templated function call operator. Back in the day when we mistook lambdas for functions we might attempt to use SFINAE on them, yet another language feature that works for function templates only. Using the above mechanism, expression SFINAE works out of the box for generic lambdas:

auto f = overload (
[](auto arg) -> enable_if_t<
is_integral<decltype(arg)>::value> {cout << "1";},
[](auto arg) -> enable_if_t<
!is_integral<decltype(arg)>::value>{cout << "2";}
);


Easy peasy: Since function call operators are templated for generic lambdas they fall under SFINAE rules when the superobject’s function call operator triggers creating candidates for overload resolution.

Closures may generally be copied. There are cases though, where you want to prohibit this. For instance you may want to limit the ways a lambda can escape a scope in order to avoid dangling references. Silly example follows:

struct Locall
{
int val;
~Locall() {
cout << "destroyed";
}
};

int main()
{
function<void()> f;

{
Locall a{ 101 };
auto k = [re = ref(a)] {
cout << "... but using value "
<< re.get().val;
};
f = k;
}

f();

return 0;
}


(run the example here)

If we marked k as non copyable, the compiler wouldn’t allow f to “absorb” a reference to an object about to die; it’s a way for the coder to convey intents that may later be forgotten (or can’t be grasped by another person working on your code). To make a non copyable lambda use copyrights :

auto k = [nocopy, re = ref(a)] { /* ... */ };


Since the obvious way for a lambda to be non copyable is to capture a move only type, I use a macro that generates an “init capture” (a C++14 generalized capture) with a unique name and a light(est)weight captured object. This is how:

#define CONCATENATE_DETAIL(x, y) x##y
#define CONCATENATE(x, y) CONCATENATE_DETAIL(x, y)
#define UNIQUE_IDEXPR(x) CONCATENATE(x, __COUNTER__)

struct NC
{
NC()                     = default;
NC(NC const&)            = delete;
NC& operator=(NC const&) = delete;
NC(NC&&)                 = default;
};

#define nocopy UNIQUE_IDEXPR(x) = NC()


Note that what we get from this is a unique expression (auto)xi = N() (where auto isn’t actually spelled and i equals 1, 2, 3…), a move initialization due to the xvalue produced by NC() ie requires the plain and move constructors to be available (auto generated or defined). Also xi is in the scope of the closure class so we don’t need to worry about collisions with the outer scope

## Quiz

Something is redundant in the nocopy macro, can you spot it? Answer in the comments section or look at the code in Github (link No 2) to find out.

# Benchmarking in C++

## Intro

We’ll be creating a framework that attempts to provide a generic, cross platform (standard compliant), non dependent to third party libraries solution to the benchmarking problem. The end product will look like this :

benchmark<time_type, clock_type> bm;

bm. run(
"Experiment Name"
, sampleSize
, { /* code to time */ }
, "factor name"
, factors. begin(), factors. end() );

bm. serialize(
"benchmark name" , "output file name" , mode);


The main features of this design are :

• Parametrizable time and clock type per benchmark.
• Multiple experiments can run on the same benchmark (hence we can group and compare results as we wish).
• Adjustable sampling size (number of times each experiment will run).
• A factor can be passed to the code under benchmark (simply put, when we graph the results, the factor will be the variable along the x axis).
• The benchmarks are printable and serializable in a “Python friendly” form.

As a complementary tool, a Python script will be developed, to read the output of a serialization and make graphs using matplotlib.
Finally, it will become apparent that the framework is mostly concerned (without excluding the possibility of expanding) with code runtime performance. The reason why this post isn’t titled “Timing C++ code” is because a sampling procedure and postprocessing of the results is incorporated as well.

## Know why

If you think something is correct, prove it with a test. If you  think something  is  fast, time it.  If  you think one algorithm  is   better  than  another,  find  some way to make a measurement to support your argument.

Do not depend  only upon your “common sense.” Do not depend only on theory. It is good to use theory to help guide you  to an  optimum  solution. But without measuring,   you   have   no   way   of   knowing  (nor convincing others)  of the  correctness of your theory.  Without  measurement,  you  have  a  faith-based system, not a scientific theory. [3]

## Know how – techniques

The dominant technique used here is instrumentation. The way we’ll be applying it is by mandating code under timing be called through a measuring structure. A simple version of this emerged after several design iterations :

template<typename TimeT = std::chrono::milliseconds>
struct measure
{
template<typename F, typename ...Args>
static typename TimeT::rep execution(
F func, Args&&... args)
{
auto start = std::chrono::system_clock::now();
func(std::forward<Args>(args)...);
auto duration = std::chrono::duration_cast<
TimeT>(
std::chrono::system_clock::now() - start);
return duration.count();
}
};


Things to note here are :

• variadic templates are used to forward as many arguments as needed to the callable object (function, function pointer, functor, lambda or a block of code surrounded by a lambda).
• The code we time won’t be rearranged by the optimizer and will always lie between those start / end calls to now(), so we can guarantee our timing will be valid.
• There’s minimum overhead by this calling layer; in other words if passing the callable and the arguments (by &&) to the instrumentation structure disturbs the results, then there’s no point in the measurement to begin with.

This has a very clean use pattern and minimum code pollution when it comes to timing :

auto executionTime = measure<>::execution(f0);


## Know how – std::chrono

C++11 introduced “a flexible collection of types that track time with varying degrees of precision”. In what would be a brief introduction to this collections, we can identify three workhorses : clocks, durations and time points.

A clock is a class that contains time information. Things to know about clocks:

• We call now() to get the current time of a clock
• There are three types of clocks: steady_clock, system_clock and high_resolution_clock.
• A clock’s tick period is the number of times it ticks per second (specified by  std::ratio)

Durations are what we would call time types. The related class template std::chrono::duration<> can “create” time types in its instantiations. To give an example, this is how we define milliseconds :

std::chrono::duration< double, std::ratio<1,1000> >


Finally a time point is the return type of clock::now().  It is implemented as if it stores a value of type Duration indicating the time interval from the start of the Clock‘s epoch.

## Implementation

A draft description of the entity design is shown in the diagram below :

I may be providing more details on the implementation techniques on another post, but for now I’ll justify (for anyone reading the code) the usage of type erasure with the fact that I had to use function templates when implementing the experiments’ functionality, and those can’t be virtual.

## Using the toolset

We’ll be using the toolset to fill in the mising graphs in Bjarne’s presentation. Therein Mr. Stroustrup proposes the following experiment :

• Generate N random integers and insert them into a sequence so that each is inserted in its proper position in the numerical order.
• Remove elements one at a time by picking a random position in the sequence and removing the element there.
• For which N is it better to use a linked list than a vector (or an array) to represent the sequence ?

This is all it takes (assuming the correct headers are included) :

template<class Cont>
void CommonUse(int nElems)
{
Cont cont;
random_device rd;
mt19937 eng{ rd() };
uniform_int_distribution<int> distr(0, nElems);

for (int i = 0, val; i < nElems; i++) {
val = distr(eng);
cont.insert(lower_bound(
begin(cont), end(cont), val), val);
}

while (nElems) {
auto it = std::begin(cont);
cont.erase(it);
--nElems;
}
}

int main()
{
bmk::benchmark<> bm;
// 1. Run an experiment for vectors --------------
bm.run("vector", 10, CommonUse< vector<int> >,
"number of elements",
{ 10, 100, 1000, 10000, 100000, 1000000 });
// 2. Run an experiment for lists ----------------
bm.run("list", 10, CommonUse< list<int> >,
"number of elements",
{ 10, 100, 1000, 10000, 100000, 1000000 });
// 3. Record the results -------------------------
bm.serialize("Why you should avoid Linked Lists",
"vector_vs_list.txt");
}


Here we average the results of 10 samples per experiment and use an initializer list to provide the factors. Factors must be the same type as the argument to the lambda and can also be given as an iterator delimited range (begin / end). If you think an operation should be excluded from the timing you can surround it with a timeout’s .tic() / .toc() as long as the timeout has the same clock and time type as the experiment :

bmk::timeout_ptr<> to = make_unique < bmk::timeout<> >();

to->tic();
val = distr(eng); // time excluded from the experiment
to->toc();

/* rest of the experiment */



Experiments that return timeouts (actually timeout unique pointers) are handled differently (thank you expression SFINAE) and have the total duration stored in timeout subtracted from the experiment’s time (hopefully you won’t use timeout’s for trivial tasks that will take 0 nanoseconds anyway).

If we were to open “vector_vs_list.txt” (just the results stored as a Python dictionary) with the visualizer python script this is what we get :

Yes I only actually tried up to 10,000 elements; it takes waaaay too long when I add the last too factors as well (you can run the code yourself if you don’t believe me).

If we were to run factorless experiments then the sequence size couldn’t be an input. So let’s choose for example’s sake the size 10,000; the visualization we get is a boxplot per experiment :

## Code Kata

• Add the option to have each experiment run in its own thread (I’ll be posting a take on this but it’s good to get some ideas first).
• I kind of cheated (not that it changes the results) in favor of vector. Can you spot where ?

## Post scriptum

tl;dr ? You can check the code in action here (it’s coliru so I’m only console printing the results).

## References

1. My post on SO
2. cppreference.com on chrono
3. C++ coding guidelines
4. The chrono proposal
5. The benchmark class and visualizer at the blog’s repository

# Compile-time integer sequences

## Intro

We’ll be discussing a mechanism that can, during compilation, generate a sequence [0, N] given an integer N. So for example, the following mappings can be generated :

int_seq<3> -> { 0, 1, 2, 3 }
int_seq<2> -> { 0, 1, 2    }
int_seq<1> -> { 0, 1       }
int_seq<0> -> { 0          }


We choose to use a closed interval in the examples so that the last case won’t produce an empty set (and an awkward compilation).

## Know why

Being a compile time thing, this sequence can be embeeded in type information or used in contexts that demand compile time constants. Typical examples include :

• Filling a braced initializer list. This has numerous uses, consider eg:
template<N>
void fU()
{   // int_seq<N> would come in handy here
int N[] = { 0, 1 ... N-1, N };
}

• Pack expansion contexts; more on this on the next post, but for now just consider getting all tuple elements when all you know is its size and get<> demands a compile time constant.
• In complex compile time computations (I’ll be posting about a compile time sqrt function, stay tuned)

## Know how

We present an implementation based on [3] tweaked to provide the closed interval :

// a sequence container -------------------------
template<int ...>
struct ints
{
};
// integer sequence -----------------------------
template<int N, int... Is>
struct int_seq : int_seq<N - 1, N, Is...>
{
};

template<int... Is>
struct int_seq<0, Is...>
{
using type = ints<0, Is...>;
};
// convenience alias -----------------------------
template<int N>
using IS = typename int_seq<N>::type;


What happens here is that int_seq inherits from “another version” of itself; Inheriting from an incomplete type, as myself during class definition, is forbidden, but since the template arguments differ, each int_seq inherits from a different class.

This compile time (expanding) recursion, causes the integer sequence to be embeeded in the variadic integer pack. We can mentally trace an example of the numbers during instantiation :

int_seq<3>       inherits from int_seq<2, 3>
int_seq<2, 3>    inherits from int_seq<1, 2, 3>
int_seq<1, 2, 3> inherits from int_seq<0, 1, 2, 3>

int_seq<0, 1, 2, 3> contains { 0, 1, 2, 3 }


So how do we use this thing ? The “secret” is to infer the integer sequence. Let’s showcase this by returning to our first use case:

template<int... Is>
void fU(ints<Is...>&& s)
{
for (auto i : { Is... }) cout << i << " ";
cout << endl;
}

int main()
{
fU(IS<5>());
}


Since the sequence can’t “live” on its own, it’s used through a pack type container ints. Yet inside func we can refer to the sequence Is since it has been inferred; all it took was placing it in a deducable context!

## CodeKata

Now make a reverse integer list. I was tempted to submit a proposal for an STL extension (make_reverse_integer_sequence) but after solving this discovered (not to my surprise I admit) that it has been done before (just look at [2]).

So here’s my take at this:

// reverse integer sequence ---------------------
template<int C, int N, int... Is>
struct rev_seq : rev_seq<C - 1, N, N - C, Is...>
{
};

template<int N, int... Is>
struct rev_seq<0, N, Is...>
{
using type = ints<N, Is...>;
};
// convenience alias ----------------------------
template<int N>
using RS = typename rev_seq<N, N>::type;

## Post scriptum

Related code can be found in the blog’s repository. A demo can be found here. Lastly, here’s a list of problems, in ascending order of difficulty :

1. Trace the instantiation for RS<3> in the CodeKata.
2. Tweak the code to provide the open sets (most implementations out there work this way, so that’s an easy one).
3. Make a reverse integer list by reversing the front one.
4. Make [N1, N2] ranges, ie ranges that can have their starting point specified eg
 [2,6] -> { 2, 3, 4, 5, 6 }
5. Make [N1, N2, step] ranges, ie ranges that can have their starting point and step specified eg
 [2,6,2] -> { 2, 4, 6 }

Hopefully I’ll get some responses on the above and compile the highlights (along with some commenting) in an upcoming post.