Composing callables in modern C++

Composing callables is a typical and useful functionality one could ask from a programming language. The canonical explanation of composition includes the example:

h(x) = f(g(x)) = (f * g)(x)

i.e. h is the composition of two functions, namely f and g. This means the effect of calling h is that of calling f with the result of g(x).

Being statically typed, C++ may appear as a nightmare of a language to express such mutations. After all what would be the type of an entity that produces a composition out of arbitrary callables? Luckily, such a view couldn’t be further from the truth; not only is such a utility expressable in core C++ (no third party libraries needed), but also consistitutes a prime example where many modern C++ features simplify the code and facilitate the end result.

Let’s look at one possible implementation:

template <class F, class... Fs> 
constexpr auto compose(F &&arg, Fs &&...args)
{
  return [
    fun = std::forward<F>(arg),
    ... functions = std::forward<Fs>(args)
  ] <class... Xs> (Xs &&...xs) mutable
    requires std::invocable<F, Xs...>
  {
    if constexpr (sizeof...(Fs))
    {
      return compose(std::forward<Fs>(functions)...)(
        std::invoke(std::forward<F>(fun), 
                    std::forward<Xs>(xs)...));
    }
    else
    {
      return std::invoke(
        std::forward<F>(fun), 
        std::forward<Xs>(xs)...);
    }
  };
}

Usage of such a construct would allow to chain an arbitrary amount of callables where the result of each one can be passed to the next:

auto h = compose(g, f, z);
h(x);

As shown in the following Demo, the first callable can even have an arbitrary amount of arguments, but subsequent ones must accept the result of the previous call as a parameter. Let’s now list the modern C++ features that allow us to write this code, along with the language version that introduced them:

  1. Line 6 (C++20): Variadic capture clause, allows us to capture an arbitrary amount of callables into the lambda.
  2. Line 7 (C++20): Template syntax for generic lambda expression. This comes in handy when forwarding the xs parameters to the callable, because their type is now readily available and I don’t have to resort to more bloated syntax like “decltype(xs)...“. It also facilitates writing requirements (see next point).
  3. Line 8 (C++20): Constraint to prevent composition chains where the callable cannot be invoked with the provided argument(s), also allowing better compilation error messages.
  4. Line 10 (C++17): constexpr if allows compile time branching to treat our two cases, namely that of Fs being greater than zero vs zero.
  5. Lines 13 & 18 (C++17): std::invoke is a standard library function which enables us to invoke arbitrary callables with a unified syntax.

The only “tricky” line in the code above, is the return statement in Line 12, which says: call compose again, only now

  • the list of functions is the “tail” of the current function list, and
  • the argument to that “new” composition is the result of calling the “head” with xs i.e. fun(xs...).

A library solution

Addressing composition in modern C++ wouldn’t be complete unless we mention standard library facilities can help, e.g.:

using std::views::transform;
auto fgh = transform(h) | transform(g) | transform(f);

// Calculate f( g( h(x) ) )
auto fgh_x = ranges::single_view{42} | fgh; 

fgh_x[0]; // Result is the only element in the view.

If pulling from a single element view is not something you find awkward, or even bettern you’d like to apply the composition to a collection of inputs, you can even create an abstraction similar to what we described above using fold expressions (Demo):

template <class... Fs>
auto composer(Fs&&... functions) 
{
  using std::views::transform;
  return (transform(functions) | ...);
}

Sanitizers in continuous integration

Sanitizers are tools embedded in gcc or clang, that can instrument your code to check for various error categories like:

  • Memory and address violations.
  • Threading problems.
  • Undefined behavior.

This post shows how to incorporate such tools to your continuous integration pipeline for C++ code and provides an open source repository of reusable sanitizer workflows for GitHub.

What is a sanitizer good for?

There’s countless ways to mess up a C++ program, but many problems boil down to memory or threading issues. Let’s look at two silly programs that fall into these categories:

// 1. 
{
    auto d{ new double };
}

// 2. 
{
    int val = 0;
    auto task = std::async([&val] {
        int i(0);
        while (i++ < 1'000)
        {
            ++val;
        }
    });

    if (val == 1'000) { 
        cout << "Updates finished"; 
    }
    task.get();
}

In the first case, the address sanitizer will report the memory leak, adding the source location that generated it:

=============================================
==2153==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from: 
#0 0x7fe48d1d6587 in operator new(unsigned long) 
    ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
test_asan.cpp:5

The second error case can be handled by the thread sanitizer. It recognizes the potential for a data race when reading val, while being modified by task.

==================
WARNING: ThreadSanitizer: data race (pid=2209) 2: Write of size 4 at 0x7ffe61a62a3c by thread T1:

Drilling down to the culprit of threading issues might be more convoluted, although the exact location of unsafe access is again given. An awesome bonus is that thread sanitizer has (almost) zero false positives, which makes it much more useful compared to similar tools.

Testing frameworks running with sanitizers

A sanitizer can be enabled by providing the '-fsanitize=' option to your compiler. Of course for complicated projects with intricate dependencies, configuring your build system will be a bigger challenge. When the sanitizer of choice is enabled, this will make your test suite run against the instrumented code.

Note that if you’re using third-party libraries you’ll have to also compile those with sanitizer support or else a sanitizer cannot peer through them. Even so, problems in third party libraries will be reported with restricted information about the internals that caused them. This is an excellent starting point and a very big help when evaluating the integration of third party components to your code-base.

When running a test suite over sanitizer-instrumented code, say GoogleTest, run time execution will spit out problems reported by the sanitizer as well. For example:

2: Note: Google Test filter = TestMyTest.TestCase
2: [==========] Running 1 test from 1 test suite.
2: [----------] Global test environment set-up.
2: [----------] 1 test from TestThreadSanitizer 
2: [ RUN ] TestThreadSanitizer.DataRace
2: ================
2: WARNING: ThreadSanitizer: data race (pid=2209)
2: Write of size 4 at 0x7ffe61a62a3c by thread T1: 
2: #0 operator() /home/runner/work/sanitizer_workflows/sanitizer_workflows
...

These magical “WARNING” and “ERROR” keywords will trigger your testing framework; the test will fail. Simply by running your test suite through your sanitizer of choice, the tests not only check the observable behavior, but also verify that all runs correctly under the hood and no latent bugs are left in the code.

Reusable GitHub workflows

I have created a repository with reusable workflows for my two favorite sanitizer configurations: thread and address sanitizer. Using the workflows is described therein, but in summary it just takes defining a yaml file like:

name: Memory

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  address_sanitizer:
    uses: picanumber/sanitizer_workflows/.github/workflows/asan.yml@main
    with:
      testDir: 'tests' # Optional, default value is 'tests'

Example

The yapp library, heavily depends on threading integrity. It follows the guide in sanitizer_workflows to provide the sanitizer checks:

Pipelines to the rescue

This post discusses pipelines, a popular parallel programming pattern, aiming to dive into:

  • What is a parallel pipeline.
  • What are some popular use cases.
  • A C++ library implementing the pattern.
  • An example use-case, with implementation.
image from yapp

Pipelines in a nutshell

We can think of a pipeline as the computing analog of an assembly line. A pipeline is a data processing mechanism composed of a sequence of stages. Data progresses from one stage to the next, occupying one stage at a time. This lifts concerns on data races, and allows stages to use their own computational context (dedicated thread, device, system). Finally, buffering can be used between stages to regulate varying latency of computations.

The parallel pipeline pattern can be identified in:

  • The Unix instruction pipeline
  • The OpenGL programming model
  • CI / CD pipelines
  • Anything that has to do with real-time or streaming data processing

A more specialized adaptation of the pattern can be found in “instruction pipelining” within a processor. As shown below, there can be four stages to this pipeline (fetch, decode, execute, write-back), while the completion of an instruction can be viewed as the output. The control unit plays the role of the generator, arranging for a flow to start, continue and stop as a program commands.

Instruction pipelining from Wikipedia

In its general software implementation, a pipeline is composed of three types of stages:

  • A generator stage. This is the stage that pumps data into the pipeline and can be placed at the front of a pipeline. A pipeline should be operational with unbound generators, i.e. an “infinite” data stream.
  • Intermediate stages, processing stages or transformation stages. These are the stages that take some input and produce some output. A pipeline can have an arbitrary amount of such stages.
  • Sink or output stage. This stage extracts data from the pipeline and can be placed at the back of a pipeline.

Of course no-one prevents intermediate stages to also output “intermediate” results, but in general pipelines are expected to produce on a single end. Secondary reasons like debugging may force us to inspect data half way through, but that is a “side-concern”.

yet another parallel pipeline

A multi-threaded implementation of the pipeline pattern can be found in yapp. It is an alternative to pipelines in large general purpose multi-threading libraries, aspiring to:

  • Smoothly collaborate with code using standard thread facilities.
  • Avoid the (bigger) learning curve.
  • Provide “just use the pipeline part”.

yap sports:

  • Zero dependencies.
  • Header-only implementation.
  • Vanilla c++20.
  • Exclusive use of C++ standard facilities like <thread> and friends.
  • Meta-programmed stitching of user-provided callable objects into pipelines.

k most frequent words

In searching for a “hands-on” example using pipelines, we should consider that many text processing tasks can be solved in a pipeline:

  • Find all occurrences of a word.
  • Find all misspelled words.
  • Translate a text – OK, technology has advanced here so we might be looking at a different universe with this one.
  • Find the number of individual words, or extract other metrics on “language quality”.

An interesting variation is the following: Given a text file and a positive integer number K, return the K most frequent words in that text. This process can be modeled as pipeline with the following steps:

  1. Read the input file line by line.
  2. Break each line into a list of words.
  3. Place each word into a hash table, mapping the word to its frequency.
  4. Use newly discovered <word,frequency> pairs to maintain a K-top list of words.

This is written in yap as:

auto pl = yap::Pipeline{} 
  | Reader(name) | Splitter{} | Frequency{} | KTop(k);

pl.consume(); // Use all data from the generator.

While step (1) is generating lines (reading the input), subsequent steps are already processing data in their own thread. By the time input reading is finished, we are pretty much done processing. Full code is provided here, but to drive the point home it’s worth looking at the file reader implementation:

class Reader
{
    std::ifstream _input;

  public:
    Reader(const char *fname) : _input(fname)
    {
        if (!_input.is_open())
            throw std::logic_error("Cannot open input file");
    }

    std::string operator()()
    {
        std::string line;
        if (!std::getline(_input, line))
        {
            throw yap::GeneratorExit{};
        }
        return line;
    }
};

Every time the reader is invoked, it pushes a single line from the input file to the pipeline. Subsequent steps start working with that line as described in steps 2-4. Pipelines are usually “ever-running” (or until their .stop()method is called) but in this case we need to process a predefined amount of data, namely all of the input file. To signal the “end of input”, the generator throws the GeneratorExit exception (if you know what I’m saying), while the call to consume is to block until all data has been processed.

You can read on all the available operations here.

Outro

This example concludes our introduction to pipelines and the yap library. Hopefully it was enough to motivate you to apply the pattern in your code and search for potential optimizations or improvement in organization that it may introduce.

References

Safe specializations – Concepts in practice 1

Can you enforce user code to provide correct specializations? This article provides one way of doing it.

1. Specializations

Class template specializations are used to provide a custom behavior for your type of choice. Think for example the property of something being serializable, one might express it as:

template <class>
struct Serializable
{
  // By default a type is considered "non-serializable"
  constexpr static bool value = false;
};

We call the above a “base template”, meaning it’s the non-specialized version that applies to every type argument, while a specialization is expected whenever someone wants to customize this behavior. To make things more concrete, a library using this trait might write:

if constexpr (Serializable<T>::value) {
  /* branch 1 */ 
} else { 
  /* branch 2 */ 
}

To land on “branch 1”, you have to specialize the base template. But how do you do that? Usually the author may leave a hint on the base template like so:

template <class T>
struct Serializable
{
  constexpr static bool value = false;
  
  static string toString(T const&);
  static optional<T> fromString(string const&);
};

Such a hint does not have to be in comments because the silent contract is that library code will never try to use the two de/serialization functions when value is false. Following such a hint, users are expected to provide the following customization for their type:

template <>
struct Serializable<MyType>
{
  constexpr static bool value = true;
  
  static string toString(MyType const&) { 
    /* implementation */ 
  }
  static optional<MyType> fromString(string const&) {
    /* implementation */ 
  }
};

2. The problem

You might have already deduced that this “user provided specialization” is the part we’re trying to make safer or fail-proof. How can one ensure that a user provided specialization “follows the rules”? To be more specific, in our example providing such a customization comes with the following challenges:

  1. Just forgetting constexpr or static in the declaration of value will lead in all sorts of interesting compilation or link time errors. The definition of value in a specialization is only enforced by reverse engineering the errors that result from its absence. What if we want to change the base template value to be int, what if mixing int and bool mostly works any-ways?
  2. toString and fromString face similar problems. The author of Serializable expects them to have very specific signatures but has no means of enforcing them, like inheritance does with the combination of virtual and override. Again some variation may pass compilation, like passing by value in fromString, but this can spuriously kill performance or go against the grain in terms of design/intentions of the author of serializable.

3. A legacy solution

Legacy systems, typically addressed this by providing alternatives to “hand-written” specializations. For example boost geometry has a vast type specification system that builds on top of this logic (oversimplifying here, bg is amazing and did concept-like stuff before c++11). Instead of writing your specializations, you use macros to generate them for you, e.g.:

BOOST_GEOMETRY_REGISTER_POINT_3D(...

Since I’m not above using a macro, I’ll provide arguments on why I want to avoid them. To its core, the macro is a code generation facility; while powerful and alluring to experts, it doesn’t prioritize our needs in this case:

  • It doesn’t provide type restriction (quite the contrary) and
  • The kind of errors I’m trying to avoid can easily be generated by cunning experts or puzzled novices. Just pass names decorated with const& and friends to the macro declaration.
  • Furthermore, when we do break the macro, our compilation error has one or more extra level of indirection – not fun.

4. A concept based solution

On the contrary, using concepts here can totally cover our needs:

  • Type checking – check
  • Member existence – check
  • Do all that in compile time – check
  • Non intrusive to the user type – check

The last point is of great importance to library developers, since removing the need to modify “user code” is a big plus: user code does not have to inherit, does not have to provide a specific layout, heck it can even be written in C. All the library mandates is that you write the specialization correctly.

5. Implementation

Start by declaring your restrictions. In our example that would be:

  1. Every specialization of Serializable must contain a value member of the appropriate type.
  1. Every specialization of Serializable must contain a serialization/deserialization function of appropriate type.
  2. Combine the above requirements in a Serializability concept. A type can either
    • provide a specialization, hence true == Serializable<T>::value + restriction (2) must be true.
    • follow the base template, hence false == Serializable<T>::value, in which case there is no extra requirement.

One way of implementing the above in C++ is:

// 1
template <class T>
concept SProp = requires (T obj) {
  requires std::same_as<
    decltype(Serializable<T>::value), const bool>;
};

// 2    
template <class T>
concept SMethods = requires (T obj) { 
  { Serializable<T>::toString(std::as_const(obj)) } 
    -> std::same_as<std::string>;
  { Serializable<T>::fromString(std::string{}) }      
    -> std::same_as<std::optional<T>>; 
};

// 3
template <class T>
concept Serializability =  
  (SProp<T> and !Serializable<T>::value) or  
  (SProp<T> and Serializable<T>::value and SMethods<T>);

Next we need one final piece of machinery. That is to expose a “library facing” trait like so:

template <Serializability T>
constexpr bool serialize_v = Serializable<T>::value;

// Using the above utility, a library writer can write:
if constexpr (serialize_v<T>) { // ...

Attention! one might be tempted to use the concept as trait, like if constexpr (Serializability<T>) and be done with it, but that’s not what we want. The aforementioned concept is permitted to be true or false. In case user code incorrectly specializes our base template we end up with legal code and a concept that evaluates to false. But that’s not what we want; our goal was to “ban incorrect specializations“. The serialize_v trait, enforces Serializability to hold (i.e. either you’re unspecialized or correctly specialized), and uses the nested value as the target for compile time branching.

An example implementation can be found here, with a proper specilization of a user type branching on whether it’s serializable or not.

To extend the example, one can imagine user code that incorrectly specializes, by forgetting the const in the toString method:

template <>
struct Serializable<MyStruct>
{
  constexpr static bool value = true;
    
  static std::string 
  toString(MyStruct &) { 
      /* implementation */
      return {};
  }
  
  static std::optional<MyStruct> 
  fromString(std::string const& val) { 
      /* implementation */
      return std::nullopt; 
  }
};

The compiler refutes the code with:

note: the required expression ‘Serializable::toString(std::as_const(obj))’ is invalid

6. References

A related question I answered in Stack Overflow.

A C++ locking wrapper

Intro

This post deals with the design and implementation of a locking wrapper. It is organized into the following four sections:

  1. Description – the functionality of such a wrapper is described.
  2. Motivation – scenarios that exemplify such a utility are presented.
  3. Implementation – one such implementation is provided in C++
  4. Special cases and considerations regarding ownership

So without further ado, let’s dive in.

Description of a locking wrapper

Simply put, a locking wrapper is a class that is able to encapsulate another class to make its functionality thread-safe by means of creating critical sections around its interface. So for example, if you class provides a method foo, the wrapper’s duty is to provide an instrumented access to that method with the following semantics:

  • lock some internal mutex
  • call foo
  • unlock the internal mutex

This “decorated” call sequence adds thread safety to a class that was designed without such provision.

Motivation for such a class

Do use such a wrapper if:

  • Lock-unlock sequences is the desired granularity of synchronization primitives you want to use.
  • You want to separate a class implementation from provision for thread safety.
  • The multi-threaded use case is a fraction of the class’ use cases, and you don’t want to redesign the whole class for that fraction.
  • You want to prototype using a completely synchronized version of the class, and plan to refine later.
  • You have a scenario where each data member is paired with a mutex, and you simply want to abstract all the boilerplate noise away.

Don’t use such a wrapper if:

  • The class is heavily used in multi-threaded code (e.g. it’s a building block for other multi-threading code) and locking its methods using the same mutex is a performance killer.
  • Using thread-local instances would suffice. Of course it may be the case that both wrapped instances exist and thread-local ones, in which case it’s beneficial to separate the class from its thread-safety mechanisms.

Implementation

The code that follows is implemented as a special case of the generic wrapper, presented in a Bjarne Stroustrup paper. Therein, the creator of C++ shows a way of generically decorating a class in order to apply some prefix and postfix calls around a method. In our case, the prefix call is the one to lock the mutex, while the postfix is tasked with unlocking it.

With that in mind, a general skeleton for our wrapper is easy to devise:

template <class T> class Locking
{
T _data;
mutable std::recursive_mutex _mtx;
/* Define call proxy here */
public:
template <class U> Locking(U &&data) : _data(std::forward<U>(data))
{
}
CallProxy<T> operator->()
{
_mtx.lock();
return CallProxy<T>(&_data, _mtx);
}
};

Using this mechanism you can call any method from the class using the arrow operator, and it will be protected by a single mutex:

class X
{
int i = 0;
public:
int f(const char *msg) const
{
return i++;
}
};
Locking<X> aa(X{});
aa->f(msg); // This call is thread-safe

As you can see our Locking class holds a mutex to synchronize access to the wrapped object’s methods. Implementing the prefix decoration is as easy as locking the internal mutex before returning a pointer to the object. It’s the postfix implementation were things get interesting:

template <class U> class CallProxy
{
U *_p;
std::recursive_mutex &_mtx;
mutable bool _own;
public:
CallProxy(U *pp, std::mutex &mtx) : _p(pp), _mtx(mtx), _own(true)
{
}
CallProxy(CallProxy const &other)
: _p(other._p), _mtx(other._mtx), _own(true)
{
other._own = false;
}
~CallProxy()
{
if (_own)
{
_mtx.unlock();
}
}
CallProxy &operator=(CallProxy const &) = delete;
U *operator->()
{
return _p;
}
};

The postfix call is implemented with the help of a call proxy. As seen above this is an object that is created on the spot by the arrow operation, and implements the postfix call in its destructor. Using this design in our case, by the time the expression “->f(msg)” is complete, you have three things happening:

  • The internal mutex is locked.
  • A call proxy is returned, which can give you access to the wrapped object, hence you can call any of its methods.
  • The call proxy is destroyed, at which point the mutex is unlocked.

The complete functionality is shown below, with the call proxy implemented as a nested class for our wrapper:

template <class T> class Locking
{
T _data;
mutable std::recursive_mutex _mtx;
template <class U> class CallProxy
{
U *_p;
std::recursive_mutex &_mtx;
mutable bool _own;
public:
CallProxy(U *pp, std::recursive_mutex &mtx) : _p(pp), _mtx(mtx), _own(true)
{
}
CallProxy(CallProxy const &other)
: _p(other._p), _mtx(other._mtx), _own(true)
{
other._own = false;
}
~CallProxy()
{
if (_own)
{
_mtx.unlock();
}
}
CallProxy &operator=(CallProxy const &) = delete;
U *operator->()
{
return _p;
}
};
public:
template <class U> Locking(U &&data) : _data(std::forward<U>(data))
{
}
CallProxy<T> operator->()
{
_mtx.lock();
return CallProxy<T>(&_data, _mtx);
}
};

A fully working demo can be found here.

Value semantics and using smart pointers

For those that went through the paper, you may have noticed that we chose to avoid using pointers to refer to the wrapped data; instead forwarding references are leveraged to construct the data in place, and avoid weird lifetime scenarios or the resource being shared with non thread-safe use. Of course one can always circumvent this by using reference types or smart pointers, but I’m not in the business of providing bullet-proof boots.

Using smart pointers can be a valid case, when they’re handed exclusively to the thread-safe wrapper (or if you can somehow guarantee that they’re not being used while we try to use them through the wrapper). Below we summarize some examples of using smart pointers with our locking wrapper, which were also shown in the linked demo:

std::shared_ptr<Locking<X>> yy(std::make_shared<Locking<X>>(X{}));
(*yy.get())->f(msg);
// This is most likely unsafe, since we can only lock a single copy of the pointer
Locking<std::shared_ptr<X>> xx(std::make_shared<X>());
xx->get()->f(msg);

Avoiding compile-time recursion

One of my favorite blogs, is Raymond Chen’s “The Old New Thing”. In a recent post, he described a way to find a type in a tuple, with a compile time predicate that given a type and a tuple, will return the index of that type in the tuple, or fail to compile if the type does not exist (or exists more than once):


type_index<int, std::tuple<char, int, float>>::value;
// Our type^^^ is found at index ^^^ 1 of our tuple

The value is calculated at compile time and all is good in the world, no?

You can follow the link to check his implementation, Raymond offers a great walk-through as always. The thing that bothers me is the use of recursive template instantiations. This happens when a template instantiates itself until it hits a termination case. The reason I dislike the technique is because it tends to:

  • Increase the binary size. Symbols are generated for a chain of instantiations which end up bloating the executable.
  • Slow down compilation. It’s a recursion and the compiler has to pay the price up front.
  • Complicate the error messages when something goes wrong. The compiler will spit out the whole recursion tree when something goes wrong, and the error messages get obfuscated.

There’s a chapter called “The Cost of Recursive Instantiation” in the template book, but what I’ve found really dangerous is that when trying to scale code designed this way the burden on the compiler may get out of hand. E.g. we may start adding nested types in such templates or try to piggy back on the recursion to design extra machinery.

Of course in the problem at hand, Raymond’s implementation is simple and efficient (even checked binary outputs and resulting generated symbol – as always you can trust what he’s saying). I’m writing this post to show there’s usually an alternative and to point out that not considering the cost of these things may haunt you in the future.

Let’s see how a non recursive implementation would look like:


template <class T, std::size_t I, class Tuple> // 1
constexpr bool match_v = std::is_same_v<T, std::tuple_element_t<I, Tuple>>;
template <class T, class Tuple, // 2
class Idxs = std::make_index_sequence<std::tuple_size_v<Tuple>>>
struct type_index;
template <class T, template <class…> class Tuple, class… Args, // 3
std::size_t… Is>
struct type_index<T, Tuple<Args…>, std::index_sequence<Is…>>
: std::integral_constant<std::size_t,
((Is * match_v<T, Is, Tuple<Args…>>) + … + 0)> {
static_assert(1 == (match_v<T, Is, Tuple<Args…>> + … + 0),
"T doesn't appear once in Tuple");
};

I’m doing all the work at (3) with this guy:

Is * match_v<T, Is, Tuple<Args...>> + ... + 0

This fold is binary to handle empty parameter packs.  The “match_v” predicate (1) checks if a type “T” exist at index “I” in a tuple “Tuple”. We can see what an example instantiation would do:

type_index<int, tuple<char, bool, int>>

// T = int
// Tuple = tuple<char, bool, int>
// Args... = char, bool, int
// Is = 0, 1, 2 -> index sequence of cardinality = sizeof...(Args)

// If we denote as "m" the "match_v" predicate, we have
(0 * m<int,0,Tuple> + (1 * m<int,1,Tuple> + (2 * m<int,2,Tuple>)))
(0 * 0 + (1 * 0 + (2 * 1))) = 2

2 -> the index of "int" inside Tuple

Essentially we multiply each index with the predicate’s result, and sum the results. The predicate (1) could be written inline, but since there’s a second use for it, it’s good to abstract it out.

Our specialization has a static assertion. By using a variation of the fold expression described above, it makes sure that only one occurrence of the type “T” exists in “Tuple”. This way, when our meta-function returns zero, we can be positive it means “index zero” instead of “does not exist”; additionally we make sure that if the type exists more than once in the tuple, a static assertion is thrown much like in the original implementation.

Check a demo here.

A peace offering

I understand that flattening template instantiations might not appeal to some people, so let me extend an olive branch here: when done right, compile time recursion can be OK. Take for example the “index_sequence” we use in our code, which is implemented recursively:

  • Implementations use an amazing trick (I believe it’s attributed to Dave Abrahams but don’t quote me on that – it certainly appeared in his work a long time ago) to lower recursion depth to Log2(N). It is based on the realization that you can use the first half of an index sequence to build the second half, by adding an initial offset:

    index_sequence<20> = index_sequence<10, 10 + index_sequence<10>>

  • Ubiquitous library facilities are reusable. What this means, is that if someone instantiated an index sequence of twenty elements, most probably this will be shareable in your implementation.

Finally, if we end up using recursion (hope not but…), simplicity should compensate for potential hazards. Here’s how I’d solve the same problem recursively, with a single function:


template <class T, class Tuple, std::size_t I = 0>
consteval std::size_t TypeIndex() {
if constexpr (I >= std::tuple_size<Tuple>::value) {
return 0;
} else if constexpr (std::is_same_v<T, std::tuple_element_t<I, Tuple>>) {
return I + 1;
} else {
return TypeIndex<T, Tuple, I + 1>();
}
}

Check a demo here.

Not much explanation is needed apart from the fact that I’ve opted to return the 1-based index and use 0 when the search fails, instead of a compilation error. We also avoid compilation error when more that one instances occur, by returning the index of the first match, thus short-circuiting template instantiations.

Disclaimer

If you’re in a situation where such things matter, always always measure: measure compilation time, executable size, check generated symbols, check compilation in different platforms (time and again I’ve seen VS struggle with sizes & symbol generation that gcc would simply optimize out and discard). I only wrote this post because it’s good to know your options.

A “sorted view”

1. Intro

This installment elaborates on the creation of a “sorted_view” utility. The “References” section contains links to the complete code; throughout the article, we provide demos and snippets to showcase and explain the implementations. The section on modernizing the code contains real world applications of the following C++17 features:

  • structured bindings
  • template argument deduction for class templates

Problem description

Apply a relative sorting on two or more data sequences, i.e take the sequences

s1 = ["Michael", "Larisa", "Paavo", "Mark"]
s2 = ["Phelps", "Latynina", "Nurmi", "Spitz"]

If the goal was to sort based the ordering of s1, the expected output would be

s1 = ["Larisa", "Mark", "Michael", "Paavo"]
s2 = ["Latynina", "Spitz", "Phelps", "Nurmi"]

Related work

This is a common problem in modern packaging. I’ve answered to exactly this problem description here. The “sorting by indices” techniques I used is much better explained (and extended and prepackaged) by Raymond Chen in a great series of articles you can start reading here.

What to expect

This is an example of  how higher abstractions can give better performance. What we do, instead of delaying or minimizing expensive swap / copy / move operations (as in the sources linked so far), is to create a literate view for our sequences that will give us the desired sorted step through and avoid touching the sequences.

2. Design a sorted view

Any goals that are not immediately clear will be explained shortly after.

Goals

  • Don’t mutate the original sequences
  • Don’t make any copies of the original sequences
  • Allow custom comparators to be used for sorting
  • Easy “sorted walks”
  • Ability to use other sequences as tie breakers

Tools

The few Standard Library components we’ll use to make our life simpler:

  • A tuple to hold the (one or more) sequences
  • A vector of indices that holds the step through order
  • The sort and iota algorithms

3. Implementation

In a nutshell, our class will look like this


template <class… Args>
class sorted_view
{
std::tuple<Args…> _ts;
mutable std::vector<std::size_t> _is;
public:
sorted_view(Args&&… args);
// sort based on the Ith sequence
template <std::size_t I, class F> void sort(F&& bop) const;
// size of the Ith sequence
template <std::size_t I = 0> auto size() const;
// the SORTED element of the Ith sequence at position pos
template <std::size_t I> decltype(auto) at(std::size_t pos);
};

Capturing

Since spelling out the type of our class is quite cumbersome and error prone, we’ll create a helper function to automate this:


template <class… Args>
auto make_sorted_view(Args&&… args)
{
return sorted_view<Args…>(std::forward<Args>(args)…);
}

So the tuple class member is referencing any l-value references and move constructing from r-value references.

Sorting

Sorting is pretty straightforward


template <std::size_t I, class F>
void sort(F&& bop) const
{
using namespace std;
_is.resize(get<I>(_ts).size());
iota(begin(_is), end(_is), 0);
std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) {
return forward<F>(bop)(get<I>(_ts)[lhs], get<I>(_ts)[rhs]);
});
}

The I template parameter denotes the sequence we use for sorting. A lambda that uses the custom comparison function with the referenced elements of that sequence is used in the sort algorithm.

Accessing

True to our “easy access” goal we provide an access method


template <std::size_t I>
decltype(auto) at(std::size_t pos)
{
return std::get<I>(_ts)[_is[pos]];
}

all it does is retrieve the element of the Ith sequence at the sorted position pos.

Example

Say we had the top ten of Olympic medalists. If we encode each column as a sequence (hence we’d have the sequences ‘first_name’, ‘last_name’ etc) this is how we’d use the sorted view class


// 1. Create the sorted view
auto sv = make_sorted_view(first_name, last_name, country, sport,
years, gender, medals_gold, medals_silver, medals_bronze, medals_total);
sv.sort<0>(std::less<>{}); // 2. Sort by first name
for (size_t i = 0; i < sv.size<0>(); i++) printf("{ %s-%s-%s }\n",
sv.at<0>(i).c_str(), // 3. access the ith sorted element of sequence 0
sv.at<1>(i).c_str(),
sv.at<2>(i).c_str());

The output is

{ Birgit-Fischer-Germany }
{ Bjorn-Daehlie-Norway }
{ Carl-Lewis-United States }
{ Jenny-Thompson-United States }
{ Larisa-Latynina-Soviet Union }
{ Mark-Spitz-United States }
{ Michael-Phelps-United States }
{ Ole Einar-Bjorndalen-Norway }
{ Paavo-Nurmi-Finland }
{ Sawao-Kato-Japan }
{ Usain-Bolt-Jamaica }

Working code can be found here.

Proving the “no copy” clause

Let’s avoid standardese and just use an example. Therein the S class explicitly forbids copy construction and allows move construction; the sorted_view class captures everything as expected.


struct S {
int i;
S(int i) : i(i) {}
int size() { return 1; }
S(S const&) = delete;
S(S&&) = default;
};
int main()
{
S s1(1), s2(2);
auto sv = make_sorted_view(s1, s2, S(3));
using sv_t = decltype(sv);
using tuple_t = sv_t::tuple_t;
using tuple_t = std::tuple<S&, S&, S>;
}

The additional clue here is that if the two definitions of tuple_t were different, the program wouldn’t compile due to conflicts.

4. Sorting with multiple criteria

Let’s return to our original example. If we were to sort based on gold medals, we see that Mark Spitz and Carl Lewis collected an equal amount of them so we’d like to specify a tie breaker, let’s say silver medals. Nope, equal there too so  let’s fall back to bronze. This process of relative sorting with fallback is what the following method does


template <std::size_t I, std::size_t… Is, class F>
void multi_sort(F&& bop) const
{
using namespace std;
_is.resize(get<I>(_ts).size());
iota(begin(_is), end(_is), 0);
std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) {
return detail::use_comparator<I, Is…>::apply(
_ts, lhs, rhs, forward<F>(bop));
});
}

The Demo provides essentially the following use case


sv.multi_sort<9, 6, 7, 8>([](auto&& a, auto&& b) { return a > b; });

i.e. sort based on total medals (sequence 9), then gold (6), then silver (7) and finally bronze (8). Complete code can be found in the references in case you ‘re wondering how is the equality operator generated (we only pass one comparator) or how to actually apply the fallback comparisons (generic lambdas are great by the way).

5. What can we get out of c++17

I hate the “I have this hammer, are there any nails” approach. The things we’ll work on modernizing are actual shortcomings of the pre C++17 code version so don’t expect a cornucopia of features. That been said, I find the following language additions exceptional.

No more “make_stuff” stuff

Template argument deduction for class templates is here to eliminate most cases of make_something helper functions. Though they might not look like such, the following are class template instantiations:


sorted_view sv1(first_name, last_name, sport, medals_gold, medals_silver);
sorted_view sv2(medals_bronze, medals_total);

Implicitly synthesized Deduction Guides (from existing constructors) take over the task of spelling out the class type (which in this case matches the desired behavior in terms of capturing). We can also specify our own deduction guides explicitly but this deserves its own post.  Also note that the type of the two variables declared above is different (different instantiations of the same class template).

NOTE: A nitpicker might have noticed that since the beginning the sorted view constructor didn’t use forwarding references but mere r-value references. That was so that we’d take advantage of reference collapsing rules and have the C++17 extension work out of the box in the desired fashion. If you don’t believe me make the constructor templated in the Demo below and see if template argument deduction for class templates works (spoiler … it doesn’t).

Accessible access

The way we accessed the sorted view elements was kind of awkward, mostly because the user had to be quite familiar with the class interface. Secondly because even then, unless we went the extra mile to bake “named tuples” into sorted_view, no one would be comfortable using the indexes of the inner tuple without looking at the instantiation (I used many fields in the initial example make this obvious).

Structured bindings provide a simple way to decompose a class (destructure one might say) but even better, you can do this your own way! Since destructuring the data holder of the sorted view (the inner tuple) doesn’t do much good, here’s what we’ll provide :


sorted_view sv(first_name, last_name, sport, medals_gold,
medals_silver, medals_bronze, medals_total);
auto [bName1, bName2, bSport, bGoldM, bSilverM, bBronzeM, bTotalM] = sv;

Every binding is an instance of class that holds pointers to a tuple element and the sorted indices


template <class T>
struct sorted_element
{
std::remove_reference_t<T>* _data;
std::vector<std::size_t> const* _is;
sorted_element(T& elem, std::vector<std::size_t> const& is)
: _data(&elem), _is(&is)
{
}
decltype(auto) operator[](std::size_t i)
{
return _data->at(_is->at(i));
}
};

Also, this type provides a sorted access through an overloaded subscript operator:


sv.multi_sort<6, 4, 5>([](auto&& a, auto&& b) { return a > b; });
auto [sName1, sName2, sSport, sGoldM, sSilverM, sBronzeM, sTotalM] = sv;
for (size_t i = 0; i < sv.size<6>(); i++)
{
std::cout << sTotalM[i] << " " << sGoldM[i] << " " << sSilverM[i] << " "
<< sName1[i] << " " << sName2[i] << std::endl;
}

Now this makes the code way clearer with minimum effort. Here’s what it took to provide custom structured bindings for our class:


namespace std
{
// 1. provide tuple_element<I, sorted_view<…>>::type
template <std::size_t I, class… Types>
struct tuple_element<I, utl::sorted_view<Types…>>
{
using sorted_view_t = utl::sorted_view<Types…>;
using type = typename sorted_view_t::template tuple_element_t<I>;
};
// 2. provide tuple_size<sorted_view<…>>
template <class… Types>
struct tuple_size<utl::sorted_view<Types…>>
: public integral_constant<size_t, sizeof…(Types)>
{
};
}
namespace utl
{
// 3. provide a get<I>(sorted_view<…>) function
template <std::size_t I, class… Types>
auto get(sorted_view<Types…> const& sv)
{
return sv.template elem<I>();
}
}

It’s worth mentioning that ADL allows get to be written inside the utl namespace (there’s a dispute on Reddit on whether writing it inside std would be UB). Usage of the techniques mentioned in this section can be found in the following Demo

6. Extensions

There are things to consider in order to augment the functionality of the sorted view class:

  • Provide an interface so that sort could take advantage of C++17 parallel sorts if desired
  • Raymond Chen has an article on the Schwarzian transform, maybe this would be good to have
  • Custom view should be also possible. Why just sort, a lot of other stuff can be done the same way (my first attempt cluttered the interface so until I find a way to keep simple things simple, having a dedicated class is OK)
  • Provide named tuples for the C++14 implementation
  • Concepts: the elements sorted_view acts upon, have to be sortable, indexable etc and all these are constraints wonderfully expressed with concepts (you should see the errors I got while writing this w/o concepts).

On most of the points above, I’ll return with a dedicated post, until then thank you for reading.

References

  1. sorted_view.h
  2. sorted_view.cpp
  3. Sorting by indices – Raymond Chen
  4. How do I sort two vectors containing relative entries
  5. Template argument deduction for class templates
  6. Structured bindings

Quantifiers, metaprogramming and concepts

1. Intro

Quantification expresses the extent to which a  predicate is true over a set of elements. Two forms are the most common:

  • Universal \forall
  • Existential \exists

To define the quantifiers, let A represent an expression, and let x represent a variable. Then:

  • Universal Quantification: If we want to indicate that A is true for all possible values of x, we write “\forall\ x\; A\;“. Here \forall\ x\; is called the universal quantifier, and A\; is called the scope of the quantifier. The symbol \forall is pronounced “for all“.
  • Existential quantification: If we want to indicate that A is true for at least one value of x, we write “\exists\ x\; A\;“. This statement is pronounced “There exists an x such that A“. Here \exists\ x\; is called the existential quantifier, and A is called the scope of the existential quantifier.

We elaborate on the topic as follows: In section 2 we describe quantification in the template metaprogramming context, in section 3 we explore related standard library facilities introduced in C++17 and section 4 adds concepts to the equation. Finally, in section 5 we draw some concluding remarks.

2. Use in metaprogramming 

Metaprograms often use predicate logic in creative ways. For example, type queries in generic code are used to constrain, dispatch, specialize, activate or deactivate code at compile time. Even until C++14, expressing predicate logic at compile time was cumbersome, when handling variadic arguments, involving a helper type and crafty use of boolean logic:


template< bool… Bs >
using bool_sequence = std::integer_sequence< bool, Bs… >;
template< bool… Bs >
using bool_and = std::is_same<
bool_sequence<Bs… >, bool_sequence<(Bs || true)… >>;
template< bool… Bs >
using bool_or = std::integral_constant< bool, !bool_and< !Bs… >::value >;

The C++17 Standard allows for much simpler (almost mathematical) syntax, by folding over the logical operators:


template <bool… Bs> constexpr bool all_v = (… && Bs);
template <bool… Bs> constexpr bool any_v = (… || Bs);

Above we define two variable templates that become true or false depending on the value of their (boolean) template arguments. Since they’re conceptually and syntactically simpler, as well as cheaper in terms of instantiations, we’ll prefer these to built subsequent tools.

Making the Quantifiers

To define our quantifiers we need to involve the predicates, essentially the process that produces the list of boolean we feed to the above variable templates. Let P be a predicate over a type and Ts a type list, then:


// ———————————— "is there ?"
template <template <class> class P, class… Ts>
constexpr bool existential_quantifier = any_v<P<Ts>::value…>;
// —————————————– "are all ?"
template <template <class> class P, class… Ts>
constexpr bool universal_quantifier = all_v<P<Ts>::value…>;

view raw

quantifiers.cpp

hosted with ❤ by GitHub

These quantifiers could be used for example to check that a list of types is trivially copyable:


template <class… Ts>
constexpr bool are_tc = universal_quantifier<std::is_trivially_copyable, Ts…>;

A toy example can be found here . Of course the quantifiers can be used inline w/o stub code like the are_tc variable template above.

Generalized Predicates – Currying metafunctions

A limitation of the code so far is that it can only be used for univariate predicates. Say we had a predicate that needed 2 or more variables to compute, how would we handle it? Here the is_smaller type checks that the size of its operands is ascending:


template <class T1, class T2>
using is_smaller = std::integral_constant<bool, sizeof(T1) < sizeof(T2)>;

view raw

is_smaller.cpp

hosted with ❤ by GitHub

You might argue that “ \forall T\; \in\; A\; smaller\;” i.e. is_smaller holds for every type T in typelist A, has no meaning and you’d be right (smaller than what?). Thankfully we need to introduce a restriction which is that multivariate predicates are to be bound until only one argument position is free and we can do that either in place (by loading the needed types throughout our interfaces) or by introducing a compile time bind, essentially currying metafunctions:


template <template <class…> class C, class…Ts>
struct curry
{
template <class T>
using type = C<Ts…, T>;
};

view raw

meta_curry.cpp

hosted with ❤ by GitHub

With this in place here’s how to verify that a type is smaller than every type in a typelist:


struct S0 {};
struct alignas(2) S1 {};
struct alignas(4) S2 {};
struct alignas(8) S3 {};
int main()
{
static_assert(universal_quantifier<curry<is_smaller, S0>::type, S1, S2, S3>);
}

3. The Standard library

The universal and existential quantifiers are defined in the C++17 Standard library as conjunction and disjunction respectively. Their main difference from what we defined so far is that they short-circuit in the expression level, i.e. not every expression needs to be well-formed if the result can be computed before their evaluation. This comes in handy in certain cases, but it implies they’ll have to be recursively implemented (I’m pretty sure the following implementation is attributed to Johnathan Wakely):


template<class…> struct conjunction : std::true_type { };
template<class B1> struct conjunction<B1> : B1 { };
template<class B1, class… Bn>
struct conjunction<B1, Bn…>
: std::conditional_t<bool(B1::value), conjunction<Bn…>, B1> {};
template<class…> struct disjunction : std::false_type { };
template<class B1> struct disjunction<B1> : B1 { };
template<class B1, class… Bn>
struct disjunction<B1, Bn…>
: std::conditional_t<bool(B1::value), B1, disjunction<Bn…>> { };

view raw

con_dis.cpp

hosted with ❤ by GitHub

i.e. we stop inheriting when the expression has computed. In contrast, our flattened instantiations implementation excels in compile times and instantiation depth violations. Choose wisely between flat and short circuiting depending on your use case.

As an example, recursive instantiations here use an integer sequence of 18 elements to break a tuple, essentially due to its move constructor. The move constructor has a conditional noexcept that is implemented recursively. Therefore, for each template argument, a constant number of additional instantiations is required (the implementation e.g. of is_nothrow_move_constructible in terms of is_nothrow_constructible which is implemented in terms of __is_nt_constructible and so on, for 15 instantiation levels!). This means that each template argument for tuple requires 15 additional instantiation levels for this check. On top of that, 9 levels are always required (constant depth). Therefore, 17 arguments require an instantiation depth of 17*15+9 == 264 (default for clang 3.7 was 256).

Another example use of this mechanism would be to elaborate on quantification the other way around, i.e. one type vs many predicates (instead of 1 predicate vs many types we’be doing so far):


template <class T, template <class> class… Ps>
constexpr bool satisfies_all_v = std::conjunction<Ps<T>…>::value;
template <class T, template <class> class… Ps>
constexpr bool satisfies_any_v = std::disjunction<Ps<T>…>::value;

4. Concepts

Concepts are predicates. And as such a simple extension of our quantifiers can encapsulate concepts as well:


// ———————————————————–
// ——————– to express "1 predicate Vs many types"
template <template <class> class P, class… Ts>
concept bool existential_quantifier = any_v<P<Ts>::value…>;
template <template <class> class P, class… Ts>
concept bool universal_quantifier = all_v<P<Ts>::value…>;
// ———————————————————–
// ———————————————————–
// ——————– to express "1 type Vs many predicates"
template <class T, template <class> class… Ps>
concept bool satisfies_all = all_v<Ps<T>…>;
template <class T, template <class> class… Ps>
concept bool satisfies_any = any_v<Ps<T>…>::value;
// ———————————————————–

To get an idea of how this could be used check below:


template<typename T> concept bool Data = requires(T t) { t.data; };
template<Data… Ts>
requires existential_quantifier<std::is_move_constructible, Ts…>
auto f(Ts… ts)
{
std::cout << "version for data\n";
return 0;
}

or this Demo (disclaimer: only to showcase the use of quantifiers with concepts, don’t try to find particular meaning in the code).

5. Conclusions

Metaprogramming is changing. For the better. Better techniques and more expressive syntax are emerging that will allow library writers to produce more elegant code. Users of this code will have a better experience due to more comprehensive compiler errors. The price we have to pay is understanding the formalities and theory involved with this ecosystem; the bet to be won is Stroustrup’s “make simple things simple” claim, so that modern C++ can be “expert friendly” but not “expert only”.

6. References

On Quantifiers and predicate logic

Standard library documentation

Related Stack Overflow articles listed chronologically

Related code on the blog’s repository

Want a (std::) slice?

1. Intro

Whenever we use data structures that are compile time restricted in terms of extends, we allow for significant optimizations and early error detection in the expense of more cumbersome access style and code maintenance, due to the inevitable introduction of templated code. Users of such data types, algebraic (tuple, pair) or aggregate (array), breathed a sigh of relief (well… maybe not all of them) with the introduction of std::apply in C++17, that handles the boilerplate of unpacking their contents into a callable. To give an example, you should now be able to write your callable like so


template <class… Ts>
void call_me(Ts&&… args) { /* implementation */ }

and

  • Remain blissfully unaware of the structure that packs your arguments (the back end doesn’t need to specialize for different argument sources)
  • Use the callable with any of the aforementioned data types without special treatment (the front end needs no boilerplate to unpack the data structures)

So, for example, you can know invoke your callable like so


std::apply(call_me, a); // a is array<int, 4>
std::apply(call_me, p); // p is pair<double, int>
std::apply(call_me, t); // t is tuple<char, int, double>

view raw

using_apply.cpp

hosted with ❤ by GitHub

2. Creating slices

Handy as it may be, this ecosystem of data types and tools, leaves a lot to be desired. First and foremost, the slice operation i.e. the creation of  an entity that captures a subset of some given data. My main motivation for this was to be able to pass slices to the apply function although while researching the topic I found an old freestanding request, which I’ve just answered here (isn’t it a bummer when you invent something only to discover other people have had similar musings?) Below we define the tuple_slice operation that fills this gap and elaborate on the code:


namespace detail
{
template <std::size_t Ofst, class Tuple, std::size_t… I>
constexpr auto slice_impl(Tuple&& t, std::index_sequence<I…>)
{
return std::forward_as_tuple(
std::get<I + Ofst>(std::forward<Tuple>(t))…);
}
}
template <std::size_t I1, std::size_t I2, class Cont>
constexpr auto tuple_slice(Cont&& t)
{
static_assert(I2 >= I1, "invalid slice");
static_assert(std::tuple_size<std::decay_t<Cont>>::value >= I2,
"slice index out of bounds");
return detail::slice_impl<I1>(std::forward<Cont>(t),
std::make_index_sequence<I2 – I1>{});
}

view raw

tuple_slice.cpp

hosted with ❤ by GitHub

Not to be confused with valarray’s slice, this piece of code does 3 simple things:

  • Protects from invalid instantiations with static asserts
  • Forwards the container (tuple, array, pair) + an integer sequence to the implementation function
  • The implementation function gets every element indexed by the sequence and forwards it as a tuple i.e. lvalues are referenced and rvalues are move constructed

A slice of t can be obtained like so

tuple_slice<I1, I2>(t);

Where [I1, I2) is the exclusive range of the subset.

To be able to create slices that are not zero-based, all we do is create the index sequence out of the cardinality of the desired sequence and then offset the values. This works but a helper “make_integer_range” type would be nice to have:


namespace detail
{
template <class…> struct integer_range;
template <class T, class Ofst, T… Vals>
struct integer_range<T, Ofst, std::integer_sequence<T, Vals…>>
{
using type = std::integer_sequence<T, (Ofst::value + Vals)…>;
};
}
template <class T, T I1, T I2, class Check = std::enable_if_t<I1 <= I2>>
using make_integer_range = typename detail::integer_range<T,
std::integral_constant<T, I1>, std::make_integer_sequence<T, I2 – I1>>::type;
template <std::size_t I1, std::size_t I2>
using make_index_range = make_integer_range<std::size_t, I1, I2>;

You can experiment with integer ranges here. Back to the main act then (using make_index_range to simplify the definition of tuple_slice is trivial so we’ll omit it), here are some example uses of tuple_slice:


int main()
{
int i1(1), i2(1);
char c1('o'), c2('k');
auto tp{ std::make_tuple(2., 4., 23, 7, 'x', 'l') };
std::array<char, 4> ar{ { 'a', 'b', 'c', 'd' } };
// 1. read from a slice ———————————–
std::tie(i1, i2) = tuple_slice<2, 4>(tp);
// 2. assign to a slice ———————————–
tuple_slice<0, 2>(ar) = std::tie(c1, c2);
// 3. apply a slice on a lambda —————————
std::apply([](auto&& arg1, auto&& arg2) {
std::cout << "(" << arg1 << ", " << arg2 << ")\n";
}, tuple_slice<1, 3>(ar));
}

3. Design considerations

One might ask “Shouldn’t a slice operation on an array create another array?” and this brings us to a generic code paradox: While trying to handle more cases we may end up being more restrictive; The extra effort to create respective make_array and forward_as_array functions is not the problem (there you go if you don’t believe me). Here are the considerations:

  • We cannot create an array of references, hence we’d fall back to creating an array of reference wrappers. Not only does this create an interface mismatch with the tuple case (we need to call get() on a reference wrapper to obtain the value) but between lvalue and rvalue arrays as well: slicing the later would give us move constructed values, hence no  get() method.
  • How would this logic extend to the pair case? Slicing a pair does not give us yet another pair (only in the dummy case where the slice is identical to the original) so the fallback of producing a tuple would be viewed as an exception. To me, the frequency of the word “exception” in the description of a design is inversely proportional to its quality.
  • Slices should handle uniformly the empty case and tuples do this .
  • If there’s a unique sliced type, pattern matching it would require minimum effort.
  • There’s always the option to create arrays out of the slices if we want to (would be interesting to see if the intermediate tuple can be optimized away) .

So, there you go … less is more

4. References

Compose and curry as folds

1. Intro

In a previous post we introduced C++17 fold expressions and described a way to extend them for arbitrary callables. Implementation details don’t matter for what we’re elaborating on here but it should be clear that (given the tools we developed) the following is possible:

(Op<F>(args) + ...)
(... + Op<F>(args))

i.e. performing a right or left fold respectively using the “F” type as the folding operator (the “+” operator is used just to generate the underlying expression templates). By “F” we mean a type that will serve as the reducing function (it generates callable instances) e.g. this:


struct Max
{
template <class T1, class T2>
constexpr decltype(auto) operator()(T1&& lhs, T2&& rhs)
{
return lhs > rhs ? std::forward<T1>(lhs) : std::forward<T2>(rhs);
}
};

view raw

max_reducer.cpp

hosted with ❤ by GitHub

Such tools enhance the notion of “reduce” being the ultimate production line where new algorithmic skeletons and processes can be forged sometimes to a surprising extend as we’ll see below.

2. On composition

In a nutshell composition is the pointwise application of one function to the result of another to produce a third one. Add another one to the pipeline and the mathematic notation goes something like this:

(\;f\; \circ\ g\; \circ\ y\;)\;(\;x\;)\; \equiv f\;(\;g\;(\;y\;(\;x\;)\;)\;)\;

…and so on and so forth, we should be able to compose as many functions as we want provided that the result of the “inner” functions can be a valid input to the outer.

Now, look at the mathematical notation again. What does the right hand side look like? It’s just folding over the function call operator! Written in pseudo C++ (I say pseudo because C++ folds don’t support arbitrary operators out of the box) it would look like this:

args () ...

where args is a pack of callables we want to compose. Now it all boils down to synthesizing the reducer, and composing two callables is easy enough, by creating a lambda that applies the left side to the result of calling the right side with whatever arguments you pass to it:


struct Comp
{
template <class F1, class F2>
auto operator()(F1 lhs, F2 rhs)
{
return [=](auto… args)
{
return lhs(rhs(args…));
// ^^^^^^^^^^^^^^^^^
};
}
};
template <class… Ts>
auto compose(Ts&&… args)
{
using fld::Op;
return (Op<Comp>(std::forward<Ts>(args)) + …).give();
}

We’ve also added a helper compose function to hide the fold generation even though it’s good to know that it takes a right fold to do this. Having this in place you can write things like


int main()
{
auto f1 = [](auto a) { return 2 * a; };
auto f2 = [](auto a) { return 3 * a; };
auto f3 = [](auto a, auto b) { return b * a; };
auto cfs = compose(f1, f2, f3);
std::cout << cfs(2, 3) << std::endl;
}

One more thing

I might have oversimplified the definition of the Comp structure. Taking move semantics into consideration is not only a matter of efficiency but correctness as well, since preserving value category is essential when using forwarded callables. To ammend this, the returned lambda will do a perfect capture using this helper type:


template <class T>
using perfect_capture_t =
std::conditional_t<std::is_lvalue_reference<T>::value,
std::reference_wrapper<std::remove_reference_t<T>>, T>;

This type will be used by the (C++14 generalized) lambda capture to acquire either a reference (wrapper) or a moved callable (depending on its value category). So now Comp is written as follows


struct Comp
{
template <class F1, class F2>
auto operator()(F1&& lhs, F2&& rhs)
{
return [
lhs = perfect_capture_t<F1>(std::forward<F1>(lhs)),
rhs = perfect_capture_t<F2>(std::forward<F2>(rhs))
](auto&&… args)
{
return lhs(rhs(std::forward<decltype(args)>(args)…));
};
}
};

view raw

composer.cpp

hosted with ❤ by GitHub

3. On currying

Currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument. In other words it could be described (nitpickers pardon me) by the production of partially applied functions something you see in C++ with certain applications of std::bind. Following the same logic as above we’ll be using the left fold pipeline with the following reducer


struct Curry
{
template <class F1, class F2>
auto operator()(F1&& lhs, F2&& rhs)
{
return [
lhs = perfect_capture_t<F1>(std::forward<F1>(lhs)),
rhs = perfect_capture_t<F2>(std::forward<F2>(rhs))
](auto&&… args)
{
return lhs(rhs, std::forward<decltype(args)>(args)…);
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
};
}
};
template <class… Ts>
auto curry(Ts&&… args)
{
using fld::Op;
return (… + Op<Curry>(std::forward<Ts>(args))).give();
}

view raw

curry.cpp

hosted with ❤ by GitHub

Now this is possible


int main()
{
auto f1 = [](auto a) { return 2 * a; };
auto f2 = [](auto a) { return 3 * a; };
auto f4 = [](auto a, auto b, auto c) { return c * b * a; };
std::cout << curry(f4, 2, 1)(1) << std::endl;
auto ccfs = compose(f1, f2, curry(f4, 2, 1));
std::cout << ccfs(1) << std::endl;
}

The last piece of the puzzle is constexpr lambdas that would really allow for this to extend to synthesizing constexpr callables and this is on its way for C++17

Demo

A runnable example can be found here. Note that this particular example only runs for g++ and I’ll be diving into clang in a later endeavor.

References