Fun with folds

The functional genome

 A fold is a higher order function (a function that has one or more function parameters and/or returns a function) that is recursively applied over a data structure. An easy to grasp example of such a function can be found in Haskell :

foldl (operator) accumulator [...]

Here are the building blocks of the above expression :

  • foldl : A left fold; left as in leftwise applied (…not a leftover fold) – more on this later.
  • (operator) : A binary function
  • accumulator : A starting value
  • […] : A list to fold

 So, say you want for example to sum the (list of) numbers from 1 to 10 :

foldl (+) 0 [1..10]

 This expands to :

( ( ( 1 + 2 ) + 3 ) + ... + 10 )

 Does the power of folds boil down to accumulations? Far from it!

origami

 Folding can be used creatively to perform all sorts of actions, algebra preserving ones:

foldl (max) 0 [1, 20, 2, 14]

or catamorphisms

foldl (Stringify) "" [1, 20, 2, 14]

here we assume that Stringify is a function (you’d need a type class for this) that maps its input to a stringified concatenation, so that we’d get “120214” and have a numeric sequence reduced to a string.

Folds in a C++ Context

 C++17 will have fold expressions! Clang supports them since 3.6 as does the trunk version of gcc. The syntax differs from Haskell and is designed to play well with variadic templates. A fold expression can be written like this:

(args + ...)

 This is a (unary) left fold, because we “anchor” the application of the operator to the left, ie it expands as:

((args$0 + args$1) + ...) + args$n

 If we were to create a right fold :

(... + args)

 the expression expands like this :

args$0 + (... + (args$n-1 + args$n))

 We used the term unary previously not to describe the operator (+) but the accumulator. When an accumulator is explicitly specified

(args + ... +   an)
(  a0 + ... + args)

we dub the fold expression as “binary”. For unary folds the identity value of the operator’s algebra is implicitly used, i.e for ‘+’ a zero is added, for ‘*’ the result is multiplied by 1 etc, so that empty parameter packs have meaningful folding.

Show me some folding

A for_each lambda

auto ForEach = [](auto&& fun, auto&&... args) {
    (fun(args), ...);
};

To showcase this, let’s assume the existence of a printer functor

struct Printer 
{
    template <class T> 
    void operator()(T&& arg) { std::cout << arg; }
};

we can call it on arbitrary sequences like this

ForEach(Printer{}, 1, " to the ", 2, " to the ", 3, '\n');

ForEach applies an arbitrary callable to variadic arguments by folding with the , operator. It also does this in a left to right fashion, which is the associativity of the comma operator. Note that if a right fold was used inside ForEach we’d still be applying the callable from left to right (the associativity of comma does not change). Run the code here.

Summing the contents of an std::array at compile time

namespace detail
{
	template <class... Ts>
	constexpr auto sum_(Ts&&... args)
	{
		return (args + ...);
	}

	template <typename T, size_t N, size_t... Is>
	constexpr T sum_impl(array<T, N> const &arr, 
            index_sequence<Is...>)
	{
		return sum_(get<Is>(arr)...);
	}
}

template <typename T, size_t N>
constexpr T sum(array<T, N> const &arr)
{
    return detail::sum_impl(
        arr, make_index_sequence<N>{});
} 

int main()
{
    constexpr array<int, 4> arr{ { 1, 1, 2, 3 } };
    cout << integral_constant<int, sum(arr)>{};
}

The core function here is detail::sum_ which uses fold expressions to calculate the total of a sequence.

A more modern summing style

The above is OK and all, but C++17 will give us the tools to be way, way more terse. In this case the function apply abstracts away the process of unfolding an algebraic data type into a function call (what we manually do, in a total of three functions, with the index sequence above). Having this in place, all it takes is to encapsulate our fold :

struct Summer
{
    template <class... Ts>
    constexpr auto operator()(Ts&&... args) { 
        return (args + ...); 
    }
};

and then call

apply(Summer{}, arr)

Sadly the current proposal for apply only works for tuples. It takes a minor modification to have it working for std::array as well, specifically the use of a mechanism to obtain the valency of the input data type (for now tuple_size is “hardcoded” into apply). Click here for a demo.

Folding arbitrary functions

A somewhat limiting feature of c++ fold expressions, is that they’re only available for certain operators:

        +  -  *  /  %  ^  &  |  ~  =  <  >  <<  >>
        +=  -=  *=  /=  %=  ^=  &=  |=  <<=  >>=
        ==  !=  <=  >=  &&  ||  ,  .*  ->*

So doing

args Op ...

is not allowed for arbitrary operators. We are going to create a mechanism to bypass this. To solve this we are going to use expression templates, which is the first thing that came to mind when looking at Haskell’s fold diagram, due to its resemblance to Veldhuizen’s expression parse trees:

Right-fold-transformation

The code below creates add-hoc expression templates for arbitrary operators using thin wrapper types that can be parametrized by the operant/operator pair.

First thing we need is a wrapper type that apart from data, records the type of the operant we want to use:

template <class F, class T>
struct XX
{
	T data;

	template <class D>
	constexpr XX(D&& data)
		: data(std::forward<D>(data))
	{ }

	constexpr T give() const
	{
		return data;
	}
};

Then we create an expression class that records the operants and the type of the operation we want to perform.

template <class L, class F, class R>
struct YY
{
	L l;
	R r;

	template <class LL, class RR>
	constexpr YY(LL&& l, RR&& r)
		: l(std::forward<LL>(l))
		, r(std::forward<RR>(r))
	{ }

	constexpr auto give() const
	{
		return F{}(l.give(), r.give());
	}
};

We’ll be using the + operator in our ad-hoc expression templates so we overload it to perform expression parsing for our wrapper class (the two versions are created in way that satisfies the types involved in a left to right evaluation):

template <class F, class R, class T>
constexpr auto operator+(
    XX<F, T> const& lhs, R const& rhs)
{
	return YY< XX<F, T>, F, R >(lhs, rhs);
}

template <class F, class T1, class T2, class R>
constexpr auto operator+(
    YY<T1, F, T2> const& lhs, R const& rhs)
{
	return YY< YY<T1, F, T2>, F, R >(lhs, rhs);
}

Finally the interface functions use the above machinery:

namespace detail
{
	template <class... Ts>
	constexpr auto _foldl(Ts&&... args)
	{
		return (args + ...);
	}

	template <class... Ts>
	constexpr auto _foldr(Ts&&... args)
	{
		return (... + args);
	}
}

template <class F, class... Ts>
constexpr decltype(auto) foldl(Ts&&... args)
{
	return detail::_foldl(XX<F, Ts>(args)...).give();
}

template <class F, class... Ts>
constexpr decltype(auto) foldr(Ts&&... args)
{
	return detail::_foldr(XX<F, Ts>(args)...).give();
}

What we get out of this, is a foldl and a foldr function that, much like the Haskell counterparts, can be parameterized by the type of a binary operator. And then things get interesting :

// concatenate the stringification of sequence elements
foldr<Join>(
    1,  
    std::string(" bird in the hand, is worth "), 
    10, 
    std::string(" in the bush"));

// create a vector out of a sequence
auto k = foldl<Vettore>(1, 2, 30, 12);

// find the maximum element of a sequence
foldl<Max>(1, 20, 3, 5);

The complete code for the above examples can be found here.

Hopefully this is clear enough to proceed to the next step, providing some syntactic sugar. We didn’t get through all this ET trouble to end up with something compile time recursion could (kind of, almost in let’s say a similar way) handle. What we aimed for is the ability to write customizable C++ folds ourselves. Here’s how :

template <class F, class T>
XX<F, T> Op(T const& val)
{
    return XX<F, T>(val); 
}

So when using this “operator generator” we’d be writing things like

(Op<Max>(args) + ...)

References

1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4191.html
3. http://en.cppreference.com/w/cpp/language/fold
4. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.248
5. Extension to std::experimental::apply
6. Arbitrary folds
7. All the fold examples in one place
8. https://www.youtube.com/watch?v=HGjGR9DfWSI

Advertisements

15 thoughts on “Fun with folds

  1. You didn’t include the starting value in your summation fold. You can see from the type of the folding function that the starting value is always used for any amounts of inputs from the traversable. Leaving out the starting value on any function that has two differently typed “parameters” would break.

    And I think stringify would be
    \ a b -> a ++ show b
    And this is why for this case you would do a foldr because then the function would be
    (++) . show
    And the point free style always shows off how intelligent we are

    Like

    1. >Leaving out the starting value on any function that has two differently typed “parameters” would >break.

      Not if the operation we’re performing handles the different types. Have you tried the `Join` example ? Parameters are of `int` and `string` types and no initial value is needed. The “+0” concept was mentioned to introduce identity elements, C++ unlike Haskell does not mandate specifiying a starting value. If you imply the “break” case is when no arguments are specified at all please refer to my general comment below where address the comment made by Morwenn as well. Thx for your comment, ❤ your Haskell

      Like

      1. > Not if the operation we’re performing handles the different types.

        I meant the Haskell fold example

        foldl (+) 0 [1..10] This expands to ( ( ( (0 + 1) + 2 ) + 3 ) + … + 10 )

        That this “starter” is needed (or “finisher” for foldr) can be seen by the type. Given the type (b -> a -> b) -> b -> [a] -> b there is no way for foldr to generate a result from any non-empty list if the starting value of type b is not used. You can see that by the “Stringify and concatenate” example.

        foldl (\a b -> a ++ (show b)) “” [1,2,3]

        If this were actually to fold to
        (1 ++ show 2) ++ show 3) like you say in your example, then this is actually a type error.
        It folds to
        (“” ++ show 1) ++ show 2 …

        And then the types work.

        Like

  2. I once developed a small folding library to analyze how we could expand the concept of identity elements as default return values for C++ folds (including right and left identity elements), even though I also wrote a paper against them in the end. You might want to read a bit more about them, so I’ll drop a small link: https://github.com/Morwenn/cpp-fold

    Like

    1. Hello, yes I’m familiar with that work, (apart from the “paper against them”, will check the link again) pretty cool stuff therein . Please refer to my general comment below regarding identity elements, in any case we could cook sth up to fortify the whole concept.

      Like

  3. Another use of folds can be found here http://stackoverflow.com/a/34315532/2567683

    On identity elements now. When writing

    Op<F>(args) + ...
    

    We are accessing an F operator that has nothing to do with addition. It is this custom operator that kept me away from implementing an ‘identity’ element. What would be that for max? std::numeric_limits::min() ? would that have any meaning in an empty parameter list? What if the operator was an arbitrary functor ? how would I deduce an identity from that?

    There are many questions and fuzzy decisions to be made that to me didn’t need answer right away (my primary goal was to implement folds for abitrary operators) plus the fact that one can use “binary folds” if she needs the identity to be there, eg use an empty vector in the “Vettore” example. In any case this is work in progress (I have to fix copies in XX etc) and I appreciate thought provoking comments

    Like

    1. This is exactly one of the reasons I wrote a proposal to remove the default values for some elements: even when trying to generalize the mechanism to encompass right identity elements and left identity elements, there are too many holes. Moreover, the identity elements already picked for operator+ and operator- are a door wide open for silent and unexpected conversions… basically, the paper (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0036r0.pdf) proposes to remove (at least), the defaults for operator+, operator-, operator& and operator| because they don’t make much sense outside of integer operations and can lead to silent error. I wish it was possible to get a working generic mechanism, but couldn’t find one, so… well, I advocated for the deletion of the problematic cases.

      Like

  4. I feel slow now… I’m a C++ guy from way back. I guess I haven’t needed to recursively apply functions over data sets often enough to appreciate the ease/power which is coming our way. (Plus, I’m missing the syntactic details I think; I keep feeling like I’m looking at code samples from some language I don’t know.) But, I always like reading about what smart innovative folks are up to, so thanks for posting this.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s