A “sorted view”

1. Intro

This installment elaborates on the creation of a “sorted_view” utility. The “References” section contains links to the complete code; throughout the article, we provide demos and snippets to showcase and explain the implementations. The section on modernizing the code contains real world applications of the following C++17 features:

  • structured bindings
  • template argument deduction for class templates

Problem description

Apply a relative sorting on two or more data sequences, i.e take the sequences

s1 = ["Michael", "Larisa", "Paavo", "Mark"]
s2 = ["Phelps", "Latynina", "Nurmi", "Spitz"]

If the goal was to sort based the ordering of s1, the expected output would be

s1 = ["Larisa", "Mark", "Michael", "Paavo"]
s2 = ["Latynina", "Spitz", "Phelps", "Nurmi"]

Related work

This is a common problem in modern packaging. I’ve answered to exactly this problem description here. The “sorting by indices” techniques I used is much better explained (and extended and prepackaged) by Raymond Chen in a great series of articles you can start reading here.

What to expect

This is an example of  how higher abstractions can give better performance. What we do, instead of delaying or minimizing expensive swap / copy / move operations (as in the sources linked so far), is to create a literate view for our sequences that will give us the desired sorted step through and avoid touching the sequences.

2. Design a sorted view

Any goals that are not immediately clear will be explained shortly after.

Goals

  • Don’t mutate the original sequences
  • Don’t make any copies of the original sequences
  • Allow custom comparators to be used for sorting
  • Easy “sorted walks”
  • Ability to use other sequences as tie breakers

Tools

The few Standard Library components we’ll use to make our life simpler:

  • A tuple to hold the (one or more) sequences
  • A vector of indices that holds the step through order
  • The sort and iota algorithms

3. Implementation

In a nutshell, our class will look like this


template <class… Args>
class sorted_view
{
std::tuple<Args…> _ts;
mutable std::vector<std::size_t> _is;
public:
sorted_view(Args&&… args);
// sort based on the Ith sequence
template <std::size_t I, class F> void sort(F&& bop) const;
// size of the Ith sequence
template <std::size_t I = 0> auto size() const;
// the SORTED element of the Ith sequence at position pos
template <std::size_t I> decltype(auto) at(std::size_t pos);
};

Capturing

Since spelling out the type of our class is quite cumbersome and error prone, we’ll create a helper function to automate this:


template <class… Args>
auto make_sorted_view(Args&&… args)
{
return sorted_view<Args…>(std::forward<Args>(args)…);
}

So the tuple class member is referencing any l-value references and move constructing from r-value references.

Sorting

Sorting is pretty straightforward


template <std::size_t I, class F>
void sort(F&& bop) const
{
using namespace std;
_is.resize(get<I>(_ts).size());
iota(begin(_is), end(_is), 0);
std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) {
return forward<F>(bop)(get<I>(_ts)[lhs], get<I>(_ts)[rhs]);
});
}

The I template parameter denotes the sequence we use for sorting. A lambda that uses the custom comparison function with the referenced elements of that sequence is used in the sort algorithm.

Accessing

True to our “easy access” goal we provide an access method


template <std::size_t I>
decltype(auto) at(std::size_t pos)
{
return std::get<I>(_ts)[_is[pos]];
}

all it does is retrieve the element of the Ith sequence at the sorted position pos.

Example

Say we had the top ten of Olympic medalists. If we encode each column as a sequence (hence we’d have the sequences ‘first_name’, ‘last_name’ etc) this is how we’d use the sorted view class


// 1. Create the sorted view
auto sv = make_sorted_view(first_name, last_name, country, sport,
years, gender, medals_gold, medals_silver, medals_bronze, medals_total);
sv.sort<0>(std::less<>{}); // 2. Sort by first name
for (size_t i = 0; i < sv.size<0>(); i++) printf("{ %s-%s-%s }\n",
sv.at<0>(i).c_str(), // 3. access the ith sorted element of sequence 0
sv.at<1>(i).c_str(),
sv.at<2>(i).c_str());

The output is

{ Birgit-Fischer-Germany }
{ Bjorn-Daehlie-Norway }
{ Carl-Lewis-United States }
{ Jenny-Thompson-United States }
{ Larisa-Latynina-Soviet Union }
{ Mark-Spitz-United States }
{ Michael-Phelps-United States }
{ Ole Einar-Bjorndalen-Norway }
{ Paavo-Nurmi-Finland }
{ Sawao-Kato-Japan }
{ Usain-Bolt-Jamaica }

Working code can be found here.

Proving the “no copy” clause

Let’s avoid standardese and just use an example. Therein the S class explicitly forbids copy construction and allows move construction; the sorted_view class captures everything as expected.


struct S {
int i;
S(int i) : i(i) {}
int size() { return 1; }
S(S const&) = delete;
S(S&&) = default;
};
int main()
{
S s1(1), s2(2);
auto sv = make_sorted_view(s1, s2, S(3));
using sv_t = decltype(sv);
using tuple_t = sv_t::tuple_t;
using tuple_t = std::tuple<S&, S&, S>;
}

The additional clue here is that if the two definitions of tuple_t were different, the program wouldn’t compile due to conflicts.

4. Sorting with multiple criteria

Let’s return to our original example. If we were to sort based on gold medals, we see that Mark Spitz and Carl Lewis collected an equal amount of them so we’d like to specify a tie breaker, let’s say silver medals. Nope, equal there too so  let’s fall back to bronze. This process of relative sorting with fallback is what the following method does


template <std::size_t I, std::size_t… Is, class F>
void multi_sort(F&& bop) const
{
using namespace std;
_is.resize(get<I>(_ts).size());
iota(begin(_is), end(_is), 0);
std::sort(begin(_is), end(_is), [&](size_t lhs, size_t rhs) {
return detail::use_comparator<I, Is…>::apply(
_ts, lhs, rhs, forward<F>(bop));
});
}

The Demo provides essentially the following use case


sv.multi_sort<9, 6, 7, 8>([](auto&& a, auto&& b) { return a > b; });

i.e. sort based on total medals (sequence 9), then gold (6), then silver (7) and finally bronze (8). Complete code can be found in the references in case you ‘re wondering how is the equality operator generated (we only pass one comparator) or how to actually apply the fallback comparisons (generic lambdas are great by the way).

5. What can we get out of c++17

I hate the “I have this hammer, are there any nails” approach. The things we’ll work on modernizing are actual shortcomings of the pre C++17 code version so don’t expect a cornucopia of features. That been said, I find the following language additions exceptional.

No more “make_stuff” stuff

Template argument deduction for class templates is here to eliminate most cases of make_something helper functions. Though they might not look like such, the following are class template instantiations:


sorted_view sv1(first_name, last_name, sport, medals_gold, medals_silver);
sorted_view sv2(medals_bronze, medals_total);

Implicitly synthesized Deduction Guides (from existing constructors) take over the task of spelling out the class type (which in this case matches the desired behavior in terms of capturing). We can also specify our own deduction guides explicitly but this deserves its own post.  Also note that the type of the two variables declared above is different (different instantiations of the same class template).

NOTE: A nitpicker might have noticed that since the beginning the sorted view constructor didn’t use forwarding references but mere r-value references. That was so that we’d take advantage of reference collapsing rules and have the C++17 extension work out of the box in the desired fashion. If you don’t believe me make the constructor templated in the Demo below and see if template argument deduction for class templates works (spoiler … it doesn’t).

Accessible access

The way we accessed the sorted view elements was kind of awkward, mostly because the user had to be quite familiar with the class interface. Secondly because even then, unless we went the extra mile to bake “named tuples” into sorted_view, no one would be comfortable using the indexes of the inner tuple without looking at the instantiation (I used many fields in the initial example make this obvious).

Structured bindings provide a simple way to decompose a class (destructure one might say) but even better, you can do this your own way! Since destructuring the data holder of the sorted view (the inner tuple) doesn’t do much good, here’s what we’ll provide :


sorted_view sv(first_name, last_name, sport, medals_gold,
medals_silver, medals_bronze, medals_total);
auto [bName1, bName2, bSport, bGoldM, bSilverM, bBronzeM, bTotalM] = sv;

Every binding is an instance of class that holds pointers to a tuple element and the sorted indices


template <class T>
struct sorted_element
{
std::remove_reference_t<T>* _data;
std::vector<std::size_t> const* _is;
sorted_element(T& elem, std::vector<std::size_t> const& is)
: _data(&elem), _is(&is)
{
}
decltype(auto) operator[](std::size_t i)
{
return _data->at(_is->at(i));
}
};

Also, this type provides a sorted access through an overloaded subscript operator:


sv.multi_sort<6, 4, 5>([](auto&& a, auto&& b) { return a > b; });
auto [sName1, sName2, sSport, sGoldM, sSilverM, sBronzeM, sTotalM] = sv;
for (size_t i = 0; i < sv.size<6>(); i++)
{
std::cout << sTotalM[i] << " " << sGoldM[i] << " " << sSilverM[i] << " "
<< sName1[i] << " " << sName2[i] << std::endl;
}

Now this makes the code way clearer with minimum effort. Here’s what it took to provide custom structured bindings for our class:


namespace std
{
// 1. provide tuple_element<I, sorted_view<…>>::type
template <std::size_t I, class… Types>
struct tuple_element<I, utl::sorted_view<Types…>>
{
using sorted_view_t = utl::sorted_view<Types…>;
using type = typename sorted_view_t::template tuple_element_t<I>;
};
// 2. provide tuple_size<sorted_view<…>>
template <class… Types>
struct tuple_size<utl::sorted_view<Types…>>
: public integral_constant<size_t, sizeof…(Types)>
{
};
}
namespace utl
{
// 3. provide a get<I>(sorted_view<…>) function
template <std::size_t I, class… Types>
auto get(sorted_view<Types…> const& sv)
{
return sv.template elem<I>();
}
}

It’s worth mentioning that ADL allows get to be written inside the utl namespace (there’s a dispute on Reddit on whether writing it inside std would be UB). Usage of the techniques mentioned in this section can be found in the following Demo

6. Extensions

There are things to consider in order to augment the functionality of the sorted view class:

  • Provide an interface so that sort could take advantage of C++17 parallel sorts if desired
  • Raymond Chen has an article on the Schwarzian transform, maybe this would be good to have
  • Custom view should be also possible. Why just sort, a lot of other stuff can be done the same way (my first attempt cluttered the interface so until I find a way to keep simple things simple, having a dedicated class is OK)
  • Provide named tuples for the C++14 implementation
  • Concepts: the elements sorted_view acts upon, have to be sortable, indexable etc and all these are constraints wonderfully expressed with concepts (you should see the errors I got while writing this w/o concepts).

On most of the points above, I’ll return with a dedicated post, until then thank you for reading.

References

  1. sorted_view.h
  2. sorted_view.cpp
  3. Sorting by indices – Raymond Chen
  4. How do I sort two vectors containing relative entries
  5. Template argument deduction for class templates
  6. Structured bindings

2 thoughts on “A “sorted view”

Leave a comment