Thinking in C++ - Practical Programming - Sorting and operations on sorted ranges

Thinking in C++ Vol 2 - Practical Programming
Prev	Home	Next

Sorting and operations on sorted ranges

C# Essentials
eBook

$9.99

eBookFrenzy.com

A significant category of STL algorithms must operate on a sorted range. STL provides a number of separate sorting algorithms, depending on whether the sort should be stable, partial, or just regular (non-stable). Oddly enough, only the partial sort has a copying version. If you re using another sort and you need to work on a copy, you ll have to make your own copy before sorting.

Once your sequence is sorted, you can perform many operations on that sequence, from simply locating an element or group of elements to merging with another sorted sequence or manipulating sequences as mathematical sets.

Each algorithm involved with sorting or operations on sorted sequences has two versions. The first uses the object s own operator< to perform the comparison, and the second uses operator( )(a, b) to determine the relative order of a and b. Other than this, there are no differences, so this distinction will not be pointed out in the description of each algorithm.

Sorting

The sort algorithms require ranges delimited by random-access iterators, such as a vector or deque. The list container has its own built-in sort( ) function, since it only supports bi-directional iteration.

void sort(RandomAccessIterator first, RandomAccessIterator
last);
void sort(RandomAccessIterator first, RandomAccessIterator
last, StrictWeakOrdering binary_pred);

Sorts [first, last) into ascending order. The first form uses operator< and the second form uses the supplied comparator object to determine the order.

void stable_sort(RandomAccessIterator first,
RandomAccessIterator last);
void stable_sort(RandomAccessIterator first,
RandomAccessIterator last, StrictWeakOrdering
binary_pred);

Sorts [first, last) into ascending order, preserving the original ordering of equivalent elements. (This is important if elements can be equivalent but not identical.)

void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle, RandomAccessIterator last);
void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle, RandomAccessIterator last,
StrictWeakOrdering binary_pred);

Sorts the number of elements from [first, last) that can be placed in the range [first, middle). The rest of the elements end up in [middle, last) and have no guaranteed order.

RandomAccessIterator partial_sort_copy(InputIterator first,
InputIterator last, RandomAccessIterator result_first,
RandomAccessIterator result_last);
RandomAccessIterator partial_sort_copy(InputIterator first,
InputIterator last, RandomAccessIterator result_first,
RandomAccessIterator result_last, StrictWeakOrdering
binary_pred);

Sorts the number of elements from [first, last) that can be placed in the range [result_first, result_last) and copies those elements into [result_first, result_last). If the range [first, last) is smaller than [result_first, result_last), the smaller number of elements is used.

void nth_element(RandomAccessIterator first,
RandomAccessIterator nth, RandomAccessIterator last);
void nth_element(RandomAccessIterator first,
RandomAccessIterator nth, RandomAccessIterator last,
StrictWeakOrdering binary_pred);

Just like partial_sort( ), nth_element( ) partially orders a range of elements. However, it s much less ordered than partial_sort( ). The only guarantee from nth_element( ) is that whatever location you choose will become a dividing point. All the elements in the range [first, nth) will pair-wise satisfy the binary predicate (operator< by default, as usual), and all the elements in the range (nth, last] will not. However, neither subrange is in any particular order, unlike partial_sort( ) which has the first range in sorted order.

If all you need is this very weak ordering (if, for example, you re determining medians, percentiles, and so on), this algorithm is faster than partial_sort( ).

Locating elements in sorted ranges

Once a range is sorted, you can use a group of operations to find elements within those ranges. In the following functions, there are always two forms. One assumes that the intrinsic operator< performs the sort, and the second operator must be used if some other comparison function object performs the sort. You must use the same comparison for locating elements as you do to perform the sort; otherwise, the results are undefined. In addition, if you try to use these functions on unsorted ranges, the results will be undefined.

bool binary_search(ForwardIterator first, ForwardIterator
last, const T& value);
bool binary_search(ForwardIterator first, ForwardIterator
last, const T& value, StrictWeakOrdering binary_pred);

Tells you whether value appears in the sorted range [first, last).

ForwardIterator lower_bound(ForwardIterator first,
ForwardIterator last, const T& value);
ForwardIterator lower_bound(ForwardIterator first,
ForwardIterator last, const T& value, StrictWeakOrdering
binary_pred);

Returns an iterator indicating the first occurrence of value in the sorted range [first, last). If value is not present, an iterator to where it would fit in the sequence is returned.

ForwardIterator upper_bound(ForwardIterator first,
ForwardIterator last, const T& value);
ForwardIterator upper_bound(ForwardIterator first,
ForwardIterator last, const T& value, StrictWeakOrdering
binary_pred);

Returns an iterator indicating one past the last occurrence of value in the sorted range [first, last). If value is not present, an iterator to where it would fit in the sequence is returned.

pair<ForwardIterator, ForwardIterator> equal_range(ForwardIterator first, ForwardIterator last,
const T& value);
pair<ForwardIterator, ForwardIterator> equal_range(ForwardIterator first, ForwardIterator last,
const T& value, StrictWeakOrdering binary_pred);

Essentially combines lower_bound( ) and upper_bound( ) to return a pair indicating the first and one-past-the-last occurrences of value in the sorted range [first, last). Both iterators indicate the location where value would fit if it is not found.

You may find it surprising that the binary search algorithms take a forward iterator instead of a random access iterator. (Most explanations of binary search use indexing.) Remember that a random access iterator is-a forward iterator, and can be used wherever the latter is specified. If the iterator passed to one of these algorithms in fact supports random access, then the efficient logarithmic-time procedure is used, otherwise a linear search is performed.[96]

Example

The following example turns each input word into an NString and adds it to a vector<NString>. The vector is then used to demonstrate the various sorting and searching algorithms.

//: C06:SortedSearchTest.cpp

// Test searching in sorted ranges.

// NString

#include <algorithm>

#include <cassert>

#include <ctime>

#include <cstdlib>

#include <cstddef>

#include <fstream>

#include <iostream>

#include <iterator>

#include <vector>

#include "NString.h"

#include "PrintSequence.h"

#include "../require.h"

using namespace std;

int main(int argc, char* argv[]) {

typedef vector<NString>::iterator sit;

char* fname = "Test.txt";

if(argc > 1) fname = argv[1];

ifstream in(fname);

assure(in, fname);

srand(time(0));

cout.setf(ios::boolalpha);

vector<NString> original;

copy(istream_iterator<string>(in),

istream_iterator<string>(), back_inserter(original));

require(original.size() >= 4, "Must have four elements");

vector<NString> v(original.begin(), original.end()),

w(original.size() / 2);

sort(v.begin(), v.end());

print(v.begin(), v.end(), "sort");

v = original;

stable_sort(v.begin(), v.end());

print(v.begin(), v.end(), "stable_sort");

v = original;

sit it = v.begin(), it2;

// Move iterator to middle

for(size_t i = 0; i < v.size() / 2; i++)

++it;

partial_sort(v.begin(), it, v.end());

cout << "middle = " << *it << endl;

print(v.begin(), v.end(), "partial_sort");

v = original;

// Move iterator to a quarter position

it = v.begin();

for(size_t i = 0; i < v.size() / 4; i++)

++it;

// Less elements to copy from than to the destination

partial_sort_copy(v.begin(), it, w.begin(), w.end());

print(w.begin(), w.end(), "partial_sort_copy");

// Not enough room in destination

partial_sort_copy(v.begin(), v.end(), w.begin(),w.end());

print(w.begin(), w.end(), "w partial_sort_copy");

// v remains the same through all this process

assert(v == original);

nth_element(v.begin(), it, v.end());

cout << "The nth_element = " << *it << endl;

print(v.begin(), v.end(), "nth_element");

string f = original[rand() % original.size()];

cout << "binary search: "

<< binary_search(v.begin(), v.end(), f) << endl;

sort(v.begin(), v.end());

it = lower_bound(v.begin(), v.end(), f);

it2 = upper_bound(v.begin(), v.end(), f);

print(it, it2, "found range");

pair<sit, sit> ip = equal_range(v.begin(), v.end(), f);

print(ip.first, ip.second, "equal_range");

} ///:~

This example uses the NString class seen earlier, which stores an occurrence number with copies of a string. The call to stable_sort( ) shows how the original order for objects with equal strings is preserved. You can also see what happens during a partial sort (the remaining unsorted elements are in no particular order). There is no partial stable sort.

Notice in the call to nth_element( ) that, whatever the nth element turns out to be (which will vary from one run to another because of URandGen), the elements before that are less, and after that are greater, but the elements have no particular order other than that. Because of URandGen, there are no duplicates, but if you use a generator that allows duplicates, you ll see that the elements before the nth element will be less than or equal to the nth element.

This example also illustrates all three binary search algorithms. As advertised, lower_bound( ) refers to the first element in the sequence equal to a given key, upper_bound( ) points one past the last, and equal_range( ) returns both results as a pair.

Merging sorted ranges

As before, the first form of each function assumes that the intrinsic operator< performs the sort. The second form must be used if some other comparison function object performs the sort. You must use the same comparison for locating elements as you do to perform the sort; otherwise, the results are undefined. In addition, if you try to use these functions on unsorted ranges, the results will be undefined.

OutputIterator merge(InputIterator1 first1, InputIterator1
last1, InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
OutputIterator merge(InputIterator1 first1, InputIterator1
last1, InputIterator2 first2, InputIterator2 last2,
OutputIterator result, StrictWeakOrdering binary_pred);

Copies elements from [first1, last1) and [first2, last2) into result, such that the resulting range is sorted in ascending order. This is a stable operation.

void inplace_merge(BidirectionalIterator first,
BidirectionalIterator middle, BidirectionalIterator
last);
void inplace_merge(BidirectionalIterator first,
BidirectionalIterator middle, BidirectionalIterator last,
StrictWeakOrdering binary_pred);

This assumes that [first, middle) and [middle, last) are each sorted ranges in the same sequence. The two ranges are merged so that the resulting range [first, last) contains the combined ranges in sorted order.

Example

It s easier to see what goes on with merging if ints are used. The following example also emphasizes how the algorithms (and our own print template) work with arrays as well as containers:

//: C06:MergeTest.cpp

// Test merging in sorted ranges.

//{L} Generators

#include <algorithm>

#include "PrintSequence.h"

#include "Generators.h"

using namespace std;

int main() {

const int SZ = 15;

int a[SZ*2] = {0};

// Both ranges go in the same array:

generate(a, a + SZ, SkipGen(0, 2));

a[3] = 4;

a[4] = 4;

generate(a + SZ, a + SZ*2, SkipGen(1, 3));

print(a, a + SZ, "range1", " ");

print(a + SZ, a + SZ*2, "range2", " ");

int b[SZ*2] = {0}; // Initialize all to zero

merge(a, a + SZ, a + SZ, a + SZ*2, b);

print(b, b + SZ*2, "merge", " ");

// Reset b

for(int i = 0; i < SZ*2; i++)

b[i] = 0;

inplace_merge(a, a + SZ, a + SZ*2);

print(a, a + SZ*2, "inplace_merge", " ");

int* end = set_union(a, a + SZ, a + SZ, a + SZ*2, b);

print(b, end, "set_union", " ");

} ///:~

In main( ), instead of creating two separate arrays, both ranges are created end to end in the array a. (This will come in handy for the inplace_merge.) The first call to merge( ) places the result in a different array, b. For comparison, set_union( ) is also called, which has the same signature and similar behavior, except that it removes duplicates from the second set. Finally, inplace_merge( ) combines both parts of a.

Set operations on sorted ranges

Once ranges have been sorted, you can perform mathematical set operations on them.

bool includes(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2);
bool includes(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
StrictWeakOrdering binary_pred);

Returns true if [first2, last2) is a subset of [first1, last1). Neither range is required to hold only unique elements, but if [first2, last2) holds n elements of a particular value, [first1, last1) must also hold at least n elements if the result is to be true.

OutputIterator set_union(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_union(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);

Creates the mathematical union of two sorted ranges in the result range, returning the end of the output range. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets, the resulting set will contain the larger number of identical values.

OutputIterator set_intersection(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_intersection(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);

Produces, in result, the intersection of the two input sets, returning the end of the output range that is, the set of values that appear in both input sets. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets, the resulting set will contain the smaller number of identical values.

OutputIterator set_difference(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_difference(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);

Produces, in result, the mathematical set difference, returning the end of the output range. All the elements that are in [first1, last1) but not in [first2, last2) are placed in the result set. Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets (n times in set 1 and m times in set 2), the resulting set will contain max(n-m, 0) copies of that value.

OutputIterator set_symmetric_difference(InputIterator1
first1, InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_symmetric_difference(InputIterator1
first1, InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);

Constructs, in result, the set containing:

1. All the elements in set 1 that are not in set 2.

2. All the elements in set 2 that are not in set 1.

Neither input range is required to hold only unique elements, but if a particular value appears multiple times in both input sets (n times in set 1 and m times in set 2), the resulting set will contain abs(n-m) copies of that value, where abs( ) is the absolute value. The return value is the end of the output range.

Example

It s easiest to see the set operations demonstrated using simple vectors of characters. These characters are randomly generated and then sorted, but the duplicates are retained so that you can see what the set operations do when there are duplicates.

//: C06:SetOperations.cpp

// Set operations on sorted ranges.

//{L} Generators

#include <algorithm>

#include <vector>

#include "Generators.h"

#include "PrintSequence.h"

using namespace std;

int main() {

const int SZ = 30;

char v[SZ + 1], v2[SZ + 1];

CharGen g;

generate(v, v + SZ, g);

generate(v2, v2 + SZ, g);

sort(v, v + SZ);

sort(v2, v2 + SZ);

print(v, v + SZ, "v", "");

print(v2, v2 + SZ, "v2", "");

bool b = includes(v, v + SZ, v + SZ/2, v + SZ);

cout.setf(ios::boolalpha);

cout << "includes: " << b << endl;

char v3[SZ*2 + 1], *end;

end = set_union(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_union", "");

end = set_intersection(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_intersection", "");

end = set_difference(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_difference", "");

end = set_symmetric_difference(v, v + SZ,

v2, v2 + SZ, v3);

print(v3, end, "set_symmetric_difference","");

} ///:~

After v and v2 are generated, sorted, and printed, the includes( ) algorithm is tested by seeing if the entire range of v contains the last half of v. It does, so the result should always be true. The array v3 holds the output of set_union( ), set_intersection( ), set_difference( ), and set_symmetric_difference( ), and the results of each are displayed so you can ponder them and convince yourself that the algorithms work as promised.

Thinking in C++ Vol 2 - Practical Programming
Prev	Home	Next