Thinking in C++ Vol 2 - Practical Programming |

Prev |
Home |
Next |

A significant category of STL algorithms must operate on a
sorted range. STL provides a number of separate sorting algorithms, depending
on whether the sort should be stable, partial, or just regular (non-stable).
Oddly enough, only the partial sort has a copying version. If you re using
another sort and you need to work on a copy, you ll have to make your own copy
before sorting.

Once your sequence is sorted, you can perform many
operations on that sequence, from simply locating an element or group of
elements to merging with another sorted sequence or manipulating sequences as
mathematical sets.

Each algorithm involved with sorting or operations on sorted
sequences has two versions. The first uses the object s own **operator<**
to perform the comparison, and the second uses **operator( )(a, b)** to
determine the relative order of **a** and **b**. Other than this, there
are no differences, so this distinction will not be pointed out in the
description of each algorithm.

#### Sorting

The sort algorithms require ranges delimited by
random-access iterators, such as a **vector** or **deque**. The **list**
container has its own built-in **sort( )** function, since it only
supports bi-directional iteration.

void **sort**(RandomAccessIterator first, RandomAccessIterator

last);

void **sort**(RandomAccessIterator first, RandomAccessIterator

last, StrictWeakOrdering binary_pred);

Sorts **[first, last)** into ascending order. The first
form uses **operator<** and the second form uses the supplied comparator
object to determine the order.

void **stable_sort**(RandomAccessIterator first,

RandomAccessIterator last);

void **stable_sort**(RandomAccessIterator first,

RandomAccessIterator last, StrictWeakOrdering

binary_pred);

Sorts **[first, last)** into ascending order, preserving
the original ordering of equivalent elements. (This is important if elements
can be equivalent but not identical.)

void **partial_sort**(RandomAccessIterator first,

RandomAccessIterator middle, RandomAccessIterator last);

void **partial_sort**(RandomAccessIterator first,

RandomAccessIterator middle, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Sorts the number of elements from **[first, last)** that
can be placed in the range **[first, middle)**. The rest of the elements end
up in **[middle, last)** and have no guaranteed order.

RandomAccessIterator **partial_sort_copy**(InputIterator first,

InputIterator last, RandomAccessIterator result_first,

RandomAccessIterator result_last);

RandomAccessIterator **partial_sort_copy**(InputIterator first,

InputIterator last, RandomAccessIterator result_first,

RandomAccessIterator result_last, StrictWeakOrdering

binary_pred);

Sorts the number of elements from **[first, last)** that
can be placed in the range **[result_first, result_last)** and copies those
elements into **[result_first, result_last)**. If the range **[first, last)**
is smaller than **[result_first, result_last)**, the smaller number of
elements is used.

void **nth_element**(RandomAccessIterator first,

RandomAccessIterator nth, RandomAccessIterator last);

void **nth_element**(RandomAccessIterator first,

RandomAccessIterator nth, RandomAccessIterator last,

StrictWeakOrdering binary_pred);

Just like **partial_sort( )**, **nth_element( )**
partially orders a range of elements. However, it s much less ordered than **partial_sort( )**.
The only guarantee from **nth_element( )** is that whatever *location
*you choose will become a dividing point. All the elements in the range **[first,
nth) **will pair-wise satisfy the binary predicate (**operator<** by
default, as usual),** **and all the elements in the range **(nth, last]**
will not. However, neither subrange is in any particular order, unlike **partial_sort( )**
which has the first range in sorted order.

If all you need is this very weak ordering (if, for example,
you re determining medians, percentiles, and so on), this algorithm is faster
than **partial_sort( )**.

#### Locating elements in sorted ranges

Once a range is sorted, you can use a group of operations to
find elements within those ranges. In the following functions, there are always
two forms. One assumes that the intrinsic **operator<** performs the
sort, and the second operator must be used if some other comparison function
object performs the sort. You must use the same comparison for locating
elements as you do to perform the sort; otherwise, the results are undefined.
In addition, if you try to use these functions on unsorted ranges, the results
will be undefined.

bool **binary_search**(ForwardIterator first, ForwardIterator

last, const T& value);

bool **binary_search**(ForwardIterator first, ForwardIterator

last, const T& value, StrictWeakOrdering binary_pred);

Tells you whether **value** appears in the sorted range **[first,
last)**.

ForwardIterator **lower_bound**(ForwardIterator first,

ForwardIterator last, const T& value);

ForwardIterator **lower_bound**(ForwardIterator first,

ForwardIterator last, const T& value, StrictWeakOrdering

binary_pred);

Returns an iterator indicating the first occurrence of **value**
in the sorted range **[first, last)**. If **value** is not present, an
iterator to where it would fit in the sequence is returned.

ForwardIterator **upper_bound**(ForwardIterator first,

ForwardIterator last, const T& value);

ForwardIterator **upper_bound**(ForwardIterator first,

ForwardIterator last, const T& value, StrictWeakOrdering

binary_pred);

Returns an iterator indicating one past the last occurrence
of **value** in the sorted range **[first, last)**. If **value** is
not present, an iterator to where it would fit in the sequence is returned.

pair<ForwardIterator,
ForwardIterator> **equal_range**(ForwardIterator first, ForwardIterator last,

const T& value);

pair<ForwardIterator, ForwardIterator> **equal_range**(ForwardIterator
first, ForwardIterator last,

const T& value, StrictWeakOrdering binary_pred);

Essentially
combines **lower_bound( )** and **upper_bound( )** to return a **pair**
indicating the first and one-past-the-last occurrences of **value** in the
sorted range **[first, last)**. Both iterators indicate the location where **value**
would fit if it is not found.

You may find
it surprising that the binary search algorithms take a forward iterator instead
of a random access iterator. (Most explanations of binary search use indexing.)
Remember that a random access iterator is-a forward iterator, and can be used
wherever the latter is specified. If the iterator passed to one of these
algorithms in fact supports random access, then the efficient logarithmic-time
procedure is used, otherwise a linear search is performed.

#### Example

The** **following example turns each input word into an **NString**
and adds it to a **vector<NString>**. The **vector** is then used
to demonstrate the various sorting and searching algorithms.

//: C06:SortedSearchTest.cpp

// Test searching in sorted ranges.

// NString

#include <algorithm>

#include <cassert>

#include <ctime>

#include <cstdlib>

#include <cstddef>

#include <fstream>

#include <iostream>

#include <iterator>

#include <vector>

#include "NString.h"

#include "PrintSequence.h"

#include "../require.h"

using namespace std;

int main(int argc, char* argv[]) {

typedef vector<NString>::iterator sit;

char* fname = "Test.txt";

if(argc > 1) fname = argv[1];

ifstream in(fname);

assure(in, fname);

srand(time(0));

cout.setf(ios::boolalpha);

vector<NString> original;

copy(istream_iterator<string>(in),

istream_iterator<string>(),
back_inserter(original));

require(original.size() >= 4, "Must have four
elements");

vector<NString> v(original.begin(),
original.end()),

w(original.size() / 2);

sort(v.begin(), v.end());

print(v.begin(), v.end(), "sort");

v = original;

stable_sort(v.begin(), v.end());

print(v.begin(), v.end(), "stable_sort");

v = original;

sit it = v.begin(), it2;

// Move iterator to middle

for(size_t i = 0; i < v.size() / 2; i++)

++it;

partial_sort(v.begin(), it, v.end());

cout << "middle = " << *it
<< endl;

print(v.begin(), v.end(), "partial_sort");

v = original;

// Move iterator to a quarter position

it = v.begin();

for(size_t i = 0; i < v.size() / 4; i++)

++it;

// Less elements to copy from than to the destination

partial_sort_copy(v.begin(), it, w.begin(), w.end());

print(w.begin(), w.end(),
"partial_sort_copy");

// Not enough room in destination

partial_sort_copy(v.begin(), v.end(),
w.begin(),w.end());

print(w.begin(), w.end(), "w
partial_sort_copy");

// v remains the same through all this process

assert(v == original);

nth_element(v.begin(), it, v.end());

cout << "The nth_element = " <<
*it << endl;

print(v.begin(), v.end(), "nth_element");

string f = original[rand() % original.size()];

cout << "binary search: "

<< binary_search(v.begin(), v.end(), f)
<< endl;

sort(v.begin(), v.end());

it = lower_bound(v.begin(), v.end(), f);

it2 = upper_bound(v.begin(), v.end(), f);

print(it, it2, "found range");

pair<sit, sit> ip = equal_range(v.begin(),
v.end(), f);

print(ip.first, ip.second, "equal_range");

} ///:~

This example uses the **NString** class seen earlier,
which stores an occurrence number with copies of a string. The call to **stable_sort( )**
shows how the original order for objects with equal strings is preserved. You
can also see what happens during a partial sort (the remaining unsorted
elements are in no particular order). There is no partial stable sort.

Notice in the call to **nth_element( )** that,
whatever the nth element turns out to be (which will vary from one run to
another because of **URandGen**), the elements before that are less, and
after that are greater, but the elements have no particular order other than
that. Because of **URandGen**, there are no duplicates, but if you use a
generator that allows duplicates, you ll see that the elements before the nth
element will be less than or equal to the nth element.

This example also illustrates all three binary search
algorithms. As advertised, **lower_bound( )** refers to the first
element in the sequence equal to a given key, **upper_bound( )** points
one past the last, and **equal_range( )** returns both results as a
pair.

#### Merging sorted ranges

As before, the first form of each function assumes that the
intrinsic **operator<** performs the sort. The second form must be used
if some other comparison function object performs the sort. You must use the
same comparison for locating elements as you do to perform the sort; otherwise,
the results are undefined. In addition, if you try to use these functions on
unsorted ranges, the results will be undefined.

OutputIterator **merge**(InputIterator1 first1, InputIterator1

last1, InputIterator2 first2, InputIterator2 last2,

OutputIterator result);

OutputIterator **merge**(InputIterator1 first1, InputIterator1

last1, InputIterator2 first2, InputIterator2 last2,

OutputIterator result, StrictWeakOrdering binary_pred);

Copies elements from **[first1, last1)** and **[first2,
last2)** into **result**, such that the resulting range is sorted in
ascending order. This is a stable operation.

void **inplace_merge**(BidirectionalIterator first,

BidirectionalIterator middle, BidirectionalIterator

last);

void **inplace_merge**(BidirectionalIterator first,

BidirectionalIterator middle, BidirectionalIterator last,

StrictWeakOrdering binary_pred);

This assumes that **[first, middle)** and **[middle,
last)** are each sorted ranges in the same sequence. The two ranges are
merged so that the resulting range **[first, last)** contains the combined
ranges in sorted order.

#### Example

It s easier to see what goes on with merging if **int**s
are used. The following example also emphasizes how the algorithms (and our own
**print** template) work with arrays as well as containers:

//: C06:MergeTest.cpp

// Test merging in sorted ranges.

//{L} Generators

#include <algorithm>

#include "PrintSequence.h"

#include "Generators.h"

using namespace std;

int main() {

const int SZ = 15;

int a[SZ*2] = {0};

// Both ranges go in the same array:

generate(a, a + SZ, SkipGen(0, 2));

a[3] = 4;

a[4] = 4;

generate(a + SZ, a + SZ*2, SkipGen(1, 3));

print(a, a + SZ, "range1", " ");

print(a + SZ, a + SZ*2, "range2", "
");

int b[SZ*2] = {0}; // Initialize all to zero

merge(a, a + SZ, a + SZ, a + SZ*2, b);

print(b, b + SZ*2, "merge", " ");

// Reset b

for(int i = 0; i < SZ*2; i++)

b[i] = 0;

inplace_merge(a, a + SZ, a + SZ*2);

print(a, a + SZ*2, "inplace_merge", "
");

int* end = set_union(a, a + SZ, a + SZ, a + SZ*2, b);

print(b, end, "set_union", " ");

} ///:~

In **main( )**, instead of creating two separate
arrays, both ranges are created end to end in the array **a**. (This will
come in handy for the **inplace_merge**.) The first call to **merge( )**
places the result in a different array, **b**. For comparison, **set_union( )**
is also called, which has the same signature and similar behavior, except that
it removes duplicates from the second set. Finally, **inplace_merge( )**
combines both parts of **a**.

#### Set operations on sorted ranges

Once ranges have been sorted, you can perform mathematical
set operations on them.

bool **includes**(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2);

bool **includes**(InputIterator1 first1, InputIterator1 last1,

InputIterator2 first2, InputIterator2 last2,

StrictWeakOrdering binary_pred);

Returns **true** if **[first2, last2)** is a subset of
**[first1, last1)**. Neither range is required to hold only unique elements,
but if **[first2, last2)** holds **n** elements of a particular value, **[first1,
last1)** must also hold at least **n** elements if the result is to be **true**.

OutputIterator **set_union**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result);

OutputIterator **set_union**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Creates the mathematical union of two sorted ranges in the **result**
range, returning the end of the output range. Neither input range is required
to hold only unique elements, but if a particular value appears multiple times
in both input sets, the resulting set will contain the larger number of
identical values.

OutputIterator **set_intersection**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result);

OutputIterator **set_intersection**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Produces, in **result**, the intersection of the two
input sets, returning the end of the output range that is, the set of values
that appear in both input sets. Neither input range is required to hold only
unique elements, but if a particular value appears multiple times in both input
sets, the resulting set will contain the smaller number of identical values.

OutputIterator **set_difference**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result);

OutputIterator **set_difference**(InputIterator1 first1,

InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Produces, in **result**, the mathematical set difference,
returning the end of the output range. All the elements that are in **[first1,
last1)** but not in **[first2, last2)** are placed in the result set.
Neither input range is required to hold only unique elements, but if a
particular value appears multiple times in both input sets (**n** times in
set 1 and **m** times in set 2), the resulting set will contain **max(n-m,
0)** copies of that value.

OutputIterator **set_symmetric_difference**(InputIterator1

first1, InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result);

OutputIterator **set_symmetric_difference**(InputIterator1

first1, InputIterator1 last1, InputIterator2 first2,

InputIterator2 last2, OutputIterator result,

StrictWeakOrdering binary_pred);

Constructs, in **result**, the set containing:

1. All the elements in set 1 that are not in set 2.

2. All the elements in set 2 that are not in set 1.

Neither input range is required to hold only unique
elements, but if a particular value appears multiple times in both input sets (**n**
times in set 1 and **m** times in set 2), the resulting set will contain **abs(n-m)**
copies of that value, where **abs( )** is the absolute value. The
return value is the end of the output range.

#### Example

It s easiest to see the set operations demonstrated using
simple **vector**s of characters. These characters are randomly generated
and then sorted, but the duplicates are retained so that you can see what the
set operations do when there are duplicates.

//: C06:SetOperations.cpp

// Set operations on sorted ranges.

//{L} Generators

#include <algorithm>

#include <vector>

#include "Generators.h"

#include "PrintSequence.h"

using namespace std;

int main() {

const int SZ = 30;

char v[SZ + 1], v2[SZ + 1];

CharGen g;

generate(v, v + SZ, g);

generate(v2, v2 + SZ, g);

sort(v, v + SZ);

sort(v2, v2 + SZ);

print(v, v + SZ, "v",
"");

print(v2, v2 + SZ, "v2", "");

bool b = includes(v, v + SZ, v + SZ/2, v + SZ);

cout.setf(ios::boolalpha);

cout << "includes: " << b
<< endl;

char v3[SZ*2 + 1], *end;

end = set_union(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_union", "");

end = set_intersection(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_intersection",
"");

end = set_difference(v, v + SZ, v2, v2 + SZ, v3);

print(v3, end, "set_difference",
"");

end = set_symmetric_difference(v, v + SZ,

v2, v2 + SZ, v3);

print(v3, end,
"set_symmetric_difference","");

} ///:~

After **v** and **v2** are generated, sorted, and
printed, the **includes( )** algorithm is tested by seeing if the
entire range of **v** contains the last half of **v**. It does, so the
result should always be true. The array **v3** holds the output of **set_union( )**,
**set_intersection( )**, **set_difference( )**, and **set_symmetric_difference( )**,
and the results of each are displayed so you can ponder them and convince
yourself that the algorithms work as promised.

Thinking in C++ Vol 2 - Practical Programming |

Prev |
Home |
Next |