Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |
A significant category of STL algorithms must operate on a
sorted range. STL provides a number of separate sorting algorithms, depending
on whether the sort should be stable, partial, or just regular (non-stable).
Oddly enough, only the partial sort has a copying version. If you re using
another sort and you need to work on a copy, you ll have to make your own copy
before sorting.
Once your sequence is sorted, you can perform many
operations on that sequence, from simply locating an element or group of
elements to merging with another sorted sequence or manipulating sequences as
mathematical sets.
Each algorithm involved with sorting or operations on sorted
sequences has two versions. The first uses the object s own operator<
to perform the comparison, and the second uses operator( )(a, b) to
determine the relative order of a and b. Other than this, there
are no differences, so this distinction will not be pointed out in the
description of each algorithm.
Sorting
The sort algorithms require ranges delimited by
random-access iterators, such as a vector or deque. The list
container has its own built-in sort( ) function, since it only
supports bi-directional iteration.
void sort(RandomAccessIterator first, RandomAccessIterator
last);
void sort(RandomAccessIterator first, RandomAccessIterator
last, StrictWeakOrdering binary_pred);
Sorts [first, last) into ascending order. The first
form uses operator< and the second form uses the supplied comparator
object to determine the order.
void stable_sort(RandomAccessIterator first,
RandomAccessIterator last);
void stable_sort(RandomAccessIterator first,
RandomAccessIterator last, StrictWeakOrdering
binary_pred);
Sorts [first, last) into ascending order, preserving
the original ordering of equivalent elements. (This is important if elements
can be equivalent but not identical.)
void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle, RandomAccessIterator last);
void partial_sort(RandomAccessIterator first,
RandomAccessIterator middle, RandomAccessIterator last,
StrictWeakOrdering binary_pred);
Sorts the number of elements from [first, last) that
can be placed in the range [first, middle). The rest of the elements end
up in [middle, last) and have no guaranteed order.
RandomAccessIterator partial_sort_copy(InputIterator first,
InputIterator last, RandomAccessIterator result_first,
RandomAccessIterator result_last);
RandomAccessIterator partial_sort_copy(InputIterator first,
InputIterator last, RandomAccessIterator result_first,
RandomAccessIterator result_last, StrictWeakOrdering
binary_pred);
Sorts the number of elements from [first, last) that
can be placed in the range [result_first, result_last) and copies those
elements into [result_first, result_last). If the range [first, last)
is smaller than [result_first, result_last), the smaller number of
elements is used.
void nth_element(RandomAccessIterator first,
RandomAccessIterator nth, RandomAccessIterator last);
void nth_element(RandomAccessIterator first,
RandomAccessIterator nth, RandomAccessIterator last,
StrictWeakOrdering binary_pred);
Just like partial_sort( ), nth_element( )
partially orders a range of elements. However, it s much less ordered than partial_sort( ).
The only guarantee from nth_element( ) is that whatever location
you choose will become a dividing point. All the elements in the range [first,
nth) will pair-wise satisfy the binary predicate (operator< by
default, as usual), and all the elements in the range (nth, last]
will not. However, neither subrange is in any particular order, unlike partial_sort( )
which has the first range in sorted order.
If all you need is this very weak ordering (if, for example,
you re determining medians, percentiles, and so on), this algorithm is faster
than partial_sort( ).
Locating elements in sorted ranges
Once a range is sorted, you can use a group of operations to
find elements within those ranges. In the following functions, there are always
two forms. One assumes that the intrinsic operator< performs the
sort, and the second operator must be used if some other comparison function
object performs the sort. You must use the same comparison for locating
elements as you do to perform the sort; otherwise, the results are undefined.
In addition, if you try to use these functions on unsorted ranges, the results
will be undefined.
bool binary_search(ForwardIterator first, ForwardIterator
last, const T& value);
bool binary_search(ForwardIterator first, ForwardIterator
last, const T& value, StrictWeakOrdering binary_pred);
Tells you whether value appears in the sorted range [first,
last).
ForwardIterator lower_bound(ForwardIterator first,
ForwardIterator last, const T& value);
ForwardIterator lower_bound(ForwardIterator first,
ForwardIterator last, const T& value, StrictWeakOrdering
binary_pred);
Returns an iterator indicating the first occurrence of value
in the sorted range [first, last). If value is not present, an
iterator to where it would fit in the sequence is returned.
ForwardIterator upper_bound(ForwardIterator first,
ForwardIterator last, const T& value);
ForwardIterator upper_bound(ForwardIterator first,
ForwardIterator last, const T& value, StrictWeakOrdering
binary_pred);
Returns an iterator indicating one past the last occurrence
of value in the sorted range [first, last). If value is
not present, an iterator to where it would fit in the sequence is returned.
pair<ForwardIterator,
ForwardIterator> equal_range(ForwardIterator first, ForwardIterator last,
const T& value);
pair<ForwardIterator, ForwardIterator> equal_range(ForwardIterator
first, ForwardIterator last,
const T& value, StrictWeakOrdering binary_pred);
Essentially
combines lower_bound( ) and upper_bound( ) to return a pair
indicating the first and one-past-the-last occurrences of value in the
sorted range [first, last). Both iterators indicate the location where value
would fit if it is not found.
You may find
it surprising that the binary search algorithms take a forward iterator instead
of a random access iterator. (Most explanations of binary search use indexing.)
Remember that a random access iterator is-a forward iterator, and can be used
wherever the latter is specified. If the iterator passed to one of these
algorithms in fact supports random access, then the efficient logarithmic-time
procedure is used, otherwise a linear search is performed.
Example
The following example turns each input word into an NString
and adds it to a vector<NString>. The vector is then used
to demonstrate the various sorting and searching algorithms.
//: C06:SortedSearchTest.cpp
// Test searching in sorted ranges.
// NString
#include <algorithm>
#include <cassert>
#include <ctime>
#include <cstdlib>
#include <cstddef>
#include <fstream>
#include <iostream>
#include <iterator>
#include <vector>
#include "NString.h"
#include "PrintSequence.h"
#include "../require.h"
using namespace std;
int main(int argc, char* argv[]) {
typedef vector<NString>::iterator sit;
char* fname = "Test.txt";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
srand(time(0));
cout.setf(ios::boolalpha);
vector<NString> original;
copy(istream_iterator<string>(in),
istream_iterator<string>(),
back_inserter(original));
require(original.size() >= 4, "Must have four
elements");
vector<NString> v(original.begin(),
original.end()),
w(original.size() / 2);
sort(v.begin(), v.end());
print(v.begin(), v.end(), "sort");
v = original;
stable_sort(v.begin(), v.end());
print(v.begin(), v.end(), "stable_sort");
v = original;
sit it = v.begin(), it2;
// Move iterator to middle
for(size_t i = 0; i < v.size() / 2; i++)
++it;
partial_sort(v.begin(), it, v.end());
cout << "middle = " << *it
<< endl;
print(v.begin(), v.end(), "partial_sort");
v = original;
// Move iterator to a quarter position
it = v.begin();
for(size_t i = 0; i < v.size() / 4; i++)
++it;
// Less elements to copy from than to the destination
partial_sort_copy(v.begin(), it, w.begin(), w.end());
print(w.begin(), w.end(),
"partial_sort_copy");
// Not enough room in destination
partial_sort_copy(v.begin(), v.end(),
w.begin(),w.end());
print(w.begin(), w.end(), "w
partial_sort_copy");
// v remains the same through all this process
assert(v == original);
nth_element(v.begin(), it, v.end());
cout << "The nth_element = " <<
*it << endl;
print(v.begin(), v.end(), "nth_element");
string f = original[rand() % original.size()];
cout << "binary search: "
<< binary_search(v.begin(), v.end(), f)
<< endl;
sort(v.begin(), v.end());
it = lower_bound(v.begin(), v.end(), f);
it2 = upper_bound(v.begin(), v.end(), f);
print(it, it2, "found range");
pair<sit, sit> ip = equal_range(v.begin(),
v.end(), f);
print(ip.first, ip.second, "equal_range");
} ///:~
This example uses the NString class seen earlier,
which stores an occurrence number with copies of a string. The call to stable_sort( )
shows how the original order for objects with equal strings is preserved. You
can also see what happens during a partial sort (the remaining unsorted
elements are in no particular order). There is no partial stable sort.
Notice in the call to nth_element( ) that,
whatever the nth element turns out to be (which will vary from one run to
another because of URandGen), the elements before that are less, and
after that are greater, but the elements have no particular order other than
that. Because of URandGen, there are no duplicates, but if you use a
generator that allows duplicates, you ll see that the elements before the nth
element will be less than or equal to the nth element.
This example also illustrates all three binary search
algorithms. As advertised, lower_bound( ) refers to the first
element in the sequence equal to a given key, upper_bound( ) points
one past the last, and equal_range( ) returns both results as a
pair.
Merging sorted ranges
As before, the first form of each function assumes that the
intrinsic operator< performs the sort. The second form must be used
if some other comparison function object performs the sort. You must use the
same comparison for locating elements as you do to perform the sort; otherwise,
the results are undefined. In addition, if you try to use these functions on
unsorted ranges, the results will be undefined.
OutputIterator merge(InputIterator1 first1, InputIterator1
last1, InputIterator2 first2, InputIterator2 last2,
OutputIterator result);
OutputIterator merge(InputIterator1 first1, InputIterator1
last1, InputIterator2 first2, InputIterator2 last2,
OutputIterator result, StrictWeakOrdering binary_pred);
Copies elements from [first1, last1) and [first2,
last2) into result, such that the resulting range is sorted in
ascending order. This is a stable operation.
void inplace_merge(BidirectionalIterator first,
BidirectionalIterator middle, BidirectionalIterator
last);
void inplace_merge(BidirectionalIterator first,
BidirectionalIterator middle, BidirectionalIterator last,
StrictWeakOrdering binary_pred);
This assumes that [first, middle) and [middle,
last) are each sorted ranges in the same sequence. The two ranges are
merged so that the resulting range [first, last) contains the combined
ranges in sorted order.
Example
It s easier to see what goes on with merging if ints
are used. The following example also emphasizes how the algorithms (and our own
print template) work with arrays as well as containers:
//: C06:MergeTest.cpp
// Test merging in sorted ranges.
//{L} Generators
#include <algorithm>
#include "PrintSequence.h"
#include "Generators.h"
using namespace std;
int main() {
const int SZ = 15;
int a[SZ*2] = {0};
// Both ranges go in the same array:
generate(a, a + SZ, SkipGen(0, 2));
a[3] = 4;
a[4] = 4;
generate(a + SZ, a + SZ*2, SkipGen(1, 3));
print(a, a + SZ, "range1", " ");
print(a + SZ, a + SZ*2, "range2", "
");
int b[SZ*2] = {0}; // Initialize all to zero
merge(a, a + SZ, a + SZ, a + SZ*2, b);
print(b, b + SZ*2, "merge", " ");
// Reset b
for(int i = 0; i < SZ*2; i++)
b[i] = 0;
inplace_merge(a, a + SZ, a + SZ*2);
print(a, a + SZ*2, "inplace_merge", "
");
int* end = set_union(a, a + SZ, a + SZ, a + SZ*2, b);
print(b, end, "set_union", " ");
} ///:~
In main( ), instead of creating two separate
arrays, both ranges are created end to end in the array a. (This will
come in handy for the inplace_merge.) The first call to merge( )
places the result in a different array, b. For comparison, set_union( )
is also called, which has the same signature and similar behavior, except that
it removes duplicates from the second set. Finally, inplace_merge( )
combines both parts of a.
Set operations on sorted ranges
Once ranges have been sorted, you can perform mathematical
set operations on them.
bool includes(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2);
bool includes(InputIterator1 first1, InputIterator1 last1,
InputIterator2 first2, InputIterator2 last2,
StrictWeakOrdering binary_pred);
Returns true if [first2, last2) is a subset of
[first1, last1). Neither range is required to hold only unique elements,
but if [first2, last2) holds n elements of a particular value, [first1,
last1) must also hold at least n elements if the result is to be true.
OutputIterator set_union(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_union(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);
Creates the mathematical union of two sorted ranges in the result
range, returning the end of the output range. Neither input range is required
to hold only unique elements, but if a particular value appears multiple times
in both input sets, the resulting set will contain the larger number of
identical values.
OutputIterator set_intersection(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_intersection(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);
Produces, in result, the intersection of the two
input sets, returning the end of the output range that is, the set of values
that appear in both input sets. Neither input range is required to hold only
unique elements, but if a particular value appears multiple times in both input
sets, the resulting set will contain the smaller number of identical values.
OutputIterator set_difference(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_difference(InputIterator1 first1,
InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);
Produces, in result, the mathematical set difference,
returning the end of the output range. All the elements that are in [first1,
last1) but not in [first2, last2) are placed in the result set.
Neither input range is required to hold only unique elements, but if a
particular value appears multiple times in both input sets (n times in
set 1 and m times in set 2), the resulting set will contain max(n-m,
0) copies of that value.
OutputIterator set_symmetric_difference(InputIterator1
first1, InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result);
OutputIterator set_symmetric_difference(InputIterator1
first1, InputIterator1 last1, InputIterator2 first2,
InputIterator2 last2, OutputIterator result,
StrictWeakOrdering binary_pred);
Constructs, in result, the set containing:
1. All the elements in set 1 that are not in set 2.
2. All the elements in set 2 that are not in set 1.
Neither input range is required to hold only unique
elements, but if a particular value appears multiple times in both input sets (n
times in set 1 and m times in set 2), the resulting set will contain abs(n-m)
copies of that value, where abs( ) is the absolute value. The
return value is the end of the output range.
Example
It s easiest to see the set operations demonstrated using
simple vectors of characters. These characters are randomly generated
and then sorted, but the duplicates are retained so that you can see what the
set operations do when there are duplicates.
//: C06:SetOperations.cpp
// Set operations on sorted ranges.
//{L} Generators
#include <algorithm>
#include <vector>
#include "Generators.h"
#include "PrintSequence.h"
using namespace std;
int main() {
const int SZ = 30;
char v[SZ + 1], v2[SZ + 1];
CharGen g;
generate(v, v + SZ, g);
generate(v2, v2 + SZ, g);
sort(v, v + SZ);
sort(v2, v2 + SZ);
print(v, v + SZ, "v",
"");
print(v2, v2 + SZ, "v2", "");
bool b = includes(v, v + SZ, v + SZ/2, v + SZ);
cout.setf(ios::boolalpha);
cout << "includes: " << b
<< endl;
char v3[SZ*2 + 1], *end;
end = set_union(v, v + SZ, v2, v2 + SZ, v3);
print(v3, end, "set_union", "");
end = set_intersection(v, v + SZ, v2, v2 + SZ, v3);
print(v3, end, "set_intersection",
"");
end = set_difference(v, v + SZ, v2, v2 + SZ, v3);
print(v3, end, "set_difference",
"");
end = set_symmetric_difference(v, v + SZ,
v2, v2 + SZ, v3);
print(v3, end,
"set_symmetric_difference","");
} ///:~
After v and v2 are generated, sorted, and
printed, the includes( ) algorithm is tested by seeing if the
entire range of v contains the last half of v. It does, so the
result should always be true. The array v3 holds the output of set_union( ),
set_intersection( ), set_difference( ), and set_symmetric_difference( ),
and the results of each are displayed so you can ponder them and convince
yourself that the algorithms work as promised.
Thinking in C++ Vol 2 - Practical Programming |
Prev |
Home |
Next |