Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Thinking in C++ Vol 2 - Practical Programming
Prev Home Next

set

The set container accepts only one copy of each element. It also sorts the elements. (Sorting isn t intrinsic to the conceptual definition of a set, but the STL set stores its elements in a balanced tree data structure to provide rapid lookups, thus producing sorted results when you traverse it.) The first two examples in this chapter used sets.

Consider the problem of creating an index for a book. You might like to start with all the words in the book, but you only want one instance of each word, and you want them sorted. A set is perfect for this and solves the problem effortlessly. However, there s also the problem of punctuation and any other nonalpha characters, which must be stripped off to generate proper words. One solution to this problem is to use the Standard C library functions isalpha( ) and isspace( ) to extract only the characters you want. You can replace all unwanted characters with spaces so that you can easily extract valid words from each line you read:

//: C07:WordList.cpp
// Display a list of words used in a document.
#include <algorithm>
#include <cctype>
#include <cstring>
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <sstream>
#include <string>
#include "../require.h"
using namespace std;
 
char replaceJunk(char c) {
// Only keep alphas, space (as a delimiter), and '
return (isalpha(c) || c == '\'') ? c : ' ';
}
 
int main(int argc, char* argv[]) {
char* fname = "WordList.cpp";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
set<string> wordlist;
string line;
while(getline(in, line)) {
transform(line.begin(), line.end(), line.begin(),
replaceJunk);
istringstream is(line);
string word;
while(is >> word)
wordlist.insert(word);
}
// Output results:
copy(wordlist.begin(), wordlist.end(),
ostream_iterator<string>(cout, "\n"));
} ///:~
 

The call to transform( ) replaces each character to be ignored with a space. The set container not only ignores duplicate words, but compares the words it keeps according to the function object less<string> (the default second template argument for the set container), which in turn uses string::operator<( ), so the words emerge in alphabetical order.

You don t need to use a set just to get a sorted sequence. You can use the sort( ) function (along with a multitude of other functions in the STL) on different STL containers. However, it s likely that set will be faster here. Using a set is particularly handy when you just want to do lookup, since its find( ) member function has logarithmic complexity and so is much faster than the generic find( ) algorithm. As you recall, the generic find( ) algorithm needs to traverse the whole range until it finds the search element (resulting in a worst-case complexity of N, and an average complexity of N/2). However, if you have a sequence container that is already sorted, use equal_range( ) for logarithmic complexity when finding elements.

The following version shows how to build the list of words with an istreambuf_iterator that moves the characters from one place (the input stream) to another (a string object), depending on whether the Standard C library function isalpha( ) returns true:

//: C07:WordList2.cpp
// Illustrates istreambuf_iterator and insert iterators.
#include <cstring>
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <string>
#include "../require.h"
using namespace std;
 
int main(int argc, char* argv[]) {
char* fname = "WordList2.cpp";
if(argc > 1) fname = argv[1];
ifstream in(fname);
assure(in, fname);
istreambuf_iterator<char> p(in), end;
set<string> wordlist;
while(p != end) {
string word;
insert_iterator<string> ii(word, word.begin());
// Find the first alpha character:
while(p != end && !isalpha(*p))
++p;
// Copy until the first non-alpha character:
while(p != end && isalpha(*p))
*ii++ = *p++;
if(word.size() != 0)
wordlist.insert(word);
}
// Output results:
copy(wordlist.begin(), wordlist.end(),
ostream_iterator<string>(cout, "\n"));
} ///:~
 

This example was suggested by Nathan Myers, who invented the istreambuf_iterator and its relatives. This iterator extracts information character by character from a stream. Although the istreambuf_iterator template argument might imply that you could extract, for example, ints instead of char, that s not the case. The argument must be of some character type a regular char or a wide character.

After the file is open, an istreambuf_iterator called p is attached to the istream so characters can be extracted from it. The set<string> called wordlist will hold the resulting words.

The while loop reads words until it finds the end of the input stream. This is detected using the default constructor for istreambuf_iterator, which produces the past-the-end iterator object end. Thus, if you want to test to make sure you re not at the end of the stream, you simply say p != end.

The second type of iterator that s used here is the insert_iterator, which you saw previously. This inserts objects into a container. Here, the container is the string called word, which, for the purposes of insert_iterator, behaves like a container. The constructor for insert_iterator requires the container and an iterator indicating where it should start inserting the characters. You could also use a back_insert_iterator, which requires that the container have a push_back( ) (string does).

After the while loop sets everything up, it begins by looking for the first alpha character, incrementing start until that character is found. It then copies characters from one iterator to the other, stopping when a nonalpha character is found. Each word, assuming it is nonempty, is added to wordlist.

Thinking in C++ Vol 2 - Practical Programming
Prev Home Next

 
 
   Reproduced courtesy of Bruce Eckel, MindView, Inc. Design by Interspire