Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

The Art of Unix Programming
Prev Home Next


Unix Programming - Taxonomy of Unix IPC Methods - Pipes, Redirection, and Filters

Pipes, Redirection, and Filters

After Ken Thompson and Dennis Ritchie, the single most important formative figure of early Unix was probably Doug McIlroy. His invention of the pipe construct reverberated through the design of Unix, encouraging its nascent do-one-thing-well philosophy and inspiring most of the later forms of IPC in the Unix design (in particular, the socket abstraction used for networking).

Pipes depend on the convention that every program has initially available to it (at least) two I/O data streams: standard input and standard output (numeric file descriptors 0 and 1 respectively). Many programs can be written as filters , which read sequentially from standard input and write only to standard output.

Normally these streams are connected to the user's keyboard and display, respectively. But Unix shells universally support redirection operations which connect these standard input and output streams to files. Thus, typing

ls >foo

sends the output of the directory lister ls(1) to a file named ‘foo’. On the other hand, typing:

wc <foo

causes the word-count utility wc(1) to take its standard input from the file ‘foo’, and deliver a character/word/line count to standard output.

The pipe operation connects the standard output of one program to the standard input of another. A chain of programs connected in this way is called a pipeline . If we write

ls | wc

we'll see a character/word/line count for the current directory listing. (In this case, only the line count is really likely to be useful.)

One favorite pipeline was “bc | speak”—a talking desk calculator. It knew number names up to a vigintillion.

-- Doug McIlroy

It's important to note that all the stages in a pipeline run concurrently. Each stage waits for input on the output of the previous one, but no stage has to exit before the next can run. This property will be important later on when we look at interactive uses of pipelines, like sending the lengthy output of a command to more(1).

It's easy to underestimate the power of combining pipes and redirection. As an instructive example, The Unix Shell As a 4GL [Schaffer-Wolf] shows that with these facilities as a framework, a handful of simple utilities can be combined to support creating and manipulating relational databases expressed as simple textual tables.

The major weakness of pipes is that they are unidirectional. It's not possible for a pipeline component to pass control information back up the pipe other than by terminating (in which case the previous stage will get a SIGPIPE signal on the next write). Accordingly, the protocol for passing data is simply the receiver's input format.

So far, we have discussed anonymous pipes created by the shell. There is a variant called a named pipe which is a special kind of file. If two programs open the file, one for reading and the other for writing, a named pipe acts like a pipe-fitting between them. Named pipes are a bit of a historical relic; they have been largely displaced from use by named sockets, which we'll discuss below. (For more on the history of this relic, see the discussion of System V IPC below.)

Pipelines have many uses. For one example, Unix's process lister ps(1) lists processes to standard output without caring that a long listing might scroll off the top of the user's display too quickly for the user to see it. Unix has another program, more(1), which displays its standard input in screen-sized chunks, prompting for a user keystroke after displaying each screenful.

Thus, if the user types “ps | more”, piping the output of ps(1) to the input of more(1), successive page-sized pieces of the list of processes will be displayed after each keystroke.

The ability to combine programs like this can be extremely useful. But the real win here is not cute combinations; it's that because both pipes and more(1) exist, other programs can be simpler . Pipes mean that programs like ls(1) (and other programs that write to standard out) don't have to grow their own pagers — and we're saved from a world of a thousand built-in pagers (each, naturally, with its own divergent look and feel). Code bloat is avoided and global complexity reduced.

As a bonus, if anyone needs to customize pager behavior, it can be done in one place, by changing one program. Indeed, multiple pagers can exist, and will all be useful with every application that writes to standard output.

In fact, this has actually happened. On modern Unixes, more(1) has been largely replaced by less(1), which adds the capability to scroll back in the displayed file rather than just forward.[70] Because less(1) is decoupled from the programs that use it, it's possible to simply alias ‘more’ to ‘less’ in your shell, set the environment variable PAGER to ‘less’ (see Chapter10), and get all the benefits of a better pager with all properly-written Unix programs.

Shell source code for the program pic2graph(1) ships with the groff suite of text-formatting tools from the Free Software Foundation. It translates diagrams written in the PIC language to bitmap images. Example7.1 shows the pipeline at the heart of this code.

The pic2graph(1) implementation illustrates how much one pipeline can do purely by calling preexisting tools. It starts by massaging its input into an appropriate form, continues by feeding it through groff(1) to produce PostScript, and finishes by converting the PostScript to a bitmap. All these details are hidden from the user, who simply sees PIC source go in one end and a bitmap ready for inclusion in a Web page come out the other.

This is an interesting example because it illustrates how pipes and filtering can adapt programs to unexpected uses. The program that interprets PIC, pic(1), was originally designed only to be used for embedding diagrams in typeset documents. Most of the other programs in the toolchain it was part of are now semiobsolescent. But PIC remains handy for new uses, such as describing diagrams to be embedded in HTML. It gets a renewed lease on life because tools like pic2graph(1) can bundle together all the machinery needed to convert the output of pic(1) into a more modern format.

We'll examine pic(1) more closely, as a minilanguage design, in Chapter8.

In Unix terms, fetchmail is an uncomfortably large program that bristles with options. Thinking about the way mail transport works, one might think it would be possible to decompose it into a pipeline. Suppose for a moment it were broken up into several programs: a couple of fetch programs to get mail from POP3 and IMAP sites, and a local SMTP injector. The pipeline could pass Unix mailbox format. The present elaborate fetchmail configuration could be replaced by a shellscript containing command lines. One could even insert filters in the pipeline to block spam.

#!/bin/sh
imap [email protected] | spamblocker | smtp jrandom
imap [email protected] | smtp jrandom
# pop [email protected] | smtp jrandom

This would be very elegant and Unixy. Unfortunately, it can't work. We touched on the reason earlier; pipelines are unidirectional.

One of the things the fetcher program (imap or pop) would have to do is decide whether to send a delete request for each message it fetches. In fetchmail's present organization, it can delay sending that request to the POP or IMAP server until it knows that the local SMTP listener has accepted responsibility for the message. The pipelined, small-component version would lose that property.

Consider, for example, what would happen if the smtp injector fails because the SMTP listener reports a disk-full condition. If the fetcher has already deleted the mail, we lose. This means the fetcher cannot delete mail until it is notified to do so by the smtp injector. This in turn raises a host of questions. How would they communicate? What message, exactly, would the injector pass back? The global complexity of the resulting system, and its vulnerability to subtle bugs, would almost certainly be higher than that of a monolithic program.

Pipelines are a marvelous tool, but not a universal one.


[an error occurred while processing this directive]
The Art of Unix Programming
Prev Home Next

 
 
  Published under free license. Design by Interspire