Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

The Art of Unix Programming
Prev Home Next


Unix Programming - Compactness and Orthogonality - Compactness and the Strong Single Center

Compactness and the Strong Single Center

One subtle but powerful way to promote compactness in a design is to organize it around a strong core algorithm addressing a clear formal definition of the problem, avoiding heuristics and fudging.

Formalization often clarifies a task spectacularly. It is not enough for a programmer to recognize that bits of his task fall within standard computer-science categories — a little depth-first search here and a quicksort there. The best results occur when the nub of the task can be formalized, and a clear model of the job at hand can be constructed. It is not necessary that ultimate users comprehend the model. The very existence of a unifying core will provide a comfortable feel, unencumbered with the why-in-hell-did-they-do-that moments that are so prevalent in using Swiss-army-knife programs.

-- Doug McIlroy

This is an often-overlooked strength of the Unix tradition. Many of its most effective tools are thin wrappers around a direct translation of some single powerful algorithm.

Perhaps the clearest example of this is diff(1), the Unix tool for reporting differences between related files. This tool and its dual, patch(1), have become central to the network-distributed development style of modern Unix. A valuable property of diff is that it seldom surprises anyone. It doesn't have special cases or painful edge conditions, because it uses a simple, mathematically sound method of sequence comparison. This has consequences:

By virtue of a mathematical model and a solid algorithm, Unix diff contrasts markedly with its imitators. First, the central engine is solid, small, and has never needed one line of maintenance. Second, the results are clear and consistent, unmarred by surprises where heuristics fail.

-- Doug McIlroy

Thus, people who use diff can develop an intuitive feel for what it will do in any given situation without necessarily understanding the central algorithm perfectly. Other well-known examples of this special kind of clarity achieved through a strong central algorithm abound in Unix:

All three of these programs are so bug-free that their correct functioning is taken utterly for granted, and compact enough to fit easily in a programmer's hand. Only a part of these good qualities are due to the polishing that comes with a long service life and frequent use; most of it is that, having been constructed around a strong and provably correct algorithmic core, they never needed much polishing in the first place.

The opposite of a formal approach is using heuristics—rules of thumb leading toward a solution that is probabilistically, but not certainly, correct. Sometimes we use heuristics because a deterministically correct solution is impossible. Think of spam filtering, for example; an algorithmically perfect spam filter would need a full solution to the problem of understanding natural language as a module. Other times, we use heuristics because known formally correct methods are impossibly expensive. Virtual-memory management is an example of this; there are near-perfect solutions, but they require so much runtime instrumentation that their overhead would swamp any theoretical gain over heuristics.

The trouble with heuristics is that they proliferate special cases and edge cases. If nothing else, you usually have to backstop a heuristic with some sort of recovery mechanism when it fails. All the usual problems with escalating complexity follow. To manage the resulting tradeoffs, you have to start by being aware of them. Always ask if a heuristic actually pays off in performance what it costs in code complexity — and don't guess at the performance difference, actually measure it before making a decision.


[an error occurred while processing this directive]
The Art of Unix Programming
Prev Home Next

 
 
  Published under free license. Design by Interspire