Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




The Art of Unix Programming
Prev Home Next

Unix Programming - The Importance of Being Textual - Case Study: .newsrc Format

Case Study: .newsrc Format

Usenet news is a worldwide distributed bulletin-board system that anticipated today's P2P networking by two decades. It uses a message format very similar to that of RFC822 electronic-mail messages, except that instead of being directed to personal recipients messages are sent to topic groups. Articles posted at any participating site are broadcast to each site that it has registered as a neighbor, and eventually flood-fill to all news sites.

Almost all Usenet news readers understand the .newsrc file, which records which Usenet messages have been seen by the calling user. Though it is named like a run-control file, it is not only read at startup but typically updated at the end of the newsreader run. The .newsrc format has been fixed since the first newsreaders around 1980. Example5.2 is a representative section from a .newsrc file.

Each line sets properties for the newsgroup named in the first field. The name is immediately followed by a character that indicates whether the owning user is currently subscribed to the group or not; a colon indicates subscription, and an exclamation mark indicates nonsubscription. The remainder of the line is a sequence of comma-separated article numbers or ranges of article numbers, indicating which articles the user has seen.

Non-Unix programmers might have automatically tried to design a fast binary format in which each newsgroup status was described by either a long but fixed-length binary record, or a sequence of self-describing binary packets with internal length fields. The main point of such a binary representation would be to express ranges with binary data in paired word-length fields, in order to avoid the overhead of parsing all the range expressions at startup.

Such a layout could be read and written faster than a textual format, but it would have other problems. A nave implementation in fixed-length records would have placed artificial length limits on newsgroup names and (more seriously) on the maximum number of ranges of seen-article numbers. A more sophisticated binary-packet format would avoid the length limits, but could not be edited with the user's eyeballs and fingers — a capability that can be quite useful when you want to reset just some of the read bits in an individual newsgroup. Also, it would not necessarily be portable to different machine types.

The designers of the original newsreader chose transparency and interoperability over economy. The case for going in the other direction was not completely ridiculous; .newsrc files can get very large, and one modern reader (GNOME's Pan) uses a speed-optimized private format to avoid startup lag. But to other implementers, textual representation looked like a good tradeoff in 1980, and has looked better as machines increased in speed and storage dropped in price.

[an error occurred while processing this directive]
The Art of Unix Programming
Prev Home Next

  Published under free license. Design by Interspire