Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




Version Control with Subversion
Prev Home Next

Version Control with Subversion - Chapter 5. Repository Administration - FSFS


In mid-2004, a second type of repository storage system came into being: one which doesn't use a database at all. An FSFS repository stores a revision tree in a single file, and so all of a repository's revisions can be found in a single subdirectory full of numbered files. Transactions are created in separate subdirectories. When complete, a single transaction file is created and moved to the revisions directory, thus guaranteeing that commits are atomic. And because a revision file is permanent and unchanging, the repository also can be backed up while “hot”, just like a Berkeley DB repository.

The revision-file format represents a revision's directory structure, file contents, and deltas against files in other revision trees. Unlike a Berkeley DB database, this storage format is portable across different operating systems and isn't sensitive to CPU architecture. Because there's no journaling or shared-memory files being used, the repository can be safely accessed over a network filesystem and examined in a read-only environment. The lack of database overhead also means that the overall repository size is a bit smaller.

FSFS has different performance characteristics too. When committing a directory with a huge number of files, FSFS uses an O(N) algorithm to append entries, while Berkeley DB uses an O(N^2) algorithm to rewrite the whole directory. On the other hand, FSFS writes the latest version of a file as a delta against an earlier version, which means that checking out the latest tree is a bit slower than fetching the fulltexts stored in a Berkeley DB HEAD revision. FSFS also has a longer delay when finalizing a commit, which could in extreme cases cause clients to time out when waiting for a response.

The most important distinction, however, is FSFS's inability to be “wedged” when something goes wrong. If a process using a Berkeley DB database runs into a permissions problem or suddenly crashes, the database is left unusable until an administrator recovers it. If the same scenarios happen to a process using an FSFS repository, the repository isn't affected at all. At worst, some transaction data is left behind.

The only real argument against FSFS is its relative immaturity compared to Berkeley DB. It hasn't been used or stress-tested nearly as much, and so a lot of these assertions about speed and scalability are just that: assertions, based on good guesses. In theory, it promises a lower barrier to entry for new administrators and is less susceptible to problems. In practice, only time will tell.

[13] This may sound really prestigious and lofty, but we're just talking about anyone who is interested in that mysterious realm beyond the working copy where everyone's data hangs out.

[14] Pronounced “fuzz-fuzz”, if Jack Repenning has anything to say about it.

[an error occurred while processing this directive]
Version Control with Subversion
Prev Home Next

  Published under the terms of the Creative Commons License Design by Interspire