Version Control with Subversion - Chapter 5. Repository Administration

Version Control with Subversion
Prev	Home	Next

Version Control with Subversion - Chapter 5. Repository Administration - FSFS

FSFS

In mid-2004, a second type of repository storage system came into being: one which doesn't use a database at all. An FSFS repository stores a revision tree in a single file, and so all of a repository's revisions can be found in a single subdirectory full of numbered files. Transactions are created in separate subdirectories. When complete, a single transaction file is created and moved to the revisions directory, thus guaranteeing that commits are atomic. And because a revision file is permanent and unchanging, the repository also can be backed up while “hot”, just like a Berkeley DB repository.

The revision-file format represents a revision's directory structure, file contents, and deltas against files in other revision trees. Unlike a Berkeley DB database, this storage format is portable across different operating systems and isn't sensitive to CPU architecture. Because there's no journaling or shared-memory files being used, the repository can be safely accessed over a network filesystem and examined in a read-only environment. The lack of database overhead also means that the overall repository size is a bit smaller.

FSFS has different performance characteristics too. When committing a directory with a huge number of files, FSFS uses an O(N) algorithm to append entries, while Berkeley DB uses an O(N^2) algorithm to rewrite the whole directory. On the other hand, FSFS writes the latest version of a file as a delta against an earlier version, which means that checking out the latest tree is a bit slower than fetching the fulltexts stored in a Berkeley DB HEAD revision. FSFS also has a longer delay when finalizing a commit, which could in extreme cases cause clients to time out when waiting for a response.

The most important distinction, however, is FSFS's inability to be “wedged” when something goes wrong. If a process using a Berkeley DB database runs into a permissions problem or suddenly crashes, the database is left unusable until an administrator recovers it. If the same scenarios happen to a process using an FSFS repository, the repository isn't affected at all. At worst, some transaction data is left behind.

The only real argument against FSFS is its relative immaturity compared to Berkeley DB. It hasn't been used or stress-tested nearly as much, and so a lot of these assertions about speed and scalability are just that: assertions, based on good guesses. In theory, it promises a lower barrier to entry for new administrators and is less susceptible to problems. In practice, only time will tell.

^[13]This may sound really prestigious and lofty, but we're just talking about anyone who is interested in that mysterious realm beyond the working copy where everyone's data hangs out.

^[14]Pronounced “fuzz-fuzz”, if Jack Repenning has anything to say about it.