Version Control with Subversion - Chapter 5. Repository Administration - FSFS
In mid-2004, a second type of repository storage system
came into being: one which doesn't use a database at all.
An FSFS repository stores a revision tree in a single file,
and so all of a repository's revisions can be found in a
single subdirectory full of numbered files. Transactions
are created in separate subdirectories. When complete, a
single transaction file is created and moved to the
revisions directory, thus guaranteeing that commits are
atomic. And because a revision file is permanent and
unchanging, the repository also can be backed up while
“hot”, just like a Berkeley DB repository.
The revision-file format represents a revision's
directory structure, file contents, and deltas against files
in other revision trees. Unlike a Berkeley DB database,
this storage format is portable across different operating
systems and isn't sensitive to CPU architecture. Because
there's no journaling or shared-memory files being used, the
repository can be safely accessed over a network filesystem
and examined in a read-only environment. The lack of
database overhead also means that the overall repository
size is a bit smaller.
FSFS has different performance characteristics too.
When committing a directory with a huge number of files, FSFS
uses an O(N) algorithm to append entries, while Berkeley DB
uses an O(N^2) algorithm to rewrite the whole directory. On
the other hand, FSFS writes the latest version of a file as
a delta against an earlier version, which means that
checking out the latest tree is a bit slower than fetching
the fulltexts stored in a Berkeley DB HEAD revision. FSFS
also has a longer delay when finalizing a commit, which
could in extreme cases cause clients to time out when
waiting for a response.
The most important distinction, however, is FSFS's
inability to be “wedged” when something goes
wrong. If a process using a Berkeley DB database runs into
a permissions problem or suddenly crashes, the database is
left unusable until an administrator recovers it. If the
same scenarios happen to a process using an FSFS repository,
the repository isn't affected at all. At worst, some
transaction data is left behind.
The only real argument against FSFS is its relative
immaturity compared to Berkeley DB. It hasn't been used or
stress-tested nearly as much, and so a lot of these
assertions about speed and scalability are just that:
assertions, based on good guesses. In theory, it promises a
lower barrier to entry for new administrators and is less
susceptible to problems. In practice, only time will
[an error occurred while processing this directive]