|
Version Control with Subversion - Repository Maintenance - Managing Disk Space
Managing Disk Space
While the cost of storage has dropped incredibly in the
past few years, disk usage is still a valid concern for
administrators seeking to version large amounts of data.
Every additional byte consumed by the live repository is a
byte that needs to be backed up offsite, perhaps multiple
times as part of rotating backup schedules. If using a
Berkeley DB repository, the primary storage mechanism is a
complex database system, it is useful to know what pieces of
data need to remain on the live site, which need to be
backed up, and which can be safely removed. This section is
specific to Berkeley DB; FSFS repositories have no extra
data to be cleaned up or reclaimed.
Until recently, the largest offender of disk space usage
with respect to Subversion repositories was the log files to
which Berkeley DB performs its pre-writes before modifying
the actual database files. These files capture all the
actions taken along the route of changing the database from
one state to another—while the database files reflect
at any given time some state, the log files contain all the
many changes along the way between states. As such, they
can start to accumulate quite rapidly.
Fortunately, beginning with the 4.2 release of Berkeley
DB, the database environment has the ability to remove its
own unused log files without any external procedures. Any
repositories created using an
svnadmin
which is compiled against Berkeley DB version 4.2 or greater
will be configured for this automatic log file removal. If
you don't want this feature enabled, simply pass the
--bdb-log-keep option to the
svnadmin create
command. If you forget
to do this, or change your mind at a later time, simple edit
the DB_CONFIG file found in your
repository's db directory, comment out
the line which contains the set_flags
DB_LOG_AUTOREMOVE directive, and then run
svnadmin recover
on your repository to
force the configuration changes to take effect. See
the section called “Berkeley DB Configuration” for more information about
database configuration.
Without some sort of automatic log file removal in
place, log files will accumulate as you use your repository.
This is actually somewhat of a feature of the database
system—you should be able to recreate your entire
database using nothing but the log files, so these files can
be useful for catastrophic database recovery. But
typically, you'll want to archive the log files that are no
longer in use by Berkeley DB, and then remove them from disk
to conserve space. Use the
svnadmin
list-unused-dblogs
command to list the unused
log files:
$ svnadmin list-unused-dblogs /path/to/repos
/path/to/repos/log.0000000031
/path/to/repos/log.0000000032
/path/to/repos/log.0000000033
$ svnadmin list-unused-dblogs /path/to/repos | xargs rm
## disk space reclaimed!
To keep the size of the repository as small as possible,
Subversion uses deltification (or,
“deltified storage”) within the repository
itself. Deltification involves encoding the representation
of a chunk of data as a collection of differences against
some other chunk of data. If the two pieces of data are
very similar, this deltification results in storage savings
for the deltified chunk—rather than taking up space
equal to the size of the original data, it only takes up
enough space to say, “I look just like this other
piece of data over here, except for the following couple of
changes”. Specifically, each time a new version of a
file is committed to the repository, Subversion encodes the
previous version (actually, several previous versions) as a
delta against the new version. The result is that most of
the repository data that tends to be sizable—namely,
the contents of versioned files—is stored at a much
smaller size than the original “fulltext”
representation of that data.
Note
Because all of the Subversion repository data that is
subject to deltification is stored in a single Berkeley DB
database file, reducing the size of the stored values will
not necessarily reduce the size of the database file
itself. Berkeley DB will, however, keep internal records
of unused areas of the database file, and use those areas
first before growing the size of the database file. So
while deltification doesn't produce immediate space
savings, it can drastically slow future growth of the
database.
[an error occurred while processing this directive]
|
|