40.1 Available Data Synchronization Software
Data synchronization is no problem for computers that are
permanently linked by means of a fast network. In this case, use a
network file system, like NFS, and store the files on a server,
enabling all hosts to access the same data via the network. This
approach is impossible if the network connection is poor or not
permanent. When you are on the road with a laptop, copies of all
needed files must be on the local hard disk. However, it is then
necessary to synchronize modified files. When you modify a file on
one computer, make sure a copy of the file is updated on all other
computers. For occasional copies, this can be done manually with scp
or rsync. However, if many files are involved, the procedure can be
complicated and requires great care to avoid errors, such as
overwriting a new file with an old file.
WARNING: Risk of Data Loss
Before you start managing your data with a synchronization
system, you should be well acquainted with the program used and test
its functionality. A backup is indispensable for important files.
The time-consuming and error-prone task of manually
synchronizing data can be avoided by using one of the programs that
use various methods to automate this job. The following summaries are
merely intended to convey a general understanding of how these
programs work and how they can be used. If you plan to use them, read
the program documentation.
40.1.1 Unison
Unison is not a network file system. Instead, the files are
simply saved and edited locally. The program Unison can be executed
manually to synchronize files. When the synchronization is performed
for the first time, a database is created on the two hosts,
containing checksums, time stamps, and permissions of the selected
files. The next time it is executed, Unison can recognize which
files were changed and propose transmission from or to the other
host. Usually all suggestions can be accepted.
40.1.2 CVS
CVS, which is mostly used for managing program source
versions, offers the possibility to keep copies of the files on
multiple computers. Accordingly, it is also suitable for data
synchronization. CVS maintains a central repository on the server in
which the files and changes to files are saved. Changes that are
performed locally are committed to the repository and can be
retrieved from other computers by means of an update. Both
procedures must be initiated by the user.
CVS is very resilient to errors when changes occur on several
computers. The changes are merged and, if changes took place in the
same lines, a conflict is reported. When a conflict occurs, the
database remains in a consistent state. The conflict is only visible
for resolution on the client host.
40.1.3 Subversion
In contrast to CVS, which evolved,
Subversion
(SVN) is a consistently designed project. Subversion was developed
as a technically improved successor to CVS.
Subversion has been improved in many respects to its
predecessor. Due to its history, CVS only maintains files and is
oblivious of directories. Directories also have a version history in
Subversion and can be copied and renamed just like files. It is also
possible to add metadata to every file and to every directory. This
metadata can be fully maintained with versioning. As opposed to CVS,
Subversion supports transparent network access over dedicated
protocols, like WebDAV (Web-based Distributed Authoring and
Versioning). WebDAV extends the functionality of the HTTP protocol
to allow collaborative write access to files on remote Web servers.
Subversion was largely assembled on the basis of existing
software packages. Therefore, the Apache Web server and the WebDAV
extension always run in conjunction with Subversion.
40.1.4 mailsync
Unlike the synchronization tools covered in the previous
sections, mailsync only synchronizes e-mails between mailboxes. The
procedure can be applied to local mailbox files as well as to
mailboxes on an IMAP server.
Based on the message ID contained in the e-mail header, the
individual messages are either synchronized or deleted.
Synchronization is possible between individual mailboxes and between
mailbox hierarchies.
40.1.5 rsync
When no version control is needed but large directory
structures need to be synchronized over slow network connections,
the tool rsync offers well-developed mechanisms for transmitting
only changes within files. This not only concerns text files, but
also binary files. To detect the differences between files, rsync
subdivides the files into blocks and computes checksums over them.
The effort put into the detection of the changes comes at a
price. The systems to synchronize should be scaled generously for
the usage of rsync. RAM is especially important.