The suggested locking methods in the first and second editions of the book
Programming Perl (O'Reilly) and
the DB_File manpage (before Version 1.72, fixed in
1.73) are flawed. If you use them in an environment where more than
one process can modify the DBM file, it can be corrupted. The
following is an explanation of why this happens.
You cannot use a tied file's file handle for
locking, since you get the file handle after the file has already
been tied. It's too late to lock. The problem is
that the database file is locked after it is
opened. When the database is opened, the first 4 KB (for the Berkeley
DB library, at least) are read and then cached in memory. Therefore,
a process can open the database file, cache the first 4 KB, and then
block while another process writes to the file. If the second process
modifies the first 4 KB of the file, when the original process gets
the lock it now has an inconsistent view of the database. If it
writes using this view it may easily corrupt the database on disk.
This problem can be difficult to trace because it does not cause
corruption every time a process has to wait for a lock. One can do
quite a bit of writing to a database file without actually changing
the first 4 KB. But once you suspect this problem, you can easily
reproduce it by making your program modify the records in the first 4
KB of the DBM file.
It's better to resort to using the standard modules
for locking than to try to invent your own.
If your DBM file is used only in the read-only mode, generally there
is no need for locking at all. If you access the DBM file in
read/write mode, the safest method is to tie the DBM file after
acquiring an external lock and untie it before the lock is released.
So to access the file in shared mode
(FLOCK_SH[47]), follow
this pseudocode:
[47]The
FLOCK_* constants are defined in the
Fcntl module; FLOCK_SH for
shared, FLOCK_EX for exclusive, and
FLOCK_UN for unlock.
You might want to save a
few tie(
)/untie( ) calls if the same request
accesses the DBM file more than once. Be careful, though. Based on
the caching effect explained above, a process can perform an atomic
downgrade of an exclusive lock to a shared one without retying the
file:
because it has the updated data in its cache. By atomic, we mean
it's ensured that the lock status gets changed
without any other process getting exclusive access in between.
If you can ensure that one process safely upgrades a shared lock to
an exclusive lock, you can save the overhead of doing the extra
tie( ) and untie( ). But this
operation might lead to a deadlock if two processes try to upgrade
from shared to exclusive locks at the same time. Remember that in
order to acquire an exclusive lock, all other processes need to
release all locks. If your OS's
locking implementation resolves this deadlock by denying one of the
upgrade requests, make sure your program handles that appropriately.
The process that was denied has to untie the DBM file and then ask
for an exclusive lock.
A DBM file always has to be untied before the lock is released
(unless you do an atomic downgrade from exclusive to shared, as we
have just explained). Remember that if at any given moment a process
wants to lock and access the DBM file, it has to retie this file if
it was tied already. If this is not done, the integrity of the DBM
file is not ensured.
To conclude, the safest method of reading from a DBM file is to lock
the file before tying it, untie it before releasing the lock, and, in
the case
of
writing, call sync( ) before untying it.