Database locking is required if more than
one process will try to
modify the data. In an environment in which there are both reading
and writing processes, the reading processes should use locking as
well, since it's possible for another process to
modify the resource at the same moment, in which case the reading
process gets corrupted data.
We distinguish between shared-access and exclusive-access locks.
Before doing an operation on the DBM file, an
exclusive lock
request is issued if a read/write access
is required. Otherwise, a shared lock is
issued.
19.2.1. Deadlocks
First let's make sure that you know how
processes work with the CPU. Each process gets a tiny CPU time slice
before another process takes over. Usually operating systems use a
"round robin" technique to decide
which processes should get CPU slices and when. This decision is
based on a simple queue, with each process that needs CPU entering
the queue at the end of it. Eventually the added process moves to the
head of the queue and receives a tiny allotment of CPU time,
depending on the processor speed and implementation (think
microseconds). After this time slice, if it is still not finished,
the process moves to the end of the queue again. Figure 19-1 depicts this process. (Of course, this diagram
is a simplified one; in reality various processes have different
priorities, so one process may get more CPU time slices than others
over the same period of time.)
Figure 19-1. CPU time allocation
Now let's talk about the situation called
deadlock. If two processes simultaneously try to
acquire exclusive locks on two separate resources (databases), a
deadlock is possible. Consider this example:
sub lock_foo {
exclusive_lock('DB1');
exclusive_lock('DB2');
}
sub lock_bar {
exclusive_lock('DB2');
exclusive_lock('DB1');
}
Suppose process A calls lock_foo( ) and process B
calls lock_bar( ) at the same time. Process A
locks resource DB1 and process B locks resource
DB2. Now suppose process A needs to acquire a lock
on DB2, and process B needs a lock on
DB1. Neither of them can proceed, since they each
hold the resource needed by the other. This situation is called a
deadlock.
Using the same CPU-sharing diagram shown in Figure 19-1, let's imagine that process A
gets an exclusive lock on DB1 at time slice 1 and
process B gets an exclusive lock on DB2 at time
slice 2. Then at time slice 4, process A gets the CPU back, but it
cannot do anything because it's waiting for the lock
on DB2 to be released. The same thing happens to
process B at time slice 5. From now on, the two processes will get
the CPU, try to get the lock, fail, and wait for the next chance
indefinitely.
Deadlock wouldn't be a problem if lock_foo(
) and lock_bar( ) were atomic, which
would mean that no other process would get access to the CPU before
the whole subroutine was completed. But this never happens, because
all the running processes get access to the CPU only for a few
milliseconds or even microseconds at a time (called a time
slice). It usually takes more than one CPU time slice to
accomplish even a very simple operation.
For the same reason, this code shouldn't be relied
on:
sub get_lock {
sleep 1, until -e $lock_file;
open LF, $lock_file or die $!;
return 1;
}
The problem with this code is that the test and the action pair
aren't atomic. Even if the -e
test determines that the file doesn't exist, nothing
prevents another process from creating the file in between the
-e test and the next operation that tries to
create it. Later we will see how this problem can be resolved.
19.2.2. Exclusive Locking Starvation
If a shared lock request is issued, it is granted immediately if the
file is not locked or has another shared lock on it. If the file has
an exclusive lock on it, the shared lock request is granted as soon
as that lock is removed. The lock status becomes
SHARED on success.
If an exclusive lock is requested, it is granted as soon as the file
becomes unlocked. The lock status becomes
EXCLUSIVE on success.
If the DB has a shared lock on it, a process that makes an exclusive
lock request will poll until there are no reading or writing
processes left. Lots of processes can successfully read the file,
since they do not block each other. This means that a process that
wants to write to the file may never get a chance to squeeze in,
since it needs to obtain an exclusive lock.
Figure 19-2 represents a possible scenario in which
everybody can read but no one can write.
("pX" represents different
processes running at different times, all acquiring shared locks on
the DBM file.)
Figure 19-2. Overlapping shared locks prevent an exclusive lock
The result is a starving process that will time out the request,
which will fail to update the DB. Ken Williams solved this problem
with his Tie::DB_Lock module, discussed later in
this chapter.
There are several locking wrappers for DB_File on
CPAN right now. Each one implements locking differently and has
different goals in mind. It is worth knowing the differences between
them, so that you can pick the right one for your application.