Nowadays, millions of people surf the Internet. There are millions of
terabytes of data lying around, and many new techniques and
technologies have been invented to manipulate this data. One of these
inventions is the relational database, which makes it possible to
search and modify huge stores of data very quickly. The Structured
Query Language (SQL) is used to access and manipulate the contents of
these databases.
Let's say that you started your web services with a
simple, flat-file database. Then with time your data grew big, which
made the use of a flat-file database slow and inefficient. So you
switched to the next simple solution—using DBM files. But your
data set continued to grow, and even the DBM files
didn't provide a scalable enough solution. So you
finally decided to switch to the most advanced solution, a relational
database.
On the other hand, it's quite possible that you had
big ambitions in the first place and you decided to go with a
relational database right away.
We went through both scenarios, sometimes doing the minimum
development using DBM files (when we knew that the data set was small
and unlikely to grow big in the short term) and sometimes developing
full-blown systems with relational databases at the heart.
As we repeat many times in this book, none of our suggestions and
examples should be applied without thinking. But since
you're reading this chapter, the chances are that
you are doing the right thing, so we are going to concentrate on the
extra benefits that mod_perl provides when you use relational
databases. We'll also talk about related coding
techniques that will help you to improve the performance of your
service.
From now on, we assume that you use the
DBI module to talk to the databases. This in turn
uses the unique database driver module for your database, which
resides in the DBD:: namespace (for example,
DBD::Oracle for Oracle and
DBD::mysql for MySQL). If you stick to standard
SQL, you maximize portability from one database to another. Changing
to a new database server should simply be a matter of using a
different database driver. You do this just by changing the data set
name string ($dsn) in the
DBI->connect( ) call.
Rather than writing your queries in plain SQL, you should probably
use some other abstraction module on top of the
DBI module. This can help to make your code more
extensible and maintainable. Raw SQL coupled with
DBI usually gives you the best machine
performance, but sometimes time to market is what counts, so you have
to make your choices. An abstraction layer with a well-thought-out
API is a pleasure to work with, and future modifications to the code
will be less troublesome. Several DBI abstraction solutions are
available on CPAN. DBIx::Recordset,
Alzabo, and Class::DBI are just
a few such modules that you may want to try. Take a look at the other
modules in the DBIx:: category—many of them
provide some kind of wrapping and abstraction around
DBI.
20.1. Persistent Database Connections with Apache::DBI
When people first started to use the Web, they found that they
needed to write web interfaces to their databases, or add databases
to drive their web interfaces. Whichever way you look at it, they
needed to connect to the databases in order to use them.
CGI is the most widely used protocol for building such interfaces,
implemented in Apache's mod_cgi and its equivalents.
For working with databases, the main limitation of most
implementations, including mod_cgi,
is that they don't allow persistent connections to
the database. For every HTTP request, the CGI script has to connect
to the database, and when the request is completed the connection is
closed. Depending on the relational database that you use, the time
to instantiate a connection may be very fast (for example, MySQL) or
very slow (for example, Oracle). If your database provides a very
short connection latency, you may get away without having persistent
connections. But if not, it's possible that opening
a connection may consume a significant slice of the time to serve a
request. It may be that if you can cut this overhead you can greatly
improve the performance of your service.
Apache::DBI was written to solve this problem.
When you use it with mod_perl, you have a database connection that
persists for the entire life of a mod_perl process. This is possible
because with mod_perl, the child process does not quit when a request
has been served. When a mod_perl script needs to use a database,
Apache::DBI immediately provides a valid
connection (if it was already open) and your script starts doing the
real work right away without having to make a database connection
first.
Of course, the persistence doesn't help with any
latency problems you may encounter during the actual use of the
database connections. Oracle, for example, is notorious for
generating a network transaction for each row returned. This slows
things down if the query execution matches many rows.
The DBI module can make use of the
Apache::DBI module. When the
DBI module loads, it tests whether the environment
variable $ENV{MOD_PERL} is set and whether the
Apache::DBI module has already been loaded. If so,
the DBI module forwards every connect(
) request to the Apache::DBI
module.
When Apache::DBI gets a connect(
) request, it checks whether it already has a handle with
the same connect( ) arguments. If it finds one, it
checks that the connection is still valid using the ping(
) method. If this operation succeeds, the database handle
is returned immediately. If there is no appropriate database handle,
or if the ping( ) method fails,
Apache::DBI establishes a new connection, stores
the handle, and then returns the handle to the caller.
It is important to understand that the pool of connections is not
shared between the processes. Each process has its own pool of
connections.
When you start using
Apache::DBI, there is no need to delete all the
disconnect( )statements from your code. They
won't do anything, because the
Apache::DBI module overloads the
disconnect( ) method with an empty one. You
shouldn't modify your scripts at all for use with
Apache::DBI.
20.1.2. When to Use Apache::DBI (and When Not to Use It)
You will want to use the
Apache::DBI module only if you are opening just a
few database connections per process. If there are ten child
processes and each opens two different connections (using different
connect( ) arguments), in total there will be 20
opened and persistent connections.
This module must not be used if (for example)
you have many users, and a unique
connection (with unique connect( ) arguments) is
required for each user.[48] You cannot ensure that requests from one
user will be served by any particular process, and connections are
not shared between the child processes, so many child processes will
open a separate, persistent connection for each user. In the worst
case, if you have 100 users and 50 processes, you could end up with
5,000 persistent connections, which might be largely unused. Since
database servers have limitations on the maximum number of opened
connections, at some point new connections will not be permitted, and
eventually your service will become unavailable.
[48]That is, database user
connections. This doesn't mean that if many people
register as users on your web site you shouldn't use
Apache::DBI; it is only a very special
case.
If you want to use Apache::DBI but you have both
situations on one machine, at the time of writing the only solution
is to run two mod_perl-enabled servers, one that uses
Apache::DBI and one that does not.
In mod_perl 2.0, a threaded server can be used, and this situation is
much improved. Assuming that you have a single process with many
threads and each unique open connection is needed by only a single
thread, it's possible to have a pool of database
connections that are reused by different threads.
20.1.3. Configuring Apache::DBI
Apache::DBI will not work
unless
mod_perl was built with:
PERL_CHILD_INIT=1 PERL_STACKED_HANDLERS=1
or:
EVERYTHING=1
during the perl Makefile.PL ... stage.
After installing this module, configuration is simple—just add
a single directive to httpd.conf:
PerlModule Apache::DBI
Note that it is important to load this module before any other
Apache*DBI module and before the
DBI module itself. The best rule is just to load
it first of all. You can skip preloading DBI at
server startup, since Apache::DBI does that for
you, but there is no harm in leaving it in, as long as
Apache::DBI is loaded first.
20.1.4. Debugging Apache::DBI
If you are not sure whether this module is working as advertised
and that your connections are actually persistent, you should enable
debug mode in the startup.pl script, like this:
$Apache::DBI::DEBUG = 1;
Starting with Apache::DBI Version 0.84, the above
setting will produce only minimal output. For a full trace, you
should set:
$Apache::DBI::DEBUG = 2;
After setting the DEBUG level, you will see
entries in the error_log file. Here is a sample
of the output with a DEBUG level of 1:
12851 Apache::DBI new connect to
'test::localhostPrintError=1RaiseError=0AutoCommit=1'
12853 Apache::DBI new connect to
'test::localhostPrintError=1RaiseError=0AutoCommit=1'
When a connection is reused, Apache::DBIstays
silent, so you can see when a real connect( ) is
called. If you set the DEBUG level to 2,
you'll see a more verbose output. This output was
generated after two identical requests with a single server running:
12885 Apache::DBI need ping: yes
12885 Apache::DBI new connect to
'test::localhostPrintError=1RaiseError=0AutoCommit=1'
12885 Apache::DBI need ping: yes
12885 Apache::DBI already connected to
'test::localhostPrintError=1RaiseError=0AutoCommit=1'
You can see that process 12885 created a new connection on the first
request and on the next request reused it, since it was using the
same connect( ) argument. Moreover, you can see
that the connection was validated each time with the ping( )
method.
20.1.5. Caveats and Troubleshooting
This section covers some of the risks and things to keep in mind when
using Apache::DBI.
20.1.5.1. Database locking risks
When you use Apache::DBIor similar persistent
connections, be very careful about locking the database
(LOCK TABLE ...) or single rows. MySQL threads
keep tables locked until the thread ends (i.e., the connection is
closed) or until the tables are explicitly unlocked. If your session
dies while tables are locked, they will stay locked, as your
connection to the database won't be closed. In Chapter 6 we discussed how to terminate the program
cleanly if the session is aborted prematurely.
20.1.5.2. Transactions
A standard Perl script using DBI will
automatically perform a rollback whenever the script exits. In the
case of persistent database connections, the database handle will not
be destroyed and hence no automatic rollback will occur. At first
glance it even seems to be possible to handle a transaction over
multiple requests, but the temptation should be avoided because
different requests are handled by different mod_perl processes, and a
mod_perl process does not know the state of a specific transaction
that has been started by another mod_perl process.
In general, it is good practice to perform an explicit commit or
rollback at the end of every script. To avoid inconsistencies in the
database in case AutoCommit is
Off and the script terminates prematurely without
an explicit rollback, the Apache::DBI module uses
a PerlCleanupHandler to issue a rollback at the
end of every request.
20.1.5.3. Opening connections with different parameters
When Apache::DBI receives
a connection request, before it
decides to use an existing cached connection it insists that the new
connection be opened in exactly the same way as the cached
connection. If you have one script that sets
AutoCommit and one that does not,
Apache::DBI will make two different connections.
So, for example, if you have limited Apache to 40 servers at most,
instead of having a maximum of 40 open connections, you may end up
with 80.
These two connect( ) calls will create two
different connections:
my $dbh = DBI->connect
("DBI:mysql:test:localhost", '', '',
{
PrintError => 1, # warn( ) on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
) or die "Cannot connect to database: $DBI::errstr";
my $dbh = DBI->connect
("DBI:mysql:test:localhost", '', '',
{
PrintError => 1, # warn( ) on errors
RaiseError => 0, # don't die on error
AutoCommit => 0, # don't commit executes immediately
}
) or die "Cannot connect to database: $DBI::errstr";
Notice that the only difference is in the value of
AutoCommit.
However, you are free to modify the handle immediately after you get
it from the cache, so always initiate connections using the same
parameters and set AutoCommit (or whatever)
afterward. Let's rewrite the second
connect( ) call to do the right thing (i.e., not
to create a new connection):
my $dbh = DBI->connect
("DBI:mysql:test:localhost", '', '',
{
PrintError => 1, # warn( ) on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
) or die "Cannot connect to database: $DBI::errstr";
$dbh->{AutoCommit} = 0; # don't commit if not asked to
When you aren't sure whether you're
doing the right thing, turn on debug mode.
When the $dbh attribute is altered after
connect( ), it affects all other handlers
retrieving this database handle. Therefore, it's
best to restore the modified attributes to their original values at
the end of database handle usage. As of
Apache::DBI Version 0.88, the caller has to do
this manually. The simplest way to handle this is to localize the
attributes when modifying them:
my $dbh = DBI->connect(...) ...
{
local $dbh->{LongReadLen} = 40;
}
Here, the LongReadLen attribute overrides the
value set in the connect( ) call or its default
value only within the enclosing block.
The problem with this approach is that prior to Perl Version 5.8.0 it
causes memory leaks. So the only clean alternative for older Perl
versions is to manually restore
$dbh's values:
my @attrs = qw(LongReadLen PrintError);
my %orig = ( );
my $dbh = DBI->connect(...) ...
# store the values away
$orig{$_} = $dbh->{$_} for @attrs;
# do local modifications
$dbh->{LongReadLen} = 40;
$dbh->{PrintError} = 1;
# do something with the database handle
# ...
# now restore the values
$dbh->{$_} = $orig{$_} for @attrs;
Another thing to remember is that with some database servers
it's possible to access more than one database using
the same database connection. MySQL is one of those servers. It
allows you to use a fully qualified table specification notation. So
if there is a database foo with a table
test and a database bar
with its own table test, you can always use:
SELECT * FROM foo.test ...
or:
SELECT * FROM bar.test ...
No matter what database you have used in the database name string in
the connect( ) call (e.g.,
DBI:mysql:foo:localhost), you can still access
both tables by using a fully qualified syntax.
Alternatively, you can switch databases with USE
foo and USE bar, but this approach seems
less convenient, and therefore error-prone.
20.1.5.4. Cannot find the DBI handler
You must use DBI->connect( ) as in
normal
DBI usage to get your $dbh
database handle. Using Apache::DBI does not
eliminate the need to write proper DBI code. As
the Apache::DBI manpage states, you should program
as if you are not using Apache::DBI at all.
Apache::DBI will override the
DBI methods where necessary and return your cached
connection. Any disconnect( ) calls will just be
ignored.
20.1.5.5. The morning bug
The SQL server keeps a connection to the client open for a
limited period of time. In the early days of
Apache::DBI, everyone was bitten by the so-called
morning bug—every morning the first user
to use the site received a "No Data
Returned" message, but after that everything worked
fine.
The error was caused by Apache::DBI returning an
invalid connection handle (the server had closed it because of a
timeout), and the script was dying on that error. The ping(
) method was introduced to solve this problem, but it
didn't work properly until
Apache::DBI Version 0.82 was released. In that
version and after, ping( ) was called inside an
eval block, which resolved the problem.
It's still possible that some
DBD:: drivers don't have the
ping( ) method implemented. The
Apache::DBI manpage explains how to write it.
Another solution is to increase the timeout parameter when starting
the database server. We usually start the MySQL server with the
script safe_mysqld, so we modified it to use
this option:
nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800
The timeout value that we use is 172,800 seconds, or 48 hours. This
change solves the problem, but the ping( ) method
works properly in DBD::mysql as well.
20.1.5.6. Apache:DBI does not work
If Apache::DBI doesn't work,
first make sure
that you have it installed. Then make sure that you configured
mod_perl with either:
PERL_CHILD_INIT=1 PERL_STACKED_HANDLERS=1
or:
EVERYTHING=1
Turn on debug mode using the $Apache::DBI::DEBUG
variable.
20.1.5.7. Skipping connection cache during server startup
Does your error_log look like this?
10169 Apache::DBI PerlChildInitHandler
10169 Apache::DBI skipping connection cache during server startup
Database handle destroyed without explicit disconnect at
/usr/lib/perl5/site_perl/5.6.1/Apache/DBI.pm line 29.
If so, you are trying to open a database connection in the parent
httpd process. If you do, the children will each
get a copy of this handle, causing clashes when the handle is used by
two processes at the same time. Each child must have its own unique
connection handle.
To avoid this problem, Apache::DBI checks whether
it is called during server startup. If so, the module skips the
connection cache and returns immediately without a database handle.
You must use the Apache::DBI->connect_on_init(
) method (see the next section) in the startup file to
preopen a connection before the child processes are
spawned.