Let's now talk about various techniques that allow
you to boost the speed of applications that work with relational
databases. A whole book could be devoted to this topic, so here we
will concentrate on the techniques that apply specifically to
mod_perl servers.
20.2.1. Preopening DBI Connections
If you are using Apache::DBI and
you
want to make sure that a database
connection will already be open when your code is first executed
within each child process after a server restart, you should use
the connect_on_init( )
method in the startup file to preopen every connection that you are
going to use. For example:
Apache::DBI->connect_on_init(
"DBI:mysql:test:localhost", "my_username", "my_passwd",
{
PrintError => 1, # warn( ) on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
);
For this method to work, you need to make sure that you have built
mod_perl with PERL_CHILD_INIT=1 or
EVERYTHING=1.
Be warned, though, that if you call connect_on_init(
) and your database is down, Apache children will be
delayed at server startup, trying to connect. They
won't begin serving requests until either they are
connected or the connection attempt fails. Depending on your
DBD driver, this can take several minutes!
20.2.2. Improving Speed by Skipping ping( )
If you use Apache::DBI and want to
save a little bit of time, you can change
how often the ping( ) method is called. The
following setting in a startup file:
will change this behavior. If the value of
$timeout is 0,
Apache:DBI will validate the database connection
using the ping( ) method for every database
access. This is the default. Setting $timeout to a
negative value will deactivate the validation of the database handle.
This can be used for drivers that do not implement the ping(
) method (but it's generally a bad idea,
because you don't know if your database handle
really works). Setting $timeout to a positive
value will ping the database on access only if
the previous access was more than $timeoutseconds
earlier.
$data_source is the same as in the
connect( ) method (e.g.,
DBI:mysql:...).
20.2.3. Efficient Record-Retrieval Techniques
When working with a relational
database,
you'll often encounter the need to read the
retrieved set of records into your program, then format and print
them to the browser.
Assuming that you're already connected to the
database, let's consider the following code
prototype:
my $query = "SELECT id,fname,lname FROM test WHERE id < 10";
my $sth = $dbh->prepare($query);
$sth->execute;
my @results = ( );
while (my @row_ary = $sth->fetchrow_array) {
push @results, [ transform(@row_ary) ];
}
# print the output using the the data returned from the DB
In this example, the httpd process will grow by
the size of the variables that have been allocated for the records
that matched the query. Remember that to get the total amount of
extra memory required by this technique, this growth should be
multiplied by the number of child processes that your server
runs—which is probably not a constant.
A better approach is not to accumulate the records, but rather to
print them as they are fetched from the DB. You can use the methods
$sth->bind_columns( ) and
$sth->fetchrow_arrayref( ) (aliased to
$sth->fetch( )) to fetch the data in the
fastest possible way. Example 20-1 prints an HTML
table with matched data. Now the only additional memory consumed is
for an @cols array to hold temporary row values.
Example 20-1. bind_cols.pl
my $query = "SELECT id,fname,lname FROM test WHERE id < 10";
my @fields = qw(id fname lname);
# create a list of cols values
my @cols = ( );
@cols[0..$#fields] = ( );
$sth = $dbh->prepare($query);
$sth->execute;
# Bind perl variables to columns.
$sth->bind_columns(undef, \(@cols));
print "<table>";
print '<tr bgcolor="grey">',
map("<th>$_</th>", @fields), "</tr>";
while ($sth->fetch) {
print "<tr>",
map("<td>$_</td>", @cols), "</tr>";
}
print "</table>";
Note that this approach doesn't tell you how many
records have been matched. The workaround is to run an identical
query before the code above, using SELECTcount(*)... instead of SELECT *
... to get the number of matched records:
my $query = "SELECT count(*) FROM test WHERE id < 10";
This should be much faster, since you can remove any SORT
BY and similar attributes.
You might think that the DBI method
$sth->rows will tell you how many records will
be returned, but unfortunately it will not. You can rely on a row
count only after a do (for some specific
operations, such as update and delete), after a non-select
execute, or after fetching all the rows of a
selectstatement.
For selectstatements, it is generally not
possible to know how many rows will be returned except by fetching
them all. Some DBD drivers will return the number
of rows the application has fetched so far, but others may return
-1 until all rows have been fetched. Thus, use of
the rows method with select
statements is not recommended.
20.2.4. mysql_use_result Versus mysql_store_result Attributes
Many mod_perl developers
use MySQL as their
preferred relational database server because of its speed. Depending
on the situation, it may be possible to change the way in which the
DBD::mysql driver delivers data. The two
attributes mysql_use_result and
mysql_store_result influence the speed and size of
the processes.
You can tell the DBD::mysql driver to change the
default behavior before you start to fetch the results:
my $sth = $dbh->prepare($query);
$sth->{"mysql_use_result"} = 1;
This forces the driver to use mysql_use_result
rather than mysql_store_result. The former is
faster and uses less memory, but it tends to block other processes,
which is why mysql_store_result is the default.
Think about it in client/server terms. When you ask the server to
spoon-feed you the data as you use it, the server process must buffer
the data, tie up that thread, and possibly keep database locks open
for a long time. So if you read a row of data and ponder it for a
while, the tables you have locked are still locked, and the server is
busy talking to you every so often. That is the situation with
mysql_use_result.
On the other hand, if you just suck down the whole data set to the
client, then the server is free to serve other requests. This
improves parallelism, since rather than blocking each other by doing
frequent I/O, the server and client are working at the same time.
That is the situation with mysql_store_result.
As the MySQL manual suggests, you should not use
mysql_use_result if you are doing a lot of
processing for each row on the client side. This can tie up the
server and prevent other threads from updating the tables.
If you are using some other DBD driver, check its
documentation to see if it provides the flexibility of
DBD::mysql in this regard.
20.2.5. Running Two or More Relational Databases
Sometimes you end up running
many databases on
the same machine. These might have very different needs. For example,
one may handle user sessions (updated frequently but with tiny
amounts of data), and another may contain large sets of data that are
hardly ever updated. You might be able to improve performance by
running two differently tuned database servers on one machine. The
frequently updated database can gain a lot from fast disk access,
whereas the database with mostly static data could benefit from lots
of caching.
20.2.6. Caching prepare( ) Statements
You can also benefit from
persistent
connections by replacing prepare( ) with
prepare_cached( ). That way you will always be
sure that you have a good statement handle and you will get some
caching benefit. The downside is that you are going to pay for
DBI to parse your SQL and do a cache lookup every
time you call prepare_cached( ). This will give a
big performance boost to database servers that execute
prepare( ) quite slowly (e.g., Oracle), but it
might add an unnecessary overhead with servers such as MySQL that do
this operation very quickly.
Be warned that some databases (e.g., PostgreSQL and Sybase)
don't support caches of prepared plans. With Sybase
you could open multiple connections to achieve the same result, but
this is at the risk of getting deadlocks, depending on what you are
trying to do!
Another pitfall to watch out for lies in the fact that
prepare_cached( ) actually gives you a reference
to the same cached statement handle, not just a
similar copy. So you can't do this:
my $sth1 = $dbh->prepare_cached('SELECT name FROM table WHERE id=?');
my $sth2 = $dbh->prepare_cached('SELECT name FROM table WHERE id=?');
because $sth1 and $sth2 are now
the same object! If you try to use them independently, your code will
fail.
Make sure to read the DBI manpage for the complete
documentation of this method and the latest updates.