Improving Performance (Practical mod

20.2. Improving Performance

Let's now talk about various techniques that allow you to boost the speed of applications that work with relational databases. A whole book could be devoted to this topic, so here we will concentrate on the techniques that apply specifically to mod_perl servers.

20.2.1. Preopening DBI Connections

If you are using Apache::DBI and you want to make sure that a database connection will already be open when your code is first executed within each child process after a server restart, you should use the connect_on_init( ) method in the startup file to preopen every connection that you are going to use. For example:

Apache::DBI->connect_on_init(
    "DBI:mysql:test:localhost", "my_username", "my_passwd",
    {
     PrintError => 1, # warn( ) on errors
     RaiseError => 0, # don't die on error
     AutoCommit => 1, # commit executes immediately
    }
);

For this method to work, you need to make sure that you have built mod_perl with PERL_CHILD_INIT=1 or EVERYTHING=1.

Be warned, though, that if you call connect_on_init( ) and your database is down, Apache children will be delayed at server startup, trying to connect. They won't begin serving requests until either they are connected or the connection attempt fails. Depending on your DBD driver, this can take several minutes!

20.2.2. Improving Speed by Skipping ping( )

If you use Apache::DBI and want to save a little bit of time, you can change how often the ping( ) method is called. The following setting in a startup file:

Apache::DBI->setPingTimeOut($data_source, $timeout)

will change this behavior. If the value of $timeout is 0, Apache:DBI will validate the database connection using the ping( ) method for every database access. This is the default. Setting $timeout to a negative value will deactivate the validation of the database handle. This can be used for drivers that do not implement the ping( ) method (but it's generally a bad idea, because you don't know if your database handle really works). Setting $timeout to a positive value will ping the database on access only if the previous access was more than $timeoutseconds earlier.

$data_source is the same as in the connect( ) method (e.g., DBI:mysql:...).

20.2.3. Efficient Record-Retrieval Techniques

When working with a relational database, you'll often encounter the need to read the retrieved set of records into your program, then format and print them to the browser.

Assuming that you're already connected to the database, let's consider the following code prototype:

my $query = "SELECT id,fname,lname FROM test WHERE id < 10";
my $sth = $dbh->prepare($query);
$sth->execute;

my @results = ( );
while (my @row_ary  = $sth->fetchrow_array) {
    push @results, [ transform(@row_ary) ];
}
# print the output using the the data returned from the DB

In this example, the httpd process will grow by the size of the variables that have been allocated for the records that matched the query. Remember that to get the total amount of extra memory required by this technique, this growth should be multiplied by the number of child processes that your server runs—which is probably not a constant.

A better approach is not to accumulate the records, but rather to print them as they are fetched from the DB. You can use the methods $sth->bind_columns( ) and $sth->fetchrow_arrayref( ) (aliased to $sth->fetch( )) to fetch the data in the fastest possible way. Example 20-1 prints an HTML table with matched data. Now the only additional memory consumed is for an @cols array to hold temporary row values.

Example 20-1. bind_cols.pl

my $query = "SELECT id,fname,lname FROM test WHERE id < 10";
my @fields = qw(id fname lname);

# create a list of cols values
my @cols = ( );
@cols[0..$#fields] = ( );
$sth = $dbh->prepare($query);
$sth->execute;

# Bind perl variables to columns.
$sth->bind_columns(undef, \(@cols));
print "<table>";
print '<tr bgcolor="grey">', 
    map("<th>$_</th>", @fields), "</tr>";
while ($sth->fetch) {
    print "<tr>", 
        map("<td>$_</td>", @cols), "</tr>";
}
print "</table>";

Note that this approach doesn't tell you how many records have been matched. The workaround is to run an identical query before the code above, using SELECT count(*)... instead of SELECT * ... to get the number of matched records:

my $query = "SELECT count(*) FROM test WHERE id < 10";

This should be much faster, since you can remove any SORT BY and similar attributes.

You might think that the DBI method $sth->rows will tell you how many records will be returned, but unfortunately it will not. You can rely on a row count only after a do (for some specific operations, such as update and delete), after a non-select execute, or after fetching all the rows of a selectstatement.

For selectstatements, it is generally not possible to know how many rows will be returned except by fetching them all. Some DBD drivers will return the number of rows the application has fetched so far, but others may return -1 until all rows have been fetched. Thus, use of the rows method with select statements is not recommended.

20.2.4. mysql_use_result Versus mysql_store_result Attributes

Many mod_perl developers use MySQL as their preferred relational database server because of its speed. Depending on the situation, it may be possible to change the way in which the DBD::mysql driver delivers data. The two attributes mysql_use_result and mysql_store_result influence the speed and size of the processes.

You can tell the DBD::mysql driver to change the default behavior before you start to fetch the results:

my $sth = $dbh->prepare($query);
$sth->{"mysql_use_result"} = 1;

This forces the driver to use mysql_use_result rather than mysql_store_result. The former is faster and uses less memory, but it tends to block other processes, which is why mysql_store_result is the default.

Think about it in client/server terms. When you ask the server to spoon-feed you the data as you use it, the server process must buffer the data, tie up that thread, and possibly keep database locks open for a long time. So if you read a row of data and ponder it for a while, the tables you have locked are still locked, and the server is busy talking to you every so often. That is the situation with mysql_use_result.

On the other hand, if you just suck down the whole data set to the client, then the server is free to serve other requests. This improves parallelism, since rather than blocking each other by doing frequent I/O, the server and client are working at the same time. That is the situation with mysql_store_result.

As the MySQL manual suggests, you should not use mysql_use_result if you are doing a lot of processing for each row on the client side. This can tie up the server and prevent other threads from updating the tables.

If you are using some other DBD driver, check its documentation to see if it provides the flexibility of DBD::mysql in this regard.

20.2.5. Running Two or More Relational Databases

Sometimes you end up running many databases on the same machine. These might have very different needs. For example, one may handle user sessions (updated frequently but with tiny amounts of data), and another may contain large sets of data that are hardly ever updated. You might be able to improve performance by running two differently tuned database servers on one machine. The frequently updated database can gain a lot from fast disk access, whereas the database with mostly static data could benefit from lots of caching.

20.2.6. Caching prepare( ) Statements

You can also benefit from persistent connections by replacing prepare( ) with prepare_cached( ). That way you will always be sure that you have a good statement handle and you will get some caching benefit. The downside is that you are going to pay for DBI to parse your SQL and do a cache lookup every time you call prepare_cached( ). This will give a big performance boost to database servers that execute prepare( ) quite slowly (e.g., Oracle), but it might add an unnecessary overhead with servers such as MySQL that do this operation very quickly.

Be warned that some databases (e.g., PostgreSQL and Sybase) don't support caches of prepared plans. With Sybase you could open multiple connections to achieve the same result, but this is at the risk of getting deadlocks, depending on what you are trying to do!

Another pitfall to watch out for lies in the fact that prepare_cached( ) actually gives you a reference to the same cached statement handle, not just a similar copy. So you can't do this:

my $sth1 = $dbh->prepare_cached('SELECT name FROM table WHERE id=?');
my $sth2 = $dbh->prepare_cached('SELECT name FROM table WHERE id=?');

because $sth1 and $sth2 are now the same object! If you try to use them independently, your code will fail.

Make sure to read the DBI manpage for the complete documentation of this method and the latest updates.