Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




Chapter 18. mod_perl Data-Sharing Techniques

In this chapter, we discuss the ways mod_perl makes it possible to share data between processes or even between different handlers.

18.1. Sharing the Read-Only Data in and Between Processes

If you need to access some data in your code that's static and will not be modified, you can save time and resources by processing the data once and caching it for later reuse. Since under mod_perl processes persist and don't get killed after each request, you can store the data in global variables and reuse it.

For example, let's assume that you have a rather expensive function, get_data( ), which returns read-only data as a hash. In your code, you can do the following:

use vars qw(%CACHE);
%CACHE = get_data( ) unless %CACHE;
my $foo = $CACHE{bar};

This code creates a global hash, %CACHE, which is undefined when the code is executed for the first time. Therefore, the get_data( ) method is called, which hopefully populates %CACHE with some data. Now you can access this data as usual.

When the code is executed for the second time within the same process, the get_data( ) method will not be called again, since %CACHE has the data already (assuming that get_data( ) returned data when it was called for the first time).

Now you can access the data without any extra retrieval overhead.

If, for example, get_data( ) returns a reference to a list, the code will look like this:

use enum qw(FIRST SECOND THIRD);
use vars qw($RA_CACHE);
$RA_CACHE = get_data( ) unless $RA_CACHE;
my $second = $RA_CACHE->[SECOND];

Here we use the enum pragma to create constants that we will use in accessing the array reference. In our example, FIRST equals 0, SECOND equals 1, etc. We have used the RA_ prefix to indicate that this variable includes a reference to an array. So just like with the hash from the previous example, we retrieve the data once per process, cache it, and then access it in all subsequent code re-executions (e.g., HTTP requests) without calling the heavy get_data( ) method.

This is all fine, but what if the retrieved data set is very big and duplicating it in all child processes would require a huge chunk of memory to be allocated? Since we assumed that the data is read-only, can we try to load it into memory only once and share it among child processes? There is a feasible solution: we can run the get_data( ) method during server startup and place the retrieved data into a global variable of some new package that we have created on the fly. For example, let's create a package called Book::Cache, as shown in Example 18-1.

Example 18-1. Book/

package Book::Cache;

%Book::Cache::DATA = get_data( );
sub get_data {
    # some heavy code that generates/retrieves data

And initialize this module from

use Book::Cache ( );

Now when the child processes get spawned, this data is available for them all via a simple inclusion of the module in the handler's code:

use Book::Cache ( );
$foo = $Book::Cache::DATA{bar};

Be careful, though, when accessing this data. The data structure will be shared only if none of the child processes attempts to modify it. The moment a child process modifies this data, the copy-on-write event happens and the child copies the whole data structure into its namespace, and this data structure is not shared anymore.

Copyright © 2003 O'Reilly & Associates. All rights reserved.

  Published courtesy of O'Reilly Design by Interspire