Namespace Issues (Practical mod

6.3. Namespace Issues

If your service consists of a single script, you will probably have no namespace problems. But web services usually are built from many scripts and handlers. In the following sections, we will investigate possible namespace problems and their solutions. But first we will refresh our understanding of two special Perl variables, @INC and %INC.

6.3.1. The @INC Array

Perl's @INC array is like the PATH environment variable for the shell program. Whereas PATH contains a list of directories to search for executable programs, @INC contains a list of directories from which Perl modules and libraries can be loaded.

When you use( ), require( ), or do( ) a filename or a module, Perl gets a list of directories from the @INC variable and searches them for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, you must tell Perl where to find the file. You can either provide a path relative to one of the directories in @INC or provide the absolute path to the file.

6.3.2. The %INC Hash

Perl's %INC hash is used to cache the names of the files and modules that were loaded and compiled by use( ), require( ), or do( )statements. Every time a file or module is successfully loaded, a new key-value pair is added to %INC. The key is the name of the file or module as it was passed to one of the three functions we have just mentioned. If the file or module was found in any of the @INC directories (except "."), the filenames include the full path. Each Perl interpreter, and hence each process under mod_perl, has its own private %INC hash, which is used to store information about its compiled modules.

Before attempting to load a file or a module with use( ) or require( ), Perl checks whether it's already in the %INC hash. If it's there, the loading and compiling are not performed. Otherwise, the file is loaded into memory and an attempt is made to compile it. Note that do( ) loads the file or module unconditionally—it does not check the %INC hash. We'll look at how this works in practice in the following examples.

First, let's examine the contents of @INC on our system:

panic% perl -le 'print join "\n", @INC'
/usr/lib/perl5/5.6.1/i386-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i386-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl
.

Notice . (the current directory) as the last directory in the list.

Let's load the module strict.pm and see the contents of %INC:

panic% perl -le 'use strict; print map {"$_ => $INC{$_}"} keys %INC'
strict.pm => /usr/lib/perl5/5.6.1/strict.pm

Since strict.pm was found in the /usr/lib/perl5/5.6.1/ directory and /usr/lib/perl5/5.6.1/ is a part of @INC, %INC includes the full path as the value for the key strict.pm.

Let's create the simplest possible module in /tmp/test.pm:

1;

This does absolutely nothing, but it returns a true value when loaded, which is enough to satisfy Perl that it loaded correctly. Let's load it in different ways:

panic% cd /tmp
panic% perl -e 'use test; \
       print map { "$_ => $INC{$_}\n" } keys %INC'
test.pm => test.pm

Since the file was found in . (the directory the code was executed from), the relative path is used as the value. Now let's alter @INC by appending /tmp:

panic% cd /tmp
panic% perl -e 'BEGIN { push @INC, "/tmp" } use test; \
       print map { "$_ => $INC{$_}\n" } keys %INC'
test.pm => test.pm

Here we still get the relative path, since the module was found first relative to ".". The directory /tmp was placed after . in the list. If we execute the same code from a different directory, the "." directory won't match:

panic% cd /
panic% perl -e 'BEGIN { push @INC, "/tmp" } use test; \
       print map { "$_ => $INC{$_}\n" } keys %INC'
test.pm => /tmp/test.pm

so we get the full path. We can also prepend the path with unshift( ), so that it will be used for matching before ".". We will get the full path here as well:

panic% cd /tmp
panic% perl -e 'BEGIN { unshift @INC, "/tmp" } use test; \
       print map { "$_ => $INC{$_}\n" } keys %INC'
test.pm => /tmp/test.pm

The code:

BEGIN { unshift @INC, "/tmp" }

can be replaced with the more elegant:

use lib "/tmp";

This is almost equivalent to our BEGIN block and is the recommended approach.

These approaches to modifying @INC can be labor intensive: moving the script around in the filesystem might require modifying the path.

6.3.3. Name Collisions with Modules and Libraries

In this section, we'll look at two scenarios with failures related to namespaces. For the following discussion, we will always look at a single child process.

6.3.3.1. A first faulty scenario

It is impossible to use two modules with identical names on the same server. Only the first one found in a use( ) or a require( )statement will be loaded and compiled. All subsequent requests to load a module with the same name will be skipped, because Perl will find that there is already an entry for the requested module in the %INC hash.

Let's examine a scenario in which two independent projects in separate directories, projectA and projectB, both need to run on the same server. Both projects use a module with the name MyConfig.pm, but each project has completely different code in its MyConfig.pm module. This is how the projects reside on the filesystem (all located under the directory /home/httpd/perl):

projectA/MyConfig.pm
projectA/run.pl
projectB/MyConfig.pm
projectB/run.pl

Examples Example 6-6, Example 6-7, Example 6-8, and Example 6-9 show some sample code.

Example 6-6. projectA/run.pl

use lib qw(.);
use MyConfig;
print "Content-type: text/plain\n\n";
print "Inside project: ", project_name( );

Example 6-7. projectA/MyConfig.pm

sub project_name { return 'A'; }
1;

Example 6-8. projectB/run.pl

use lib qw(.);
use MyConfig;
print "Content-type: text/plain\n\n";
print "Inside project: ", project_name( );

Example 6-9. projectB/MyConfig.pm

sub project_name { return 'B'; }
1;

Both projects contain a script, run.pl, which loads the module MyConfig.pm and prints an indentification message based on the project_name( ) function in the MyConfig.pm module. When a request to /perl/projectA/run.pl is issued, it is supposed to print:

Inside project: A

Similarly, /perl/projectB/run.pl is expected to respond with:

Inside project: B

When tested using single-server mode, only the first one to run will load the MyConfig.pm module, although both run.pl scripts call use MyConfig. When the second script is run, Perl will skip the use MyConfig;statement, because MyConfig.pm is already located in %INC. Perl reports this problem in the error_log:

Undefined subroutine
&Apache::ROOT::perl::projectB::run_2epl::project_name called at
/home/httpd/perl/projectB/run.pl line 4.

This is because the modules didn't declare a package name, so the project_name( )subroutine was inserted into projectA/run.pl's namespace, Apache::ROOT::perl::projectB::run_2epl. Project B doesn't get to load the module, so it doesn't get the subroutine either!

Note that if a library were used instead of a module (for example, config.pl instead of MyConfig.pm), the behavior would be the same. For both libraries and modules, a file is loaded and its filename is inserted into %INC.

6.3.3.2. A second faulty scenario

Now consider the following scenario:

project/MyConfig.pm
project/runA.pl
project/runB.pl

Now there is a single project with two scripts, runA.pl and runB.pl, both trying to load the same module, MyConfig.pm, as shown in Examples Example 6-10, Example 6-11, and Example 6-12.

Example 6-10. project/MyConfig.pm

sub project_name { return 'Super Project'; }
1;

Example 6-11. project/runA.pl

use lib qw(.);
use MyConfig;
print "Content-type: text/plain\n\n";
print "Script A\n";
print "Inside project: ", project_name( );

Example 6-12. project/runB.pl

use lib qw(.);
use MyConfig;
print "Content-type: text/plain\n\n";
print "Script B\n";
print "Inside project: ", project_name( );

This scenario suffers from the same problem as the previous two-project scenario: only the first script to run will work correctly, and the second will fail. The problem occurs because there is no package declaration here.

We'll now explore some of the ways we can solve these problems.

6.3.3.3. A quick but ineffective hackish solution

The following solution should be used only as a short term bandage. You can force reloading of the modules either by fiddling with %INC or by replacing use( ) and require( ) calls with do( ).

If you delete the module entry from the %INC hash before calling require( ) or use( ), the module will be loaded and compiled again. See Example 6-13.

Example 6-13. project/runA.pl

BEGIN {
    delete $INC{"MyConfig.pm"};
}
use lib qw(.);
use MyConfig;
print "Content-type: text/plain\n\n";
print "Script A\n";
print "Inside project: ", project_name( );

Apply the same fix to runB.pl.

Another alternative is to force module reload via do( ), as seen in Example 6-14.

Example 6-14. project/runA.pl forcing module reload by using do( ) instead of use( )

use lib qw(.);
do "MyConfig.pm";
print "Content-type: text/plain\n\n";
print "Script B\n";
print "Inside project: ", project_name( );

Apply the same fix to runB.pl.

If you needed to import( )something from the loaded module, call the import( ) method explicitly. For example, if you had:

use MyConfig qw(foo bar);

now the code will look like:

do "MyConfig.pm";
MyConfig->import(qw(foo bar));

Both presented solutions are ultimately ineffective, since the modules in question will be reloaded on each request, slowing down the response times. Therefore, use these only when a very quick fix is needed, and make sure to replace the hack with one of the more robust solutions discussed in the following sections.

6.3.3.4. A first solution

The first faulty scenario can be solved by placing library modules in a subdirectory structure so that they have different path prefixes. The new filesystem layout will be:

projectA/ProjectA/MyConfig.pm
projectA/run.pl
projectB/ProjectB/MyConfig.pm
projectB/run.pl

The run.pl scripts will need to be modified accordingly:

use ProjectA::MyConfig;

and:

use ProjectB::MyConfig;

However, if later on we want to add a new script to either of these projects, we will hit the problem described by the second problematic scenario, so this is only half a solution.

6.3.3.5. A second solution

Another approach is to use a full path to the script, so the latter will be used as a key in %INC:

require "/home/httpd/perl/project/MyConfig.pm";

With this solution, we solve both problems but lose some portability. Every time a project moves in the filesystem, the path must be adjusted. This makes it impossible to use such code under version control in multiple-developer environments, since each developer might want to place the code in a different absolute directory.

6.3.3.6. A third solution

This solution makes use of package-name declaration in the require( ) d modules. For example:

package ProjectA::Config;

Similarly, for ProjectB, the package name would be ProjectB::Config.

Each package name should be unique in relation to the other packages used on the same httpd server. %INC will then use the unique package name for the key instead of the filename of the module. It's a good idea to use at least two-part package names for your private modules (e.g., MyProject::Carp instead of just Carp), since the latter will collide with an existing standard package. Even though a package with the same name may not exist in the standard distribution now, in a later distribution one may come along that collides with a name you've chosen.

What are the implications of package declarations? Without package declarations in the modules, it is very convenient to use( ) and require( ), since all variables and subroutines from the loaded modules will reside in the same package as the script itself. Any of them can be used as if it was defined in the same scope as the script itself. The downside of this approach is that a variable in a module might conflict with a variable in the main script; this can lead to hard-to-find bugs.

With package declarations in the modules, things are a bit more complicated. Given that the package name is PackageA, the syntax PackageA::project_name( )should be used to call a subroutine project_name( ) from the code using this package. Before the package declaration was added, we could just call project_name( ). Similarly, a global variable $foo must now be referred to as $PackageA::foo, rather than simply as $foo. Lexically defined variables (declared with my( )) inside the file containing PackageA will be inaccessible from outside the package.

You can still use the unqualified names of global variables and subroutines if these are imported into the namespace of the code that needs them. For example:

use MyPackage qw(:mysubs sub_b $var1 :myvars);

Modules can export any global symbols, but usually only subroutines and global variables are exported. Note that this method has the disadvantage of consuming more memory. See the perldoc Exporter manpage for information about exporting other variables and symbols.

Let's rewrite the second scenario in a truly clean way. This is how the files reside on the filesystem, relative to the directory /home/httpd/perl:

project/MyProject/Config.pm
project/runA.pl
project/runB.pl

Examples Example 6-15, Example 6-16, and Example 6-17 show how the code will look.

Example 6-15. project/MyProject/Config.pm

package MyProject::Config
sub project_name { return 'Super Project'; }
1;

Example 6-16. project/runB.pl

use lib qw(.);
use MyProject::Config;
print "Content-type: text/plain\n\n";
print "Script B\n";
print "Inside project: ", MyProject::Config::project_name( );

Example 6-17. project/runA.pl

use lib qw(.);
use MyProject::Config;
print "Content-type: text/plain\n\n";
print "Script A\n";
print "Inside project: ", MyProject::Config::project_name( );

As you can see, we have created the MyProject/Config.pm file and added a package declaration at the top of it:

package MyProject::Config

Now both scripts load this module and access the module's subroutine, project_name( ), with a fully qualified name, MyProject::Config::project_name( ).

See also the perlmodlib and perlmod manpages.

From the above discussion, it also should be clear that you cannot run development and production versions of the tools using the same Apache server. You have to run a dedicated server for each environment. If you need to run more than one development environment on the same server, you can use Apache::PerlVINC, as explained in Appendix B.