Method Handlers: The Browse and See, Browse and View Example (Practical mod

6.12. Method Handlers: The Browse and See, Browse and View Example

Let's look at an example of the method-handler concepts presented in Chapter 4. Suppose you need to implement a handler that allows browsing the files in the document root and beneath. Directories should be browsable (so you can move up and down the directory tree), but files should not be viewable (so you can see the available files, but you cannot click to view them).

So let's write a simple file browser. We know what customers are like, so we suspect that the customer will ask for similar customized modules pretty soon. To avoid having to duplicate our work later, we decide to start writing a base class whose methods can easily be overridden as needed. Our base class is called Apache::BrowseSee.

We start the class by declaring the package and using the strict pragma:

package Apache::BrowseSee;
use strict;

Next, we import common constants (e.g., OK, NOT_FOUND, etc.), load the File::Spec::Functions and File::Basename modules, and import a few path-manipulation functions that we are going to use:

use Apache::Constants qw(:common);
use File::Spec::Functions qw(catdir canonpath curdir updir);
use File::Basename 'dirname';

Now let's look at the functions. We start with the simple constructor:

sub new { bless {  }, shift;}

The real entry point, the handler, is prototyped as ($$). The handler starts by instantiating its object, if it hasn't already been done, and storing the $r object, so we don't need to pass it to the functions as an argument:

sub handler ($$) {
    my($self, $r) = @_;
    $self = $self->new unless ref $self;
    $self->{r} = $r;

Next we retrieve the path_info element of the request record:

    $self->{dir} = $r->path_info || '/';

For example, if the request was /browse/foo/bar, where /browse is the location of the handler, the path_info element will be /foo/bar. The default value / is used when the path is not specified.

Then we reset the entries for dirs and files:

    $self->{dirs}  = {  };
    $self->{files} = {  };

This is needed because it's possible that the $self object is created outside the handler (e.g., in the startup file) and may persist between requests.

Now an attempt to fetch the contents of the directory is made:

    eval { $self->fetch( ) };
    return NOT_FOUND if $@;

If the fetch( ) method dies, the error message is assigned to $@ and we return NOT_FOUND. You may choose to approach it differently and return an error message explaining what has happened. You may also want to log the event before returning:

    warn($@), return NOT_FOUND if $@;

Normally this shouldn't happen, unless a user messes with the arguments (something you should always be on the lookout for, because they will do it).

When the fetch( ) function has completed successfully, all that's left is to send the HTTP header and start of the HTML via the head( ) method, render the response, send the end of the HTML via tail( ),[32] and finally to return the OK constant to tell the server that the request has been fully answered:

[32]This could perhaps be replaced by a templating system. See Appendix D for more information about the Template Toolkit.

    $self->head;
    $self->render;
    $self->tail;

    return OK;
}

The response is generated by three functions. The head( ) method is a very simple one—it sends the HTTP header text/html and prints an HTML preamble using the current directory name as a title:

sub head {
    my $self = shift;
    $self->{r}->send_http_header("text/html");
    print "<html><head><title>Dir: $self->{dir}</title><head><body>";
}

The tail( ) method finishes the HTML document:

sub tail {
    my $self = shift;
    print "</body></html>";
}

The fetch( ) method reads the contents of the directory stored in the object's dir attribute (relative to the document root) and then sorts the contents into two groups, directories and files:

sub fetch {
    my $self = shift;
    my $doc_root = Apache->document_root;
    my $base_dir = canonpath( catdir($doc_root, $self->{dir}));

    my $base_entry = $self->{dir} eq '/' ? '' : $self->{dir};
    my $dh = Apache::gensym( );
    opendir $dh, $base_dir or die "Cannot open $base_dir: $!";
    for (readdir $dh) {
        next if $_ eq curdir( );  # usually '.'

        my $full_dir = catdir $base_dir, $_;
        my $entry = "$base_entry/$_";
        if (-d $full_dir) {
            if ($_ eq updir( )) { # '..'
                $entry = dirname $self->{dir};
                next if catdir($base_dir, $entry) eq $doc_root;
            }
            $self->{dirs}{$_} = $entry;
        }
        else {
            $self->{files}{$_} = $entry;
        }
    }
    closedir $dh;
}

By using canonpath( ), we make sure that nobody messes with the path_info element, by eliminating successive slashes and "/."s on Unix and taking appropriate actions on other operating systems. It's important to use File::Spec and other cross-platform functions when developing applications.

While looping through the directory entries, we skip over the current directory entry using the curdir( ) function imported from File::Spec::Functions (which is equivalent to . on Unix) and handle the parent directory entry specially by matching the updir( ) function (which is equivalent to .. on Unix). The function dirname( ) gives us the parent directory, and afterward we check that this directory is different from the document root. If it's the same, we skip this entry.

Note that since we use the path_info element to pass the directory relative to the document root, we rely on Apache to handle the case when users try to mess with the URL and add .. to reach files they aren't supposed to reach.

Finally, let's look at the render( ) method:

sub render {
    my $self = shift;
    print "<p>Current Directory: <i>$self->{dir}</i><br>";

    my $location = $self->{r}->location;
    print qq{<a href="$location$self->{dirs}{$_}">$_</a><br>}
        for sort keys %{ $self->{dirs} || {  } };
    print qq{$_<br>}
        for sort keys %{ $self->{files} || {  } };
}

The render( ) method actually takes the files and directories prepared in the fetch( ) method and displays them to the user. First the name of the current directory is displayed, followed by the directories and finally the files. Since the module should allow browsing of directories, we hyperlink them. The files aren't linked, since we are in "see but don't touch" mode.[33]

[33]In your real code you should also escape HTML- and URI-unsafe characters in the filenames (e.g., <, >, &, ", ', etc.) by using the Apache::Util::escape_html and Apache::Util::escape_uri functions.

Finally, we finish the package with 1; to make sure that the module will be successfully loaded. The _ _END_ _ token allows us to put various notes and POD documentation after the program, where Perl won't complain about them.

1;
_ _END_ _

Example 6-39 shows how the whole package looks.

Example 6-39. Apache/BrowseSee.pm

package Apache::BrowseSee;
use strict;

use Apache::Constants qw(:common);
use File::Spec::Functions qw(catdir canonpath curdir updir);
use File::Basename 'dirname';

sub new { bless {}, shift;}

sub handler ($$) {
    my($self, $r) = @_;
    $self = $self->new unless ref $self;

    $self->{r}     = $r;
    $self->{dir}   = $r->path_info || '/';
    $self->{dirs}  = {};
    $self->{files} = {};

    eval { $self->fetch( ) };
    return NOT_FOUND if $@;

    $self->head;
    $self->render;
    $self->tail;

    return OK;
}

sub head {
    my $self = shift;
    $self->{r}->send_http_header("text/html");
    print "<html><head><title>Dir: $self->{dir}</title><head><body>";
}

sub tail {
    my $self = shift;
    print "</body></html>";
}

sub fetch {
    my $self = shift;
    my $doc_root = Apache->document_root;
    my $base_dir = canonpath( catdir($doc_root, $self->{dir}));

    my $base_entry = $self->{dir} eq '/' ? '' : $self->{dir};
    my $dh = Apache::gensym( );
    opendir $dh, $base_dir or die "Cannot open $base_dir: $!";
    for (readdir $dh) {
        next if $_ eq curdir( );

        my $full_dir = catdir $base_dir, $_;
        my $entry = "$base_entry/$_";
        if (-d $full_dir) {
            if ($_ eq updir( )) {
                $entry = dirname $self->{dir};
                next if catdir($base_dir, $entry) eq $doc_root;
            }
            $self->{dirs}{$_} = $entry;
        }
        else {
            $self->{files}{$_} = $entry;
        }
    }
    closedir $dh;
}

sub render {
    my $self = shift;
    print "Current Directory: <i>$self->{dir}</i><br>";

    my $location = $self->{r}->location;
    print qq{<a href="$location$self->{dirs}{$_}">$_</a><br>}
        for sort keys %{ $self->{dirs} || {} };
    print qq{$_<br>}
        for sort keys %{ $self->{files} || {} };
}

1;
_  _END_  _

This module should be saved as Apache/BrowseSee.pm and placed into one of the directories in @INC. For example, if /home/httpd/perl is in your @INC, you can save it in /home/httpd/perl/Apache/BrowseSee.pm.

To configure this module, we just add the following snippet to httpd.conf:

PerlModule Apache::BrowseSee
<Location /browse>
    SetHandler perl-script
    PerlHandler Apache::BrowseSee->handler
</Location>

Users accessing the server from /browse can now browse the contents of your server from the document root and beneath but cannot view the contents of the files (see Figure 6-2).

Figure 6-2. The files can be browsed but not viewed

Now let's say that as soon as we get the module up and running, the client comes back and tells us he would like us to implement a very similar application, except that files should now be viewable (clickable). This is because later he wants to allow only authorized users to read the files while letting everybody see what he has to offer.

We knew that was coming, remember? Since we are lazy and it's not exciting to write the same code again and again, we will do the minimum amount of work while still keeping the client happy. This time we are going to implement the Apache::BrowseRead module:

package Apache::BrowseRead;
use strict;
use base qw(Apache::BrowseSee);

We place the new module into Apache/BrowseRead.pm, declare a new package, and tell Perl that this package inherits from Apache::BrowseSee using the base pragma. The last line is roughly equivalent to:

BEGIN {
    require Apache::BrowseSee;
    @Apache::BrowseRead::ISA = qw(Apache::BrowseSee);
}

Since this class is going to do the same job as Apache::BrowseSee, apart from rendering the file listings differently, all we have to do is override the render( ) method:

sub render {
    my $self = shift;
    print "<p>Current Directory: <i>$self->{dir}</i><br>";

    my $location = $self->{r}->location;
    print qq{<a href="$location$self->{dirs}{$_}">$_</a><br>}
        for sort keys %{ $self->{dirs} || {  } };
    print  qq{<a href="$self->{files}{$_}">$_</a><br>}
        for sort keys %{ $self->{files} || {  } };
}

As you can see, the only difference here is that we link to the real files now.

We complete the package as usual with 1; and _ _END_ _:

1;
_ _END_ _

Example 6-40 shows the whole package.

Example 6-40. Apache/BrowseRead.pm

package Apache::BrowseRead;
use strict;
use base qw(Apache::BrowseSee);

sub render {
    my $self = shift;
    print "<p>Current Directory: <i>$self->{dir}</i><br>";

    my $location = $self->{r}->location;
    print qq{<a href="$location$self->{dirs}{$_}">$_</a><br>}
        for sort keys %{ $self->{dirs} || {} };
    print  qq{<a href="$self->{files}{$_}">$_</a><br>}
        for sort keys %{ $self->{files} || {} };
}
1;
_  _END_  _

Finally, we should add a new configuration section in httpd.conf:

PerlModule Apache::BrowseRead
<Location /read>
    SetHandler perl-script
    PerlHandler Apache::BrowseRead->handler
</Location>

Now, when accessing files through /read, we can browse and view the contents of the files (see Figure 6-3). Once we add some authentication/authorization methods, we will have a server where everybody can browse, but only privileged users can read.

Figure 6-3. The files can be browsed and read

You might be wondering why you would write a special module to do something Apache itself can already do for you. First, this was an example on using method handlers, so we tried to keep it simple while showing some real code. Second, this example can easily be adapted and extended—for example, it can handle virtual files that don't exist on the filesystem but rather are generated on the fly and/or fetched from the database, and it can easily be changed to do whatever you (or your client) want to do, instead of what Apache allows.