Analyzing Dumped core Files (Practical mod

21.6. Analyzing Dumped core Files

When your application dies with the "Segmentation fault" error (generated by the default SIGSEGVsignal handler) and generates a core file, you can analyze the core file using gdb or a similar debugger to find out what caused the segmentation fault (or segfault).

21.6.1. Getting Ready to Debug

To debug the core file, you may need to recompile Perl and mod_perl so that their executables contain debugging symbols. Usually you have to recompile only mod_perl, but if the core dump happens in the libperl.so library and you want to see the whole backtrace, you will probably want to recompile Perl as well.

For example, sometimes people send this kind of backtrace to the mod_perl list:

#0  0x40448aa2 in ?? ( )
#1  0x40448ac9 in ?? ( )
#2  0x40448bd1 in ?? ( )
#3  0x4011d5d4 in ?? ( )
#4  0x400fb439 in ?? ( )
#5  0x400a6288 in ?? ( )
#6  0x400a5e34 in ?? ( )

This kind of trace is absolutely useless, since you cannot tell where the problem happens from just looking at machine addresses. To preserve the debug symbols and get a meaningful backtrace, recompile Perl with -DDEBUGGING during the ./Configure stage (or with -Doptimize="-g", which, in addition to adding the -DDEBUGGING option, adds the -g option, which allows you to debug the Perl interpreter itself).

After recompiling Perl, recompile mod_perl with PERL_DEBUG=1 during the perl Makefile.PL stage. Building mod_perl with PERL_DEBUG=1 will:

Add -g to EXTRA_CFLAGS, passed to your C compiler during compilation.
Turn on the PERL_TRACE option.
Set PERL_DESTRUCT_LEVEL=2.
Link against libperld if -e $Config{archlibexp}/CORE/libperld$Config{lib_ext} (i.e., if you've compiled perl with -DDEBUGGING).

During make install, Apache strips all the debugging symbols. To prevent this, you should use the Apache —without-execstrip ./configure option. So if you configure Apache via mod_perl, you should do this:

panic% perl Makefile.PL USE_APACI=1 \
  APACI_ARGS='--without-execstrip' [other options]

Alternatively, you can copy the unstripped binary manually. For example, we did this to give us an Apache binary called httpd_perl that contains debugging symbols:

panic# cp apache_1.3.24/src/httpd /home/httpd/httpd_perl/bin/httpd_perl

Now the software is ready for a proper debug.

21.6.2. Creating a Faulty Package

The nex t stage is to create a package that aborts abnormally with a segfault, so you will be able to reproduce the problem and exercise the debugging technique explained here. Luckily, you can download Debug::DumpCore from CPAN, which does a very simple thing—it segfaults when called as:

use Debug::DumpCore;
Debug::DumpCore::segv( );

Debug::DumpCore::segv( ) calls a function, which calls another function, which dereferences a NULL pointer, which causes the segfault:

int *p;
p = NULL;
printf("%d", *p); // cause a segfault

For those unfamiliar with C programming, p is a pointer to a segment of memory. Setting it to NULL ensures that we try to read from a segment of memory to which the operating system does not allow us access, so of course dereferencing the NULL pointer through *p causes a segmentation fault. And that's what we want.

Of course, you can use Perl's CORE::dump( ) function, which causes a core dump, but you don't get the nice long trace provided by Debug::DumpCore, which on purpose calls a few other functions before causing a segfault.

21.6.3. Dumping the core File

Now let's dump the core file from within the mod_perl server. Sometimes the program aborts abnormally via the SIGSEGVsignal (a segfault), but no core file is dumped. And without the core file it's hard to find the cause of the problem, unless you run the program inside gdb or another debugger in the first place. In order to get the core file, the application must:

Have the same effective UID as the real UID (the same goes for GID). This is the case with mod_perl unless you modify these settings in the server configuration file.
Be running from a directory that is writable by the process at the moment of the segmentation fault. Note that the program might change its current directory during its run, so it's possible that the core file will need to be dumped in a different directory from the one from which the program was started. For example when mod_perl runs an Apache::Registryscript, it changes its directory to the one in which the script's source is located.
Be started from a shell process with sufficient resource allocations for the core file to be dumped. You can override the default setting from within a shell script if the process is not started manually. In addition, you can use BSD::Resource to manipulate the setting from within the code as well.

You can use ulimit for a Bourne-style shell and limit for a C-style shell to check and adjust the resource allocation. For example, inside bash, you may set the core file size to unlimited with:
```
panic% ulimit -c unlimited
```
or for csh:
```
panic% limit coredumpsize unlimited
```
For example, you can set an upper limit of 8 MB on the core file with:
```
panic% ulimit -c 8388608
```
This ensures that if the core file would be bigger than 8 MB, it will be not created.

You must make sure that you have enough disk space to create a big core file (mod_perl core files tend to be of a few MB in size).

Note that when you are running the program under a debugger like gdb, which traps the SIGSEGV signal, the core file will not be dumped. Instead, gdb allows you to examine the program stack and other things without having the core file.

First let's test that we get the core file from the command line (under tcsh):

panic% limit coredumpsize unlimited
panic% perl -MDebug::DumpCore -e 'Debug::DumpCore::segv( )'
Segmentation fault (core dumped)
panic% ls -l core
-rw------- 1 stas stas 954368 Jul 31 23:52 core

Indeed, we can see that the core file was dumped. Let's write a simple script that uses Debug::DumpCore, as shown in Example 21-9.

Example 21-9. core_dump.pl

use strict;
use Debug::DumpCore ( );
use Cwd( )

my $r = shift;
$r->send_http_header("text/plain");

my $dir = getcwd;
$r->print("The core should be found at $dir/core\n");
Debug::DumpCore::segv( );

In this script we load the Debug::DumpCore and Cwd modules. Then we acquire the request object and send the HTTP headers. Now we come to the real part—we get the current working directory, print out the location of the core file that we are about to dump, and finally call Debug::DumpCore::segv( ), which dumps the core file.

Before we run the script we make sure that the shell sets the core file size to be unlimited, start the server in single-server mode as a non-root user, and generate a request to the script:

panic% cd /home/httpd/httpd_perl/bin
panic% limit coredumpsize unlimited
panic% ./httpd_perl -X
    # issue a request here
Segmentation fault (core dumped)

Our browser prints out:

The core should be found at /home/httpd/perl/core

And indeed the core file appears where we were told it would (remember that Apache::Registry scripts change their directory to the location of the script source):

panic% ls -l /home/httpd/perl/core
-rw------- 1 stas httpd 4669440 Jul 31 23:58 /home/httpd/perl/core

As you can see it's a 4.7 MB core file. Notice that mod_perl was started as user stas, which has write permission for the directory /home/httpd/perl.

21.6.4. Analyzing the core File

First we start gdb, with the location of the mod_perl executable and the core file as the arguments:

panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core

To see the backtrace, execute the where or bt commands:

(gdb) where
#0  0x4039f781 in crash_now_for_real (
    suicide_message=0x403a0120 "Cannot stand this life anymore")
    at DumpCore.xs:10
#1  0x4039f7a3 in crash_now (
    suicide_message=0x403a0120 "Cannot stand this life anymore",
    attempt_num=42) at DumpCore.xs:17
#2  0x4039f824 in XS_Debug_ _DumpCore_segv (cv=0x84ecda0)
    at DumpCore.xs:26
#3  0x401261ec in Perl_pp_entersub ( )
   from /usr/lib/perl5/5.6.1/i386-linux/CORE/libperl.so
#4  0x00000001 in ?? ( )

Notice that only the symbols from the DumpCore.xs file are available (plus Perl_pp_entersub from libperl.so), since by default Debug::DumpCore always compiles itself with the -g flag. However, we cannot see the rest of the trace, because our Perl and mod_perl libraries and Apache server were built without the debug symbols. We need to recompile them all with the debug symbols, as explained earlier in this chapter.

Then we repeat the process of starting the server, issuing a request, and getting the core file, after which we run gdb again against the executable and the dumped core file:

panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core

Now we can see the whole backtrace:

(gdb) bt
#0  0x40448aa2 in crash_now_for_real (
    suicide_message=0x404499e0 "Cannot stand this life anymore")
    at DumpCore.xs:10
#1  0x40448ac9 in crash_now (
    suicide_message=0x404499e0 "Cannot stand this life anymore",
    attempt_num=42) at DumpCore.xs:17
#2  0x40448bd1 in XS_Debug_ _DumpCore_segv (my_perl=0x8133b60, cv=0x861d1fc)
    at DumpCore.xs:26
#3  0x4011d5d4 in Perl_pp_entersub (my_perl=0x8133b60) at pp_hot.c:2773
#4  0x400fb439 in Perl_runops_debug (my_perl=0x8133b60) at dump.c:1398
#5  0x400a6288 in S_call_body (my_perl=0x8133b60, myop=0xbffff160, is_eval=0)
    at perl.c:2045
#6  0x400a5e34 in Perl_call_sv (my_perl=0x8133b60, sv=0x85d696c, flags=4)
    at perl.c:1963
#7  0x0808a6e3 in perl_call_handler (sv=0x85d696c, r=0x860bf54, args=0x0)
    at mod_perl.c:1658
#8  0x080895f2 in perl_run_stacked_handlers (hook=0x8109c47 "PerlHandler",
    r=0x860bf54, handlers=0x82e5c4c) at mod_perl.c:1371
#9  0x080864d8 in perl_handler (r=0x860bf54) at mod_perl.c:897
#10 0x080d2560 in ap_invoke_handler (r=0x860bf54) at http_config.c:517
#11 0x080e6796 in process_request_internal (r=0x860bf54) at http_request.c:1308
#12 0x080e67f6 in ap_process_request (r=0x860bf54) at http_request.c:1324
#13 0x080ddba2 in child_main (child_num_arg=0) at http_main.c:4595
#14 0x080ddd4a in make_child (s=0x8127ec4, slot=0, now=1028133659)
#15 0x080ddeb1 in startup_children (number_to_start=4) at http_main.c:4792
#16 0x080de4e6 in standalone_main (argc=2, argv=0xbffff514) at http_main.c:5100
#17 0x080ded04 in main (argc=2, argv=0xbffff514) at http_main.c:5448
#18 0x40215082 in _ _libc_start_main ( ) from /lib/i686/libc.so.6

Reading the trace from bottom to top, we can see that it starts with Apache functions, moves on to the mod_perl and then Perl functions, and finally calls functions from the Debug::DumpCore package. At the top we can see the crash_now_for_real( ) function, which was the one that caused the segmentation fault; we can also see that the faulty code was at line 10 of the DumpCore.xs file. And indeed, if we look at that line number we can see the reason for the segfault—the dereferencing of the NULL pointer:

9: int *p = NULL;
  10: printf("%d", *p); /* cause a segfault */

In our example, we knew what Perl script had caused the segmentation fault. In the real world, it is likely that you'll have only the core file, without any clue as to which handler or script has triggered it. The special curinfo gdb macro can help:

panic% gdb /home/httpd/httpd_perl/bin/httpd_perl /home/httpd/perl/core
(gdb) source mod_perl-1.xx/.gdbinit
(gdb) curinfo
9:/home/httpd/perl/core_dump.pl

Start the gdb debugger as before. .gdbinit, the file with various useful gdb macros, is located in the source tree of mod_perl. We use the gdb source function to load these macros, and when we run the curinfo macro we learn that the core was dumped when /home/httpd/perl/core_dump.pl was executing the code at line 9.

These are the bits of information that are important in order to reproduce and resolve a problem: the filename and line number where the fault occurred (the faulty function is Debug::DumpCore::segv( ) in our case) and the actual line where the segmentation fault occurred (the printf("%d", *p) call in XS code). The former is important for problem reproducing, since it's possible that if the same function was called from a different script the problem wouldn't show up (not the case in our example, where using a dereferenced NULL pointer will always cause a segmentation fault).

21.6.5. Extracting the Backtrace Automatically

With the help of Debug::FaultAutoBT, you can try to get the backtrace extracted automatically, without any need for the core file. As of this writing this CPAN module is very new and doesn't work on all platforms.

To use this module we simply add the following code in the startup file:

use Debug::FaultAutoBT;
use File::Spec::Functions;
my $tmp_dir = File::Spec::Functions::tmpdir;
die "cannot find out a temp dir" if $tmp_dir eq '';
my $trace = Debug::FaultAutoBT->new(dir => "$tmp_dir");
$trace->ready( );

This code tries to automatically figure out the location of the temporary directory, initializes the Debug::FaultAutoBT object with it, and finally uses the method ready( ) to set the signal handler, which will attempt to automatically get the backtrace. Now when we repeat the process of starting the server and issuing a request, if we look at the error_log file, it says:

SIGSEGV (Segmentation fault) in 29072
writing to the core file /tmp/core.backtrace.29072

And indeed the file /tmp/core.backtrace.29072 includes a backtrace similar to the one we extracted before, using the core file.