To avoid data loss and other problems, you should take special care when
testing a new device driver. This section discusses various testing strategies. For example,
setting up a separate system that you control through a serial connection is
the safest way to test a new driver. You can load test
modules with various kernel variable settings to test performance under different kernel conditions. Should
your system crash, you should be prepared to restore backup data, analyze any
crash dumps, and rebuild the device directory.
Enable the Deadman Feature to Avoid a Hard Hang
If your system is in a hard hang, then you cannot break into
the debugger. If you enable the deadman feature, the system panics instead of
hanging indefinitely. You can then use the kmdb(1) kernel debugger to analyze your problem.
The deadman feature checks every second whether the system clock is updating. If
the system clock is not updating, then you are in an indefinite hang.
If the system clock has not been updated for 50 seconds, the
deadman feature induces a panic and puts you in the debugger.
Take the following steps to enable the deadman feature:
Make sure you are capturing crash images with dumpadm(1M).
Set the snooping variable in the /etc/system file. See the system(4) man page for information on the /etc/system file.
Reboot the system so that the /etc/system file is read again and the snooping setting takes effect.
Note that any zones on your system inherit the deadman setting as
If your system hangs while the deadman feature is enabled, you should see
output similar to the following example on your console:
panic[cpu1]/thread=30018dd6cc0: deadman: timed out after 9 seconds of
panic: entering debugger (continue to save dump)
Inside the debugger, use the ::cpuinfo command to investigate why the clock interrupt
was not able to fire and advance the system time.
Testing With a Serial Connection
Using a serial connection is a good way to test drivers. Use the
tip(1) command to make a serial connection between a host system and a
test system. With this approach, the tip window on the host console is
used as the console of the test machine. See the tip(1) man page for
A tip window has the following advantages:
Interactions with the test system and kernel debuggers can be monitored. For example, the window can keep a log of the session for use if the driver crashes the test system.
The test machine can be accessed remotely by logging into a tip host machine and using tip(1) to connect to the test machine.
Note - Although using a tip connection and a second machine are not required to
debug a Solaris device driver, this technique is still recommended.
To Set Up the Host System for a tip Connection
- Connect the host system to the test machine using serial port A on
This connection must be made with a null modem cable.
- On the host system, make sure there is an entry in /etc/remote for
the connection. See the remote(4) man page for details.
The terminal entry must match the serial port that is used. The
Solaris operating system comes with the correct entry for serial port B, but
a terminal entry must be added for serial port A:
Note - The baud rate must be set to 9600.
- In a shell window on the host, run tip(1) and specify the
name of the entry:
% tip debug
The shell window is now a tip window with a connection to
the console of the test machine.
Caution - Do not use STOP-A for SPARC machines or F1-A for x86 architecture
machines on the host machine to stop the test machine. This action actually
stops the host machine. To send a break to the test machine, type
~# in the tip window. Commands such as ~# are recognized only if
these characters on first on the line. If the command has no effect,
press either the Return key or Control-U.
Setting Up a Target System on the SPARC Platform
A quick way to set up the test machine on the SPARC
platform is to unplug the keyboard before turning on the machine. The machine
then automatically uses serial port A as the console.
Another way to set up the test machine is to use boot
PROM commands to make serial port A the console. On the test machine,
at the boot PROM ok prompt, direct console I/O to the serial
line. To make the test machine always come up with serial port A
as the console, set the environment variables: input-device and output-device.
Example 22-1 Setting input-device and output-device With Boot PROM Commands
ok setenv input-device ttya
ok setenv output-device ttya
The eeprom command can also be used to make serial port A the
console. As superuser, execute the following commands to make the input-device and
output-device parameters point to serial port A. The following example demonstrates the eeprom
Example 22-2 Setting input-device and output-device With the eeprom Command
# eeprom input-device=ttya
# eeprom output-device=ttya
The eeprom commands cause the console to be redirected to serial port A
at each subsequent system boot.
Setting Up a Target System on the x86 Platform
On x86 platforms, use the eeprom command to make serial port A the
console. This procedure is the same as the SPARC platform procedure. See Setting Up a Target System on the SPARC Platform.
The eeprom command causes the console to switch to serial port A (COM1)
Note - x86 machines do not transfer console control to the tip connection until an
early stage in the boot process unless the BIOS supports console redirection to
a serial port. In SPARC machines, the tip connection maintains console control throughout
the boot process.
Setting Up Test Modules
The system(4) file in the /etc directory enables you to set the value
of kernel variables at boot time. With kernel variables, you can toggle different
behaviors in a driver and take advantage of debugging features that are provided
by the kernel. The kernel variables moddebug and kmem_flags, which can be
very useful in debugging, are discussed later in this section. See also Enable the Deadman Feature to Avoid a Hard Hang.
Changes to kernel variables after boot are unreliable, because /etc/system is read
only once when the kernel boots. After this file is modified, the system
must be rebooted for the changes to take effect. If a change
in the file causes the system not to work, boot with the ask
(-a) option. Then specify /dev/null as the system file.
Note - Kernel variables cannot be relied on to be present in subsequent releases.
Setting Kernel Variables
The set command changes the value of module or kernel variables. To set
module variables, specify the module name and the variable:
For example, to set the variable test_debug in a driver that is named
myTest, use set as follows:
% set myTest:test_debug=1
To set a variable that is exported by the kernel itself, omit
the module name.
You can also use a bitwise OR operation to set a value,
% set moddebug | 0x80000000
Loading and Unloading Test Modules
The commands modload(1M), modunload(1M), and modinfo(1M) can be used to add test
modules, which is a useful technique for debugging and stress-testing drivers. These commands are
generally not needed in normal operation, because the kernel automatically loads needed modules
and unloads unused modules. The moddebug kernel variable works with these commands
to provide information and set controls.
Using the modload() Function
Use modload(1M) to force a module into memory. The modload command verifies that
the driver has no unresolved references when that driver is loaded. Loading a
driver does not necessarily mean that the driver can attach. When a driver
loads successfully, the driver's _info(9E) entry point is called. The attach() entry
point is not necessarily called.
Using the modinfo() Function
Use modinfo(1M) to confirm that the driver is loaded.
Example 22-3 Using modinfo to Confirm a Loaded Driver
Id Loadaddr Size Info Rev Module Name
6 101b6000 732 - 1 obpsym (OBP symbol callbacks)
7 101b65bd 1acd0 226 1 rpcmod (RPC syscall)
7 101b65bd 1acd0 226 1 rpcmod (32-bit RPC syscall)
7 101b65bd 1acd0 1 1 rpcmod (rpc interface str mod)
8 101ce8dd 74600 0 1 ip (IP STREAMS module)
8 101ce8dd 74600 3 1 ip (IP STREAMS device)
$ modinfo | grep mydriver
169 781a8d78 13fb 0 1 mydriver (Test Driver 1.5)
The number in the info field is the major number that has been
chosen for the driver. The modunload(1M) command can be used to unload
a module if the module ID is provided. The module ID is found
in the left column of modinfo output.
Sometimes a driver does not unload as expected after a modunload is issued,
because the driver is determined to be busy. This situation occurs when the
driver fails detach(9E), either because the driver really is busy, or because the
detach entry point is implemented incorrectly.
To remove all of the currently unused modules from memory, run modunload(1M)
with a module ID of 0:
# modunload -i 0
Setting the moddebug Kernel Variable
The moddebug kernel variable controls the module loading process. The possible values of
Prints messages to the console when loading or unloading modules.
Gives more detailed error messages.
Prints more detail when loading or unloading, such as including the address and size.
No auto-unloading drivers. The system does not attempt to unload the device driver when the system resources become low.
No auto-unloading streams. The system does not attempt to unload the STREAMS module when the system resources become low.
No auto-unloading of kernel modules of any type.
If running with kmdb, moddebug causes a breakpoint to be executed and a return to kmdb immediately before each module's _init() routine is called. This setting also generates additional debug messages when the module's _info() and _fini() routines are executed.
Setting kmem_flags Debugging Flags
The kmem_flags kernel variable enables debugging features in the kernel's memory allocator. Set
kmem_flags to 0xf to enable the allocator's debugging features. These features include runtime checks
to find the following code conditions:
Writing to a buffer after the buffer is freed
Using memory before the memory is initialized
Writing past the end of a buffer
The Solaris Modular Debugger Guide describes how to use the kernel memory allocator to analyze such problems.
Note - Testing and developing with kmem_flags set to 0xf can help detect latent memory
corruption bugs. Because setting kmem_flags to 0xf changes the internal behavior of the kernel
memory allocator, you should thoroughly test without kmem_flags as well.
Avoiding Data Loss on a Test System
A driver bug can sometimes render a system incapable of booting. By taking
precautions, you can avoid system reinstallation in this event, as described in this
Back Up Critical System Files
A number of driver-related system files are difficult, if not impossible, to reconstruct.
Files such as /etc/name_to_major, /etc/driver_aliases, /etc/driver_classes, and /etc/minor_perm can be corrupted if the
driver crashes the system during installation. See the add_drv(1M) man page.
To be safe, make a backup copy of the root file system
after the test machine is in the proper configuration. If you plan to
modify the /etc/system file, make a backup copy of the file before
To Boot With an Alternate Kernel
To avoid rendering a system inoperable, you should boot from a copy of
the kernel and associated binaries rather than from the default kernel.
- Make a copy of the drivers in /platform/*.
# cp -r /platform/`uname -i`/kernel /platform/`uname -i`/kernel.test
- Place the driver module in /platform/`uname -i`/kernel.test/drv.
- Boot the alternate kernel instead of the default kernel.
After you have created and stored the alternate kernel, you can boot this
kernel in a number of ways.
You can boot the alternate kernel by rebooting:
# reboot -- kernel.test/unix
On a SPARC-based system, you can also boot from the PROM:
ok boot kernel.test/sparcv9/unix
Note - To boot with the kmdb debugger, use the -k option as described in Getting Started With the Modular Debugger.
On an x86-based system, when the Select (b)oot or (i)nterpreter: message is displayed in the boot process, type the following:
Example 22-4 Booting an Alternate Kernel
The following example demonstrates booting with an alternate kernel.
ok boot kernel.test/sparcv9/unix
Rebooting with command: boot kernel.test/sparcv9/unix
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a File and \
Example 22-5 Booting an Alternate Kernel With the -a Option
Alternatively, the module path can be changed by booting with the ask (-a)
option. This option results in a series of prompts for configuring the boot
ok boot -a
Rebooting with command: boot -a
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a File and \
Enter filename [kernel/sparcv9/unix]: kernel.test/sparcv9/unix
Enter default directory for modules
[/platform/sun4u/kernel.test /kernel /usr/kernel]: <CR>
Name of system file [etc/system]: <CR>
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2002 Sun Microsystems, Inc. All rights reserved.
root filesystem type [ufs]: <CR>
Enter physical name of root device
Consider Alternative Back-Up Plans
If the system is attached to a network, the test machine can
be added as a client of a server. If a problem occurs, the
system can be booted from the network. The local disks can then be
mounted, and any fixes can be made. Alternatively, the system can be booted
directly from the Solaris system CD-ROM.
Another way to recover from disaster is to have another bootable root file
system. Use format(1M) to make a partition that is the exact size of
the original. Then use dd(1M) to copy the bootable root file system. After
making a copy, run fsck(1M) on the new file system to ensure
Subsequently, if the system cannot boot from the original root partition, boot the
backup partition. Use dd(1M) to copy the backup partition onto the original partition.
You might have a situation where the system cannot boot even though the
root file system is undamaged. For example, the damage might be limited to
the boot block or the boot program. In such a case, you can
boot from the backup partition with the ask (-a) option. You can then
specify the original file system as the root file system.
Capture System Crash Dumps
When a system panics, the system writes an image of kernel memory to
the dump device. The dump device is by default the most suitable swap
device. The dump is a system crash dump, similar to core dumps generated
by applications. On rebooting after a panic, savecore(1M) checks the dump device for
a crash dump. If a dump is found, savecore makes a copy of
the kernel's symbol table, which is called unix.n. The savecore utility then
dumps a core file that is called vmcore.n in the core image directory. By
default, the core image directory is /var/crash/machine_name. If /var/crash has insufficient space for
a core dump, the system displays the needed space but does not actually
save the dump. The mdb(1) debugger can then be used on the core
dump and the saved kernel.
In the Solaris operating system, crash dump is enabled by default. The
dumpadm(1M) command is used to configure system crash dumps. Use the dumpadm command
to verify that crash dumps are enabled and to determine the location of
core files that have been saved.
Note - You can prevent the savecore utility from filling the file system. Add a file
that is named minfree to the directory in which the dumps are to
be saved. In this file, specify the number of kilobytes to remain free
after savecore has run. If insufficient space is available, the core file is
Recovering the Device Directory
Damage to the /devices and /dev directories can occur if the driver crashes
during attach(9E). If either directory is damaged, you can rebuild the directory by
booting the system and running fsck(1M) to repair the damaged root file system.
The root file system can then be mounted. Recreate the /devices and
/dev directories by running devfsadm(1M) and specifying the /devices directory on the mounted
The following example shows how to repair a damaged root file system on
a SPARC system. In this example, the damaged disk is /dev/dsk/c0t3d0s0, and
an alternate boot disk is /dev/dsk/c0t1d0s0.
Example 22-6 Recovering a Damaged Device Directory
ok boot disk1
Rebooting with command: boot kernel.test/sparcv9/unix
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@31,0:a File and \
# fsck /dev/dsk/c0t3d0s0** /dev/dsk/c0t3d0s0
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1478 files, 9922 used, 29261 free
(141 frags, 3640 blocks, 0.4% fragmentation)
# mount /dev/dsk/c0t3d0s0 /mnt
# devfsadm -r /mnt
Note - A fix to the /devices and /dev directories can allow the system to
boot while other parts of the system are still corrupted. Such repairs are only
a temporary fix to save information, such as system crash dumps, before reinstalling