Tuning Drivers
The Solaris OS provides kernel statistics structures so that you can implement counters
for your driver. The DTrace facility enables you to analyze performance in real
time. This section presents the following topics on device performance:
Kernel Statistics – The Solaris OS provides a set of data structures and functions for capturing performance statistics in the kernel. Kernel statistics (called kstats) enable your driver to export continuous statistics while the system is running. The kstat data is handled programmatically by using the kstat functions.
DTrace for Dynamic Instrumentation – DTrace enables you to add instrumentation to your driver dynamically so that you can perform tasks like analyzing the system and measuring performance. DTrace takes advantage of predefined kstat structures.
Kernel Statistics
To assist in performance tuning, the Solaris kernel provides the kstat(3KSTAT) facility.
The kstat facility provides a set of functions and data structures for device
drivers and other kernel modules to export module-specific kernel statistics.
A kstat is a data structure for recording quantifiable aspects of a device's
usage. A kstat is stored as a null-terminated linked list. Each kstat has
a common header section and a type-specific data section. The header section is
defined by the kstat_t structure.
The article “Using kstat From Within a Program in the Solaris OS” on
the Sun Developer Network at https://developers.sun.com/solaris/articles/kstat_api.html provides two practical examples on how to
use the kstat(3KSTAT) and libkstat(3LIB) APIs to extract metrics from the Solaris OS.
The examples include “Walking Through All the kstat” and “Getting NIC kstat Output
Using the Java Platform.”
Kernel Statistics Structure Members
The members of a kstat structure are:
- ks_class[KSTAT_STRLEN]
Categorizes the kstat type as bus, controller, device_error, disk, hat, kmem_cache, kstat, misc, net, nfs, pages, partition, rps, ufs, vm, or vmem.
- ks_crtime
Time at which the kstat was created. ks_crtime is commonly used in calculating rates of various counters.
- ks_data
Points to the data section for the kstat.
- ks_data_size
Total size of the data section in bytes.
- ks_instance
The instance of the kernel module that created this kstat. ks_instance is combined with ks_module and ks_name to give the kstat a unique, meaningful name.
- ks_kid
Unique ID for the kstat.
- ks_module[KSTAT_STRLEN]
Identifies the kernel module that created this kstat. ks_module is combined with ks_instance and ks_name to give the kstat a unique, meaningful name. KSTAT_STRLEN sets the maximum length of ks_module.
- ks_name[KSTAT_STRLEN]
A name assigned to the kstat in combination with ks_module and ks_instance. KSTAT_STRLEN sets the maximum length of ks_module.
- ks_ndata
Indicates the number of data records for those kstat types that support multiple records: KSTAT_TYPE_RAW, KSTAT_TYPE_NAMED, and KSTAT_TYPE_TIMER
- ks_next
Points to next kstat in the chain.
- ks_resv
A reserved field.
- ks_snaptime
The timestamp for the last data snapshot, useful in calculating rates.
- ks_type
The data type, which can be KSTAT_TYPE_RAW for binary data, KSTAT_TYPE_NAMED for name/value pairs, KSTAT_TYPE_INTR for interrupt statistics, KSTAT_TYPE_IO for I/O statistics, and KSTAT_TYPE_TIMER for event timers.
Kernel Statistics Structures
The structures for the different kinds of kstats are:
- kstat(9S)
Each kernel statistic (kstat) that is exported by device drivers consists of a header section and a data section. The kstat(9S) structure is the header portion of the statistic.
- kstat_intr(9S)
Structure for interrupt kstats. The types of interrupts are:
Hard interrupt – Sourced from the hardware device itself
Soft interrupt – Induced by the system through the use of some system interrupt source
Watchdog interrupt – Induced by a periodic timer call
Spurious interrupt – An interrupt entry point was entered but there was no interrupt to service
Multiple service – An interrupt was detected and serviced just prior to returning from any of the other types
Drivers generally report only claimed hard interrupts and soft interrupts from their handlers, but measurement of the spurious class of interrupts is useful for auto-vectored devices to locate any interrupt latency problems in a particular system configuration. Devices that have more than one interrupt of the same type should use multiple structures.
- kstat_io(9S)
Structure for I/O kstats.
- kstat_named(9S)
Structure for named kstats. A named kstat is an array of name-value pairs. These pairs are kept in the kstat_named structure.
Kernel Statistics Functions
The functions for using kstats are:
- kstat_create(9F)
Allocate and initialize a kstat(9S) structure.
- kstat_delete(9F)
Remove a kstat from the system.
- kstat_install(9F)
Add a fully initialized kstat to the system.
- kstat_named_init(9F), kstat_named_setstr(9F)
Initialize a named kstat. kstat_named_setstr() associates str, a string, with the named kstat pointer.
- kstat_queue(9F)
A large number of I/O subsystems have at least two basic queues of transactions to be managed. One queue is for transactions that have been accepted for processing but for which processing has yet to begin. The other queue is for transactions that are actively being processed but not yet done. For this reason, two cumulative time statistics are kept: wait time and run time. Wait time is prior to service. Run time is during the service. The kstat_queue() family of functions manages these times based on the transitions between the driver wait queue and run queue:
Kernel Statistics for Solaris Ethernet Drivers
The kstat interface described in the following table is an effective way to
obtain Ethernet physical layer statistics from the driver. Ethernet drivers should export these
statistics to guide users in better diagnosis and repair of Ethernet physical layer
problems. With exception of link_up, all statistics have a default value of
0 when not present. The value of the link_up statistic should be assumed
to be 1.
The following example gives all the shared link setup. In this case
mii is used to filter statistics.
kstat ce:0:mii:link_*
Table 22-2 Ethernet MII/GMII Physical Layer Interface Kernel Statistics
Kstat Variable |
Type |
Description |
xcvr_addr |
KSTAT_DATA_UINT32 |
Provides the MII address of the
transceiver that is currently in use.
(0) - (31) are for the MII address of the physical layer device in use for a given Ethernet device.
(-1) is used where there is no externally accessible MII interface, and therefore the MII address is undefined or irrelevant.
|
xcvr_id |
KSTAT_DATA_UINT32 |
Provides the specific vendor ID or device
ID of the transceiver that is currently in use. |
xcvr_inuse |
KSTAT_DATA_UINT32 |
Indicates the type of transceiver
that is currently in use. The IEEE aPhytType enumerates the following set:
(0) other undefined
(1) no MII interface is present, but no transceiver is connected
(2) 10 Mbits/s Clause 7 10 Mbits/s Manchester
(3) 100BASE-T4 Clause 23 100 Mbits/s 8B/6T
(4) 100BASE-X Clause 24 100 Mbits/s 4B/5B
(5) 100BASE-T2 Clause 32 100 Mbits/s PAM5X5
(6) 1000BASE-X Clause 36 1000 Mbits/s 8B/10B
(7) 1000BASE-T Clause 40 1000 Mbits/s 4D-PAM5
This
set is smaller than the set specified by ifMauType, which is defined to
include all of the above plus their half duplex/full duplex options. Since this
information can be provided by the cap_* statistics, the missing definitions can be
derived from the combination of xcvr_inuse and cap_* to provide all the
combinations of ifMayType. |
cap_1000fdx |
KSTAT_DATA_CHAR |
Indicates the device is 1 Gbits/s full duplex capable. |
cap_1000hdx |
KSTAT_DATA_CHAR |
Indicates the device
is 1 Gbits/s half duplex capable. |
cap_100fdx |
KSTAT_DATA_CHAR |
Indicates the device is 100 Mbits/s full
duplex capable. |
cap_100hdx |
KSTAT_DATA_CHAR |
Indicates the device is 100 Mbits/s half duplex capable. |
cap_10fdx |
KSTAT_DATA_CHAR |
Indicates the device
is 10 Mbits/s full duplex capable. |
cap_10hdx |
KSTAT_DATA_CHAR |
Indicates the device is 10 Mbits/s half
duplex capable. |
cap_asmpause |
KSTAT_DATA_CHAR |
Indicates the device is capable of asymmetric pause Ethernet flow control. |
cap_pause |
KSTAT_DATA_CHAR |
Indicates
the device is capable of symmetric pause Ethernet flow control when cap_pause is
set to 1 and cap_asmpause is set to 0. When cap_asmpause is set
to 1, cap_pause has the following meaning:
|
cap_rem_fault |
KSTAT_DATA_CHAR |
Indicates the device is capable of remote
fault indication. |
cap_autoneg |
KSTAT_DATA_CHAR |
Indicates the device is capable of auto-negotiation. |
adv_cap_1000fdx |
KSTAT_DATA_CHAR |
Indicates the device is advertising
1 Gbits/s full duplex capability. |
adv_cap_1000hdx |
KSTAT_DATA_CHAR |
Indicates the device is advertising 1 Gbits/s half
duplex capability. |
adv_cap_100fdx |
KSTAT_DATA_CHAR |
Indicates the device is advertising 100 Mbits/s full duplex capability. |
adv_cap_100hdx |
KSTAT_DATA_CHAR |
Indicates the
device is advertising 100 Mbits/s half duplex capability. |
adv_cap_10fdx |
KSTAT_DATA_CHAR |
Indicates the device is advertising
10 Mbits/s full duplex capability. |
adv_cap_10hdx |
KSTAT_DATA_CHAR |
Indicates the device is advertising 10 Mbits/s half
duplex capability. |
adv_cap_asmpause |
KSTAT_DATA_CHAR |
Indicates the device is advertising the capability of asymmetric pause Ethernet flow
control. |
adv_cap_pause |
KSTAT_DATA_CHAR |
Indicates the device is advertising the capability of symmetric pause Ethernet flow control
when adv_cap_pause is set to 1 and adv_cap_asmpause is set to 0.
When adv_cap_asmpause is set to 1, adv_cap_pause has the following meaning:
|
adv_rem_fault |
KSTAT_DATA_CHAR |
Indicates the device
is experiencing a fault that it is going to forward to the link
partner. |
adv_cap_autoneg |
KSTAT_DATA_CHAR |
Indicates the device is advertising the capability of auto-negotiation. |
lp_cap_1000fdx |
KSTAT_DATA_CHAR |
Indicates the link partner
device is 1 Gbits/s full duplex capable. |
lp_cap_1000hdx |
KSTAT_DATA_CHAR |
Indicates the link partner device is
1 Gbits/s half duplex capable. |
lp_cap_100fdx |
KSTAT_DATA_CHAR |
Indicates the link partner device is 100 Mbits/s
full duplex capable. |
lp_cap_100hdx |
KSTAT_DATA_CHAR |
Indicates the link partner device is 100 Mbits/s half duplex
capable. |
lp_cap_10fdx |
KSTAT_DATA_CHAR |
Indicates the link partner device is 10 Mbits/s full duplex capable. |
lp_cap_10hdx |
KSTAT_DATA_CHAR |
Indicates the
link partner device is 10 Mbits/s half duplex capable. |
lp_cap_asmpause |
KSTAT_DATA_CHAR |
Indicates the link partner
device is capable of asymmetric pause Ethernet flow control. |
lp_cap_pause |
KSTAT_DATA_CHAR |
Indicates the link partner device
is capable of symmetric pause Ethernet flow control when lp_cap_pause is set to
1 and lp_cap_asmpause is set to 0. When lp_cap_asmpause is set to 1,
lp_cap_pause has the following meaning:
|
lp_rem_fault |
KSTAT_DATA_CHAR |
Indicates the link partner is experiencing a fault
with the link. |
lp_cap_autoneg |
KSTAT_DATA_CHAR |
Indicates the link partner device is capable of auto-negotiation. |
link_asmpause |
KSTAT_DATA_CHAR |
Indicates the
link is operating with asymmetric pause Ethernet flow control. |
link_pause |
KSTAT_DATA_CHAR |
Indicates the resolution of the
pause capability. Indicates the link is operating with symmetric pause Ethernet flow control
when link_pause is set to 1 and link_asmpause is set to 0. When
link_asmpause is set to 1 and is relative to a local view
of the link, link_pause has the following meaning:
|
link_duplex |
KSTAT_DATA_CHAR |
Indicates the link duplex.
link_duplex = 0 Link is down and duplex is unknown.
link_duplex = 1 Link is up and in half duplex mode.
link_duplex = 2 Link is up and in full duplex mode.
|
link_up |
KSTAT_DATA_CHAR |
Indicates whether
the link is up or down.
|
DTrace for Dynamic Instrumentation
DTrace is a comprehensive dynamic tracing facility for examining the behavior of both
user programs and the operating system itself. With DTrace, you can collect data
at strategic locations in your environment, referred to as probes. DTrace enables you
to record such data as stack traces, timestamps, the arguments to a function,
or simply counts of how often the probe fires. Because DTrace enables you
to insert probes dynamically, you do not need to recompile your code. For
more information on DTrace, see the Solaris Dynamic Tracing Guide and the DTrace User Guide . The DTrace BigAdmin System Administration Portal contains
many links to articles, XPerts sessions, and other information about DTrace.