Checking ZFS Data Integrity
No fsck utility equivalent exists for ZFS. This utility has traditionally served two
purposes, data repair and data validation.
Data Repair
With traditional file systems, the way in which data is written is
inherently vulnerable to unexpected failure causing data inconsistencies. Because a traditional file system is
not transactional, unreferenced blocks, bad link counts, or other inconsistent data structures are
possible. The addition of journaling does solve some of these problems, but can
introduce additional problems when the log cannot be rolled back. With ZFS, none
of these problems exist. The only way for inconsistent data to exist on
disk is through hardware failure (in which case the pool should have been
redundant) or a bug in the ZFS software exists.
Given that the fsck utility is designed to repair known pathologies specific to
individual file systems, writing such a utility for a file system with no
known pathologies is impossible. Future experience might prove that certain data corruption problems
are common enough and simple enough such that a repair utility can be
developed, but these problems can always be avoided by using redundant pools.
If your pool is not redundant, the chance that data corruption can
render some or all of your data inaccessible is always present.
Data Validation
In addition to data repair, the fsck utility validates that the data on
disk has no problems. Traditionally, this task is done by unmounting the file
system and running the fsck utility, possibly taking the system to single-user mode
in the process. This scenario results in downtime that is proportional to the
size of the file system being checked. Instead of requiring an explicit utility
to perform the necessary checking, ZFS provides a mechanism to perform routine checking
of all data. This functionality, known as scrubbing, is commonly used in memory and
other systems as a method of detecting and preventing errors before they result
in hardware or software failure.
Controlling ZFS Data Scrubbing
Whenever ZFS encounters an error, either through scrubbing or when accessing a file
on demand, the error is logged internally so that you can get
a quick overview of all known errors within the pool.
Explicit ZFS Data Scrubbing
The simplest way to check your data integrity is to initiate an
explicit scrubbing of all data within the pool. This operation traverses all the
data in the pool once and verifies that all blocks can be read.
Scrubbing proceeds as fast as the devices allow, though the priority of any
I/O remains below that of normal operations. This operation might negatively impact performance, though
the file system should remain usable and nearly as responsive while the scrubbing
occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrub can be displayed in the zpool status
output. For example:
# zpool status -v tank
pool: tank
state: ONLINE
scrub: scrub completed with 0 errors on Wed Aug 30 14:02:24 2006
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
errors: No known data errors
Note that only one active scrubbing operation per pool can occur at one
time.
You can stop a scrub that is in progress by using the
-s option. For example:
# zpool scrub -s tank
In most cases, a scrub operation to ensure data integrity should continue to
completion. Stop a scrub at your own discretion if system performance is impacted
by a scrub operation.
Performing routine scrubbing also guarantees continuous I/O to all disks on the system.
Routine scrubbing has the side effect of preventing power management from placing idle
disks in low-power mode. If the system is generally performing I/O all the
time, or if power consumption is not a concern, then this issue can
safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
ZFS Data Scrubbing and Resilvering
When a device is replaced, a resilvering operation is initiated to move data
from the good copies to the new device. This action is a
form of disk scrubbing. Therefore, only one such action can happen at a
given time in the pool. If a scrubbing operation is in progress, a
resilvering operation suspends the current scrubbing, and restarts it after the resilvering is
complete.
For more information about resilvering, see Viewing Resilvering Status.