###
17.2.3. `HASH`

Partitioning

Partitioning by `HASH`

is used primarily to
ensure an even distribution of data among a predetermined number
of partitions. With range or list partitioning, you must specify
explicitly into which partition a given column value or set of
column values is to be stored; with hash partitioning, MySQL
takes care of this for you, and you need only specify a column
value or expression based on a column value to be hashed and the
number of partitions into which the partitioned table is to be
divided.

To partition a table using `HASH`

partitioning,
it is necessary to append to the `CREATE TABLE`

statement a ```
PARTITION BY HASH
(
````expr`

)

clause, where
`expr`

is an expression that returns an
integer. This can simply be the name of a column whose type is
one of MySQL's integer types. In addition, you will most likely
want to follow this with a ```
PARTITIONS
````num`

clause, where
`num`

is a non-negative integer
representing the number of partitions into which the table is to
be divided.

For example, the following statement creates a table that uses
hashing on the `store_id`

column and is divided
into 4 partitions:

CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;

If you do not include a `PARTITIONS`

clause,
the number of partitions defaults to `1`

.
**Exception**: For ```
NDB
Cluster
```

tables, the default number of partitions is
the same as the number of cluster data nodes, possibly modified
to take into account any `MAX_ROWS`

setting in
order to ensure that all rows can fit into the partitions. (See
Chapter 16, *MySQL Cluster*.)

Using the `PARTITIONS`

keyword without a number
following it results in a syntax error.

You can also use an SQL expression that returns an integer for
`expr`

. For instance, you might want to
partition based on the year in which an employee was hired. This
can be done as shown here:

CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

You may use any function or other expression for
`expr`

that is valid in MySQL, so long
as it returns a non-constant, non-random integer value. (In
other words, it should be varying but deterministic.) However,
you should keep in mind that this expression is evaluated each
time a row is inserted or updated (or possibly deleted); this
means that very complex expressions may give rise to performance
issues, particularly when performing operations (such as batch
inserts) that affect a great many rows at one time.

The most efficient hashing function is one which operates upon a
single table column and whose value increases or decreases
consistently with the column value, as this allows for
“pruning” on ranges of partitions. That is, the
more closely that the expression varies with the value of the
column on which it is based, the more efficiently MySQL can use
the expression for hash partitioning.

For example, where `date_col`

is a column of
type `DATE`

, then the expression
`TO_DAYS(date_col)`

is said to vary directly
with the value of `date_col`

, because for every
change in the value of `date_col`

, the value of
the expression changes in a consistent manner. The variance of
the expression `YEAR(date_col)`

with respect to
`date_col`

is not quite as direct as that of
`TO_DAYS(date_col)`

, because not every possible
change in `date_col`

produces an equivalent
change in `YEAR(date_col)`

. Even so,
`YEAR(date_col)`

is a good candidate for a
hashing function, because it varies directly with a portion of
`date_col`

and there is no possible change in
`date_col`

that produces a disproportionate
change in `YEAR(date_col)`

.

By way of contrast, suppose that you have a column named
`int_col`

whose type is `INT`

.
Now consider the expression ```
POW(5-int_col,3) +
6
```

. This would be a poor choice for a hashing function
because a change in the value of `int_col`

is
not guaranteed to produce a proportional change in the value of
the expression. Changing the value of `int_col`

by a given amount can produce by widely different changes in the
value of the expression. For example, changing
`int_col`

from `5`

to
`6`

produces a change of `-1`

in the value of the expression, but changing the value of
`int_col`

from `6`

to
`7`

produces a change of `-7`

in the expression value.

In other words, the more closely the graph of the column value
*versus* the value of the
expression follows a straight line as traced by the equation
`y=``n`

x

where
`n`

is some nonzero constant, the
better the expression is suited to hashing. This has to do with
the fact that the more nonlinear an expression is, the more
uneven the distribution of data among the partitions it tends to
produce.

In theory, pruning is also possible for expressions involving
more than column value, but determining which of these are
suitable can be quite difficult and time-consuming. For this
reason, the use of hashing expressions involving multiple
columns is not particularly recommended.

When `PARTITION BY HASH`

is used, MySQL
determines which partition of `num`

partitions to use based on the modulus of the result of the user
function. In other words, for an expression
`expr`

, the partition in which the
record is stored is partition number
`N`

, where
`N`

=
MOD(`expr`

,
`num`

)

. For example, suppose
table `t1`

is defined as follows, so that it
has 4 partitions:

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY HASH( YEAR(col3) )
PARTITIONS 4;

If you insert a record into `t1`

whose
`col3`

value is
`'2005-09-15'`

, then the partition in which it
is stored is determined as follows:

MOD(YEAR('2005-09-01'),4)
= MOD(2005,4)
= 1

MySQL 5.1 also supports a variant of
`HASH`

partitioning known as *linear
hashing* which employs a more complex algorithm for
determining the placement of new rows inserted into the
partitioned table. See
Section 17.2.3.1, “`LINEAR HASH`

Partitioning”, for a description of
this algorithm.

The user function is evaluated each time a record is inserted or
updated. It may also — depending on the circumstances
— be evaluated when records are deleted.

**Note**: If the table to be
partitioned has a `UNIQUE`

key, then any
columns supplied as arguments to the `HASH`

user function or to the `KEY`

's
`column_list`

must be part of that key.