Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions

  




 

 

I.3. Restrictions on Subqueries

  • Known bug to be fixed later: If you compare a NULL value to a subquery using ALL, ANY, or SOME, and the subquery returns an empty result, the comparison might evaluate to the non-standard result of NULL rather than to TRUE or FALSE.

  • A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

  • Subquery optimization for IN is not as effective as for the = operator or for IN(value_list) constructs.

    A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

    The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

    SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
    

    The optimizer rewrites the statement to a correlated subquery:

    SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
    

    If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M√óN), rather than O(M+N) as it would be for an uncorrelated subquery.

    An implication is that an IN subquery can be much slower than a query written using an IN(value_list) construct that lists the same values that the subquery would return.

  • In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:

    DELETE FROM t WHERE ... (SELECT ... FROM t ...);
    UPDATE t ... WHERE col = (SELECT ... FROM t ...);
    {INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
    

    Exception: The preceding prohibition does not apply if you are using a subquery for the modified table in the FROM clause. Example:

    UPDATE t ... WHERE col = (SELECT (SELECT ... FROM t...) AS _t ...);
    

    Here the prohibition does not apply because a subquery in the FROM clause is materialized as a temporary table, so the relevant rows in t have already been selected by the time the update to t takes place.

  • Row comparison operations are only partially supported:

    • For expr IN (subquery), expr can be an n-tuple (specified via row constructor syntax) and the subquery can return rows of n-tuples.

    • For expr op {ALL|ANY|SOME} (subquery), expr must be a scalar value and the subquery must be a column subquery; it cannot return multiple-column rows.

    In other words, for a subquery that returns rows of n-tuples, this is supported:

    (val_1, ..., val_n) IN (subquery)
    

    But this is not supported:

    (val_1, ..., val_n) op {ALL|ANY|SOME} (subquery)
    

    The reason for supporting row comparisons for IN but not for the others is that IN is implemented by rewriting it as a sequence of = comparisons and AND operations. This approach cannot be used for ALL, ANY, or SOME.

  • Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:

    (col1, col2, ...) = (val1, val2, ...)
    col1 = val1 AND col2 = val2 AND ...
    
  • Subqueries in the FROM clause cannot be correlated subqueries. They are materialized (executed to produce a result set) before evaluating the outer query, so they cannot be evaluated per row of the outer query.

  • The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.

    An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example:

    SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);
    

    That statement can be rewritten as follows:

    SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;
    

    But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery.

  • Possible future optimization: MySQL does not rewrite the join order for subquery evaluation. In some cases, a subquery could be executed more efficiently if MySQL rewrote it as a join. This would give the optimizer a chance to choose between more execution plans. For example, it could decide whether to read one table or the other first.

    Example:

    SELECT a FROM outer_table AS ot
    WHERE a IN (SELECT a FROM inner_table AS it WHERE ot.b = it.b);
    

    For that query, MySQL always scans outer_table first and then executes the subquery on inner_table for each row. If outer_table has a lot of rows and inner_table has few rows, the query probably will not be as fast as it could be.

    The preceding query could be rewritten like this:

    SELECT a FROM outer_table AS ot, inner_table AS it
    WHERE ot.a = it.a AND ot.b = it.b;
    

    In this case, we can scan the small table (inner_table) and look up rows in outer_table, which will be fast if there is an index on (ot.a,ot.b).

  • Possible future optimization: A correlated subquery is evaluated for each row of the outer query. A better approach is that if the outer row values do not change from the previous row, do not evaluate the subquery again. Instead, use its previous result.

  • Possible future optimization: A subquery in the FROM clause is evaluated by materializing the result into a temporary table, and this table does not use indexes. This does not allow the use of indexes in comparison with other tables in the query, although that might be useful.

  • Possible future optimization: If a subquery in the FROM clause resembles a view to which the merge algorithm can be applied, rewrite the query and apply the merge algorithm so that indexes can be used. The following statement contains such a subquery:

    SELECT * FROM (SELECT * FROM t1 WHERE t1.t1_col) AS _t1, t2 WHERE t2.t2_col;
    

    The statement can be rewritten as a join like this:

    SELECT * FROM t1, t2 WHERE t1.t1_col AND t2.t2_col;
    

    This type of rewriting would provide two benefits:

    • It avoids the use of a temporary table for which no indexes can be used. In the rewritten query, the optimizer can use indexes on t1.

    • It gives the optimizer more freedom to choose between different execution plans. For example, rewriting the query as a join allows the optimizer to use t1 or t2 first.

  • Possible future optimization: For IN, = ANY, <> ANY, = ALL, and <> ALL with non-correlated subqueries, use an in-memory hash for a result result or a temporary table with an index for larger results. Example:

    SELECT a FROM big_table AS bt
    WHERE non_key_field IN (SELECT non_key_field FROM table WHERE condition)
    

    In this case, we could create a temporary table:

    CREATE TABLE t (key (non_key_field))
    (SELECT non_key_field FROM table WHERE condition)
    

    Then, for each row in big_table, do a key lookup in t based on bt.non_key_field.


 
 
  Published under the terms of the GNU General Public License Design by Interspire