Correlated Subqueries

01 May 2015

What are correlated subqueries? Excerpts from Wikipedia:

A correlated subquery (also known as synchronized subquery) is a subquery nested inside another query and uses values from the outer query. The subquery is evaluated once for each row processed by the outer query.

An example of correlated subquery:

INSERT INTO glueing_intermediate_info 
            (NewChainId, ShapeIndex, OldNodeId)
     SELECT t1.NewChainId,
            t1.ToPoint,
            (
             SELECT cno.Node
               FROM glueing_original_cno cno
              WHERE cno.Chain  = t1.OldChainId
                AND cno.NodeTp = (
                    CASE t1.Reversed
                      WHEN 0 THEN H_NDH_CNO_NODTP_NOD_B_NODE()
                      WHEN 1 THEN H_NDH_CNO_NODTP_NOD_A_NODE()
                    END
                )
            ) 
       FROM glueing_mapline_info t1,
            glueing_mapline_info t2
      WHERE t1.newchainid = t2.newchainid
        AND t1.topoint    = t2.frompoint;

In the example above, the correlated subquery is the SELECT statement over glueing_original_cno table. This query depends on the value of OldChainId column in each row of the glueing_mapline_info table from the outer SELECT query.

If the result set from the outer query has M rows, the result set from the inner query has N rows, AND there are no proper index created over the table (glueing_original_cno in this case) in the correlated subquery, then the result of running the entire query would be a M x N table scan. What a daunting performance hit!

In other words, correlated subqueries are generally considered a bad practice and should be avoided. Such query can be optimized/rewritten/avoided by:

  1. Using JOINS. In the example above, the correlated subquery can be trivially rewritten with joining table glueing_original_cno with table glueing_mapline_info in the outer query. Of course, proper index’d better be created over glueing_original_cno before joining.

  2. Create a temperary table or a view. The idea is that the inner query will be evaluated only once and used subsequently.

Be aware that correlated subqueries and ordinary nested subqueries are different. Simple nested subqueries do not use values from the outer query therefore got evaluated only once before entering the maing loop of the outer query. An example of ordinary subquery would be:

SELECT *
  FROM A
 WHERE A.Id IN (SELECT DISTINCT Id
                           FROM B);

The dependency direction is reversed: the outer loop depends on the inner, not the other way around. The number of iterations now would be M+N :-)

REFERENCE