GROUP BY vs DISTINCT

11 Mar 2013

Which one is faster? And what are the differences?

According to this stackoverflow thread, these two expressions are identical and DISTINCT is actually implemented by GROUP BY. If one of them is faster, it’s going to be DISTINCT. However, this thread is discussing MySQL.

For SQLite, you can use EXPLAIN QUERY PLAN to show how each query is executed:

sort on a column having no index

sqlite> EXPLAIN QUERY PLAN select distinct geomarea from bmd;
selectid = 0
   order = 0
    from = 0
  detail = SCAN TABLE bmd_layer_tiles USING COVERING INDEX idx_tfg (~1000000 rows)
 
selectid = 0
   order = 0
    from = 0
  detail = USE TEMP B-TREE FOR DISTINCT

sqlite> EXPLAIN QUERY PLAN select geomarea from bmd group by GeomArea;
selectid = 0
   order = 0
    from = 0
  detail = SCAN TABLE bmd_layer_tiles USING COVERING INDEX idx_tfg (~1000000 rows)
 
selectid = 0
   order = 0
    from = 0
  detail = USE TEMP B-TREE FOR GROUP BY

sort on a column having index

sqlite> EXPLAIN QUERY PLAN select distinct contourelevation from bmd;
selectid = 0
   order = 0
    from = 0
  detail = SCAN TABLE bmd USING COVERING INDEX idx_contourelevation (~1000000 rows)

sqlite> EXPLAIN QUERY PLAN select * from bmd group by contourelevation;
selectid = 0
   order = 0
    from = 0
  detail = SCAN TABLE bmd USING INDEX idx_contourelevation (~1000000 rows)

From their query plans above, they are exactly the same! Actually, you can find out the answer from this link, where it explains that both GROUP BY and DISTINCT use a temporary B-Tree to sort the output rows when there are no indexes created!