Spatialite Slow Queries

26 Jul 2013

This week I created some query with INTERSECTS spatial operation in it, it is used to detect if some line segments ‘intersects’ or ‘overlaps’ a group of other line segments.

That turned out to be a very slow query, although that wasn’t obvious when I ran query over map data of Wolfsburg (because this is a small data set). Why is it slow?

The query looks like this:

SELECT col1,
       col2,
       /* "geom" is some geometry created outside of this 
           query beforehand */
       INTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
  FROM tb1,
       tb2 
 WHERE tb1.id = tb2.id

This query doesn’t look gigantic in terms of text size. However, there is one lengthy operation in it: INTERSECTS. This is a spatial operation to calculate exact spatial relationship between two geometries. If the data set is big, like the map data of China, then all the line segments in China are run through this operation and the performance issue would be quite noticeable, which was what I encountered this week.

So, Spatialite is slow, isn’t it? No, not really. The problem is spatialite does not use spatial index at all to perform such spatial operation unless you explicitly tell it to so.

See this stackoverflow post. Due to limitations in the SQLite design, using a spatial index in a query isn’t as invisible as it is in PostGIS.

That said, what we can do to accelerate the query above is to build a spatial index over the geometries stored in tb1 table, so the spatial opeartion INTERSECTS can make use of the spatial index to quickly find the answer. To build a spatial index, there are two ways in Spatialite/SQLite domain:

  1. build it explicitly within the query like this recipe.
  2. use SQLite R*Tree module, which is implemented as a virtual table in SQLite.

The speedup should be quite dramatic with the help of spatial indexes. However, the trade-off is more code and less readibility (especially with the first method mentioned above). Can we do better?

To answer this question, let’s have a closer look at the query again; what do we really want to achieve in this query? Do we really need to calculate the EXACT AND ACCURATE spatial relationship among geometries?

If the answer is yes, then building a spatial index is necessary. If the answer is no, which means you only need some ‘approximation’ on the spatial relation, then building spatial index could be overkill. And even if you really need the exact spatial relation, you can still use the approximation as a pre-filtering to avoid some unnecessary, expensive operations beforehand.

So how to calculate approximate spatial relationships? Use Spatialite MBR (Minimum Bounding Rectangle) operations! See here. MBR relationships are fast to evaluate, plus the MBR information is already embeded in every geometry which means no extra calculation is needed to obtain such information.

In other words, if approximate spatial relation is already good enough, then the above query can become the following:

SELECT col1,
       col2,
       MBRINTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
  FROM tb1,
       tb2 
 WHERE tb1.id = tb2.id

 /* Or use MBR relation as a prefiltering if exact spatial 
    relation is still required */
SELECT col1,
       col2,
       /* assume spatial index has already been created 
          over tb1.geom1's; I do not explicitly show that 
          in this query for simplicity */
       INTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
  FROM tb1,
       tb2 
 WHERE tb1.id = tb2.id
   AND NOT MBRDISJOINT(tb1.geom1, geom)

See the difference? :)