This week I created some query with INTERSECTS
spatial operation in it, it is used
to detect if some line segments ‘intersects’ or ‘overlaps’ a group of other line segments.
That turned out to be a very slow query, although that wasn’t obvious when I ran query over map data of Wolfsburg (because this is a small data set). Why is it slow?
The query looks like this:
SELECT col1,
col2,
/* "geom" is some geometry created outside of this
query beforehand */
INTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
FROM tb1,
tb2
WHERE tb1.id = tb2.id
This query doesn’t look gigantic in terms of text size. However, there is one lengthy
operation in it: INTERSECTS
. This is a spatial operation to calculate exact spatial
relationship between two geometries. If the data set is big, like the map data of China, then
all the line segments in China are run through this operation and the performance issue would
be quite noticeable, which was what I encountered this week.
So, Spatialite is slow, isn’t it? No, not really. The problem is spatialite does not use spatial index at all to perform such spatial operation unless you explicitly tell it to so.
See this stackoverflow post. Due to limitations in the SQLite design, using a spatial index in a query isn’t as invisible as it is in PostGIS.
That said, what we can do to accelerate the query above is to build a spatial index over the
geometries stored in tb1
table, so the spatial opeartion INTERSECTS
can make use of
the spatial index to quickly find the answer. To build a spatial index, there are two ways
in Spatialite/SQLite domain:
- build it explicitly within the query like this recipe.
- use SQLite R*Tree module, which is implemented as a virtual table in SQLite.
The speedup should be quite dramatic with the help of spatial indexes. However, the trade-off is more code and less readibility (especially with the first method mentioned above). Can we do better?
To answer this question, let’s have a closer look at the query again; what do we really want to achieve in this query? Do we really need to calculate the EXACT AND ACCURATE spatial relationship among geometries?
If the answer is yes, then building a spatial index is necessary. If the answer is no, which means you only need some ‘approximation’ on the spatial relation, then building spatial index could be overkill. And even if you really need the exact spatial relation, you can still use the approximation as a pre-filtering to avoid some unnecessary, expensive operations beforehand.
So how to calculate approximate spatial relationships? Use Spatialite MBR (Minimum Bounding Rectangle) operations! See here. MBR relationships are fast to evaluate, plus the MBR information is already embeded in every geometry which means no extra calculation is needed to obtain such information.
In other words, if approximate spatial relation is already good enough, then the above query can become the following:
SELECT col1,
col2,
MBRINTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
FROM tb1,
tb2
WHERE tb1.id = tb2.id
/* Or use MBR relation as a prefiltering if exact spatial
relation is still required */
SELECT col1,
col2,
/* assume spatial index has already been created
over tb1.geom1's; I do not explicitly show that
in this query for simplicity */
INTERSECTS(tb1.geom1, geom) AS IsOverlappingTunnels
FROM tb1,
tb2
WHERE tb1.id = tb2.id
AND NOT MBRDISJOINT(tb1.geom1, geom)
See the difference? :)