We help one of our clients operate a high traffic API that, among other things, returns data that is associated with an IP address range. It’s similar to querying for information from WHOIS.
The IP address ranges are specified in a MySQL database with ip_start and ip_end columns. The IP addresses are first converted to their representation as 32-bit integers.
For instance, the range 22.214.171.124 to 126.96.36.199 would look like:
ip_start ip_end ...
Read on for more about how we helped optimize the queries that search for an IP within these ranges.
Loading new information into the API required that we test whether a new IP range overlapped with a current range in the database.
It is relatively easy to write a query that tests whether one range intersects another in SQL:
Unfortunately, even with indexes, this query requires a full table scan in MySQL. On a large table, the query took around 0.25 seconds. While insignificant for one-off queries, loading thousands of new ranges made the time really add up.
We scoped out various solutions, but the one that ended up working best was modeling the problem using MySQL Spatial Extensions.
While IP ranges are not a typical use case for spatial extensions, they can be modeled as 1-dimensional lines. And given that spatial indexes are very efficient at doing queries like Intersects(), we found the solution to be really performant.
The following SQL snippet creates a column called ip_range on the table that is then populated with rectangles in 2-dimensions that represent the IP ranges. We only care about 1-dimension, so the height of the rectangles is unimportant:
With that column and the spatial index, queries to test for range intersection can be written as:
This query achieves the same thing as the earlier query, but runs much faster.
Have you modeled a problem with the geospatial features of a database?
Interested in leveling up your coding skills from the same authors of the Big Nerd Ranch Guide? Subscribe to The Frontier today!