Distance Searching and MySQL
January 28, 2015
So here’s a fairly common situation: Your organization has a handful of locations, and you want to provide a tool for people to find the one closest to them. Whether it’s a selection of retail partners providing your product or just some branch office locations, the convenience of letting people know where they can find you is important – and with a little well-organized information, we can make this happen.
It’s common to have a list of locations stored in your site’s database, usually with the name, address, locality and more. This may be good information for someone traveling the roads, but not very useful for pinpointing a position on the globe. What we need to work with is a simple set of coordinates: latitude and longitude. This is simple, easy to store and can be indexed for a rapid search. With a little math, we can use this to narrow down locations within a radius.
For Those Who Wish to Follow Along
For the purposes of this example, I’ll be using the U.S. data from the GeoNames data sets. You can download a zip file for most countries, containing a TSV file, which can be imported into your MySQL database via the tool of your choosing. This is a good sample to illustrate the performance of our queries, with more than two million records to search. We’ll want to be sure to index the “latitude” and “longitude” columns to speed up our searching.
Here’s the create table statement for the U.S. data that I used:
CREATE TABLE `geoname` ( `geonameid` int(11) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(200) DEFAULT NULL, `asciiname` varchar(200) CHARACTER SET ascii DEFAULT NULL, `alternatenames` varchar(10000) CHARACTER SET ascii DEFAULT NULL, `latitude` double DEFAULT NULL, `longitude` double DEFAULT NULL, `feature class` char(1) DEFAULT NULL, `feature code` varchar(10) DEFAULT NULL, `country code` char(2) DEFAULT NULL, `cc2` varchar(60) DEFAULT NULL, `admin1 code` varchar(20) DEFAULT NULL, `admin2 code` varchar(80) DEFAULT NULL, `admin3 code` varchar(20) DEFAULT NULL, `admin4 code` varchar(20) DEFAULT NULL, `population` bigint(20) DEFAULT NULL, `elevation` int(11) DEFAULT NULL, `dem` int(11) DEFAULT NULL, `timezone` varchar(40) DEFAULT NULL, `modification date` date DEFAULT NULL, PRIMARY KEY (`geonameid`), KEY `latitude` (`latitude`), KEY `longitude` (`longitude`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Using the Great-Circle Distance Formula and Limiting Distance
The great-circle distance is the shortest distance between two points on the surface of a sphere. This is determined by multiplying the radius of the sphere by the central angle of the two points (their latitude and longitude coordinates) in radians. Given the radius of the Earth is approximately 3,956 miles (the Earth isn’t exactly a perfect sphere, but this is close enough), we can express this in MySQL like so:
3956 * ACOS(COS(RADIANS(`orig_latitude`)) * COS(RADIANS(`latitude`)) * COS(RADIANS(`orig_longitude`) - RADIANS(`longitude`)) + SIN(RADIANS(`orig_latitude`)) * SIN(RADIANS(`latitude`)))
While that will provide the distance between the originating coordinate and the point of the location returned by the query, it’s quite calculation-heavy, and being run against a large data set can drastically hurt the query performance. The good news is that there’s a way around this. If we’re querying for locations nearby to a user, we can use our indices on the latitudes and longitudes to reduce the seeking of records to locations within proximity of the user’s latitude/longitude.
Since there are 69 miles per degree of latitude, we can limit our northern and southern bounds like so in the WHERE clause (where “radius” is our search radius):
`latitude` BETWEEN `orig_latitude` - (`radius` / 69) AND `orig_latitude` + (`radius` / 69)
Determining the east and west bounds is slightly different; since the miles per degree depends on the distance from the equator, we’ll use this function instead:
`longitude` BETWEEN `orig_longitude` - (`radius` / (69 * COS(RADIANS(`orig_latitude`)))) AND `orig_longitude` + (`radius` / (69 * COS(RADIANS(`orig_latitude`))))
So now that we have our formulas, it’s time to put the query together:
SET @orig_latitude = 32.867073, @orig_longitude = -96.769410, @radius = 10 ; SELECT * FROM ( SELECT `geonameid`, `name`, 3956 * ACOS(COS(RADIANS(@orig_latitude)) * COS(RADIANS(`latitude`)) * COS(RADIANS(@orig_longitude) - RADIANS(`longitude`)) + SIN(RADIANS(@orig_latitude)) * SIN(RADIANS(`latitude`))) AS `distance` FROM `geoname` WHERE `latitude` BETWEEN @orig_latitude - (@radius / 69) AND @orig_latitude + (@radius / 69) AND `longitude` BETWEEN @orig_longitude - (@radius / (69 * COS(RADIANS(@orig_latitude)))) AND @orig_longitude + (@radius / (69* COS(RADIANS(@orig_latitude)))) ) r WHERE `distance` < @radius ORDER BY `distance` ASC ;
A few notes: I placed the main select as a subquery to limit by distance, as it is possible for the bounding-box limits for the latitude and longitude to produce a limited number of results that are outside the radius. As for the user’s latitude and longitude, this could be gathered via HTML5 Geolocation, but for the sake of this sample, hard-coding our office’s location will suffice.
We’re able to seek through 2,104,095 records, and return the 2,321 records with calculated distances within the search radius, all within 0.0474 seconds.