Reverse Geocoding on Snowflake
In this blog post we will show how you can use OpenStreetMap data from the Snowflake data marketplace and the geospatial feature in the Snowflake data platform to reverse geocode latitude / longitude points. Reverse geocoding is the process of converting a point location (latitude, longitude) to a readable address or place name. This permits the identification of nearby street addresses, places, and/or area subdivisions such as a neighbourhood, county, state, or country.
Reverse geo coding Latitude: -37.8719989, Longitude: 145.2127641 will give you 17, Richardson Rise, Wantirna South, City of Knox, Victoria, Australia, POSTCODE – 3152
Table of Contents
Consider you are an IOT company. You have a smart watch device which tracks the user’s location as a coordinate continuously. The user visited a lot of places and wants to know the restaurant address that he visited around 2 pm. In this use case, the only thing you will be able to retrieve from the device is Latitude and Longitude. Now from this point to get the address Reverse Geocoding is required.
Consider you have a fleet of rental cars and you want to track your car every 10 minutes by getting the address. If you have a device fitted in your car that sends you the location point of the car, by reverse geocoding you can fetch the address and track your vehicle.
There are many other use cases where reverse geocoding can be used for example, tracking location of payment, tagging address on photos or videos etc.
We will use Sonra’s OpenStreetMap (OSM Australia data set from Snowflake’s Data Marketplace. We applied various transformations to this data and put the logic into Views
- V_OSM_AUS_FL_ADDR – Contains the first level address like house number, name of shop, name of building etc.
2. V_OSM_AUS_BOUNDARY – Contains administrative boundaries
3. V_OSM_AUS_STREET – Contains street and railway line data
4. V_OSM_AUS_STREET_NODE – Contains information on the nodes that make up the streets and railway lines
Using the query in the appendix we are able to retrieve various pieces of information across three hierarchical levels for a given geo point (lat / lon)
Level 1: House number, building name
Level 2: Street name including the nodes (lat / lon) that make up the street name
Level 3: Boundary Hierarchy, e.g. county, state, country etc.
When we query our Views by giving a lat / lon pair we get the address including the house number, name of a shop, name of a building., street name and the boundary hierarchy etc.
Let’s look at some examples
Lets fetch the address including house number from a location point.
Run the query in the appendix and use the following coordinates to get the address info
Latitude: -37.8719989, Longitude: 145.2127641
Point on Map:
We can see that it fetched the house number, street name and the boundary hierarchy with postcode.
Now lets fetch the address which includes house number along with the building name for a geo location by running the same query
Latitide: -33.9438680, Longitude: 151.0497926
The query retrieved the house number, building name, street name and the boundary hierarchy along with the postcode.
In this example, we fetch the address for a shop
Latitude: -32.7638302, Longitude: 151.5903418
We got back the shop name, street name and the boundary hierarchy along with the postcode.
Some points don’t have any houses, shops nearby. Consider a highway connecting two cities. For such points we will consider second level addresses.
Let’s run the query for such a geo location.
Latitude: -32.436829, Longitude: 139.961628
We can see in the map there are no houses or addresses nearby and as a result it fetched the street name and boundary hierarchy along with the postcode.
In this blog post, we have used Sonra’s OpenStreetMap Australia data set from the Snowflake data marketplace to get a readable address from a point on the map (reverse geocoding).
We have also made available OpenStreetMap data for the US, UK, Germany, France and many other countries on the Snowflake data marketplace.
To get started you can subscribe to our OSM dataset or else reach out to us through our website.
You can also drop us a mail at [email protected] and we will try to get back to you as early as possible.
We will give a latitude and longitude as an input to this query to return address information.
CONCAT(LISTAGG(NAME, ', ') WITHIN GROUP (ORDER BY RANK DESC), IFNULL(CONCAT(', POSTCODE - ', SUBSTRING(MAX(POSTCODE), 4)), '')) AS ADDRESS
FL_ADDR AS NAME,
40 AS RANK,
CONCAT('C - ', POSTCODE) AS POSTCODE
AND LON BETWEEN 144 AND 146
AND LAT BETWEEN -39 AND -37 QUALIFY ROW_NUMBER() OVER (
30 AS RANK,
CONCAT('B - ', POSTCODE) AS POSTCODE
INNER JOIN (
-37.8719989)) AS Distance
AND LON BETWEEN 144 AND 146
AND LAT BETWEEN -39 AND -37) NODE ON
STREET.NODE_ID = NODE.ID QUALIFY ROW_NUMBER() OVER (
CONCAT('A - ', POSTCODE) AS POSTCODE
Let us understand the query by breaking it into 2 parts, outer query and inner query. Inner query can be further divided into 3 subparts
Part 1 (Outer Query): Aggregate the output of sub query to form the address
In this we aggregate the name column into 1 row using LISTAGG function and take the max of postcode which comes from the subquery. Subquery consists of 3 parts which can be termed as 3 levels. Post code from the first level will be given higher priority than the postcode in the last level. For example, if a postcode is available at first level i.e. house number, shop etc we consider that postcode and ignore the postcodes at the lower level.
Part 2 (Inner Query): Fetching address at various level and merge them using union all
Level 1: We fetch the names and postcodes of the first level address within 100 meters distance from the point. For reducing the number of data points for calculating the distance we create a bounding box in where clause. We append the postcode with ‘C – ‘ so that while taking the max of the Postcode this is given priority over the other two levels. Using the qualify clause we select the nearest name and its postcode if available.
Level 2: We fetch the street names along with their postcode if available within 2000 meters and pick the nearest one using the qualify clause. For postcode we append ‘B – ‘ which says that it takes the second highest priority.
Level 3: We have a view which includes the hierarchical boundaries and rank associated with it. Low level boundaries are given higher rank so while ordering in descending order it comes first. In this we list the point under hierarchical boundaries at various levels.
We are offering various GEO datasets. You can find our offering here: