The Data Marketplace. A missing piece in modern data architecture
What is a Data Marketplace?
Data Marketplaces are a relatively recent phenomenon in data management. They bring together providers and buyers of data. Organisations were able to buy data from third parties in the past. However, Data Marketplaces make it significantly easier to find and buy data. They also cut out the middleman and bring providers and consumers of data in direct contact with each other. Finally, Data Marketplaces are good for competition and lower the bar of entry to new providers.
The Snowflake Data Marketplace?
Snowflake is one of the providers of a Data Marketplace.
In my opinion It is the hidden gem of the Snowflake data platform. If executed well the Data Marketplace will be the engine of massive growth for the company. I will explain in a minute. Let’s first look at the Snowflake Data Marketplace itself and how you as a consumer of data can benefit from it.
Just like any other market, the Snowflake Data Marketplace connects buyers and sellers. Instead of buying apples or potatoes you can purchase a subscription to data.
The Data Marketplace consists of a catalog where you can browse for data sets and request access from the provider of the data.
Some of the data sets are free to access. Other data sets require a subscription.
As you can see from the figure above the Snowflake Data Marketplace is divided into various categories, e.g. Business, Travel Marketing etc. One of the categories is location data (Local). Sonra has published various open data sets in this category.
Once you have requested access to a data set you need to agree terms and conditions with the seller. Once this is done, the provider of data will share out the data set and you can use it to enrich your in-house data.
Benefits of the Data Marketplace
The Data Marketplace makes use cases possible that were not achievable in the past. It is the missing piece in a modern data architecture. Here are some benefits.
- Data scientists can significantly improve the accuracy and value of predictive models by adding new features to your model. The options are mind boggling. You can expect a huge Return on Investment.
- Reference and master data such as ISO codes, geography hierarchies, address data etc. can be consumed without worrying about ETL. The provider of data takes care of it.
- Organisations can monetize their own data.
The technical foundation of the Snowflake Data Marketplace
The Data Marketplace is built on top of the Snowflake data sharing feature which in turn is made possible by separating storage from compute. While other vendors also claim to support this feature, few actually do. One tell tale sign of tight coupling of compute and storage is lengthy cluster start up times. With Snowflake this is an instantaneous process.
As a Snowflake client you can share data with other Snowflake clients or even with third parties that don’t have a Snowflake account. In the past, organisations had to use FTP or APIs to exchange data between suppliers, regulators, clients etc. Data sharing simplifies the whole data exchange life cycle. You don’t need to create an API or export your data to XML and upload it to an FTP server from where the consuming party needs to download the data and convert it back into a database. With the data sharing feature datasets from third parties are just a JOIN away.
The great thing about data sharing in Snowflake is that it is available across all three cloud platforms and data can also be replicated across multiple regions.
The Snowflake Data Marketplace in action
Now that we know how the Snowflake Data Marketplace works let’s dive into an example.
In this scenario we are a property website and want to enrich our in house data with location data from the Snowflake Marketplace.
We first request access to the OpenStreetMap UK and Administrative Boundaries UK data sets
Step 1: Click on Get Data
Step 2: Select the Database Name and Roles that can access the data
Now you can see the database in your explorer
Use case 1: Reverse geo coding
For the first use case we look at reverse geo coding amenities that are near a property.
Sample Queries
- Count of supermarkets around properties in 500 meter distance in descending order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
select INDEX AS PROP_ID, PRICES, BEDS, ADDRESS, COUNT(SHOP.ID) AS COUNT_OF_SUPERMARKET from V_OSM_UK_SHOP SHOP INNER JOIN PROPERTY PROP ON ST_DWITHIN(SHOP.COORDINATES, PROP.COORDINATES, 500) WHERE SHOP.SHOP = 'supermarket' GROUP BY PROP_ID, PRICES, BEDS, ADDRESS ORDER BY COUNT_OF_SUPERMARKET DESC; |
We use the ST_DWITHIN geospatial function to find the number of amenities nearby the geo coordinates (lat/lon) of a property.
2. Get a list of all the amenities in 1000 meter distance around Property ID, ‘3692’
Next, let’s look at the amenities nearby a particular property and calculate the distance using ST_DWITHIN.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
select AMENITY.ID, AMENITY.AMENITY, AMENITY.NAME, ST_DISTANCE(AMENITY.COORDINATES,PROP.COORDINATES) as DISTANCE from V_OSM_UK_AMENITY AMENITY INNER JOIN PROPERTY PROP ON ST_DWITHIN(AMENITY.COORDINATES, PROP.COORDINATES, 1000) WHERE PROP.INDEX = 3692; |
Use case 2: Aggregation / drill down on geography dimension
We can use the UK Administrative Boundary data to drill down into results. Let’s look at the regions in England with the highest number of fast food restaurants.
- List of Regions with number of fast food stores in descending order
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SELECT RGN19NM, COUNT(*) AS COUNT_OF_FAST_FOOD FROM ( SELECT RGN19NM, CASE WHEN COORDINATES IS NULL THEN POLYGON_COORDINATES ELSE COORDINATES END AS COORDINATES FROM V_ONS_UK_REGION_201912 REGION LEFT OUTER JOIN V_ONS_UK_REGION_POLYGON_201912 RP ON REGION.OBJECTID = RP.OBJECTID ) AS R INNER JOIN V_OSM_UK_AMENITY_SUSTENANCE AMENITY_SUSTANCE ON ST_COVERS(R.COORDINATES,AMENITY_SUSTANCE.COORDINATES) WHERE AMENITY = 'fast_food' GROUP BY RGN19NM ORDER BY COUNT_OF_FAST_FOOD DESC; |
Using census information we could calculate the area with the highest density of fast food stores per capita. But this is a task for another day.
2. List of Wards in London Region with average property prices and number of convenience store
Instead, let’s drill down into the London region and have a closer look at convenience stores and average property prices at the Ward level
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
SELECT PROPERTY.WD19CD, PROPERTY.WD19NM, PROPERTY.AVERAGE_PRICE, OSM_SHOP.COUNT_OF_CONVENIENCE_STORE FROM ( SELECT WD19CD, WD19NM, ROUND(AVG(PRICES), 2) AS AVERAGE_PRICE FROM PROPERTY PROP INNER JOIN ( SELECT WARDS_INNER.* FROM V_ONS_UK_LKP_WARD_LOCAL_AUTH_DIST_COUNTY_REGION_COUNTRY_201912 LKP INNER JOIN V_ONS_UK_WARD_201912 WARDS_INNER ON WARDS_INNER.WD19CD = LKP.WD19CD WHERE RGN19CD LIKE 'E12000007') WARDS ON ST_COVERS(WARDS.COORDINATES, PROP.COORDINATES) GROUP BY WD19CD, WD19NM) PROPERTY INNER JOIN ( SELECT WD19CD, WD19NM, COUNT(*) AS COUNT_OF_CONVENIENCE_STORE FROM V_OSM_UK_SHOP SHOP INNER JOIN ( SELECT WARDS_INNER.* FROM V_ONS_UK_LKP_WARD_LOCAL_AUTH_DIST_COUNTY_REGION_COUNTRY_201912 LKP INNER JOIN V_ONS_UK_WARD_201912 WARDS_INNER ON WARDS_INNER.WD19CD = LKP.WD19CD WHERE RGN19CD LIKE 'E12000007') WARDS ON ST_COVERS(WARDS.COORDINATES, SHOP.COORDINATES) WHERE SHOP.SHOP = 'convenience' GROUP BY WD19CD, WD19NM) OSM_SHOP ON OSM_SHOP.WD19CD = PROPERTY.WD19CD ORDER BY AVERAGE_PRICE, COUNT_OF_CONVENIENCE_STORE DESC; |
Why is the Snowflake Data Marketplace a game changer?
Now that we know how we can use the Snowflake Data Marketplace to our advantage and know how it works we can look at its significance for data management.
Buying and selling data is not a new thing. It has been around for a long time. Vendors such as Experian or Acxiom have been monetising, selling, and reselling data for a few decades. What has changed is that the whole process is a lot more transparent and open. It cuts out the middleman and brings buyers and sellers together.
So let’s come back to my initial point. Why do I think that the Data Marketplace is so important for Snowflake?
Three words: Two-sided markets.
Marketplaces scale exceptionally well because they don’t need to own their inventory. Airbnb, for instance, belongs to the world’s 100 largest companies in the travel industry without owning a single hotel room.
Multi-sided marketplaces make it significantly easier, faster, and cheaper for customers to find what they need. As an example, consider what getting a taxi was like before Uber entered the market.
Depending on the market, the customer either needed to call a number or stand on the street and wait for an unoccupied taxi to drive by. They’d know nothing about the driver and commit to a ride the final sum of which typically remained a mystery until the very end – and often needed to be paid in cash.
The same principles apply to Data Marketplaces.
Snowflake has all of the ingredients in place to benefit from these network effects and make the Data Marketplace a success. It is uniquely positioned.
- The Marketplace is built on a world class data platform. As we have seen, the technology is uniquely suited to deliver on the promises of the Marketplace.
- Unlike other offerings, Snowflake has a critical mass of customers and partners that can bring the Data Marketplace to life and generate the required network effects.
- There is a lot of momentum from the recent IPO, which will help to attract new providers and consumers of data
- The Data Marketplace is managed by an experienced team
- Snowflake is available across all the major cloud platforms which gives it unique exposure that no other competitor can achieve.
At Sonra we are glad to be a part of this journey.
Are you curious? Do you want to explore new data sets that can improve the predictions of your machine learning models?
Join our webinar on 7th October on Open Data.
Also have a look at the location data sets Sonra has published on the Snowflake Data Marketplace.
Are you looking for an open data set that is not currently available? Reach out to us with your requirements.
We are offering various GEO datasets. You can find our offering here: