The Data Marketplace. A missing piece in modern data architecture

Uli Bethke Snowflake

What is a Data Marketplace?

Data Marketplaces are a relatively recent phenomenon in data management. They bring together providers and buyers of data. Organisations were able to buy data from third parties in the past. However, Data Marketplaces make it significantly easier to find and buy data. They also cut out the middleman and bring providers and consumers of data in direct contact with each other. Finally, Data Marketplaces are good for competition and lower the bar of entry to new providers.

The Snowflake Data Marketplace?

Snowflake is one of the providers of a Data Marketplace.

In my opinion It is the hidden gem of the Snowflake data platform. If executed well the Data Marketplace will be the engine of massive growth for the company. I will explain in a minute. Let’s first look at the Snowflake Data Marketplace itself and how you as a consumer of data can benefit from it.

Just like any other market, the Snowflake Data Marketplace connects buyers and sellers. Instead of buying apples or potatoes you can purchase a subscription to data.

The Data Marketplace consists of a catalog where you can browse for data sets and request access from the provider of the data.

Some of the data sets are free to access. Other data sets require a subscription.

As you can see from the figure above the Snowflake Data Marketplace is divided into various categories, e.g. Business, Travel Marketing etc. One of the categories is location data (Local). Sonra has published various open data sets in this category.

Once you have requested access to a data set you need to agree terms and conditions with the seller. Once this is done, the provider of data will share out the data set and you can use it to enrich your in-house data.

Benefits of the Data Marketplace

The Data Marketplace makes use cases possible that were not achievable in the past. It is the missing piece in a modern data architecture. Here are some benefits.

  • Data scientists can significantly improve the accuracy and value of predictive models by adding new features to your model. The options are mind boggling. You can expect a huge Return on Investment.
  • Reference and master data such as ISO codes, geography hierarchies, address data etc. can be consumed without worrying about ETL. The provider of data takes care of it.
  • Organisations can monetize their own data.

The technical foundation of the Snowflake Data Marketplace

The Data Marketplace is built on top of the Snowflake data sharing feature which in turn is made possible by separating storage from compute. While other vendors also claim to support this feature, few actually do. One tell tale sign of tight coupling of compute and storage is lengthy cluster start up times. With Snowflake this is an instantaneous process.

As a Snowflake client you can share data with other Snowflake clients or even with third parties that don’t have a Snowflake account. In the past, organisations had to use FTP or APIs to exchange data between suppliers, regulators, clients etc. Data sharing simplifies the whole data exchange life cycle. You don’t need to create an API or export your data to XML and upload it to an FTP server from where the consuming party needs to download the data and convert it back into a database. With the data sharing feature datasets from third parties are just a JOIN away.

The great thing about data sharing in Snowflake is that it is available across all three cloud platforms and data can also be replicated across multiple regions.

The Snowflake Data Marketplace in action

Now that we know how the Snowflake Data Marketplace works let’s dive into an example.

In this scenario we are a property website and want to enrich our in house data with location data from the Snowflake Marketplace.

We first request access to the OpenStreetMap UK and Administrative Boundaries UK data sets

Step 1: Click on Get Data

Step 2: Select the Database Name and Roles that can access the data

Now you can see the database in your explorer

Use case 1: Reverse geo coding

For the first use case we look at reverse geo coding amenities that are near a property.

Sample Queries

  1. Count of supermarkets around properties in 500 meter distance in descending order
We use the ST_DWITHIN geospatial function to find the number of amenities nearby the geo coordinates (lat/lon) of a property.

2. Get a list of all the amenities in 1000 meter distance around Property ID, ‘3692’

Next, let’s look at the amenities nearby a particular property and calculate the distance using ST_DWITHIN.

Use case 2: Aggregation / drill down on geography dimension

We can use the UK Administrative Boundary data to drill down into results. Let’s look at the regions in England with the highest number of fast food restaurants.

  1. List of Regions with number of fast food stores in descending order

Using census information we could calculate the area with the highest density of fast food stores per capita. But this is a task for another day.

2. List of Wards in London Region with average property prices and number of convenience store

Instead, let’s drill down into the London region and have a closer look at convenience stores and average property prices at the Ward level

Why is the Snowflake Data Marketplace a game changer?

Now that we know how we can use the Snowflake Data Marketplace to our advantage and know how it works we can look at its significance for data management.

Buying and selling data is not a new thing. It has been around for a long time. Vendors such as Experian or Acxiom have been monetising, selling, and reselling data for a few decades. What has changed is that the whole process is a lot more transparent and open. It cuts out the middleman and brings buyers and sellers together.

So let’s come back to my initial point. Why do I think that the Data Marketplace is so important for Snowflake?

Three words: Two-sided markets.

Marketplaces scale exceptionally well because they don’t need to own their inventory. Airbnb, for instance, belongs to the world’s 100 largest companies in the travel industry without owning a single hotel room.

Multi-sided marketplaces make it significantly easier, faster, and cheaper for customers to find what they need. As an example, consider what getting a taxi was like before Uber entered the market.

Depending on the market, the customer either needed to call a number or stand on the street and wait for an unoccupied taxi to drive by. They’d know nothing about the driver and commit to a ride the final sum of which typically remained a mystery until the very end – and often needed to be paid in cash.

The same principles apply to Data Marketplaces.

Snowflake has all of the ingredients in place to benefit from these network effects and make the Data Marketplace a success. It is uniquely positioned.

  • The Marketplace is built on a world class data platform. As we have seen, the technology is uniquely suited to deliver on the promises of the Marketplace.
  • Unlike other offerings, Snowflake has a critical mass of customers and partners that can bring the Data Marketplace to life and generate the required network effects.
  • There is a lot of momentum from the recent IPO, which will help to attract new providers and consumers of data
  • The Data Marketplace is managed by an experienced team
  • Snowflake is available across all the major cloud platforms which gives it unique exposure that no other competitor can achieve.

At Sonra we are glad to be a part of this journey.

Are you curious? Do you want to explore new data sets that can improve the predictions of your machine learning models?

Join our webinar on 7th October on Open Data.

Also have a look at the location data sets Sonra has published on the Snowflake Data Marketplace.

Are you looking for an open data set that is not currently available? Reach out to us with your requirements.

 

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.