In this blog post we will show you how easy it is to connect Zeppelin notebooks to Snowflake Cloud Data Warehouse. We will also execute some queries and visualize the results using Zeppelin’s built-in tools.
Notebooks are useful for exploratory data analysis, data discovery and data storytelling. As they are web based they can be shared between different users to collaborate on a project. They are most useful when used with Python or R but can also be used to connect to relational databases.
If you don’t have Zeppelin installed you can follow the instructions found here.
If you have a Linux machine or a MacOS you can probably use a package manager for installation.
For example on a MacOS the installation is simple as:
brew install apache-zeppelin
After the installation, we can start Zeppelin notebook on a Unix machine with a simple command:
This command starts the Zeppelin notebook server which can be reached on the localhost and port 8080 by entering localhost:8080 in your browser.
Assuming that Zeppelin is already installed, our goal is to connect Zeppelin to Snowflake Data Warehouse. To start off, we will have to add a new interpreter for Snowflake. We can go to the interpreter menu as follows:
Once we get into the interpreter settings menu, the next step is to click the Create button (top right) in order to create a new interpreter:
Now we have to choose jdbc in the Interpreter group. After doing so you will see that many new parameters show up.
Once we have chosen JDBC in the Interpreter group, the following all that we have to do:
1) Fill in the box Interpreter Name with any name you want but later parts of the demo will assume the name snowflake
2) Change the field default.password to your Snowflake account’s password
3) Change the field default.url to the JDBC URL of your Snowflake database.
There are precise rules on how to make JDBC URLs, but for example something like this could be used (fill in with your credentials):
4) Change the field default.user to your Snowflake user
5) In the dependencies subsection, add the location of the latest JAR file used for JDBC connection. We prefer to use the location of the JAR file on central Maven repository.
In our case the location of the latest JAR file is: net.snowflake:snowflake-jdbc:3.6.0.
Once we have created the interpreter, it should look like this:
To use the new Snowflake interpreter we have to go back to base screen and create a new notebook:
Once we are in the notebook let’s query our data:
We only had to add the “%snowflake” prefix at the start of a paragraph (cell) and then we can write a SQL query.
The output of the query looks as follows:
We can also use Zeppelin to make an interesting pie chart:
In the piechart every slice represent the proportion of sales per each county in Dublin.
We can also make a barchart by grouping on edname field and counting the total number of households: