From Code to Clarity: Visualizing SQL code for Documentation and Debugging
In this post we will show you how you can create a visual diagram of your SQL code. We will use the FlowHigh SQL visualiser to achieve this.
FlowHigh is the essential toolbox for SQL data engineers and SQL developers. It helps engineers to increase their productivity when developing SQL code and data pipelines.
Apart from the FlowHigh SQL visualiser it is made up of the following components
FlowHigh SQL Optimiser. The Optimiser detects 30+ anti patterns in SQL code and gives recommendations on how to fix them.
FlowHigh SQL Parser. The SQL parser helps to break down individual SQL code into its components.
FlowHigh SQL Tuning. A marketplace that brings together SQL experts and for tuning poorly performing SQL queries.
We will add new features such as automated SQL data pipelines and SQL impact analysis over time. Get access to the FlowHigh to keep up to date and try out existing and new features.
Let’s take a simple SELECT statement and show how the FlowHigh visualiser converts the code to a visual flow diagram.
Visualise SQL code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
WITH f_customer_sales AS ( SELECT fs.OrderDateKey ,fs.ProductKey ,fs.OrderQuantity*fs.UnitPrice AS TotalSale ,dc.FirstName ,dc.LastName FROM FactInternetSales fs INNER JOIN DimCustomer dc ON dc.CustomerKey=fs.CustomerKey) ,d_date AS ( SELECT DateKey ,CalendarYear FROM DimDate WHERE CalendarYear=2017) SELECT CalendarYear ,ProductKey ,SUM(TotalSale) AS TotalSales FROM f_customer_sales INNER JOIN d_date ON d_date.DateKey=f_customer_sales.OrderDateKey GROUP BY CalendarYear ,ProductKey ORDER BY CalendarYear ASC ,TotalSales DESC |
The SQL code is of low complexity and contains multiple Common Table Expressions (CTE). Let’s have a look at the visual diagram that FlowHigh Visualiser generates
Using the visual diagram of the SQL code we can quickly see that it breaks down into two CTEs.
The first CTE f_customer_sales joins the tables FactInternetSales and DimCustomer together.
The second CTE d_date queries the table DimDate and applies a filter on the CalendarYear field.
The two CTEs are then joined and an aggregation is applied. In a final step the resultset is sorted and returned to the user.
Table types
For representing SQL code we have come up with a categorisation for tables. We have already come across three of these table types in the previous section: physical tables, CTEs, and Select.
Let’s go through them one by one
Physical table
A physical table is a standard table that has been created with the CREATE TABLE DDL statement.
The following query references a physical table
1 2 3 |
SELECT col1 ,col2 FROM order_item |
… and the representation by FlowHigh on the diagram.
Pseudo tables
Pseudo tables are tables that are created on the fly by hard coding values.
An example
1 2 3 |
SELECT CAST('2021-05-24' AS DATE) AS QueryDate UNION ALL SELECT CAST('2021-05-25' AS DATE) AS QueryDate |
Common Table Expressions (CTE)
We have already seen CTEs in action in our first example. CTEs are reusable snippets of SQL that can be nested and referenced in other components of the same SQL statement.
There is one CTE visualisation feature I did not mention. Sometimes developers write a CTE and then don’t use it.
Here is an example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
WITH f_customer_sales AS ( SELECT fs.OrderDateKey ,fs.ProductKey ,fs.OrderQuantity*fs.UnitPrice AS TotalSale ,dc.FirstName ,dc.LastName FROM FactInternetSales fs INNER JOIN DimCustomer dc ON dc.CustomerKey=fs.CustomerKey) ,d_date AS ( SELECT DateKey ,CalendarYear FROM DimDate WHERE CalendarYear=2017) ,d_geo AS ( SELECT geo_nm FROM geography) SELECT CalendarYear ,ProductKey ,SUM(TotalSale) AS TotalSales FROM f_customer_sales INNER JOIN d_date ON d_date.DateKey=f_customer_sales.OrderDateKey GROUP BY CalendarYear ,ProductKey ORDER BY CalendarYear ASC ,TotalSales DESC |
We have added a third CTE d_geo to the SQL. However, this is not referenced anywhere else.
We can see that this results in an orphaned component on the diagram which is not connected to any other parts of the SQL. We also grey this out on the diagram.
This will also be detected by the FlowHigh optimiser as an SQL anti pattern.
Inline view
Inline views are similar to CTEs. They can both be nested. One of the differences is that CTEs are reusable and multiple instances can be referenced inside the same SQL statement.
1 2 3 4 5 6 7 8 9 |
SELECT category_name ,max_list_price FROM product_categories a JOIN(SELECT category_id ,MAX( list_price) max_list_price FROM products GROUP BYcategory_id) b ON a.category_id=b.category_id ORDER BYcategory_name; |
Recursive CTE
Recursive CTEs traverse a hierarchy. Here is an example
1 2 3 4 5 6 7 8 9 10 11 12 13 |
WITH cte_numbers(n ,weekday)AS ( SELECT0 ,DATENAME(DW ,0) UNION ALL SELECT n+1 ,DATENAME(DW ,n+1) FROM cte_numbers WHERE n<6) SELECT weekday FROM cte_numbers; |
The diagram shows the anchor and recursive clause. For a great explanation on how recursive CTEs work check this blog post Understanding Recursive CTEs.
Function table
Some databases and SQL dialects offer User Defined Table Functions (UDTF).
Here is an example for a UDTF
1 2 3 4 5 6 |
SELECT * FROM TABLE(get_countries_for_user(123)) cc WHERE cc.country_code IN ('US' ,'FR' ,'CA') ORDER BY country_code; |
And how FlowHigh displays a UDTF in the diagram
Browsing and searching SQL
You can browse the diagram
You can find tables by browsing and selecting tables from the Table List. This is very useful if you need to quickly find a table in a very large and complex SQL statement with many subqueries and table joins.
In the figure below we have selected the DimCutsomer table and the f_customer_sales CTE. They are highlighted on the diagram.
You can also browse the diagram itself and select tables which will then highlight the corresponding part of the SQL code.
In the figure below we have selected the f_customer_sales CTE. This will highlight the definition of the CTE and also any instances of the CTE.
We search for the d_date CTE and results are highlighted in the editor and the diagram
Navigating the SQL visualisation
Expand and collapse
You can collapse and also expand any CTEs and Inline Views. This is useful to drill down into the SQL hierarchy and nest or unnest any lower or higher level parts of the SQL.
Let’s collapse both CTEs to focus on the top level tables that make up our SQL.
We can now drill further into our code one CTE at a time
The same expand and collapse functionality also exists for the SQL code editor. We can also expand and collapse and inline views or CTEs.
This drill down feature particularly useful for deeply nested and complex SQL
Level 1
Level 2
Level 3
Level 4
Columns
You can also view column level detail
Hover over a table
…or an operator
Expand a table
Expand all tables
Other features
Export and download SQL diagram
You can also download the SQL diagram as a PNG to use in your documentation.
Zoom
You can zoom in and out of a diagram by using your mouse wheel or the zoom feature in the bottom right corner of the diagram
Use cases for visualising SQL
Visualising SQL is useful for the following scenarios
- SQL query visualisation helps understand complex queries. It allows developers to see how each part of the query contributes to the final result, making it easier to understand and debug.
- SQL query visualisations can be used to document SQL code