Snowflake,

Home » Snowflake » How Snowflake EXCLUDE and ILIKE protects you from bad SQL. ILIKE it like that.

How Snowflake EXCLUDE and ILIKE protects you from bad SQL. ILIKE it like that.

by Uli Bethke

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.

Published on August 16, 2023
Updated on June 23, 2025

One of the most common and impactful SQL anti patterns is the use of SELECT *. In this article we will explain what an SQL anti pattern is and how SELECT * impacts Snowflake performance and increases your Snowflake costs. We will then walk you through how some new Snowflake SQL features can protect you from this anti pattern. Last but not least we will show you how FlowHigh helps you to detect this and many other SQL anti patterns.

Your subscription could not be saved. Please try again.

You're In! Welcome to FastForward Congratulations on successfully subscribing to the FastForward Data Engineering Newsletter! You're now part of a growing community of 15,000+ data engineers who are staying ahead in the ever-evolving world of data.

FlowForward.

All Things Data Engineering
Straight to Your Inbox!

What is an anti pattern?

Anti patterns identify common but problematic solutions in the software industry, much like how design patterns highlight best practices. They often arise from a lack of experience or misapplying good strategies in the wrong situations. By recognizing these anti patterns, professionals can understand the root causes of certain issues. These patterns not only point out problems but also offer detailed plans to fix them.

What is an SQL anti pattern?

SQL, like other coding languages, has its anti patterns. These bad practices can make code slow, hard to read, or even produce wrong outcomes. While some problems are specific to certain databases, many bad practices are common across SQL databases. It’s essential to recognize and address these issues, but sometimes fixing one can lead to another, requiring a balanced approach.

Why SELECT * is bad

Utilizing SELECT * in SQL might seem convenient, but it carries a lot of drawbacks. This approach often fetches unnecessary columns, straining system resources, especially in columnar databases where only specific columns should ideally be read. This indiscriminate fetching can increase costs. The excess data also amplifies serialization overheads and consumes more network bandwidth. Cache inefficiencies arise as unneeded data is stored, potentially evicting essential data. Moreover, for wide tables with many columns, the resource impact is even more visible. From a maintenance perspective, SELECT * can cloud code clarity and disrupt downstream processes if table schemas change. Given these issues, it’s advisable to specify columns explicitly in SQL queries to ensure efficiency and clarity.

Why People Fall for the SELECT * anti pattern

SQL developers and data professionals often resort to using SELECT * for a variety of reasons:

Simplicity and Convenience: Writing SELECT * is straightforward and doesn’t require knowledge of all column names. This makes it an attractive choice, especially when quickly drafting queries or exploring unfamiliar tables.
Exploratory Analysis: When initially getting a feel for a dataset or trying to understand a table’s structure, using SELECT * can provide a quick overview. It’s a common method for initial data examination before diving into more detailed analysis.

When Using SELECT * is Legitimate

While generally considered an anti-pattern, there are situations where SELECT * can be deemed appropriate:

Temporary Analysis: For one-time, quick analyses where a complete view of the data is necessary, SELECT * can be useful. It’s beneficial when the primary goal is to get a snapshot of the data without concern for performance or long-term implications. Ideally it should be used together with the LIMIT or TOP clause
Full Data Extraction: In situations where the intention is genuinely to extract or backup the entire table, SELECT * is not only legitimate but also the most direct method to achieve this.
Some data engineers use SELECT * for automatic schema evolution and infer the schema at runtime. There are pros and cons to this approach. Maybe the topic for another day

Leveraging Snowflake EXCLUDE and ILIKE to address the SELECT * anti pattern

So what can you do to address the anti pattern. Using an IDE that features autocomplete is one way of handling this. More recently databases have added new features to SQL to mitigate the issue.

Snowflake has introduced EXCLUDE and ILIKE. Both EXCLUDE and ILIKE reduce the negative impact of SELECT * by offering a neat way to include or exclude columns based on naming and patterns.

Let’s go through some examples. We will use the following table structure

CREATE TABLE products (

product_id INT PRIMARY KEY,

desc_english VARCHAR(255),

desc_french VARCHAR(255),

price DECIMAL(10, 2),

stock INT

);

SELECT * ILIKE:

Snowflake’s SELECT * ILIKE feature provides enhanced granularity when dealing with tables that have numerous columns, especially when you want to narrow down to specific naming patterns.

For instance, imagine an e-commerce database where the products table has varied columns for product descriptions in multiple languages: desc_english, desc_french, alongside other columns like price and stock. Instead of employing the usual anti pattern of fetching all columns and subsequently filtering out the required language columns in the application layer, you can harness Snowflake’s feature. By executing a query like SELECT * ILIKE ‘desc%’ FROM products;, you’re able to directly retrieve only the product descriptions. This not only enhances data fetch efficiency but also minimizes the processing overhead in your application.

Query:

1	SELECT * ILIKE 'desc%' FROM products';

DESC_ENGLISH	DESC_FRENCH
Laptop	Ordinateur portable
Mouse	Souris
Keyboard	Clavier
Monitor	Moniteur
Speaker	Haut-parleur

Using ILIKE matching is case insensitive.

EXCLUDE Keyword in Snowflake:

If you have a table with numerous columns and you want most of them but wish to exclude a few, the EXCLUDE function provides a straightforward approach. Instead of listing all the columns you want, you can simply specify the ones you don’t want.

For instance, given our previous e-commerce product table, if you’re interested in all details except descriptions in English, you’d use the EXCLUDE clause to leave out the desc_english column.

Query:

1	SELECT * EXCLUDE (desc_english) FROM products;;

Result:

PRODUCT_ID	DESC_FRENCH	PRICE	STOCK
1	Ordinateur portable	1,000	10
2	Souris	50	50
3	Clavier	70	35
4	Moniteur	150	20
5	Haut-parleur	80	45

Some databases such as BigQuery have the same feature but instead of EXCLUDE they use the EXCEPT clause to exclude columns in your results. This is not ideal as In many SQL dialects, EXCEPT has a different job. It is a synonym of MINUS and compares two SELECT result sets and shows rows from the first that aren’t in the second. This dual use can be confusing, so it’s important to know how your database handles EXCEPT, MINUS, and EXCLUDE.

Using FlowHigh UI to Detect the SELECT * anti pattern

We have created a module in FlowHigh to detect anti patterns in SQL, e.g. the SELECT * anti pattern.

In this section, we’ll guide you through the process using the FlowHigh UI to detect the SELECT * anti pattern.

Gain access to FlowHigh today.

We’ll use the below query as input to detect anti patterns.

WITH RecentOrders AS (

SELECT *

FROM Orders

WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31'

TopProducts AS (

SELECT product_id, SUM(sold_quantity) AS total_sold

FROM Sales

GROUP BY product_id

HAVING SUM(sold_quantity) > 100

PreferredShippers AS (

SELECT shipper_id

FROM Shippers

WHERE rating > 4.5

)

SELECT *

FROM RecentOrders r

JOIN Customers c ON r.customer_id = c.customer_id

JOIN Products p ON r.product_id = p.product_id

JOIN Shipping s ON r.shipping_id = s.shipping_id

JOIN TopProducts tp ON r.product_id = tp.product_id

JOIN PreferredShippers ps ON s.shipper_id = ps.shipper_id

LEFT JOIN (

SELECT customer_id, AVG(order_total) AS avg_order_total

FROM Orders

GROUP BY customer_id

) AS AverageOrders ON c.customer_id = AverageOrders.customer_id

WHERE c.customer_type = 'Premium'

AND p.product_category IN ('Electronics', 'Apparel')

AND s.shipping_type = 'Express'

AND tp.total_sold > 200

AND AverageOrders.avg_order_total > 100

ORDER BY r.order_date DESC, tp.total_sold DESC;

Follow these steps:

Access FlowHigh: Log in to FlowHigh’s web interface and select the “optimise sql” option from the left panel.
Submit the Query: Paste the SQL query into the editor and click “Optimise”.

FlowHigh’s analysis will identify instances of the SELECT * anti pattern. If multiple anti patterns are detected in the same query, FlowHigh will highlight them individually.

On the right side, you’ll receive clear descriptions of detected anti patterns, along with explanations, types (Readability, Performance, Correctness), and suggestions for resolution.

Using FlowHigh SDK to Detect the SELECT * anti pattern

The FlowHigh SDK is very useful to programmatically detect anti patterns in your SQL code for automation purposes. As an example you could have a process that scans the Snowflake query history from time to time and detects any bad SQL.

Follow these steps to use the FlowHigh SDK:

Import the SDK: Begin by importing the required modules from the FlowHigh SDK in your Python script.
Instantiate FlowHighSubmissionClass: Create an instance of the FlowHighSubmissionClass by passing your SQL query to the constructor:

1	fh = FlowHighSubmissionClass.from_sql(query_txt)

Retrieve Anti Patterns: Utilize the instantiated object to extract all anti pattern details present in your SQL code. By using the get_all_antipatterns() method, you’ll obtain a list of anti pattern instances detected within the query.

1	antipatterns = fh.get_all_antipatterns()

Source code to detect anti patterns.

import requests

import pandas as pd

from datetime import datetime

from flowhigh.model import AntiPattern

from flowhigh.utils.converter import FlowHighSubmissionClass

fh_ap_doc = requests.get("https://docs.sonra.io/flowhigh/master/docs/sql-optimiser/index.json")

json_data = fh_ap_doc.json()

ap_dict = {k: v for dic in json_data["data"] for k, v in dic.items()}

query_log = []

data = []

# function to return antipattern details

def get_antipattern_details(ap: AntiPattern):

severity = "Notice" if ap.severity == 3 else ("Caution" if ap.severity == 2 else "Warning")

performance_flag = ap.performance if ap.performance != ' ' else 'N'

readability_flag = ap.readability if ap.readability != ' ' else 'N'

correctness_flag = ap.correctness if ap.correctness != ' ' else 'N'

ap.p

for position in ap.pos:

start, length = map(int, position.split('-'))

end = start + length

yield [ap.type_, ap_dict[ap.type_], start, end, str(query_txt[start:end]),

severity, ap.link, performance_flag, readability_flag, correctness_flag,

datetime.now().strftime('%Y-%m-%d %H:%M:%S')]

# Query to check for antipattern

query_txt = '''

'''

fh = FlowHighSubmissionClass.from_sql(query_txt)

antipatterns = fh.get_all_antipatterns()

for ap in antipatterns:

data.extend(list(get_antipattern_details(ap)))

# converting the list to dataframes to load the data into table

df = pd.DataFrame(data, columns=['AP_ID', 'AP_NAME', 'POS_START_INDEX', 'POS_END_INDEX', 'QUERY_SECTIONS_OF_AP',

'SEVERITY', 'AP_LINK', 'PERFORMANCE_FLAG', 'READABILITY_FLAG',

'CORRECTNESS_FLAG', 'TIME_STAMP'])

print(df)

About the author:

Uli Bethke

Co-founder of Sonra

Any questions or comments for Uli? Connect with him on LinkedIn.

Follow Uli Bethke:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD