Snowflake Regular Expressions for Effective Validation

Snowflake Regular Expressions for Effective Validation

When working with user data, email validation is crucial to ensure clean, consistent, and valid information.

Recently, I faced an interesting challenge where I had to validate email addresses in SQL using regular expressions.

Creating the USERS Table

First, let’s create a temporary USERS table to store our users' email addresses:

create or replace temp table "USERS" ( "USER_ID" int autoincrement primary key, "EMAIL" varchar);

Next, we populate this table with a few test email addresses, including valid, invalid, subdomains, and plus-addressing (used by some services for email filtering):

insert overwrite into "USERS" ("EMAIL")values ('[email protected]'), ('[email protected]'), ('[email protected]'), ('[email protected]'), ('[email protected]'), -- Plus-addressing test case ('[email protected]'), -- Subdomain test case ('user@invalid_domain@com'), -- Invalid case ('[email protected]');        


The Regular Expression

Now, we get to the core of the solution: the regular expression to validate the email addresses.

select , regexp_like("EMAIL", $$[a-zA-Z0-9.-]+[+]?[a-zA-Z0-9.-]@[a-zA-Z0-9.-]+[.][a-zA-Z0-9]{2,6}$$) as "IS_VALID_EMAIL"from "USERS";        


This SQL query checks whether each email follows a valid pattern using the regexp_like() function. Let's break down the regular expression pattern:

  • [a-zA-Z0-9.-]+: Matches the local part of the email (before the @), consisting of letters, digits, dots, and hyphens. The + ensures at least one character is present.
  • [+]?: Allows an optional + symbol, commonly used for plus-addressing (e.g., [email protected]).
  • [a-zA-Z0-9.-]*: Matches any additional characters following the plus sign (if present), until we reach the domain.
  • @[a-zA-Z0-9.-]+: The @ symbol is required, followed by the domain (e.g., example or sub-domain), consisting of letters, digits, dots, and hyphens.
  • [.][a-zA-Z0-9]{2,6}: Finally, we match a period (.) followed by a top-level domain (TLD) that is 2 to 6 characters long (e.g., .com, .co.uk).

Running the Validation

When we run the query, we get a table of email addresses and whether they pass the regex validation



要查看或添加评论,请登录

Jaswanth Kumar的更多文章

社区洞察

其他会员也浏览了