DIFFERENT WAYS TO DO PATTERN MATCHING IN POSTGRESQL

By: Saurav

2020-07-17 22:26:00 UTC

For a long while, I had been using LIKE and ILIKE operators to do pattern matching in PostgreSQL. Like most of you, I have been thinking about all my queries in terms of these two operators. Coming from SQL background, I never really questioned if there were other ways. Well today, I would be showing you a few more operators which Postgres provides which you might like and prefer. Let's start with the basics:

LIKE and NOT LIKE


These operators are almost identical to SQL like operators. In the simplest sense, they provide case sensitive exact string matching. If your query does not contain percent signs or underscore, this will act as identical to the equals operator.

select * from users where password like '26Y';

If there is an entry with password 26YlWR7e26k, it will not match. It will only match a password which is '26Y'

If you want to use LIKE or NOT LIKE operator for pattern matching within a string, it supports two operators % and _
- An underscore (_) in the pattern matches any single character;
- A percent sign (%) matches any string of zero or more characters.
- To match an actual underscore or percent sign, your query pattern must be escaped by the escape character (\)

select * from users where password like '26Y%';

This will match an entry 26YlWR7e26k and 26Y

let's see what PSQL exactly does behind the scenes while running LIKE operator. We will use EXPLAIN query:

EXPLAIN select * from users where password LIKE '26Y';

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~~ '26Y'::text)

EXPLAIN select * from users where password NOT LIKE '26Y';

Seq Scan on users  (cost=0.00..1329.86 rows=9388 width=1993)
  Filter: ((password)::text !~~ '26Y'::text)

Notice the operator PSQL uses behind the scene. ~~ and !~~

These bring me to the second operator:

~~ and !~~

These operators are aliases for LIKE and NOT LIKE operators in PSQL. You can use them just like you would use LIKE operator.

select * from users where password ~~ '26y'; 

select * from users where password !~~ '26y'; 

These return the same exact query plan like LIKE/NOT LIKE operator

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~~ '26y'::text)

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text !~~ '26y'::text)

ILIKE and NOT ILIKE


These are case insensitive pattern matching operators provided in Postgres. These are not available in SQL. As explained earlier, these operators are case insensitive. They support the same operators (%) and (_) for matching a pattern within a string.

You can use them to write a broader query.

select * from users where password ilike '26y';
-- This will match '26Y'

select * from users where password not ilike '26y';

let's take a look at the query plan

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~~* '26y'::text)

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text !~~* '26y'::text)

Notice the ~~* and !~~* operators in the query plan.
These operators are aliases used by Postgres under the hood. This brings us to the next two operators:

~~* and !~~*

As mentioned earlier, these are just aliases for ILIKE and not ILIKE operators. They provide case insensitive alternative to LIKE operators. They also support the % and _ operators to do pattern matching within a string. The backslash rule applies here as well if you have a (%) or (_) in your string and you want to escape it.

EXPLAIN select * from users where password ~~* '26y';

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~~* '26y'::text)


EXPLAIN select * from users where password !~~* '26y';

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text !~~* '26y'::text)

SIMILAR TO


This operator returns true or false when a match happens. It supports what LIKE operator supports except that it interprets the pattern using SQL99's definition of a regular expression. SQL99 regular expressions merge LIKE notation and common regular expression notation.

The common regular expressions supported by this operator are:
- | denotes either of two
- * denotes zero or more times repetition.
- + denotes one or more times repetition of the previous item
- () Denotes one item

You can read more about POSIX regexp here: REGEX

Let's take a look at the query plan:

EXPLAIN select * from users where password SIMILAR TO '26y'; 

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~ '^(?:26y)$'::text)

Notice the ~ operator in the plan. This is regex pattern matching operator (case sensitive). This can match any regular expression you provide and that's how PostgreSQL chooses to run the SIMILAR TO query.

Regular expression matching is a very powerful pattern matcher and brings us to our last but not the least matchers:

~ and !~

With this operator, you have the liberty to use any POSIX regular expression for your query. Let's take a few examples:

select * from users where password ~ '26Y';  -- matches for all tokens

select * from users where password ~ '^26Y'; -- Starts with
Seq Scan on users  (cost=0.00..1329.86 rows=9388 width=1993)
  Filter: ((password)::text ~ '^26y'::text)


select * from users where password ~ '26Y$'; -- Ends with



EXPLAIN select * from users where password ~ '26Y';
Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~ '26y'::text)

EXPLAIN select * from users where password !~ '26Y';
Seq Scan on users  (cost=0.00..1329.86 rows=9388 width=1993)
  Filter: ((password)::text !~ '26y'::text)

I love this pattern matcher. The reason being I am not restricted by the use of only % and _ or basic regex operators. ~ and !~ opens doors to you using any regex pattern you can write. The only limitation is you.

Finally, since the above are case sensitive, we also have the case insensitive operator counterparts:

~* and !~*


These provide the same powers as above but much relaxed constraints. No case insensitivity.

EXPLAIN select * from users where password ~* '^26y'; 

Seq Scan on users  (cost=0.00..1329.86 rows=1 width=1993)
  Filter: ((password)::text ~* '^26y'::text)

There are a few more operators and function available and you can read more about them in the PostgreSQL Reference

Try them and let me know how it works out for you :)
Twitter

Owned & Maintained by Saurav Prakash

If you like what you see, you can help me cover server costs or buy me a cup of coffee though donation :)