PostgreSQL: Regular expressions and pattern matching
A regular expression is a special text string used to describe a search pattern. PostgreSQL’s regular expressions supports three separate approaches to pattern matching:
- POSIX-style regular expressions (BREs and EREs)
- SIMILAR TO operator added in SQL:1999
- SQL LIKE operator
There are some more advanced techniques for advanced pattern matching requirements but those will very likely involve writing some user defined functions using Perl or Tcl.
POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. This post is first in the series of blogs I plan to write on the RE topic with respect to PostgreSQL. For starters, I will discuss POSIX-style regular expressions and see some basic uses.
PostgreSQL supports following four operators for POSIX regular expression matching (also known as the tilde operator). The tilde operator returns true or false depending on whether or not a regular expression can match a string or a part thereof.
- ~ (Matches regular expression, case sensitive)
- ~* (Matches regular expression, case insensitive)
- !~ (Does not match regular expression, case sensitive)
- !~* (Does not match regular expression, case insensitive)
So how do we use them ? The most common use of this operator is to select rows based on whether a column matches a regular expression. Here are some examples:
regex=# SELECT 'similar' ~ 'similar' as result; result -------- t (1 row)
— following will fail as the pattern matching is case sensitive
regex=# SELECT 'similar' ~ 'Similar' as result; result -------- f (1 row)
— so we switch to case insensitive operator and re-run the same query
regex=# SELECT 'similar' ~* 'Similar' as result; result -------- t (1 row)
Similarly, in order to achieve the opposite result, we just prefix ~ operator with ‘!’ sign
— Following will return TRUE since both strings do not match with case sensitivity enabled
regex=# SELECT 'similar' !~ 'Similar' as result; result -------- t (1 row)
— With case sensitivity turned off, patterns match and therefore we get FALSE
regex=# SELECT 'similar' !~* 'Similar' as result; result -------- f (1 row)
We could quite easily extend this to run match among multiple string patterns but that and more will come in the upcoming blogs in this series. Stay tuned!
Shameless self promotion: I have an online regular expression testing website that supports PostgreSQL!