Skip to content

REGEXP_SUBSTR

Returns the substring that matches patt in expr. If no match is found, NULL is returned.

This function complies with the POSIX standard for regular expressions and the Unicode guidelines for regular expressions.

Syntax

sql
REGEXP_SUBSTR(expr, 
              patt[, pos[, occ[, param[, grp_num]]]])

Arguments

expr: Character sequence or text column to be searched, of data type STRING.


patt: Regular expression pattern of data type STRING.


pos (optional): Position of the character counted from the left, from which the search for patt should begin. Default value is pos = 1 (character 1 from the left).


occ (optional): Defines the occurrence of patt in expr whose position is to be returned. Default value occ = 1. Only the first occurrence of patt in expr is returned (unlike the INSTR function, which starts its search for the second occurrence at the second character of the first occurrence).


param (optional): VARCHAR2 or CHAR string in which additional regular expression parameters can be specified to influence the search for patt.

Supported parameters:

Parameter Effect
c (default) Case sensitive search: Case is considered in the search.
iCase insensitive search: Case is ignored in the search.
m Search in multiline mode. Metacharacters `^` mark the beginning and `$` the end of a line in expr. Without this parameter, multiline mode is disabled. `^` and `$` mark the beginning and end of the entire expr.
n Causes the dot (.), which matches any character, to match the newline character. If this parameter is omitted, the dot does not match the newline character.
x Causes whitespace to be ignored. By default, whitespace matches itself.

grp_num (optional): Integer 0-9. Specifies which capturing group (subexpressions in patt enclosed in parentheses) should be extracted.

If grp_num is specified, this capturing group is extracted. If grp_num is NULL (not specified), the position of the entire substring that matches patt is returned. If patt does not have at least grp_num subexpressions, the function returns NULL. A NULL for grp_num returns NULL. The default value for grp_num is NULL.

Return Values

A value of type STRING that matches the substring.

Example 1

Search for a sentence (defined by a terminating '.' character) in expr starting from pos = 1.

  • '0' - without additional parameter
  • '1' - Sentence 1 is returned (grp_num = 1)
  • '2' - Sentence 2 is returned (grp_num = 2)
sql
select  
    regexp_substr('Gaudeamus igitur, Omnes homines dum sumus. Gaudeamus igitur, Iuvenes dum sumus.', '[^\.]*\.') as result1,
    regexp_substr('Gaudeamus igitur, Omnes homines dum sumus. Gaudeamus igitur, Iuvenes dum sumus.', '[^\.]*\.',1,1) as result2,
    regexp_substr('Gaudeamus igitur, Omnes homines dum sumus. Gaudeamus igitur, Iuvenes dum sumus.', '[^\.]*\.',1,2)  as result3;
+--------------------------------------------+--------------------------------------------+--------------------------------------+
| RESULT1                                    | RESULT2                                    | RESULT3                              |
+--------------------------------------------+--------------------------------------------+--------------------------------------+
| Gaudeamus igitur, Omnes homines dum sumus. | Gaudeamus igitur, Omnes homines dum sumus. | Gaudeamus igitur, Iuvenes dum sumus. |
+--------------------------------------------+--------------------------------------------+--------------------------------------+

Example 2

result1: Search for the second occurrence of a word following the word 'dum', and return the word instead of the entire expression. result2: Search for the second occurrence of a word following the word 'dum', and return the entire expression.

sql
select regexp_substr('Gaudeamus igitur, Omnes homines dum sumus. Gaudeamus igitur, Iuvenes dum volumus.', 
'dum\W+(\w+)', 1, 2, '',1) as result1,
regexp_substr('Gaudeamus igitur, Omnes homines dum sumus. Gaudeamus igitur, Iuvenes dum volumus.', 
'dum\W+(\w+)', 1, 2, '') as result2;
+---------+-------------+
| RESULT1 | RESULT2     |
+---------+-------------+
| volumus | dum volumus |
+---------+-------------+

Example 3

In this example, substrings of expr following a space are returned. result5 is NULL.

sql
select 
    regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}(\w+)', 1, 1, '', 1) as result1,
    regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}(\w+)', 1, 2, '', 1) as result2,
    regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}(\w+)', 1, 3, '', 1) as result3,
    regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}(\w+)', 1, 4, '', 1) as result4,
    regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}(\w+)', 1, 5, '', 1) as result5;
+---------+---------+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 | RESULT4 | RESULT5 |
+---------+---------+---------+---------+---------+
| igitur  | Iuvenes | dum     | sumus   | NULL    |
+---------+---------+---------+---------+---------+

Example 4

Retrieve the second, third, and fourth group within the first occurrence of the pattern. In this case, letters 2, 3, and 4 of the word 'Iuvenes'. The search is case sensitive.

sql
    select
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'c', 1) as result1,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'c', 2) as result2,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'c', 3) as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| u       | v       | e       |
+---------+---------+---------+

Example 5

Similar to Example 4, but with a case insensitive search. Therefore, the word 'igitur' is found and processed first instead of 'Iuvenes'.

sql
    select
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'i', 1) as result1,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'i', 2) as result2,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, 'i', 3) as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| g       | i       | t       |
+---------+---------+---------+

Example 6

Similar to Examples 4 and 5, but param is passed as an empty string ''. The default value for case sensitivity is used.

sql
    select
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, '', 1) as result1,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, '', 2) as result2,
        regexp_substr('Gaudeamus igitur, Iuvenes dum sumus', '\W{1}I(\w)(\w)(\w)', 1, 1, '', 3) as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| u       | v       | e       |
+---------+---------+---------+