Skip to content

REGEXP_INSTR

Determines the start position or end position (offset) of the occ occurrence of the patt pattern in the expr string. If no match is found, 0 is returned.

This function complies with the POSIX standard for regular expressions and the Unicode guidelines for regular expressions.

Syntax

sql
REGEXP_INSTR(expr, patt[, pos[, occ[, offset[, param[, grp_num]]]]])

Arguments

expr: Character sequence or text column to be searched, of data type STRING.


patt: Regular expression pattern of data type STRING.


pos (optional): Position of the character counted from the left, from which the search for patt should begin. Default value is pos = 1 (character 1 from the left).


occ (optional): Defines the occurrence of patt in expr whose position is to be returned. Default value occ = 1. Only the first occurrence of patt in expr is returned (unlike the INSTR function, which starts its search for the second occurrence at the second character of the first occurrence).


offset (optional): If the value is 0, the position of the first character of the match is returned. Value 1 returns the position of the first character after the end of the match. Default value is 0.


param (optional): STRING sequence in which additional regular expression parameters can be specified to influence the search for patt.

Supported parameters:

Parameter Effect
c (default) Case sensitive search: Case is considered in the search.
iCase insensitive search: Case is ignored in the search.
m Search in multiline mode. Metacharacters `^` mark the beginning and `$` the end of a line in expr. Without this parameter, multiline mode is disabled. `^` and `$` mark the beginning and end of the entire expr.
n Causes the dot (.), which matches any character, to match the newline character. If this parameter is omitted, the dot does not match the newline character.
x Causes whitespace to be ignored. By default, whitespace matches itself.

Please note

If no parameters are passed through param:

  • A dot (.) does not match the newline character.

  • The source string is treated as a single line.


grp_num (optional): Integer 0-9. Specifies the position of which capturing group (subexpressions in patt enclosed in parentheses) should be extracted.

If grp_num is specified, the position of the defined capturing group is extracted. If grp_num is NULL (not specified), the position of the entire substring that matches patt is returned. If patt does not have at least grp_num subexpressions, the function returns NULL. A NULL for grp_num returns NULL. The default value for grp_num is NULL.

Usage Notes

  • Positions of characters in the string start at 1 and not 0.

  • If the specified grp_num is greater than the actual number of capturing groups in expr, 0 is returned.

Example 1 | REGEXP_INSTR and REGEXP_SUBSTR

Comparison of REGEXP_INSTR and REGEXP_SUBSTR.

sql
select   regexp_substr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') as reg_substr, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') AS reg_instr;
+---------------+--------------+
|   REG_SUBSTR  |   REG_INSTR  |          
|---------------+--------------+
| Gaudeamus1    | 1            |
+---------------+--------------+

Example 2 | pos Argument

sql
select   regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',3) AS result1, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',11) AS result2;
+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 3       | 13      |
+---------+---------+

Example 3 | occ Argument

Search for the 2nd and 3rd match: occ argument.

sql
select   regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,2) AS result1, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result2;
+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 13      | 22      |
+---------+---------+

Example 4 | offset Argument

Retrieve the second, third, and fourth group within the first occurrence of the pattern. In this case, letters 1, 2, and 3 of the word 'Iuvenes'. The search is case sensitive.

sql
select   
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result1,
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,0) AS result2, 
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,1) AS result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 22      | 22      | 30      |
+---------+---------+---------+

Example 5 | grp_num Argument

If the specified grp_num is greater than the actual number of capturing groups in expr, 0 is returned.

sql
    select
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,4) AS result1;
+---------+
| RESULT1 |
+---------+
| 0       |
+---------+

Additional Examples

"Word characters" include letters (a–z, A–Z), underscore ('_'), and decimal digits 0–9.

sql
    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+G\w+') as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+') as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+') as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 0       | 10      | 17      |
+---------+---------+---------+

Case insensitive search for the position of a word starting with 'i'. The 3rd occurrence is empty.

sql
    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,1,0, 'i') as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,2,0, 'i') as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+',1,3,0, 'i') as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 10      | 17      | 0       |
+---------+---------+---------+

If grp_num is missing, it defaults to grp_num = 1.

sql
    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i'  ) as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i',1) as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i' ,2) as result3;
+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 17      | 17      | 20      |
+---------+---------+---------+

Search for the position of the second word from the first, second, and third matches in a numbered list. This also shows that attempting to go beyond the last capturing group returns 0.

sql
select 
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1,    '', 1) as result1,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1, 0, '', 1) as result1_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2,    '', 1) as result2,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2, 0, '', 1) as result2_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3,    '', 1) as result3,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3, 0, '', 1) as result3_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4,    '', 1) as result4,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4, 0, '', 1) as result4_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5,    '', 1) as result5,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5, 0, '', 1) as result5_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6,    '', 1) as result6,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6, 0, '', 1) as result6_pos;
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| RESULT1   | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS | RESULT4 | RESULT4_POS | RESULT5 | RESULT5_POS | RESULT6 | RESULT6_POS |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| Gaudeamus | 4           | igitur  | 17          | Iuvenes | 27          | dum     | 38          | sumus   | 45          | NULL    | 0           |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+

Return the positions of capturing groups defined as individual letters.

sql
select 
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 1) as result1,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 1) as result1_pos,
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 2) as result2,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 2) as result2_pos,
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 3) as result3,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 3) as result3_pos;
+---------+-------------+---------+-------------+---------+-------------+
| RESULT1 | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS |
+---------+-------------+---------+-------------+---------+-------------+
| G       | 1           | a       | 2           | u       | 3           |
+---------+-------------+---------+-------------+---------+-------------+

In this example, match occurrences of strings ending with 'us', with at least two alphabetic characters before (case insensitive). The position of the first character of the string is returned.

sql
select 
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '\.') as "dot_position",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 1) as "result1",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 2) as "result2",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 3) as "result3",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 4) as "result4",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 5) as "result5";
+--------------+---------+---------+---------+---------+---------+
| DOT_POSITION | RESULT1 | RESULT2 | RESULT3 | RESULT4 | RESULT5 |
+--------------+---------+---------+---------+---------+---------+
| 36           | 1       | 31      | 76      | 0       | 0       |
+--------------+---------+---------+---------+---------+---------+