REGEXP_INSTR
Determines the start position or end position (offset) of the occ occurrence of the patt pattern in the expr string. If no match is found, 0 is returned.
This function complies with the POSIX standard for regular expressions and the Unicode guidelines for regular expressions.
Syntax
REGEXP_INSTR(expr, patt[, pos[, occ[, offset[, param[, grp_num]]]]])Arguments
expr: Character sequence or text column to be searched, of data type STRING.
patt: Regular expression pattern of data type STRING.
pos (optional): Position of the character counted from the left, from which the search for patt should begin. Default value is pos = 1 (character 1 from the left).
occ (optional): Defines the occurrence of patt in expr whose position is to be returned. Default value occ = 1. Only the first occurrence of patt in expr is returned (unlike the INSTR function, which starts its search for the second occurrence at the second character of the first occurrence).
offset (optional): If the value is 0, the position of the first character of the match is returned. Value 1 returns the position of the first character after the end of the match. Default value is 0.
param (optional): STRING sequence in which additional regular expression parameters can be specified to influence the search for patt.
Supported parameters:
| Parameter | Effect |
| c (default) | Case sensitive search: Case is considered in the search. |
| i | Case insensitive search: Case is ignored in the search. |
| m | Search in multiline mode. Metacharacters `^` mark the beginning and `$` the end of a line in expr. Without this parameter, multiline mode is disabled. `^` and `$` mark the beginning and end of the entire expr. |
| n | Causes the dot (.), which matches any character, to match the newline character. If this parameter is omitted, the dot does not match the newline character. |
| x | Causes whitespace to be ignored. By default, whitespace matches itself. |
Please note
If no parameters are passed through param:
A dot (.) does not match the newline character.
The source string is treated as a single line.
grp_num (optional): Integer 0-9. Specifies the position of which capturing group (subexpressions in patt enclosed in parentheses) should be extracted.
If grp_num is specified, the position of the defined capturing group is extracted. If grp_num is NULL (not specified), the position of the entire substring that matches patt is returned. If patt does not have at least grp_num subexpressions, the function returns NULL. A NULL for grp_num returns NULL. The default value for grp_num is NULL.
Usage Notes
Positions of characters in the string start at
1and not0.If the specified grp_num is greater than the actual number of capturing groups in expr,
0is returned.
Example 1 | REGEXP_INSTR and REGEXP_SUBSTR
Comparison of REGEXP_INSTR and REGEXP_SUBSTR.
select regexp_substr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') as reg_substr,
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') AS reg_instr;+---------------+--------------+
| REG_SUBSTR | REG_INSTR |
|---------------+--------------+
| Gaudeamus1 | 1 |
+---------------+--------------+Example 2 | pos Argument
select regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',3) AS result1,
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',11) AS result2;+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 3 | 13 |
+---------+---------+Example 3 | occ Argument
Search for the 2nd and 3rd match: occ argument.
select regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,2) AS result1,
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result2;+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 13 | 22 |
+---------+---------+Example 4 | offset Argument
Retrieve the second, third, and fourth group within the first occurrence of the pattern. In this case, letters 1, 2, and 3 of the word 'Iuvenes'. The search is case sensitive.
select
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result1,
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,0) AS result2,
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,1) AS result3;+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 22 | 22 | 30 |
+---------+---------+---------+Example 5 | grp_num Argument
If the specified grp_num is greater than the actual number of capturing groups in expr, 0 is returned.
select
regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,4) AS result1;+---------+
| RESULT1 |
+---------+
| 0 |
+---------+Additional Examples
"Word characters" include letters (a–z, A–Z), underscore ('_'), and decimal digits 0–9.
select
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+G\w+') as result1,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+') as result2,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+') as result3;+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 0 | 10 | 17 |
+---------+---------+---------+Case insensitive search for the position of a word starting with 'i'. The 3rd occurrence is empty.
select
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,1,0, 'i') as result1,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,2,0, 'i') as result2,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+',1,3,0, 'i') as result3;+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 10 | 17 | 0 |
+---------+---------+---------+If grp_num is missing, it defaults to grp_num = 1.
select
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i' ) as result1,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i',1) as result2,
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i' ,2) as result3;+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 17 | 17 | 20 |
+---------+---------+---------+Search for the position of the second word from the first, second, and third matches in a numbered list. This also shows that attempting to go beyond the last capturing group returns 0.
select
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1, '', 1) as result1,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1, 0, '', 1) as result1_pos,
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2, '', 1) as result2,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2, 0, '', 1) as result2_pos,
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3, '', 1) as result3,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3, 0, '', 1) as result3_pos,
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4, '', 1) as result4,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4, 0, '', 1) as result4_pos,
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5, '', 1) as result5,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5, 0, '', 1) as result5_pos,
regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6, '', 1) as result6,
regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6, 0, '', 1) as result6_pos;+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| RESULT1 | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS | RESULT4 | RESULT4_POS | RESULT5 | RESULT5_POS | RESULT6 | RESULT6_POS |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| Gaudeamus | 4 | igitur | 17 | Iuvenes | 27 | dum | 38 | sumus | 45 | NULL | 0 |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+Return the positions of capturing groups defined as individual letters.
select
regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1, '', 1) as result1,
regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 1) as result1_pos,
regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1, '', 2) as result2,
regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 2) as result2_pos,
regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1, '', 3) as result3,
regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 3) as result3_pos;+---------+-------------+---------+-------------+---------+-------------+
| RESULT1 | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS |
+---------+-------------+---------+-------------+---------+-------------+
| G | 1 | a | 2 | u | 3 |
+---------+-------------+---------+-------------+---------+-------------+In this example, match occurrences of strings ending with 'us', with at least two alphabetic characters before (case insensitive). The position of the first character of the string is returned.
select
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '\.') as "dot_position",
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 1) as "result1",
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 2) as "result2",
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 3) as "result3",
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 4) as "result4",
regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 5) as "result5";+--------------+---------+---------+---------+---------+---------+
| DOT_POSITION | RESULT1 | RESULT2 | RESULT3 | RESULT4 | RESULT5 |
+--------------+---------+---------+---------+---------+---------+
| 36 | 1 | 31 | 76 | 0 | 0 |
+--------------+---------+---------+---------+---------+---------+