REGEXP_INSTR

Determines the start position or end position (offset) of the occ occurrence of the patt pattern in the expr string. If no match is found, 0 is returned.

This function complies with the POSIX standard for regular expressions and the Unicode guidelines for regular expressions.

Syntax

sql

REGEXP_INSTR(expr, patt[, pos[, occ[, offset[, param[, grp_num]]]]])

Arguments

expr: Character sequence or text column to be searched, of data type STRING.

patt: Regular expression pattern of data type STRING.

pos (optional): Position of the character counted from the left, from which the search for patt should begin. Default value is pos = 1 (character 1 from the left).

occ (optional): Defines the occurrence of patt in expr whose position is to be returned. Default value occ = 1. Only the first occurrence of patt in expr is returned (unlike the INSTR function, which starts its search for the second occurrence at the second character of the first occurrence).

offset (optional): If the value is 0, the position of the first character of the match is returned. Value 1 returns the position of the first character after the end of the match. Default value is 0.

param (optional): STRING sequence in which additional regular expression parameters can be specified to influence the search for patt.

Supported parameters:

Parameter	Effect
c (default)	Case sensitive search: Case is considered in the search.
i	Case insensitive search: Case is ignored in the search.
m	Search in multiline mode. Metacharacters `^` mark the beginning and `$` the end of a line in expr. Without this parameter, multiline mode is disabled. `^` and `$` mark the beginning and end of the entire expr.
n	Causes the dot (.), which matches any character, to match the newline character. If this parameter is omitted, the dot does not match the newline character.
x	Causes whitespace to be ignored. By default, whitespace matches itself.

Please note

If no parameters are passed through param:

A dot (.) does not match the newline character.
The source string is treated as a single line.

grp_num (optional): Integer 0-9. Specifies the position of which capturing group (subexpressions in patt enclosed in parentheses) should be extracted.

If grp_num is specified, the position of the defined capturing group is extracted. If grp_num is NULL (not specified), the position of the entire substring that matches patt is returned. If patt does not have at least grp_num subexpressions, the function returns NULL. A NULL for grp_num returns NULL. The default value for grp_num is NULL.

Usage Notes

Positions of characters in the string start at 1 and not 0.
If the specified grp_num is greater than the actual number of capturing groups in expr, 0 is returned.

Example 1 | REGEXP_INSTR and REGEXP_SUBSTR

Comparison of REGEXP_INSTR and REGEXP_SUBSTR.

sql

select   regexp_substr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') as reg_substr, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d') AS reg_instr;

+---------------+--------------+
|   REG_SUBSTR  |   REG_INSTR  |          
|---------------+--------------+
| Gaudeamus1    | 1            |
+---------------+--------------+

Example 2 | pos Argument

sql

select   regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',3) AS result1, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',11) AS result2;

+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 3       | 13      |
+---------+---------+

Example 3 | occ Argument

Search for the 2nd and 3rd match: occ argument.

sql

select   regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,2) AS result1, 
         regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result2;

+---------+---------+
| RESULT1 | RESULT2 |
+---------+---------+
| 13      | 22      |
+---------+---------+

Example 4 | offset Argument

Retrieve the second, third, and fourth group within the first occurrence of the pattern. In this case, letters 1, 2, and 3 of the word 'Iuvenes'. The search is case sensitive.

sql

select   
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3) AS result1,
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,0) AS result2, 
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,3,1) AS result3;

+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 22      | 22      | 30      |
+---------+---------+---------+

Example 5 | grp_num Argument

If the specified grp_num is greater than the actual number of capturing groups in expr, 0 is returned.

sql

    select
        regexp_instr('Gaudeamus1, igitur2, Iuvenes3', '\w+\d',1,4) AS result1;

+---------+
| RESULT1 |
+---------+
| 0       |
+---------+

Additional Examples

"Word characters" include letters (a–z, A–Z), underscore ('_'), and decimal digits 0–9.

sql

    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+G\w+') as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+') as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+') as result3;

+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 0       | 10      | 17      |
+---------+---------+---------+

Case insensitive search for the position of a word starting with 'i'. The 3rd occurrence is empty.

sql

    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,1,0, 'i') as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+i\w+',1,2,0, 'i') as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '\W+I\w+',1,3,0, 'i') as result3;

+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 10      | 17      | 0       |
+---------+---------+---------+

If grp_num is missing, it defaults to grp_num = 1.

sql

    select
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i'  ) as result1,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i',1) as result2,
        regexp_instr('Gaudeamus igitur, Iuvenes dum sumus', '(\W+)i(\w+)',1,2,0, 'i' ,2) as result3;

+---------+---------+---------+
| RESULT1 | RESULT2 | RESULT3 |
+---------+---------+---------+
| 17      | 17      | 20      |
+---------+---------+---------+

Search for the position of the second word from the first, second, and third matches in a numbered list. This also shows that attempting to go beyond the last capturing group returns 0.

sql

select 
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1,    '', 1) as result1,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 1, 0, '', 1) as result1_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2,    '', 1) as result2,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 2, 0, '', 1) as result2_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3,    '', 1) as result3,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 3, 0, '', 1) as result3_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4,    '', 1) as result4,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 4, 0, '', 1) as result4_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5,    '', 1) as result5,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 5, 0, '', 1) as result5_pos,
    regexp_substr('1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6,    '', 1) as result6,
    regexp_instr( '1. Gaudeamus 2. igitur 3. Iuvenes 4. dum 5. sumus', '\D\W+(\w+)', 1, 6, 0, '', 1) as result6_pos;

+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| RESULT1   | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS | RESULT4 | RESULT4_POS | RESULT5 | RESULT5_POS | RESULT6 | RESULT6_POS |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+
| Gaudeamus | 4           | igitur  | 17          | Iuvenes | 27          | dum     | 38          | sumus   | 45          | NULL    | 0           |
+-----------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+

Return the positions of capturing groups defined as individual letters.

sql

select 
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 1) as result1,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 1) as result1_pos,
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 2) as result2,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 2) as result2_pos,
    regexp_substr('Gaudeamus', '(\w)(\w)(\w)', 1, 1,    '', 3) as result3,
    regexp_instr('Gaudeamus', '(\w)(\w)(\w)' , 1, 1, 0, '', 3) as result3_pos;

+---------+-------------+---------+-------------+---------+-------------+
| RESULT1 | RESULT1_POS | RESULT2 | RESULT2_POS | RESULT3 | RESULT3_POS |
+---------+-------------+---------+-------------+---------+-------------+
| G       | 1           | a       | 2           | u       | 3           |
+---------+-------------+---------+-------------+---------+-------------+

In this example, match occurrences of strings ending with 'us', with at least two alphabetic characters before (case insensitive). The position of the first character of the string is returned.

sql

select 
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '\.') as "dot_position",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 1) as "result1",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 2) as "result2",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 3) as "result3",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 4) as "result4",
    regexp_instr('Gaudeamus igitur, Iuvenes dum sumus. Gaudeam55us igitur, Omnes homines dum sumus', '[[:alpha:]]{2,}us', 1, 5) as "result5";

+--------------+---------+---------+---------+---------+---------+
| DOT_POSITION | RESULT1 | RESULT2 | RESULT3 | RESULT4 | RESULT5 |
+--------------+---------+---------+---------+---------+---------+
| 36           | 1       | 31      | 76      | 0       | 0       |
+--------------+---------+---------+---------+---------+---------+

Pattern Development

Integrations

Instance Metadata

Deployments

SQL Guide

Scripting Guide

General

Administration

Connections

Patterns

Sections

Editors

Instances

Release Management

Builds

Projects

Meta Database

SQL Command Reference

Query Syntax

Query Operators

General DDL

General DML

Function Reference

Scripting Reference

Information Schema

Azure Synapse Analytics

REGEXP_INSTR

Syntax

Arguments

Usage Notes

Example 1 | REGEXP_INSTR and REGEXP_SUBSTR

Example 2 | pos Argument

Example 3 | occ Argument

Example 4 | offset Argument

Example 5 | grp_num Argument

Additional Examples

Integrations

Connections

Sections

Editors

Release Management

SQL Command Reference

Query Syntax

Query Operators

General DDL

General DML

Function Reference

Scripting Reference

Information Schema

REGEXP_INSTR ​

Syntax ​

Arguments ​

Usage Notes ​

Example 1 | REGEXP_INSTR and REGEXP_SUBSTR ​

Example 2 | pos Argument ​

Example 3 | occ Argument ​

Example 4 | offset Argument ​

Example 5 | grp_num Argument ​

Additional Examples ​

REGEXP_INSTR

Syntax

Arguments

Usage Notes

Example 1 | REGEXP_INSTR and REGEXP_SUBSTR

Example 2 | pos Argument

Example 3 | occ Argument

Example 4 | offset Argument

Example 5 | grp_num Argument

Additional Examples