Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples for regexp functions #16962

Merged
merged 17 commits into from
Apr 15, 2024
Merged
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
361 changes: 361 additions & 0 deletions functions-and-operators/string-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1268,22 +1268,370 @@ Escape the argument for use in an SQL statement.

Pattern matching using regular expressions.

Examples:

In this example a number of strings are matched against two regular expressions.

```sql
WITH vals AS (
SELECT 'TiDB' AS v
UNION ALL
SELECT 'Titanium'
UNION ALL
SELECT 'Tungsten'
UNION ALL
SELECT 'Rust'
)
SELECT
v,
v REGEXP '^Ti' AS 'starts with "Ti"',
v REGEXP '^.{4}$' AS 'Length is 4 characters'
FROM
vals;
```

```
+----------+------------------+------------------------+
| v | starts with "Ti" | Length is 4 characters |
+----------+------------------+------------------------+
| TiDB | 1 | 1 |
| Titanium | 1 | 0 |
| Tungsten | 0 | 0 |
| Rust | 0 | 1 |
+----------+------------------+------------------------+
4 rows in set (0.00 sec)
```

The following example demonstrates that `REGEXP` isn't limited to the `SELECT` clause and for example can also be used in the `WHERE` clause of the query.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT
v
FROM (
SELECT 'TiDB' AS v
) AS vals
WHERE
v REGEXP 'DB$';
```

```
+------+
| v |
+------+
| TiDB |
+------+
1 row in set (0.01 sec)
```

### [`REGEXP_INSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-instr)

Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)).

The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [match_type]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`).

dveeden marked this conversation as resolved.
Show resolved Hide resolved
If either the `str` or `regexp` is NULL then the function returns NULL.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Examples:

In the example below you can see that the `^.b.$` matches `abc`.

```sql
SELECT REGEXP_INSTR('abc','^.b.$');
```
dveeden marked this conversation as resolved.
Show resolved Hide resolved

```
+-----------------------------+
| REGEXP_INSTR('abc','^.b.$') |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.00 sec)
```

In the example below you can see that the third argument can be used to start looking for a match with a different start position in the string.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_INSTR('abcabc','a');
```

```
+----------------------------+
| REGEXP_INSTR('abcabc','a') |
+----------------------------+
| 1 |
+----------------------------+
1 row in set (0.00 sec)
```

```sql
SELECT REGEXP_INSTR('abcabc','a',2);
```

```
+------------------------------+
| REGEXP_INSTR('abcabc','a',2) |
+------------------------------+
| 4 |
+------------------------------+
1 row in set (0.00 sec)
```

In the example below we use the 4th argument to look for the second match.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_INSTR('abcabc','a',1,2);
dveeden marked this conversation as resolved.
Show resolved Hide resolved
```

```
+--------------------------------+
| REGEXP_INSTR('abcabc','a',1,2) |
+--------------------------------+
| 4 |
+--------------------------------+
1 row in set (0.00 sec)
```

In the example below we use the 5th argument to return the value _after_ the mach instead of the value of the match.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_INSTR('abcabc','a',1,1,1);
```

```
+----------------------------------+
| REGEXP_INSTR('abcabc','a',1,1,1) |
+----------------------------------+
| 2 |
+----------------------------------+
1 row in set (0.00 sec)
```

In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_INSTR('abcabc','A',1,1,0,'');
```

```
+-------------------------------------+
| REGEXP_INSTR('abcabc','A',1,1,0,'') |
+-------------------------------------+
| 0 |
+-------------------------------------+
1 row in set (0.00 sec)
```

```sql
SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i');
```

```
+--------------------------------------+
| REGEXP_INSTR('abcabc','A',1,1,0,'i') |
+--------------------------------------+
| 1 |
+--------------------------------------+
1 row in set (0.00 sec)
```

dveeden marked this conversation as resolved.
Show resolved Hide resolved
Besides `match_type` the [collation](/character-set-and-collation.md) also influences the matching. In the example below a case-sensitive and a case-insensitive collation is used to demonstrate this.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci);
```

```
+-------------------------------------------------------+
| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci) |
+-------------------------------------------------------+
| 1 |
+-------------------------------------------------------+
1 row in set (0.01 sec)
```

```sql
SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin);
```

```
+------------------------------------------------+
| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin) |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.00 sec)
```

### [`REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like)

Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)).

The `REGEXP_LIKE(str, regex, [match_type])` function is used to test if a regular expression matches a string. Optionally the `match_type` can be used to change the matching behavior.

Examples:

This example matches `^a` against `abc`, which matches.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_LIKE('abc','^a');
```

```
+-------------------------+
| REGEXP_LIKE('abc','^a') |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
```

This example matches `^A` against `abc`, which doesn't match.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_LIKE('abc','^A');
```

```
+-------------------------+
| REGEXP_LIKE('abc','^A') |
+-------------------------+
| 0 |
+-------------------------+
1 row in set (0.00 sec)
```

This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. More details about regular expression `match_type`` can be found in [`match_type` compatibility](#match_type-compatibility) section.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_LIKE('abc','^A','i');
```

```
+-----------------------------+
| REGEXP_LIKE('abc','^A','i') |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.00 sec)
```

### [`REGEXP_REPLACE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace)

Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)).

The `REGEXP_REPLACE(str, regexp, replace, [start, [match, [match_type]]])` function can be used to replace strings based on regular expressions.

Examples:

In this example two o's are replaced by "i".
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i');
```

```
+--------------------------------------+
| REGEXP_REPLACE('TooDB', 'o{2}', 'i') |
+--------------------------------------+
| TiDB |
+--------------------------------------+
1 row in set (0.00 sec)
```

In the example below we start the match at the third character, causing the regular expression to not match and not do any replacement.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i',3);
```

```
+----------------------------------------+
| REGEXP_REPLACE('TooDB', 'o{2}', 'i',3) |
+----------------------------------------+
| TooDB |
+----------------------------------------+
1 row in set (0.00 sec)
```

In the example below the 5th argument is used to set if the first or the second match is used for the replacement.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,1);
```

```
+---------------------------------------+
| REGEXP_REPLACE('TooDB', 'o', 'i',1,1) |
+---------------------------------------+
| TioDB |
+---------------------------------------+
1 row in set (0.00 sec)
```

```sql
SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2);
```

```
+---------------------------------------+
| REGEXP_REPLACE('TooDB', 'o', 'i',1,2) |
+---------------------------------------+
| ToiDB |
+---------------------------------------+
1 row in set (0.00 sec)
```

In the example below we use the 6th argument to set the `match_type` for case insensitive matching. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1);
dveeden marked this conversation as resolved.
Show resolved Hide resolved
```

```
+-----------------------------------------+
| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1) |
+-----------------------------------------+
| TooDB |
+-----------------------------------------+
1 row in set (0.00 sec)
```

```sql
SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i');
```

```
+---------------------------------------------+
| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i') |
+---------------------------------------------+
| TiDB |
+---------------------------------------------+
1 row in set (0.00 sec)
```

### [`REGEXP_SUBSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr)

Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)).

The `REGEXP_SUBSTR(str, regexp, [start, [match, [match_type]]])` function is used to get a substring based on a regular expression.

Examples

In this example we use the `Ti.{2}` regular expression to get the `TiDB` substring of the `This is TiDB` string.
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT REGEXP_SUBSTR('This is TiDB','Ti.{2}');
```

```
+----------------------------------------+
| REGEXP_SUBSTR('This is TiDB','Ti.{2}') |
+----------------------------------------+
| TiDB |
+----------------------------------------+
1 row in set (0.00 sec)
```

### [`REPEAT()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_repeat)

Repeat a string the specified number of times.
Expand Down Expand Up @@ -1560,6 +1908,15 @@ The value options of `match_type` between TiDB and MySQL are:

- TiDB does not support `"u"`, which means Unix-only line endings in MySQL.

| `match_type` | MySQL | TiDB | Description |
|:------------:|-------|------|----------------------------------------|
| c | Yes | Yes | Case-sensitive matching |
| i | Yes | Yes | Case-insensitive matching |
| m | Yes | Yes | Multi-line mode |
| s | No | Yes | Matches newlines, same as `n` in MySQL |
| n | Yes | No | Matches newlines, same as `s` in TiDB |
| u | Yes | No | UNIX™ line endings |

### Data type compatibility

The difference between TiDB and MySQL support for the binary string type:
Expand All @@ -1581,3 +1938,7 @@ The difference between TiDB and MySQL support for the binary string type:
```sql
SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s;
```

### Known issues

- [GitHub Issue #37981](https://github.com/pingcap/tidb/issues/37981)
Loading