From 17ceb204028d153ee40cbac423f5eb8bdf32c368 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 12:05:28 +0200 Subject: [PATCH 01/17] Add regular expression examples --- functions-and-operators/string-functions.md | 167 ++++++++++++++++++++ 1 file changed, 167 insertions(+) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index e2d9d93ae49f9..1459d7582b37e 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1268,10 +1268,173 @@ Escape the argument for use in an SQL statement. Pattern matching using regular expressions. +Examples: + +In this example a number of strings are matched against two regular expressions. + +```sql +WITH vals AS ( + SELECT 'TiDB' AS v + UNION ALL + SELECT 'Titanium' + UNION ALL + SELECT 'Tungsten' + UNION ALL + SELECT 'Rust' +) +SELECT + v, + v REGEXP '^Ti' AS 'starts with "Ti"', + v REGEXP '^.{4}$' AS 'Length is 4 characters' +FROM + vals; +``` + +``` ++----------+------------------+------------------------+ +| v | starts with "Ti" | Length is 4 characters | ++----------+------------------+------------------------+ +| TiDB | 1 | 1 | +| Titanium | 1 | 0 | +| Tungsten | 0 | 0 | +| Rust | 0 | 1 | ++----------+------------------+------------------------+ +4 rows in set (0.00 sec) +``` + +The following example demonstrates that `REGEXP` isn't limited to the `SELECT` clause and for example can also be used in the `WHERE` clause of the query. + +```sql +SELECT + v +FROM ( + SELECT 'TiDB' AS v + ) AS vals +WHERE + v REGEXP 'DB$'; +``` + +``` ++------+ +| v | ++------+ +| TiDB | ++------+ +1 row in set (0.01 sec) +``` + ### [`REGEXP_INSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-instr) Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). +The `REGEXP_INSTR(str, regexp, [start, [match, [ret]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). + +If either the `str` or `regexp` is NULL then the function returns NULL. If both arguments are NULL then the functions returns NULL. + + +Examples: + +In the example below you can see that the `^.b.$` matches `abc`. + +```sql +SELECT REGEXP_INSTR('abc','^.b.$'); +``` + +``` ++-----------------------------+ +| REGEXP_INSTR('abc','^.b.$') | ++-----------------------------+ +| 1 | ++-----------------------------+ +1 row in set (0.00 sec) +``` + +In the example below you can see that the third argument can be used to start looking for a match with a different start position in the string. + +```sql +SELECT REGEXP_INSTR('abcabc','a'); +``` + +``` ++----------------------------+ +| REGEXP_INSTR('abcabc','a') | ++----------------------------+ +| 1 | ++----------------------------+ +1 row in set (0.00 sec) +``` + +```sql +SELECT REGEXP_INSTR('abcabc','a',2); +``` + +``` ++------------------------------+ +| REGEXP_INSTR('abcabc','a',2) | ++------------------------------+ +| 4 | ++------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below we use the 4th argument to look for the second match. + +```sql +SELECT REGEXP_INSTR('abcabc','a',1,2); +``` + +``` ++--------------------------------+ +| REGEXP_INSTR('abcabc','a',1,2) | ++--------------------------------+ +| 4 | ++--------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below we use the 5th argument to return the value _after_ the mach instead of the value of the match. + +```sql +SELECT REGEXP_INSTR('abcabc','a',1,1,1); +``` + +``` ++----------------------------------+ +| REGEXP_INSTR('abcabc','a',1,1,1) | ++----------------------------------+ +| 2 | ++----------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. + +```sql +SELECT REGEXP_INSTR('abcabc','A',1,1,0,''); +``` + +``` ++-------------------------------------+ +| REGEXP_INSTR('abcabc','A',1,1,0,'') | ++-------------------------------------+ +| 0 | ++-------------------------------------+ +1 row in set (0.00 sec) +``` + +```sql +SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i'); +``` + +``` ++--------------------------------------+ +| REGEXP_INSTR('abcabc','A',1,1,0,'i') | ++--------------------------------------+ +| 1 | ++--------------------------------------+ +1 row in set (0.00 sec) +``` + ### [`REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like) Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). @@ -1581,3 +1744,7 @@ The difference between TiDB and MySQL support for the binary string type: ```sql SELECT REGEXP_REPLACE('abcd','(.*)(.{2})$','\\1') AS s; ``` + +### Known issues + +- https://github.com/pingcap/tidb/issues/37981 \ No newline at end of file From 5aa851367e554bd822b473e924c6611794707e0d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 12:07:35 +0200 Subject: [PATCH 02/17] Fixup --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 1459d7582b37e..5d3859d7c6f53 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1327,7 +1327,7 @@ WHERE Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). -The `REGEXP_INSTR(str, regexp, [start, [match, [ret]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). +The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [flags]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). If either the `str` or `regexp` is NULL then the function returns NULL. If both arguments are NULL then the functions returns NULL. From 47c6fd0e332289f18c0395230c789dd2494ce971 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 12:14:19 +0200 Subject: [PATCH 03/17] Add REGEXP_LIKE --- functions-and-operators/string-functions.md | 49 +++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 5d3859d7c6f53..c9f8cca70d49e 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1439,6 +1439,55 @@ SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i'); Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). +The `REGEXP_LIKE(str, regex, [flags])` function is used to test if a regular expression matches a string. Optionally the `flags` can be used to change the matching behavior. + +Examples: + +This example matches `^a` against `abc`, which matches. + +```sql +SELECT REGEXP_LIKE('abc','^a'); +``` + +``` ++-------------------------+ +| REGEXP_LIKE('abc','^a') | ++-------------------------+ +| 1 | ++-------------------------+ +1 row in set (0.00 sec) +``` + +This example matches `^A` against `abc`, which doesn't match. + +```sql +SELECT REGEXP_LIKE('abc','^A'); +``` + +``` ++-------------------------+ +| REGEXP_LIKE('abc','^A') | ++-------------------------+ +| 0 | ++-------------------------+ +1 row in set (0.00 sec) +``` + +This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. + +```sql +SELECT REGEXP_LIKE('abc','^A','i'); +``` + +``` ++-----------------------------+ +| REGEXP_LIKE('abc','^A','i') | ++-----------------------------+ +| 1 | ++-----------------------------+ +1 row in set (0.00 sec) +``` + ### [`REGEXP_REPLACE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace) Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). From 8c834856bf44cbc2b527caa2dcb2c7ef64dcd1d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 12:46:58 +0200 Subject: [PATCH 04/17] Add REGEXP_SUBSTR --- functions-and-operators/string-functions.md | 109 ++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index c9f8cca70d49e..d14b2e7b815de 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1492,10 +1492,119 @@ SELECT REGEXP_LIKE('abc','^A','i'); Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). +The `REGEXP_REPLACE(str, regexp, replace, [start, [match, [flags]]])` function can be used to replace strings based on regular expressions. + +Examples: + +In this example two o's are replaced by "i". + +```sql +SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i'); +``` + +``` ++--------------------------------------+ +| REGEXP_REPLACE('TooDB', 'o{2}', 'i') | ++--------------------------------------+ +| TiDB | ++--------------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below we start the match at the third character, causing the regular expression to not match and not do any replacement. + +```sql +SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i',3); +``` + +``` ++----------------------------------------+ +| REGEXP_REPLACE('TooDB', 'o{2}', 'i',3) | ++----------------------------------------+ +| TooDB | ++----------------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below the 5th argument is used to set if the first or the second match is used for the replacement. + +```sql +SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,1); +``` + +``` ++---------------------------------------+ +| REGEXP_REPLACE('TooDB', 'o', 'i',1,1) | ++---------------------------------------+ +| TioDB | ++---------------------------------------+ +1 row in set (0.00 sec) +``` + +```sql +SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2); +``` + +``` ++---------------------------------------+ +| REGEXP_REPLACE('TooDB', 'o', 'i',1,2) | ++---------------------------------------+ +| ToiDB | ++---------------------------------------+ +1 row in set (0.00 sec) +``` + +In the example below we use the 6th argument to set the flags for case insensitive matching. + +```sql +SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1); +``` + +``` ++-----------------------------------------+ +| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1) | ++-----------------------------------------+ +| TooDB | ++-----------------------------------------+ +1 row in set (0.00 sec) +``` + +```sql +SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i'); +``` + +``` ++---------------------------------------------+ +| REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i') | ++---------------------------------------------+ +| TiDB | ++---------------------------------------------+ +1 row in set (0.00 sec) +``` + ### [`REGEXP_SUBSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr) Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). +The `REGEXP_SUBSTR(str, regexp, [start, [match, [flags]]])` function is used to get a substring based on a regular expression. + +Examples + +In this example we use the `Ti.{2}` regular exmpression to get the `TiDB` substring of the `This is TiDB` string. + +```sql +SELECT REGEXP_SUBSTR('This is TiDB','Ti.{2}'); +``` + +``` ++----------------------------------------+ +| REGEXP_SUBSTR('This is TiDB','Ti.{2}') | ++----------------------------------------+ +| TiDB | ++----------------------------------------+ +1 row in set (0.00 sec) +``` + ### [`REPEAT()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_repeat) Repeat a string the specified number of times. From fd6365effa8830213a6f6fb9ba5b2b8674e94443 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 16:05:35 +0200 Subject: [PATCH 05/17] Update functions-and-operators/string-functions.md --- functions-and-operators/string-functions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index d14b2e7b815de..cdb659e507141 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1331,7 +1331,6 @@ The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [flags]]]])` function retur If either the `str` or `regexp` is NULL then the function returns NULL. If both arguments are NULL then the functions returns NULL. - Examples: In the example below you can see that the `^.b.$` matches `abc`. From 461a801f1b4bb7792d1c0e84b62341b14c4dbb82 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Thu, 4 Apr 2024 18:22:09 +0200 Subject: [PATCH 06/17] Update functions-and-operators/string-functions.md Co-authored-by: Mattias Jonsson --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index cdb659e507141..345f72dbcd37f 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1589,7 +1589,7 @@ The `REGEXP_SUBSTR(str, regexp, [start, [match, [flags]]])` function is used to Examples -In this example we use the `Ti.{2}` regular exmpression to get the `TiDB` substring of the `This is TiDB` string. +In this example we use the `Ti.{2}` regular expression to get the `TiDB` substring of the `This is TiDB` string. ```sql SELECT REGEXP_SUBSTR('This is TiDB','Ti.{2}'); From 87b7d9a8e824bc01d8bf79c08bdd2ee1952de3e5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 5 Apr 2024 08:41:56 +0200 Subject: [PATCH 07/17] Update based on review --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 345f72dbcd37f..e23ffec82aa91 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1329,7 +1329,7 @@ Return the starting index of the substring that matches the regular expression ( The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [flags]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). -If either the `str` or `regexp` is NULL then the function returns NULL. If both arguments are NULL then the functions returns NULL. +If either the `str` or `regexp` is NULL then the function returns NULL. Examples: From c9ebd1acf5f7d81395997b0ed844812188c13aba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 5 Apr 2024 10:12:08 +0200 Subject: [PATCH 08/17] Add table about match_type --- functions-and-operators/string-functions.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index e23ffec82aa91..0d4b8b6865ad3 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1880,6 +1880,15 @@ The value options of `match_type` between TiDB and MySQL are: - TiDB does not support `"u"`, which means Unix-only line endings in MySQL. +| `match_type` | MySQL | TiDB | Description | +|:------------:|-------|------|----------------------------------------| +| c | Yes | Yes | Case-sensitive matching | +| i | Yes | Yes | Case-insensitive matching | +| m | Yes | Yes | Multi-line mode | +| s | No | Yes | Matches newlines, same as `n` in MySQL | +| n | Yes | No | Matches newlines, same as `s` in TiDB | +| u | Yes | No | UNIX™ line endings | + ### Data type compatibility The difference between TiDB and MySQL support for the binary string type: @@ -1904,4 +1913,4 @@ The difference between TiDB and MySQL support for the binary string type: ### Known issues -- https://github.com/pingcap/tidb/issues/37981 \ No newline at end of file +- [GitHub Issue #37981](https://github.com/pingcap/tidb/issues/37981) \ No newline at end of file From 620e8f807c003c6fa8c55906a9657ff572a15d6d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 5 Apr 2024 10:17:26 +0200 Subject: [PATCH 09/17] Add links for match_type --- functions-and-operators/string-functions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 0d4b8b6865ad3..49205c59b8559 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1406,7 +1406,7 @@ SELECT REGEXP_INSTR('abcabc','a',1,1,1); 1 row in set (0.00 sec) ``` -In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. +In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_INSTR('abcabc','A',1,1,0,''); @@ -1472,7 +1472,7 @@ SELECT REGEXP_LIKE('abc','^A'); 1 row in set (0.00 sec) ``` -This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. +This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_LIKE('abc','^A','i'); @@ -1553,7 +1553,7 @@ SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2); 1 row in set (0.00 sec) ``` -In the example below we use the 6th argument to set the flags for case insensitive matching. +In the example below we use the 6th argument to set the flags for case insensitive matching. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1); From 214d6b1ecad4dda92e902982dfe8a171cd7c4b6d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 5 Apr 2024 10:21:15 +0200 Subject: [PATCH 10/17] Only use match_type, not match_type and flags --- functions-and-operators/string-functions.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 49205c59b8559..7dadeaa459565 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1327,7 +1327,7 @@ WHERE Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). -The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [flags]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). +The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [match_type]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). If either the `str` or `regexp` is NULL then the function returns NULL. @@ -1406,7 +1406,7 @@ SELECT REGEXP_INSTR('abcabc','a',1,1,1); 1 row in set (0.00 sec) ``` -In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. +In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_INSTR('abcabc','A',1,1,0,''); @@ -1438,7 +1438,7 @@ SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i'); Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). -The `REGEXP_LIKE(str, regex, [flags])` function is used to test if a regular expression matches a string. Optionally the `flags` can be used to change the matching behavior. +The `REGEXP_LIKE(str, regex, [match_type])` function is used to test if a regular expression matches a string. Optionally the `match_type` can be used to change the matching behavior. Examples: @@ -1472,7 +1472,7 @@ SELECT REGEXP_LIKE('abc','^A'); 1 row in set (0.00 sec) ``` -This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. +This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. More details about regular expression `match_type`` can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_LIKE('abc','^A','i'); @@ -1491,7 +1491,7 @@ SELECT REGEXP_LIKE('abc','^A','i'); Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). -The `REGEXP_REPLACE(str, regexp, replace, [start, [match, [flags]]])` function can be used to replace strings based on regular expressions. +The `REGEXP_REPLACE(str, regexp, replace, [start, [match, [match_type]]])` function can be used to replace strings based on regular expressions. Examples: @@ -1553,7 +1553,7 @@ SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2); 1 row in set (0.00 sec) ``` -In the example below we use the 6th argument to set the flags for case insensitive matching. More details about regular expression flags can be found in [`match_type` compatibility](#match_type-compatibility) section. +In the example below we use the 6th argument to set the `match_type` for case insensitive matching. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section. ```sql SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1); @@ -1585,7 +1585,7 @@ SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1,'i'); Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). -The `REGEXP_SUBSTR(str, regexp, [start, [match, [flags]]])` function is used to get a substring based on a regular expression. +The `REGEXP_SUBSTR(str, regexp, [start, [match, [match_type]]])` function is used to get a substring based on a regular expression. Examples From 79aee91635e9aa3da00084953e2ae40a6020a777 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 5 Apr 2024 10:25:51 +0200 Subject: [PATCH 11/17] Update based on review --- functions-and-operators/string-functions.md | 28 +++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 7dadeaa459565..bf1bce70920d2 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1434,6 +1434,34 @@ SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i'); 1 row in set (0.00 sec) ``` +Besides `match_type` the [collation](/character-set-and-collation.md) also influences the matching. In the example below a case-sensitive and a case-insensitive collation is used to demonstrate this. + +```sql +SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci); +``` + +``` ++-------------------------------------------------------+ +| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci) | ++-------------------------------------------------------+ +| 1 | ++-------------------------------------------------------+ +1 row in set (0.01 sec) +``` + +```sql +SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin); +``` + +``` ++------------------------------------------------+ +| REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_bin) | ++------------------------------------------------+ +| 0 | ++------------------------------------------------+ +1 row in set (0.00 sec) +``` + ### [`REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like) Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)). From 63e6ff4dd075d9e86f5c90e5666285ce25d87306 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Sun, 7 Apr 2024 14:11:48 +0200 Subject: [PATCH 12/17] Update functions-and-operators/string-functions.md Co-authored-by: xixirangrang --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index bf1bce70920d2..5816c043d63a3 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1391,7 +1391,7 @@ SELECT REGEXP_INSTR('abcabc','a',1,2); 1 row in set (0.00 sec) ``` -In the example below we use the 5th argument to return the value _after_ the mach instead of the value of the match. +The following example uses the 5th argument to return the value _after_ the mach, instead of the value of the match. ```sql SELECT REGEXP_INSTR('abcabc','a',1,1,1); From 7fe77500fadb8e9a67e712d551bfd3c470a95341 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Sun, 7 Apr 2024 14:12:05 +0200 Subject: [PATCH 13/17] Update functions-and-operators/string-functions.md Co-authored-by: xixirangrang --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 5816c043d63a3..798c57cea5121 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1302,7 +1302,7 @@ FROM 4 rows in set (0.00 sec) ``` -The following example demonstrates that `REGEXP` isn't limited to the `SELECT` clause and for example can also be used in the `WHERE` clause of the query. +The following example demonstrates that `REGEXP` is not limited to the `SELECT` clause. For example, you can also use it in the `WHERE` clause of the query. ```sql SELECT From 6ee6c9a2066574f425be6eee3796ff55c19bada5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Sun, 7 Apr 2024 14:12:32 +0200 Subject: [PATCH 14/17] Update functions-and-operators/string-functions.md Co-authored-by: xixirangrang --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 798c57cea5121..58a411584fdf1 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1329,7 +1329,7 @@ Return the starting index of the substring that matches the regular expression ( The `REGEXP_INSTR(str, regexp, [start, [match, [ret, [match_type]]]])` function returns the position of the match if the regular expression (`regexp`) matches the string (`str`). -If either the `str` or `regexp` is NULL then the function returns NULL. +If either the `str` or `regexp` is `NULL`, then the function returns `NULL`. Examples: From ef7a79336bf4b5cb47c7b9c31116d062461d8663 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Sun, 7 Apr 2024 14:12:45 +0200 Subject: [PATCH 15/17] Update functions-and-operators/string-functions.md Co-authored-by: xixirangrang --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 58a411584fdf1..d82227d209495 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1348,7 +1348,7 @@ SELECT REGEXP_INSTR('abc','^.b.$'); 1 row in set (0.00 sec) ``` -In the example below you can see that the third argument can be used to start looking for a match with a different start position in the string. +The following example uses the third argument to look for a match with a different start position in the string. ```sql SELECT REGEXP_INSTR('abcabc','a'); From 4eead080eacfc26218e319f0e7b61435c1848d35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Sun, 7 Apr 2024 14:13:00 +0200 Subject: [PATCH 16/17] Update functions-and-operators/string-functions.md Co-authored-by: xixirangrang --- functions-and-operators/string-functions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index d82227d209495..b1404a133a755 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1376,7 +1376,7 @@ SELECT REGEXP_INSTR('abcabc','a',2); 1 row in set (0.00 sec) ``` -In the example below we use the 4th argument to look for the second match. +The following example uses the 4th argument to look for the second match. ```sql SELECT REGEXP_INSTR('abcabc','a',1,2); From f9318aa020303019b946dddbab00f69305f38eab Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Tue, 9 Apr 2024 14:23:39 +0800 Subject: [PATCH 17/17] Apply suggestions from code review --- functions-and-operators/string-functions.md | 22 ++++++++++----------- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index b1404a133a755..a85c7cd89fa23 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -1406,7 +1406,7 @@ SELECT REGEXP_INSTR('abcabc','a',1,1,1); 1 row in set (0.00 sec) ``` -In the example below the 6th argument is used to add the `i` flag to get a case insensitive match. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section. +The following example uses the 6th argument to add the `i` flag to get a case insensitive match. For more details about regular expression `match_type`, see [`match_type` compatibility](#match_type-compatibility). ```sql SELECT REGEXP_INSTR('abcabc','A',1,1,0,''); @@ -1434,7 +1434,7 @@ SELECT REGEXP_INSTR('abcabc','A',1,1,0,'i'); 1 row in set (0.00 sec) ``` -Besides `match_type` the [collation](/character-set-and-collation.md) also influences the matching. In the example below a case-sensitive and a case-insensitive collation is used to demonstrate this. +Besides `match_type`, the [collation](/character-set-and-collation.md) also influences the matching. The following example uses a case-sensitive and a case-insensitive collation to demonstrate this. ```sql SELECT REGEXP_INSTR('abcabc','A' COLLATE utf8mb4_general_ci); @@ -1470,7 +1470,7 @@ The `REGEXP_LIKE(str, regex, [match_type])` function is used to test if a regula Examples: -This example matches `^a` against `abc`, which matches. +The following example shows that `^a` matches `abc`. ```sql SELECT REGEXP_LIKE('abc','^a'); @@ -1485,7 +1485,7 @@ SELECT REGEXP_LIKE('abc','^a'); 1 row in set (0.00 sec) ``` -This example matches `^A` against `abc`, which doesn't match. +This following example shows that `^A` does not match `abc`. ```sql SELECT REGEXP_LIKE('abc','^A'); @@ -1500,7 +1500,7 @@ SELECT REGEXP_LIKE('abc','^A'); 1 row in set (0.00 sec) ``` -This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. More details about regular expression `match_type`` can be found in [`match_type` compatibility](#match_type-compatibility) section. +This example matches `^A` against `abc`, which now matches because of the `i` flag which enabled case insensitive matching. For more details about the regular expression `match_type`, see [`match_type` compatibility](#match_type-compatibility). ```sql SELECT REGEXP_LIKE('abc','^A','i'); @@ -1523,7 +1523,7 @@ The `REGEXP_REPLACE(str, regexp, replace, [start, [match, [match_type]]])` funct Examples: -In this example two o's are replaced by "i". +In the following example, two o's are replaced by `i`. ```sql SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i'); @@ -1538,7 +1538,7 @@ SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i'); 1 row in set (0.00 sec) ``` -In the example below we start the match at the third character, causing the regular expression to not match and not do any replacement. +The following example starts the match at the third character, causing the regular expression not to match and not do any replacement. ```sql SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i',3); @@ -1553,7 +1553,7 @@ SELECT REGEXP_REPLACE('TooDB', 'o{2}', 'i',3); 1 row in set (0.00 sec) ``` -In the example below the 5th argument is used to set if the first or the second match is used for the replacement. +In the following example, the 5th argument is used to set if the first or the second match is used for the replacement. ```sql SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,1); @@ -1581,7 +1581,7 @@ SELECT REGEXP_REPLACE('TooDB', 'o', 'i',1,2); 1 row in set (0.00 sec) ``` -In the example below we use the 6th argument to set the `match_type` for case insensitive matching. More details about regular expression `match_type` can be found in [`match_type` compatibility](#match_type-compatibility) section. +The following example uses the 6th argument to set the `match_type` for case insensitive matching. For more details about the regular expression `match_type`, see [`match_type` compatibility](#match_type-compatibility). ```sql SELECT REGEXP_REPLACE('TooDB', 'O{2}','i',1,1); @@ -1615,9 +1615,7 @@ Return the substring that matches the regular expression (Partly compatible with The `REGEXP_SUBSTR(str, regexp, [start, [match, [match_type]]])` function is used to get a substring based on a regular expression. -Examples - -In this example we use the `Ti.{2}` regular expression to get the `TiDB` substring of the `This is TiDB` string. +The following example uses the `Ti.{2}` regular expression to get the `TiDB` substring of the `This is TiDB` string. ```sql SELECT REGEXP_SUBSTR('This is TiDB','Ti.{2}');