Skip to content

Commit

Permalink
Fix LIKE with escapes (#6703)
Browse files Browse the repository at this point in the history
* Fix LIKE with escapes

Fix LIKE processing for patterns containing escapes

- the starts_with / ends_with optimization did not correctly check for
  escapes when checking rest of the pattern for being literal or not
- the pattern to regexp compiler incorrectly processed \ followed by a
  character other than % or _. In PostgreSQL '\x' pattern matches single
  'x'.

There are two tests

- like_escape_many was generated using PostgreSQL with the code attached
  below for verification
- like_escape is hand-picked test cases that are more interesting.
  Lower cardinality of hand-picked test cases allows for exercising all
  scalar/array vs scalar/array combinations.

The below script isn't simples possible, because it was attempted to
generate more test cases by adding padding. Hence e.g.
is_like_without_dangling_escape.  Since this is attached for reference,
should be attached as-is.

```python
import psycopg2

data = r"""
\
\\
\\\
\\\\
a
\a
\\a
%
\%
\\%
%%
\%%
\\%%
_
\_
\\_
__
\__
\\__
abc
a_c
a\bc
a\_c
%abc
\%abc
a\\_c%
""".split('\n')

data = list(dict.fromkeys(data))

conn = psycopg2.connect(host='localhost', port=5432, user='postgres', password='mysecretpassword')
conn.set_session(autocommit=True)
cursor = conn.cursor()
for r in data:
    try:
        # PostgreSQL verifies dandling escape only sometimes
        cursor.execute(f"SELECT %s LIKE %s", (r, r))
        is_like, = cursor.fetchone()
        has_dandling_escape = False
        pg_pattern = r
    except Exception as e:
        if 'LIKE pattern must not end with escape character' not in str(e):
            raise e
        has_dandling_escape = True
        pg_pattern = r + '\\'

    for l in data:
        # print()
        # print('     '.join(str(v) for v in (l, r, has_dandling_escape, postgres_pattern)))
        cursor.execute(f"SELECT %s LIKE %s", (l, pg_pattern))
        is_like, = cursor.fetchone()
        assert type(is_like) is bool

        if not is_like and has_dandling_escape:
            pattern_without_escaped_dandling_escape = pg_pattern[:-2]
            cursor.execute(f"SELECT %s LIKE %s", (l, pattern_without_escaped_dandling_escape))
            is_like_without_dangling_escape, = cursor.fetchone()
            assert type(is_like_without_dangling_escape) is bool
        else:
            is_like_without_dangling_escape = False
        assert '"' not in l
        assert '"' not in r
        print('(r"%s", r"%s", %s),' % (
            l, r,
            str(is_like).lower(),
            # str(has_dandling_escape).lower(),
            # str(is_like_without_dangling_escape).lower(),
        ))
```

* Compact tests for regex_like

Reduce test code boilerplate and make it easier to see what are the test
cases.

* Add more test cases for regex_like
  • Loading branch information
findepi authored Nov 9, 2024
1 parent 6dea453 commit 5ad621f
Show file tree
Hide file tree
Showing 2 changed files with 1,094 additions and 59 deletions.
Loading

0 comments on commit 5ad621f

Please sign in to comment.