Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add regex search operator #2458

Merged
merged 13 commits into from
May 2, 2023
Merged

feat: add regex search operator #2458

merged 13 commits into from
May 2, 2023

Conversation

max-sixty
Copy link
Member

Todo before merge:

  • Is ~= a good operator?
    • I'm trying to be consistent with the other comparisons, which have a character before the =
  • Should we be case sensitive / case insensitive?
  • Add some integration tests
  • Add some simple docs

Other questions:

Comment on lines +96 to +99
let expr = match try_into_regex_function(expr, ctx)? {
Ok(between) => return Ok(between),
Err(expr) => expr,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes unfortuanetely, this is checking each of the try_into_x one after another. It is actually my attempt at matching.

Good news is that it is necessary only for things that was specifically want to convert to sql_parser's AST or that need special handling. If it could be implemented with an s-string, it can go into std_impl.prql.

So this PR could be done by just implementing std.regex_search in std_impl.prql, it we had some mechanism for changing the impl depending on the dialect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wonder whether we've over-abstracted this, and there's simpler code, even if more verbose and a bit less performant.

e.g. looking at this for a few minutes (but this only does add, would need to be extended to other binary ops, and doesn't handle an incorrect number of args)

        ExprKind::BuiltInFunction { name, .. } if name == "std.and" => {
            let op = BinaryOperator::Plus;
            let strength = op.binding_strength();
            let [left, right] = unpack(expr, STD_AND);
            let left = translate_operand(left, strength, !op.associates_left(), ctx)?;
            let right = translate_operand(right, strength, !op.associates_right(), ctx)?;
            sql_ast::Expr::BinaryOp { left, right, op }
        }

I don't have much confidence here though — raising since I'm still trying to go through the code and see if the perspective of someone worse at writing code can help! (And I won't pursue further unless requested)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's more concise. My helper functions were intended for also checking the structure of arguments (see IS NULL), and doing arbitrary stuff without copying before deciding that the expr should be unpacked and consumed.

Handling wrong number or type or args is nice to have, because there will be invalid RQs which we don't want to panic on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if it's more performant. Even if it's not allocating, it's calling many functions on each resolution.

(I realized this when I had https://github.com/PRQL/prql/pull/2458/files#diff-0a1f682019c767247e01027a9762cc4999669f393eb7971f095d81bea690163aR213-R219 before the let Some((decl, _)) = try_unpack(&expr, DECLS)? else { — which panicked whenever mssql was running, even though it wasn't calling regex).

That said, I'm often surprised how cheap everything is relative to allocating.

But then I'm also often reminded of how cheap perf is relative to code complexity :)

@aljazerzen
Copy link
Member

Is ~= a good operator?

Yes, I like it.

Should we be case sensitive / case insensitive?

Sensitive. But there should be a way to make it insensitive. Bit I cannot think of one that'd I'd like.

Do we want a raw string type, so "bob\smarley doesn't need the extra ?

Yes.

@max-sixty
Copy link
Member Author

max-sixty commented Apr 19, 2023

Should we be case sensitive / case insensitive?

Sensitive. But there should be a way to make it insensitive. Bit I cannot think of one that'd I'd like.

Postgres uses ~ for case sensitive and ~* for case insensitive. But =~* seems too perl-esque... Folks can always do (upper foo) ~= 'bar', but some overhead.

@max-sixty max-sixty mentioned this pull request Apr 29, 2023
@max-sixty
Copy link
Member Author

max-sixty commented May 1, 2023

Ready to merge!

We can implement sqlite & mysql later — it requires a small refactoring since it's infix in those dialects

@max-sixty max-sixty requested a review from aljazerzen May 1, 2023 06:22
@aljazerzen aljazerzen changed the title feat: Add regex search operator feat: add regex search operator May 2, 2023
@aljazerzen
Copy link
Member

I have some reservations about the language design of this, but the implementation is solid, so let's merge it.

@aljazerzen aljazerzen mentioned this pull request May 2, 2023
@max-sixty
Copy link
Member Author

I have some reservations about the language design of this, but the implementation is solid, so let's merge it.

OK, we can revert as ever!

@max-sixty max-sixty merged commit 6a4e5a8 into PRQL:main May 2, 2023
@max-sixty max-sixty deleted the regex-search branch May 2, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants