Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-19930 Query builder support for NOT CQL syntax #1963

Open
wants to merge 3 commits into
base: 4.x
Choose a base branch
from

Conversation

absurdfarce
Copy link
Contributor

No description provided.

@tolbertam tolbertam self-requested a review October 3, 2024 01:58
Copy link
Contributor

@tolbertam tolbertam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally this seems good, thanks!. I had a small documentation suggestion and some thoughts about functional CQL queries int he tests, but I think it's good as is so giving a 👍

* Builds a NOT CONTAINS relation for the column.
*
* <p>Note that NOT CONTAINS support is only available in Cassandra 5.1 or later. See
* CASSANDRA-18584 for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the places we refer to CASSANDRA-18584 can we add a link (just so its clickable from javadocs)?

Suggested change
* CASSANDRA-18584 for more information.
* <a href="https://issues.apache.org/jira/browse/CASSANDRA-18584">CASSANDRA-18584</a> for more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since NOT CONTAINS and NOT CONTAINS KEY both unconditionally require ALLOW FILTERING. It could be useful to mention that in the comments here as well.

Copy link
Contributor

@tolbertam tolbertam Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strike that, It seems that milestone 3 of the CEP aims to remove that requirement with SAI:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator#CEP29:CQLNOToperator-Milestone3

Don't require ALLOW FILTERING if a negative restriction is used on an indexed column and the index supports efficient negative searches - skipping items that match the restricted term. This is useful in particular for implementing predicates of type collection NOT CONTAINS element, where collection is indexed.
To be implemented in SAI, after CEP-7 is merged. This can be implemented in SAI by introduction of "negating iterator" that would return a complement of a posting list. This way there is no need to change the implementation of the index data structure and such iterator would compose well with other SAI features, e.g. would support intersections and unions.

Looks like https://issues.apache.org/jira/browse/CASSANDRA-19929 attempts to make that happen.

Comment on lines +48 to +49
assertThat(selectFrom("foo").all().where(Relation.column("k").contains(literal(1))))
.hasCql("SELECT * FROM foo WHERE k CONTAINS 1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit pedantic, so can choose to take these suggested changes or not, but it looks like CONTAINS/CONTAINS KEY and (NOT) variants, do not work on partition keys without ALLOW FILTERING, but can work with ALLOW FILTERING if you create an SAI index on a non-partition key column. I think it would be good to make the tests model against valid queries as much as possible, so could we do something like this:

Suggested change
assertThat(selectFrom("foo").all().where(Relation.column("k").contains(literal(1))))
.hasCql("SELECT * FROM foo WHERE k CONTAINS 1");
assertThat(selectFrom("foo").all()
.where(Relation.column("k").isEqualTo(literal(1)))
.where(Relation.column("v").contains(literal(1))))
.hasCql("SELECT * FROM foo WHERE k=1 AND v CONTAINS 1");

which would be valid for a schema like:

CREATE TABLE simple.foo (
    k int PRIMARY KEY,
    v set<int>
);

CREATE INDEX foo_sai_idx ON simple.foo (v) USING 'sai';

Although this doesn't work for NOT CONTAINS:

SELECT * FROM foo WHERE k=1 and v NOT CONTAINS 1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Which, based on the tests in C* and this, it looks like this is by design which is surprising to me that there is no way to use an operator without ALLOW FILTERING?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now in https://issues.apache.org/jira/browse/CASSANDRA-19930 that you provided a lot of context in comments around what is valid/not valid which now that I see it is very useful for my understanding 👍

.where(
Relation.column("k")
.notIn(Lists.newArrayList(literal(1), literal(2), literal(3)))))
.hasCql("SELECT * FROM foo WHERE k NOT IN (1,2,3)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other comment NOT IN requires ALLOW FILTERING on partition keys, but it can be used for clustering keys (without any indexes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants