-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use solr's graph query builder for nested collection queries #5850
Conversation
spec/services/hyrax/collections/nested_collection_query_service_spec.rb
Outdated
Show resolved
Hide resolved
expect(subject.map(&:id)).to contain_exactly(coll_a.id, another.id) | ||
expect(subject.map(&:id)).to contain_exactly(another.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test was incorrect before. If A -> B -> C -> D -> E then I wouldn't expect A to be valid as an "available parent" for C because it already is. I believe this is actually prevented by the validator if you attempt to do this in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in the current implementation, C can have A as an ancestor through B and as a direct parent. So
A -> B -> C -> D -> E
A -> C -> D -> E
at the same time. BUT I do think that works more by accident than by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing E to be a direct child of C and indirect child of C was intentional but perhaps from a theoretical stand point instead of anything concrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I tested again locally and this works for User Collections, but when I attempt to do this with a collection type that doesn't allow multiple membership, I get an error:
Hyrax::SingleMembershipError in Hyrax::Dashboard::NestCollectionsController#create_relationship_within
Error: You have specified more than one of the same single-membership collection type (type: Nesting only, collections: C and D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into restoring this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added support for this back in with dfc2f09
# Limit no longer applies so it will always return true | ||
it 'returns true' do | ||
expect(subject).to eq true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use solr's maxDepth
parameter to the graph query parser, but that feels like more work than it is worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense to me.
If you exceed it there should just be a message that says "you have delved too deep. the Balrog comes"
This sounds great! Has there been any performance testing with meaningful/large hyrax data sets? Would perhaps using the |
this looks great to me. since the performance of the old implementation is extremely bad, it seems so likely that the performance will be improved. i'd like to suggest we move forward with this and leave any testing for post merge. i'd also like to look at a limited backport of this, keeping the old indexing behavior in place for 3.x by default, but introducing the new query behavior. i'll poke at this later today. |
@dlpierce I've done some very limited manual testing and haven't seen any issues with solr. I'm not sure if the traversalFilter is useful for our use case or not. I'd be curious about performance as nesting depth scales but I'd suspect solr would handle most normal cases easily. |
if nestable?(parent) | ||
nesting_within_maximum_depth | ||
else | ||
unless nestable?(parent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we guarding circular nesting? (e.g. A has member B has member C has member A)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, the graph query parser is "cyclic aware".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a current Hyrax, we allow circular nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is being handled by the parent_and_child_can_be_nested
validation on the form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@no-reply Can you give an example of currently allowed/working circular nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to reproduce this in nurax-dev. Can you please post the exact steps that allowed this for you? I'd like to reproduce this in nurax-dev then test the same steps with this PR to see how the behavior differs.
What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users. |
# needed to make the determination are too expensive to do for every possible | ||
# collection, so we only test for this situation prior to saving the new | ||
# relationship. | ||
def nesting_within_maximum_depth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm trying to write a commit message for the backport of this change---which no-ops this validation if the solr graph query is configured. but i'm a bit stuck and realizing i'm not entirely clear on why we're removing this. can you help me @cjcolvar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that the max depth was only needed to limit the amount of nested reindexing to undertake. Now that the indexing isn't necessary the technical limit seems like solr's graph query parser's efficiency. If there is a functional reason for having a limit (and making it adopter configurable) then I could restore the depth limit enforcement with a new implementation. The configuration of the max depth was also on the Samvera::NestingIndexer so we'd have to make a new config in Hyrax for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! super helpful
I would think a note on this page https://samvera.github.io/service-stack.html (which could stand some expansion and more prominant linking in the readme, but that's a different ticket) and a comment in app/search_builders/hyrax/dashboard/nested_collections_search_builder.rb when we talk about the graph search. (because that's where things will break if you multishard). In a perfect world, we would throw an exception or send a notice or something on startup if multiple shards are detected, but I don't think we even mandate solr cloud usage, so I'm not sure how that API call would be reliable and again, I feel like "Check if Solr configuration is sane on start up" is a whole new ticket. |
This can replace the expensive nested indexing using samvera-nesting_indexer
dfc2f09
to
78a6098
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My testing of the style of solr query used here worked well and seemed to be speedy with lots of collections. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can replace the expensive nested indexing using samvera-nesting_indexer and all of the special handling for it.
Note: the graph query builder does not work with multi-sharded solr instances
All nesting indexing has been removed along with unused methods/parameters. Deprecation notices will be added to a future backport PR to 3.x.