Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use solr's graph query builder for nested collection queries #5850

Merged
merged 4 commits into from
Sep 30, 2022
Merged

Conversation

cjcolvar
Copy link
Member

@cjcolvar cjcolvar commented Sep 9, 2022

This can replace the expensive nested indexing using samvera-nesting_indexer and all of the special handling for it.

Note: the graph query builder does not work with multi-sharded solr instances

All nesting indexing has been removed along with unused methods/parameters. Deprecation notices will be added to a future backport PR to 3.x.

Comment on lines 220 to 179
expect(subject.map(&:id)).to contain_exactly(coll_a.id, another.id)
expect(subject.map(&:id)).to contain_exactly(another.id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test was incorrect before. If A -> B -> C -> D -> E then I wouldn't expect A to be valid as an "available parent" for C because it already is. I believe this is actually prevented by the validator if you attempt to do this in the UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in the current implementation, C can have A as an ancestor through B and as a direct parent. So

A -> B -> C -> D -> E
A -> C -> D -> E

at the same time. BUT I do think that works more by accident than by design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing E to be a direct child of C and indirect child of C was intentional but perhaps from a theoretical stand point instead of anything concrete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I tested again locally and this works for User Collections, but when I attempt to do this with a collection type that doesn't allow multiple membership, I get an error:

Hyrax::SingleMembershipError in Hyrax::Dashboard::NestCollectionsController#create_relationship_within
Error: You have specified more than one of the same single-membership collection type (type: Nesting only, collections: C and D)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into restoring this functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for this back in with dfc2f09

Comment on lines 271 to 273
# Limit no longer applies so it will always return true
it 'returns true' do
expect(subject).to eq true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use solr's maxDepth parameter to the graph query parser, but that feels like more work than it is worth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense to me.

If you exceed it there should just be a message that says "you have delved too deep. the Balrog comes"

@cjcolvar
Copy link
Member Author

This should also fix #5596, #5595, and #5492.

@dlpierce
Copy link
Contributor

This sounds great! Has there been any performance testing with meaningful/large hyrax data sets? Would perhaps using the traversalFilter graph query option provide any benefit?

@no-reply
Copy link
Contributor

this looks great to me.

since the performance of the old implementation is extremely bad, it seems so likely that the performance will be improved. i'd like to suggest we move forward with this and leave any testing for post merge.

i'd also like to look at a limited backport of this, keeping the old indexing behavior in place for 3.x by default, but introducing the new query behavior. i'll poke at this later today.

@cjcolvar
Copy link
Member Author

@dlpierce I've done some very limited manual testing and haven't seen any issues with solr. I'm not sure if the traversalFilter is useful for our use case or not. I'd be curious about performance as nesting depth scales but I'd suspect solr would handle most normal cases easily.

if nestable?(parent)
nesting_within_maximum_depth
else
unless nestable?(parent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we guarding circular nesting? (e.g. A has member B has member C has member A)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, the graph query parser is "cyclic aware".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a current Hyrax, we allow circular nesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is being handled by the parent_and_child_can_be_nested validation on the form.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@no-reply Can you give an example of currently allowed/working circular nesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

circular-nesting

i was able to create this situation trivially in the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce this in nurax-dev. Can you please post the exact steps that allowed this for you? I'd like to reproduce this in nurax-dev then test the same steps with this PR to see how the behavior differs.

@no-reply
Copy link
Contributor

What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.

# needed to make the determination are too expensive to do for every possible
# collection, so we only test for this situation prior to saving the new
# relationship.
def nesting_within_maximum_depth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to write a commit message for the backport of this change---which no-ops this validation if the solr graph query is configured. but i'm a bit stuck and realizing i'm not entirely clear on why we're removing this. can you help me @cjcolvar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that the max depth was only needed to limit the amount of nested reindexing to undertake. Now that the indexing isn't necessary the technical limit seems like solr's graph query parser's efficiency. If there is a functional reason for having a limit (and making it adopter configurable) then I could restore the depth limit enforcement with a new implementation. The configuration of the max depth was also on the Samvera::NestingIndexer so we'd have to make a new config in Hyrax for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! super helpful

@orangewolf
Copy link
Member

What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.

I would think a note on this page https://samvera.github.io/service-stack.html (which could stand some expansion and more prominant linking in the readme, but that's a different ticket) and a comment in app/search_builders/hyrax/dashboard/nested_collections_search_builder.rb when we talk about the graph search. (because that's where things will break if you multishard).

In a perfect world, we would throw an exception or send a notice or something on startup if multiple shards are detected, but I don't think we even mandate solr cloud usage, so I'm not sure how that API call would be reliable and again, I feel like "Check if Solr configuration is sane on start up" is a whole new ticket.

@jlhardes jlhardes added this to the 4.x Series milestone Sep 28, 2022
Copy link
Contributor

@dlpierce dlpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing of the style of solr query used here worked well and seemed to be speedy with lots of collections. 👍

@dlpierce dlpierce merged commit 9131b9c into main Sep 30, 2022
@dlpierce dlpierce deleted the solr_graph branch September 30, 2022 20:35
@dlpierce dlpierce added the notes-major Release Notes: Potentially breaking changes label Oct 12, 2022
Copy link
Member Author

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orangewolf added a commit that referenced this pull request Nov 18, 2022
* backport #5850: limited/guarded backport of query portion

Co-authored-by: Chris Colvard <chris.colvard@gmail.com>
Co-authored-by: Rob Kaufman <rob@notch8.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes-major Release Notes: Potentially breaking changes valkyrization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants