Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use solr's graph query builder for nested collection queries #5850

Merged
merged 4 commits into from
Sep 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion app/actors/hyrax/actors/collections_membership_actor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ def assign_nested_attributes_for_collection(env)
# along side the FileSets on the show page
def add(env, id)
collection = Hyrax.config.collection_class.find(id)
collection.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX

return unless env.current_ability.can?(:deposit, collection)
env.curation_concern.member_of_collections << collection
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def update_members # rubocop:disable Metrics/MethodLength
after_update_error(err_msg) if err_msg.present?
return if err_msg.present?

@collection.try(:reindex_extent=, Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX)
begin
Hyrax::Collections::CollectionMemberService.add_members_by_ids(collection_id: collection_id,
new_member_ids: batch_ids,
Expand Down
2 changes: 0 additions & 2 deletions app/controllers/hyrax/dashboard/collections_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,6 @@ def update_active_fedora_collection
process_branding

@collection.visibility = Hydra::AccessControls::AccessRight::VISIBILITY_TEXT_VALUE_PRIVATE unless Hyrax::CollectionType.for(collection: @collection).discoverable?
# we don't have to reindex the full graph when updating collection
@collection.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX
if @collection.update(collection_params.except(:members))
after_update_response
else
Expand Down
19 changes: 3 additions & 16 deletions app/forms/hyrax/forms/dashboard/nest_collection_form.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ def initialize(parent: nil,
validates :parent, presence: true
validates :child, presence: true
validate :parent_and_child_can_be_nested
validate :nesting_within_maximum_depth

def save
return false unless valid?
Expand Down Expand Up @@ -83,12 +82,11 @@ def available_parent_collections
# rerouting to new_dashboard_collection_path to add the new collection as
# a child. Since we don't yet have a child collection, the valid? option can't be used here.
def validate_add
if nestable?(parent)
nesting_within_maximum_depth
else
unless nestable?(parent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we guarding circular nesting? (e.g. A has member B has member C has member A)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, the graph query parser is "cyclic aware".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a current Hyrax, we allow circular nesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is being handled by the parent_and_child_can_be_nested validation on the form.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@no-reply Can you give an example of currently allowed/working circular nesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

circular-nesting

i was able to create this situation trivially in the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to reproduce this in nurax-dev. Can you please post the exact steps that allowed this for you? I'd like to reproduce this in nurax-dev then test the same steps with this PR to see how the behavior differs.

errors.add(:parent, :cannot_have_child_nested)
false
return false
end
true
end

def remove
Expand All @@ -104,17 +102,6 @@ def remove

attr_accessor :query_service, :persistence_service, :context, :collection

# ideally we would love to be able to eliminate collections which exceed the
# maximum nesting depth from the lists of available collections, but the queries
# needed to make the determination are too expensive to do for every possible
# collection, so we only test for this situation prior to saving the new
# relationship.
def nesting_within_maximum_depth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to write a commit message for the backport of this change---which no-ops this validation if the solr graph query is configured. but i'm a bit stuck and realizing i'm not entirely clear on why we're removing this. can you help me @cjcolvar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that the max depth was only needed to limit the amount of nested reindexing to undertake. Now that the indexing isn't necessary the technical limit seems like solr's graph query parser's efficiency. If there is a functional reason for having a limit (and making it adopter configurable) then I could restore the depth limit enforcement with a new implementation. The configuration of the max depth was also on the Samvera::NestingIndexer so we'd have to make a new config in Hyrax for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! super helpful

return true if query_service.valid_combined_nesting_depth?(parent: parent, child: child, scope: context)
errors.add(:collection, :exceeds_maximum_nesting_depth)
false
end

def parent_and_child_can_be_nested
if nestable?(parent) && nestable?(child)
return true if query_service.parent_and_child_can_nest?(parent: parent, child: child, scope: context)
Expand Down
19 changes: 0 additions & 19 deletions app/indexers/hyrax/repository_reindexer.rb

This file was deleted.

1 change: 0 additions & 1 deletion app/models/concerns/hyrax/collection_behavior.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ module CollectionBehavior
include Hyrax::HumanReadableType
include Hyrax::HasRepresentative
include Hyrax::Permissions
include Hyrax::CollectionNesting

included do
validates_with HasOneTitleValidator
Expand Down
73 changes: 0 additions & 73 deletions app/models/concerns/hyrax/collection_nesting.rb

This file was deleted.

1 change: 0 additions & 1 deletion app/models/concerns/hyrax/work_behavior.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ module WorkBehavior
include ProxyDeposit
include Works::Metadata
include WithEvents
include Hyrax::CollectionNesting

included do
property :owner, predicate: RDF::URI.new('http://opaquenamespace.org/ns/hydra/owner'), multiple: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,11 @@ class NestedCollectionsSearchBuilder < ::Hyrax::CollectionSearchBuilder
# @param access [Symbol] :edit, :read, :discover - With the given :access what all can
# @param collection [::Collection]
# @param scope [Object] Typically a controller that responds to #current_ability, #blackligh_config
# @param nesting_attributes [NestingAttributes] an object encapsulating nesting attributes of the collection
# @param nest_direction [Symbol] (:as_parent or :as_child) the direction we are adding nesting to this collection
def initialize(access:, collection:, scope:, nesting_attributes:, nest_direction:)
def initialize(access:, collection:, scope:, nest_direction:)
super(scope)
@collection = collection
@discovery_permissions = extract_discovery_permissions(access)
@nesting_attributes = nesting_attributes
@nest_direction = nest_direction
end

Expand All @@ -24,74 +22,16 @@ def with_pagination(solr_parameters)
solr_parameters[:rows] = 1000
end

# Solr can do graph traversal without the need of special indexing with the Graph query parser so
# use this to compute both the parents and children of the current collection then exclude them
# See https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#graph-query-parser
def show_only_other_collections_of_the_same_collection_type(solr_parameters)
solr_parameters[:fq] ||= []
solr_parameters[:fq] += [
"-" + Hyrax::SolrQueryBuilderService.construct_query_for_ids([limit_ids]),
Hyrax::SolrQueryBuilderService.construct_query(Hyrax.config.collection_type_index_field => @collection.collection_type_gid)
Hyrax::SolrQueryBuilderService.construct_query(Hyrax.config.collection_type_index_field => @collection.collection_type_gid),
"-{!graph from=id to=member_of_collection_ids_ssim#{' maxDepth=1' if @nest_direction == :as_parent}}id:#{@collection.id}",
"-{!graph to=id from=member_of_collection_ids_ssim#{' maxDepth=1' if @nest_direction == :as_child}}id:#{@collection.id}"
]
solr_parameters[:fq] += limit_clause if limit_clause # add limits to prevent illegal nesting arrangements
end

private

def limit_ids
# exclude current collection from returned list
limit_ids = [@collection.id.to_s]
# cannot add a parent that is already a parent
limit_ids += @nesting_attributes.parents if @nesting_attributes.parents && @nest_direction == :as_parent
limit_ids
end

# remove collections from list in order to to prevent illegal nesting arrangements
def limit_clause
case @nest_direction
when :as_parent
eligible_to_be_a_parent
when :as_child
eligible_to_be_a_child
end
end

# To be eligible to be a parent collection of child "Collection G":
# 1) cannot have any pathnames containing Collection G's ID
# 2) cannot already be Collection G's direct parent
# => this is handled through limit_ids method
def eligible_to_be_a_parent
# Using a !lucene query allows us to get items using a wildcard query, a feature not supported via AF query builder.
["-_query_:\"{!lucene df=#{Samvera::NestingIndexer.configuration.solr_field_name_for_storing_pathnames}}*#{@collection.id}*\""]
end

# To be eligible to be a child collection of parent "Collection F":
# 1) Cannot have any pathnames containing any of Collection F's pathname or ancestors
# 2) cannot already be Collection F's direct child
def eligible_to_be_a_child
exclude_path = []
exclude_path << exclude_if_paths_contain_collection
exclude_path << exclude_if_already_parent
end

def exclude_if_paths_contain_collection
# 1) Exclude any pathnames containing any of Collection F's pathname or ancestors
array_to_exclude = [] + @nesting_attributes.pathnames unless @nesting_attributes.pathnames.nil?
array_to_exclude += @nesting_attributes.ancestors unless @nesting_attributes.ancestors.nil?
# build a unique string containing all of Collection F's pathnames and ancestors
exclude_list = ""
array_to_exclude&.uniq&.each do |element|
exclude_list += ' ' unless exclude_list.empty?
exclude_list += element.to_s
end
# Using a !lucene query allows us to get items which match any individual element
# from the list. Building the query via the AF builder created a !field query which
# only searches the field for an exact string and doesn't allow an "OR" connection
# between the elements.
return "-_query_:\"{!lucene q.op=OR df=#{Samvera::NestingIndexer.configuration.solr_field_name_for_storing_pathnames}}#{exclude_list}\"" unless exclude_list.empty?
""
end

def exclude_if_already_parent
# 2) Exclude any of Collection F's direct children
"-" + ActiveFedora::SolrQueryBuilder.construct_query(Samvera::NestingIndexer.configuration.solr_field_name_for_storing_parent_ids => @collection.id.to_s)
end
end
end
Expand Down
Loading