Skip to content

Commit

Permalink
Replace parent-child with a join field. (toptal#760)
Browse files Browse the repository at this point in the history
Implement hierarchical structures with Elastic join field, in place of
an obsolete (and removed) parent-child relationships.
  • Loading branch information
mrzasa authored and Çağatay Yücelen committed Jan 28, 2023
1 parent 334b4bd commit 7247b5e
Show file tree
Hide file tree
Showing 16 changed files with 746 additions and 83 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### New Features

* [#760](https://github.com/toptal/chewy/pull/760): Replace parent-child mapping with a [join field](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#parent-child-mapping-types) ([@mrzasa][])

### Changes

### Bugs Fixed
Expand Down Expand Up @@ -723,4 +725,3 @@
[@Vitalina-Vakulchyk]: https://github.com/Vitalina-Vakulchyk
[@webgago]: https://github.com/webgago
[@yahooguntu]: https://github.com/yahooguntu

17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,23 @@ end

See the section on *Script fields* for details on calculating distance in a search.

### Join fields

You can use a [join field](https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html)
to implement parent-child relationships between documents.
It [replaces the old `parent_id` based parent-child mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html#parent-child-mapping-types)

To use it, you need to pass `relations` and `join` (with `type` and `id`) options:
```ruby
field :hierarchy_link, type: :join, relations: {question: %i[answer comment], answer: :vote, vote: :subvote}, join: {type: :comment_type, id: :commented_id}
```
assuming you have `comment_type` and `commented_id` fields in your model.

Note that when you reindex a parent, it's children and grandchildren will be reindexed as well.
This may require additional queries to the primary database and to elastisearch.

Also note that the join field doesn't support crutches (it should be a field directly defined on the model).

### Crutches™ technology

Assume you are defining your index like this (product has_many categories through product_categories):
Expand Down
6 changes: 6 additions & 0 deletions lib/chewy/errors.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,10 @@ def initialize(type, import_errors)
super message
end
end

class InvalidJoinFieldType < Error
def initialize(join_field_type, join_field_name, relations)
super("`#{join_field_type}` set for the join field `#{join_field_name}` is not on the :relations list (#{relations})")
end
end
end
80 changes: 68 additions & 12 deletions lib/chewy/fields/base.rb
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
module Chewy
module Fields
class Base
attr_reader :name, :options, :value, :children
attr_accessor :parent
attr_reader :name, :join_options, :options, :children
attr_accessor :parent # used by Chewy::Index::Mapping to expand nested fields

def initialize(name, value: nil, **options)
@name = name.to_sym
@options = {}
update_options!(**options)
@value = value
@children = []
@allowed_relations = find_allowed_relations(options[:relations]) # for join fields
end

def update_options!(**options)
@join_options = options.delete(:join) || {}
@options = options
end

Expand Down Expand Up @@ -53,30 +55,70 @@ def compose(*objects)
{name => result}
end

def value
if join_field?
join_type = join_options[:type]
join_id = join_options[:id]
# memoize
@value ||= proc do |object|
validate_join_type!(value_by_name_proc(join_type).call(object))
# If it's a join field and it has join_id, the value is compound and contains
# both name (type) and id of the parent object
if value_by_name_proc(join_id).call(object).present?
{
name: value_by_name_proc(join_type).call(object), # parent type
parent: value_by_name_proc(join_id).call(object) # parent id
}
else
value_by_name_proc(join_type).call(object)
end
end
else
@value
end
end

private

def geo_point?
@options[:type].to_s == 'geo_point'
end

def join_field?
@options[:type].to_s == 'join'
end

def ignore_blank?
@options.fetch(:ignore_blank) { geo_point? }
end

def evaluate(objects)
object = objects.first

if value.is_a?(Proc)
if value.arity.zero?
object.instance_exec(&value)
elsif value.arity.negative?
value.call(*object)
else
value.call(*objects.first(value.arity))
end
value_by_proc(objects, value)
else
message = value.is_a?(Symbol) || value.is_a?(String) ? value.to_sym : name
value_by_name(objects, value)
end
end

def value_by_proc(objects, value)
object = objects.first
if value.arity.zero?
object.instance_exec(&value)
elsif value.arity.negative?
value.call(*object)
else
value.call(*objects.first(value.arity))
end
end

def value_by_name(objects, value)
object = objects.first
message = value.is_a?(Symbol) || value.is_a?(String) ? value.to_sym : name
value_by_name_proc(message).call(object)
end

def value_by_name_proc(message)
proc do |object|
if object.is_a?(Hash)
if object.key?(message)
object[message]
Expand All @@ -89,6 +131,20 @@ def evaluate(objects)
end
end

def validate_join_type!(type)
return unless type
return if @allowed_relations.include?(type.to_sym)

raise Chewy::InvalidJoinFieldType.new(type, @name, options[:relations])
end

def find_allowed_relations(relations)
return [] unless relations
return relations unless relations.is_a?(Hash)

(relations.keys + relations.values).flatten.uniq
end

def compose_children(value, *parent_objects)
return unless value

Expand Down
12 changes: 2 additions & 10 deletions lib/chewy/fields/root.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module Chewy
module Fields
class Root < Chewy::Fields::Base
attr_reader :dynamic_templates, :id, :parent, :parent_id
attr_reader :dynamic_templates, :id

def initialize(name, **options)
super(name, **options)
Expand All @@ -12,9 +12,7 @@ def initialize(name, **options)

def update_options!(**options)
@id = options.fetch(:id, options.fetch(:_id, @id))
@parent = options.fetch(:parent, options.fetch(:_parent, @parent))
@parent_id = options.fetch(:parent_id, @parent_id)
@options.merge!(options.except(:id, :_id, :parent, :_parent, :parent_id, :type))
@options.merge!(options.except(:id, :_id, :type))
end

def mappings_hash
Expand Down Expand Up @@ -50,12 +48,6 @@ def dynamic_template(*args)
end
end

def compose_parent(object)
return unless parent_id

parent_id.arity.zero? ? object.instance_exec(&parent_id) : parent_id.call(object)
end

def compose_id(object)
return unless id

Expand Down
5 changes: 5 additions & 0 deletions lib/chewy/index/adapter/active_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ def raw_default_scope_where_ids_in(ids, converter)
object_class.connection.execute(sql).map(&converter)
end

def raw(scope, converter)
sql = scope.to_sql
object_class.connection.execute(sql).map(&converter)
end

def relation_class
::ActiveRecord::Relation
end
Expand Down
10 changes: 6 additions & 4 deletions lib/chewy/index/adapter/orm.rb
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,13 @@ def load(ids, **options)
additional_scope = options[options[:_index].to_sym].try(:[], :scope) || options[:scope]

loaded_objects = load_scope_objects(scope, additional_scope)
.index_by do |object|
object.public_send(primary_key).to_s
end
loaded_objects = raw(loaded_objects, options[:raw_import]) if options[:raw_import]

indexed_objects = loaded_objects.index_by do |object|
object.public_send(primary_key).to_s
end

ids.map { |id| loaded_objects[id.to_s] }
ids.map { |id| indexed_objects[id.to_s] }
end

private
Expand Down
3 changes: 1 addition & 2 deletions lib/chewy/index/import.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@ module ClassMethods
# passed objects from the index if they are not in the default scope
# or marked for destruction.
#
# It handles parent-child relationships: if the object parent_id has been
# changed it destroys the object and recreates it from scratch.
# It handles parent-child relationships with a join field reindexing children when the parent is reindexed.
#
# Performs journaling if enabled: it stores all the ids of the imported
# objects to a specialized index. It is possible to replay particular import
Expand Down
Loading

0 comments on commit 7247b5e

Please sign in to comment.