Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RecordAccessors to ResourceFinders #1045

Merged
merged 1 commit into from
Aug 31, 2017
Merged

Conversation

lgebhardt
Copy link
Member

@lgebhardt lgebhardt commented Apr 27, 2017

Background

I originally intended for JR to be able to support different ORMs. However, the architecture of JR has slowly drifted off course, and now needs a correction to meet this goal.

Originally, the Resource class was heavily tied to SQL through ActiveRecord::Relation. With that in mind I created #977, which broke out the calls to the DB layer into a RecordAccessor class. However, there were several problems with this implementation (see #1030 for some context). One, it actually returns resources, so it's not really a RecordAccessor (not sure where my head was). Two, I think it's the wrong abstraction - a RecordAccessor should return records (i.e. models), not resources. Finally, the ActiveRecordAccessor attempts to handle caching as well by returning either resources or serialized json from the cache.

Luckily, this implementation has not seen the light of day in a release, and I'd like to take the time needed to straighten it out.

Goals

The broad goals of this PR are to:

  • Clearly define the caching and ORM access layers in JR
  • Refactor RecordAccessors to provide an interface to the ORM instead of returning Resources
  • Refactor caching, which currently crosses many layers

What's in this PR

  • Renames the default accessor from ActiveRecordAccessor to ActiveRelationRecordAccessor
  • Removes the caching code from record accessors
  • Currently caching is disabled and untested in this WIP, but will need to be reimplemented closer to the serializer
  • Cleans up the PORO tests to use a custom record accessor and the PORO model now includes ActiveModel::Model for ActiveRecord compliance

What's left to do?

Caching needs to be reworked to fit in the desired layers. A lot of this is TBD. One proposal is to split the caching into two types:

  • Model caching
  • Serialized Resource caching which is handled on a per resource instance basis as the respose document is being contructed.

This approach to caching would allow Serializer caching to work across all ORMs, even if model caching isn't implemented for a particular ORM.

Here's a diagram to illustrate the layers and the caching locations I'm proposing:
jr-app-architecture

@DavidMikeSimon
Copy link
Contributor

DavidMikeSimon commented Apr 28, 2017

I'm definitely in favor of this architecture change. 👍 :-D.

Regarding caching: so if I'm following correctly, what's currently called the "resource cache" would be split into two parts, a "model cache" and a "serialization cache", in order to decouple as much of the caching code as possible from any particular ORM, and allow it to work at least partially with unusual ORMs and on POROs.

My initial impression is that this is doable and makes sense. One thing though is that I'm not sure a model cache (that is, a cache that returns actual ORM instances) would provide much speed benefit in most cases. I suggest using proxy models instead:

  • For AR models that they want to have full caching on, users enable proxy models for them. This would be done from the Resource DSL with a method like model_proxy, which just sets a flag that the accessor can read and interpret.

  • A cache-aware accessor, in a cacheable operation like show or index, when fetching models on a Resource that has the model_proxy flag set, returns lightweight instances of a proxy class with only the id and cache field loaded. The proxy model's method_missing will lazily load the real model's instance and forward calls to it. This way, the application will still function even if assumptions made by the caching system are broken (e.g. Cache context based on record #930).

  • There is a risk of major slowdown if the proxy model does a lot of lazy loads. Suppose an index operation loads 50 proxy model instances and calls a forwarded method on each of them. That would be 50 queries! It would have only been 1 query without caching. We should have a setting to warn the user if lazy loads happen, enabled by default in the dev environment.

  • Some ways the user can prevent lazy loads:

    • Change their code to avoid accessing the unavailable method
    • Disable the model proxy on that resource class
    • Maybe: Add more attributes to the proxy model to be loaded from the DB
    • Maybe: Add their own methods to the proxy model (and/or tell the proxy model which methods can be safely copied from the real model)
  • The proxy model instance will go into the Resource's _model field just like a real model instance would. The Resource does not need to know or care.

  • When an association method on the proxy model is called, it returns a proxy association: an array of proxy models, which would lazy load to a real AR association if any unknown methods (e.g. where) are called on it. (This terminology is getting a bit confusing here because AR also has "association proxies". Maybe should call this something else?)

  • The accessor pre-fills proxied associations to other proxied models based on the include paths it is given, similarly to the current preload method. For an association to a model without proxy enabled, it loads the real records for that model and pre-fills those. Ideally it should also be able to pre-fill associations from unproxied models to proxied models, but I'm not sure where in the real model instance we would keep those.

Given all that, the serializer could be in charge of interacting with the actual cache of JSON:

  • To enable caching at the serializer layer, there would be a Resource DSL method like serializer_caching.

  • The current caching method would just be a shortcut to calling both model_proxy and serializer_caching.

  • The serializer takes a first pass through the Resources it is given (following include paths) to check for any that can return cache keys. The Resource calls the cache_field method on its model instance, and that works the same whether it has a proxy or a real instance. The serializer polls the cache backend for the keys it gets and uses any results it gets back preferentially. On cache misses, the serializer renders the result as normal and (if there was an associated cache key for the rendered resource) saves it back to the cache.

  • Serialization caching without the proxy model: My guess is that this would not reduce response times in most cases. However, if the model has expensive methods which are treated as attributes, then it would be useful.

  • Proxy model without serialization caching: Not beneficial. The one advantage proxy model instances have over real model instances is that the serializer can use them to get straight at cached serializations. The only other use I can think of would be a scenario where the only fields requested are already available in the proxy (an index of just ids?), but I don't think handling this as a special case would be worth the effort.

@DavidMikeSimon
Copy link
Contributor

(I've edited the above comment after posting it, apologies if you already read through it before those changes)

@dgeb
Copy link
Member

dgeb commented May 1, 2017

@DavidMikeSimon I would like to think that, if we get the architecture correct, this proxy model concept could be implemented without knowledge of the resource. I would rather expand the interface of the RecordAccessor to provide finer-grained access to models (e.g. pluck individual fields) than introduce another layer, especially if it means that other layers must treat these proxies any different than "real" models.

@lgebhardt
Copy link
Member Author

I propose we keep the same methodology used in v0.9 where the id and chache_field are plucked from the database and cache hits (serialized resource fragments) for those are retreived. Then the remaining resources are fetched in a second call to find_by_ids. This would require a pluck like method on the record_accessor (or another abstract object if we don't want to pollute the record accessor). The calls to perform this two staged dance could be done in processor.find. Having thought about it more I think this is a cleaner location to manage the cache than in the serializer.

This can be seen in the updated diagram:

jr-app-architecture v2

The model cache would be a secondary cache to the resource fragments and could save the time when the model hasn't changed but a serialization parameter such as fields has. I'm not sure if JR needs to implement a model caching solution or if we can let users leverage existing solutions.

Thoughts?

@DavidMikeSimon
Copy link
Contributor

@dgeb Agreed, Resources and other code outside of the accessor itself shouldn't have to know or care if a record is real or a proxy. My apologies if that was unclear, my comment was a bit of a raw brain core-dump. :-)

@dgeb
Copy link
Member

dgeb commented May 1, 2017

@DavidMikeSimon Haha ... no worries. Some aspects weren't clear to me, but I'm glad we agree on that core constraint.

@lgebhardt I like the idea of moving the code that handles caching for serialized fragments into the direct client of the serializer - the operations processor. This will keep caching concepts from leaking into adjacent layers 👍

@DavidMikeSimon
Copy link
Contributor

@lgebhardt I like the idea of a pluck method on accessors, that's an elegant place to add support for cache key plucking to other ORMs, and also it is potentially much less complex than either the current approach in 0.9 or my proxy models proposal.

A couple questions:

  1. Who would be responsible for resolving associations, the accessor or the processor? If it's the accessor, then the pluck method would have to return a much more complicated structure than a simple list of id/cache-key pairs. If it's the processor, then we'd have to add a method to the accessor like "pluck_association" which takes a list of origin ids and an association name and returns a hash that maps origin ids to target ids.

  2. An advantage of the proxy models over plucking is that more of the normal behavior of Resources would be preserved. For example, in Cache context based on record #930, with proxied models the user's expected behavior would've continued with caching enabled, without any no code changes need.

However, an advantage of the plucking approach is that it would probably be faster. I'm not sure how much faster, perhaps only slightly, perhaps significantly.

@DavidMikeSimon
Copy link
Contributor

DavidMikeSimon commented May 2, 2017

@lgebhardt A possible problem with doing cache stuff outside the serializer: the data values in the included sections are not part of the cached value. If they were, we would have to update cache fields every time an associated record is created or destroyed, even if the association is one-to-many.

So, the serializer has to be able to manipulate the fetched cache fragments before they are sent, not just generate new ones. This may make it complicated to remove caching code from the serializer.

@lgebhardt
Copy link
Member Author

@DavidMikeSimon I might not understand the Proxy Model idea entirely, but I don't see a clean way around the potential for N+1 queries, which with includes could be quite a large N. So I've been thinking of an idea that is similar to the pluck_association idea you mentioned above.

In the case of a request without any includes the processor can pluck the id and cache_field from the primary resource, look for cache hits, and load the misses with one call to find_by_ids. This works well.

When we have includes it gets more complicated. In a normal rails active relation call with includes behind the scene the models will be fetched with one call and then the related models will be fetched with an additional call, based on the ids from the primary model (I think there's a few few cases where rails manages to do all this in one call, but I'm not positive). This is a nice and simple trade off between database calls and simplicity of the SQL. You will end up with your number of includes, NI+1 queries.

I've been poking at the includes and the number of queries run with includes. There's a chance for a large number of queries based on the dataset. I hadn't noticed this before because I'm guessing the test data was hiding it.

I think we can solve all this with a strategy using pluck for the primary resource and a separate pluck for each level of the includes. This will give us an array of related ids for each level of the include directives. This can then be resolved to cache hits and a call to resolve misses. This means when caching is on we'd have a pluck and retrieve misses query per level of include, plus a pair of queries for the primary resource.

We might be able to extend this to non-cached resources as well. I think it will play nicely with the eager loading, but I'm not sure.

A final benefit is this approach may provide an opportunity to paginate included resources. We could default to tossing out the extra ids from the pluck_includes but some DBs would allow us to group and limit by the parent id. Tossing ids without instantiating models isn't ideal, but is likely a workable solution (at least better than returning them all as we currently do).

I hope I'm not missing anything major.

@DavidMikeSimon
Copy link
Contributor

DavidMikeSimon commented May 4, 2017

@lgebhardt Your description actually matches the current system's behavior pretty closely, in ActiveRecordAccessor#preload_included_fragments. Seems that great minds think alike! :-D
Currently, given NI = the number of edges in the include tree in the request, it will do NI+1 queries if there are no misses, and an additional query per inclusion where there is at least one miss; therefore, the maximum is (2*NI)+1 queries.

The intention of proxy models is to (a) formalize the structure "an id and a cache field" and (b) allow for a safe (albeit slow) fallback in weird cases. Also in the future possibly (c) allow fields other than id and cache field to be plucked, to allow Resources to implement unusual security behaviors and such without triggering slow lazy loads.

To avoid N+1 on cache misses, the serializer will have to be a little aware of the general existence of proxy models, however it will not have to care about whether any given record is real or a proxy. It just needs a method to ask the accessor to lazy load multiple records (which may or may not be proxies) in one swoop. This (the calling code inside serializer) might look something like:

# Assume that either proxy or real records already exist for every included association
# from source, including indirect associations.
def preload_from_cache(source):
  # Assume get_all_records_by_type returns a hash, where each key is a resource type and
  # each value is an array of all the records (proxy or real) anywhere in the inclusion
  # tree that belong to resources of that type.
  types = get_all_records_by_type(source) 
  types.each do |t, records|
    hits, misses = cache.get(records)
    # Hits is an array of CachedFragments for some of the records
    # Misses is an array of records for which there's no up-to-date cached serialization
    @preloaded_fragments[t] ||= {}
    hits.each do |fragment|
      @preloaded_fragments[t][fragment.id] = fragment
    end
    if misses.length > 0
      # Assume load_all has the same effect as lazy loading each individual
      # miss, but in only one query (e.g. it does a find_by_ids), and that it
      # is a no-op on real records. On accessors which do not support proxies,
      # the implementation of this method would be empty.
      t._record_accessor.load_all(misses)
      misses.each do |record|
        fragment = CachedFragment.new(object_hash(record))
        cache.insert(fragment)
        @preloaded_fragments[t][fragment.id] = fragment
      end
    end
  end
end

After calling preload_from_cache once at the beginning of serialization, any other part of the serializer can check @preloaded_fragments for pre-computed work. No other code in the serializer needs to interact directly with an accessor, and nowhere does the serializer have to know or care if a record is real or proxy.

@DavidMikeSimon
Copy link
Contributor

Also @lgebhardt I am very 👍 on paginating included resources, it plugs a big potential resource hog scenario.

@lgebhardt lgebhardt force-pushed the refactor_record_accessors branch from d987063 to b44184e Compare August 4, 2017 19:41
@lgebhardt lgebhardt changed the title Refactor RecordAccessors to return model records instead of resources Refactor RecordAccessors to ResourceFinders Aug 4, 2017
@lgebhardt lgebhardt force-pushed the refactor_record_accessors branch 2 times, most recently from a26aab4 to 74713cb Compare August 7, 2017 12:38
@lgebhardt
Copy link
Member Author

I've updated this PR with the following changes:

  • RecordAccessors have been replaced with ResourceFinder mixins to the Resource class. The default ResourceFinder is set in the configuration, and by default is the ActiveRelationResourceFinder. Other ResourceFinders are planned.
  • ResourceIdentity class is introduced and used for passing Identities around internally. This makes it easier to keep track of the relationship structure, especially with polymorphic relationships.
  • Caching is implemented. Resource retrieval is now two steps as described earlier: first the structure of the result set is retrieved as ResourceIdentities (and a cache field if caching) by walking the included relationships; then the Resources are retrieved from the cache; finally any missed resources are retrieved by instantiating models and serializing the attributes. The structure always comes from the first step.
  • The cache now uses the hash of the cache field. This fixes a bug where the string representation of Time Stamps were losing their millisecond precision. The logic for hashing a cache field can be overridden.
  • Resource.resource_for renamed to Resource.resource_klass_for
  • Resource.resource_for_model renamed to Resource.resource_klass_for_model
  • New apply_join option on relationships. Allow for control over how the joins are handled.
  • Breaking Change The automatically created Resource methods named for relationships have been removed as they are no longer used.
  • Breaking Change The records_for Resource method has been removed as it is no longer used.

ToDo:

  • Pagination of related resources
  • Use of Arel instead of SQL fragments in ActiveRelationResourceFinder
  • Maybe add other databases to test suite, which may be important to testing the pagination of related resources
  • Update documentation

It's likely I've missed some breaking changes. I'd like to get those documented, so if you encounter problems where this breaks your implementation please note it here.

Copy link
Member

@dgeb dgeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive but much needed refactor. I left a few minor questions / suggestions. Looking forward to merging!

end
# classes.each do |klass|
# klass.caching(false)
# end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just delete instead of commenting these out?

end
end
end
# def test_to_many_relationship_pagination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ToDo: since (I think) we want to use this eventually.

# require File.expand_path('../../../test_helper', __FILE__)
# require 'jsonapi-resources'
# require 'json'
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another ToDo needed

links: {
self: "/posts/2/relationships/comments",
related: "/posts/2/comments"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another ToDo needed.

end
end
rescue => e
handle_exceptions(e)
end
render_response_document
end

def run_in_transaction(transactional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method run_in_transaction sounds like it should always be transactional. Perhaps the method should be renamed or the transactional param should be dropped?

@@ -1,7 +1,7 @@
require 'jsonapi/formatter'
require 'jsonapi/processor'
require 'jsonapi/record_accessor'
require 'jsonapi/active_record_accessor'
# require 'jsonapi/resource_finder'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete instead of comment out?

# else
# type
# end
# end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be deleted?

@@ -68,6 +88,10 @@ def redefined_pkey?
belongs_to? && primary_key != resource_klass._default_primary_key
end

def inverse_relationship
@inverse_relationship
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps use attr_reader?

@lgebhardt lgebhardt force-pushed the refactor_record_accessors branch from 74713cb to e9ace8b Compare August 31, 2017 15:01
@lgebhardt lgebhardt force-pushed the refactor_record_accessors branch from e9ace8b to bdcbf61 Compare August 31, 2017 15:39
@dgeb dgeb merged commit 854ec70 into master Aug 31, 2017
@dgeb dgeb deleted the refactor_record_accessors branch August 31, 2017 17:04
@mmun
Copy link

mmun commented Aug 31, 2017

Woohoo! :)

next unless Module === klass
if ActiveRecord::Base > klass
klass.reflect_on_all_associations(:has_many).select{|r| r.options[:as] }.each do |reflection|
(hash[reflection.options[:as]] ||= []) << klass.name.downcase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context: causing trouble on 9.10->10.2 upgrade

I happen to have an active record base class Class.new(User) that is being found in object space and since it doesn't have a class name, is crashing on klass.name.downcase...

A few questions:

  1. Does this need to be a lookup in object space vs. ActiveRecord::Base.descendants?
  2. Could it should it call model_name.name.downcase instead of .name.downcase. That at least gives an ArgumentError: Class name cannot be blank. You need to supply a name argument wh en anonymous class given
  3. would you accept a PR to make this method easier to customize the various parts of this? 1) the selection of candidate classes, 2) the selection of record classes from item 1 3) the processing of ar record classes that got through 1 and 2?

Copy link
Collaborator

@bf4 bf4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting a few areas of changes such as in inferring the class which changes from 0.9 -> 0.10 (and 0.11 dev at time of writing)

(to be honest, I accidentally included too many comments in this review comment. sorry about that. I had some pending when I made it. )

@@ -379,115 +342,56 @@ def related_link(source, relationship)
link_builder.relationships_related_link(source, relationship)
end

def to_one_linkage(source, relationship)
linkage_id = foreign_key_value(source, relationship)
linkage_type = format_key(relationship.type_for_source(source))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade-polymorphic

def type_for_source(source)
if polymorphic?
resource = source.public_send(name)
resource.class._type if resource
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade-polymorphic

if _model_class && _model_class.ancestors.collect { |ancestor| ancestor.name }.include?('ActiveRecord::Base')
model_association = _model_class.reflect_on_association(relationship_name)
if model_association
options = options.reverse_merge(class_name: model_association.class_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade:inferred class_name

}

related_resources = source_resource.public_send(relationship_type, rel_opts)
source_resource = source_klass.find_by_key(source_id, context: context, fields: fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade change

filters: { resource_klass._primary_key => resource.id }
}

resource_set = find_resource_set(resource_klass,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade

options[:cache] = resource_klass.caching?
resources = {}

identities = resource_klass.find_fragments(find_options[:filters], options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade

relationship = _relationship(relationship_name)

if relationship.polymorphic? && relationship.foreign_key_on == :self
find_related_polymorphic_fragments(source_rids, relationship, options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:upgrade


pluck_fields = [primary_key, related_key, related_type]

relations = relationship.polymorphic_relations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade:find_related_polymorphic_fragments

if relationship.polymorphic?
table_alias = relationship.parent_resource._table_name

relation_name = polymorphic_relation_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade

def serialize_to_hash(source)
@top_level_sources = Set.new([source].flatten(1).compact.map {|s| top_level_source_key(s) })
# Converts a resource_set to a hash, conforming to the JSONAPI structure
def serialize_resource_set_to_hash(result_set)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants