Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize repeated apollo-cache-inmemory reads by caching partial query results. #3394

Merged
merged 97 commits into from
Sep 28, 2018

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented May 3, 2018

Reading repeatedly from apollo-cache-inmemory using either readQueryFromStore or diffQueryAgainstStore currently returns a newly-computed object each time, even if no data IDs in the cache have changed.

Passing the previousResult option can improve application performance by ensuring that equivalent results are === to each other, but the presence of previousResult only makes the cache reading computation more expensive, because new objects are still created and then thrown away if they are structurally equivalent to the previousResult.

This PR is a work-in-progress with the goal of returning previous results (including nested result objects, not just the top-level result) immediately, without any unnecessary recomputation, as long as the underlying data IDs involved in the original computation have not been modified in the meantime.

This functionality is based on an npm package called optimism that I wrote to salvage rebuild performance for Meteor 1.4.2, by avoiding unnecessarily rereading files from the file system. It is not an overstatement to say that Meteor would no longer exist as a project without this powerful caching technique.

The optimism library allows caching the results of functions based on (a function of) their arguments, while also keeping track of any other cached functions that were called in the process of evaluating the result, so that the result can be invalidated (or "dirtied") when any of the results of those other functions are dirtied. Dirtying is a very cheap, idempotent operation, since it does not force immediate recomputation, but simply marks the dirtied result as needing to be recomputed the next time the cached function is called with equivalent arguments.

If this approach is successful, it should effectively close the performance gap between apollo-cache-inmemory and https://github.com/convoyinc/apollo-cache-hermes, at least as far as cache reads are concerned, without sacrificing exactness.

Cache write performance should also benefit dramatically, since much of the cost of writing to the cache comes from broadcasting new results for existing queries, which requires first rereading those results from the updated cache.

Along the way, I have taken many opportunities to refactor and simplify the apollo-cache-inmemory code. For example, the first few commits in this PR eliminate the use of graphql-anywhere to read from the local store, which unlocks a number of optimization opportunities by removing a relatively opaque layer of abstraction.

I will try to add comments to the commits below to highlight areas of special interest.

@apollo-cla
Copy link

apollo-cla commented May 3, 2018

Warnings
⚠️

❗ Big PR

Generated by 🚫 dangerJS

const args = argumentsObjectFromField(field, variables);

const info: ExecInfo = {
resultKey: resultKeyNameFromField(field),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this would be faster if we cached resultKey as a property on field.

@clayne11
Copy link
Contributor

clayne11 commented May 4, 2018

This is a fantastic idea. I brought up this issue as it pertains to rendering in #2895 but this will also solve other performance problems. Great work!

return diffQueryAgainstStore({
store: this.config.storeFactory(this.extract(query.optimistic)),
store: store,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: store can be omitted.

options?: OptimisticWrapOptions,
): OptimisticWrapperFunction<T>;
defaultMakeCacheKey(...args: any[]): any;
} = require('optimism'); // tslint:disable-line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why require instead of import + ambient type declaration file for optimism?

// we should only merge if it's an object of the same type
// otherwise, we should delete the generated object
if (typenameChanged) {
store.delete(generatedKey);
store.delete(escapedId.id);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably port this fix too:

// remove the old generated value in case the old value was
// inlined and the new value is not, which is indicated by
// the old id being generated and the new id being real
if (!generated) {
store.delete(generatedKey);
}

@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch 3 times, most recently from 35f549f to 4d5a851 Compare May 17, 2018 00:18
public hasDepTrackingCache() {
return this.data instanceof DepTrackingCache;
}

protected broadcastWatches() {
// Skip this when silenced (like inside a transaction)
if (this.silenceBroadcast) return;

// right now, we invalidate all queries whenever anything changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to remove this comment since it's no longer true.

@jamesreggio
Copy link
Contributor

I tried to give this branch a try tonight, but ran into some issues.

First, I encountered the problem I described over here: #3300 (comment)

After patching my way around this issue, I ran into an infinite recursion bug in the merge helper function. It wasn't clear what the problem is, and reverting the changes to merge from #3300 didn't fix the issue.

I'll try to pull together a Gist with a query and payload so you can reproduce it.

@jamesreggio
Copy link
Contributor

Alright, here's a repro: https://gist.github.com/jamesreggio/eedd17511a3d64d1ba1613cbc08d78c5

It includes the original GraphQL document + variables, the parsed GraphQL document, the resulting data from the server, and the error.

I have a hunch that changes to merge in #3300 led to this issue — but it's quite onerous to cut builds of individual packages within the apollo-client repo for use in another project, so I haven't tried reverting those change wholesale to see if it resolves the problem. Perhaps you can give that a try?

@hwillson hwillson changed the title Optimize repeated apollo-cache-inmemory reads by caching partial query results. [WIP] Optimize repeated apollo-cache-inmemory reads by caching partial query results. May 29, 2018
@clayne11
Copy link
Contributor

clayne11 commented Jun 4, 2018

Any movement on this? This is an incredibly exciting improvement.

@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 4d5a851 to 0cc85c1 Compare June 5, 2018 23:24
benjamn added a commit that referenced this pull request Jun 5, 2018
Restoring non-enumerability of the ID_KEY Symbol in #3544 made ID_KEY
slightly more hidden from application code, at the cost of slightly worse
performance (because of Object.defineProperty), but tests were still
broken because Jest now includes Symbol keys when checking object equality
(even the non-enumerable ones).

Fortunately, given all the previousResult refactoring that has happened in
PR #3394, we no longer need to store ID_KEY properties at all, which
completely side-steps the question of whether ID_KEY should be enumerable
or not, and avoids any problems due to Jest including Symbol keys when
checking deep equality.

If we decide to bring this ID metadata back in the future, we could use a
WeakMap to associate result objects with their IDs, so that we can avoid
modifying the result objects.
@benjamn benjamn mentioned this pull request Jun 6, 2018
3 tasks
benjamn added a commit that referenced this pull request Jun 6, 2018
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming this.addTypename is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn
benjamn added a commit that referenced this pull request Jun 6, 2018
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming this.addTypename is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn
benjamn added a commit that referenced this pull request Jun 6, 2018
After #3444 removed `Map`-based caching for `addTypenameToDocument` (in
order to fix memory leaks), the `InMemoryCache#transformDocument` method
now creates a completely new `DocumentNode` every time it's called
(assuming `this.addTypename` is true, which it is by default).

This commit uses a `WeakMap` to cache calls to `addTypenameToDocument` in
`InMemoryCache#transformDocument`, so that repeated cache reads will no
longer create an unbounded number of new `DocumentNode` objects. The
benefit of the `WeakMap` is that it does not prevent its keys (the
original `DocumentNode` objects) from being garbage collected, which is
another way of preventing memory leaks.  Note that `WeakMap` may have to
be polyfilled in older browsers, but there are many options for that.

This optimization will be important for #3394, since the query document is
involved in cache keys used to store cache partial query results.

cc @hwillson @jbaxleyiii @brunorzn
@benjamn benjamn force-pushed the benjamn/cache-result-objects-with-optimism branch from 0cc85c1 to 8b2ab9b Compare June 6, 2018 22:19
benjamn added a commit that referenced this pull request Mar 22, 2019
Not all environments where WeakMap must be polyfilled do so reliably:
#3394 (comment)
benjamn added a commit that referenced this pull request Dec 2, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
benjamn added a commit that referenced this pull request Dec 3, 2019
The result caching system introduced by #3394 gained the ability to cache
optimistic results (rather than just non-optimistic results) in #5197, but
since then has suffered from unnecessary cache key diversity during
optimistic updates, because every EntityStore.Layer object (corresponding
to a single optimistic update) counts as a distinct cache key, which
prevents cached results from being reused if they were originally read
from a different Layer object.

This commit introduces the concept of a CacheGroup, store.group, which
manages dependency tracking and also serves as a source of keys for the
result caching system. While the Root object has its own CacheGroup, Layer
objects share a CacheGroup object, which is the key to limiting diversity
of cache keys when more than one optimistic update is pending.

This separation allows the InMemoryCache to enjoy the full benefits of
result caching for both optimistic (Layer) and non-optimistic (Root) data,
separately.
benjamn added a commit that referenced this pull request Dec 3, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
benjamn added a commit that referenced this pull request Dec 3, 2019
The previousResult option was originally a way to ensure referential
identity of structurally equivalent cache results, before the result
caching system was introduced in #3394. It worked by returning
previousResult whenever it was deeply equal to the new result.

The result caching system works a bit differently, and in particular never
needs to do a deep comparison of results. However, there were still a few
(test) cases where previousResult seemed to have a positive effect, and
removing it seemed like a breaking change, so we kept it around.

In the meantime, the equality check has continued to waste CPU cycles, and
the behavior of previousResult has undermined other improvements, such as
freezing cache results (#4514). Even worse, previousResult effectively
disabled an optimization that allowed InMemoryCache#broadcastWatches to
skip unchanged queries (see comments I removed if curious). This commit
restores that optimization.

I realized eliminating previousResult might finally be possible while
working on PR #5617, which made the result caching system more precise by
depending on IDs+fields rather than just IDs. This additional precision
seems to have eliminated the few remaining cases where previousResult had
any meaningful benefit, as evidenced by the lack of any test changes in
this commit... even among the many direct tests of previousResult in
src/cache/inmemory/__tests__/diffAgainstStore.ts!

The removal of previousResult is definitely a breaking change (appropriate
for Apollo Client 3.0), because you can still contrive cases where some
never-before-seen previousResult object just happens to be deeply equal to
the new result. Also, it's fair to say that this removal will strongly
discourage disabling the result caching system (which is still possible
for diagnostic purposes), since we rely on result caching to get the
benefits that previousResult provided.
benjamn added a commit that referenced this pull request Dec 3, 2019
The result caching system introduced by #3394 gained the ability to cache
optimistic results (rather than just non-optimistic results) in #5197, but
since then has suffered from unnecessary cache key diversity during
optimistic updates, because every EntityStore.Layer object (corresponding
to a single optimistic update) counts as a distinct cache key, which
prevents cached results from being reused if they were originally read
from a different Layer object.

This commit introduces the concept of a CacheGroup, store.group, which
manages dependency tracking and also serves as a source of keys for the
result caching system. While the Root object has its own CacheGroup, Layer
objects share a CacheGroup object, which is the key to limiting diversity
of cache keys when more than one optimistic update is pending.

This separation allows the InMemoryCache to enjoy the full benefits of
result caching for both optimistic (Layer) and non-optimistic (Root) data,
separately.
benjamn added a commit that referenced this pull request Jun 8, 2020
When an object is evicted from the cache, common intuition says that any
dangling references to that object should be proactively removed from
elsewhere in the cache. Thankfully, this intuition is misguided, because a
much simpler and more efficient approach to handling dangling references
is already possible, without requiring any new cache features.

As the tests added in this commit demonstrate, the cleanup of dangling
references can be postponed until the next time the affected fields are
read from the cache, simply by defining a custom read function that
performs any necessary cleanup, in whatever way makes sense for the logic
of the particular field. This lazy approach is vastly more efficient than
scanning the entire cache for dangling references would be, because it
kicks in only for fields you actually care about, the next time you ask
for their values.

For example, you might have a list of references that should be filtered
to exclude the dangling ones, or you might want the dangling references to
be nullified in place (without filtering), or you might have a single
reference that should default to something else if it becomes invalid. All
of these options are matters of application-level logic, so the cache
cannot choose the right default strategy in all cases.

By default, references are left untouched unless you define custom logic
to do something else. It may actually be unwise/destructive to remove
dangling references from the cache, because the evicted data could always
be written back into the cache at some later time, restoring the validity
of the references. Since eviction is not necessarily final, dangling
references represent useful information that should be preserved by
default after eviction, but filtered out just in time to keep them from
causing problems. Even if you ultimately decide to prune the dangling
references, proactively finding and removing them is way more work than
letting a read function handle them on-demand.

This system works because the result caching system (#3394, #5617) tracks
hierarchical field dependencies in a way that causes read functions to be
reinvoked any time the field in question is affected by updates to the
cache, even if the changes are nested many layers deep within the
field. It also helps that custom read functions are consistently invoked
for a given field any time that field is read from the cache, so you don't
have to worry about dangling references leaking out by other means.
benjamn added a commit that referenced this pull request Jun 30, 2020
The makeVar method was originally attached to InMemoryCache so that we
could call cache.broadcastWatches() whenever the variable was updated.
See #5799 and #5976 for background.

However, as a number of developers have reported, requiring access to an
InMemoryCache to create a ReactiveVar can be awkward, since the code that
calls makeVar may not be colocated with the code that creates the cache,
and it is often desirable to create and initialize reactive variables
before the cache has been created.

As this commit shows, the ReactiveVar function can infer the current
InMemoryCache from a contextual Slot, when called without arguments (that
is, when reading the variable). When the variable is updated (by passing a
new value to the ReactiveVar function), any caches that previously read
the variable will be notified of the update. Since this logic happens at
variable access time rather than variable creation time, makeVar can be a
free-floating global function, importable directly from @apollo/client.

This new system allows the variable to become associated with any number
of InMemoryCache instances, whereas previously a given variable was only
ever associated with one InMemoryCache. Note: when I say "any number" I
very much mean to include zero, since a ReactiveVar that has not been
associated with any caches yet can still be used as a container, and will
not trigger any broadcasts when updated.

The Slot class that makes this all work may seem like magic, but we have
been using it ever since Apollo Client 2.5 (#3394, via the optimism
library), so it has been amply battle-tested. This magic works.
benjamn added a commit that referenced this pull request Jun 30, 2020
…6512)

The makeVar method was originally attached to InMemoryCache so that we
could call cache.broadcastWatches() whenever the variable was updated.
See #5799 and #5976 for background.

However, as a number of developers have reported, requiring access to an
InMemoryCache to create a ReactiveVar can be awkward, since the code that
calls makeVar may not be colocated with the code that creates the cache,
and it is often desirable to create and initialize reactive variables
before the cache has been created.

As this commit shows, the ReactiveVar function can infer the current
InMemoryCache from a contextual Slot, when called without arguments (that
is, when reading the variable). When the variable is updated (by passing a
new value to the ReactiveVar function), any caches that previously read
the variable will be notified of the update. Since this logic happens at
variable access time rather than variable creation time, makeVar can be a
free-floating global function, importable directly from @apollo/client.

This new system allows the variable to become associated with any number
of InMemoryCache instances, whereas previously a given variable was only
ever associated with one InMemoryCache. Note: when I say "any number" I
very much mean to include zero, since a ReactiveVar that has not been
associated with any caches yet can still be used as a container, and will
not trigger any broadcasts when updated.

The Slot class that makes this all work may seem like magic, but we have
been using it ever since Apollo Client 2.5 (#3394, via the optimism
library), so it has been amply battle-tested. This magic works.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.