-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: different cache format #285
Comments
Hi, I hope you don't mind me making some comments and asking some questions, even though I am not a team member. I am just a user (at some point). This subject intrigues me a lot and I am learning. I hope you will have patience with me. I've looked at the Relay docs and they use a flattened record approach. Correct me if I am wrong, but the rule for data sent back from a GraphQL server is that the data always has ids and these must be unique among all records (i.e. like node ids in a graph) (as mentioned here towards the bottom)? So, with that in mind, the flattening to records makes sense to me, since every record is unique. What I don't understand completely is the need to remember references to the actual queries. Maybe I missed you guys trying to come up with a different approach, but what I believe should be happening is the client should be asking for a full result of data from the cache, as if it were a GraphQL server itself. If the cache can fulfill the request fully, it does with a GraphQL response, with no request to the real GraphQL server. If it can't, the cache system would return a diff'ed query (usually smaller than the original), which would be sent to the real GraphQL server to get the missing data, to inevitably update the cache and fulfill the request. I know the overall goal of Meteor is to be able to combine data sources in the backend, even other REST endpoints. But, to do this and conform to GraphQL, any external data source would need some sort of UUID in their data as the id. This is where things get murky for me. I realize the Relay method to caching the data is complicated, but isn't that the ultimate goal for the client's cache with GraphQL? Fill the request, if it can't, give the optimized query for the missing data, fetch it, then update the cache? In other words, and I might be overstepping my realm of influence here, but to me, the cache format should be optimized and structured for the best and optimal for use with the client to serve as a "middleware" to the real GraphQL server. It shouldn't be built or structured for any other reason. 😄 Scott |
Actually our client does the exact same thing that relay does - this question doesn't have anything to do with whether or not IDs are used to normalize. Note that we use IDs when they are present but don't require them to exist to do caching. |
The suggested approach here with paths is just a different data structure for representing flattened records. |
Oh ok. The have or have not id's throws me for a loop I guess. Theoretically, there won't be a record, without ID's or rather there can't be (in my mind). So, I guess that is my misunderstanding. I guess I am at a higher level with my thinking than this issue is. Sorry, for the interruption then. Scott |
Even with Relay, not all records need to have IDs. In that case, Relay generates an ID, and and Apollo Client uses the query path. In fact, you can do a lot of nice caching stuff with no IDs at all, which can be useful for certain cases. |
Interesting. I'll have to learn some more, I guess..... Thanks for your time. Scott |
tl;dr: We need to be able to identify the same entity called from 2 different paths. Global id's serves this nicely. I use some Relay machinery (by means of The Connections and Node Interface caches nice in Relay (at least if think, since I don't have redux devtools to inspect it =) but no wonders, it is not supported in Apollo. For example (I'm telling the obvious with these example, but for completude...): Users connection
Node Interface
ProblemEven if the first user in connection (first query) is the same of the user in the node call (second query), they will be cached separately. And all sorts of problems arise: inconsistent data, only way to update the cache of the connection is calling the exactly same connection, etc. I've seem talks of Apollo supporting these idiomatic features, like here. For anything other than simple applications these features are a must to have. It could get much more complicated. User have a friend, which is also a user... identifying different paths - whatever they are - by the unique global id is much better than the developer having to call queries and think to hard in ways to optimize the caching. I know this feature wouldn't be the default caching mechanism for Apollo, but shouldn't it be not only supported but even encouraged? How is the roadmap for this right now? ps: Sorry if this discussion don't belong here, I move somewhere else... |
Please do! We do have the concept of a global ID which you can use by passing
I don't agree that all of what Relay specifies needs to be idiomatic GraphQL, especially because a lot of things like the mutation spec is simply driven by the idiosyncrasies of the Facebook internal GraphQL layer. But I think it should be possible to use Node queries in a useful way if you have them available! |
Revert "Replace `createMeteorNetworkInterface` with `createNetworkInterface` in `meteor.md`"
Note that this isn't a proposal to actually change anything, just something I wrote up on a plane flight when I was thinking about my mental model for how the apollo cache works.
Idea about alternate cache format
This is based on thinking about how I would write a blog post about how caching works in Apollo client (AC). The main way it makes sense for me to talk about how the cache works, and my mental model for how I can understand what is and isn't cached, is through paths to data (inspired by falcor).
Current model: normalized objects
Let's say you have a query and result like this:
Right now, AC understands this as three objects:
If one of the objects has an ID, then the paths are relative to the ID:
Representation in cache:
This is somewhat nice because we maintain the identity of "objects" - these are things that have fields, and some of the fields are references and some are scalar values.
Alternative model: paths
OK so what's the other model I'm proposing? It's to get rid of the idea of objects entirely, and store paths only instead. Let's look at the above in this light.
Let's look at our query from above again:
This query, which is a nested structure, can be rewritten as a series of paths, just by looking at the query:
This already gives us a bit of clarity, which is that the query which looks somewhat complex, is actually only fetching two scalar fields. Now, what if the cache just used these same paths to store the result?
You can see right away that this is much easier to look at than the set of three objects at the top, because it completely removes the need for references or generated IDs - it just uses the same stuff you typed in the query. And what would this look like with IDs?
Also quite clear. There is one question here, which is whether we should somehow remember that user 5 came from the
user(id: 5)
query - the current format remembers that but the new one doesn't yet. Perhaps we could add a second entry in the case of IDs, which maps paths to ID:Pros and cons
===
to compare them.Quick third approach
There's also the approach of storing an actual tree of JSON instead of the paths, and merging the data into the tree on result. So the above would be stored directly as a tree:
Unless there is an ID, in which case the tree is normalized:
I feel like this could be easier to understand for simple cases, but will end up with a disorganized store where it's not clear how deep objects actually go.
Conclusion
I think this is something to think about, and perhaps as we implement new features we should decide if the path-based format would be a help or a hindrance.
Based on the tradeoffs, I don't think we should make this change immediately, but it can be a useful tool for explaining the mental model of how the cache works. If the mental model makes sense to people, it could be worth updating the code to match it.
The text was updated successfully, but these errors were encountered: