Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-org-compatible schemas for data in Redux and persistent store #5006

Open
gnprice opened this issue Sep 16, 2021 · 6 comments
Open

Multi-org-compatible schemas for data in Redux and persistent store #5006

gnprice opened this issue Sep 16, 2021 · 6 comments

Comments

@gnprice
Copy link
Member

gnprice commented Sep 16, 2021

We'd like to store server data for all the accounts the user is logged in as, #5005, instead of just a single "active account".

As part of that, we'll need to arrange the data in a way that makes room for data from multiple accounts, both in Redux and in the store that persists between runs of the app.

This issue is for developing that new pair of schemas and migrating to it, without yet making any changes to what data we actually store.

I'll add in a comment some specific design thoughts we have so far on what that might look like.

@gnprice
Copy link
Member Author

gnprice commented Sep 16, 2021

In the status quo, our Redux state tree looks like:

  • accounts -- the only subtree containing multiple accounts' data.

    This is an array, with the active account at index 0 and the rest ordered in LRU fashion.

    For each account, we have (chiefly):

    • realm: URL and email, reflecting how the user logged in
    • apiKey, which we got by logging in, and together with realm and email lets us make requests on behalf of the user
    • zulipVersion, zulipFeatureLevel, and userId, which are server data we get from the /register response aka the initial fetch
  • session and settings, which each have a mix of data related to the particular server/account and data that's per-client; settings persists between launches while session is ephemeral

  • drafts and outbox, which have data that's per-server but that reflect data the server doesn't (or may not) have yet. At present, when you switch accounts we discard these, which isn't great: Do not remove outbox messages on logout and account switch #3278. After we have a new schema (completing the present issue), these would be first in line for a partial version of Store server data for multiple accounts at once #5005.

  • caughtUp and fetching, which together with parts of session represent the boundaries of what parts of the server state we know, what parts we don't know, and what parts we're in the middle of trying to fetch

  • a long list of subtrees all consisting of server data, including those at users, streams, messages, many miscellaneous others, and realm which is basically an arbitrary bundle of miscellaneous bits of server data.

Then the schema of our persistent storage is simple to describe in terms of the Redux schema:

  • It's all in a key-value store.
  • Each top-level subtree is one key, except that some subtrees we leave ephemeral and don't persist at all. The ephemeral subtrees are session, caughtUp, fetching, and a handful of the subtrees with server data.
  • We have a listener on the Redux store watching for changes there, and it periodically takes a new version of the store and writes out the new versions of all the subtrees that have changed. With Store persistent data in a sound way #4841 we plan to make these updates transactional, so that the state on disk always corresponds to a coherent single snapshot of what was in Redux.

@gnprice
Copy link
Member Author

gnprice commented Sep 16, 2021

Here's thoughts on a new design. First, considering what we'd like the schema in Redux to look like, because that's the surface area that the bulk of our app's code interacts with:

  • We'd like all the server data for a given account to be bundled together as one subtree somewhere inside the state.

    That way, when handling a server event or the result of another fetch (like a message fetch), we can have a sub-reducer function that gets passed the new data and just that one subtree -- as far as that sub-reducer is concerned, there's only one account in the world, and it has no access to other accounts' data.

    This is important because handling events and other fetch results represents the bulk of our reducer code. If all that code had to carefully make sure it operated on the right account's data, that'd be a lot of surface area where it'd be easy to make subtle, dangerous bugs by crossing wires between different accounts. With this design, all that reducer code stays no more complicated than it is now with our single-active-account design, with no opportunity in that code for that sort of bug.

    The same subtrees may also contain some per-account data that isn't from the server, like what's now in drafts and outbox.

  • Those server-data subtrees, in turn, can be bundled together as some container. For example we might have a top-level subtree state.byAccount, which might be an Immutable.Map indexed by some way of identifying a particular account.

  • One question is how to index the accounts for that container.

    We should make the indexes stable: once you add an account, it keeps the same key in that container until you forget it / log out, even as you switch between looking at different accounts; log into other accounts; log out of / forget other accounts; or (ideally) change your email on that account. If the key changes, that'd be a likely source of bugs.

    Stability on removing other accounts points away from using an array; we'll want a map.

    I think the cleanest thing is to have an arbitrary, internal, sequential numeric ID we assign to each account you log into. That means we'll need to always know what the ID of the active account is, and we'll want a small data structure outside of this container to tell us the list of accounts we have and their respective IDs.

  • Putting it together, our Redux state tree might look like:

    • settings and session, containing data that's per-client and independent of the particular server/account, with the distinction between them being that one is persistent and the other ephemeral.

      This is just like the status quo of these two subtrees, except that we'll have migrated their per-account data into the per-account subtrees.

    • byAccount, an Immutable.Map or other container keyed on "account ID". Each value is a large subtree containing all the server data and most other per-account data, including drafts, outbox, and parts of what's now in settings and session.

    • activeAccountId: number, which just identifies which subtree of byAccount is currently reflected in the app's main UI. This changes whenever the user switches accounts.

    • nextAccountId: number. When the user logs into another account, we assign it this ID, and then increment this value. Boring, but simple, and it lets us reduce edge cases to worry about by never reusing an account ID.

    • accounts. Then as now, this will be an array of accounts, reflecting the order they appear in the UI (the current account-pick screen, or another future organization switcher.)

      An essential piece of data these will gain is accountId, for finding the bulk of this account's data in state.byAccount. A possible future is that all other per-account data moves there, and state.accounts becomes just a list of account IDs.

      At least initially, I think we want to keep apiKey, realm, and email here in the state.accounts list -- that is, the data we must have for an account, with which we can always fetch the bulk of the rest of it (all but drafts and outbox) from the server. That's because initially (i.e. before doing Store server data for multiple accounts at once #5005, which has significant prerequisites) we're going to still only be keeping around server data for a single account at a time, just as we do now, which means we'll be discarding most of the active account's byAccount subtree when you switch accounts. I wouldn't want a bug there to force the user to log in again from scratch, and conversely there's no rush to move these bits of data.


A UI side note: the behavior where we keep accounts in LRU order is kind of silly. The normal thing would be that they're in a stable order; new accounts you add go at the bottom; and there's some UI for the user to explicitly reorder them, by dragging them around.

It'd be straightforward to proceed with this data-schema change without affecting the LRU behavior. That's even probably the right way to organize the changes, in order to separate PRs making big infrastructural changes from PRs making significant UI changes, which call for different modes of review. But once we've made this schema change, so that the concept of "which account is the active account" is an explicit value rather than being implicitly bound up with the order of the state.accounts list, that would be a good UI change to make.

@gnprice
Copy link
Member Author

gnprice commented Sep 17, 2021

And here's thoughts on what we might want the schema to look like for the data we store persistently on the device.

We're not going to want to continue to have each whole top-level subtree (that we persist at all) correspond to a single key in the persistent store. That would put almost all the app's data in a single key, corresponding to state.byAccount.

  • As a baseline, we'll want the granularity to remain as fine as it is now, with e.g. state.users and state.streams separate. That means we'll want, say, state.byAccount.get(1).users to have its own key.
  • We'll also want other top-level subtrees like state.settings to still have a single key.
  • As a future follow-up improvement, we might split up some of these subtrees further: for example a separate key for each user like state.byAccount.get(1).users.get(12345), while some siblings of users remain single keys, like perhaps state.byAccount.get(1).settings. (Even if we decide to split up settings, too, and every one of its siblings, we won't split all of them at once, so we'll have intermediate states where some are split and others aren't.)

As things stand, we hardcode that one-key-per-top-level-subtree behavior. The upstream redux-persist actually made that configurable -- you could pass it a stateIterator function, to walk a state tree and provide an iterator of pairs (key, subtree) -- but after vendoring/forking that library into our own tree we eliminated that feature, along with other configuration, in order to facilitate refactoring that code to let us make changes on the road to #4841.

But that behavior remains very much under our control. It's expressed entirely by this bit of code in redux-persist/createPersistor.js where we decide what to write:

    // Atomically collect the subtrees that need to be written out.
    const updatedSubstates = [];
    for (const key of Object.keys(state)) {
      if (!passWhitelistBlacklist(key)) {
        continue;
      }
      if (state[key] === lastWrittenState[key]) {
        continue;
      }
      updatedSubstates.push([key, state[key]]);
    }

together with some code in redux-persist/getStoredState.js for the corresponding reads.

So we'll want to write some new code that does those jobs, but with all the specific wrinkles we want, like that byAccount gets split and so do its direct children. Naturally those app-level details of our particular state layout shouldn't live mixed in with the systemsy code at those sites in redux-persist/; we'll make some API for those details to live elsewhere, as either callbacks or config objects that redux-persist applies. Happily it's all internal to our app, so there's no need for that API to be stable or fully general -- we can change up the API for future changes however we need to -- only for it to be reasonably clean for understanding the code on either side.

For efficiency, one difference I think we'll want from the old stateIterator is that it should get passed the old as well as the new state, and do all the walking through them for itself. That way its caller in createPersistor doesn't have to keep looking up potentially-deeply-nested subtrees again and again.

For the actual shape of the keys, I imagine we'll go for something like
state.byAccount.get(1).users -> byAccount:1:users
i.e. just concatenate the components of the path down the tree, with a delimiter like :. (This works as long as the components are always either JS identifiers or numbers, neither of which can contain :. We'll figure something else out if we want more general strings in there.)

@gnprice
Copy link
Member Author

gnprice commented Sep 17, 2021

Another question is how to do migration. This was actually the reason we forked/vendored redux-persist in the first place: upstream had a new major version with a new design for migrations, which insisted on doing migrations locally in each subtree and was not compatible with changes that seriously rearranged the subtrees themselves, as we need to do here.

I think the existing migration logic will mostly give us what we need. But this is something I'll need to look at more closely.

One thing that I know is missing is that when we stop using an old key -- for example users, in favor of byAccount:1:users and byAccount:2:users and so on -- and then we write out a migrated version of the state using the new keys, it would be good to actually delete the old key. The current code will let it sit around, which is inefficient at best and a hazard for causing subtle bugs. To date this hasn't mattered because we have never, at least not since 2018 when we started doing migrations at all, stopped using an old key.

@gnprice
Copy link
Member Author

gnprice commented Sep 18, 2021

Here's a different aspect that requires some more design and experimentation: what will we want the app code to look like?

One part of this, about reducers, is discussed above:

  • We'd like all the server data for a given account to be bundled together as one subtree somewhere inside the state.
    That way, when handling a server event or the result of another fetch (like a message fetch), we can have a sub-reducer function that gets passed the new data and just that one subtree -- as far as that sub-reducer is concerned, there's only one account in the world, and it has no access to other accounts' data.
    This is important because handling events and other fetch results represents the bulk of our reducer code. If all that code had to carefully make sure it operated on the right account's data, that'd be a lot of surface area where it'd be easy to make subtle, dangerous bugs by crossing wires between different accounts. With this design, all that reducer code stays no more complicated than it is now with our single-active-account design, with no opportunity in that code for that sort of bug.

But there's also our UI code, and our selector code (particularly for complex selectors, like those in unreadSelectors.js that assemble the data the unreads screen needs), and our code in fetchActions and elsewhere (much of it organized as thunk actions) that drives requests to the server. For the great bulk of that code, in any given function call or React component there's just one account it's concerned with, and everything it calls in turn should operate on the same one account. Right now, the way we accomplish that is:

  • All our code that uses any per-account data is meant to operate on the active account, unless specifically documented otherwise.
  • So when one piece of code (that's meant to operate on a single account) calls some other piece of code, the callee acts on the same account as the caller because they're both acting on the same account.
  • And then in cases where the active account could change over the course of a function call -- basically, across an await -- we should be checking for that and bailing out if it does, and to the extent we don't, that's a bug. We do this in the event-queue long-polling loop in startEventPolling, which is the most important case, but not most other places. (Just filed Discard fetches when the active, logged-in account has changed #5009 for this.)

In a future with multiple accounts' data #5005, this solution will break down, at least in some areas:

  • Some parts of the UI, and the selectors that drive them, may be able to continue implicitly using the single active account.
  • But the event-queue loop, for one, will certainly be running for several different accounts at once; that's what it takes to be maintaining server data from them.
  • We'll almost certainly also want initial fetches to happen for accounts that aren't the one currently in the foreground of the UI -- i.e., that aren't the active account -- because that's what it takes to start having server data for those accounts, and to have an event queue to maintain it. So doInitialFetch, at least, and its callees, will need to have some explicit way to thread through the information of which account they're supposed to be operating on. That's a significant swath of code already.
  • There will also be some UI that needs to operate on each of several different accounts' server data at the same time, for example Add unread counts to account view #893.
  • Even for the bulk of the app's UI, we may want the implicit "active account" it operates on to be something less than global. For example, when navigating from one account's UI to another (particularly with Add quick organization switcher #1321, perhaps in a drawer or something), there may be a brief period during a transition where both accounts' UI are visible at once. This suggests that we may want something like a "main UI" container component supplying a notion of "active account" just for its React descendants, while several of those can exist at once. That in turn would require that all the selectors driving the main UI be somehow implicitly supplied with the local "active account", rather than finding it implicitly on the global Redux state.

I think what this may point toward is that we're going to want, for each account, a separate object that looks a lot like a Redux store. (Not sure yet if they should each be actual Redux stores, or some sort of proxy for a single global actual Redux store.) That is, we'll want an object that:

  • has a method like getState that gets just the one account's data, to pass it to selectors that operate on server data;
  • has a method like dispatch that
    • can be passed a plain-object action, e.g. with the result of a fetch, and applies it with the appropriate reducer to that account's data;

    • crucially, can be passed a thunk action, and invokes that thunk action with the same per-account versions of dispatch and getState.

      This is the part that will make it reasonable to have a function like doInitialFetch with a bunch of callees -- the bulk of which are already either selectors or thunk actions -- and thread the intended account down throughout its call tree.

  • can be attached in the React component tree in a manner akin to react-redux's Provider component, so that descendants can use functions akin to react-redux's connect, useSelector, and useDispatch to get the appropriate object for their part of the React tree.

I say "a method like getState" and so on because one possibility is that this object should have one method for the per-account state and another for the global state, or the account-independent state.

One thing that might point us toward having a single actual Redux store, and the per-account versions as proxy views on that global store, is if we have much code that with separate stores would need to look at several different stores at once. But there's very little code that acts on several accounts' data at once; the main example that comes to mind is AccountPickScreen, but even there it's mostly a matter of having several AccountItem child components that each get one account's data. And there's very little data that isn't per-account -- a handful of settings, and a handful of bits of state.session like isOnline and orientation -- so there's very little code that acts on that together with per-account data, almost all of it UI code, and it might even be a good thing for that code to be a bit more explicit about consuming account-independent settings.

Another thing that might point toward a single actual store is that separate stores might require more complex management for persistence: if we simply have separate Redux stores each running redux-persist, then they're each separately rehydrating on startup, each separately doing migrations, and each separately making transactional writes. This might be dispositive that we want to keep a single global actual store, at least initially -- I'd rather not have to worry about added complexity there while we're making a bunch of other complex changes.

If we do have a single actual store, and per-account proxies for it, then TBD what interface exactly those proxies provide, and how they're implemented -- this will require some more thinking and experimentation.

@gnprice
Copy link
Member Author

gnprice commented Sep 21, 2021

I think what this may point toward is that we're going to want, for each account, a separate object that looks a lot like a Redux store.

I've done some more thinking about what this internal API for the app code to look like, and then I spent yesterday drafting up a branch to move toward that API.

I'll start sending PRs from the work in that branch. At present it's 120 commits long, though some of those changes are experiments we won't need and some will get squashed together in cleaning it up. In addition to branch cleanups, it needs more jsdoc and comments and probably some tests.

The basic thrust of what's in the branch is to separate which parts of our code will want to be dealing with one of these store-like objects that's bound to a specific account's data, and which really will want to be dealing with the global store. In the branch, we don't yet change any of the actual data structures… but we distinguish per-account vs. global types, even though secretly we're passing the same values around under both types. That lets us use the type-checker to trace how the data flows around to ensure that we're consistent in which code gets which kind of type, and to identify exactly which spots of code rely on assumptions that won't stay true in the future where these objects are actually distinct.

In particular:

  • There's a PerAccountState, which thanks to inexact object types is actually a supertype of GlobalState. For now the PerAccountState values we pass around are secretly just GlobalState values, but in the future they'll have just the actual per-account state.
  • The types Selector, ThunkAction, and Dispatch and values connect, useSelector, and useDispatch are each split into a per-account and a global version. The per-account version gets the existing simple name, because that's what most of the code wants; code that actually needs all accounts' data, or doesn't expect there's a particular account it's operating on, switches to the global version to keep the existing GlobalState-based definition.
  • There's more that follows that, but getting into details may be clearest in the form of PRs.

chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this issue Jan 27, 2023
This fixes a regression introduced in 3b811d6 that made it
impossible to log into a new account, by crashing just after
completing authentication with the message "zulipVersion must be
non-null".

Oops: getHaveServerData was not an OK place to call
getServerVersion. It has the following invariant:

  // This invariant will hold as long as we only call this function in a
  // context where we have server data.
  //
  // TODO(zulip#5006): […]
  invariant(zulipVersion, 'zulipVersion must be non-null');

So, abandon that selector and access zulipVersion safely.

While I was at it, I double-checked `Account['zulipVersion']` and
`Account['zulipFeatureLevel']` …and found that their jsdocs weren't
right about when those fields were null. Fix that, copying some text
from the jsdoc of `Account['userId']`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant