Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal Persistence Strategy for a Chat Client #310

Closed
fredguest opened this issue Jan 13, 2017 · 5 comments
Closed

Optimal Persistence Strategy for a Chat Client #310

fredguest opened this issue Jan 13, 2017 · 5 comments

Comments

@fredguest
Copy link

Hey @WolframHempel and @yasserf very cool ecosystem of libraries, thanks for building and OSing them!

I’m coming from traditional REST APIs so Deepstream records and lists are new concepts for me. I built a little demo app with React and Redux to explore the best implementation of records and lists, and the requirements in this case are for an observable collection, but that each item in the collection does not need to be observable, much in the way the collection of messages might work in a chat client. Think Slack for example, the individual messages themselves do not need to be observable, but the collection of messages does so that new messages can be rendered when necessary. I landed on two distinct strategies, each with pros and cons, and I was hoping you might weigh in.

You can find the full codebase for the demo app here: https://github.com/choppur/www

In the first strategy, I used an observable list in order to know when new records are added, and then fetched snapshots of the referenced records, as follows:

import React, { Component, PropTypes } from 'react';
import $ from '../../styles/demo/AnimalsView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import difference from 'lodash/difference';
import Button from '../shared/Button';
import List from '../shared/List';

class AnimalsView extends Component {
  componentDidMount() {
    this.list = client.record.getList(this.props.listName);
    this.list.subscribe((animalNames) =>
      this.props.replaceAnimalNames(animalNames)
    );
  }

  componentWillReceiveProps({ animalNames }) {
    if (animalNames.length > this.props.animalNames.length) {
      difference(animalNames, this.props.animalNames).map((name) =>
        client.record.snapshot(name, (error, data) =>
          this.props.pushAnimalData(data)
        )
      );
    }
  }

  componentWillUnmount() {
    this.list.discard();
  }

  render() {
    return (
      <div className={css($.AnimalsView)}>
        <Button label="Add Animal" onClick={this.onClick.bind(this)} />
        <List items={this.props.animals} />
      </div>
    );
  }

  onClick() {
    const recordName = `${this.props.listName}/${client.getUid()}`;
    const record = client.record.getRecord(recordName);
    record.set({ name: recordName, foo: 'bar' }, () =>
      this.list.addEntry(recordName)
    );
  }
}

AnimalsView.propTypes = {
  animals: PropTypes.array,
  listName: PropTypes.string,
  animalNames: PropTypes.array,
  pushAnimalData: PropTypes.func,
  replaceAnimalNames: PropTypes.func
};

export default AnimalsView;

What I like about this strategy is that it’s fairly efficient in terms of the amount of data sent over the wire, because even though the entire list is being sent to the client on every update, the nature of a list is lightweight, and the data of each record only ever needs to be fetched once. What I don’t love about this strategy is that it feels like there are more callbacks than I would ideally like to be required to synchronize the data safely. Also, as you will see in the video below, it’s a bit laggy.

In the second strategy, I used an observable record without a list, updated a path within the record to persist the array of data, and subscribed to changes in that path in order to know when new items are added, as follows:

import React, { Component, PropTypes } from 'react';
import $ from '../../styles/demo/VegetablesView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import Button from '../shared/Button';
import List from '../shared/List';

class VegetablesView extends Component {
  componentDidMount() {
    this.record = client.record.getRecord(this.props.recordName);
    this.record.whenReady((record) =>
      this.props.replaceVegetableData(record.get('data'))
    );
    this.record.subscribe('data', (data) =>
      this.props.replaceVegetableData(data)
    );
  }

  componentWillUnmount() {
    this.record.discard();
  }

  render() {
    return (
      <div className={css($.VegetablesView)}>
        <Button label="Add Vegetable" onClick={this.onClick.bind(this)} />
        <List items={this.props.vegetables} />
      </div>
    );
  }

  onClick() {
    const newItem = { name: `${this.props.recordName}/${client.getUid()}`, foo: 'bar' };
    this.record.set('data', [ ...this.props.vegetables, newItem ]);
  }
}

VegetablesView.propTypes = {
  vegetables: PropTypes.array,
  recordName: PropTypes.string,
  replaceVegetableData: PropTypes.func
};

export default VegetablesView;

What I like about this strategy is that it lends itself nicely to atomic state (reduxjs/redux#1385), as you could easily create a single record to store the entire state of your application, and then subscribe to isolated paths within that record from the relevant nodes in your component tree. I also like the fact that it’s less code, and simpler code, than the first strategy. What I don’t love about this strategy is that it’s less efficient in terms of the amount of data sent over the wire because it fetches all of the data for all of the items in the array any time the array changes, instead of just sending the list as in the first strategy. Furthermore, this line:

  this.record.set('data', [ ...this.props.vegetables, newItem ]);

...feels unsafe to me. If two different clients disagree about this.props.vegetables, some vegetables could be lost 😝. I could refetch that path within the record immediately before that line of code and use the returned data instead of this.props.vegetables, but that seems wildly inefficient, and I’m not sure it would even guarantee safety.

In considering the second strategy, the record API feels incomplete to me. Instead of simply using set() on a path, which replaces the entire path, what I would ideally like to do is update a path rather than replace it, based on it’s existing state in Deepstream, maybe something like this:

  this.record.array.push('data', newItem);

…which would add a new item to the array at that path based on the existing state of the array at the moment Deepstream received the new item, without having to send all of that back to the client first. And in the case of an object, maybe something like this:

  this.record.object.merge('data', newKey);

…which would merge a new key and it’s value into the object at that path in a similar manner.

Finally, comparing the apparent speed of these two strategies, with the caveat of testing a very small data set, the second strategy is quite noticeably faster:

deepstream

I would love to hear any thoughts, suggestions, or code modifications you may have to share, and thanks again for all of the great work you’ve open sourced here!

@yasserf
Copy link
Contributor

yasserf commented Jan 14, 2017

hey @fredguest

That was quite a read! I would have gone down the first route too, only difference is I would have used list.subscribe but rather

  list.on('entry-added', (recordName) => console.log(`Vegetable ${recordName}`got added!`)

That way you get told which individual things got added rather than having to replace them all. It seems that you are doing a diff between the updates anyways so thats fine. I'm quite curious where the lag is there, as it's the approach we almost all the time as well!

In regards to your second approach:

What I don’t love about this strategy is that it’s less efficient in terms of the amount of data sent over the wire because it fetches all of the data for all of the items in the array any time the array changes, instead of just sending the list as in the first strategy.

If you use paths record.set('vegetable[10]', { ... } ) what would happen is only that data would be sent to deepstream, and it would only be that data sent to all clients. This is part of a concept we have called 'PATCHES' where we only have to send around diffs. The client library abstracts away the fact it was just a small path that changed when you use the subscribe callback.

this.record.set('data', [ ...this.props.vegetables, newItem ]); ...feels unsafe to me. If two different clients disagree about this.props.vegetables, some vegetables could be lost 😝. I could refetch that path within the record immediately before that line of code and use the returned data instead of this.props.vegetables, but that seems wildly inefficient, and I’m not sure it would even guarantee safety.

We actually have pretty strict data guarantees in place. While the callback on a record set is considered safest it's also the slowest since it depends on the data actually being set in the database rather than just in the cache. Deepstream goes with a "everything is good until I tell you it isn't" approach which means if a database error occurs you will be told, but it only requires a write to the cache to actually send out the data.

We also have version numbers on records and lists. What this means if two clients happen to update the exact same record at the exact same time then they will both try to update the record version to the same number as well. Since all records with the same name process their updates in sync this means the second one in the queue will be rejected and inform the client that there was a merge conflict to resolve.

https://deepstream.io/tutorials/core/handling-data-conflicts/

On a single deepstream node due to the nature of the algorithm I haven't been able to see this break. Obviously as your cluster scales this may become an issue but is something that your cache setup would mitigate depending on the consistency levels you set vs speed.

I hope that explains a few of your questions, feel free to let me know what I didn't or any more you may have!

@fredguest
Copy link
Author

fredguest commented Jan 17, 2017

Thanks for getting through the long read @yasserf, I hope you’ll stay with me! 😬 I refactored the code based on your comments, and I also removed Redux just so that all of the state management for each strategy can be seen in a single file.

Just as a quick aside, I don’t see the on() method documented anywhere on this page https://deepstream.io/docs/client-js/datasync-list/ nor do I see the concept of PATCHES and their behavior documented anywhere on this page https://deepstream.io/docs/client-js/datasync-record/ which would both be nice to have.

Taking advantage of list.on('entry-added’, (name) => {}) cleaned things up quite a bit so the first strategy can be refactored as follows:

import React, { Component } from 'react';
import $ from '../../styles/demo/AnimalsView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import Button from '../shared/Button';
import List from '../shared/List';

class AnimalsView extends Component {
  state = { animals: [] }

  componentDidMount() {
    this.list = client.record.getList('animals');

    this.list.whenReady(() =>
      this.list.getEntries().forEach((name) =>
        client.record.snapshot(name, (error, data) =>
          this.setState(({ animals }) => ({ animals: [ ...animals, data ] }))
        )
      )
    );

    this.list.on('entry-added', (name) =>
      client.record.snapshot(name, (error, data) =>
        this.setState(({ animals }) => ({ animals: [ ...animals, data ] }))
      )
    );
  }

  componentWillUnmount() {
    this.list.discard();
  }

  render() {
    return (
      <div className={css($.AnimalsView)}>
        <Button label="Add Animal" onClick={this.onClick.bind(this)} />
        <List items={this.state.animals} />
      </div>
    );
  }

  onClick() {
    const recordName = `animals/${client.getUid()}`;
    const record = client.record.getRecord(recordName);
    record.set({ name: recordName, foo: 'bar' }, () =>
      this.list.addEntry(recordName)
    );
  }
}

export default AnimalsView;

As you can see it’s no longer necessary to use componentWillReceiveProps or to manually diff the props which is great. All in all I think it’s quite a clean solution, but I wish client.record.snapshot() could also take an array of names and return an array of snapshots, rather than having to request each snapshot one at a time in the initial whenReady() callback. Also, it remains noticeably slower than the second strategy in rendering new items. If I had to guess I would attribute the lagginess to two things:

  1. It asks Deepstream to create a new record for each item added, which I imagine is a more expensive operation than simply updating a path on an existing record as in the second strategy.

  2. A callback, a subscription event, and a new snapshot request must complete between the creation of a new record and it’s rendering, rather than a single subscription event triggering the rendering of a new item in the second strategy.

I was also able to refactor the second strategy quite a bit, but I wasn’t able to get PATCHES to behave as expected, or maybe I misunderstood your explanation of them:

import React, { Component } from 'react';
import $ from '../../styles/demo/VegetablesView';
import { css } from 'aphrodite/no-important';
import { client } from '../../utils/client';
import Button from '../shared/Button';
import List from '../shared/List';

class VegetablesView extends Component {
  state = { vegetables: {}, names: [] }

  componentDidMount() {
    this.record = client.record.getRecord('myApp');

    this.record.whenReady((record) => {
      const vegetables = record.get('vegetables') || {};
      const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
      this.setState({ vegetables, names });
    });

    this.record.subscribe('vegetables', (vegetables) => {
      const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
      this.setState({ vegetables, names });
    });
  }

  componentWillUnmount() {
    this.record.discard();
  }

  render() {
    return (
      <div className={css($.VegetablesView)}>
        <Button label="Add Vegetable" onClick={this.onClick.bind(this)} />
        <List items={this.state.names.map((name) => this.state.vegetables[name])} />
      </div>
    );
  }

  onClick() {
    const itemName = `vegetables/${client.getUid()}`;
    this.record.set(`vegetables[${itemName}]`, { name: itemName, createdAt: Date.now() });
  }
}

export default VegetablesView;

I actually prefer this strategy, not only because it is faster, but because it is a truer expression of the requirements. As discussed initially, individual observable records are not required in this case, simply an observable collection, and this strategy avoids creating a new observable record for each item in the list.

The only problem is that in the record.subscribe() callback, I am receiving all of the data at that path, rather than simply the new item that was added. As a result I am forced to this:

    this.record.subscribe('vegetables', (vegetables) => {
      const names = Object.values(vegetables).sort((a, b) => a.createdAt - b.createdAt).map((vegetable) => vegetable.name);
      this.setState({ vegetables, names });
    });

…which sends a lot of data over the wire and must be resorted every time which is silly. If I was able to only receive the individual new items as they were added to that path, then I would be able to do this:

    this.record.subscribe('vegetables', (newVegetable) => {
      this.setState(({ vegetables, names }) => ({
        vegetables: { ...vegetables, [newVegetable.name]: newVegetable },
        names: [ ...names, newVegetable.name ]
      }));
    });

…which is infinitely nicer and more efficient, because you only need to sort the names once on the initial whenReady() callback and never again, and because it sends much less data over the wire. If this were possible, which is what I took your explanation of PATCHES to mean, then this strategy would clearly be superior to the first.

If you want to confirm the performance difference, feel free to clone the client https://github.com/choppur/www and run it against your own Deepstream server and DB. I'm using a Deepstream server with a RethinkDB instance, each running on their own EC2, and no standalone caching layer.

@yasserf
Copy link
Contributor

yasserf commented Mar 1, 2017

Hey Fred,

Apologies for not getting back, we are going live with deepstreamHub in a week and combined with the length of your explanation it unfortunately slipped through the cracks!

If you have any questions regarding API usage please let me know, but if you can keep them slightly more bit size for now I'll definitely be able to get back quicker :$

Thanks!

@fredguest
Copy link
Author

No problem! The idea was to give clear, code examples of the issues, but I can see that also raises the level of investment in responding, no worries. Good luck with the launch of deepstreamHub! I wish you guys lots of success so your open source projects can continue to thrive! :)

@yasserf
Copy link
Contributor

yasserf commented Mar 1, 2017

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants