Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for initializing a document with a 'null' author and timestamp? #374

Open
HerbCaudill opened this issue May 14, 2021 · 6 comments
Open

Comments

@HerbCaudill
Copy link
Contributor

HerbCaudill commented May 14, 2021

Suppose Alice initializes a document as follows:

let doc = A.from({ todos: [] })

Alice adds some todos:

doc = A.change(doc, s => s.todos.push('Learn Automerge'))
doc = A.change(doc, s => s.todos.push('Profit'))

Now Bob initializes his document using the same code as Alice:

let doc = A.from({ todos: [] })

Alice and Bob connect using the new sync protocol, perhaps using an implementation like this one. https://github.com/HerbCaudill/todomvc-automerge/blob/master/src/AutomergeSync.tsx

Half of the time, they will converge on Alice's list with two items; the other half, they will converge on Bob's empty list. It seems to me that the preferred outcome here is always going to be converge on Alice's list.

I asked about this on Slack and @ept suggested initializing the document with a "null" authorId like 0000, and a timestamp of 0.

conversation from slack fwiw

@HerbCaudill :
Continuing my exploration of Automerge 1, I've made a very minimal TodoMVC app in React, which can sync instances in multiple browsers using @localfirst/relay for networking. Code is here https://github.com/HerbCaudill/todomvc-automerge

In my testing, if you make a bunch of changes to the list in one browser and then open another browser window, there's some % chance that the first browser's state will be overwritten with the default state from the second browser.

IIRC, the way I'm doing this wouldn't have worked in Automerge 0.x : each client starts out with this as their state -

const defaultState = A.from<State>({
  todos: [
    { id: '0', completed: true, value: 'Learn React' },
    { id: '1', completed: false, value: 'Learn Automerge' },
    { id: '2', completed: false, value: 'Profit' },
  ],
})

And the sync protocol lets them reconcile their documents, which in theory started out divergent and shouldn't be able to converge.

I guess I need to have some kind of flag to indicate that I'm still at the default state, and if that's the case then defer to a peer who has already made edits? I've solved this in the past with Automerge 0.x but I'm not sure if I'm thinking about it in the right way & my brain is tired

@ept :
Rather than using A.from, maybe a good option would be to initialise all clients with the same initial change. If the actorId and the change timestamp are hardcoded, the rest of the change generation should be deterministic, and so all clients should have an identical initial change hash.

@HerbCaudill :
hm ok that makes sense. how do I hardcode the timestamp?

@ept :
try this — a bit convoluted, but ought to work

const defaultState = Automerge.load(Automerge.getLastLocalChange(Automerge.change(Automerge.init('0000'), {time: 0}, doc =>  {
  doc.todos = [
    { id: '0', completed: true, value: 'Learn React' },
    { id: '1', completed: false, value: 'Learn Automerge' },
    { id: '2', completed: false, value: 'Profit' }
  ]
)))

Here 0000 is the actorId and 0 is the change timestamp. don't use them for anything else, only for this one initialisation step. As long as all clients have exactly the same code, they should generate exactly the same change (with the same hash).

Use Automerge.Backend.getHeads(Automerge.Frontend.getBackendState(defaultState)) to check you're getting the same hash on all clients.

@HerbCaudill :
That seems to work - I haven't been able to reproduce the incorrect behavior, and the hashes match.

So, if I understand this correctly, it seems like this would pretty much always be what I want. I almost feel like the author for init, from, etc. should always be some null-like value, and the timestamp should always be 0. So if I do let d = A.from({foo: 42}) and you do let s = A.from({foo: 42}) we should always be able to merge those two docs

This is the code I've ended up with, which solves the problem:

const initWithNullAuthor = function <T>(initialState: T) {
  const emptyDoc = A.init<T>('0000')
  const changeOptions = { time: 0 }
  return A.load<T>(
    A.getLastLocalChange(
      A.change(
        emptyDoc, //
        changeOptions,
        s => Object.assign(s, initialState)
      )
    )
  )
}

const defaultState = initWithNullAuthor({ todos: [] })

It seems like this is a fairly common use case, if not the most likely scenario & as such, it seems like it deserves better than this ugly user-land solution. A couple of possibilities come to mind:

  1. My preferred solution would be to treat init and from as special cases, with no author, and timestamp 0. Seems to me that it's not really meaningful to talk about "conflicts" in the context of the initial creation of the document. Alice's initialization and Bob's initialization are, for practical purposes, the same event. So basically init would have { authorId: '0000' } as the default options, and from would have { time: 0} as the change options by default.

  2. Alternatively, we could expose my initWithNullAuthor as its own top-level Automerge function (hopefully with a better name!)

@jeffa5
Copy link
Collaborator

jeffa5 commented May 14, 2021

I like the idea of this becoming a part of the library as it seems like a very easy thing to trip up on otherwise, I wonder if there are use cases where you wouldn't always want a common shared ancestor for all peers? I'd expect that in that case you could use separate documents.

Anyway I'd be for adding functionality to the init function at least to create a default change in the backend which basically does nothing (perhaps initialises the root to be an empty map) with a common actor id. This should make syncing less confusing with overwriting histories.

Do we know how Yjs handles this? Just may be a point of reference.

@ept
Copy link
Member

ept commented May 17, 2021

I agree that we need a solution for this problem, but I'm not totally convinced that this approach is the best one. Special-casing the initialisation change to make it deterministic helps in the simple case when all peers perform exactly the same initialisation. However, if the developer ever changes the initialisation code, it will produce a different change with a different hash, and then we're back to square one again with conflicts on the top-level todos key. If you are very careful you can have multiple initialisation changes that chain off each other, but it would be a very fragile API that is easy to use incorrectly.

I think a better approach would be to revive what was discussed years ago in #4: if two users concurrently create objects of the same type under the same key, then we don't make a conflict, but rather we merge the operations on those objects into a single object. That would have the desired effect in many common cases. There might still be some cases of initialisation where a deterministic initialisation change is needed, but they would be much less common.

@HerbCaudill
Copy link
Contributor Author

if two users concurrently create objects of the same type under the same key, then we don't make a conflict, but rather we merge the operations on those objects into a single object

Thanks @ept - I knew something along these lines had come up in the past. I've reread #4 and I agree that's the right approach.

@DiamondYuan
Copy link

DiamondYuan commented May 17, 2021

I think , we could create 2 new data type.

SharedArray and SharedMap (or other name like NoConflictArray , NoConflictMap

user A set data.todo = SharedArray(1)
user B set data.todo = SharedArray(2)
data.todo will be [1,2]

user A set data.todo = SharedArray(1)
user B set data.todo = [2]
data.todo will be [2] or SharedArray[1]

@DiamondYuan
Copy link

DiamondYuan commented May 17, 2021

then we can add some action like makeConfictFreeList and makeConfictFreeMap
default use makeList

@ept
Copy link
Member

ept commented May 17, 2021

@DiamondYuan That would be a big API change, which I'm not keen on. I think we can change the semantics for concurrently created objects on the same key without otherwise changing the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants