Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize two Playground instances #727

Merged
merged 86 commits into from
Nov 6, 2023
Merged

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Oct 28, 2023

Description

Synchronizes two Playground instances. This is the technical foundation needed to fork entire sites, make changes, merge them back, rebase, undo etc. It's like git for WordPress.

CleanShot.2023-11-04.at.18.59.07.mp4

Wait, what? How does it work?

We journal local changes, send them to over a remote peer, and replay them there. The following changes are supported:

  • SQL Queries (INSERT, DELETE, ALTER TABLE etc)
  • Filesystem changes (create, delete, rename, etc)

If that's so simple, why doesn't WordPress already support it?

This type of sync was never possible before.

The secret ingredient here is Playground. We can only keep track of all actions because we have a full control over the filesystem and the database.

What about conflicting autoincrement IDs?

We're sharding IDs to avoid conflicts. For example, peer 1 could start all autoincrement sequences at 12345000001, while peer 2 could start at 54321000001. This gives both peers have a lot of space to create records without assigning the same IDs.

In some ways, this is similar to ID sharding once described on Instagram's engineering blog.

What if we run out of space to assign new IDs?

Currently, that would create a conflict and cause the two peers to diverge forever.

In the future, we could rewrite these high IDs to reclaim the space. Here's how it could work:

  1. Alice assigns a high autoincrement ID (e.g. 1234500001) and marks it as "dirty"
  2. Alice sends the change to Bob
  3. Bob finds the available next low ID (e.g. 35) and establishes a mapping between 1234500001 and 35
  4. Bob starts rewriting all occurences of 1234500001 to 35 in all SQL queries received from Alice
  5. Bob rewrites and applies the received query
  6. Bob sends a confirmation to Alice that the record was "committed" with ID 35
  7. Alice rewrites all local instances of 1234500001 with 35 like Bob did
  8. Alice sends a confirmation to Bob that she reconciled 1234500001 as 35 and Bob may stop rewriting it

The rewriting is needed because sometimes ids are stored inside serialized data such as JSON or PHP's serialize() output. It's an imperfect heuristics that would occasionally rewrite data that was the same as our ID but had a different meaning, but perhaps it wouldn't happen that often. That's the best we can do anyway. There's no way to reason about the meaning of arbitrary serialized data as it can come from any WordPress plugin.

Time traveling

Wouldn't it be handy to undo a mistake that messed up your site? Well, now you can.

The journal is a recipe for getting from a vanilla WordPress to the site you have now. We can replay that recipe on a fresh Playground, stop half-way through, and recover the site you've had a few minutes ago. This opens the door to a WordPress-wide undo button.

It's quite similar to what Redux devtools provide.

The proof of concept can be accessed at http://localhost:5400/website-server/demos/time-traveling.html:

CleanShot 2023-11-04 at 20 43 36@2x CleanShot 2023-11-04 at 20 43 20@2x

Testing instructions

  1. Run nx dev
  2. Go to http://localhost:5400/website-server/demos/sync.html
  3. Make some changes in either Playground window and confirm that within 5 seconds they're reflected in the other window

Follow up work

  • Sync changes over the network. For now the only transport uses the local iframe.postMessage.
  • Merge the SQLite translator changes to the upstream sqlite-database-integration repo: Expose actions exposing executed SQL queries sqlite-database-integration#56
  • Improve the test coverage. Test recording and replaying SQL queries. Test the ID sharding offset instrumentation to ensure it isn't easily derailed.
  • Do not send all the files eagerly. Save transfer by computing the hash and only sending what the other peer don't already have (like git).
  • Negotiate the ID offset used by different peers. This is to avoid sharding collisions when two peers randomly choose similar offsets.
  • Explore ID rewriting to reclaim the ID sharding space (as outlined above).
  • Implement normalizeFilesystemOperations() to be able to transmit files that are created and instantly renamed (see the comment in fs.ts for more details)
  • Time traveling support by restoring the initial state, removing parts of the journal, and replaying what's left

Tasks already done

  • Unit tests
  • Audit the atomic() helper – is it a good idea to expose that kind of API?
  • Test replaying memfs changes in opfs as this PR changed the underlying journaling mechanism
  • Rewrite resource URLs in transmitted SQL queries to and from https://playground.wordpress.net/
  • Keep track of transactions, don’t sync anything that was rolled back, only replay entire commits,
  • Ignore violations of unique indexes
  • Batch what gets sent, perhaps collect all non-rolled back queries from a single request handled by PHP, or from all requests in any 3-5 seconds time window
  • Move all that logic to a new package like @wp-playground/sync, expose an interface to sync via an arbitrary backend
  • Ideally, don’t notify the fs event listener about replayed file operations.
  • Bump autoincrements every time a CREATE_TABLE is executed
  • Ensure that INSERT queries that do not provide an explicit value for their auto_increment ID do carry over the correct ID to the synchronized peer
  • Don’t call post_message_to_js on replayed queries.

cc @dmsnell

@adamziel adamziel force-pushed the try/replay-sql-queries branch from fa31127 to 7b75f63 Compare October 29, 2023 23:03
Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete review

@adamziel adamziel force-pushed the try/replay-sql-queries branch from 1059219 to c7d4577 Compare November 5, 2023 19:14
@adamziel
Copy link
Collaborator Author

adamziel commented Nov 5, 2023

I think that if we base64-encode it then we won't need to escape the JSON value.

@dmsnell With base64 we wouldn't need the json_encode() at all.

But there's a catch:

> btoa('ą')
VM197:1 Uncaught DOMException: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.

> atob('xIU=')
'Ä\x85'

Are you familiar with any fast and synchronous ways of dealing with base64 in JS in the browser?

Edit: I went with this one: ce943cd and then simplified it in 67cf982

@dmsnell
Copy link
Member

dmsnell commented Nov 5, 2023

With base64 we wouldn't need the json_encode() at all.
But there's a catch:

there's a reason to leave the JSON encoding in there, and you found it. also I didn't know if you would be sending more than just strings, and this lets us send structured data across the boundary. text encoding is a big reason I lean on JSON in places though; it not only handles Unicode issues in a straightforward manner, but it can also escape special characters so they don't trip up naive parsers, e.g. "\u00a0" instead of a newline character (though with base64 this second need isn't important).

fast and synchronous ways of dealing with base64 in JS in the browser?

atob() and btoa() - I'm not sure what the question is

@adamziel
Copy link
Collaborator Author

adamziel commented Nov 6, 2023

fast and synchronous ways of dealing with base64 in JS in the browser?

atob() and btoa() - I'm not sure what the question is

Oh I just meant they don't work with non-latin characters by default, but converting the string to Uint8Array and then calling String.fromCodePoint(...bytes); does the trick.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. There will be ample opportunity to iterate on the documentation and interfaces. Maybe we continue playing with it and revising before we publish it too broadly; wouldn't want to get stuck with the API if we're not confident on it.

@adamziel
Copy link
Collaborator Author

adamziel commented Nov 6, 2023

@dmsnell I consider most Playground APIs as unstable but yeah, let's still not advertise this too much. Stable APIs or not, not breaking existing apps is just a nice thing to do.

@adamziel adamziel merged commit 420be2b into trunk Nov 6, 2023
4 checks passed
@adamziel adamziel deleted the try/replay-sql-queries branch November 6, 2023 19:15
@adamziel adamziel mentioned this pull request Nov 24, 2023
@adamziel adamziel mentioned this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants