Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: synchronous → asynchronous execution #7970

Closed
wants to merge 1 commit into from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jul 15, 2018

In this patch we're changing the execution model of the post_content parser from synchronous to asynchronous.

const doc = '<!-- wp:paragraph --><p>Wee!</p><!-- /wp:paragraph -->';

const parseSyncNowBroken = ( doc ) => {
	const parsed = parseWithGrammar( doc );
	return parsed.length === 1;
}

const parseAsyncNowWorks = ( doc ) => {
	return parseWithGrammar( doc ).then( parsed => {
		return parsed.length === 1;
	} );
}

const usingAsyncAwait = async ( doc ) => {
	const parsed = await parseWithGrammar( doc );
	return parsed.length === 1;
}

Why use an asynchronous parsing model?

So far whenever we have relied on parsing the raw content from post_content we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want.

Benefits of a synchronous model

  • execution flow is straightforward
  • loading state is "free" because nothing renders until parse finishes

Limitations of a synchronous environment

  • execution of the remainder of the app waits for every parse
  • cannot run parser in WebWorker or other context
  • cannot run parsers written in an asynchronous manner

These limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change.

Benefits of an asynchronous model

  • parsing doesn't block the UI
  • parsing can happen in a WebWorker, over the network, or in any asynchronous manner

Limitations of an asynchronous environment

  • UI must become async-aware, the loading state is no longer "free"
  • race conditions can appear if not planned for and avoided
  • errors during the parse can show up as uncaught promise catches

What? It isn't all peaches?

Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify post-parser.js to this instead and it will become obvious…

import { parse as syncParse } from './post.pegjs';

export const parse = ( document ) => new Promise( ( resolve ) => {
	setTimeout( () => resolve( syncParse( document ) ), 2500 );
} );

With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post.

For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash.

To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now.

As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling parse() in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished.

What do we need in this PR then?

  • Please review the code changes here and reflect on their implications. The tests had to change and so do all functions that rely on parsing the post_content - they must become asynchronous themselves.
  • Please consider the race conditions and the experience implications and weigh in on how severe you estimate them to be.
  • Please share any further thoughts or ideas you may have concerning the sync vs. async behavior.
  • Please consider and comment on how you think we should handle exceptions raised during the parse since those will appear as uncaught promise failures. Previously in the sync model an exception would immediately break the app; in the async model it silently allows things to continue but the parse never finishes. (we can easily add a hard-limit on the parse timeout and create a new error state)

@dmsnell dmsnell added [Feature] Block API API that allows to express the block paradigm. Framework Issues related to broader framework topics, especially as it relates to javascript [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f Needs Dev Ready for, and needs developer efforts [Type] Performance Related to performance efforts labels Jul 15, 2018
@dmsnell dmsnell force-pushed the framework/async-parser branch from 34a10cb to ffa5b67 Compare July 15, 2018 20:08
@Hywan
Copy link

Hywan commented Jul 16, 2018

Thanks for the PR! 🎉

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an important future-proofing change we should make sooner than later. That it introduces some complexity in loading states is something we would have been better to anticipate earlier in the project, but which we should design for all the same. I think it can be improved from what's implemented here, though it's not awful even as-is, and is certainly better than a loading state which is merely the browser hanging.

It's maybe the only public-facing API where we're "forcing" ES2015+ (i.e. Promises), though not in a way that I'd anticipate complaints being lodged. It's not imposing any build system, browser requirements, or polyfills to be effected by the consuming developers; merely to understand how to work with them as a return type.

@@ -52,11 +52,11 @@ class BlockDropZone extends Component {
}

onHTMLDrop( HTML, position ) {
const blocks = rawHandler( { HTML, mode: 'BLOCKS' } );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make this function async / await to avoid effecting changes much?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know we could - done!

export * from './post.pegjs';
import { parse as syncParse } from './post.pegjs';

export const parse = ( document ) => Promise.resolve( syncParse( document ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically some variable shadowing going on here with the document global. Would post be a more accurate variable name anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to postContent

HTML: getBlockContent( block ),
mode: 'BLOCKS',
canUserUseUnfilteredHTML,
} ).then( ( content ) => dispatch( 'core/editor' ).replaceBlocks( block.uid, content ) ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: blocks or newBlocks seems like a more accurate variable name than content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconstructed

HTML: block.originalContent,
mode: 'BLOCKS',
} ) );
} ).then( ( content ) => replaceBlock( block.uid, content ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: block or newBlock seems like a more accurate variable name than content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconstructed

@@ -77,7 +77,7 @@ function getRawTransformations() {
* @param {Array} [options.tagName] The tag into which content will be inserted.
* @param {boolean} [options.canUserUseUnfilteredHTML] Whether or not the user can use unfiltered HTML.
*
* @return {Array|string} A list of blocks or a string, depending on `handlerMode`.
* @return {Promise<Array|string>} A list of blocks or a string, depending on `handlerMode`.
*/
export default function rawHandler( { HTML = '', plainText = '', mode = 'AUTO', tagName, canUserUseUnfilteredHTML = false } ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still one more instance of the synchronous form at:

  • editor/components/block-settings-menu/block-unknown-convert-button

@dmsnell
Copy link
Member Author

dmsnell commented Jul 16, 2018

@aduth I tried converting to async/await and I failed somewhere to get it working. I believe that SETUP_EDITOR is causing the trouble

Actions must be plain objects.

But I'm lost. The Promise version works but the async/await doesn't. Not sure if you have any cycles to try and figure this out - promise version is currently in HEAD^

@dmsnell
Copy link
Member Author

dmsnell commented Jul 16, 2018

Update @aduth the problem is apparently in sending async SETUP_EDITOR as an action. If that action returns a Promise it works but I think the asyncGenerator transform is doing something the middleware doesn't like.

with that, I think using the Promise interface is working and a reasonable way forward unless we figure out why the middleware blows up

@aduth aduth force-pushed the framework/async-parser branch from a03aaf5 to 3b6d5de Compare August 6, 2018 14:34
Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for my delay in revisiting this one. As restitution, I've taken it upon myself to do the rebase.

The SETUP_EDITOR still wasn't working quite well, since as an async function its return value is a promise, which cannot be handled without an appropriate middleware. I've restored the behavior of it as implemented by Promise, which required some hacking at its tests to become working again.

I'm having some issues with end-to-end tests in my environment. Will see if they persist through to Travis or if it's my local setup.

Unfortunately this appears it'll be a breaking change; one for which we can't really provide a proper deprecate path, since the primary point of change is in the return value of wp.blocks.parse (from Array to Promise<Array>).

We should probably highlight this in release notes.

From my end, this looks good. May be worth having a second set of eyes on it, since it's a pretty fundamental change in parsing behavior.

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Travis failures, end-to-end test issues are probably legitimate.

@aduth aduth added this to the 3.7 milestone Aug 15, 2018
@aduth
Copy link
Member

aduth commented Aug 15, 2018

Milestoning this to 3.7. We should get it in sooner than later. I can plan to revisit the failures soon.

@aduth
Copy link
Member

aduth commented Aug 16, 2018

Thought occurred to me: Should we want serialization to be asynchronous as well, both as a complement to the parse and in similarly enabling future offloading?

@aduth
Copy link
Member

aduth commented Aug 16, 2018

I believe the failures can likely be attributed to the fact that now that the parse is asynchronous, there's an opportunity for the editor to be rendered without its block representation of content being ready. We shouldn't want this, as it can lead to destructive user flows where content is inadvertently replaced. With an asynchronous parse, we probably want to include a "state" where we know that content exists, but we haven't yet received the parsed result; and until we do, we should prevent the user from interacting with the editor (at least the blocks list).

@Hywan
Copy link

Hywan commented Aug 17, 2018

Having an async serializer makes sense to me.

@aduth aduth added [Type] Breaking Change For PRs that introduce a change that will break existing functionality and removed Needs Dev Ready for, and needs developer efforts labels Aug 21, 2018
In this patch **we're changing the execution model of the `post_content` parser from _synchronous_ to _asynchronous_**.

```js
const doc = '<!-- wp:paragraph --><p>Wee!</p><!-- /wp:paragraph -->';

const parseSyncNowBroken = ( doc ) => {
	const parsed = parseWithGrammar( doc );
	return parsed.length === 1;
}

const parseAsyncNowWorks = ( doc ) => {
	return parseWithGrammar( doc ).then( parsed => {
		return parsed.length === 1;
	} );
}

const usingAsyncAwait = async ( doc ) => {
	const parsed = await parseWithGrammar( doc );
	return parsed.length === 1;
}
```

So far whenever we have relied on parsing the raw content from `post_content` we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want.

 - execution flow is straightforward
 - loading state is "free" because nothing renders until parse finishes

 - execution of the remainder of the app waits for every parse
 - cannot run parser in `WebWorker` or other context
 - cannot run parsers written in an asynchronous manner

These limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change.

 - parsing doesn't block the UI
 - parsing can happen in a `WebWorker`, over the network, or in any asynchronous manner

 - UI _must_ become async-aware, the loading state is no longer "free"
 - race conditions _can_ appear if not planned for and avoided

Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify `post-parser.js` to this instead and it will become obvious…

```js
import { parse as syncParse } from './post.pegjs';

export const parse = ( document ) => new Promise( ( resolve ) => {
	setTimeout( () => resolve( syncParse( document ) ), 2500 );
} );
```

With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post.

For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash.

To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now.

As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling `parse()` in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished.

 - Please review the code changes here and reflect on their implications. The tests had to change and so do all functions that rely on parsing the `post_content` - they must become asynchronous themselves.
 - Please consider the race conditions and the experience implications and weigh in on how severe you estimate them to be.
 - Please share any further thoughts or ideas you may have concerning the sync vs. async behavior.
@aduth aduth force-pushed the framework/async-parser branch from 3b6d5de to 13560d1 Compare August 21, 2018 20:35
@aduth
Copy link
Member

aduth commented Aug 21, 2018

Pushed a rebase to at least resolve the changes. My plan is push for necessary updates in the morning:

  • Document breaking changes with parse result
  • Update serializer to asynchronous flow
  • Implement loading state for parse-in-progress

To the last point, the reason it's taken me longer than I'd like to have revisited this is that I think it could tie into a larger refactor of the editor initialization. I also think it may surface the need for some generic actions around setting / replacing the content of the editor. Finally, it could even tie into pieces of #8822 (comment) where we need the concept of a placeholder block (here, to represent the known presence / unknown parse result, there to serve as a stand-in for the off-screen blocks).

@aduth
Copy link
Member

aduth commented Aug 22, 2018

For my own reference, I'm also observing a failed network request to /wp/v2/types/undefined on the latest branch when loading the editor.

@youknowriad youknowriad modified the milestones: 3.7, 3.8 Aug 30, 2018
@aduth
Copy link
Member

aduth commented Sep 4, 2018

It's worth noting that should we change serialize to become asynchronous as well, we're subject to similar concerns as when we had considered the serialize result always being generated on the server: currently we expect it is simple to display the HTML representation of a block ("Edit as HTML") or for the entire post (Code Editor). The resolution of the serialize function would therefore become a prerequisite to both of these if refactored to be asynchronous.

@aduth
Copy link
Member

aduth commented Sep 4, 2018

I am still of the opinion that asynchronous parse and serialize are the way to go, but for transparency it's worth considering that a synchronous parse from wp.blocks.parse does not eliminate the option for the editor to receive its parsed blocks asynchronously. See also related efforts in #9403. Since the editor reflects itself on its state populated via resetPost and resetBlocks actions, it matters not if these occur asynchronously. Where I still think it's important that wp.blocks.parse and serialize occur asynchronously is in being the assumed default parser / serializer for the editor, and in treating the current implementation more as a stop-gap solution, where an asynchronous solution is the intended end-goal.

@youknowriad youknowriad removed this from the 3.8 milestone Sep 5, 2018
@Hywan
Copy link

Hywan commented Sep 12, 2018

What is blocking the PR? Can I help?

@aduth
Copy link
Member

aduth commented Sep 12, 2018

@Hywan Currently it's a combination of:

  • For me, an internal conflict on whether we should bother at all. Maybe biased by the implications it has on UX. Maybe because I haven't convinced myself that it needs to be asynchronous. This isn't very actionable, aside from responding to my various earlier brain dumps.
  • On more actionable front, preparing separately the requirements for working in an asynchronous user workflow. Editor: Sync blocks state to edited post content #9403 addresses parse, and itself is semi-blocked by its own forked efforts (Editor: PostTextEditor: Fix case where empty state value defers to prop #9513 and an as-of-yet undeveloped template validation refactor). Nothing yet has been proposed for serialization refactor for asynchronous.

@mcsf
Copy link
Contributor

mcsf commented Oct 12, 2018

  • parsing doesn't block the UI
  • parsing can happen in a WebWorker, over the network, or in any asynchronous manner

I was pretty excited about reaping these benefits, especially in order to take full advantage of @Hywan's excellent Rust parser, but such a fundamental change sadly cannot fit into the WP 5.0 merge timeline.

In practical terms, since replacing the PEG-generated parser in #8083, parsing has worked quickly and robustly on both client and server. The synchronous character of block parsing hasn't yet been a measurable issue.

In time, perhaps, we may justify a switch of the entire cycle (parsing and serialization) to async. For that, I'm labelling this Future and closing. In the meantime, I hope to see client-side applications of the async parser for intensive document processing, or the "emancipation" of Gutenberg's block-based document format thanks to the portability of the Rust parser — e.g. applications that process Gutenberg documents outside the Web environment, adoption in the mobile apps, etc.

A million thanks to both @Hywan and @dmsnell for all the parsing efforts. I'm sure more will come for the document of the Web.

@aduth
Copy link
Member

aduth commented Dec 9, 2019

Follow-up issue to capture some of the goals here: #19021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f Framework Issues related to broader framework topics, especially as it relates to javascript [Type] Breaking Change For PRs that introduce a change that will break existing functionality [Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants