Parser: synchronous → asynchronous execution #7970

dmsnell · 2018-07-15T19:59:54Z

In this patch we're changing the execution model of the post_content parser from synchronous to asynchronous.

const doc = '<!-- wp:paragraph --><p>Wee!</p><!-- /wp:paragraph -->';

const parseSyncNowBroken = ( doc ) => {
	const parsed = parseWithGrammar( doc );
	return parsed.length === 1;
}

const parseAsyncNowWorks = ( doc ) => {
	return parseWithGrammar( doc ).then( parsed => {
		return parsed.length === 1;
	} );
}

const usingAsyncAwait = async ( doc ) => {
	const parsed = await parseWithGrammar( doc );
	return parsed.length === 1;
}

Why use an asynchronous parsing model?

So far whenever we have relied on parsing the raw content from post_content we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want.

Benefits of a synchronous model

execution flow is straightforward
loading state is "free" because nothing renders until parse finishes

Limitations of a synchronous environment

execution of the remainder of the app waits for every parse
cannot run parser in WebWorker or other context
cannot run parsers written in an asynchronous manner

These limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change.

Benefits of an asynchronous model

parsing doesn't block the UI
parsing can happen in a WebWorker, over the network, or in any asynchronous manner

Limitations of an asynchronous environment

UI must become async-aware, the loading state is no longer "free"
race conditions can appear if not planned for and avoided
errors during the parse can show up as uncaught promise catches

What? It isn't all peaches?

Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify post-parser.js to this instead and it will become obvious…

import { parse as syncParse } from './post.pegjs';

export const parse = ( document ) => new Promise( ( resolve ) => {
	setTimeout( () => resolve( syncParse( document ) ), 2500 );
} );

With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post.

For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash.

To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now.

As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling parse() in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished.

What do we need in this PR then?

Please review the code changes here and reflect on their implications. The tests had to change and so do all functions that rely on parsing the post_content - they must become asynchronous themselves.
Please consider the race conditions and the experience implications and weigh in on how severe you estimate them to be.
Please share any further thoughts or ideas you may have concerning the sync vs. async behavior.
Please consider and comment on how you think we should handle exceptions raised during the parse since those will appear as uncaught promise failures. Previously in the sync model an exception would immediately break the app; in the async model it silently allows things to continue but the parse never finishes. (we can easily add a hard-limit on the parse timeout and create a new error state)

Hywan · 2018-07-16T08:30:01Z

Thanks for the PR! 🎉

aduth

I think this is an important future-proofing change we should make sooner than later. That it introduces some complexity in loading states is something we would have been better to anticipate earlier in the project, but which we should design for all the same. I think it can be improved from what's implemented here, though it's not awful even as-is, and is certainly better than a loading state which is merely the browser hanging.

It's maybe the only public-facing API where we're "forcing" ES2015+ (i.e. Promises), though not in a way that I'd anticipate complaints being lodged. It's not imposing any build system, browser requirements, or polyfills to be effected by the consuming developers; merely to understand how to work with them as a return type.

aduth · 2018-07-16T12:08:05Z

editor/components/block-drop-zone/index.js

@@ -52,11 +52,11 @@ class BlockDropZone extends Component {
 	}

 	onHTMLDrop( HTML, position ) {
-		const blocks = rawHandler( { HTML, mode: 'BLOCKS' } );


Should we just make this function async / await to avoid effecting changes much?

didn't know we could - done!

aduth · 2018-07-16T12:08:33Z

blocks/api/post-parser.js

-export * from './post.pegjs';
+import { parse as syncParse } from './post.pegjs';
+
+export const parse = ( document ) => Promise.resolve( syncParse( document ) );


Technically some variable shadowing going on here with the document global. Would post be a more accurate variable name anyways?

changed to postContent

aduth · 2018-07-16T12:11:13Z

editor/components/block-settings-menu/block-html-convert-button.js

+			HTML: getBlockContent( block ),
+			mode: 'BLOCKS',
+			canUserUseUnfilteredHTML,
+		} ).then( ( content ) => dispatch( 'core/editor' ).replaceBlocks( block.uid, content ) ),


Minor: blocks or newBlocks seems like a more accurate variable name than content.

reconstructed

aduth · 2018-07-16T12:17:52Z

editor/components/block-list/invalid-block-warning.js

 				HTML: block.originalContent,
 				mode: 'BLOCKS',
-			} ) );
+			} ).then( ( content ) => replaceBlock( block.uid, content ) );


Minor: block or newBlock seems like a more accurate variable name than content.

reconstructed

aduth · 2018-07-16T12:18:44Z

blocks/api/raw-handling/index.js

@@ -77,7 +77,7 @@ function getRawTransformations() {
 * @param {Array}   [options.tagName]                  The tag into which content will be inserted.
 * @param {boolean} [options.canUserUseUnfilteredHTML] Whether or not the user can use unfiltered HTML.
 *
- * @return {Array|string} A list of blocks or a string, depending on `handlerMode`.
+ * @return {Promise<Array|string>} A list of blocks or a string, depending on `handlerMode`.
 */
 export default function rawHandler( { HTML = '', plainText = '', mode = 'AUTO', tagName, canUserUseUnfilteredHTML = false } ) {


There's still one more instance of the synchronous form at:

editor/components/block-settings-menu/block-unknown-convert-button

dmsnell · 2018-07-16T19:51:07Z

@aduth I tried converting to async/await and I failed somewhere to get it working. I believe that SETUP_EDITOR is causing the trouble

Actions must be plain objects.

But I'm lost. The Promise version works but the async/await doesn't. Not sure if you have any cycles to try and figure this out - promise version is currently in HEAD^

dmsnell · 2018-07-16T20:25:05Z

Update @aduth the problem is apparently in sending async SETUP_EDITOR as an action. If that action returns a Promise it works but I think the asyncGenerator transform is doing something the middleware doesn't like.

with that, I think using the Promise interface is working and a reasonable way forward unless we figure out why the middleware blows up

aduth

Apologies for my delay in revisiting this one. As restitution, I've taken it upon myself to do the rebase.

The SETUP_EDITOR still wasn't working quite well, since as an async function its return value is a promise, which cannot be handled without an appropriate middleware. I've restored the behavior of it as implemented by Promise, which required some hacking at its tests to become working again.

I'm having some issues with end-to-end tests in my environment. Will see if they persist through to Travis or if it's my local setup.

Unfortunately this appears it'll be a breaking change; one for which we can't really provide a proper deprecate path, since the primary point of change is in the return value of wp.blocks.parse (from Array to Promise<Array>).

We should probably highlight this in release notes.

From my end, this looks good. May be worth having a second set of eyes on it, since it's a pretty fundamental change in parsing behavior.

aduth

Given Travis failures, end-to-end test issues are probably legitimate.

aduth · 2018-08-15T15:54:53Z

Milestoning this to 3.7. We should get it in sooner than later. I can plan to revisit the failures soon.

aduth · 2018-08-16T20:22:50Z

Thought occurred to me: Should we want serialization to be asynchronous as well, both as a complement to the parse and in similarly enabling future offloading?

aduth · 2018-08-16T20:34:33Z

I believe the failures can likely be attributed to the fact that now that the parse is asynchronous, there's an opportunity for the editor to be rendered without its block representation of content being ready. We shouldn't want this, as it can lead to destructive user flows where content is inadvertently replaced. With an asynchronous parse, we probably want to include a "state" where we know that content exists, but we haven't yet received the parsed result; and until we do, we should prevent the user from interacting with the editor (at least the blocks list).

Hywan · 2018-08-17T07:49:18Z

Having an async serializer makes sense to me.

@Hywan

In this patch **we're changing the execution model of the `post_content` parser from _synchronous_ to _asynchronous_**. ```js const doc = '<p>Wee!</p>'; const parseSyncNowBroken = ( doc ) => { const parsed = parseWithGrammar( doc ); return parsed.length === 1; } const parseAsyncNowWorks = ( doc ) => { return parseWithGrammar( doc ).then( parsed => { return parsed.length === 1; } ); } const usingAsyncAwait = async ( doc ) => { const parsed = await parseWithGrammar( doc ); return parsed.length === 1; } ``` So far whenever we have relied on parsing the raw content from `post_content` we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want. - execution flow is straightforward - loading state is "free" because nothing renders until parse finishes - execution of the remainder of the app waits for every parse - cannot run parser in `WebWorker` or other context - cannot run parsers written in an asynchronous manner These limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change. - parsing doesn't block the UI - parsing can happen in a `WebWorker`, over the network, or in any asynchronous manner - UI _must_ become async-aware, the loading state is no longer "free" - race conditions _can_ appear if not planned for and avoided Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify `post-parser.js` to this instead and it will become obvious… ```js import { parse as syncParse } from './post.pegjs'; export const parse = ( document ) => new Promise( ( resolve ) => { setTimeout( () => resolve( syncParse( document ) ), 2500 ); } ); ``` With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post. For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash. To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now. As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling `parse()` in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished. - Please review the code changes here and reflect on their implications. The tests had to change and so do all functions that rely on parsing the `post_content` - they must become asynchronous themselves. - Please consider the race conditions and the experience implications and weigh in on how severe you estimate them to be. - Please share any further thoughts or ideas you may have concerning the sync vs. async behavior.

aduth · 2018-08-21T20:38:40Z

Pushed a rebase to at least resolve the changes. My plan is push for necessary updates in the morning:

Document breaking changes with parse result
Update serializer to asynchronous flow
Implement loading state for parse-in-progress

To the last point, the reason it's taken me longer than I'd like to have revisited this is that I think it could tie into a larger refactor of the editor initialization. I also think it may surface the need for some generic actions around setting / replacing the content of the editor. Finally, it could even tie into pieces of #8822 (comment) where we need the concept of a placeholder block (here, to represent the known presence / unknown parse result, there to serve as a stand-in for the off-screen blocks).

aduth · 2018-08-22T13:16:26Z

For my own reference, I'm also observing a failed network request to /wp/v2/types/undefined on the latest branch when loading the editor.

aduth · 2018-09-04T18:03:58Z

It's worth noting that should we change serialize to become asynchronous as well, we're subject to similar concerns as when we had considered the serialize result always being generated on the server: currently we expect it is simple to display the HTML representation of a block ("Edit as HTML") or for the entire post (Code Editor). The resolution of the serialize function would therefore become a prerequisite to both of these if refactored to be asynchronous.

aduth · 2018-09-04T18:04:27Z

I am still of the opinion that asynchronous parse and serialize are the way to go, but for transparency it's worth considering that a synchronous parse from wp.blocks.parse does not eliminate the option for the editor to receive its parsed blocks asynchronously. See also related efforts in #9403. Since the editor reflects itself on its state populated via resetPost and resetBlocks actions, it matters not if these occur asynchronously. Where I still think it's important that wp.blocks.parse and serialize occur asynchronously is in being the assumed default parser / serializer for the editor, and in treating the current implementation more as a stop-gap solution, where an asynchronous solution is the intended end-goal.

Hywan · 2018-09-12T09:16:48Z

What is blocking the PR? Can I help?

aduth · 2018-09-12T15:50:08Z

@Hywan Currently it's a combination of:

For me, an internal conflict on whether we should bother at all. Maybe biased by the implications it has on UX. Maybe because I haven't convinced myself that it needs to be asynchronous. This isn't very actionable, aside from responding to my various earlier brain dumps.
On more actionable front, preparing separately the requirements for working in an asynchronous user workflow. Editor: Sync blocks state to edited post content #9403 addresses parse, and itself is semi-blocked by its own forked efforts (Editor: PostTextEditor: Fix case where empty state value defers to prop #9513 and an as-of-yet undeveloped template validation refactor). Nothing yet has been proposed for serialization refactor for asynchronous.

mcsf · 2018-10-12T16:18:21Z

parsing doesn't block the UI

parsing can happen in a WebWorker, over the network, or in any asynchronous manner

I was pretty excited about reaping these benefits, especially in order to take full advantage of @Hywan's excellent Rust parser, but such a fundamental change sadly cannot fit into the WP 5.0 merge timeline.

In practical terms, since replacing the PEG-generated parser in #8083, parsing has worked quickly and robustly on both client and server. The synchronous character of block parsing hasn't yet been a measurable issue.

In time, perhaps, we may justify a switch of the entire cycle (parsing and serialization) to async. For that, I'm labelling this Future and closing. In the meantime, I hope to see client-side applications of the async parser for intensive document processing, or the "emancipation" of Gutenberg's block-based document format thanks to the portability of the Rust parser — e.g. applications that process Gutenberg documents outside the Web environment, adoption in the mobile apps, etc.

A million thanks to both @Hywan and @dmsnell for all the parsing efforts. I'm sure more will come for the document of the Web.

aduth · 2019-12-09T17:35:24Z

Follow-up issue to capture some of the goals here: #19021

dmsnell requested review from youknowriad, mtias, gziolo, aduth and jorgefilipecosta July 15, 2018 19:59

dmsnell force-pushed the framework/async-parser branch from 34a10cb to ffa5b67 Compare July 15, 2018 20:08

aduth reviewed Jul 16, 2018

View reviewed changes

aduth force-pushed the framework/async-parser branch from a03aaf5 to 3b6d5de Compare August 6, 2018 14:34

aduth approved these changes Aug 6, 2018

View reviewed changes

aduth requested changes Aug 6, 2018

View reviewed changes

aduth added this to the 3.7 milestone Aug 15, 2018

aduth added [Type] Breaking Change For PRs that introduce a change that will break existing functionality and removed Needs Dev Ready for, and needs developer efforts labels Aug 21, 2018

aduth force-pushed the framework/async-parser branch from 3b6d5de to 13560d1 Compare August 21, 2018 20:35

aduth mentioned this pull request Aug 28, 2018

Editor: Sync blocks state to edited post content #9403

Closed

Hywan mentioned this pull request Aug 29, 2018

Overview of Short-term Parsing Enhancements #8244

Closed

11 tasks

youknowriad modified the milestones: 3.7, 3.8 Aug 30, 2018

youknowriad removed this from the 3.8 milestone Sep 5, 2018

aduth mentioned this pull request Sep 6, 2018

Improve post preview load state #7922

Closed

mcsf closed this Oct 12, 2018

mcsf deleted the framework/async-parser branch October 12, 2018 16:18

mcsf added the Future label Oct 12, 2018

dmsnell mentioned this pull request Nov 26, 2018

Block and Component Lazy Loading #2768

Open

3 tasks

aduth mentioned this pull request Dec 9, 2019

Block Parser: Implement generator / stream / chunked block parse interface #19021

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: synchronous → asynchronous execution #7970

Parser: synchronous → asynchronous execution #7970

dmsnell commented Jul 15, 2018 •

edited

Loading

Hywan commented Jul 16, 2018

aduth left a comment

aduth Jul 16, 2018

dmsnell Jul 16, 2018

aduth Jul 16, 2018

dmsnell Jul 16, 2018

aduth Jul 16, 2018

dmsnell Jul 16, 2018

aduth Jul 16, 2018

dmsnell Jul 16, 2018

aduth Jul 16, 2018

dmsnell commented Jul 16, 2018

dmsnell commented Jul 16, 2018

aduth left a comment

aduth left a comment

aduth commented Aug 15, 2018

aduth commented Aug 16, 2018

aduth commented Aug 16, 2018

Hywan commented Aug 17, 2018

aduth commented Aug 21, 2018

aduth commented Aug 22, 2018

aduth commented Sep 4, 2018

aduth commented Sep 4, 2018

Hywan commented Sep 12, 2018

aduth commented Sep 12, 2018

mcsf commented Oct 12, 2018

aduth commented Dec 9, 2019

Parser: synchronous → asynchronous execution #7970

Parser: synchronous → asynchronous execution #7970

Conversation

dmsnell commented Jul 15, 2018 • edited Loading

Why use an asynchronous parsing model?

Benefits of a synchronous model

Limitations of a synchronous environment

Benefits of an asynchronous model

Limitations of an asynchronous environment

What? It isn't all peaches?

What do we need in this PR then?

Hywan commented Jul 16, 2018

aduth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Jul 16, 2018

dmsnell commented Jul 16, 2018

aduth left a comment

Choose a reason for hiding this comment

aduth left a comment

Choose a reason for hiding this comment

aduth commented Aug 15, 2018

aduth commented Aug 16, 2018

aduth commented Aug 16, 2018

Hywan commented Aug 17, 2018

aduth commented Aug 21, 2018

aduth commented Aug 22, 2018

aduth commented Sep 4, 2018

aduth commented Sep 4, 2018

Hywan commented Sep 12, 2018

aduth commented Sep 12, 2018

mcsf commented Oct 12, 2018

aduth commented Dec 9, 2019

dmsnell commented Jul 15, 2018 •

edited

Loading