-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser: synchronous → asynchronous execution #7970
Conversation
34a10cb
to
ffa5b67
Compare
Thanks for the PR! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an important future-proofing change we should make sooner than later. That it introduces some complexity in loading states is something we would have been better to anticipate earlier in the project, but which we should design for all the same. I think it can be improved from what's implemented here, though it's not awful even as-is, and is certainly better than a loading state which is merely the browser hanging.
It's maybe the only public-facing API where we're "forcing" ES2015+ (i.e. Promises), though not in a way that I'd anticipate complaints being lodged. It's not imposing any build system, browser requirements, or polyfills to be effected by the consuming developers; merely to understand how to work with them as a return type.
@@ -52,11 +52,11 @@ class BlockDropZone extends Component { | |||
} | |||
|
|||
onHTMLDrop( HTML, position ) { | |||
const blocks = rawHandler( { HTML, mode: 'BLOCKS' } ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make this function async
/ await
to avoid effecting changes much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't know we could - done!
blocks/api/post-parser.js
Outdated
export * from './post.pegjs'; | ||
import { parse as syncParse } from './post.pegjs'; | ||
|
||
export const parse = ( document ) => Promise.resolve( syncParse( document ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically some variable shadowing going on here with the document
global. Would post
be a more accurate variable name anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to postContent
HTML: getBlockContent( block ), | ||
mode: 'BLOCKS', | ||
canUserUseUnfilteredHTML, | ||
} ).then( ( content ) => dispatch( 'core/editor' ).replaceBlocks( block.uid, content ) ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: blocks
or newBlocks
seems like a more accurate variable name than content
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconstructed
HTML: block.originalContent, | ||
mode: 'BLOCKS', | ||
} ) ); | ||
} ).then( ( content ) => replaceBlock( block.uid, content ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: block
or newBlock
seems like a more accurate variable name than content
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconstructed
blocks/api/raw-handling/index.js
Outdated
@@ -77,7 +77,7 @@ function getRawTransformations() { | |||
* @param {Array} [options.tagName] The tag into which content will be inserted. | |||
* @param {boolean} [options.canUserUseUnfilteredHTML] Whether or not the user can use unfiltered HTML. | |||
* | |||
* @return {Array|string} A list of blocks or a string, depending on `handlerMode`. | |||
* @return {Promise<Array|string>} A list of blocks or a string, depending on `handlerMode`. | |||
*/ | |||
export default function rawHandler( { HTML = '', plainText = '', mode = 'AUTO', tagName, canUserUseUnfilteredHTML = false } ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still one more instance of the synchronous form at:
editor/components/block-settings-menu/block-unknown-convert-button
@aduth I tried converting to
But I'm lost. The Promise version works but the |
Update @aduth the problem is apparently in sending with that, I think using the |
a03aaf5
to
3b6d5de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for my delay in revisiting this one. As restitution, I've taken it upon myself to do the rebase.
The SETUP_EDITOR
still wasn't working quite well, since as an async
function its return value is a promise, which cannot be handled without an appropriate middleware. I've restored the behavior of it as implemented by Promise, which required some hacking at its tests to become working again.
I'm having some issues with end-to-end tests in my environment. Will see if they persist through to Travis or if it's my local setup.
Unfortunately this appears it'll be a breaking change; one for which we can't really provide a proper deprecate
path, since the primary point of change is in the return value of wp.blocks.parse
(from Array
to Promise<Array>
).
We should probably highlight this in release notes.
From my end, this looks good. May be worth having a second set of eyes on it, since it's a pretty fundamental change in parsing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Travis failures, end-to-end test issues are probably legitimate.
Milestoning this to 3.7. We should get it in sooner than later. I can plan to revisit the failures soon. |
Thought occurred to me: Should we want serialization to be asynchronous as well, both as a complement to the parse and in similarly enabling future offloading? |
I believe the failures can likely be attributed to the fact that now that the parse is asynchronous, there's an opportunity for the editor to be rendered without its block representation of content being ready. We shouldn't want this, as it can lead to destructive user flows where content is inadvertently replaced. With an asynchronous parse, we probably want to include a "state" where we know that content exists, but we haven't yet received the parsed result; and until we do, we should prevent the user from interacting with the editor (at least the blocks list). |
Having an async serializer makes sense to me. |
In this patch **we're changing the execution model of the `post_content` parser from _synchronous_ to _asynchronous_**. ```js const doc = '<!-- wp:paragraph --><p>Wee!</p><!-- /wp:paragraph -->'; const parseSyncNowBroken = ( doc ) => { const parsed = parseWithGrammar( doc ); return parsed.length === 1; } const parseAsyncNowWorks = ( doc ) => { return parseWithGrammar( doc ).then( parsed => { return parsed.length === 1; } ); } const usingAsyncAwait = async ( doc ) => { const parsed = await parseWithGrammar( doc ); return parsed.length === 1; } ``` So far whenever we have relied on parsing the raw content from `post_content` we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want. - execution flow is straightforward - loading state is "free" because nothing renders until parse finishes - execution of the remainder of the app waits for every parse - cannot run parser in `WebWorker` or other context - cannot run parsers written in an asynchronous manner These limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change. - parsing doesn't block the UI - parsing can happen in a `WebWorker`, over the network, or in any asynchronous manner - UI _must_ become async-aware, the loading state is no longer "free" - race conditions _can_ appear if not planned for and avoided Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify `post-parser.js` to this instead and it will become obvious… ```js import { parse as syncParse } from './post.pegjs'; export const parse = ( document ) => new Promise( ( resolve ) => { setTimeout( () => resolve( syncParse( document ) ), 2500 ); } ); ``` With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post. For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash. To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now. As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling `parse()` in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished. - Please review the code changes here and reflect on their implications. The tests had to change and so do all functions that rely on parsing the `post_content` - they must become asynchronous themselves. - Please consider the race conditions and the experience implications and weigh in on how severe you estimate them to be. - Please share any further thoughts or ideas you may have concerning the sync vs. async behavior.
3b6d5de
to
13560d1
Compare
Pushed a rebase to at least resolve the changes. My plan is push for necessary updates in the morning:
To the last point, the reason it's taken me longer than I'd like to have revisited this is that I think it could tie into a larger refactor of the editor initialization. I also think it may surface the need for some generic actions around setting / replacing the content of the editor. Finally, it could even tie into pieces of #8822 (comment) where we need the concept of a placeholder block (here, to represent the known presence / unknown parse result, there to serve as a stand-in for the off-screen blocks). |
For my own reference, I'm also observing a failed network request to |
It's worth noting that should we change |
I am still of the opinion that asynchronous parse and serialize are the way to go, but for transparency it's worth considering that a synchronous parse from |
What is blocking the PR? Can I help? |
@Hywan Currently it's a combination of:
|
I was pretty excited about reaping these benefits, especially in order to take full advantage of @Hywan's excellent Rust parser, but such a fundamental change sadly cannot fit into the WP 5.0 merge timeline. In practical terms, since replacing the PEG-generated parser in #8083, parsing has worked quickly and robustly on both client and server. The synchronous character of block parsing hasn't yet been a measurable issue. In time, perhaps, we may justify a switch of the entire cycle (parsing and serialization) to async. For that, I'm labelling this Future and closing. In the meantime, I hope to see client-side applications of the async parser for intensive document processing, or the "emancipation" of Gutenberg's block-based document format thanks to the portability of the Rust parser — e.g. applications that process Gutenberg documents outside the Web environment, adoption in the mobile apps, etc. A million thanks to both @Hywan and @dmsnell for all the parsing efforts. I'm sure more will come for the document of the Web. |
Follow-up issue to capture some of the goals here: #19021 |
In this patch we're changing the execution model of the
post_content
parser from synchronous to asynchronous.Why use an asynchronous parsing model?
So far whenever we have relied on parsing the raw content from
post_content
we have done so synchronously and waited for the parser to finish before continuing execution. When the parse is instant and built in a synchronous manner this works well. It imposes some limitations though that we may not want.Benefits of a synchronous model
Limitations of a synchronous environment
WebWorker
or other contextThese limitations are things we anticipated and even since the beginnings of the project we could assume that at some point we would want an asynchronous model. Recently @Hywan wrote a fast implementation of the project's parser specification but the output depends on an asynchronous model. In other words, the timing is right for us to adopt this change.
Benefits of an asynchronous model
WebWorker
, over the network, or in any asynchronous mannerLimitations of an asynchronous environment
What? It isn't all peaches?
Sadly once we enter an asynchronous world we invite complexities and race conditions. The code in this PR so-far doesn't address either of these. The first thing you might notice is that when loading a document in the editor we end up loading a blank document for a spit-second before we load the parsed document. If you don't see this then modify
post-parser.js
to this instead and it will become obvious…With this change we are simulating that it takes 2.5s to parse our document. You should see an empty document load in the editor immediately and then after the delay the parsed document will surprisingly appear. During that initial delay we can interact with the empty document and this means that we can create a state where we mess up the document and its history - someone will think they lost their post.
For the current parsers this shouldn't be a practical problem since the parse is fast but likely people will see an initial flash.
To mitigate this problem we need to somehow introduce a loading state for the editor: "we are in the process of loading the initial document" and that can appear as a message where the contents would otherwise reside or it could simply be a blank area deferring the render until ready. A common approach I have seen is to keep things blanked out unless the operation takes longer than a given threshold to complete so as not to jar the experience with the flashing loading message, but that's really a detail that isn't the most important thing right now.
As for the race condition we may need to consider the risk and the cost of the solution. Since I think the flash would likely be more jarring than the race condition likely it may be a problem we can feasibly defer. The biggest risk is probably when we have code rapidly calling
parse()
in sequence and the results come back out of order or if they start from different copies of the original document. One way we can mitigate this is by enforcing a constraint on the parser to be a single actor and only accept (or queue up instead) parsing requests until the current work is finished.What do we need in this PR then?
post_content
- they must become asynchronous themselves.