-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML block: fix parsing #27268
HTML block: fix parsing #27268
Conversation
9747a02
to
19ee60a
Compare
Size Change: +137 B (0%) Total Size: 1.33 MB
ℹ️ View Unchanged
|
Asking wider feedback because it adds a small piece to the existing block API. |
@@ -5,7 +5,7 @@ | |||
"attributes": { | |||
"content": { | |||
"type": "string", | |||
"source": "html" | |||
"source": "raw" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure there's already a PR that does the same thing with lengthy discussion from @aduth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh haha :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find it :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it, this is the one I was talking about #10551
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youknowriad Nice, thanks, will read quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the difference with that PR is that it still uses the html
type instead of a new raw
type, but the solution is essentially the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and I believe @aduth's point is that it's the html
type itself that is "entirely" broken and that we shouldn't fix just the "HTML block" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although...if this fixes the immediate experience for users could we land this PR and then write a followup Issue detailing the problems with html
and how we might look to resolve them long term? Not being able to put HTML in the HTML block feels like a bit of a poor UX 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should add a new API (new "source" type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this using the examples provided by the contributor in the original Issue.
This PR does seem to resolve the issue in that chars such as <
are not converted to entities. Here's the comparison from my testing.
Master | This PR |
---|---|
As you can see on master we get errors because the generated save content doesn't match the post content. On this PR however, there are no errors and everything is great 🎉
@ellatrix I'll defer to yourself and @youknowriad regarding whether to land this or to try and ship a more comprehensive fix to HTML in general.
One thing - we should probably add some tests to cover this scenario.
Definitely needs some tests. This is just a proposal so far. I'd like some feedback from at least some more people like @youknowriad @mtias @mcsf. I don't see a good alternative here. We could try to fix the |
I believe this is why the entities are encoded. See this note from innerHTML from MDN:
|
Yes, because of this, the HTML you SET with One alternative solution here is to always encode characters just like |
This sounds similar to what my original PR did but I was decoding. Correct me if I'm wrong but the key would be to ensure that the Gutenberg annotations are 1:1 with whatever the user types into the Block? If the database needs to have that data encoded for persistence then that's ok so long as when it's parsed back it returns back 1:1 with what the user originally entered? |
The parser has changed since this PR was created, do we still want to do this? Or should we close this? |
@aristath Unfortunately, this is still a problem. Would be good to agree on the solution. |
19ee60a
to
e265ebf
Compare
So to recap... The problem is that when HTML code it put into the HTML block certain characters are encoded into HTML entities. The reason for this is that the @ellatrix's proposed fix (here) is to introduce a new @youknowriad was not in favour of introducing a new API however. He referenced a previous PR by @johngodley where they were looking to address a similar issue by either
The outcome of #10551 is that neither @youknowriad nor @aduth were happy to commit to merging the PR. However, the goal of that PR seems to have been around normalizing the HTML whereas our problem is relating to the HTML being fundamentally transformed. There is also #25120 which attempts to "fix" the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't impact any other blocks besides the HTML block. We should probably try and keep an eye out in case anyone else wants to use raw
for another block. As discussed in person, I think there are still peculiarities about what the role of the HTML block even is or what proper behavior here is.
So practically I think this is a fair tactical change and addresses the issue at hand without adding much baggage. If we ever get around to reworking the HTML block we can change this stuff here, or if we overhaul the attribute sourcing and validation.
1 > 0
b04b62e
to
d87cfe5
Compare
I'm in favour of merging this one. The impact seems limited to the HTML block and it is "opt in" rather than default behaviour. Current unit test failures seem related and will need to be resolved prior to merge. |
In #39424 when we added an optimization to only parse a block's innerHTML once we also changed the behavior that the innerHTML propety represented the raw HTML loaded by the parser. Instead, what we have since that change is the DOM of the parsed HTML. In this patch we're adding yet-another parameter to the bag of arguments in `getBlockAttribute()` so that the new `raw` type can read and reproduce that original source, e.g. when reading the string `1 < 0` the parsed value's `innerHTML` will be `1 < 0` even though the block's raw content was `1 < 0`.
d87cfe5
to
bc59be6
Compare
I'm very pleased to see this get merged 🥳 |
Description
Fixes #24282, or at least partly.
The issue has been around since the beginning of Gutenberg.
The problem is that, when parsing the blocks,
element.innerHTML
encodes these characters.What we're looking for is the raw block content, so there's no need to parse the HTML at all, it can be skipped.
How has this been tested?
Type
0 < 1
in an HTML block. Save and reload. There should be no error.Screenshots
Types of changes
Checklist: