Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

­ tags get sanitized when using "Edit Block as HTML" #23509

Open
bfiessinger opened this issue Jun 26, 2020 · 9 comments
Open

­ tags get sanitized when using "Edit Block as HTML" #23509

bfiessinger opened this issue Jun 26, 2020 · 9 comments
Labels
[Block] HTML Affects the the HTML Block [Package] Element /packages/element [Type] Bug An existing feature does not function as intended

Comments

@bfiessinger
Copy link

Describe the bug
When using ­ (maybe other HTML entities too) with edit block as HTML the tag get's sanitized right after saving the post.

Before save:
shy2

After save:
shy1

Editor version (please complete the following information):

  • WordPress version: 5.4.2
  • Gutenberg version: 8.4.0
@annezazu annezazu added [Block] HTML Affects the the HTML Block Needs Testing Needs further testing to be confirmed. labels Jun 30, 2020
@getdave
Copy link
Contributor

getdave commented Sep 7, 2020

Similar to #24282. I can confirm that:

  • Manually adding ­ to the HTML block content results in the "soft hyphen" character not being rendered when you Preview the block.
  • Adding & to the HTML block content results in the & character being correctly rendered when you Preview the block.

@getdave getdave added [Package] Element /packages/element and removed Needs Testing Needs further testing to be confirmed. labels Sep 7, 2020
@getdave
Copy link
Contributor

getdave commented Oct 2, 2020

Update.

I created an html block with content <h2>Geschäfts&shy;führung</h2>.

I then debugged the block serialization routine by hitting Save Draft to trigger serializing and persisting the block.

The result of that routine is that &shy; is not stripped from the content when serializing the block.

Screen Shot 2020-10-02 at 15 23 17

Reloading the browser at this point will cause the editor to boot and the parse routine to run over the content coming from the database. Again here the serialized content being passed into parse() still contains the &shy; character:

Screen Shot 2020-10-02 at 15 32 07

I believe the problem is that the parser code for blocks with source type of html ultimately ends up in the html() matcher:

return match.innerHTML;

This uses .innerHTML which will parse the raw content as HTML. This causes most (but not all!) entities such as &shy; to be parsed into their equivalents. Exceptions to this auto parsing rule for .innerHTML are the characters &, <, or > - innerHTML returns these characters as the HTML entities &amp;, &lt; and &gt; respectively.

This means that the block's content is then converted into a version with most (but not all) entities converted. This form of the block is then saved and the original content is lost forever.

Conclusion

My take is that the HTML block doesn't do what we expect. If we enter HTML entities such as &shy; in the block we should expect these to be preserved exactly as is. However, if we select the Preview button on the HTML block we should expect these characters to be parsed and shown as their equivalents.

Basically:

  • the "raw" HTML block should respect entities and not parse them to equivalents.
  • the "Preview" mode of the HTML block should parse entities to equivalents for preview purposes but this should not effect the underlying block's content.

@romspielplatz
Copy link

The same happens in the code editor. The workaround to hit the save draft button does not work for me. Saving the draft or updating the post strips the ­ tag from the content.

@vortacan
Copy link

I just struggled with the same problem when trying to implement soft hyphens in the core/paragraph block. I assumed, it should be possible to enter &shy; (or equivalents &#xAD; &#173;) in html mode and keep them as HTML entities in the code (works with &amp; for example). But they are parsed right away, as soon as the block is deselected (magically disappearing). Actually the hyphens are there and work in the visual mode of the block as well as on the frontend. But not keeping the entities as such in the html mode makes it impossible to edit or correct their position.

Furthermore I think the original question of this ticket was about the html mode of a block ("edit block as HTML") whereas @getdave tests and answers refer to the HTML block. A solution might solve both though.

@skorasaurus
Copy link
Member

heavily related to #12872

@getdave
Copy link
Contributor

getdave commented Jun 28, 2021

A solution might solve both though.

Highly likely. I've still got this on my radar but unlikely until after WordPress 5.8 is released.

@goldtreefrog
Copy link

goldtreefrog commented Apr 13, 2022

I am running WordPress 5.9.3 (with a child theme of GeneratePress, but I think this is not a theme issue). I am using a classic block because I already had HTML code for the table I need and pasting it in was the fastest way to get it in there. At small screen sizes, I need hypenation. I can insert a soft hyphen and watch it magically disappear as soon as I save the draft, sometimes sooner. When I view page source in the browser there is no &shy; anywhere. Using the hyphens: auto setting in CSS helps, but there are words that are not hyphenated when I need them to be.

Is there any hope for a fix in the WordPress editor? There is a PHP solution, perhaps, but it would be nicer if soft hyphen support was built in.

A PHP solution is given at Stack Exchange.

@jordesign jordesign added the [Type] Bug An existing feature does not function as intended label Aug 24, 2023
@audunmb
Copy link

audunmb commented Jan 12, 2024

Current behaviour (6.4.2, TT3 theme, with Firefox) is now:

Do the following steps:

  1. Write a long word.

  2. To add a manual hyphen, edit as html and insert a &shy; at the wanted place.

  3. Revert to visual editor. The soft hyphen now shows as a hyphen in the block.

Now it starts to get odd. When pressing delete in front of the hyphen you delete the next character, not the hyphen. If you use backspace, you can delete it.

If you switch back to html editor, the soft hyphen shows as a hyphen character - instead of &shy;. It still works as a soft hyphen though.

When switching to frontend, the hyphen behaves as ­ should, except that it's not in the html code (use source view or inspector tools).

This is quite odd. Expected behaviour would be:

  • Delete should remove the hyphen in visual editor.
  • The entity should show as &shy; in html mode.

It also baffles me why the entity is removed from the html code, only to have some other code adding a soft hyphen (not sure how) still. Someone went to a lot of trouble only to code the behaviour of a soft hyphen instead of just using html standards.

The best solution would anyway be to fix soft hyphens with #55565

@jvandriel
Copy link

jvandriel commented Feb 5, 2024

I'm having the same issue. When I write the following HTML, on save the encoded HTML symbols get turned into the actual characters. Breaking the heading's content, which is supposed to show those tags. That's why I encoded them to begin with.

<h3 class="wp-block-heading">
  Something something <code>&lt; em &gt;</code> and <code>&lt; strong &gt;</code> something
</h3>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] HTML Affects the the HTML Block [Package] Element /packages/element [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

10 participants