Non-breaking spaces replaced with regular, breaking spaces #31

daformat · 2017-11-26T11:42:02Z

The parser seems to replace non breaking spaces by normal spaces, I've tried using regular html entities, unicode characters, and hard-coding the non breaking spaces (typing alt + space on a Mac), no matter what I cannot get a non-breaking space in the parser output.

Reproducing the issue

Compare the render of <JsxParser> with React's dangerouslySetInnerHtml prop using the following:

Input

<p>This is a « test » without any non-breaking space</p>
<p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
<p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>

You can also try with hard-coded non-breaking spaces but it seems I cannot type nor paste them in github's editor for some reason, so I didn't include this in the example input

Output

With <JsxParser> you should get regular spaces everywhere, with dangerouslySetInnerHtml you should get the non-breaking spaces wherever you put them, which is the expected behavior.

Version info

I'm using React 15.6.1 and react-jsx-parser 1.1.5

The text was updated successfully, but these errors were encountered:

TroyAlford · 2017-11-27T20:43:00Z

I'm not actually clear on what is breaking in this example. Are you saying that the   is being replaced with a single non-breaking space, when rendered?

daformat · 2017-11-27T21:01:35Z

Using:

<JsxParser
  jsx={`<p>This is a « test » without any non-breaking space</p>
        <p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
        <p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>`}
  onError={error => console.log(error)}
  />

renders to (redacted and formatted for clarity):

<div class="jsx-parser">
  <p>This is a « test » without any non-breaking space</p>
  <p>This is a « test » with non-breaking spaces as Unicode character</p>
  <p>This is a « test » with non-breaking spaces as regular html entities</p>
</div>

Both \u00a0and   are non-breaking spaces and should render as such, but for some reason they get converted to regular spaces (breaking spaces that is).

The actual expected render should keep those entities and be:

<div class="jsx-parser">
  <p>This is a « test » without any non-breaking space</p>
  <p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
  <p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>
</div>

Admittedly, this is a boring example :)

daformat · 2017-12-03T20:07:54Z

So, apparently this all comes down to

react-jsx-parser/source/components/JsxParser.js

Line 59 in a99d3ff

return (value || '').replace(/\s+/g, ' ')

The \spattern matches any blank character, including non-breaking spaces, which are then converted to regular, breaking, spaces.

I can see why you'd want to replace any multiple spaces by singles spaces, but I'm not sure this is absolutely required as multiple spaces are already ignored upon rendering.

I made a fork where I just removed the replace and I indeed get the expected result, keeping my precious non-breaking spaces as such.

Here are the options I see:

Remove the replace completely (I can open a pull request for you to pull my changes).
Only replace regular spaces, and maybe tabs using a more explicit pattern (I would go with something like /[ \t]{2,}/, using {2,} in order to only match sequences of two or more spaces, including tabs)

As noted in 2. I would tweak the regex so it doesn't match single spaces, as there's no need to replace those systematically.

Hope this helps, let me know your thoughts and... have a nice day!

TroyAlford · 2017-12-04T00:34:58Z

Oh nice. Actually - I think you're right that this is entirely unnecessary, since the repo was switched to using Acorn for parsing JSX. If you want to submit a pull request - please add a couple tests, as well, to ensure that it behaves properly, and I'll be happy to make an update and re-release!

I really appreciate the help!

See TroyAlford#31

…TroyAlford#31

See TroyAlford#31

daformat · 2017-12-11T10:14:49Z

Here you go Troy,

Not sure I should have kept the nodeType tests on react renderer comments (the comment - text - comment thing), but in the end I did, so it's consistent with the other tests that were already there. Let me know if I should do things differently.

Happy to help resolving this, the issue was quite trivial but since non-breaking spaces can really make or break a layout, keeping them as such is quite important to me :)

Have a nice day.

…such (#33) * Keep white-space as-is, keeping non-breaking spaces non-breaking - see #31

TroyAlford · 2018-02-17T04:02:55Z

@daformat - I pushed a change in the last few days which should make this work even more consistently. For some reason just returning the value from a JsxText node, when it was an   would get thrown out by React. Thus, when you render something like: <b>Bold</b> <a href="...">Link</a> it would come out with nothing between the b and the a tag.

The fix I pushed may be relevant to your use-case, too - in case you run into this. It had to do with returning <Fragment>{value}</Fragment> instead of just value from these nodes - which for some reason causes React to render as an actual non-breaking space character - rather than omitting it as useless whitespace.

I am still confused about the behavior - but as this happens all the time in my use case - I can confirm that the update solves it.

Tl;Dr; - you may want to update your version. :)

daformat · 2018-02-22T13:38:13Z

Thanks for the heads up @TroyAlford! It seems those poor non-breaking spaces don't get much love these days... Will update but I'm not too concerned by this specific use-case in my app.

Maybe React does a filtering for empty text nodes via the same regexp you were using:
the \s (white spaces) character class catches all spaces, including non breaking ones and tabs, so if they test for empty text nodes using this instead of explicitly looking for regular breaking spaces and tabs, it could explain your issue.

Something like:

// Replace multiple spaces by a single space
// even non-breaking spaces, duh
decodeHTMLEntities(
  html
).replace(
  /\s+/g,
  ' '
);

would replace non breaking spaces as soon as   entities are parsed to non-breaking spaces, which I suspect happens before the multiple space trimming thing React does.

The correct behavior requires to use something more specific than the \s class.
As mentioned in the previous comments, using /[ \t]{2,}/g instead yields the expected result without removing any non-breaking spaces.

TroyAlford · 2018-02-23T09:07:37Z

Thanks for the thoughtful reply! I tried a bunch of different permutations to fix this (because I really wanted to just output the encoded version into the rendered HTML) - but in the end, settled on something that works.

I suspect you are exactly right - but hopefully this won't crop up again to require further fixing. :) If it does, this gives me some good notes to go back to, to revisit possible solutions.

daformat · 2018-02-23T12:43:41Z

You're very welcome, I don't have much time today but I did a quick search in React source code. I was looking for regexes containing \s without much luck regarding our issue.

But then it occurred to me that the inverse class, \S, could have been used, too (capital S, matching anything but whitespaces, non-breaking spaces being considered as whitespace). I found only one place where this occurs:

https://github.com/facebook/react/blob/825682390d825bb51be2e06835cf7e3458352e83/packages/react-dom/src/client/validateDOMNesting.js#L442-L453

As a quick check I compiled the list of all utf spaces I could find information on and ran a quick map to generate a table with the results of checking against different regexp:

Digging a little bit more in the React repo, I found Issue 7515

Looking at the comments it seems to me it's related:

It's still rough but I cannot look further today, just wanted to add this for reference, here is the gist for testing utf spaces against different regular expressions.

TroyAlford · 2018-03-03T21:19:13Z

Thanks for the great research summary! I'd be totally happy to check out a PR, if you think there's better handling to be done here. At the moment, I'm ok with just outputting the actual non-standard space character inside a <Fragment> block - but perhaps there's something more elegant to do?

daformat changed the title ~~Non breaking spaces replaced by normal spaces~~ Non-breaking spaces replaced with regular, breaking spaces Nov 27, 2017

daformat added a commit to daformat/react-jsx-parser that referenced this issue Dec 11, 2017

Add test for non-breaking spaces

1ddf131

See TroyAlford#31

daformat added a commit to daformat/react-jsx-parser that referenced this issue Dec 11, 2017

Keep white-space as-is, keeping non-breaking spaces non-breaking - see …

ffc0c43

…TroyAlford#31

daformat added a commit to daformat/react-jsx-parser that referenced this issue Dec 11, 2017

Add test for non-breaking spaces

2617816

See TroyAlford#31

daformat mentioned this issue Dec 11, 2017

Keep white-space as authored, so non-breaking spaces are rendered as such #33

Merged

TroyAlford closed this as completed in #33 Dec 28, 2017

TroyAlford pushed a commit that referenced this issue Dec 28, 2017

Keep white-space as authored, so non-breaking spaces are rendered as …

da56d37

…such (#33) * Keep white-space as-is, keeping non-breaking spaces non-breaking - see #31

TroyAlford added bug help wanted labels Feb 17, 2018

saendu mentioned this issue Apr 18, 2018

CR and whitespaces break rendering #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-breaking spaces replaced with regular, breaking spaces #31

Non-breaking spaces replaced with regular, breaking spaces #31

daformat commented Nov 26, 2017 •

edited

Loading

TroyAlford commented Nov 27, 2017

daformat commented Nov 27, 2017 •

edited

Loading

daformat commented Dec 3, 2017 •

edited

Loading

TroyAlford commented Dec 4, 2017

daformat commented Dec 11, 2017

TroyAlford commented Feb 17, 2018

daformat commented Feb 22, 2018 •

edited

Loading

TroyAlford commented Feb 23, 2018

daformat commented Feb 23, 2018 •

edited

Loading

TroyAlford commented Mar 3, 2018

Non-breaking spaces replaced with regular, breaking spaces #31

Non-breaking spaces replaced with regular, breaking spaces #31

Comments

daformat commented Nov 26, 2017 • edited Loading

Reproducing the issue

Input

Output

Version info

TroyAlford commented Nov 27, 2017

daformat commented Nov 27, 2017 • edited Loading

daformat commented Dec 3, 2017 • edited Loading

TroyAlford commented Dec 4, 2017

daformat commented Dec 11, 2017

TroyAlford commented Feb 17, 2018

daformat commented Feb 22, 2018 • edited Loading

TroyAlford commented Feb 23, 2018

daformat commented Feb 23, 2018 • edited Loading

TroyAlford commented Mar 3, 2018

daformat commented Nov 26, 2017 •

edited

Loading

daformat commented Nov 27, 2017 •

edited

Loading

daformat commented Dec 3, 2017 •

edited

Loading

daformat commented Feb 22, 2018 •

edited

Loading

daformat commented Feb 23, 2018 •

edited

Loading