Marked removes non-breaking spaces in the original text #363

arturi · 2014-03-09T21:31:23Z

Consider, say, a text like this:
2 125 euro
I've put a non-breaking space between 2 and 125 so that it would always end up on the same line.

Marked pre-parses the text and completely removes the original non-breaking characters that I've put there:

Lexer.prototype.lex = function(src) {
  src = src
    .replace(/\r\n|\r/g, '\n')
    .replace(/\t/g, '    ')
    .replace(/\u00a0/g, ' ')
    .replace(/\u2424/g, '\n');
  return this.token(src, true);
};

This is where the devil hides: .replace(/\u00a0/g, ' ')

Here is more on why invisible non-breaking space characters are cool: http://destroytoday.com/findings/fix-widows-with-non-breaking-spaces/

The text was updated successfully, but these errors were encountered:

christopherscott · 2014-03-12T23:02:01Z

+1, if anything this should be an option, or configurable

daleconboy · 2014-03-12T23:02:29Z

Yes! I've recently lost a few hours of my life tracking this very same thing down.

chjj · 2014-03-13T03:44:01Z

@daleconboy, I'm sorry to hear that, but many people lost several hours of their lives trying to figure out why their spaces weren't getting processed correctly when text was passed in from the DOM (see #52 - cc @OscarGodson), which is why this was added in the first place.

I'll consider adding an option, but I want to keep their removal the default since more people probably get bit by this "feature" of contenteditable elements than not.

daleconboy · 2014-03-14T04:29:37Z

Hey, thanks for the response. I definitely sympathize with anyone who's been bitten by this quirk in any way, however I would argue against the default being wholesale replacement of non-breaking spaces.

Reason being, it's not a bug with marked, but rather a browser behavior which shouldn't be the responsibility of marked to manage. Technically the responsibility should fall on the developer who's using the contenteditable elements to be aware of the quirk and to manage the white space handling, or conversion, on their end.

The W3C working draft specifically calls this out to authors working with contenteditable elements:

http://www.w3.org/TR/html51/editing.html#best-practices-for-in-page-editors

Authors are encouraged to set the 'white-space' property on editing hosts and on markup that was originally created through these editing mechanisms to the value 'pre-wrap' …

It seems that with contenteditable regions expected to behave in this way, you would want to preserve their expected behavior by default to avoid confusion. This, in turn, would also avoid the confusion where devs are expecting their explicitly set non-breaking spaces to behave as expected.

And, since marked may also be used in a node environment where contenteditabe does not exist, this replacement behavior by default would be unexpected.

Bottom line, I appreciate you considering it as an option. How you decide to set the default behavior is of course up to you. Any option is definitely better than no option. I'll cast my vote for the default being no replacement. :)

Cheers!

drscannell · 2014-03-14T11:46:20Z

@daleconboy's argument is pretty convincing. Are there other use cases for no-break spaces in markdown input? I would think a set of tests would help define the severity of the issue.

OscarGodson · 2014-03-14T18:14:19Z

@daleconboy I like your point about the browser, except, in @arturi's post he specifically points out that spaces are good to fix a browser bug haha :) Also, i wouldn't agree that it's a browser issue. Markdown's "spec" doesn't say which kind of spaces are and aren't allowed so IMO Marked, and any markdown parser, should assume all spaces (nbsp, unicode, etc) should be considered what they are: spaces. Your suggestion, unless im misunderstanding it, is wanted to specifically ignore certain kinds of spaces.

scy · 2014-04-08T18:30:52Z

I’m working on a Markdown-based presentation tool, and I’m using marked to generate HTML.

Having control over when and where text wraps is vital in a good presentation. Currently, the only way I can do that with marked is by overriding the lexer with a custom one that does the same things as the original one, except for the NBSP replacement. This is of course far from future-proof: In case the original lexer changes, I have to adapt my code.

Therefore I’m very much in favor of making this configurable. If you’re interested in a PR, let us know. And although I think that not replacing the NBSPs is the “right” thing to do, I can understand that you don’t want to break existing code that relies on marked fixing the browser behavior. So, I don’t care what the default for this option is, but please introduce one.

Lendar · 2015-06-17T15:39:51Z

@scy suggestion is nice. Let me extend it with an example. It might be helpful for future readers...

import marked from 'marked';

// monkey patch for marked 0.3.3 to preserve non-breaking spaces
marked.Lexer.prototype.lex = function (src) {
  src = src
    .replace(/\r\n|\r/g, '\n')
    .replace(/\t/g, '    ')
    .replace(/\u2424/g, '\n');

  return this.token(src, true);
};

UPDATED 2017-05-11: fixed syntax

RichardForrester · 2016-02-15T02:46:42Z

Is anyone aware of an option or a work around for this issue?

There should definitely be an option to allow non-breaking spaces to pass to the HTML.

arturi · 2016-02-15T02:50:29Z

I’ve solved it by extending lex, as shown here: #363 (comment).

deanvaessen · 2017-04-29T08:05:43Z

Hello everyone,

@Lendar 's suggestion is wonderful and if I edit this in the source code I can fix it this way. However putting it straight into my own code complains about 'this.token' not being a function. What is the best way to implement this?

Cheers!

davidchambers · 2017-04-29T09:11:06Z

I imagine the problem, @deanvaessen, is the arrow function. Try replacing (src) => { ... } with function(src) { ... }. :)

deanvaessen · 2017-05-04T20:30:31Z

Confirmed. Thank you @davidchambers :)

Lendar · 2017-05-11T04:24:21Z

@davidchambers @deanvaessen surprised it's still relevant. Updated the example in the comment ⬆️

yurikhan · 2017-05-28T13:28:47Z

I am in the same boat as @scy — working on a Markdown-based presentation tool. I, too, want to control where lines break and where they never break. Please make an option that stops breaking non-breaking spaces.

As for browsers and/or WYSIWYG editors inserting non-breaking spaces where not expicitly requested by the user, that’s their bugs and should be fixed there.

ArTiSTiX · 2017-10-09T09:32:32Z

Up ?
Sometimes, non-breaking spaces are needed by the language (e.g. in French, https://fr.wikipedia.org/wiki/Espace_ins%C3%A9cable, non-breaking spaces are necessary before '?', ':', '!', ';', thousand separators, phone numbers, and i also use then between quotes and where line-breaking should be avoided like brand names).

Thus, there is no reason for removing them (i would say non-breaking spaces should not be interpreted as syntax spaces).

PS: there is also no reason for anyone to monkey-patch marked. But it's a bit annoying to always work with minor-fix forks.

oliviertassinari · 2018-02-11T17:19:02Z

Alright, let's use @Lendar monkey patch :) for replacing
https://github.com/chjj/marked/blob/6b0416d10910702f73da9cb6bb3d4c8dcb7dead7/lib/marked.js#L142-L150

joshbruce · 2018-02-11T17:53:21Z

Closing as having a fix or workaround as the Marked library proper figures its life out. :)

oliviertassinari · 2018-02-11T17:55:53Z

@joshbruce So it's a won't fix.

joshbruce · 2018-02-11T18:13:13Z

@oliviertassinari: At this juncture I'm siding with @chjj on this one (#363 (comment)). See #956 as well:

XSS fixes were the focus.
Fixing known issues and complying with CommonMark and GFM are next; so, it's on our radar (@Feder1co5oave and @UziTech) as the spec does see it as a part of mixed content: http://spec.commonmark.org/0.28/#example-302 - see also Release 0.3.9 #958 (if requested to be reopened by the primary contributors at this time, it will be)

I guess what I'm saying is, right now we have bigger fish to fry and it seems like there is a viable workaround in the meantime. Does that help?

joshbruce · 2018-02-11T18:16:14Z

Note: This only applies to explicit   inclusion in the Markdown.

oliviertassinari · 2018-02-11T18:23:49Z

@joshbruce Thanks for the extra details. I wasn't sure what was the implication of the first answer.

joshbruce · 2018-02-11T18:26:15Z

@oliviertassinari: Fair. And sorry for not providing more - was in a rush going through issues. :)

Feder1co5oave · 2018-02-11T18:32:30Z

What wrote Christopher about people getting bit by this is at least debatable.
I also came across an example in commonmark where a non-breaking space changed the interpretation behaviour because it usually isn't allowed to be used in place of a single space. So if we want to comply I think we need to at least consider this.
I never used one but I guess some people use it so I see no point in replacing them altogether with single spaces.
However, if we merge this we must require it to be tested properly.

joshbruce · 2018-02-11T18:35:53Z

@Feder1co5oave: Reopen or no?

Again, I'm not sure if the original ticket was referring to the html encoded   or a unicode character discovery - not the same thing in my book. As a user, I would expect the   to be preserved, but not necessarily a special character injection of UTF-8 or something similar...am I wrong there. I concur that Chris's assessment is debatable.

Leave it to you, brother.

Feder1co5oave · 2018-02-11T18:43:02Z

I'm pretty sure ` `s pass through without a problem. Whereas the Unicode character is currently replaced by a single space. It seems it was set this way because users somehow typed in unwanted non-breaking spaces, but it seems to me this assumption is flimsy. Also you take away from others the possibility to consciously use non-breaking spaces and I don't like that. We certainly need to improve our Unicode support in general (per commonmark), so I think this will change eventually. We need to make sure everything works smoothly as usual.

joshbruce · 2018-02-11T18:56:31Z

All right. Leaving closed for now. Flagging with newly minted #1048 for when we're ready to focus there. This could also explain the Chinese character problems with header ids, yeah?

Feder1co5oave · 2018-02-11T19:04:24Z

Yes it's related to that and headings' ids

Feder1co5oave · 2018-02-23T20:52:13Z

Tagging as #1048

joshbruce · 2018-02-24T15:14:09Z

Tagging #1043 as well, just because of the "header ids" comment.

UziTech · 2018-12-05T23:17:03Z

related pr #897

aumouvantsillage mentioned this issue Jul 1, 2015

Spaces have zero-width in HTML generated from text mode KaTeX/KaTeX#281

Closed

davidchambers mentioned this issue Mar 14, 2016

everything sanctuary-js/sanctuary-site#2

Merged

ArTiSTiX added a commit to habx/marked that referenced this issue Oct 9, 2017

Removed replacing nbsp by spaces in lexer - FIXES markedjs#363

7dbc64d

ArTiSTiX added a commit to habx/marked that referenced this issue Oct 9, 2017

Removed replacing nbsp by spaces in lexer - FIXES markedjs#363

146070c

oliviertassinari mentioned this issue Feb 11, 2018

[docs] Use non-breaking space mui/material-ui#10252

Merged

joshbruce closed this as completed Feb 11, 2018

Feder1co5oave reopened this Feb 23, 2018

Feder1co5oave added this to the 0.5.0 - Architecture and extensibility milestone Feb 23, 2018

Feder1co5oave added the proposal label Feb 23, 2018

joshbruce modified the milestones: 0.5.0 - Architecture and extensibility, v1.x - All the nope release Apr 4, 2018

UziTech added the has PR The issue has a Pull Request associated label Dec 5, 2018

kephas mentioned this issue Dec 6, 2018

Remove substitution of non-breakable spaces #897

Closed

MarcLoupias mentioned this issue May 11, 2019

Gestion des espaces insécables MarcLoupias/dvlp-faq-xml#16

Open

styfle mentioned this issue Aug 5, 2019

␤ actually becomes a newline when it shouldn't #1531

Closed

UziTech mentioned this issue Aug 5, 2019

remove substitutions #1532

Merged

4 tasks

joshbruce closed this as completed in #1532 Aug 5, 2019

UziTech mentioned this issue Jul 22, 2021

First line is not parsed as markdown #2139

Closed

nartc mentioned this issue Oct 28, 2021

Supporting empty block for spacing nartc/notion-stuff#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marked removes non-breaking spaces in the original text #363

Marked removes non-breaking spaces in the original text #363

arturi commented Mar 9, 2014

christopherscott commented Mar 12, 2014

daleconboy commented Mar 12, 2014

chjj commented Mar 13, 2014

daleconboy commented Mar 14, 2014

drscannell commented Mar 14, 2014

OscarGodson commented Mar 14, 2014

scy commented Apr 8, 2014

Lendar commented Jun 17, 2015 •

edited

Loading

RichardForrester commented Feb 15, 2016

arturi commented Feb 15, 2016

deanvaessen commented Apr 29, 2017

davidchambers commented Apr 29, 2017

deanvaessen commented May 4, 2017

Lendar commented May 11, 2017 •

edited

Loading

yurikhan commented May 28, 2017

ArTiSTiX commented Oct 9, 2017

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018 •

edited

Loading

joshbruce commented Feb 11, 2018

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018 via email •

edited

Loading

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018 via email

Feder1co5oave commented Feb 23, 2018

joshbruce commented Feb 24, 2018

UziTech commented Dec 5, 2018

Marked removes non-breaking spaces in the original text #363

Marked removes non-breaking spaces in the original text #363

Comments

arturi commented Mar 9, 2014

christopherscott commented Mar 12, 2014

daleconboy commented Mar 12, 2014

chjj commented Mar 13, 2014

daleconboy commented Mar 14, 2014

drscannell commented Mar 14, 2014

OscarGodson commented Mar 14, 2014

scy commented Apr 8, 2014

Lendar commented Jun 17, 2015 • edited Loading

RichardForrester commented Feb 15, 2016

arturi commented Feb 15, 2016

deanvaessen commented Apr 29, 2017

davidchambers commented Apr 29, 2017

deanvaessen commented May 4, 2017

Lendar commented May 11, 2017 • edited Loading

yurikhan commented May 28, 2017

ArTiSTiX commented Oct 9, 2017

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018 • edited Loading

joshbruce commented Feb 11, 2018

oliviertassinari commented Feb 11, 2018

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018 via email • edited Loading

joshbruce commented Feb 11, 2018

Feder1co5oave commented Feb 11, 2018 via email

Feder1co5oave commented Feb 23, 2018

joshbruce commented Feb 24, 2018

UziTech commented Dec 5, 2018

Lendar commented Jun 17, 2015 •

edited

Loading

Lendar commented May 11, 2017 •

edited

Loading

joshbruce commented Feb 11, 2018 •

edited

Loading

Feder1co5oave commented Feb 11, 2018 via email •

edited

Loading