-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marked removes non-breaking spaces in the original text #363
Comments
+1, if anything this should be an option, or configurable |
Yes! I've recently lost a few hours of my life tracking this very same thing down. |
@daleconboy, I'm sorry to hear that, but many people lost several hours of their lives trying to figure out why their spaces weren't getting processed correctly when text was passed in from the DOM (see #52 - cc @OscarGodson), which is why this was added in the first place. I'll consider adding an option, but I want to keep their removal the default since more people probably get bit by this "feature" of contenteditable elements than not. |
Hey, thanks for the response. I definitely sympathize with anyone who's been bitten by this quirk in any way, however I would argue against the default being wholesale replacement of non-breaking spaces. Reason being, it's not a bug with marked, but rather a browser behavior which shouldn't be the responsibility of marked to manage. Technically the responsibility should fall on the developer who's using the contenteditable elements to be aware of the quirk and to manage the white space handling, or conversion, on their end. The W3C working draft specifically calls this out to authors working with contenteditable elements: http://www.w3.org/TR/html51/editing.html#best-practices-for-in-page-editors
It seems that with contenteditable regions expected to behave in this way, you would want to preserve their expected behavior by default to avoid confusion. This, in turn, would also avoid the confusion where devs are expecting their explicitly set non-breaking spaces to behave as expected. And, since marked may also be used in a node environment where contenteditabe does not exist, this replacement behavior by default would be unexpected. Bottom line, I appreciate you considering it as an option. How you decide to set the default behavior is of course up to you. Any option is definitely better than no option. I'll cast my vote for the default being no replacement. :) Cheers! |
@daleconboy's argument is pretty convincing. Are there other use cases for no-break spaces in markdown input? I would think a set of tests would help define the severity of the issue. |
@daleconboy I like your point about the browser, except, in @arturi's post he specifically points out that spaces are good to fix a browser bug haha :) Also, i wouldn't agree that it's a browser issue. Markdown's "spec" doesn't say which kind of spaces are and aren't allowed so IMO Marked, and any markdown parser, should assume all spaces (nbsp, unicode, etc) should be considered what they are: spaces. Your suggestion, unless im misunderstanding it, is wanted to specifically ignore certain kinds of spaces. |
I’m working on a Markdown-based presentation tool, and I’m using marked to generate HTML. Having control over when and where text wraps is vital in a good presentation. Currently, the only way I can do that with marked is by overriding the lexer with a custom one that does the same things as the original one, except for the NBSP replacement. This is of course far from future-proof: In case the original lexer changes, I have to adapt my code. Therefore I’m very much in favor of making this configurable. If you’re interested in a PR, let us know. And although I think that not replacing the NBSPs is the “right” thing to do, I can understand that you don’t want to break existing code that relies on marked fixing the browser behavior. So, I don’t care what the default for this option is, but please introduce one. |
@scy suggestion is nice. Let me extend it with an example. It might be helpful for future readers...
UPDATED 2017-05-11: fixed syntax |
Is anyone aware of an option or a work around for this issue? There should definitely be an option to allow non-breaking spaces to pass to the HTML. |
I’ve solved it by extending lex, as shown here: #363 (comment). |
Hello everyone, @Lendar 's suggestion is wonderful and if I edit this in the source code I can fix it this way. However putting it straight into my own code complains about 'this.token' not being a function. What is the best way to implement this? Cheers! |
I imagine the problem, @deanvaessen, is the arrow function. Try replacing |
Confirmed. Thank you @davidchambers :) |
@davidchambers @deanvaessen surprised it's still relevant. Updated the example in the comment ⬆️ |
I am in the same boat as @scy — working on a Markdown-based presentation tool. I, too, want to control where lines break and where they never break. Please make an option that stops breaking non-breaking spaces. As for browsers and/or WYSIWYG editors inserting non-breaking spaces where not expicitly requested by the user, that’s their bugs and should be fixed there. |
Up ? Thus, there is no reason for removing them (i would say non-breaking spaces should not be interpreted as syntax spaces). PS: there is also no reason for anyone to monkey-patch marked. But it's a bit annoying to always work with minor-fix forks. |
Alright, let's use @Lendar monkey patch :) for replacing |
Closing as having a fix or workaround as the Marked library proper figures its life out. :) |
@joshbruce So it's a |
@oliviertassinari: At this juncture I'm siding with @chjj on this one (#363 (comment)). See #956 as well:
I guess what I'm saying is, right now we have bigger fish to fry and it seems like there is a viable workaround in the meantime. Does that help? |
Note: This only applies to explicit |
@joshbruce Thanks for the extra details. I wasn't sure what was the implication of the first answer. |
@oliviertassinari: Fair. And sorry for not providing more - was in a rush going through issues. :) |
What wrote Christopher about people getting bit by this is at least debatable. |
@Feder1co5oave: Reopen or no? Again, I'm not sure if the original ticket was referring to the html encoded Leave it to you, brother. |
I'm pretty sure ` `s pass through without a problem. Whereas the
Unicode character is currently replaced by a single space. It seems it was
set this way because users somehow typed in unwanted non-breaking spaces,
but it seems to me this assumption is flimsy. Also you take away from
others the possibility to consciously use non-breaking spaces and I don't
like that. We certainly need to improve our Unicode support in general (per
commonmark), so I think this will change eventually. We need to make sure
everything works smoothly as usual.
|
All right. Leaving closed for now. Flagging with newly minted #1048 for when we're ready to focus there. This could also explain the Chinese character problems with header ids, yeah? |
Yes it's related to that and headings' ids
|
Tagging as #1048 |
Tagging #1043 as well, just because of the "header ids" comment. |
related pr #897 |
Consider, say, a text like this:
2 125 euro
I've put a non-breaking space between
2
and125
so that it would always end up on the same line.Marked pre-parses the text and completely removes the original non-breaking characters that I've put there:
This is where the devil hides:
.replace(/\u00a0/g, ' ')
Here is more on why invisible non-breaking space characters are cool: http://destroytoday.com/findings/fix-widows-with-non-breaking-spaces/
The text was updated successfully, but these errors were encountered: