Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Encode Internationalized Domain Names #1285

Closed
ilyaigpetrov opened this issue Jun 3, 2018 · 11 comments
Closed

Don't Encode Internationalized Domain Names #1285

ilyaigpetrov opened this issue Jun 3, 2018 · 11 comments

Comments

@ilyaigpetrov
Copy link

What pain point are you perceiving?.

$ npx marked
[aaa](https://сл96.рф)   
<p><a href="https://%D1%81%D0%BB96.%D1%80%D1%84">aaa</a></p>

require('url').parse('https://%D1%81%D0%BB96.%D1%80%D1%84') populates pathname instead of hostname:

Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: null,
  query: null,
  pathname: '%D1%81%D0%BB96.%D1%80%D1%84',
  path: '%D1%81%D0%BB96.%D1%80%D1%84',
  href: 'https:///%D1%81%D0%BB96.%D1%80%D1%84' }

Describe the solution you'd like
Please, don't percent-encode internationalized domain names, because it doesn't work well with url.parse. I see no reason for IDN to be encoded.

@UziTech UziTech added the L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue label Jun 4, 2018
@UziTech
Copy link
Member

UziTech commented Jun 4, 2018

Looks like we would need to extract the domain name from href in Renderer.prototype.link

href = encodeURI(href).replace(/%25/g, '%');

A PR would be welcome

@styfle
Copy link
Member

styfle commented Jun 4, 2018

FYI, it looks like CommonMark has the same behavior as marked.

@UziTech How would you safely extract the domain name? Maybe something like this?

function getHostName() {
  var url = new URL(href);
  return url.hostname
}

But this relies on URL which is not available in IE.

@Martii
Copy link
Contributor

Martii commented Jun 4, 2018

Perhaps https://nodejs.org/api/url.html#url_url_format_url_options could be of help.... although use all true values since we would want those available. I'm not entirely sure why this project is encoding without diving into the historical commits.


Also https://nodejs.org/api/url.html#url_url_resolve_from_to presuming these are available... always test assertions. :)

@UziTech
Copy link
Member

UziTech commented Jun 4, 2018

@Martii marked is used on the client side as well and in Node so Node APIs won't work.

@styfle We could just use regex

function getHostName(href) {
  return href.match(/^https?:\/\/[^\/]+\//)[0];
}

@ilyaigpetrov
Copy link
Author

ilyaigpetrov commented Jun 5, 2018

Why do you need to encode and then to escape:

href = encodeURI(href).replace(/%25/g, '%');
var out = '<a href="' + escape(href) + '"';

What's the purpose? Can't you just do this instead:

href = href.replace(/"/g, '%22')
  .replace(/\r/g, '%0D')
  .replace(/\n/g, '%0A');
var out = '<a href="' + href + '"';

Or maybe just encodeURI is enough without escape?

@ilyaigpetrov
Copy link
Author

Also it may be not a marked fault, but a fault of url module of Node.js, because Chrome and new URL(href) both handle percent-encoded IDNs just fine.

@ilyaigpetrov
Copy link
Author

After a little contemplation I decided I'm fine with the current behavior as long as it works in major browsers and handled by search engines. If you agree and don't see any problem with url Node.js module then you may close it.

@Martii
Copy link
Contributor

Martii commented Jun 5, 2018

If I'm not mistaken there exists this at https://www.npmjs.com/package/url (haven't tried it in the DOM yet... not much time to do this at the moment). It's a bit tenured just like the node URL API.

Would be nice if ECMAScript standards would incorporate some sort of improved IRI handling like this.

@styfle styfle removed the L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue label Jun 14, 2018
@styfle
Copy link
Member

styfle commented Jun 14, 2018

Would be nice if ECMAScript standards would incorporate some sort of improved IRI handling like this.

@Martii There is the WHATWG URL standard which is implemented in modern browsers and newer versions of Node.js which I believe has improved IRI handling. See my comment above.

After a little contemplation I decided I'm fine with the current behavior as long as it works in major browsers and handled by search engines. If you agree and don't see any problem with url Node.js module then you may close it.

@ilyaigpetrov Thanks, I'll close this issue.

@styfle styfle closed this as completed Jun 14, 2018
@AlynxZhou
Copy link

Hi, I am using marked.js to build my own static site generator, and I found this issue, which breaks my code.

I use my own implementation to convert all links to absolute path and encode them, yesterday I found marked.js do encode again so the link is wrong after encode twice. I cannot stop doing encode in my code because not all pages are generated via marked.js (tags cloud, categories...). I read this issue and have some questions:

  • In Don't Encode Internationalized Domain Names #1285 (comment), I am not sure why marked needs to extract host name from URL, I think converting [aaa](bbb) into <a href="bbb">aaa</a> does not need to know what bbb means.
  • If extracting host name is needed by some options (maybe baseUrl I guess? to know whether we need to add prefix?), could you please skip doing encode if user does not enable this option? For example my code does not use it, because I have my own function to add base URL for all relative links in my site.

@UziTech
Copy link
Member

UziTech commented Sep 20, 2021

@AlynxZhou marked tries to follow CommonMark spec. I think you will have to change the spec if you want this changed in marked. Otherwise you can create an extension for marked that people can use that does not encode urls. We want to stay away from adding more options to marked to prevent feature creep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants