-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: do not ignore IDNA conversion error #11549
Conversation
Hopefully the issue with legacy url parser is fixed. /cc @nodejs/intl @nodejs/url New CI: https://ci.nodejs.org/job/node-test-pull-request/6586/ |
@@ -1007,7 +1008,8 @@ the new `URL` implementation but is not part of the WHATWG URL standard. | |||
* `domain` {String} | |||
* Returns: {String} | |||
|
|||
Returns the Unicode serialization of the `domain`. | |||
Returns the Unicode serialization of the `domain`. If `domain` is an invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, should this be deserialization
, and mention that it is the inverse of domainToASCII
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is serialization, since the domain is fully parsed and subsequently serialized from the parsed form. It's just that it uses a different algorithm for deserialization.
src/node_i18n.cc
Outdated
@@ -489,8 +492,11 @@ static void ToUnicode(const FunctionCallbackInfo<Value>& args) { | |||
CHECK_GE(args.Length(), 1); | |||
CHECK(args[0]->IsString()); | |||
Utf8Value val(env->isolate(), args[0]); | |||
// optional arg | |||
bool lenient = args[1].As<Boolean>()->Value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the args.Length()
check above to use 2? Also, you probably want to add a CHECK(args[1]->IsBoolean());
or do args[1]->BooleanValue()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't update the check for argument length, since (as the comment is trying to say) it is an optional argument, so that existing usage of toUnicode(str)
would still work. V8 automatically returns an Undefined
for out-of-range args[]
dereference.
Wasn't aware of BooleanValue()
. Will use that instead.
src/node_i18n.cc
Outdated
@@ -508,8 +514,11 @@ static void ToASCII(const FunctionCallbackInfo<Value>& args) { | |||
CHECK_GE(args.Length(), 1); | |||
CHECK(args[0]->IsString()); | |||
Utf8Value val(env->isolate(), args[0]); | |||
// optional arg | |||
bool lenient = args[1].As<Boolean>()->Value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ditto)
MaybeStackBuffer<char> buf; | ||
int32_t len = ToASCII(&buf, *val, val.length()); | ||
int32_t len = ToASCII(&buf, *val, val.length(), lenient); | ||
|
||
if (len < 0) { | ||
return env->ThrowError("Cannot convert name to ASCII"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error part of any non-experimental API? Could we change it to Cannot encode name to ASCII as Punycode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for toASCII
> url.parse(`http://${'é'.repeat(230)}.com/`)
Error: Cannot convert name to ASCII
MaybeStackBuffer<char> buf; | ||
int32_t len = ToUnicode(&buf, *val, val.length()); | ||
int32_t len = ToUnicode(&buf, *val, val.length(), lenient); | ||
|
||
if (len < 0) { | ||
return env->ThrowError("Cannot convert name to Unicode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error part of any non-experimental API? Could we change it to Cannot decode name as Punycode
? (basically the same question I also posted below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; in fact the toUnicode
JS function isn't used in the code base at all. Maybe we should just remove this method?
/cc @jasnell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove which function specifically? The `i18n::ToUnicode' function is definitely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasnell, the exposed process.binding('icu').toUnicode()
JS function.
src/node_i18n.cc
Outdated
@@ -493,7 +493,7 @@ static void ToUnicode(const FunctionCallbackInfo<Value>& args) { | |||
CHECK(args[0]->IsString()); | |||
Utf8Value val(env->isolate(), args[0]); | |||
// optional arg | |||
bool lenient = args[1].As<Boolean>()->Value(); | |||
bool lenient = args[1]->BooleanValue().FromJust(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this compile? Seems like the env->context()
argument is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaybeStackBuffer<char> buf; | ||
int32_t len = ToUnicode(&buf, *val, val.length()); | ||
int32_t len = ToUnicode(&buf, *val, val.length(), lenient); | ||
|
||
if (len < 0) { | ||
return env->ThrowError("Cannot convert name to Unicode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also fix the missing errors when parsing percent-encoded disallowed characters in hosts(https://github.com/nodejs/node/blob/master/test/fixtures/url-tests.js#L4499) since we are no longer ignoring UIDNA_ERROR_DISALLOWED
, you can turn them on in this PR if you like.
@jasnell, did you see #11549 (comment)? |
Old behavior can be restored using a special `lenient` mode.
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests
Test re-enabled per @joyeecheung. Will land tomorrow. |
Landed in a520508...7ceea2a. |
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Currently, the ICU-based IDNA conversion methods only return errors on those passed along through a
UErrorCode
. However, according to ICU's documentation foruidna_nameToASCII()
,In other words, when non-catastrophically invalid domains are passed,
ToASCII()
andToUnicode()
(and their downstreamurl.domainToASCII()
andurl.domainToUnicode()
) currently return garbled domain names instead of errors.This PR makes the C++ binding methods report errors in
pInfo->errors
in addition toUErrorCode
, thereby fixing those aforementioned problems.Also included in this PR are additional tests for invalid situations as well as documentation clarifications for the user-facing
url.domainToASCII()
andurl.domainToUnicode()
.Before vs. after
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)