Non-ASCII space characters are trimmed (IRI) #56

ranvis · 2018-11-28T13:30:49Z

Trailing non-ASCII space characters in URI are trimmed when the URI is expressed in IRI form with as_iri() and then fed to new().

say length URI->new(URI->new('%20')->as_iri); # 3
say length URI->new(URI->new('%09')->as_iri); # 3
say length URI->new(URI->new('%0B')->as_iri); # 3
say length URI->new(URI->new('%E3%80%80')->as_iri); # 0
say length URI->new(URI->new('%E2%81%9F')->as_iri); # 0

I am not sure if the problem is because new() trims trailing spaces, or because as_iri() unescapes non-ASCII space characters. (Maybe the latter?)

The text was updated successfully, but these errors were encountered:

oalders · 2019-02-05T20:56:56Z

Looking at https://url.spec.whatwg.org/ it says that a URL parser should "Remove any leading and trailing C0 control or space from input." There's no reference to non-ASCII space characters.

The same spec says "Standardize on the term URL. URI and IRI are just confusing." I'm learning a lot today. :)

So, I'm not sure what the significance of non-ASCII spaces in an IRI is. Is it correct to say that they're allowed in URLs?

vanHoesel · 2019-02-05T23:01:37Z

That requires some investigation and rethinking, as at some places it is talking about the 'real space' character, or the single byte representing it. It also references tabs specifically. And I did look at the algorithm describing the state-machine... It is a complicated thing.

same applies for #61

oalders · 2019-02-06T04:58:06Z

Thanks for your thoughts @vanHoesel. I was hoping you'd contribute to the conversation. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ASCII space characters are trimmed (IRI) #56

Non-ASCII space characters are trimmed (IRI) #56

ranvis commented Nov 28, 2018

oalders commented Feb 5, 2019

vanHoesel commented Feb 5, 2019

oalders commented Feb 6, 2019

Non-ASCII space characters are trimmed (IRI) #56

Non-ASCII space characters are trimmed (IRI) #56

Comments

ranvis commented Nov 28, 2018

oalders commented Feb 5, 2019

vanHoesel commented Feb 5, 2019

oalders commented Feb 6, 2019