You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trailing non-ASCII space characters in URI are trimmed when the URI is expressed in IRI form with as_iri() and then fed to new().
say length URI->new(URI->new('%20')->as_iri); # 3
say length URI->new(URI->new('%09')->as_iri); # 3
say length URI->new(URI->new('%0B')->as_iri); # 3
say length URI->new(URI->new('%E3%80%80')->as_iri); # 0
say length URI->new(URI->new('%E2%81%9F')->as_iri); # 0
I am not sure if the problem is because new() trims trailing spaces, or because as_iri() unescapes non-ASCII space characters. (Maybe the latter?)
The text was updated successfully, but these errors were encountered:
Looking at https://url.spec.whatwg.org/ it says that a URL parser should "Remove any leading and trailing C0 control or space from input." There's no reference to non-ASCII space characters.
The same spec says "Standardize on the term URL. URI and IRI are just confusing." I'm learning a lot today. :)
So, I'm not sure what the significance of non-ASCII spaces in an IRI is. Is it correct to say that they're allowed in URLs?
That requires some investigation and rethinking, as at some places it is talking about the 'real space' character, or the single byte representing it. It also references tabs specifically. And I did look at the algorithm describing the state-machine... It is a complicated thing.
Trailing non-ASCII space characters in URI are trimmed when the URI is expressed in IRI form with as_iri() and then fed to new().
I am not sure if the problem is because new() trims trailing spaces, or because as_iri() unescapes non-ASCII space characters. (Maybe the latter?)
The text was updated successfully, but these errors were encountered: