Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Encode URLs #12

Open
InQuize opened this issue Feb 1, 2022 · 3 comments
Open

UTF-8 Encode URLs #12

InQuize opened this issue Feb 1, 2022 · 3 comments

Comments

@InQuize
Copy link

InQuize commented Feb 1, 2022

When I try to play media file served by webfs in VLC all goes well, until there is some symbol in filename that does not pass input sanity checks.
In my case I'm having trouble with square brackets [ ].

It is a good practice to UTF-8 encode URLs so that
http://example.com/dir/sub dir/long filename with $ymbols & stuff.mp4
would look like
http%3A%2F%2Fexample.com%2Fdir%2Fsub%20dir%2Flong%20filename%20with%20%24ymbols%20%26%20stuff.mp4

If I manually encode links served by webfs with e.g. urlencoder.org, VLC does not through error and plays fine, so it should be a matter of fixing HTML, possibly as configurable option.

@LinRaymond2006
Copy link

Hello InQuize, hope these information helps:

rfc 1738 compliance

According to rfc 1738's requirement for URL, these characters must be encoded to printable US-ASCII using % encoding scheme:

  1. non-graphic US-ASCII
  2. unsafe characteres (and reserved characters)

excerpt from rfc 1738:

In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)

Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

rfc 2616 compliance

According to rfc 2616, the rfc-compliant implementation is required to decode the %-encoded octec so that it can be interpreted.

excerpt from rfc 2616

The Request-URI is transmitted in the format specified in section 3.2.1. If the Request-URI is encoded using the "% HEX HEX" encoding [42], the origin server MUST decode the Request-URI in order to properly interpret the request. Servers SHOULD respond to invalid Request-URIs with an appropriate status code.

Also refer to

  1. rfc1738 section 2.2, URL Character Encoding Issues
  2. rfc 2616 section 5.1.2, Request-URI

Feel free to correct me if I'm wrong!

@InQuize
Copy link
Author

InQuize commented Sep 1, 2024

Yeah, pretty much. Good job citing all the docs.
I had trouble with reserved chars [ ] as part of RFC 3986 section 2.2 and so had to switch to sharing using Samba.
That urlencoder link I mentioned above also has a good chunk of info on the subject.

@LinRaymond2006
Copy link

Thanks for pointing that out!
It's definitely a topic I'll do further research on in near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants