-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work in progress for Resource cleanup #6518
Conversation
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
* <li>The path provided is ensured to be relative and belongs within the Resource, attempts to reference a path | ||
* outside of the Resource will result in an exception.</li> | ||
* </ul> | ||
* </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregw @lachlan-roberts I could use feedback on this clarification on addPath
* <p> | ||
* Constraints: | ||
* <ul> | ||
* <li>The path is expected to be in URI path-like format.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "URI path-like" format is the ambiguous decoded path space that is returned from getPathInfo
etc.
I think it is this space that needs to be better defined, which is part of the conversation at: jakartaee/servlet#18 (comment)
The path is "URI-like" in that it is /
separated segments, but it is not a URI as it cannot contain encoded characters, specifically it cannot contain an encoded /
, so there are some paths that cannot be specified.
* Constraints: | ||
* <ul> | ||
* <li>The path is expected to be in URI path-like format.</li> | ||
* <li>The path provided will not be URI decoded by the Resource implementation.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit of a convoluated way of saying that the path is decoded. Ie any % characters will not result in further decoding, but will themselves be encoded
* <li>URI canonicalization will be attempted before being used against the Resource. | ||
* This can result in a normal Resource, perhaps with it being an alias to a different Resource, | ||
* perhaps even rejected with an Exception if there is an attempt to navigate above this Resource.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is exactly correct. Rather the path is checked to see if conanicalized, would it navigate above the base and if so an MUE us thrown. Otherwise the possibly non-canonical path is used, which may result in a non-normal Resource which will return true from isAlias
* <li>The path provided will be given to the Resource implementation for interpretation on how it | ||
* will behave (such as using "a\\b.txt" on a Path based in linux vs windows).</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm probably true... but really not very desirable.
Ideally \
would never be treated as a separator as the separator is '/' in this space.
import static org.hamcrest.MatcherAssert.assertThat; | ||
import static org.hamcrest.Matchers.equalTo; | ||
|
||
public class ResourceAddPathTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather test cases be added to the existing ResourceTest
rather than create a special purpose unit test. By creating multiple special purpose test that overlap with other test cases it just becomes too complex to see what is tested and then gaps emerge. We have already gone down this path to some extent as we have ResourceTest
, FileSystemResourceTest
, PathResourceTest
, ResourceAliasTest
.
We need to simplify and rationalize the testing rather than duplicate tests, so that ResourceTest
does all the tests it can over all the different implementations of Resource
. Then we can have a PathResourceTest
iff there are some tests that can only be done on PathResources and not other resources in general. We should strive to get rid of ResourceAliasTest
, FileSystemResourceTest
and definitely not add ResourceAddPathTest
/** | ||
* Collection of Path encoding / decoding utilities for URI, Http Headers, and HTML | ||
*/ | ||
public class PathUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is well named. It reads like it is for utility methods for the Path
class.
Instead it has methods that work on the strange decoded URI path space that is ill defined by the servlet specification. Currently we have such methods on URIUtil, which I guess is similarly badly named as they are not URI paths either.... well I guess they are decoded URI paths, so it's a little bit better.
Eitherway, I think we should strive to not add extra classes like this, unless their purpose is clearly defined.
URI_ENCODED_CHARS = new String[127]; | ||
Arrays.fill(URI_ENCODED_CHARS, null); | ||
|
||
// URI Reserved Chars | ||
String uriReservedGenDelims = ":?#[]@"; // intentionally missing "/" | ||
String uriReservedSubDelims = "!$&'()*+,;="; | ||
// Extra Reserved Chars (specified by Jetty) | ||
String jettyReserved = "%\"<> \\^`{}|"; | ||
|
||
String reserved = uriReservedGenDelims + uriReservedSubDelims + jettyReserved; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this array approach, but I think we should switch around how we create it to defining what characters will not be encoded rather than which characters will be encoded.
We should not encode /
and the unreserved characters from the URI spec:
- ALPHA:
A
-Z
+a
-z
- DIGIT:
0
-9
-
/.
/_
/~
I think we should encode everything else.
return buf.toString(); | ||
} | ||
|
||
public static String encodePathDelayAlloc(String rawpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the JMH has indicated enough that this is a better algorithm than the URIUtil.encodePath
. Just update that method and don't need to check in and maintain the jmh or alternatives.
If we do want to check in a JMH, it should be a general comparison between doing a static array lookup vs switch(char)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to jmh compare the approach used in HttpParser, which looks up characters in an array of Token[256]
and then does a switch on the enum obtained from token.getType
.
Responsibilities:
Signed-off-by: Joakim Erdfelt joakim.erdfelt@gmail.com