Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBEAP-27450 Normalize URL notation via URI.toURL().toExternalForm() #737

Merged
merged 2 commits into from
Jul 29, 2024

Conversation

TomasHofman
Copy link
Contributor

@TomasHofman TomasHofman commented Jul 23, 2024

This went bit farther than just calling URL.toExternalForm(). I tried to simplify the RepositoryDefinition class a bit.

@TomasHofman
Copy link
Contributor Author

Cc @spyrkob

Copy link
Collaborator

@spyrkob spyrkob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomasHofman there are some problems with using URI class - I added some notes below.

} catch (MalformedURLException e) {
return false;
uri = new URI(location);
if (("file".equals(uri.getScheme()) || StringUtils.isBlank(uri.getScheme()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some legal paths that would not be caught by this. For example, on Windows - D:/Foo/Bar, on Linux a:foo/bar. In both cases the first letter is treated as a schema and so the whole path will be treated as a URI and fail on uri.toURL()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I think Windows D:/Foo/Bar should not be allowed because it's not a URL, but it should be handled gracefully by the catch (URISyntaxException e) block bellow.

On Linux, a:foo/bar can as well be http:foo/bar and now how do we tell if it should be treated as URL or a path?

This is a hole we dug ourselves by being too benevolent about the input, we should have insisted on a URL :).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm a:foo/bar is not a URL protocol we support, therefore we should treat it as a path... but yes I get what you're saying. D:/Foo/Bar on the other hand is not an URL to start with, it's a path

@TomasHofman
Copy link
Contributor Author

Thanks @spyrkob for reviewing, I didn't thought of the Windows inputs. (And the Windows jobs did fail, which is good to see that we have coverage!)

@TomasHofman TomasHofman force-pushed the repo-url-normalization branch 7 times, most recently from ce6b648 to 9906b35 Compare July 24, 2024 12:23
@TomasHofman
Copy link
Contributor Author

@spyrkob I believe I managed to sort out the Windows paths.

I think that we allow a weird mixture of things, while disallowing different things of the same kind.

For instance we have test cases where we test that pure paths (not URLs) are accepted (in this case absolute paths):

actualList = from(List.of(System.getProperty("user.dir")));

Different test checks that a non-URL string is refused based on having invalid format:

assertThrows(ArgumentParsingException.class, ()->from(List.of("imnoturl")));

On yet different place we test that the same kind of non-URL string is refused based on the path not being an existing file:

Assertions.assertThatThrownBy(() -> from(List.of("idontexist")))

From a little different angle, we are inconsistent in that we allow relative paths in URL form ("file:some/path"), but don't allow relative paths in path form ("some/path"), yet we allow absolute paths in path form ("/some/path").

I'm not sure though if we should change that due to backward compatibility. (At least we probably shouldn't make it stricter.)

@TomasHofman
Copy link
Contributor Author

Updated comments in the code.

@spyrkob
Copy link
Collaborator

spyrkob commented Jul 24, 2024

I think that we allow a weird mixture of things, while disallowing different things of the same kind.

@TomasHofman I agree :( Originally this only allowed URLs, later we got a request to support local files as well, hence this logic changed and got overcomplicated over time. The relative URL itself is an unexpected effect of maven library allowing it, by the time I realised this is possible, we couldn't block it easily due to backwards compatibility (at least in 1.1.x branch).

From a little different angle, we are inconsistent in that we allow relative paths in URL form ("file:some/path"), but don't allow relative paths in path form ("some/path"), yet we allow absolute paths in path form ("/some/path").

We should allow relative paths some/path as long as the file exists, this sounds like a bug

Different test checks that a non-URL string is refused based on having invalid format:

That's an incorrect test - probably hasn't been removed when support of local files has been added. Also that shows that the check in the test is not enough, the test should check the reason why the value is removed

The summary of allowed values is:

  1. A http/https URL
  2. An absolute, local file URL pointing to an existing folder
  3. A relative, local file URL pointing to an existing folder (probably can deprecate and drop this in future)
  4. An absolute local path, pointing to an existing folder
  5. A relative, local path pointing to an existing folder

On top of that, each of this can have an ID suffix <ID>:: - though I'd like to deprecate and drop that in future.

Copy link
Collaborator

@spyrkob spyrkob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @TomasHofman

@TomasHofman
Copy link
Contributor Author

I think I still need to handle a:foo/bar case.

@spyrkob
Copy link
Collaborator

spyrkob commented Jul 24, 2024

I think I still need to handle a:foo/bar case.

yes of course, sorry 🤦

@spyrkob spyrkob self-requested a review July 24, 2024 15:04
@TomasHofman TomasHofman force-pushed the repo-url-normalization branch 2 times, most recently from 0647562 to 0b02cd8 Compare July 24, 2024 15:38
Also tried so simplify the RepositoryDefinition class.
@TomasHofman
Copy link
Contributor Author

When you do Path.of("a:foo/bar") it gives you relative path on Linux, but a valid absolute path on Windows, so it's gonna be platform specific. Check the test case. (It's bit weird but the test has to be tailored for the platform too.)

I tried to do a comparison to the original impl and I do get mostly the same results, except that the current impl allows some notations that threw exceptions before, like:

  • file://host/c:/some/path
  • file://host/some/path
  • file://../prospero-cli

They do not seem to be very sensible, so we could just explicitly forbid any remote URLs (two slashes) in the file schema. We could also enforce only http and https for the remote URLs.

Then we have the option to throw all this away and keep the original impl, just do the bare minimum to prefer the one slash notation over the three slashes notation...

@spyrkob
Copy link
Collaborator

spyrkob commented Jul 25, 2024

Thanks for the detailed summary @TomasHofman!

They do not seem to be very sensible, so we could just explicitly forbid any remote URLs (two slashes) in the file schema. We could also enforce only http and https for the remote URLs.

+1 to forbidding the remote file URLs especially if we already blocked that before. I'm thinking more and more that we should have supported http/https URLs and local file paths (without file:schema) only.

@TomasHofman
Copy link
Contributor Author

Added new commit to try to enforce http / https / file schemas, and file only for local URLs (no host).

Copy link
Collaborator

@spyrkob spyrkob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @TomasHofman

@spyrkob spyrkob merged commit 53b700b into wildfly-extras:main Jul 29, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants