You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this run we see that the script has the link for https://commons.wikimedia.org/wiki/File:Trains_icons_(evolution)_SVG.svg as https://commons.wikimedia.org/wiki/File:Trains_icons_ likely failing on the first (.
Should the script ULR encode or what could be a solution?
The text was updated successfully, but these errors were encountered:
By looking at rfc3986 Uniform Resource Identifier (URI): Generic Syntax, I think we should be able to ensure that we are correctly matching URLs, and we should take care that some characters can only exist in very specific spots in a URL and otherwise should be url-encoded.
At a glance we should probably add ~, !, @, +, ,, ; and maybe (, ), [, ] as well.
The regex intentionally does not grab # and anything which follows as that denotes the anchor tag, which we ignore for this purpose.
I think we could add ( and ) as valid URL characters to capture, then it would also capture the trailing ), and we would need to discard that as a special case. I expect [ and ] will also prove to be challenging if we need to add those.
In this run we see that the script has the link for
https://commons.wikimedia.org/wiki/File:Trains_icons_(evolution)_SVG.svg
ashttps://commons.wikimedia.org/wiki/File:Trains_icons_
likely failing on the first(
.Should the script ULR encode or what could be a solution?
The text was updated successfully, but these errors were encountered: