Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe characters end up in documentation filenames. #699

Closed
SDGGiesbrecht opened this issue Dec 13, 2016 · 4 comments
Closed

Unsafe characters end up in documentation filenames. #699

SDGGiesbrecht opened this issue Dec 13, 2016 · 4 comments
Assignees

Comments

@SDGGiesbrecht
Copy link
Contributor

SDGGiesbrecht commented Dec 13, 2016

Swift function names allow (and require in the case of the colon) characters that are dangerous to use in filenames and URLs. Jazzy naïvely uses these function names when it assigns filenames to the HTML it outputs. This results in problems when one attempts to host the documentation.

For example, the following code is valid Swift:

public func /<T>(lhs: T, rhs: T) {}

But the resulting file is /(_:_:).html, which is a disaster waiting to happen.

Test it in no time by pasting the following terminal commands. (Make sure there isn’t some other “Project” folder in the current directory first!)

rm -r Project
mkdir Project
cd Project
printf 'import PackageDescription\n\nlet package = Package(\n    name: "MyPackage"\n)' > Package.swift
mkdir Sources
cd Sources
printf '/// Some documentation\npublic func /<T>(lhs: T, rhs: T) {}' > Source.swift
cd ..
swift package generate-xcodeproj
jazzy
open docs/index.html
cd ..

When the documentation opens, click on the link to the function in the top left... and see the ensuing chaos.

@jpsim
Copy link
Collaborator

jpsim commented Dec 13, 2016

There are a number of open issues tracking better handling of file/URL generation for invalid characters. Such as #146, #361, #547, #558. No PR we've yet received has quite handled this generically, cleanly and comprehensively enough to be accepted IMO.

@SDGGiesbrecht
Copy link
Contributor Author

Well, right now GitHub Pages fails to load any of my documentation unless I manually remove the offending files, leaving broken links in my wake. So I’m somewhat impatient to see it fixed.

I’ve read the other threads you mentioned @jpsim, and I have a possible solution with them all in mind. I’ll explain it here for posterity in a minute and then try my hand at a pull request, but I realize this one will probably require a lot of review.

@SDGGiesbrecht
Copy link
Contributor Author

Design Considerations:

1. Sanitized filenames should only accessible with a configuration option for now. Because...
(a) Any sanitization scheme inevitable reduces human readability, and that is sometimes undesirable.
(b) Many projects do not expose any functions that trigger problems. Where that is case, there is no sense in messing with human readability for no reason.
(c) Documentation that has been generated with an older version of Jazzy and then linked to elsewhere on the web (including the rest of the developer’s own website) cannot have its filenames changed without introducing broken links. Such a change should not be done behind the developer’s back; if the developer must opt in, the affects are more apparent.
The name of the configuration option in my pull request, “use_safe_filenames” was an arbitrary choice. Any name that is clear could be used instead.

2. Sanitization should be applied solely to path components generated from swift code. Manually defined path components such as custom categories should remain in the developer’s direct control.

3. An existing standard should be used, in order to efficiently deal with all related problems at once, including those still unnoticed. CGI.escape() is designed precicely for this purpose and it is conveniently located in the Ruby Standard Library.

4. The file name itself must be sanitized, not just the links that point to it. The issue is not just about referencing the file; the ability to save and host the documentation is at risk. It has been asked if this is a Windows‐only problem, but this is not the case. As noted in my earlier comment, GitHub Pages crashes when it tries to import my Jazzy‐generated documentation and yields a site‐wide 404. The example I have repeatedly used, /(_:_:), causes an unexpected file tree on Unix systems, and the resulting HTML file can no longer find any of its resources (i.e. its CSS stylesheet).

5. Mere percent encoding is insufficient. As @mbogh noticed, this breaks the links. Here is why:
If the file name would be /(_:_:).html, and percent encoding alone is applied,
(a) the filename becomes %2F%28_%3A_%3A%29.html
(b) and links point to %2F%28_%3A_%3A%29.html
When a link is clicked on, the server resolves the percent encoding, yielding /(_:_:).html again, which it attempts to find.
Because the filename is not /(_:_:).html, but %2F%28_%3A_%3A%29.html, the server fails to find it and returns a 404.
The solution I used in my pull request was to swap the percent signs out for underscores after performing the encoding. With the percent signs gone, the server will no longer try to process any encoding scheme, and the links just work as is: the link points to _2F_28__3A__3A_29.html and the server finds it in the same place. The underscore was an arbitrary choice; any safe character could be used instead.

6. Normalization form NFC sometimes reduces the number of offending characters in a string. A single composed character will be escaped instead of several combining characters in a row. This makes a minor improvement to the readability of the resulting filenames.

SDGGiesbrecht added a commit that referenced this issue Dec 14, 2016
Moving the branch for #699 into the main repository for easier testing.
@SDGGiesbrecht SDGGiesbrecht self-assigned this Dec 16, 2016
@SDGGiesbrecht
Copy link
Contributor Author

The configuration option use_safe_filenames and the corresponding flag --use-safe-filenames are now available. Until 0.7.4 is released, you can build and run from source (in your project directory):

git clone https://github.com/realm/jazzy
cd jazzy
bundle install
cd ..
jazzy/bin/jazzy --use-safe-filenames
rm -rf jazzy

Any other flags you normally use (such as --clean) can be added to the second last line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants