-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway renders pretty 404 pages if available #4233
Conversation
7115da7
to
5df5fee
Compare
Cool feature. cc @lgierth @diasdavid |
As for tests, you probably want to add something in core/corehttp/gateway_test.go, some of the tests in there construct directories, in particular TestIPNSHostnameRedirect creates a directory with an index.html to test that working properly. I imagine a test for this 404 feature would be quite similar. |
5df5fee
to
ec8b2c3
Compare
Thanks for the advice @whyrusleeping, hopefully this covers the new cases. Shout if there's anything else I can do to help! |
ec8b2c3
to
0ba9673
Compare
Thinking about this, what is the need for not setting a cache? Everything is content addressed, right? There should be no need not to cache. @dignifiedquire @victorbjelkholm I think you guys might be interested in this PR |
Choosing ETag for the cache is not simple in this case as the We would also have to resolve the |
@whyrusleeping I'm thinking of the case where I request a non-existant file from the gateway today; I don't want my browser's cache to be prevented from trying again, say, tomorrow when that DNSlink entry/peerID points to a new hash where that file now exists. Right now this PR ensures the gateway sets the ETag of a requested but missing file to be the Cid of the pretty 404 page that was found (as @Kubuxu suggests). Smart browsers can then include it in the Does this address your concern? |
What if the gateway fails to find the file, timeouts and returns a 404 with cache-control: unlimited? An existing hash->file can be cached forever and the corollary is that a non-existing hash->file will someday become an existing hash->file. |
If you load As both Etag and Cache-Control uses the URL, it has to be a redirect and if it redirects to Introducing IPNS would make things work differently though, and probably in that case it's best to dump the Cache-Control header. However, if IPFS isn't being used with IPNS, the 404 should be permanent.
In the case of |
No redirect for 404 page please. I need to know original url where 404 happen. Then i can redirect to right location by Javascript. I think about .htaccess for ipfs. |
There's no redirecting happening at the moment. I believe @victorbjelkholm was talking about internally, ie. the data is being delivered when a missing file is requested.
I agree — in this case we know that Further to this, in the case of Because of this I'm up for leaving the 404.html page without the immutable cache header, but leave the ETag in place, as it stands. It's extremely unlikely we'd ever need to change the |
I think doing this at the Gateway is the wrong place, because if I've understood this correctly, its indicating to the caller (which might be an application, not a browser) that its retrieved the page. Shouldnt this be part of the logic in whatever is calling the Gateway (browser) to decide what to do when the page isn't found. For example ... I can see situations where if the file can't be delivered by the Gateway, the app decides to go to somewhere else (such as an Archive) to try a different way to retrieve the same content. |
@mitra42 The gateway will still return a 404 status so the application can decide what to do with it. However, I believe this should be checking the client's Accept header. If the client doesn't want an HTML page, we shouldn't be returning an HTML 404 page. |
That's an excellent idea @Stebalien; only if the request header matches @mitra42 while I agree with your point in pure terms, my understanding of the Gateway's purpose is that it should bring some of the benefits of IPFS to browsers which don't have native IPFS support, which leads me to two counter arguments:
I hope that makes sense! |
d7846d7
to
81bfa5e
Compare
Agreed, |
81bfa5e
to
f278f0c
Compare
It might be worth noting that if a person wishing to use this feature wants their pretty 404 page's assets to render correctly without specifying absolute paths (eg. if they want the 404s to work with accessing the file from an IPFS or IPNS URL like <head>
<!-- things that don't refernce assets -->
<script>
const ipfsPath = /^\/ip[fn]s\/(.+?)\//.exec(window.location.pathname);
const basePath = (ipfsPath == null) ? '/' : ipfsPath[0];
document.write('<base href="' + basePath + '"/>');
</script>
<!-- things that reference assets, eg. CSS and JS -->
</head> |
@whyrusleeping is there anything further you'd like me to alter on this PR? |
This LGTM on a technical level. Now we need to get a signoff from the others on whether or not this is something we really want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from those 2 style quirks, SGTM.
This should cover the changes you suggested @Stebalien — feedback is welcome! |
@@ -290,6 +295,10 @@ func (i *gatewayHandler) getOrHeadHandler(w http.ResponseWriter, r *http.Request | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if a 404 page is present, we should return 404 here. But I'm fine leaving that edge-case unfixed.
In the same way that an `index.html` file is rendered, if one is present, when the requested path is a directory, now an `ipfs-404.html` file is rendered if the requested file is not present within the specified IPFS object. `ipfs-404.html` files are looked for in the directory of the requested path and each parent until one is found, falling back on the well-known 404 error message. License: MIT Signed-off-by: JP Hastings-Spital <jphastings@gmail.com>
6dec6e3
to
dfceafd
Compare
🥳 Thanks for your patience on this, this has been a long time coming. |
If I am not mistaken, this change implies a lookup for In core/corehttp/gateway_test.go, I think that the following tests are wrong: {"/deeper/", "text/html", http.StatusNotFound, "Deep custom 404"},
{"/deeper", "text/html", http.StatusNotFound, "Deep custom 404"}, These and |
That was intentional by me — the approach that web servers like NGinx take is |
That sounds rather unusual. By default directory indexes are disabled unless autoindex is changed. The error_page directive only affects the page shown on errors, it does not affect the routing logic (whether to display an index or not). Did you have a custom config such as With the current approach, there is no way to obtain a directory index if |
That's a good point; I guess it comes down to whether IPFS wants directory listings to always be possible to retrieve on the HTTP gateway. Given this is my first PR to IPFS I don't feel like I'm qualified to make such a judgement call (beyond what I'd personally like) — so maybe we can lean on @Stebalien? |
If directory listings are not desirable, one could add an explicit index.html file to hide the contents. There is probably no point in using this custom 404 page feature for hiding files since people can just query the directory directly, that is where IPFS differs from nginx. I am also new here and do not know if special files such as index.html and ipfs-404.html are documented anywhere, but that would be nice to have :-) |
Hm. Yeah, I didn't fully think through the ramifications of this. I actually have at least one case where I want to support custom 404 pages and directory listings (dist.ipfs.io). |
fixes #4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
I love that there is thinking around supporting 404 pages. But because it is a blog with pages in this folder format
it means I'd need to put a |
Specifying it once at the root should be enough. The gateway will search all parent directories up the path for an ipfs-404.html file. |
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
fixes ipfs#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option.
…-404 Gateway renders pretty 404 pages if available This commit was moved from ipfs/kubo@1744eb2
fixes ipfs/kubo#4233 (comment) Basically, there's a trade-off here: 1. We can support directory listings while supporting 404 pages (this PR). 2. If a 404 page is present, directory listings don't work. Given that option 1 is more flexible and users shouldn't be _too_ confused if they land on a directory with no index.html page, I've gone with that option. This commit was moved from ipfs/kubo@6a2fe0a
I noticed ipfs/ipfs#167 and thought this might be an elegant solution for allowing the gateway to show helpful information when requested files aren't found.
When a requested file isn't found
ipfs-404.html
is looked for in the same directory, rolling up through any of its parents, and displayed (without immutable cache headers) if present.Run this patch locally and query the following for a demonstration: