Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize domains and redirect in proxito #6905

Merged
merged 5 commits into from
Apr 24, 2020

Conversation

davidfischer
Copy link
Contributor

@davidfischer davidfischer commented Apr 14, 2020

  • Do HTTP -> HTTPS redirects if domain.https
  • Redirect public domain to canonical CNAME if domain.canonical
  • Redirect CNAME -> public domain (or another canonical domain) if the requested custom domain is not canonical
  • Set additional data on the X-RTD-Redirect header whether the redirect is to canonicalize, a system redirect, or a user redirect

If this is merged, #6882 can be closed as it won't be needed. This is an alternative that uses the same code from the resolver to perform the redirect to the appropriate domain.

Fixes: #4641

Concerns

I don't think the HTTP -> HTTPS redirect is a concern. Users who set domain.https will be happy with the change and users who didn't won't see a change.

The canonicalizing domain change is a little more worrying. Users who don't have domain.canonical set might be surprised their domain is now redirecting to slug.readthedocs.io/... (or the equivalent on readthedocs.com). Maybe we need a big note in the changelog about setting domain.canonical?

- Only redirect if the appropriate flag is set on the Domain
  (default is not to redirect)
- Use a 302 redirect for now. Ideally this is a 301 redirect,
  but redirecting other people's domains with a 301 is always a bit
  risky.
- Do HTTP -> HTTPS redirects if domain.https is set
- Redirect public domain to canonical CNAME if domain.canonical
- Redirect CNAME -> public domain if the custom domain is not canonical
- Set additional data on the X-RTD-Redirect header whether the redirect
  is to canonicalize, a system redirect, or a user redirect
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think that I also prefer this way since it follow a similar pattern that we already have.

Comment on lines +41 to +44
r = self.client.get('/', HTTP_HOST=self.domain.domain)
self.assertEqual(r.status_code, 302)
self.assertEqual(
r['Location'], f'https://{self.domain.domain}/en/latest/',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing only 1 redirect now, instead of 2. I think we were doing,

  1. http -> https
  2. https -> /en/latest/

Now we are doing http -> https /en/latest/ in one step. Is this correct?

Example,

$ curl --silent -IL http://docs.readthedocs.io/ | grep Location
Location: https://docs.readthedocs.io/
Location: https://docs.readthedocs.io/en/stable/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there are two redirects. I actually think that for *.readthedocs.io there will be two since the http -> https redirect will be handled by nginx before it hits proxito. For custom domains that need to both change domain and scheme, this change does use a single redirect.

@@ -176,7 +176,26 @@ def system_redirect(self, request, final_project, lang_slug, version_slug, filen
)
log.info('System Redirect: host=%s, from=%s, to=%s', request.get_host(), filename, to)
resp = HttpResponseRedirect(to)
resp['X-RTD-System-Redirect'] = True
resp['X-RTD-Redirect'] = 'system'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@humitos
Copy link
Member

humitos commented Apr 15, 2020

Users who don't have domain.canonical set might be surprised their domain is now redirecting to slug.readthedocs.io/...

This is a tricky one. I'm not sure to understand if this is something wanted at all, though. Why would you create only one custom domain in RTD but expecting to keep serving the docs under .readthedocs.io?

I can see the use case of having multiple-domains for your site and wanted one to be the canonical, though.

Some quick numbers in .org

# Projects with 2 or more domains
In [5]: Project.objects.annotate(domains_count=Count('domains')).filter(domains_count__gte=2).count()                                        
Out[5]: 377

# Projects with 1 domain that it's not set as canonical
In [7]: Project.objects.annotate(domains_count=Count('domains')).filter(domains_count=1, domains__canonical=False).count()                                                                                                                   
Out[7]: 1131

# Projects with 1 domain set as canonical
In [8]: Project.objects.annotate(domains_count=Count('domains')).filter(domains_count=1, domains__canonical=True).count()                                                                                                                    
Out[8]: 1951

# Projects with 2 or more domains and at least 1 not set as canonical
In [10]: Project.objects.annotate(domains_count=Count('domains')).filter(domains_count__gte=2, domains__canonical=False).count()             
Out[10]: 374

# Projects with 2 or more domains and at least 1 set as canonical
In [11]: Project.objects.annotate(domains_count=Count('domains')).filter(domains_count__gte=2, domains__canonical=True).count()              
Out[11]: 278

@davidfischer
Copy link
Contributor Author

Why would you create only one custom domain in RTD but expecting to keep serving the docs under .readthedocs.io?

I don't know. Perhaps historical reasons? Regardless, I don't think we should serve the same docs from multiple domains. We currently do that and I'd like to get away from it if possible.

@ericholscher
Copy link
Member

I don't know. Perhaps historical reasons? Regardless, I don't think we should serve the same docs from multiple domains. We currently do that and I'd like to get away from it if possible.

I think we should comment this code out with a TODO & xfail the tests for it. It seems less valuable than the rest of this PR which seems 👍. We can address the canonical domain stuff later -- I think we could probably just only respect the canonical setting when they have more than 1 domain, otherwise the domain is automatically canonical, or something like that.

@davidfischer
Copy link
Contributor Author

I removed the redirect from custom domains -> public domain if the domain is not canonical for now. This leaves only:

  • HTTP -> HTTPS redirect if domain.https
  • public domain -> custom domain redirect if domain.canonical

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ericholscher
Copy link
Member

We should definitely plan on QA'ing this one

@ericholscher ericholscher merged commit fb01c6d into master Apr 24, 2020
@ericholscher ericholscher deleted the davidfischer/canonicalize-domain-redirect branch April 24, 2020 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support HTTPS redirect
3 participants