-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMP] website, website_event: add rel=canonical tag #35852
Conversation
50461ad
to
5f9356b
Compare
c11fe59
to
1ca558c
Compare
Great work! I let technical review to my colleague @yajo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, thanks!
I'll appreciate if you can remove my mention (@yajo) from the commit. Just put Jairo Llopis please. Github is spamming me on each clone/rebase/push/amend/etc. of that commit. 😅 |
7d2295d
to
d4e6837
Compare
Done, sorry for the spam. Hopefully it is better now. |
Sweet! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats about lang prefix finally seb?
Part of the task was also to check the question about duplicate content in case of ?sort=latest_blog page for example. Did you check it ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runbot red
For lang, I'll check today, for runbot hopefully my last force push will fix it |
d4e6837
to
53c3da9
Compare
All references I checked agree that listing pages (also called search pages) should be canonical per page but not per filter or sort. The goal is only one listing indexed (no duplicate) but all its page are indexed (full content coverage). Some references:
This is how the current implementation is going to work if we usually filter or sort by query string and have the pager in the path. In the specific context of our /blog route, it could be argued that the route by /tag should in fact not be canonical because it is duplicate content from the main list. Same could be said for the multi blog route and specific blog route when multiple blogs are used. |
53c3da9
to
0c54a05
Compare
0c54a05
to
ec7b3e7
Compare
Now you own him a beer @JKE-be! Perhaps at Odoo Experience 😉 |
@seb-odoo I know that this is merged and closed but the reference from dmoz expert is very old (2011). Curently, SEO experts say that page filtering and sorting creates duplicate content and that canonical url doesn't solve the problem. See for example Olivier Andrieu's books and website (in french). |
@zeroheure Do you have a direct link to an article on that website (or any other) about the sorting and filtering problem, and specifically how/why canonical is not solving it? Also we've not implemented exactly like the OCA module, so the result might not be the same. |
@seb-odoo There is not much freely available on abondance.com Here's recent Google advices about faceted navigation, where it is clear that canonical alone doesn't solve the problem. |
After reading the last article, it seems that, apart from
We usually try to do those when appropriate, though there might still be room for improvement. They also suggest to use parameters instead of using the path for categories and other meaningful pages: apparently they can guess more easily what to crawl then, but this is at the opposite of what we do in our routes. Finally, for pagination, it is recommended to add canonical on all pages to a "view all" page, but only if it is doable, without impacting performance. So this cannot apply in our case, unless we add more complex logic based on the number of products/events/posts/... and only canonicalize pages if less than an arbitrary number so that the "view all" is not slow. |
Thankyou for taking time to read it and clarifying problem. So your current implementation is the best that can be done ? And the only remaining problem is that we may have to add parameters in Google Search Console. |
I would not say it's the best that can be done, but we are going in the right direction for sure. There are probably other links where we could add a "nofollow", but we will not have time to do that before version 13. The current compromise seems good enough, we have fixed the worse occurrences of exponentially growing links (such as in #30832 referenced above for blog, and in the current PR for slides), and added canonical to send the correct info for the remaining duplicate pages. As said, we could go further for search pagination by being a bit smarter when there are few items listed, but this would be at the cost of more complexity and potential mistakes. Since we ignore all parameters by default in our implementation of canonical, I don't think there is anything to do in the search console regarding that. Feedback is always welcome post v13 to keep improving it where needed. |
There's still a big problem then: They are no The canonical improvement would still save us from that problem in many cases, but the nofollow there would save a lot of time that crawlers spend combining filters. |
Google only follows links if there is an href, using form elements is enough to stop the search engine spider (another way is to use JS with |
Follows odoo#35852. More useless crawls saved.
OK the logs I saw were not attrs indeed, they were the order. I opened #37984 for that case. |
Follows odoo#35852. More useless crawls saved. X-original-commit: 673a9e9
Follows odoo#35852. More useless crawls saved.
Follows odoo#35852. More useless crawls saved. X-original-commit: 42cbe36
Follows odoo#35852. More useless crawls saved. X-original-commit: 42cbe36
The canonical tag is important for SEO, indeed it prevents search engines from
indexing duplicate content.
Reasoning
The choice has been made to create the canonical tag automatically depending on
the request path, ignoring the query string, and manually prefixing the
appropriate domain and language code.
Indeed creating it manually for each resource would create a lot of code and
potential mistakes.
It is more dangerous to do it the generic way, but after investigation it
appears that it is an acceptable trade-off since the vast majority of our routes
are well built and already ready for this:
Override
It is still possible to override the default behavior by passing
canonical_params
manually to the view or to the different methods.This is done for
/event
because the only way to display Past Events is to adddate=old
.Languages
Fix an issue where it was possible for a bot to be on the URL without language
code but to use a language that is not the default language.
Adapt hreflang, because it:
Misc
task-1958075
closes #12532
Inspired by OCA module
website_canonical_url
courtesy of Jairo Llopis.Co-authored-by: Jairo Llopis jairo.llopis@tecnativa.com
Co-authored-by: Sébastien Theys seb@odoo.com