Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMP] website, website_event: add rel=canonical tag #35852

Closed
wants to merge 1 commit into from

Conversation

seb-odoo
Copy link
Contributor

@seb-odoo seb-odoo commented Aug 20, 2019

The canonical tag is important for SEO, indeed it prevents search engines from
indexing duplicate content.

Reasoning

The choice has been made to create the canonical tag automatically depending on
the request path, ignoring the query string, and manually prefixing the
appropriate domain and language code.

Indeed creating it manually for each resource would create a lot of code and
potential mistakes.

It is more dangerous to do it the generic way, but after investigation it
appears that it is an acceptable trade-off since the vast majority of our routes
are well built and already ready for this:

  • using query string only for minor features that do not change the main content
  • having the models, the ids, the pager and other important features in the path

Override

It is still possible to override the default behavior by passing
canonical_params manually to the view or to the different methods.

This is done for /event because the only way to display Past Events is to add
date=old.

Languages

Fix an issue where it was possible for a bot to be on the URL without language
code but to use a language that is not the default language.

Adapt hreflang, because it:

  • must only be present on canonical pages
  • must always lead to canonical pages
  • should not be set if there is no alternate language

Misc

task-1958075
closes #12532

Inspired by OCA module website_canonical_url courtesy of Jairo Llopis.

Co-authored-by: Jairo Llopis jairo.llopis@tecnativa.com
Co-authored-by: Sébastien Theys seb@odoo.com

@seb-odoo seb-odoo added RD research & development, internal work Website labels Aug 20, 2019
@seb-odoo seb-odoo requested a review from JKE-be August 20, 2019 12:22
@seb-odoo seb-odoo self-assigned this Aug 20, 2019
@seb-odoo seb-odoo changed the title [IMP] website: add rel=canonical tag [IMP] website, website_event: add rel=canonical tag Aug 20, 2019
@seb-odoo seb-odoo force-pushed the master-canonical-seb branch 3 times, most recently from c11fe59 to 1ca558c Compare August 20, 2019 14:43
@pedrobaeza
Copy link
Collaborator

Great work! I let technical review to my colleague @yajo

Copy link
Contributor

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, thanks!

@yajo
Copy link
Contributor

yajo commented Aug 21, 2019

I'll appreciate if you can remove my mention (@yajo) from the commit. Just put Jairo Llopis please. Github is spamming me on each clone/rebase/push/amend/etc. of that commit. 😅

@seb-odoo seb-odoo force-pushed the master-canonical-seb branch 2 times, most recently from 7d2295d to d4e6837 Compare August 21, 2019 09:12
@seb-odoo
Copy link
Contributor Author

Done, sorry for the spam. Hopefully it is better now.

@Yenthe666
Copy link
Collaborator

Sweet!

Copy link
Contributor

@JKE-be JKE-be left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats about lang prefix finally seb?

addons/website/tests/test_base_url.py Outdated Show resolved Hide resolved
@JKE-be
Copy link
Contributor

JKE-be commented Aug 21, 2019

@seb-odoo

Part of the task was also to check the question about duplicate content in case of ?sort=latest_blog page for example. Did you check it ?

Copy link
Contributor

@JKE-be JKE-be left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runbot red

@seb-odoo
Copy link
Contributor Author

For lang, I'll check today, for runbot hopefully my last force push will fix it

@robodoo robodoo added the CI 🤖 Robodoo has seen passing statuses label Aug 21, 2019
@robodoo robodoo removed the CI 🤖 Robodoo has seen passing statuses label Aug 21, 2019
@seb-odoo
Copy link
Contributor Author

All references I checked agree that listing pages (also called search pages) should be canonical per page but not per filter or sort.

The goal is only one listing indexed (no duplicate) but all its page are indexed (full content coverage).

Some references:

This is how the current implementation is going to work if we usually filter or sort by query string and have the pager in the path.

In the specific context of our /blog route, it could be argued that the route by /tag should in fact not be canonical because it is duplicate content from the main list.

Same could be said for the multi blog route and specific blog route when multiple blogs are used.

@robodoo robodoo added the CI 🤖 Robodoo has seen passing statuses label Aug 21, 2019
@robodoo robodoo removed the CI 🤖 Robodoo has seen passing statuses label Aug 21, 2019
@Yenthe666
Copy link
Collaborator

Now you own him a beer @JKE-be! Perhaps at Odoo Experience 😉

@seb-odoo seb-odoo deleted the master-canonical-seb branch September 4, 2019 08:52
@zeroheure
Copy link
Contributor

@seb-odoo I know that this is merged and closed but the reference from dmoz expert is very old (2011). Curently, SEO experts say that page filtering and sorting creates duplicate content and that canonical url doesn't solve the problem. See for example Olivier Andrieu's books and website (in french).
Further, I've experienced duplicate content on my Odoo website, while OCA's canonical_url module was setup. The only practical way to solve duplicate content was to disable sorting (I could have setup parameters in Google Search Console, but this was not easy to manage for my customer).

@seb-odoo
Copy link
Contributor Author

@zeroheure Do you have a direct link to an article on that website (or any other) about the sorting and filtering problem, and specifically how/why canonical is not solving it?

Also we've not implemented exactly like the OCA module, so the result might not be the same.

@zeroheure
Copy link
Contributor

zeroheure commented Sep 24, 2019

@seb-odoo There is not much freely available on abondance.com
Here's a recent video (french), Olivier Andrieu talks about "navigation à facette" (sort and filtering) at 4:08. It is basic but expose the problem. (I have also emails and a website audit from him that I can send if needed).
There is a lot more on the private part of his site.

Here's recent Google advices about faceted navigation, where it is clear that canonical alone doesn't solve the problem.

@seb-odoo
Copy link
Contributor Author

After reading the last article, it seems that, apart from rel="canonical", the two other main tips are:

We usually try to do those when appropriate, though there might still be room for improvement.

They also suggest to use parameters instead of using the path for categories and other meaningful pages: apparently they can guess more easily what to crawl then, but this is at the opposite of what we do in our routes.
But more importantly they also recommend to have meaningless parameters not in the path, which is what we do. Also more important parameters should be listed first.

Finally, for pagination, it is recommended to add canonical on all pages to a "view all" page, but only if it is doable, without impacting performance. So this cannot apply in our case, unless we add more complex logic based on the number of products/events/posts/... and only canonicalize pages if less than an arbitrary number so that the "view all" is not slow.

@zeroheure
Copy link
Contributor

Thankyou for taking time to read it and clarifying problem. So your current implementation is the best that can be done ? And the only remaining problem is that we may have to add parameters in Google Search Console.

@seb-odoo
Copy link
Contributor Author

I would not say it's the best that can be done, but we are going in the right direction for sure.

There are probably other links where we could add a "nofollow", but we will not have time to do that before version 13.
It is safer to not use them and use a bit more crawler time on duplicate links (that are now canonicalized!), instead of risking to break indexation of some pages because they can't be reached anymore.

The current compromise seems good enough, we have fixed the worse occurrences of exponentially growing links (such as in #30832 referenced above for blog, and in the current PR for slides), and added canonical to send the correct info for the remaining duplicate pages.

As said, we could go further for search pagination by being a bit smarter when there are few items listed, but this would be at the cost of more complexity and potential mistakes.

Since we ignore all parameters by default in our implementation of canonical, I don't think there is anything to do in the search console regarding that.

Feedback is always welcome post v13 to keep improving it where needed.

@yajo
Copy link
Contributor

yajo commented Oct 4, 2019

There's still a big problem then: website_sale attribute filters:

Captura de pantalla de 2019-10-04 09-36-00

They are no <a>, they are <input> or similar elements. How are we supposed to add a rel="nofollow" to them? Does it work in inputs as well?

The canonical improvement would still save us from that problem in many cases, but the nofollow there would save a lot of time that crawlers spend combining filters.

@zeroheure
Copy link
Contributor

zeroheure commented Oct 4, 2019

Google only follows links if there is an href, using form elements is enough to stop the search engine spider (another way is to use JS with document.location).
Please note that the nofollow attribute works for filters but doesn't work for normal pages : for example, adding a nofollow on a link toward the About us page has no effect on Google spider and Google index (see Google's I/O developer conference, May 2018 (video).

yajo added a commit to Tecnativa/odoo that referenced this pull request Oct 4, 2019
@yajo
Copy link
Contributor

yajo commented Oct 4, 2019

OK the logs I saw were not attrs indeed, they were the order. I opened #37984 for that case.

robodoo pushed a commit that referenced this pull request Oct 4, 2019
Follows #35852. More useless crawls saved.

closes #37984

Signed-off-by: Jérémy Kersten (jke) <jke@openerp.com>
fw-bot pushed a commit to odoo-dev/odoo that referenced this pull request Oct 4, 2019
Follows odoo#35852. More useless crawls saved.

X-original-commit: 673a9e9
nim-odoo pushed a commit to odoo-dev/odoo that referenced this pull request Oct 7, 2019
robodoo pushed a commit that referenced this pull request Oct 7, 2019
Follows #35852. More useless crawls saved.

closes #37990

Signed-off-by: Nicolas Martinelli (nim) <nim@odoo.com>
fw-bot pushed a commit to odoo-dev/odoo that referenced this pull request Oct 7, 2019
Follows odoo#35852. More useless crawls saved.

X-original-commit: 42cbe36
fw-bot pushed a commit to odoo-dev/odoo that referenced this pull request Oct 7, 2019
Follows odoo#35852. More useless crawls saved.

X-original-commit: 42cbe36
robodoo pushed a commit that referenced this pull request Oct 7, 2019
Follows #35852. More useless crawls saved.

closes #38106

X-original-commit: 42cbe36
Signed-off-by: Jérémy Kersten (jke) <jke@openerp.com>
robodoo pushed a commit that referenced this pull request Oct 7, 2019
Follows #35852. More useless crawls saved.

closes #38114

X-original-commit: 42cbe36
Signed-off-by: Jérémy Kersten (jke) <jke@openerp.com>
robodoo pushed a commit that referenced this pull request Oct 8, 2019
Follows #35852. More useless crawls saved.

closes #38114

X-original-commit: 42cbe36
Signed-off-by: Jérémy Kersten (jke) <jke@openerp.com>
robodoo pushed a commit that referenced this pull request Oct 8, 2019
Follows #35852. More useless crawls saved.

closes #37989

X-original-commit: 673a9e9
Signed-off-by: Nicolas Martinelli (nim) <nim@odoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI 🤖 Robodoo has seen passing statuses RD research & development, internal work Website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants