-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add canonical link header #963
Conversation
HTML snapshot should point at canonical URL. It improves semantics and help search engines to deduplicate relevant results when multiple snapshots are published on the web. Closes openzim#564 License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
ac6a4a4
to
f419861
Compare
@lidel Thank you for this PR, this will be reviewed in within the next days. |
Codecov Report
@@ Coverage Diff @@
## master #963 +/- ##
==========================================
+ Coverage 62.24% 62.29% +0.04%
==========================================
Files 20 20
Lines 1682 1684 +2
Branches 340 340
==========================================
+ Hits 1047 1049 +2
Misses 465 465
Partials 170 170
Continue to review full report at Codecov.
|
@lidel, articles in ZIMs made by mwoffliner currently have in their footer a link to the specific revision that corresponds to the text from which the snapshot is made. For example, it looks like this:
This leads to a page like that in screenshot below. Shouldn't your canonical link also contain the revision ID? |
@Jaifroid Good question! I believe the semantic meaning of
My understanding is that in the context of a wiki page the "preferred version" is the latest version, and that is what this PR links to. |
OK @lidel, thanks for the explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @lidel
Without this search engines crawling unpacked version may produce duplicated results. It does not cost much to have it, and avoids issues like ones linked below. Ref. openzim/mwoffliner#963 ipfs/distributed-wikipedia-mirror#65 https://en.wikipedia.org/wiki/Canonical_link_element
Fix #564
Change
This PR adds canonical URL in form of HTML
<link>
tag in the header of each page.Example
webUrl
=https://en.wikipedia.org/wiki/
articleId
=Mew (Pokémon)
(noteé
)produces:
Motivation
As noted in #564, HTML snapshot should point at canonical URL.
It improves semantics and help search engines to deduplicate relevant
results when multiple snapshots are published on the web.
Closes #564 cc @kelson42, ipfs/distributed-wikipedia-mirror#48