Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that all languages are indexable by crawlers #699

Closed
asmecher opened this issue Aug 24, 2015 · 47 comments
Closed

Ensure that all languages are indexable by crawlers #699

asmecher opened this issue Aug 24, 2015 · 47 comments
Assignees
Labels
Enhancement:3:Major A new feature or improvement that will take a month or more to complete.
Milestone

Comments

@asmecher
Copy link
Member

asmecher commented Aug 24, 2015

Describe the problem you would like to solve
Users can switch languages while reading a multilingual journal. However, the currently active language is stored in a cookie on the user's device and not in the URL. As a result, a search engine crawler can not index information in languages other than the journal's primary language.

Describe the solution you'd like
No consensus has been reached on a proposed solution.

Who is asking for this feature?
Multilingual journals that want to be indexed by Google (not Google Scholar).

Additional information
See http://forum.pkp.sfu.ca/t/keep-ui-archivable-by-heritrix-web-crawler/3207/6 for details.

@asmecher asmecher added this to the OJS 3.0 milestone Aug 24, 2015
@NateWr
Copy link
Contributor

NateWr commented Sep 3, 2015

I've switched to a link in my ui branch of OMP. I'll update it for OJS too, but as you discussed in email, we still need to propogate via URL.

@NateWr NateWr self-assigned this Sep 3, 2015
@NateWr
Copy link
Contributor

NateWr commented Jul 26, 2016

OJS uses links for the language switcher (can't remember when this was implemented). But I think the issue of propagating the language within the URL is more in your wheel house. Assigning back to you unless you'd like me to look further.

@NateWr NateWr assigned asmecher and unassigned NateWr Jul 26, 2016
@asmecher
Copy link
Member Author

Sure, I'll take a look.

@asmecher asmecher modified the milestones: OJS 3.0.1, OJS 3.0 Aug 5, 2016
@asmecher
Copy link
Member Author

asmecher commented Aug 5, 2016

Hmm. I don't like constructs that only show up for search engines, e.g. via user agents. So we're left with adding the language to system URLs in the general case, which I'm hesitant to impose on single-language journals; adding this as an optional mode could provide flexibility for both types of users, but switching between them might be catastrophic as all URLs would change. We could potentially have the URL generation code add a URL parameter for language, which would allow interoperability between the two modes -- but this would need to behave well with e.g. POST forms and Javascript, which might not be expecting URL parameters to suddenly get included. Deferring pending more consideration.

@asmecher asmecher modified the milestones: OJS 3.0.2, OJS 3.0.1 Oct 14, 2016
@asmecher asmecher modified the milestones: OMP 3.1, OJS 3.0.2, OJS 3.1 Jan 27, 2017
@asmecher asmecher removed this from the OJS 3.1 milestone May 25, 2017
@Vitaliy-1
Copy link
Collaborator

Greetings @asmecher
How can I add language in the url on article detail page?
For example my aim is to add additional parameter only for non-primary locale (Ukrainian in our case). The problem is that there is no other way for Google to index it...

I have already got some experience in PHP and Java EE, so hope if you guide me I could manage this problem. From where to start?

@asmecher
Copy link
Member Author

Hi @Vitaliy-1 -- the code for this is pretty much constrained to pages/article/ArticleHandler.inc.php. PATH_INFO URL components come in via the $args parameter to each function. Have a look there and see if it makes sense -- let me know if you get stuck somewhere specific.

@Vitaliy-1
Copy link
Collaborator

Vitaliy-1 commented Jul 13, 2017

Thanks for reply @asmecher ,

Hmm, $args is an array, that from my point of view contains only article id. $request is an Request Object from which I can, for example, retrieve URL, redirect request, but not to change it somehow. PATH_INFO can be seen in context of $_SERVER array. Do not see the way to modify URL here. I am missing something...

Can you show me an example of URL mapping?

I know that view function (method of this Handler class) is crucial for displaying article landing page. It is responsible for the view part of URL. How is it possible to change it from view/ to view/uk/. Or maybe to work with the last part of URL, article id, is better? Where actually the latter is come from? I though from articleid variable but changing it not make any effect...

So, I am thinking about something like:

$currentLocale = AppLocale::getLocale();
$defaultLocale = AppLocale::getPrimaryLocale();
if ($currentLocale != $defaultLocale) {
  $addToUrl = substr($currentLocale, 3, 2);
  //add $addToUrl to Url
}

Maybe just create new page with this url pattern and redirect like this to it. But in Java it is possible to map one servlet to several url patterns. I am confused.

@Vitaliy-1
Copy link
Collaborator

Vitaliy-1 commented Jul 17, 2017

Hi again, @asmecher

It's not easy without much experience in programming to read and understand others` code. But I know that you haven't got much time for helping others to write the code.

After browsing classes I found PKPPageRouter class and its method route https://github.com/pkp/pkp-lib/blob/master/classes/core/PKPPageRouter.inc.php#L146
Suppose it picks up entered by user url and associates with specific ojs file.
There is a hook inside called LoadHandler which carries 3 variables. $page and $op seems to represent parameters from url and $sourceFile represents path to smarty template (I hope).

I have created a mockup of a plugin here to manage this hook: https://github.com/Vitaliy-1/localeRedirect/blob/master/LocaleRedirectPlugin.inc.php

Can you confirm that I am on the right path? Or you wouldn't use this hook for specified earlier task?

@Vitaliy-1
Copy link
Collaborator

Vitaliy-1 commented Jul 17, 2017

Another approach, that I found, is to modify initialize function inside ArticleHandler class. As an quick example, with what planning to work:

function initialize($request, $args) {
	    if ($args[0] == "uk_UA") {
            $articleId = isset($args[1]) ? $args[1] : 0;
            $galleyId = isset($args[2]) ? $args[2] : 0;
            $request->getSession()->setSessionVar("currentLocale", "uk_UA");
        } else {
            $articleId = isset($args[0]) ? $args[0] : 0;
            $galleyId = isset($args[1]) ? $args[1] : 0;
        }
        // original code here 

        return $request
}

So the question remains what approach is better in your opinion? Or non of them? And will google actually see that page for selected locale?

@asmecher
Copy link
Member Author

@Vitaliy-1, my worry is about ambiguity in URLs. If I'm reading correctly, this would result "equivalent" URLs like...

  • .../article/view/uk_UA/smecher17/pdf
  • .../article/view/smecher17/pdf

However, that last one could be read two ways: a galley view with article ID "smecher17", galley ID "pdf", or an article view with locale "smecher17" and article ID "pdf". We can code around it here but there will be lots of knock-on complication, e.g. in parsing URLs for statistics calcuations in the log files.

I think it's definitely necessary to...

  • have unambiguous URLs (i.e.URLs shouldn't be parseable in two conflicting ways)
  • have backwards compatibility with existing URLs

What about using an optional URL parameter, e.g.: .../article/view/smecher17/pdf?locale=uk_UA? It's not as pretty as your proposal, but isn't ambiguous, and it should be clear to readers how it'll behave. To facilitate indexing, I would think the only additional thing that's needed is better linking to different-language versions, in the front end and probably also in meta content.

@Vitaliy-1
Copy link
Collaborator

Greetings @asmecher

While writing the code I have encountered a problem with language toggle. As an example of changing locale:

$_SESSION["currentLocale"] = "en_US";
or
$request->getSession()->setSessionVar("currentLocale", "en_US");

The lines above are changing actual locale text only on any second request (but session locale is changing immediately). Only way that I found includes:

$request->redirectUrl(...);

Is there more clear way?

@Vitaliy-1
Copy link
Collaborator

Ahh, The problem can be managed by assigning values inside constructor of SessionManager class. Obviously session values can't be changed if already assigned, isn't it?

@asmecher
Copy link
Member Author

@Vitaliy-1, rather than working via session parameters, I'd suggest adding a facility to the AppLocale class that permits setting the locale, rather than just getting it. This would involve moving the $currentLocale variable there out into the class, and adding a new setLocale function.

@Vitaliy-1
Copy link
Collaborator

Thanks for guidance @asmecher

There is another one problem, after applying modifications as per your advice.
The problem is that locale from all plugins don't want to change immediately after using setLocale method. They need session refreshment. But core locale is updating accordingly.

My AppLocale class: https://github.com/Vitaliy-1/AppLocale/blob/master/AppLocale.inc.php

This how I call setLocale method from a plugin: https://github.com/Vitaliy-1/localeRedirect/blob/master/LocaleRedirectPlugin.inc.php#L41

@Vitaliy-1
Copy link
Collaborator

Hi @asmecher

I have managed to make a separate URL for non-primary locale. After looking over several options and reading google guidelines about multilanguage sites I pick up a variant with separate subdomain. It has no conflicts with main code, OJS picks requests to subdomains without a need to pointing them in the apache configuration files. Only subdomain registering is needed. Have checked on the production system and it works fine with already started and new user sessions. One problem was to make a switcher on a admin dashboard side, because standard tools for routing current location weren't working in usernav.tpl (as it is not actually a page), but it was managed with HTTP_REFERER and bit of regex.

But I wasn't able to code an appropriate setter for AppLocale class, so I have done the modification for SubmissionManager class - setting the currentLocale var for user session depending on presence of subdomain in URL.

Do you actually need this sort of a plugin for public use? If so, how can I manage a setter for changing languages?

@ajnyga
Copy link
Collaborator

ajnyga commented Apr 18, 2024

The old url's work and these are and should be used when article metadata is exported somewhere, like for example Crossref and DOAJ or shown in OAI-PMH, because we of course can not know how the journal changes their settings. RSS feeds are probably, like Bozana is thinking, different in this regard.

@bozana
Copy link
Collaborator

bozana commented Apr 22, 2024

Hi @jonasraoni, as Antti-Jussi said, the old links will work.
However, the new WebFeed URLs will contain the UI language, as in the example above.
According to the https://www.rssboard.org/rss-language-codes:

The language employed in an RSS feed can be indicated in the language element,...

the language element should then also contain the UI language in the format ISO 639-1.

EDIT: The issue that should address this: #9910

@bozana
Copy link
Collaborator

bozana commented Apr 24, 2024

Hi @jyhein,
I took a look into the code once again and it looks good. Just that OMP and OPS are missing one change -- I left a comment in the PRs.
Regarding ORCID: Because it is currently being moved into the core, could you only provide the links to you changes in this issue, so that @ewhanson can consider them there: #9771. Else, you do not need to link to them in your PRs here.
Then, you can rebase everything (also the plugin submodules), create PRs for plugin submodules/repositories (and link to the PRs here in this issue above), and consider all submodules (pkp, but also every plugin submodule) in the last commit. Then, when the tests pass we can merge... :-)
Thanks a lot!

@bozana
Copy link
Collaborator

bozana commented Apr 25, 2024

Hi @jyhein (and maybe @ajnyga), what about sitemap -- does it need to contain all languages? -- s. https://developers.google.com/search/docs/specialty/international/localized-versions.

@ajnyga
Copy link
Collaborator

ajnyga commented Apr 25, 2024

My thinking was that the sitemap would guide to the primary language (via the link without the language code) and each page would have further information for search engines in the page header.

But of course adding the links to that sitemap would be doable. Just leads to a massive sitemap of course in some cases.

@bozana
Copy link
Collaborator

bozana commented Apr 25, 2024

Yes, lets leave it as it is for now... Also, as @jyhein said, it seems, only one way from 3 listed in that Google page needs to be supported...
Thanks a lot!

bozana added a commit to pkp/crossref-ojs that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/crossref-ops that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/citationStyleLanguage that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/googleScholar that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit that referenced this issue Apr 25, 2024
#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/ojs that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/omp that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
bozana added a commit to pkp/ops that referenced this issue Apr 25, 2024
pkp/pkp-lib#699 Show locale in url in multilingual contexts
@bozana
Copy link
Collaborator

bozana commented Apr 26, 2024

All merged, thanks a lot!

@bozana bozana added this to the 3.5.0 LTS milestone Apr 26, 2024
@bozana bozana closed this as completed Apr 26, 2024
@asmecher
Copy link
Member Author

@bozana / @jyhein, I'm re-opening this because it breaks my installation (specifically #9628). My local OJS is installed to http://localhost/git/ojs-main, and a typical URL into OJS is http://localhost/git/ojs-main/index.php/publicknowledge/article/view/mwandenga-signalling-theory.

With the PR applied, the path gets mixed into the path_info data. Going to http://localhost/git/ojs-main redirects me to http://localhost/git/ojs-main/index.php/git/ojs-main, and going to http://localhost/git/ojs-main/index.php/publicknowledge/article/view/mwandenga-signalling-theory redirects me to http://localhost/git/ojs-main/index.php/git/ojs-main/publicknowledge/article/view/mwandenga-signalling-theory.

The /git/ojs-main part after the index.php should not be there -- it's the installation directory and is already there before the index.php wrapper.

Can you test with the case where OJS is not installed in the server's root directory?

@asmecher asmecher reopened this Apr 29, 2024
@bozana
Copy link
Collaborator

bozana commented May 2, 2024

@asmecher, I have just merged the fix that @jyhein provided, so your installation should work correctly with the new code... :-)

@bozana bozana closed this as completed May 2, 2024
@asmecher
Copy link
Member Author

asmecher commented May 2, 2024

That works -- thanks, @jyhein and @bozana!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement:3:Major A new feature or improvement that will take a month or more to complete.
Projects
Development

No branches or pull requests

10 participants