-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[I18N] References to ISO-639 vs. BCP47 #959
Comments
To me, this seems to be a comment addressed to DCMI. DCAT just reuses/imports the property |
I find this quite puzzling actually and went below my radar while I should have perhaps flagged it earlier... Some comments that could have an impact on discussion here:
|
@aisaac If I remember correctly, the GLD WG thought it would be good to recommend one particular set of URIs for |
@makxdekkers thanks for the clarification. GLD is not DCMI indeed. I can agree with the fact that LoC's URIs are probably the best around for ISO codes, but am still a bit unsure about making a strong recommendation for something that would capture only a part of the multilinguality needs (especially, no subtags). We could still leave the door open to BCP47. Note that's also a problem we have for Europeana: we would like to normalize languages to URIs, and we were looking especially at the NAL from the European Commission (for example http://publications.europa.eu/resource/authority/language/FRA). But that would have implied losing all the subtags (which is something we've realized only quite recently, in fact, see https://europeana.atlassian.net/browse/RD-64), so for the moment we normalize to ISO639-3 and keep subtags according to BCP47 (for the moment just hoping that the subtags are correct, which is a bit optimistic but we plan to check that some day too). |
@aisaac As far as I have understood the intention of |
@makxdekkers We should forget the limitation of dct:language in the base DCMI namespace. This is going to be lifted very soon. Then there will remain a soft recommendation to use URIs for LinguisticSystems. And examples including ISO 639-2 and 639-5 from Library of Congress, but also BCP46 including subtags. So it will still be rather open to using subtags for scripts and other variants. I agree that most cases will focus on simple tags, but there are for sure cases (as in academic research) where finer-grained info is relevant. I guess I don't mind giving prominent mention to the simple codes (and the URIs that represent them) but it would be good to be not too exclusive - which I guess brings us back to the original comment. |
@aisaac I wasn't talking about the limitation of |
@makxdekkers I think I understand this point. I could point to the fact that some data consumers could be happy to invest in more logic, if they can get more value. But I won't do it too much. My main point here was really to make 100% clear to someone else than us that the constrain wouldn't come from the dct:language property itself, as the thread got a bit convoluted and may have hinted that it was the case. |
can dct:language be multiple-valued? Then the spec could recommend ('should') use of LoC 639-1 and 639-2 URIs, but allow ('May') additional dct:language elements with values from other more granular schemes like BCP46. This would make client logic simpler I think. |
@smrgeoinfo yes this could be an interesting option. There is nothing against having multiple values for |
following "Recommendation 7. Encoding schemes should be implemented using a 'scheme' attribute of the XML element for the property. The encoding scheme name should be given as the attribute value. For example:" in http://www.dublincore.org/specifications/dublin-core/dc-xml-guidelines/2002-04-14/, (has that been superseded?), one could go as far as something like |
Thanks for the useful input, and interesting discussions. The DCAT editing group have discussed this issue at some length and our consensus view is that now is not the best time to change the DCAT position (as in DCAT 2014) ahead of DCMI softening its stance on ranges. We’d hope that this could be re-visited as part of a future revision that could take account of any such changes, and at that point we'd look at exploiting approaches such as those suggested by @smrgeoinfo as well as any others that are applicable. The broader point in this thread – that flexibility for data providers often incurs costs for data consumers – is something that we’d like to address (perhaps in a primer document or some further examples) when we have time/people to do this. To make sure we don't forget this, this issue will remain open, but taged to 'DCAT Future Work" so it can be picked up in future revisions. |
I'm a little confused by the outcome. I think the situation is necessarily confusing. For example, the purl.org link on the RDF property I realize the desire for URLs, particularly resolvable ones, for valid language tags remains an issue for some users/implementations and that is something that hasn't been solved for BCP47 yet. It's something I will take an action item to pursue separately in the I18N WG. The key problem here is that BCP47 language tags are widely used in Web-based specifications, making specs that don't support them a potential interoperability risk. Of course, the reverse is also true (that the sudden appearance of language tags could also be a problem). I do think it would be better if DCAT could insert at least some health warning or consider some guidance to at least call out the potential for change in this area so that implementers are not surprised if later the range restrictions are relaxed. |
Note: I have opened an action item in I18N to address the lack of resolvable URLs for language tags and have contacted various people at IETF/IANA about potentially hosting it. |
@aphillips Maybe you also want to include the people at http://www.lexvo.org/ in your discussions? They have done quite a lot of work on minting URIs for language-related objects which might be useful. |
Thanks for suggesting the inclusion of a "health warning", @aphillips . This would indeed be important to address the possible confusion caused by the pointer to the inconsistent definition in DCMI you pointed out - where the textual definition says So, we can deal with this by adding a note, clarifying this point. @aphillips , if you think this can solve the issue, we'll create a draft PR for you to review. Just for our records I dug a bit into this. The inconsistent DCMI defintion was actually discussed by the GLD WG while working on the first release of DCAT - see https://www.w3.org/2011/gld/track/issues/26 - and ended up in deciding to recommend the use of URIs. Checking the different DCTERMS guidelines, the confusion is not solved. E.g., in the chapter about creating metadata of the Dublin Core User Guide, they keep on saying:
However, the associated links to examples point to the relevant section of the Publishing Metadata chapter, which instead states:
So, irrespective of the inconsistent free-text statements, If this is the case, language codes are meant to be used only with class Of course, this may change in the future, if DCMI is going to relax its axioms, going maybe so far as to make an object property also a datatype property (as @aisaac noted). However, as @makxdekkers was arguing, this will lead to a backward compatibility issue (at least when Actually, there may be the option of using the corresponding property from DCMI Elements with literals, namely, |
We discussed this issue in the last DCAT subgroup meeting. @aphillips: do you think a "health warning" in the specification can solve the issue? And in such a case, could you provide us with a text to include? |
Having re-read this thread this morning and gone back to the editor's draft... the text here says:
I think this should say something entirely different. It should probably say something like:
Alternatively, if you don't want to change the recommendation, a health warning could be something like:
I'm happy to discuss the details here. If you need me to attend a future call, please let me know. (A couple of minor side notes. BCP47 is a better reference than any of its constituent RFCs; the current "core" RFC of BCP47 is 5646 (4646 was an older edition). ISO 639-3 and -2 are linked together ( |
@aphillips: We have considered your points, and after some discussions, we included the health warning you suggested. |
Closing applying the "due for closing" policy, and also considering we have implemented Addison's suggestion. |
In our teleconference today, the I18N WG agreed that we are satisfied by this change. Please remember to add this to DCAT3 also. |
6.4.9 Property: language (and other locations)
https://www.w3.org/TR/2019/WD-vocab-dcat-2-20190528/
Dublin Core, which is the source for this reference, is unclear (AFAICT) in its relationship to BCP47 and language tags. The use of ISO 639 parts 1 and 2 (without reference to part 3) and the lack of support for the various kinds of subtags in BCP47 (not to mention the stability guarantees found in the BCP registry) are somewhat outdated. It's also notable that the description in DC of "LingusticsSystem" seems to be more like language tags than 639 provides.
Some systems (e.g. library technologies) might be limited to the 639 codes simply because they are intended for a specific industry segment (which is fine, since BCP47 includes the 639 codes). For general interchange, however, it would be better if DCAT permitted language tags and not just 639 codes, as this is the modern standard and used broadly on the Internet/Web.
[This comment is part of the I18N horizontal review.]
The text was updated successfully, but these errors were encountered: