-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
annotation: Multilingual meta data #53
Comments
My personal preference would be to separate I18N/L10N from the JSON Schema specification. A companion specification could describe how to build locale files by associating the file with the URI of the schema being localized, and associating each set of localized strings with a point in that schema through JSON pointer. The JSON pointer would simply be appended to the schema URI as a fragment, as is done for all other uses of JSON pointer to reference internal contents of the schema. This would be as powerful as the proposal above but would not burden JSON Schema at all. It would also easily allow alternate approaches to translations, which may be preferable in some implementation environments. |
Please try to avoid reinventing wheels which are already specified in JSON-LD. |
@akuckartz I really like JSON-LD, can you point to some specific examples/alternatives? |
@akuckartz Ad noted, I'd prefer not to solve this within JSON Schema at all. If JSON-LD provides a way to do this orthogonal to JSON Schema, I'm all for that. |
JSON-LD appears to have a robust mechanism for i18n, and it can nicely be reused for non-schema JSON as well, but it lacks the feature mentioned in the OP of having a JSON-LD also lacks any means to plug in a variable default locale. One fairly clean, intelligible way to add this support is to leverage the reserved tildes of JSON Reference for variable substitutions, in our case for adding in a locale (e.g., as mentioned at whitlockjc/json-refs#47 (comment) and subsequently potentially supported through whitlockjc/json-refs#54 ) So, using both JSON-LD and JSON Reference (and @handrews suggestion for JSON pointer (or JSON reference)), we could do: {
"@context": {"@language": {"$ref": "~$locale~"}},
"localization-strings": {"$ref": "locale-~$locale~.json"},
"type": "object",
"title": {"$ref": "#localization-strings/myTitle"},
"description": {"$ref": "#localization-strings/myDescription"}
}
This format is superior to using JSON-LD alone in that it allows us to maintain locale files independent for each language (as is convenient when having multiple translators) like {
"myTitle": "My title",
"myDescription": "My description"
} The aforementioned |
Note that I just edited the initial comment to include the |
Not a fan of the Regarding the question in the OP regarding other candidates for i18n besides |
Regarding the question of external files, that's a big part of why I don't think that JSON Schema should be in the business of specifying I18N/L10N at all. I18N/L10N is an issue for JSON-formatted data independent of JSON Schema. It should be solved independently, as well. It may be solved in part or entirely by JSON-LD, but whatever is not covered there should not be pulled into JSON Schema. I do support using IRIs instead of URIs as appropriate to enable I18N ( issue #59 ), but that is the extent to which I think JSON Schema should concern itself with this sort of thing. I18N/L10N is a critically important topic, and deserves better than the be shoehorned into a specification that is primarily focused on very different things. |
@brettz9 are you OK with folding this topic from https://github.com/json-schema/json-schema/wiki/Content-language-proposal into this issue?
I think this similarly lies outside the scope of JSON Schema, both because of my opinion that I18N should be handled separately and because this particular part is primarily geared towards displaying in a UI, which is not what JSON Schema is trying to solve ( see issue #55 for a discussion of such principles). For now I'm going to point readers of that wiki page here, but I'd be happy to file a separate issue (or you are welcome to yourself- please prefix it with "v6 annotation:" if you do. |
@handrews I don't have any use for the naming scheme myself, we've got milestones and tags if we need to distinguish between different types of issues, and we can filter by those to boot. |
@awwright well yeah, but I can't apply tags or milestones so the naming scheme help me :-) |
@handrews : To me, although it is probably not a feature that would be used for validation (since looking at code points might not be reliable or valid such such detection), |
@brettz9 Oh I see what you mean, I think I misinterpreted it. Yeah, I should file that separately as it is about data and not meta-data. This issue should just be about the meta-data. |
I'm sorry if I'm missing something obvious while giving my 50 cents. Why not just assume that title, description or any string type default that has an object as value it should contain translations as mentioned earlier
and to avoid the bloat and use translations in external files you could just reference.
and in the translation json:
or to have a default description included to help understanding the schema:
something like this could also be used:
or just pointing to an index translation file and have it reference specific language files:
if there's a performance hit when referencing same file multiple times maybe something like:
I'm quite new to this stuff but...could this work? |
@ruifortes : The approach you describe suffers from the following problems in comparison to other solutions:
...this could also be used to resolve to a string instead of an object (if the file were a locale for a single language). The URL could detect Accept-Language headers (though again, I prefer at least an option for substituting a locale variable so client-side replacements can be done instead). Otherwise, your solution would require locale files to be structured in such a way as to include all language data. If we would be adding a new i18n feature to JSON Schema anyways, then I think my However, since it seems more people are opposed to modifying JSON Schema to support i18n, but since i18n is a real need and throwing our hands up doesn't cut it imo, I still very much believe we ought to at least recommend or mention a specific mechanism (which I will detail in my next post). |
My proposal for a specific minimally intrusive i18n mechanism is as follows: JSON References alone could be used, but with a specifically recommended means of doing dynamic locale substitution (if server-side locale negotiation is not being used). Namely, any JSON References within the document would need to be preprocessed (something which a JSON References library ought to be able to do anyways). For our i18n, the specific preprocessing required would be to specifically replace the term "locale" surrounded by {
"@context": {"@language": {"$ref": "~$locale~"}}, // Optional JSON-LD-friendly means of embedding the default language for the document based on the current locale (and including its language code within the document)
"localization-strings": {"$ref": "locale-~$locale~.json"}, // The name "localization-string" (or whatever we agreed on) could be recommended as a convention, but not strictly required; a file might therefore become `locale-en-US.json`
"type": "object",
"title": {"$ref": "#localization-strings/myTitle"}, // Standard internal JSON Reference
"description": {"$ref": "#localization-strings/myDescription"} // Standard internal JSON Reference
} If one preferred using server-side substitutions, the {
"localization-strings": {"$ref": "locale-detection.json"}, // The name "localization-string" could be recommended as a convention, but not strictly required
"type": "object",
"title": {"$ref": "#localization-strings/myTitle"}, // Standard internal JSON Reference
"description": {"$ref": "#localization-strings/myDescription"} // Standard internal JSON Reference
} The above has the following advantages:
|
@handrews : Were you going to file https://github.com/json-schema/json-schema/wiki/Content-language-proposal ? |
Hi. The strategy I'm following uses json-references to create the final localized json. I don't know much about json-ld but I this dereferencing strategy could be used to create the final json-ld docs also. About the dereferencer: You can test it on jsbin One simple solution would be to use a pointer with a prefix (maybe "lang:") that would expect and object containing language codes as keys and would return apropriate value. Another option that I could be interesting is to use the reference object to contain other props other than "#ref" to use as configuration. What do you think?
|
@ruifortes : Hi... A few things...
To solve 4.iii, my suggestion earlier was to leverage JSON References' use of JSON Pointer which prohibits tildes (except if followed by a 0 or 1 which are reserved by the spec) to signal that our use of tildes means the property is not yet valid and must first be preprocessed. This could thus allow custom variable substitution in a way which I believe is more flexible as to placement within an absolute URL, less ambiguous as to processing requirements and intent, and also useful as a general practice for other kinds of variable substitution. To repeat an example I gave earlier, The only challenge to this I see is if the spec ever starts allowing Although I hope this can be resolved within JSON Pointer, I hope we could at least mention this as a recommended option for schema i18n. It doesn't help us to have 100 ways to i18nize (i18n is already a pain). |
Hi.
|
@ruifortes : I believe there are currently some issues with circular references that the author has stated he has a local fix for and is currently working on finishing. I believe he removed the ability to build circular objects though, but his library can nevertheless handle that kind of reference in some manner. If you are looking in the context of validation, I believe Ajv may handle circular validation, but I'm not sure how it works with remote references. I need to investigate this myself more carefully when I have the chance. Good luck! |
This thread that I started on the google group is very relevant to this discussion: |
I think we should withhold this until we take a closer look at I18N/L10N. As of right now, JSON Schema doesn't provide any text for end users, it's strictly for machines (with some opaque text properties for developers to use, like "title" and "description"). If we do JSON Schema UI, that'll be a concern there too: Presumably, rendered forms will want to present localized text fields. |
In light of the previous comment, and that I'm in agreement with...
I'm closing this issue. If later you still feel this is an important thing to do, open a new issue referencing this one. |
...except for the part where the spec for "title" and "description" reads:
I really want to get the annotation fields sorted. There are regularly statements here about using them that are more restrictive or (like in this case) flatly contradict what the spec actually says. "default" is the worst (see #204 and #217) but as this shows, "title" and "description" are also problematic. |
Has there been any additional work around localization for JSON Schema, or any exemplar projects that have found working solutions? I haven't found any other issues on the subject. We need schema localization for our project, and would rather not be trailblazing if best (or at least workable) practices have been established. (Our discussion happening here: OriginProtocol/origin#256 ) |
@wanderingstan nothing recent has happened, we're more than full up with more fundamental concerns for draft-08. I'm still not convinced that this belongs in JSON Schema, but a key focus of draft-08 is enabling well-defined extension vocabularies, so it would make it easier for you (or anyone) to write an I18N vocabulary and get it supported by implementations that support extensions. Draft-08 should make it clear how implementations can support extensions without everyone having to make up their own mechanism. @gregsdennis I'm not sure I follow how A major reason why I'm focusing on vocabularies for draft-08 is to make it easier for these other topics to move ahead without significant involvement on my part. |
@wanderingstan I would really recommend seeing if there are other I18N solutions for JSON which could be used alongside of JSON Schema. Surely someone has worked on this somewhere? JSON-LD being one example but I imagine there must be others? |
@handrews / @Relequestual / @awwright I corrected my issue reference above. If external data (perhaps through combining This would allow the schema to remain concise by not cluttering it up with multiple translations. |
I still think this needs more implementations and research. |
FWIW, here's how @wanderingstan and I solved the localization problem for our JSON schemas in our distributed application: We were already using react-intl for our localization library, so we decided to swap out all of the English strings in our JSON schemas for react-intl message IDs like this: Before:
After:
For more examples, all of our schemas can be found here. Then, we wrote a util function to replace the IDs with message strings in the user's preferred language. The full function can be found here, but this is the gist of it:
So now we just have to call
|
That's a really good example of why this is an application design concern rather than a JSON Schema concern. You're basically designing your schema to return well-known strings (keys) for error messages and annotations. Those strings are then converted (via lookup or whatever) to the localized messages. The only difference is you convert the keys in the schema before evaluation. It's a good design; I just think it should stay a part of the application. |
Controlling the consuming application is not a given thing. For example, we are using JSONSchema to provide IDEs and text editors like VSCode with IntelliSense for our config. We don't control the i18n of those applications. Hence, a use case like ours seems to require JSON Schema to define how to handle translations. In fact, I would argue that interop is the beauty of JSON Schema. To provide translation interop, JSON Schema must define handling translations as part of the spec. |
This was originally proposed on the old wiki at https://github.com/json-schema/json-schema/wiki/multilingual-meta-data-(v5-proposal) by @geraintluff with further contributions from @brettz9 and @sonnyp
The
translation
alternative discussed at the end of this comment was originally proposed at https://github.com/json-schema/json-schema/wiki/translations-(v5-proposal) by @geraintluff based on an email thread with @fge.Proposed keywords
This proposal modifies the existing properties:
title
description
This proposal would also apply to the named enumerations proposed in issue #57 , if that makes it in.
Purpose
This modification would allow inclusions of multiple translated values for the specified properties.
Currently, schemas can only specify meta-data in one language at a time. Different localisations may be requested by the client using the HTTP
Accept-Language
header, but that requires multiple (largely redundant) requests to get multiple localisations, and is only available over HTTP (not when pre-loading schemas, for instance).Values
In addition to the current string values (which are presumed to be in the language of the document), the values of these keywords may be an object.
The keys of such an object should be IETF Language Tags, and the values must be strings.
Behaviour
When the value of the keyword is an object, the most appropriate language tag should be selected by the client, and the string value used as the value of the keyword.
Example
Concerns
Schemas with many languages could end up quite bulky.
In fact, the
Accept-Language
option is in many ways more elegant, as the majority of the time only one language will be used by the client (and the other localisations will simply be noise). However, this option is not available in all situations. One might also avoid the extra bulk by using JSON references (and thereby also enable localisation files to contain all translatable text).An alternative approach to the above would be to reserve
localeKey
as a property for any schema object or sub-object andlocalization-strings
as a top-level property:The advantage to this approach would be that, as typically occurs with locale files (for reasons of convenience in independent editing by different translators), all language strings could be stored together. Thus, if leveraging JSON references, it would be a simple matter of:
or yet simpler:
Alternative: translation objects
This alterantive proposes a
translations
keyword which would be alongside thetitle
and `descripttranslation object Values
The value of
translations
would be an object. The values would be a JSON schema meta keyword and would themselves be objects,where each property key MUST be in accordance with RFC 3066
Example translation object
When translating title and description, you can easily write an object where the meta keywords are RFC3066 conformal:
Translation object concerns: where to apply?
"What would be left to specify is of course what "relevant" is here.
Apart from "title", there is "description". But I don't think we want any other keyword to be affected."
The text was updated successfully, but these errors were encountered: