Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotation: Multilingual meta data #53

Closed
handrews opened this issue Sep 16, 2016 · 36 comments
Closed

annotation: Multilingual meta data #53

handrews opened this issue Sep 16, 2016 · 36 comments

Comments

@handrews
Copy link
Contributor

handrews commented Sep 16, 2016

This was originally proposed on the old wiki at https://github.com/json-schema/json-schema/wiki/multilingual-meta-data-(v5-proposal) by @geraintluff with further contributions from @brettz9 and @sonnyp

The translation alternative discussed at the end of this comment was originally proposed at https://github.com/json-schema/json-schema/wiki/translations-(v5-proposal) by @geraintluff based on an email thread with @fge.

Proposed keywords

This proposal modifies the existing properties:

  • title
  • description

This proposal would also apply to the named enumerations proposed in issue #57 , if that makes it in.

Purpose

This modification would allow inclusions of multiple translated values for the specified properties.

Currently, schemas can only specify meta-data in one language at a time. Different localisations may be requested by the client using the HTTP Accept-Language header, but that requires multiple (largely redundant) requests to get multiple localisations, and is only available over HTTP (not when pre-loading schemas, for instance).

Values

In addition to the current string values (which are presumed to be in the language of the document), the values of these keywords may be an object.

The keys of such an object should be IETF Language Tags, and the values must be strings.

Behaviour

When the value of the keyword is an object, the most appropriate language tag should be selected by the client, and the string value used as the value of the keyword.

Example

{
    "title": {
        "en": "Example schema",
        "de": "..."
    }
}

Concerns

Schemas with many languages could end up quite bulky.

In fact, the Accept-Language option is in many ways more elegant, as the majority of the time only one language will be used by the client (and the other localisations will simply be noise). However, this option is not available in all situations. One might also avoid the extra bulk by using JSON references (and thereby also enable localisation files to contain all translatable text).

An alternative approach to the above would be to reserve localeKey as a property for any schema object or sub-object and localization-strings as a top-level property:

{
    "localization-strings": {
        "en": {
            "example": {
                "title": "Example schema",
                "description": "Example schema description"
            }
        },
        "de": {
            "example": {}
        }
    },
    "type": "object",
    "localeKey": "example"
}

The advantage to this approach would be that, as typically occurs with locale files (for reasons of convenience in independent editing by different translators), all language strings could be stored together. Thus, if leveraging JSON references, it would be a simple matter of:

{
    "localization-strings": {
        "en": {
            "$ref": "locale_en-US.json"
        },
        "de": {
            "$ref": "locale_de.json"
        }
    },
    "type": "object",
    "localeKey": "example"
}

or yet simpler:

{
    "localization-strings": {"$ref": "locales.json"},
    "type": "object",
    "localeKey": "example"
}

Alternative: translation objects

This alterantive proposes a translations keyword which would be alongside the title and `descript

translation object Values

The value of translations would be an object. The values would be a JSON schema meta keyword and would themselves be objects,
where each property key MUST be in accordance with RFC 3066

Example translation object

When translating title and description, you can easily write an object where the meta keywords are RFC3066 conformal:

{
    "title": "postal code",
    "description": "A postal code.",
    "translations": { 
        "title": { 
            "en-GB": "postcode",
            "en-US": "zip code",
            "de": "Postleitzahl",
            "fr": "code postal"
        },
        "description": {
            "en-GB": "A Royal Mail postcode.",
            "en-US": "An USPS ZIP code.",
            //  ...
        }
    }
    //  ...
}

Translation object concerns: where to apply?

"What would be left to specify is of course what "relevant" is here.
Apart from "title", there is "description". But I don't think we want any other keyword to be affected."

@handrews
Copy link
Contributor Author

My personal preference would be to separate I18N/L10N from the JSON Schema specification. A companion specification could describe how to build locale files by associating the file with the URI of the schema being localized, and associating each set of localized strings with a point in that schema through JSON pointer. The JSON pointer would simply be appended to the schema URI as a fragment, as is done for all other uses of JSON pointer to reference internal contents of the schema.

This would be as powerful as the proposal above but would not burden JSON Schema at all. It would also easily allow alternate approaches to translations, which may be preferable in some implementation environments.

@akuckartz
Copy link

Please try to avoid reinventing wheels which are already specified in JSON-LD.

@awwright
Copy link
Member

@akuckartz I really like JSON-LD, can you point to some specific examples/alternatives?

@handrews
Copy link
Contributor Author

@akuckartz Ad noted, I'd prefer not to solve this within JSON Schema at all. If JSON-LD provides a way to do this orthogonal to JSON Schema, I'm all for that.

@brettz9
Copy link

brettz9 commented Sep 17, 2016

JSON-LD appears to have a robust mechanism for i18n, and it can nicely be reused for non-schema JSON as well, but it lacks the feature mentioned in the OP of having a localeKey to allow a standardized means of referencing keys within simple key-value, single-language modular JSON locale documents (e.g., flat files) without any proprietary (or necessarily server-side) means of substitution, embedding, or transformation.

JSON-LD also lacks any means to plug in a variable default locale. One fairly clean, intelligible way to add this support is to leverage the reserved tildes of JSON Reference for variable substitutions, in our case for adding in a locale (e.g., as mentioned at whitlockjc/json-refs#47 (comment) and subsequently potentially supported through whitlockjc/json-refs#54 )

So, using both JSON-LD and JSON Reference (and @handrews suggestion for JSON pointer (or JSON reference)), we could do:

{
    "@context": {"@language": {"$ref": "~$locale~"}},
    "localization-strings": {"$ref": "locale-~$locale~.json"},
    "type": "object",
    "title": {"$ref": "#localization-strings/myTitle"},
    "description": {"$ref": "#localization-strings/myDescription"}
}

~$locale~ could be substituted with the language code from an Accept-Language header or as otherwise determined by the application.

This format is superior to using JSON-LD alone in that it allows us to maintain locale files independent for each language (as is convenient when having multiple translators) like locale-en-US.json with keys like:

{
     "myTitle": "My title",
     "myDescription": "My description"
}

The aforementioned localeKey was more succinct than using JSON references as we do above, but this JSON References approach at least uses the draft JSON Reference spec rather than a proprietary property; though the variable substitutions we use within the JSON reference are non-standard, they at least make clear that pre-processing is required given their use of reserved tildes (with a 0 or 1 following) precludes them from being valid JSON references.

@handrews
Copy link
Contributor Author

Note that I just edited the initial comment to include the translation alternative, as I've generally been consolidating proposals targeting the same problem into a single issue for discussion.

@brettz9
Copy link

brettz9 commented Sep 19, 2016

Not a fan of the translations approach (at least alone), as it, as with JSON-LD, doesn't allow for separate language files.

Regarding the question in the OP regarding other candidates for i18n besides title and description, some new fields enumTitles would be very useful (and maybe enumDescriptions for parity), so that, for example, items in a pull-down could be made human-readable, and in an i18n-izable fashion (I think this proposal was brought up somewhere else already).

@handrews
Copy link
Contributor Author

@brettz9 for enumTitles and enumDescriptions see issue #57 (named enumerations), particularly the surprisingly concise oneOf-based solution that @awwright proposes later in the comments. It would reduce the problem back to the regular title and description.

@handrews
Copy link
Contributor Author

Regarding the question of external files, that's a big part of why I don't think that JSON Schema should be in the business of specifying I18N/L10N at all. I18N/L10N is an issue for JSON-formatted data independent of JSON Schema.

It should be solved independently, as well. It may be solved in part or entirely by JSON-LD, but whatever is not covered there should not be pulled into JSON Schema. I do support using IRIs instead of URIs as appropriate to enable I18N ( issue #59 ), but that is the extent to which I think JSON Schema should concern itself with this sort of thing.

I18N/L10N is a critically important topic, and deserves better than the be shoehorned into a specification that is primarily focused on very different things.

@handrews
Copy link
Contributor Author

@brettz9 are you OK with folding this topic from https://github.com/json-schema/json-schema/wiki/Content-language-proposal into this issue?

As with HTML's lang/xml:lang properties, it would be useful to indicate the content language of a particular field in a standard manner. This might be used for proper font display (as in CJK languages) or for selectively showing content to users based on their locale.

I think a name "contentLang" for the property would avoid confusion at (falsely) thinking that this was necessarily indicating the language of the "title" or "description" itself.

I think this similarly lies outside the scope of JSON Schema, both because of my opinion that I18N should be handled separately and because this particular part is primarily geared towards displaying in a UI, which is not what JSON Schema is trying to solve ( see issue #55 for a discussion of such principles).

For now I'm going to point readers of that wiki page here, but I'd be happy to file a separate issue (or you are welcome to yourself- please prefix it with "v6 annotation:" if you do.

@awwright
Copy link
Member

@handrews I don't have any use for the naming scheme myself, we've got milestones and tags if we need to distinguish between different types of issues, and we can filter by those to boot.

@handrews
Copy link
Contributor Author

@awwright well yeah, but I can't apply tags or milestones so the naming scheme help me :-)
If you want to go through and tag them as valdiation/annotation/hyper-schema and set the v5 and v6 milestones I will be happy to change the titles and drop the prefixes. I just figured I was bugging you about enough things already.

@brettz9
Copy link

brettz9 commented Sep 20, 2016

@handrews : To me, although it is probably not a feature that would be used for validation (since looking at code points might not be reliable or valid such such detection), contentLang does describe the data, placing constraints on how it is to be understood which brings it more into the world of schemas, as I see it (even more so than the i18n of "title", etc. which doesn't pertain to the data). And this is not something that can be solved by JSON-LD unless one uses JSON-LD in the instance documents (which defeats the purpose of being able to add this kind of pseudo-constraint at the schema level).

@handrews
Copy link
Contributor Author

@brettz9 Oh I see what you mean, I think I misinterpreted it. Yeah, I should file that separately as it is about data and not meta-data. This issue should just be about the meta-data.

@ruifortes
Copy link

I'm sorry if I'm missing something obvious while giving my 50 cents.
I'm assuming you would always need a parser in order to only included the specified language text.

Why not just assume that title, description or any string type default that has an object as value it should contain translations as mentioned earlier

{
  type: object,
  properties: {
    id: {
      title: {en: 'Product ID', fr: 'ID du produit',    de: 'Produkt ID'},
      description: {
        en: 'A unique identifier that identifies this product.',
        fr: 'Un identifiant unique qui identifie ce produit.',
        de: 'Eine eindeutige Kennung, die dieses Produkt identifiziert.'},
      type: 'string'
    }    
  }
}

and to avoid the bloat and use translations in external files you could just reference.

{
  type: object,
  properties: {
    id: {
      title: {$ref :'url://schemas/product/translations#id/title'},
      description: {$ref :'url://schemas/product/translations#id/desc'}
      type: 'string'
    }    
  }
}

and in the translation json:

{
  id: {
    title: {en: 'Product ID', fr: 'ID du produit', de: 'Produkt ID'},
    desc: {
      en: 'A unique identifier that identifies this product.',
      fr: 'Un identifiant unique qui identifie ce produit.',
      de: 'Eine eindeutige Kennung, die dieses Produkt identifiziert.'},
    }    
  }
}

or to have a default description included to help understanding the schema:

{
  type: object,
  properties: {
    id: {
      title: {en: 'Product ID', $ref :'url://schemas/product/translations#id/title'},
      description: {
        en: 'A unique identifier that identifies this product.',
        $ref :'url://schemas/product/translations#id/desc'
      type: 'string'
    }    
  }
}

something like this could also be used:

{
  type: object,
  properties: {
    id: {
      title: {
        en: {$ref :'url://schemas/product/translations_en#id/title'}
        fr: {$ref :'url://schemas/product/translations_fr#id/title'}
        de: {$ref :'url://schemas/product/translations_de#id/title'}
      },
      description: {
        en: {$ref :'url://schemas/product/translations_en#id/desc'},
        fr: {$ref :'url://schemas/product/translations_fr#id/desc'},
        de: {$ref :'url://schemas/product/translations_de#id/desc'},
      type: 'string'
    }
  }
}

or just pointing to an index translation file and have it reference specific language files:

{
  id: {
    title: {
      en: {$ref :'url://schemas/product/translations_en#id/title'}
      fr: {$ref :'url://schemas/product/translations_fr#id/title'}
      de: {$ref :'url://schemas/product/translations_de#id/title'}  
    },
    desc: {
      en: {$ref :'url://schemas/product/translations_en#id/desc'},
      fr: {$ref :'url://schemas/product/translations_fr#id/desc'},
      de: {$ref :'url://schemas/product/translations_de#id/desc'},
    }    
  }
}

if there's a performance hit when referencing same file multiple times maybe something like:

{
  type: object,
  properties: {
    id: {
      title: {$ref :'#/translations/id/title'},
      description: {$ref :'#/translations/id/desc'}
      type: 'string'
    }    
  },
  translations: {$ref :'url://schemas/product/translations'}
}

I'm quite new to this stuff but...could this work?

@brettz9
Copy link

brettz9 commented Oct 9, 2016

@ruifortes : The approach you describe suffers from the following problems in comparison to other solutions:

  1. It requires modifying JSON Schema to support a new feature (objects for title and description).
  2. Although some parsing is indeed necessary for i18n detecting and substituting text for a single language (whether the solution may be within JSON Schema, JSON-References, etc.), it would be nice to have a standard way to indicate the need for a locale substitution so that the relevant library handling JSON Schema, JSON References, etc. would be able to handle substitutions out of the box and so that people looking at such i18n JSON Schema documents would be able to readily identify what mechanism was being used to provide i18n (and without needing to retrieve data from all languages). I offer my solution at annotation: Multilingual meta data #53 (comment) as a potentially standardizable means of leveraging JSON References for this purpose without needing to force people into elaborate custom solutions. Notice that in your examples of:

title: {$ref :'url://schemas/product/translations#id/title'},

...this could also be used to resolve to a string instead of an object (if the file were a locale for a single language). The URL could detect Accept-Language headers (though again, I prefer at least an option for substituting a locale variable so client-side replacements can be done instead). Otherwise, your solution would require locale files to be structured in such a way as to include all language data.

If we would be adding a new i18n feature to JSON Schema anyways, then I think my localeKey solution would be far preferable in giving succinctness over JSON References and flexibility as to how to store the locale data, including as independent files for each language, though I wouldn't be opposed to supporting title/description as objects as well.

However, since it seems more people are opposed to modifying JSON Schema to support i18n, but since i18n is a real need and throwing our hands up doesn't cut it imo, I still very much believe we ought to at least recommend or mention a specific mechanism (which I will detail in my next post).

@brettz9
Copy link

brettz9 commented Oct 9, 2016

My proposal for a specific minimally intrusive i18n mechanism is as follows:

JSON References alone could be used, but with a specifically recommended means of doing dynamic locale substitution (if server-side locale negotiation is not being used).

Namely, any JSON References within the document would need to be preprocessed (something which a JSON References library ought to be able to do anyways). For our i18n, the specific preprocessing required would be to specifically replace the term "locale" surrounded by ~$...~. (We use tildes used since they are reserved to be followed by a 0 or 1 in a valid JSON Reference, and this use of an invalid JSON references would signal that the document had not yet been preprocessed to make JSON Reference friendly, yet the use of "locale" would be intelligible as to its purpose.)

{
    "@context": {"@language": {"$ref": "~$locale~"}}, // Optional JSON-LD-friendly means of embedding the default language for the document based on the current locale (and including its language code within the document)
    "localization-strings": {"$ref": "locale-~$locale~.json"}, // The name "localization-string" (or whatever we agreed on) could be recommended as a convention, but not strictly required; a file might therefore become `locale-en-US.json`
    "type": "object",
    "title": {"$ref": "#localization-strings/myTitle"}, // Standard internal JSON Reference
    "description": {"$ref": "#localization-strings/myDescription"} // Standard internal JSON Reference
}

If one preferred using server-side substitutions, the Accept-Language header could instead be used to determine the server-side generation of content for locale-detection.json:

{
    "localization-strings": {"$ref": "locale-detection.json"}, // The name "localization-string" could be recommended as a convention, but not strictly required
    "type": "object",
    "title": {"$ref": "#localization-strings/myTitle"}, // Standard internal JSON Reference
    "description": {"$ref": "#localization-strings/myDescription"} // Standard internal JSON Reference
}

The above has the following advantages:

  1. Rather than reinventing the wheel or requiring many modifications to support i18n (besides use of a good JSON References library), it leverages the JSON References draft standard, adding support for variable substitution (which is a feature JSON References could really use anyways) in a manner which does not conflict with any current valid JSON References documents.
  2. It allows locale files to be separated by language (or not).
  3. It allows transparent client-side substitution (as well as opaque server-side substitution) without needing to import content of all languages.
  4. Could potentially be used with JSON-LD or other mechanisms but does not require it.

@brettz9
Copy link

brettz9 commented Oct 11, 2016

@ruifortes
Copy link

ruifortes commented Oct 25, 2016

Hi.
I'm doing some experiments regarding localization and started developing a json dereferencer (json-deref) and would appreciate some feedback.

The strategy I'm following uses json-references to create the final localized json. I don't know much about json-ld but I this dereferencing strategy could be used to create the final json-ld docs also.

About the dereferencer:
json-deref can have a loader (both local and external) that is called for each found json-reference.
This loader is passed the pointer fragment (or url), the other properties of the reference object and a defaultLoader that can be used to retrieve a pointer (normaly the current one) in the current json document.

You can test it on jsbin
Repository is here

One simple solution would be to use a pointer with a prefix (maybe "lang:") that would expect and object containing language codes as keys and would return apropriate value.

Another option that I could be interesting is to use the reference object to contain other props other than "#ref" to use as configuration. What do you think?
One advantage would be to have a default text alongside with the reference for readability purposes.
Something like:

{
  "description": {
    "en": "foo prop description",
    "$ref": "translations#lang:props/foo"
  }
}

also some traps are addressed in this jsbin

@brettz9
Copy link

brettz9 commented Oct 28, 2016

@ruifortes : Hi... A few things...

  1. Have you seen https://github.com/whitlockjc/json-refs , a library which already does JSON reference dereferencing (including supporting callbacks for preprocessing)?
  2. Per the JSON References spec, Any members other than "$ref" in a JSON Reference object SHALL be ignored.. While one might interpret this as meaning that no changes are to be made to the properties, per this comment at least, it seems it has been understand as meaning that other properties should not be used on the same object to be compliant with other implementations.
  3. As I've mentioned, having all translations mixed together in a single, non-modular object or file does not conduce well to the normal practice of allowing translators to work independently. If translations are to be put in an object, they ought to be put in an object first keyed to language code and then to key and value (though without a substitution mechanism, this would still suffer from loading all locales and not picking one). This admittedly doesn't allow the locale info entirely inline (a reference is required to an object elsewhere with the locale data), but that is not a good practice to be encouraging anyways.
  4. I see a few potential problems with your lang: prefix:
    1. I'm not sure how well this would work with absolute URLs
    2. While the json: protocol (or a lang: protocol) in your example could have a portion for referencing a hard-coded absolute URL (but whose language code portion was to vary), I'm not sure how friendly or intuitive that would appear (not to mention the ugliness in any encoding that such absolute URLs might require to be valid URIs).
    3. If allowing lang: anywhere within the URL (in order to solve the problem in (4.i)), there's the question of distinguishing it from another portion of a URL using such a string, and a person reading the JSON reference wouldn't automatically know whether it was intended for i18n or just happened to be resolving to a hash named lang:props/foo (or depending on its placement, to a JSON property named lang:props).

To solve 4.iii, my suggestion earlier was to leverage JSON References' use of JSON Pointer which prohibits tildes (except if followed by a 0 or 1 which are reserved by the spec) to signal that our use of tildes means the property is not yet valid and must first be preprocessed. This could thus allow custom variable substitution in a way which I believe is more flexible as to placement within an absolute URL, less ambiguous as to processing requirements and intent, and also useful as a general practice for other kinds of variable substitution.

To repeat an example I gave earlier, ~$...~ could be used to denote a variable (or a reserved variable) such that {"$ref": "locale-~$locale~.json"} could, post-substitution, become {"$ref": "locale-en-US.json"}.

The only challenge to this I see is if the spec ever starts allowing ~$...~ for its own purpose (though I've submitted a request to the relevant ART group at IETF for them to reserve this sequence, if not also make locale a specifically reserved word). I can report back on any response.

Although I hope this can be resolved within JSON Pointer, I hope we could at least mention this as a recommended option for schema i18n. It doesn't help us to have 100 ways to i18nize (i18n is already a pain).

@ruifortes
Copy link

ruifortes commented Nov 1, 2016

Hi.
you're right in point 3. I'm doing fairly simple UI localization and never thought about larger project with multiple translators.
I also saw json-refs but it seamed over complex and went to json-schema-deref. Turned out the later only allowed for external loaders.
I'll look at json-refs closelly but it failed some pointer dereferencing in my tests.
Does it solve this chained references like these?

{
  "a": {"$ref": "#/b"},
  "b": {"$ref": "#/c"},
  "c": "value of c"
}

@brettz9
Copy link

brettz9 commented Nov 1, 2016

@ruifortes : I believe there are currently some issues with circular references that the author has stated he has a local fix for and is currently working on finishing. I believe he removed the ability to build circular objects though, but his library can nevertheless handle that kind of reference in some manner. If you are looking in the context of validation, I believe Ajv may handle circular validation, but I'm not sure how it works with remote references. I need to investigate this myself more carefully when I have the chance. Good luck!

@handrews
Copy link
Contributor Author

This thread that I started on the google group is very relevant to this discussion:
https://groups.google.com/forum/#!topic/json-schema/cG4HAyerqQk

@handrews handrews changed the title v6 annotation: Multilingual meta data annotation: Multilingual meta data Nov 24, 2016
@awwright
Copy link
Member

awwright commented Dec 3, 2016

I think we should withhold this until we take a closer look at I18N/L10N.

As of right now, JSON Schema doesn't provide any text for end users, it's strictly for machines (with some opaque text properties for developers to use, like "title" and "description").

If we do JSON Schema UI, that'll be a concern there too: Presumably, rendered forms will want to present localized text fields.

@Relequestual
Copy link
Member

In light of the previous comment, and that I'm in agreement with...

Regarding the question of external files, that's a big part of why I don't think that JSON Schema should be in the business of specifying I18N/L10N at all. I18N/L10N is an issue for JSON-formatted data independent of JSON Schema.
(@handrews)

I'm closing this issue. If later you still feel this is an important thing to do, open a new issue referencing this one.

@handrews
Copy link
Contributor Author

handrews commented Jan 4, 2017

As of right now, JSON Schema doesn't provide any text for end users, it's strictly for machines (with some opaque text properties for developers to use, like "title" and "description").

...except for the part where the spec for "title" and "description" reads:

Both of these keywords can be used to decorate a user interface with information about the data produced by this user interface.

I really want to get the annotation fields sorted. There are regularly statements here about using them that are more restrictive or (like in this case) flatly contradict what the spec actually says. "default" is the worst (see #204 and #217) but as this shows, "title" and "description" are also problematic.

@wanderingstan
Copy link

Has there been any additional work around localization for JSON Schema, or any exemplar projects that have found working solutions? I haven't found any other issues on the subject.

We need schema localization for our project, and would rather not be trailblazing if best (or at least workable) practices have been established. (Our discussion happening here: OriginProtocol/origin#256 )

@gregsdennis
Copy link
Member

gregsdennis commented Jun 12, 2018

@handrews this would be an excellent use case for the $data keyword proposal #549 #51.

@handrews
Copy link
Contributor Author

handrews commented Jun 12, 2018

@wanderingstan nothing recent has happened, we're more than full up with more fundamental concerns for draft-08.

I'm still not convinced that this belongs in JSON Schema, but a key focus of draft-08 is enabling well-defined extension vocabularies, so it would make it easier for you (or anyone) to write an I18N vocabulary and get it supported by implementations that support extensions. Draft-08 should make it clear how implementations can support extensions without everyone having to make up their own mechanism.

@gregsdennis I'm not sure I follow how $data fits in, but in general I am leaving $data to @awwright and/or @Relequestual (aside from ensuring the core spec allows for keywords that interact with the instance). I'm also unlikely to pick up I18N, given that I have at least two years' worth of backlogged work between draft-08, hypermedia operations, and API documentation (based on an assumption of drafts every 6 months and a ballpark estimate of how many iterations will be needed on each).

A major reason why I'm focusing on vocabularies for draft-08 is to make it easier for these other topics to move ahead without significant involvement on my part.

@handrews
Copy link
Contributor Author

@wanderingstan I would really recommend seeing if there are other I18N solutions for JSON which could be used alongside of JSON Schema. Surely someone has worked on this somewhere? JSON-LD being one example but I imagine there must be others?

@gregsdennis
Copy link
Member

gregsdennis commented Jun 13, 2018

@handrews / @Relequestual / @awwright I corrected my issue reference above.

If external data (perhaps through combining $data and $ref) could specify all supported translations, then the correct translation could be extracted through a use of the $data keyword. The only unknown is passing the desired culture as a parameter.

This would allow the schema to remain concise by not cluttering it up with multiple translations.

@awwright
Copy link
Member

I still think this needs more implementations and research.
JSON Schema takes a different enough approach it's difficult to make analogies from HTML to JSON documents.
Normally, different languages would be presented as different HTML documents, and could be negotiated and selected between by a user-agent. I don't know if that makes sense here, if there's an analogous way to do that at all.
Nonetheless, clients should be able to discover alternate languages and select between them.

@jordajm
Copy link

jordajm commented Jun 21, 2018

FWIW, here's how @wanderingstan and I solved the localization problem for our JSON schemas in our distributed application:

We were already using react-intl for our localization library, so we decided to swap out all of the English strings in our JSON schemas for react-intl message IDs like this:

Before:

    "location": {
      "type": "string",
      "title": "Location"
    },

After:

    "location": {
      "type": "string",
      "title": "schema.housing.location"
    },

For more examples, all of our schemas can be found here.

Then, we wrote a util function to replace the IDs with message strings in the user's preferred language. The full function can be found here, but this is the gist of it:

  if (schema.description) {    
    schema.description = globalIntlProvider.formatMessage(schemaMessages[schemaType][schema.description])
  }

So now we just have to call translateSchema before we use any schema and all the strings are localized. For example:

<Form
  schema={translateSchema(this.state.selectedSchema, this.state.selectedSchemaType)}
  onSubmit={this.onDetailsEntered}
  formData={this.state.formListing.formData}
>

@gregsdennis
Copy link
Member

That's a really good example of why this is an application design concern rather than a JSON Schema concern.

You're basically designing your schema to return well-known strings (keys) for error messages and annotations. Those strings are then converted (via lookup or whatever) to the localized messages. The only difference is you convert the keys in the schema before evaluation.

It's a good design; I just think it should stay a part of the application.

@samuelstroschein
Copy link

That's a really good example of why this is an application design concern rather than a JSON Schema concern.
It's a good design; I just think it should stay a part of the application.

Controlling the consuming application is not a given thing.

For example, we are using JSONSchema to provide IDEs and text editors like VSCode with IntelliSense for our config. We don't control the i18n of those applications. Hence, a use case like ours seems to require JSON Schema to define how to handle translations.

In fact, I would argue that interop is the beauty of JSON Schema. To provide translation interop, JSON Schema must define handling translations as part of the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants