Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear Schema Evolution for JSON Schemas #1778

Closed
tfactor2 opened this issue Feb 22, 2021 · 7 comments
Closed

Unclear Schema Evolution for JSON Schemas #1778

tfactor2 opened this issue Feb 22, 2021 · 7 comments

Comments

@tfactor2
Copy link

Hi,

Preface

The problems described in Schema Evolution and Compatibility are generic ones (the order of updating producers/consumers, structure changes) as stated there explicitly:

An important aspect of data management is schema evolution. After the initial schema is defined, applications may need to evolve it over time. When this happens, it’s critical for the downstream consumers to be able to handle data encoded with both the old and the new schema seamlessly. This is an area that tends to be overlooked in practice until you run into your first production issues. Without thinking through data management and schema evolution carefully, people often pay a much higher cost later on.

When using Avro or other schema formats, one of the most important things is to manage the schemas and consider how these schemas should evolve.

From another side, JSON Schema Compatibility Rules article states that:

The JSON Schema compatibility rules are loosely based on similar rules for Avro, however, the rules for backward compatibility are more complex.
The article is a part JSON Schema Compatibility Rules which is a serde specifics article not related to common Schema Evolution problems.

Problem Statement

Compatibility checks related problems are stated as to be resolvable with Confluent Schema Registry but they are not if JSON Schema used.
At least the Schema Evolution and Compatibility article is confusing and may lead to wrong assumptions about the capabilities of Confluent Schema Registry.

With JSON Schema the compatibility types check done by Confluent Schema Registry has to be disabled (at least, how I understood it):

From #10 in Test Drive JSON Schema:

Update the compatibility requirements globally.

curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{"compatibility": "NONE"}' \
  http://localhost:8081/config
Example result (this is the default):

{"compatibility":"NONE"}

If you do not update the compatibility requirements, the following step will fail on a different error than the one being demonstrated here, due to the BACKWARD compatibility setting. For more examples of using curl against the APIs to test and set configurations, see Schema Registry API Usage Examples.

The critical conceptual restriction is described in the Test Drive section that is a part of a how-to deep serde related technical article.

Proposed Solution

  1. Explicitly mention on Schema Evolution and Compatibility that JSON Schema doesn't support schema evolution checks (at least has limited support). AND/OR
  2. Handle additionalProperties in a more advanced manner and set it automatically - for example, for BACKWARD compatible topics set additionalProperties: false for producers and additionalProperties: true for consumers.

PS: The ticket is related to the closed issue 1458.

@kijanowski
Copy link

I still think the proposed solution requires one more thing to be mentioned. A typical schema evolution scenario looks like this:

  1. apply a schema to a topic in the Schema Registry
{
  "$schema": "http://json-schema.org/draft-07/schema",
  "type": "object",
  "required": [
    "NAME",
    "ID"
  ],
  "properties": {
    "NAME": {
      "type": "string"
    },
    "ID": {
      "type": "string"
    }
  },
  "additionalProperties": false
}
  1. produce a message using the Schema Registry (value_schema_id in payload)
  2. consume the message with a Java client
  3. stop the Java client
  4. add an optional field with a default value, like this:
    "NEW": {
      "oneOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": "a default value"
    }
  1. apply it in the Schema Registry

  2. produce another message using the Schema Registry according to the old schema (same as in step 2., use the same old value_schema_id)

  3. start the Java consumer and wait for it to blow up

The client should be able to read messages with the old schema, but it works only if the json.fail.invalid.schema config property is set to false. Actually this is the case by default and that's why the kafka-json-schema-console-consumer works fine in the demo in the docs.
But is it supposed to work that way, that I have to disable schema validation to get it working?
I've introduced a backward compatible change and would expect, that the consumer is able to read both kind of messages (sent with an old as well as a new schema).

Just to be clear where the issue lies: the deserialization process works well, the payload is converted into a DTO built from the new schema. The additional field is set to a given default value.
It's the validation of the DTO (built from new schema) against the old schema, which id was attached to the message. The code is in the deserialiser:
https://github.com/confluentinc/schema-registry/blob/master/json-schema-serializer/src/main/java/io/confluent/kafka/serializers/json/AbstractKafkaJsonSchemaDeserializer.java#L131

As a result validation fails with:

org.everit.json.schema.ValidationException: #: extraneous key [NEW] is not permitted

And this is not a surprise, since additionalProperties are set to false on the old schema. Which is in line with the docs w.r.t. schema evolution:

For example, a reader’s schema can add an additional property, say myProperty, to those of the writer’s schema, but it can only be done in a backward compatible manner if the writer’s schema has a closed content model. This is because if the writer’s schema has an open content model, then the writer may have produced JSON documents with myProperty using a different type than the type expected for myProperty in the reader’s schema.

So I wonder, what is the right way to do schema evolution with a JSON Schema. Set additionalProperties to false AND do not enable schema validation (which is disabled by default)?

@rayokota
Copy link
Member

rayokota commented Apr 6, 2021

See https://yokota.blog/2021/03/29/understanding-json-schema-compatibility/

@bharti-gwalani
Copy link

Hi, we are facing the same issue, and not sure what is the fix. Could you please help

@nuria
Copy link

nuria commented Mar 24, 2023

I do not think java issues play a part on problem described by @tfactor2
The docs are confusing mostly because they are written from the avro perspective where reader and writer schema can differ and compatibility is precisely specified. In json schema reader and writter schema can also differ but practically rules are more lenient and additional non-required fields are ok.

With this definition of backwards compatibility for json schemas:

Backward compatibility – all documents that conform to the previous version of the schema are also valid according to the new version (i. e. we can always validate data with latest schema even for not-yet-updated producers)

As such, it seems, an addition of an additional not required property in a closed content model should always be backwards compatible.

@agates4
Copy link

agates4 commented Jun 7, 2023

There indeed is a bug here. Within JSON schemas, closed content models are inaccurately identified as open content models, and when an additional not required property is added to the closed content model, because of the misidentification, the new schema is marked as incompatible.

@nuria @tfactor2 do you experience this as well?

@nuria
Copy link

nuria commented Jun 7, 2023

I have certainly seen how new non required properties render schema incompatible regardless of how it is defined, which should be incorrect on an open content model

@big-andy-coates
Copy link
Contributor

I ran into the same issue. In fact, if you're looking for full compatibility then you're out of luck. While it's technically possible to add and remove optional properties with full compatibility checks, there are some serious hoops to jump through. This makes it, IMHO, not workable as a solution.

I've detailed my findings in a quick post, the second part of which proposes a more user friendly approach to implementing JSON Schema compatibility checks.

I've raised #2927 to get feedback and look at the potential of including this in the Schema Registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants