Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue adding fields to stream using JSON_SR #7707

Open
mikebin opened this issue Jun 21, 2021 · 3 comments
Open

Issue adding fields to stream using JSON_SR #7707

mikebin opened this issue Jun 21, 2021 · 3 comments
Assignees
Labels
bug P1 Slightly lower priority to P0 ;) planning streaming-engine Tickets owned by the ksqlDB Streaming Team

Comments

@mikebin
Copy link

mikebin commented Jun 21, 2021

Describe the bug
Issue using CREATE OR REPLACE STREAM to add new field(s) with JSON_SR.

To Reproduce
Version: ksqlDB 6.2.0/0.17.0

-- Initial stream with 1 field
create stream s (val string) with (kafka_topic='s', value_format='json_sr', partitions=1);

-- Modify stream to add field
create or replace stream s (val string, newfield string) with (kafka_topic='s', value_format='json_sr', partitions=1);

-- Insert will fail attempting to register new schema version
insert into s (val, newfield) values ('a', '1');

Expected behavior
New schema version will be registered containing newfield.

Actual behaviour
Failed to insert values into 'S'. Could not serialize value: [ 'a' | '1' ]. Error serializing message to topic: s. Failed to access Avro data from topic s : Schema being registered is incompatible with an earlier schema for subject "s-value"; error code: 409

Note the error message above says Avro, even though this is JSON Schema. That should also be corrected.

Log from schema registry:

[2021-06-21 09:45:30,307] WARN Found incompatible change: Difference{jsonPath='#/properties/NEWFIELD', type=PROPERTY_ADDED_TO_OPEN_CONTENT_MODEL} (io.confluent.kafka.schemaregistry.json.JsonSchema:322)

Additional context

  • The same steps above work fine for AVRO and PROTOBUF. This issue only occurs with JSON_SR
  • Problem might be that ksqlDB needs to set "additionalProperties": false.
@colinhicks colinhicks added the streaming-engine Tickets owned by the ksqlDB Streaming Team label Jun 21, 2021
@vcrfxia vcrfxia added P0 Denotes must-have for a given milestone and removed needs-triage labels Jun 22, 2021
@colinhicks
Copy link
Contributor

colinhicks commented Jul 21, 2021

Partial workaround: Manually register the initial schema with additionalProperties: false. This must be done before creating the stream.

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schemaType":"JSON","schema":"{\"type\":\"object\",\"properties\":{\"VAL\":{\"connect.index\":0,\"oneOf\":[{\"type\":\"null\"},{\"type\":\"string\"}]}}, \"additionalProperties\":false}"}' \
http://localhost:8081/subjects/s-value/versions
ksql> create stream s (val string) with (kafka_topic='s', value_format='json_sr', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> create or replace stream s (val string, newfield string) with (kafka_topic='s', value_format='json_sr', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> insert into s (val, newfield) values ('a', '1');

ksql> select * from s emit changes;
+---------------------------------------------------------------------+---------------------------------------------------------------------+
|VAL                                                                  |NEWFIELD                                                             |
+---------------------------------------------------------------------+---------------------------------------------------------------------+
|a                                                                    |1                                                                    |

This works because the existing schema uses the closed model (additionalProperties: false). Unfortunately, because ksqlDB doesn't specify additionalProperties: false in the request, this value defaults to true in the update.

That means a second evolution of the query won't work.

ksql> create or replace stream s (val string, newfield string, anotherfield string) with (kafka_topic='s4', value_format='json_sr', partitions=1); 

ksql> insert into s (val, newfield, anotherfield) values ('a', '1', 'b');
Failed to insert values into 'S'. Could not serialize value: [ 'a' | '1' | 'b' ]. Error serializing message to topic: s. Failed to access Avro data from topic s : Schema being registered is incompatible with an earlier schema for subject "s-value"; error code: 409

It's also not possible to manually update the schema again, because once additionalProperties is set to or defaults to true, it's effectively sticky. Even a subsequent manual request that sets additionalProperties: false will not pass the compatibility test.

@colinhicks
Copy link
Contributor

A more complete workaround is to always manually update the schema, with each evolution, before using ksqlDB to insert data.

@colinhicks
Copy link
Contributor

colinhicks commented Jul 21, 2021

And, as @mikebin suggested, it looks like having ksqlDB itself set "additionalProperties": false would avoid the need for any workaround. However, we should first understand if the default was used on purpose for some other reason.

It's worth noting there is some general confusion about how schema evolution works with the JSON schema format: confluentinc/schema-registry#1778. I found the closed vs. open model controlled by additionalProperties to be counterintuitive. This blog post is a good overview: https://yokota.blog/2021/03/29/understanding-json-schema-compatibility/

@colinhicks colinhicks self-assigned this Jul 23, 2021
@suhas-satish suhas-satish added P1 Slightly lower priority to P0 ;) and removed P0 Denotes must-have for a given milestone labels Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P1 Slightly lower priority to P0 ;) planning streaming-engine Tickets owned by the ksqlDB Streaming Team
Projects
None yet
Development

No branches or pull requests

4 participants