You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's clarify the required definitions to understand the issue properly:
Topic: This is a mechanism used by Kafka to group different messages.
Schema: It's a description of the structure that a message must adhere to.
Subject: An entity required by the Schema Registry to establish a relationship between a schema and a topic. It's used to check the right to publish a specific messages in a topic.
The expected flow, given the entities introduced above, is as follows:
We expect the user to create a schema.
The user should register the schema with a specific topic using a designated Subject.
The user should register a message with a specific topic using a certain Subject.
This implies verifying that the Subject is allowed/registered to publish to a particular topic.
It also involves verifying that the message structure complies with the schema associated with the entity.
To summarize the types of relationships we could have:
One topic could be associated with one or more subjects.
One subject could be associated with one or more topics.
One subject is always associated with one schema.
To calculate the subject deterministically, we need:
The policy name (an enum representing how we calculate the subject from the provided inputs, listed here).
The topic name.
The record name (the namespace of the record).
A prefix for the subject (used to differentiate between key and value schemas, particularly in the PROTOBUF case).
Current Issue
Currently, we have only one policy to associate a Subject with a topic, known as topic_name. This strategy is used to calculate the subject for a given schema.
With this specific policy, the relationship between topic <-> schema <-> subject is bi-directional (i.e., these entities have a one-to-one relationship). Therefore, given the topic and the schema, we can automatically compute the associated subject.
We use this property to enable users to produce messages by providing only the schema ID as a parameter, instead of the entire schema. This allows us to retrieve the schema based on its ID, calculate the unique associated subject, and check if the subject is registered for the targeted schema.
This approach works well for all cases except for Protobuf. Currently, we query the database for the schema and check if it's associated with the targeted topic, without verifying whether it's registered as a key or a value schema (meaning a user can switch between key and value by simply using the schema body). This issue needs to be addressed.
A more long-term design problem is that this property holds true only when the strategy is topic_name. In the future, we need to ensure that before proceeding with message production by providing only schema IDs, we should check if the subject can be directly computed or if we also need the schema (for the record_name strategy or the topic_record_name strategy).
A more general solution would be to formalize the Subject object and assign it an ID.
This way, even if the relationship between subject and schema is not unique, we can directly verify if the subject is allowed and if the message structure complies with the schema by retrieving the subject from the database and then obtaining the associated schema using the subject ID.
The text was updated successfully, but these errors were encountered:
Issue Description
Current Status
Let's clarify the required definitions to understand the issue properly:
The expected flow, given the entities introduced above, is as follows:
To summarize the types of relationships we could have:
To calculate the subject deterministically, we need:
PROTOBUF
case).Current Issue
Currently, we have only one policy to associate a
Subject
with a topic, known astopic_name
. This strategy is used to calculate the subject for a given schema.With this specific policy, the relationship between
topic <-> schema <-> subject
is bi-directional (i.e., these entities have a one-to-one relationship). Therefore, given the topic and the schema, we can automatically compute the associated subject.We use this property to enable users to produce messages by providing only the schema ID as a parameter, instead of the entire schema. This allows us to retrieve the schema based on its ID, calculate the unique associated subject, and check if the subject is registered for the targeted schema.
This approach works well for all cases except for Protobuf. Currently, we query the database for the schema and check if it's associated with the targeted topic, without verifying whether it's registered as a key or a value schema (meaning a user can switch between key and value by simply using the schema body). This issue needs to be addressed.
A more long-term design problem is that this property holds true only when the strategy is
topic_name
. In the future, we need to ensure that before proceeding with message production by providing only schema IDs, we should check if the subject can be directly computed or if we also need the schema (for therecord_name
strategy or thetopic_record_name
strategy).A more general solution would be to formalize the Subject object and assign it an ID.
This way, even if the relationship between subject and schema is not unique, we can directly verify if the subject is allowed and if the message structure complies with the schema by retrieving the subject from the database and then obtaining the associated schema using the subject ID.
The text was updated successfully, but these errors were encountered: