You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Design: Range queries V1.1 Range over value fields
Development: 0.9
last update: 28.10.2022
This issue describes our approach for the support of Range queries over value fields in Quick.
Problem definition
Imagine the scenario in which we want to do analytics over the click counts of a user in a range of time. The users are allowed to edit their click count. This means that we need the information of the userId and timestamp (in a format like userId:timestamp) as the key; otherwise, we will lose the updated information. So the data would look like this:
Pay attention to the userId 2. If the key were only the integer value 2, we would have lost the values in timestamp 0. Now let's create the GraphQL schema and use the Range query functionality:
The problem here is that we can only query over the key of the topic. We cannot query the data over the userId 2. So in this design document, we will discuss possible designs and solutions to overcome this limitation.
Goals
Quick-CLI: The users should set the --range-key option when they are creating a queryable topic for range
Gateway: The gateway should be aware of the type of the newly defined range key field
Mirror: The mirror should repartition the data based on the defined range-key
Out of scope
The range-key can only be applied on primitive types (Int, Long, String)
Implementation
1. Quick CLI
Goal: The users should set the --range-key option when they are creating a queryable topic for range
When creating a topic, the user can pass a --range-key <FieldName> option. The manager passes the value to the deployment of the mirror.
This command sends a request to the manager, and the manager prepares the deployment of a mirror called user-metrics. This mirror creates two indexes:
Range Index over the new key (userId) and timestamp
Point Index only over the new key (userId)
2. Gateway
Goal: The gateway should be aware of the type of the newly defined range key field
It is important to create a partitioned mirror client based on the newly defined key. Currently, we are using the information in the topic registry (i.e., key serde) to serialize the keys and find the partition. This should change to the type of the new key. One idea is to use the type of keyArgument and supply the SerDe.
3. Mirror
Goal: The mirror should repartition the data based on the defined range-key
The mirror should use the selectKey method to repartition the data based on the new key. Kafka Streams will:
Send the rekeyed data to an internal repartition topic
Reread the newly rekeyed data back into Kafka Streams
Below you can find a detailed description of the topology:
The text was updated successfully, but these errors were encountered:
Design: Range queries V1.1 Range over value fields
Development: 0.9
last update: 28.10.2022
This issue describes our approach for the support of Range queries over value fields in Quick.
Problem definition
Imagine the scenario in which we want to do analytics over the click counts of a user in a range of time. The users are allowed to edit their click count. This means that we need the information of the userId and timestamp (in a format like
userId:timestamp
) as the key; otherwise, we will lose the updated information. So the data would look like this:Pay attention to the userId
2
. If the key were only the integer value2
, we would have lost the values in timestamp0
. Now let's create the GraphQL schema and use the Range query functionality:And we create a topic with a key string and the value schema of
UserMetric
and ingest the JSON data defined above.quick topic user-request-range --key integer --value schema --schema gateway.UserMetric --range-field timestamp
The problem here is that we can only query over the key of the topic. We cannot query the data over the userId
2
. So in this design document, we will discuss possible designs and solutions to overcome this limitation.Goals
Out of scope
range-key
can only be applied on primitive types (Int, Long, String)Implementation
1. Quick CLI
Goal: The users should set the --range-key option when they are creating a queryable topic for range
When creating a topic, the user can pass a
--range-key <FieldName>
option. The manager passes the value to the deployment of the mirror.Example:
quick topic user-metrics --key string --value schema --schema gateway.UserMetric --range-key userId --range-field timestamp
This command sends a request to the manager, and the manager prepares the deployment of a mirror called
user-metrics
. This mirror creates two indexes:userId
) andtimestamp
userId
)2. Gateway
Goal: The gateway should be aware of the type of the newly defined range key field
It is important to create a partitioned mirror client based on the newly defined key. Currently, we are using the information in the topic registry (i.e., key serde) to serialize the keys and find the partition. This should change to the type of the new key. One idea is to use the type of
keyArgument
and supply the SerDe.3. Mirror
Goal: The mirror should repartition the data based on the defined range-key
The mirror should use the
selectKey
method to repartition the data based on the new key. Kafka Streams will:Below you can find a detailed description of the topology:
The text was updated successfully, but these errors were encountered: