Design: Range queries V1.1 Range over value fields #114

raminqaf · 2022-10-13T13:58:31Z

Design: Range queries V1.1 Range over value fields

Development: 0.9

last update: 28.10.2022

This issue describes our approach for the support of Range queries over value fields in Quick.

Problem definition

Imagine the scenario in which we want to do analytics over the click counts of a user in a range of time. The users are allowed to edit their click count. This means that we need the information of the userId and timestamp (in a format like userId:timestamp) as the key; otherwise, we will lose the updated information. So the data would look like this:

[
    {
        "key": "1:0",
        "value": {
            "userId": 1,
            "clickCount": 1,
            "timestamp": 0
        }
    },
    {
        "key": "2:0",
        "value": {
            "userId": 2,
            "clickCount": 1,
            "timestamp": 0
        }
    },
    {
        "key": "2:0",
        "value": {
            "userId": 2,
            "clickCount": 2,
            "timestamp": 0
        }
    },
    {
        "key": "2:1",
        "value": {
            "userId": 2,
            "clickCount": 1,
            "timestamp": 1
        }
    }
]

Pay attention to the userId 2. If the key were only the integer value 2, we would have lost the values in timestamp 0. Now let's create the GraphQL schema and use the Range query functionality:

type Query {
    userMetrics(
        userId: Int
        timeFrom: Long
        timeTo: Long
    ): [UserMetric] @topic(name: "user-metrics",
        keyArgument: "userId",
        rangeFrom: "timeFrom",
        rangeTo: "timeTo")
}
type UserMetric {
    userId: Int!
    clickCount: Int!
    timestamp: Long
}

And we create a topic with a key string and the value schema of UserMetric and ingest the JSON data defined above.

quick topic user-request-range --key integer --value schema --schema gateway.UserMetric --range-field timestamp

The problem here is that we can only query over the key of the topic. We cannot query the data over the userId 2. So in this design document, we will discuss possible designs and solutions to overcome this limitation.

Goals

Quick-CLI: The users should set the --range-key option when they are creating a queryable topic for range
Gateway: The gateway should be aware of the type of the newly defined range key field
Mirror: The mirror should repartition the data based on the defined range-key

Out of scope

The range-key can only be applied on primitive types (Int, Long, String)

Implementation

1. Quick CLI

Goal: The users should set the --range-key option when they are creating a queryable topic for range

When creating a topic, the user can pass a --range-key <FieldName> option. The manager passes the value to the deployment of the mirror.

Example:

quick topic user-metrics --key string --value schema --schema gateway.UserMetric --range-key userId --range-field timestamp

This command sends a request to the manager, and the manager prepares the deployment of a mirror called user-metrics. This mirror creates two indexes:

Range Index over the new key (userId) and timestamp
Point Index only over the new key (userId)

2. Gateway

Goal: The gateway should be aware of the type of the newly defined range key field

It is important to create a partitioned mirror client based on the newly defined key. Currently, we are using the information in the topic registry (i.e., key serde) to serialize the keys and find the partition. This should change to the type of the new key. One idea is to use the type of keyArgument and supply the SerDe.

3. Mirror

Goal: The mirror should repartition the data based on the defined range-key

The mirror should use the selectKey method to repartition the data based on the new key. Kafka Streams will:

Send the rekeyed data to an internal repartition topic
Reread the newly rekeyed data back into Kafka Streams
Below you can find a detailed description of the topology:

The text was updated successfully, but these errors were encountered:

raminqaf added the type/design Design documents for enhancements label Oct 13, 2022

raminqaf added this to Quick Oct 13, 2022

This was referenced Oct 17, 2022

Deploy mirror with --range-key argument #117

Closed

Create range index on --range-key field in mirror #119

Closed

Gateway key serializer should be set by the keyArgument type #120

Closed

raminqaf mentioned this issue Dec 1, 2022

Range Queries on key #116

Closed

raminqaf closed this as completed Jan 17, 2023

github-project-automation bot moved this to Done in Quick Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: Range queries V1.1 Range over value fields #114

Design: Range queries V1.1 Range over value fields #114

raminqaf commented Oct 13, 2022 •

edited

Loading

Design: Range queries V1.1 Range over value fields #114

Design: Range queries V1.1 Range over value fields #114

Comments

raminqaf commented Oct 13, 2022 • edited Loading

Design: Range queries V1.1 Range over value fields

Problem definition

Goals

Out of scope

Implementation

1. Quick CLI

2. Gateway

3. Mirror

raminqaf commented Oct 13, 2022 •

edited

Loading