Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD continues sending merge operator to TiKV and then time out #7689

Closed
rleungx opened this issue Jan 10, 2024 · 1 comment · Fixed by #7708
Closed

PD continues sending merge operator to TiKV and then time out #7689

rleungx opened this issue Jan 10, 2024 · 1 comment · Fixed by #7708
Assignees
Labels
severity/major type/bug The issue is confirmed as a bug.

Comments

@rleungx
Copy link
Member

rleungx commented Jan 10, 2024

Bug Report

When we enable the scheduling service, PD wants to merge the region 302944954 and 302944946.
Here is the rule fit for these two regions:

sh-5.1# curl http://127.0.0.1:2379/pd/api/v1/config/rules/region/302944954/detail
{
    "rule-fits": [
        {
            "rule": {
                "group_id": "pd",
                "id": "default",
                "start_key": "",
                "end_key": "",
                "role": "voter",
                "is_witness": false,
                "count": 3,
                "location_labels": [
                    "topology.kubernetes.io/region",
                    "topology.kubernetes.io/zone",
                    "serverless.tidbcloud.com/partition",
                    "kubernetes.io/hostname"
                ],
                "version": 1
            },
            "peers": [
                {
                    "id": 452329816,
                    "store_id": 452861088
                },
                {
                    "id": 452419226,
                    "store_id": 452861087
                },
                {
                    "id": 460931485,
                    "store_id": 452861086
                }
            ],
            "peers-different-role": null,
            "isolation-score": 300,
            "witness-score": 0
        }
    ],
    "orphan-peers": null
}
sh-5.1# curl http://127.0.0.1:2379/pd/api/v1/config/rules/region/302944946/detail
{
    "rule-fits": [
        {
            "rule": {
                "group_id": "pd",
                "id": "default",
                "start_key": "",
                "end_key": "",
                "role": "voter",
                "is_witness": false,
                "count": 3,
                "location_labels": [
                    "topology.kubernetes.io/region",
                    "topology.kubernetes.io/zone",
                    "serverless.tidbcloud.com/partition",
                    "kubernetes.io/hostname"
                ],
                "version": 1
            },
            "peers": [
                {
                    "id": 452470008,
                    "store_id": 452861087
                },
                {
                    "id": 462779468,
                    "store_id": 452861088
                },
                {
                    "id": 462779469,
                    "store_id": 452861086
                }
            ],
            "peers-different-role": null,
            "isolation-score": 300,
            "witness-score": 0
        }
    ],
    "orphan-peers": null
}

But in TiKV side, these operator is rejected because the region belongs to two different keyspaces.

[WARN] [peer_fsm.rs:1002] ["failed to propose merge"] [err="KvEngine check merge shards from different keyspaces require to be empty ...]

More detailed information:

source: 
region id: 302944946 
start_key: 78009D2600000000FB 
end_key: 78009D2700000000FB

target: 
region id: 302944954 
start_key: 78009D2700000000FB 
end_key: 78009D2900000000FB

78009D2600000000FB -> keyspace id 40230
» keyspace show id 40230
{
    "id": 40230,
    "name": "DZ6h87arR3U4M33",
    "state": "ENABLED",
    "created_at": 1688637837,
    "state_changed_at": 1688637837,
    "config": {
        "gc_life_time": "6000",
        "safe_point_version": "v2",
        "serverless_cluster_id": "",
        "serverless_project_id": "",
        "serverless_tenant_id": "",
        "tso_keyspace_group_id": "4",
        "user_kind": "basic"
    }
}

78009D2700000000FB -> keyspace id 40231
» keyspace show id 40231
{
    "id": 40231,
    "name": "3Ker9q2AQjHxazx",
    "state": "TOMBSTONE",
    "created_at": 1688637837,
    "state_changed_at": 1689244223,
    "config": {
        "gc_life_time": "6000",
        "serverless_cluster_id": "e2e-test-1665-1555513219",
        "serverless_project_id": "",
        "serverless_tenant_id": "e2e-tenant-id",
        "tso_keyspace_group_id": "2",
        "user_kind": "basic"
    }
}

78009D2900000000FB -> keyspace id 40233
» keyspace show id 40233
{
    "id": 40233,
    "name": "48cDkthwxwKt9xx",
    "state": "ENABLED",
    "created_at": 1688637838,
    "state_changed_at": 1688637838,
    "config": {
        "gc_life_time": "6000",
        "safe_point_version": "v2",
        "serverless_cluster_id": "",
        "serverless_project_id": "",
        "serverless_tenant_id": "",
        "tso_keyspace_group_id": "4",
        "user_kind": "basic"
    }
}
@rleungx rleungx added the type/bug The issue is confirmed as a bug. label Jan 10, 2024
@rleungx rleungx changed the title PD continue sending merge operator to TiKV and then time out PD continues sending merge operator to TiKV and then time out Jan 10, 2024
@rleungx
Copy link
Member Author

rleungx commented Jan 15, 2024

After some investigation, the watcher only loads one batch of rules which is 400. We need to load them all when initialization.

ti-chi-bot bot added a commit that referenced this issue Jan 16, 2024
close #7689

Signed-off-by: lhy1024 <admin@liudos.us>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
pingandb pushed a commit to pingandb/pd that referenced this issue Jan 18, 2024
close tikv#7689

Signed-off-by: lhy1024 <admin@liudos.us>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: pingandb <songge102@pingan.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

2 participants