Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop-task during load phase will cause the source always tries to transfer to the worker #3771

Closed
zhenjiaogao opened this issue Dec 7, 2021 · 2 comments · Fixed by #4004
Closed
Assignees
Labels
area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.

Comments

@zhenjiaogao
Copy link

What did you do?

  • tiup dm display dm1
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.7.0/tiup-dm display dm1
Cluster type:       dm
Cluster name:       dm1
Cluster version:    v2.0.7
Deploy user:        tidb
SSH type:           builtin
ID                 Role          Host          Ports      OS/Arch       Status     Data Dir                             Deploy Dir
--                 ----          ----          -----      -------       ------     --------                             ----------
172.16.x.162:9093  alertmanager  172.16.x.162  9093/9094  linux/x86_64  Up         /home/dmwang-data/alertmanager-9093  /home/dmwang-deploy/alertmanager-9093
172.16.x.127:8361  dm-master     172.16.x.127  8361/8391  linux/x86_64  Healthy    /home/dmwang-data/dm-master-8361     /home/dmwang-deploy/dm-master-8361
172.16.x.222:8361  dm-master     172.16.x.222  8361/8391  linux/x86_64  Healthy|L  /home/dmwang-data/dm-master-8361     /home/dmwang-deploy/dm-master-8361
172.16.x.215:8361  dm-master     172.16.x.215  8361/8391  linux/x86_64  Healthy    /home/dmwang-data/dm-master-8361     /home/dmwang-deploy/dm-master-8361
172.16.x.162:8262  dm-worker     172.16.x.162  8262       linux/x86_64  Bound      /home/dmwang-data/dm-worker-8262     /home/dmwang-deploy/dm-worker-8262
172.16.x.215:8262  dm-worker     172.16.x.215  8262       linux/x86_64  Free       /home/dmwang-data/dm-worker-8262     /home/dmwang-deploy/dm-worker-8262
172.16.x.162:3330  grafana       172.16.x.162  3330       linux/x86_64  Up         -                                    /home/dmwang-deploy/grafana-3330
172.16.x.162:9390  prometheus    172.16.x.162  9390       linux/x86_64  Up         /home/dmwang-data/prometheus-9390    /home/dmwang-deploy/prometheus-9390
Total nodes: 8
  • There are two tasks bound to mysql-replica-01
tiup dmctl  --master-addr 172.16.x.127:8361 query-status test_new
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.x.127:8361 query-status test_new
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "dm-172.16.x.162-8262",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "test_new",
                    "stage": "Running",
                    "unit": "Load",
                    "result": null,
                    "unresolvedDDLLockID": "",
                    "load": {
                        "finishedBytes": "17000554",
                        "totalBytes": "11951230304",
                        "progress": "0.14 %",
                        "metaBinlog": "(mysql-bin.000049, 194)",
                        "metaBinlogGTID": "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-205705"
                    }
                }
            ]
        }
    ]
}

$ tiup dmctl  --master-addr 172.16.x.127:8361 query-status test
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.x.127:8361 query-status test
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "dm-172.16.x.162-8262",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "test",
                    "stage": "Running",
                    "unit": "Sync",
                    "result": null,
                    "unresolvedDDLLockID": "",
                    "sync": {
                        "totalEvents": "0",
                        "totalTps": "0",
                        "recentTps": "0",
                        "masterBinlog": "(mysql-bin.000049, 194)",
                        "masterBinlogGtid": "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-205705",
                        "syncerBinlog": "(mysql-bin.000048, 2687435173)",
                        "syncerBinlogGtid": "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-205705",
                        "blockingDDLs": [
                        ],
                        "unresolvedGroups": [
                        ],
                        "synced": true,
                        "binlogType": "remote",
                        "secondsBehindMaster": "0"
                    }
                }
            ]
        }
    ]
}
  • Stop task task_new and its status is load
$ tiup dmctl  --master-addr 172.16.x.127:8361 stop-task ./task_new.yaml
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.x.127:8361 stop-task ./task_new.yaml
{
    "op": "Stop",
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.x.162-8262"
        }
    ]
}

$ tiup dmctl  --master-addr 172.16.x.127:8361 query-status test_new
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.x.127:8361 query-status test_new
{
    "result": false,
    "msg": "task test_new has no source or not exist",
    "sources": [
    ]
}
  • stop dm-worker 172.16.x.162:8262
  • dmctl list-member --worker, source mysql-replica-01 is bound to new dm worker 172.16.x.215:8262
$ tiup dmctl  --master-addr 172.16.x.127:8361 list-member --worker
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.4.127:8361 list-member --worker
{
    "result": true,
    "msg": "",
    "members": [
        {
            "worker": {
                "msg": "",
                "workers": [
                    {
                        "name": "dm-172.16.x.162-8262",
                        "addr": "172.16.x.162:8262",
                        "stage": "offline",
                        "source": ""
                    },
                    {
                        "name": "dm-172.16.x.215-8262",
                        "addr": "172.16.x.215:8262",
                        "stage": "bound",
                        "source": "mysql-replica-01"
                    }
                ]
            }
        }
    ]
}
  • But, when dm worker 172.16.x.162:8262 is restarted, source mysql-replica-01 is bound to 172.16.x.162:8262 again
$ tiup dm start dm1 -N 172.16.x.162:8262
$ tiup dmctl  --master-addr 172.16.x.127:8361 list-member --worker
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.x.127:8361 list-member --worker
{
    "result": true,
    "msg": "",
    "members": [
        {
            "worker": {
                "msg": "",
                "workers": [
                    {
                        "name": "dm-172.16.x.162-8262",
                        "addr": "172.16.x.162:8262",
                        "stage": "bound",
                        "source": "mysql-replica-01"
                    },
                    {
                        "name": "dm-172.16.x.215-8262",
                        "addr": "172.16.x.215:8262",
                        "stage": "free",
                        "source": ""
                    }
                ]
            }
        }
    ]
}

What did you expect to see?

source mysql-replica-01 is still bound to dm-worker 172.16.x.215:8262 after dm-worker 172.16.x.162-8262 is started

What did you see instead?

When dm worker 172.16.x.162:8262 is restarted, source mysql-replica-01 is bound to 172.16.x.162:8262 again

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

(paste DM version here, and you must ensure versions of dmctl, DM-worker and DM-master are same)

Upstream MySQL/MariaDB server version:

(paste upstream MySQL/MariaDB server version here)

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
@zhenjiaogao zhenjiaogao added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Dec 7, 2021
@lance6716 lance6716 changed the title There is a task in the stop state and load phase, unexpected results display after transfter source stop-task during load phase will cause the source always try to transfer to the worker Dec 7, 2021
@lance6716 lance6716 changed the title stop-task during load phase will cause the source always try to transfer to the worker stop-task during load phase will cause the source always tries to transfer to the worker Dec 7, 2021
@lance6716
Copy link
Contributor

since stop-task doesn't clean dump files, the etcd metadata related to dump files are not cleaned as well. Still prefer stop-task --remove-meta @sunzhaoyang

@lance6716
Copy link
Contributor

the fix can be don't count for stopped tasks when checking if a worker has local load files of some sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants