Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failover_promote returns an error via force_inconsistency: true #1399

Closed
dokshina opened this issue Apr 29, 2021 · 6 comments · Fixed by #1772
Closed

failover_promote returns an error via force_inconsistency: true #1399

dokshina opened this issue Apr 29, 2021 · 6 comments · Fixed by #1772
Assignees
Labels
bug Something isn't working cartridge

Comments

@dokshina
Copy link
Contributor

dokshina commented Apr 29, 2021

Bug description

failover_promote returns an error via force_inconsistency: true. Reproduced in both stateboard cases (tarantool and etcd).
Vclockkeeper value in stateboard storage was changed between get_vclockkeeper and set_vclockkeeper calls in the force_inconsistency function.
That check should be on the server (stateboard or etcd), not the client.

if keeper.instance_uuid == instance_uuid
and vclock == nil
then
-- No update needed
return true
end

Needs extra investigation to fix etcd error.

Steps to reproduce

1. The cluster with simple topology is deployed:
    core_1_replicaset:
      hosts:
        core-1:
      vars:
        replicaset_alias: core-1
        roles:
          - vshard-router
          - failover-coordinator

    storage_1_replicaset:
      hosts:
        storage-1-leader:
        storage-1-replica:
        storage-1-replica-2:
      vars:
        replicaset_alias: storage-1
        failover_priority:
          - storage-1-leader
          - storage-1-replica-2
          - storage-1-replica
        roles:
          - vshard-storage

    storage_2_replicaset:
      hosts:
        storage-2-leader:
        storage-2-replica:
      vars:
        replicaset_alias: storage-2
        failover_priority:
          - storage-2-leader
          - storage-2-replica
        roles:
          - vshard-storage

The failover is set to

    cartridge_failover_params:
      mode: stateful
      state_provider: stateboard
      stateboard_params:
        uri: vm1:4001
        password: secret-stateboard
  1. Leaders are promoted to storage-1-replica and storage-2-replica with force_inconsistency: false.
  2. Leaders are promoted to core-1, storage-1, storage-2 with force_inconsistency: true.

On step 3 the error is returned: Failed to promote leaders: Promotion succeeded, but inconsistency wasn't forced: Ordinal comparison failed (requested 5, current 7).

  1. use test from Add stateful test for promote from ticket 1399 #1710
  2. add fiber.sleep before
    'set_vclockkeeper', {
  3. Profit

Actual behavior

Promotion succeeded, but inconsistency wasn't forced: Compare failed (101): [223 != 253]

local resp, err = session.connection:request('PUT',

Promotion succeeded, but inconsistency wasn't forced: Ordinal comparison failed (requested 5, current 7)

'set_vclockkeeper', {

Expected behavior

No error returned.

@rosik
Copy link
Contributor

rosik commented Apr 29, 2021

Related to #1398

@rosik rosik added the bug Something isn't working label Apr 29, 2021
@kyukhin kyukhin added this to the wishlist milestone Aug 19, 2021
@filonenko-mikhail filonenko-mikhail removed this from the wishlist milestone Jan 12, 2022
@filonenko-mikhail filonenko-mikhail added teamX and removed teamS Scaling labels Jan 12, 2022
@filonenko-mikhail
Copy link
Contributor

filonenko-mikhail commented Jan 21, 2022

Already covered by tests in other pr #1682

@opomuc
Copy link

opomuc commented Feb 11, 2022

The issue is still reproduced. @filonenko-mikhail DM me for details, please

@filonenko-mikhail
Copy link
Contributor

Please provide reproducer:

  • cartridge version
  • env
  • step to reproduce

@yngvar-antonsson
Copy link
Collaborator

yngvar-antonsson commented Feb 17, 2022

Promotion succeeded, but inconsistency wasn't forced: Compare failed (101): [223 != 253]
It seems that the value by last index (prevIndex) was changed between get_vclockkeeper and set_vclockkeeper in

local resp, err = session.connection:request('PUT',

the same in

'set_vclockkeeper', {

I'll try to write a repro test

@yngvar-antonsson
Copy link
Collaborator

Great thanks to @rosik for help to investigate the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cartridge
Projects
None yet
6 participants