Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behaviour on vshard bootstrapping #1148

Closed
dokshina opened this issue Dec 1, 2020 · 1 comment · Fixed by #1583
Closed

Confusing behaviour on vshard bootstrapping #1148

dokshina opened this issue Dec 1, 2020 · 1 comment · Fixed by #1583
Assignees
Labels
bug Something isn't working cartridge teamS Scaling

Comments

@dokshina
Copy link
Contributor

dokshina commented Dec 1, 2020

Cartridge version: cartridge == 2.3.0-1

The problem

My application has hot and cold vshard groups configured.
I create one storage replicaset with cold group.

cartridge.admin_get_replicasets()
unix/:./tmp/run/myapp.router.control> cartridge.admin_get_replicasets()
---
- - &0
    active_master: &1
      disabled: false
      uuid: 686f8679-6b47-413d-b3d3-fe05872e2f44
      replicaset: *0
      labels: []
      uri: localhost:3302
      alias: s1-master
      clock_delta: 4.5e-05
      message:
      priority: 1
      status: healthy
    master: *1
    status: healthy
    all_rw: false
    vshard_group: cold
    alias: s-1
    weight: 1
    servers:
    - *1
    roles:
    - vshard-storage
    uuid: fcb1f474-ec75-4603-b116-0978f8496e8f
  - &2
    active_master: &3
      disabled: false
      uuid: 9c9afbbb-a109-4602-98bf-022a4fbaba40
      replicaset: *2
      labels: []
      uri: localhost:3301
      alias: router
      clock_delta: 6.85e-05
      message:
      priority: 1
      status: healthy
    master: *3
    status: healthy
    all_rw: false
    alias: router
    servers:
    - *3
    roles:
    - vshard-router
    uuid: 67ceb5d0-d9ec-4d33-bb4c-9cb629b77b5f

Then, I call admin_bootstrap_vshard(). cold group is bootstrapped successfully, but there are two problems:

  • admin_bootstrap_vshard returns error
unix/:./tmp/run/myapp.router.control> cartridge.admin_bootstrap_vshard()
---
- null
- line: 131
  class_name: Bootstrapping vshard failed
  err: Sharding config is empty
  file: '....rocks/share/tarantool/cartridge/roles/vshard-router.lua'
  stack: "stack traceback:\n\t....rocks/share/tarantool/cartridge/roles/vshard-router.lua:131:
    in function 'bootstrap_group'\n\t....rocks/share/tarantool/cartridge/roles/vshard-router.lua:169:
    in function <....rocks/share/tarantool/cartridge/roles/vshard-router.lua:147>\n\t[C]:
    in function 'xpcall'\n\t...se/cartridge-cli/myapp/.rocks/share/tarantool/errors.lua:145:
    in function <...se/cartridge-cli/myapp/.rocks/share/tarantool/errors.lua:139>\n\t[C]:
    in function 'pcall'\n\tbuiltin/box/console.lua:412: in function 'eval'\n\tbuiltin/box/console.lua:718:
    in function 'repl'\n\tbuiltin/box/console.lua:852: in function <builtin/box/console.lua:838>\n\t[C]:
    in function 'pcall'\n\tbuiltin/socket.lua:1078: in function <builtin/socket.lua:1076>"
  str: 'Bootstrapping vshard failed: Sharding config is empty'
...
  • <group>.bootstrapped is false:
unix/:./tmp/run/myapp.router.control> vshard_utils.get_known_groups()
---
- cold:
    rebalancer_max_receiving: 100
    bootstrapped: false
    collect_bucket_garbage_interval: 0.5
    collect_lua_garbage: false
    sync_timeout: 1
    rebalancer_disbalance_threshold: 1
    bucket_count: 10000
  hot:
    rebalancer_max_receiving: 100
    bootstrapped: false
    collect_bucket_garbage_interval: 0.5
    collect_lua_garbage: false
    sync_timeout: 1
    rebalancer_disbalance_threshold: 1
    bucket_count: 20000
...

The possible reason

After bootstrapping all groups, vshard-router. bootstrap checks if there was some errors, and returns the last error.
As a result, two-phase commit is skipped in this case, and vshard groups config isn't updated (bootstrapped: true should be set after successful bootstrapping).

The possible solution

I sure that we have to update vshard groups config anyway if some groups were successfully bootstrapped.
Moreover, I think that my case is valid, I don't have to create replicasets of ALL groups mentioned in config.

And maybe can_bootstrap_vshard should return true only if there are some replicasets with non-bootstrapped groups.
(Now it checks that all groups mentioned in config are bootstratpped).

@rosik rosik added the bug Something isn't working label Dec 1, 2020
@kyukhin kyukhin added this to the wishlist milestone Aug 19, 2021
@msiomkin
Copy link

I also think I shouldn't be forced to create replicasets for all sharding groups mentioned in the config. E.g. during development I may want to fill in the group names before I actually create storage roles for all of them. So, an empty sharding group shouldn't result in an error. A warning is quite enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cartridge teamS Scaling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants