Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebalancing: Rebalancing criteria is not effective for nodes running on the same machine #2600

Closed
mrtracy opened this issue Sep 21, 2015 · 4 comments

Comments

@mrtracy
Copy link
Contributor

mrtracy commented Sep 21, 2015

A common test scenario is to run multiple nodes on the same machine. However, the current rebalancing criteria is not a good match for this scenario; this is because stores are compared based on "Capacity.FractionUsed()", which is based on information about the hard-drive the store is located on. In the case of multiple nodes on the same machine, they are often running from different files on the same hard-drive, and thus the FractionUsed() is always the same for all nodes.

We likely need an alternative rebalancing criteria for this situation, which will be very common in tests.

@tamird
Copy link
Contributor

tamird commented Sep 21, 2015

s/machine/store/, right? Also, do you actually want a rebalance to occur in this case?

@bdarnell
Copy link
Contributor

The issue is when multiple stores are backed by the same physical filesystem. This is a test-only scenario, but for testing we need some way to either apply different limits to each node (can we do this with docker?) or a way to fake the "available space on filesystem" functions.

@jacqueswww
Copy link

@bdarnell try creating a new docker container with added parameter "--storage-opt dm.basesize=20G":

docker --storage-opt dm.basesize=10G pull cockroachdb/cockroach

Docker daemon needs to switch to device mapper storage though:

DOCKER_OPTS="--storage-driver=devicemapper"

(http://muehe.org/posts/switching-docker-from-aufs-to-devicemapper/)

@jess-edwards jess-edwards mentioned this issue Oct 1, 2015
78 tasks
@mrtracy
Copy link
Contributor Author

mrtracy commented Jan 7, 2016

This was fixed by #3098, which added the --balance-mode flag.

Running cockroach with --balance-mode=rangecount will fix this problem.

@mrtracy mrtracy closed this as completed Jan 7, 2016
craig bot pushed a commit that referenced this issue Jan 30, 2024
117544: deps: upgrade Shopify/sarama v1.38.1 to IBM/sarama v1.42.1 r=rharding6373 a=wenyihu6

This patch updates sarama library to the latest version. Note that the ownership
of the sarama library has been transferred from Shopify to IBM.

Fixes: #117522
Release note: none

Here is the list of commits between the two versions upgrade. 
```
d88a48a chore: update CHANGELOG.md to v1.42.1 (#2711)
385b3b4 fix(config): relax ClientID validation after 1.0.0 (#2706)
3364ff0 chore(doc): add CODE_OF_CONDUCT.md
768496e chore(ci): bump actions/dependency-review-action from 3.1.0 to 3.1.1 (#2707)
27710af fix: make fetchInitialOffset use correct protocol (#2705)
a46917f chore(ci): bump actions/dependency-review-action from 2.5.1 to 3.1.0 (#2702)
4168f7c chore(ci): bump ossf/scorecard-action from 2.1.2 to 2.3.1 (#2703)
7155d51 chore(ci): add kafka 3.6.0 to FVT and versions
e0c3c62 fix(txmgr): ErrOffsetsLoadInProgress is retriable
2e077cf Fix default retention time value in offset commit (#2700)
f97ced2 Merge pull request #2678 from lzakharov/fix-data-race-in-async-produce
56d5044 fix: data race in Broker.AsyncProduce
a15034a Fix data race on Broker.done channel (#2698)
82f0e48 Asynchronously close brokers during a RefreshBrokers (#2693)
ee1944b chore(ci): bump github/codeql-action from 2.22.4 to 2.22.5 (#2695)
1d4de95 chore(ci): bump actions/upload-artifact from 3.1.0 to 3.1.3 (#2696)
3ca69a8 chore(doc): add OpenSSF Scorecard badge (#2691)
d2023bf feat(test): add a simple fuzzing example (#2039)
b8b29e1 chore(doc): add OpenSSF badge (#2690)
d38f08c fix(ci): always run CodeQL on every commit (#2689)
c5815ae chore(ci): bump github/codeql-action from 2.2.4 to 2.22.4 (#2686)
3a893f5 Merge pull request #2688 from IBM/dnwe/security-dot-md
7ae18cb fix(ci): ignore markdown changes for dep review
3b0f32e fix(doc): add SECURITY.md for vuln reporting
40ec971 chore(ci): bump actions/checkout from 3.1.0 to 4.1.1 (#2687)
25137dc chore(ci): add Dependency Review Actions
8ce03ed chore(ci): add golangci-lint and gitleaks checks
9658e0e chore(ci): ensure GH actions are pinned by hash
3d56b4c chore(ci): ensure gh permissions are explicit
8892f3f chore(ci): add dependabot to /examples tree
05af18e chore(ci): ossf scorecard.yml (#2683)
c42b2e0 fix(client): ignore empty Metadata responses when refreshing (#2672)
6678dd1 chore(deps): bump the golang-org-x group with 1 update (#2671)
24f1249 fix: pre-compile regex for parsing kafka version (#2663)
64d2044 fix(docs): correct topic name in rebalancing strategy example (#2657)
44f6db5 chore(deps): bump the golang-org-x group with 2 updates (#2661)
e16473b chore(ci): bump docker/setup-buildx-action from 2 to 3 (#2653)
98ec384 fix: use least loaded broker to refresh metadata
4b55bb3 perf: Alloc records in batch (#2646)
05cb9fa fix(consumer): don't retry session if ctx canceled
0b17025 chore(deps): bump the golang-org-x group with 1 update (#2641)
9e75986 chore(ci): bump actions/checkout from 3 to 4
9b0419d fix(consumer): guard against nil client
f3c4194 fix: typo
87229d9 fix: add retry logic to AlterUserScramCredentials
ae5eee5 fix(client): force Event Hubs to use V1_0_0_0 (#2633)
dedd86d fix: make clear that error is configuration issue not server error (#2628)
a4eafb4 chore(proto): doc CreateTopics/JoinGroup fields
503ade3 fix: add paragraph break to fix markdown render
ffaa252 fix(gh): correct issue template comments
78c7b63 chore(gh): add new style issue templates
c7e6bca chore(ci): ignore .md-only changes
09395f6 chore(docs): remove gopkg.in link
261043a chore(ci): add workflow_dispatch to stale
bbf6ee4 chore(ci): improve stale behaviour
b1bf950 chore(docs): add 1.41.0 to CHANGELOG.md
2b4ba74 chore(lint): bump golangci-lint and tweak config
9282d75 fix(doc): add missing doc for mock consumer
e9bd1b8 fix(proto): handle V3 member metadata and empty owned partitions
96c37d1 fix(docs): use go install for fetching tools
5cd9fa6 fix: flaky TestFuncOffsetManager
1bcf2d9 feat(fvt): test wider range of kafkas
827ec18 fix(fvt): reduce minimum compression-ratio metric
d44ebdc fix(fvt): fresh metrics registry for each test
2b54832 fix(fvt): ensure correct version in consumer tests
270f507 chore(fvt): tweak to work across more versions
b4e0554 feat(ci): experiment with tcpdump during FVT
0bb3316 fix(fvt): versioned cfg for invalid topic producer
d4dc7bc fix(examples): sync exactly_once and consumergroup
913b18f fix(fvt): Metadata version in ensureFullyReplicated
8681621 fix(fvt): handle msgset vs batchset
26792a3 feat(fvt): add healthcheck, depends_on and --wait
d2dba29 feat(gzip): switch to klauspost/compress gzip (#2600)
f8daee4 chore(deps): bump github.com/eapache/go-resiliency from 1.3.0 to 1.4.0 (#2598)
868ed33 Merge pull request #2595 from IBM/dnwe/close
b0363d1 fix(test): ensure some more clients are closed
31a8693 fix(fvt): disable keepalive on toxiproxy client
45313c3 fix(test): add missing closes to admin client tests (#2594)
a5b6e6a Merge pull request #2593 from IBM/dnwe/toxiproxy
0b9db06 chore(ci): implement toxiproxy client
4d8bb31 chore(ci): replace toxiproxy client dep
3d7b37f feat(fvt): experiment with per-kafka-version image
bd81a11 chore(fvt): tidyup broker await
8d0df91 chore(deps): bump module github.com/klauspost/compress to v1.16.7
6ff3567 chore(test): fix a couple of leaks
f033fc7 chore(deps): bump github.com/eapache/go-xerial-snappy digest to c322873
0409ed9 chore(deps): bump module github.com/jcmturner/gokrb5/v8 to v8.4.4
9dc4305 chore(deps): bump module github.com/pierrec/lz4/v4 to v4.1.18
108e264 chore(fvt): roll some tests back to DefaultVersion
f4e6453 chore(test): use modern protocol versions in FVT
991b2b0 chore(test): speedup some slow tests
fa7db9a chore(ci): pre-build FVT docker image
e31a540 chore(ci): use latest Go in actions (#2580)
500399c chore(test): ensure all mockresponses use version
43eae9b feat: add new error for MockDeleteTopicsResponse (#2475)
4cde6b3 chore(config): make DefaultVersion V2_1_0_0 (#2574)
8a09ef3 Merge pull request #2575 from IBM/dnwe/mockbroker
f4f435c chore(test): add verbose logger for unittests
03368ff chore(test): ensure MockBroker closed within test
e8808a6 chore(proto): match HeartbeatResponse version (#2576)
be809f9 chore(ci): remove manual go cache
a3024e7 chore(test): add V2_1_0_0 ApiVersions
00741ec feat(proto): add support for TxnOffsetCommit V2
0c39b9f feat(proto): add support for ListOffsetsRequest V4
765bfa3 chore(config): make DefaultVersion V2_0_0_0
bb864d7 fix(test): shutdown MockBroker
1c9ebab Merge pull request #2570 from IBM/dnwe/proto
d9cb01e fix(proto): use full ranges for remaining proto
826ef81 fix(proto): use full range of Txn protocols
09c8186 fix: avoid logging value of proxy.Dialer
76ca69a feat(proto): support for Metadata V6-V10
10dd922 fix(proto): use full range of ListGroupsRequest
4175433 fix(proto): use full range of SyncGroupRequest
29487f1 bug: implement unsigned modulus for partitioning with crc32 hashing
f35d212 fix: a rebalance isn't triggered after adding new partitions
4659dd0 chore(deps): bump the golang-org-x group with 1 update (#2561)
d8d9e73 Merge pull request #2558 from IBM/dnwe/proto
e4bf4df fix(proto): tweak LeaveGroupRequest requiredVersion
24b54f6 fix(proto): use full range of OffsetFetchRequest
57969b4 fix(proto): use full range of HeartbeatRequest
3d1c345 fix(proto): use full range of FindCoordinator
cf96776 chore: add .pre-commit-config
02d1209 chore(ci): tidyup and improve actions workflows
09ced0b fix(proto): use full range of MetadataRequest
8c40629 fix(proto): use up to V3 of OffsetRequest
cdf36d5 fix(proto): use range of OffsetCommitRequest vers
1a8a3ed fix(consumer): support JoinGroup V4
40b52c5 fix(consumer): use full range of FetchRequest vers
5530d61 fix(example): check if msg channel is closed
68312a5 chore(test): add V2_0_0_0 ApiVersions
c10bd1e fix(test): test timing error
82a6d57 fix(proto): correct JoinGroup usage for wider version range
23d4561 chore(proto): permit DeleteGroupsRequest V1
6010af0 chore(proto): permit AlterConfigsRequest V1
c32ffd1 chore(proto): permit CreatePartitionsRequest V1
1532d9f fix(proto): ensure req+resp requiredVersion match (#2548)
2a5f0f6 chore: add 1.40.1 to CHANGELOG.md
4cce955 Fix printing of final metrics
1d8f80e fix(test): use correct v7 mock ProduceResponse
973a9b7 fix(producer): use newer ProduceReq as appropriate
fb761f2 feat: support up to v4 of the ListGroups API (#2541)
017083e fix(consumer): use newer LeaveGroup as appropriate (#2544)
ce1ac25 Merge pull request #2538 from IBM/dnwe/is-valid-version
a9126ad fix(proto): use DeleteRecordsRequest v1
b8cc2b1 feat(proto): add test around supported versions
ee2872c fix(admin): remove group member needs >= 2.4.0
3b82606 fix(proto): correct consumer metadata shim
c240c67 fix(proto): extend txn types for identical versions
40fa609 fix(proto): ensure req+resp requiredVersion match
fa37d61 fix(proto): use DescribeLogDirsRequest v1
3dfbf99 feat(proto): add isValidVersion to protocol types
02c5de3 chore(deps): bump the golang-org-x group with 1 update (#2542)
6d094b8 Merge the two CONTRIBUTING.md's (#2543)
bbee916 chore: rollup fvt kafka to latest three (#2537)
87209f8 Merge pull request #2536 from hindessm/mrh/sleep-when-throttled
102513a test: add throttling support test
5ac5dc0 feat: support for sleeping when throttled
7d7ac52 Implement resolve_canonical_bootstrap_servers_only (#2156)
f07b129 Merge pull request #2533 from hindessm/mrh/extend-throttling-metric-scope
b678d34 chore(typo): trivial typo
e18c6cf feat: refactor throttle metrics to handle more responses
2d7ccb8 feat: add throttleTime function for responses with time.Duration fields
fa93017 feat: add throttleTime function for responses with int32 ms fields
ae24dbf chore(typo): fix field documentation typo
aa72f59 fix: avoiding burning cpu if all partitions are paused (#2532)
34bc8f9 fix: correct unsupported version check (#2528)
c28ecc0 fix(fvt): ensure fully-replicated at test start (#2531)
849c8b1 fix(test): allow testing of skipped test without IsTransactional panic (#2525)
ecf43f4 fix: concurrent issue on updateMetaDataMs (#2522)
e07f521 Merge pull request #2520 from hindessm/mrh/admin-retry-logic
aad8cf3 fix: add retry logic to ListPartitionReassignments
66ef5a9 fix: add retry logic to DescribeCluster
c7ce32f fix: add retry logic to DescribeTopics
80899bf Added support for Kerberos authentication with a credentials cache (KRB5_CCACHE_AUTH). (#2457)
735f33b feat(consumer): use buffer pools for decompression (#2484)
669d2bc fix: comments for PauseAll and ResumeAll (#2523)
63ff8d1 Merge pull request #2519 from hindessm/mrh/fix-retry-logic-again
df58534 fix: use safer condition
3ba807b fix: admin retry logic
08ff0ff Merge pull request #2517 from hindessm/mrh/fix-some-retry-issues
39c18fc fix: off-by-one errors in attempts remaining logging
7888004 fix: admin retry logic
6ecdb50 Merge pull request #2514 from hindessm/main
f6ccc6f fix(test): ubi-minimal seems to be missing zoneinfo files
b53dbe9 fix(tools): remove default duplication from help
d439508 chore(docs): fix iotuil.Discard reference
441b083 chore(typos): random typos spotted while browsing code
12c24a8 chore(deps): bump github.com/stretchr/testify from 1.8.1 to 1.8.3 (#2512)
55ea700 chore(deps): bump github.com/klauspost/compress from 1.15.14 to 1.16.6 (#2513)
2420fcd chore(deps): bump the golang-org-x group with 2 updates
cc72418 chore(ci): bump actions/setup-go from 3 to 4 (#2508)
dcf5196 chore(ci): remove empty scope from dependabot.yml
fb81408 Merge pull request #2504 from EladLeev/golangci-cleanup
5185d46 chore(ci): tweak scope in dependabot.yml
492d4f9 chore(ci): remove scope from dependabot.yml
eb52957 chore(ci): add dependabot.yml
da48ff2 chore(ci): add depguard config
3654162 chore: add 1.40.0 to CHANGELOG.md
0863085 chore(ci): bump golangci, remove deprecated linters
848522f chore(ci): add simple apidiff workflow
ee207f8 chore(ci): fix stale action params
3f22fd3 chore(ci): migrate probot-stale to actions/stale
4b9e8f6 fix: restore (*OffsetCommitRequest) AddBlock func
c2cab9d chore: migrate module to github.com/IBM/sarama
fd35e17 chore: bytes.Equal instead bytes.Compare (#2485)
fd21bd2 chore(ci): remove Shopify/shopify-cla-action (#2489)
9127f1c fix: data race in balance strategy (#2453)
2f8dcd0 fix(mock consumer): HighWaterMarkOffset (#2447)
7dbf0b5 fix: use version 4 of DescribeGroupsRequest only if kafka broker version is >= 2.4
1015b4f chore(deps): bump golang.org/x/net from 0.5.0 to 0.7.0 (#2452)
397cee4 fix: simplify some balance_strategy.go logic
d8bcfcc chore: refresh CHANGELOG.md from github-releases
40329aa chore: add kafka 3.3.2 (#2434)
0b15695 fix(metrics): fix race condition when calling Broker.Open() twice (#2428)
66e60c7 fix(consumer): don't retry FindCoordinator forever (#2427)
```

117678: roachtest: add elastic backup equivalent test for aws r=sumeerbhola a=aadityasondhi

Informs #107770.

Release note: None

Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants