Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update concurrency formula. #1907

Merged
merged 2 commits into from
Sep 30, 2024
Merged

Conversation

marianogappa
Copy link
Contributor

@marianogappa marianogappa commented Sep 27, 2024

The Problem

There's a "bug" in the concurrency formula on the scheduler (all strategies).

This is the formula:

	tableConcurrency := max(s.concurrency/minResourceConcurrency, minTableConcurrency)
	resourceConcurrency := tableConcurrency * minResourceConcurrency

But these values are hardcoded:

minResourceConcurrency := 100
minTableConcurrency := 1

So if you replace:

	tableConcurrency := max(s.concurrency/100, 1)
	resourceConcurrency := tableConcurrency * 100

This means that any plugin whose default concurrency is <= 100 will have a table concurrency of 1, even if it's the only table being synced (assuming no one changes the default).

The Fix

I made a very subtle change in the formula. Only if concurrency is <= 100, I change the minResourceConcurrency to concurrency/10. This decreases the resource concurrency up to 10x (that we don't seem to be hitting anyway), and increases the table concurrency up to 10x.

Plugins affected (on default concurrency)

  • bamboo-hr
  • bigquery
  • clickhouse (doesn't use scheduler)
  • confluence
  • crowddev
  • file (doesn't use scheduler)
  • leanix
  • oracledb
  • s3 (doesn't use scheduler)
  • sentinelone
  • servicenow
  • shopify
  • sonarqube
  • statuspage

The Results

I'm still working on the results (it's trickier than it seems).

In principle, they are very encouraging:

BigQuery

Before

$ cli sync bigquery_to_postgresql.yaml
Loading spec(s) from bigquery_to_postgresql.yaml
Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)]
Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 2m4s

After

$ cli sync bigquery_to_postgresql.yaml
Loading spec(s) from bigquery_to_postgresql.yaml
Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)]
Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 1m27s

Result

1.43x of regular speed (43% faster)

Sentinelone

Before

$ cli sync . 
Loading spec(s) from .
Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)]
Sync completed successfully. Resources: 1231, Errors: 0, Warnings: 0, Time: 1m4s

After

$ cli sync .
Loading spec(s) from .
Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)]
Sync completed successfully. Resources: 1231, Errors: 0, Warnings: 0, Time: 15s

Result

4.27x of regular speed (327% faster)

Sonarqube

Before

$ cli sync sonarqube_to_postgresql.yaml
	Loading spec(s) from sonarqube_to_postgresql.yaml
	Starting sync for: sonarqube (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.6.0)]
	Sync completed successfully. Resources: 4594, Errors: 0, Warnings: 0, Time: 39s

After

$ cli sync sonarqube_to_postgresql.yaml
Loading spec(s) from sonarqube_to_postgresql.yaml
Starting sync for: sonarqube (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.6.0)]
Sync completed successfully. Resources: 4594, Errors: 0, Warnings: 0, Time: 22s

Result

1.77x of regular speed (77% faster)

@github-actions github-actions bot added feat and removed feat labels Sep 27, 2024
@marianogappa marianogappa marked this pull request as ready for review September 27, 2024 15:37
@marianogappa marianogappa requested a review from a team as a code owner September 27, 2024 15:37
@github-actions github-actions bot added feat and removed feat labels Sep 27, 2024
@marianogappa marianogappa merged commit adce99c into main Sep 30, 2024
8 checks passed
@marianogappa marianogappa deleted the mariano/dynamic-limiter branch September 30, 2024 15:28
kodiakhq bot pushed a commit that referenced this pull request Oct 2, 2024
🤖 I have created a release *beep* *boop*
---


## [4.64.0](v4.63.0...v4.64.0) (2024-10-01)


### Features

* Add `opts.SchedulerOpts()` helper to convert `plugin.SyncOptions` for scheduler ([#1900](#1900)) ([242fb55](242fb55))
* **remoteoauth:** Add `WithToken` option ([#1898](#1898)) ([ff7a485](ff7a485))
* Update concurrency formula. ([#1907](#1907)) ([adce99c](adce99c))


### Bug Fixes

* **deps:** Update aws-sdk-go-v2 monorepo ([#1903](#1903)) ([ce2a0ef](ce2a0ef))
* **deps:** Update aws-sdk-go-v2 monorepo ([#1908](#1908)) ([bea3b00](bea3b00))
* **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.22.4 ([#1912](#1912)) ([c28aabe](c28aabe))
* **deps:** Update module golang.org/x/oauth2 to v0.23.0 ([#1910](#1910)) ([6fe6414](6fe6414))
* **deps:** Update module google.golang.org/grpc to v1.67.0 ([#1904](#1904)) ([a349812](a349812))
* **deps:** Update opentelemetry-go monorepo ([#1911](#1911)) ([78e05e1](78e05e1))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants