Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TiSpark recent release is slower than before #2105

Open
birdstorm opened this issue Sep 8, 2021 · 1 comment
Open

[BUG] TiSpark recent release is slower than before #2105

birdstorm opened this issue Sep 8, 2021 · 1 comment
Assignees

Comments

@birdstorm
Copy link
Contributor

birdstorm commented Sep 8, 2021

The problem is located that in the recent release of TiSpark, the default value of PARTITION_PER_SPLIT is changed from 10 to 1. It results in increasing Spark tasks.

some related problems:

  • ScanRequest receive a slower response from tikv when scanning meta data
    • cause: scanning is not concurrent.
  • Memory usage incresed.
    • cause: the memory usage of ColumnVector should be optimized.

Affected versions: v2.3.14 to v2.3.16, v2.4.1

@crabo
Copy link

crabo commented May 11, 2022

"Memory usage": Pls also check DAGIterator.process(), the underlying grpc always thow OutOfDirectMemoryError even in unpooled mode. As 10Million rows table scan in ETL, Off-Heap mem is requried roughly 5GB, that's really a big waste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants