Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage] use binary compression format to store the metadata of fuse #10265

Closed
sundy-li opened this issue Feb 28, 2023 · 1 comment · Fixed by #11015
Closed

[storage] use binary compression format to store the metadata of fuse #10265

sundy-li opened this issue Feb 28, 2023 · 1 comment · Fixed by #11015
Assignees
Labels
A-storage Area: databend storage

Comments

@sundy-li
Copy link
Member

sundy-li commented Feb 28, 2023

Summary

Loading hits dataset into databend, it will generate ~ 16 segments.

Each segment's metadata is stored in JSON format. The segment contains so many fields which makes the file too large.

Hits dataset is small (9kw rows, 20GB compressed data), but the metadata just took 16 * 12M = 172MB, we need to reduce the metadata size (Storage as Binary & compression format).

❯ du -sh  _data/1/208469/_sg/f48d4df4462144a3a234fef0d8cd28bf_v2.json
12M     _data/1/208469/_sg/f48d4df4462144a3a234fef0d8cd28bf_v2.json

If we set table_meta_segment_count to zero, each query will load the segment metadata multiple times which works very slowly!

Thought we already cached this, it could be better in smaller size.

@sundy-li sundy-li added the A-storage Area: databend storage label Feb 28, 2023
This was referenced Feb 28, 2023
@jun0315
Copy link
Contributor

jun0315 commented Mar 18, 2023

/assignme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: databend storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants