Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): table meta optimize #11015

Merged
merged 31 commits into from
Apr 21, 2023
Merged

feat(query): table meta optimize #11015

merged 31 commits into from
Apr 21, 2023

Conversation

jun0315
Copy link
Contributor

@jun0315 jun0315 commented Apr 10, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This pr aims to improve the serialization and deserialization efficiency of metadata in Databed and reduce storage space consumption. Simultaneously supports partial read functionality.

Design

Adopting Bincode binary format+Zstd compression method. This can significantly improve the reading speed and stored file size. Simultaneously using custom file formats:

At the beginning, it is the header, which stores the version number of the segment, followed by the encoding format (bincode), which is stored in an enumerated manner, followed by the compression method, which is also stored in an enumerated manner, followed by the length of the serialized data (blocks and summary), and finally, the data is serialized in sequence before being compressed and stored.

Closes #10265

@vercel
Copy link

vercel bot commented Apr 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Apr 21, 2023 3:05am

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Apr 10, 2023
@jun0315
Copy link
Contributor Author

jun0315 commented Apr 10, 2023

Test environment

Memory: 32G
Hard drive: 2 nvme ssd, both 2TB
CPU: 12th Gen Intel i9-12900K (24) @ 6.500GHz
Graphics card: NVIDIA GeForce GTX 1660 Ti
System: ArchLinux

Test Script

#!/bin/bash

echo "start create table"

MYSQL_HOST="127.0.0.1"
MYSQL_USER="root"
MYSQL_PORT="3311"


mysql -h $MYSQL_HOST -u $MYSQL_USER -P $MYSQL_PORT -e "
Drop table hits;
CREATE TABLE hits
(
    WatchID BIGINT NOT NULL,
    JavaEnable SMALLINT NOT NULL,
    Title TEXT NOT NULL,
    GoodEvent SMALLINT NOT NULL,
    EventTime TIMESTAMP NOT NULL,
    EventDate Date NOT NULL,
    CounterID INTEGER NOT NULL,
    ClientIP INTEGER NOT NULL,
    RegionID INTEGER NOT NULL,
    UserID BIGINT NOT NULL,
    CounterClass SMALLINT NOT NULL,
    OS SMALLINT NOT NULL,
    UserAgent SMALLINT NOT NULL,
    URL TEXT NOT NULL,
    Referer TEXT NOT NULL,
    IsRefresh SMALLINT NOT NULL,
    RefererCategoryID SMALLINT NOT NULL,
    RefererRegionID INTEGER NOT NULL,
    URLCategoryID SMALLINT NOT NULL,
    URLRegionID INTEGER NOT NULL,
    ResolutionWidth SMALLINT NOT NULL,
    ResolutionHeight SMALLINT NOT NULL,
    ResolutionDepth SMALLINT NOT NULL,
    FlashMajor SMALLINT NOT NULL,
    FlashMinor SMALLINT NOT NULL,
    FlashMinor2 TEXT NOT NULL,
    NetMajor SMALLINT NOT NULL,
    NetMinor SMALLINT NOT NULL,
    UserAgentMajor SMALLINT NOT NULL,
    UserAgentMinor VARCHAR(255) NOT NULL,
    CookieEnable SMALLINT NOT NULL,
    JavascriptEnable SMALLINT NOT NULL,
    IsMobile SMALLINT NOT NULL,
    MobilePhone SMALLINT NOT NULL,
    MobilePhoneModel TEXT NOT NULL,
    Params TEXT NOT NULL,
    IPNetworkID INTEGER NOT NULL,
    TraficSourceID SMALLINT NOT NULL,
    SearchEngineID SMALLINT NOT NULL,
    SearchPhrase TEXT NOT NULL,
    AdvEngineID SMALLINT NOT NULL,
    IsArtifical SMALLINT NOT NULL,
    WindowClientWidth SMALLINT NOT NULL,
    WindowClientHeight SMALLINT NOT NULL,
    ClientTimeZone SMALLINT NOT NULL,
    ClientEventTime TIMESTAMP NOT NULL,
    SilverlightVersion1 SMALLINT NOT NULL,
    SilverlightVersion2 SMALLINT NOT NULL,
    SilverlightVersion3 INTEGER NOT NULL,
    SilverlightVersion4 SMALLINT NOT NULL,
    PageCharset TEXT NOT NULL,
    CodeVersion INTEGER NOT NULL,
    IsLink SMALLINT NOT NULL,
    IsDownload SMALLINT NOT NULL,
    IsNotBounce SMALLINT NOT NULL,
    FUniqID BIGINT NOT NULL,
    OriginalURL TEXT NOT NULL,
    HID INTEGER NOT NULL,
    IsOldCounter SMALLINT NOT NULL,
    IsEvent SMALLINT NOT NULL,
    IsParameter SMALLINT NOT NULL,
    DontCountHits SMALLINT NOT NULL,
    WithHash SMALLINT NOT NULL,
    HitColor CHAR NOT NULL,
    LocalEventTime TIMESTAMP NOT NULL,
    Age SMALLINT NOT NULL,
    Sex SMALLINT NOT NULL,
    Income SMALLINT NOT NULL,
    Interests SMALLINT NOT NULL,
    Robotness SMALLINT NOT NULL,
    RemoteIP INTEGER NOT NULL,
    WindowName INTEGER NOT NULL,
    OpenerName INTEGER NOT NULL,
    HistoryLength SMALLINT NOT NULL,
    BrowserLanguage TEXT NOT NULL,
    BrowserCountry TEXT NOT NULL,
    SocialNetwork TEXT NOT NULL,
    SocialAction TEXT NOT NULL,
    HTTPError SMALLINT NOT NULL,
    SendTiming INTEGER NOT NULL,
    DNSTiming INTEGER NOT NULL,
    ConnectTiming INTEGER NOT NULL,
    ResponseStartTiming INTEGER NOT NULL,
    ResponseEndTiming INTEGER NOT NULL,
    FetchTiming INTEGER NOT NULL,
    SocialSourceNetworkID SMALLINT NOT NULL,
    SocialSourcePage TEXT NOT NULL,
    ParamPrice BIGINT NOT NULL,
    ParamOrderID TEXT NOT NULL,
    ParamCurrency TEXT NOT NULL,
    ParamCurrencyID SMALLINT NOT NULL,
    OpenstatServiceName TEXT NOT NULL,
    OpenstatCampaignID TEXT NOT NULL,
    OpenstatAdID TEXT NOT NULL,
    OpenstatSourceID TEXT NOT NULL,
    UTMSource TEXT NOT NULL,
    UTMMedium TEXT NOT NULL,
    UTMCampaign TEXT NOT NULL,
    UTMContent TEXT NOT NULL,
    UTMTerm TEXT NOT NULL,
    FromTag TEXT NOT NULL,
    HasGCLID SMALLINT NOT NULL,
    RefererHash BIGINT NOT NULL,
    URLHash BIGINT NOT NULL,
    CLID INTEGER NOT NULL
)
CLUSTER BY (CounterID, EventDate, UserID, EventTime, WatchID);
COPY INTO hits FROM 'https://repo.databend.rs/hits/hits_1m.tsv.gz' FILE_FORMAT=(type=TSV compression=AUTO);
"

for i in {1..6}
do
          echo "start insert into for the $i time"
        mysql -h $MYSQL_HOST -u $MYSQL_USER -P $MYSQL_PORT -e "insert into hits select * from hits;"
          echo "end insert into for the $i time"
done
        

echo "insert data over, ready to compact segement"

mysql -h $MYSQL_HOST -u $MYSQL_USER -P $MYSQL_PORT -e "optimize table hits compact segment;"

# Run the first query and assign the result to a variable
snapshot_id=$(mysql -N -h $MYSQL_HOST -u $MYSQL_USER -P $MYSQL_PORT -e "SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'hits') limit 1;")

# Run the second query using the result from the first query
mysql -h $MYSQL_HOST -u $MYSQL_USER -P $MYSQL_PORT -e "SELECT * FROM FUSE_SEGMENT('default', 'hits', '$snapshot_id');"

Test Result

  Compact segment size File size ratio Load segment time Load time ratio
Json No compress(origin) 11m 100% 211 ms211 ms 215 msAgv: 212ms 100%
Json Snappy 2.5m 22.7% 215 ms223 ms223 msAgv: 220ms 103.8%
Json Zstd 688k 6.1% 216ms223ms213msAgv: 217.3ms 102.4%
Messagepack No compress 3.6m 32.7% 198ms207ms184msAgv: 196ms 92.4%
Messagepack Snappy 1.1m 10% 205ms198ms194msAgv: 199ms 93.9%
Messagepack Zstd 532k 5% 194ms200ms193msAgv: 195ms 92.0%
Bincode No compress 4.5m 40.9% 91ms91ms96msAgv: 93ms 43.9%
Bincode Snappy 1.7m 15.5% 85ms99ms98msAgv:94.3ms 44.5%
Bincode Zstd 596k 5.3% 85ms85ms93msAgv: 87.6ms 41.3%

Bincode + Zstd Best in terms of file size and read speed.

We also tested Bincode+Zstd+Header :

596K 5.3% 84ms86ms94msAgv: 88ms 41.5%

@Xuanwo
Copy link
Member

Xuanwo commented Apr 11, 2023

Bincode + Zstd Best in terms of file size and read speed.

This test seems happen on local fs. It's worth a bench on s3 too.

@jun0315
Copy link
Contributor Author

jun0315 commented Apr 11, 2023

Bincode + Zstd Best in terms of file size and read speed.

This test seems happen on local fs. It's worth a bench on s3 too.

Good idea, Next, I will test!:D

@jun0315 jun0315 changed the title feat(query): segment meta optimize(WIP) feat(query): segment meta optimize Apr 12, 2023
@jun0315 jun0315 marked this pull request as ready for review April 12, 2023 14:14
@BohuTANG
Copy link
Member

For benchhits_1m dataset is too small, you can use https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz :)

@jun0315
Copy link
Contributor Author

jun0315 commented Apr 12, 2023

For benchhits_1m dataset is too small, you can use https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz :)

I have tried this dataset before, but my machine tested slowly at that time, so I chose the above method. I'll use this test later!

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

  1. I think we should compress the snapshot in this pr together, let's keep v3 complete as possible.
  2. We don't need message_pack and snap and json by default, maybe it's better to add an optional dependency and features gate which reduce the compile time.

@BohuTANG
Copy link
Member

BTW, do we have a tool or functions to decode the compressed bincode meta file?Like we want to check the meta data in it, for json it's easy to do it.

@jun0315
Copy link
Contributor Author

jun0315 commented Apr 12, 2023

BTW, do we have a tool or functions to decode the compressed bincode meta file?Like we want to check the meta data in it, for json it's easy to do it.

Command-line tool name: databend-meta-decoder

Usage:
databend-meta-decoder -i <INPUT> [-j] [-o <OUTPUT>] -t <TYPE>

Description:
Decode and output the encoded metadata to the terminal or file. Supports bincode, MessagePack, and JSON encoding formats.

Options:
-j, --json Output data in JSON format
-i, --input Specify the file path to be decoded
-o, --output Specify the output file path for the decoded data. If not specified, output will be displayed on the terminal
-t, --type input data type: segment or snapshot(type only support sg or ss)

Examples:

  1. View metadata information in binary format without saving to a file:
databend-meta-decoder -i data.bin

Version: x
Encoding: y
Compression: z
Blocks Size: p bytes
Summary Size: q bytes
  1. Convert metadata information in binary format to JSON format and save it to a file:
databend-meta-decoder -i data.bin -j -o output.json

Note:
If the output file already exists, it will be overwritten.

How about this design?

@BohuTANG
Copy link
Member

databend-meta-decoder

We already have a service called ·databend-meta·, and it has some tools. Here, to avoid confusion, the name databend-meta-decoder needs to be changed.

@BohuTANG
Copy link
Member

BohuTANG commented Apr 13, 2023

I have test on s3 on hits dataset(first time), but seems no performance help on this dataset:
image

main:
image

compress:
image

cc @sundy-li @dantengsky

@jun0315 jun0315 changed the title feat(query): segment meta optimize feat(query): table meta optimize Apr 13, 2023
@dantengsky
Copy link
Member

@jun0315

FLASHBACK is not working as expected.

The location of the snapshot that flashed back seems to be broken.

mysql> -- table with mixed versions of snapshot
mysql> select * from fuse_snapshot('default', 'hits') limit 10;
+----------------------------------+----------------------------------------------------------+----------------+----------------------------------+---------------+-------------+-----------+--------------------+---
---------------+------------+----------------------------+
| snapshot_id                      | snapshot_location                                        | format_version | previous_snapshot_id             | segment_count | block_count | row_count | bytes_uncompressed | by
tes_compressed | index_size | timestamp                  |
+----------------------------------+----------------------------------------------------------+----------------+----------------------------------+---------------+-------------+-----------+--------------------+---
---------------+------------+----------------------------+
| 8ee454392ea44e77b6013ae47d785d47 | 1/367953/_ss/8ee454392ea44e77b6013ae47d785d47_v3.bincode |              3 | 1eb4da851dd740998c8cf7ddae5729ca |             1 |         768 | 128000000 |       107571195904 |
   10375147776 |  763978368 | 2023-04-17 13:20:06.749049 |
| 1eb4da851dd740998c8cf7ddae5729ca | 1/367953/_ss/1eb4da851dd740998c8cf7ddae5729ca_v3.bincode |              3 | 04015a04838b436689eba248759c6526 |            17 |         768 | 128000000 |       107571195904 |
   10375147776 |  763978368 | 2023-04-17 13:19:46.233535 |
| 04015a04838b436689eba248759c6526 | 1/367953/_ss/04015a04838b436689eba248759c6526_v2.json    |              3 | 89f876abd62644c2a9b34b21e3ada816 |             1 |         384 |  64000000 |        53785597952 |
    5187573888 |  381989184 | 2023-04-17 11:10:42.748486 |
| 89f876abd62644c2a9b34b21e3ada816 | 1/367953/_ss/89f876abd62644c2a9b34b21e3ada816_v2.json    |              3 | 358b09dd29d84c77a0e3afa7e43a327e |            88 |         384 |  64000000 |        53785597952 |
    5187573888 |  381989184 | 2023-04-17 11:10:41.628499 |
| 358b09dd29d84c77a0e3afa7e43a327e | 1/367953/_ss/358b09dd29d84c77a0e3afa7e43a327e_v2.json    |              3 | 218f25dc6f974945824a424ed56a3842 |            72 |         192 |  32000000 |        26892798976 |
    2593786944 |  190994592 | 2023-04-17 11:05:27.000547 |
| 218f25dc6f974945824a424ed56a3842 | 1/367953/_ss/218f25dc6f974945824a424ed56a3842_v2.json    |              3 | a03148ed276f4a099eb795425a7737bd |            56 |          96 |  16000000 |        13446399488 |
    1296893472 |   95497296 | 2023-04-17 11:02:50.153224 |
| a03148ed276f4a099eb795425a7737bd | 1/367953/_ss/a03148ed276f4a099eb795425a7737bd_v2.json    |              3 | 69f164cbee3e4af5a7ef343325d076b6 |            40 |          48 |   8000000 |         6723199744 |
     648446736 |   47748648 | 2023-04-17 11:01:29.665681 |
| 69f164cbee3e4af5a7ef343325d076b6 | 1/367953/_ss/69f164cbee3e4af5a7ef343325d076b6_v2.json    |              3 | 09b49e08e3ec4b349cda62b79af5165f |            24 |          24 |   4000000 |         3361599872 |
     324223368 |   23874324 | 2023-04-17 11:00:46.310446 |
| 09b49e08e3ec4b349cda62b79af5165f | 1/367953/_ss/09b49e08e3ec4b349cda62b79af5165f_v2.json    |              3 | 4018dd73093f4286b71d7b4d7fc56fcf |            12 |          12 |   2000000 |         1680799936 |
     162111684 |   11937162 | 2023-04-17 11:00:22.528997 |
| 4018dd73093f4286b71d7b4d7fc56fcf | 1/367953/_ss/4018dd73093f4286b71d7b4d7fc56fcf_v2.json    |              3 | NULL                             |             6 |           6 |   1000000 |          840399968 |
      81055842 |    5968581 | 2023-04-17 11:00:05.325648 |
+----------------------------------+----------------------------------------------------------+----------------+----------------------------------+---------------+-------------+-----------+--------------------+---
---------------+------------+----------------------------+
10 rows in set (0.17 sec)
Read 10 rows, 2.00 KiB in 0.167 sec., 59.75 rows/sec., 11.97 KiB/sec.

mysql> -- work as expected
mysql> select sum(dnstiming) from hits;
+----------------+
| sum(dnstiming) |
+----------------+
|      673451904 |
+----------------+
1 row in set (0.96 sec)
Read 128000000 rows, 488.28 MiB in 0.932 sec., 137.36 million rows/sec., 524.00 MiB/sec.

mysql> -- flash back to a snapshot of version 2
mysql> alter table hits FLASHBACK TO (SNAPSHOT => '04015a04838b436689eba248759c6526');
Query OK, 0 rows affected (0.12 sec)


mysql> select * from fuse_snapshot('default', 'hits') limit 10;
ERROR 1105 (HY000): Code: 3001, Text = NotFound (persistent) at read, context: { response: Parts { status: 404, version: HTTP/1.1, headers: {"x-amz-request-id": "0VKQ23D8SWXYQ0ZB", "x-amz-id-2": "VyShEzIZY/eqPLL7p
RBel01ERQ6Qs2OiNu5hMzmVDv2d2k4ki8jua10rW6rRFv0Ns7y/imLlRy4=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Mon, 17 Apr 2023 13:21:52 GMT", "server": "AmazonS3"} }, service: s3, path:
 1/367953/_ss/04015a04838b436689eba248759c6526_v3.bincode, range: 0- } => S3Error { code: "NoSuchKey", me

mysql> select sum(dnstiming) from hits;
ERROR 1105 (HY000): Code: 3001, Text = NotFound (persistent) at read, context: { response: Parts { status: 404, version: HTTP/1.1, headers: {"x-amz-request-id": "CGJ2R9PYCFGMNYJN", "x-amz-id-2": "ZVpwLJexiHX/qz3URRYpGyEplS9bM9/WAnyNq8sPeWVeIe7taGePWRrREuHoRIFA4qtRYpsyfwU=", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Mon, 17 Apr 2023 13:24:06 GMT", "server": "AmazonS3"} }, service: s3, path: 1/367953/_ss/04015a04838b436689eba248759c6526_v3.bincode, range: 0- } => S3Error { code: "NoSuchKey", me


@dantengsky dantengsky requested a review from zhyass April 17, 2023 13:52
@zhyass
Copy link
Member

zhyass commented Apr 17, 2023

@jun0315

FLASHBACK is not working as expected.

The location of the snapshot that flashed back seems to be broken.

@dantengsky
That maybe a bug in read snapshot that I have been fixed in https://github.com/datafuselabs/databend/pull/11017/files#diff-05fed86b8525d48e792a773f7ebf75b67357efae46924e26bba966e7d39663e7L81-L109.

The reason is: SnapshotIO takes the version of the root snapshot to read the previous snapshot. If the version of the snapshot before and after is different, the problem will occur.

@jun0315
Copy link
Contributor Author

jun0315 commented Apr 17, 2023

@jun0315
FLASHBACK is not working as expected.
The location of the snapshot that flashed back seems to be broken.

@dantengsky That maybe a bug in read snapshot that I have been fixed in https://github.com/datafuselabs/databend/pull/11017/files#diff-05fed86b8525d48e792a773f7ebf75b67357efae46924e26bba966e7d39663e7L81-L109.

The reason is: SnapshotIO takes the version of the root snapshot to read the previous snapshot. If the version of the snapshot before and after is different, the problem will occur.

@zhyass Thanks a lot, I have merged branch 'main' into segment_compress.

@dantengsky
Copy link
Member

@dantengsky dantengsky marked this pull request as ready for review April 20, 2023 15:07
@dantengsky dantengsky added the ci-benchmark Benchmark: run all test label Apr 20, 2023
Copy link
Member

@zhyass zhyass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jun0315
Copy link
Contributor Author

jun0315 commented Apr 20, 2023

Thanks a lot for your contributions to this PR!! @dantengsky @zhyass

@dantengsky dantengsky removed the ci-benchmark Benchmark: run all test label Apr 21, 2023
@sundy-li
Copy link
Member

sundy-li commented Apr 21, 2023

The performance drop is because of #11154 , but it will be better after #11088

@dantengsky
Copy link
Member

dantengsky commented Apr 21, 2023

The performance drop is because of #11154 , but it will be better after #11088

got it

@BohuTANG BohuTANG merged commit 7fc402b into databendlabs:main Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[storage] use binary compression format to store the metadata of fuse
6 participants