Change to suggested schema? #64

mzealey · 2020-07-06T07:03:27Z

For graphite table, switching to the following moves us from 5 bytes/row to 0.6 bytes/row for a large dataset:

    `Path` LowCardinality(String),
    `Value` Float64,
    `Time` UInt32 Codec(DoubleDelta, LZ4),
    `Date` Date Codec(DoubleDelta, LZ4),
    `Timestamp` UInt32 Codec(DoubleDelta, LZ4)

toYYYYMMDD() might be a better partitioning key for large volumes of data

ClickHouse/ClickHouse#12144 (comment) also suggests removing the Date col - perhaps this could be an option (although when compressed as above only takes a very small amount of space)

The text was updated successfully, but these errors were encountered:

deniszh · 2020-07-06T08:53:22Z

I also got similar results when experimenting with compression and low cardinality, but I do not think it's universal enough and need to be recommended as generic solution.
Same for partitioning key, it's really depends on use case.
For Data column it's also very confusing, theoretically you need do exact opposite thing - get rid of Timestamp with zero-timestamp=true and use Date/Time for queries.

mzealey · 2020-07-06T09:27:20Z

I'd have thought other than partitioning the suggested changes above should benefit everyone as Path will be repeated a lot and Time/Date/Timestamp are all usually increasing and with lots of duplicates

deniszh · 2020-07-06T09:48:05Z

I mean, I think it's not possible to remove Date, it's used by graphite-clickhouse. You can zero Timestamp and have mostly same effect with is documented and recommended way. You can't remove Timestamp because it's required (maybe not anymore?) by Graphite Merge Tree engine.

Hipska · 2020-09-10T08:01:19Z

See this extract from one of my servers:

SELECT
    column,
    any(type),
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
    sum(rows)
FROM system.parts_columns
WHERE (table = 'graphite_data') AND active
GROUP BY column
ORDER BY column ASC

┌─column────┬─any(type)─┬─compressed─┬─uncompressed─┬──sum(rows)─┐
│ Date      │ Date      │ 69.84 MiB  │ 12.24 GiB    │ 6569285540 │
│ Path      │ String    │ 4.60 GiB   │ 1.04 TiB     │ 6569285540 │
│ Time      │ UInt32    │ 17.00 GiB  │ 24.47 GiB    │ 6569285540 │
│ Timestamp │ UInt32    │ 19.38 GiB  │ 24.47 GiB    │ 6569285540 │
│ Value     │ Float64   │ 6.40 GiB   │ 48.94 GiB    │ 6569285540 │
└───────────┴───────────┴────────────┴──────────────┴────────────┘

The Date column has not much impact. The Path might be useful to have as LowCardinality, although it only has big impact in uncompressed format.

I just found out the setting for zero-timestamp, I will change it so that Timestamp will only contain zero value. And will check and compare again later. There is not much documentation on this setting, so I'm wondering why it is enabled by default and why it is needed?

Felixoid · 2020-10-19T06:45:33Z

Be aware, that LowCardinality impacts the size of the marks. Since it doubles it for Path, I've had a lot of OOMs during experiments with it. In ClickHouse telegram chat people have suggested me to use ZSTD, and it works quite well too. Here's my own experiments' final state:

ATTACH TABLE data_lr
(
    `Path` String CODEC(ZSTD(3)),
    `Value` Float64 CODEC(Gorilla, LZ4),
    `Time` UInt32 CODEC(DoubleDelta, LZ4),
    `Date` Date CODEC(DoubleDelta, LZ4),
    `Timestamp` UInt32 CODEC(DoubleDelta, LZ4)
)
ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/graphite.data_lr/{shard}', '{replica}', 'graphite_rollup')
PARTITION BY toYYYYMMDD(Date)
ORDER BY (Path, Time)
SETTINGS index_granularity = 256

And here's the data on the host result:

SELECT
    column,
    any(type),
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
    sum(rows)
FROM system.parts_columns
WHERE (table = 'data_lr') AND active
GROUP BY column
ORDER BY column ASC

┌─column────┬─any(type)─┬─compressed─┬─uncompressed─┬───sum(rows)─┐
│ Date      │ Date      │ 63.59 MiB  │ 48.97 GiB    │ 26289415098 │
│ Path      │ String    │ 9.01 GiB   │ 1.54 TiB     │ 26289415098 │
│ Time      │ UInt32    │ 980.84 MiB │ 97.94 GiB    │ 26289415098 │
│ Timestamp │ UInt32    │ 4.29 GiB   │ 97.94 GiB    │ 26289415098 │
│ Value     │ Float64   │ 60.51 GiB  │ 195.87 GiB   │ 26289415098 │
└───────────┴───────────┴────────────┴──────────────┴─────────────┘

@Hipska what do you use for the Value column?

Hipska · 2021-01-19T09:18:39Z

4 months later and I have seen that compress ratio for column Timestamp has gone from 4-5 up to 220+

So again, why is zero-timestamp not set by default? Why is it useful to have an actual timestamp in it? I don't see the use-case here.

@Felixoid I didn't do anything special to the datamodel, I only added a TTL based on the Date column.

Felixoid · 2021-01-19T10:02:03Z

The Timestamp column is used internally by GraphiteMergeTree as Version column during the rollup process. I'm afraid, I already don't remember details of how it's used, but the doc tells:

Version of the metric. Data type: any numeric. ClickHouse saves the rows with the highest version or the last written if versions are the same. Other rows are deleted during the merge of data parts.

Here's a comment from the code https://github.com/ClickHouse/ClickHouse/fc42851/master/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.h#L78-L90

    /* | path | time | rounded_time | version | value | unmodified |
     * -----------------------------------------------------------------------------------
     * | A    | 11   | 10           | 1       | 1     | a          |                     |
     * | A    | 11   | 10           | 3       | 2     | b          |> subgroup(A, 11)    |
     * | A    | 11   | 10           | 2       | 3     | c          |                     |> group(A, 10)
     * ----------------------------------------------------------------------------------|>
     * | A    | 12   | 10           | 0       | 4     | d          |                     |> Outputs (A, 10, avg(2, 5), a)
     * | A    | 12   | 10           | 1       | 5     | e          |> subgroup(A, 12)    |
     * -----------------------------------------------------------------------------------
     * | A    | 21   | 20           | 1       | 6     | f          |
     * | B    | 11   | 10           | 1       | 7     | g          |
     * ...
     */

I don't think setting zero-timestamp=true by default is a good idea. The much more valuable change is using proper codecs, like Timestamp UInt32 CODEC(DoubleDelta, LZ4). As you see, it helps significantly to store the data more effectively.

SELECT
    column,
    any(type),
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes)) AS uncompressed,
    sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes) AS ratio,
    sum(rows)
FROM system.parts_columns
WHERE ((table = 'data_lr') AND (database = 'graphite')) AND active
GROUP BY column
ORDER BY column ASC

┌─column────┬─any(type)─┬─compressed─┬─uncompressed─┬──────────────ratio─┬───sum(rows)─┐
│ Date      │ Date      │ 59.17 MiB  │ 45.58 GiB    │   788.813125428121 │ 24468984334 │
│ Path      │ String    │ 9.42 GiB   │ 1.41 TiB     │ 153.15145708165463 │ 24468984334 │
│ Time      │ UInt32    │ 961.97 MiB │ 91.15 GiB    │   97.0315116787169 │ 24468984334 │
│ Timestamp │ UInt32    │ 4.33 GiB   │ 91.15 GiB    │  21.03714874982845 │ 24468984334 │
│ Value     │ Float64   │ 55.59 GiB  │ 182.31 GiB   │  3.279419424878342 │ 24468984334 │
└───────────┴───────────┴────────────┴──────────────┴────────────────────┴─────────────┘

Felixoid · 2021-01-19T10:08:31Z

And one additional note, when I've tested the LowCardinality(String) as a Path column, it did consume the double amount of memory for the marks. And my servers were killed by OOM quite often, especially during the wide Time+Path requests. Usage of String CODEC(ZSTD(3)) is more reliable. At least, it was the state of Sep 2019. Not sure how it's now.

Hipska · 2021-01-19T10:27:52Z

So, I still don't see use case to have anything other than 0 in Timestamp col. Why would there be a case that there are multiple rows on the same Path and Time and different value? It seems to me those would be very specific use-case that should not be the reason for a default setting that uses more storage than needed?

Felixoid · 2021-01-19T10:34:56Z

Because neither I nor you know the internals of GraphiteMergeTree engine, I'd say. And I feel uncomfortable making a decision for everybody to change the default behavior.

Besides that, with the proper codec, it doesn't hurt so much. Plus, it just came to my mind, it would make sense to add a custom TTL to the Timestamp column. Something pretty big to be sure, that rollup is done at least once. Then it will be pretty insignificant.

bzed · 2021-03-30T10:53:33Z

But instead of changing the table schema, is there an issue in using toYYYYMMDD() for partitioning? Optimizing > 1.5TB partitions down to ~100G just needs an useless amount of empty space. (Does anybody know why clickhouse requires the 100% overhead?)

Felixoid · 2021-03-30T11:10:03Z

It's something I can answer

But instead of changing the table schema, is there an issue in using toYYYYMMDD() for partitioning?

It depends more on the amount of data. Partitions bring overhead. And if you have broad historical inserts (like over a year), then the memory consumption for such batches would be huge. On the other hand, the optimizing overhead is minimized, so one should find a balance for himself. I personally have used partitions for 3 days, but a half-year ago migrated to toYYYYMMDD() and don't see issues. So it sounds like a solution I can suggest.

Does anybody know why clickhouse requires the 100% overhead?

It's inherited from the MergeTree engine. The 100% space allocation required to be safe during merging all parts together. The server can't know in advance how much space may be really needed and how aggressive aggregation will be, so can't predict the necessary allocation more precisely than just taking an existing size as a constant rule of thumb.

bzed · 2021-03-30T22:48:29Z

It's something I can answer

But instead of changing the table schema, is there an issue in using toYYYYMMDD() for partitioning?

It depends more on the amount of data. Partitions bring overhead. And if you have broad historical inserts (like over a year), then the memory consumption for such batches would be huge.

Basically I'm inserting ~400k rows/s with data from the last 10..300 seconds. Old data is rarely inserted.

On the other hand, the optimizing overhead is minimized, so one should find a balance for himself. I personally have used partitions for 3 days, but a half-year ago migrated to toYYYYMMDD() and don't see issues. So it sounds like a solution I can suggest.

Thanks, appreciated.

Hipska · 2021-05-26T08:09:09Z

Hi guys, would graphite-clickhouse benefit from any projections? https://www.youtube.com/watch?v=jJ5VuLr2k5k

Felixoid · 2021-05-26T08:49:30Z

Hi guys, would graphite-clickhouse benefit from any projections? https://www.youtube.com/watch?v=jJ5VuLr2k5k

Yes, it definitely looks interesting. Looks like it brings the feature to define the only one table for both direct and reversed points at once. Thank you for popping it up! 👍

keliss mentioned this issue Aug 26, 2021

"Attempt to read after end of file" upon insert #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change to suggested schema? #64

Change to suggested schema? #64

mzealey commented Jul 6, 2020

deniszh commented Jul 6, 2020

mzealey commented Jul 6, 2020

deniszh commented Jul 6, 2020 •

edited

Loading

Hipska commented Sep 10, 2020

Felixoid commented Oct 19, 2020 •

edited

Loading

Hipska commented Jan 19, 2021

Felixoid commented Jan 19, 2021

Felixoid commented Jan 19, 2021 •

edited

Loading

Hipska commented Jan 19, 2021

Felixoid commented Jan 19, 2021

bzed commented Mar 30, 2021

Felixoid commented Mar 30, 2021 •

edited

Loading

bzed commented Mar 30, 2021

Hipska commented May 26, 2021

Felixoid commented May 26, 2021

Change to suggested schema? #64

Change to suggested schema? #64

Comments

mzealey commented Jul 6, 2020

deniszh commented Jul 6, 2020

mzealey commented Jul 6, 2020

deniszh commented Jul 6, 2020 • edited Loading

Hipska commented Sep 10, 2020

Felixoid commented Oct 19, 2020 • edited Loading

Hipska commented Jan 19, 2021

Felixoid commented Jan 19, 2021

Felixoid commented Jan 19, 2021 • edited Loading

Hipska commented Jan 19, 2021

Felixoid commented Jan 19, 2021

bzed commented Mar 30, 2021

Felixoid commented Mar 30, 2021 • edited Loading

bzed commented Mar 30, 2021

Hipska commented May 26, 2021

Felixoid commented May 26, 2021

deniszh commented Jul 6, 2020 •

edited

Loading

Felixoid commented Oct 19, 2020 •

edited

Loading

Felixoid commented Jan 19, 2021 •

edited

Loading

Felixoid commented Mar 30, 2021 •

edited

Loading