Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` #11802

Rachelint · 2024-08-04T08:06:26Z

Which issue does this PR close?

Part of #11719

Rationale for this change

What changes are included in this PR?

Reduce the cost about clone and drop of Statistics using arc
Optimize the impl for get_statistics_with_limit(seems may tmp vectors exist, but not sure)

Are these changes tested?

By exist tests.

Are there any user-facing changes?

No.

Rachelint · 2024-08-04T08:59:27Z

According to simple benchmark about q0 of clickbench.

The first change make it 1.30+ faster(have been partially eliminated):

┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃   main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 3.65ms │                    2.77ms │ +1.32x faster │
└──────────────┴────────┴───────────────────────────┴───────────────┘

After second change is applied, 2.10+ faster:

┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃   main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 3.65ms │                    1.69ms │ +2.15x faster │
└──────────────┴────────┴───────────────────────────┴───────────────┘

The complete benchmark for clickbench_partitioned(because eliminate the Arc<Statistic> in PartitionedFile, finally 1.67x faster):

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     3.46ms │                    2.07ms │ +1.67x faster │
│ QQuery 1     │    55.94ms │                   55.10ms │     no change │
│ QQuery 2     │   149.88ms │                  147.99ms │     no change │
│ QQuery 3     │   165.74ms │                  161.47ms │     no change │
│ QQuery 4     │  1593.53ms │                 1582.07ms │     no change │
│ QQuery 5     │  1444.10ms │                 1414.92ms │     no change │
│ QQuery 6     │    46.93ms │                   44.35ms │ +1.06x faster │
│ QQuery 7     │    57.43ms │                   54.95ms │     no change │
│ QQuery 8     │  2268.06ms │                 2258.53ms │     no change │
│ QQuery 9     │  1871.11ms │                 1866.65ms │     no change │
│ QQuery 10    │   508.29ms │                  503.46ms │     no change │
│ QQuery 11    │   567.24ms │                  554.40ms │     no change │
│ QQuery 12    │  1635.85ms │                 1628.98ms │     no change │
│ QQuery 13    │  3178.33ms │                 3199.05ms │     no change │
│ QQuery 14    │  2377.17ms │                 2369.37ms │     no change │
│ QQuery 15    │  1791.89ms │                 1772.30ms │     no change │
│ QQuery 16    │  4722.17ms │                 4731.72ms │     no change │
│ QQuery 17    │  4641.52ms │                 4609.93ms │     no change │
│ QQuery 18    │  9375.13ms │                 9536.14ms │     no change │
│ QQuery 19    │   135.14ms │                  133.52ms │     no change │
│ QQuery 20    │  3490.32ms │                 3499.24ms │     no change │
│ QQuery 21    │  4016.60ms │                 4017.70ms │     no change │
│ QQuery 22    │  8761.18ms │                 8770.62ms │     no change │
│ QQuery 23    │ 21291.41ms │                21205.12ms │     no change │
│ QQuery 24    │  1027.87ms │                 1020.31ms │     no change │
│ QQuery 25    │   834.89ms │                  825.86ms │     no change │
│ QQuery 26    │  1201.90ms │                 1196.39ms │     no change │
│ QQuery 27    │  4920.91ms │                 4911.59ms │     no change │
│ QQuery 28    │ 21419.93ms │                21962.78ms │     no change │
│ QQuery 29    │   833.25ms │                  841.28ms │     no change │
│ QQuery 30    │  1935.39ms │                 1936.87ms │     no change │
│ QQuery 31    │  2782.50ms │                 2784.20ms │     no change │
│ QQuery 32    │ 15001.36ms │                15043.47ms │     no change │
│ QQuery 33    │  9388.50ms │                 9423.83ms │     no change │
│ QQuery 34    │  9219.85ms │                 9246.16ms │     no change │
│ QQuery 35    │  3006.75ms │                 3022.45ms │     no change │
│ QQuery 36    │   243.27ms │                  232.42ms │     no change │
│ QQuery 37    │   107.44ms │                  106.40ms │     no change │
│ QQuery 38    │   135.83ms │                  138.65ms │     no change │
│ QQuery 39    │   798.46ms │                  815.32ms │     no change │
│ QQuery 40    │    54.63ms │                   53.55ms │     no change │
│ QQuery 41    │    49.08ms │                   43.92ms │ +1.12x faster │
│ QQuery 42    │    62.14ms │                   60.26ms │     no change │
└──────────────┴────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)                        │ 147172.40ms │
│ Total Time (reduce-clone-of-statistic)   │ 147785.39ms │
│ Average Time (main)                      │   3422.61ms │
│ Average Time (reduce-clone-of-statistic) │   3436.87ms │
│ Queries Faster                           │           3 │
│ Queries Slower                           │           0 │
│ Queries with No Change                   │          40 │
└──────────────────────────────────────────┴─────────────┘

alamb · 2024-08-04T11:52:08Z

This is very exciting and a great idea. Thank you @Rachelint

We have seen similar performance challenges cloning Statistics in InfluxDB

Rachelint · 2024-08-04T12:41:04Z

This is very exciting and a great idea. Thank you @Rachelint

We have seen similar performance challenges cloning Statistics in InfluxDB

Glad to see that it can help!

In fact, I want to refactor the returned value of statistic function to Arc<Statistic> for reducing more clone of Statistic in further pr.

datafusion/datafusion/physical-plan/src/execution_plan.rs

Line 383 in a4d41d6

fn statistics(&self) -> Result<Statistics> {

But I don't know if it is ok to modify such a function in the public trait...

Rachelint · 2024-08-05T06:26:00Z

More benchmarks(no change ones):

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 277.12ms │                  277.33ms │     no change │
│ QQuery 2     │  46.22ms │                   42.92ms │ +1.08x faster │
│ QQuery 3     │ 108.74ms │                  107.83ms │     no change │
│ QQuery 4     │  58.80ms │                   57.93ms │     no change │
│ QQuery 5     │ 188.03ms │                  185.05ms │     no change │
│ QQuery 6     │  50.56ms │                   52.48ms │     no change │
│ QQuery 7     │ 294.15ms │                  290.95ms │     no change │
│ QQuery 8     │ 118.22ms │                  116.69ms │     no change │
│ QQuery 9     │ 218.15ms │                  218.93ms │     no change │
│ QQuery 10    │ 187.02ms │                  188.36ms │     no change │
│ QQuery 11    │  30.54ms │                   30.30ms │     no change │
│ QQuery 12    │  80.52ms │                   80.74ms │     no change │
│ QQuery 13    │ 125.80ms │                  121.68ms │     no change │
│ QQuery 14    │  74.04ms │                   72.71ms │     no change │
│ QQuery 15    │ 102.41ms │                  104.08ms │     no change │
│ QQuery 16    │  42.47ms │                   40.99ms │     no change │
│ QQuery 17    │ 265.32ms │                  264.94ms │     no change │
│ QQuery 18    │ 439.87ms │                  433.37ms │     no change │
│ QQuery 19    │ 133.63ms │                  134.33ms │     no change │
│ QQuery 20    │ 142.03ms │                  135.10ms │     no change │
│ QQuery 21    │ 289.53ms │                  285.10ms │     no change │
│ QQuery 22    │  27.15ms │                   24.94ms │ +1.09x faster │
└──────────────┴──────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 3300.31ms │
│ Total Time (reduce-clone-of-statistic)   │ 3266.73ms │
│ Average Time (main)                      │  150.01ms │
│ Average Time (reduce-clone-of-statistic) │  148.49ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        20 │
└──────────────────────────────────────────┴───────────┘

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.68ms │                    0.71ms │     no change │
│ QQuery 1     │    65.11ms │                   66.44ms │     no change │
│ QQuery 2     │   160.05ms │                  158.06ms │     no change │
│ QQuery 3     │   182.27ms │                  180.71ms │     no change │
│ QQuery 4     │  1585.06ms │                 1598.65ms │     no change │
│ QQuery 5     │  1561.95ms │                 1508.21ms │     no change │
│ QQuery 6     │    56.30ms │                   56.04ms │     no change │
│ QQuery 7     │    67.08ms │                   67.35ms │     no change │
│ QQuery 8     │  2253.04ms │                 2255.98ms │     no change │
│ QQuery 9     │  1876.22ms │                 1866.22ms │     no change │
│ QQuery 10    │   542.70ms │                  529.90ms │     no change │
│ QQuery 11    │   596.05ms │                  579.83ms │     no change │
│ QQuery 12    │  1726.31ms │                 1692.74ms │     no change │
│ QQuery 13    │  3985.22ms │                 3990.07ms │     no change │
│ QQuery 14    │  2518.86ms │                 2523.15ms │     no change │
│ QQuery 15    │  1779.28ms │                 1768.23ms │     no change │
│ QQuery 16    │  4825.89ms │                 4817.11ms │     no change │
│ QQuery 17    │  4763.05ms │                 4700.93ms │     no change │
│ QQuery 18    │ 10052.79ms │                10142.51ms │     no change │
│ QQuery 19    │   143.22ms │                  144.22ms │     no change │
│ QQuery 20    │  3276.89ms │                 3262.53ms │     no change │
│ QQuery 21    │  3852.14ms │                 3813.10ms │     no change │
│ QQuery 22    │  9274.03ms │                 8843.84ms │     no change │
│ QQuery 23    │ 22849.52ms │                22499.54ms │     no change │
│ QQuery 24    │  1145.09ms │                 1115.46ms │     no change │
│ QQuery 25    │  1033.81ms │                  985.75ms │     no change │
│ QQuery 26    │  1334.39ms │                 1300.65ms │     no change │
│ QQuery 27    │  4684.21ms │                 4654.56ms │     no change │
│ QQuery 28    │ 23122.33ms │                23813.49ms │     no change │
│ QQuery 29    │   891.59ms │                  892.46ms │     no change │
│ QQuery 30    │  2017.38ms │                 2025.79ms │     no change │
│ QQuery 31    │  2832.43ms │                 2891.86ms │     no change │
│ QQuery 32    │ 15113.67ms │                15290.76ms │     no change │
│ QQuery 33    │  9394.78ms │                 9566.36ms │     no change │
│ QQuery 34    │  9355.68ms │                 9431.65ms │     no change │
│ QQuery 35    │  2988.67ms │                 3006.76ms │     no change │
│ QQuery 36    │   256.50ms │                  256.27ms │     no change │
│ QQuery 37    │   169.19ms │                  150.94ms │ +1.12x faster │
│ QQuery 38    │   153.09ms │                  150.05ms │     no change │
│ QQuery 39    │   795.92ms │                  811.69ms │     no change │
│ QQuery 40    │    58.10ms │                   60.42ms │     no change │
│ QQuery 41    │    55.76ms │                   56.48ms │     no change │
│ QQuery 42    │    68.34ms │                   69.07ms │     no change │
└──────────────┴────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)                        │ 153464.65ms │
│ Total Time (reduce-clone-of-statistic)   │ 153596.55ms │
│ Average Time (main)                      │   3568.95ms │
│ Average Time (reduce-clone-of-statistic) │   3572.01ms │
│ Queries Faster                           │           1 │
│ Queries Slower                           │           0 │
│ Queries with No Change                   │          42 │
└──────────────────────────────────────────┴─────────────┘

Rachelint · 2024-08-05T06:32:00Z

Some strange results found during the benchmarks(see #11807).

But when I pulling main and rebasing, the results become different...
I can almost make sure that, it is not really related to codes... So strange...

alamb

Thank you very much @Rachelint -- I went through this PR carefully. I have some comments that I think could make the code better but I don't think they are necessary to merge this PR

alamb · 2024-08-05T18:43:26Z

datafusion/core/src/datasource/listing/mod.rs

@@ -78,10 +78,11 @@ pub struct PartitionedFile {
    ///
    /// DataFusion relies on these statistics for planning (in particular to sort file groups),
    /// so if they are incorrect, incorrect answers may result.
-    pub statistics: Option<Statistics>,
+    pub statistics: Option<Arc<Statistics>>,


💯 This alone will likely avoid a bunch of copying

It is also an API change, so I marked the PR thusly

alamb · 2024-08-05T18:46:59Z

datafusion/core/src/datasource/listing/mod.rs

@@ -159,6 +160,24 @@ impl From<ObjectMeta> for PartitionedFile {
    }
 }

+impl Default for PartitionedFile {


Isn't this the same as #[derive(Default)]?

alamb · 2024-08-05T18:50:52Z

datafusion/core/src/datasource/statistics.rs


                total_byte_size =
-                    add_row_stats(file_stats.total_byte_size, total_byte_size);
+                    add_row_stats(file_stats.total_byte_size.clone(), total_byte_size);


I double checked that stats here are Precision<usize> (and thus this clone is not a performance problem)

I also made a small experiment to see if an alternate formulation where it might be clearer that the copy is not occuring (last commit in #11828)

alamb · 2024-08-05T18:51:36Z

datafusion/core/src/datasource/statistics.rs

-) -> Precision<ScalarValue> {
-    match (&min_values, &min_nominee) {
-        (Precision::Exact(val1), Precision::Exact(val2)) if val1 > val2 => min_nominee,
+    min_nominee: &Precision<ScalarValue>,


💯 to reduce this copy

💯 to reduce this copy

I think the alternative may be that we refactor the clone expensive scalars to the clone cheap impl (like String to Arc<str>)?

I think that would also be something interesting to pursue 💯

alamb · 2024-08-05T19:14:15Z

datafusion/core/src/datasource/listing/helpers.rs

    partitioned_files
-        .chunks(chunk_size)
-        .map(|c| c.to_vec())
+        .chunks_mut(chunk_size)
+        .map(|c| c.iter_mut().map(mem::take).collect())
        .collect()
 }


I think this could also be forumulated with drain() and this avoid the need for Default: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.drain

Here is a POC of it working: #11829

let mut chunks = Vec::with_capacity(n); let mut current_chunk = Vec::with_capacity(chunk_size); for file in partitioned_files.drain(..) { current_chunk.push(file); if current_chunk.len() == chunk_size { chunks.push(mem::take(&mut current_chunk)); } } if !current_chunk.is_empty() { chunks.push(current_chunk) } chunks

(I don't know if this matters for performance)

I use this to replace the chunk_mut, see no changes in performance, but it is really good to eliminiate the default need of PartitionedFile .

alamb

I am running benchmarks on my test machine too but I think this looks great to me

alamb · 2024-08-05T19:20:08Z

But when I pulling main and rebasing, the results become different...
I can almost make sure that, it is not really related to codes... So strange...

I think there can be some signifiant variation in performance when we are measuring queries that take 100s of ms -- so it may be measurement noise

alamb · 2024-08-05T22:09:59Z

I couldn't reproduce the performance improvement 🤔

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.68ms │                    0.65ms │     no change │
│ QQuery 1     │    73.22ms │                   68.60ms │ +1.07x faster │
│ QQuery 2     │   127.00ms │                  121.18ms │     no change │
│ QQuery 3     │   130.57ms │                  131.48ms │     no change │
│ QQuery 4     │  1004.57ms │                  977.24ms │     no change │
│ QQuery 5     │  1048.62ms │                 1100.14ms │     no change │
│ QQuery 6     │    65.11ms │                   65.73ms │     no change │
│ QQuery 7     │    75.67ms │                   72.46ms │     no change │
│ QQuery 8     │  1482.20ms │                 1484.73ms │     no change │
│ QQuery 9     │  1368.35ms │                 1335.16ms │     no change │
│ QQuery 10    │   453.35ms │                  458.35ms │     no change │
│ QQuery 11    │   497.56ms │                  487.70ms │     no change │
│ QQuery 12    │  1157.58ms │                 1196.27ms │     no change │
│ QQuery 13    │  2351.61ms │                 2436.85ms │     no change │
│ QQuery 14    │  1583.30ms │                 1622.80ms │     no change │
│ QQuery 15    │  1108.02ms │                 1109.76ms │     no change │
│ QQuery 16    │  2930.51ms │                 2942.69ms │     no change │
│ QQuery 17    │  2865.63ms │                 2908.80ms │     no change │
│ QQuery 18    │  5855.23ms │                 5819.46ms │     no change │
│ QQuery 19    │   122.07ms │                  122.00ms │     no change │
│ QQuery 20    │  1654.79ms │                 1690.15ms │     no change │
│ QQuery 21    │  1966.00ms │                 2037.89ms │     no change │
│ QQuery 22    │  4425.05ms │                 4841.03ms │  1.09x slower │
│ QQuery 23    │ 10970.91ms │                11441.47ms │     no change │
│ QQuery 24    │   698.12ms │                  754.27ms │  1.08x slower │
│ QQuery 25    │   635.64ms │                  662.99ms │     no change │
│ QQuery 26    │   795.75ms │                  835.62ms │  1.05x slower │
│ QQuery 27    │  2506.06ms │                 2533.12ms │     no change │
│ QQuery 28    │ 15227.11ms │                15365.39ms │     no change │
│ QQuery 29    │   553.26ms │                  569.91ms │     no change │
│ QQuery 30    │  1294.37ms │                 1321.61ms │     no change │
│ QQuery 31    │  1601.24ms │                 1676.62ms │     no change │
│ QQuery 32    │  7582.42ms │                 7757.52ms │     no change │
│ QQuery 33    │  5026.69ms │                 5103.35ms │     no change │
│ QQuery 34    │  5008.33ms │                 5122.12ms │     no change │
│ QQuery 35    │  1889.92ms │                 1902.88ms │     no change │
│ QQuery 36    │   324.32ms │                  311.34ms │     no change │
│ QQuery 37    │   208.77ms │                  216.91ms │     no change │
│ QQuery 38    │   194.58ms │                  192.48ms │     no change │
│ QQuery 39    │  1028.92ms │                  994.52ms │     no change │
│ QQuery 40    │    88.58ms │                   88.15ms │     no change │
│ QQuery 41    │    80.15ms │                   79.89ms │     no change │
│ QQuery 42    │    94.55ms │                   93.66ms │     no change │
└──────────────┴────────────┴───────────────────────────┴───────────────┘

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  103.86ms │                  109.40ms │  1.05x slower │
│ QQuery 2     │   25.44ms │                   25.20ms │     no change │
│ QQuery 3     │   41.18ms │                   42.15ms │     no change │
│ QQuery 4     │   36.43ms │                   34.44ms │ +1.06x faster │
│ QQuery 5     │   62.59ms │                   62.96ms │     no change │
│ QQuery 6     │    8.65ms │                    8.53ms │     no change │
│ QQuery 7     │  117.76ms │                  121.29ms │     no change │
│ QQuery 8     │   26.31ms │                   26.38ms │     no change │
│ QQuery 9     │   63.16ms │                   63.40ms │     no change │
│ QQuery 10    │   69.42ms │                   71.37ms │     no change │
│ QQuery 11    │   64.83ms │                   65.63ms │     no change │
│ QQuery 12    │   27.29ms │                   27.44ms │     no change │
│ QQuery 13    │   40.85ms │                   41.87ms │     no change │
│ QQuery 14    │   11.45ms │                   11.31ms │     no change │
│ QQuery 15    │   21.04ms │                   21.42ms │     no change │
│ QQuery 16    │   26.00ms │                   26.18ms │     no change │
│ QQuery 17    │  104.54ms │                  102.61ms │     no change │
│ QQuery 18    │  235.03ms │                  228.68ms │     no change │
│ QQuery 19    │   29.05ms │                   27.95ms │     no change │
│ QQuery 20    │   46.78ms │                   45.58ms │     no change │
│ QQuery 21    │  171.09ms │                  173.58ms │     no change │
│ QQuery 22    │   14.21ms │                   14.10ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)                   │ 1346.95ms │
│ Total Time (reduce-clone-of-statistic)   │ 1351.45ms │
│ Average Time (main_base)                 │   61.23ms │
│ Average Time (reduce-clone-of-statistic) │   61.43ms │
│ Queries Faster                           │         1 │
│ Queries Slower                           │         1 │
│ Queries with No Change                   │        20 │
└──────────────────────────────────────────┴───────────┘

Rachelint · 2024-08-06T00:57:08Z

@alamb It mainly improve the short queries in clickbench_partitioned, and In my expectations, other cases should have no changes, maybe can try the clickbench_partitioned case?

🤔 But it is strange that it get slower in other cases (especially the not short queries in clickbench_1), I guess the reasons chould be:

Some changes in the latest main but not in this branch make a difference? (assuming the main branch here is the latest), the slowers in clickbench_1 look like this?
Maybe measurement noise exists? The slower in tpch_mem_sf1 may be due to this?

I am running clickbench_1 in my local to try reproducing it.

alamb · 2024-08-06T10:23:11Z

@alamb It mainly improve the short queries in clickbench_partitioned, and In my expectations, other cases should have no changes, maybe can try the clickbench_partitioned case?

That is my expectation too

🤔 But it is strange that it get slower in other cases (especially the not short queries in clickbench_1), I guess the reasons chould be:

Some changes in the latest main but not in this branch make a difference? (assuming the main branch here is the latest), the slowers in clickbench_1 look like this?

My script tries to control for this by comparing against git merge-base -- FWIW the script I am using is here https://github.com/alamb/datafusion-benchmarking/blob/main/compare_branch.sh

Maybe measurement noise exists? The slower in tpch_mem_sf1 may be due to this?

Yes, maybe

I am running clickbench_1 in my local to try reproducing it.

Than thank you. I also hope to spend some time shortly looking into this (and it looks like you have done some additonal work too)

Rachelint · 2024-08-06T10:46:27Z

@alamb It mainly improve the short queries in clickbench_partitioned, and In my expectations, other cases should have no changes, maybe can try the clickbench_partitioned case?

That is my expectation too

🤔 But it is strange that it get slower in other cases (especially the not short queries in clickbench_1), I guess the reasons chould be:

Some changes in the latest main but not in this branch make a difference? (assuming the main branch here is the latest), the slowers in clickbench_1 look like this?

My script tries to control for this by comparing against git merge-base -- FWIW the script I am using is here https://github.com/alamb/datafusion-benchmarking/blob/main/compare_branch.sh

Maybe measurement noise exists? The slower in tpch_mem_sf1 may be due to this?

Yes, maybe

I am running clickbench_1 in my local to try reproducing it.

Than thank you. I also hope to spend some time shortly looking into this (and it looks like you have done some additonal work too)

I am still working on finding the reason why the long queries slower(e.g. q22 as mentions about the strange result above), after pulling and rebasing to the latest main, this branch took 9200ms, and main 8800ms now...

The code change here is almost impossible to make such a difference (the planning stages just took less than 5ms).
And the generated plans are same as I see in analyze.

I used perf to collects some cpu metrics, also seems almost same ...

latest main

83,416,745,608      dTLB-loads
    22,018,889      dTLB-load-misses          #    0.03% of all dTLB cache hits
<not supported>      dTLB-prefetch-misses
83,416,745,608      L1-dcache-loads
 6,315,625,925      L1-dcache-load-misses     #    7.57% of all L1-dcache hits
29,755,469,086      L1-dcache-stores

  17.828162002 seconds time elapsed

  91.404349000 seconds user
   5.162641000 seconds sys

this branch

83,396,296,754      dTLB-loads
    21,873,549      dTLB-load-misses          #    0.03% of all dTLB cache hits
<not supported>      dTLB-prefetch-misses
83,396,296,754      L1-dcache-loads
 6,312,881,833      L1-dcache-load-misses     #    7.57% of all L1-dcache hits
29,739,091,743      L1-dcache-stores

  18.637998137 seconds time elapsed

  94.941212000 seconds user
   5.033127000 seconds sys

Rachelint · 2024-08-06T13:39:10Z

It is really Interesting, I profile the two branch with the q22 in clickbench_partitioned case:

sudo perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles ./target/release/dfbench-main-d clickbench  --iterations 2 --path "./benchmarks/data/hits_partitioned/" --queries-path "./benchmarks/queries/clickbench/queries.sql" -o "./result"

Then, I found this branch's bus cycles is higher than main, although its total instructions is fewer as my expectation.
It means that the cpu do more memory accesses in this branch.

Then I only revert the commit b7262c2d56c6254dcb07a227ac89f9181c4cf570 which introducing Arc<Statistic>, but keep other commits in this pr, the q22 get as fast as the main!

Seems the Arc here lead to more memory accesses?

latest main

   388,784,924,806      cycles
   518,440,373,959      instructions              #    1.33  insn per cycle
    12,099,462,240      cache-references
     6,367,115,115      cache-misses              #   52.623 % of all cache refs
     2,269,797,500      bus-cycles

      18.112975321 seconds time elapsed

      89.534429000 seconds user
       5.590137000 seconds sys

this branch

   407,813,402,518      cycles
   513,514,196,117      instructions              #    1.26  insn per cycle
    12,041,238,783      cache-references
     6,321,844,624      cache-misses              #   52.502 % of all cache refs
     2,378,396,233      bus-cycles

      18.527597954 seconds time elapsed

      95.135616000 seconds user
       4.336611000 seconds sys

Rachelint · 2024-08-06T18:00:28Z

@alamb finally I think I got the reason, it seems not the measurement noise for the long queries(such as q22 in clickbench)...

The introduction for the Arc<Statistic> to PartitionedFile maybe actually make the long queries slower. The detail can see above, although the use of Arc can decrease the instructions, but it increase bus-cycles, and finally leads to the higher cycles(slower).

I guess it is related to the atomic in the Arc, and when the amount of PartitionedFile becomes large, the cost of atomic becomes not trivial. But I am not sure, just a guess.

I eliminate the Arc<Statistic> in PartitionedFile now for not hurting the long queries.

The new benchmarks can see following.

Rachelint · 2024-08-06T18:04:04Z

The target case clickbench_partitioned

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ reduce-clone-of-statistic ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     3.46ms │                    2.07ms │ +1.67x faster │
│ QQuery 1     │    55.94ms │                   55.10ms │     no change │
│ QQuery 2     │   149.88ms │                  147.99ms │     no change │
│ QQuery 3     │   165.74ms │                  161.47ms │     no change │
│ QQuery 4     │  1593.53ms │                 1582.07ms │     no change │
│ QQuery 5     │  1444.10ms │                 1414.92ms │     no change │
│ QQuery 6     │    46.93ms │                   44.35ms │ +1.06x faster │
│ QQuery 7     │    57.43ms │                   54.95ms │     no change │
│ QQuery 8     │  2268.06ms │                 2258.53ms │     no change │
│ QQuery 9     │  1871.11ms │                 1866.65ms │     no change │
│ QQuery 10    │   508.29ms │                  503.46ms │     no change │
│ QQuery 11    │   567.24ms │                  554.40ms │     no change │
│ QQuery 12    │  1635.85ms │                 1628.98ms │     no change │
│ QQuery 13    │  3178.33ms │                 3199.05ms │     no change │
│ QQuery 14    │  2377.17ms │                 2369.37ms │     no change │
│ QQuery 15    │  1791.89ms │                 1772.30ms │     no change │
│ QQuery 16    │  4722.17ms │                 4731.72ms │     no change │
│ QQuery 17    │  4641.52ms │                 4609.93ms │     no change │
│ QQuery 18    │  9375.13ms │                 9536.14ms │     no change │
│ QQuery 19    │   135.14ms │                  133.52ms │     no change │
│ QQuery 20    │  3490.32ms │                 3499.24ms │     no change │
│ QQuery 21    │  4016.60ms │                 4017.70ms │     no change │
│ QQuery 22    │  8761.18ms │                 8770.62ms │     no change │
│ QQuery 23    │ 21291.41ms │                21205.12ms │     no change │
│ QQuery 24    │  1027.87ms │                 1020.31ms │     no change │
│ QQuery 25    │   834.89ms │                  825.86ms │     no change │
│ QQuery 26    │  1201.90ms │                 1196.39ms │     no change │
│ QQuery 27    │  4920.91ms │                 4911.59ms │     no change │
│ QQuery 28    │ 21419.93ms │                21962.78ms │     no change │
│ QQuery 29    │   833.25ms │                  841.28ms │     no change │
│ QQuery 30    │  1935.39ms │                 1936.87ms │     no change │
│ QQuery 31    │  2782.50ms │                 2784.20ms │     no change │
│ QQuery 32    │ 15001.36ms │                15043.47ms │     no change │
│ QQuery 33    │  9388.50ms │                 9423.83ms │     no change │
│ QQuery 34    │  9219.85ms │                 9246.16ms │     no change │
│ QQuery 35    │  3006.75ms │                 3022.45ms │     no change │
│ QQuery 36    │   243.27ms │                  232.42ms │     no change │
│ QQuery 37    │   107.44ms │                  106.40ms │     no change │
│ QQuery 38    │   135.83ms │                  138.65ms │     no change │
│ QQuery 39    │   798.46ms │                  815.32ms │     no change │
│ QQuery 40    │    54.63ms │                   53.55ms │     no change │
│ QQuery 41    │    49.08ms │                   43.92ms │ +1.12x faster │
│ QQuery 42    │    62.14ms │                   60.26ms │     no change │
└──────────────┴────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)                        │ 147172.40ms │
│ Total Time (reduce-clone-of-statistic)   │ 147785.39ms │
│ Average Time (main)                      │   3422.61ms │
│ Average Time (reduce-clone-of-statistic) │   3436.87ms │
│ Queries Faster                           │           3 │
│ Queries Slower                           │           0 │
│ Queries with No Change                   │          40 │
└──────────────────────────────────────────┴─────────────┘

Rachelint · 2024-08-06T18:05:07Z

The other non-target cases

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃       main ┃ reduce-clone-of-statistic ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │     0.67ms │                    0.65ms │ no change │
│ QQuery 1     │    65.64ms │                   64.44ms │ no change │
│ QQuery 2     │   159.91ms │                  159.89ms │ no change │
│ QQuery 3     │   183.11ms │                  181.20ms │ no change │
│ QQuery 4     │  1603.36ms │                 1613.42ms │ no change │
│ QQuery 5     │  1520.54ms │                 1524.46ms │ no change │
│ QQuery 6     │    56.11ms │                   56.84ms │ no change │
│ QQuery 7     │    67.41ms │                   66.71ms │ no change │
│ QQuery 8     │  2257.21ms │                 2268.91ms │ no change │
│ QQuery 9     │  1880.61ms │                 1881.54ms │ no change │
│ QQuery 10    │   542.22ms │                  531.25ms │ no change │
│ QQuery 11    │   590.60ms │                  585.14ms │ no change │
│ QQuery 12    │  1694.33ms │                 1698.81ms │ no change │
│ QQuery 13    │  3323.80ms │                 3294.59ms │ no change │
│ QQuery 14    │  2525.08ms │                 2490.38ms │ no change │
│ QQuery 15    │  1774.93ms │                 1777.03ms │ no change │
│ QQuery 16    │  4806.90ms │                 4812.63ms │ no change │
│ QQuery 17    │  4742.81ms │                 4665.33ms │ no change │
│ QQuery 18    │ 10054.37ms │                 9883.42ms │ no change │
│ QQuery 19    │   148.36ms │                  145.45ms │ no change │
│ QQuery 20    │  3262.71ms │                 3240.41ms │ no change │
│ QQuery 21    │  3818.90ms │                 3793.73ms │ no change │
│ QQuery 22    │  8847.59ms │                 8840.25ms │ no change │
│ QQuery 23    │ 22621.27ms │                22558.51ms │ no change │
│ QQuery 24    │  1118.08ms │                 1115.80ms │ no change │
│ QQuery 25    │   992.32ms │                  991.33ms │ no change │
│ QQuery 26    │  1307.20ms │                 1300.77ms │ no change │
│ QQuery 27    │  4689.10ms │                 4661.18ms │ no change │
│ QQuery 28    │ 23024.40ms │                23722.41ms │ no change │
│ QQuery 29    │   890.02ms │                  894.09ms │ no change │
│ QQuery 30    │  1994.15ms │                 1998.73ms │ no change │
│ QQuery 31    │  2829.76ms │                 2821.23ms │ no change │
│ QQuery 32    │ 15160.37ms │                15177.93ms │ no change │
│ QQuery 33    │  9355.13ms │                 9327.41ms │ no change │
│ QQuery 34    │  9287.31ms │                 9305.62ms │ no change │
│ QQuery 35    │  3002.29ms │                 3019.29ms │ no change │
│ QQuery 36    │   261.70ms │                  248.65ms │ no change │
│ QQuery 37    │   156.69ms │                  152.93ms │ no change │
│ QQuery 38    │   149.18ms │                  150.53ms │ no change │
│ QQuery 39    │   811.29ms │                  808.08ms │ no change │
│ QQuery 40    │    59.28ms │                   59.07ms │ no change │
│ QQuery 41    │    55.25ms │                   56.12ms │ no change │
│ QQuery 42    │    69.43ms │                   66.93ms │ no change │
└──────────────┴────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main)                        │ 151761.37ms │
│ Total Time (reduce-clone-of-statistic)   │ 152013.10ms │
│ Average Time (main)                      │   3529.33ms │
│ Average Time (reduce-clone-of-statistic) │   3535.19ms │
│ Queries Faster                           │           0 │
│ Queries Slower                           │           0 │
│ Queries with No Change                   │          43 │
└──────────────────────────────────────────┴─────────────┘

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃     main ┃ reduce-clone-of-statistic ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 204.44ms │                  202.51ms │ no change │
│ QQuery 2     │  29.58ms │                   31.01ms │ no change │
│ QQuery 3     │  81.19ms │                   81.99ms │ no change │
│ QQuery 4     │  57.45ms │                   57.49ms │ no change │
│ QQuery 5     │ 119.86ms │                  118.53ms │ no change │
│ QQuery 6     │  12.56ms │                   12.50ms │ no change │
│ QQuery 7     │ 248.12ms │                  244.20ms │ no change │
│ QQuery 8     │  25.27ms │                   25.43ms │ no change │
│ QQuery 9     │ 115.59ms │                  116.67ms │ no change │
│ QQuery 10    │ 113.39ms │                  113.32ms │ no change │
│ QQuery 11    │  55.93ms │                   55.65ms │ no change │
│ QQuery 12    │  33.71ms │                   34.02ms │ no change │
│ QQuery 13    │  78.77ms │                   76.73ms │ no change │
│ QQuery 14    │  14.64ms │                   14.60ms │ no change │
│ QQuery 15    │  23.23ms │                   23.77ms │ no change │
│ QQuery 16    │  35.59ms │                   35.66ms │ no change │
│ QQuery 17    │ 168.67ms │                  167.96ms │ no change │
│ QQuery 18    │ 475.47ms │                  472.33ms │ no change │
│ QQuery 19    │  35.21ms │                   35.14ms │ no change │
│ QQuery 20    │  77.52ms │                   78.73ms │ no change │
│ QQuery 21    │ 277.91ms │                  274.80ms │ no change │
│ QQuery 22    │  18.82ms │                   19.37ms │ no change │
└──────────────┴──────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 2302.93ms │
│ Total Time (reduce-clone-of-statistic)   │ 2292.40ms │
│ Average Time (main)                      │  104.68ms │
│ Average Time (reduce-clone-of-statistic) │  104.20ms │
│ Queries Faster                           │         0 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        22 │
└──────────────────────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃     main ┃ reduce-clone-of-statistic ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 277.52ms │                  275.11ms │ no change │
│ QQuery 2     │  44.10ms │                   42.95ms │ no change │
│ QQuery 3     │ 108.33ms │                  106.12ms │ no change │
│ QQuery 4     │  59.08ms │                   57.71ms │ no change │
│ QQuery 5     │ 189.75ms │                  190.00ms │ no change │
│ QQuery 6     │  53.32ms │                   52.75ms │ no change │
│ QQuery 7     │ 293.43ms │                  296.44ms │ no change │
│ QQuery 8     │ 122.13ms │                  119.84ms │ no change │
│ QQuery 9     │ 220.95ms │                  221.72ms │ no change │
│ QQuery 10    │ 189.97ms │                  188.24ms │ no change │
│ QQuery 11    │  30.37ms │                   30.10ms │ no change │
│ QQuery 12    │  80.62ms │                   79.39ms │ no change │
│ QQuery 13    │ 123.44ms │                  123.03ms │ no change │
│ QQuery 14    │  74.08ms │                   74.29ms │ no change │
│ QQuery 15    │ 103.81ms │                  103.95ms │ no change │
│ QQuery 16    │  41.45ms │                   40.85ms │ no change │
│ QQuery 17    │ 267.03ms │                  266.79ms │ no change │
│ QQuery 18    │ 435.63ms │                  437.27ms │ no change │
│ QQuery 19    │ 133.49ms │                  134.07ms │ no change │
│ QQuery 20    │ 127.94ms │                  127.25ms │ no change │
│ QQuery 21    │ 285.53ms │                  287.23ms │ no change │
│ QQuery 22    │  24.96ms │                   24.01ms │ no change │
└──────────────┴──────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                        │ 3286.93ms │
│ Total Time (reduce-clone-of-statistic)   │ 3279.13ms │
│ Average Time (main)                      │  149.41ms │
│ Average Time (reduce-clone-of-statistic) │  149.05ms │
│ Queries Faster                           │         0 │
│ Queries Slower                           │         0 │
│ Queries with No Change                   │        22 │
└──────────────────────────────────────────┴───────────┘

alamb · 2024-08-06T18:27:17Z

@alamb finally I think I got the reason, it seems not the measurement noise for the long queries(such as q22 in clickbench)...

The introduction for the Arc<Statistic> to PartitionedFile maybe actually make the long queries slower. The detail can see above, although the use of Arc can decrease the instructions, but it increase bus-cycles, and finally leads to the higher cycles(slower).

I guess it is related to the atomic in the Arc, and when the amount of PartitionedFile becomes large, the cost of atomic becomes not trivial. But I am not sure, just a guess.

I eliminate the Arc<Statistic> in PartitionedFile now for not hurting the long queries.

The new benchmarks can see following.

I find it very strange that Arc in statistics should show up at all in the execution times -- I would expect a query that takes seconds to run would not look at the statistics once the query started and I would expect the actual processing time to dominate 🤔

alamb · 2024-08-06T18:27:25Z

I'll rerun and see what I can see

Rachelint · 2024-08-06T18:44:43Z

@alamb finally I think I got the reason, it seems not the measurement noise for the long queries(such as q22 in clickbench)...
The introduction for the Arc<Statistic> to PartitionedFile maybe actually make the long queries slower. The detail can see above, although the use of Arc can decrease the instructions, but it increase bus-cycles, and finally leads to the higher cycles(slower).
I guess it is related to the atomic in the Arc, and when the amount of PartitionedFile becomes large, the cost of atomic becomes not trivial. But I am not sure, just a guess.
I eliminate the Arc<Statistic> in PartitionedFile now for not hurting the long queries.
The new benchmarks can see following.

I find it very strange that Arc in statistics should show up at all in the execution times -- I would expect a query that takes seconds to run would not look at the statistics once the query started and I would expect the actual processing time to dominate 🤔

The statistic is actually not used when the execution started, I guess it may be due to the drop of PartitionedFile here (PartitionedFile drop -> Arc<Statistic> drop -> atomic sub).

datafusion/datafusion/core/src/datasource/physical_plan/file_stream.rs

Lines 291 to 305 in 16a3557

    
           fn start_next_file(&mut self) -> Option<Result<(FileOpenFuture, Vec<ScalarValue>)>> { 
        
               let part_file = self.file_iter.pop_front()?; 
        
               let file_meta = FileMeta { 
        
                   object_meta: part_file.object_meta, 
        
                   range: part_file.range, 
        
                   extensions: part_file.extensions, 
        
               }; 
        
               Some( 
        
                   self.file_opener 
        
                       .open(file_meta) 
        
                       .map(|future| (future, part_file.partition_values)), 
        
               ) 
        
           }

Maybe a possible alternative worth trying in future: we take and drop the Arc<Statistic> in PartitionedFile before the actual execution?

alamb · 2024-08-06T19:17:32Z

Removed the api change label as we have now removed the Arc

I agree the timings look good now.

Would you be willing to create a new PR with just the Arc statistics changes so we can see if we see any differences there?

Thanks again @Rachelint

Rachelint · 2024-08-07T02:13:25Z

@alamb I found the reason finally,

Removed the api change label as we have now removed the Arc

I agree the timings look good now.

Would you be willing to create a new PR with just the Arc statistics changes so we can see if we see any differences there?

Thanks again @Rachelint

Yes, planning to, it is really worth puresuing.

alamb · 2024-08-08T12:05:11Z

@alamb I found the reason finally,

Removed the api change label as we have now removed the Arc
I agree the timings look good now.
Would you be willing to create a new PR with just the Arc statistics changes so we can see if we see any differences there?
Thanks again @Rachelint

Yes, planning to, it is really worth puresuing.

Thanks -- filed #11885

github-actions bot added the core Core DataFusion crate label Aug 4, 2024

Rachelint changed the title ~~Reduce clone of Statistics by using arc~~ Reduce clone of Statistics in ListingTable Aug 4, 2024

Rachelint marked this pull request as ready for review August 4, 2024 14:24

Rachelint force-pushed the reduce-clone-of-statistic branch from 459bcfc to 28efa57 Compare August 4, 2024 15:11

Rachelint marked this pull request as draft August 4, 2024 16:47

Rachelint mentioned this pull request Aug 4, 2024

Incomprehensible benmark results from two dfbenchs built in different dirs #11807

Open

Rachelint marked this pull request as ready for review August 5, 2024 04:36

Rachelint changed the title ~~Reduce clone of Statistics in ListingTable~~ Reduce clone of Statistics in ListingTable -- 2x faster for ClickBench Q0 Aug 5, 2024

Rachelint changed the title ~~Reduce clone of Statistics in ListingTable -- 2x faster for ClickBench Q0~~ Reduce clone of Statistics in ListingTable Aug 5, 2024

Rachelint force-pushed the reduce-clone-of-statistic branch from 28efa57 to 5162833 Compare August 5, 2024 06:35

alamb changed the title ~~Reduce clone of Statistics in ListingTable~~ Reduce clone of Statistics in ListingTable and PartitionedFile Aug 5, 2024

alamb added the api change Changes the API exposed to users of the crate label Aug 5, 2024

This was referenced Aug 5, 2024

Make Precision<usize> copy to make it clear clones are not expensive #11828

Merged

Minor: Avoid need for PartitionedFile default #11829

Closed

alamb approved these changes Aug 5, 2024

View reviewed changes

alamb reviewed Aug 5, 2024

View reviewed changes

alamb mentioned this pull request Aug 5, 2024

Improve parquet ListingTable speed with parquet metadata (short clickbench queries) #11719

Open

Rachelint force-pushed the reduce-clone-of-statistic branch 2 times, most recently from 9a8674f to ce03376 Compare August 6, 2024 07:57

Rachelint added 3 commits August 6, 2024 15:58

reduce clone of Statistics by using arc.

b7262c2

optimize get_statistics_with_limit and split_files.

9205343

directly create the col stats set.

d636155

Rachelint added 6 commits August 6, 2024 15:58

fix pb.

fb5fbd1

fix fmt.

ba92c00

fix clippy.

136eb5f

fix compile.

cc33b8a

remove stale codes.

b28adeb

optimize split_files by using drain.

49ca5cb

Rachelint force-pushed the reduce-clone-of-statistic branch from ce03376 to 49ca5cb Compare August 6, 2024 07:58

remove default for PartitionedFile.

2b56774

Rachelint force-pushed the reduce-clone-of-statistic branch from 6993a3f to 3b393d3 Compare August 6, 2024 15:38

don't keep Arc<Statistic> in PartitionedFile.

56cc8ea

Rachelint force-pushed the reduce-clone-of-statistic branch from 3b393d3 to 56cc8ea Compare August 6, 2024 16:06

fix pb.

302590d

alamb removed the api change Changes the API exposed to users of the crate label Aug 6, 2024

alamb merged commit bddb641 into apache:main Aug 6, 2024
24 checks passed

alamb mentioned this pull request Aug 8, 2024

Use Arc<Statistics> rather than Statistics in PartitionedFile #11885

Open

Rachelint mentioned this pull request Aug 9, 2024

lazily compute for null count(seems help to high cardinality aggr) apache/arrow-rs#6155

Closed

alamb mentioned this pull request Aug 14, 2024

DataFusion weekly project plan (Andrew Lamb) - Aug 12, 2024 #11986

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` #11802

Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` #11802

Rachelint commented Aug 4, 2024 •

edited

Loading

Rachelint commented Aug 4, 2024 •

edited

Loading

alamb commented Aug 4, 2024

Rachelint commented Aug 4, 2024 •

edited

Loading

Rachelint commented Aug 5, 2024

Rachelint commented Aug 5, 2024

alamb left a comment

alamb Aug 5, 2024 •

edited

Loading

alamb Aug 5, 2024

alamb Aug 5, 2024

alamb Aug 5, 2024

alamb Aug 5, 2024

Rachelint Aug 6, 2024

alamb Aug 6, 2024

alamb Aug 5, 2024

Rachelint Aug 6, 2024 •

edited

Loading

alamb left a comment

alamb commented Aug 5, 2024

alamb commented Aug 5, 2024

Rachelint commented Aug 6, 2024 •

edited

Loading

alamb commented Aug 6, 2024

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024

Rachelint commented Aug 6, 2024

alamb commented Aug 6, 2024

alamb commented Aug 6, 2024

Rachelint commented Aug 6, 2024

alamb commented Aug 6, 2024

Rachelint commented Aug 7, 2024

alamb commented Aug 8, 2024

Reduce clone of Statistics in ListingTable and PartitionedFile #11802

Reduce clone of Statistics in ListingTable and PartitionedFile #11802

Conversation

Rachelint commented Aug 4, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Rachelint commented Aug 4, 2024 • edited Loading

alamb commented Aug 4, 2024

Rachelint commented Aug 4, 2024 • edited Loading

Rachelint commented Aug 5, 2024

Rachelint commented Aug 5, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

alamb Aug 5, 2024

Choose a reason for hiding this comment

alamb Aug 5, 2024

Choose a reason for hiding this comment

alamb Aug 5, 2024

Choose a reason for hiding this comment

alamb Aug 5, 2024

Choose a reason for hiding this comment

Rachelint Aug 6, 2024

Choose a reason for hiding this comment

alamb Aug 6, 2024

Choose a reason for hiding this comment

alamb Aug 5, 2024

Choose a reason for hiding this comment

Rachelint Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb commented Aug 5, 2024

alamb commented Aug 5, 2024

Rachelint commented Aug 6, 2024 • edited Loading

alamb commented Aug 6, 2024

Rachelint commented Aug 6, 2024 • edited Loading

Rachelint commented Aug 6, 2024 • edited Loading

Rachelint commented Aug 6, 2024 • edited Loading

Rachelint commented Aug 6, 2024

Rachelint commented Aug 6, 2024

alamb commented Aug 6, 2024

alamb commented Aug 6, 2024

Rachelint commented Aug 6, 2024

alamb commented Aug 6, 2024

Rachelint commented Aug 7, 2024

alamb commented Aug 8, 2024

Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` #11802

Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` #11802

Rachelint commented Aug 4, 2024 •

edited

Loading

Rachelint commented Aug 4, 2024 •

edited

Loading

Rachelint commented Aug 4, 2024 •

edited

Loading

alamb Aug 5, 2024 •

edited

Loading

Rachelint Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading

Rachelint commented Aug 6, 2024 •

edited

Loading