ParquetExec::statistics::is_exact
likely wrong/misunderstood
#5614
Labels
bug
Something isn't working
ParquetExec::statistics::is_exact
likely wrong/misunderstood
#5614
A
ParquetExec
is created from aFileScanConfig
and an optional filter predicate1. These two are different, independent parameters -- at least the documentation is not implying that the predicate should be considered when constructing theFileScanConfig
. Now the statistics for theParquetExec
are calculated byFileScanConfig::project
:https://github.com/apache/arrow-datafusion/blob/0f6931caa6f8b48e116a8e77e989c404f31f3f8d/datafusion/core/src/physical_plan/file_format/mod.rs#L213-L219
This forwards
is_exact
from the input which might have been set totrue
. However there is a predicate,is_exact
should likely befalse
because some data may be removed which will mess up the exact statistic. So either the forwarding is wrong (at least when a predicate is given) or the docs are imprecise.Note that this is unrelated to #5613 because this issue here is about the
is_exact=true
case.Footnotes
And a metadata size hint, but this is irrelevant here. ↩
The text was updated successfully, but these errors were encountered: