Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid reading min/max/null statistics for planning iceberg inserts #23757

Merged
merged 2 commits into from
Oct 13, 2024

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Oct 11, 2024

Description

For large tables going through statistics for all files can be slow.
The calling code in getStatisticsCollectionMetadataForWrite was not
using all the statistics and is simplified to only fetch NDVs.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Improve planning time for iceberg inserts. ({issue}`23757`)

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

My only concern is that we should guard the code against potential future regressions on the write path which could be affected by min/max stats.

@@ -346,6 +347,7 @@
import static org.apache.iceberg.SnapshotSummary.DELETED_RECORDS_PROP;
import static org.apache.iceberg.SnapshotSummary.REMOVED_EQ_DELETES_PROP;
import static org.apache.iceberg.SnapshotSummary.REMOVED_POS_DELETES_PROP;
import static org.apache.iceberg.SnapshotSummary.TOTAL_RECORDS_PROP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calling code in getStatisticsCollectionMetadataForWrite was not
using all the statistics and is simplified to only fetch NDVs.

For regression prevention purposes: Can we ensure through a test that the calling code does not depend on min/max stats retrieved from io.trino.plugin.iceberg.TableStatisticsReader#makeTableStatistics ?

I remember raising a while ago the idea promoted by this PR and @findepi was rather cautious in doing this change because of potential regressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important thing here is to not go through all manifest files for planning inserts rather than usage of min/max/null stats. There are tests which assert on manifest file accesses on filesystem. If this code were getting those stats through some other cheaper means, then we wouldn't care about it.
At worst, there's couple of ways of making a mistake in this code:

  1. We generate NDV stats even though we don't know existing NDVs and end up under counting NDVs by recording only the NDVs collected on write. The other min/max/nulls stats would still be correct, so the CBO may give worse plans but it's not the end of the world. Eventually the stats will become more accurate, or a call to ANALYZE will fix the whole thing.
  2. We fail to detect that the table is empty or that NDVs are known and skip generating them on write. Again we get possibly worse plans from CBO while the other non-ndv stats are still intact and a call to ANALYZE will fix the problem.
    Either way, these are much more tolerable problems than the problem caused by the current code where it can bottleneck INSERT queries for minutes on planning.

@@ -234,7 +233,6 @@ public void testInsert()
.add(new FileOperation(STATS, "InputFile.newStream"))
.add(new FileOperation(SNAPSHOT, "OutputFile.create"))
.add(new FileOperation(MANIFEST, "OutputFile.create"))
.add(new FileOperation(MANIFEST, "InputFile.newStream"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should outline in the commit comment that the manifest list corresponding to the table snapshot is not read anymore now during writes.

For large tables going through statistics for all files can be slow.
The calling code in getStatisticsCollectionMetadataForWrite was not
using all the statistics and is simplified to only fetch NDVs.
@raunaqmorarka raunaqmorarka merged commit 0c0dda1 into trinodb:master Oct 13, 2024
42 checks passed
@raunaqmorarka raunaqmorarka deleted the ice-metadata-for-write branch October 13, 2024 10:14
@github-actions github-actions bot added this to the 462 milestone Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants