Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Refactor plan #230

Closed
wants to merge 3 commits into from
Closed

DO NOT MERGE: Refactor plan #230

wants to merge 3 commits into from

Conversation

fqtab
Copy link
Contributor

@fqtab fqtab commented Mar 29, 2024

This DRAFT is just to give interested stakeholders an idea of where/how I'm planning to refactor things to enable:

  • Pluggable Committer interface
    • kafka-connect/src/main/java/io/tabular/iceberg/connect/api/Committer.java
  • Consumer side zombie fencing
    • kafka-connect/src/main/java/io/tabular/iceberg/connect/committer/v1/CommitterImpl.java
  • Producer side zombie fencing and separate control cluster capabilities
    • kafka-connect/src/main/java/io/tabular/iceberg/connect/committer/v2/CommitterImpl.java
    • kafka-connect-runtime/src/test/java/io/tabular/iceberg/connect/IntegrationControlClusterTest.java

The plan is to break this up into smaller PRs with wayyyy better code (not to mention better test coverage)
Also, there are breaking changes as well as unnecessary changes in this PR (because this is just a POC).

The main downside of this refactor is we create a TaskWriter per topic partition per table per in a given Task now.
Previously, we only create a TaskWriter per table in given Task.
This is necessary to enable producer generation based fencing.
However this likely means higher memory usage (think particularly about the table fanout usecase).

PRs planned:

  1. Write some important tests to avoid breaking changes
  2. Refactor existing code to conform to new interfaces (Writer, Committer)
  3. Add zombie fencing to Committer implementation
  4. Add a new Committer that allows for separate control cluster

@fqtab fqtab force-pushed the fence_worker_zombies branch from 79e367a to 11d0096 Compare April 2, 2024 18:49
@fqtab fqtab changed the title DRAFT: Fence worker zombies DRAFT: The future Apr 2, 2024
@fqtab fqtab changed the title DRAFT: The future DO NOT REVIEW: Refactor iceberg-kafka-connect Apr 2, 2024
@fqtab fqtab changed the title DO NOT REVIEW: Refactor iceberg-kafka-connect DO NOT REVIEW: Refactor plan Apr 2, 2024
@tabmatfournier
Copy link
Contributor

We are already memory hungry. This would be very memory hungry in the cases where someone is fanning out to multiple tables at once and have a lot of partitions. Wouldn't be surprised if you 2x-5xd the memory usage in some cases.

@fqtab fqtab changed the title DO NOT REVIEW: Refactor plan DO NOT MERGE: Refactor plan Apr 4, 2024
@bryanck
Copy link
Contributor

bryanck commented Apr 8, 2024

We are already memory hungry. This would be very memory hungry in the cases where someone is fanning out to multiple tables at once and have a lot of partitions. Wouldn't be surprised if you 2x-5xd the memory usage in some cases.

+1, more concerning to me is moving to a model were we must create a file per partition per commit, which will result in many more files created than needed in many cases. This can have a negative impact on query planning performance, scan performance, storage size, and so on. Aggressive compaction can help mitigate that to some extent but that can drive up compute and storage costs also.

At my current company we will often partition a topic for future scalability using a highly composite number of partitions, often several thousand. This allows us to scale the number of processing tasks without introducing skew. I feel it is too limiting not being able to coalesce partitions when writing.

@fqtab
Copy link
Contributor Author

fqtab commented Apr 9, 2024

Cool, I did call this out explicitly as a potential issue in the PR description but admittedly I did not fully consider the impact of this.
Appreciate the clear feedback and agree it doesn't make sense to move forward with the topic-partition-level processing model.
Taken the rest of the useful changes from this PR and created a new PR.

@fqtab fqtab closed this Apr 9, 2024
@fqtab fqtab deleted the fence_worker_zombies branch April 9, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants