Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7665] Support rolling upgrade to table version 8 #12250

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

codope
Copy link
Member

@codope codope commented Nov 13, 2024

Change Logs

  • Migrating table properties including partition fields, key generators, payload type, bootstrap index type. Handling both upgrade and downgrade
  • Migrating timeline to new layout: a) archived to LSM timeline layout, b) read both json/avro commit metadata, c) rename instants (including clustering action). These are all done for upgrade. For downgrade, I need to write a LSM to legacy archive timeline v1 writer.
  • Full compact the table, to get rid of log files. Both in case of upgrade and downgrade.
  • Drop version 7.
  • Some tests for above.

TODO:

  • LSM to legacy archived timeline writer to use in downgrade.
  • Migration path for CDC and incremental queries.
  • Handle differences between 0.x and 1.x for stuff needed in upgrade e.g compaction (need to compact older file slice), rollback (marker differences b/w 0.14 and 0.15). Though if we compact and delete any leftover markers, it might be okay. Need to test these scenarios.

Impact

Support rolling upgrade to table version 8.

Risk level (write none, low medium or high below)

high

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 13, 2024
String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
HoodieTableMetaClient metaClient = table.getMetaClient();
HoodieTableConfig tableConfig = metaClient.getTableConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can trigger a rollback for pending instants first, then remove the any log file markers explicitly if there are any.
This way we can clean up the table as much as possible before any other steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, changed the code to rollback and compact in one step.

throw new HoodieException(e);
}
};
lsmTimelineWriter.write(Collections.singletonList(ActiveAction.fromInstants(archivedTimeline.getInstants())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list for archived timeline is huge, we need to split the list into small batchs and write into the LSM timeline per-batch, by default, just use 10 instants as a batch which is in line with the current behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern is that the legacy archived timeline may contain enormous instants there (like several GBs of avro logs), it would be very time-consuming to load the whole legacy archived timeline, maybe we just load the latest avro log(which should be enough for file slicing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i thought about it and I agree. I wanted to quickly do some local testing to validate the upgrade path. I will do the batching soon.


// Migrate the LSM timeline back to the old archived timeline format
try {
// TODO: Convert instants from the LSM format to the old format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need timeline archiver v1 from Balaj's PR. cc @bvaradar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. For now, i've ported over some legacy archiver code (copied from #11923) for local testing. Once that PR lands, I will rebase.

@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Nov 14, 2024
@codope codope force-pushed the hudi-7665-table-props branch 3 times, most recently from 12271aa to 9d9683f Compare November 16, 2024 14:05
}

@Override
public int archiveInstants(HoodieEngineContext context, List<HoodieInstant> instantsToArchive, boolean acquireLock) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this acquireLock is always false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is true in UpgradeDowngradeUtils methods upgradeToLSMTimeline and downgradeFromLSMTimeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need lock for upgrade/downgrade, is there any concurrency here?

port some legacy timeline archiver code and compact post rollback
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL PR with lines of changes > 1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants