Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support adding a field with ADD COLUMN in Iceberg #16321

Merged
merged 2 commits into from
Jul 12, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Mar 1, 2023

Description

Relates to #16897
Fixes #16248

Release notes

(x) Release notes are required, with the following suggested text:

# General & Iceberg
* Add support for adding nested fields with an `ADD COLUMN` statement. ({issue}`16248`)

@cla-bot cla-bot bot added the cla-signed label Mar 1, 2023
@ebyhr ebyhr self-assigned this Mar 1, 2023
@github-actions github-actions bot added hive Hive connector iceberg Iceberg connector mongodb MongoDB connector tests:hive labels Mar 1, 2023
@ebyhr ebyhr force-pushed the ebi/iceberg-add-field branch 2 times, most recently from e7ac1ef to 50e5080 Compare March 2, 2023 01:21
@github-actions github-actions bot added the delta-lake Delta Lake connector label Mar 2, 2023
@ebyhr ebyhr force-pushed the ebi/iceberg-add-field branch 6 times, most recently from 59eb1b7 to c39c2e8 Compare March 8, 2023 06:07
@ebyhr ebyhr requested review from alexjo2144 and findepi March 8, 2023 06:26
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Parse column name as QualifiedName in column definition"

@ebyhr
Copy link
Member Author

ebyhr commented Mar 20, 2023

Addressed comments.

@findepi findepi requested review from martint and kasiafi March 20, 2023 10:09
Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr @findepi in my opinion this new functionality is not "ADD COLUMN". It effectively doesn't add a column, but it changes the type of an existing column. Hence, it falls under "ALTER COLUMN".

The user could probably use the existing syntax ALTER COLUMN ... SET DATA TYPE ... to add a nested field. I understand however that this syntax is very verbose, as it requires to explicitly describe the requested type. I think that we could extend the ALTER COLUMN syntax to support adding a nested field in a more concise form.
It could be something like:

ALTER COLUMN <column_name> ADD FIELD <field_name> <type>

The <field_name> could be an identifier for an immediately nested field, and a qualified name for deeper nesting (anyway, the passed name should be relative to the <column_name>).

Execution-wise, the two ways of adding a column, ALTER COLUMN ... SET DATA TYPE ... and ALTER COLUMN ... ADD FIELD ... should be unified. The new syntax should be considered syntactic sugar.

All the above applies as well to removing nested fields (#16002).

@martint
Copy link
Member

martint commented Mar 21, 2023

Yes, adding nested fields should be handled with SET DATA TYPE. We talked about it nested fields when adding support for that functionality, but stopped before defining the syntax required.

@findepi
Copy link
Member

findepi commented Mar 21, 2023

adding nested fields should be handled with SET DATA TYPE.

I agree that adding nested fields should be possible via SET DATA TYPE.

it is, however, impractical to require users to provide current ROW definition plus the new field when they want to add a new field, that's why we need option to add/remove individual fields, or users will not use Trino for that. I don't think there is any disagreement, I just wanted to make sure others have the context.

Execution-wise, the two ways of adding a column, ALTER COLUMN ... SET DATA TYPE ... and ALTER COLUMN ... ADD FIELD ... should be unified. The new syntax should be considered syntactic sugar.

I would be hesitant about that. ALTER COLUMN ... ADD FIELD ... can be executed atomically by a connector, while the equivalent of reading the type, adding a field to the type and invoking SET DATA TYPE cannot.

@ebyhr
Copy link
Member Author

ebyhr commented May 16, 2023

@martint Could you take another look?

@ebyhr
Copy link
Member Author

ebyhr commented Jun 6, 2023

Rebased on master to resolve conflicts.

@ebyhr ebyhr requested a review from martint June 22, 2023 23:05
@ebyhr
Copy link
Member Author

ebyhr commented Jul 7, 2023

Rebased on master to resolve conflicts.

@ebyhr
Copy link
Member Author

ebyhr commented Jul 10, 2023

@martint Could you please take another look?

@findepi
Copy link
Member

findepi commented Jul 10, 2023

@ebyhr the build isn't green

@ebyhr
Copy link
Member Author

ebyhr commented Jul 10, 2023

The failure #18202 is unrelated to this change.

@Override
public String getName()
{
return "ADD COLUMN";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it add field

@ebyhr ebyhr dismissed martint’s stale review July 11, 2023 21:01

Addressed comment.

@ebyhr
Copy link
Member Author

ebyhr commented Jul 11, 2023

Let me merge this PR today if there are no additional comments.

@martint
Copy link
Member

martint commented Jul 11, 2023

I’m still reviewing

@ebyhr ebyhr merged commit 00dfcb8 into trinodb:master Jul 12, 2023
91 checks passed
@ebyhr ebyhr deleted the ebi/iceberg-add-field branch July 12, 2023 04:28
@github-actions github-actions bot added this to the 422 milestone Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

Allow adding a specific field in nested field in iceberg connector
4 participants