Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Metadata for Iceberg table #20949

Open
cain129 opened this issue Mar 5, 2024 · 3 comments
Open

Generate Metadata for Iceberg table #20949

cain129 opened this issue Mar 5, 2024 · 3 comments

Comments

@cain129
Copy link

cain129 commented Mar 5, 2024

Hello,

I cannot find anything in the documentation that would accomplish what I am looking to do.

I am using Trino with iceberg enabled with Postgres as the Metastore. I have a table that uses an external location in Minio for which I want to insert files into minio using an external service.

My problem is, I cannot read data from the external location when it has been inserted directly into minio. That is to say, I can insert and query data inside Trino, but any data files placed into s3a://bucket/iceberg/tableName/data are not able to be queried. I feel there should be something like hive's "CALL system.sync_partition_metadata" for iceberg that would allow us to place files into the external location, and then generate the metadata in the metastore for querying using trino. I know there is a procedure "register.table" however my understanding was that this is a way to restore previous snapshots i.e a table got deleted but the metadata still exists.

In conclusion: There needs to be a way to generate metadata for iceberg tables for data that already exists in the location.

Please let me know if this is already a feature or a duplicate issue.

@oneonestar
Copy link
Member

Are you looking for #11744 ?

@ebyhr
Copy link
Member

ebyhr commented Mar 6, 2024

I have a table that uses an external location in Minio

What's the table format? Did you consider using migrate procedure?

@cain129
Copy link
Author

cain129 commented Mar 6, 2024

Are you looking for #11744 ?

This looks close to what im doing. One concern I have is it seems like #11744 is more used for importing data into a trino table that is stored in trino. This would be for tables stored externally.

I have a table that uses an external location in Minio

What's the table format? Did you consider using migrate procedure?

I believe migrate is used for an existing hive table that you want to convert to an iceberg table? That is not what I am wanting to do. There is no source hive table to migrate from. We have data coming from external sources that get augmented and processed by other applications and then passed through nifi into our minio bucket. With Hive we can use "CALL system.sync_partition_metadata" to allow those files to be viewed by a table with an external_location. We want to do something similar in iceberg but iceberg doesnt have a sync_partition_metadata procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants