Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for add file procedures in Iceberg #23717

Merged
merged 1 commit into from
Oct 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 27 additions & 9 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,11 +568,13 @@ nested directories, or `false` to ignore them.
(iceberg-add-files)=
#### Add files

The connector can add files from tables or locations if
The connector can add files from tables or locations to an existing table if
mosabua marked this conversation as resolved.
Show resolved Hide resolved
`iceberg.add_files-procedure.enabled` is set to `true` for the catalog.

Use the procedure `system.add_files_from_table` to add existing files from the Hive
table or `system.add_files` to add existing files from specified locations.
Use the procedure `system.add_files_from_table` to add existing files from a
Hive table or `system.add_files` to add existing files from a specified location
to an existing table.

The data files must be the Parquet, ORC, or Avro file format.

:::{warning}
Expand All @@ -584,17 +586,31 @@ relevant schema and table names supplied with the required parameters
`schema_name` and `table_name`:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
mosabua marked this conversation as resolved.
Show resolved Hide resolved
schema_name => 'testdb',
table_name => 'hive_customer_orders')
```

You need to provide a `partition_filter` argument to add files from specified partitions.
Alternatively, you can set the current catalog and schema with a `USE`
statement, and omit catalog and schema information, including the `system`
schema for the procedure from any following `ALTER TABLE` statements:

```sql
USE example.testdb;
ALTER TABLE iceberg_customer_orders
EXECUTE add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders')
```

Use a `partition_filter` argument to add files from specified partitions.
The following example adds files from a partition where the `region` is `ASIA` and
`country` is `JAPAN`:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders',
partition_filter => map(ARRAY['region', 'country'], ARRAY['ASIA', 'JAPAN']))
Expand All @@ -604,7 +620,8 @@ In addition, you can provide a `recursive_directory` argument to migrate a
Hive table that contains subdirectories:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders',
recursive_directory => 'true')
Expand All @@ -614,12 +631,13 @@ The default value of `recursive_directory` is `fail`, which causes the procedure
to throw an exception if subdirectories are found. Set the value to `true` to add
files from nested directories, or `false` to ignore them.

`add_files` procedure supports adding files from a specified location.
The `add_files` procedure supports adding files from a specified location.
The procedure does not validate file schemas for compatibility with
the target Iceberg table. The `location` property is supported for partitioned tables.

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files(
location => 's3://my-bucket/a/path',
format => 'ORC')
```
Expand Down
Loading