apache · rdblue · Oct 26, 2023 · Oct 25, 2023
diff --git a/1.4.0/docs/api.md b/1.4.0/docs/api.md
@@ -0,0 +1,256 @@
+---
+title: "Java API"
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg Java API
+
+## Tables
+
+The main purpose of the Iceberg API is to manage table metadata, like schema, partition spec, metadata, and data files that store table data.
+
+Table metadata and operations are accessed through the `Table` interface. This interface will return table information.
+
+### Table metadata
+
+The [`Table` interface](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
+
+* `schema` returns the current table [schema](schemas.md)
+* `spec` returns the current table partition spec
+* `properties` returns a map of key-value [properties](configuration.md)
+* `currentSnapshot` returns the current table snapshot
+* `snapshots` returns all valid snapshots for the table
+* `snapshot(id)` returns a specific snapshot by ID
+* `location` returns the table's base location
+
+Tables also provide `refresh` to update the table to the latest version, and expose helpers:
+
+* `io` returns the `FileIO` used to read and write table files
+* `locationProvider` returns a `LocationProvider` used to create paths for data and metadata files
+
+
+### Scanning
+
+#### File level
+
+Iceberg table scans start by creating a `TableScan` object with `newScan`.
+
+```java
+TableScan scan = table.newScan();
+```
+
+To configure a scan, call `filter` and `select` on the `TableScan` to get a new `TableScan` with those changes.
+
+```java
+TableScan filteredScan = scan.filter(Expressions.equal("id", 5))
+```
+
+Calls to configuration methods create a new `TableScan` so that each `TableScan` is immutable and won't change unexpectedly if shared across threads.
+
+When a scan is configured, `planFiles`, `planTasks`, and `schema` are used to return files, tasks, and the read projection.
+
+```java
+TableScan scan = table.newScan()
+    .filter(Expressions.equal("id", 5))
+    .select("id", "data");
+
+Schema projection = scan.schema();
+Iterable<CombinedScanTask> tasks = scan.planTasks();
+```
+
+Use `asOfTime` or `useSnapshot` to configure the table snapshot for time travel queries.
+
+#### Row level
+
+Iceberg table scans start by creating a `ScanBuilder` object with `IcebergGenerics.read`.
+
+```java
+ScanBuilder scanBuilder = IcebergGenerics.read(table)
+```
+
+To configure a scan, call `where` and `select` on the `ScanBuilder` to get a new `ScanBuilder` with those changes.
+
+```java
+scanBuilder.where(Expressions.equal("id", 5))
+```
+
+When a scan is configured, call method `build` to execute scan. `build` return `CloseableIterable<Record>`
+
+```java
+CloseableIterable<Record> result = IcebergGenerics.read(table)
+        .where(Expressions.lessThan("id", 5))
+        .build();
+```
+where `Record` is Iceberg record for iceberg-data module `org.apache.iceberg.data.Record`.
+
+### Update operations
+
+`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
+
+For example, updating the table schema is done by calling `updateSchema`, adding updates to the builder, and finally calling `commit` to commit the pending changes to the table:
+
+```java
+table.updateSchema()
+    .addColumn("count", Types.LongType.get())
+    .commit();
+```
+
+Available operations to update a table are:
+
+* `updateSchema` -- update the table schema
+* `updateProperties` -- update table properties
+* `updateLocation` -- update the table's base location
+* `newAppend` -- used to append data files
+* `newFastAppend` -- used to append data files, will not compact metadata
+* `newOverwrite` -- used to append data files and remove files that are overwritten
+* `newDelete` -- used to delete data files
+* `newRewrite` -- used to rewrite data files; will replace existing files with new versions
+* `newTransaction` -- create a new table-level transaction
+* `rewriteManifests` -- rewrite manifest data by clustering files, for faster scan planning
+* `rollback` -- rollback the table state to a specific snapshot
+
+### Transactions
+
+Transactions are used to commit multiple table changes in a single atomic operation. A transaction is used to create individual operations using factory methods, like `newAppend`, just like working with a `Table`. Operations created by a transaction are committed as a group when `commitTransaction` is called.
+
+For example, deleting and appending a file in the same transaction:
+```java
+Transaction t = table.newTransaction();
+
+// commit operations to the transaction
+t.newDelete().deleteFromRowFilter(filter).commit();
+t.newAppend().appendFile(data).commit();
+
+// commit all the changes to the table
+t.commitTransaction();
+```
+
+## Types
+
+Iceberg data types are located in the [`org.apache.iceberg.types` package](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/types/package-summary.html).
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. Types without parameters use `get`, and types like `decimal` use factory methods:
+
+```java
+Types.IntegerType.get()    // int
+Types.DoubleType.get()     // double
+Types.DecimalType.of(9, 2) // decimal(9, 2)
+```
+
+### Nested types
+
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](evolution.md#correctness) and nullability.
+
+Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.
+
+```java
+// struct<1 id: int, 2 data: optional string>
+StructType struct = Struct.of(
+    Types.NestedField.required(1, "id", Types.IntegerType.get()),
+    Types.NestedField.optional(2, "data", Types.StringType.get())
+  )
+```
+```java
+// map<1 key: int, 2 value: optional string>
+MapType map = MapType.ofOptional(
+    1, 2,
+    Types.IntegerType.get(),
+    Types.StringType.get()
+  )
+```
+```java
+// array<1 element: int>
+ListType list = ListType.ofRequired(1, IntegerType.get());
+```
+
+
+## Expressions
+
+Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/expressions/Expressions.html).
+
+Supported predicate expressions are:
+
+* `isNull`
+* `notNull`
+* `equal`
+* `notEqual`
+* `lessThan`
+* `lessThanOrEqual`
+* `greaterThan`
+* `greaterThanOrEqual`
+* `in`
+* `notIn`
+* `startsWith`
+* `notStartsWith`
+
+Supported expression operations are:
+
+* `and`
+* `or`
+* `not`
+
+Constant expressions are:
+
+* `alwaysTrue`
+* `alwaysFalse`
+
+### Expression binding
+
+When created, expressions are unbound. Before an expression is used, it will be bound to a data type to find the field ID the expression name represents, and to convert predicate literals.
+
+For example, before using the expression `lessThan("x", 10)`, Iceberg needs to determine which column `"x"` refers to and convert `10` to that column's data type.
+
+If the expression could be bound to the type `struct<1 x: long, 2 y: long>` or to `struct<11 x: int, 12 y: int>`.
+
+### Expression example
+
+```java
+table.newScan()
+    .filter(Expressions.greaterThanOrEqual("x", 5))
+    .filter(Expressions.lessThan("x", 10))
+```
+
+
+## Modules
+
+Iceberg table support is organized in library modules:
+
+* `iceberg-common` contains utility classes used in other modules
+* `iceberg-api` contains the public Iceberg API, including expressions, types, tables, and operations
+* `iceberg-arrow` is an implementation of the Iceberg type system for reading and writing data stored in Iceberg tables using Apache Arrow as the in-memory data format
+* `iceberg-aws` contains implementations of the Iceberg API to be used with tables stored on AWS S3 and/or for tables defined using the AWS Glue data catalog
+* `iceberg-core` contains implementations of the Iceberg API and support for Avro data files, **this is what processing engines should depend on**
+* `iceberg-parquet` is an optional module for working with tables backed by Parquet files
+* `iceberg-orc` is an optional module for working with tables backed by ORC files (*experimental*)
+* `iceberg-hive-metastore` is an implementation of Iceberg tables backed by the Hive metastore Thrift client
+
+This project Iceberg also has modules for adding Iceberg support to processing engines and associated tooling:
+
+* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
+* `iceberg-flink` is an implementation of Flink's Table and DataStream API for Iceberg (use iceberg-flink-runtime for a shaded version)
+* `iceberg-hive3` is an implementation of Hive 3 specific SerDe's for Timestamp, TimestampWithZone, and Date object inspectors (use iceberg-hive-runtime for a shaded version).
+* `iceberg-mr` is an implementation of MapReduce and Hive InputFormats and SerDes for Iceberg (use iceberg-hive-runtime for a shaded version for use with Hive)
+* `iceberg-nessie` is a module used to integrate Iceberg table metadata history and operations with [Project Nessie](https://projectnessie.org/)
+* `iceberg-data` is a client library used to read Iceberg tables from JVM applications
+* `iceberg-pig` is an implementation of Pig's LoadFunc API for Iceberg
+* `iceberg-runtime` generates a shaded runtime jar for Spark to integrate with iceberg tables
+
diff --git a/1.4.0/docs/assets/images/audit-branch.png b/1.4.0/docs/assets/images/audit-branch.png
diff --git a/1.4.0/docs/assets/images/historical-snapshot-tag.png b/1.4.0/docs/assets/images/historical-snapshot-tag.png
diff --git a/1.4.0/docs/assets/images/iceberg-in-place-metadata-migration.png b/1.4.0/docs/assets/images/iceberg-in-place-metadata-migration.png
diff --git a/1.4.0/docs/assets/images/iceberg-migrateaction-step1.png b/1.4.0/docs/assets/images/iceberg-migrateaction-step1.png
diff --git a/1.4.0/docs/assets/images/iceberg-migrateaction-step2.png b/1.4.0/docs/assets/images/iceberg-migrateaction-step2.png
diff --git a/1.4.0/docs/assets/images/iceberg-migrateaction-step3.png b/1.4.0/docs/assets/images/iceberg-migrateaction-step3.png
diff --git a/1.4.0/docs/assets/images/iceberg-snapshotaction-step1.png b/1.4.0/docs/assets/images/iceberg-snapshotaction-step1.png
diff --git a/1.4.0/docs/assets/images/iceberg-snapshotaction-step2.png b/1.4.0/docs/assets/images/iceberg-snapshotaction-step2.png
diff --git a/1.4.0/docs/assets/images/partition-spec-evolution.png b/1.4.0/docs/assets/images/partition-spec-evolution.png