Task: Add Unit Test for Flink-Spark Equality Delete Write #1

adelly13 · 2024-10-17T05:21:11Z

Created new flink-spark-bundle module with Unit Test testEqualityDeleteWritesOnSpark() in TestFlinkSpark.java.

Procedures for testEqualityDeleteWritesOnSpark():

Create table in Flink
Write initial data
Create a deletestream
Apply equality deletes
Initialize spark session
Read table using spark
Compare

Current issues: running into errors with the imports as well as Invalid write distribution mode: range. Need to define sort order or partition spec.

geruh · 2024-10-20T09:01:46Z

flink-spark-bundle/src/test/java/org/apache/iceberg/flinkspark/TestFlinkSpark.java

+    }
+
+    @TestTemplate
+    public void testCheckAndGetEqualityFieldIds() {


We can get rid of the all of the tests but your spark + flink test

geruh · 2024-10-20T10:06:01Z

flink-spark-bundle/src/test/java/org/apache/iceberg/flinkspark/TestFlinkSpark.java

+
+        // Assert that only row with id=3 remains in the table
+        assertThat(actualData).containsExactlyInAnyOrderElementsOf(expectedData);
+


This is a good test for writing the equality delete using the stream execution environment! I'd also suggest creating a test for an UPSERT case by leveraging the FlinkTableEnviroment. This will help you see how, the UPSERT leverages an equality delete to replace the value.

You can add a test to TestFlinkCatalogTable.java like this:

sql("CREATE TABLE test_table (id INT, data STRING, PRIMARY KEY(id) NOT ENFORCED) WITH ('format-version'='2', 'write.upsert.enabled'='true')"); sql("INSERT INTO test_table VALUES (1, 'a'), (2, 'b'), (3, 'c')"); // Perform upsert operation sql("INSERT INTO test_table VALUES (2, 'updated_b'), (4, 'd')");

To expand upon this, I'd suggest adding some DeleteFile assertions. For example, what do you expect the DeleteFile to look like we want to delete based on:

one column: id = 1

all columns: id = 1 and data = 'a'

range: id > 3 (also how does flink get the values greater than 3?)

geruh · 2024-10-20T10:27:36Z

flink-spark-bundle/src/test/java/org/apache/iceberg/flinkspark/TestFlinkSpark.java

+                .master("local[*]")
+                .config("spark.sql.catalog.hadoop_catalog", "org.apache.iceberg.spark.SparkCatalog")
+                .config("spark.sql.catalog.hadoop_catalog.type", "hadoop")
+                .config("spark.sql.catalog.myCatalog.warehouse", "file:///path/to/warehouse")


We should set the same warehouse as the Flink catalog to ensure we are using the same tables.

spark = SparkSession.builder() .appName("iceberg-spark") .master("local[*]") .config("spark.sql.catalog.hadoop_catalog", "org.apache.iceberg.spark.SparkCatalog") .config("spark.sql.catalog.hadoop_catalog.type", "hadoop") .config("spark.sql.catalog.hadoop_catalog.warehouse", CATALOG_EXTENSION.warehouse()) .getOrCreate();

Added Unit Test for Flink-Spark Equality Delete Write

c471a25

adelly13 changed the title ~~Added Unit Test for Flink-Spark Equality Delete Write~~ Task: Add Unit Test for Flink-Spark Equality Delete Write Oct 18, 2024

geruh reviewed Oct 20, 2024

View reviewed changes

Added Upsert unit test and fix relocated indexing

3f7f2ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task: Add Unit Test for Flink-Spark Equality Delete Write #1

Task: Add Unit Test for Flink-Spark Equality Delete Write #1

adelly13 commented Oct 17, 2024

geruh Oct 20, 2024

geruh Oct 20, 2024 •

edited

Loading

geruh Oct 20, 2024 •

edited

Loading

geruh Oct 20, 2024


		// Assert that only row with id=3 remains in the table
		assertThat(actualData).containsExactlyInAnyOrderElementsOf(expectedData);

Task: Add Unit Test for Flink-Spark Equality Delete Write #1

Are you sure you want to change the base?

Task: Add Unit Test for Flink-Spark Equality Delete Write #1

Conversation

adelly13 commented Oct 17, 2024

geruh Oct 20, 2024

Choose a reason for hiding this comment

geruh Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

geruh Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

geruh Oct 20, 2024

Choose a reason for hiding this comment

geruh Oct 20, 2024 •

edited

Loading

geruh Oct 20, 2024 •

edited

Loading