-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
187: Adding apache hudi support to dbt #210
Changes from 23 commits
3b781cb
c3d11fe
022edba
cd22177
59b1370
b0e45fd
10a50ca
705a777
9616bb0
283c7d1
8f49b09
a4f0699
f521ca9
7ba9b1b
46be053
d9e15a0
ca588b2
2d5ba2e
4b43b46
aab2160
ae3bfe3
0723de9
202e88a
22a2025
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
spark.hadoop.datanucleus.autoCreateTables true | ||
spark.hadoop.datanucleus.schema.autoCreateTables true | ||
spark.hadoop.datanucleus.fixedDatastore false | ||
spark.serializer org.apache.spark.serializer.KryoSerializer | ||
spark.jars.packages org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0 | ||
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension | ||
spark.driver.userClassPathFirst true |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'append', | ||
file_format = 'hudi', | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg | ||
|
||
{% else %} | ||
|
||
select cast(2 as bigint) as id, 'yo' as msg | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'insert_overwrite', | ||
file_format = 'hudi', | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg | ||
|
||
{% else %} | ||
|
||
select cast(2 as bigint) as id, 'yo' as msg | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'insert_overwrite', | ||
partition_by = 'id', | ||
file_format = 'hudi', | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg | ||
|
||
{% else %} | ||
|
||
select cast(2 as bigint) as id, 'yo' as msg | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'merge', | ||
file_format = 'hudi', | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg | ||
|
||
{% else %} | ||
|
||
select cast(2 as bigint) as id, 'yo' as msg | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'merge', | ||
file_format = 'hudi', | ||
unique_key = 'id', | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg | ||
|
||
{% else %} | ||
|
||
select cast(2 as bigint) as id, 'yo' as msg | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
{{ config( | ||
materialized = 'incremental', | ||
incremental_strategy = 'merge', | ||
file_format = 'hudi', | ||
unique_key = 'id', | ||
merge_update_columns = ['msg'], | ||
) }} | ||
|
||
{% if not is_incremental() %} | ||
|
||
select cast(1 as bigint) as id, 'hello' as msg, 'blue' as color | ||
union all | ||
select cast(2 as bigint) as id, 'goodbye' as msg, 'red' as color | ||
|
||
{% else %} | ||
|
||
-- msg will be updated, color will be ignored | ||
select cast(2 as bigint) as id, 'yo' as msg, 'green' as color | ||
union all | ||
select cast(3 as bigint) as id, 'anyway' as msg, 'purple' as color | ||
|
||
{% endif %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{{ config(materialized='table', file_format='hudi') }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Am I right in thinking that this is failing on Databricks because the This specific test case (
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you are right, since there were many iterations on this PR already, For now, I'll disable the model to keep it simple and merge this PR, later in the next iteration I'll bring back both these tests. |
||
select 1 as id, 'Vino' as name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! Out of curiosity, what's the change coming in v0.10 that will make this sail smoothly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark SQL DML support has been added to Apache Hudi recently with the 0.9.0 release, but there were a few gaps that got fixed after we released the last version, which is scheduled for the next release in a few weeks.
Most specifically, these commits are the ones that are relevant to making these tests run smoothly.