-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] support for dynamic partition overwrite #348
Comments
Fundamentally, dynamic partition overwrite is not a great API because it is quite scary; you can accidentally overwrite a partition with bad records falling in partitions that shouldnt have been touched. So it is always safer to use In the future, we can always add support for dynamic partition overwrite. But I would still recommend using |
Got it and Thanks for your explain @tdas |
note that the risk of bad records overwriting partitions is zero if you do something like
now you might argue that in such a case i can use yes you can do dumb things with dynamic partition overwrite, but also very useful things. |
It is not really that you cannot do dumb things with More over, up until now, every one who has asked for the feature was happy with If you got an implementation you should consider opening a PR. |
opened PR here: |
@tdas to be honest, breaking the normal API of spark 2, instead of It's so easy to write overwrite when you are used to working with parquets table and forgetting adding the replaceWhere and boom, the table is gone. Maybe that's why you added the time machine... because you knew it would happen ;) |
merge
incremental strategy [and still doesn't yet]
dbt-labs/dbt-spark#138
Just got bit by this because delta didn't function the same as parquet. Sure it is "dangerous," but it is also a feature that can be turned on and off depending on how one wishes to solve a particular problem. Willing to accept the risk. |
Any news about this please ? |
@marmbrus as you suggested i opened PR... in april 2020... it would be great if a committer could look at it sometime. thanks |
I am reopening this issue. Despite our reservations about the risk of using this feature, by community demand, we have decided to go ahead with implementing this feature. |
@koertkuipers interested in continuing to contribute with the PR? |
yes. see: |
Cool. could you update the PR to latest master and run the tests again. lets hope it still works. further discussions on the PR. |
GitOrigin-RevId: ceacc3a6239f36e76cc8cf4f3447285bb06fc6a0 CDF + Vacuum tests This PR adds two tests testing CDF + VACUUM integration. Closes delta-io#1177 GitOrigin-RevId: f2f7b187cb3cc78c267d378eaf3d0657a56241d9 Adds miscellaneous (e.g. end-to-end workload, and CDCReader) tests for CDF. Resolves delta-io#1178 GitOrigin-RevId: 5c7da4ff9413d84e73137a80673872065de8267b Minor refactor to UpdateCommand GitOrigin-RevId: ab3fcbe0522aa194ac0781730a50112655d5c7ec Fixes delta-io#348 Support Dynamic Partition Overwrite The goal of this PR to to support dynamic partition overwrite mode on writes to delta. To enable this on a per write add `.option("partitionOverwriteMode", "dynamic")`. It can also be set per sparkSession in the SQL Config using `.config("spark.sql.sources.partitionOverwriteMode", "dynamic")`. Some limitations of this pullreq: Dynamic partition overwrite mode in combination with replaceWhere is not supported. If both are set this will result in an error. The SQL `INSERT OVERWRITE` syntax does not yet support dynamic partition overwrite. For this more changes will be needed to be made to `org.apache.spark.sql.delta.catalog.DeltaTableV2` and related classes. Fixes delta-io#348 Closes delta-io#371 Signed-off-by: Allison Portis <allison.portis@databricks.com> GitOrigin-RevId: 5b01e5b04e573dabe91ac2d71991a127617b8038 Add Checkpoint + CDF test. This PR adds a test to ensure CDC fields are not included in the checkpoint file. Resolves delta-io#1180 GitOrigin-RevId: d4a7b8bc4d1a79d30806ff18c5b507f6edcd964c
is it supported now ? |
The goal of this PR to to support dynamic partition overwrite mode on writes to delta. To enable this on a per write add `.option("partitionOverwriteMode", "dynamic")`. It can also be set per sparkSession in the SQL Config using `.config("spark.sql.sources.partitionOverwriteMode", "dynamic")`. Some limitations of this pullreq: Dynamic partition overwrite mode in combination with replaceWhere is not supported. If both are set this will result in an error. The SQL `INSERT OVERWRITE` syntax does not yet support dynamic partition overwrite. For this more changes will be needed to be made to `org.apache.spark.sql.delta.catalog.DeltaTableV2` and related classes. Fixes delta-io#348 Closes delta-io#371 Signed-off-by: Allison Portis <allison.portis@databricks.com> GitOrigin-RevId: 5b01e5b04e573dabe91ac2d71991a127617b8038
The goal of this PR to to support dynamic partition overwrite mode on writes to delta. To enable this on a per write add `.option("partitionOverwriteMode", "dynamic")`. It can also be set per sparkSession in the SQL Config using `.config("spark.sql.sources.partitionOverwriteMode", "dynamic")`. Some limitations of this pullreq: Dynamic partition overwrite mode in combination with replaceWhere is not supported. If both are set this will result in an error. The SQL `INSERT OVERWRITE` syntax does not yet support dynamic partition overwrite. For this more changes will be needed to be made to `org.apache.spark.sql.delta.catalog.DeltaTableV2` and related classes. Fixes delta-io#348 Closes delta-io#371 Signed-off-by: Allison Portis <allison.portis@databricks.com> GitOrigin-RevId: 5b01e5b04e573dabe91ac2d71991a127617b8038
Hi @bkyryliuk - it is included in the preview for delta lake 2.0.0 - see the preview release notes here: https://github.com/delta-io/delta/releases/tag/v2.0.0rc1. It will be included in the official 2.0.0 release. |
There is no support for dynamic partition overwrite for now. It seems we can only overwrite with replaceWhere option to specify the overwrite part, which is not convenient.
I am not sure why we don't have dynamic overwrite support in delta, is there any limits about it?
The text was updated successfully, but these errors were encountered: