-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add INSERT OVERWRITE to Trino SQL #11603
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: George Fisher.
|
This has been discussed many, many, many times. The goal of Trino is not to be a better Hive, but to be a ANSI compliant MPP engine over arbitrary data. This means changing the language for specific Hive behaviors is very unlikely to happen. In this specific case, partitioned data has never been particularly popular outside of very large (and very high tech) companies, and most of these are moving to new and better designs like Iceberg, so I don't think there is much chance of this getting accepted into Trino. In the future, I suggest you discussion with the community in Slack before putting a lot of work into a big change like this. |
Hi @dain thanks for the comments. After my thread on Trino Slack about this feature I wondered how this would be received. I over-emphasized the Hive similarity - we do not actually use Hive outside Hive tables. My history with this kind of scenario extends back before I ever met a Hive table. My experience with the relative importance of this scenario is exactly as you say, at very large tech companies where the data size is unmanageable without partitioning. Also, no worries about the effort: (1) I gained deep insight into the controller interface (2) a dive into Trino testing (3) I started to implement this as a proof-of-concept and it moved quickly (4) this is a change I can always use - nothing is lost. |
Thanks. The slack discussion on this is great. The conclusion of that discussion is the upcoming MERGE support should be able to cover this usecase. |
Great. I will continue discussion of this in the issue thread. |
Add INSERT OVERWRITE to Trino SQL to allow connectors to provide overwrite functionality without the need for session parameters.
Related issue: #11602
Current functionality: example multi-stage query:
SET SESSION hive.insert_existing_partitions_behavior='OVERWRITE';
INSERT INTO hive.test2.insert_test SELECT * FROM tpch.sf1.customer;
Can instead be written like this:
INSERT OVERWRITE hive.test2.insert_test SELECT * FROM tpch.sf1.customer;
Notes
This is a change to the SQL parser, connector interface, and the Hive connector.
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:
Details
I have opened an issue here where explain why I am proposing this pull request: #11602
Justifications Summary
Motivations