-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2495: Update merge script to python3 #1373
base: master
Are you sure you want to change the base?
Conversation
- Fixes differences between python2 and python3 - Unifies to only using git as gitbox is no longer necessary. - Used to merge PARQUET-2483 - Adds options to clear existing branches used from merge, if for some reason they exist (useful while iterating on updates to the script) - Adds the ability to pass PR number as an arg on the command line It is possible JIRA updates still have some work here, as it errored out while trying to do those.
I'm not an expert in python. Just want to ask a general question on whether we should enforce using the merge script for parquet-java (or other parquet-xxx projects). It has been broken for a long time. @gszadovszky @julienledem Thoughts? |
I think the main thing the script does (which I haven't fully tested) is allow for easy merging of backports, and JIRA closing (which given current ML thread might not be useful for super long). |
There is a description about it in the dev/README. It seems this is for the time when we were not able to simply push the PRs from github. Github was a read only mirror for a time being and we used the apache git repo to push the commits into. I don't remember using this script. I think we do not need it anymore. |
For reference there is a similar script in Arrow which I think probably has similar origins. The additional benefit even for github, is it looks like it tags issues with appropriate fix versions. Not sure if parquet-java has other ways of tracking this? |
There is a similar script in Apache ORC: https://github.com/apache/orc/blob/main/dev/merge_orc_pr.py. It can help keep the PR description, update JIRA state and backport commit to other branches in a single shot. Perhaps we do not want to enforce using the script but it might help in the above cases. |
I haven't used this script and would prefer to see if we can have a more Github-oriented workflow. Rok just raised a vote to migrate the issues from Jira to GitHub. |
@Fokko in your experience how have issues been tagged with fixes for GH oriented workflows? |
@emkornfield Sorry for the late reply here, the mailbox is a bit swamped, thanks for the ping. For Iceberg we make these Github releases (example for 1.5.0). Github will generate this list for you, and then the release manager groups and curates this a bit to make it easier to read. Merging would just go through the Github UI since we don't need to keep track of things in Jira, and this way we don't need to use the script. |
@Fokko Thanks, I still think the Arrow issues which use the tool might keep better records. For instance if you look at a recently resolved issue in Arrow you see it is tagged with a fix version (I believe this is due to using a script). I don't see something similar in a recent Iceberg. I might be missing something but I think the Arrow approach is nicer as it allows for me to directly find an issue and determine which release I might need to use to make use of it. I also think being able to easily backport fixes to other branches via one step process is nicer even if it doesn't use the UI (but this is likely a rarer use-case). Thoughts? |
In Iceberg we create backports by hand. We checkout the branch, and backport using cherry-picking the commits, and create a PR. Can I ask how you resolve merge conflicts with older versions? |
In the merge script of Apache ORC, it asks which branch to cherry-pick the commit and tries to do that automatically. If it succeeds, then we are done. Or when it fails, it prints error message and asks the developer to resolve the cherry-pick conflict manually and then come back to the script by answering BTW, as I have asked in #2932, it would be good to automatically close the Github issue related to the PR. |
My biggest concern is properly tagging issues to milestones. I think the main question we should answer here is do we want a script. If we do, I think we should check this in and I can cherry-pick the necessary features from some of the other merge scripts (we should also ideally consolidate with the script in |
I started a discussion thread on parquet-dev to get a broader audience to discuss a path forward here. |
It is possible JIRA updates still have some work here, as it errored out while trying to do those.
Make sure you have checked all steps below.
Jira
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
the ASF 3rd Party License Policy.
Tests
tests and somewhat hard to test as it requires REST calls/etc
Commits
from "How to write a good git commit message":
Style
mvn spotless:apply -Pvector-plugins
Documentation