Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it so applying and removing patches are repeatable without errors #2502

Merged
merged 3 commits into from
Oct 14, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Oct 14, 2024

There are a number of issues with dealing with patches for the CUDF repo. This fixes some of them. It does not fix submodule update --init as that happens externally. It also does not fix other parts of upmerging a branch unless you have unapplied the patches before hand. Sadly it also does not provide any good way to do development work on CUDF using the submodule as your development environment.

What this does do it make it so that if you apply patches to CUDF and try to build again without removing those patches it should work.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Oct 14, 2024

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's here looks OK.

What about the dangerous and surprising behavior of specifying submodule.patch.skip=true as discussed in #2497 (comment)? If this is an artifact of how the validate build works, maybe we should change the name of the property to something less "inviting" for developers to try, as this won't do what they think it does.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 14, 2024

What's here looks OK.

What about the dangerous and surprising behavior of specifying submodule.patch.skip=true as discussed in #2497 (comment)? If this is an artifact of how the validate build works, maybe we should change the name of the property to something less "inviting" for developers to try, as this won't do what they think it does.

That is a good point. At a minimum I need to document it better. Thinking about it I am trying to understand the use case fro it. Originally I put it in as a way to skip the apply/unapply steps. But then I used it as a part of the CI process to auto upmerge CUDF, which made @gerashegalov want to lock it down so we don't accidentally build the plugin without the patches being installed. But I don't see that being the case any more. Perhaps I can move it back to what it was originally intended to be and document it better to let people know that there is a foot/gun here no matter what it does.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 14, 2024

build

jlowe
jlowe previously approved these changes Oct 14, 2024
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo, lgtm. Note that this will make the submodule sync take a bit longer because it will build everything only to rebuild it, but ccache may be able to avoid a lot of the rebuild.

I'm personally OK if we decide to keep the build-skipping as long as we update the name of the config to reflect that. submodule.patch.skip implies it's only going to skip patching submodules, not also skip all native builds (including those outside of the submodule).

pom.xml Outdated Show resolved Hide resolved
if [ -n "$(git status --porcelain --untracked-files=no)" ] ; then
CHANGED_FILES=$(git status --porcelain --untracked-files=no)

if [ \( -s "$FULLY_PATCHED_FILE" \) -a \( -n "$CHANGED_FILES" \) ] ; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a direct test of patch applicability via git apply --check for both forward and --reverse . If it is neither of the states we are in, we can ask/warn the developer to commit changes and regenerate the patch if it is intentional

# is to save some state files about what happened. But a user could mess with CUDF directly
# so we want to have ways to double check that they are indeed correct.

FULLY_PATCHED_FILE="$CUDF_DIR/spark-rapids-jni.patch"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may technically be a patch that applies to non-cudf portion of spark-rapids, maybe should go to the top-level dir?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what's the use-case for patching outside of the submodule? I thought we would checkin the change directly rather than a patch of that change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have a sole use case with the submodule patching right now. No doubt direct modifications of the source tree outside the module is preferrable. In terms of hypotheticals, there could be another submodule or some native dependency that we want consume bypassing cudf which needs patching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now all patches apply only to CUDF. I don't want to try and make that more generic right now. If we do hit that situation I would rather have sub-directories under patches for each third party sub directory, and then apply/remove patches for each subdirectory separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, separate subdirs work too

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Oct 14, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Oct 14, 2024

@gerashegalov I added in the direct test, and you were also right that git apply is much cleaner than patch. Please take another look

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gerashegalov gerashegalov merged commit 2c3b60c into NVIDIA:branch-24.12 Oct 14, 2024
3 checks passed
pxLi pushed a commit to pxLi/spark-rapids-jni that referenced this pull request Oct 15, 2024
NVIDIA#2502)

* Make it so applying and removing patches are repeatable without errors

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Adjust config for skipping a patch

* More fixes

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

---------

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants