-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove aggregator dependency before deploying dist artifact #4265
Remove aggregator dependency before deploying dist artifact #4265
Conversation
Fixes NVIDIA#4253 Signed-off-by: Gera Shegalov <gera@apache.org>
build |
jenkins/spark-tests.sh
Outdated
@@ -23,7 +23,7 @@ nvidia-smi | |||
|
|||
ARTF_ROOT="$WORKSPACE/jars" | |||
MVN_GET_CMD="mvn org.apache.maven.plugins:maven-dependency-plugin:2.8:get -B \ | |||
-Dmaven.repo.local=$WORKSPACE/.m2 \ | |||
-Dmaven.repo.local=$WORKSPACE/.m2 -Dtransitive=false \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix would be good for our test script jenkins/spark-tests.sh
, but there should be a problem on the customers' end, as they may add rapids-4-spark_2.12
in there pom.xml if without adding transitive=false
, them they will get these dependencies error.
Would it doable to remove the dependency tree [dependency reduce] from rapids-4-spark_2.12
, similar to what we did on branch-21.10? https://github.com/NVIDIA/spark-rapids/pull/3930/files#diff-2681b51a40a6a26d3a6f7f839cd810422226193ef273ea02d1b0b93c3a87cfa6R1868
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the dependencies as published seem problematic for users who want to pull in the dist jar for building their RAPIDS accelerated UDFs or custom code that accesses ColumnarRdd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have recently litigated it on other PRs: transitive provided
dependencies are irrelevant for normal use https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope. We should not optimize for dependency:get mojo that behaves different by default. IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case it's just wrong. The dependency states that the jar needs the aggregator jar to be provided, but it contains it instead. What's the point of advertising a bogus dependency? It clearly causes problems for some use cases, and is misleading at best if someone examines the pom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree: provided
is the weakest scope that still serves the purpose that the aggregator dependency tree is required for building the dist jar (which it is, especially given that some branches like spark312db are built remotely). The next weakest dependency scope is runtime
stating that the dependency will be needed only at the execution phase, and using that might be misleading.
What's the point of advertising a bogus dependency?
disagree with the characterization as bogus because the aggregator is required for building dist package. According to the transitive dependency rules: if the application
depends with any scope on dist
and dist
depends with provided
on aggregator
, the dependency subtree rooted in aggregator
is specifically not advertised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this violates the spirit of the shading we're doing in two ways.
-
Most shaded jars do not advertise the dependencies they used during builds when they pull in those artifacts. That's why there's a dependency-reduced pom built by default by the shade plugin. What we're doing here is effectively a form of shading, so I do not understand why we would want to advertise the details of a dependency that is irrelevant once the artifact is published. What's the use-case where having this dependency in the published artifact is desired? Maybe I'm missing that.
-
I do not think we should reference unpublished dependencies in the poms of our published artifacts, even if they are marked provided. The user does not need to provide it to use the artifact, and the fact that this is a build-time only dependency is not clearly conveyed by the pom as-is.
Bottom line is that this is breaking some workflows as it is now. Sure, we control this particular workflow and can update it as a workaround, but do we really want to explain this all over again to a user that is doing something similar? I think it's a bit difficult to argue why we need to list a dependency in this pom as provided when it's not actually needed and then didn't bother to publish a dependency-reduced pom to avoid such problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tried manual install-file with dependency-reduced pom 372bc13#diff-2681b51a40a6a26d3a6f7f839cd810422226193ef273ea02d1b0b93c3a87cfa6R375
The issue with that is a bug in the install-file mojo when executed inside a multi-module build it's picking a random (based on the build order of the current maven session) instead of the current pom. I question whether it's worth solving a simple problem affecting this edge case because even a small change is more disruptive #4043 (comment) than the problem it addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR based on the offline meeting with @jlowe
Signed-off-by: Gera Shegalov <gera@apache.org>
Would this be better to target |
Signed-off-by: Gera Shegalov <gera@apache.org>
Got build error with this PR
|
build |
The error is from a line number preceding the change in the PR (executing binary-dedupe), probably unrelated but checking.
spark320 sha1 is different:
When building locally it, the build is correct
|
Inspecting the CI workspace points to the Databricks-vs-Other-Shims split brain problem.
The jars being combined were built at different commits, c8dd501 and 6fd125d respectively
vs
|
Oh, I see, we need to build out DB shims before the spark-rapids release. Is this diff Let me update the DB shims using the same commit Id with the Other-Shims. |
yes, it's because of the commit mismatch because I verified that the the shas match and the build succeeds once the nightly artifacts are caught up. Somehow we had an an inconsistent set of artifacts in the internal maven repo, which I could reproduce by skipping the real build but just building I think there is a simple solution to the problem that totally prevents this from happening. When databricks is involved, all artifacts (shims) should be built on databricks nodes. Then the build pipeline can fan out into testing stages and only databricks-related tests are run on databricks, where as the rest runs on non-databricks fleet. We should not fan-out for artifact creation as we do today. I'll also file a bug to surface the error message from binary-dedupe.sh to maven better. |
Updated DB shims using the same commit Id with the nightly ones, build PASS, test PASS. Build: Test: REL_it-A30-AQE/19/consoleFull
|
@gerashegalov Could you please help to check if we're good to merge? |
Merge it since test passed. |
Fixes #4253
Signed-off-by: Gera Shegalov gera@apache.org