Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check revisions for all shim jars while build all #4266

Merged
merged 3 commits into from
Dec 8, 2021

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Dec 2, 2021

When building all shims, the code base for all the shims may be inconsistent, especially with databricks build.
Because of databricks shims are pulled from URM nightly build, and spar30x, 31x, 32x are built locally.

This contributes #3661

Signed-off-by: Chong Gao res_life@163.com

@res-life
Copy link
Collaborator Author

res-life commented Dec 2, 2021

build

@res-life res-life requested a review from gerashegalov December 2, 2021 13:15
build/buildall Outdated
pre_revision="${curr_revision}"
pre_shim_version_path="${shim_version_path}"
else
echo "Error: version file not exists: ${shim_version_path}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Error: version file not exists: ${shim_version_path}"
echo "Error: version file missing: ${shim_version_path}"

build/buildall Outdated
case $DIST_PROFILE in

snapshots?(WithDatabricks))
with_db_build="true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the WithDatabricks is optional here, its not always true. if I just use "snapshots" then its not with databricks. You need to check explicitly for WithDatabricks before setting this.

build/buildall Outdated
if [ -n "$pre_revision" ] && [[ "$curr_revision" != "$pre_revision" ]] ; then
echo
echo "=================================================================="
echo "Check Failed!!!: revisions are not equals"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "Check Failed!!!: revisions are not equals"
echo "Check Failed: git revisions between shims are not equal"

build/buildall Outdated
echo "=================================================================="
echo " in path ${pre_shim_version_path} ${pre_revision}"
echo " in path ${shim_version_path} ${curr_revision}"
echo "Although the is build finished, but you are taking the risk of inconsistent revisions of shims!!!"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would simplify this and just say something like:
Please check the revisions of each shim to see which one is inconsistent. Note, if building with Databricks those jars are built separately.

build/buildall Outdated
###############################################################################################
# the following is to check the revisions in shims are identical
###############################################################################################
shims_for_check="${SPARK_SHIM_VERSIONS[@]}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to put this in separate function, that way if one wants to comment it out its a single line to remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@gerashegalov
Copy link
Collaborator

I think we need this kind of validation script called from dist/pom.xml in a verify phase the latest. Use of buildall is encouraged but it's optional and it's definitely not used by CI (yet).

@res-life
Copy link
Collaborator Author

res-life commented Dec 3, 2021

I think we need this kind of validation script called from dist/pom.xml in a verify phase the latest. Use of buildall is encouraged but it's optional and it's definitely not used by CI (yet).

Maybe put into the package phase is better.

@res-life
Copy link
Collaborator Author

res-life commented Dec 3, 2021

build

@tgravescs
Copy link
Collaborator

I'm not sure what is status of this, comment was made to put into package phase but updates here don't match, are you modifying this to go into the regular pom files?

@gerashegalov
Copy link
Collaborator

@res-life here is a perfect scenario that the validation you are working can help with #4265 (comment) .

Please undo the changes to buildall and call check-shims-revisions.sh by adding another execution to maven-antrun-plugin in dist/pom.xml . Please make sure that your error message is part of the Maven exception

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life force-pushed the check-shims-revisions branch from 6b29065 to cc3a568 Compare December 6, 2021 11:23
@res-life
Copy link
Collaborator Author

res-life commented Dec 6, 2021

@gerashegalov @tgravescs
The build all script invokes mvn package for dist module, so added the check revisions script into package phase, or the build will skip this check. Is this ok?
Pushed --force because of messed up the code.

@res-life
Copy link
Collaborator Author

res-life commented Dec 6, 2021

build

dist/pom.xml Outdated
@@ -338,6 +338,23 @@
</target>
</configuration>
</execution>
<execution>
<id>check-shims-versions</id>
<phase>package</phase>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario from #4265, making it run package means that we hit the secondary problem first because bindary-dedupe is run inside build-parallel-worlds which is a generate-resources execution. On the other hand check-shims-revisions.sh relies on the build-parallel-worlds to unzip aggregator dependencies.

So I think to avoid a major refactoring and still get check-shims-revisions.sh called in time, we need to move its call to between the unzip loop and call to binary-dedupe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment!
Updated.

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Dec 7, 2021

build

@res-life
Copy link
Collaborator Author

res-life commented Dec 7, 2021

build

@res-life res-life mentioned this pull request Dec 7, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, verified that it succeeds for the consistent shims and fails with a much more clear message when we have an inconsistency.

potential improvements to actually print commit hashes, but this is minor.

[INFO]      [exec] Check Failed: git revisions between shims are not equal
[INFO]      [exec] Please check the revisions of each shim to see which one is inconsistent. Note, if building with Databricks those jars are built separately.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11.394 s
[INFO] Finished at: 2021-12-07T14:15:57-08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: The following error occurred while executing this line:
[ERROR] /home/gshegalov/gits/NVIDIA/spark-rapids/dist/maven-antrun/build-parallel-worlds.xml:127: exec returned: 1
[ERROR] around Ant part ...<ant antfile="/home/gshegalov/gits/NVIDIA/spark-rapids/dist/maven-antrun/build-parallel-worlds.xml" target="build-parallel-worlds" />... @ 9:138 in /home/gshegalov/gits/NVIDIA/spark-rapids/dist/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

@res-life res-life merged commit 367c5d7 into NVIDIA:branch-22.02 Dec 8, 2021
@res-life res-life deleted the check-shims-revisions branch December 8, 2021 01:39
@sameerz sameerz added the build Related to CI / CD or cleanly building label Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants