Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]binary-dedup.sh script fails on mac #3823

Closed
tgravescs opened this issue Oct 14, 2021 · 6 comments · Fixed by #3848
Closed

[BUG]binary-dedup.sh script fails on mac #3823

tgravescs opened this issue Oct 14, 2021 · 6 comments · Fixed by #3848
Assignees
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
looks like xargs on mac doesn't support --arg-file option:


     [exec] + xargs --arg-file=/Users/tgraves/Documents/workspace/spark-rapids/dist/target/spark3xx-common.txt -P 6 -n 1 -I% bash -c 'remove_duplicates "$@"' _ %
     [exec] xargs: illegal option -- -
     [exec] usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr]
     [exec]              [-L number] [-n number [-x]] [-P maxprocs] [-s size]
     [exec]              [utility [argument ...]]

line: xargs --arg-file="$SPARK3XX_COMMON_TXT" -P 6 -n 1 -I% bash -c 'remove_duplicates "$@"' _ %

instead of using the --arg-file option I think we can just cat the the file and | it into xargs

Note there is a line at the end that uses it too:

xargs --arg-file="$UNSHIMMED_LIST_TXT" -P 6 -n 100 -I% \
But that didn't fail so we probably have a bug where it should fail.

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building labels Oct 14, 2021
@gerashegalov
Copy link
Collaborator

What's your MacOS version?

on 11.6 (Big Sur), the first failure I encountered is even earlier than --arg-file
we need to use shasum instead of sha1sum on Mac

      [exec] xargs: sha1sum: No such file or directory

then I see the the --arg-file error

then the build fails for me though as expected

    [exec] + xargs --arg-file=/Users/user/gits/NVIDIA/spark-rapids/dist/target/spark3xx-common.txt -P 6 -n 1 -I% bash -c 'remove_duplicates "$@"' _ %
     [exec] xargs: illegal option -- -
     [exec] usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements] [-S replsize]]
     [exec]              [-J replstr] [-L number] [-n number [-x]] [-P maxprocs]
     [exec]              [-s size] [utility [argument ...]]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for RAPIDS Accelerator for Apache Spark Distribution 21.12.0-SNAPSHOT:
[INFO] 
[INFO] RAPIDS Accelerator for Apache Spark Distribution ... FAILURE [ 15.119 s]
[INFO] RAPIDS Accelerator for Apache Spark UDF Examples ... SKIPPED
[INFO] RAPIDS Accelerator for Apache Spark Tests .......... SKIPPED
[INFO] rapids-4-spark-integration-tests_2.12 .............. SKIPPED
[INFO] RAPIDS Accelerator for Apache Spark Tests For 3.1.X+ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  15.266 s
[INFO] Finished at: 2021-10-14T12:39:58-07:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/Users/user/gits/NVIDIA/spark-rapids/dist/target" executable="/Users/user/gits/NVIDIA/spark-rapids/dist/scripts/binary-dedupe.sh"/>... @ 14:174 in /Users/user/gits/NVIDIA/spark-rapids/dist/target/antrun/build-main.xml

@gerashegalov
Copy link
Collaborator

and once I fix it it's incredibly slow, probably need to just compile a single list of files to delete in larger batches

@tgravescs
Copy link
Collaborator Author

yes I had to install sha1sum as well, but I figured that was env setup like install maven. we may want to document but should just be brew install md5sha1sum

@tgravescs
Copy link
Collaborator Author

versions fail on Mojava 10.14.6 and Catalina 10.15.7

@gerashegalov
Copy link
Collaborator

gerashegalov commented Oct 14, 2021

do you see shasum on Mojava and Catalina?
it was already there on Big Sur

which shasum
/usr/bin/shasum

so we could just automatically pick it on Mac without having to document an additional download

@tgravescs
Copy link
Collaborator Author

yes /usr/bin/shasum exists on both.

@gerashegalov gerashegalov self-assigned this Oct 15, 2021
@gerashegalov gerashegalov removed the ? - Needs Triage Need team to review and classify label Oct 15, 2021
@gerashegalov gerashegalov added this to the Oct 18 - Oct 29 milestone Oct 18, 2021
gerashegalov added a commit that referenced this issue Oct 21, 2021
This PR fixes #3823. 

The minimum fix for this issue is to use `shasum` on macOS vs `sha1sum` on Linux. Once the script is fixed functionally it exhibits unacceptable performance on macOS. The single-threaded performance was also bad on Linux but it was mitigated by using parallel xargs. The slowness may  be attributable to much higher cost of forking a child process and IO due to the presence of  an enterprise security software on macOS. This PR reworks the script such that it is much faster without requiring parallelism, and the run time is similar for both Linux and macOS.

1. eliminate instances of forked process per class file by using bash builtins 
2. compute sha for all files just once
3. build large lists for a few large `rsync` and `rm` calls

On a sample MBP the run time goes down from at least a few minutes (did not wait for completion ) to 25 seconds
On a sample Linux desktop the script run time decreases from over a minute to 3-5 seconds depending on the profile.
 
Signed-off-by: Gera Shegalov <gera@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants