-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]binary-dedup.sh script fails on mac #3823
Comments
What's your MacOS version? on 11.6 (Big Sur), the first failure I encountered is even earlier than
then I see the the --arg-file error then the build fails for me though as expected
|
and once I fix it it's incredibly slow, probably need to just compile a single list of files to delete in larger batches |
yes I had to install sha1sum as well, but I figured that was env setup like install maven. we may want to document but should just be brew install md5sha1sum |
versions fail on Mojava 10.14.6 and Catalina 10.15.7 |
do you see
so we could just automatically pick it on Mac without having to document an additional download |
yes /usr/bin/shasum exists on both. |
This PR fixes #3823. The minimum fix for this issue is to use `shasum` on macOS vs `sha1sum` on Linux. Once the script is fixed functionally it exhibits unacceptable performance on macOS. The single-threaded performance was also bad on Linux but it was mitigated by using parallel xargs. The slowness may be attributable to much higher cost of forking a child process and IO due to the presence of an enterprise security software on macOS. This PR reworks the script such that it is much faster without requiring parallelism, and the run time is similar for both Linux and macOS. 1. eliminate instances of forked process per class file by using bash builtins 2. compute sha for all files just once 3. build large lists for a few large `rsync` and `rm` calls On a sample MBP the run time goes down from at least a few minutes (did not wait for completion ) to 25 seconds On a sample Linux desktop the script run time decreases from over a minute to 3-5 seconds depending on the profile. Signed-off-by: Gera Shegalov <gera@apache.org>
Describe the bug
looks like xargs on mac doesn't support --arg-file option:
line
: xargs --arg-file="$SPARK3XX_COMMON_TXT" -P 6 -n 1 -I% bash -c 'remove_duplicates "$@"' _ %
instead of using the --arg-file option I think we can just cat the the file and | it into xargs
Note there is a line at the end that uses it too:
xargs --arg-file="$UNSHIMMED_LIST_TXT" -P 6 -n 100 -I% \
But that didn't fail so we probably have a bug where it should fail.
The text was updated successfully, but these errors were encountered: