Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use curl instead of wget for Spark and Julia downloads #1950

Merged
merged 18 commits into from
Aug 3, 2023

Conversation

mathbunnyru
Copy link
Member

Describe your changes

Issue ticket if applicable

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@mathbunnyru
Copy link
Member Author

I used nmap --script ssl-enum-ciphers -p 443 URL to check both spark and julia download websites protocols.
https://stackoverflow.com/a/55764641/4881441

And then forced wget to use exactly such protocol.
https://stackoverflow.com/a/55764641/4881441

@mathbunnyru
Copy link
Member Author

dlcdn.apache.org supports both TLSv1.2 and TLSv1.3. I will use v3.

@bjornjorgensen
Copy link
Contributor

but its it secure when we use https instead of http

@mathbunnyru
Copy link
Member Author

but its it secure when we use https instead of http

Sorry, I don't understand what you're saying.

@bjornjorgensen
Copy link
Contributor

what does "Force tls version on some wget downloads" have to do with run-one-until-success

@mathbunnyru mathbunnyru changed the title Force tls version on some wget downloads Try to fix wget for spark and Julia Jul 27, 2023
@mathbunnyru
Copy link
Member Author

what does "Force tls version on some wget downloads" have to do with run-one-until-success

Thanks, I changed the PR name to something more appropriate.
I tried a few solutions here, but they don't seem to work fine.

@bjornjorgensen
Copy link
Contributor

is run-one-until-success only for ubuntu and debian?
Are there any reasons why we need it?

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Jul 28, 2023

is run-one-until-success only for ubuntu and debian?

run-one-until-success is a part of run-one package.
I don't know which OSs provide such a package.

Are there any reasons why we need it?

We need it because spark and julia downloads fail on aarch64 machines.
I tried to understand why it happens and tried several solutions.
I don't know the exact reason and if someone has a better solution, I would really appreciate a proper fix.
My attempts haven't fixed the problem.
So, run-one-until-success mitigates it in a simple manner.

We already install run-one package to support RESTARTABLE option, so the image size is not affected.

The only downside I can see is that when spark or julia downloads actually fail, we will get timeout errors, instead of download errors. Unfortunately, there is no attempts parameter to pass to run-one-until-success.

@mathbunnyru
Copy link
Member Author

@mathbunnyru mathbunnyru reopened this Jul 28, 2023
@bjornjorgensen
Copy link
Contributor

so this is a bug for wget on aarch64?

can we try replace wget --progress=dot:giga "https://julialang-s3.julialang.org/bin/linux/${JULIA_SHORT_ARCH}/${JULIA_MAJOR_MINOR}/${JULIA_INSTALLER}"

curl -o ${JULIA_INSTALLER} -L "https://julialang-s3.julialang.org/bin/linux/${JULIA_SHORT_ARCH}/${JULIA_MAJOR_MINOR}/${JULIA_INSTALLER}" --progress-bar

@mathbunnyru
Copy link
Member Author

@bjornjorgensen I've tried to run it manually:

runner-user@jupyter-runner-1:~$ curl --output julia.tgz https://julialang-s3.julialang.org/bin/linux/x64/1.9/julia-1.9.2-linux-x86_64.tar.gz --progress-bar
##O=- #     #

I ran it with verbose as well:
logs.txt

@bjornjorgensen
Copy link
Contributor

@mathbunnyru
Copy link
Member Author

Thank you so much @bjornjorgensen!
Could you please take a look here?
Will curl with -L fix spark download as well?
https://github.com/jupyter/docker-stacks/actions/runs/5674349686/job/15378214762#step:7:289

@bjornjorgensen
Copy link
Contributor

"The -L flag tells curl to follow any redirects, so it will continue to the new location provided by the server. This is an important flag to include when downloading files, because the URL you are trying to download from might be a redirect to the actual file's URL."

I hope so.. :)

@bjornjorgensen
Copy link
Contributor

it works.. if you update the title for this PR I will approve it 👍 and Thank you :)

@mathbunnyru mathbunnyru changed the title Try to fix wget for spark and Julia Use curl instead of wget for Spark and Julia downloads Aug 2, 2023
@mathbunnyru mathbunnyru merged commit e1bd309 into jupyter:main Aug 3, 2023
54 checks passed
@mathbunnyru
Copy link
Member Author

Thank you @bjornjorgensen.
I added you as a co-author for the squashed commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants