-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance download_and_extract #8216
base: dev
Are you sure you want to change the base?
Conversation
Hi @Jerome-Hsieh If you're having issues with putting this PR together we can discuss how to resolve them, but please do not close and then open a new PR. Thanks for the contribution! |
monai/apps/utils.py
Outdated
with tempfile.TemporaryDirectory() as tmp_dir: | ||
filename = filepath or Path(tmp_dir, _basename(url)).resolve() | ||
if filepath: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get the idea for this change here. Seems equivalent to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change I want to make sure if user don't set the filepath
, the process can still work by using the default path
monai/apps/utils.py
Outdated
if filepath: | ||
FilepathExtenstion = ''.join(Path(".", _basename(filepath)).resolve().suffixes) | ||
if urlFilenameExtension != FilepathExtenstion: | ||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the issue here is that if filepath can directly get the name of the downloaded file, the download_and_extract
function would work as expected, rather than only raising an error at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but my perspective is that an error with the filepath
occurs when the program tries to write the downloaded file to an invalid filepath
.
E.g.,filepath='./test'
is invalid, filepath='./test.tar.gz'
is valid
So I would like to validate the filepath at the very beginning to ensure it is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would warn instead of forcing extensions, it takes control away from people to force things like this that aren't strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I have a question.
Is it necessary for the user to specify the file extension when setting the filepath
for the downloaded file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user gives a path without extension then you'll get a file with no extension, the user would have to account for that and provide an extension. You could add an extension if none is given (and warn that this happened), but if the user has given a different extension than what's expected, this should be a warning and not an error. If you add an extension you should also be careful not to overwrite an existing file. I realise this is getting rather complicated now, sorry!
Signed-off-by: jerome_Hsieh <jerome910810@gmail.com>
Signed-off-by: jerome_Hsieh <jerome910810@gmail.com>
Signed-off-by: jerome_Hsieh <jerome910810@gmail.com>
0511db5
to
0441871
Compare
Signed-off-by: jerome_Hsieh <jerome910810@gmail.com>
@ericspod @KumoLiu |
Hi @Jerome-Hsieh, thank you for the quick update. I tested your latest changes, but neither "." nor "./test" as the file path seems to work. Is this the expected behavior? In my opinion, both should work. The logic could be adjusted so that if the file path is a directory and not empty, it automatically captures the name and downloads the file into the specified directory. What do you think?
|
Hi @KumoLiu my test results: >>> url = "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/MedNIST.tar.gz"
>>> monai.apps.utils.download_and_extract(url, filepath=".")
2024-12-06 18:06:56,333 - INFO - Expected md5 is None, skip md5 check for file ..
2024-12-06 18:06:56,334 - INFO - File exists: ., skipped downloading.
2024-12-06 18:06:56,334 - INFO - Non-empty folder exists in ., skipped extracting. and >>> monai.apps.utils.download_and_extract(url, filepath="./test")
2024-12-06 18:08:41,246 - WARNING - filepath=./test, which missing file extension. Auto-appending extension to: ./test.tar.gz
test.tar.gz: 59.0MB [00:10, 6.07MB/s]
2024-12-06 18:08:51,443 - INFO - Downloaded: test.tar.gz
2024-12-06 18:08:51,443 - INFO - Expected md5 is None, skip md5 check for file test.tar.gz.
2024-12-06 18:08:51,443 - INFO - Writing into directory: .. If the results like above, that's expected behavior. Your opinion sounds like |
Regarding file downloading from URL, sometimes we can infer the name from the content deposition. Here is one example snippet: https://github.com/Project-MONAI/VLM/blob/7be688aa457a1806f908eb758f2f3ee816fea017/m3/demo/experts/utils.py#L135 @KumoLiu |
Thank @mingxin-zheng provide information to find file name. |
Signed-off-by: jerome_Hsieh <jerome910810@gmail.com>
Hi @KumoLiu the newest commit I change the logic that if the file path is a directory and not empty, it will automatically capture the name from content deposition, if it doses't have any file name, will use url basename and download the file into the specified directory then gives a warning. |
for more information, see https://pre-commit.ci
Hi @Jerome-Hsieh, thank you for the update. After our discussion in the development meeting, I realized that there is an |
@@ -27,6 +28,8 @@ | |||
from urllib.parse import urlparse | |||
from urllib.request import urlopen, urlretrieve | |||
|
|||
import requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the ci error here: https://github.com/Project-MONAI/MONAI/actions/runs/12338243392/job/34432967743?pr=8216
Please consider using optional_import
here and
Line 61 in 21920a3
requests_get, has_requests = optional_import("requests", name="get") |
And skip in the related tests.
MONAI/tests/test_bundle_get_data.py
Line 46 in 21920a3
@SkipIfNoModule("requests") |
Please also include related test! Thanks again.
Hi @KumoLiu I see, and what do you think about previous version a9a0171 I commit ? |
Hi @Jerome-Hsieh, a9a0171 looks good. The only update needed is how to extract the file extension from the URL. For example, given a URL like "https://drive.google.com/u/1/uc?id=1KntZge40tAHgyXmHYVqZZ5d2p_4Qr2l5&export=download", you can use |
Fixes #5463
Description
According to issue, the error messages are not very intuitive.
I think maybe we can check if the file name matches the downloaded file’s base name before starting the download.
If it doesn’t match, it will notify user.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.