-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO-507: Fixing imports with spaces #501
Conversation
darwin/utils.py
Outdated
@@ -210,13 +280,15 @@ def find_files( | |||
for f in files: | |||
path = Path(f) | |||
if path.is_dir(): | |||
found_files.extend([f for f in path.glob(pattern) if is_extension_allowed("".join(f.suffixes))]) | |||
elif is_extension_allowed("".join(path.suffixes)): | |||
found_files.extend([f for f in path.glob(pattern) if is_extension_allowed(str(path))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes made only by formatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to give visibility this line - "".join(path.suffixes)
is there to handle the fact that some extensions have two dots in them e.g .nii.gz
. I'm not sure this new logic works, as is_extension_allowed
expects only the extension passed to it not the entire path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see the new is_extension_allowed_by_filename
but, maybe that should be used instead of is_extension_allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're absolutely right, not sure how I missed that!
Well, I am - I unit tested it, so tested the functions individually!
found_files.append(path) | ||
else: | ||
raise UnsupportedFileType(path) | ||
|
||
return [f for f in found_files if f not in map(Path, files_to_exclude)] | ||
files_to_exclude_full_paths = [str(Path(f)) for f in files_to_exclude] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned into two lines for readability, because applying the str
as
[f for f in found file if str(f) not in map(str, map(Path, files_to_exclude))]
is not too readable, and import this
will tell us it should be.
Comparing by string, because comparing PosixPath
, WindowsPath
, or generally PurePath
objects is quite unreliable. Iteratively comparing path.parents
and then path.name
works, and casting to str
provides ...parents/name
in a single string - which is more efficient to compare.
Comparison of parents and then name is O(n)=n*(p+1)
- where p is number of parents. Comparison of string output is O(n)=1
- at least on the surface - but under the hood, we should at least assume that pathlib
is relatively optimised.
|
||
class TestIsExtensionAllowedByFilenameFunctions(FindFileTestCase): | ||
@dataclass | ||
class Dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency injection approach used because pytest
already has is_allowed_by_filename
in memory from the file header, and this guarantees a new memory pointer for each test - pytest
runs 4 in parallel as default.
@@ -1,4 +1,5 @@ | |||
from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatted made changes to a file I saved in advertently.
…x format in unrelated code
IO-507 BUG: "Error: No files found" when importing folders that contain files with spaces in the filename
BUG: submission from @Rafal Zadlowski Summary (describe an issue): Share Loom/Screenshots with Console/Network opened: Darwin affected version Environment (production/staging; browser and OS version) Impact Priority Expected Behaviour Team & Dataset Link Intercom ticket |
pathlib.Path.suffixes
was not working well, because we were joining all suffixes into a string to allow extensions like.nii.gz
to work, and...filename.1.png
from working as the extension is parsed as1.png
and that is not in the valid list (.png
is)Changes:
is_extension_allowed
,is_image_extension_allowed
, andis_video_extension_allowed
, because these accept only the extensionis_extension_allowed_by_filename
and corresponding image and video equivalents, which accept the entire filename, and check it ends with a valid extension.