-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a FSspec based ArtifactLoader #100
Conversation
src/nnbench/types.py
Outdated
self, path: str | os.PathLike[str], to_localdir: str | os.PathLike[str] | None = None | ||
) -> None: | ||
self.rpath = str(path) | ||
self.lpath = Path(to_localdir or mkdtemp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an acquired/created resource that needs proper cleanup, e.g. through a weakref finalizer (see e.g. https://github.com/python/cpython/blob/2647afeab694517933c53028d4ad427460b9e1fa/Lib/tempfile.py#L885-L888)
src/nnbench/types.py
Outdated
Path | ||
The path to the artifact on the local filesystem. | ||
""" | ||
fs = spec.AbstractFileSystem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filesystem should be constructed directly from the URI, see e.g. https://github.com/nicholasjng/shelf/blob/4a628c17e678524a49cb327238c3fb7924e0cb9d/src/shelf/util.py#L12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the story after I downloaded the remote files to my machine? I mean, what do I have to use to instantiate my model from the files?
pyproject.toml
Outdated
@@ -31,7 +31,7 @@ classifiers = [ | |||
"Typing :: Typed", | |||
] | |||
|
|||
dependencies = ["tabulate"] | |||
dependencies = ["tabulate", "fsspec"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we should make this optional (i.e. by importing fsspec
only locally in the artifact load method), and facilitate local loading with builtin methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
src/nnbench/types.py
Outdated
from typing import Any, Callable, Generic, Iterable, Iterator, Literal, TypeVar | ||
|
||
from fsspec import core, filesystem, utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please import the specific APIs from these submodules, not the whole modules themselves.
tests/test_artifacts.py
Outdated
|
||
def test_load_local_file(local_file: Path, tmp_path: Path) -> None: | ||
test_dir = tmp_path / "test_load_dir" | ||
loader = FilePathArtifactLoader(str(local_file), str(test_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work without the string casts.
src/nnbench/types.py
Outdated
def __init__( | ||
self, | ||
path: str | os.PathLike[str], | ||
to_localdir: str | os.PathLike[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a different name, it currently sounds like a bool. Maybe destination
/ basedir
?
src/nnbench/types.py
Outdated
fs, _, _ = core.get_fs_token_paths(self.source_path) | ||
if isinstance(fs, LocalFileSystem): | ||
return Path(self.source_path).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this caching a la "check if it exists, only download if it doesn't"? (Otherwise it's probably currently missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More or less.
My understanding is, that this class, as a general filesystem class, should also be able to use local filepaths.
In that case, we don't want to download the data.
If it is a remote path, we download and put it either (a) a specified local dir, or (b) a tmp dir.
If a) the files are persisted on disk and the filepath can be used (by the user) in other places.
If b) the cleanup task deletes the tmp dir and the model to not clutter the filesystem.
That is how it is currently implemented. Do you agree with this?
9c53388
to
ce7f0dd
Compare
.github/workflows/lint-and-test.yaml
Outdated
@@ -58,6 +58,7 @@ jobs: | |||
- name: Run tests on oldest supported Python | |||
run: | | |||
python -m pip install ".[dev]" | |||
pip install fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just add it to the dev extra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line, then.
src/nnbench/types.py
Outdated
raise ImportError( | ||
"The FilePathArtifactLoader depends on fsspec. Do you have it installed?" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not helpful, you should instead give an install instruction:
raise ImportError( | |
"The FilePathArtifactLoader depends on fsspec. Do you have it installed?" | |
) | |
raise ImportError( | |
"class {self.__class__.__name__} requires `fsspec` to be installed. You can install it by running `python -m pip install --upgrade fsspec`" | |
) |
src/nnbench/types.py
Outdated
Path | ||
The path to the artifact on the local filesystem. | ||
""" | ||
fs, _, _ = self._get_fs_token_paths(self.source_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're ignoring the other two outputs, you could also use fs = fsspec.filesystem(self.source_path, **self.storage_options)
(or with the options in another set of parentheses, not sure about the calling convention).
src/nnbench/types.py
Outdated
""" | ||
fs, _, _ = self._get_fs_token_paths(self.source_path) | ||
if isinstance(fs, self._LocalFileSystem): | ||
return Path(self.source_path).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to copy files into your target folder even if they are local? Seems like you could just omit the branch here.
src/nnbench/types.py
Outdated
def _get_protocol(self, path: str) -> str: | ||
"""Lazy loading of fsspec.utils.get_protocol""" | ||
try: | ||
from fsspec.utils import get_protocol | ||
|
||
return get_protocol(path) | ||
except ImportError: | ||
raise ImportError( | ||
"class {self.__class__.__name__} requires `fsspec` to be installed. You can install it by running `python -m pip install --upgrade fsspec`" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here:
- The single use in
load()
below does not warrant extracting these into their own APIs - please just import the fsspec versions both inline inload()
, and scrap the private class methods. - You can factor out the import error with a simple import guard at the top of the file, e.g. like so:
# this towards the top, after the imports.
try:
import fsspec
HAS_FSSPEC = True
except ImportError:
HAS_FSSPEC = False
...
# later, in FilePathArtifactLoader.__init__():
...
if not HAS_FSSPEC:
raise ModuleNotFoundError("class blah requires `fsspec`...")
.github/workflows/lint-and-test.yaml
Outdated
@@ -58,6 +58,7 @@ jobs: | |||
- name: Run tests on oldest supported Python | |||
run: | | |||
python -m pip install ".[dev]" | |||
pip install fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line, then.
src/nnbench/types.py
Outdated
if TYPE_CHECKING: | ||
import fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mypy complaining? You could also add fsspec
to the list of modules for which to ignore missing imports (in the pyproject.toml) if you like, since most of its interfaces are not great typing-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, was mypy. Changed to ignore in the .toml
.
Closes #94
Questions
fsspec
become a first class dependecy such as tabulate? Or do we import it conditionally in theFilepathArtifactLoader
?