Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing file #136

Merged
merged 3 commits into from
Sep 20, 2022
Merged

Fix missing file #136

merged 3 commits into from
Sep 20, 2022

Conversation

simleo
Copy link
Collaborator

@simleo simleo commented Sep 19, 2022

Fixes #135.

Now when write is called on a crate containing at least a file or dataset whose source is missing, an OSError will be raised. Note that the use case mentioned in #73 is still covered: a crate with a missing data entity can still be loaded, but will not be immediately writable back to disk. To "fix" the crate, one can set the source of the offending data entity to:

  1. an existing file or directory

  2. None: in this case, when the crate is written:
    1.1 If the data entity is a File, the actual file will still be missing in the serialized crate, and a warning will be issued
    2.1 If the data entity is a Dataset, an empty directory will be created for it

When creating a new RO-Crate, it's now possible to pass None as the source argument of File and Dataset: the former allows to create a crate with a missing file, which will be treated as explained above; the latter allows to add an empty directory to a new crate.

Example:

import uuid
from pathlib import Path
from rocrate.rocrate import ROCrate

crate = ROCrate()
file_path = Path("/tmp") / uuid.uuid4().hex
assert not file_path.exists()
crate.add_file(file_path)
crate.write("/tmp/crate")  # Fails with FileNotFoundError
file_entity = crate.dereference(file_path.name)
file_entity.source = None
crate.write("/tmp/crate")  # UserWarning: No source for 13af7607702f487faa712425968cf61b

# Directly add a file entity with missing source
file_path_2 = Path("/tmp") / uuid.uuid4().hex
assert not file_path_2.exists()
crate.add_file(None, file_path_2.name)
crate.write("/tmp/crate_2")  # UserWarning: No source for e645714d126446e4bf7303d7615406a5

# Read a crate containing an entity with missing source
read_crate = ROCrate("/tmp/crate")  # No error when loading the crate
read_crate.write("/tmp/crate_copy")  # Fails with FileNotFoundError
crate_file_path = Path("/tmp/crate") / file_path.name
crate_file_path.touch()
read_crate.write("/tmp/crate_copy")  # OK, source now exists

Se the unit tests diff for detailed checks on all use cases.

Finally, note that another option to create a data entity with missing source is to use a file: URI as the id.

@simleo simleo merged commit 1a45d86 into ResearchObject:master Sep 20, 2022
@simleo simleo deleted the fix_missing_file branch September 20, 2022 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix behavior wrt "missing" data entities
1 participant