Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLA-699][external] Allow registration of multi-slotted read-write files from external storage #790

Merged
merged 8 commits into from
Mar 20, 2024

Conversation

JBWilkie
Copy link
Collaborator

Problem

Currently, the REST API is the only way to register externally stored items. The ability to register items using darwin-py is simpler in some scenarios and can prevent difficult-to-spot payload errors

Solution

Added user-facing methods to register multi-slotted read-write items from external storage

Changelog

Added ability to register multi-slotted read-write files from external storage in darwin-py

@JBWilkie JBWilkie requested a review from saurbhc March 15, 2024 15:25
@@ -189,7 +189,7 @@ def get_remote_dataset(
parsed_dataset_identifier.team_slug = self.default_team

try:
matching_datasets: List[RemoteDataset] = [
matching_datasets: List[RemoteDatasetV2] = [
Copy link
Collaborator Author

@JBWilkie JBWilkie Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type changes aren't functionally significant, but since we no longer have RemoteDatasetV1 they make some typechecker errors less likely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some places (docstrings/types) where we use RemoteDataset - should we change all of them 🤔

Copy link
Collaborator Author

@JBWilkie JBWilkie Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, although I don't think it's a high priority change. I made the change here because I noticed Pylance type checking complaining, but didn't comb the rest of the repo

If you agree, I'd like to do that at some point down the line

Comment on lines 694 to 700
if not isinstance(storage_keys, dict) or not all(
isinstance(v, list) and all(isinstance(i, str) for i in v)
for v in storage_keys.values()
):
raise ValueError(
"storage_keys must be a dictionary with keys as item names and values as lists of storage keys."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic models could be very efficient here! - but i guess we only use pydantic in darwin-py 2.0 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. We do, but I don't see a problem with using models here

We could do something like:

class StorageKeysModel(BaseModel):
    storage_keys: Dict[str, List[str]] = Field(..., description="A dictionary with keys as item names and values as lists of storage keys.")

and then:

try:
    validated_input = StorageKeysModel(storage_keys=storage_keys)
except ValidationError as e:
    raise ValueError(e)

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, that'd work! 🙏

Comment on lines 733 to 737
# Do not register more than 500 items in a single request
chunk_size = 500
chunked_items = (
items[i : i + chunk_size] for i in range(0, len(items), chunk_size)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've seen the chunking/batching function before somewhere - we can create a new fn and re-use it?

Copy link
Member

@saurbhc saurbhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🙏

@JBWilkie JBWilkie merged commit 3c4263c into master Mar 20, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants