-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAR-2707][External] Allow repeated polling of pending export releases #876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not detect any problem codewise. Haven't tested it tho. 👍🏼
darwin/dataset/remote_dataset.py
Outdated
""" | ||
Get a specific ``Release`` for this ``RemoteDataset``. | ||
|
||
Parameters | ||
---------- | ||
name : str, default: "latest" | ||
Name of the export. | ||
retry : bool, default: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the argument retry
doesn't match what it's doing "return all releases, if true". It should be something like incude_pending
or similar that is a bit more self explanataory
darwin/dataset/remote_dataset_v2.py
Outdated
) | ||
else: | ||
return sorted( | ||
filter(lambda x: x.available, releases), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you can have this nicer where only the first argument releases
or filter(lambda x: x.available, releases)
is chosen with an if
and have a single return line:
return sorted(
releases_fn,
key=lambda x: x.version,
reverse=True,
)
darwin/cli_functions.py
Outdated
""" | ||
version: str = DatasetIdentifier.parse(dataset_slug).version or "latest" | ||
if version == "latest" and retry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions here:
- I don't recall the details but is the name
latest
hardcoded by us? - What happens if a client deliberately passes the name
latest
withretry=True
? - I don't think this restriction is necessary, can't we pick the name of the latest release ourselves before performing the download and then do the retry logic using it instead of
latest
. This would ensure we refer to the same export even if a new export would be created in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the details but is the name latest hardcoded by us?
Yes, latest
is a reserved release name. If you try to create an export named latest
, the api responds with {"errors":{"name":["is reserved"]}}
What happens if a client deliberately passes the name
latest
withretry=True
?
We will return the latest available release
I don't think this restriction is necessary, can't we pick the name of the latest release ourselves before performing the download and then do the retry logic using it instead of latest. This would ensure we refer to the same export even if a new export would be created in the meantime
Actually yes, I think we can. This is because each release has an export_date
of type datetime.datetime
. This allows us to select the most recent release incase retry
is passed as True
. I'll make this change now, thank you for flagging
darwin/dataset/release.py
Outdated
@@ -22,6 +23,8 @@ class Release: | |||
The version of the ``Release``. | |||
name : str | |||
The name of the ``Release``. | |||
status : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if it's common in darwin-py but this would be better as an enum
as it only has a select few values.
tests/darwin/dataset/release_test.py
Outdated
@@ -16,6 +16,7 @@ def release(dataset_slug: str, team_slug_darwin_json_v2: str) -> Release: | |||
team_slug=team_slug_darwin_json_v2, | |||
version="latest", | |||
name="test", | |||
status="test_status", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation purposes, it'd be best to use actual values of export statuses here instead of stubs as they are enums and not arbitrary strings.
darwin/dataset/remote_dataset.py
Outdated
if release.status == "pending": | ||
if retry: | ||
retry_duration = 300 | ||
retry_interval = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more conventional to have these configurable via CLI or some SDK settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@balysv This makes sense. I can see 2 options, both of which involve building in some validation:
- 1: Make these values configurable in the
~/.config.yaml
file, or: - 2: Add two additional arguments:
retry_duration
andretry_interval
with default values of ~10 minutes & ~10 seconds. These can be configured, but if they're passed withoutretry=True
then we will throw an error
I'm leaning toward the additional arguments
Problem
Before a dataset release can be pulled, it needs to finish generating. The time taken for this can vary based on export size and current load on the export pipeline. If a release isn't ready for pulling, then darwin-py will throw an error
Solution
Introduce the optional
retry
parameter (SDK & CLI) that allows polling of pending dataset releases. If the pending release becomes available within the allotted time, it will be automatically downloadedChangelog
Allow optional polling of pending dataset releases in case the release is not yet ready for download