Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve all asset types being evaluated during upload validation #2062

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

aaronkanzer
Copy link
Member

Moving PR #1818 (Cc @kabilar) to here so there is access to env vars in CI

Also relates to /linc-archive/pull/34 and #1814

During validation of zarr based datasets, a validation check occurs on the assetSummary field in the dandiset.yaml

Prior to this PR, the only "assets" evaluated were "blob". In the query that is updated in this PR, the checks for size are expanded to not just blob but other "asset" types within the Asset model -- see link below:

blob = models.ForeignKey(
AssetBlob, related_name='assets', on_delete=models.CASCADE, null=True, blank=True
)
embargoed_blob = models.ForeignKey(
EmbargoedAssetBlob, related_name='assets', on_delete=models.CASCADE, null=True, blank=True
)
zarr = models.ForeignKey(
'zarr.ZarrArchive', related_name='assets', on_delete=models.CASCADE, null=True, blank=True
)

Prior to the fix in this PR, the following validation would fail on assets in dandischema such as zarr: https://github.com/dandi/dandi-schema/blob/bdb7dd681435f1092814041eb01eb4fabe87bf57/dandischema/models.py#L1497

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

@@ -90,7 +92,9 @@ def version_aggregate_assets_summary(version: Version) -> None:

assets_summary = aggregate_assets_summary(
asset.full_metadata
for asset in version.assets.filter(status=Asset.Status.VALID)
for asset in version.assets.filter(
Q(status=Asset.Status.VALID) | Q(zarr__status=ZarrArchiveStatus.UPLOADED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem that led to this particular change? We need @jjnesbitt to confirm, but I don't think we should be counting Zarr archives in the UPLOADED state (those have not yet been "finalized" in the terminology of the Zarr API; once they are, I believe they move into the COMPLETE state).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- updated to COMPLETE here and in the unit test.

The expansion of the filter query here is to match similar logic in the validate_pending_asset_metadata Celery task -- specific code reference here

@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 31, 2024

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

In the case where there is no blob but just zarr FKs associated with the Asset, validation fails since the query of:

'numberOfBytes': 1 if version.assets.filter(blob__size__gt=0).exists() else 0,

only evaluates the blob Foreign Key. I stumbled upon this bug when uploading a dandiset of pure zarr if you'd like to replicate in DANDI Archive + dandischema's current state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants