Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for Hdi_isfolder with a capital #418

Merged
merged 8 commits into from
Dec 31, 2023
Merged

Conversation

basnijholt
Copy link
Contributor

@basnijholt basnijholt commented Jun 21, 2023

Currently adlfs is telling me that my folder is a file.

When printing props, I see Hdi_isfolder is capitalized whereas in the code it is not.

For example on my folder (folder/.dev/Air) I have the following props which are passed to _details (which sets whether it is a file or directory):

{'name': 'folder/.dev/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'Hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB6D28944E30E9"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': 'application/octet-stream', 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}

Then _details sets this as a file.

Closes #440

Currently `adlfs` is telling me that my folder is a file.

When printing `props`, I see `Hdi_isfolder` is capitalized whereas in the code it is not.

For example on my folder (`folder/.dev/Air`) I have the following `props` which are passed to `_details` (which sets whether it is a file or directory):
```
{'name': 'folder/.dev/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'Hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB6D28944E30E9"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': 'application/octet-stream', 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 6, 14, 22, 42, 5, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}
```

Then `_details` sets this as a `file`.
@basnijholt
Copy link
Contributor Author

Interestingly, in the same container, code, and environment, on another folder I get these props with hdi_isfolder uncapitalized:

props {'name': 'folder/Air', 'container': 'mycontainer', 'snapshot': None, 'version_id': None, 'is_current_version': None, 'blob_type': <BlobType.BLOCKBLOB: 'BlockBlob'>, 'metadata': {'hdi_isfolder': 'true'}, 'encrypted_metadata': None, 'last_modified': datetime.datetime(2023, 2, 17, 0, 3, 8, tzinfo=datetime.timezone.utc), 'etag': '"0x8DB107A59F70DB3"', 'size': 0, 'content_range': None, 'append_blob_committed_block_count': None, 'is_append_blob_sealed': None, 'page_blob_sequence_number': None, 'server_encrypted': True, 'copy': {'id': None, 'source': None, 'status': None, 'progress': None, 'completion_time': None, 'status_description': None, 'incremental_copy': None, 'destination_snapshot': None}, 'content_settings': {'content_type': None, 'content_encoding': None, 'content_language': None, 'content_md5': None, 'content_disposition': None, 'cache_control': None}, 'lease': {'status': 'unlocked', 'state': 'available', 'duration': None}, 'blob_tier': None, 'rehydrate_priority': None, 'blob_tier_change_time': None, 'blob_tier_inferred': None, 'deleted': False, 'deleted_time': None, 'remaining_retention_days': None, 'creation_time': datetime.datetime(2023, 2, 17, 0, 3, 8, tzinfo=datetime.timezone.utc), 'archive_status': None, 'encryption_key_sha256': None, 'encryption_scope': None, 'request_server_encrypted': True, 'object_replication_source_properties': [], 'object_replication_destination_policy': None, 'last_accessed_on': None, 'tag_count': None, 'tags': None, 'immutability_policy': {'expiry_time': None, 'policy_mode': None}, 'has_legal_hold': None, 'has_versions_only': None}

@basnijholt
Copy link
Contributor Author

@TomAugspurger or @hayesgb, any feedback? Is there a chance of getting this merged?

@TomAugspurger
Copy link
Contributor

Thanks @basnijholt. I didn't really understand the changes until reading #418.

Does azurite support these metadata fields? If so, could you add a test that hits this?

@TomAugspurger
Copy link
Contributor

Did you have a chance to look at tests with Azurite?

adlfs/spec.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

Can you add a test?

@basnijholt
Copy link
Contributor Author

Unfortunately, I no longer work with Azure Data Lake and cannot find the bandwidth to sit down and write a proper test for this.

The code change is really trivial though and this code ran in production for many months in an internal project.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 10, 2023 via email

@TomAugspurger
Copy link
Contributor

Azurite does support metadata. Added a test and changelog entry.

@TomAugspurger TomAugspurger merged commit 32132c4 into fsspec:main Dec 31, 2023
4 checks passed
@TomAugspurger
Copy link
Contributor

Thanks @basnijholt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support virtual directory stubs with uppercase "Hdi_isfolder" metadata
3 participants