Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dct identifier to publisher #301

Merged
merged 4 commits into from
Sep 12, 2024

Conversation

hcvdwerf
Copy link
Contributor

@hcvdwerf hcvdwerf commented Sep 9, 2024

The system previously lacked support for storing and serializing publisher identifiers, such as ROR IDs, ORCIDs, or other unique identifiers for publishers. While other metadata fields like publisher name, email, and URL were being correctly parsed and serialized, the identifier field (e.g., dct:identifier) was not being included in the publisher’s metadata. This omission resulted in incomplete RDF metadata and affected datasets that needed to capture a unique reference to the publisher.

@hcvdwerf hcvdwerf force-pushed the add-dct-identifier-to-publisher branch from 83a1ddf to aefa22c Compare September 9, 2024 12:12
Comment on lines 29 to 30
elif isinstance(dcat_publisher, dict) and dcat_publisher.get('name'):
package_dict['extras'].append({'key': 'dcat_publisher_name', 'value': dcat_publisher.get('name')})
package_dict['extras'].append({'key': 'dcat_publisher_email', 'value': dcat_publisher.get('mbox')})
if dcat_publisher.get('name'):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a double checl for `.get('name') which can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the second one

@@ -66,6 +66,11 @@ dataset_fields:

- field_name: type
label: Type

- field_name: identifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add it to the dcat_ap_full.yaml file as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do

@@ -123,7 +123,7 @@ def _parse_dataset_base(self, dataset_dict, dataset_ref):

# Publisher
publisher = self._publisher(dataset_ref, DCT.publisher)
for key in ("uri", "name", "email", "url", "type"):
for key in ("uri", "name", "email", "url", "type", "identifier"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take care of the DCAT -> CKAN mapping, but you also need to add the other way around, CKAN -> DCAT. This is done a bit further down in the file here and here.

Additionally, this will cover the legacy way of defining publisher fields based on namespaced publisher_* extra fields, but going forward we want to support scheming objects ("publisher": {"id": xx, "name": yy, "url": zz}). This is handled in the euro_dcat_ap_scheming profile here (for CKAN -> DCAT, the opposite should already be handled by the base profile). Do you mind adding support for the new field there as well? And if you extend this test to cover the new field, even better.

@hcvdwerf Let me know if all this makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tnx for the comments. Verry helpful @amercader . Jut commit the changes

Copy link
Contributor

@Markus92 Markus92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small fix for failing e2e test.

@amercader amercader merged commit d4dfab8 into ckan:master Sep 12, 2024
4 checks passed
@amercader
Copy link
Member

Thanks @hcvdwerf !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants