-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define and implement the GRSciColl master data management solution #319
Comments
What we want:
What we don’t want or don’t need:
Where we start:
|
Attempt at mapping fields:
NB: the Institution and Code, which are mandatory fields cannot be inferred from the EML. The users will have to fill those fields.
NB: Same comment about codes as for collection. |
collection homepage collection identifiers
|
Also should it perhaps be possible to map dataset => institution ? And they list their collections UPDATE: in this case the publisher would be natural to use I guess. So perhaps no need after all :) |
Perhaps we could fall back to occurrence metrics when/if it isn't filled? |
For the collection-dataset mapping:
For the institution-organization mapping:
For both mappings, for the contacts I think we could check if the person exists in grscicoll and create a new person otherwise. It's not ideal since we'll be kind of duplicating people and if the person changes in the organization or the dataset, should we update it in grscicoll too? or if it's deleted do we still keep this person in grscicoll? with the current model that we have for persons I don't think there isn't a good solution unless we improve the model first. |
The problem with inferring a collection's parent institution from a dataset title (or publisher) is that it might generate duplicates if the spellings are different than what we have in GRSciColl. Plus, what if there are several institutions in GRSciColl matching the same name? I think someone will have to check manually which institution should be the parent one, it cannot really happen automatically. Concerning using occurrences and publisher to infer some content:
I don't think the I agree that we should first check if the contact exists in GRSciColl before creating a new one. The definitions of the catalogueURL field we wrote is "If your specimens are digitized and available online, you can put here the link to access them". |
Yes, that can happen. This complicates things. If we don't want to have conflicts we'd have to "duplicate" all the contacts and keep a link between them so we know for sure to what grscicoll person they refer.
I'm not sure. I guess some collections might have records in multiple datasets. We could have a link to the occurrences in the institution/collection page. |
You are right, it gets a bit complicated. I think we should leave those empty by default and the users can always fill them in. |
As agreed with the others, we'll map the specimenPreservationMethod in the
|
Deployed to PROD. |
There are potentially multiple sources of truth for the metadata in the catalogue which needs to be resolved; a problem known as master data management. For example we have information available in a dataset metadata description, an existing GRSciColl entry and an Index Herbariorum record.
Define, implement and document the approach taken by the catalogue for handling differing views of metadata.
An approach could be as follows:
The text was updated successfully, but these errors were encountered: