Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output Created/LastChange timestamp as processingDateTime, fix #36 #37

Merged
merged 3 commits into from
Jan 11, 2024

Conversation

kba
Copy link
Member

@kba kba commented Jan 11, 2024

With this PR, the alto:processingDateTime element of an alto:processingStep will be set to either the pc:Created timestamp (--timestamp-src Created), the pc:LastChange timestamp (--timestamp-src LastChange) or not at all like before (--timestamp-src none).

This is not 100% correct since Created and LastChange are document-wide and not step-specific but we have no other source for them AFAICS and it is important for our (@StaatsbibliothekBerlin) workflows to have at least an approximate date for versioning purposes in the alto:processingSteps.

@kba kba requested a review from bertsky January 11, 2024 14:10
@kba kba force-pushed the created-date branch 4 times, most recently from c6c0db4 to 037de1e Compare January 11, 2024 14:41
@kba kba merged commit 924c9e1 into master Jan 11, 2024
5 checks passed
@bertsky
Copy link
Collaborator

bertsky commented Jan 12, 2024

IMHO the correct representation would have been:

  • for PAGE's Metadata/Created: a separate Description/Processing element with processingCategory=contentGeneration and the respective processingDateTime (independent of the step_alto entries for each Metadata/MetadataItem)
  • for PAGE's Metadata/LastChange: a separate Description/Processing element with processingCategory=contentModification and the respective processingDateTime (independent of the step_alto entries for each Metadata/MetadataItem)

For ALTO v2 with its preProcessingStep|ocrProcessingStep|postProcessingStep distinction, one would probably have to map to:

  • for PAGE's Metadata/Created: a separate Description/OCRProcessing element with ocrProcessingType=preProcessingStep and the respective processingDateTime (independent of the step_alto entries for each Metadata/MetadataItem)
  • for PAGE's Metadata/LastChange: a separate Description/OCRProcessing element with ocrProcessingType=postProcessingStep and the respective processingDateTime (independent of the step_alto entries for each Metadata/MetadataItem)

But obviously, this is not ideal. However, since PAGE's Created/LastChange does not have a clear semantics, I would argue this is the best pragmatic fit.

BTW, we are also still missing Metadata/Creator! IMO this should go into the contentGeneration (or preProcessingStep) entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants