DALAI-project

DALAI - Using artificial intelligence to improve the quality and usability of digital records

The included repositories contain code and model files for tools developed as part of the the DALAI project (September 2021 - August 2023). The project was funded by the European Regional Development Fund’s ”Sustainable growth and jobs 2012–2020” programme and the City of Mikkeli.

The common aim of the different tools is to facilitate the automation of the digitisation and description of cultural heritage materials which are in the holdings of archives and other memory organisations.

Click here for more information on the included repositories

Repository	Domain	Content
CornerAPI	Image Classification	Code for an API that detects torn corners and edges from document images.
EmptyAPI	Image Classification	Code for an API that detects empty pages from document images.
PostitAPI	Image Classification	Code for an API that detects post-it/sticky notes from document images.
WritingtypeAPI	Image Classification	Code for an API that classifies document images based on the writing types(s) (handwritten, typewritten, combination) they contain.
FaultyImageAPI	Image Classification	Code for an API that combines the classification models listed above.
NER_API	Named Entity Recognition	Code for an API that performs named entity recognition from text input in Finnish.
Train_BERT_NER	Named Entity Recognition	Code for training Finnish named entity recognition (NER) model based on BERT language model.
Empty_training	Image Classification	Code for training a neural network model to detect empty pages from document images.
Train_document_classification	Image Classification	Code for training a neural network model to classify input documents into distinct classes based on the type/format of the document.
Train_fault_detection	Image Classification	Code for training a neural network model to detect faults like folded corners or sticky notes from document images.
Train_writing_type	Image Classification	Code for training a neural network model to classify document images based on the writing types(s) (handwritten, typewritten, combination) they contain.
Table_segmentation	Image Segmentation	Code for segmenting table structures and detecting text content in document images.

Some of the tools are also available via Arkkiivi web user interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DALAI-project

DALAI - Using artificial intelligence to improve the quality and usability of digital records

Pinned Loading

Repositories

People

Top languages

Most used topics