-
Notifications
You must be signed in to change notification settings - Fork 0
Digital Library of Arne Novák
Updated: 4. 2. 2021
Contact e-mail: digitalia@phil.muni.cz
The aim of the Digital Library of Arne Novák (DK AN) is to facilitate access to the works of prof. Arne Novák, a literary scientist, critic, historian and essayist (* 2. 3. 1880, † 26. 11. 1939.) The collection includes digitized materials from the resources of the Central Library of the Faculty of Arts, Masaryk University which are freely accessible under the Copyright Act (his own monographs, introductions, epilogues and original works they belong to).
The target group includes researchers who are interested in the personality of Arne Novák, the period he lived in, and the history of Czech literature in general. All of the works are freely accessible, with the exception of copyright-protected illustrations. The content can be displayed directly in your browser or downloaded in any of the formats listed below.
The technical background for the DK AN is managed by the Centre for Information Technologies of the Faculty of Arts, Masaryk University as part of the infrastructure for the Digital Library of the Faculty of Arts, Masaryk University. The Digital Library was founded in 2009 and until 2020 it was administered by the Central Library of the Faculty of Arts, Masaryk University. The Central Library continues to be the administrator of the metadata and data contained in the system and remains responsible for further evolvement of the contents. In 2020, the DK AN was transferred to the existing Islandora system under the LINDAT/CLARIAH-CZ project.
The Islandora system is composed of several components:
- The Drupal system is used to manage the contents of the repository and to store the metadata, while also providing a rich user interface.
- The files are stored in the Fedora system. Fedora services also include versioning and fixity checking.
- Apache Solr is used for content indexing and searching in the repository.
- Islandora microservices provide synchronization of the contents in Drupal and Fedora systems and integration of various applications for processing images, videos, and text. The DK AN currently uses the Houdini microservice, integrating the ImageMagick application.
A more detailed architecture diagram of Islandora is part of its documentation.
Individual entries in the DK AN (works, parts of works, pages) are stored as Drupal Nodes and grouped by Content Type. The DK AN includes four Content Types: Book, Part, Page, and Author.
Files are stored in the repository as Media and grouped by Media Types. In turn, Media are attached to individual Nodes.
Media Types attached to Book Nodes are Book Cover, Book PDF, Book Text, and MARC XML.
Media Types attached to Page Nodes are Page Scan, TIFF Scan, Page Thumbnail, Illustration, and Page Text.
Part Nodes do not contain any files, only metadata, including the link to the respective work and its page range. Based on this data, the list of respective pages of the work is displayed on the pages.
Full text in PDF (Book PDF Media Type) can be displayed by the PDFjs browser. Images of individual pages (Page Scan Media Type) can be viewed in the OpenSeadragon browser. Other files are accessible to users for downloading from the pages of the respective work or page.
Indexing and searching in the repository is provided by Apache Solr.
The Search Content index includes all objects of the Book, Part, and Page types. Indexing is used for the fields of the title, author’s name, content type, work, year of issue and full OCR texts attached to pages and works.
2 options of searching are available:
- Titles: Searching for works or parts of works by their title. This search method also provides the autocomplete function. The results are sorted and grouped by the year of issue.
- Fulltext: Searching in the OCR text of individual pages. The results are sorted by relevance. You can filter the pages by the work, displaying also the number of matching pages for each work.
The results of both searches can be filtered by the year of issue, either by entering the range in the Year of issue (from) and Year of issue (to) fields or using the slider at the top of the page.
Searching ignores characters in the following categories (Unicode Character Categories):
- Punctuation, Connector Characters
- Punctuation, Dash Characters
- Punctuation, Close Characters
- Punctuation, Final Quote Characters
- Punctuation, Initial Quote Characters
- Punctuation, Other Characters
- Punctuation, Open Characters
System operation uses the infrastructure for the operation of virtual servers at the Masaryk University, built on the VMware technology, physically located at the Institute of Computer Science, Masaryk University. The operating system is Ubuntu LTS (based on the recommendation of the Islandora Community).
Backup is performed periodically using a Bacula-based tape device, managed by the Institute of Computer Science, Masaryk University.
All changes in the metadata of works, parts of works, pages, and authors are recorded in Drupal. User with the Editor role can view, compare, or delete versions of objects (i.e., Revisions) or revert to previous versions of the contents.
The content of the DK AN is accessible also to anonymous users. Such users may “browse” individual works and their pages, search for titles of works, parts of works, and authors, and also search in the full text of works. All types of files listed below [Included files] are available for download.
Users logged in with the Editor role have access to forms for editing the works, parts of works, and authors. They can also manage versions of objects (see Servers, backup, integrity, and authenticity).
An OAI-PMH endpoint has been created for all works in the DK AN for sharing the metadata in DC.
The contents of the repository (works, parts of works, pages) are accessible via the REST API. The following formats are supported for the GET method: csv, json, and jsonld.
URI of individual objects: https://arne-novak.phil.muni.cz/node/[object_id]
The Matomo tool has been deployed to monitor page performance. The collected data is stored on a local server.
The distribution of the data set does not contain author’s works. The provided distribution of the data set does not include author’s works pursuant to §2 of Act No. 121/2000 Coll., on Copyright and Rights Related to Copyright and on Amendment to Certain Acts (the Copyright Act). In this respect, copyright does not prevent any re-use of the content of the provided distribution data set.
The library contains records for the complete works, parts of works as well as individual pages.
Works can be viewed and subsequently downloaded in the PDF format or individual pages can be browsed directly in your browser. The full text of works can also be downloaded in the TXT format. Parts of works (such as introductions) can be viewed and the complete work subsequently downloaded in the PDF format or displayed page by page.
Files attached to works:
- full text of the work in PDF
- OCR text of the work in TXT
- cover page of the work in PNG
- metadata of the work in MARCXML (non-public)
Pages can be downloaded in the TXT or PNG formats after clicking the respective thumbnail.
Files attached to pages:
- page image in PNG
- page thumbnail in PNG (220x220 px)
- OCR text of the page in TXT
- page image in TIFF (non-public)
Scanning was performed using the PlusTek OpticBook 4600 book scanner. The following scanning parameters were used: 300 DPI, B&W, TIFF fax G4. A greyscale PNG image with half resolution is generated for viewing.
The subsequent processing (alignment, centring, cleaning) was performed manually, using a proprietary program. Only all four editions of Dějiny literatury české were processed automatically, using the Scan Tailor program, because of the time-consuming nature of the task.
Text recognition was performed using Readiris Pro 10, Corporate Edition. Text reviews are not planned.
The DK AN stores its metadata in Drupal Fields. Each type of record in the DK AN (works, parts of works, pages, and authors) has a defined list of respective fields.
When exporting the metadata in the DC schema, these fields are mapped to the DC schema elements.
The metadata schema for works together with the DC mapping:
Field (Czech) | Field (English) | DC term | Required | Searchable | Public | Repeatable | Value example | Standards and rules | Comments |
---|---|---|---|---|---|---|---|---|---|
Název | Title | http://purl.org/dc/terms/title | Yes | Yes | Yes | Patero obrázků z dějin knihy | |||
Autor | Author | http://purl.org/dc/terms/contributor | Yes | Yes | Yes | Novák, Arne | |||
Rok vydání | Year of issue | http://purl.org/dc/terms/issued | Yes | Yes | 1920 | ||||
Licence | Licence | http://purl.org/dc/terms/rights | Yes | Yes | Public domain | controlled vocabulary | Values: Public domain, EP, N/A | ||
Klíčová slova | Keywords | http://purl.org/dc/terms/subject | Yes | Arne Novák | |||||
ID | ID | http://purl.org/dc/terms/identifier | Yes | Yes | 24 | Identifier within the library | |||
PID | PID | http://purl.org/dc/terms/identifier | 20706 | Aleph MU library system number | |||||
SYSNO | SYSNO | http://purl.org/dc/terms/identifier | 46428 | ||||||
Vydavatel | Publisher | http://purl.org/dc/terms/publisher | Yes | Yes | Spolek výtvarných umělců Mánes | ||||
Citace | Citation | http://purl.org/dc/terms/bibliographicCitation | Yes | Yes | NOVÁK, Arne. Jan Neruda. 3. vyd. Praha: Spolek výtvarných umělců Mánes, 1920. Zlatoroh, sv. 2. | ||||
Popis | Description | http://purl.org/dc/terms/description | Yes | Yes | With a portrait by M. Švabinský | The description typically applies to the physical print. | |||
Typ | Type | http://purl.org/dc/terms/type | Yes | Book |