A curated list of resources, projects, and tools for using Artificial Intelligence in Libraries, Archives, and Museums.
- Introduction
- Learning Resources
- Tools and Frameworks
- Datasets
- Projects, Initiatives, and Case Studies
- Policies and recommendations
- Conferences and Workshops
- Publications and News Sources
- Community
- Contributions
- License
This list is a collection of resources, tools, projects, and other materials for professionals and enthusiasts in the Libraries, Archives, and Museums (LAM) sector. You might also know this as the GLAM (galleries, libraries, archives and museums) or CHI (cultural heritage institutions) sector, or be more familiar with the term 'memory institutions'. However you describe the field, if you know of an AI, machine learning, big data or data science project, event or resource related to collections, please share it here!
This list is maintained by the AI4LAM community. Its aim is to support knowledge sharing, innovation, and collaboration in applying AI to LAM.
Please note: the appearance of a resource on this list does not constitute an official endorsement by AI4LAM.
- Elements of AI – free course by MinnaLearn & University of Helsinki
- Introduction to AI for GLAM – by Library Carpentries
- AI Guide by the AI Pedagogy Project – collection of materials by metaLAB
- Slides from FF23 workshop on Intro to AI for GLAM and shared notes
- Machine Learning 101 – by Jason Mayes from Google
- Codecademy AI Courses – many topics; some lessons are free, some are for-fee
- Introduction to Deep Learning, by Sebastian Raschka
- Dive into Deep Learning, by Zhang et al.
- A Collection of AI Demos to Discover and Explore
- DeepLearning.AI Short Courses, a free courses from a platform created by Andrew Ng
- Introduction to Hugging Face, a free course by Codecademy
- A Gentle Introduction to Computer Vision – from Machine Learning Mastery
- Computer Vision for Heritage Collections – French-language 2 hr workshop designed to introduce computer vision applications to cultural heritage professionals
- Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification – two-part intro by the Programming Historian
- A Code-First Introduction to NLP – by Rachel Thomas of fast.ai
- NLP course and associated GitHub repo – by Elena Voita
- NLP accelerated class – by Machine Learning University
- Overview of deep learning techniques applied to NLP (2018)
- Deep Learning for NLP – from Machine Learning Mastery
- Hands-on NLTK Tutorial
- NLP in Python - Quickstart Guide
- Deep Learning for NLP With Pytorch
- A Very Gentle Introduction to LLMs without the Hype – by Mark Riedl
- What are large language models (LLMs)? – (YouTube) by Google for Developers
- A brief introduction to GenAI – by U. Michigan MIDAS
- Generative AI for Everyone – free Coursera course by Andrew Ng
- What Is ChatGPT Doing … and Why Does It Work? – by Stephen Wolfram
- The Map Of Transformers
- The Illustrated Transformer, a visual introduction to transformers
- Introduction to Generative AI, by Google
- Generative AI for Beginners - A Course, by Microsoft
- Understanding LLMs – A Transformative Reading List
- Large Language Model Course
- A Generative AI Primer, by the UK's National Centre for AI
- The AI4LAM YouTube channel has introductory presentations on many topics
- The CENL "AI in Libraries" network group is also organizing webinars on AI implementation in GLAM.
- Awesome Computer Vision
- Awesome Deep Learning for Natural Language Processing (NLP)
- Awesome Deep Learning
- Awesome Deep Learning Resources
- Awesome Deep Vision
- Awesome Document Understanding
- Awesome Generative AI
- Awesome Image Classification
- Awesome Jupyter GLAM
- Awesome LLM
- Awesome Machine Learning
- Awesome Machine Learning & Deep Learning Tutorials
- Awesome Natural Language Generation
- Awesome NLP
- Awesome Production Machine Learning
- Awesome Software Engineering for Machine Learning
- Awesome Visual Transformer
- Awesome XAI
- The NLP Index
Note: datasets for training and testing are listed in a separate section of this document.
- Arkindex – open-source platform for managing & processing collections of digitized documents
- Callico – open-source web platform for document annotation
- Coconut Libtool – web-based textual analysis tool designed to assist social scientists, librarians, or anyone in data analysis
- Distributed Annotation 'n' Enrichment (DANE) – compute task assignment & file storage for automatic annotation of content (CLARIAH, Norway)
- HTRFLOW demo and associated GitHub repo – explore AI models for Handwritten Text Recogntion (Swedish National Archives)
- Label Studio – data labeling platform to fine-tune LLMs, prepare training data, or validate AI models
- OCR correction – OCR correction tools (Bibliothèque nationale, Luxembourg)
- Surya – multilingual document OCR toolkit with line-level text detection
- Text models from the National Library of Sweden – available on Hugging Face
- Transkribus – transcription, recognition, & searching of historical documents
- Acoustic models from the National Library of Sweden – available on Hugging Face
- Annotorious – JavaScript image annotation library
- Audiovisual Metadata Platform (AMP) – generation of metadata for discovery & use of digital audio & video collections (Indiana U., USA)
- CAMPI – Computer-Aided Metadata Generation for Photo archives Initiative (Carnegie Mellonw U., USA)
- ELAN – addS textual annotations to audio and/or video recordings (Max Planck Institute for Psycholinguistics, The Netherlands)
- inaFaceAnalyzer – Python toolbox for face-based description of gender representation in media (Institut National de l'Audiovisuel, France)
- Newspaper Navigator – explore visual & textual content in the Chronicling America digitized newspaper collection (Library of Congress, USA)
- Oodi – virtual information assistant (Helsinki Central Library)
- ReTV – video analysis & summarization (Modul Univesrity, Austria)
- VGG Image Annotator – manual annotation software for image, audio and video
- Annif and associated tutorial – tool for automated subject indexing and classification (National Library of Finland)
- GallicaPix – retrieval of heritage images (Bibliothèque nationale de France)
- GallicaSNOOP – framework for large-scale content-based image retrieval (Bibliothèque nationale de France)
- Maken Similarity Service – tools for alternative reading & finding similar photographs (National Library of Norway)
- Semantic search for Nasjonalmuseet’s online collection – open beta test (National Museum of Norway)
- VGG Text Search (VTS) Engine – search for text strings over a user-defined image set
- BERTopic – topic modeling technique that leverages Transformers and c-TF-IDF
- Chatbot for Luxembourgish newspapers – uses ChatGPT and understands French, German and English (Bibliothèque nationale de Luxembourg)
- Norwegian Transformer Model (NoTraM) – transformer model for Norwegian and Nordic languages (National Library of Norway)
- Swedish BERT – BERT model for the Swedish language (Royal Library of Sweden)
- Visual AI – open-world interpretable visual transformer (UK)
There are many (G)LAM-related datasets on Hugging Face. The following links will perform live searches directly in Hugging Face for datasets tagged with the given terms:
- Full-text search for "handwritten text recognition"
- Full-text search for "optical text recognition"
- Datasets tagged "summarization"
- Datasets tagged "feature extraction"
- Datasets tagged "image classification"
- Datasets tagged "video classification"
- Datasets tagged "text classification"
- Datasets tagged "audio classification"
- Gensim datasets – repository of datasets for unstructured text processing
- HTR datasets in Zenodo – subject search in Zenodo
- HTR-United – datasets for training transcription or segmentation models
- Kaggle datasets
- nlp-datasets – free/public domain datasets with text data for use in NLP
- Open Library data dumps – from the Internet Archive
- Open data collections from the National Library of Scotland
- Registry of Open Data on AWS – datasets tagged by topic
- Inventory of NARA Artificial Intelligence (AI) Use Cases - the US National Archives and Records Administration (NARA)'s inventory of AI use cases
- List of Artificial Intelligence (AI) initiatives in museums – compiled in 2021 by Elena Villaespesa, Oonagh Murphy and Kate Nadel for the Museums+AI Network project.
- Projects in AI Registry (PAIR) – registry of AI projects in higher education (U. Oklahoma Libraries, USA)
- Argilla prompt-collective – crowdsourcing effort to rank 50,000 prompts, on Hugging Face
- BigLAM – BigScience Libraries, Archives and Museums on Hugging Face
- Living with Machines – Turing Institute & British Library
- Machine Learning with Archive Collections
- Nasjonalbiblioteket AI Lab – National Library of Norway on Hugging Face
- KBLab – National Library of Sweden on Hugging Face
- Vatican Manuscripts – machine transcription in the Vatican Secret Archive
- PleIAs – French organization training LLMs with an open science approach
- ACM TechBrief on Generative AI, by the ACM Technology Policy Council
- Canadian Government Principles for responsible, trustworthy and privacy-protective generative AI technologies
- IFLA Statement on Libraries and Artificial Intelligence
- US Government Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
- A cluster analysis of national AI strategies – Brookings Institute analysis of different countries’ national AI strategies, Dec. 2023
- A principled governance for emerging AI regimes: lessons from China, the European Union, and the United States by R. B. L. Dixon in AI and Ethics, 3, 793–810, 2023
- AI Governance Alliance: Briefing Paper Series – by the World Economic Forum, Jan. 2024
- AI policies across the globe: Implications and recommendations for libraries by L. S. Lo in IFLA Journal, 49(4), 645–649, 2023
- Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches to Principles for AI by Fjeld et al, Berkman Klein Center Research Publication No. 2020-1, 2020
- What ethics do I need to consider when using AI? – blog posting by Livi Adu, Nov. 2023
- Responsible AI in Libraries and Archives - IMLS funded project to produce tools and strategies that support responsible use of AI in the field (2022-2025)
- A Comprehensive AI Policy Education Framework for University Teaching and Learning by C. K. Y. Chan in International Journal of Educational Technology in Higher Education, 20(38), 2023.
- A Framework for U.S. AI Governance: Creating a Safe and Thriving AI Sector white paper by the MIT Schwarzman College of Computing, Dec. 11, 2023. (See also related article in MIT News.)
- LC Labs Artificial Intelligence Planning Framework – US Library of Congress planning framework for responsible exploration and adoption of AI
- French translation: Planification de projets IA dans les GLAM
The annual Fantastic Futures conference is the main conference series for the AI4LAM community. Various other conferences and workshops are relevant to the community and may be included in the list below.
👋🏻 Note: AI4LAM's conferences tracker Google sheet has a more complete list of events. The following is a list of larger and/or especially relevant events for AI4LAM.
- BitCurator Forum – Mar. 19–22 virtual event on digital forensics, digital archives, and related digital analysis workflows
- IIPC General Assembly & Web Archiving Conference – Apr. 24–26 at the Bibliothèque nationale de France, Paris, France.
- Digital Library Federation (DLF) 2024 Forum – Jul. 29–31 at Michigan State U., East Lansing, Michigan, USA.
- International Conference on Document Analysis and Recognition (ICDAR) 2024 – Aug. 30–Sep. 4 in Athens, Greece.
- International Conference on Digital Preservation (iPRES) 2024 – Sep. 16–20 in Ghent & Flanders, Belgium.
- Fantastic Futures 2024 – Oct. 16–18 at the National Film and Sound Archive of Australia (NFSA), Canberra, Australia.
- ai4Libraries Conference – Oct. 23 and/or 24 virtual event hosted by Georgia Tech Library, Atlanta, Georgia, USA.
- Fantastic Futures 2018 – Dec. 5 at the National Library of Norway, Oslo, Norway.
- Fantastic Futures 2019 – Dec. 4–6 at Stanford University, Stanford, California, USA.
- Fantastic Futures 2021 – Dec. 8–10 at the Bibliothèque nationale de France, Paris, France.
- Fantastic Futures 2022 – Nov. 30–Dec. 2 virtual event hosted by the British Library, London, England.
- ai4Libraries Conference – Oct. 19 virtual event hosted by Georgia Tech Library, Atlanta, Georgia, USA.
- Fantastic Futures 2023 – Nov. 15–17 at Internet Archive Canada Headquarters, Vancouver, British Columbia, Canada.
- Fantastic Futures 2024 – Oct. 15–18 at The National Film and Sound Archive of Australia (NFSA) in Canberra, Australia.
- AI & Society
- AI Magazine
- Archival Science
- Big Data & Society
- Critical AI
- Digital Humanities Quarterly
- Digital Scholarship in the Humanities
- International Journal on Digital Libraries
- Journal of Academic Librarianship
- Journal of Cultural Analytics
- Journal of Documentation
- Journal of Librarianship and Information Science
- Journal of Open Humanities Data
- Journal of the Association for Information Science and Technology
- Journal on Computing and Cultural Heritage
- Library Hi Tech
- Library Resources & Technical Services
- Literary and Linguistic Computing
- Social Science Computer Review
- World Digital Libraries – An International Journal
The AI4LAM community's home page is https://ai4lam.org. The secretariat and other contact addresses can be found at the About page.
Your help and participation in enhancing this awesome list are very much welcome! Please use the issue ticket system to request additions or changes, or to make other contributions to this repository. For more information, please visit the guidelines for contributing.
The contents of this page are licensed under the Creative Commons CC0 1.0 Universal license. CC0 is a “no rights reserved” license; the authors relinquish copyright and similar rights to the contents of the Awesome AI for LAM list.