Besides their regular text, Wikipedia articles are rich multimedia documents containing video, audio, and above all, images. The volume of Wikipedia images is very large: English Wikipedia articles alone contain more than 5 million unique images. As observed in recent work, images represent an essential component of Wikipedia readers' experiences with significant user engagement[1]. At the same time, from a data perspective, the geographical and cultural diversity of Wikipedia makes its image data unprecedented and extremely valuable for the research community. Despite its volume and value, navigating, retrieving, and re-using visual content on Wikipedia is hard, due to the lack of labels, categories, and metadata. Classification of this content for research and editing purposes is becoming increasingly important. Unfortunately, the value offered by its uniqueness comes with the disadvantage that common off-the-shelf classification models based on ImageNet give unsatisfactory results, requiring a custom solution.
This project is inspired by the textual counterpart ORES, and the goal is two-fold: 1) develop a classification taxonomy to label images on Wikipedia and 2) develop a model for image classification and embedding. The first part requires familiarity with semantic network data such as Wikipedia/Commons category network, and it aims to identify the best way to label images on Wikipedia based on existing metadata (e.g. Wikipedia/Commons templates, categories, and tags). The second part will focus on training and evaluating a deep learning model to predict the binary relevance of a set of relevant labels.
More info: https://meta.wikimedia.org/wiki/Research:Automated_Categorization_of_Wikipedia_Images