Skip to content
c-martinez edited this page Dec 9, 2015 · 1 revision

The ShiCo server uses Word2Vec models created from the KB data set. Here we provide information on the models provided as well as instructions on how to create your own model.

Provided Models

We provide a set of Word2Vec models usable for tracking shifting concepts (in Dutch). These models were created from the KB newspaper archives between 1950 and 1990. Each model spans a 10 year period; the time periods covered by each model overlap. Documents in the newspaper archive were pre-processed using the same methodology as described in these script.

LFS

Because the word2vec models are rather big (about 30 GB), they are distributed using Githug LFS. If you want to use these models, install LFS and pull the models from our repo.

Creating your own model

We use the Gensim implementation of word2vec. You may want to create your own set of models for tracking shifting concepts, in your own language, for a different period of time, etc. The following is a list of issues to consider when creating your own model:

  • This tutorial may be useful to guide you in creating individual models.
  • Please name your models following the same convention used here: <from year>_<to year>.w2v, for example: 1950_1959.w2v
Clone this wiki locally