-
Notifications
You must be signed in to change notification settings - Fork 52
Using QLever for Wikidata
Setting up a QLever instance for a fresh copy of the complete Wikidata is very easy.
You need a machine with at least 32 GB of RAM (better 64 GB) and 2 TB of disk space (SSD is best, but HDD also works). Download the qlever script and follow the simple instructions given on that page. Once you downloaded the script and started it, it is largely self-explanatory.
The following commands create a Qleverfile
(QLever's config file for everything) for Wikidata, downloads the dataset (aroung 100 GB compressed), load the data into QLever (aka builds an index), starts the server, and starts the UI and enables fast autocompletion (if you want that).
mkdir wikidata && cd wikidata
qlever setup-config wikidata
qlever get-data
qlever index
qlever start
qlever ui
qlever autocompletion-warmup
The following statistics on the data loading (= index building) time and index size are from a build on 21.01.204, on a PC with an AMD Ryzen 9 5900X processor (16 cores), 128 GB RAM, and 7.3 TB of NVMe SSD space.
Parse input : 1.4 h
Build vocabularies : 0.4 h
Convert to global IDs : 0.2 h
Permutation SPO & SOP : 0.6 h
Permutation OSP & OPS : 0.9 h
Permutation PSO & POS : 0.9 h
TOTAL index build time : 4.4 h
54G wikidata.index.ops
108G wikidata.index.ops.meta
51G wikidata.index.osp
108G wikidata.index.osp.meta
2.8G wikidata.index.patterns
72G wikidata.index.pos
2.3M wikidata.index.pos.meta
73G wikidata.index.pso
2.3M wikidata.index.pso.meta
39G wikidata.index.sop
94G wikidata.index.sop.meta
41G wikidata.index.spo
94G wikidata.index.spo.meta
206G wikidata.vocabulary.external
54G wikidata.vocabulary.external.idsAndOffsets.mmap
908K wikidata.vocabulary.internal
992G total