mongo-jmdict is a project for storing the JMDict dictionary in mongo. The SAX parser persists the following JMDict fields:
- ent_seq
- keb
- reb
- pos
- gloss
Use bundle to install the required dependencies.
$ bundle install
Use --help
to see the list of available options.
$ ruby parse.rb --help
Once connected to Mongo, the script will download the latest version of the JMDict dictionary, parse, and finally persist to Mongo. If you intend to use your database for querying, be sure to ensure indexes on the relevant fields. For example,
$ mongo
> db.collection.ensureIndex({kanji: 1})
> db.collection.ensureIndex({readings: 1})
This package uses the EDICT and KANJIDIC dictionary files. These files are the property of the Electronic Dictionary Research and Development Group, and are used in conformance with the Group's licence.