This repository has been archived by the owner on Feb 25, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This is a dictionary I have wanted to use for a while which did not work with rikaisama nor some other old desktop epwing reader application. With the exception of a very small number of cases noted below, all the useful information contained in the epwing can be extracted.
This dictionary apparently is still being produced and sold by
三省堂
as part of the新明解
dictionaries, however the title contained in the epwing merely says故事ことわざの辞典
. This epwing may be of a much older revision before the新明解
label.Regexes
Almost all of the headings are remarkably clean. The heading text is simply the proverb or idiom, with a reading for every word.
哀哀(あいあい)たる父母(ふぼ)我(われ)を生(う)みて苦労(くろう)せり
However, there are headings which contain alternate forms.
The approach taken in the implementation of the extractor is to determine all reduced forms of the expression. A reduced form of an expression is one which has no word alternatives. Then for each reduced form, all possible readings are determined for that particular form.
The following is an example parse of a simple case where there are three possible variations of a proverb.
--
今参(いままい)り=二十日(はつか)〔=百日(ひゃくにち)・三日(みっか)〕
->今参(いままい)り二十日(はつか)
or今参(いままい)り百日(ひゃくにち)
or今参(いままい)り三日(みっか)
Below is the information about possible scenarios regarding alternate forms.
Word alternatives are indicated by the
=
character. Every=
with non-bracketed text that follows is paired with alternatives enclosed by〔=
and〕
. There are 886 headings with word alternatives enclosed by〔=
and〕
. There are 886 headings with such alternatives.燕雀(えんじゃく)=安(いずく)んぞ〔=何(なん)ぞ〕=鴻鵠(こうこく)〔=大鵬(たいほう)〕の=志(こころざし)〔=心(こころ)〕を知(し)らんや
There is 1 heading that is an exception to the
=
indicator and has no alternatives:勝地(しょうち)は=主(ぬし)なし
A quick google search for
勝地は主なし
does not come up with any exact matches but instead勝地定主無し
comes up which appears to have the same meaning. Nevertheless, this single exception is left unhandled so as not to unnecessarily complicate the regex.Alternatives are also specified with the
・
character. This can denote alternatives for readings (the most common case).There are 197 headings with reading alternatives enclosed by
()
.鰯(いわし)の頭(かしら・あたま)も信心(しんじん)から
There are at most 3 reading alternatives listed and this is the case for exactly 1 heading.
藪(やぶ)に馬鍬(まぐわ・まんぐわ・うまぐわ)
The
・
character can also denote alternatives for words enclosed by〔=
and〕
.There are 26 headings with additional word alternatives denoted by the
・
character.泣(な)く子(こ)も目(め)を=開(あ)く〔=見(み)る・見よ〕
There are at most 3 additional word alternatives denoted by the
・
and this is the case for exactly 3 headings.内弁慶(うちべんけい)外(そと)=味噌(みそ)〔=菜虫(なむし)・すばり・鼠(ねずみ)〕
下手(へた)の長糸(ながいと)、上手(じょうず)の=小糸(こいと)〔=手糸(ていと)・一寸(いっすん)・一尺(いっしゃく)〕
雪(ゆき)は豊年(ほうねん)の=瑞(しるし)〔=例(ためし)・貢(みつぎ)・貢物(みつぎもの)〕
There are exactly 9 headings where for a group of words (the primary and its alternatives), the primary word has more than one reading.
倚門(いもん)の=望(ぼう・のぞみ)〔=情(じょう)〕
口(くち)は禍(わざわい)の=門(かど・もん)〔=元(もと)〕
There are exactly 2 headings which contain word alternatives which also contain reading alternatives.
乞食(こじき)に=筋(すじ)〔=種(しゅ・たね)〕無(な)し"
孫(まご)飼(か)わんより=犬(いぬ)の子(こ)〔=犬子(えのこ・えのころ)〕飼え
This case is left unhandled so as not to unnecessarily complicate the regex.
There are exactly 2 headings with word alternatives where readings do not immediately follow a kanji.
男(おとこ)の子(こ)は=父(ちち)〔=男・男親(おとこおや)〕に付(つ)く
藪(やぶ)に=功(こう)の者(もの)〔=功・功者(こうしゃ)〕
This case is left unhandled so as not to unnecessarily complicate the regex.
Glyph Tables
Thankfully, none of the bitmap glyphs are ever referenced in the dictionary entries.
Tags
Putting aside the fact that there is no grammatical metadata included in the entries, I do not think it is necessarily a good idea to try and apply deinflection on proverbs. In any case, there is no feasible way to determine grammatical metadata for the entries.