Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

add epwing support for kotowaza #4

Merged
merged 1 commit into from Mar 21, 2017
Merged

add epwing support for kotowaza #4

merged 1 commit into from Mar 21, 2017

Conversation

ghost
Copy link

@ghost ghost commented Mar 21, 2017

Introduction

This is a dictionary I have wanted to use for a while which did not work with rikaisama nor some other old desktop epwing reader application. With the exception of a very small number of cases noted below, all the useful information contained in the epwing can be extracted.

This dictionary apparently is still being produced and sold by 三省堂 as part of the 新明解 dictionaries, however the title contained in the epwing merely says 故事ことわざの辞典. This epwing may be of a much older revision before the 新明解 label.

Regexes

Almost all of the headings are remarkably clean. The heading text is simply the proverb or idiom, with a reading for every word.

  • 哀哀(あいあい)たる父母(ふぼ)我(われ)を生(う)みて苦労(くろう)せり

However, there are headings which contain alternate forms.

The approach taken in the implementation of the extractor is to determine all reduced forms of the expression. A reduced form of an expression is one which has no word alternatives. Then for each reduced form, all possible readings are determined for that particular form.

The following is an example parse of a simple case where there are three possible variations of a proverb.

-- 今参(いままい)り=二十日(はつか)〔=百日(ひゃくにち)・三日(みっか)〕 -> 今参(いままい)り二十日(はつか) or 今参(いままい)り百日(ひゃくにち) or 今参(いままい)り三日(みっか)

Below is the information about possible scenarios regarding alternate forms.

Word alternatives are indicated by the character. Every with non-bracketed text that follows is paired with alternatives enclosed by 〔= and . There are 886 headings with word alternatives enclosed by 〔= and . There are 886 headings with such alternatives.

  • 燕雀(えんじゃく)=安(いずく)んぞ〔=何(なん)ぞ〕=鴻鵠(こうこく)〔=大鵬(たいほう)〕の=志(こころざし)〔=心(こころ)〕を知(し)らんや

There is 1 heading that is an exception to the indicator and has no alternatives:

  • 勝地(しょうち)は=主(ぬし)なし

A quick google search for 勝地は主なし does not come up with any exact matches but instead 勝地定主無し comes up which appears to have the same meaning. Nevertheless, this single exception is left unhandled so as not to unnecessarily complicate the regex.

Alternatives are also specified with the character. This can denote alternatives for readings (the most common case).

There are 197 headings with reading alternatives enclosed by ().

  • 鰯(いわし)の頭(かしら・あたま)も信心(しんじん)から

There are at most 3 reading alternatives listed and this is the case for exactly 1 heading.

  • 藪(やぶ)に馬鍬(まぐわ・まんぐわ・うまぐわ)

The character can also denote alternatives for words enclosed by 〔= and .

There are 26 headings with additional word alternatives denoted by the character.

  • 泣(な)く子(こ)も目(め)を=開(あ)く〔=見(み)る・見よ〕

There are at most 3 additional word alternatives denoted by the and this is the case for exactly 3 headings.

  • 内弁慶(うちべんけい)外(そと)=味噌(みそ)〔=菜虫(なむし)・すばり・鼠(ねずみ)〕
  • 下手(へた)の長糸(ながいと)、上手(じょうず)の=小糸(こいと)〔=手糸(ていと)・一寸(いっすん)・一尺(いっしゃく)〕
  • 雪(ゆき)は豊年(ほうねん)の=瑞(しるし)〔=例(ためし)・貢(みつぎ)・貢物(みつぎもの)〕

There are exactly 9 headings where for a group of words (the primary and its alternatives), the primary word has more than one reading.

  • 倚門(いもん)の=望(ぼう・のぞみ)〔=情(じょう)〕
  • 口(くち)は禍(わざわい)の=門(かど・もん)〔=元(もと)〕

There are exactly 2 headings which contain word alternatives which also contain reading alternatives.

  • 乞食(こじき)に=筋(すじ)〔=種(しゅ・たね)〕無(な)し"
  • 孫(まご)飼(か)わんより=犬(いぬ)の子(こ)〔=犬子(えのこ・えのころ)〕飼え

This case is left unhandled so as not to unnecessarily complicate the regex.

There are exactly 2 headings with word alternatives where readings do not immediately follow a kanji.

  • 男(おとこ)の子(こ)は=父(ちち)〔=男・男親(おとこおや)〕に付(つ)く
  • 藪(やぶ)に=功(こう)の者(もの)〔=功・功者(こうしゃ)〕

This case is left unhandled so as not to unnecessarily complicate the regex.

Glyph Tables

Thankfully, none of the bitmap glyphs are ever referenced in the dictionary entries.

Tags

Putting aside the fact that there is no grammatical metadata included in the entries, I do not think it is necessarily a good idea to try and apply deinflection on proverbs. In any case, there is no feasible way to determine grammatical metadata for the entries.

@FooSoft FooSoft merged commit 8fd1282 into FooSoft:master Mar 21, 2017
@FooSoft
Copy link
Owner

FooSoft commented Mar 21, 2017

Excellent work, I was pretty interested in this dictionary as well, but didn't have the time to hook it up 👍
Thanks again for the codes and clear explanation!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant