Skip to content
This repository has been archived by the owner on Feb 25, 2023. It is now read-only.

Request: Separate Daijirin's J-J and J-E versions #16

Open
anonymouse333 opened this issue Mar 31, 2018 · 1 comment
Open

Request: Separate Daijirin's J-J and J-E versions #16

anonymouse333 opened this issue Mar 31, 2018 · 1 comment

Comments

@anonymouse333
Copy link

The Daijirin EPWING dictionary comes with both J-J and J-E definitions. Ideally, Yomichan Import should split these into two separate dictionaries so users can choose to add either only the J-J version or only the J-E version to Yomichan.

Alternatively, if the dictionary can't be converted into two separate versions at once, the user should be given the option to strip one version out during the conversion process, leaving them with either only a J-J version or only a J-E version.

@rnpnr
Copy link
Contributor

rnpnr commented Jun 25, 2021

Here is a hacky diff to do just that. Reverse the condition to get a dictionary containing only the J->E definitions.

Note this does not remove the English only entries but in my experience those aren't the ones that show up when you don't want them to. As far as I know it doesn't remove any entries incorrectly but the diff between the (pretty-printed) jsons is 400K lines long so I didn't look at the whole thing.

diff --git a/daijirin.go b/daijirin.go
index 5983918..46b11a1 100644
--- a/daijirin.go
+++ b/daijirin.go
@@ -29,6 +29,7 @@ import (
 )
 
 type daijirinExtractor struct {
+	engGlossExp  *regexp.Regexp
 	partsExp     *regexp.Regexp
 	readGroupExp *regexp.Regexp
 	expVarExp    *regexp.Regexp
@@ -39,6 +40,7 @@ type daijirinExtractor struct {
 
 func makeDaijirinExtractor() epwingExtractor {
 	return &daijirinExtractor{
+		engGlossExp:  regexp.MustCompile(`→英和`),
 		partsExp:     regexp.MustCompile(`([^(【〖]+)(?:【(.*)】)?(?:〖(.*)〗)?(?:((.*)))?`),
 		readGroupExp: regexp.MustCompile(`[-・]+`),
 		expVarExp:    regexp.MustCompile(`\(([^\)]*)\)`),
@@ -49,6 +51,10 @@ func makeDaijirinExtractor() epwingExtractor {
 }
 
 func (e *daijirinExtractor) extractTerms(entry zig.BookEntry, sequence int) []dbTerm {
+	if e.engGlossExp.FindStringIndex(entry.Text) != nil {
+		return nil
+	}
+
 	matches := e.partsExp.FindStringSubmatch(entry.Heading)
 	if matches == nil {
 		return nil

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants