changes related to FooSoft/yomichan#84 #11

siikamiika · 2017-10-12T05:06:24Z

No description provided.

Seems like documentation isn't correct

FooSoft · 2017-10-12T17:33:27Z

common.go

@@ -102,6 +103,7 @@ func (terms dbTermList) crush() dbRecordList {
 			strings.Join(t.Rules, " "),
 			t.Score,
 			t.Glossary,
+			t.Sequence,


I don't know off-hand if the sequence values for JMDict are 0-based or 1-based, but if they are 0-based then this should probably be set to t.Sequence + 1. The reason for this is so that a sequence value of 0 can become a sentinel for "no sequence defined".

Seems like they start at 1000000, but I don't know how it's determined. In Yomichan, the sentinel value is currently set to -1.

Using -1 works as well, but in that case you'll have to make sure that all the other dictionaries that do not have the concept of Sequence have it initialized to that value.

Yeah, thinking about it more -1 is probably most explicit (since it's obviously an invalid sequence). You'll just have to make sure that the EPWING and ENAMDICT parsers explicitly set this value (since the default of 0 would be a valid value).

FooSoft · 2017-10-12T17:42:02Z

common.go

+	if len(term.TermTags) == 0 {
+		return tags
+	} else {
+		return tags + "\t" + strings.Join(term.TermTags, " ")


Is the idea that tags and termTags are separated by a tab and then the extension decides what is which is which at runtime? I think it would be cleaner to add an actual field to the export (similar to Sequence). The output of yomichan-import should be ready for consumption by the extension and there should not be any additional parsing taking place after that.

Yes, I decided for this to enable easier backwards compatibility with dictionaries exported with older version of yomichan-import. This is what's done in Yomichan: siikamiika/yomichan@4fb983a. If a new field for termTags is added, should it come after tags or at the end like sequence?

Yeah, adding a field at the end will allow us to keep compatibility since older versions of Yomichan will just ignore it when destructuring the array.

FooSoft · 2017-10-12T17:43:54Z

edict.go

 			term.Score += 100
 		case "P":
 			term.Score += 500
-		case "arch", "iK":
+		case "iK", "ik":


Hah, there are both iK and ik? Do they mean different things? Oh crazy EDICT.

FooSoft · 2017-10-12T17:49:45Z

edict.go

+			"v2r-s", "v2s-s", "v2t-k", "v2t-s", "v2w-s", "v2y-k", "v2y-s", "v2z-s", "v4b", "v4h", "v4k", "v4m", "v4r", "v4s", "v4t", "v5aru",
+			"v5b", "v5g", "v5k", "v5k-s", "v5m", "v5n", "v5r-i", "v5r", "v5s", "v5t", "v5u", "v5u-s", "vi", "vk", "vn", "vr", "vs-c", "vs-i",
+			"vs", "vs-s", "vt", "vz":
+			tag.Category = "pos"


What exactly does pos mean? Is it an abbreviation for something? Prefer the full name since it makes it easier to understand.

There are currently no categories that have multiple words, so should I use partOfSpeech part-of-speech or something else? Anyway, I think it must be compatible with CSS classes so it can't have spaces.

Yeah, having spaces would definitely not be good. I think partOfSpeech would be the most consistent way to name this.

FooSoft · 2017-10-12T17:58:04Z

common.go

@@ -77,6 +77,7 @@ type dbTerm struct {
 	Expression string
 	Reading    string
 	Tags       []string


At the point where we have termTags, it would probably make sense to rename tags to be be something that more specifically reflects what they are tagging.

siikamiika · 2017-10-13T00:30:38Z

Instead of hard coding Sequence = -1 for everything, I added sequence to epwing.go that increments every time extractor.extractTerms is called. JMnedict had Sequence already. The rikai solution isn't perfect, but I'm not familiar enough with the format to say if it's possible to infer the original "sequence" from it (unless you hash by glossary or something).

FooSoft · 2017-10-13T15:40:10Z

Looks good, thanks for the updates!

siikamiika added 8 commits October 11, 2017 05:30

edict: add sequence

46492eb

add TermTags

23488d9

edict: respect NoKanji

edb1628

edict: fix Score

c6c50ce

edict: fix PartsOfSpeech and Misc

28bd180

edict: add Category "pos"

5437ad2

edict: fix PartsOfSpeech and Misc with Restricted*

1ee4aae

edict: undo persistent Misc

3100950

Seems like documentation isn't correct

FooSoft reviewed Oct 12, 2017

View reviewed changes

siikamiika added 3 commits October 13, 2017 02:46

rename Tags to DefinitionTags

26d01e0

add Sequence to other dictionary formats

8252612

rename pos to partOfSpeech

f5e195a

FooSoft merged commit cc4140f into FooSoft:dev Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes related to FooSoft/yomichan#84 #11

changes related to FooSoft/yomichan#84 #11

siikamiika commented Oct 12, 2017

FooSoft Oct 12, 2017

siikamiika Oct 12, 2017

FooSoft Oct 12, 2017

FooSoft Oct 12, 2017

FooSoft Oct 12, 2017

siikamiika Oct 12, 2017

FooSoft Oct 12, 2017

FooSoft Oct 12, 2017

FooSoft Oct 12, 2017

siikamiika Oct 12, 2017

FooSoft Oct 12, 2017

FooSoft Oct 12, 2017

siikamiika commented Oct 13, 2017

FooSoft commented Oct 13, 2017

changes related to FooSoft/yomichan#84 #11

changes related to FooSoft/yomichan#84 #11

Conversation

siikamiika commented Oct 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siikamiika commented Oct 13, 2017

FooSoft commented Oct 13, 2017