-
Notifications
You must be signed in to change notification settings - Fork 23
Conversation
Seems like documentation isn't correct
@@ -102,6 +103,7 @@ func (terms dbTermList) crush() dbRecordList { | |||
strings.Join(t.Rules, " "), | |||
t.Score, | |||
t.Glossary, | |||
t.Sequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know off-hand if the sequence values for JMDict are 0-based or 1-based, but if they are 0-based then this should probably be set to t.Sequence + 1
. The reason for this is so that a sequence value of 0
can become a sentinel for "no sequence defined".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like they start at 1000000, but I don't know how it's determined. In Yomichan, the sentinel value is currently set to -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using -1
works as well, but in that case you'll have to make sure that all the other dictionaries that do not have the concept of Sequence
have it initialized to that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thinking about it more -1
is probably most explicit (since it's obviously an invalid sequence). You'll just have to make sure that the EPWING and ENAMDICT parsers explicitly set this value (since the default of 0
would be a valid value).
common.go
Outdated
if len(term.TermTags) == 0 { | ||
return tags | ||
} else { | ||
return tags + "\t" + strings.Join(term.TermTags, " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that tags
and termTags
are separated by a tab and then the extension decides what is which is which at runtime? I think it would be cleaner to add an actual field to the export (similar to Sequence). The output of yomichan-import
should be ready for consumption by the extension and there should not be any additional parsing taking place after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I decided for this to enable easier backwards compatibility with dictionaries exported with older version of yomichan-import. This is what's done in Yomichan: siikamiika/yomichan@4fb983a. If a new field for termTags is added, should it come after tags or at the end like sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding a field at the end will allow us to keep compatibility since older versions of Yomichan will just ignore it when destructuring the array.
edict.go
Outdated
term.Score += 100 | ||
case "P": | ||
term.Score += 500 | ||
case "arch", "iK": | ||
case "iK", "ik": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, there are both iK
and ik
? Do they mean different things? Oh crazy EDICT.
edict.go
Outdated
"v2r-s", "v2s-s", "v2t-k", "v2t-s", "v2w-s", "v2y-k", "v2y-s", "v2z-s", "v4b", "v4h", "v4k", "v4m", "v4r", "v4s", "v4t", "v5aru", | ||
"v5b", "v5g", "v5k", "v5k-s", "v5m", "v5n", "v5r-i", "v5r", "v5s", "v5t", "v5u", "v5u-s", "vi", "vk", "vn", "vr", "vs-c", "vs-i", | ||
"vs", "vs-s", "vt", "vz": | ||
tag.Category = "pos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly does pos
mean? Is it an abbreviation for something? Prefer the full name since it makes it easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are currently no categories that have multiple words, so should I use partOfSpeech
part-of-speech
or something else? Anyway, I think it must be compatible with CSS classes so it can't have spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having spaces would definitely not be good. I think partOfSpeech
would be the most consistent way to name this.
common.go
Outdated
@@ -77,6 +77,7 @@ type dbTerm struct { | |||
Expression string | |||
Reading string | |||
Tags []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point where we have termTags
, it would probably make sense to rename tags
to be be something that more specifically reflects what they are tagging.
Instead of hard coding |
Looks good, thanks for the updates! |
No description provided.