-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending support for honorific speech / 敬語 #1822
Conversation
0733e6b
to
5cc5642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very promising. Thank you!
A few general comments:
- Do you mind if we leave out the honorific part from this PR? I have some ideas about doing that and in particular I'd like to avoid false positives such as matching ご会計, ご電話, ご見積, お利用 etc. (I furthermore would like to make it so that when it matches 御利用 etc. it indicates that 御 is read ご in this case. Maybe even indicating that 御 could be み in some common cases like 御心, 御国)
- When we indicate "dialectal", can we indicate which dialects?
I think "< respectful < imperative" would be fine but I'd like to avoid the "masu-stem" part if possible. That seems like a regression in terms of clarity.
I'd much prefer we refer to specific dialects.
I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?
I agree, we probably don't want that.
I'm not sure I've fully grasped the difference yet (I've only given the PR a quick scan) but my initial hunch is not to introduce multiple ways of representing the same way, i.e. use an empty array instead of |
Absolutely not. I shouldn't have pushed it to this branch in the first place. This implementation is as naive as it gets and my original plan was to make it work together with masu-stem + する/なさる/になる.
I just wonder how broad we want to be. We can't know or list every dialect, so which ones do we want to support? How detailed do we want to be? For example, should we distinguish between Kyoto-ben, Osaka-ben, Hakata-ben etc. or should we refer to broader categories like Kansai-ben or even broader like Western Japan? |
Great, thank you!
I think we can restrict ourselves to the dialects in JMdict which should be these ones: http://www.edrdg.org/jmwsgi/edhelp.py?svc=jmdict&sid=#kw_dial |
By the way, thank you for all the constructive feedback!
This would be ideal, but since the te-form serves many functions, it is difficult to provide a concise description without causing misunderstandings. Perhaps '-te (kudasai)'? This might imply て is generally an abbreviation for てください, though. |
I'm done implementing the first batch of suggestions:
|
Not at all. Thank you so much for all your work and patience!
I think "conjunction or request" is nice. We might consider changing the localization for "te form" to that some time. |
Looks great! I think "Kansai dialect" is fine since that's what we already have (and we try to make the localizations beginner-friendly as far as possible). |
|
Looks great! When you're ready to merge, please update the PR to "ready to review". @SaltfishAmi can likely help with the simplified Chinese localization. |
Hey @SaltfishAmi,
Thank you for your help! |
|
This is more descriptive and will be more distinguishable from DeinflectRule's reasons array in the future.
This allows multiple reasons per rule in the future.
Forwarding the Ichidan verb stem to the plain form programmatically allows us to simplify the rule data array, as we don't need the special rules for Ichidan verbs anymore.
Inputs like 行っており would not be parsed until the end. The in the code comment described problem of re-deinflection occurs only on Ichidan verbs.
The implementation of nasaru has made Reason.Nasai obsolete.
e.g. 聞きしたい, 送りします
@SaltfishAmi Great, thank you very much.
This should be fine, as |
Looks great, thank you! |
I meant the Chinese version. Yeah, it looks fine. |
This draft PR is intended to be a place to discuss the following changes regarding the extension of support for honorific speech / 敬語 and providing an overview with a detailed commit history.
Changes
masu-stem + する(See: Duplicate results of causative and causative passive forms of Ichidan verbs #1849)CandidateWord
'sreasons
toreasonChains
for more clarityDeinflectRules
'sreason
to an array to allow multiple reasons per RuleTodo
Points of discussion
-nasai
reason? This case is also covered by the なさる implementation now. If we would remove the-nasai
reason, it would show‹ respectful ‹ imperative or masu-stem
on inputs like 止めなさい.→ Removing
-nasai
andmasu-stem
.dialectal
, or make them more specific likeKansai-ben
?→ Specific, according to the dialects in JMdict.
polite request
comes to my mind.→ Not (yet) necessary.
< respectful
? My concern is that this could be a little bit noisy in the results list. With an inflection of いらっしゃる as input, 来る would be the first result, followed by いらっしゃる → 居る → 行く.→ No, because of the aforementioned reasons.
Reason.None
, or representing the condition through an empty array?→ Representing the condition through an empty array.
Examples of new inputs being recognized after these changes
‹ polite ‹ -te
‹ polite ‹ imperative
‹ respectful ‹ continuous ‹ -te
‹ -suru ‹ respectful
‹ respectful ‹ past
‹ -suru ‹ respectful ‹ polite
‹ respectful ‹ polite
‹ -suru ‹ humble < polite
聞きしたい as 聞く‹ humble ‹ -tai
送りします as 送る‹ humble ‹ polite
‹ humble ‹ polite
‹ humble or Kansai dialect ‹ continuous ‹ polite past
‹ humble or Kansai dialect ‹ continuous
Features needed for complete support that are not included