Extending support for honorific speech / 敬語 #1822

enellis · 2024-06-14T22:32:52Z

This draft PR is intended to be a place to discuss the following changes regarding the extension of support for honorific speech / 敬語 and providing an overview with a detailed commit history.

Changes

Added
- Imperative for くださる
- Reason for deinflection → respectful speech
  - Respectful continuous form
  - なさる
  - になる
- Reason for deinflection → humble speech
  - 致す
  - ~~masu-stem + する~~ (See: Duplicate results of causative and causative passive forms of Ichidan verbs #1849)
- Reason for deinflection → humble or Kansai dialect
  - Humble or dialectal continuous form (ておる, でおる, とる, どる)
- Additions for ます
  - たら・たり and て-form
  - Imperative
Fixed
- Recognition of のたまう・のたもう・宣う・曰う
- Stop deinflection at masu-stem only if it's an Ichidan verb, as inputs like 行っており would not be parsed until the end. The in the comment described problem of re-deinflection occurs only on Ichidan verbs.
Refactoring
- Renamed CandidateWord's reasons to reasonChains for more clarity
- Changed DeinflectRules's reason to an array to allow multiple reasons per Rule
- Ichidan stems are now programmatically forwarded to their plain form. This streamlines the rule data array, eliminating the need for multiple special rules for Ichidan stems.

Todo

Discussion
Chinese localization
Changelog

Points of discussion

Do we want to keep the -nasai reason? This case is also covered by the なさる implementation now. If we would remove the -nasai reason, it would show ‹ respectful ‹ imperative or masu-stem on inputs like 止めなさい.
→ Removing -nasai and masu-stem.
Do we want to keep the reasons for deinflection for dialectal / colloquial rules more general like dialectal, or make them more specific like Kansai-ben?
→ Specific, according to the dialects in JMdict.
Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.
→ Not (yet) necessary.
Do we want いらっしゃる to be recognized as 居る, 行く and 来る with < respectful? My concern is that this could be a little bit noisy in the results list. With an inflection of いらっしゃる as input, 来る would be the first result, followed by いらっしゃる → 居る → 行く.
→ No, because of the aforementioned reasons.
Which approach is preferable: being explicit and keep using Reason.None, or representing the condition through an empty array?
→ Representing the condition through an empty array.

Examples of new inputs being recognized after these changes

ありまして　　　 as ある　　　　　 ‹ polite ‹ -te
いらっしゃいませ as いらっしゃる　‹ polite ‹ imperative
見てらっしゃって as 見る　　　　　 ‹ respectful ‹ continuous ‹ -te
仕事なさる　　　 as 仕事（する）　 ‹ -suru ‹ respectful
喜びなさった　　 as 喜ぶ　　　　　 ‹ respectful ‹ past
到着になります　 as 到着（する）　 ‹ -suru ‹ respectful ‹ polite
読みになります　 as 読む　　　　　 ‹ respectful ‹ polite
お願い致します　as お願い（する） ‹ -suru ‹ humble < polite
~~聞きしたい　　　 as 聞く　　　　　 ‹ humble ‹ -tai~~
~~送りします　　　 as 送る　　　　　 ‹ humble ‹ polite~~
待ちいたします　 as 待つ　　　　　 ‹ humble ‹ polite
待っておりました as 待つ　　　　　 ‹ humble or Kansai dialect ‹ continuous ‹ polite past
飲んどる　　　　 as 飲む　　　　　 ‹ humble or Kansai dialect ‹ continuous

Features needed for complete support that are not included

Honorific prefixes
Something else?

birtles

Looking very promising. Thank you!

A few general comments:

Do you mind if we leave out the honorific part from this PR? I have some ideas about doing that and in particular I'd like to avoid false positives such as matching ご会計, ご電話, ご見積, お利用 etc. (I furthermore would like to make it so that when it matches 御利用 etc. it indicates that 御 is read ご in this case. Maybe even indicating that 御 could be み in some common cases like 御心, 御国)
When we indicate "dialectal", can we indicate which dialects?

src/background/deinflect.test.ts

birtles · 2024-06-17T01:52:36Z

Points of discussion

Do we want to keep the -nasai reason? This case is also covered by the なさる implementation now. If we would remove the -nasai reason, it would show ‹ respectful ‹ imperative or masu-stem on inputs like 止めなさい.

I think "< respectful < imperative" would be fine but I'd like to avoid the "masu-stem" part if possible. That seems like a regression in terms of clarity.

Do we want to keep the reasons for deinflection for dialectal / colloquial rules more general like dialectal, or make them more specialized like Kansai-ben?

I'd much prefer we refer to specific dialects.

Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

Do we want いらっしゃる to be recognized as 居る, 行く and 来る with < respectful? My concern is that this could be a little bit noisy in the results list. With an inflection of いらっしゃる as input, 来る would be the first result, followed by いらっしゃる → 居る → 行く.

I agree, we probably don't want that.

Which approach is preferable: being explicit and keep using Reason.None, or representing the condition through an empty array?

I'm not sure I've fully grasped the difference yet (I've only given the PR a quick scan) but my initial hunch is not to introduce multiple ways of representing the same way, i.e. use an empty array instead of Reason.None.

enellis · 2024-06-17T05:00:28Z

Do you mind if we leave out the honorific part from this PR?

Absolutely not. I shouldn't have pushed it to this branch in the first place. This implementation is as naive as it gets and my original plan was to make it work together with masu-stem + する/なさる/になる.

When we indicate "dialectal", can we indicate which dialects?

I just wonder how broad we want to be. We can't know or list every dialect, so which ones do we want to support? How detailed do we want to be? For example, should we distinguish between Kyoto-ben, Osaka-ben, Hakata-ben etc. or should we refer to broader categories like Kansai-ben or even broader like Western Japan?
The use of おる instead of いる for example, though mainly found in Kyoto and Osaka, can be observed throughout whole western Japan. Additionally, in literature and film, おる can convey an old-fashioned or slightly arrogant tone.
My concern with listing specific dialects is that it could create the false implication of being exhaustive.

birtles · 2024-06-17T05:28:46Z

Do you mind if we leave out the honorific part from this PR?

Absolutely not. I shouldn't have pushed it to this branch in the first place. This implementation is as naive as it gets and my original plan was to make it work together with masu-stem + する/なさる/になる.

Great, thank you!

When we indicate "dialectal", can we indicate which dialects?

I just wonder how broad we want to be. We can't know or list every dialect, so which ones do we want to support? How detailed do we want to be? For example, should we distinguish between Kyoto-ben, Osaka-ben, Hakata-ben etc. or should we refer to broader categories like Kansai-ben or even broader like Western Japan? The use of おる instead of いる for example, though mainly found in Kyoto and Osaka, can be observed throughout whole western Japan. Additionally, in literature and film, おる can convey an old-fashioned or slightly arrogant tone. My concern with listing specific dialects is that it could create the false implication of being exhaustive.

I think we can restrict ourselves to the dialects in JMdict which should be these ones: http://www.edrdg.org/jmwsgi/edhelp.py?svc=jmdict&sid=#kw_dial

enellis · 2024-06-17T06:56:32Z

By the way, thank you for all the constructive feedback!

Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

This would be ideal, but since the te-form serves many functions, it is difficult to provide a concise description without causing misunderstandings. Perhaps '-te (kudasai)'? This might imply て is generally an abbreviation for てください, though. conjunction or request would be another idea.

enellis · 2024-06-17T11:03:27Z

I'm done implementing the first batch of suggestions:

Removed the honorific prefixes commit.
Removed masu-stem reason in case of masu-stem + なさい and removed Rason.Nasai.
Changed dialectal to Kansai dialect.
- I based this on the dial_label_ks localization. Is Kansai-ben the better option?
致す, なさる and になる don't deinflect to する anymore when they appear by themselves.
Pure forwarding rules are now represented by an empty reasons array instead of Reason.None.

src/background/deinflect.ts

src/background/deinflect.test.ts

birtles · 2024-06-18T05:48:27Z

By the way, thank you for all the constructive feedback!

Not at all. Thank you so much for all your work and patience!

Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

This would be ideal, but since the te-form serves many functions, it is difficult to provide a concise description without causing misunderstandings. Perhaps '-te (kudasai)'? This might imply て is generally an abbreviation for てください, though. conjunction or request would be another idea.

I think "conjunction or request" is nice. We might consider changing the localization for "te form" to that some time.

birtles · 2024-06-18T06:03:25Z

I'm done implementing the first batch of suggestions:

Removed the honorific prefixes commit.

Removed masu-stem reason in case of masu-stem + なさい and removed Rason.Nasai.

Changed dialectal to Kansai dialect.

I based this on the dial_label_ks localization. Is Kansai-ben the better option?

致す, なさる and になる don't deinflect to する anymore when they appear by themselves.

Pure forwarding rules are now represented by an empty reasons array instead of Reason.None.

Looks great! I think "Kansai dialect" is fine since that's what we already have (and we try to make the localizations beginner-friendly as far as possible).

enellis · 2024-06-18T09:22:12Z

Removed masu-stem reason on ていらっしゃい-forms.
Added check for type & Type.MasuStem.
Renamed Reason.HumbleOrKansaiBen to Reason.HumbleOrKansaiDialect.

birtles · 2024-06-19T04:42:49Z

Removed masu-stem reason on ていらっしゃい-forms.

Added check for type & Type.MasuStem.

Renamed Reason.HumbleOrKansaiBen to Reason.HumbleOrKansaiDialect.

Looks great! When you're ready to merge, please update the PR to "ready to review". @SaltfishAmi can likely help with the simplified Chinese localization.

enellis · 2024-06-19T08:12:39Z

Hey @SaltfishAmi,
sorry for the noise, but could you help localizing the following strings:

humble / 謙譲語
humble or Kansai dialect / 謙譲語・関西弁
respectful / 尊敬語

Thank you for your help!

SaltfishAmi · 2024-06-19T11:46:02Z

Hey @SaltfishAmi, sorry for the noise, but could you help localizing the following strings:
* humble / 謙譲語

谦让语

* humble or Kansai dialect / 謙譲語・関西弁

谦让语或关西方言 (is this a bit too long?)

* respectful / 尊敬語

尊敬语

This is more descriptive and will be more distinguishable from DeinflectRule's reasons array in the future.

This allows multiple reasons per rule in the future.

Forwarding the Ichidan verb stem to the plain form programmatically allows us to simplify the rule data array, as we don't need the special rules for Ichidan verbs anymore.

Inputs like 行っており would not be parsed until the end. The in the code comment described problem of re-deinflection occurs only on Ichidan verbs.

The implementation of nasaru has made Reason.Nasai obsolete.

e.g. 聞きしたい, 送りします

enellis · 2024-06-19T19:54:10Z

@SaltfishAmi Great, thank you very much.

谦让语或关西方言 (is this a bit too long?)

This should be fine, as 可能或被动或敬语的 is still a bit longer. Or did you mean the English version?

birtles · 2024-06-20T01:02:01Z

Looks great, thank you!

SaltfishAmi · 2024-06-20T01:02:14Z

This should be fine, as 可能或被动或敬语的 is still a bit longer. Or did you mean the English version?

I meant the Chinese version. Yeah, it looks fine.

enellis force-pushed the extend-keigo branch 5 times, most recently from 0733e6b to 5cc5642 Compare June 16, 2024 16:17

birtles reviewed Jun 17, 2024

View reviewed changes

src/background/deinflect.test.ts Outdated Show resolved Hide resolved

src/background/deinflect.test.ts Outdated Show resolved Hide resolved

enellis force-pushed the extend-keigo branch from 95e28be to 9bf8609 Compare June 17, 2024 10:30

enellis commented Jun 17, 2024

View reviewed changes

src/background/deinflect.ts Outdated Show resolved Hide resolved

enellis commented Jun 17, 2024

View reviewed changes

src/background/deinflect.test.ts Outdated Show resolved Hide resolved

enellis force-pushed the extend-keigo branch from 9e6bd7b to a2e5779 Compare June 18, 2024 09:12

enellis added 9 commits June 19, 2024 21:42

chore: rename CandidateWord.reasons field to reasonChains

2596d92

This is more descriptive and will be more distinguishable from DeinflectRule's reasons array in the future.

chore: change DeinflectRule.reason field to an array

fc8cbe9

This allows multiple reasons per rule in the future.

chore: refactor deinflection of Ichidan verbs

53a09e7

Forwarding the Ichidan verb stem to the plain form programmatically allows us to simplify the rule data array, as we don't need the special rules for Ichidan verbs anymore.

fix: stop after masu stem only if Ichidan verb

420257a

Inputs like 行っており would not be parsed until the end. The in the code comment described problem of re-deinflection occurs only on Ichidan verbs.

fix: better handling of notamau / notamou

411dca8

feat: add te/tara/tari-forms and imperative for masu

8346147

feat: add imperative for kudasaru

4629e53

feat: add reason for respectful speech

f30f94b

feat: add respectful continuous forms

2d3e80e

enellis added 8 commits June 19, 2024 21:42

feat: add nasaru as respectful speech

2959c14

chore: remove Reason.Nasai

f6fd081

The implementation of nasaru has made Reason.Nasai obsolete.

feat: add ni-naru as respectful speech

ff07f3a

feat: add reason for humble speech or Kansai dialect

a287aba

feat: add humble or dialectal continuous te-oru forms

d813fe9

feat: add reason for humble speech

e7ac30e

feat: add itasu as humble speech for suru

a2a172f

feat: add masu-stem + suru forms as humble speech

77c7409

e.g. 聞きしたい, 送りします

enellis force-pushed the extend-keigo branch from a2e5779 to 27c5277 Compare June 19, 2024 19:49

chore: update CHANGELOG.md

dc95fb5

enellis force-pushed the extend-keigo branch from 27c5277 to dc95fb5 Compare June 19, 2024 20:01

enellis marked this pull request as ready for review June 19, 2024 20:08

chore: minor tweaks

72a6fb1

birtles enabled auto-merge (rebase) June 20, 2024 01:01

birtles merged commit 57fbef2 into birchill:main Jun 20, 2024
1 check passed

enellis deleted the extend-keigo branch June 20, 2024 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending support for honorific speech / 敬語 #1822

Extending support for honorific speech / 敬語 #1822

enellis commented Jun 14, 2024 •

edited

Loading

birtles left a comment

birtles commented Jun 17, 2024

Points of discussion

enellis commented Jun 17, 2024 •

edited

Loading

birtles commented Jun 17, 2024

enellis commented Jun 17, 2024 •

edited

Loading

enellis commented Jun 17, 2024 •

edited

Loading

birtles commented Jun 18, 2024

birtles commented Jun 18, 2024

enellis commented Jun 18, 2024

birtles commented Jun 19, 2024

enellis commented Jun 19, 2024

SaltfishAmi commented Jun 19, 2024

enellis commented Jun 19, 2024

birtles commented Jun 20, 2024

SaltfishAmi commented Jun 20, 2024

Extending support for honorific speech / 敬語 #1822

Extending support for honorific speech / 敬語 #1822

Conversation

enellis commented Jun 14, 2024 • edited Loading

Changes

Todo

Points of discussion

Examples of new inputs being recognized after these changes

Features needed for complete support that are not included

birtles left a comment

Choose a reason for hiding this comment

birtles commented Jun 17, 2024

Points of discussion

enellis commented Jun 17, 2024 • edited Loading

birtles commented Jun 17, 2024

enellis commented Jun 17, 2024 • edited Loading

enellis commented Jun 17, 2024 • edited Loading

birtles commented Jun 18, 2024

birtles commented Jun 18, 2024

enellis commented Jun 18, 2024

birtles commented Jun 19, 2024

enellis commented Jun 19, 2024

SaltfishAmi commented Jun 19, 2024

enellis commented Jun 19, 2024

birtles commented Jun 20, 2024

SaltfishAmi commented Jun 20, 2024

enellis commented Jun 14, 2024 •

edited

Loading

enellis commented Jun 17, 2024 •

edited

Loading

enellis commented Jun 17, 2024 •

edited

Loading

enellis commented Jun 17, 2024 •

edited

Loading