Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending support for honorific speech / 敬語 #1822

Merged
merged 19 commits into from
Jun 20, 2024

Conversation

enellis
Copy link
Contributor

@enellis enellis commented Jun 14, 2024

This draft PR is intended to be a place to discuss the following changes regarding the extension of support for honorific speech / 敬語 and providing an overview with a detailed commit history.

Changes

  • Added
    • Imperative for くださる
    • Reason for deinflection → respectful speech
      • Respectful continuous form
      • なさる
      • になる
    • Reason for deinflection → humble speech
    • Reason for deinflection → humble or Kansai dialect
      • Humble or dialectal continuous form (ておる, でおる, とる, どる)
    • Additions for ます
      • たら・たり and て-form
      • Imperative
  • Fixed
    • Recognition of のたまう・のたもう・宣う・曰う
    • Stop deinflection at masu-stem only if it's an Ichidan verb, as inputs like 行っており would not be parsed until the end. The in the comment described problem of re-deinflection occurs only on Ichidan verbs.
  • Refactoring
    • Renamed CandidateWord's reasons to reasonChains for more clarity
    • Changed DeinflectRules's reason to an array to allow multiple reasons per Rule
    • Ichidan stems are now programmatically forwarded to their plain form. This streamlines the rule data array, eliminating the need for multiple special rules for Ichidan stems.

Todo

  • Discussion
  • Chinese localization
  • Changelog

Points of discussion

  • Do we want to keep the -nasai reason? This case is also covered by the なさる implementation now. If we would remove the -nasai reason, it would show ‹ respectful ‹ imperative or masu-stem on inputs like 止めなさい.
    → Removing -nasai and masu-stem.
  • Do we want to keep the reasons for deinflection for dialectal / colloquial rules more general like dialectal, or make them more specific like Kansai-ben?
    → Specific, according to the dialects in JMdict.
  • Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.
    → Not (yet) necessary.
  • Do we want いらっしゃる to be recognized as 居る, 行く and 来る with < respectful? My concern is that this could be a little bit noisy in the results list. With an inflection of いらっしゃる as input, 来る would be the first result, followed by いらっしゃる → 居る → 行く.
    → No, because of the aforementioned reasons.
  • Which approach is preferable: being explicit and keep using Reason.None, or representing the condition through an empty array?
    → Representing the condition through an empty array.

Examples of new inputs being recognized after these changes

  • ありまして    as ある      ‹ polite ‹ -te
  • いらっしゃいませ as いらっしゃる  ‹ polite ‹ imperative
  • 見てらっしゃって as 見る      ‹ respectful ‹ continuous ‹ -te
  • 仕事なさる    as 仕事(する)  ‹ -suru ‹ respectful
  • 喜びなさった   as 喜ぶ      ‹ respectful ‹ past
  • 到着になります  as 到着(する)  ‹ -suru ‹ respectful ‹ polite
  • 読みになります  as 読む      ‹ respectful ‹ polite
  • お願い致します  as お願い(する) ‹ -suru ‹ humble < polite
  • 聞きしたい    as 聞く      ‹ humble ‹ -tai
  • 送りします    as 送る      ‹ humble ‹ polite
  • 待ちいたします  as 待つ      ‹ humble ‹ polite
  • 待っておりました as 待つ      ‹ humble or Kansai dialect ‹ continuous ‹ polite past
  • 飲んどる     as 飲む      ‹ humble or Kansai dialect ‹ continuous

Features needed for complete support that are not included

  • Honorific prefixes
  • Something else?

@enellis enellis force-pushed the extend-keigo branch 5 times, most recently from 0733e6b to 5cc5642 Compare June 16, 2024 16:17
Copy link
Member

@birtles birtles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very promising. Thank you!

A few general comments:

  • Do you mind if we leave out the honorific part from this PR? I have some ideas about doing that and in particular I'd like to avoid false positives such as matching ご会計, ご電話, ご見積, お利用 etc. (I furthermore would like to make it so that when it matches 御利用 etc. it indicates that 御 is read ご in this case. Maybe even indicating that 御 could be み in some common cases like 御心, 御国)
  • When we indicate "dialectal", can we indicate which dialects?

src/background/deinflect.test.ts Outdated Show resolved Hide resolved
src/background/deinflect.test.ts Outdated Show resolved Hide resolved
@birtles
Copy link
Member

birtles commented Jun 17, 2024

Points of discussion

  • Do we want to keep the -nasai reason? This case is also covered by the なさる implementation now. If we would remove the -nasai reason, it would show ‹ respectful ‹ imperative or masu-stem on inputs like 止めなさい.

I think "< respectful < imperative" would be fine but I'd like to avoid the "masu-stem" part if possible. That seems like a regression in terms of clarity.

  • Do we want to keep the reasons for deinflection for dialectal / colloquial rules more general like dialectal, or make them more specialized like Kansai-ben?

I'd much prefer we refer to specific dialects.

  • Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

  • Do we want いらっしゃる to be recognized as 居る, 行く and 来る with < respectful? My concern is that this could be a little bit noisy in the results list. With an inflection of いらっしゃる as input, 来る would be the first result, followed by いらっしゃる → 居る → 行く.

I agree, we probably don't want that.

  • Which approach is preferable: being explicit and keep using Reason.None, or representing the condition through an empty array?

I'm not sure I've fully grasped the difference yet (I've only given the PR a quick scan) but my initial hunch is not to introduce multiple ways of representing the same way, i.e. use an empty array instead of Reason.None.

@enellis
Copy link
Contributor Author

enellis commented Jun 17, 2024

Do you mind if we leave out the honorific part from this PR?

Absolutely not. I shouldn't have pushed it to this branch in the first place. This implementation is as naive as it gets and my original plan was to make it work together with masu-stem + する/なさる/になる.

When we indicate "dialectal", can we indicate which dialects?

I just wonder how broad we want to be. We can't know or list every dialect, so which ones do we want to support? How detailed do we want to be? For example, should we distinguish between Kyoto-ben, Osaka-ben, Hakata-ben etc. or should we refer to broader categories like Kansai-ben or even broader like Western Japan?
The use of おる instead of いる for example, though mainly found in Kyoto and Osaka, can be observed throughout whole western Japan. Additionally, in literature and film, おる can convey an old-fashioned or slightly arrogant tone.
My concern with listing specific dialects is that it could create the false implication of being exhaustive.

@birtles
Copy link
Member

birtles commented Jun 17, 2024

Do you mind if we leave out the honorific part from this PR?

Absolutely not. I shouldn't have pushed it to this branch in the first place. This implementation is as naive as it gets and my original plan was to make it work together with masu-stem + する/なさる/になる.

Great, thank you!

When we indicate "dialectal", can we indicate which dialects?

I just wonder how broad we want to be. We can't know or list every dialect, so which ones do we want to support? How detailed do we want to be? For example, should we distinguish between Kyoto-ben, Osaka-ben, Hakata-ben etc. or should we refer to broader categories like Kansai-ben or even broader like Western Japan? The use of おる instead of いる for example, though mainly found in Kyoto and Osaka, can be observed throughout whole western Japan. Additionally, in literature and film, おる can convey an old-fashioned or slightly arrogant tone. My concern with listing specific dialects is that it could create the false implication of being exhaustive.

I think we can restrict ourselves to the dialects in JMdict which should be these ones: http://www.edrdg.org/jmwsgi/edhelp.py?svc=jmdict&sid=#kw_dial

@enellis
Copy link
Contributor Author

enellis commented Jun 17, 2024

By the way, thank you for all the constructive feedback!

Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

This would be ideal, but since the te-form serves many functions, it is difficult to provide a concise description without causing misunderstandings. Perhaps '-te (kudasai)'? This might imply て is generally an abbreviation for てください, though. conjunction or request would be another idea.

@enellis
Copy link
Contributor Author

enellis commented Jun 17, 2024

I'm done implementing the first batch of suggestions:

  • Removed the honorific prefixes commit.
  • Removed masu-stem reason in case of masu-stem + なさい and removed Rason.Nasai.
  • Changed dialectal to Kansai dialect.
    • I based this on the dial_label_ks localization. Is Kansai-ben the better option?
  • 致す, なさる and になる don't deinflect to する anymore when they appear by themselves.
  • Pure forwarding rules are now represented by an empty reasons array instead of Reason.None.

src/background/deinflect.ts Outdated Show resolved Hide resolved
@birtles
Copy link
Member

birtles commented Jun 18, 2024

By the way, thank you for all the constructive feedback!

Not at all. Thank you so much for all your work and patience!

Do we want add a reason for deinflection for suru-noun / masu-stem / te-form + ください? Something like polite request comes to my mind.

I'm not sure if that's necessary, or at least not yet. If a user is scanning text like "助けてください" it's true they only get "助ける (te form)" followed by "ください = please" but if anything that probably suggests we should make the "te form" label more helpful? That way if the user is scanning "助けて" in isolation they can understand it might be a request?

This would be ideal, but since the te-form serves many functions, it is difficult to provide a concise description without causing misunderstandings. Perhaps '-te (kudasai)'? This might imply て is generally an abbreviation for てください, though. conjunction or request would be another idea.

I think "conjunction or request" is nice. We might consider changing the localization for "te form" to that some time.

@birtles
Copy link
Member

birtles commented Jun 18, 2024

I'm done implementing the first batch of suggestions:

  • Removed the honorific prefixes commit.
  • Removed masu-stem reason in case of masu-stem + なさい and removed Rason.Nasai.
  • Changed dialectal to Kansai dialect.
    • I based this on the dial_label_ks localization. Is Kansai-ben the better option?
  • 致す, なさる and になる don't deinflect to する anymore when they appear by themselves.
  • Pure forwarding rules are now represented by an empty reasons array instead of Reason.None.

Looks great! I think "Kansai dialect" is fine since that's what we already have (and we try to make the localizations beginner-friendly as far as possible).

@enellis
Copy link
Contributor Author

enellis commented Jun 18, 2024

  • Removed masu-stem reason on ていらっしゃい-forms.
  • Added check for type & Type.MasuStem.
  • Renamed Reason.HumbleOrKansaiBen to Reason.HumbleOrKansaiDialect.

@birtles
Copy link
Member

birtles commented Jun 19, 2024

  • Removed masu-stem reason on ていらっしゃい-forms.
  • Added check for type & Type.MasuStem.
  • Renamed Reason.HumbleOrKansaiBen to Reason.HumbleOrKansaiDialect.

Looks great! When you're ready to merge, please update the PR to "ready to review". @SaltfishAmi can likely help with the simplified Chinese localization.

@enellis
Copy link
Contributor Author

enellis commented Jun 19, 2024

Hey @SaltfishAmi,
sorry for the noise, but could you help localizing the following strings:

  • humble / 謙譲語
  • humble or Kansai dialect / 謙譲語・関西弁
  • respectful / 尊敬語

Thank you for your help!

@SaltfishAmi
Copy link
Contributor

Hey @SaltfishAmi, sorry for the noise, but could you help localizing the following strings:

* humble / 謙譲語

谦让语

* humble or Kansai dialect / 謙譲語・関西弁

谦让语或关西方言 (is this a bit too long?)

* respectful / 尊敬語

尊敬语

This is more descriptive and will be more distinguishable from
DeinflectRule's reasons array in the future.
This allows multiple reasons per rule in the future.
Forwarding the Ichidan verb stem to the plain form programmatically
allows us to simplify the rule data array, as we don't need the
special rules for Ichidan verbs anymore.
Inputs like 行っており would not be parsed until
the end. The in the code comment described problem
of re-deinflection occurs only on Ichidan verbs.
@enellis
Copy link
Contributor Author

enellis commented Jun 19, 2024

@SaltfishAmi Great, thank you very much.

谦让语或关西方言 (is this a bit too long?)

This should be fine, as 可能或被动或敬语的 is still a bit longer. Or did you mean the English version?

@enellis enellis marked this pull request as ready for review June 19, 2024 20:08
@birtles birtles enabled auto-merge (rebase) June 20, 2024 01:01
@birtles
Copy link
Member

birtles commented Jun 20, 2024

Looks great, thank you!

@birtles birtles merged commit 57fbef2 into birchill:main Jun 20, 2024
1 check passed
@SaltfishAmi
Copy link
Contributor

This should be fine, as 可能或被动或敬语的 is still a bit longer. Or did you mean the English version?

I meant the Chinese version. Yeah, it looks fine.

@enellis enellis deleted the extend-keigo branch June 20, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants