Refactor Japanese ます, んかった form #1385

khaitruong922 · 2024-09-01T06:26:04Z

1. Refactor masu form

Problems

Currently there are many wrong conjugations with masu form, for example:
- 行きまさない (masu -> negative)
- 行きますな (masu -> imperative negative)
- 行きましたい (masu -> tai)
Verbs end with ます also is also treated as polite ます form
- 済ません returns (negative) and (potential -> n). (negative) is incorrect, it should be 済まさない.
- 済ましょう returns (volitional), which is incorrect, should be 済まそう.

Some may not appear in real text, but it can create wrong impressions on Japanese grammar for learners, especially when they start to output on their own.

Solution

Remove v5m, add masu and masen conditions.
masu form is a special form in Japanese with different rules of conjugation and we should not treat it as a godan verb.
Follow the conjugation rules in https://en.wiktionary.org/wiki/Appendix:Japanese_verbs
Only add universal rules: ません、ませんでした、ました、ましたら、まして、ます、ますれば、ませば、ましょう

Make んかった return past as out condition (make sure it's consistent with other かった ending), update name to '-n past'.

Will update some tests to make sure there are no false positives.
Maybe we should have a refactor PR to update the language transform test logic too. Currently it only checks if a transformation is correct, it does not check if a term has any extra wrong transformation. For testing false positive, we have to know it in advance and explicitly add it to the test. This makes it hard to track the effect when we're adding new forms to the language.

codspeed-hq · 2024-09-01T07:06:20Z

CodSpeed Performance Report

Merging #1385 will not alter performance

_{Comparing khaitruong922:refactor-masu (c811ea8) with master (cab7793)}

Summary

✅ 3 untouched benchmarks

🆕 2 new benchmarks
⁉️ 2 (👁 2) dropped benchmarks

Benchmarks breakdown

	Benchmark	`master`	`khaitruong922:refactor-masu`	Change
👁	`japanese transformations (n=66)`	18.5 ms	N/A	N/A
🆕	`japanese transformations (n=70)`	N/A	20.8 ms	N/A
👁	`japanese transformations-full (n=181)`	40.5 ms	N/A	N/A
🆕	`japanese transformations-full (n=193)`	N/A	44.3 ms	N/A

jamesmaa · 2024-09-01T21:19:31Z

cc @Lyroxide for review. If no review in 3 days I'll glance over it

Lyroxide · 2024-09-02T14:54:20Z

Any reason for including ませば? It's only used in kobun, where ませば is not "-ba of ます", but the "-ba of 未然形 of まし". I'd wager that ませば doesn't even exist in modern texts. ますれば seems fine though.

ませんかった threw me off at first but it seems legitimate according to 大辞林・日国

Is it possible to remove "-n past" but chain "-ta" with "-n"? Didn't had a chance to review when this was added initially.

khaitruong922 · 2024-09-02T15:05:51Z

Thanks for review. Let's remove ませば for consistency since we are not going to include the まする form.

khaitruong922 · 2024-09-02T15:19:00Z

I also agree with chaining -n -> -ta for `n-past.

ませんかった is the only exception -n past. It's not actually the -n form of verb, but just appending かった after ません, similar to ませんでした.
I think the correct transformation would be masu -> negative -> -ta, instead of -masu -> negative -> -n past (having double negative transformation)

khaitruong922 · 2024-09-02T15:28:23Z

Remove n past, added -n condition to chain with -ta.

khaitruong922 added 3 commits September 1, 2024 12:57

refactor masu form

01d18b3

normalize masu mai form

6be59d0

add tests

7770479

khaitruong922 marked this pull request as ready for review September 1, 2024 07:04

khaitruong922 requested a review from a team as a code owner September 1, 2024 07:04

test false positives

adb0755

Kuuuube added kind/bug The issue or PR is regarding a bug area/linguistics The issue or PR is related to linguistics labels Sep 1, 2024

remove maseba

1d92cfd

change n past to n + ta

1af6bc9

Merge branch 'master' into refactor-masu

c811ea8

jamesmaa approved these changes Sep 2, 2024

View reviewed changes

jamesmaa added this pull request to the merge queue Sep 2, 2024

Merged via the queue into yomidevs:master with commit 31e81fe Sep 2, 2024
9 checks passed

khaitruong922 deleted the refactor-masu branch September 2, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Japanese ます, んかった form #1385

Refactor Japanese ます, んかった form #1385

khaitruong922 commented Sep 1, 2024 •

edited

Loading

codspeed-hq bot commented Sep 1, 2024 •

edited

Loading

jamesmaa commented Sep 1, 2024

Lyroxide commented Sep 2, 2024

khaitruong922 commented Sep 2, 2024

khaitruong922 commented Sep 2, 2024 •

edited

Loading

khaitruong922 commented Sep 2, 2024

Refactor Japanese ます, んかった form #1385

Refactor Japanese ます, んかった form #1385

Conversation

khaitruong922 commented Sep 1, 2024 • edited Loading

codspeed-hq bot commented Sep 1, 2024 • edited Loading

CodSpeed Performance Report

Merging #1385 will not alter performance

Summary

Benchmarks breakdown

jamesmaa commented Sep 1, 2024

Lyroxide commented Sep 2, 2024

khaitruong922 commented Sep 2, 2024

khaitruong922 commented Sep 2, 2024 • edited Loading

khaitruong922 commented Sep 2, 2024

khaitruong922 commented Sep 1, 2024 •

edited

Loading

codspeed-hq bot commented Sep 1, 2024 •

edited

Loading

khaitruong922 commented Sep 2, 2024 •

edited

Loading