different behavior dealing with Japanese Kanji by different model #204

unlock2000 · 2022-09-30T12:02:56Z

unlock2000
Sep 30, 2022

When transcribing some Japanese language audio files, I noticed there exist a small probalilty (one charpter out of total 20 charpters) that all the Kanji characters have been
wrongly replaced by their corresponding hiragana characters with medium and large mode, while with tiny and small model, the Kanji
characters have not been transformed.

Seems worthy futher investigation.

tiny&small model
[00:00.000 --> 00:09.000] Chapter 19 違う人生
[00:09.000 --> 00:16.000] ヒロは毎日夢を見ます。
[00:16.000 --> 00:24.000] 夢の中でヒロは違う人生を行きます。
[00:24.000 --> 00:32.000] 昨日の夜ヒロは面白い夢を見ました。
[00:32.000 --> 00:39.000] 夢の中でヒロには息子がいました。
[00:39.000 --> 00:48.000] そして息子と一緒に古い列車に乗っていました。
[00:48.000 --> 00:57.000] 列車の窓から美しい山や湖が見えました。
[00:57.000 --> 01:04.000] しかし突然地震が起きました。
[01:04.000 --> 01:16.000] その地震はとてつもなく大きくて、列車が激しく揺れました。
[01:16.000 --> 01:26.000] そして列車は線路を外れて山にぶつかりました。
[01:26.000 --> 01:34.000] 夢の中でヒロは火を失いました。
[01:34.000 --> 01:42.000] そして目が覚めると息子がいませんでした。
[01:42.000 --> 01:49.000] ヒロは一生懸命息子を探しました。
[01:49.000 --> 01:56.000] 電車の中も山の中も探しました。
[01:56.000 --> 02:03.000] でも息子は見つかりませんでした。
[02:03.000 --> 02:11.000] そして夢の中で五年が経ちました。
[02:11.000 --> 02:21.000] 朝目が覚めてヒロは夢について考えました。
[02:21.000 --> 02:24.000] 変な夢
[02:24.000 --> 02:33.000] そして布団から出て仕事に行きました。
[02:33.000 --> 02:41.000] ヒロは毎朝仕事に行って夕方に帰ってきて、
[02:41.000 --> 02:55.000] 晩ご飯を食べてビデオゲームをして少し本を読んでから寝ます。
[02:55.000 --> 03:02.000] ヒロは毎日夢を見ます。
[03:02.000 --> 03:10.000] 夢の中ではいつも問題ばかり起きます。
[03:10.000 --> 03:19.000] でもヒロは毎日夢を見るのが楽しみです。
[03:19.000 --> 03:31.000] 今夜はどんな夢を見るのかな?と考えて今日も寝ます。
[03:31.000 --> 03:39.000] ヒロは夢の中で違う人生を行きます。

medium&large model

[00:00.000 --> 00:04.000] チャプターナインティーン
[00:04.000 --> 00:08.000] ちがうじんせい
[00:08.000 --> 00:15.000] ひろはまいにちゆめをみます
[00:15.000 --> 00:24.000] ゆめのなかでひろはちがうじんせいをいきます
[00:24.000 --> 00:32.000] きのうのよるひろはおもしろいゆめをみました
[00:32.000 --> 00:39.000] ゆめのなかでひろにはむすこがいました
[00:39.000 --> 00:48.000] そしてむすこといっしょにふるいれっしゃにのっていました
[00:48.000 --> 00:56.000] れっしゃのまどからうつくしいやまやみずうみがみえました
[00:56.000 --> 01:04.000] しかしとつぜんじしんがおきました
[01:04.000 --> 01:10.000] そのじしんはとてつもなくおおきくて
[01:10.000 --> 01:15.000] れっしゃがはげしくゆれました
[01:15.000 --> 01:25.000] そしてれっしゃはせんろをはずれてやまにぶつかりました
[01:25.000 --> 01:33.000] ゆめのなかでひろはきをうしないました
[01:33.000 --> 01:41.000] そしてめがさめるとむすこがいませんでした
[01:41.000 --> 01:49.000] ひろはいっしょうけんめいむすこをさがしました
[01:49.000 --> 01:56.000] でんしゃのなかもやまのなかもさがしました
[01:56.000 --> 02:03.000] でもむすこはみつかりませんでした
[02:03.000 --> 02:11.000] そしてゆめのなかでごねんがたちました
[02:11.000 --> 02:21.000] あさめがさめてひろはゆめについてかんがえました
[02:21.000 --> 02:24.000] へんなゆめ
[02:24.000 --> 02:33.000] そしてふとんからでてしごとにいきました
[02:33.000 --> 02:38.000] ひろはまいあさしごとにいって
[02:38.000 --> 02:41.000] ゆうがたにかえってきて
[02:41.000 --> 02:44.000] ばんごはんをたべて
[02:44.000 --> 02:55.000] ビデオゲームをしてすこしほんをよんでからねます
[02:55.000 --> 03:02.000] ひろはまいにちゆめをみます
[03:02.000 --> 03:10.000] ゆめのなかではいつももんだいばかりおきます
[03:10.000 --> 03:19.000] でもひろはまいにちゆめをみるのがたのしみです
[03:19.000 --> 03:31.000] こんやはどんなゆめをみるのかなとかんがえてきょうもねます
[03:31.000 --> 03:41.000] ひろはゆめのなかでちがうじんせいをいきます

Answered by jongwook

Sep 30, 2022

It's similar to the "punctuation mode" and "no-punctuation mode" observed in #194, where the model can sample a certain style of writing and continue using it since the subsequent transcriptions are conditioned on the previous outputs.

I think the model was more likely to go with this "no-kanji mode" given the tone and the content of the story which would be targeted to children.

As mentioned in the comments in #194, you could supply a hypothetical sentence that could have come before the audio as the initial prompt, such as, --initial_prompt "次はヒロが見た変な夢の物語です。" to nudge the model to output in the style you want.

View full answer

jongwook · 2022-09-30T21:40:56Z

jongwook
Sep 30, 2022
Maintainer

It's similar to the "punctuation mode" and "no-punctuation mode" observed in #194, where the model can sample a certain style of writing and continue using it since the subsequent transcriptions are conditioned on the previous outputs.

I think the model was more likely to go with this "no-kanji mode" given the tone and the content of the story which would be targeted to children.

As mentioned in the comments in #194, you could supply a hypothetical sentence that could have come before the audio as the initial prompt, such as, --initial_prompt "次はヒロが見た変な夢の物語です。" to nudge the model to output in the style you want.

1 reply

unlock2000 Oct 1, 2022
Author

Thanks for your explanation and suggestion.
After adding the inital prompt parameter, the Kanji characters appear in the output.

[00:00.000 --> 00:09.000] Chapter 19 違う人生
[00:09.000 --> 00:16.000] ヒロは毎日夢を見ます。
[00:16.000 --> 00:24.000] 夢の中でヒロは違う人生を生きます。
[00:24.000 --> 00:32.000] 昨日の夜、ヒロは面白い夢を見ました。
[00:32.000 --> 00:39.000] 夢の中でヒロには息子がいました。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different behavior dealing with Japanese Kanji by different model #204

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

different behavior dealing with Japanese Kanji by different model #204

unlock2000 Sep 30, 2022

Replies: 1 comment · 1 reply

jongwook Sep 30, 2022 Maintainer

unlock2000 Oct 1, 2022 Author

unlock2000
Sep 30, 2022

Replies: 1 comment 1 reply

jongwook
Sep 30, 2022
Maintainer

unlock2000 Oct 1, 2022
Author