[3.x] Fix Chinese&Japanese erroneous newline #45290

erbing315 · 2021-01-18T17:08:03Z

Problem

Chinese&Japanese erroneous newline.For examples:
parse_bbcode("向最坏处着想,向最好处努力")

It's correct newline.But,when I add bbcode tag:
parse_bbcode("向[b]最坏处着想,向最好处努力")

It's wrong newline.Around at bbcode tag instead of correct place.
parse_bbcode("向[b]最坏[/b]处着想,向最好处努力")

Reason

Chinese&Japanese do not use space or any character for separating words.
So,a Chinese&Japanese sentence will be considered as a word.
BBcode tag will separate words,rich text lable try to put a word in one line,
then,it will newline in erroneous place.

Solve

For Chinese&Japanese,unicode greater than 0x3040 and less than 0xfaff,add them to the tag stack one by one.

My code causes other language and punctuation "sticking" to previous Chinese&Japanese char.
——For punctuation,it conforms to the Chinese grammar standard.
——For other language,it can be solved by adding a space between other language and Chinese&Japanese.

bruvzg · 2021-01-18T17:25:10Z

scene/gui/rich_text_label.cpp

-				//append item condition
+		int lipos = 0;
+		while (lipos < line.length()) {
+			if (line[lipos] >= 0x3040 && line[lipos] < 0xfaff) {


This range includes multiple non CJK blocks. Probably should be limited to 3400 — 4DBF, 4E00 — 9FFF, F900 — FAFF and 20000 — 2A6DF, 2F800 — 2FA1F (last two won't work on Windows).

For the reference, master branch use ICU break iterator with the following rules set:line_normal_cj.txt and 4MB dictionary: cjdict.txt.

If I understand correctly, this approach should work for pure ideographs, but not for mixed syllabary + ideographs (Okurigana). But since ICU based breaking won't be backported to 3.2, it's probably better than nothing.

Also, I'm not sure if it's good for performance to add a new ItemText for each word, it might be better to do it in the _process_line instead.

fire · 2021-01-18T17:57:03Z

The existence of https://w3c.github.io/i18n-tests/results/line-breaks-jazh means we don't have to redo the work. Since the requirements work has been done, we should at least see if it's possible to do a better job on it.

TokageItLab · 2021-01-18T18:28:59Z

The problem with this is that the behavior of line breaks is different between strings without tags and with tags, which seems to be a bit of a hacky implementation.

Also, you have to consider that Japanese and Chinese sometimes have English mixed in with the text. I tried your PR here and got the following.

Usually, put English words with 1-byte spaces between them, though... but it depends on the person.
(And this one is broken even without the BBTag)

I think there is a fundamental problem with line breaks in Godot, so I'll see if I can find a solution here as well.

TokageItLab · 2021-01-18T21:46:11Z

Probably the cause of the line break problem is that the character data and tag data are in the same array. First of all, Godot doesn't support line breaks in Japanese or Chinese at all, and usually tries to write everything out on a single line. If there is any space or non-character data, the character considers it as a break in RichTextLabel::_process_line() in line 402.

while (c[end] != 0 && !(end && c[end - 1] == ' ' && c[end] != ' ')) {

	int cw = font->get_char_size(c[end], c[end + 1]).width;
	if (c[end] == '\t') {
		cw = tab_size * font->get_char_size(' ').width;
	}

	if (end > 0 && w + cw + begin > p_width) {
		break; //don't allow lines longer than assigned width
	}

	w += cw;
	fw += cw;

	end++;
}
CHECK_HEIGHT(fh);
ENSURE_WIDTH(w);

When I rewrite the condition to test it, Japanese and Chinese lines are now broken correctly, but English lines are broken incorrectly instead.

I recommend that you implement the correct line break definition here.

2.0

TokageItLab · 2021-01-19T16:51:36Z

@erbing315 Rather than splitting all the words and increasing the number of items, it would be better to change the determination of line breaks according to the character encoding of the chars: *c in _process_line(). Good luck.

erbing315

@erbing315 Rather than splitting all the words and increasing the number of items, it would be better to change the determination of line breaks according to the character encoding of the chars: *c in _process_line(). Good luck.

ありがとう、でも、ぼくresolved that problem,and works in my IDE.But,can't successful checks

akien-mga · 2021-01-19T17:12:29Z

For the reference, note that this issue should already be fixed in the master branch by @bruvzg's work on Complex Text Layouts and especially the integration of ICU dictionary data to do proper word and line wrapping.

Any intermediate solution for 3.2 should thus try not to be too disruptive as the main fix for this issue has already happened for 4.0 and later.

erbing315 · 2021-01-22T07:48:24Z

@TokageItLab Does Japanese care about newline before small kana?Like this:
楽しい時間はあ
っという間に

アマチ
ュア

TokageItLab · 2021-01-22T16:21:53Z

@erbing315 Yes, it's true that Japanese doesn't do line breaks like that. But the line break rule may be supposed to be solved in 4.0 Just like @bruvzg and @akien-mga said. I think the problem is whether or not the line is broken correctly when enclosed in tags.

For example,

[b]最高[/b]っぽい

Then,

最高
っぽい

It is a mistake to treat things enclosed in tags like this as words. As for this, it may be already fixed in #43691.
@bruvzg Excuse me, is it right?

erbing315 · 2021-01-23T06:52:08Z

@TokageItLab No......bbcode tags still break word in master,
can bbc[b]ode bre[/b]ak word?

I will try to fix it,but,it's difficult because it's decided by tag stack's data structure
Maybe I will fix by plugin......

KoBeWi · 2021-01-23T12:43:44Z

Line breaking by tags is another issue (#41963) that should be solved in a new PR.

akien-mga · 2021-06-09T08:54:48Z

Superseded by #49280. Thanks for the contribution anyway!

erbing315 added 2 commits January 19, 2021 01:03

fix Chinese&Japanese erroneous newline

e3b8ef7

Update rich_text_label.cpp

d60e7a5

Calinou added bug topic:gui labels Jan 18, 2021

Calinou added this to the 3.2 milestone Jan 18, 2021

akien-mga requested review from bruvzg and a team January 18, 2021 17:15

bruvzg reviewed Jan 18, 2021

View reviewed changes

erbing315 added 2 commits January 20, 2021 00:08

fix Chinese&Japanese erroneous newline

4ddf4be

2.0

Update rich_text_label.cpp

d9e7e83

erbing315 commented Jan 19, 2021

View reviewed changes

erbing315 added 3 commits January 20, 2021 01:35

Update rich_text_label.cpp

ae82e95

Update rich_text_label.cpp

c3d5dea

promote performance

affb7ca

improve judging language

3cc5c9a

erbing315 changed the title ~~fix Chinese&Japanese erroneous newline~~ [3.2]fix Chinese&Japanese erroneous newline Jan 23, 2021

Base automatically changed from 3.2 to 3.x March 16, 2021 11:11

akien-mga modified the milestones: 3.2, 3.3 Mar 17, 2021

akien-mga modified the milestones: 3.3, 3.4 Mar 26, 2021

akien-mga changed the title ~~[3.2]fix Chinese&Japanese erroneous newline~~ [3.x] Fix Chinese&Japanese erroneous newline Mar 26, 2021

timothyqiu mentioned this pull request Jun 3, 2021

[3.x] Fix RichTextLabel auto-wrapping on CJK texts #49280

Merged

akien-mga closed this Jun 9, 2021

akien-mga added the archived label Jun 9, 2021

timothyqiu mentioned this pull request Apr 16, 2022

[3.x] Fix Label autowrap for CJK text #60294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.x] Fix Chinese&Japanese erroneous newline #45290

[3.x] Fix Chinese&Japanese erroneous newline #45290

erbing315 commented Jan 18, 2021

bruvzg Jan 18, 2021

bruvzg Jan 18, 2021

fire commented Jan 18, 2021 •

edited

Loading

TokageItLab commented Jan 18, 2021 •

edited

Loading

TokageItLab commented Jan 18, 2021

TokageItLab commented Jan 19, 2021

erbing315 left a comment

akien-mga commented Jan 19, 2021

erbing315 commented Jan 22, 2021

TokageItLab commented Jan 22, 2021

erbing315 commented Jan 23, 2021

KoBeWi commented Jan 23, 2021

akien-mga commented Jun 9, 2021

[3.x] Fix Chinese&Japanese erroneous newline #45290

[3.x] Fix Chinese&Japanese erroneous newline #45290

Conversation

erbing315 commented Jan 18, 2021

Problem

Reason

Solve

bruvzg Jan 18, 2021

Choose a reason for hiding this comment

bruvzg Jan 18, 2021

Choose a reason for hiding this comment

fire commented Jan 18, 2021 • edited Loading

TokageItLab commented Jan 18, 2021 • edited Loading

TokageItLab commented Jan 18, 2021

TokageItLab commented Jan 19, 2021

erbing315 left a comment

Choose a reason for hiding this comment

akien-mga commented Jan 19, 2021

erbing315 commented Jan 22, 2021

TokageItLab commented Jan 22, 2021

erbing315 commented Jan 23, 2021

KoBeWi commented Jan 23, 2021

akien-mga commented Jun 9, 2021

fire commented Jan 18, 2021 •

edited

Loading

TokageItLab commented Jan 18, 2021 •

edited

Loading