Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231

kreier · 2024-07-25T06:14:27Z

The determination of a string width with pdf.get_string_width(string) depends on the language set for the shape engine (when used). But even after explicit setting the shape engine to a specific script and language with something like pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara") this setting can change. For example, when a string with latin characters is printed. The shape engine examines the first character and realizes the mismatch, and changes to latin text shaping. But when the next string is rendered, the string width is determined first with the old (now latin) setting and after that the shape engine determines the language (arabic in this case) and switches to this script and language. But the return value is based on the calculation with the wrong latin setting.

I discovered this bug in a document where both latin and non-latin strings are mixed, and sometimes the non-latin strings where misplaced. To visualize this behavior I have this example below

Minimal code

from fpdf import FPDF
fontname = ["NotoArabic.ttf"]
teststrings = ["الملوك", "الملوك", "test", "الملوك", "الملوك", "الملوك", "test", "الملوك", "test"]

def render_strings(teststrings):
    pdf.set_font('noto', size=24)
    pdf.set_draw_color(160)
    pdf.set_line_width(0.3)
    for string in teststrings:
        # pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara")
        pdf.set_x(110 - pdf.get_string_width(string))
        pdf.rect(pdf.get_x(), pdf.get_y()+2, pdf.get_string_width(string), 13, style="D")
        pdf.cell(h=17, text=string)
        pdf.ln()
    pdf.ln()

for typeface in fontname:
    pdf = FPDF(orientation="P", unit="mm", format="A4")
    pdf.add_page()
    pdf.c_margin = 0
    pdf.add_font("noto", style="", fname="../../fonts/" + typeface)
    pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara")
    render_strings(teststrings)
    pdf.output("fpdf2_switch_language" + typeface + ".pdf")

The output looks like this:

Environment

Operating System: Mac OSX
Python version: 3.12.3
fpdf2 version used: git+https://github.com/py-pdf/fpdf2.git@fbbb3f701fd35abaff1cf0b04a8576fe45e204e2 (latest master)

The text was updated successfully, but these errors were encountered:

kreier · 2024-07-25T06:19:30Z

Updated test: setting the font shape engine to the desired language and script every time before determining the string width solves this problem. At least for the moment. I added this line in the example code above and commented it out.

andersonhc · 2024-07-25T14:51:02Z

Thanks for reporting this issue @kreier, I will take a look as soon as possible

kreier added the bug label Jul 25, 2024

kreier mentioned this issue Jul 25, 2024

Right space next to string in Arabic is not consistent for right-aligned strings kreier/timeline#51

Open

andersonhc added the text-shaping label Jul 25, 2024

andersonhc mentioned this issue Jul 27, 2024

Fix bidirectional text processing on get_string_width #1233

Merged

5 tasks

andersonhc closed this as completed in #1233 Jul 31, 2024

kreier mentioned this issue Sep 27, 2024

Alignment for RTL strings fixed kreier/timeline#54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231

Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231

kreier commented Jul 25, 2024 •

edited

Loading

kreier commented Jul 25, 2024 •

edited

Loading

andersonhc commented Jul 25, 2024

Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231

Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231

Comments

kreier commented Jul 25, 2024 • edited Loading

kreier commented Jul 25, 2024 • edited Loading

andersonhc commented Jul 25, 2024

kreier commented Jul 25, 2024 •

edited

Loading

kreier commented Jul 25, 2024 •

edited

Loading