UnicodeEncodeError: 'latin-1' codec can't encode characters #933

hackinteach · 2023-09-28T05:19:35Z

Describe the bug

When PDF contain non-latin text (Thai in this case), pdf.output() fails to write pdf to a file with exception
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-11: ordinal not in range(256)

Error details

Traceback (most recent call last):
  File "~/sandbox/fpdf-bug/main.py", line 16, in <module>
    pdf.output(f"test_encrypt_utf8_fpdf_{__version__}.pdf")
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/fpdf.py", line 4784, in output
    self.buffer = output_producer.bufferize()
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/output.py", line 443, in bufferize
    pdf_obj.serialize(_security_handler=fpdf._security_handler)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 172, in serialize
    obj_dict = self._build_obj_dict(_security_handler)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 193, in _build_obj_dict
    return build_obj_dict(
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 244, in build_obj_dict
    value = value.serialize(
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 277, in serialize
    return _security_handler.encrypt_string(self, _obj_id)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/encryption.py", line 215, in encrypt_string
    return f"<{bytes(self.encrypt_bytes(string.encode('latin-1'), obj_id)).hex().upper()}>"
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-11: ordinal not in range(256)

Minimal code

from fpdf import FPDF, __version__

if __name__ == "__main__":
    print(__version__)
    pdf = FPDF()
    pdf.add_page()
    pdf.add_font("Garuda", fname="garuda.ttf")
    pdf.set_font("Garuda", size=12)
    pdf.start_section("ทดสอบภาษาไทย")
    pdf.write(
        txt="สวัสดี ทดสอบภาษาไทย กีกี้ กาก้า ก๋า อ้า อ้ำ ฤาษี ทุ่มทุน อุ้งอุ๋ง น้ำใจ ฯลฯ ญาญ่า ฐาน ฎีกา ฏฒัฯนณ ภัทร์ สิทธิ์")  # This line contains non-latin text
    pdf.set_encryption("password", "password")
    pdf.output(f"fpdf_{__version__}.pdf")

Environment
Please provide the following information:

Operating System: mac os Ventura 13.6
Python version: 3.10.6
fpdf2 version used: tested on tag 2.7.0 and 2.7.5

The text was updated successfully, but these errors were encountered:

Lucas-C · 2023-09-28T18:01:09Z

Hi @hackinteach!

Thank you for reporting this 👍
I was able to reproduce your problem.

Seems like the problem comes from the serialization of the PDFString .title of a OutlineItemDictionary object.

I won't have time to look for a fix right now, so anyone is welcome to submit a PR to fix this 🙂

…935)

Lucas-C · 2023-09-29T06:04:18Z

@andersonhc fixed this in PR #935

You can test it by installing the latest version of fpdf2 from this repo:

pip install git+https://github.com/py-pdf/fpdf2.git@master

We are planning to release a new version of fpdf2 soon

hackinteach added the bug label Sep 28, 2023

andersonhc added the encryption label Sep 28, 2023

Lucas-C added the outline-toc label Sep 28, 2023

andersonhc mentioned this issue Sep 28, 2023

Fix encryption of strings containing non-latin characters (Fix #933) #935

Merged

4 tasks

Lucas-C closed this as completed in #935 Sep 29, 2023

Lucas-C pushed a commit that referenced this issue Sep 29, 2023

Fix encryption of strings containing non-latin characters (Fix #933) (#…

a17d2f4

…935)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

hackinteach commented Sep 28, 2023

Lucas-C commented Sep 28, 2023

Lucas-C commented Sep 29, 2023

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

Comments

hackinteach commented Sep 28, 2023

Lucas-C commented Sep 28, 2023

Lucas-C commented Sep 29, 2023