Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

Closed
hackinteach opened this issue Sep 28, 2023 · 2 comments · Fixed by #935
Closed

UnicodeEncodeError: 'latin-1' codec can't encode characters #933

hackinteach opened this issue Sep 28, 2023 · 2 comments · Fixed by #935

Comments

@hackinteach
Copy link

Describe the bug

When PDF contain non-latin text (Thai in this case), pdf.output() fails to write pdf to a file with exception
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-11: ordinal not in range(256)

Error details

Traceback (most recent call last):
  File "~/sandbox/fpdf-bug/main.py", line 16, in <module>
    pdf.output(f"test_encrypt_utf8_fpdf_{__version__}.pdf")
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/fpdf.py", line 4784, in output
    self.buffer = output_producer.bufferize()
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/output.py", line 443, in bufferize
    pdf_obj.serialize(_security_handler=fpdf._security_handler)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 172, in serialize
    obj_dict = self._build_obj_dict(_security_handler)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 193, in _build_obj_dict
    return build_obj_dict(
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 244, in build_obj_dict
    value = value.serialize(
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/syntax.py", line 277, in serialize
    return _security_handler.encrypt_string(self, _obj_id)
  File "~/sandbox/fpdf-bug/venv/lib/python3.10/site-packages/fpdf/encryption.py", line 215, in encrypt_string
    return f"<{bytes(self.encrypt_bytes(string.encode('latin-1'), obj_id)).hex().upper()}>"
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-11: ordinal not in range(256)

Minimal code

from fpdf import FPDF, __version__

if __name__ == "__main__":
    print(__version__)
    pdf = FPDF()
    pdf.add_page()
    pdf.add_font("Garuda", fname="garuda.ttf")
    pdf.set_font("Garuda", size=12)
    pdf.start_section("ทดสอบภาษาไทย")
    pdf.write(
        txt="สวัสดี ทดสอบภาษาไทย กีกี้ กาก้า ก๋า อ้า อ้ำ ฤาษี ทุ่มทุน อุ้งอุ๋ง น้ำใจ ฯลฯ ญาญ่า ฐาน ฎีกา ฏฒัฯนณ ภัทร์ สิทธิ์")  # This line contains non-latin text
    pdf.set_encryption("password", "password")
    pdf.output(f"fpdf_{__version__}.pdf")

Environment
Please provide the following information:

  • Operating System: mac os Ventura 13.6
  • Python version: 3.10.6
  • fpdf2 version used: tested on tag 2.7.0 and 2.7.5
@Lucas-C
Copy link
Member

Lucas-C commented Sep 28, 2023

Hi @hackinteach!

Thank you for reporting this 👍
I was able to reproduce your problem.

Seems like the problem comes from the serialization of the PDFString .title of a OutlineItemDictionary object.

I won't have time to look for a fix right now, so anyone is welcome to submit a PR to fix this 🙂

@Lucas-C
Copy link
Member

Lucas-C commented Sep 29, 2023

@andersonhc fixed this in PR #935

You can test it by installing the latest version of fpdf2 from this repo:

pip install git+https://github.com/py-pdf/fpdf2.git@master

We are planning to release a new version of fpdf2 soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants