Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 Surrogates Not Allowed #546

Open
rsbohn opened this issue Aug 6, 2024 · 1 comment
Open

UTF8 Surrogates Not Allowed #546

rsbohn opened this issue Aug 6, 2024 · 1 comment

Comments

@rsbohn
Copy link

rsbohn commented Aug 6, 2024

Something in the text returned from GPT 4o can't be logged to the database.

File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3310, in insert_all
self.insert_chunk(
File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 3068, in insert_chunk
result = self.db.execute(query, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\tools\hudson\Lib\site-packages\sqlite_utils\db.py", line 524, in execute
return self.conn.execute(sql, parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc81' in position 14511: surrogates not allowed

Work around: Disable logs and run the prompt again.

PS> cat .\transcript.csv | llm -m 4o -s "Extract each place name."
@AlexanderYastrebov
Copy link

Would be nice to have a small reproducer file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants