-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to insert UTF-8 character with fast_executemany option #617
Comments
Could you post an ODBC trace? |
Hi, thank you for looking into this. Below attached is the ODBC trace, I have executed the snippet posted in the initial comment: |
It looks like Python3 Unicode conversions don't really handle characters that require two UTF-16 codepoints very well, so "space", "🎥", "space" gets mangled. I need to dig a lot deeper to see if there is a neat fix for this, but one work around is to manually escape the codepoints with a "\u" in front of each one of the pair, so in case of "🎥", which is "d83c dfa5" in UTF-16, you'd put "\ud83c\udfa5" |
For me, with Python 3.7.4 under Windows 7, str2 = " \ud83c\udfa5 "
print(str2)
# UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed However, str2 = " \U0001f3a5 "
print(str2)
# 🎥 Unfortunately, it isn't really a workaround because the string parameter value is the same whether it is defined using a string literal (" 🎥 ") or an escaped Unicode code point (as above). I also notice that the example code fails if the column is declared as |
Yes, I forgot to mention this in the original description - this is only an issue if you insert to However, the string |
I know what the problem is now, it's due to some confusion between number of characters and number of codepoints in conversion code. I checked in a fix that should work for this case: However in the process of testing it, I discovered that when using Python2 on Windows, inserting any Unicode character in this manner doesn't work, because it doesn't do any conversion and so gives the driver the data in the source format (i.e. UTF-8), which the driver doesn't interpret correctly, as Windows itself is not set to UTF-8, so the driver treats it as 1252 So I will see what to do about that and will make a PR once I have a fix for that case as well. |
If @mkleehammer feels so inclined perhaps we could take this opportunity to discuss the future of Python2 support. |
Thank you for your support, I have manually built pyodbc with proposed fix (v-makouz@606b4a9) and it seems to work flawlessly since then (on Windows 10, Python 3.7.1). |
@gordthompson What are your thoughts on Python 2 support? I'd love to stop updates for it. I personally maintain a large code base still using Python 2 for some of the microservices, but because they are critical 24x7 services I rarely want to update the the pyodbc versions. (They'll hopefully be ported in 12 months or so.). I'd be happy to make occasional security / crash fixes on a dedicated Python 2 branch for a couple of years. If we take that step. we should also look at what the minimum Python 3 version should be. I guess that would depend on which LTS OS versions are out there. |
In that case I would propose the following: At the end of 2019, create a pyodbc4 branch. The master branch will then be for v5 (and beyond) and will be for Python_3(+) only. Probably no need to do a massive cleanup to remove all Python_2 code from master; as other changes are made to a particular source file just remove the Python_2-specific stuff at the same time. Eventually the Python_2 remnants will disappear.
My personal preference would be for 3.6+, but I notice that Ubuntu 16.04 LTS still distributes 3.5. Ubuntu 16.04 is on Standard Support until April 2021 (and EOL three years later) so it will be around for a while. So, we might have to go with Python_3.5+, at least at the source level (e.g., no f'{string}'s 😢) for the time being. For Windows we could probably get away with only building 3.6+ wheels. (I don't know about Macs.) |
Does this fix enable high unicode chars for inserts that don't use parameterized queries? instead of |
@timnyborg - Inserts using |
Well, while the parameterized method works on my stack (Python 3.7, MSSQL, Pyodbc 4.0.27, FreeTDS 1.00.40), the literals cause a sql syntax error. Perhaps it's a separate issue from the interaction with FreeTDS?
|
Try the Microsoft ODBC Driver for SQL Server instead: https://docs.microsoft.com/en-us/sql/connect/odbc/microsoft-odbc-driver-for-sql-server |
Yep, works with the Microsoft drivers (didn't even know they were available!) However, needed to implement an OpenSSL fix on Debian 10 to get around error 0x2746: Thanks, all. |
Is the fix for this problem (parameters not literals) currently scheduled for any release? In the interim is there a workaround, a way of 'pre-treating' the strings to bypass the error? The closest I can get is I load/reload a lot of data daily, many millions of rows, so I need Any advice is appreciated, thanks. |
Well, it's not pretty but it works:
If there is a better way, and there probably is, please let me know... Thanks |
* Merging updates. (#1) Merging updates. * fix for smalldatetime issue * Fixed a bad merge * Fix for inserting high unicode chars * merge with main branch * Fix for function sequence error * reverted unnecessary file changes * removed obsolete include * fix for 540 * fix for TVP type mismatch issue * Combined the IFs * Fix for high unicode insertion, WIP * Fix python2 high unicode insertion * Renamed a table to t1 Co-authored-by: v-chojas <25211973+v-chojas@users.noreply.github.com> Co-authored-by: Michael Kleehammer <michael@kleehammer.com>
I am trying to copy data from one SQL Server instance to another using pyODBC package and I encountered an error while handling UTF-8 characters on Windows.
I have narrowed it down to this snippet, which fails for me:
I get the following error:
Everything works fine when I set the
crsr.fast_executemany=False
.This seems to be somewhat related to #246 , but this happens on Windows system.
The text was updated successfully, but these errors were encountered: