Buffer overflow in cursor create_name_map #931

juliuszsompolski · 2021-07-14T17:57:28Z

Function create_name_map used a static buffer szName[300] for the column name in call to SQLDescribeColW. That function would truncate the column name to the size of the provided buffer, but return the actual length of the column name in the &cchName return parameter.
That cchName output was used to compute cbName parameter for the call to TextBufferToObject(enc, szName, cbName). When the actual column name was longer than the buffer, cbName was to big, and was causing a buffer overflow, resulting in crashes, invalid data, or errors.

Fix this by making szName a dynamic buffer. This buffer overflow is very similar to the one fixed in #881. I check through the rest of the code base, and I don't see more places with static buffers that could have a similar issue.

Other systems do not support such long column names.
Because I couldn't test it with other systems, I adapted python3 test suites to run on Spark, and added the test there.

$ python tests3/sparktests.py DSN=DatabricksAzureWestEurope
Library: /home/julek/dev/pyodbc/build/lib.linux-x86_64-3.7/pyodbc.cpython-37m-x86_64-linux-gnu.so
----------------------------------------------------------------------
Ran 58 tests in 557.160s

OK

juliuszsompolski · 2021-07-14T17:58:27Z

cc @mkleehammer @keitherskine This is a very similar buffer overflow to the one fixed in #881. We encountered it with Simba Spark ODBC driver.

juliuszsompolski · 2021-07-14T17:59:03Z

src/cursor.cpp

+                PyErr_NoMemory();
+                goto done;
+            }
+            goto retry;


I think goto is justified here to not repeat that logic...

juliuszsompolski · 2021-07-14T18:00:10Z

src/cursor.cpp

+            nameLen = cchName + 1;
+            if (!pyodbc_realloc((BYTE**) &szName, (nameLen + 1) * sizeof(ODBCCHAR))) {


Like in my previous PR, I prefer to do some defensive double +1 here to rather have a too big buffer, and space for an extra character (like a termination character) than a too small one.

…um names

v-chojas · 2021-07-14T18:40:00Z

I am curious to learn what databases will let you use such long column names, and what demand there is for that capability in the first place -- 300 already seems very generous. I can understand error messages could be extremely verbose, but a column name doesn't seem like it would make sense being so long.

SQL Server has a maximum of 128. The first paragraph of my comment is 308 characters long.

juliuszsompolski · 2021-07-14T18:44:59Z

@v-chojas Apache Spark allows it. Simba ODBC driver for Spark allows it.
We are hitting it in our internal testing of Simba ODBC, and Redash which uses pyodbc hits it.
When a complex expression is specified in Spark and no alias name is given to it, Spark will autogenerate column name based on the full expression. If it's a complex case expression, it can end up overflowing 300 characters.
It can be argued that that behaviour of Spark is stupid, but it's a bigger change to change the behaviour there.

juliuszsompolski · 2021-07-14T18:47:39Z

@v-chojas the tests I added show that MS Access, Informix, SQLLite and SQL DW also allow it.
SQL Server, MySQL and PG SQL did not allow it, so I remove the tests for them.

v-chojas · 2021-07-14T19:03:08Z

MS Access has a limit of 64, and SQL DW is based on SQL Server, so I would expect the same limit applies there.

My research shows that Informix also has 128, and SQLite is the only one that claims "no limit".

Function create_name_map used a static buffer `szName[300]` for the column name in call to `SQLDescribeColW`. That function would truncate the column name to the size of the provided buffer, but return the actual length of the column name in the `&cchName` return parameter. That `cchName` output was used to compute `cbName` parameter for the call to `TextBufferToObject(enc, szName, cbName)`. When the actual column name was longer than the buffer, cbName was to big, and was causing a buffer overflow, resulting in crashes, invalid data, or errors. Fix this by making szName a dynamic buffer. This buffer overflow is very similar to the one fixed in mkleehammer#881. I check through the rest of the code base, and I don't see more places with static buffers that could have a similar issue. Upstream PR: mkleehammer#931

juliuszsompolski · 2021-07-15T14:22:34Z

@v-chojas The tests added here suggest that they are nevertheless happy to create a table with column names 500 and 600 characters long, to query it, and to return such long column names.
I am not proposing this change because I'm on a nitpicking crusade to hunt irrelevant buffer overflows, but because our users have been hitting this and experiencing crashes or mangled column names, or UTF encoding/decoding errors. We have been asked for an urgent fix, so tentatively I forked pyodbc and created https://github.com/databricks/pyodbc/releases/tag/v20210715. We do however want to work with upstream to get our changes and fixes into the official pyodbc releases.

v-chojas · 2021-07-15T15:48:43Z

I don't know how you tested, but I have access to a DW instance and...

>>> conn = pyodbc.connect('Driver=ODBC Driver 17 for SQL Server;Server=<my DW>;...')
>>> stmt = conn.cursor()
>>> colname = 'col' + ('1234567890' * 60)
>>> stmt.execute('create table test931(' + colname + ' int)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Parse Error: Identiier 'col1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890...' exceeded the maximum length of 128. (104307) (SQLExecDirectW)")
>>> stmt.execute('select @@version').fetchall()
[('Microsoft Azure SQL Data Warehouse  10.0.16003.0 Apr 28 2021 04:55:16 Copyright (c) Microsoft Corporation', )]

juliuszsompolski · 2021-07-15T16:12:36Z

tests3/sqldwtests.py

+    def test_long_column_name(self):
+        "ensure super long column names are handled correctly."
+        c1 = 'abcdefghij' * 50
+        c2 = 'klmnopqrst' * 60
+        self.cursor = self.cnxn.cursor()
+
+        self.cursor.execute("create table t1({} int, {} int)".format(c1, c2))
+        self.cursor.execute("select * from t1")
+
+        names = [ t[0] for t in self.cursor.description ]
+        names.sort()
+
+        self.assertEqual(names, [ c1, c2 ])


@v-chojas I don't have access to a SQL DW instance, so I depended on the CI...
Oh, I get it. https://app.travis-ci.com/github/mkleehammer/pyodbc/jobs/524518887 The CI actually runs only SQL Server, PG SQL and MySql tests...

juliuszsompolski · 2021-07-15T16:42:31Z

I removed tests for Access, Informix, SQL DW
I tested with SQLite.
What SQLLite does is that if BufferLength of 300 is passed to SQLDescribeCol function, it returns the column name truncated to 300 characters, and returns 300 in NameLengthPtr.
This is not according to the spec of https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqldescribecol-function?view=sql-server-ver15, which says that the actual column name length should be returned

NameLengthPtr
[Output] Pointer to a buffer in which to return the total number of characters (excluding the null termination) available to return in *ColumnName. If the number of characters available to return is greater than or equal to BufferLength, the column name in *ColumnName is truncated to BufferLength minus the length of a null-termination character.

Therefore, it seems it can't be tested with the existing set of tests. Thanks @v-chojas for checking all these systems.

This reverts commit 89d9a2d.

juliuszsompolski · 2021-07-15T18:10:27Z

Because I couldn't test it with other systems, I adapted python3 test suites to run on Spark, and added the test there.

$ python tests3/sparktests.py DSN=DatabricksAzureWestEurope
Library: /home/julek/dev/pyodbc/build/lib.linux-x86_64-3.7/pyodbc.cpython-37m-x86_64-linux-gnu.so
----------------------------------------------------------------------
Ran 58 tests in 557.160s

OK

mkleehammer · 2021-08-16T20:02:06Z

Thanks for all the work you put into this!

mkleehammer · 2021-08-16T20:08:40Z

Sorry everyone, I somehow merged this without the rebase. I thought I configured Github to require that or make it the default.

column name buffer overflow

b8b005c

juliuszsompolski commented Jul 14, 2021

View reviewed changes

juliuszsompolski added 3 commits July 14, 2021 20:11

fix

da393c6

add tests - lets see if these databases actually handle such long col…

5574c7e

…um names

remove unsupported tests

2b738f8

fix

c72289a

juliuszsompolski mentioned this pull request Jul 15, 2021

Buffer overflow in cursor create_name_map databricks/pyodbc#1

Merged

juliuszsompolski commented Jul 15, 2021

View reviewed changes

juliuszsompolski added 3 commits July 15, 2021 18:16

remove Access, Informix, SQL DW tests

033a9e4

whitespace

baf350c

whitespace cd

5794b9b

juliuszsompolski added 4 commits July 15, 2021 18:43

remove sqllite tests as well

b550078

check if maybe long query output columns work

89d9a2d

Revert "check if maybe long query output columns work"

27278c4

This reverts commit 89d9a2d.

add spark tests

12095f3

revert nameLen change

24166ff

mkleehammer merged commit 24e0cca into mkleehammer:master Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer overflow in cursor create_name_map #931

Buffer overflow in cursor create_name_map #931

juliuszsompolski commented Jul 14, 2021 •

edited

Loading

juliuszsompolski commented Jul 14, 2021

juliuszsompolski Jul 14, 2021

juliuszsompolski Jul 14, 2021

v-chojas commented Jul 14, 2021

juliuszsompolski commented Jul 14, 2021 •

edited

Loading

juliuszsompolski commented Jul 14, 2021

v-chojas commented Jul 14, 2021

juliuszsompolski commented Jul 15, 2021

v-chojas commented Jul 15, 2021

juliuszsompolski Jul 15, 2021

juliuszsompolski commented Jul 15, 2021

juliuszsompolski commented Jul 15, 2021

mkleehammer commented Aug 16, 2021

mkleehammer commented Aug 16, 2021

		nameLen = cchName + 1;
		if (!pyodbc_realloc((BYTE*) &szName, (nameLen + 1) sizeof(ODBCCHAR))) {

Buffer overflow in cursor create_name_map #931

Buffer overflow in cursor create_name_map #931

Conversation

juliuszsompolski commented Jul 14, 2021 • edited Loading

juliuszsompolski commented Jul 14, 2021

juliuszsompolski Jul 14, 2021

Choose a reason for hiding this comment

juliuszsompolski Jul 14, 2021

Choose a reason for hiding this comment

v-chojas commented Jul 14, 2021

juliuszsompolski commented Jul 14, 2021 • edited Loading

juliuszsompolski commented Jul 14, 2021

v-chojas commented Jul 14, 2021

juliuszsompolski commented Jul 15, 2021

v-chojas commented Jul 15, 2021

juliuszsompolski Jul 15, 2021

Choose a reason for hiding this comment

juliuszsompolski commented Jul 15, 2021

juliuszsompolski commented Jul 15, 2021

mkleehammer commented Aug 16, 2021

mkleehammer commented Aug 16, 2021

juliuszsompolski commented Jul 14, 2021 •

edited

Loading

juliuszsompolski commented Jul 14, 2021 •

edited

Loading