Invocation of stored procedure with empty TVP dumps core #772

bkline · 2020-05-28T12:47:44Z

Environment

Python: 3.8.2
pyodbc: 4.0.30
OS: Linux 5.4.0-31-generic (also reproduced on Windows Server 2016)
DB: SQL Server 2019 (RTM-CU4) (KB4548597) - 15.0.4033.1 (X64)
driver: Microsoft ODBC Driver 17 for SQL Server

Issue

Expected: successful execution or a raised exception
Observed: segmentation fault (core dumped)

bkline · 2020-05-28T12:56:28Z

#!/usr/bin/env python3

# Repro case

from argparse import ArgumentParser
from pyodbc import connect

CLEANUP = (
    "DROP PROCEDURE save_dictionary_term",
    "DROP TYPE alias_type",
    "DROP TABLE dictionary_term_alias",
    "DROP TABLE dictionary_term",
)

parser = ArgumentParser()
parser.add_argument("conn")
opts = parser.parse_args()
conn = connect(opts.conn)
cursor = conn.cursor()

# Clean up from previous runs.
for statement in CLEANUP:
    try:
        cursor.execute(statement)
        conn.commit()
    except Exception:
        pass

# Create the database objects.
cursor.execute("""\
CREATE TABLE dictionary_term (
    term_id INTEGER NOT NULL,
    term_name NVARCHAR(1000) NOT NULL
)""")
conn.commit()
cursor.execute("""\
CREATE TABLE dictionary_term_alias (
    term_id INTEGER NOT NULL,
    other_name NVARCHAR(1000) NOT NULL,
    other_name_type NVARCHAR(30) NOT NULL
)""")
conn.commit()
cursor.execute("""\
CREATE TYPE alias_type AS TABLE (
    term_id INTEGER NULL,
    other_name NVARCHAR(1000) NULL,
    other_name_type NVARCHAR(30) NULL
)""")
conn.commit()
cursor.execute("""\
CREATE PROCEDURE save_dictionary_term (
    @tid INTEGER,
    @tname NVARCHAR(1000),
    @aliases alias_type READONLY
)
AS
BEGIN
    INSERT INTO dictionary_term (term_id, term_name) VALUES (@tid, @tname)
    INSERT INTO dictionary_term_alias (term_id, other_name, other_name_type)
    SELECT term_id, other_name, other_name_type FROM @aliases
END""")
conn.commit()

# Invoke the stored procedure with a non-empty table parameter argument.
aliases = [
    [42, "pullet", "synonym"],
    [42, "capon", "synonym"],
    [42, "fowl", "broader"],
]
command = "EXECUTE save_dictionary_term @tid=?, @tname=?, @aliases=?"
args = 42, "chicken", aliases
cursor.execute(command, args)
conn.commit()
print(cursor.execute("SELECT * FROM dictionary_term").fetchall())
print(cursor.execute("SELECT * FROM dictionary_term_alias").fetchall())

# Invoke the stored procedure with an empty table parameter argument.
args = 76, "trombones", []
cursor.execute(command, args) # <-- will crash the interpreter here
conn.commit()
print(cursor.execute("SELECT * FROM dictionary_term").fetchall())
print(cursor.execute("SELECT * FROM dictionary_term_alias").fetchall())

v-chojas · 2020-05-28T22:11:43Z

Could you also provide an ODBC trace?

bkline · 2020-05-28T22:58:20Z

No problem (credentials redacted). odbctrace.log

bkline · 2020-05-30T04:37:33Z

Doesn't look as if the problem is in a lower layer. I can successfully invoke stored procedures with empty table parameters from C++ or C# using the same driver.

bkline · 2020-05-30T12:11:23Z

I see what's happening. The code in params.cpp is trying to figure out the tvp's column information by examining the values being passed, without any help from the dbms. Because there are no rows to examine in this case, the code mistakenly thinks there are no columns defined for the tvp parameter.

bkline · 2020-05-30T22:03:46Z

There's a possibility that I'll try to come up with a PR. The factors which argue against doing that are

it's been years since I've done any serious C++ ODBC programming
I haven't found enough guidance to prevent me from doing things in ways which will not fit with the approaches used on this project (for example, creating my own statement handles handles when I should be reusing the one already being passed around); is there a guide for contributors?

Partly I suppose my ultimate decision will depend on how much encouragement and support I get from the maintainers for giving it a shot.

One thing I'll ask up front is about the apparent contradiction I see between this comment in dbspecific.h

// Items specific to databases.
//
// Obviously we'd like to minimize this, but if they are needed this file isolates them.  I'd like for there to be a
// single build of pyodbc on each platform and not have a bunch of defines for supporting different databases.

And things like this in pyodbc.h

#ifndef SQL_SS_TABLE
#define SQL_SS_TABLE -153
#endif

#ifndef SQL_SOPT_SS_PARAM_FOCUS
#define SQL_SOPT_SS_PARAM_FOCUS 1236
#endif

#ifndef SQL_CA_SS_TYPE_NAME
#define SQL_CA_SS_TYPE_NAME 1227
#endif

If additional defined constants are needed for the fix (seems likely) where should they go? Is one of those two headers preferred over the other for this purpose? Directly in params.cpp? Some other option?

Thanks!

bkline · 2020-05-30T22:47:20Z

Here's another problem, closely related to the bug originally reported. On line 1437 ff. of params.cpp, this code

        Py_ssize_t i = PySequence_Size(info.pObject) - info.ColumnSize;
        Py_ssize_t ncols = 0;
        while (i < PySequence_Size(info.pObject))
        {
            PyObject *row = PySequence_GetItem(info.pObject, i);

doesn't check the return value from the first invocation of PySequence_Size() for an error return (-1), which will happen if None is passed for the parameter, causing the interpreter to dump core for a different reason than the one behind the crash originally reported for this issue. (The return value from PySequence_GetItem() in this snippet also needs to be checked, as passing the NULL pointer that comes back to PySequence_Check() is where this crash actually happens.)

Note that this second bug means that the comment in the test immediately below this loop

        if (!ncols)
        {
            // TVP has no columns --- is null
            info.nested = 0;
        }

is wrong. If we get to that code before crashing then the tvp will not have been NULL but instead an empty sequence, as in my repro case. If NULL (None in the Python code) really is passed for the tvp, the code will have already blown up inside the preceding loop.

v-chojas · 2020-06-01T15:04:02Z

Because there are no rows to examine in this case, the code mistakenly thinks there are no columns defined for the tvp parameter.

Unfortunately due to how TVPs are implemented at the ODBC driver level, it is not possible to determine the correct binding types via the usual SQLDescribeParam method if no data is provided to detect.

However, it should be possible to avoid the crash by adding an extra condition to the loop: while(i >= 0 && i < PySequence_Size(info.pObject)).

bkline · 2020-06-01T15:39:03Z

I tried that, but that just defers the core dump. The ODBC documentation is not as helpful as I would like for this area, but we should be able to get the data source to provide the information needed for the calls to SQLBindParameter(). I appreciate the irony that we'd be forced to fall back on this less efficient way to determine the settings in the case where there are no table rows for which the bindings will be really useful. It may be that the original picture the implementers had in mind was something like order entry, where an order without any products shouldn't exist. Our use case is different: a drug dictionary entry with linked rows for aliases. Most drugs will have aliases (many will have lots of aliases -- that's how things work in that industry), but occasionally there will be a drug which has no aliases. We can live with the performance penalty (I say this before I've gotten to the point of knowing what that penalty actually is -- might be negligible, might be a showstopper) for that rare occurrence, but having the application crash isn't an acceptable option. If the outlier cases (no aliases) were the norm, pushing the performance hit past an allowable threshold, we'd probably redesign the application. We shouldn't have to do that, though, nor should we have to make up bogus data to avoid crashing the interpreter.

bkline · 2020-06-01T15:43:15Z

Anyway, back to the ODBC API. If I can get the back end to cough up the name of the TVP type, I think we'll be home free. I'm looking for the right field to request in a SQLGetDescField() call, but no luck yet. Any useful hints would be appreciated. 😉

gordthompson · 2020-06-01T16:13:56Z

Possibly somewhat related comment here.

bkline · 2020-06-01T21:34:51Z

Bingo! SQL_CA_SS_TYPE_NAME gets me the name. May need to do some digging for different values needed for other back ends, but this works for SQL Server. Let me use this to see if I can get the back end to give me what's needed for the binding, and if so, what the performance penalty would be of the extra round trips, if any.

bkline · 2020-06-03T02:49:11Z

Here's a first cut at a fix. The repro case runs successfully with it, though there's more to do (threading stuff, unit test, etc.), but I'm reluctant to go much further toward rolling a PR without some feedback.

index b12dce9..f74ff0c 100644
--- a/src/cursor.h
+++ b/src/cursor.h
@@ -68,6 +68,9 @@ struct ParamInfo
     struct ParamInfo *nested;
     SQLLEN curTvpRow;
 
+    // For TVPs, the name of the table type.
+    SQLWCHAR* type_name;
+
     // Optional data.  If used, ParameterValuePtr will point into this.
     union
     {
diff --git a/src/params.cpp b/src/params.cpp
index 20548c2..293ffab 100644
--- a/src/params.cpp
+++ b/src/params.cpp
@@ -606,6 +606,8 @@ static void FreeInfos(ParamInfo* a, Py_ssize_t count)
             pyodbc_free(a[i].ParameterValuePtr);
         if (a[i].ParameterType == SQL_SS_TABLE && a[i].nested)
             FreeInfos(a[i].nested, a[i].maxlength);
+        if (a[i].type_name)
+            pyodbc_free(a[i].type_name);
         Py_XDECREF(a[i].pObject);
     }
     pyodbc_free(a);
@@ -1200,6 +1202,29 @@ static bool GetTableInfo(Cursor *cur, Py_ssize_t index, PyObject* param, ParamIn
         SQLDescribeParam(cur->hstmt, index + 1, &tvptype, 0, 0, 0);
     }
 
+    if (!nrows)
+    {
+        // With no table rows to examine, we'll need the TPV's type. Note that we don't
+        // really need the values provided by SQLDescribeParam, but without invoking
+        // that function, the next call (to get the table type name) will fail for
+        // some reason. Errors will be detected later (when the needed type_name
+        // pointer is found to be NULL).
+        SQLHANDLE ipd;
+        SQLWCHAR name[257] = {0};
+        SQLINTEGER len = 0;
+        SQLULEN parm_size;
+        SQLSMALLINT decdigits, nulls, datatype;
+        SQLGetStmtAttr(cur->hstmt, SQL_ATTR_IMP_PARAM_DESC, &ipd, SQL_IS_POINTER, &len);
+        SQLDescribeParam(cur->hstmt, index + 1, &datatype, &parm_size, &decdigits, &nulls);
+        SQLRETURN rc = SQLGetDescFieldW(ipd, index + 1, SQL_CA_SS_TYPE_NAME, name, sizeof(name), &len);
+        if (SQL_SUCCEEDED(rc))
+        {
+            info.type_name = (SQLWCHAR*)pyodbc_malloc(len + sizeof(SQLWCHAR));
+            memcpy(info.type_name, name, len);
+            info.type_name[len / sizeof(SQLWCHAR)] = 0;
+        }
+    }
+
     info.pObject = param;
     Py_INCREF(param);
     info.ValueType = SQL_C_BINARY;
@@ -1434,8 +1459,14 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             return false;
         }
 
-        Py_ssize_t i = PySequence_Size(info.pObject) - info.ColumnSize;
+        // Make sure the user didn't pass None for the TVP.
+        Py_ssize_t i = PySequence_Size(info.pObject);
+        if (i < 0)
+        {
+            RaiseErrorV(0, ProgrammingError, "A TVP must be a Sequence.");
+        }
         Py_ssize_t ncols = 0;
+        i -= info.ColumnSize;
         while (i < PySequence_Size(info.pObject))
         {
             PyObject *row = PySequence_GetItem(info.pObject, i);
@@ -1455,12 +1486,7 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             ncols = PySequence_Size(row);
             i++;
         }
-        if (!ncols)
-        {
-            // TVP has no columns --- is null
-            info.nested = 0;
-        }
-        else
+        if (ncols)
         {
             PyObject *row = PySequence_GetItem(info.pObject, PySequence_Size(info.pObject) - info.ColumnSize);
             Py_XDECREF(row);
@@ -1498,6 +1524,64 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             }
         }
 
+        else
+        {
+            // An empty sequence was passed for the TVP, so we can't use the data to figure even how
+            // many columns the TVP has. Get some help from the server.
+            SQLHANDLE chstmt;
+            ParamInfo* column = 0;
+
+            if (!info.type_name)
+            {
+                RaiseErrorV(0, ProgrammingError, "Unable to find TVP type name.");
+                return false;
+            }
+            ret = SQLAllocHandle(SQL_HANDLE_STMT, GetConnection(cur)->hdbc, &chstmt);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLAllocHandle", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            ret = SQLSetStmtAttr(chstmt, SQL_SOPT_SS_NAME_SCOPE, (SQLPOINTER)SQL_SS_NAME_SCOPE_TABLE_TYPE,
+                                 SQL_IS_UINTEGER);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLSetStmtAttr", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            ret = SQLColumnsW(chstmt, NULL, 0, NULL, 0, info.type_name, SQL_NTS, NULL, 0);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLColumnsW", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            while (SQL_SUCCEEDED(SQLFetch(chstmt)))
+            {
+                if (ncols++)
+                {
+                    pyodbc_realloc((BYTE**)&info.nested, ncols * sizeof(ParamInfo));
+                    column = info.nested + ncols - 1;
+                }
+                else
+                    column = info.nested = (ParamInfo*)pyodbc_malloc(ncols * sizeof(ParamInfo));
+                memset((void*)column, 0, sizeof(ParamInfo));
+                column->ValueType = SQL_C_DEFAULT;
+                column->ParameterType = SQL_VARCHAR;
+                column->StrLen_or_Ind = SQL_DATA_AT_EXEC;
+            }
+            for (i = 0; i < ncols; ++i) {
+                // TODO Replace hard-coded type information.
+                ret = SQLBindParameter(cur->hstmt, (SQLUSMALLINT)(i + 1), SQL_PARAM_INPUT,
+                                       SQL_C_DEFAULT, SQL_VARCHAR, 0, 0, info.nested + i, 0,
+                                       &info.nested[i].StrLen_or_Ind);
+                if (!SQL_SUCCEEDED(ret))
+                {
+                    RaiseErrorFromHandle(cur->cnxn, "SQLBindParameter", GetConnection(cur)->hdbc, cur->hstmt);
+                    return false;
+                }
+            }
+        }
+
         ret = SQLSetStmtAttr(cur->hstmt, SQL_SOPT_SS_PARAM_FOCUS, 0, SQL_IS_INTEGER);
         if (!SQL_SUCCEEDED(ret))
         {
diff --git a/src/pyodbc.h b/src/pyodbc.h
index 6bef3e6..55f7af4 100644
--- a/src/pyodbc.h
+++ b/src/pyodbc.h
@@ -82,6 +82,14 @@ typedef int Py_ssize_t;
 #define SQL_CA_SS_TYPE_NAME 1227
 #endif
 
+#ifndef SQL_SOPT_SS_NAME_SCOPE
+#define SQL_SOPT_SS_NAME_SCOPE 1237
+#endif
+
+#ifndef SQL_SS_NAME_SCOPE_TABLE_TYPE
+#define SQL_SS_NAME_SCOPE_TABLE_TYPE 1
+#endif
+
 inline bool IsSet(DWORD grf, DWORD flags)
 {
     return (grf & flags) == flags;

bkline · 2020-06-03T13:29:36Z

As expected, there is a performance penalty when we ask the back end for help in figuring out how many columns an empty TVP has. To measure, I moved my testing to a development configuration in which the database server and client are not on the same machine. It took five seconds to invoke the stored procedure 1,000 times with two rows in the TVP. Also five seconds with one row per call. When the calls use an empty TVP the 1,000 invocations take seven seconds. So in our use case (see earlier comment above), you'd be hard pressed to measure the penalty, and even in the degenerate case in which most (or all) of the TVPs are empty, it would be hard to argue that an exorbitant price is being exacted for avoiding crashing the interpreter. I mean, it's nice to have a performant database module, but there's only so much you should be expected to do to save developers from their own misguided application designs, right? 😉

bkline · 2020-06-03T14:01:57Z

While we've got this part of the code under the microscope (and while I'm waiting for some feedback), can I ask what's going on in this code in GetTableInfo() (line 1198, ff. of params.cpp)?

    if (!nskip)
    {
        // Need to describe in order to fill in IPD with the TVP's type name, because user has not provided it
        SQLSMALLINT tvptype;
        SQLDescribeParam(cur->hstmt, index + 1, &tvptype, 0, 0, 0);
    }

Looks like a mistake (on several fronts), but I'm open to other explanations.

[Hint: tvptype gets a value which is never used.]

v-chojas · 2020-06-03T14:53:54Z

I don't think such complexity is necessary, since the default value of a TVP is empty and so it should be possible to just skip binding it completely; the fact that skipping the first loop with an additional condition will cause a crash later just means that other place should also skip.

can I ask what's going on in this code

The comment there explains why, but you noticed this behaviour yourself when you wrote this comment in your proposed change:

Note that we don't really need the values provided by SQLDescribeParam, but without invoking that function ... fails

bkline · 2020-06-03T19:50:38Z

I'll take your word for it that just calling SQLDescribeParam() by itself will get the type name. I didn't get that impression from the ODBC API documentation, but I'll be the first to admit that I don't always succeed at figuring out those docs.

In any case, I'm at a dead end. I tried your suggestion of adding the extra test to the while loop to prevent attempts to fetch outside the parameter value array from crashing the interpreter. I don't know what to do with your suggestion that we fix "that other place" to prevent the deferred crash, because it's happening inside SQLExecute(). My naïve assumption would be that SQLExecute() isn't happy with the fact that we've got a TVP whose columns aren't bound, even when the parameter's table has no rows.¹ But I recognize that you've got a lot more experience in contributing to this project than I do, so you're more likely than I am to know how to implement the solution you're proposing, and I'm eagerly looking forward to testing it. 😋

¹ According to Microsoft's docs, "After binding the table-valued parameter, the application must then bind each table-valued parameter column."

Without this patch, passing a TVP with no rows will crash the Python interpreter. Solution from v-chojas. See #772.

keitherskine · 2021-05-29T15:38:44Z

Should have been fixed by PR #779 . Feel free to re-open if still an issue.

This was referenced Jun 5, 2020

Bind TVP columns even when the table is empty #776

Closed

Tell the driver to use the default value for an empty TVP #779

Merged

mkleehammer pushed a commit that referenced this issue Jan 22, 2021

Tell the driver to use the default value for an empty TVP

91f46ef

Without this patch, passing a TVP with no rows will crash the Python interpreter. Solution from v-chojas. See #772.

keitherskine closed this as completed May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invocation of stored procedure with empty TVP dumps core #772

Invocation of stored procedure with empty TVP dumps core #772

bkline commented May 28, 2020

bkline commented May 28, 2020

v-chojas commented May 28, 2020

bkline commented May 28, 2020

bkline commented May 30, 2020

bkline commented May 30, 2020

bkline commented May 30, 2020 •

edited

Loading

bkline commented May 30, 2020 •

edited

Loading

v-chojas commented Jun 1, 2020

bkline commented Jun 1, 2020

bkline commented Jun 1, 2020

gordthompson commented Jun 1, 2020

bkline commented Jun 1, 2020

bkline commented Jun 3, 2020 •

edited

Loading

bkline commented Jun 3, 2020

bkline commented Jun 3, 2020 •

edited

Loading

v-chojas commented Jun 3, 2020

bkline commented Jun 3, 2020

keitherskine commented May 29, 2021

Invocation of stored procedure with empty TVP dumps core #772

Invocation of stored procedure with empty TVP dumps core #772

Comments

bkline commented May 28, 2020

Environment

Issue

bkline commented May 28, 2020

v-chojas commented May 28, 2020

bkline commented May 28, 2020

bkline commented May 30, 2020

bkline commented May 30, 2020

bkline commented May 30, 2020 • edited Loading

bkline commented May 30, 2020 • edited Loading

v-chojas commented Jun 1, 2020

bkline commented Jun 1, 2020

bkline commented Jun 1, 2020

gordthompson commented Jun 1, 2020

bkline commented Jun 1, 2020

bkline commented Jun 3, 2020 • edited Loading

bkline commented Jun 3, 2020

bkline commented Jun 3, 2020 • edited Loading

v-chojas commented Jun 3, 2020

bkline commented Jun 3, 2020

keitherskine commented May 29, 2021

bkline commented May 30, 2020 •

edited

Loading

bkline commented May 30, 2020 •

edited

Loading

bkline commented Jun 3, 2020 •

edited

Loading

bkline commented Jun 3, 2020 •

edited

Loading