Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invocation of stored procedure with empty TVP dumps core #772

Closed
bkline opened this issue May 28, 2020 · 18 comments
Closed

Invocation of stored procedure with empty TVP dumps core #772

bkline opened this issue May 28, 2020 · 18 comments

Comments

@bkline
Copy link
Contributor

bkline commented May 28, 2020

Environment

  • Python: 3.8.2
  • pyodbc: 4.0.30
  • OS: Linux 5.4.0-31-generic (also reproduced on Windows Server 2016)
  • DB: SQL Server 2019 (RTM-CU4) (KB4548597) - 15.0.4033.1 (X64)
  • driver: Microsoft ODBC Driver 17 for SQL Server

Issue

  • Expected: successful execution or a raised exception
  • Observed: segmentation fault (core dumped)
@bkline
Copy link
Contributor Author

bkline commented May 28, 2020

#!/usr/bin/env python3

# Repro case

from argparse import ArgumentParser
from pyodbc import connect

CLEANUP = (
    "DROP PROCEDURE save_dictionary_term",
    "DROP TYPE alias_type",
    "DROP TABLE dictionary_term_alias",
    "DROP TABLE dictionary_term",
)

parser = ArgumentParser()
parser.add_argument("conn")
opts = parser.parse_args()
conn = connect(opts.conn)
cursor = conn.cursor()

# Clean up from previous runs.
for statement in CLEANUP:
    try:
        cursor.execute(statement)
        conn.commit()
    except Exception:
        pass

# Create the database objects.
cursor.execute("""\
CREATE TABLE dictionary_term (
    term_id INTEGER NOT NULL,
    term_name NVARCHAR(1000) NOT NULL
)""")
conn.commit()
cursor.execute("""\
CREATE TABLE dictionary_term_alias (
    term_id INTEGER NOT NULL,
    other_name NVARCHAR(1000) NOT NULL,
    other_name_type NVARCHAR(30) NOT NULL
)""")
conn.commit()
cursor.execute("""\
CREATE TYPE alias_type AS TABLE (
    term_id INTEGER NULL,
    other_name NVARCHAR(1000) NULL,
    other_name_type NVARCHAR(30) NULL
)""")
conn.commit()
cursor.execute("""\
CREATE PROCEDURE save_dictionary_term (
    @tid INTEGER,
    @tname NVARCHAR(1000),
    @aliases alias_type READONLY
)
AS
BEGIN
    INSERT INTO dictionary_term (term_id, term_name) VALUES (@tid, @tname)
    INSERT INTO dictionary_term_alias (term_id, other_name, other_name_type)
    SELECT term_id, other_name, other_name_type FROM @aliases
END""")
conn.commit()

# Invoke the stored procedure with a non-empty table parameter argument.
aliases = [
    [42, "pullet", "synonym"],
    [42, "capon", "synonym"],
    [42, "fowl", "broader"],
]
command = "EXECUTE save_dictionary_term @tid=?, @tname=?, @aliases=?"
args = 42, "chicken", aliases
cursor.execute(command, args)
conn.commit()
print(cursor.execute("SELECT * FROM dictionary_term").fetchall())
print(cursor.execute("SELECT * FROM dictionary_term_alias").fetchall())

# Invoke the stored procedure with an empty table parameter argument.
args = 76, "trombones", []
cursor.execute(command, args) # <-- will crash the interpreter here
conn.commit()
print(cursor.execute("SELECT * FROM dictionary_term").fetchall())
print(cursor.execute("SELECT * FROM dictionary_term_alias").fetchall())

@v-chojas
Copy link
Contributor

Could you also provide an ODBC trace?

@bkline
Copy link
Contributor Author

bkline commented May 28, 2020

No problem (credentials redacted). odbctrace.log

@bkline
Copy link
Contributor Author

bkline commented May 30, 2020

Doesn't look as if the problem is in a lower layer. I can successfully invoke stored procedures with empty table parameters from C++ or C# using the same driver.

@bkline
Copy link
Contributor Author

bkline commented May 30, 2020

I see what's happening. The code in params.cpp is trying to figure out the tvp's column information by examining the values being passed, without any help from the dbms. Because there are no rows to examine in this case, the code mistakenly thinks there are no columns defined for the tvp parameter.

@bkline
Copy link
Contributor Author

bkline commented May 30, 2020

There's a possibility that I'll try to come up with a PR. The factors which argue against doing that are

  • it's been years since I've done any serious C++ ODBC programming
  • I haven't found enough guidance to prevent me from doing things in ways which will not fit with the approaches used on this project (for example, creating my own statement handles handles when I should be reusing the one already being passed around); is there a guide for contributors?

Partly I suppose my ultimate decision will depend on how much encouragement and support I get from the maintainers for giving it a shot.

One thing I'll ask up front is about the apparent contradiction I see between this comment in dbspecific.h

// Items specific to databases.
//
// Obviously we'd like to minimize this, but if they are needed this file isolates them.  I'd like for there to be a
// single build of pyodbc on each platform and not have a bunch of defines for supporting different databases.

And things like this in pyodbc.h

#ifndef SQL_SS_TABLE
#define SQL_SS_TABLE -153
#endif

#ifndef SQL_SOPT_SS_PARAM_FOCUS
#define SQL_SOPT_SS_PARAM_FOCUS 1236
#endif

#ifndef SQL_CA_SS_TYPE_NAME
#define SQL_CA_SS_TYPE_NAME 1227
#endif

If additional defined constants are needed for the fix (seems likely) where should they go? Is one of those two headers preferred over the other for this purpose? Directly in params.cpp? Some other option?

Thanks!

@bkline
Copy link
Contributor Author

bkline commented May 30, 2020

Here's another problem, closely related to the bug originally reported. On line 1437 ff. of params.cpp, this code

        Py_ssize_t i = PySequence_Size(info.pObject) - info.ColumnSize;
        Py_ssize_t ncols = 0;
        while (i < PySequence_Size(info.pObject))
        {
            PyObject *row = PySequence_GetItem(info.pObject, i);

doesn't check the return value from the first invocation of PySequence_Size() for an error return (-1), which will happen if None is passed for the parameter, causing the interpreter to dump core for a different reason than the one behind the crash originally reported for this issue. (The return value from PySequence_GetItem() in this snippet also needs to be checked, as passing the NULL pointer that comes back to PySequence_Check() is where this crash actually happens.)

Note that this second bug means that the comment in the test immediately below this loop

        if (!ncols)
        {
            // TVP has no columns --- is null
            info.nested = 0;
        }

is wrong. If we get to that code before crashing then the tvp will not have been NULL but instead an empty sequence, as in my repro case. If NULL (None in the Python code) really is passed for the tvp, the code will have already blown up inside the preceding loop.

@v-chojas
Copy link
Contributor

v-chojas commented Jun 1, 2020

Because there are no rows to examine in this case, the code mistakenly thinks there are no columns defined for the tvp parameter.

Unfortunately due to how TVPs are implemented at the ODBC driver level, it is not possible to determine the correct binding types via the usual SQLDescribeParam method if no data is provided to detect.

However, it should be possible to avoid the crash by adding an extra condition to the loop: while(i >= 0 && i < PySequence_Size(info.pObject)).

@bkline
Copy link
Contributor Author

bkline commented Jun 1, 2020

I tried that, but that just defers the core dump. The ODBC documentation is not as helpful as I would like for this area, but we should be able to get the data source to provide the information needed for the calls to SQLBindParameter(). I appreciate the irony that we'd be forced to fall back on this less efficient way to determine the settings in the case where there are no table rows for which the bindings will be really useful. It may be that the original picture the implementers had in mind was something like order entry, where an order without any products shouldn't exist. Our use case is different: a drug dictionary entry with linked rows for aliases. Most drugs will have aliases (many will have lots of aliases -- that's how things work in that industry), but occasionally there will be a drug which has no aliases. We can live with the performance penalty (I say this before I've gotten to the point of knowing what that penalty actually is -- might be negligible, might be a showstopper) for that rare occurrence, but having the application crash isn't an acceptable option. If the outlier cases (no aliases) were the norm, pushing the performance hit past an allowable threshold, we'd probably redesign the application. We shouldn't have to do that, though, nor should we have to make up bogus data to avoid crashing the interpreter.

@bkline
Copy link
Contributor Author

bkline commented Jun 1, 2020

Anyway, back to the ODBC API. If I can get the back end to cough up the name of the TVP type, I think we'll be home free. I'm looking for the right field to request in a SQLGetDescField() call, but no luck yet. Any useful hints would be appreciated. 😉

@gordthompson
Copy link
Collaborator

Possibly somewhat related comment here.

@bkline
Copy link
Contributor Author

bkline commented Jun 1, 2020

Bingo! SQL_CA_SS_TYPE_NAME gets me the name. May need to do some digging for different values needed for other back ends, but this works for SQL Server. Let me use this to see if I can get the back end to give me what's needed for the binding, and if so, what the performance penalty would be of the extra round trips, if any.

@bkline
Copy link
Contributor Author

bkline commented Jun 3, 2020

Here's a first cut at a fix. The repro case runs successfully with it, though there's more to do (threading stuff, unit test, etc.), but I'm reluctant to go much further toward rolling a PR without some feedback.

index b12dce9..f74ff0c 100644
--- a/src/cursor.h
+++ b/src/cursor.h
@@ -68,6 +68,9 @@ struct ParamInfo
     struct ParamInfo *nested;
     SQLLEN curTvpRow;
 
+    // For TVPs, the name of the table type.
+    SQLWCHAR* type_name;
+
     // Optional data.  If used, ParameterValuePtr will point into this.
     union
     {
diff --git a/src/params.cpp b/src/params.cpp
index 20548c2..293ffab 100644
--- a/src/params.cpp
+++ b/src/params.cpp
@@ -606,6 +606,8 @@ static void FreeInfos(ParamInfo* a, Py_ssize_t count)
             pyodbc_free(a[i].ParameterValuePtr);
         if (a[i].ParameterType == SQL_SS_TABLE && a[i].nested)
             FreeInfos(a[i].nested, a[i].maxlength);
+        if (a[i].type_name)
+            pyodbc_free(a[i].type_name);
         Py_XDECREF(a[i].pObject);
     }
     pyodbc_free(a);
@@ -1200,6 +1202,29 @@ static bool GetTableInfo(Cursor *cur, Py_ssize_t index, PyObject* param, ParamIn
         SQLDescribeParam(cur->hstmt, index + 1, &tvptype, 0, 0, 0);
     }
 
+    if (!nrows)
+    {
+        // With no table rows to examine, we'll need the TPV's type. Note that we don't
+        // really need the values provided by SQLDescribeParam, but without invoking
+        // that function, the next call (to get the table type name) will fail for
+        // some reason. Errors will be detected later (when the needed type_name
+        // pointer is found to be NULL).
+        SQLHANDLE ipd;
+        SQLWCHAR name[257] = {0};
+        SQLINTEGER len = 0;
+        SQLULEN parm_size;
+        SQLSMALLINT decdigits, nulls, datatype;
+        SQLGetStmtAttr(cur->hstmt, SQL_ATTR_IMP_PARAM_DESC, &ipd, SQL_IS_POINTER, &len);
+        SQLDescribeParam(cur->hstmt, index + 1, &datatype, &parm_size, &decdigits, &nulls);
+        SQLRETURN rc = SQLGetDescFieldW(ipd, index + 1, SQL_CA_SS_TYPE_NAME, name, sizeof(name), &len);
+        if (SQL_SUCCEEDED(rc))
+        {
+            info.type_name = (SQLWCHAR*)pyodbc_malloc(len + sizeof(SQLWCHAR));
+            memcpy(info.type_name, name, len);
+            info.type_name[len / sizeof(SQLWCHAR)] = 0;
+        }
+    }
+
     info.pObject = param;
     Py_INCREF(param);
     info.ValueType = SQL_C_BINARY;
@@ -1434,8 +1459,14 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             return false;
         }
 
-        Py_ssize_t i = PySequence_Size(info.pObject) - info.ColumnSize;
+        // Make sure the user didn't pass None for the TVP.
+        Py_ssize_t i = PySequence_Size(info.pObject);
+        if (i < 0)
+        {
+            RaiseErrorV(0, ProgrammingError, "A TVP must be a Sequence.");
+        }
         Py_ssize_t ncols = 0;
+        i -= info.ColumnSize;
         while (i < PySequence_Size(info.pObject))
         {
             PyObject *row = PySequence_GetItem(info.pObject, i);
@@ -1455,12 +1486,7 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             ncols = PySequence_Size(row);
             i++;
         }
-        if (!ncols)
-        {
-            // TVP has no columns --- is null
-            info.nested = 0;
-        }
-        else
+        if (ncols)
         {
             PyObject *row = PySequence_GetItem(info.pObject, PySequence_Size(info.pObject) - info.ColumnSize);
             Py_XDECREF(row);
@@ -1498,6 +1524,64 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
             }
         }
 
+        else
+        {
+            // An empty sequence was passed for the TVP, so we can't use the data to figure even how
+            // many columns the TVP has. Get some help from the server.
+            SQLHANDLE chstmt;
+            ParamInfo* column = 0;
+
+            if (!info.type_name)
+            {
+                RaiseErrorV(0, ProgrammingError, "Unable to find TVP type name.");
+                return false;
+            }
+            ret = SQLAllocHandle(SQL_HANDLE_STMT, GetConnection(cur)->hdbc, &chstmt);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLAllocHandle", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            ret = SQLSetStmtAttr(chstmt, SQL_SOPT_SS_NAME_SCOPE, (SQLPOINTER)SQL_SS_NAME_SCOPE_TABLE_TYPE,
+                                 SQL_IS_UINTEGER);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLSetStmtAttr", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            ret = SQLColumnsW(chstmt, NULL, 0, NULL, 0, info.type_name, SQL_NTS, NULL, 0);
+            if (!SQL_SUCCEEDED(ret))
+            {
+                RaiseErrorFromHandle(cur->cnxn, "SQLColumnsW", GetConnection(cur)->hdbc, cur->hstmt);
+                return false;
+            }
+            while (SQL_SUCCEEDED(SQLFetch(chstmt)))
+            {
+                if (ncols++)
+                {
+                    pyodbc_realloc((BYTE**)&info.nested, ncols * sizeof(ParamInfo));
+                    column = info.nested + ncols - 1;
+                }
+                else
+                    column = info.nested = (ParamInfo*)pyodbc_malloc(ncols * sizeof(ParamInfo));
+                memset((void*)column, 0, sizeof(ParamInfo));
+                column->ValueType = SQL_C_DEFAULT;
+                column->ParameterType = SQL_VARCHAR;
+                column->StrLen_or_Ind = SQL_DATA_AT_EXEC;
+            }
+            for (i = 0; i < ncols; ++i) {
+                // TODO Replace hard-coded type information.
+                ret = SQLBindParameter(cur->hstmt, (SQLUSMALLINT)(i + 1), SQL_PARAM_INPUT,
+                                       SQL_C_DEFAULT, SQL_VARCHAR, 0, 0, info.nested + i, 0,
+                                       &info.nested[i].StrLen_or_Ind);
+                if (!SQL_SUCCEEDED(ret))
+                {
+                    RaiseErrorFromHandle(cur->cnxn, "SQLBindParameter", GetConnection(cur)->hdbc, cur->hstmt);
+                    return false;
+                }
+            }
+        }
+
         ret = SQLSetStmtAttr(cur->hstmt, SQL_SOPT_SS_PARAM_FOCUS, 0, SQL_IS_INTEGER);
         if (!SQL_SUCCEEDED(ret))
         {
diff --git a/src/pyodbc.h b/src/pyodbc.h
index 6bef3e6..55f7af4 100644
--- a/src/pyodbc.h
+++ b/src/pyodbc.h
@@ -82,6 +82,14 @@ typedef int Py_ssize_t;
 #define SQL_CA_SS_TYPE_NAME 1227
 #endif
 
+#ifndef SQL_SOPT_SS_NAME_SCOPE
+#define SQL_SOPT_SS_NAME_SCOPE 1237
+#endif
+
+#ifndef SQL_SS_NAME_SCOPE_TABLE_TYPE
+#define SQL_SS_NAME_SCOPE_TABLE_TYPE 1
+#endif
+
 inline bool IsSet(DWORD grf, DWORD flags)
 {
     return (grf & flags) == flags;

@bkline
Copy link
Contributor Author

bkline commented Jun 3, 2020

As expected, there is a performance penalty when we ask the back end for help in figuring out how many columns an empty TVP has. To measure, I moved my testing to a development configuration in which the database server and client are not on the same machine. It took five seconds to invoke the stored procedure 1,000 times with two rows in the TVP. Also five seconds with one row per call. When the calls use an empty TVP the 1,000 invocations take seven seconds. So in our use case (see earlier comment above), you'd be hard pressed to measure the penalty, and even in the degenerate case in which most (or all) of the TVPs are empty, it would be hard to argue that an exorbitant price is being exacted for avoiding crashing the interpreter. I mean, it's nice to have a performant database module, but there's only so much you should be expected to do to save developers from their own misguided application designs, right? 😉

@bkline
Copy link
Contributor Author

bkline commented Jun 3, 2020

While we've got this part of the code under the microscope (and while I'm waiting for some feedback), can I ask what's going on in this code in GetTableInfo() (line 1198, ff. of params.cpp)?

    if (!nskip)
    {
        // Need to describe in order to fill in IPD with the TVP's type name, because user has not provided it
        SQLSMALLINT tvptype;
        SQLDescribeParam(cur->hstmt, index + 1, &tvptype, 0, 0, 0);
    }

Looks like a mistake (on several fronts), but I'm open to other explanations.

[Hint: tvptype gets a value which is never used.]

@v-chojas
Copy link
Contributor

v-chojas commented Jun 3, 2020

I don't think such complexity is necessary, since the default value of a TVP is empty and so it should be possible to just skip binding it completely; the fact that skipping the first loop with an additional condition will cause a crash later just means that other place should also skip.

can I ask what's going on in this code

The comment there explains why, but you noticed this behaviour yourself when you wrote this comment in your proposed change:

Note that we don't really need the values provided by SQLDescribeParam, but without invoking that function ... fails

@bkline
Copy link
Contributor Author

bkline commented Jun 3, 2020

I'll take your word for it that just calling SQLDescribeParam() by itself will get the type name. I didn't get that impression from the ODBC API documentation, but I'll be the first to admit that I don't always succeed at figuring out those docs.

In any case, I'm at a dead end. I tried your suggestion of adding the extra test to the while loop to prevent attempts to fetch outside the parameter value array from crashing the interpreter. I don't know what to do with your suggestion that we fix "that other place" to prevent the deferred crash, because it's happening inside SQLExecute(). My naïve assumption would be that SQLExecute() isn't happy with the fact that we've got a TVP whose columns aren't bound, even when the parameter's table has no rows.1 But I recognize that you've got a lot more experience in contributing to this project than I do, so you're more likely than I am to know how to implement the solution you're proposing, and I'm eagerly looking forward to testing it. 😋

1 According to Microsoft's docs, "After binding the table-valued parameter, the application must then bind each table-valued parameter column."

mkleehammer pushed a commit that referenced this issue Jan 22, 2021
Without this patch, passing a TVP with no rows will crash
the Python interpreter. Solution from v-chojas.

See #772.
@keitherskine
Copy link
Collaborator

Should have been fixed by PR #779 . Feel free to re-open if still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants