Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy doubles percent sign (%) #56

Closed
mister-average opened this issue Nov 14, 2018 · 6 comments
Closed

SQLAlchemy doubles percent sign (%) #56

mister-average opened this issue Nov 14, 2018 · 6 comments

Comments

@mister-average
Copy link

(But it can be fixed by setting self._double_percents = False in AthenaIdentifierPreparer __init__)

Broken:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine(.....)
>>> session = engine.connect()
>>> sql = "select date_parse('20191030', '%Y%m%d' )"
>>> txt = sqlalchemy.sql.text(sql)
>>> result = session.execute(txt, {})


Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.6/site-packages/pyathena/util.py", line 28, in _wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pyathena/cursor.py", line 46, in execute
    raise OperationalError(query_execution.state_change_reason)
pyathena.error.OperationalError: INVALID_FUNCTION_ARGUMENT: Invalid format: "20191030"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.6/site-packages/pyathena/util.py", line 28, in _wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pyathena/cursor.py", line 46, in execute
    raise OperationalError(query_execution.state_change_reason)
sqlalchemy.exc.OperationalError: (pyathena.error.OperationalError) INVALID_FUNCTION_ARGUMENT: Invalid format: "20191030" [SQL: "select date_parse('20191030', '%%Y%%m%%d' )"] (Background on this error at: http://sqlalche.me/e/e3q8)

(See how it has doubled the percent signs in the last line of the traceback?)

Fix by changing sqlalchemy_athena.py to have this:

class AthenaIdentifierPreparer(IdentifierPreparer):
    """PrestoIdentifierPreparer

    https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_presto.py"""
    reserved_words = UniversalSet()

    def __init__(self, dialect, initial_quote='"',
                 final_quote=None, escape_quote='"',
                 quote_case_sensitive_collations=True, omit_schema=False):
        super(AthenaIdentifierPreparer, self).__init__(dialect, initial_quote, final_quote, escape_quote, quote_case_sensitive_collations, omit_schema)
        self._double_percents = False

Fixed:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine(....., echo="debug")
>>> session = engine.connect()
>>> sql = "select date_parse('20191030', '%Y%m%d' )"
>>> txt = sqlalchemy.sql.text(sql)
>>> result = session.execute(txt, {})
2018-11-13 23:22:15,519 INFO sqlalchemy.engine.base.Engine select date_parse('20191030', '%Y%m%d' )
2018-11-13 23:22:15,519 INFO sqlalchemy.engine.base.Engine {}
2018-11-13 23:22:16,813 DEBUG sqlalchemy.engine.base.Engine Col ('_col0',)
>>> result.fetchall()
2018-11-13 23:22:27,171 DEBUG sqlalchemy.engine.base.Engine Row (datetime.datetime(2019, 10, 30, 0, 0),)
[(datetime.datetime(2019, 10, 30, 0, 0),)]
>>> session.execute(sqlalchemy.text("select :word"), {'word':"cat"}).fetchall()
2018-11-13 23:23:27,964 INFO sqlalchemy.engine.base.Engine select %(word)s
2018-11-13 23:23:27,965 INFO sqlalchemy.engine.base.Engine {'word': 'cat'}
2018-11-13 23:23:29,199 DEBUG sqlalchemy.engine.base.Engine Col ('_col0',)
2018-11-13 23:23:29,199 DEBUG sqlalchemy.engine.base.Engine Row ('cat',)
[('cat',)]
@laughingman7743
Copy link
Owner

If the query contains % characters it will be doubly escaped. 😭
Your modification seems to be good. 👍

@laughingman7743
Copy link
Owner

laughingman7743 commented Nov 17, 2018

If you set the _double_percents option of IdentifierPreparer to False, an error will occur if the % character and the query parameter are contained.

import sqlalchemy
from urllib.parse import quote_plus
from sqlalchemy.engine import create_engine


def main():
    conn_str = 'awsathena+rest://:@athena.{region_name}.amazonaws.com:443/' \
               '{schema_name}?s3_staging_dir={s3_staging_dir}'
    engine = create_engine(conn_str.format(
        region_name='us-west-2',
        schema_name='default',
        s3_staging_dir=quote_plus('s3://BUCKET/path/to/')))
    session = engine.connect()

    query = sqlalchemy.sql.text("select date_parse('20191030', '%Y%m%d')")
    print(session.execute(query).fetchall())  # return [(datetime.datetime(2019, 10, 30, 0, 0),)]

    query = sqlalchemy.sql.text("select date_parse('20191030', '%Y%m%d'), :word")
    print(session.execute(query, word='cat').fetchall())  # raise exception


if __name__ == '__main__':
    main()
Traceback (most recent call last):
  File "/Users/laughingman7743/github/PyAthena/sqla_query_with_parameter.py", line 28, in <module>
    main()
  File "/Users/laughingman7743/github/PyAthena/sqla_query_with_parameter.py", line 24, in main
    print(session.execute(query, word='cat').fetchall())
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/util.py", line 28, in _wrapper
    return wrapped(*args, **kwargs)
  File "/Users/laughingman7743/github/PyAthena/pyathena/cursor.py", line 38, in execute
    self._query_id = self._execute(operation, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/common.py", line 154, in _execute
    query = self._formatter.format(operation, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/formatter.py", line 112, in format
    return (operation % kwargs).strip() if kwargs else operation.strip()
ValueError: unsupported format character 'Y' (0x59) at index 32

If formatting is done regardless of the query parameters as follows, the _double_percents option seems to be OK with default True setting.
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/formatter.py#L112

return (operation % kwargs).strip()

@mister-average
Copy link
Author

Argh! Of course that was too easy to work!

My use case is that I want to be able to copy sql statements that work in the Athena console and paste them directly into a Jupyter notebook, where the %%sql magic can execute them against Athena unmodified. Users can't be bothered to reformat the text with double percents.

Could it be that the real problem is that the visit_textclause() function in the default SQLCompiler class from sqlalchemy/sql/compiler.py calls self.post_process_text(textclause.text) -- which doubles percents -- REGARDLESS of whether there are any bind parameters or not? This behavior can be overridden and corrected by your AthenaCompiler class in pyathena/sqlalchemy_athena.py.

With the following changes only (the previous change I suggested is not needed):

from sqlalchemy.sql.compiler import BIND_PARAMS_ESC, BIND_PARAMS

class AthenaCompiler(SQLCompiler):
    """PrestoCompiler

    https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_presto.py"""
    def visit_char_length_func(self, fn, **kw):
        return 'length{0}'.format(self.function_argspec(fn, **kw))

    def visit_textclause(self, textclause, **kw):
        def do_bindparam(m):
            name = m.group(1)
            if name in textclause._bindparams:
                return self.process(textclause._bindparams[name], **kw)
            else:
                return self.bindparam_string(name, **kw)

        if not self.stack:
            self.isplaintext = True

        if len(textclause._bindparams) == 0:
            return textclause.text

        else:
            # un-escape any \:params
            return BIND_PARAMS_ESC.sub(
                lambda m: m.group(1),
                BIND_PARAMS.sub(
                    do_bindparam,
                    self.post_process_text(textclause.text))
            )

I now get this:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine(....)
>>> session = engine.connect()
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' )")).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0),)]
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' ), 'cat'")).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0), 'cat')]
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' ), :word"), {'word':"cat"}).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0), 'cat')]

Thank you for your time in looking at this.

@laughingman7743
Copy link
Owner

It is the following method to override.
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/sql/compiler.py#L759

It seems just to add the following conditions.

if len(textclause._bindparams) == 0:
    return textclause.text

It feels good. 👍
I will add some test cases and check if there is no problem.

@laughingman7743
Copy link
Owner

I fixed it with the following pull request. And a few test cases are added.
#57

Please check it.

@mister-average
Copy link
Author

Thank you so much! It works well for me!

laughingman7743 added a commit that referenced this issue Nov 26, 2018
…rcent_character_in_sqla

Fix double escaping of percent character in SQLAlchemy (fix #56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants