turn off unicode #77

sqlalchemy-bot · 2008-02-26T05:00:33Z

Migrated issue, originally created by Anonymous

if the input and output are not uincode, then decode and encode cause some overhead, add a choice to turn unicode off could improve the performance a bit.

add a argument in Lookup and Template:
... ,using_unicode = True, ...

when turn off unicode, the compiled module source is saved with the proper charset, and adding

# -*- encoding:charset -*-

in head, escape is not needed.

Attachments: unicode.patch

The text was updated successfully, but these errors were encountered:

sqlalchemy-bot · 2008-03-01T00:55:49Z

Michael Bayer (@zzzeek) wrote:

hi there -

im reviewing your patches, thanks for them ! So far this particular one I can't accept:

the primary method to turn off the "unicode" conversion step expression matches, which is certainly fairly expensive, is to redefine the default_filter of the template: http://www.makotemplates.org/docs/filtering.html#filtering_expression_defaultfilters
the explicit kwargs in Template are to allow checking for valid arguments.

the use_unicode flag I don't exactly understand the point of. If it's that you're trying to have a template which contains multibyte characters and you'd like it to go straight through and generate a python file with a "coding" attribute at the top, its not that simple. See allow usage of non-ascii bytestring literals in templates #11 for reference. example (fails with the patch, as well as without):

      template = Template("""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »""", input_encoding='utf-8')
      assert template.render() == """Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""

sqlalchemy-bot · 2008-03-01T00:55:49Z

Changes by Michael Bayer (@zzzeek):

changed status to closed

sqlalchemy-bot · 2008-03-04T02:23:54Z

Anonymous wrote:

because strings in the compiled source code are unicode, like u'\xxxx', just removing "unicode" from default_filters does not work, it will causes DecodeError if the data is multibyte string.

so, strings must stay like in template source code, such as "我们", and add "# -- encoding:utf-8 --" in compiled source code.

In lexer.py, it try to decode all source code into Unicode, so we need a parameter to turn it off. Then removing "unicode" from default_filters will not cause DecodeError.

Instead of using Unicode, it must be more complicated, but speeds up a bit. I have used it this way and work fine. If you are interesting in it, I will refine the code and submit it again.

sqlalchemy-bot · 2008-03-04T02:23:54Z

Changes by Anonymous:

changed status to reopened

sqlalchemy-bot · 2008-03-04T16:11:22Z

Michael Bayer (@zzzeek) wrote:

can you please attach a template file illustrating what you're referring to ? if the idea is just, "unicode is too slow, just pass through utf-8 directly without processing", that historically has not worked with our particular approach (we tried). Like I pointed out in my example, the patch does not work.

sqlalchemy-bot · 2008-03-06T09:52:40Z

Anonymous wrote:

I have updated the patch, and pass all the test cases, including two chinese templates, one using unicode, the other one using utf-8 directly for better performance.

If unicode is not neccessary, Can Mako turn off unicode at default or no unicode at all?

sqlalchemy-bot · 2008-03-07T13:45:53Z

Michael Bayer (@zzzeek) wrote:

this part of the patch:

@@ -563,7 +566,7 @@
             "try:")
         self.write_source_comment(node)
         self.printer.writelines(
-                "context.write(unicode(%s))" % node.attributes['expr'],
+                "context.write(%s)" % node.attributes['expr'],
             "finally:",
                 "context.caller_stack.nextcaller = None",
             None

should be calling upon the default_filters in the way that visitExpression does, since a %call approximates saying ${foo()} - so we wouldn't hardcode unicode(), but would instead pull from default_filters. It's a bug on my part, can you work that in to the patch ?

sqlalchemy-bot · 2008-03-07T13:47:34Z

Michael Bayer (@zzzeek) wrote:

this will also resolve #11. I do not recall what was causing AST parsing to fail over there since it does not seem to be happening now.

sqlalchemy-bot · 2008-03-07T13:48:44Z

Michael Bayer (@zzzeek) wrote:

oh also can we call the flag "disable_unicode=True"

sqlalchemy-bot · 2008-03-07T13:51:07Z

Michael Bayer (@zzzeek) wrote:

...which would also replace default filters with [str()]. The point of the default filter of unicode() or str() is so that people can say ${5 + 7} and it renders. It of course can be cleared entirely for performance reasons.

sqlalchemy-bot · 2008-03-10T03:29:58Z

Anonymous wrote:

I have updated the patch:

add default filters to %call tag.

replace disable_unicode as "disable_unicode"

set default_filters as ["str"] while disable_unicode is True.

sqlalchemy-bot · 2008-03-21T20:14:53Z

Michael Bayer (@zzzeek) wrote:

thanks. Committed a modified version in d5f83e6 which retains identical Mako behavior if the flag is off, which is the default setting for both Template and TemplateLookup. Also added new documentation for this mode. Since not using unicode is against Mako's general philosophy, the docs warn against using this flag unless users are absolutely sure they want it (if anyone reports UnicodeDecode errors with this flag, they're using it wrong and will be urged to stop using it), and it's almost certain that this feature will not be available in the Python 3000 version since Py3K standardizes on unicode strings everywhere.

sqlalchemy-bot · 2008-03-21T20:14:53Z

Changes by Michael Bayer (@zzzeek):

changed status to closed

sqlalchemy-bot closed this as completed Mar 21, 2008

sqlalchemy-bot added compiler low priority feature labels Nov 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turn off unicode #77

turn off unicode #77

sqlalchemy-bot commented Feb 26, 2008

sqlalchemy-bot commented Mar 1, 2008

sqlalchemy-bot commented Mar 1, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 6, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 10, 2008

sqlalchemy-bot commented Mar 21, 2008

sqlalchemy-bot commented Mar 21, 2008

turn off unicode #77

turn off unicode #77

Comments

sqlalchemy-bot commented Feb 26, 2008

sqlalchemy-bot commented Mar 1, 2008

sqlalchemy-bot commented Mar 1, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 4, 2008

sqlalchemy-bot commented Mar 6, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 7, 2008

sqlalchemy-bot commented Mar 10, 2008

sqlalchemy-bot commented Mar 21, 2008

sqlalchemy-bot commented Mar 21, 2008