Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turn off unicode #77

Closed
sqlalchemy-bot opened this issue Feb 26, 2008 · 13 comments
Closed

turn off unicode #77

sqlalchemy-bot opened this issue Feb 26, 2008 · 13 comments

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by Anonymous

if the input and output are not uincode, then decode and encode cause some overhead, add a choice to turn unicode off could improve the performance a bit.

add a argument in Lookup and Template:
... ,using_unicode = True, ...

when turn off unicode, the compiled module source is saved with the proper charset, and adding

# -*- encoding:charset -*-

in head, escape is not needed.


Attachments: unicode.patch

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

hi there -

im reviewing your patches, thanks for them ! So far this particular one I can't accept:

  • the primary method to turn off the "unicode" conversion step expression matches, which is certainly fairly expensive, is to redefine the default_filter of the template: http://www.makotemplates.org/docs/filtering.html#filtering_expression_defaultfilters

  • the explicit kwargs in Template are to allow checking for valid arguments.

  • the use_unicode flag I don't exactly understand the point of. If it's that you're trying to have a template which contains multibyte characters and you'd like it to go straight through and generate a python file with a "coding" attribute at the top, its not that simple. See allow usage of non-ascii bytestring literals in templates #11 for reference. example (fails with the patch, as well as without):

          template = Template("""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »""", input_encoding='utf-8')
          assert template.render() == """Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
    

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Author

Anonymous wrote:

because strings in the compiled source code are unicode, like u'\xxxx', just removing "unicode" from default_filters does not work, it will causes DecodeError if the data is multibyte string.

so, strings must stay like in template source code, such as "我们", and add "# -- encoding:utf-8 --" in compiled source code.

In lexer.py, it try to decode all source code into Unicode, so we need a parameter to turn it off. Then removing "unicode" from default_filters will not cause DecodeError.

Instead of using Unicode, it must be more complicated, but speeds up a bit. I have used it this way and work fine. If you are interesting in it, I will refine the code and submit it again.

@sqlalchemy-bot
Copy link
Author

Changes by Anonymous:

  • changed status to reopened

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

can you please attach a template file illustrating what you're referring to ? if the idea is just, "unicode is too slow, just pass through utf-8 directly without processing", that historically has not worked with our particular approach (we tried). Like I pointed out in my example, the patch does not work.

@sqlalchemy-bot
Copy link
Author

Anonymous wrote:

I have updated the patch, and pass all the test cases, including two chinese templates, one using unicode, the other one using utf-8 directly for better performance.

If unicode is not neccessary, Can Mako turn off unicode at default or no unicode at all?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

this part of the patch:

@@ -563,7 +566,7 @@
             "try:")
         self.write_source_comment(node)
         self.printer.writelines(
-                "context.write(unicode(%s))" % node.attributes['expr'],
+                "context.write(%s)" % node.attributes['expr'],
             "finally:",
                 "context.caller_stack.nextcaller = None",
             None

should be calling upon the default_filters in the way that visitExpression does, since a %call approximates saying ${foo()} - so we wouldn't hardcode unicode(), but would instead pull from default_filters. It's a bug on my part, can you work that in to the patch ?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

this will also resolve #11. I do not recall what was causing AST parsing to fail over there since it does not seem to be happening now.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

oh also can we call the flag "disable_unicode=True"

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

...which would also replace default filters with [str()]. The point of the default filter of unicode() or str() is so that people can say ${5 + 7} and it renders. It of course can be cleared entirely for performance reasons.

@sqlalchemy-bot
Copy link
Author

Anonymous wrote:

I have updated the patch:

add default filters to %call tag.

replace disable_unicode as "disable_unicode"

set default_filters as ["str"] while disable_unicode is True.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

thanks. Committed a modified version in d5f83e6 which retains identical Mako behavior if the flag is off, which is the default setting for both Template and TemplateLookup. Also added new documentation for this mode. Since not using unicode is against Mako's general philosophy, the docs warn against using this flag unless users are absolutely sure they want it (if anyone reports UnicodeDecode errors with this flag, they're using it wrong and will be urged to stop using it), and it's almost certain that this feature will not be available in the Python 3000 version since Py3K standardizes on unicode strings everywhere.

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • changed status to closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant