SQL query formatting improvements #1752

living180 · 2023-04-02T12:00:58Z

Description

Various improvements to the code which formats SQL statements for the SQL panel. The majority are performance improvements, with an overall 35% reduction in formatting time on my system. However, there is also an improvement in indentation for queries which use the CASE keyword, and better simplification of queries generated by .count() querysets.

Checklist:

I have added the relevant tests for this change.
I have added an item to the Pending section of docs/changes.rst.

Because the token values escaped by BoldKeywordFilter are simply intermediate values and are not directly included in HTML templates, use Python's html.escape() instead of django.utils.html.escape() to eliminate the overhead of converting the token values to SafeString. Also pass quote=False when calling escape() since the token values will not be used in quoted attributes.

sqlparse's SerializerUnicode filter does a bunch of fancy whitespace processing which isn't needed because the resulting string will just be inserted into HTML. Replace with a simple EscapedStringSerializer that does nothing but convert the Statement to a properly-escaped string. In the process stop the escaping within BoldKeywordFilter to have a cleaner separation of concerns: BoldKeywordFilter now only handles marking up keywords as bold, while escaping is explicitly handled by the EscapedStringSerializer.

Instead of using a regex to elide the select list in the simplified representation of an SQL query, use an sqlparse filter to elide the select list as a preprocessing step. The result ends up being about 10% faster.

Instead of only eliding select lists longer than 12 characters, now only elide select lists that contain a dot (from a column expression like `table_name`.`column_name`). The motivation for this is that as of Django 1.10, using .count() on a queryset generates SELECT COUNT(*) AS `__count` FROM ... instead of SELECT COUNT(*) FROM ... queries. This change prevents the new form from being elided.

If a query has subselects in its WHERE clause, do not elide the select lists in those subselects.

The "<strong>" tokens inserted by the BoldKeywordFilter were causing the AlignedIndentFilter to apply excessive indentation to queries which used CASE statements. Fix by rewriting BoldIndentFilter as a statement filter rather than a preprocess filter, and applying after AlignedIndentFilter.

When formatting SQL statements using sqparse, grouping only affects the output when AlignedIndentFilter is applied.

By using a settings_changed signal receiver to clear the query caching, the parse_sql() and _parse_sql() functions can be merged and the check for the "PRETTIFY_SQL" setting can be moved back inside the get_filter_stack() function.

matthiask

Thanks for refactoring the way settings are modified! That's an excellent change.

The code amount increase makes me a bit sad, but I think NOT hiding count(*) queries etc. is definitely a net positive.

Until now we haven't used sqlparse's insert_(before|after) methods. They are documented https://sqlparse.readthedocs.io/en/latest/analyzing/?highlight=insert_before#sqlparse.sql.TokenList.insert_before but I find the idx manipulation and those methods a bit disquieting. The test suite seems to cover the relevant cases, we are depending on sqlparse anyway and it has been quite stable over the years and that's sufficient for me, but I want to wait a bit if anyone else has some reservations about this change before I'll merge it.

Thanks!

matthiask · 2023-04-09T18:34:31Z

Thanks!

living180 added 5 commits March 27, 2023 14:39

Remove unhelpful comments

9448bdb

Replace select-list elision implementation

496c97d

Instead of using a regex to elide the select list in the simplified representation of an SQL query, use an sqlparse filter to elide the select list as a preprocessing step. The result ends up being about 10% faster.

living180 marked this pull request as draft April 2, 2023 12:06

living180 force-pushed the reformat_sql branch from ecc77b3 to 84a607c Compare April 2, 2023 12:24

living180 marked this pull request as ready for review April 2, 2023 12:31

living180 added 5 commits April 2, 2023 15:32

Only elide top-level select lists

ef9cfbb

If a query has subselects in its WHERE clause, do not elide the select lists in those subselects.

Only enable SQL grouping for AlignedIndentFilter

255efeb

When formatting SQL statements using sqparse, grouping only affects the output when AlignedIndentFilter is applied.

Eliminate intermediate _parse_sql() method

3751812

By using a settings_changed signal receiver to clear the query caching, the parse_sql() and _parse_sql() functions can be merged and the check for the "PRETTIFY_SQL" setting can be moved back inside the get_filter_stack() function.

Amend change log

e34ec83

living180 force-pushed the reformat_sql branch from 8729521 to e34ec83 Compare April 2, 2023 12:32

matthiask approved these changes Apr 2, 2023

View reviewed changes

matthiask merged commit 7b8a6cc into django-commons:main Apr 9, 2023

living180 deleted the reformat_sql branch April 15, 2023 08:52

danlamanna mentioned this pull request Aug 22, 2023

Some SQL queries make debug toolbar rendering very slow #1402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL query formatting improvements #1752

SQL query formatting improvements #1752

living180 commented Apr 2, 2023

matthiask left a comment

matthiask commented Apr 9, 2023

SQL query formatting improvements #1752

SQL query formatting improvements #1752

Conversation

living180 commented Apr 2, 2023

Description

Checklist:

matthiask left a comment

Choose a reason for hiding this comment

matthiask commented Apr 9, 2023