Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework similary/duplicate query grouping #1636

Merged
merged 4 commits into from
Jun 1, 2022

Conversation

living180
Copy link
Contributor

The existing code for tracking and grouping similar and duplicate queries was complicated, hard to understand (nested dict comprehensions), and inefficient (multiple nested loops, using exceptions for flow control, calling key-computation functions twice for each query). Tweak the data structures to allow significant simplification of the code, and pull duplicated functionality into a shared function.

@matthiask
Copy link
Member

I'm surprised. I thought there was a good reason for using the pprint module, because of unhashable types and other problems with data structures (I could be wrong, it has been a long time). LGTM.

@living180
Copy link
Contributor Author

I thought there was a good reason for using the pprint module, because of unhashable types and other problems with data structures (I could be wrong, it has been a long time).

You are correct that raw_params cannot be used as a hash key directly because it can (and frequently does) include unhashable types such as lists. So the approach you used for duplicate_query_key() is to convert raw_params into a string (originally using pformat() in 87b7c31, then saferepr() in 48a825a). The approach is sound (and actually quite clever), but I think repr() should be sufficient for this case - it doesn't care about hashability and does give a useful string representation. The only additional protection that saferepr() brings is protection against recursive data structures (such as a list containing itself). However, trying to pass a recursive data structure as an SQL query parameter doesn't make sense and would cause things to blow up long before they reach the SQLPanel code, thus I don't think saferepr() is really necessary here.

living180 added 4 commits June 1, 2022 05:00
Avoid redefining them every time the SQLPanel.generate_stats() method is
called.
Tweak the data structures to reduce the number of loops required, and
pull common code into a function.
saferepr() is only necessary to guard against recursive data structures.
However, if you pass a recursive data structure as an SQL query
parameter, You're Gonna Have A Bad Time.  So just use repr() which is
faster.
@matthiask matthiask merged commit 6897235 into django-commons:main Jun 1, 2022
@matthiask
Copy link
Member

Thanks!

@living180 living180 deleted the query_grouping branch June 2, 2022 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants