-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More performance improvements #743
Conversation
- iter_descendants changes: - in the base class, handle the generic case of no children - in section.py, always iterate into children (questions/groups) - in question.py, iterate into children only if new kwarg calls for it (actually not used right now but left it if it's needed). - survey.py - in setup_media(), invert `condition` as the positive list is shorter - in setup_xpath_dict(), store the element name to avoid extra lookup in tight loop (all questions/sections) - update self._xpath annotation to match condition filter - xls2json.py - more efficient merge_dicts()
- detect and re-use tuple[Option] rather than creating a new tuple containing those same Options by default, otherwise create Options from an Iterable[dict] as was done before.
- fewer string objects created = less memory used + faster - also in utils.py, use _attrs dict directly instead of looking up the key repeatedly (_get_attributes returns a NamedNodeMap object which is pretty much the same thing).
- instance_expression.py - not possible to match "instance(" if the string is too short - skip `find_boundaries` func body if no tokens found - skip `replace_with_output` func body if no boundaries found - utils.py - single-pass substitution of XML tokens with translation table - survey.py - skip `insert_output_values` func body if placeholder "-" found - instead of creating XML node just to do string replacement, add func to do the same. Checks for replacement characters first since CPU cost of that seems less a problem vs CPU + memory of creating new strings (due to replacement), particularly if the majority of strings don't have any matches. - updated perf test stats, so far since 2c878a3 about 10-20% faster, up to 10% less memory
- classes have attributes defined now so no need to use them like dicts - provides better warnings / navigation / refactoring via IDE - variable impact on performance (some slightly better, some worse) - also converted a few more string concatenations to f-strings
- when a "compact_tag" is specified in a XLSForm, this is translated by xls2json.py processing into "instance::odk:tag" (per aliases.py) which is then stored as Question.instance e.g. `{"odk:tag": "value"}`. This info then used by Question.xml_instance() to output these data as attributes on the instance element e.g. `<name @ODK:tag="value"/>`
- expression.py - remove is_single_token_expression and instead look for the token type of interest only. This requires making the lexer rules available outside of get_expression_lexer() and unfortunately compiling an extra time since the Scanner combines all regexes into one before compiling (so you can't give it pre-compiled regex). - change PYXFORM_REF regex to specifically look for optional `last-saved#` prefix instead of any ncname before a `#`. Not sure why it was like that but `last-saved` is quicker than ncname regex. - pyxform_reference.py - add some sanity checks on input `value` to avoid more expensive parsing in the remaining function body - survey.py - in _generate_last_saved_instance, combine the check performed on default/choice_filter/bind(items) into a func (in expression.py) - refactor check of bind items to more efficient method (per comment).
- previously just took RSS value at end of last "convert" but this can be affected by garbage collection mid-run, so now the RSS is polled after the run and the largest value is printed with average timings. - expression.py: seems that `match` a bit faster than `fullmatch`
- test provides internal dict structure directly but provides a non-string input for "default" whereas normally these values are provided from XLS/X or MD as strings. - the failure is at the function `has_last_saved` which looks for the length of a value but that is a TypeError for `int` - probably there are other places that would expect strings as well.
@@ -391,6 +391,22 @@ def validate(self): | |||
for child in self.children: | |||
child.validate() | |||
|
|||
def iter_descendants( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this implementation could go in some sort of shared place? Everyone's favorite, a util file? I'm worried about these identical implementations drifting apart...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree 3x copy/pastas = probably time for a shared func. I didn't do that here because of the points noted in #745. In which case the only copy of this method override for iter_descendants
would be MultipleChoice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 🏃♀️
A few more general ideas following #740
Why is this the best possible solution? Were any other approaches considered?
See commit messages for details of each change. Overall a modest but worthwhile improvement. I think to get much more will require addressing inefficiencies in XML generation, which may require overriding more of of minidom's Element class and perhaps Attr as well. Also the relative repeats recursion issue remains.
What are the regression risks?
Should be minimal risk, the only tests changes were to update the performance test stats.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
No
Before submitting this PR, please make sure you have:
tests
python -m unittest
and verified all tests passruff format pyxform tests
andruff check pyxform tests
to lint code