Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise time taken to get raw graph #2312

Merged
merged 2 commits into from
Jun 6, 2017

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented May 30, 2017

Reduce validation time. This is part of the work to look at why it would take 30 minutes to validate a user suite (cylc validate --strict). This change reduces the validation time of the suite to 10 minutes 6 minutes 4 minutes. Still not good enough, but moving towards the right direction.

The main change is the restructure of the variable self.edges. It is now a dict. Each key is a unique sequence, and each value is a set of (left, right, suicide, conditional) tuples. This change allows get_graph_raw to look at each unique sequence once instead of multiple times. There is also caching at various points to reduce the number of expensive calls to regular expression matching and date-time functionality.

Other minor changes include:

  • Remove unused variables.
  • Use compiled regular expressions where possible.
  • Reduce usage of complex regular expressions.
  • Remove edge objects, which were not cheap to initialise.
  • Replace graphnode objects with a GraphNodeParser singleton, which is able to cache of graph node parsing results.
  • Avoid multiple parsing and un-parsing of task IDs and cycle points.
  • Replace the internal recursive_replace function in SuiteConfig.generate_triggers with a while stack loop at a level above the original loop.

@matthewrmshin matthewrmshin added this to the next release milestone May 30, 2017
@matthewrmshin matthewrmshin self-assigned this May 30, 2017
@oliver-sanders oliver-sanders added the efficiency For notable efficiency improvements label May 31, 2017
@matthewrmshin matthewrmshin force-pushed the opt-validate-graphing branch 3 times, most recently from 2e8b527 to 08c0c3a Compare June 1, 2017 13:53
The main change is the restructure of the variable
`SuiteConfig().edges`. It is now a dict. Each key is a unique sequence,
and each value is a set of `(left, right, suicide, conditional)` tuples.
This change allows `get_graph_raw` to look at each unique sequence once
instead of multiple times. There is also caching at various points to
reduce the number of expensive calls to regular expression matching and
date-time functionality.

Other minor changes include:
* Remove unused variables.
* Use compiled regular expressions where possible.
* Reduce usage of complex regular expressions.
* Remove edge and graphnode objects, which were not cheap to initialise.
* New GraphNodeParser object to manage and cache graph node parsing.
* The `recursive_replace` internal function is replaced with a loop at a
  level up.
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, no problems found.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Some comments.

@@ -955,7 +957,7 @@ def compute_inheritance(self, use_simple_method=True):
OUT.info("Parsing the runtime namespace hierarchy")

results = OrderedDictWithDefaults()
n_reps = 0
# n_reps = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines can be removed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think these were being kept to support a commented-out comparison between alternative methods, that we should look at some time...)


expr_list = recursive_replace(expr_list, left, task_trigger)
stack = [expr_list]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment here to explain what this block is doing.

@@ -721,6 +722,7 @@ def __init__(self, suite, fpath, template_vars=None,
# "foo:fail => bar => !foo" looks like "foo => bar => foo").
graph = self.get_graph(ungroup_all=True, ignore_suicide=True,
is_validate=True)
GraphNodeParser.get_inst().clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

if lnode.offset_is_irregular:
offset_tuple = (lnode.offset_string, seq)
(str(first_point - last_point), seq))
elif offset_is_from_icp:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if offset_is_from_icp:
    ...
elif offset_is_from_icp:
    ...

Has the is_absolute logic been preserved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old logic, is_absolute is always set to the same value as offset_is_from_ict. In the new logic, we have unified the variables as offset_is_from_icp.

@@ -43,7 +43,7 @@ cmp_ok "${SUITE_RUN_DIR}/ctb-get-graph-raw.out" <<'__OUT__'
],
[
"t1.1",
null,
"T.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the consequence of this change in order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old logic was returning a list by using .values() method of a dict. The new logic sorts the list to fix the ordering.

self._offsets = {}
self._nodes = {}
self._offsets.clear()
self._nodes.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two fewer W0612 messages.

@oliver-sanders oliver-sanders merged commit a32641d into cylc:master Jun 6, 2017
@matthewrmshin matthewrmshin deleted the opt-validate-graphing branch June 6, 2017 15:26
matthewrmshin added a commit to matthewrmshin/cylc-flow that referenced this pull request Jul 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants