Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of custom task outputs #2364

Merged
merged 4 commits into from
Aug 4, 2017

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Jul 19, 2017

Store custom outputs in separate DB table task_outputs.

  • Each task proxy (that has custom outputs) has a single row in the new table. The completed outputs are stored as a single value in the outputs column of the row. The value is a multi-line string, each line is a trigger=message pair.
  • Remove custom outputs and duplicated outputs from the task_events table.
    • Update cylc suite-state to query the new table instead of task_events.

Correctly load completed custom outputs on restart.

  • For running, succeeded and failed tasks, which may have some completed outputs.
  • Instead of setting all outputs to completed only for succeeded tasks.

Fix job.status exit time on failure.

  • This was lost, as we no longer pass the failed argument to cylc message on failure.
    Fix DB task jobs run exit time format.

Close #2361. Close #2362.

@matthewrmshin matthewrmshin added the bug Something is wrong :( label Jul 19, 2017
@matthewrmshin matthewrmshin added this to the next release milestone Jul 19, 2017
@matthewrmshin matthewrmshin self-assigned this Jul 19, 2017
@hjoliver
Copy link
Member

hjoliver commented Jul 25, 2017

I like this, just wondering whether we need the 'trigger' recorded (in what you refer to as trigger=message). That is really just a label that points to the associated message under task [[[outputs]]] to save us from writing the full output message in the graph. I can't think of any use for multiple outputs with the exact same message, so the label is redundant/useless information in the table, no?

@matthewrmshin
Copy link
Contributor Author

This is broadly true for the latest implementation. However, I have now modified the --message= option of cylc suite-state to accept either the trigger string or the message string (in line with cylc reset --output=OUTPUT).

@matthewrmshin
Copy link
Contributor Author

Branch re-based.

@hjoliver
Copy link
Member

This is broadly true ...

Roger that.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of very minor points.

return len(res) > 0
res = self.suite_state_query(task, cycle, status, message)
if status:
return bool(res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bool just checking for a non-empty list?

Note that:

>>> bool(None)
False
>>> bool([None])
True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res should be a list, so this is testing for a non-empty list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case I'm OK with this.

r"LEFT OUTER JOIN " +
r" %(task_outputs)s " +
r"ON %(task_pool)s.cycle == %(task_outputs)s.cycle AND " +
r" %(task_pool)s.name == %(task_outputs)s.name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting a refactor of this here but to note [for the future] that python's implicit string joining syntax...

('a'
 'b'
 'c')

...is O(1) whereas string operators...

('a' +
 'b' +
 'c')

...are O(N) - and also a little messier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say that I have always found this sort of thing surprising (or is it disappointing?) with Python. Surely, Python should optimise this away at compile time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprising, no, disappointing, yes.

Store custom outputs in separate DB table.
Correcyly load completed custom outputs on restart.
Fix job.status exit time on failure.
Fix DB task jobs run exit time format.
On resetting a held task to submitted/running, ensure that the status is
set to submitted(held)/running(held) instead of held(submitted/running).

On shutdown, ensure that database is written last.
@matthewrmshin
Copy link
Contributor Author

(Finally got test to pass on Travis CI!)

self.suite_db_mgr.process_queued_ops()
self.suite_db_mgr.on_suite_shutdown()
except StandardError:
ERR.error(str(exc))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except StandardError as exc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, haven't uncovered any problems.

@hjoliver hjoliver merged commit 96c360a into cylc:master Aug 4, 2017
@matthewrmshin matthewrmshin deleted the fix-restart-task-outputs branch August 4, 2017 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants