#102 - Add overhead value to TestTask #243

worldomonation · 2020-07-16T18:49:42Z

This implements the TestTask.overhead property that returns a float value representing non-group time.

Example Usage:

>>> import mozci
>>> from mozci.push import Push
>>> push = Push('b32bf0e2fa08e29748da69ac64fbe41659ad372c', branch='autoland')
>>> task = push.tasks[3]
>>> task.overhead
447.7479989528656

Or..

>>> for label, overhead in [(task.label, task.overhead) for task in push.tasks if type(task) == TestTask]:
...     print(label, overhead)
...
test-macosx1014-64/debug-mochitest-browser-chrome-e10s-4 141.60400009155273
test-linux1804-64/debug-mochitest-browser-chrome-fis-e10s-7 117.6800000667572
test-windows7-32/debug-web-platform-tests-reftest-e10s-2 581.414999961853
test-macosx1014-64/debug-xpcshell-e10s-2 110.23099994659424
test-macosx1014-64-shippable/opt-mochitest-devtools-chrome-e10s-3 118.6470000743866
test-linux1804-64-asan/opt-mochitest-browser-chrome-e10s-13 376.37199997901917
test-windows7-32/debug-mochitest-browser-chrome-e10s-3 293.8619999885559
test-linux1804-64-asan/opt-mochitest-browser-chrome-e10s-13 124.0220000743866

- implement the TestTask.overhead property that returns a float value representing non-group time. - add new query to obtain start/end times for the group and task.

marco-c · 2020-07-16T22:55:16Z

mozci/queries/test_task_overhead.query

+from: unittest
+format: list
+groupby: task.id
+limit: 20000


Do we need such a high limit? If so, we might want to use destination: url like https://github.com/mozilla/mozci/blob/737c4cc0810bd745d13c90c511d18afff6baee20/mozci/queries/push_revisions.query.

No we do not - this was just carried over from the working query. It should return only 1 row. 1000 ought to be enough, though I'd like @klahnakoski to weigh in.

mozci/task.py

marco-c · 2020-07-16T23:42:34Z

mozci/task.py

@@ -358,6 +360,30 @@ def configuration(self):
        parts = config.split("-")
        return "-".join(parts[:-1] if parts[-1].isdigit() else parts)

+    @property
+    def overhead(self):


This is defined on a specific task, but we will have to define overheads, total_overhead, median_overhead on LabelSummary to actually use the information (e.g. the median overhead for a given suite on a given platform/configuration).

For task-level scheduling, we can just consider the sum of the median durations of all scheduled tasks.
For manifest-level scheduling, for each task we'll have to sum the durations of the groups in the task and sum the overhead associated with the task (I think by having a median overhead by suite and platform/configuration, we might be precise enough; maybe even just platform/configuration?).

Right, I will implement them while I wait for review from ahal and ekyle.

@worldomonation I will require the group-level run times, but I plan to compute the aggregates after the data has been put into the database. Specifically, there is a BigQuery database with some of the data generated by mozci; my plan is to use include what these functions generate and form queries from there.

- fix typos

klahnakoski · 2020-07-17T18:20:34Z

mozci/task.py

+        assert data["task_min"] < data["group_min"]
+        assert data["task_max"] > data["group_max"]


Do not crash on a problem: We should be emitting problems as "warnings" to the logs and also declaring the overhead a zero (our best estimate).

Depanding on your logging library, something like:

if not (data["task_min"] <= data["group_min"] <= data["group_max"] <= data["task_max"]): Log.warning( "task {{task.id}} has inconsistent group duration: {{data}}", task=task, data=data ) return 0

Why shouldn't we crash on a problem?

I agree crashing on a problem is bad if you're writing an ETL, but mozci is not an ETL, it's a library. Some consumers of the library will want to ignore errors and keep going (like an ETL). Other consumers of the library will want to fail loudly so they can be notified immediately when things go awry. The nice thing about crashing is that it lets the consumers decide how to handle the situation.

Though assertions are trickier to catch than a proper Exception, so if we're going to do this validation it should be using something like ValueError and the fact ValueError can be raised should be well documented.

@ahal Since you are the owner, I will leave the error mitigation patterns to you. In the meantime, my reasons are:

We have a reasonable mitigation - The assert is catching logic bugs: raising exception when the data is incoherent. The definition of overhead is our "best estimate of overhead": In the event of incoherent data, a zero estimate is a legitimate return value.

Less work for callers - For every raised exception, there is at least one try/catch needed to handle that error. All callers must implement their own mitigation, possibly the same mitigation. I doubt anyone will be writing an exception handler for these asserts.

Callers have less context - Callers are less likely to know a suitable mitigation: It is likely the caller will let the error propagate up to terminate the program.

asserts are for debugging - These specific asserts exist to crash in the event of bad data; to crash in the event of unanticipated data; which is useful only in debugging. My suggestion is to shunt the error to stderr in production, and consume that channel offline to guide further development.

this is a rare event - I do not expect these asserts to be broken

Yes, letting an exception escape the callee allows the caller to decide mitigation, but I am not advocating suppression of all exceptions. In this case, we have a reasonable mitigation strategy that I doubt any caller will ever write a handler for; plus it remains visible to anyone monitoring stderr.

Of course, I can always be wrong. I am sure I have promoted past warnings to errors so that a better strategy can be implemented by a caller: In which case I advocate a more detailed description so the caller has the context to make the best decision:

if not (data["task_min"] <= data["group_min"] <= data["group_max"] <= data["task_max"]): Log.error( "task {{task.id}} has inconsistent group duration: {{data}}", task=task, data=data )

klahnakoski · 2020-07-17T18:32:04Z

mozci/task.py

@@ -358,6 +360,30 @@ def configuration(self):
        parts = config.split("-")
        return "-".join(parts[:-1] if parts[-1].isdigit() else parts)

+    @property
+    def overhead(self):


@worldomonation I will require the group-level run times, but I plan to compute the aggregates after the data has been put into the database. Specifically, there is a BigQuery database with some of the data generated by mozci; my plan is to use include what these functions generate and form queries from there.

- implement overhead-related methods for LabelSummary

- add overhead related abstract properties to GroupSummary, RunnableSummary - GroupSummary overhead will currently return 0

ahal · 2020-07-20T15:45:11Z

mozci/queries/test_task_overhead.query

+    - {value: result.start_time, name: group_min, aggregate: min}
+    - {value: result.end_time, name: group_max, aggregate: max}
+where:
+    - in: {task.id: {$eval: task_id}}


Rather than having a separate query for this (or maybe in addition), I think we could refactor the existing push_task_results_from_unittest.query to include this information as well, saving ourselves from running a separate query.

I'd be fine if you decide to file a follow-up issue to implement this and save it for later.

ahal · 2020-07-20T15:59:41Z

mozci/task.py

+        return [task.overhead for task in self.tasks]
+
+    @property
+    def total_overheads(self):


nit: shouldn't be pluralized

ahal · 2020-07-20T16:00:06Z

mozci/task.py

@@ -358,6 +360,30 @@ def configuration(self):
        parts = config.split("-")
        return "-".join(parts[:-1] if parts[-1].isdigit() else parts)

+    @property


This should be memoized

worldomonation · 2020-09-03T05:18:09Z

I haven't forgotten about this PR yet, but I can't promise when I can get to it - job search takes first priority.

I can likely address @ahal's comments on the method name and memoization but I think it would be better to not refactor the ActiveData query at this point in time.

worldomonation changed the title ~~Changes:~~ #102 - Add overhead value to TestTask Jul 16, 2020

Changes:

4f4d24c

- implement the TestTask.overhead property that returns a float value representing non-group time. - add new query to obtain start/end times for the group and task.

worldomonation force-pushed the 102_part_1 branch from 1a26793 to 4f4d24c Compare July 16, 2020 21:20

worldomonation requested review from ahal and marco-c July 16, 2020 21:26

marco-c reviewed Jul 16, 2020

View reviewed changes

mozci/task.py Outdated Show resolved Hide resolved

marco-c reviewed Jul 16, 2020

View reviewed changes

mozci/task.py Outdated Show resolved Hide resolved

marco-c reviewed Jul 16, 2020

View reviewed changes

mozci/task.py Show resolved Hide resolved

marco-c reviewed Jul 16, 2020

View reviewed changes

Address code review commenst by @marco.

15693e7

- fix typos

worldomonation requested a review from klahnakoski July 17, 2020 00:40

klahnakoski reviewed Jul 17, 2020

View reviewed changes

worldomonation added 2 commits July 17, 2020 11:35

Changes:

dd3edde

- implement overhead-related methods for LabelSummary

Address code review comments from @klahnakoski

64559bd

- add overhead related abstract properties to GroupSummary, RunnableSummary - GroupSummary overhead will currently return 0

ahal requested changes Jul 20, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#102 - Add overhead value to TestTask #243

#102 - Add overhead value to TestTask #243

worldomonation commented Jul 16, 2020 •

edited

Loading

marco-c Jul 16, 2020

worldomonation Jul 17, 2020 •

edited

Loading

marco-c Jul 16, 2020

worldomonation Jul 17, 2020

klahnakoski Jul 17, 2020

klahnakoski Jul 17, 2020

ahal Jul 17, 2020 •

edited

Loading

klahnakoski Jul 21, 2020

klahnakoski Jul 17, 2020

ahal Jul 20, 2020

ahal Jul 20, 2020

ahal Jul 20, 2020

worldomonation commented Sep 3, 2020

		assert data["task_min"] < data["group_min"]
		assert data["task_max"] > data["group_max"]

#102 - Add overhead value to TestTask #243

Are you sure you want to change the base?

#102 - Add overhead value to TestTask #243

Conversation

worldomonation commented Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

worldomonation Jul 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahal Jul 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

worldomonation commented Sep 3, 2020

worldomonation commented Jul 16, 2020 •

edited

Loading

worldomonation Jul 17, 2020 •

edited

Loading

ahal Jul 17, 2020 •

edited

Loading