Metric for total compute time by task prefix, including currently-execuing tasks #7601

gjoseph92 · 2023-02-28T23:08:56Z

In #7406, we added a prometheus metric for total on-thread compute time per task prefix. This basically gives you total CPU-seconds per second. This is quite a useful overview metric to show on grafana, since it can a) show generally which tasks were running when, and b) show how well-utilized the cluster was (if you have 20 threads, 20sec of compute time per second would be 100% utilization).

However, because this metric is only updated when tasks complete, if tasks are longer than the prometheus scrape interval, the metric can look quite misleading:

Here we're comparing rate(dask_scheduler_tasks_compute_seconds_total[15s]) and dask_worker_tasks{state="executing"}. This second metric was just updated by @crusaderky in #7506, but it's basically a gauge of the current number of tasks in state executing on each worker.

You'd expect these metrics to be pretty close to the same. If nthreads tasks were in state executing, you'd expect the total compute time to be about nthreads seconds. However, they look wildly different. I suspect part/most of this is because the tasks are (much?) slower than the Prometheus scrape interval.

If you had a 5min task, you'll get a bunch of scrapes with no increase in the tasks_compute_seconds_total metric, then one scrape where it bursts up by 5min. I'm hoping that explains the spikiness here.

Ideally we could add another metric (or update this metric) to include time tasks have spent on the threadpool so far, even when they haven't completed yet.

Though it's also interesting that even if you set the aggregation interval to something very long like 30min, the averaged-out rate is still much lower than the theoretical max:

Because dask_worker_tasks is a gauge, it's also possible that this is an overestimate (it's only sampled every 5s, or whatever the scrape interval is; any idle time between the scrapes is not captured).

The high worker CPU in this case makes me think it's probably not that far off, though.

The text was updated successfully, but these errors were encountered:

gjoseph92 · 2023-03-07T17:50:46Z

After discussion with @crusaderky, this overlaps a lot (in terms of code changes) with current work on #7565. So the sequence will be:

Some variant of Fine performance metrics for execute, gather_dep, etc. #7586
Linking up ^^ with prometheus
Then, looking into refactoring that tracing to include currently-elapsed time

fjetter · 2023-03-14T14:00:48Z

I do not agree with the notion that these two lines necessarily need to align. This all is about semantics / how the actual counter is defined. These two plots are not showing the same information. The one metric is defined as "total finished compute time" while the other one is "tasks currently in state executing on workers". There is a correlation, of course, but I don't understand why it has to be aligned.

This is mildly off topic and likely not helping for this specific case but please note that prometheus can indeed export metrics with a custom, specific timestamp, i.e. if we know the values of something at a given time we can actually export a value for a specific time. This will likely not help a lot with "smearing out" a metric as you are intending here but this came up frequently enough in the past that I felt the need to share it.

Looking at something like the GaugeMetricFamily we're using we will notice that the API GaugeMetricFamily.add_metric actually exposes a kwarg timestamp which can be used to specify the exact time when an event occured.

To illustrate I set this for the partially_saturated metric below

# /metrics
dask_scheduler_workers{state="idle"} 0.0
# See how the value and the timestamp is exported
dask_scheduler_workers{state="partially_saturated"} 0.0 1678802020560
dask_scheduler_workers{state="saturated"} 0.0
dask_scheduler_workers{state="paused_or_retiring"} 0.0

gjoseph92 added the diagnostics label Feb 28, 2023

gjoseph92 mentioned this issue Mar 11, 2023

[RFC] Fine-grained metrics using Spans #7619

Draft

crusaderky mentioned this issue Mar 17, 2023

Fine performance metrics meta-issue #7665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric for total compute time by task prefix, including currently-execuing tasks #7601

Metric for total compute time by task prefix, including currently-execuing tasks #7601

gjoseph92 commented Feb 28, 2023

gjoseph92 commented Mar 7, 2023

fjetter commented Mar 14, 2023

Metric for total compute time by task prefix, including currently-execuing tasks #7601

Metric for total compute time by task prefix, including currently-execuing tasks #7601

Comments

gjoseph92 commented Feb 28, 2023

gjoseph92 commented Mar 7, 2023

fjetter commented Mar 14, 2023