Skip to content

Commit

Permalink
WIP: Write-back throttling
Browse files Browse the repository at this point in the history
v1: This is quite a primitive attempt to keep write-back data under
control. We simply use the current loadavg to estimate a value of
outstanding write requests. If we get above 1 loadavg per active core
(by dividing loadavg by worker threads), we throttle execution for a
few milliseconds to give the disk time to write data back. The value
of 20ms per request was found experimentally (it matches around one
revolution of a standard HDD + some overhead). In my test setup it
works quite well: It keeps the CPU mostly as busy as before but the
loadavg peaks at around 9.5 for an 8-core system instead of going to
15+.

We cannot expect fossilize to progress any further if the disks cannot
keep up with the write-back amount from the GPU driver, anyways, so
there's no advantage in not waiting for small periods of time.

A better solution could measure the IO PSI data of the process trying
to keep the IO latency below a certain threshold. But the problem with
loadavg in Linux is that it measures everything in a system that waits
busily for events, be it IO, memory allocation, other tasks, etc... But
we are mainly interested in keep IO under control, everything else can
be covered by the CPU scheduler.

Signed-off-by: Kai Krakow <kai@kaishome.de>
  • Loading branch information
kakra committed Dec 21, 2020
1 parent 941925c commit 1e32c5d
Showing 1 changed file with 48 additions and 0 deletions.
48 changes: 48 additions & 0 deletions cli/fossilize_replay.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@
#include <map>
#include <assert.h>

#ifdef __linux__
#include <cmath>
#endif

#ifdef FOSSILIZE_REPLAYER_SPIRV_VAL
#include "spirv-tools/libspirv.hpp"
#endif
Expand Down Expand Up @@ -918,6 +922,7 @@ struct ThreadedReplayer : StateCreatorInterface
else
pipeline_cache_misses.fetch_add(1, std::memory_order_relaxed);
}
maybe_throttle();
}
else
{
Expand Down Expand Up @@ -1058,6 +1063,7 @@ struct ThreadedReplayer : StateCreatorInterface
else
pipeline_cache_misses.fetch_add(1, std::memory_order_relaxed);
}
maybe_throttle();
}
else
{
Expand Down Expand Up @@ -2083,6 +2089,7 @@ struct ThreadedReplayer : StateCreatorInterface

if (memory_index == 0)
{
//TODO Q: Maybe flush write-back here somehow? -> A: No, GPU driver writes do not come from this thread
work.push_back({ get_order_index(MAINTAIN_SHADER_MODULE_LRU_CACHE),
[this]() {
// Now all worker threads are drained for any work which needs shader modules,
Expand Down Expand Up @@ -2345,6 +2352,47 @@ struct ThreadedReplayer : StateCreatorInterface
queued_count[item.memory_context_index]++;
}

double m_prev_loadavg;
inline void maybe_throttle()
{
#ifdef __linux__
double loadavg[1];
// TODO Maybe use PSI on modern systems to measure the current IO latency?
const int rv = ::getloadavg(loadavg, 1);
if (rv != 1)
{
LOGE("Failed to query load average\n");
return;
}

static const double load_exp = exp(-5.0 / 60.0);

// Taken from github.com/Zygo/bees:
// Averages are fun, but want to know the load from the last 5 seconds.
// Invert the load average function:
// LA = LA * load_exp + N * (1 - load_exp)
// LA2 - LA1 = LA1 * load_exp + N * (1 - load_exp) - LA1
// LA2 - LA1 + LA1 = LA1 * load_exp + N * (1 - load_exp)
// LA2 - LA1 + LA1 - LA1 * load_exp = N * (1 - load_exp)
// LA2 - LA1 * load_exp = N * (1 - load_exp)
// LA2 / (1 - load_exp) - (LA1 * load_exp / 1 - load_exp) = N
// (LA2 - LA1 * load_exp) / (1 - load_exp) = N
// except for rounding error which might make this just a bit below zero.
const double current_load = max(0.0, (loadavg[0] - m_prev_loadavg * load_exp) / (1 - load_exp));

m_prev_loadavg = loadavg[0];

if (current_load > num_worker_threads)
{
// Interprets the current load as number of outstanding IO requests
// 20ms is the time we can expect an IO request to finish
uint32_t throttle_ms = (uint32_t)(current_load * 20.0 / num_worker_threads);
LOGI("Throttling threads %d load %0.2f throttle_ms %d\n", num_worker_threads, current_load, throttle_ms);
std::this_thread::sleep_for(std::chrono::milliseconds(throttle_ms));
}
#endif
}

unsigned num_worker_threads = 0;
unsigned loop_count = 0;

Expand Down

0 comments on commit 1e32c5d

Please sign in to comment.