Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Compute offloaded IR hash once and cache it #1608

Merged
merged 2 commits into from
Jul 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 23 additions & 12 deletions taichi/program/async_engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,13 @@ std::unique_ptr<IRNode> clone_offloaded_task(OffloadedStmt *from,

KernelLaunchRecord::KernelLaunchRecord(Context context,
Kernel *kernel,
std::unique_ptr<IRNode> &&stmt_)
std::unique_ptr<IRNode> &&stmt_,
uint64 h)
: context(context),
kernel(kernel),
stmt(dynamic_cast<OffloadedStmt *>(stmt_.get())),
stmt_holder(std::move(stmt_)),
h(hash(stmt)) {
h(h),
stmt_holder_(std::move(stmt_)) {
TI_ASSERT(stmt != nullptr);
TI_ASSERT(stmt->get_kernel() != nullptr);
}
Expand Down Expand Up @@ -130,25 +131,35 @@ void AsyncEngine::launch(Kernel *kernel) {
kernel->lower(/*to_executable=*/false);
auto block = dynamic_cast<Block *>(kernel->ir.get());
TI_ASSERT(block);

auto &offloads = block->statements;
auto &dummy_root = kernel_to_dummy_roots_[kernel];
if (dummy_root == nullptr) {
dummy_root = std::make_unique<Block>();
dummy_root->kernel = kernel;
auto &kmeta = kernel_metas_[kernel];
const bool kmeta_inited = kmeta.initialized();
if (!kmeta_inited) {
kmeta.dummy_root = std::make_unique<Block>();
kmeta.dummy_root->kernel = kernel;
}
for (std::size_t i = 0; i < offloads.size(); i++) {
auto offload = offloads[i]->as<OffloadedStmt>();
KernelLaunchRecord rec(
kernel->program.get_context(), kernel,
clone_offloaded_task(offload, kernel, dummy_root.get()));
auto cloned = clone_offloaded_task(offload, kernel, kmeta.dummy_root.get());
uint64 h;
if (kmeta_inited) {
h = kmeta.offloaded_hashes[i];
} else {
h = hash(cloned.get());
TI_ASSERT(kmeta.offloaded_hashes.size() == i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the hash really crashes, I think it should be an error rather than an assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the hash really crashes, I think it should be an error rather than an assertion.

Note that offloaded_hashes is a std::vector. It maps to an offloaded task by its position in that kernel... The assertion is checking for position (i.e. the hash of this offloaded task isn't computed yet), not hash collision.

What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

I actually think it's fine for the same offloads (regardless of whether they are in the same kernel or not) to have the same hashes. If two offloads are indeed the same, then they should have the same TaskMetas (i.e. input/output snodes, activation snodes)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I thought offloaded_hashes was an std::unordered_map, which would cause an assertion failure when there are two same offloads...

I actually think it's fine for the same offloads to have the same hashes.

I agree.

kmeta.offloaded_hashes.push_back(h);
}
KernelLaunchRecord rec(kernel->program.get_context(), kernel,
std::move(cloned), h);
enqueue(std::move(rec));
}
}

void AsyncEngine::enqueue(KernelLaunchRecord &&t) {
using namespace irpass::analysis;

auto &meta = metas[t.h];
auto &meta = offloaded_metas_[t.h];
// TODO: this is an abuse since it gathers nothing...
auto root_stmt = t.stmt;
gather_statements(root_stmt, [&](Stmt *stmt) {
Expand Down Expand Up @@ -213,7 +224,7 @@ bool AsyncEngine::optimize_listgen() {
for (int i = 0; i < task_queue.size(); i++) {
// Try to eliminate unused listgens
auto &t = task_queue[i];
auto meta = metas[t.h];
auto meta = offloaded_metas_[t.h];
auto offload = t.stmt;
bool keep = true;
if (offload->task_type == OffloadedStmt::TaskType::listgen) {
Expand Down
37 changes: 24 additions & 13 deletions taichi/program/async_engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,15 @@ class KernelLaunchRecord {
Context context;
Kernel *kernel; // TODO: remove this
OffloadedStmt *stmt;
std::unique_ptr<IRNode> stmt_holder;
uint64 h;
uint64 h; // hash of |stmt|

KernelLaunchRecord(Context contxet,
KernelLaunchRecord(Context context,
Kernel *kernel,
std::unique_ptr<IRNode> &&stmt);
std::unique_ptr<IRNode> &&stmt,
uint64 h);

private:
std::unique_ptr<IRNode> stmt_holder_;
};

// In charge of (parallel) compilation to binary and (serial) kernel launching
Expand Down Expand Up @@ -154,13 +157,6 @@ class AsyncEngine {
public:
// TODO: state machine

struct TaskMeta {
std::unordered_set<SNode *> input_snodes, output_snodes;
std::unordered_set<SNode *> activation_snodes;
};

std::unordered_map<std::uint64_t, TaskMeta> metas;

ExecutionQueue queue;

std::deque<KernelLaunchRecord> task_queue;
Expand All @@ -183,11 +179,26 @@ class AsyncEngine {
void synchronize();

private:
struct KernelMeta {
std::unique_ptr<Block> dummy_root;
std::vector<uint64> offloaded_hashes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me: Why do offload hashes belong to the kernel meta? Is the same offload's hash different when the kernel is different?
I thought it should belong to the offloaded meta but the key of offloaded_metas_ is the hash...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do offload hashes belong to the kernel meta?

Each offloaded task in a kernel will generate a hash. What I'm trying to avoid is the re-computation of the hashes for the offloads inside a kernel...

Is the same offload's hash different when the kernel is different?

I don't think so. From what I can tell, the hash() function computes the hash purely based on the input IR, not the kernel info. So same offloads will generate same hashes...

However, this identical relationship is kind of shaky. The current impl of hash() uses irpass::print() to get a textual AST std::string, then computes the hash of that string. If the print() ever mixes in the kernel info (e.g. prints the kernel name), then same offloaded tasks would result in different hashes.


inline bool initialized() const {
return dummy_root != nullptr;
}
};

struct TaskMeta {
std::unordered_set<SNode *> input_snodes, output_snodes;
std::unordered_set<SNode *> activation_snodes;
};

// In async mode, the root of an AST is an OffloadedStmt instead of a Block.
// This map provides a dummy Block root for these OffloadedStmt, so that
// get_kernel() could still work correctly.
std::unordered_map<const Kernel *, std::unique_ptr<Block>>
kernel_to_dummy_roots_;
std::unordered_map<const Kernel *, KernelMeta> kernel_metas_;

std::unordered_map<std::uint64_t, TaskMeta> offloaded_metas_;
};

TLANG_NAMESPACE_END