[async] Compute offloaded IR hash once and cache it #1608

k-ye · 2020-07-29T12:37:02Z

According to the profile, hash computation took 25.3% of the time in the main thread:

This PR caches the hash so that it is only done once. This is correct because: when we do fuse two kernels, the hash is re-computed on task_a, see

taichi/taichi/program/async_engine.cpp

Line 316 in 2bfbca8

task_queue[i].h = hash(task_a);

Because the major bottleneck is clone, the performance didn't improve much. FPS went from 5 -> 6 for mpm88.. Now the profile looked like this:

For cloning, I think we can do the similar thing, but it's a bit more difficult. We can clone it once at launch() as a template, but then when there is a fusion, we need to re-clone that task, so that it doesn't pollute the template.

Related issue = #742

[Click here for the format server]

codecov · 2020-07-29T14:35:25Z

Codecov Report

Merging #1608 into master will decrease coverage by 0.44%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master    #1608      +/-   ##
==========================================
- Coverage   67.53%   67.08%   -0.45%     
==========================================
  Files          40       40              
  Lines        5630     5691      +61     
  Branches      982      993      +11     
==========================================
+ Hits         3802     3818      +16     
- Misses       1660     1699      +39     
- Partials      168      174       +6

Impacted Files	Coverage Δ
python/taichi/lang/impl.py	`87.82% <0.00%> (-1.59%)`	⬇️
python/taichi/main.py	`42.35% <0.00%> (ø)`
python/taichi/misc/gui.py	`23.12% <3.22%> (-2.24%)`	⬇️
python/taichi/lang/ops.py	`92.51% <33.33%> (-0.82%)`	⬇️
python/taichi/lang/shell.py	`48.88% <47.45%> (+10.17%)`	⬆️
python/taichi/lang/__init__.py	`80.20% <100.00%> (+<0.01%)`	⬆️
python/taichi/lang/expr.py	`89.08% <100.00%> (+0.12%)`	⬆️
python/taichi/lang/matrix.py	`92.03% <100.00%> (ø)`
python/taichi/lang/snode.py	`93.69% <100.00%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98df864...6908484. Read the comment docs.

archibate · 2020-07-29T16:17:23Z

Btw, what async_mode is? What can I do for OpenGL?

xumingkuan

LGTMig!

xumingkuan · 2020-07-30T01:24:19Z

taichi/program/async_engine.h

@@ -183,11 +179,26 @@ class AsyncEngine {
  void synchronize();

 private:
+  struct KernelMeta {
+    std::unique_ptr<Block> dummy_root;
+    std::vector<uint64> offloaded_hashes;


This is a bit confusing to me: Why do offload hashes belong to the kernel meta? Is the same offload's hash different when the kernel is different?
I thought it should belong to the offloaded meta but the key of offloaded_metas_ is the hash...

Why do offload hashes belong to the kernel meta?

Each offloaded task in a kernel will generate a hash. What I'm trying to avoid is the re-computation of the hashes for the offloads inside a kernel...

Is the same offload's hash different when the kernel is different?

I don't think so. From what I can tell, the hash() function computes the hash purely based on the input IR, not the kernel info. So same offloads will generate same hashes...

However, this identical relationship is kind of shaky. The current impl of hash() uses irpass::print() to get a textual AST std::string, then computes the hash of that string. If the print() ever mixes in the kernel info (e.g. prints the kernel name), then same offloaded tasks would result in different hashes.

xumingkuan · 2020-07-30T01:28:11Z

taichi/program/async_engine.cpp

+      h = kmeta.offloaded_hashes[i];
+    } else {
+      h = hash(cloned.get());
+      TI_ASSERT(kmeta.offloaded_hashes.size() == i);


What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

BTW if the hash really crashes, I think it should be an error rather than an assertion.

BTW if the hash really crashes, I think it should be an error rather than an assertion.

Note that offloaded_hashes is a std::vector. It maps to an offloaded task by its position in that kernel... The assertion is checking for position (i.e. the hash of this offloaded task isn't computed yet), not hash collision.

What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

I actually think it's fine for the same offloads (regardless of whether they are in the same kernel or not) to have the same hashes. If two offloads are indeed the same, then they should have the same TaskMetas (i.e. input/output snodes, activation snodes)?

Oh I see. I thought offloaded_hashes was an std::unordered_map, which would cause an assertion failure when there are two same offloads...

I actually think it's fine for the same offloads to have the same hashes.

I agree.

xumingkuan

LGTM now.

xumingkuan · 2020-07-30T10:14:00Z

taichi/program/async_engine.cpp

+      h = kmeta.offloaded_hashes[i];
+    } else {
+      h = hash(cloned.get());
+      TI_ASSERT(kmeta.offloaded_hashes.size() == i);


Oh I see. I thought offloaded_hashes was an std::unordered_map, which would cause an assertion failure when there are two same offloads...

I actually think it's fine for the same offloads to have the same hashes.

I agree.

k-ye · 2020-07-30T12:29:23Z

Btw, what async_mode is? What can I do for OpenGL?

Not for now. There are some necessary cleanups before this is useful.

[async] Compute offloaded IR hash once and cache it

898907b

k-ye requested review from yuanming-hu, xumingkuan and taichi-gardener July 29, 2020 12:37

xumingkuan reviewed Jul 30, 2020

View reviewed changes

Merge branch 'master' into hash

6908484

k-ye requested a review from xumingkuan July 30, 2020 10:03

xumingkuan approved these changes Jul 30, 2020

View reviewed changes

k-ye merged commit 00f1e88 into taichi-dev:master Jul 30, 2020

k-ye deleted the hash branch July 30, 2020 12:29

k-ye mentioned this pull request Jul 31, 2020

[async] Clone offloaded tasks lazily by caching the AST #1619

Merged

FantasyVR mentioned this pull request Aug 1, 2020

[release] v0.6.24 #1626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[async] Compute offloaded IR hash once and cache it #1608

[async] Compute offloaded IR hash once and cache it #1608

k-ye commented Jul 29, 2020

codecov bot commented Jul 29, 2020 •

edited

Loading

archibate commented Jul 29, 2020

xumingkuan left a comment

xumingkuan Jul 30, 2020

k-ye Jul 30, 2020

xumingkuan Jul 30, 2020

xumingkuan Jul 30, 2020

k-ye Jul 30, 2020

xumingkuan Jul 30, 2020

xumingkuan left a comment

xumingkuan Jul 30, 2020

k-ye commented Jul 30, 2020

[async] Compute offloaded IR hash once and cache it #1608

[async] Compute offloaded IR hash once and cache it #1608

Conversation

k-ye commented Jul 29, 2020

codecov bot commented Jul 29, 2020 • edited Loading

Codecov Report

archibate commented Jul 29, 2020

xumingkuan left a comment

Choose a reason for hiding this comment

xumingkuan Jul 30, 2020

Choose a reason for hiding this comment

k-ye Jul 30, 2020

Choose a reason for hiding this comment

xumingkuan Jul 30, 2020

Choose a reason for hiding this comment

xumingkuan Jul 30, 2020

Choose a reason for hiding this comment

k-ye Jul 30, 2020

Choose a reason for hiding this comment

xumingkuan Jul 30, 2020

Choose a reason for hiding this comment

xumingkuan left a comment

Choose a reason for hiding this comment

xumingkuan Jul 30, 2020

Choose a reason for hiding this comment

k-ye commented Jul 30, 2020

codecov bot commented Jul 29, 2020 •

edited

Loading