Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Compute offloaded IR hash once and cache it #1608

Merged
merged 2 commits into from
Jul 30, 2020
Merged

Conversation

k-ye
Copy link
Member

@k-ye k-ye commented Jul 29, 2020

According to the profile, hash computation took 25.3% of the time in the main thread:

Screen Shot 2020-07-29 at 21 29 10

This PR caches the hash so that it is only done once. This is correct because: when we do fuse two kernels, the hash is re-computed on task_a, see

task_queue[i].h = hash(task_a);

Because the major bottleneck is clone, the performance didn't improve much. FPS went from 5 -> 6 for mpm88.. Now the profile looked like this:

Screen Shot 2020-07-29 at 21 31 47


For cloning, I think we can do the similar thing, but it's a bit more difficult. We can clone it once at launch() as a template, but then when there is a fusion, we need to re-clone that task, so that it doesn't pollute the template.

Related issue = #742

[Click here for the format server]


@codecov
Copy link

codecov bot commented Jul 29, 2020

Codecov Report

Merging #1608 into master will decrease coverage by 0.44%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1608      +/-   ##
==========================================
- Coverage   67.53%   67.08%   -0.45%     
==========================================
  Files          40       40              
  Lines        5630     5691      +61     
  Branches      982      993      +11     
==========================================
+ Hits         3802     3818      +16     
- Misses       1660     1699      +39     
- Partials      168      174       +6     
Impacted Files Coverage Δ
python/taichi/lang/impl.py 87.82% <0.00%> (-1.59%) ⬇️
python/taichi/main.py 42.35% <0.00%> (ø)
python/taichi/misc/gui.py 23.12% <3.22%> (-2.24%) ⬇️
python/taichi/lang/ops.py 92.51% <33.33%> (-0.82%) ⬇️
python/taichi/lang/shell.py 48.88% <47.45%> (+10.17%) ⬆️
python/taichi/lang/__init__.py 80.20% <100.00%> (+<0.01%) ⬆️
python/taichi/lang/expr.py 89.08% <100.00%> (+0.12%) ⬆️
python/taichi/lang/matrix.py 92.03% <100.00%> (ø)
python/taichi/lang/snode.py 93.69% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98df864...6908484. Read the comment docs.

@archibate
Copy link
Collaborator

Btw, what async_mode is? What can I do for OpenGL?

Copy link
Contributor

@xumingkuan xumingkuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMig!

@@ -183,11 +179,26 @@ class AsyncEngine {
void synchronize();

private:
struct KernelMeta {
std::unique_ptr<Block> dummy_root;
std::vector<uint64> offloaded_hashes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me: Why do offload hashes belong to the kernel meta? Is the same offload's hash different when the kernel is different?
I thought it should belong to the offloaded meta but the key of offloaded_metas_ is the hash...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do offload hashes belong to the kernel meta?

Each offloaded task in a kernel will generate a hash. What I'm trying to avoid is the re-computation of the hashes for the offloads inside a kernel...

Is the same offload's hash different when the kernel is different?

I don't think so. From what I can tell, the hash() function computes the hash purely based on the input IR, not the kernel info. So same offloads will generate same hashes...

However, this identical relationship is kind of shaky. The current impl of hash() uses irpass::print() to get a textual AST std::string, then computes the hash of that string. If the print() ever mixes in the kernel info (e.g. prints the kernel name), then same offloaded tasks would result in different hashes.

h = kmeta.offloaded_hashes[i];
} else {
h = hash(cloned.get());
TI_ASSERT(kmeta.offloaded_hashes.size() == i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the hash really crashes, I think it should be an error rather than an assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if the hash really crashes, I think it should be an error rather than an assertion.

Note that offloaded_hashes is a std::vector. It maps to an offloaded task by its position in that kernel... The assertion is checking for position (i.e. the hash of this offloaded task isn't computed yet), not hash collision.

What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?

I actually think it's fine for the same offloads (regardless of whether they are in the same kernel or not) to have the same hashes. If two offloads are indeed the same, then they should have the same TaskMetas (i.e. input/output snodes, activation snodes)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I thought offloaded_hashes was an std::unordered_map, which would cause an assertion failure when there are two same offloads...

I actually think it's fine for the same offloads to have the same hashes.

I agree.

@k-ye k-ye requested a review from xumingkuan July 30, 2020 10:03
Copy link
Contributor

@xumingkuan xumingkuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

h = kmeta.offloaded_hashes[i];
} else {
h = hash(cloned.get());
TI_ASSERT(kmeta.offloaded_hashes.size() == i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I thought offloaded_hashes was an std::unordered_map, which would cause an assertion failure when there are two same offloads...

I actually think it's fine for the same offloads to have the same hashes.

I agree.

@k-ye
Copy link
Member Author

k-ye commented Jul 30, 2020

Btw, what async_mode is? What can I do for OpenGL?

Not for now. There are some necessary cleanups before this is useful.

@k-ye k-ye merged commit 00f1e88 into taichi-dev:master Jul 30, 2020
@k-ye k-ye deleted the hash branch July 30, 2020 12:29
@FantasyVR FantasyVR mentioned this pull request Aug 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants