-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[async] Compute offloaded IR hash once and cache it #1608
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1608 +/- ##
==========================================
- Coverage 67.53% 67.08% -0.45%
==========================================
Files 40 40
Lines 5630 5691 +61
Branches 982 993 +11
==========================================
+ Hits 3802 3818 +16
- Misses 1660 1699 +39
- Partials 168 174 +6
Continue to review full report at Codecov.
|
Btw, what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTMig!
@@ -183,11 +179,26 @@ class AsyncEngine { | |||
void synchronize(); | |||
|
|||
private: | |||
struct KernelMeta { | |||
std::unique_ptr<Block> dummy_root; | |||
std::vector<uint64> offloaded_hashes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing to me: Why do offload hashes belong to the kernel meta? Is the same offload's hash different when the kernel is different?
I thought it should belong to the offloaded meta but the key of offloaded_metas_
is the hash...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do offload hashes belong to the kernel meta?
Each offloaded task in a kernel will generate a hash. What I'm trying to avoid is the re-computation of the hashes for the offloads inside a kernel...
Is the same offload's hash different when the kernel is different?
I don't think so. From what I can tell, the hash()
function computes the hash purely based on the input IR, not the kernel info. So same offloads will generate same hashes...
However, this identical relationship is kind of shaky. The current impl of hash()
uses irpass::print()
to get a textual AST std::string
, then computes the hash of that string. If the print()
ever mixes in the kernel info (e.g. prints the kernel name), then same offloaded tasks would result in different hashes.
h = kmeta.offloaded_hashes[i]; | ||
} else { | ||
h = hash(cloned.get()); | ||
TI_ASSERT(kmeta.offloaded_hashes.size() == i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if the hash really crashes, I think it should be an error rather than an assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if the hash really crashes, I think it should be an error rather than an assertion.
Note that offloaded_hashes
is a std::vector
. It maps to an offloaded task by its position in that kernel... The assertion is checking for position (i.e. the hash of this offloaded task isn't computed yet), not hash collision.
What if we have two same offloads in a kernel? And does it prevent two different offloads in two different kernels having the same hash?
I actually think it's fine for the same offloads (regardless of whether they are in the same kernel or not) to have the same hashes. If two offloads are indeed the same, then they should have the same TaskMeta
s (i.e. input/output snodes, activation snodes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I thought offloaded_hashes
was an std::unordered_map
, which would cause an assertion failure when there are two same offloads...
I actually think it's fine for the same offloads to have the same hashes.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
h = kmeta.offloaded_hashes[i]; | ||
} else { | ||
h = hash(cloned.get()); | ||
TI_ASSERT(kmeta.offloaded_hashes.size() == i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. I thought offloaded_hashes
was an std::unordered_map
, which would cause an assertion failure when there are two same offloads...
I actually think it's fine for the same offloads to have the same hashes.
I agree.
Not for now. There are some necessary cleanups before this is useful. |
According to the profile, hash computation took 25.3% of the time in the main thread:
This PR caches the hash so that it is only done once. This is correct because: when we do fuse two kernels, the hash is re-computed on
task_a
, seetaichi/taichi/program/async_engine.cpp
Line 316 in 2bfbca8
Because the major bottleneck is clone, the performance didn't improve much. FPS went from 5 -> 6 for
mpm88
.. Now the profile looked like this:For cloning, I think we can do the similar thing, but it's a bit more difficult. We can clone it once at
launch()
as a template, but then when there is a fusion, we need to re-clone that task, so that it doesn't pollute the template.Related issue = #742
[Click here for the format server]