Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] [IR] More accurate same_value analysis #2118

Merged
merged 11 commits into from
Dec 27, 2020

Conversation

xumingkuan
Copy link
Contributor

@xumingkuan xumingkuan commented Dec 24, 2020

Related issue = #742 #656

bool same_value(
    Stmt *stmt1,
    Stmt *stmt2,
    const AsyncStateSet &possibly_modified_states,
    IRBank *ir_bank,
    const std::optional<std::unordered_map<int, int>> &id_map = std::nullopt);

allows passing in a parameter possibly_modified_states, so that we can analyze if two global loads share the same value (same if the value state of the SNode is not modified).

G2P2G running time on my laptop (fusible now, 1.44x faster!):
before:

num particles=10086
  frame time 0.210 s
  substep time 4.047 ms
num particles=10251
  frame time 0.341 s
  substep time 6.559 ms
num particles=10416
  frame time 0.248 s
  substep time 4.776 ms
num particles=10581
  frame time 0.327 s
  substep time 6.291 ms
num particles=10746
  frame time 0.294 s
  substep time 5.658 ms
num particles=10911
  frame time 0.281 s
  substep time 5.409 ms
num particles=11076
  frame time 0.235 s
  substep time 4.526 ms
num particles=11241
  frame time 0.264 s
  substep time 5.083 ms
num particles=11406
  frame time 0.255 s
  substep time 4.929 ms
num particles=11571
  frame time 0.313 s
  substep time 6.022 ms
......
num particles=20991
  frame time 0.320 s
  substep time 6.157 ms
num particles=21024
  frame time 0.318 s
  substep time 6.118 ms
num particles=21057
  frame time 0.326 s
  substep time 6.272 ms
num particles=21090
  frame time 0.326 s
  substep time 6.272 ms
num particles=21123
  frame time 0.334 s
  substep time 6.425 ms

after:

num particles=10086
  frame time 0.187 s
  substep time 3.606 ms
num particles=10251
  frame time 0.187 s
  substep time 3.606 ms
num particles=10416
  frame time 0.189 s
  substep time 3.644 ms
num particles=10581
  frame time 0.190 s
  substep time 3.663 ms
num particles=10746
  frame time 0.190 s
  substep time 3.663 ms
num particles=10911
  frame time 0.193 s
  substep time 3.721 ms
num particles=11076
  frame time 0.192 s
  substep time 3.721 ms
num particles=11241
  frame time 0.194 s
  substep time 3.740 ms
num particles=11406
  frame time 0.197 s
  substep time 3.798 ms
num particles=11571
  frame time 0.198 s
  substep time 3.817 ms
......
num particles=20991
  frame time 0.317 s
  substep time 6.099 ms
num particles=21024
  frame time 0.316 s
  substep time 6.080 ms
num particles=21057
  frame time 0.314 s
  substep time 6.042 ms
num particles=21090
  frame time 0.314 s
  substep time 6.042 ms
num particles=21123
  frame time 0.315 s
  substep time 6.061 ms

one grid:

num particles=10086
  frame time 0.299 s
  substep time 5.754 ms
num particles=10251
  frame time 0.305 s
  substep time 5.869 ms
num particles=10416
  frame time 0.296 s
  substep time 5.696 ms
num particles=10581
  frame time 0.313 s
  substep time 6.022 ms
num particles=10746
  frame time 0.297 s
  substep time 5.715 ms
num particles=10911
  frame time 0.305 s
  substep time 5.888 ms
num particles=11076
  frame time 0.308 s
  substep time 5.926 ms
num particles=11241
  frame time 0.320 s
  substep time 6.157 ms
num particles=11406
  frame time 0.319 s
  substep time 6.137 ms
num particles=11571
  frame time 0.311 s
  substep time 5.984 ms
......
num particles=20991
  frame time 0.460 s
  substep time 8.899 ms
num particles=21024
  frame time 0.424 s
  substep time 8.151 ms
num particles=21057
  frame time 0.434 s
  substep time 8.362 ms
num particles=21090
  frame time 0.418 s
  substep time 8.036 ms
num particles=21123
  frame time 0.433 s
  substep time 8.324 ms

sync mode:

num particles=10086
  frame time 0.168 s
  substep time 3.241 ms
num particles=10251
  frame time 0.171 s
  substep time 3.280 ms
num particles=10416
  frame time 0.175 s
  substep time 3.356 ms
num particles=10581
  frame time 0.176 s
  substep time 3.376 ms
num particles=10746
  frame time 0.175 s
  substep time 3.356 ms
num particles=10911
  frame time 0.179 s
  substep time 3.433 ms
num particles=11076
  frame time 0.180 s
  substep time 3.452 ms
num particles=11241
  frame time 0.179 s
  substep time 3.433 ms
num particles=11406
  frame time 0.182 s
  substep time 3.491 ms
num particles=11571
  frame time 0.183 s
  substep time 3.510 ms
......
num particles=20991
  frame time 0.305 s
  substep time 5.888 ms
num particles=21024
  frame time 0.306 s
  substep time 5.888 ms
num particles=21057
  frame time 0.305 s
  substep time 5.869 ms
num particles=21090
  frame time 0.305 s
  substep time 5.869 ms
num particles=21123
  frame time 0.305 s
  substep time 5.869 ms

[Click here for the format server]


@xumingkuan xumingkuan changed the title [async] [IR] More accurate same_value analysis [async] [IR] More accurate same_value analysis Dec 24, 2020
@xumingkuan xumingkuan marked this pull request as ready for review December 26, 2020 07:53
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting more comments, thanks :)

taichi/analysis/same_statements.cpp Outdated Show resolved Hide resolved
taichi/analysis/same_statements.cpp Outdated Show resolved Hide resolved
taichi/analysis/same_statements.cpp Show resolved Hide resolved
taichi/analysis/same_statements.cpp Outdated Show resolved Hide resolved
taichi/analysis/same_statements.cpp Show resolved Hide resolved
taichi/ir/analysis.h Outdated Show resolved Hide resolved
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, LGTM! Would you move some of your replies to the code comments directly?

taichi/analysis/same_statements.cpp Outdated Show resolved Hide resolved
taichi/analysis/same_statements.cpp Outdated Show resolved Hide resolved
taichi/analysis/same_statements.cpp Show resolved Hide resolved
taichi/analysis/same_statements.cpp Show resolved Hide resolved
if (auto global_ptr = global_load->ptr->cast<GlobalPtrStmt>()) {
TI_ASSERT(global_ptr->width() == 1);
if (possibly_modified_states_.count(ir_bank_->get_async_state(
global_ptr->snodes[0], AsyncState::Type::value)) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, previous we were too conservative, and returned "not the same" even for snodes that are only read in the tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

@xumingkuan xumingkuan merged commit 64c913c into taichi-dev:master Dec 27, 2020
@k-ye k-ye mentioned this pull request Jan 5, 2021
@xumingkuan xumingkuan deleted the same-value branch March 13, 2021 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants