Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved profiler result printing. #7709

Merged
merged 7 commits into from
Jul 26, 2023
Merged

Conversation

mcourteaux
Copy link
Contributor

This 50-cents improvement makes my profiling reports more readable, as I have long function names. Additionally, I cheaply aligned the timing by printing spacing depending on how many order of magnitude the number spans. Example output below:

neonraw_denoiser_f8_k7_p1-x86-64-linux-avx2-fma-profile_by_timer
 total time: 4193.547363 ms  samples: 3593  runs: 11  time/run: 381.231567 ms
 average threads used: 15.640412
 heap allocations: 284680  peak heap usage: 2949120 bytes
  halide_malloc:                 0.575ms (0%)    threads: 16.000
  halide_free:                   0.095ms (0%)    threads: 15.000
  denoised:                     17.235ms (4%)    threads: 15.000
  luminance:                     5.551ms (1%)    threads: 15.446 stack: 13456
  blur_pass_h:                   0.385ms (0%)    threads: 16.000 stack: 12528
  blur_pass_h_sum:               0.767ms (0%)    threads: 15.500 stack: 32
  filter_input:                  0.296ms (0%)    threads: 16.000 stack: 11664
  blur_pass_v:                   0.000ms (0%)    threads: 0.000  stack: 32
  blur_pass_v_sum:               0.191ms (0%)    threads: 16.000 stack: 32
  mask_activation:               1.057ms (0%)    threads: 15.727 peak: 1105920  num: 56936     avg: 73728
  mask_activation_sum:          56.829ms (14%)   threads: 15.670 stack: 32
  exp_logits:                   17.358ms (4%)    threads: 15.521 peak: 1179648  num: 56936     avg: 73728
  logit:                         1.251ms (0%)    threads: 15.923 stack: 1760
  max_logit:                     7.621ms (1%)    threads: 15.688 stack: 32
  blur_conv_pass_0:              5.247ms (1%)    threads: 15.722 peak: 1769472  num: 170808    avg: 110592
  blur_conv_pass_0_sum:        242.229ms (63%)   threads: 15.688 stack: 32
  sum_exp_logits:                1.483ms (0%)    threads: 15.857 stack: 192
  sum_exp_logits$1:              6.088ms (1%)    threads: 15.237 stack: 32
  softmax:                       6.316ms (1%)    threads: 15.677 stack: 2304
  denoise_accum_pass_0:          4.716ms (1%)    threads: 15.833 stack: 4
  denoise_accum_pass_0_intm:     5.929ms (1%)    threads: 15.545 stack: 48

@abadams abadams requested a review from halidebuildbots July 25, 2023 17:27
while (sstr.size() < cursor) {
sstr << " ";
}

float ft = fs->time / (p->runs * 1000000.0f);
if (ft < 10000) sstr << " ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All "if" statements in Halide should be in {braces}, even trivial single-line ones like these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm. Then why doesn't the clang-format CI-step complain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format didn't, but clang-tidy did

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, what, clang-tidy checks this separately. Clang-format has an option for this too, I believe.

@mcourteaux
Copy link
Contributor Author

Dang this clang-tidy fix makes the code really ugly... ☹️

@abadams
Copy link
Member

abadams commented Jul 25, 2023

Yeah, clang-tidy and clang-format make things better on average. There are definitely cases where it makes things worse, but it's worth it in expectation.

@mcourteaux
Copy link
Contributor Author

I can surround it with // clang-format off/on in this case. Maybe clang-tidy will still complain then... Let me know what you prefer here.

@steven-johnson
Copy link
Contributor

I can surround it with // clang-format off/on in this case. Maybe clang-tidy will still complain then... Let me know what you prefer here.

We should fix our clang-format setings to enforce this as well (if possible). In the meantime, please just manually format in a way that makes clang-tidy happy; we prefer to reserve clang-format off for pathological cases.

@mcourteaux
Copy link
Contributor Author

Okay, then this is it. 😄

@steven-johnson steven-johnson self-requested a review July 26, 2023 22:03
Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@steven-johnson steven-johnson merged commit bfc26cc into halide:main Jul 26, 2023
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Fixed the regularization for BGU.

* Improved profiler result printing.

* Clang-format ain't liking pretty code.

* Clang-tidy ain't liking pretty code.

---------

Co-authored-by: Steven Johnson <srj@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants