Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more detailed self profiling #58085

Merged
merged 3 commits into from
Feb 10, 2019
Merged

Conversation

wesleywiser
Copy link
Member

@wesleywiser wesleywiser commented Feb 2, 2019

Timing data and cache hits/misses are now recorded at the query level.
This allows us to show detailed per query information such as total time
for each query.

To see detailed query information in the summary pass the -Z verbose
flag. For example:

rustc -Z self-profile -Z verbose hello_world.rs

results in something like:

Self profiling results:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| Other                                     |            177 |    54.97 |           8094 |    45.47
| - {time spent not running queries}        |            113 |    35.09 |              0 |     0.00
| - const_eval                              |             16 |     4.97 |             26 |    11.54
| - type_of                                 |              9 |     2.80 |            627 |    27.75
| - const_eval_raw                          |              8 |     2.48 |             22 |     0.00
| - adt_def                                 |              7 |     2.17 |            381 |    11.55
| - visible_parent_map                      |              7 |     2.17 |             99 |    98.99
| - item_attrs                              |              6 |     1.86 |            698 |    50.14
| - item_children                           |              5 |     1.55 |           2815 |     0.00
| - adt_dtorck_constraint                   |              4 |     1.24 |              2 |     0.00
| - adt_destructor                          |              2 |     0.62 |             15 |    86.67
| TypeChecking                              |             53 |    16.46 |           2834 |    79.89
| - trait_impls_of                          |              9 |     2.80 |             65 |    86.15
| - evaluate_obligation                     |              7 |     2.17 |             80 |     2.50
| - const_is_rvalue_promotable_to_static    |              6 |     1.86 |              1 |     0.00
| - is_copy_raw                             |              6 |     1.86 |             29 |    58.62
| - rvalue_promotable_map                   |              6 |     1.86 |              2 |    50.00
| - {time spent not running queries}        |              6 |     1.86 |              0 |     0.00
| - typeck_item_bodies                      |              5 |     1.55 |              1 |     0.00
| - typeck_tables_of                        |              5 |     1.55 |             19 |    94.74
| - dropck_outlives                         |              2 |     0.62 |              1 |     0.00
| - layout_raw                              |              1 |     0.31 |            668 |    87.87
| Linking                                   |             48 |    14.91 |             43 |    46.51
| - {time spent not running queries}        |             48 |    14.91 |              0 |     0.00
| Codegen                                   |             29 |     9.01 |            420 |    61.90
| - {time spent not running queries}        |             16 |     4.97 |              0 |     0.00
| - collect_and_partition_mono_items        |             11 |     3.42 |             13 |    92.31
| - mir_const                               |              1 |     0.31 |              1 |     0.00
| - mir_validated                           |              1 |     0.31 |              3 |    66.67
| Expansion                                 |             14 |     4.35 |              0 |     0.00
| - {time spent not running queries}        |             14 |     4.35 |              0 |     0.00
| BorrowChecking                            |              1 |     0.31 |             12 |    41.67
| - borrowck                                |              1 |     0.31 |              2 |    50.00
| Parsing                                   |              0 |     0.00 |              0 |     0.00

Optimization level: No
Incremental: off
Rendered

Self profiling results:

Phase Time (ms) Time (%) Queries Hits (%)
Other 177 54.97 8094 45.47
- {time spent not running queries} 113 35.09 0 0.00
- const_eval 16 4.97 26 11.54
- type_of 9 2.80 627 27.75
- const_eval_raw 8 2.48 22 0.00
- adt_def 7 2.17 381 11.55
- visible_parent_map 7 2.17 99 98.99
- item_attrs 6 1.86 698 50.14
- item_children 5 1.55 2815 0.00
- adt_dtorck_constraint 4 1.24 2 0.00
- adt_destructor 2 0.62 15 86.67
TypeChecking 53 16.46 2834 79.89
- trait_impls_of 9 2.80 65 86.15
- evaluate_obligation 7 2.17 80 2.50
- const_is_rvalue_promotable_to_static 6 1.86 1 0.00
- is_copy_raw 6 1.86 29 58.62
- rvalue_promotable_map 6 1.86 2 50.00
- {time spent not running queries} 6 1.86 0 0.00
- typeck_item_bodies 5 1.55 1 0.00
- typeck_tables_of 5 1.55 19 94.74
- dropck_outlives 2 0.62 1 0.00
- layout_raw 1 0.31 668 87.87
Linking 48 14.91 43 46.51
- {time spent not running queries} 48 14.91 0 0.00
Codegen 29 9.01 420 61.90
- {time spent not running queries} 16 4.97 0 0.00
- collect_and_partition_mono_items 11 3.42 13 92.31
- mir_const 1 0.31 1 0.00
- mir_validated 1 0.31 3 66.67
Expansion 14 4.35 0 0.00
- {time spent not running queries} 14 4.35 0 0.00
BorrowChecking 1 0.31 12 41.67
- borrowck 1 0.31 2 50.00
Parsing 0 0.00 0 0.00

Optimization level: No
Incremental: off

cc @nikomatsakis @michaelwoerister @Zoxc

Fixes #54141

@rust-highfive
Copy link
Collaborator

r? @oli-obk

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 2, 2019
@oli-obk
Copy link
Contributor

oli-obk commented Feb 3, 2019

r? @michaelwoerister

@Mark-Simulacrum
Copy link
Member

Does this change the JSON emitted in a backwards-incompatible way? If so, we might want to start emitting a version of some kind (probably just an integer) which we can increment on backwards incompatible changes to aid perf.

@wesleywiser
Copy link
Member Author

No, currently the additional per-query data isn't being reported in the JSON at all. When it is, I expect we can do so in a backwards-compatible way.

@wesleywiser
Copy link
Member Author

wesleywiser commented Feb 5, 2019

@michaelwoerister I also fixed the reported times to be self-time as we discussed at All Hands.

@michaelwoerister
Copy link
Member

Looks great!
@bors r+

@bors
Copy link
Contributor

bors commented Feb 6, 2019

📌 Commit fd566f087deea557abcf6eb29587c4e890de902b has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 6, 2019
@bors
Copy link
Contributor

bors commented Feb 7, 2019

☔ The latest upstream changes (presumably #58254) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 7, 2019
Timing data and cache hits/misses are now recorded at the query level.
This allows us to show detailed per query information such as total time
for each query.

To see detailed query information in the summary pass the `-Z verbose`
flag. For example:

```
rustc -Z self-profile -Z verbose hello_world.rs
```
@wesleywiser
Copy link
Member Author

Rebased

@bors r+

@bors
Copy link
Contributor

bors commented Feb 7, 2019

📌 Commit 584081a has been approved by wesleywiser

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 7, 2019
@wesleywiser
Copy link
Member Author

This also fixes #54141

@bors
Copy link
Contributor

bors commented Feb 10, 2019

⌛ Testing commit 584081a with merge 703dbe0abe66c8c349fdd0dd59ac61207eaba23d...

@bors
Copy link
Contributor

bors commented Feb 10, 2019

💔 Test failed - checks-travis

@rust-highfive
Copy link
Collaborator

The job arm-android of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:41:08] test string::test_str_clear ... ok
[01:41:08] test string::test_str_truncate ... ok
[01:41:08] test string::test_str_truncate_split_codepoint ... ok
[01:41:08] test string::test_str_truncate_invalid_len ... ok
[01:41:08] died due to signal 11
[01:41:08] 
[01:41:08] 
[01:41:08] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "arm-linux-androideabi" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "alloc" "--"
[01:41:08] expected success, got: exit code: 3
---
travis_time:end:0771a596:start=1549788663863589567,finish=1549788663878504326,duration=14914759
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:03ab03c0
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1b70ce00
travis_time:start:1b70ce00
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:2e1214e8
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 10, 2019
@kennytm
Copy link
Member

kennytm commented Feb 10, 2019

@bors retry #55861

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 10, 2019
@bors
Copy link
Contributor

bors commented Feb 10, 2019

⌛ Testing commit 584081a with merge 68650ca...

bors added a commit that referenced this pull request Feb 10, 2019
Implement more detailed self profiling

Timing data and cache hits/misses are now recorded at the query level.
This allows us to show detailed per query information such as total time
for each query.

To see detailed query information in the summary pass the `-Z verbose`
flag. For example:

```
rustc -Z self-profile -Z verbose hello_world.rs
```

results in something like:

```md
Self profiling results:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| Other                                     |            177 |    54.97 |           8094 |    45.47
| - {time spent not running queries}        |            113 |    35.09 |              0 |     0.00
| - const_eval                              |             16 |     4.97 |             26 |    11.54
| - type_of                                 |              9 |     2.80 |            627 |    27.75
| - const_eval_raw                          |              8 |     2.48 |             22 |     0.00
| - adt_def                                 |              7 |     2.17 |            381 |    11.55
| - visible_parent_map                      |              7 |     2.17 |             99 |    98.99
| - item_attrs                              |              6 |     1.86 |            698 |    50.14
| - item_children                           |              5 |     1.55 |           2815 |     0.00
| - adt_dtorck_constraint                   |              4 |     1.24 |              2 |     0.00
| - adt_destructor                          |              2 |     0.62 |             15 |    86.67
| TypeChecking                              |             53 |    16.46 |           2834 |    79.89
| - trait_impls_of                          |              9 |     2.80 |             65 |    86.15
| - evaluate_obligation                     |              7 |     2.17 |             80 |     2.50
| - const_is_rvalue_promotable_to_static    |              6 |     1.86 |              1 |     0.00
| - is_copy_raw                             |              6 |     1.86 |             29 |    58.62
| - rvalue_promotable_map                   |              6 |     1.86 |              2 |    50.00
| - {time spent not running queries}        |              6 |     1.86 |              0 |     0.00
| - typeck_item_bodies                      |              5 |     1.55 |              1 |     0.00
| - typeck_tables_of                        |              5 |     1.55 |             19 |    94.74
| - dropck_outlives                         |              2 |     0.62 |              1 |     0.00
| - layout_raw                              |              1 |     0.31 |            668 |    87.87
| Linking                                   |             48 |    14.91 |             43 |    46.51
| - {time spent not running queries}        |             48 |    14.91 |              0 |     0.00
| Codegen                                   |             29 |     9.01 |            420 |    61.90
| - {time spent not running queries}        |             16 |     4.97 |              0 |     0.00
| - collect_and_partition_mono_items        |             11 |     3.42 |             13 |    92.31
| - mir_const                               |              1 |     0.31 |              1 |     0.00
| - mir_validated                           |              1 |     0.31 |              3 |    66.67
| Expansion                                 |             14 |     4.35 |              0 |     0.00
| - {time spent not running queries}        |             14 |     4.35 |              0 |     0.00
| BorrowChecking                            |              1 |     0.31 |             12 |    41.67
| - borrowck                                |              1 |     0.31 |              2 |    50.00
| Parsing                                   |              0 |     0.00 |              0 |     0.00

Optimization level: No
Incremental: off
```

<details>
<summary>Rendered</summary>

Self profiling results:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| **Other**                                     |           **177** |   **54.97** |   **8094** |  **45.47**
| - {time spent not running queries}        |            113 |    35.09 |              0 |     0.00
| - const_eval                              |             16 |     4.97 |             26 |    11.54
| - type_of                                 |              9 |     2.80 |            627 |    27.75
| - const_eval_raw                          |              8 |     2.48 |             22 |     0.00
| - adt_def                                 |              7 |     2.17 |            381 |    11.55
| - visible_parent_map                      |              7 |     2.17 |             99 |    98.99
| - item_attrs                              |              6 |     1.86 |            698 |    50.14
| - item_children                           |              5 |     1.55 |           2815 |     0.00
| - adt_dtorck_constraint                   |              4 |     1.24 |              2 |     0.00
| - adt_destructor                          |              2 |     0.62 |             15 |    86.67
| TypeChecking                              |             53 |    16.46 |           2834 |    79.89
| - trait_impls_of                          |              9 |     2.80 |             65 |    86.15
| - evaluate_obligation                     |              7 |     2.17 |             80 |     2.50
| - const_is_rvalue_promotable_to_static    |              6 |     1.86 |              1 |     0.00
| - is_copy_raw                             |              6 |     1.86 |             29 |    58.62
| - rvalue_promotable_map                   |              6 |     1.86 |              2 |    50.00
| - {time spent not running queries}        |              6 |     1.86 |              0 |     0.00
| - typeck_item_bodies                      |              5 |     1.55 |              1 |     0.00
| - typeck_tables_of                        |              5 |     1.55 |             19 |    94.74
| - dropck_outlives                         |              2 |     0.62 |              1 |     0.00
| - layout_raw                              |              1 |     0.31 |            668 |    87.87
| Linking                                   |             48 |    14.91 |             43 |    46.51
| - {time spent not running queries}        |             48 |    14.91 |              0 |     0.00
| Codegen                                   |             29 |     9.01 |            420 |    61.90
| - {time spent not running queries}        |             16 |     4.97 |              0 |     0.00
| - collect_and_partition_mono_items        |             11 |     3.42 |             13 |    92.31
| - mir_const                               |              1 |     0.31 |              1 |     0.00
| - mir_validated                           |              1 |     0.31 |              3 |    66.67
| Expansion                                 |             14 |     4.35 |              0 |     0.00
| - {time spent not running queries}        |             14 |     4.35 |              0 |     0.00
| BorrowChecking                            |              1 |     0.31 |             12 |    41.67
| - borrowck                                |              1 |     0.31 |              2 |    50.00
| Parsing                                   |              0 |     0.00 |              0 |     0.00

Optimization level: No
Incremental: off

</details>
cc @nikomatsakis @michaelwoerister @Zoxc

Fixes #54141
@bors
Copy link
Contributor

bors commented Feb 10, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: wesleywiser
Pushing 68650ca to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 10, 2019
@bors bors merged commit 584081a into rust-lang:master Feb 10, 2019
@nnethercote
Copy link
Contributor

I think this broke parts of rustc-perf: rust-lang/rustc-perf#342

@nnethercote
Copy link
Contributor

This new code appears to have a drastic performance overhead. E.g. running webrender within rustc-perf the instruction counts went up from 13.3B to 19.3B. Here is the diff from Cachegrind:

5,997,045,804 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
1,268,884,464 (21.16%)  /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe
  553,314,246 ( 9.23%)  /home/njn/moz/rustN/src/librustc/util/profiling.rs:rustc::util::profiling::SelfProfiler::get_results
  486,866,419 ( 8.12%)  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:rustc::util::profiling::SelfProfiler::record
  431,061,188 ( 7.19%)  /home/njn/moz/rustN/src/libcore/slice/mod.rs:<alloc::collections::btree::map::BTreeMap<K, V>>::entry
  390,622,026 ( 6.51%)  /home/njn/moz/rustN/src/libcore/cmp.rs:<alloc::collections::btree::map::BTreeMap<K, V>>::entry
  309,573,832 ( 5.16%)  /home/njn/moz/rustN/src/liballoc/collections/btree/map.rs:<alloc::collections::btree::map::BTreeMap<K, V>>::entry
  275,390,566 ( 4.59%)  /home/njn/moz/rustN/src/liballoc/collections/btree/search.rs:<alloc::collections::btree::map::BTreeMap<K, V>>::entry
  239,623,696 ( 4.00%)  /home/njn/moz/rustN/src/liballoc/collections/btree/search.rs:rustc::util::profiling::SelfProfiler::get_results
  181,305,990 ( 3.02%)  /home/njn/moz/rustN/src/liballoc/collections/btree/map.rs:<alloc::collections::btree::map::Entry<'a, K, V>>::or_default
  164,109,066 ( 2.74%)  /home/njn/moz/rustN/src/libcore/slice/mod.rs:rustc::util::profiling::SelfProfiler::get_results
  155,460,592 ( 2.59%)  /home/njn/moz/rustN/src/liballoc/collections/btree/map.rs:<alloc::collections::btree::map::Entry<'a, K, V>>::or_insert
  147,378,858 ( 2.46%)  /build/glibc-OTsEL5/glibc-2.27/elf/../elf/dl-tls.c:_dl_update_slotinfo
  140,811,068 ( 2.35%)  /home/njn/moz/rustN/src/libcore/hash/sip.rs:<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write
  133,098,728 ( 2.22%)  /home/njn/moz/rustN/src/librustc/util/profiling.rs:rustc::util::profiling::SelfProfiler::record
  125,295,164 ( 2.09%)  /home/njn/moz/rustN/src/liballoc/collections/btree/node.rs:<alloc::collections::btree::map::BTreeMap<K, V>>::entry
   90,432,970 ( 1.51%)  /home/njn/moz/rustN/src/liballoc/collections/btree/map.rs:rustc::util::profiling::SelfProfiler::get_results
   70,141,016 ( 1.17%)  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:rustc::util::profiling::SelfProfiler::record
   68,204,221 ( 1.14%)  /home/njn/moz/rustN/src/libstd/thread/local.rs:<std::thread::local::LocalKey<T>>::try_with
   67,454,387 ( 1.12%)  /home/njn/moz/rustN/src/libcore/cell.rs:<std::thread::local::LocalKey<T>>::try_with

Lots of BTreeMap stuff, and lots of copying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants