Add `thread_ids::Vector` option to Profile.init() #39746

NHDaly · 2021-02-19T04:45:29Z

This option will configure julia's profiler to only run on the
provided thread ids! :)
Adds a global int mask to allow toggling profiling for up to 64
threads in a performant way. I think if you have more than 64 threads,
it's okay that you can't profile individual threads since it's
unlikely to be very meaningful by then...

- This option will configure julia's profiler to only run on the provided thread ids! :) - Adds a global int mask to allow toggling profiling for up to 64 threads in a performant way. I think if you have more than 64 threads, it's okay that you can't profile individual threads since it's unlikely to be very meaningful by then...

NHDaly · 2021-02-19T05:03:29Z

stdlib/Profile/src/Profile.jl

    n_cur = ccall(:jl_profile_maxlen_data, Csize_t, ())
    delay_cur = ccall(:jl_profile_delay_nsec, UInt64, ())/10^9
    if n === nothing && delay === nothing
+        _init_threadid_filter(thread_ids)


Hmm, now that I think about it, I guess thread_ids should be "sticky," too, like the other params? I was trying to be super safe here to make sure it's always reset back to normal to preserve backwards compatibility, but I think that is someone if using the girls, it would make more sense for it to work like the others.

I can change this in the morning. :)

Should this also get added to the return value? Would we want to return the mask or the array? If we want to return the array, I can either add a getter for the mask and reconstruct the vector from it, or we can store the vector as a global in Julia, in parallel with the mask. I think a getter for the mask makes more sense.

NHDaly · 2021-02-19T05:04:27Z

stdlib/Profile/src/Profile.jl

+            end
+            threadid_mask |= 0x1 << (tid-1)
+        end
+        ccall(:jl_profile_init_threadid_filter, Cvoid, (UInt64,), threadid_mask)


Oops, we should move this outside the if block, so that we can reset the filter when passing an empty vector.

vchuravy · 2021-02-19T05:07:41Z

Hmm, I would rather record on which thread the backtrack was collected. In pprof I think you can add that as a field. I suspect that would be more useful than limiting profiling to a specific set of threads (how do you know that they are of more interest then the others)

NHDaly · 2021-02-22T20:57:19Z

Hrmmm yeah, you're maybe right.

... Can you filter out by threadid in pprof if we add it as a field? I guess in the worst case we could do it in our julia API even if you can't do it through the GUI.

Hehe blech, but Valentin it will be so much harder to make the changes to pipe this through everywhere! This is the easier/lazier option 😅

Still, i think you're probably right that this would be a nice thing to record, even if we didn't want to filter it... it'll just be more work to get that set up i think.

how do you know that they are of more interest then the others?

Yeah, I was basically only imagining using this on thread 1, because e.g. if you have some tasks that you only schedule via @async, you can be sure that they'll only run there (though of course you don't know what else might schedule there).

In this case, we were screwing around with trying to fix our server's responsiveness problems with this silly package: https://github.com/RelationalAI-oss/DevilSpawn.jl. But it didn't actually have much impact on responsiveness, and I was curious to see why, since in theory we shouldn't be scheduling (as much) stuff on the main thread, so i was interested to see what was still scheduling there and blocking the thread.

IanButterworth · 2021-07-29T14:12:12Z

Thanks @NHDaly, this issue bugs me too (julia-vscode/julia-vscode#1881)

I think there's an argument that it would be good to have both the ability to select which threads to profile, and add the thread info into the backtraces.
Selecting which threads to profile in advance would save on CPU overhead and use up less of the profile buffer.

Given selecting which threads to profile is relatively simple and wouldn't need a change of any of the profile data reading downstream code, could this PR be considered before both are implemented?

@NHDaly I'd be happy to test this. Perhaps before I do that there seem to be some pending changes, and a rebase given a lot has changed since this

NHDaly · 2021-07-29T15:33:59Z

+1 Thanks @IanButterworth - i agree! I think those are good points. Especially if you're running on a system with close to a hundred threads, you'd need a super big buffer, and if you know ahead of time which threads you want to profile, it can help a lot to apply this filter up front 👍

Let's try to pick this up again 👍

vchuravy · 2021-07-29T15:36:17Z

and if you know ahead of time which threads you want to profile, it can help a lot to apply this filter up front +1

With migration, how do you know? I maintain my objection xD

NHDaly · 2021-07-29T15:43:24Z

With migration, how do you know? I maintain my objection xD

A few possible reasons:

Because some people are (naughtily) running their tasks with @async
You might be interested in just sampling a smaller number of threads to get a smaller profile? 🤷
Assuming that there's some resolution to Severe thread starvation issues #41586 that involves a separate thread pool for "high-priority / low-latency tasks", we could want to profile only that thread pool. (This is the main motivation for me, personally, fwiw. We've been experimenting with this and it seems to be helping our liveness quite a bit.)

vchuravy · 2021-07-29T15:46:47Z

Those folks don't know either, since they don't know from which thread their @async code got called.
Sure, so can we make sampling more efficient?
Then we should add it on a per thread-pool level once/if we have that.

(Nathan, I just really want to nerd-snipe you into doing the hard work xD)

NHDaly · 2022-01-02T16:39:19Z

Now that we've added thread_ids to the profile results (#41742), i think we can close this. Filtering by thread could now be done client-side in the profile viewers. :)

Thanks for pushing, @vchuravy 😉

NHDaly marked this pull request as draft February 19, 2021 04:45

NHDaly requested a review from vchuravy February 19, 2021 04:45

NHDaly mentioned this pull request Feb 19, 2021

Profiling specific threads? #39743

Closed

NHDaly commented Feb 19, 2021

View reviewed changes

vchuravy mentioned this pull request Jul 27, 2021

most of the profile runtime is in task_done_hook unless threads = 1 #41713

Closed

NHDaly closed this Jan 2, 2022

NHDaly deleted the nhd-profile-threadid_mask branch January 2, 2022 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `thread_ids::Vector` option to Profile.init() #39746

Add `thread_ids::Vector` option to Profile.init() #39746

NHDaly commented Feb 19, 2021 •

edited

Loading

NHDaly Feb 19, 2021

NHDaly Feb 19, 2021

vchuravy commented Feb 19, 2021

NHDaly commented Feb 22, 2021 •

edited

Loading

IanButterworth commented Jul 29, 2021 •

edited

Loading

NHDaly commented Jul 29, 2021

vchuravy commented Jul 29, 2021

NHDaly commented Jul 29, 2021

vchuravy commented Jul 29, 2021

NHDaly commented Jan 2, 2022

Add thread_ids::Vector option to Profile.init() #39746

Add thread_ids::Vector option to Profile.init() #39746

Conversation

NHDaly commented Feb 19, 2021 • edited Loading

NHDaly Feb 19, 2021

Choose a reason for hiding this comment

NHDaly Feb 19, 2021

Choose a reason for hiding this comment

vchuravy commented Feb 19, 2021

NHDaly commented Feb 22, 2021 • edited Loading

IanButterworth commented Jul 29, 2021 • edited Loading

NHDaly commented Jul 29, 2021

vchuravy commented Jul 29, 2021

NHDaly commented Jul 29, 2021

vchuravy commented Jul 29, 2021

NHDaly commented Jan 2, 2022

Add `thread_ids::Vector` option to Profile.init() #39746

Add `thread_ids::Vector` option to Profile.init() #39746

NHDaly commented Feb 19, 2021 •

edited

Loading

NHDaly commented Feb 22, 2021 •

edited

Loading

IanButterworth commented Jul 29, 2021 •

edited

Loading