-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #1
base: master
Are you sure you want to change the base?
Conversation
Oooh, I see. Unfortunately, the patch in this PR is not generic enough since, in general, we can't assume any structure outer/left to xs |> Filter(x -> x > 0) |> Map(type_instability) |> OptimizeInner() |> Map(asint) (But it does help me understand the problem. Thanks!) I'm not sure what's the best strategy, though. I think we need something like @please_inline Transducers.next(rf::R_{OptimizeXF}, acc, @nospecialize(input)) = ... in the Julia compiler to fully solve the problem; i.e., the compiler inlines this even though Meanwhile, maybe I should stop trying to support (type_instability(x) for x in xs) |> Map(asint)
# ----------
# JIT'ed |
Yeah, I had the fear that simply eliminating that call is not the way to go... About the compiler support: I struggle a lot with inlining, and the possibility to force it would result in measurable performance improvements in the original target of Catwalk.jl, but I never was brave enough to ask for it... Forcing inlining from the call site seems a bit less risky in terms of accidental compilation overhead, and now we have a real use case. Do you think it is time to open an issue? |
My guess is that many Julia programmers wished there was a forced/more controllable inlining macro at least once. I couldn't find it in the issue tracker, though, which is kinda strange. Maybe everyone assumed there is already one 😄 . So yeah, I think it'd be nice to have an issue for this. |
Great news, @tkf : JuliaLang/julia#41328 allows forced inlining! I have tested this case (only on non-folds, non-catwalk sample code for now, I have package installation issues after compiling 1.8-dev). |
Thanks! Yeah, that's great news, esp. for packages heavily depend on higher-order function like Transducers. |
A possible fix of the missing performance gain.
The problem was that
was called before the Catwalked method of
next
, resulting in a non-jitted dynamic dispatch.I am not sure though if what I did is reasonable in the larger context, but I hope you can fix it based on this.
Also, the default batch size was too small, so I have increased it to 1e6, which may be more than ideal, more tests are needed.
When testing with
@btime
, initial overhead should be small, but I see a small amount of compilation in every Catwalked run, thats why the tested runtimes have to be several seconds. I will check that, but I like to test cold runs with@time
anyway, because Catwalk adds significant compiling overhead, and not measuring it seems unfair.