Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve foldl with tail-call function-barrier #34293

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions base/reduce.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,18 +47,29 @@ function foldl_impl(op::OP, nt, itr) where {OP}
return v
end

function _foldl_impl(op::OP, init, itr) where {OP}
# Unroll the while loop once; if init is known, the call to op may
# be evaluated at compile time
function _foldl_impl(op::OP, init::T, itr) where {OP, T}
y = iterate(itr)
y === nothing && return init
v = op(init, y[1])
if v isa T
return _foldl_impl(op, v, itr, y[2])
else
return _foldl_impl(op, v, itr, y[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this exactly the same as what's done in the other branch of the conditional? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it helps the compiler. Removing this branch introduces an allocation and makes the code a bit slower:

julia> @eval Base function _foldl_impl(op::OP, init::T, itr) where {OP, T}
           y = iterate(itr)
           y === nothing && return init
           v = op(init, y[1])
           return _foldl_impl(op, v, itr, y[2])
       end
_foldl_impl (generic function with 4 methods)

julia> @btime sum(x for x in $xs if x !== missing)
  780.859 ns (1 allocation: 16 bytes)

But the speedup is not as drastic as I felt while I was playing with it. So I'm OK with removing this micro-optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, as this "no-op if branch" gets rid of a constant-time work, the difference becomes large if you consider shorter arrays:

julia> xs = [abs(x) < 1 ? x : missing for x in randn(10)];

julia> @btime sum(x for x in $xs if x !== missing)
  8.405 ns (0 allocations: 0 bytes)
1.5696312630017393

julia> @eval Base function _foldl_impl(op::OP, init::T, itr) where {OP, T}
           y = iterate(itr)
           y === nothing && return init
           v = op(init, y[1])
           return _foldl_impl(op, v, itr, y[2])
       end
_foldl_impl (generic function with 4 methods)

julia> @btime sum(x for x in $xs if x !== missing)
  29.120 ns (1 allocation: 16 bytes)
1.5696312630017393

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is incredibly bizarre. 😳

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I was just doing union splitting manually.

end
end

_dec(::Val{n}) where {n} = Val(n - 1)

@inline function _foldl_impl(op::OP, acc::T, itr, state, counter = Val(3)) where {OP,T}
while true
y = iterate(itr, y[2])
y === nothing && break
v = op(v, y[1])
y = iterate(itr, state)
y === nothing && return acc
x, state = y
result = op(acc, x)
counter === Val(0) || result isa T ||
return _foldl_impl(op, result, itr, state, _dec(counter))
acc = result
end
return v
end

struct _InitialValue end
Expand Down