Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux & Zygote's AD slower than ForwardDiff #994

Open
fangzhou-xie opened this issue Jul 28, 2019 · 4 comments
Open

Flux & Zygote's AD slower than ForwardDiff #994

fangzhou-xie opened this issue Jul 28, 2019 · 4 comments

Comments

@fangzhou-xie
Copy link

fangzhou-xie commented Jul 28, 2019

I found Zygote's recent advancement in AD and tried to benchmark it and found the following:

julia> @time ForwardDiff.gradient(rosenbrock,x);
  0.074665 seconds (9 allocations: 1.070 MiB)

julia> @time Tracker.gradient(rosenbrock,x);
  1.599425 seconds (937.69 k allocations: 2.269 GiB, 24.53% gc time)

julia> @time Zygote.gradient(rosenbrock,x);
  2.726697 seconds (660.25 k allocations: 4.490 GiB, 23.72% gc time)

where the function rosenbrock is taken from here [Edit: now here]
and x = rand(10000);.
Three functions have been run multiple times for julia's JIT compilation.

I wonder what could be the reason for that?

@itsdfish
Copy link

I have been encountering poor performance as well. However, I cannot reproduce results that extreme. You should use BenchmarkTools and $ to produce more accurate benchmarking results.

using Distributions,Zygote,ForwardDiff,BenchmarkTools,Tracker,Random

Random.seed!(515)

function rosenbrock(x)
   a = one(eltype(x))
   b = 100 * a
   result = zero(eltype(x))
   for i in 1:length(x)-1
       result += (a - x[i])^2 + b*(x[i+1] - x[i]^2)^2
   end
   return result
end
       
x = rand(1000)

@btime ForwardDiff.gradient($rosenbrock,$x)
@btime Tracker.gradient($rosenbrock,$x)
@btime Zygote.gradient($rosenbrock,$x)

Results:

 4.272 ms (4 allocations: 110.72 KiB)
 17.131 ms (117906 allocations: 26.56 MiB)
 19.397 ms (72154 allocations: 48.29 MiB)

System information:

Ubuntu 18.04
Julia 1.1.1

(v1.1) pkg> st Zygote
    Status `~/.julia/environments/v1.1/Project.toml`
  [f6369f11] ForwardDiff v0.10.3
  [e88e6eb3] Zygote v0.3.2

(v1.1) pkg> st Tracker
    Status `~/.julia/environments/v1.1/Project.toml`
  [f6369f11] ForwardDiff v0.10.3
  [9f7883ad] Tracker v0.2.2

@Roger-luo
Copy link
Contributor

I think this is mainly because of tracing the for loop is a bit heavy for reverse mode, since we need to store each getindex as operator in the tape. In Tracker, this results in a bunch of getindex of length 1000, in Zygote this will be stored in Zygote.Stack IIUC which make it have a similar speed with Tracker.

@ToucheSir
Copy link
Member

This looks to be strictly a Zygote thing and could probably moved there (or closed, if we think it's an inherent design limitation) instead of Flux.

@mcabbott mcabbott transferred this issue from FluxML/Flux.jl Jun 14, 2021
@mcabbott
Copy link
Member

mcabbott commented Jun 14, 2021

Indeed. This ought to be sped up by #962, and #981. Some times:

julia> @btime ForwardDiff.gradient($rosenbrock,$x);
  534.750 μs (5 allocations: 111.86 KiB)

julia> @btime Tracker.gradient($rosenbrock,$x);
  3.340 ms (89931 allocations: 25.94 MiB)

julia> @btime Zygote.gradient($rosenbrock,$x);
  7.223 ms (73153 allocations: 48.21 MiB) # v0.6.11
  4.195 ms (71154 allocations: 9.92 MiB)  # v0.6.12, with 962
  1.857 ms (63171 allocations: 2.04 MiB)  # with 981

(Julia 1.6, M1 mac + rosetta). Bigger version:

julia> x = rand(10^5);

julia> @btime ForwardDiff.gradient($rosenbrock,$x);
  6.056 s (6 allocations: 10.68 MiB)

julia> @btime Tracker.gradient($rosenbrock,$x);
ERROR: StackOverflowError:

julia> @btime Zygote.gradient($rosenbrock,$x);
  28.132 s (7900279 allocations: 447.25 GiB) # v0.6.11
  18.026 s (7200286 allocations: 74.73 GiB)  # v0.6.12, with 962
  672.375 ms (6300305 allocations: 211.20 MiB)  # with 981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants