Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line Search for Newton-Raphson #194

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
EnumX = "4e289a0a-7415-4d19-859d-a7e5c4648b56"
FiniteDiff = "6a86dc24-6348-571c-b903-95158fe2bd41"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
Expand Down
4 changes: 3 additions & 1 deletion src/NonlinearSolve.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import ArrayInterface
import LinearSolve
using DiffEqBase
using SparseDiffTools
using LineSearches

@reexport using SciMLBase
using SciMLBase: NLStats
Expand Down Expand Up @@ -65,7 +66,8 @@ PrecompileTools.@compile_workload begin
end

export RadiusUpdateSchemes

export LineSearches
export NewtonRaphson, TrustRegion, LevenbergMarquardt


end # module
16 changes: 16 additions & 0 deletions src/jacobian.jl
Original file line number Diff line number Diff line change
Expand Up @@ -183,3 +183,19 @@ function jacobian_autodiff(f, x::AbstractArray, nonlinfun, alg)
jac_prototype = jac_prototype, chunksize = chunk_size),
num_of_chunks)
end

function simple_jacobian(cache, x::Number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the nonlinear solve be responsible for providing the jacobian? What if the problem doesn't support autodiff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I just use FiniteDiff when autodiff is passed as false ? That should be a correct approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already handled by other dispatches. Why does this exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, however in the current dispatch, the jacobian is calculated at cache.u, i.e. the dispatch takes in cache as an argument and not the x. But in when linesearch is performed, we need the jacobian at multiple x's (for multiple step lengths $\alpha_{k}$), so we can't just do jacobian(f, cache) as in the current dispatch.

for example like here (https://github.com/JuliaNLSolvers/LineSearches.jl/blob/ded667a80f47886c77d67e8890f6adb127679ab4/src/strongwolfe.jl#L84)

i can't think of any other way to approach this, any inputs will be helpful?

Copy link
Member Author

@yash2798 yash2798 Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although i have an idea like this roughly

function simple_jacobian(cache, x)
cache.tmp_var = cache.u
cache.u = x
J = jacobian(cache, f) ## our current dispatch
cache.u = cache.tmp_var
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this way we don't have to deal with sparsity and AD/Finite diff separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be something like that reusing the existing machinery. But the vjp shouldn't build the Jacobian at all: you don't actually need to calculate the Jacobian for most of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are absolutely right. so we need $J^{T}v = (v^{T}J)^{T}$. We can calculate vjp $v^{T}J$. are there any routines in SparseDiffTools.jl or do we need to do this kind of 'in-situ'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if in-situ, can you point me to something that i can take help from regarding reverse-mode AD, i am not very well versed with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's precisely what Reverse-Mode AD calculates without calculating the Jacobian. This is why I want @avik-pal to take a look and see how that would mesh with #206

@unpack f, p = cache
g = Base.Fix2(f, p)
ForwardDiff.derivative(g, x)
end

function simple_jacobian(cache, x::AbstractArray{<:Number})
@unpack f, fu, p, prob = cache
if !SciMLBase.isinplace(prob)
g = Base.Fix2(f, p)
return ForwardDiff.jacobian(g, x)
else
return ForwardDiff.jacobian((fu, x) -> f(fu, x, p), fu, x)
end
end
93 changes: 78 additions & 15 deletions src/raphson.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ for large-scale and numerically-difficult nonlinear systems.
- `diff_type`: the type of finite differencing used if `autodiff = false`. Defaults to
`Val{:forward}` for forward finite differences. For more details on the choices, see the
[FiniteDiff.jl](https://github.com/JuliaDiff/FiniteDiff.jl) documentation.
- `linesearch`: the line search algorithm used. Defaults to no line search being used.
- `linsolve`: the [LinearSolve.jl](https://github.com/SciML/LinearSolve.jl) used for the
linear solves within the Newton method. Defaults to `nothing`, which means it uses the
LinearSolve.jl default algorithm choice. For more information on available algorithm
Expand All @@ -48,29 +49,33 @@ for large-scale and numerically-difficult nonlinear systems.
Currently, the linear solver and chunk size choice only applies to in-place defined
`NonlinearProblem`s. That is expected to change in the future.
"""
struct NewtonRaphson{CS, AD, FDT, L, P, ST, CJ} <:
struct NewtonRaphson{CS, AD, FDT, L, P, ST, CJ, LS} <:
AbstractNewtonAlgorithm{CS, AD, FDT, ST, CJ}
linsolve::L
precs::P
linesearch::LS
end

function NewtonRaphson(; chunk_size = Val{0}(), autodiff = Val{true}(),
standardtag = Val{true}(), concrete_jac = nothing,
diff_type = Val{:forward}, linsolve = nothing, precs = DEFAULT_PRECS)
diff_type = Val{:forward}, linesearch = LineSearches.Static(), linsolve = nothing, precs = DEFAULT_PRECS)
NewtonRaphson{_unwrap_val(chunk_size), _unwrap_val(autodiff), diff_type,
typeof(linsolve), typeof(precs), _unwrap_val(standardtag),
_unwrap_val(concrete_jac)}(linsolve,
precs)
_unwrap_val(concrete_jac), typeof(linesearch)}(linsolve,
precs, linesearch)
end

mutable struct NewtonRaphsonCache{iip, fType, algType, uType, duType, resType, pType,
mutable struct NewtonRaphsonCache{iip, fType, algType, uType, duType, resType, pType, foType, goType,
INType, tolType,
probType, ufType, L, jType, JC}
f::fType
alg::algType
u::uType
fu::resType
p::pType
α::Real
f_o::foType # stores value of objective
yash2798 marked this conversation as resolved.
Show resolved Hide resolved
g_o::goType # stores grad of objective at x
uf::ufType
linsolve::L
J::jType
Expand All @@ -85,27 +90,27 @@ mutable struct NewtonRaphsonCache{iip, fType, algType, uType, duType, resType, p
stats::NLStats

function NewtonRaphsonCache{iip}(f::fType, alg::algType, u::uType, fu::resType,
p::pType, uf::ufType, linsolve::L, J::jType,
p::pType, α::Real, f_o::foType, g_o::goType, uf::ufType, linsolve::L, J::jType,
du1::duType,
jac_config::JC, force_stop::Bool, maxiters::Int,
internalnorm::INType,
retcode::SciMLBase.ReturnCode.T, abstol::tolType,
prob::probType,
stats::NLStats) where {
iip, fType, algType, uType,
duType, resType, pType, INType,
duType, resType, pType, foType, goType, INType,
tolType,
probType, ufType, L, jType, JC}
new{iip, fType, algType, uType, duType, resType, pType, INType, tolType,
probType, ufType, L, jType, JC}(f, alg, u, fu, p,
new{iip, fType, algType, uType, duType, resType, pType, foType, goType, INType, tolType,
probType, ufType, L, jType, JC}(f, alg, u, fu, p, α, f_o, g_o,
uf, linsolve, J, du1, jac_config,
force_stop, maxiters, internalnorm,
retcode, abstol, prob, stats)
end
end

function jacobian_caches(alg::NewtonRaphson, f, u, p, ::Val{false})
JacobianWrapper(f, p), nothing, nothing, nothing, nothing
JacobianWrapper(f, p), nothing, nothing, zero(u), nothing
end

function SciMLBase.__init(prob::NonlinearProblem{uType, iip}, alg::NewtonRaphson,
Expand All @@ -122,6 +127,9 @@ function SciMLBase.__init(prob::NonlinearProblem{uType, iip}, alg::NewtonRaphson
end
f = prob.f
p = prob.p
α = 1.0 #line search coefficient
f_o = 0.0
g_o = zero(u)
if iip
fu = zero(u)
f(fu, u, p)
Expand All @@ -130,7 +138,7 @@ function SciMLBase.__init(prob::NonlinearProblem{uType, iip}, alg::NewtonRaphson
end
uf, linsolve, J, du1, jac_config = jacobian_caches(alg, f, u, p, Val(iip))

return NewtonRaphsonCache{iip}(f, alg, u, fu, p, uf, linsolve, J, du1, jac_config,
return NewtonRaphsonCache{iip}(f, alg, u, fu, p, α, f_o, g_o, uf, linsolve, J, du1, jac_config,
false, maxiters, internalnorm,
ReturnCode.Default, abstol, prob, NLStats(1, 0, 0, 0, 0))
end
Expand All @@ -139,13 +147,13 @@ function perform_step!(cache::NewtonRaphsonCache{true})
@unpack u, fu, f, p, alg = cache
@unpack J, linsolve, du1 = cache
jacobian!(J, cache)

# u = u - J \ fu
linres = dolinsolve(alg.precs, linsolve, A = J, b = _vec(fu), linu = _vec(du1),
p = p, reltol = cache.abstol)
cache.linsolve = linres.cache
@. u = u - du1
f(fu, u, p)
perform_linesearch!(cache)
@. cache.u = cache.u - cache.α * cache.du1
f(cache.fu, cache.u, p)

if cache.internalnorm(cache.fu) < cache.abstol
cache.force_stop = true
Expand All @@ -160,7 +168,9 @@ end
function perform_step!(cache::NewtonRaphsonCache{false})
@unpack u, fu, f, p = cache
J = jacobian(cache, f)
cache.u = u - J \ fu
cache.du1 = J \ fu
yash2798 marked this conversation as resolved.
Show resolved Hide resolved
perform_linesearch!(cache)
cache.u = cache.u - cache.α * cache.du1
cache.fu = f(cache.u, p)
if iszero(cache.fu) || cache.internalnorm(cache.fu) < cache.abstol
cache.force_stop = true
Expand All @@ -172,6 +182,44 @@ function perform_step!(cache::NewtonRaphsonCache{false})
return nothing
end

function objective_linesearch(cache::NewtonRaphsonCache) ## returns the objective functions required in LS
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function objective_linesearch returns 3 functions and is called in a loop (so it might get called multiple times). is this an efficient approach?

@unpack f, p = cache

function fo(x)
return dot(value_f(cache, x), value_f(cache, x)) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the same function called twice?

end

function g!(g_o, x)
mul!(g_o, simple_jacobian(cache, x)', value_f(cache, x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good way to compute this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a vjp with reversemode ad?

g_o
end

function fg!(g_o, x)
g!(g_o, x)
fo(x)
end
return fo, g!, fg!
end

function perform_linesearch!(cache::NewtonRaphsonCache)
@unpack u, fu, du1, alg, α = cache
fo, g!, fg! = objective_linesearch(cache)
cache.f_o = fo(u)
g!(cache.g_o, u)
@unpack f_o, g_o = cache
ϕ(α) = fo(u .- α .* du1)

function dϕ(α)
g!(g_o, u .- α .* du1)
return dot(g_o, -du1)
end

function ϕdϕ(α)
return (fg!(g_o, u .- α .* du1), dot(g_o, -du1))
end
Comment on lines +210 to +219
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, these 3 functions are defined each time the perform_linesearch! function is called. Can this lead to performance issues?

cache.α, cache.f_o = alg.linesearch(ϕ, dϕ, ϕdϕ, 1.0, f_o, dot(du1, -g_o))
end

function SciMLBase.solve!(cache::NewtonRaphsonCache)
while !cache.force_stop && cache.stats.nsteps < cache.maxiters
perform_step!(cache)
Expand Down Expand Up @@ -207,3 +255,18 @@ function SciMLBase.reinit!(cache::NewtonRaphsonCache{iip}, u0 = cache.u; p = cac
cache.retcode = ReturnCode.Default
return cache
end

## some helper func ###

function value_f(cache::NewtonRaphsonCache, x)
iip = SciMLBase.isinplace(cache.prob)
@unpack f, p = cache
if iip
res = zero(x)
f(res, x, p)
else
res = f(x, p)
end
res
end

3 changes: 3 additions & 0 deletions src/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -157,3 +157,6 @@ function rfunc(r::R, c2::R, M::R, γ1::R, γ2::R, β::R) where {R <: Real} # R-f
return (1 - γ1 - β) * (exp(r - c2) + β / (1 - γ1 - β))
end
end



Loading