What are good resources to understand how ODEs can be solved?
Solving Ordinary Differential Equations I Nonstiff Problems by Hairer et al.
ODE solver selection in MatLab
What are the ODE solvers available in this repo?
-
Adaptive-step:
dopri8
Runge-Kutta 7(8) of Dormand-Prince-Shampinedopri5
Runge-Kutta 4(5) of Dormand-Prince [default].bosh3
Runge-Kutta 2(3) of Bogacki-Shampineadaptive_heun
Runge-Kutta 1(2)
-
Fixed-step:
euler
Euler method.midpoint
Midpoint method.rk4
Fourth-order Runge-Kutta with 3/8 rule.explicit_adams
Explicit Adams.implicit_adams
Implicit Adams.
-
scipy_solver
: Wraps a SciPy solver.
What are NFE-F
and NFE-B
?
Number of function evaluations for forward and backward pass.
What are rtol
and atol
?
They refer to relative rtol
and absolute atol
error tolerance.
What is the role of error tolerance in adaptive solvers?
The basic idea is each adaptive solver can produce an error estimate of the current step, and if the error is greater than some tolerance, then the step is redone with a smaller step size, and this repeats until the error is smaller than the provided tolerance.
Error Tolerances for Variable-Step Solvers
How is the error tolerance calculated?
The error tolerance is calculated as atol + rtol * norm of current state
, where the norm being used is a mixed L-infinity/RMS norm.
Where is the code that computes the error tolerance?
It is computed here.
How many states must a Neural ODE solver store during a forward pass with the adjoint method?
The number of states required to be stored in memory during a forward pass is solver dependent. For example, dopri5
requires 6 intermediate states to be stored.
How many function evaluations are there per ODE step on adaptive solvers?
-
dopri5
Thedopri5
ODE solver stores at least 6 evaluations of the ODE, then takes a step using a linear combination of them. The diagram below illustrates it: the evaluations marked witho
are on the estimated path, the others withx
are not. The first two are for selecting the initial step size.0 1 | 2 3 4 5 6 7 | 8 9 10 12 13 14 o x | x x x x x o | x x x x x o
How do I obtain evaluations on the estimated path when using an adaptive solver?
The argument t
of odeint
specifies what times should the ODE solver output.
odeint(func, x0, t=torch.linspace(0, 1, 50))
Note that the ODE solver will always integrate from min t(0)
to max t(1)
, and the intermediate values of t
have no effect on how the ODE the solved. Intermediate values are computed using polynomial interpolation and have very small cost.
What non-linearities should I use in my Neural ODE?
Avoid non-smooth non-linearities such as ReLU and LeakyReLU.
Prefer non-linearities with a theoretically unique adjoint/gradient such as Softplus.
Where is backpropagation for the Neural ODE defined?
It's defined here if you use the adjoint method odeint_adjoint
.
What are Tableaus?
Tableaus are ways to describe coefficients for RK methods. The particular set of coefficients used on this repo was taken from here.
How do I install the repo on Windows?
Try downloading the code directly and just running python setup.py install.
https://stackoverflow.com/questions/52528955/installing-a-python-module-from-github-in-windows-10
What is the most memory-expensive operation during training?
The most memory-expensive operation is the single backward call made to the network.
My Neural ODE's numerical solution is farther away from the target than the initial value
Most tricks for initializing residual nets (like zeroing the weights of the last layer) should help for ODEs as well. This will initialize the ODE as an identity.
My Neural ODE takes too long to train
This might be because you're running on CPU. Being extremely slow on CPU is expected, as training requires evaluating a neural net multiple times.
My Neural ODE produces underflow in dt when using adaptive solvers like dopri5
This is a problem of the ODE becoming stiff, essentially acting too erratic in a region and the step size becomes so close to zero that no progress can be made in the solver. We were able to avoid this with regularization such as weight decay and using "nice" activation functions, but YMMV. Other potential options are just to accept a larger error by increasing atol
, rtol
, or by switching to a fixed solver.