Frequently Asked Questions (FAQ)

What are good resources to understand how ODEs can be solved?
Solving Ordinary Differential Equations I Nonstiff Problems by Hairer et al.
ODE solver selection in MatLab

What are the ODE solvers available in this repo?

Adaptive-step:
- dopri8 Runge-Kutta 7(8) of Dormand-Prince-Shampine
- dopri5 Runge-Kutta 4(5) of Dormand-Prince [default].
- bosh3 Runge-Kutta 2(3) of Bogacki-Shampine
- adaptive_heun Runge-Kutta 1(2)
Fixed-step:
- euler Euler method.
- midpoint Midpoint method.
- rk4 Fourth-order Runge-Kutta with 3/8 rule.
- explicit_adams Explicit Adams.
- implicit_adams Implicit Adams.
scipy_solver: Wraps a SciPy solver.

What are NFE-F and NFE-B?
Number of function evaluations for forward and backward pass.

What are rtol and atol?
They refer to relative rtol and absolute atol error tolerance.

What is the role of error tolerance in adaptive solvers?
The basic idea is each adaptive solver can produce an error estimate of the current step, and if the error is greater than some tolerance, then the step is redone with a smaller step size, and this repeats until the error is smaller than the provided tolerance.
Error Tolerances for Variable-Step Solvers

How is the error tolerance calculated?
The error tolerance is calculated as atol + rtol * norm of current state, where the norm being used is a mixed L-infinity/RMS norm.

Where is the code that computes the error tolerance?
It is computed here.

How many states must a Neural ODE solver store during a forward pass with the adjoint method?
The number of states required to be stored in memory during a forward pass is solver dependent. For example, dopri5 requires 6 intermediate states to be stored.

How many function evaluations are there per ODE step on adaptive solvers?

dopri5
The dopri5 ODE solver stores at least 6 evaluations of the ODE, then takes a step using a linear combination of them. The diagram below illustrates it: the evaluations marked with o are on the estimated path, the others with x are not. The first two are for selecting the initial step size.
```
0  1 |  2  3  4  5  6  7 |  8  9  10 12 13 14
o  x |  x  x  x  x  x  o |  x  x  x  x  x  o
```

How do I obtain evaluations on the estimated path when using an adaptive solver?
The argument t of odeint specifies what times should the ODE solver output.
odeint(func, x0, t=torch.linspace(0, 1, 50))

Note that the ODE solver will always integrate from min t(0) to max t(1), and the intermediate values of t have no effect on how the ODE the solved. Intermediate values are computed using polynomial interpolation and have very small cost.

What non-linearities should I use in my Neural ODE?
Avoid non-smooth non-linearities such as ReLU and LeakyReLU.
Prefer non-linearities with a theoretically unique adjoint/gradient such as Softplus.

Where is backpropagation for the Neural ODE defined?
It's defined here if you use the adjoint method odeint_adjoint.

What are Tableaus?
Tableaus are ways to describe coefficients for RK methods. The particular set of coefficients used on this repo was taken from here.

How do I install the repo on Windows?
Try downloading the code directly and just running python setup.py install. https://stackoverflow.com/questions/52528955/installing-a-python-module-from-github-in-windows-10

What is the most memory-expensive operation during training?
The most memory-expensive operation is the single backward call made to the network.

My Neural ODE's numerical solution is farther away from the target than the initial value
Most tricks for initializing residual nets (like zeroing the weights of the last layer) should help for ODEs as well. This will initialize the ODE as an identity.

My Neural ODE takes too long to train
This might be because you're running on CPU. Being extremely slow on CPU is expected, as training requires evaluating a neural net multiple times.

My Neural ODE produces underflow in dt when using adaptive solvers like dopri5
This is a problem of the ODE becoming stiff, essentially acting too erratic in a region and the step size becomes so close to zero that no progress can be made in the solver. We were able to avoid this with regularization such as weight decay and using "nice" activation functions, but YMMV. Other potential options are just to accept a larger error by increasing atol, rtol, or by switching to a fixed solver.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

Frequently Asked Questions (FAQ)

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions (FAQ)