Skip to content

PyTorch implementation of Hamiltonian deep neural networks.

License

Notifications You must be signed in to change notification settings

DecodEPFL/HamiltonianNet

Repository files navigation

Hamiltonian Deep Neural Networks

PyTorch implementation of Hamiltonian deep neural networks as presented in "Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design".

Installation

git clone https://github.com/DecodEPFL/HamiltonianNet.git

cd HamiltonianNet

python setup.py install

Basic usage

2D classification examples:

./examples/run.py --dataset [DATASET] --model [MODEL]

where available values for DATASET are swiss_roll and double_moons.

Distributed training on 2D classification examples:

./examples/run_distributed.py --dataset [DATASET]

where available values for DATASET are swiss_roll and double_circles.

Classification over MNIST dataset:

./examples/run_MNIST.py --model [MODEL]

where available values for MODEL are MS1 and H1.

To reproduce the counterexample of Appendix III:

./examples/gradient_analysis/perturbation_analysis.py

Hamiltonian Deep Neural Networks (H-DNNs)

H-DNNs are obtained after the discretization of an ordinary differential equation (ODE) that represents a time-varying Hamiltonian system. The time varying dynamics of a Hamiltonian system is given by

  and   .

where y(t) ∈ ℝn represents the state, H(y,t): ℝn × ℝ → ℝ is the Hamiltonian function and the n × n matrix J, called interconnection matrix, satisfies .

After discretization, we have

  • H1-DNN:  

  • H2-DNN:  

                         where  

2D classification examples

We consider two benchmark classification problems: "Swiss roll" and "Double circles", each of them with two categories and two features.

swissroll doublecircles

An example of each dataset is shown in the figures above together with the predictions of a trained 64-layer H1-DNN (colored regions on the background). For these examples, the two features data is augmented, leading to yk ∈ ℝ4,   k = 0,...,64.

Figures below shows the hidden feature vectors —the states yk— of all the test data after training. First, a change of basis is performed in order to have the classification hyperplane perpendicular to the first basis vector x1. Then, projections are performed on the new coordinate planes.

propagation Swiss roll propagation Swiss roll propagation Swiss roll

propagation Double circles propagation Double circles propagation Double circles

Counterexample

Previous work conjetured that some classes of H-DNNs avoid exploding gradients when y(t) varies arbitrarily slow. The following numerical example shows that, unfortunately, this is not the case.

We consider the simple case, where the underlying ODE is

(t) = ε J tanh( y(t) )       with       .

We study the evolution of y(t) and yγ(t), t ∈ [t0, T] and t0 ∈ [0, T], with initial conditions y(t0) = y0 and yγ(t0) = y0 + γβ, with γ = 0.05 and β the unitary vectors. The initial condition y0 is set randomly, and normalized to have unitary norm.

y(t)_counterexample phi(t)_counterexample

The left Figure shows the time evolution of y(t), in blue, and yγ(t), in orange, when a perturbation is applied at a time t0 = T-t. The nominal initial condition (y(T-t)) is indicated with a blue circle and the perturbated one (yγ(T-t)) with an orange cross. A zoom is presented on the right side, where a green vector indicates the difference between yγ(T) and y(T).

Figure on the right presents the entries (1,1) and (2,2) of the BSM matrix. Note that the value coincides in sign and magnitud with the green vector.

This numerical experiment confirms that the entries of the BSM matrix (we only show 2 of the 4 entries) diverge as the depth of the network increases (i.e. as the perturbation is introduced further away from the output).

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The code DOI is DOI.

CC BY 4.0

References

[1] Clara L. Galimberti, Luca Furieri, Liang Xu and Giancarlo Ferrari Trecate. "Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design," arXiv:2105.13205, 2021.

[2] Clara L. Galimberti, Liang Xu and Giancarlo Ferrari Trecate. "A unified framework for Hamiltonian deep neural networks," The third annual Learning for Dynamics & Control (L4DC) conference, preprint arXiv:2104.13166 available, 2021.

[3] Eldad Haber and Lars Ruthotto. "Stable architectures for deep neural networks," Inverse Problems, vol. 34, p. 014004, Dec 2017.

[4] Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert and Elliot Holtham. "Reversible architectures for arbitrarily deep residual neural networks," AAAI Conference on Artificial Intelligence, 2018.

About

PyTorch implementation of Hamiltonian deep neural networks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages