Hamiltonian Deep Neural Networks

PyTorch implementation of Hamiltonian deep neural networks as presented in "Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design".

Installation

git clone https://github.com/DecodEPFL/HamiltonianNet.git

cd HamiltonianNet

python setup.py install

Basic usage

2D classification examples:

./examples/run.py --dataset [DATASET] --model [MODEL]

where available values for DATASET are swiss_roll and double_moons.

Distributed training on 2D classification examples:

./examples/run_distributed.py --dataset [DATASET]

where available values for DATASET are swiss_roll and double_circles.

Classification over MNIST dataset:

./examples/run_MNIST.py --model [MODEL]

where available values for MODEL are MS1 and H1.

To reproduce the counterexample of Appendix III:

./examples/gradient_analysis/perturbation_analysis.py

Hamiltonian Deep Neural Networks (H-DNNs)

H-DNNs are obtained after the discretization of an ordinary differential equation (ODE) that represents a time-varying Hamiltonian system. The time varying dynamics of a Hamiltonian system is given by

$\dot{\boldsymbol{y}}(t)=\boldsymbol{J}(\boldsymbol{y},t)\frac{\partial H(\boldsymbol{y},t)}{\partial \boldsymbol{y}}$ and $H(\boldsymbol{y},t) = [\tilde{\sigma}(\boldsymbol{K}(t) \boldsymbol{y}(t)+\boldsymbol{b}(t))]^{\top} \boldsymbol{1}$ .

where y(t) ∈ ℝⁿ represents the state, H(y,t): ℝⁿ × ℝ → ℝ is the Hamiltonian function and the n × n matrix J, called interconnection matrix, satisfies $\boldsymbol{J}(\boldsymbol{y},t)=\boldsymbol{J}^{\top}(\boldsymbol{y},t)\,, \forall t$ .

After discretization, we have

H₁-DNN:

$\boldsymbol{y}_{j+1} = \boldsymbol{y}_j + h\, \boldsymbol{J}_j\, \boldsymbol{K}^\top_j \sigma(\boldsymbol{K}_j {\bf y}_j +\boldsymbol{b}_j)$

H₂-DNN:

$\begin{bmatrix} {\bf p}_{j+1} \\ {\bf q}_{j+1} \end{bmatrix} = \begin{bmatrix} {\bf p}_{j} \\ {\bf q}_{j} \end{bmatrix} + h\, {\bf J} {\bf K}^\top_j \sigma\left({\bf K}_j \begin{bmatrix} {\bf p}_{j+1} \\ {\bf q}_{j} \end{bmatrix} +{\bf b}_j\right)$

where $\begin{bmatrix} {\bf p}_j \\ {\bf q}_j \end{bmatrix}=\boldsymbol{y}_j$

2D classification examples

We consider two benchmark classification problems: "Swiss roll" and "Double circles", each of them with two categories and two features.

An example of each dataset is shown in the figures above together with the predictions of a trained 64-layer H₁-DNN (colored regions on the background). For these examples, the two features data is augmented, leading to y_k ∈ ℝ⁴, k = 0,...,64.

Figures below shows the hidden feature vectors —the states y_k— of all the test data after training. First, a change of basis is performed in order to have the classification hyperplane perpendicular to the first basis vector x₁. Then, projections are performed on the new coordinate planes.

Counterexample

Previous work conjetured that some classes of H-DNNs avoid exploding gradients when y(t) varies arbitrarily slow. The following numerical example shows that, unfortunately, this is not the case.

We consider the simple case, where the underlying ODE is

ẏ(t) = ε J tanh( y(t) ) with $\boldsymbol{J} = \begin{bmatrix}0 & -1 \\ 1 & 0 \end{bmatrix}$ .

We study the evolution of y(t) and y_γ(t), t ∈ [t₀, T] and t₀ ∈ [0, T], with initial conditions y(t₀) = y₀ and y_γ(t₀) = y₀ + γβ, with γ = 0.05 and β the unitary vectors. The initial condition y₀ is set randomly, and normalized to have unitary norm.

The left Figure shows the time evolution of y(t), in blue, and y_γ(t), in orange, when a perturbation is applied at a time t₀ = T-t. The nominal initial condition (y(T-t)) is indicated with a blue circle and the perturbated one (y_γ(T-t)) with an orange cross. A zoom is presented on the right side, where a green vector indicates the difference between y_γ(T) and y(T).

Figure on the right presents the entries (1,1) and (2,2) of the BSM matrix. Note that the value coincides in sign and magnitud with the green vector.

This numerical experiment confirms that the entries of the BSM matrix (we only show 2 of the 4 entries) diverge as the depth of the network increases (i.e. as the perturbation is introduced further away from the output).

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The code DOI is .

References

[1] Clara L. Galimberti, Luca Furieri, Liang Xu and Giancarlo Ferrari Trecate. "Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design," arXiv:2105.13205, 2021.

[2] Clara L. Galimberti, Liang Xu and Giancarlo Ferrari Trecate. "A unified framework for Hamiltonian deep neural networks," The third annual Learning for Dynamics & Control (L4DC) conference, preprint arXiv:2104.13166 available, 2021.

[3] Eldad Haber and Lars Ruthotto. "Stable architectures for deep neural networks," Inverse Problems, vol. 34, p. 014004, Dec 2017.

[4] Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert and Elliot Holtham. "Reversible architectures for arbitrarily deep residual neural networks," AAAI Conference on Artificial Intelligence, 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
examples		examples
gradient_analysis		gradient_analysis
integrators		integrators
regularization		regularization
viewers		viewers
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hamiltonian Deep Neural Networks

Installation

Basic usage

Hamiltonian Deep Neural Networks (H-DNNs)

2D classification examples

Counterexample

License

References

About

Releases

Packages

Contributors 2

Languages

License

DecodEPFL/HamiltonianNet

Folders and files

Latest commit

History

Repository files navigation

Hamiltonian Deep Neural Networks

Installation

Basic usage

Hamiltonian Deep Neural Networks (H-DNNs)

2D classification examples

Counterexample

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages