TODO:
- 2D primitives: switch_batch_time, stack_neighbors
- 2D command line
- single mat option
- implement GRU, RNN, IIR+log
- implement per-class or per-step weights
- OMP parallel training
- add convolutional layers
Experiments:
- different initializations
- other update rules