Changes
- Add support of for AlphaZero.
- Add PGX environments to be used with AlphaZero.
- Remove use of JAX callbacks for logging (previous implementation wasn't compatible with TPUs). Training loop now exits out of XLA execution to execute callbacks.
- Lots of refactoring/restructuring of codebase.
- Increase JAX dependency version.