Releases: BaguaSys/bagua
Releases · BaguaSys/bagua
v0.9.2
v0.9.1
v0.9.0
Bug Fixes
Other
- Reuse fused parameter tensors in fuse_step (#410)
- Call step closure in qadam optimizer step (#432)
- Fix need_reset condition (#454)
- Do negotiation in async native op (#447)
- Fix find_unused_parameters (#452)
- Fix qadam non-deterministic (#459)
- Add
LIBRARY_PATH
env ininstall_master.sh
(#465) - Fix typo in
install_master.sh
(#471)
Python
- CUDA 11.5 can't get nccl package (#415)
- Fix process group compatibility with torch 1.6.0 (#413)
- Fix ci random fail (#445)
- Fix async algorithm (#479)
Features
Core
- Initial support for C interface (#325)
Other
- Support NODE_RANK environment variable (#426)
- Choose bagua service port dynamically (#431)
- Use bagua_module_name to identify different modules (#438)
- Add algorithm registry (#433)
- Add compatibility for NCCL version under 2.10 (#449)
- Add broadcast object api (#437)
- Support qadam in fused optimizer (#477)
Python
v0.8.2
Features
Python
- Support switching between different algorithms (#299)
- Support separate algorithm declaration and implementation (#246)
Python, core
- Support process group in
with_bagua
, support hierarchical communication in bytegrad algorithm (#300) - Support mutable bucket tensors (#271)
- Support
all_to_all_single
(#361)
Bug Fixes
Other
- Fuse optimizer oom and make it stateless (#207)
to_bagua_tensor
compatibility with torch 1.6.0 (#355)
Python
v0.8.1.post1
Bug Fixes
- Process group not yet supported in with_bagua
- Use separate process group for async communication thread to avoid potential hangs (#298)
v0.8.1
v0.8.0
[0.8.0] - 2021-09-26
Bug Fixes
Ci
- Only run publish once on git tag
Core
- Fix compressed buffer can not be scattered to odd number of ranks
Other
- Fix ci pypi versioning
- Remove init.py and python version, use cargo version
- Move import bagua_install_library to install library function
- Merge bagua_install_library and setup.py, remove nccl<=2.6 support
- Fix alltoall_v parameter (#17)
- Reduce and allgather python interface
- Fix decompress incorrect pointer and typo in error msg
- Fix python gil deadlock during getting data ptr
- Fix benchmark script requirements
- Fix alltoall_v parameter types (#27)
- Always mark bagua padding tensor as ready
- Make compress/decompress of BaguaTensor
method
string consistent (#33) - Fix scatter and reduce_scatter implementation (#40)
- Substract overflow error for decentralized op (#39)
- Fix QADAM params (#17)
- Fix assert precision (#18)
- Replace mutex with atomic bool for async op and add Aluminum submodule update (#67)
- Fix duplicated dependency downloading during installation (#77)
- Fix async algorithm aborting and hanging (#78, #81)
- Fix qadam algorithm call (#20)
- Fix missing symbols in the zip library (#24)
- Fix random autotune server hang (#206)
- Bagua-net library path mismatch, make
--enable_bagua_net
argument style consistent with other args (#218)
Python
- Fix random autotune-service hang
- Handle conflicts caused by sklearn upgrade (#225)
Features
Ci
- Only publish pypi for master commits
Other
- Add async model average algorithm (#110)
- Add cached dataset wrapper (#148)
- Support sync batchnorm (#151)
- Add
--enable-bagua-net
option in launcher (#183) - Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
- Add requirements.txt, only download dataset on local rank 0 (#2)
- Add python packaging related files
- Add
__version__
variable - Install nccl deps in bagua core and add generated
__version__
variable - Add version.py placeholder to prevent file not found error
- Initial support for python op (#2)
- Add 5 min timeout for buckets' comm op (#5)
- Replace NCCL with Aluminum (#7)
- Add synethetic benchmark script (#5)
- Add elastic training example (#7)
- Support alltoall_v (vector alltoall) (#14)
- Add reduce and allgather python interface
- Support reduce and allgather op with Reduction op enum
- Support creating BaguaTensor by passing torch tensor directly (#19)
- Compatible mode for getting pytorch tensor info with Python interpreter
- Better debug log including tensor info when executing ops
- Add native low precision decentralized operator (#26)
- Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
- Make full precision decentralized op stateless (#36)
- Add communication_primitives example (#12)
- Use nccl 2.10 avg op for all algorithms using averaging (#46, #45)
- Add opentelemetry to report tensor ready order (#42)
- Add deterministic flag (#15)
- Add native async model average algorithm (#41)
- Add examples for async model average algorithm (#14)
- Support packet splitting and multi-stream parallel transmission (#5)
- Support ncclnet v3 and remove the dependency on nccl in the installation environment (#17)
- Add sync interval param to async examples (#19)
- Suppport tokio backend (#21)
- Support bagua-net (#89)
v0.7.0
v0.7.0-rc2
chore: requires bagua-core 0.4-0.5 now
v0.7.0-rc1
feat: make full precision decentralized op stateless (#126) BREAKING CHANGE: `BaguaBucket.append_decentralized_synchronous_op` now only supports full precision decentralized communication.