v0.26.0
What's New
1. Torch 2.5.0 Compatibility (#3609)
We've added support for torch 2.5.0, including necessary patches to Torch.
Deprecations and Breaking Changes
1. FSDP Configuration Changes(#3681)
We no longer support passing fsdp_config
and fsdp_auto_wrap
directly to Trainer
.
If you'd like to specify an fsdp config and configure fsdp auto wrapping, you should use parallelism_config
.
trainer = Trainer(
parallelism_config = {
'fsdp': {
'auto_wrap': True
...
}
}
)
2. Removal of Pytorch Legacy Sharded Checkpoint Support (#3631)
PyTorch briefly used a different sharded checkpoint format than the current one, which was quickly deprecated by PyTorch. We have removed support for this format. We initially removed support for saving in this format in #2262, and the original feature was added in #1902. Please reach out if you have concerns or need help converting your checkpoints to the new format.
What's Changed
- Add backward compatibility checkpoint tests for v0.25.0 by @dakinggg in #3635
- Don't use TP when
tensor_parallel_degree
is 1 by @eitanturok in #3636 - Update huggingface-hub requirement from <0.25,>=0.21.2 to >=0.21.2,<0.26 by @dependabot in #3637
- Update transformers requirement from !=4.34.0,<4.45,>=4.11 to >=4.11,!=4.34.0,<4.46 by @dependabot in #3638
- Bump databricks-sdk from 0.32.0 to 0.33.0 by @dependabot in #3639
- Remove Legacy Checkpointing by @mvpatel2000 in #3631
- Surface UC permission error by @b-chu in #3642
- Tensor Parallelism Tests by @eitanturok in #3620
- Switch to log.info for deterministic mode by @mvpatel2000 in #3643
- Update pre-commit requirement from <4,>=3.4.0 to >=3.4.0,<5 by @dependabot in #3645
- Update peft requirement from <0.13,>=0.10.0 to >=0.10.0,<0.14 by @dependabot in #3646
- Create callback to load checkpoint by @irenedea in #3641
- Bump jupyter from 1.0.0 to 1.1.1 by @dependabot in #3595
- Fix DB SDK Import by @mvpatel2000 in #3648
- Bump coverage[toml] from 7.6.0 to 7.6.3 by @dependabot in #3651
- Bump pypandoc from 1.13 to 1.14 by @dependabot in #3652
- Replace list with Sequence by @KuuCi in #3654
- Add better error handling for non-rank 0 during Monolithic Checkpoint Loading by @j316chuck in #3647
- Raising a better warning if train or eval did not process any data. by @ethantang-db in #3656
- Fix Logo by @XiaohanZhangCMU in #3659
- Update huggingface-hub requirement from <0.26,>=0.21.2 to >=0.21.2,<0.27 by @dependabot in #3668
- Bump cryptography from 42.0.8 to 43.0.3 by @dependabot in #3667
- Bump pytorch to 2.5.0 by @b-chu in #3663
- Don't overwrite sys.excepthook in mlflow logger by @dakinggg in #3675
- Fix pull request target by @b-chu in #3676
- Use a temp path to save local checkpoints for remote save path by @irenedea in #3673
- Loss gen tokens by @dakinggg in #3677
- Refactor
maybe_create_object_store_from_uri
by @irenedea in #3679 - Don't error if some batch slice has no loss generating tokens by @dakinggg in #3682
- Bump version to 0.27.0.dev0 by @irenedea in #3681
New Contributors
- @ethantang-db made their first contribution in #3656
Full Changelog: v0.25.0...v0.26.0