Releases: PaddlePaddle/PaddleNLP
v3.0.0-beta1
What's Changed
- [DCU] high performance LLM train and inference for DCU by @yuguo-Jack in #8580
- fix benchmark dir and add CUDA_DEVICE_MAX_CONNECTIONS to qwen by @fightfat in #8678
- bug fix by @wtmlon in #8687
- [XPU] add lora optimization by @dynamicheart in #8527
- [pir save] Modiy export llama model file in pir mode by @xiaoguoguo626807 in #8689
- [AutoParallel]Change
max_steps
in Llama2-7b config for auto-parallel. by @heavyrain-lzy in #8679 - [benchmark] Change the mirror source for pip by @mmglove in #8699
- update loss base of auto-parallel tests by @zhiqiu in #8701
- Add new mistral by @wtmlon in #7425
- [Safetensors] Fix safetensors shape by @DesmonDay in #8702
- [BUG] num_samples 向下去整, 防止prefrech预取时候超过数据集最大长度... by @JunnYu in #8690
- xpu use allgather by @FeixLiu in #8697
- add fast_rmsnorm by @deepllz in #8680
- enable use_fast_layer_norm for llama2 benchmark by @deepllz in #8714
- fix xpu gather for unified ckpt by @FeixLiu in #8710
- [inference] support load or save Llama2-7b in three patterns by @lizexu123 in #8712
- fix fast_ln backward by @deepllz in #8719
- finetune support use_fast_layer_norm by @tianhaodongbd in #8717
- bug fix by @FeixLiu in #8730
- disable lora by @lugimzzz in #8674
- [Safetensors] Fix mmap for Windows system by @DrownFish19 in #8734
- correct broken links in readme by @jzhang533 in #8741
- revert benchmark fix by @ronny1996 in #8747
- [LLM] Add Yuan model by @zhaogf01 in #8654
- fix nlp dir and auto_parallel_ci exit -6 by @fightfat in #8744
- [LLM] Update sequence parallel linear import by @DrownFish19 in #8706
- [Bug fixes] Fix ring attention by @zhangyuqin1998 in #8740
- update a100 loss by @zhiqiu in #8708
- [PaddleNLP 3.0] Update README by @DrownFish19 in #8681
- [AutoParallel] update loss for global clip by @JZ-LIANG in #8750
- [NPU] Fix sequence parallel lib import by @DrownFish19 in #8760
- [DEV] Update develop version show by @DrownFish19 in #8754
- [inference] support load or save Llama2-7b in three patterns by @lizexu123 in #8766
- add benchmark baichuan2 scripts by @fightfat in #8683
- Add the missing truncation=True in llm/predictor.py by @lszxb in #8768
- fix the ce for the unittest by @wawltor in #8772
- Enable parallel_config to use commas as delimiters. by @Difers in #8677
- fix incorrect token counting in
llm/predictor.py
by @lszxb in #8769 - Refine savable by @ZHUI in #8758
- [CodeStyle] remove markdownlint-cli by @DrownFish19 in #8779
- [XPU] use allgather and fp32 multinomial for XPU by @houj04 in #8787
- fix version show by @DrownFish19 in #8791
- [BUG] Add 20 redundant data in post pretrain by @JunnYu in #8789
- vera-pissa method added by @TranscenderNing in #8722
- update version by @DrownFish19 in #8792
- [Inference LLM] refine some code in llama wint8/4 by @yuanlehome in #8796
- [DCU] Llama a8w8 inference performance optimization by @Deleter-D in #8800
- [Prediction] Update LLM prediction. by @DesmonDay in #8778
- [Trainer] Add enable_sp_async_reduce_scatter by @DesmonDay in #8803
- [AutoParallel] Refine auto_trainer save load by @zhangbo9674 in #8767
- [MoE] Optimizer parameter broadcast by @DesmonDay in #8810
- [Doc] Update README by @DrownFish19 in #8817
- support Llama3.1 8B 128K generation on single GPU 80GB by @GuoxiaWang in #8811
- add paddle nv-embed-v1 by @Li-Z-Q in #8785
- fix pad_token_id bug by @yuanlehome in #8814
- [DCU] fix llama inference bug on DCU by @Deleter-D in #8815
- [Doc] Add LLaMA3.1 by @DrownFish19 in #8824
- [BUG] Fix build train valid test datasets by @JunnYu in #8826
- Add tune_cublaslt_gemm operator by cublaslt gemm algorithm and generate algo cache file by @Hanyonggong in #8799
- fix tune_cublaslt_gemm compile bug by @yuanlehome in #8844
- [AutoParallel] Refine save and load ckpt for auto_trainer by @zhangbo9674 in #8828
- [Unified Checkpoint] update merge tensor parallel by @DesmonDay in #8856
- [Trainer] update clear_grad by @DesmonDay in #8829
- [Unified Checkpoint] Fix tie_word_embeddings by @DesmonDay in #8795
- [Inference LLM] support static c8 by @yuanlehome in #8833
- support sft mapdataset by @greycooker in #8840
- Cherry pick some changes from incubate branch by @sneaxiy in #8862
- support nested list of dict inputs by @deepllz in #8876
- Fix the bug with issues code 8641. by @smallbenxiong in #8880
- Fix the issue of P-tuning official sample error by @guangyunms in #8884
- modify Paddlemix qwen dytostatic by @xiaoguoguo626807 in #8869
- [llm]fix zeropadding by @lugimzzz in #8895
- 修复fast_ln算子动半开启后报错 by @Wennie396 in #8891
- enable_sp_async_reduce_scatter for qwen_72b && llama2_70b by @deepllz in #8897
- Update run_pretrain.py by @ZHUI in #8902
- [doc] Update readme by @DrownFish19 in #8905
- [AutoParallel] Bugfix auto parallel FA by @JZ-LIANG in #8903
- [Readme] Update README.md by @ZHUI in #8908
- [cherry-pick] Optimize async save by @ForFishes in #8878
- [LLM Inference] Refactor BlockInferencePredictor by @yuanlehome in #8879
- [Fix] modify tensorboard requirements by @greycooker in #8904
- [LLM Inference] Support qwen2 by @yuanlehome in #8893
- modify dict include none to aviod pir dytostatic bug in while op by @xiaoguoguo626807 in #8898
- [LLM]Update yuan model by @zhaogf01 in #8786
- update qwen && baichuan benchmark config by @deepllz in #8920
- [doc] Update README by @DrownFish19 in #8922
- [ New features]Trainer support dict parameter by @greycooker in #8446
- set logging_step to 5 with baichuan && qwen benchmark by @deepllz in #8928
- [Cherry-pick]fix pipeline eval by @gongel in #8924
- fix test_wint8 ut by @yuanlehome in #8930
- [LLM Inference] support llama3.1 by @yuanlehome in #8929
- Fix tokens count for benchmark by @DrownFish19 in #893...
v3.0.0-beta0
很高兴地通知大家,飞桨大模型套件发布v3.0.0beat版本:拥抱大模型,体验全升级。具体工作如下:
- 统一大模型工具链,实现国产计算芯片全流程接入;
- 全面支持飞桨4D并行配置、高效精调策略、高效对齐算法、高性能推理等大模型产业级应用流程;
- 自研极致收敛的RsLoRA+算法、自动扩缩容存储机制Unified Checkpoint和通用化支持FastFFN、FusedQKV助力大模型训推;
- 主流模型持续支持更新,提供高效解决方案。
大模型精调对齐训推优化
-
PEFT:
-
DPO:
-
国产芯片支持:
-
性能优化:
-
其他
- 新增模型内存监控 in #8269
模型新增
-
新增Gemma模型 in #8082
- google/gemma-7b
- google/gemma-7b-it
- google/gemma-2b
- google/gemma-2b-it
-
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
- meta-llama/Meta-Llama-3-70B
- meta-llama/Meta-Llama-3-70B-Instruct
-
新增Qwen2模型 in #8338 #8584 #8601
- Qwen/Qwen1.5-0.5B
- Qwen/Qwen1.5-0.5B-Chat
- Qwen/Qwen1.5-1.8B
- Qwen/Qwen1.5-1.8B-Chat
- Qwen/Qwen1.5-4B
- Qwen/Qwen1.5-4B-Chat
- Qwen/Qwen1.5-7B
- Qwen/Qwen1.5-7B-Chat
- Qwen/Qwen1.5-14B
- Qwen/Qwen1.5-14B-Chat
- Qwen/Qwen1.5-32B
- Qwen/Qwen1.5-32B-Chat
- Qwen/Qwen1.5-72B
- Qwen/Qwen1.5-72B-Chat
- Qwen/Qwen1.5-110B
- Qwen/Qwen1.5-110B-Chat
- Qwen/Qwen1.5-MoE-A2.7B
- Qwen/Qwen1.5-MoE-A2.7B-Chat
- Qwen/Qwen2-0.5B
- Qwen/Qwen2-0.5B-Instruct
- Qwen/Qwen2-1.5B
- Qwen/Qwen2-1.5B-Instruct
- Qwen/Qwen2-7B
- Qwen/Qwen2-7B-Instruct
- Qwen/Qwen2-72B
- Qwen/Qwen2-72B-Instruct
- Qwen/Qwen2-57B-A14B
- Qwen/Qwen2-57B-A14B-Instruct
基础框架升级
-
功能优化:
-
AutoParallel优化
-
分布式能力优化:
-
chat能力优化:
- 增加Chat template in #8226
-
其他
问题修复
- 修复sharding数量小于100的bug in #8146
- 修复TP/PP参数合并问题 in #8239
- 修复tensor.shape与paddle.shape(tensor)不一致问题 in #8260
- 修复fp16+delay_scale_loss_scale+sharding_stage1_overlap的bug in #8314
- 增加pipelines运行文档及提示 in #8292 #8308 #8202 #8353
- 修复text feature extraction任务中tokenizer输入 in #8331
- 修复import error in #8332 #8367
结构调整
PaddleNLP文件结构调整 in #8609 #8613 #8605 #8614 #8617 #8626 #8618 #8625 #8619 #8629 #8601 #8627 #8666
What's Changed
- [dist]pip requirements-dev.txt by @Liujie0926 in #8258
- add scaling by @lugimzzz in #8256
- [LLM]Support Gemma model by @Southpika in #8082
- [BugFix] Try except sequence parallel utils by @DesmonDay in #8189
- Update CodeCov GitHub Action by @sijunhe in #8268
- [AutoParallel] Open recompute strategy for llama model by @zhangbo9674 in #8265
- Fix sharding < 100 limitation bug by @sneaxiy in #8146
- use tensor.shape bug not paddle.shape(tensor) by @wanghuancoder in #8260
- [dist CI]update paddlenlp install for CI by @Liujie0926 in #8267
- [Bug Fix]Fix merge parameters in pp by @Southpika in #8239
- [LLM] add memory stats to logger of trainer by @SylarTiaNII in #8269
- Add p2p_comm_overlap for Llama-2-70b benchmark. by @Xreki in #8276
- add a100 test ground truth by @zhiqiu in #8249
- [paddle-pipelines] faq semantic search question answering reamde by @w5688414 in #8292
- [paddle-pipelines] Add pipelines documentation by @w5688414 in #8308
- Support llama-3 by @ZHUI in #8307
- [Distributed] [CustomDevices] Adapt SP on lora && polish MC2 APIs by @SylarTiaNII in #8303
- fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap by @FeixLiu in #8314
- [paddle-pipelines] Update mkdocs by @w5688414 in #8310
- [benchmark]update llama2_ips by @Liujie0926 in #8322
- [dist CI]fix before_hook by @Liujie0926 in #8283
- benchmark llama worker=1 by @wanghuancoder in #8305
- 【AutoParallel】Add llama2 UT for auto-parallel by @heavyrain-lzy in #8300
- Add system env log for llama test by @zhangbo9674 in #8321
- [LLM] Support fuse attention q, k, v weights by @DrownFish19 in #8202
- [Distributed] fix lora by @SylarTiaNII in #8325
- fix try import by @w5688414 in https://github.com/PaddlePaddle/Pa...
v2.8.1
What's Changed
- [Trainer] Fix sharding overlap bug by @DesmonDay in #8334
- [Cherry-pick] update truncate by @KB-Ding in #8375
- [BugFix] Fix llama3
eot_id
. by @ZHUI in #8373 - [Trainer] update distributed dataloader by @DesmonDay in #8426
- [BugFix] Fix load rng compatibility. by @ZHUI in #8451
- Cherry pick/fast_safe_open by @ZHUI in #8458
- 【cherry pick】adapter new type promotion rule for Paddle 2.6 by @zxcd in #8463
- Quick fix from pretrained. by @ZHUI in #8487
- Release/2.8 by @Galaxy1458 in #8437
- Fix from_pretrained
os.path.split
by @DesmonDay in #8508 - [fea] Cherry-picked MOE updates from develop by @bo-ke in #8531
- [LLM] relocate tensor_parallel_output to avoid conflict (#8419) by @DesmonDay in #8533
- Update sequence_parallel for predict by @DesmonDay in #8547
- Cp/fix by @ZHUI in #8569
- Do not save moe_group by @DesmonDay in #8570
- [Release] 2.8.1 by @ZHUI in #8636
Full Changelog: v2.8.0...v2.8.1
v2.8.0
很高兴地通知大家,飞桨大模型套件发布v2.8.0版本。这个版本中,我们深度优化套件的大模型精调对齐的能力,提升大模型套件在国产计算硬件训推能力,具体工作如下:
- 特色精调和高效对齐:提供自研极致收敛的RsLoRA+算法,大幅提升PEFT训练收敛速度以及训练效果;引入高性能生成加速到RLHF PPO算法,打破 PPO 训练中生成速度瓶颈,PPO训练性能大幅领先。
- 大模型训练提速:通用化支持 FastFNN、FusedQKV等多个大模型训练性能优化方式,大模型训练更快、更稳定。
大模型精调对齐训推优化
- 精调
- 推理
- 新增QWenVL 的静态图推理 #7808
模型新增
- 新增QWenVL 的静态图推理 #7808
- 新增Deberta,Debertav2模型 #8227
- deepset/deberta-v3-large-squad2
- microsoft/deberta-v2-xlarge
- microsoft/deberta-v3-base
- microsoft/deberta-v3-large
- microsoft/deberta-base
- 新增mixtral-of-experts #7803
- mistralai/Mixtral-8x7B-Instruct-v0.1
- mistralai/Mixtral-8x7B-v0.1
- 新增LLama3 #8315
- meta-llama/Meta-llama-3-8b
- meta-llama/Meta-Llama-3-8B-Instruct
- meta-llama/Meta-llama-3-70b
- meta-llama/Meta-Llama-3-70B-Instruct
基础框架升级
- Trainer升级
- AutoParallel升级
- 其他
其他支持
- 新增俄罗斯套娃(matryoshka representation learning)检索策略,节省计算和存储资源。#8165
问题修复
- 日志级别修改,并增加timelog计时日志,兼容不同设备。#8261
- 修复pipeline并行中随机初始化的shared weights不一致的问题,覆盖GPT/OPT等模型。#7772
- 关闭CI及单测中从huggingface hub下载的逻辑 #7798 #8198
- 修复llm的gradio开启chat template时候重复拼接query 和 history的问题。#7992
- 修复GPT模型下载key error问题。#8253
- 修复LlamaRotaryEmbedding #7882
- 修复allreduce dtype的问题 #7876
- 修复框架侧dev分支清理 paddle.jit.dy2static.utils_helperAPI的问题 #7989
- 修复read-data timer在ignore_data_skip=False and skip_profile_timer=False 的问题。#8177
- 修复Wandb单测问题 #8066 #8056
- 修复Trainer同时解析json与命令行列表参数报错问题#7860
- 修复Gradio UI 中的推理问题 #7740 #7788
- 修复 Tokenizer 相关的基础问题 #7797 7870
- 修复 custom devices上loading rng state的问题。#7894
- 修复自动并行打印BF16的loss编码错乱的问题#7874
- 采用float初始化模型,修复静态图自动并行AMP报错问题#8033#8199
- 修复ShardDataloader接口在PipeLine Parallelism下使用错误问题#8014
- 修复llama在custom devices的精度问题。#7895
- 修复NPU AICPU算子问题 #7976
- 修复FusedLinearWithGradAdd少传参数的问题。#8178
What's Changed
- [Unified Checkpoint] Add unified checkpoint training args doc. by @DesmonDay in #7756
- [AutoParallel] Auto Trans PP to VPP by @zhaoyinglia in #7747
- Add codecov check by @zjjlivein in #7760
- [CE] Delete gpt_for_sequence_classification by @ZHUI in #7757
- [DOC] Update trainer.md by @ZHUI in #7761
- [Release] Change version to 2.7.0 by @ZHUI in #7764
- [benchmark]close skip_memory_metrics for ips by @Liujie0926 in #7732
- [Release] Update release.yml to release tags by @ZHUI in #7765
- [AutoParallel] Add Sequence Parallel for Static LLaMA by @JZ-LIANG in #7746
- [New Features] support dynamic src_length by @wj-Mcat in #7740
- Fix unified_checkpoint bug by @DrownFish19 in #7770
- [DONE] aistudio, hf hub, bos update download by @JunnYu in #7608
- [Trainer] Fix dist dataloader eval by @DesmonDay in #7777
- [Paddle-pipelines] Update convert_files_to_dicts_splitter by @w5688414 in #7748
- [PEFT]fix lora model tp when existing other trainable module by @lugimzzz in #7781
- [Paddle-Pipelines] update faiss by @qingzhong1 in #7793
- Fix shared weights sync for PipelineLayer by @DrownFish19 in #7772
- [tests] download slow by @JunnYu in #7798
- [INFER][LLM] Support qwen in fined grained dybatch v1 by @DanGuge in #7644
- Add CE for Distributed Hybrid Parallel by @iosmers in #7782
- add MP2-SP2-pp4-vpp2-SD2-stage1-mbs2-acc8 ce by @tianhaodongbd in #7774
- [Pretrain] Fix eval during pretrain by @DesmonDay in #7806
- pipeline parallel benchmark by @zhangting2020 in #7759
- [Bug fixes] fix br gradio by @wj-Mcat in #7788
- delete useless code for write_cache_kv.cu by @yuanlehome in #7812
- [llm]support qlora pp by @lugimzzz in #7801
- Trainer support simultaneously parse JSON files and cmd arguments. by @greycooker in #7768
- [LLM] Support block_attention/cachekv quant for llama by @RichardWooSJTU in #7649
- [Bug Fix] fix paddle multipy_fwd_func warning message by @BeingGod in #7818
- [llm]fix lora by @lugimzzz in #7824
- fused rms spmd by @liuzhenhai93 in #7830
- [Pretrain] Fix eval during pretrain by @DesmonDay in #7827
- [neural search][fix bug of evaluate.py] by @ZeyuTeng96 in #7832
- [neural search] fix the bug of reading files when calculating the recall scores by @shenghwa in #7836
- [Bug fixes] update chatglm tokenizer by @wj-Mcat in #7797
- [semantic_indexing] fix bug of evaluate.py by @ZeyuTeng96 in #7843
- [faq] fix bug of evaluate.py by @ZeyuTeng96 in #7840
- [text_classification_retrieval_based] fix bug of evaluate.py by @ZeyuTeng96 in #7844
- [LLM] add Qwen-7B-Chat to PaddleNLP unit test by @ziangqin-baidu in #7823
- Support 5.2 bloom by @zhoutianzi666 in #7846
- [unified checkpoint] Fix last checkpoint save by @DrownFish19 in #7854
- [unified checkpoint] fix checkpoint names by @DrownFish19 in #7795
- [New Features]add ranks testing for test_predictor by @wj-Mcat in #7800
- [Auto Parallel] Support dynamic semi-auto training in Llama2 model by @haohongxiang in #7851
- [CI] add ci approval pipelines by @zjjlivein in #7859
- [fix] fix a bug of trainer/argparser.py by @greycooker in #7860
- [Improvement] fix ops improting in utils by @wj-Mcat in #7865
- [Add CE] Add CE for Hybrid Parallism by @iosmers in #7817
- [Unified Checkpoint] Cherry pick empty cache. by @ZHUI in #7868
- Add PPO training. by @guoshengCS in #7305
- Update reward_main.py by @wawltor in #7880
- Update ppo_main.py by @wawltor in #7881
- [LLM] revert benchmark codes by @RichardWooSJTU in #7871
- [LLM]support QWenVL second part by @DanGuge in #7808
- [Bug Fixes] update chatglm1 tokenizer by @wj-Mcat in #7870
- 【AutoParallel】Support 'master_grad' in Llama in static auto-parallelism by @heavyrain-lzy in #7658
- [Bug Fix] fix slice bug in LlamaRotaryEmbedding by @MarioLulab in #7882
- 【AutoParallel】Support bf16 loss in static by @heavyrain-lzy in #7874
- [Bug Fix] fix allreduce tensor dtype by @BeingGod in #7876
- [CE] Add Qwen into CE process by @ziangqin-baidu in #7887
- [Hackathon 5th No.73] ToT by @ErnestinaQiu in #7660
- [CustomDevice] fix loading rng state on custom devices by @SylarTiaNII in #7894
- [LLM] ...
v2.7.2
本版本做了一些小问题的修复
What's Changed
- [Unified Checkpoint] fix checkpoint names by @DrownFish19 in #7794
- [Unified Checkpoint] Fix last checkpoint save by @DrownFish19 in #7810
- [PEFT] Cherry pick lora fix by @lugimzzz in #7826
- [Unified Checkpoint] Fix unified checkpoint by empty cache. by @ZHUI in #7855
- [Fix Download] update converted logic & fix hf hub download subfolder bug by @JunnYu in #7911
- [Cherry-pick] logger level by @KB-Ding in #7920
- [Cherry-pick] RuntimeTimer for the toolkit (#7913) by @KB-Ding in #7921
- [Release] 2.7.2 for paddlenlp bugfix. by @ZHUI in #7892
Full Changelog: v2.7.1...v2.7.2
v2.7.1
本版本做了一些小问题的修复
What's Changed
- 修复了训练恢复遇到的一些问题 @ZHUI in #7771
- 修复了GPT在Pipeline模式下的初始化问题 @DrownFish19 in #7775
- 修复了dist dataloader评估时的问题。 @DesmonDay in #7778
Full Changelog: v2.7.0...v2.7.1
PaddleNLP 2.7.0 Release Note
很高兴地通知大家,飞桨大模型套件发布v2.7.0版本。这个版本中,我们深入优化套件的大模型能力。从易用性、性能、到稳定性都有巨大提升。
总体而言,当前版本更新有以下亮点:
- 统一工具链大模型入口。统一预训练、精调、压缩、推理以及部署等环节的实现代码,到 PaddleNLP/llm目录。
- 全新大模型工具链文档。一站式指引用户从大模型入门到业务部署上线。文档见: https://paddlenlp.readthedocs.io/zh/latest/llm/finetune.html
- 全断点存储机制 Unified Checkpoint。 在存储断点时将模型权重、优化器权重等进行统一safetensors格式存储,不再区分分布式策略存储,并且支持恢复训练的动态扩缩容,大大提高大模型存储的通用性。
- 高效微调升级。支持了高效微调+LoRA同时使用,支持了QLoRA等算法。
大模型训推全流程
- 预训练
- 统一了预训练入口到
llm/run_pretrain.py
。 - 支持了qwen 等模型预训练,支持flash attention。
- 统一了预训练入口到
- 精调
- 支持可LoRA + Linear量化同时使用
- 支持了流水线并行模型 + lora一起使用
- 支持了NEFTune方法
- 添加了QLoRA支持
- 压缩
- 支持PTQ、QAT量化功能,包括A8W8、WINT8、WINT4、A8W4
- 支持SmoothQuant、GPTQ、AWQ等量化算法
Unified Checkpoint
- 在大模型背景下,通常我们需要进行多卡分布式的训练,在保存Checkpoint时所得到的模型权重通常是分片放置的,例如根据张量并行、流水线并行进行切分保存。这种根据分布式策略直接存储Checkpoint的方式非常直接明了,但也存在如下的问题:
- 对下游推理不够友好,当用户希望获取中间阶段保存的Checkpoint做下游推理时,需要手动对模型权重进行合并。
- 不利于应对做恢复训练时,可能会面临的分布式策略改变、训练节点数发生变化的情况。用户往往需要手动对Checkpoint进行处理,增加了操作复杂度。
- 为了最大程度地解决上述的问题,降低用户操作难度,我们对大模型存储框架进行了升级,提出了大模型统一存储方案——Unified Checkpoint。Unified Checkpoint的核心思想是将模型权重、优化器权重等进行统一safetensors格式存储,在Checkpoint存储时不再对分布式策略进行区分,提高大模型存储的通用性。
- Unified Checkpoint具备以下功能与特点:
- 权重存储不区分分布式策略,并采用safetensors格式统一存储;
- 灵活支持大模型训练扩容、缩容等各种情况,能够适配不同分布式训练策略的切换。
模型新增
moka-ai/m3e-base
检索模型BAAI/bge-small-zh-v1.5
检索模型
基础框架升级
- Trainer 升级
- 支持了 "--skip_memory_metrics 0"是,显示实时 显存、内存占用
- 支持 "--unified_checkpoint" "--unified_checkpoint_config" 支持混合并行下模型save,动态扩缩容重启。
- 新增 PretrainModelPipe基础类,支持流水线并行训练。
其他支持 - 支持了paddlenlp commit id 展示
paddlenlp.version.commit
- 支持AI Studio download add save to aistudio hub
问题修复
- 修复了dist_dataloader的一些问题
- 修复了一些模型动转静问题
- 修复了GPT训练的一些bug,移除了GPT2。修复了一些seed设置问题
- 修复了baichuan模型在流水线并行的一些问题。
New Contributors
- @Wennie396 made their first contribution in #6897
- @Wong4j made their first contribution in #7008
- @yuanlehome made their first contribution in #7080
- @Xreki made their first contribution in #7105
- @Tom-Zheng made their first contribution in #7092
- @TimeYWL made their first contribution in #7122
- @From00 made their first contribution in #7168
- @RichardWooSJTU made their first contribution in #7186
- @heavyrain-lzy made their first contribution in #7269
- @LokeZhou made their first contribution in #7337
- @JZ-LIANG made their first contribution in #7301
- @WAI-clear made their first contribution in #7402
- @tianhaodongbd made their first contribution in #7293
- @zzjjay made their first contribution in #7504
- @anexplore made their first contribution in #7558
- @niuliling123 made their first contribution in #7528
- @zxcd made their first contribution in #7577
- @MayYouBeProsperous made their first contribution in #7575
- @iosmers made their first contribution in #7613
- @AndSonder made their first contribution in #7343
- @zhink made their first contribution in #7679
- @kingTLE made their first contribution in #7708
Full Changelog: v2.6.1...v2.7.0
v2.6.1
What's Changed
在v2.6.1版本中,我们做了大量的bug修复,提高了LLM模型和相关组件的稳定性。除了bug修复以外,主要新增功能如下:
- LLM:新增了 qwen 模型,InTokens数据流兼容了Pipeline Parallel,LLM精调支持从多个训练文件加载以及热启动,增强了LLaMA模型的不同recompute粒度
- Trainer: hybrid_parallel_topo_order 选项,并修复了 sharding stage3 的保存模型。
- Paddle-pipelines: 添加了对 ERNIE-Bot-turbo和ERNIE-embedding 的支持, 更新了分层搜索示例并且增强了 ChatPaper 的UI
- Megatron 数据集:添加了加载 megatron 数据集的支持,支持ernie-1.0和T5数据类型
New Contributors
- @xiezheng-XD made their first contribution in #6764
- @carryyu made their first contribution in #6676
- @xiaoxiaohehe001 made their first contribution in #6798
- @MARD1NO made their first contribution in #6865
- @zhoutianzi666 made their first contribution in #6905
- @lchdl made their first contribution in #6964
- @LaiXinyi823 made their first contribution in #6659
Full Changelog: v2.6.0...v2.6.1
v2.6.0
PaddleNLP 2.6 正式版本:全新升级,迈进大模型时代!
我们很高兴宣布,PaddleNLP 2.6版本现已全新升级并正式发布!此次升级标志着我们正式迈入了大模型时代。在PaddleNLP 2.6版本中,我们推出了全新的飞桨大语言模型全流程工具链。这套工具链涵盖了预训练、精调、压缩、推理以及部署等环节,为用户提供了一个完整的端到端大模型解决方案。
我们的工具链全面支持LLaMA 1/2, BLOOM, ChatGLM 1/2, GLM, OPT等主流大模型。这使得用户可以在使用同一套工具的前提下,以低成本的方式尝试各种不同的大模型。
为了支持这套大模型工具链,我们进行了大量的底层和基础框架侧的升级:
- 我们将Trainer API升级成为了4D并行分布式Trainer,这让模型的训练过程变得更加高效。
- 我们实现了高效微调算法LoRA/Prefix Tuning,使得单机可以精调千亿级别的模型。
- 同时,我们还依托PaddleSlim的自研量化算法,在所有支持的大模型上全面实现了无损量化。
这些升级都是为了让我们的用户能在大模型时代中更加轻松地进行模型的训练、优化和部署。我们期待你的试用,并期待你的反馈,让我们一起推进PaddleNLP的发展。在2.5版本到2.6版本中PaddleNLP有 40 位新增Contributors,感谢大家对PaddleNLP开源工作的支持!
New Contributors
- @zws-2019 made their first contribution in #5167
- @qiuwenbogdut made their first contribution in #5098
- @kuizhiqing made their first contribution in #5347
- @46319943 made their first contribution in #5419
- @jiaohuix made their first contribution in #5465
- @kangguangli made their first contribution in #5438
- @vivienfanghuagood made their first contribution in #5563
- @zhiboniu made their first contribution in #5470
- @cyber-pioneer made their first contribution in #5598
- @invokerbyxv made their first contribution in #5622
- @megemini made their first contribution in #5658
- @zhenyun-li made their first contribution in #5683
- @solrex made their first contribution in #5736
- @nemonameless made their first contribution in #5487
- @Yulv-git made their first contribution in #5709
- @wangxinxin08 made their first contribution in #5773
- @AlphaHinex made their first contribution in #5815
- @houj04 made their first contribution in #5820
- @Joker1718 made their first contribution in #5816
- @pkuzyc made their first contribution in #5538
- @jadepeng made their first contribution in #5841
- @KB-Ding made their first contribution in #5886
- @parap1uie-s made their first contribution in #5775
- @zirui made their first contribution in #5866
- @GOH-Gu made their first contribution in #5951
- @yangjianfengo1 made their first contribution in #6069
- @zhangting2020 made their first contribution in #5922
- @rogerserper made their first contribution in #6192
- @wtmlon made their first contribution in #6258
- @qingzhong1 made their first contribution in #6251
- @BeingGod made their first contribution in #6307
- @zhiqiu made their first contribution in #6347
- @DesmonDay made their first contribution in #6435
- @cyk1337 made their first contribution in #6447
- @lxp521125 made their first contribution in #6491
- @littsk made their first contribution in #6425
- @RachelXu7 made their first contribution in #6572
- @wanghuancoder made their first contribution in #6539
- @DrownFish19 made their first contribution in #6570
- @GhostScreaming made their first contribution in #6673
Full Changelog: v2.5.2...v2.6.0