[LLM] add memory stats to logger of trainer #8269

SylarTiaNII · 2024-04-12T12:15:19Z

PR types

Others

PR changes

Others

Description

Add memory stats to logger of trainer for performance monitoring on CE.

paddle-bot · 2024-04-12T12:15:24Z

Thanks for your contribution!

ZHUI · 2024-04-15T03:25:52Z

PaddleNLP/paddlenlp/trainer/trainer.py

Lines 1298 to 1310 in 87a92a0

    
           logs.update( 
        
               { 
        
                   "cpu_mem_used": self._memory_tracker.cpu_mem_used() >> 20, 
        
                   "cpu_mem_used_peak": self._memory_tracker.cpu_mem_used_peak >> 20, 
        
               } 
        
           ) 
        
           if is_paddle_cuda_available(): 
        
               logs.update( 
        
                   { 
        
                       "gpu_max_memory_allocated": paddle.device.cuda.max_memory_allocated() >> 20, 
        
                       "gpu_max_memory_reserved": paddle.device.cuda.max_memory_reserved() >> 20, 
        
                   } 
        
               )

下面有这个日志的

wawltor

LGTM

This reverts commit beb433a.

* [XPU] llama add xpu support (#8282) * [XPU] llama add xpu support * fix * use try import * fix * refine * refine * refine * refine * update (#8399) * [LLM] Support fuse attention q, k, v weights (#8202) 1. add use-interface & fuse action 1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch * 3.6.1 solve shard branch after rebase develop * code clean * remove debug comment * Redefine fuse and split functions * Redefine fuse and split functions * comment and fix * update method * update QKV fuse and split * support fuse weights in multi-files * add precision compare * simplify function call * support use_fast_ffn * clean modeling and configuration * add test for gpt and opt * fix tp_actions get * add fast_ffn test * add Qwen2Moe * Revert "add Qwen2Moe" This reverts commit 113b883. * add test for split * update doc * update filter_dict_keys --------- Co-authored-by: Zii <ziangqin.baidu@gmail.com> * [LLM] Fix fuse or split with same key (#8378) * fix fuse or split with same key * fix * fix eps * update format * [LLM] add decay steps option for finetuning (#8251) * [LLM] add memory stats to logger of trainer (#8269) * [Distributed] fix lora (#8325) * [LLM] fix lora target modules on llama (#8372) * [Distributed] metric calculation supports tp logits (#8370) * Update model_utils.py * Update model_utils.py * Update model_utils.py --------- Co-authored-by: Jianbang Yang <yangjianbang112@gmail.com> Co-authored-by: DrownFish19 <DrownFish19@gmail.com> Co-authored-by: Zii <ziangqin.baidu@gmail.com> Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com>

[LLM] add memory stats to logger of trainer

87a92a0

SylarTiaNII force-pushed the add_memory_stats_to_log branch from 894c62d to 87a92a0 Compare April 12, 2024 13:46

wawltor approved these changes Apr 17, 2024

View reviewed changes

wawltor merged commit beb433a into PaddlePaddle:develop Apr 17, 2024
7 of 8 checks passed

ZHUI added a commit that referenced this pull request Apr 17, 2024

Revert "[LLM] add memory stats to logger of trainer (#8269)"

3ad4984

This reverts commit beb433a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] add memory stats to logger of trainer #8269

[LLM] add memory stats to logger of trainer #8269

SylarTiaNII commented Apr 12, 2024

paddle-bot bot commented Apr 12, 2024

ZHUI commented Apr 15, 2024

wawltor left a comment

[LLM] add memory stats to logger of trainer #8269

[LLM] add memory stats to logger of trainer #8269

Conversation

SylarTiaNII commented Apr 12, 2024

PR types

PR changes

Description

paddle-bot bot commented Apr 12, 2024

ZHUI commented Apr 15, 2024

wawltor left a comment

Choose a reason for hiding this comment