Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when eval the same checkpoint, different batch size get different result, how to handle this problem? #33

Open
baiyuting opened this issue Nov 13, 2023 · 0 comments

Comments

@baiyuting
Copy link

baiyuting commented Nov 13, 2023

I want to get a stable eval result, however, when I eval the same checkpoint, I found different batch size (such as 32, 64) get different result, I try to set generation_temperature==0, however , the results still changes when I change the batch size from 60 to 32, how to prevent this thiing happening ?

below is my command

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 --master_port 11112 eval_base.py
--ckpt_dir ../data/weights/
--llm_model 7B
--tokenizer_path ../data/weights/tokenizer.model
--data_root ../data
--caption_file ../data/captions.json
--adapter_path ./xxxx/checkpoint-19.pth
--adapter_type attn
--adapter_dim 8
--adapter_scale 1
--prompt_format QCM-ALE
--max_batch_size 32
--max_seq_len 512
--split test
--n_prompt 50
--temperature 10.
--generation_temperature 0.
--visual_adapter_type router
--bits 4bit
--cpu_load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant