atomic energy after linear regression #4069

chtchelkatchev · 2024-08-20T22:50:01Z

chtchelkatchev
Aug 20, 2024

Hi,
At the beginning of the finetuning of a deepmd model on two GPUs I get the following info in the logs:
[1] DEEPMD rank:1 INFO RMSE of atomic energy after linear regression is: 0.023360175044598894 eV/atom.
[1] DEEPMD rank:1 INFO Change energy bias of ['B', 'S', 'Se'] from [-11.62286998 -1.07731472 21.40182854] to [-11.62282221 -1.04902143 21.44841567].

[0] DEEPMD rank:0 INFO RMSE of atomic energy after linear regression is: 0.029234642187003657 eV/atom.
[0] DEEPMD rank:0 INFO Change energy bias of ['B', 'S', 'Se'] from [-11.62286998 -1.07731472 21.40182854] to [-11.5774448 -1.28134359 21.26522962].

The problem is that calculated energy shifts are very inaccurate and they depend on the task number. This problem is probably related to insufficient statistics (my dataset is good). I found the parameters "data_stat_nch", "data_protect" and "bias_sample" in the manual. By increasing "bias_nsample" to 1000, I got better statistics and less different energy shifts. Can you please expand on the meaning of these parameters "data_stat_nch", "data_protect" and "bias_sample" in the manual and provide recommendations on how to optimize accuracy in calculating energy shifts?

A big request for the next version of Deepmd is to ensure that, by default, the energy shift is calculated as accurately as possible during tuning (across the entire dataset).

iProzd · 2024-08-21T02:56:40Z

iProzd
Aug 21, 2024
Collaborator

Hi @chtchelkatchev , thank you for your question. Indeed, the documentation for these parameters needs to be improved. However, I have a concern: theoretically, only rank 0 should perform the statistics, but it appears that all ranks are doing this in your case. Could you please provide your execution command, code version (and an input example if possible)? Have you followed the multi-GPU training documentation properly?

1 reply

chtchelkatchev Aug 21, 2024
Author

Hi, I'm using the latest, 2.2.11-version of deepmd. I tried to train also using only one task. Then the calculated by deepmd energy shift changes every time I start training... Doing calculations on two mpi ranks I've used the comand:
mpirun -l -launcher=fork -hosts=localhost -np $SLURM_NTASKS dp train --mpi-log=workers $file --finetune ../frozen_model.pb
where all variables were defined before...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atomic energy after linear regression #4069

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

atomic energy after linear regression #4069

chtchelkatchev Aug 20, 2024

Replies: 1 comment · 1 reply

iProzd Aug 21, 2024 Collaborator

chtchelkatchev Aug 21, 2024 Author

chtchelkatchev
Aug 20, 2024

Replies: 1 comment 1 reply

iProzd
Aug 21, 2024
Collaborator

chtchelkatchev Aug 21, 2024
Author