Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips for training large-scale face recognition model, such as millions of IDs(classes). #1426

Open
nttstar opened this issue Mar 8, 2021 · 0 comments
Labels

Comments

@nttstar
Copy link
Collaborator

nttstar commented Mar 8, 2021

For training ArcFace models by millions of IDs, we may meet some time efficiency problems.

=====
P1: There are too many classes that my GPUs can not handle.

Solutions:

  1. To reduce memory usage of the classification layer, model-parallelism and partial-fc can be the good ideas.

  2. Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple fp16-scale parameter:

export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' 
python -u train_parall.py --network r50 --dataset emore --loss arcface --fp16-scale 1.0

or change the following setting in partial-fc MXNet implementation.

config.fp16 = True
  1. Use distributed training.

=====
P2. The training dataset is too huge, io cost is high which leads to very low training speed.

Solutions:

  1. Sequential data loader instead of random access.
    Right now the default face recognition datasets(*.rec) are indexed key-value databases, called MXIndexedRecordIO. So the data loader is required to randomly access the items in these datasets while doing the training. The performance is acceptable only if the data is located on ram-filesystem or very fast SSD. For general hard disks, we must use an alternative method to avoid random access.

    a. Use recognition/common/rec2shufrec.py to convert any indexed '.rec' dataset to a shuffled sequential one called MXRecordIO
    b. In ArcFace, set is_shuffled_rec=True in config file to use the converted shuffled dataset. Please check get_face_image_iter() function in image_iter.py for detail information.
    c. Shuffled dataset-loader requires sequential scanning only, and provides data shuffling in a small in-memory buffer.
    d. Shuffled dataset can also benefit from the c++ runtime of MXNet record reader which accelerates the image processing.

=====
Any question or discussion can be left in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant