You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple fp16-scale parameter:
or change the following setting in partial-fc MXNet implementation.
config.fp16 = True
Use distributed training.
===== P2. The training dataset is too huge, io cost is high which leads to very low training speed.
Solutions:
Sequential data loader instead of random access.
Right now the default face recognition datasets(*.rec) are indexed key-value databases, called MXIndexedRecordIO. So the data loader is required to randomly access the items in these datasets while doing the training. The performance is acceptable only if the data is located on ram-filesystem or very fast SSD. For general hard disks, we must use an alternative method to avoid random access.
a. Use recognition/common/rec2shufrec.py to convert any indexed '.rec' dataset to a shuffled sequential one called MXRecordIO
b. In ArcFace, set is_shuffled_rec=True in config file to use the converted shuffled dataset. Please check get_face_image_iter() function in image_iter.py for detail information.
c. Shuffled dataset-loader requires sequential scanning only, and provides data shuffling in a small in-memory buffer.
d. Shuffled dataset can also benefit from the c++ runtime of MXNet record reader which accelerates the image processing.
=====
Any question or discussion can be left in this thread.
The text was updated successfully, but these errors were encountered:
For training ArcFace models by millions of IDs, we may meet some time efficiency problems.
=====
P1: There are too many classes that my GPUs can not handle.
Solutions:
To reduce memory usage of the classification layer, model-parallelism and partial-fc can be the good ideas.
Enable FP16 can further reduce the GPU memory usage and also get acceleration on modern NVIDIA GPUs. For example, we can enable fp16 training by a simple
fp16-scale
parameter:or change the following setting in partial-fc MXNet implementation.
=====
P2. The training dataset is too huge, io cost is high which leads to very low training speed.
Solutions:
Sequential data loader instead of random access.
Right now the default face recognition datasets(*.rec) are indexed key-value databases, called
MXIndexedRecordIO
. So the data loader is required to randomly access the items in these datasets while doing the training. The performance is acceptable only if the data is located on ram-filesystem or very fast SSD. For general hard disks, we must use an alternative method to avoid random access.a. Use recognition/common/rec2shufrec.py to convert any indexed '.rec' dataset to a shuffled sequential one called
MXRecordIO
b. In ArcFace, set
is_shuffled_rec=True
in config file to use the converted shuffled dataset. Please checkget_face_image_iter()
function inimage_iter.py
for detail information.c. Shuffled dataset-loader requires sequential scanning only, and provides data shuffling in a small in-memory buffer.
d. Shuffled dataset can also benefit from the c++ runtime of MXNet record reader which accelerates the image processing.
=====
Any question or discussion can be left in this thread.
The text was updated successfully, but these errors were encountered: