Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix IterableDataset may block model when num_workers > 0 #40541

Conversation

heavengate
Copy link
Contributor

PR types

Bug fixes

PR changes

APIs

Describe

fix IterableDataset may block model when num_workers > 0
DataLoader has py_reader and blocking_queue after multi-process processing, blocking queue capacity is 2 * num_workers,which can store all prefetched data, so when _rcvd_idx catch up _send_idx, beside the situation that all data source drained, it can be all prefetch data stored in blocking queue to wait for network reading data. In this situation, thread of DataLoader should not occupy CPU time, simply let it go to _data_queue.get_data,this method will blocking when _data_queue do not have data, and CPU time can be used for model running.
If use continue here, _rcvd_idx and _send_idx checking may keep looping, which may occupy CPU time for model running

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@heavengate heavengate merged commit a991b6a into PaddlePaddle:develop Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants