-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU Multi-process based sampling performing worse in DGL 1.1.2 as compared to DGL 1.1.1 #6315
Comments
@UtkrishtP could you also provide "model name" from lscpu? |
@anko-intel Sure here is the output:
|
@UtkrishtP, thanks for your comprehensive study on DGL. Let's discuss the issues one-by-one.
|
@czkkkkkk , Thanks for reverting back and looking into this issue.
Let me know if you have any further questions or need any data. |
It fixes the issue dmlc#6315
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you |
@UtkrishtP Can you double check whether the problem is resolved or not? If not, feel free to reopen, we can investigate more. |
🐛 Performance Bug
Hello Team,
I have been conducting some rigorous experiments on measuring the sampling time using CPU based multi-processing (num_workers > 0)
DGL 1.1.1
Experiment details:
Hardware details:
Sample code used to measure sampling time:
DGL 1.1.2
Experiment details:
We will be using the neighbor sampler, with fused=False case too.
Sample code to measure time:
Results
NOTE : All the results are for 1 epoch.
DGL 1.1.1
Below are the sampling times for all the combinations of workers and batch_sizes:
Here, I have listed out the best performing combination:
DGL 1.1.2
As per the release notes, after introduction of fused neighbor sampling we have seen a performance improvement especially for the case where #workers = 0.
#5924
The red bars are for fused sampling whereas the blue bars are for neighbor sampling(fused = False)
As per the logic and the claim made by this (#5328 (comment)) :
The above two claims are being contradicted as explained below.
Observations
Case 1
Sampling times for fused neighbor Sampler:
(Red bars are for workers = 0 case)
Case 2:
Sampling times for neighbor sampler (fused = False)
(The red bars are the best case for DGL 1.1.1 when using multi-processed neighbor sampler)
Case 3:
Here we compare the fused neighbor sampling along with DGL 1.1.1's CPU based multi-process best case.
(Red bars highlight DGL 1.1.1 CPU based mutli-process best case)
I suspect based on the above observations some performance bug in the latest DGL 1.1.2.
Let me know if some other info is required.
TIA.
The text was updated successfully, but these errors were encountered: