Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalene doesnt work properly with torchrun / torch.distributed.run #823

Open
deo-abhijit opened this issue May 9, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@deo-abhijit
Copy link

I got error while running scalene with torch.distributed.run .

I am currently following this doc

python -m torch.distributed.run --nproc_per_node=8 --master_port=2333 tools/train.py projects/configs/VAD/VAD_base.py --launcher pytorch --deterministic --work-dir path/to/save/outputs

this command runs perfectly, but when i replace the python -m with scalene, it raises error. I think the main issue is my train_mz.py takes other arguments as input from command line. and scalene is prolly passing them as args to torch.distributed.run.main() function.

although this is just a speculation.

Also there is very similar stackoverflow question on exactly similar lines.

It would be really nice if someone could help me out here. Thanks

@emeryberger
Copy link
Member

You can use --- to tell Scalene to stop processing arguments (so put all Scalene arguments first, then ---, then any other arguments), but I suspect this will not fix the problem. Please give it a try, though.

@emeryberger
Copy link
Member

You might also try specifying --cpu to help isolate the issue (if it works, that tells us something).

@deo-abhijit
Copy link
Author

You can use --- to tell Scalene to stop processing arguments (so put all Scalene arguments first, then ---, then any other arguments), but I suspect this will not fix the problem. Please give it a try, though.

Actually I had tried this as well, even the person who asked the question on stackoverflow also did try that.

But still it gave error

@emeryberger emeryberger added the bug Something isn't working label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants