Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneprof crashes when using mpirun + workload that calls make #37

Open
guoyejun opened this issue May 5, 2023 · 0 comments
Open

oneprof crashes when using mpirun + workload that calls make #37

guoyejun opened this issue May 5, 2023 · 0 comments

Comments

@guoyejun
Copy link

guoyejun commented May 5, 2023

It is in a single node (localhost in the hostfile), and the command line looks like:
oneprof -i -p ~/oneprof_log/ -o ~/oneprof_log/oneprof.log mpirun -n 2 -ppn 2 -hostfile hostfile_mpich python -u pretrain_gpt.py ...

in the python script pretrain_gpt.py, 'make' is called at https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/data/dataset_utils.py#L82, also copy here for your convenience.

def compile_helper():
    """Compile helper function ar runtime. Make sure this
    is invoked on a single process."""
    import os
    import subprocess
    path = os.path.abspath(os.path.dirname(__file__))
    ret = subprocess.run(['make', '-C', path])
    if ret.returncode != 0:
        print("Making C++ dataset helpers module failed, exiting.")
        import sys
        sys.exit(1)

and the command crashes even if the 'make' does not call the compiler because the target (.so file) is newer that its dependent files.

And it runs successfully if I disable that line to not call make.

jfedorov pushed a commit that referenced this issue Dec 15, 2023
* Initial version of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Unhide Symbols Required By XPTI

* Initial commits of unitrace

* Initial commits of unitrace

* Summarizing device timing regardless of kernel shapes by default

* Summarizing device timing with out kernel shapes by default

* Summarizing device timing with out kernel shapes by default

* Summarizing device timing with out kernel shapes by default

---------

Co-authored-by: Schilling, Matthew <matthew.schilling@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant