Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow run and error generating flamegraph #163

Closed
hmcezar opened this issue Jul 22, 2022 · 6 comments · Fixed by #164
Closed

Slow run and error generating flamegraph #163

hmcezar opened this issue Jul 22, 2022 · 6 comments · Fixed by #164

Comments

@hmcezar
Copy link

hmcezar commented Jul 22, 2022

Bug Report

Current Behavior Running memray on a HPC cluster took several hours, when the same input runs in minutes on my laptop. After the run finally finished, I got the following error trying to generate the flamegraph:

Traceback (most recent call last):
  File "/cluster/software/Python/3.9.6-GCCcore-11.2.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/cluster/software/Python/3.9.6-GCCcore-11.2.0/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/__main__.py", line 6, in <module>
    sys.exit(main())
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/commands/__init__.py", line 124, in main
    arg_values.entrypoint(arg_values, parser)
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/commands/common.py", line 117, in run
    self.write_report(result_path, output_file, args.show_memory_leaks, **kwargs)
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/commands/common.py", line 86, in write_report
    reporter = self.reporter_factory(
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/reporters/flamegraph.py", line 116, in from_snapshot
    transformed_data = with_converted_children_dict(data)
  File "/cluster/projects/nn4654k/hmcezar/venv/hymd/lib/python3.9/site-packages/memray/reporters/flamegraph.py", line 31, in with_converted_children_dict
    the_node["children"] = [child for child in the_node["children"].values()]
AttributeError: 'list' object has no attribute 'values'

Input Code

python -m memray run --native --follow-fork --trace-python-allocators -m hymd dpc.toml final_step1_centered.hdf5 --seed 400755
python -m memray flamegraph memray-hymd.78911.bin

Expected behavior/code I expected to run memray in about the same time it takes on my laptop, and in the end get a flamegraph.

Environment

  • Python(s): Python 3.9
  • Modules
Currently Loaded Modules:
  1) StdEnv                        (S)   7) XZ/5.2.5-GCCcore-11.2.0          (H)  13) gompi/2021b                         19) Tcl/8.6.11-GCCcore-11.2.0   (H)
  2) GCCcore/11.2.0                      8) libxml2/2.9.10-GCCcore-11.2.0    (H)  14) Szip/2.1.1-GCCcore-11.2.0      (H)  20) SQLite/3.36-GCCcore-11.2.0  (H)
  3) zlib/1.2.11-GCCcore-11.2.0    (H)   9) libpciaccess/0.16-GCCcore-11.2.0 (H)  15) HDF5/1.13.1-gompi-2021b             21) GMP/6.2.1-GCCcore-11.2.0    (H)
  4) binutils/2.37-GCCcore-11.2.0  (H)  10) hwloc/2.5.0-GCCcore-11.2.0       (H)  16) bzip2/1.0.8-GCCcore-11.2.0     (H)  22) libffi/3.4.2-GCCcore-11.2.0 (H)
  5) GCC/11.2.0                         11) hpcx/2.9                              17) ncurses/6.2-GCCcore-11.2.0     (H)  23) OpenSSL/1.1                 (H)
  6) numactl/2.0.14-GCCcore-11.2.0 (H)  12) OpenMPI/4.1.1-GCC-11.2.0              18) libreadline/8.1-GCCcore-11.2.0 (H)  24) Python/3.9.6-GCCcore-11.2.0
  • PyPI packages
attrs==21.4.0
commonmark==0.9.1
Cython==0.29.30
guppy3==3.1.2
h5py==3.7.0
-e git+https://github.com/hmcezar/HyMD.git@5fbd0a56c3259fa3838e449b11401b42bacc1694#egg=hymd
iniconfig==1.1.1
Jinja2==3.1.2
MarkupSafe==2.1.1
memory-profiler==0.60.0
memray==1.2.0
mpi4py==3.1.3
mpmath==1.2.1
mpsort==0.1.17
networkx==2.8.4
numpy==1.23.0
packaging==21.3
pfft-python==0.1.21
pluggy==1.0.0
plumed==2.9.0.dev0
pmesh==0.1.56
psutil==5.9.1
py==1.11.0
Pygments==2.12.0
pyparsing==3.0.9
pytest==7.1.2
pytest-mpi==0.6
rich==12.5.1
sympy==1.10.1
tomli==2.0.1

Extra information memray table runs, giving the following output:
memray-table-hymd.78911.zip

@godlygeek
Copy link
Contributor

I believe I see the bug. Are you able to check whether removing or commenting out line 23 of memray/reporters/flamegraph.py fixes the issue? That's the line that assigns "children": {}, in MAX_STACKS_NODE.

@godlygeek
Copy link
Contributor

Scratch that, I don't think that's it... I'm still trying to see how we could have gotten that error.

@godlygeek
Copy link
Contributor

Nope, I was right the first time. That is the cause, and removing line 23 does fix it. The steps for reproducing this locally were just much more complicated than I thought they'd be, but I eventually managed to figure out how to do it, and to confirm that removing that line fixes the problem.

@godlygeek
Copy link
Contributor

As for slowness, was your run on your laptop using --native and --trace-python-allocators? Each of those options slows things down a lot, and combining them both is pretty much a worst-case stress test for how much overhead Memray can add.

@hmcezar
Copy link
Author

hmcezar commented Jul 23, 2022

I installed and ran your fix (PR #164) and apparently the flamegraph bug is fixed.

Also, I don't know if it's because I had to recompile the C stuff myself, but it's also running way faster (comparable to my laptop).
Previously, it was much slower even if I didn't use the --trace-python-allocators.

Thanks for the fix!

@godlygeek
Copy link
Contributor

Fix landed on main, and will be included in the next release. Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants