Change to streaming out the heap snapshot data #52854

JianFangAtRai · 2024-01-11T04:46:09Z

This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot data (#51518 )

Here are the commit history:

* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic

* Streaming the heap snapshot! This should prevent the engine from OOMing while recording the snapshot! Now we just need to sample the files, either online, before downloading, or offline after downloading :) If we're gonna do it offline, we'll want to gzip the files before downloading them. * Allow custom filename; use original API * Support legacy heap snapshot interface. Add reassembly function. * Add tests * Apply suggestions from code review * Update src/gc-heap-snapshot.cpp * Change to always save the parts in the same directory This way you can always recover from an OOM * Fix bug in reassembler: from_node and to_node were in the wrong order * Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky. But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table? * Debugging messed up edge array idxs * Disable log message * Write the .nodes and .edges as binary data * Remove unnecessary logging * fix merge issues * attempt to add back the orphan node checking logic --------- Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Nathan Daly <NHDaly@gmail.com>

inkydragon · 2024-01-11T20:14:43Z

Since the current changes do not involve documentation, CI errors are irrelevant.
You may want to rebase based on the master branch.

┌ Error: no doc found for reference '[`Profile.HeapSnapshot.assemble_snapshot(filepath)`](@ref)' in src\stdlib\Profile.md.
└ @ Documenter C:\workdir\doc\deps\packages\Documenter\1HwWe\src\utilities\utilities.jl:44
[ Info: CheckDocument: running document checks.
[ Info: Populate: populating indices.
ERROR: LoadError: `makedocs` encountered an error [:cross_references] -- terminating build before rendering.

…ut in Julia REPL

…-stream-dev

gbaraldi · 2024-01-29T14:34:56Z

I'm not super familiar with this code, I've only modified it a bit once, but the changes seems pretty reasonable. Did you folks do some testing to see if chrome is still able to process the file that gets regenerated from the json?

JianFangAtRai · 2024-01-29T16:16:52Z

I'm not super familiar with this code, I've only modified it a bit once, but the changes seems pretty reasonable. Did you folks do some testing to see if chrome is still able to process the file that gets regenerated from the json?

Yes, we tested the snapshot with different sizes from hundreds of MBs to couple GBs, chrome was still able to process the assembled files.

src/gc-heap-snapshot.cpp

stdlib/Profile/src/heapsnapshot_reassemble.jl

…-stream-dev

This PR is to continue the work on the following PR: Prevent OOMs during heap snapshot: Change to streaming out the snapshot data (JuliaLang#51518 ) Here are the commit history: ``` * Streaming the heap snapshot! This should prevent the engine from OOMing while recording the snapshot! Now we just need to sample the files, either online, before downloading, or offline after downloading :) If we're gonna do it offline, we'll want to gzip the files before downloading them. * Allow custom filename; use original API * Support legacy heap snapshot interface. Add reassembly function. * Add tests * Apply suggestions from code review * Update src/gc-heap-snapshot.cpp * Change to always save the parts in the same directory This way you can always recover from an OOM * Fix bug in reassembler: from_node and to_node were in the wrong order * Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky. But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table? * Debugging messed up edge array idxs * Disable log message * Write the .nodes and .edges as binary data * Remove unnecessary logging * fix merge issues * attempt to add back the orphan node checking logic ``` --------- Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Nathan Daly <NHDaly@gmail.com>

Backported PRs: - [x] #54010  - [x] #54143  - [x] #54151  - [x] #54233  - [x] #54251  - [x] #54363  - [x] #54497  - [x] #53796  - [x] #54465  - [x] #54514  Need manual backport: - [ ] #52505  - [ ] #53373  - [ ] #53815  - [ ] #53984  - [ ] #54276  Contains multiple commits, manual intervention needed: - [ ] #52854  - [ ] #53218  - [ ] #53833  - [ ] #54303  - [ ] #52694  Non-merged PRs with backport label: - [ ] #54471  - [ ] #53452  - [ ] #51479

This PR is to continue the work on the following PR: Prevent OOMs during heap snapshot: Change to streaming out the snapshot data (JuliaLang#51518 ) Here are the commit history: ``` * Streaming the heap snapshot! This should prevent the engine from OOMing while recording the snapshot! Now we just need to sample the files, either online, before downloading, or offline after downloading :) If we're gonna do it offline, we'll want to gzip the files before downloading them. * Allow custom filename; use original API * Support legacy heap snapshot interface. Add reassembly function. * Add tests * Apply suggestions from code review * Update src/gc-heap-snapshot.cpp * Change to always save the parts in the same directory This way you can always recover from an OOM * Fix bug in reassembler: from_node and to_node were in the wrong order * Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky. But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table? * Debugging messed up edge array idxs * Disable log message * Write the .nodes and .edges as binary data * Remove unnecessary logging * fix merge issues * attempt to add back the orphan node checking logic ``` --------- Co-authored-by: Nathan Daly <nathan.daly@relational.ai> Co-authored-by: Nathan Daly <NHDaly@gmail.com>

Backported PRs: - [x] #51351  - [x] #52678  - [x] #54201  - [x] #54605  - [x] #54634  - [x] #54635  - [x] #54645  - [x] #54671  - [x] #54672  - [x] #54704  - [x] #54713  - [x] #54781  - [x] #54837  - [x] #54815  - [x] #55141  - [x] #55178  - [x] #55197  - [x] #55209  - [x] #55203  - [x] #54769  - [x] #54791  - [x] #55070  - [x] #54624  - [x] #54690  - [x] #55084  Need manual backport: - [ ] #52505  - [ ] #53373  - [ ] #53984  - [ ] #54276  - [ ] #54669  - [ ] #54871  Contains multiple commits, manual intervention needed: - [ ] #52854  - [ ] #53218  - [ ] #53833  - [ ] #54303  - [ ] #52694  - [ ] #54737  - [ ] #54738  - [ ] #55052  Non-merged PRs with backport label: - [ ] #55220  - [ ] #55169  - [ ] #55013  - [ ] #51479  - [ ] #50813  - [ ] #50157  - [ ] #41244

JianFangAtRai marked this pull request as draft January 11, 2024 04:46

This was referenced Jan 11, 2024

Prevent OOMs during heap snapshot: Change to streaming out the snapshot data. #51518

Closed

Change to streaming out the heap snapshot data RelationalAI/julia#127

Merged

fixed the edge type when calling find_or_create_string_id()

b83a04c

inkydragon added stdlib Julia's standard library profiler labels Jan 11, 2024

JianFangAtRai added 7 commits January 11, 2024 17:43

attempt to fix the doc issue for assemble_snapshot

e35a953

remove unused k_node_number_of_fields from gc-heap-snapshot.cpp

4d3213b

attempt to resolve the savepoint issue on serialize_node

82c41c4

remove println in take_heap_snapshot to avoid messing up console outp…

bf5b63e

…ut in Julia REPL

Merge remote-tracking branch 'origin/master' into jfang/heap-snapshot…

9bf08c2

…-stream-dev

rename alloc_type for generic memory in gc-heap-snapshot

35e0cd0

streaming strings directly to avoid cache in memory

ee71351

giordano changed the title ~~Change to streaming out the heap snapshot data (#1)~~ Change to streaming out the heap snapshot data Jan 17, 2024

JianFangAtRai marked this pull request as ready for review January 17, 2024 16:35

dedupling strings for field paths

5a9c5c9

NHDaly requested a review from gbaraldi January 24, 2024 18:57

d-netto reviewed Jan 30, 2024

View reviewed changes

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved

stdlib/Profile/src/heapsnapshot_reassemble.jl Outdated Show resolved Hide resolved

stdlib/Profile/src/heapsnapshot_reassemble.jl Outdated Show resolved Hide resolved

JianFangAtRai added 2 commits January 30, 2024 11:35

address PR comments

4525fea

Merge remote-tracking branch 'origin/master' into jfang/heap-snapshot…

1a06f9c

…-stream-dev

d-netto merged commit c16472b into JuliaLang:master Feb 1, 2024
7 checks passed

vtjnash mentioned this pull request Mar 2, 2024

node process killed by os during heap snapshot due to OOM nodejs/node#50711

Open

IanButterworth added the backport 1.10 Change should be backported to the 1.10 release label May 8, 2024

IanButterworth mentioned this pull request May 8, 2024

Backports for 1.10.4 #54416

Merged

23 tasks

KristofferC mentioned this pull request Jun 19, 2024

Backports release 1.10.5 #54851

Merged

46 tasks

KristofferC mentioned this pull request Sep 12, 2024

Backports for 1.10.6 #55746

Open

46 tasks

IanButterworth mentioned this pull request Sep 26, 2024

Profile: fix order of fields in heapsnapshot & improve formatting #55890

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change to streaming out the heap snapshot data #52854

Change to streaming out the heap snapshot data #52854

JianFangAtRai commented Jan 11, 2024

inkydragon commented Jan 11, 2024

gbaraldi commented Jan 29, 2024

JianFangAtRai commented Jan 29, 2024

Change to streaming out the heap snapshot data #52854

Change to streaming out the heap snapshot data #52854

Conversation

JianFangAtRai commented Jan 11, 2024

inkydragon commented Jan 11, 2024

gbaraldi commented Jan 29, 2024

JianFangAtRai commented Jan 29, 2024