Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transposed weakmaps - performance cliffs in XS revealed by #3105 #3118

Closed
erights opened this issue May 18, 2021 · 10 comments
Closed

transposed weakmaps - performance cliffs in XS revealed by #3105 #3118

erights opened this issue May 18, 2021 · 10 comments
Assignees
Labels
bug Something isn't working performance Performance related issues xsnap the XS execution tool
Milestone

Comments

@erights
Copy link
Member

erights commented May 18, 2021

endojs/endo#706 is an attempt to implement endojs/endo#705 , in order to get a speedup of harden on XS.
#3105 measures it to see if it actually works. To reproduce, you need to checkout
https://github.com/Agoric/agoric-sdk/tree/test-harden-perf and https://github.com/endojs/endo/tree/alt-harden and yarn link them together. Then in
agoric-sdk at packages/xsnap run
time yarn test -sv test/test-harden-perf.js

The output I got is in the attached file:

harden-perf-data2.txt

See if you get similar output. This output contains some results so bizarre that it makes me wonder if there is something terribly wrong about the code being tested or the code testing it. If not, then these bizarre cases demonstrate some bizarre performance cliffs in XS that we all need to understand. Attn @phoddie . A striking example is the contrast between the following two results:

{
 "branding": "BOTH", // <---
 "count": 100,
 "name": "listHard",
 "meterType": "xs-meter-7",
 "compute": 100211, // <---
 "allocate": 42074144,
 "allocateChunksCalls": 1,
 "allocateSlotsCalls": 2,
 "garbageCollectionCount": 0,
 "mapSetAddCount": 3483,
 "mapSetRemoveCount": 799,
 "maxBucketSize": 16,
 "innerTime": 0.83984375, // <---
 "outerTime": 56
} 

vs

{
 "branding": "POSITIVE", // <---
 "count": 100,
 "name": "listHard",
 "meterType": "xs-meter-7",
 "compute": 5905005, // <---
 "allocate": 42074144,
 "allocateChunksCalls": 1,
 "allocateSlotsCalls": 2,
 "garbageCollectionCount": 2,
 "mapSetAddCount": 25018,
 "mapSetRemoveCount": 0,
 "maxBucketSize": 17,
 "innerTime": 72.958984375, // <---
 "outerTime": 128
} 

Both of these are running the listHard test 100 deep. The listHard test is:

      let list = {};
      for (let i = 0; i < ${count}; i++) {
        list = harden({next: list});
      }

The first record is running it where harden is implemented using both the positive and negative branding techniques, and testing them for consistency with each other. They apparently are consistent with each other, at least for this case, or we would have gotten an error.

The second does harden using only the positive technique. The first should be doing all the work of the second and much more. However the first has a compute meter of 100,211 and completes in under one milisecond. The second has a compute meter of 5,905,005 and completes in over 72 miliseconds.

@erights erights added the bug Something isn't working label May 18, 2021
@erights
Copy link
Member Author

erights commented May 18, 2021

@dckc @warner as discussed in the meeting this morning, I've assigned you two. But of course grab me if I can help. Good luck!

@erights
Copy link
Member Author

erights commented May 18, 2021

(corrected above "seconds" --> "milliseconds")

@warner
Copy link
Member

warner commented May 18, 2021

I've updated the test-harden-perf branch (to commit 2cde3a7) to include a corresponding Node.js test. Node does not seem to exhibit the bizarre behavior where BOTH takes much less time than either POSITIVE or NEGATIVE, at least not as severely as XS.

I've added my results in https://github.com/Agoric/agoric-sdk/tree/test-harden-perf/packages/xsnap/harden-perf-node and https://github.com/Agoric/agoric-sdk/tree/test-harden-perf/packages/xsnap/harden-perf-xsnap . This is using the Moddable submodule at commit ddab2e5cb0b221973d83705d0c2a60e4c6b41bb2 (probably from our agoric-labs/Moddable fork), which should be equivalent to the current official Moddable public branch (2674e932f86564d0476f03316eaa744a8edf464d) plus a minor bit of instrumentation. agoric-sdk is at commit 2cde3a7, and the yarn linked endojs repository is on the alt-harden branch at commit d9096206b3a505d4d2e9f4d18b89ea0cec74708b .

listHard test

The "listHard" test grows an N-deep linked list, hardening at each step (list = harden({next: list}), repeated N times). That means N calls to harden, each of which needs to 1: see if the given object is already hardened or not, 2: if not, walk its connectivity graph to find all reachable non-hardened objects, 3: freeze all of the non-hardened objects, then 4: mark them all as hardened. In the POSITIVE approach, the "is it hardened" test uses a WeakMap, so we expect to see about 2N WeakMap .has checks (one false, then one true), on a WeakMap that grows to size N. The NEGATIVE approach patches Object.freeze (and others) to add the object to a smaller WeakMap of objects that are frozen but not (yet) hardened, and then harden() removes objects from this WeakMap. BOTH does both approaches, using separate WeakMaps.

Under Node.js, the listHard test takes roughly the same amount of time for each of the three modes (plus or minus some noise). All times are in milliseconds:

$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==100 and .name=="listHard") | [.count, .branding, .name, .innerTime]'
[100,"BOTH","listHard",45.1930830180645]
[100,"POSITIVE","listHard",44.76495200395584]
[100,"NEGATIVE","listHard",51.35582000017166]

$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==1000 and .name=="listHard") | [.count, .branding, .name, .innerTime]'
[1000,"BOTH","listHard",1119.4810279905796]
[1000,"POSITIVE","listHard",1029.8868469893932]
[1000,"NEGATIVE","listHard",1026.47445499897]

$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==10000 and .name=="listHard") | [.count, .branding, .name, .innerTime]'
[10000,"BOTH","listHard",90814.28993099928]
[10000,"POSITIVE","listHard",90027.41138899326]
[10000,"NEGATIVE","listHard",91152.67771998048]

However, under XS, the N=100 case shows BOTH taking 100x less time than either POSITIVE or NEGATIVE. (the N=1000 POSITIVE/NEGATIVE tests timed out after at least 30s of runtime, and the BOTH tests exceeded the compute meter limit until I disabled it):

$ cat harden-perf-xsnap/turing-xsnap-data.json |jq -c '.[] | select(.count==100 and .name=="listHard") | [.count, .branding, .name, .innerTime]'
[100,"BOTH","listHard",1.119873046875]
[100,"POSITIVE","listHard",106.73779296875]
[100,"NEGATIVE","listHard",167.447998046875]

# 1k BOTH took 11ms
# 10k BOTH took 148ms (after I disabled the compute meter)
# 100k BOTH took 5s (after I disabled the compute meter)

This is really strange, and we don't have any idea what's happening. The fact that it doesn't happen under Node.js suggests that there's some property of the XS implementation that interacts weirdly with the BOTH mode.. something is getting short-circuited.

hardList test

The hardList test creates an unhardened length-N linked list, then hardens the whole thing. This test seems to take about the same time for all three modes, under both Node.js and XS. The wallclock time grows super-linearly, but the exponent appears to be much worse for XS. Note that all times include both the creation of the linked list, and the subsequent harden() call:

$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==100 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[100,"BOTH","hardList",0.8037289977073669]
[100,"POSITIVE","hardList",0.7623150050640106]
[100,"NEGATIVE","hardList",0.7779809832572937]
$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==1000 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[1000,"BOTH","hardList",3.266084998846054]
[1000,"POSITIVE","hardList",3.2358070015907288]
[1000,"NEGATIVE","hardList",3.262690991163254]
$ cat harden-perf-node/turing-node-data.json |jq -c '.[] | select(.count==10000 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[10000,"BOTH","hardList",30.743683993816376]
[10000,"POSITIVE","hardList",30.096190005540848]
[10000,"NEGATIVE","hardList",30.212065994739532]
# 100k takes about 240ms
# 1M takes about 2.5s

$ cat harden-perf-xsnap/turing-xsnap-data.json |jq -c '.[] | select(.count==100 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[100,"BOTH","hardList",0.6279296875]
[100,"POSITIVE","hardList",0.989013671875]
[100,"NEGATIVE","hardList",1.0439453125]
$ cat harden-perf-xsnap/turing-xsnap-data.json |jq -c '.[] | select(.count==1000 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[1000,"BOTH","hardList",8.087158203125]
[1000,"POSITIVE","hardList",7.427001953125]
[1000,"NEGATIVE","hardList",8.22412109375]
$ cat harden-perf-xsnap/turing-xsnap-data.json |jq -c '.[] | select(.count==10000 and .name=="hardList") | [.count, .branding, .name, .innerTime]'
[10000,"BOTH","hardList",6083.156005859375]
[10000,"POSITIVE","hardList",5468.64404296875]
[10000,"NEGATIVE","hardList",6396.0849609375]

@dckc dckc added the xsnap the XS execution tool label Jun 7, 2021
@dckc dckc added this to the Testnet: Stress Test Phase milestone Jun 18, 2021
@dckc
Copy link
Member

dckc commented Jun 21, 2021

@phoddie says the transposed weak map work about ready to go... I'm fuzzy on how to verify it; I'm interested in help.

@dtribble what about O(n^2) Array.push() Are we ok to release without that?

@dckc dckc changed the title Investigate apparent performance cliffs in XS revealed by #3105 transposed weakmaps - performance cliffs in XS revealed by #3105 Jun 21, 2021
@rowgraus
Copy link

Moddable code landing would obviate this ticket

@dckc
Copy link
Member

dckc commented Jun 21, 2021

@dtribble what about O(n^2) Array.push() Are we ok to release without that?

@warner and I agreed in today's testnet meeting to track the array stuff in #3012.

@dckc
Copy link
Member

dckc commented Jun 22, 2021

this just in:

@dckc
Copy link
Member

dckc commented Jun 22, 2021

after discussion with @erights @warner , we're in no rush to merge 801429b

MarkM gives a couple screw cases:

  • unreferenced cycle that crosses the membrane
  • whole graph inside membrane

interesting w.r.t. WeakMap GC optimizations
^ these seem testable.

warner added a commit that referenced this issue Jun 25, 2021
Update the packages/xsnap/moddable submodule to current public branch, plus
an upcoming xsPromise.c fix.

* current public branch includes a new WeakMap design, which drastically
improves GC speed
* the xsPromise.c fix now correctly drops rejection handlers for resolved
promises, which was probably the cause of #3406 (unexpected retention of
Presences used in `E()` calls, resulting in too few GC actions)

refs #3406  (might even close it)
refs #3118
@dckc
Copy link
Member

dckc commented Jun 26, 2021

#3415 should fix this. We have some evidence that it does. Until we find out otherwise, let's call this done.

@dckc dckc closed this as completed Jun 26, 2021
@erights
Copy link
Member Author

erights commented Jun 26, 2021

Prior to the Moddable algorithm, the best known was Felix's algorithm from the thread at
https://esdiscuss.org/topic/linear-time-weak-map-gc

Now for historic interest only, so that it will be captured somewhere findable, here is, first, the paper Felix is responding to
fast-epemeron-collection.pdf

My transpose of Felix's algorithm
fast-epemeron-collection.pdf

and my sketch illustrating the transposed Felix algorithm when the transposed table is represented as a linked list of pairs (idea from Moddable).
transposed-felix-sketch.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Performance related issues xsnap the XS execution tool
Projects
None yet
Development

No branches or pull requests

5 participants