-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snapstore: unused snapshots are never deleted #3374
Comments
It would help me to unpack that a bit. Should I read "we" as the customer, i.e. the validator node operator? (What we the developers want is of relatively little importance, yes?) I suppose the validator node operator wants nodes to consume as little space as is feasible. A simple requirement is "at most one snapshot per live vat". Is there sufficient motivation for anything more complex than that? And what does "unused" mean? The simplest design I see is to schedule destruction of the previous snapshot of a vat whenever we make a new snapshot for that vat. It happens to align nicely with the simple requirement above. But that wouldn't involve (non-trivial) reference counting. When would multiple vats share a snapshot? I suppose the zygote design sketch in #2391 suggests at least one snapshot that isn't the most recent snapshot of any live vat. The code in #2848 does something similar with the I suppose it's also possible to have multiple vats that share a history, and hence might have exactly the same snapshot. Separating snapshot storage by vat seems simpler than reference counting, to address that possibility. To capture the customer requirement irrespective of how it's carried out, perhaps a better title is "snapstore: bounded storage per vat"? |
Validators want to minimize space usage, we (the platform developers) want to support validators. "at most one snapshot per live vat" is an excellent target. "unused" means the snapshot ID does not appear in the set of committed DB keys that capture the (snapshot ID, transcript starting point) data for each vat. If a vat needs snapshot X, and the snapstore doesn't have it, the kernel cannot ever page-in that vat, which is kernel-fatal. Mutiple vats will definitely share the "just SES and supervisor" snapshot for the first 20 cranks of their lives, or until whatever heuristic we settle upon decides it's worth the time to record a new one. If/when we get vat creation fast enough, and people start using short-lived single-purpose contract vats, they might not even make 20 cranks before being terminated. The zygote feature will make it possible to share more-developed snapshots (and if we know we're using a vat as a zygote starting point, we'll certainly want a snapshot to start from, that's where all our performance will come from, unless we also use a But more importantly, if we're identifying snapshots by their hash (which I still think is a good idea), then we're not identifying them by the vat(s) which use them, which means we need some other mechanism to guarantee the availability of the snapshot and minimize storage needs. Reference counting isn't the only possible solution, but it's the only one that's occurred to me so far. The motivating example might be: we have 1000 vats all launched from the same zygote snapshot, all of which have executed 19 cranks and then needed to be paged out for whatever reason, now how do we not keep 1000x snapshots for them, while still being able to delete that snapshot when the installation is replaced and the zygote parent vat is deleted and all 1000 child vats have either moved on (and have unique snapshots of their own) or been terminated (and no longer need storage of any kind)? As for multiple vats sharing a significant history and converging on a shared snapshot: I think that's unlikely enough that I'm not worried about trying to deduplicate it, but it's absolutely critical that we maintain availability of the snapshot. So e.g. if two vats do happen to overlap for whatever reason (parallel evolution), we must not delete a snapshot out from under them. |
can be postponed for a bit@warner measured 12,000 deliveries in 2 hrs thru vattp, the busiest vat. Using
So that's 18Mb of storage for 2hrs. We can afford that for a while. |
snapshot GC using kernelDBone idea:
|
FYI, my monitor node for the last testnet grew after 2 weeks to about 26GB worth of snapshot, spread over 3093 files, for 36 vats. |
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. After `commitCrank()`, the kernel calls `vatWarehouse.pruneSnapshots()`, which 1. calls `kernelKeeper.getUnusedSnapshots()`, 2. tries to `snapStore.delete()` each of them, and 3. reports the results using `kernelKeeper.forgetUnusedSnapshots()`. fixes #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. After `commitCrank()`, the kernel calls `vatWarehouse.pruneSnapshots()`, which 1. calls `kernelKeeper.getUnusedSnapshots()`, 2. tries to `snapStore.delete()` each of them, and 3. reports the results using `kernelKeeper.forgetUnusedSnapshots()`. fixes #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. After `commitCrank()`, the kernel calls `vatWarehouse.pruneSnapshots()`, which 1. calls `kernelKeeper.getUnusedSnapshots()`, 2. tries to `snapStore.delete()` each of them, and 3. reports the results using `kernelKeeper.forgetUnusedSnapshots()`. fixes #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. After `commitCrank()`, the kernel calls `vatWarehouse.pruneSnapshots()`, which 1. calls `kernelKeeper.getUnusedSnapshots()`, 2. tries to `snapStore.delete()` each of them, and 3. reports the results using `kernelKeeper.forgetUnusedSnapshots()`. fixes #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. After `commitCrank()`, the kernel calls `vatWarehouse.pruneSnapshots()`, which 1. calls `kernelKeeper.getUnusedSnapshots()`, 2. tries to `snapStore.delete()` each of them, and 3. reports the results using `kernelKeeper.forgetUnusedSnapshots()`. fixes #3374
Pruning snapshots involves an O(n) query on the DB, so doing it on every crank seems expensive. refs #3374
In addition to maintaining a mapping from vatID to snapshotID, vatKeeper maintains a reverse mapping. When the reverse mapping is empty, `snapStore.prepareToDelete()` queues the snapshot for deletion. When the host calls `swingstore.commit()`, swingstore calls `snapStore.commitDeletes()`. fixes #3374 refs #3431 * feat(swing-store-lmdb): host commit() deletes snapshots * feat(snapstore): prepare/commit deletes * feat(swing-store-lmdb): provide snapStore with kvStore, streamStore * chore: move snapStore to swing-store-lmdb package * refactor: factor out makeSnapStoreIO - vatKeeper.saveSnapshot() prepares deletes - removeFromSnapshot() returns consumers.length * test: use tmp directories for snapStore in tests * test(swingset): move c.shutdown to caller of runSteps: runSteps didn't create it; its caller still has a reference * chore(cosmis-swingset): use consolidated snapStore API * chore(solo): use consolidated snapStore API * refactor(swingset): static type for kvStore in vatKeeper - getRequired() asserts that get() does not return undefined - fix addHelpers() return type by declaring arg type - where kvStore.get() is ensured by getKeys() or has(), mark the alternative with assert.fail(). * docs(swingset): mark DB keys excluded from consensus * chore(snapstore): assert hash has no path separators * test(snapstore): robust prepare / commit delete * docs(swingstore): document snapStore commit constraint * test: move xsnap/snapstore integration test to solo to avoid @agoric/xsnap in swing-store-lmdb devDependencies * test(snapstore): re-save between prepare and commit delete plus one more /etc/passwd test
What is the Problem Being Solved?
We want to delete unused snapshots. The snapstore is somewhat disconnected from the set of vats which use their snapshots (e.g. the snapstore doesn't know which vats are using which snapshots). And multiple vats might be using a single snapshot, at least for a little while (if we don't write a new snapshot until 20 cranks have been delivered, then all young vats will be using the same starting snapshot).
So we need some sort of refcounts, and something to notice when a snapshot is unused, and to delete the file.
We should probably get this implemented before e.g. moving vat-tp back to
xsnap
, since it receives a lot of messages (probably more than any other vat in the system), which will produce a lot of snapshots. And making it stateless (#3305) is stalled on some deep science (reconstructing Remotables without replay). The vat-tp state is small and doesn't grow over time, so each snapshot should be minimal (I think we've seen 200-300kB?), but writing a new one every 20 deliveries would be a lot of space.Description of the Design
Not sure, maybe swingStore should be aware of which keys are being used to hold snapshot names, and could perform the refcounting internally. Or maybe something on the kernel side should track it.
Security Considerations
Test Plan
The text was updated successfully, but these errors were encountered: