-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp of SharedArrayBuffer PR #2518
Conversation
@domenic do you want to preserve the first two separate commits? Might have to do some trickery to apply the above changes then. |
Details of workers in Firefox:
|
No need to preserve the first two separate commits. It'd be vaguely nice if things could land separately, e.g. remove StructuredCloneWithTransfer, define agents and agent clusters, define SAB structured cloning, but it's not necessary. I've certainly tried for separate commits and failed on similarly-large projects. |
I've split some of this out into #2520 and #2521. To do StructuredCloneWithTransfer separately I think we'd have to tightly couple it with messageerror. Otherwise you end up changing the same algorithms everywhere (and in service workers) twice. I also think we'd have to do it before we land the SharedArrayBuffer bits. The alternative is coupling all these together. If you can make a suggestion I can work on it tomorrow, provided the other two PRs are in reasonable shape. |
@binji pointed me here - just wondering if throwing when attempting to store a SAB to IndexedDB falls out of this PR or if we need to tackle that separately. |
@inexorabletash no, we lost that property when we stopped requiring it to be a transferable. I suspect we might need StructuredSerializeWithoutSharedArrayBuffer, also for the Notifications API, which has similar constraints as Indexed DB (also keeps items for storage). @domenic what do you think? Requiring StructuredSerialize callers to traverse the record seems unnecessarily complicated. |
The above is assuming btw that |
So IMO attempting to send a SharedArrayBuffer to IDB should work fine. It should snapshot it at that time, as part of the IDB-specific records-in-memory-to-bytes-on-disk process. As a web developer I think that would be more expected than requiring me to create a copy of my object graph with each SAB replaced by an AB whenever I want to store something on disk. But let's say we want to prohibit it, for the purpose of this discussion. In that case I think it's part of a more general problem, where certain things aren't storable on disk. The obvious upcoming example is a stream. Again to me the best layering for this sort of thing is for it to be part of IDB's records-in-memory-to-bytes-on-disk process. That just makes the most sense from my perspective, as it's IDB-specific. Maybe we can help IDB out by defining some abstract op like "CanSerializationBeWrittenToDisk" that crawls the record graph for you? The normative question at hand, BTW, is whether serializing the JS object: @inexorabletash has previously expressed off-thread that it fits better with Blink's architecture to not invoke |
Storage question also affects the Notifications API at least, which does some limited form of storage. |
I guess I am coming around to StructuredSerializeWithoutSharedArrayBuffer, as long as it is generic (e.g. has a name like StructuredSerializeForStorage) and ideally is used by at least two consumers. I don't quite understand why Notifications needs this as I thought it was all passing data around in memory. It certainly doesn't seem like Notifications are stored on disk. In particular the Notifications case seems pretty much identical to the pushState case. (Which is a good test to write for SABs... let me file a web platform tests) |
You can create a notification from a service worker with serializable data, then that agent cluster can be collected, and the browser could even be closed (and store the data to disk) and then when the user clicks the notification you'd need to deserialize it, maybe from disk. And even if the disk isn't involved it wouldn't have to be the same instance of the service worker, meaning it's a new agent cluster. |
Hmm, I see! Thanks, I didn't understand how |
So I still think we need to decide, for both Notifications and IDB, whether it snapshots the data or errors at serialization time. Snapshotting makes a lot more sense to me, but nobody seems sure... I guess we could always be conservative and let it fail for now, and see if users get upset about having to crawl their object graph and create a copy with all SABs replaced by AB snapshots... |
The main reason no serialization error feels icky is because writing to disk is effectively crossing the Agent Cluster boundary (same if you pass shared memory to the network). And whenever you normally cross the Agent Cluster boundary we decided on failure. We could also do what some implementations do and hand out an AB when you try to cross a Agent Cluster boundary with an SAB, but we kinda already decided against that. |
To me the disk is qualitatively different from the in-memory agent cluster boundaries. Asking to serialize something to IDB is pretty clearly asking for you to put the bytes on the disk. That's is qualitatively different from sending the memory to another window/worker. I doubt developers even know that they are both backed by similar spec/implementation infrastructure. Putting it another way, when a developer says "put these bytes on disk" it's pretty surprising to be told "no, these are special shared bytes, we don't put those on the disk". For putting-on-disk purposes, they're just bytes, IMO! The notifications case is less straightforward, as it certainly wasn't obvious to me that it was more like IDB than like postMessage. So my intuition doesn't help there. |
It seems very natural to me, but either way they will need a different code path. These are the options we have I think:
|
OK, trying to pick between those made me side with (1). In addition to the argument-from-conservativism, the idea that Notifications and IDB should be consistent clinched it. Because neither (2) nor (3) is very good for Notifications. Both are surprising for developers, I would think. FWIW @binji and @inexorabletash were leaning toward (1) from what I understand, and Firefox already implements (1), so I think we should be good to move forward with it. Thanks again for walking me through all this. |
I decided to split out messageerror into its own PR: #2530. We can still land it together with this or shortly after each other, but it seems far better to get it reviewed separately. |
I will rebase this PR once we've made some progress on the other PRs and maybe landed one or two. In particular this probably needs to be on top of the messageerror PR due to the addition of the new StructuredSerializeForStorage abstract operation, but it also needs to be on top of the other PRs and it doesn't quite seem worth the effort to start cherry picking to make all that possible as the foundations are not at the nit stage just yet. |
Thank you both for your care and attention on this. |
On today's WebAssembly call we briefly discussed the pushState/replaceState case and thought that should probably use StructuredSerializeForStorage as well, both to be conservative and because it seems at least feasible implementations could serialize history state to the disk during navigation. |
That sounds like a good plan. |
Wait, this already happens, but I guess you mean to the "serialization steps" as stated earlier. |
But also, if we change "serialization steps", everyone that has integrated with the current calling pattern will have to change again. Perhaps introducing "serialization for storage steps" isn't so bad? |
Oh sorry, I hadn't read through the actual changes yet :-/. I agree it's a bit confusing whether people who have integrated with the current pattern for declaring serialization steps need to declare the third argument if they don't use it. E.g. in JavaScript they would not, but in most strongly-typed languages they would. Maybe "serialization for storage steps" is the easiest way out of that confusion. |
It also makes more sense as the second argument I thought. Hmm. |
I decided to make forStorage a third argument to "serialization steps" after all and recommend it be omitted if it's not relevant to the algorithm at hand. This seemed better than requiring platform objects to define a set of steps labeled "for storage" while the dominant usage of such an algorithm would not in fact be "for storage". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me a while to get around to reviewing this
@@ -7643,6 +7667,9 @@ interface <dfn>DOMStringList</dfn> { | |||
<p>These steps may perform a <span>sub-serialization</span> to serialize nested data | |||
structures. They should not call <span>StructuredSerialize</span> directly, as doing so will | |||
omit the important <var>memory</var> argument.</p> | |||
|
|||
<p>The introduction of these steps should omit mention of the <var>forStorage</var> argument if | |||
it is not relevant to the algorithm.</p> | |||
</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a new idea. Why don't we say that platform objects may define themselves as "not serializable for storage", in addition to their serialization/deserialization steps? And explain when you might want to do that. Then we just throw a DataCloneError if that is set for a platform object.
That would help enforce consistent errors, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a parameter, it would let WebAssembly.Module define the "throw if serializing for storage if not in a secure context" behavior we've discussed elsewhere (rather than doing that in IDB as a post-serialize pass). Although maybe we can do that with prose just as easily. "A module is not serializable for storage if the context is not a secure context."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting; I am now leaning back toward a parameter. The prose version isn't great, because of the question "what is the context"? Which indeed is a good point we'll need to be sure to write tests on. Consider the case
// inside window1, so object literals/functions have a relevant realm of window1
window2Store.put({
get x() { return new window3.WebAssembly.Module(...); }
});
which context(s) here need to be secure?
- The current settings object during serialization is that of window2
- The relevant settings object of the wasm module itself is window3. (Note: this kind of assumes WASM modules are platform objects, since normal JS objects don't necessarily have relevant settings objects. See Formalize [[Realm]] internal slot of ordinary objects tc39/ecma262#573.)
The only way "not serializable for storage" would work is if we decide window3 is the one that needs to be secure; then we could say "A module is not serializable for storage if its relevant settings object is not a secure context". For window2 we'll need an algorithm so we can correctly refer to "the current settings object".
(window1 isn't easy to get at in specs, and probably not in implementation either, so we can forget that.)
We could also require some combination of these, e.g. both window2 and window3. Again this is easier with algorithm steps.
Given this complexity, for now I am happy with either:
- Abandoning revisions to serialization steps alone in this PR and working on a follow-up
- Going with the PR as-is, with an optional forStorage parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current PR is fine. It's flexible which is what we need until we have patterns.
source
Outdated
allocation failure.</p> | ||
<li><p>Set <var>serialized</var> to { [[Type]]: "SharedArrayBuffer", [[ArrayBufferData]]: | ||
<var>value</var>.[[ArrayBufferData]], [[ArrayBufferByteLength]]: <var>size</var>, | ||
[[AgentCluster]]: <span>current Realm Record</span>'s corresponding <span>agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the" current Realm Record's...
source
Outdated
<li><p>Let <var>size</var> be <var>value</var>.[[ArrayBufferByteLength]].</p></li> | ||
|
||
<li> | ||
<p>Let <var>dataCopy</var> be ? <span>CreateByteDataBlock</span>(<var>size</var>).</p> | ||
<p>If <span>IsSharedArrayBuffer</span>(<var>value</var>) is true, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want a ! here per your recent changes?
source
Outdated
<li><p>Set <var>serialized</var> to { [[Type]]: "ArrayBuffer", [[ArrayBufferData]]: | ||
<var>dataCopy</var>, [[ArrayBufferByteLength]]: <var>size</var> }.</p></li> | ||
<ol> | ||
<li><p>If <span>IsDetachedBuffer</span>(<var>value</var>) is true, then throw a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another missing !
data-dfn-type="abstract-op"><dfn>StructuredSerialize</dfn> ( <var>value</var> [ , | ||
<var>memory</var> ] )</h4> | ||
<h4 id="structuredserializeinternal" data-noexport="" data-lt="StructuredSerializeInternal" | ||
data-dfn-type="abstract-op"><dfn>StructuredSerializeInternal</dfn> ( <var>value</var>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could omit all the Bikeshed stuff here since we're not exporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd leave it in for when we get that public/private thing fixed.
<li><p>Let <var>serialized</var> be ? <span>StructuredSerialize</span>(<var>input</var>, | ||
<var>memory</var>).</p></li> | ||
<li><p>Let <var>serialized</var> be ? <span>StructuredSerializeInternal</span>(<var>input</var>, | ||
false, <var>memory</var>).</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just keep calling StructuredSerialize here; seems slightly nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How? StructuredSerialize no longer accepts memory.
@@ -8611,6 +8680,7 @@ o.myself = o;</pre> | |||
</dd> | |||
|
|||
<dt><span>StructuredSerialize</span></dt> | |||
<dt><span>StructuredSerializeForStorage</span></dt> | |||
<dt><span>StructuredDeserialize</span></dt> | |||
<dd> | |||
<p>Creating a <span>JavaScript Realm</span>-independent snapshot of a given value which can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a paragraph after this explaining the distinction. Something like
StructuredSerializeForStorage can be used for situations where the serialization is anticipated to be stored in a persistent manner, instead of passed between Realms. It throws when attempting to serialize SharedArrayBuffer objects, since storing shared memory doesn't make sense. Similarly, it throws when given any platform objects that are marked as [not serializable for storage]."
source
Outdated
<span>StructuredCloneWithTransfer</span> on its arguments, but is careful to do so immediately, | ||
inside the synchronous portion of its algorithm. Thus it is able to use the structured cloning | ||
algorithms without needing to <span>prepare to run script</span> and <span>prepare to run a | ||
<span>StructuredSerialize</span> on its arguments, but is careful to do so immediately, inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructuredSerializeWithTransfer
As defined by and discussed in whatwg/html#2518.
Additionally, define StructuredSerializeForStorage when for serializable objects need to be stored more persistently. Tests: web-platform-tests/wpt#5003. Fixes #2260 (by being the last in a set of fixes). Closes tc39/proposal-ecmascript-sharedmem#144 and closes tc39/proposal-ecmascript-sharedmem#65.
Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1360190. Maybe we should have a generic bug for each browser though to go through the new tests. Safari fails quite a few due to not supporting shared or service workers, or BroadcastChannel, and also seems to have some bugs with window-to-window sharing. |
Blog post available for review: https://docs.google.com/document/d/1Gq9kwKHG4N7DLBXSbgkxzvxH2UN5iwXIjwBjB0SByh8/edit?usp=sharing. |
This LGTM and I'm ready to merge/publish the blog post when you are. I agree with a single "update SAB to pass the tests" bug for each browser. |
Something weird happened. Because I did squash & merge so I could change the commit message, GitHub ended up changing the author of the commit. I haven't seen that happen before. |
As defined by and discussed in whatwg/html#2518.
It seems like this was missed in whatwg#2518.
It seems like this was missed in #2518.
It seems like this was missed in whatwg#2518.
This is a new take on #2361 based upon StructuredSerialize and StructuredDeserialize.
I rebased @domenic’s work and then did a fixup in a new commit.
For failure StructuredDeserialize throws. (Still need to update some callers. The previous PR is also a good source for tests that are still lacking.)