-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite a lot of the internals of Blobs to make reading more explicit. #154
base: main
Are you sure you want to change the base?
Conversation
This at least matches how chrome implements blobs backed by files, and how slicing and/or combining those blobs with other blobs works. I haven't verified yet if all browsers actually fully agree on this though. There also is still plenty of room for improvement here, and of course follow-up work to have HTML actually call out to the new "create a file backed File object" to create files in drag&drop and upload algorithms. @annevk do you have time to review this? not urgent, since I won't have much time in the near future to follow up anyway, but definitely would welcome your input about both the high-level idea of how this tries to define how things work, as well as details. |
This redefines how blobs work by adding internal slots and hooks to support Blob objects backed by various sources of data. By doing so this tries to make more explicit how various edge cases behave, as well as making it clear that blobs are immutable. This fixes #75, fixes #99, and fixes #47. It also lays the ground work to address w3c/webappsec-clear-site-date#49. This also fixes #71.
40f442d
to
030d980
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In progress. Made it to line ~400 or so
</div> | ||
|
||
<div algorithm="blob deserialize"> | ||
Their [=deserialization step=] (the <dfn>blob deserialization steps</dfn>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "step" → "steps"
</dl> | ||
1. Let |read state| be a new [=bytes blob read state=]. | ||
1. If |offset| is larger than |snapshot state|[<code><a for="bytes blob snapshot state">"data"</a></code>]'s [=byte sequence/length=], | ||
set |read state|.[=bytes blob read state/bytes=] to an empty [=byte sequence=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't see the struct.name syntax called out in Infra. Do you know if there's plan to add it?
It would help a lot to have an enumeration of the edge cases this attempts to cover as well as some kind of overview of the design to solve them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably didn't catch everything, but I made it to the end!
To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn> given a sequence of {{BlobPart}}'s |parts| | ||
and {{BlobPropertyBag}} |options|, | ||
To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn> | ||
given a sequence of {{BlobPart}}'s |blobParts| and {{BlobPropertyBag}} |options|, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: {{BlobPart}}s
(plural s, not a possessive s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe list rather than sequence? (I'm hazy on when to switch from IDL terms to Infra terms)
|
||
1. For each |element| in |parts|: | ||
1. Let |bytes| be an empty [=byte sequence=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm concatenates adjacent non-blob parts rather than just ending up with more parts. Although this probably matches implementations, it doesn't seem to be observable and makes the algorithm more complicated. Is there a good reason for it?
@@ -361,16 +458,179 @@ run the following steps: | |||
1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new, but would be more readable with "Otherwise, if ..."
@@ -361,16 +458,179 @@ run the following steps: | |||
1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get | |||
a copy of the bytes held by the buffer source</a>, and append those bytes to |bytes|. | |||
|
|||
1. If |element| is a {{Blob}}, | |||
append the bytes it represents to |bytes|. | |||
1. If |element| is a {{Blob}}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Otherwise, if ..."
:: A number, representing the byte offset in the remaining blob parts | ||
from which to start returning data. | ||
: <dfn>nested blob data</dfn> | ||
:: `undefined` or a [=blob data description=]. This is `undefined` unless otherwise specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we typically style undefined (or null, true, false) as code ?
(Likely not consistent across specs)
let |d| be set to the {{FilePropertyBag/lastModified}} dictionary member. | ||
If it is not provided, | ||
set |d| to the current date and time | ||
with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: should be "ambiguity"
2. Let |n| be a new string of the same size as the {{fileName}} argument to the constructor. | ||
Copy every character from {{fileName}} to |n|, | ||
1. Let |n| be a new string of the same size as |fileName|. | ||
1. Copy every character from |fileName| to |n|, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new: use "code point" not "character" here and elsewhere.
with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>. | ||
|
||
1. If |options|.{{FilePropertyBag/lastModified}} member is provided: | ||
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency: "member" or "dictionary member"
1. If |options|.{{FilePropertyBag/lastModified}} member is provided: | ||
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member. | ||
1. Otherwise: | ||
1. Let |d| be the current date and time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current date / time (as millis since epoch) comes up several times in the spec. Factor it out into a definition.
1. If |options|.{{FilePropertyBag/lastModified}} member is provided: | ||
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member. | ||
1. Otherwise: | ||
1. Let |d| be the current date and time | ||
represented as the number of milliseconds since the <a>Unix Epoch</a> | ||
(which is the equivalent of <code>Date.now()</code> [[ECMA-262]]). | ||
|
||
Note: Since ECMA-262 {{Date}} objects convert to <code>long long</code> values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this into the domintro ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess File constructor doesn't have a domintro yet. Add one? Blob too?
That'd be a good place to call out (in giant blinking text...) that the first argument is a array. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting my current comments, since @inexorabletash made it to the end before me. Happy to take a second look after revisions, if you'd like me to.
|
||
</div> | ||
|
||
Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage API used to persist |serialized| needs a way to
</div> | ||
|
||
Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of | ||
modifying the read algorithms for deserialized blobs. I.e. a Blob that was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e.
-> E.g.,
or For example
,
(I think the IndexedDB sentence here is an example of the above, not an elaboration.)
|
||
1. Set |serialized|.\[[SnapshotState]] to |value|'s [=snapshot state=]. | ||
A {{Blob}} has an associated <dfn export for=Blob>\[[data]]</dfn> internal slot, | ||
a [=blob data description=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal slot, which holds/is a [=blob data description=] ?
(I'm trying to avoid a reparse. Without the extra words, my first parse of the sentence is that there is an internal slot and a blob data description.)
Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of | ||
modifying the read algorithms for deserialized blobs. I.e. a Blob that was | ||
deserialized from IndexedDB should start throwing in its read steps after | ||
clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow
is redundant
modifying the read algorithms for deserialized blobs. I.e. a Blob that was | ||
deserialized from IndexedDB should start throwing in its read steps after | ||
clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize | ||
pass along a hook from the storage API to here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
-> .
### Byte Sequence backed blobs ### {#byte-sequence-backed-blobs} | ||
|
||
The [=snapshot state=] for a byte sequence backed blob contains a | ||
<dfn for="bytes blob snapshot state"><code>"data"</code></dfn> member, a [=byte sequence=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Disclaimer: I'm not versed in spec-ese) I would have expected (map) entry
instead of member
here.
The [=snapshot state=] ... has a "data" entry that stores a [=byte sequence=].
?
|
||
</div> | ||
|
||
A <dfn>bytes blob read state</dfn> is a [=struct=] conssting of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conssting
-> consisting
(typo)
or a [=struct=] that has a single <dfn for="bytes blob read state">bytes</dfn> member, which is a [=byte sequence=]
where each part could be a blob itself. | ||
|
||
The [=snapshot state=] for a multipart blob contains a | ||
<dfn for="multipart blob snapshot state"><code>"parts"</code></dfn> member, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above regarding member
vs (map) entry
|
||
1. For each |element| in |blobParts|: | ||
1. If |element| is a {{USVString}}, run the following substeps: | ||
|
||
1. Let |s| be |element|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not introduced by this PR)
I think it would be nice to replace |s|
with |str|
. This would avoid [=UTF-8 encoding=] |s| which reads like it could be a possessive pronoun.
### Byte Sequence backed blobs ### {#byte-sequence-backed-blobs} | ||
|
||
The [=snapshot state=] for a byte sequence backed blob contains a | ||
<dfn for="bytes blob snapshot state"><code>"data"</code></dfn> member, a [=byte sequence=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about renaming the "data"
map entry to "bytes"
to avoid the naming conflict with the [[data]]
internal slot?
This redefines how blobs work by adding internal slots and hooks to
support Blob objects backed by various sources of data. By doing so this
tries to make more explicit how various edge cases behave, as well as
making it clear that blobs are immutable.
This fixes #75, fixes #99, and fixes #47.
It also lays the ground work to address w3c/webappsec-clear-site-data#49.
This also fixes #71.
For normative changes, the following tasks have been completed:
Implementation commitment:
Preview | Diff