Encode trait methods should be fallible #675

divergentdave · 2023-08-10T14:59:03Z

I think we should make Encode::encode() fallible in the next breaking change release. Currently the various functions that encode vectors panic if their contents exceed the limits of the length prefix, but this ought to instead be an error for the caller to propagate or handle.

The text was updated successfully, but these errors were encountered:

tgeoghegan · 2023-08-10T17:07:51Z

I'm not necessarily opposed to this, but when considering a panic vs. an error we should ask: is the error recoverable? If you have some object whose length prefix is u32, and then you try to cram 1 TB of bytes into it, I think that indicates an unrecoverable programmer error. It's not like the calling code is going to try again with a smaller object. The value of panicking there is that the programmer gets a stack trace indicating exactly where the faulty call to encode() was, as opposed to the faulty stack being popped all the way up to, say, an HTTP message router, and then all they get is "Internal Server Error".

cjpatton · 2023-08-10T17:15:01Z

u16-prefixed strings are easy to overflow, and for DAP in particular it's hard to write tests that catch this panic. In draft-02 it was easy to try to construct aggregation jobs that for which the request was too long to encode. The length depends not only on the number of reports, but the type of VDAF used. I'd rather have an error than panic in such cases.

Arguably we've fixed this particular problem by draft-05, but the same kind of idea midht come up in other cases.

branlwyd · 2023-08-17T17:10:58Z

+1 to making this fallible.

I would go further and suggest that we make this library operate on streams of data rather than byte buffers/Cursors; this is a pretty standard interface for an IO library and would allow us to intermingle parsing work with IO work, improving efficiency and lowering memory usage. We couldn't do this before since a stream can fail to read partway through, and we had no way to communicate this to the caller; but if we make these functions fallible, we can communicate stream-failure errors too.

I'm not necessarily opposed to this, but when considering a panic vs. an error we should ask: is the error recoverable? If you have some object whose length prefix is u32, and then you try to cram 1 TB of bytes into it, I think that indicates an unrecoverable programmer error.

I disagree -- this error could be because the user of the library (e.g. Janus or Daphne) attempted to cram too much data into a structure; this is not a programmer-error in the sense of being always a bug. And since the maximum amount of data that can fit into a structure is opaque to the user of the library, the user can't easily avoid this bug either.

Also, even if we encounter this error, we would want to propagate the error to the caller (via an error return value) to allow the caller to terminate/retry/etc that one request, rather than panicking which might take out the entire process.

branlwyd · 2023-08-17T17:14:23Z

(another way to think of this: as long as we panic on input-too-large, a relatively-untrusted Client can cause us to panic at any time by sending too much data in a field of a Report. switching fields e.g. from u16 to u32 increases the amount of data that must be sent to trigger this issue, but does not avoid it.)

cjpatton · 2023-11-30T17:38:22Z

@divergentdave do you think we can pick this up in the next release? I'd love to see this happen.

divergentdave · 2023-11-30T20:05:21Z

I think that makes sense, it'll be the most disruptive breaking change we've done in a while, but I think it's worth it. I split off a separate issue for switching from monolithic buffers to streams as a possible follow-on.

cjpatton · 2023-11-30T20:39:14Z

Yeah I think a good first step would be to just have .encode() return Result<(), CodecError>. We should also give @branlwyd a heads up, as this might be a headache on the Janus side.

It'll be a pile of work for Daphne, but well worth it. fyi/ @mendess

branlwyd · 2023-12-01T21:29:26Z

Yup, I am all for this change, despite the churn for Janus & other users of the library. I think it closes what would otherwise be an almost-unavoidable panic (from the POV of users of the library) which might be caused by untrusted user input.

(and fwiw, I'm fine separating the "operate on streams rather than byte buffers" suggestion out to its own issue -- while I still think it's a good idea, it's less critical.)

divergentdave added the breaking change label Nov 14, 2023

divergentdave mentioned this issue Nov 30, 2023

Change Encode/Decode to use Read/Write instead of Cursor/Vec #860

Open

divergentdave self-assigned this Dec 1, 2023

divergentdave mentioned this issue Dec 4, 2023

Make the Encode trait fallible #865

Merged

divergentdave closed this as completed in #865 Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode trait methods should be fallible #675

Encode trait methods should be fallible #675

divergentdave commented Aug 10, 2023

tgeoghegan commented Aug 10, 2023

cjpatton commented Aug 10, 2023

branlwyd commented Aug 17, 2023

branlwyd commented Aug 17, 2023

cjpatton commented Nov 30, 2023

divergentdave commented Nov 30, 2023

cjpatton commented Nov 30, 2023

branlwyd commented Dec 1, 2023 •

edited

Loading

Encode trait methods should be fallible #675

Encode trait methods should be fallible #675

Comments

divergentdave commented Aug 10, 2023

tgeoghegan commented Aug 10, 2023

cjpatton commented Aug 10, 2023

branlwyd commented Aug 17, 2023

branlwyd commented Aug 17, 2023

cjpatton commented Nov 30, 2023

divergentdave commented Nov 30, 2023

cjpatton commented Nov 30, 2023

branlwyd commented Dec 1, 2023 • edited Loading

branlwyd commented Dec 1, 2023 •

edited

Loading