Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode trait methods should be fallible #675

Closed
divergentdave opened this issue Aug 10, 2023 · 8 comments · Fixed by #865
Closed

Encode trait methods should be fallible #675

divergentdave opened this issue Aug 10, 2023 · 8 comments · Fixed by #865
Assignees

Comments

@divergentdave
Copy link
Contributor

I think we should make Encode::encode() fallible in the next breaking change release. Currently the various functions that encode vectors panic if their contents exceed the limits of the length prefix, but this ought to instead be an error for the caller to propagate or handle.

@tgeoghegan
Copy link
Contributor

I'm not necessarily opposed to this, but when considering a panic vs. an error we should ask: is the error recoverable? If you have some object whose length prefix is u32, and then you try to cram 1 TB of bytes into it, I think that indicates an unrecoverable programmer error. It's not like the calling code is going to try again with a smaller object. The value of panicking there is that the programmer gets a stack trace indicating exactly where the faulty call to encode() was, as opposed to the faulty stack being popped all the way up to, say, an HTTP message router, and then all they get is "Internal Server Error".

@cjpatton
Copy link
Collaborator

u16-prefixed strings are easy to overflow, and for DAP in particular it's hard to write tests that catch this panic. In draft-02 it was easy to try to construct aggregation jobs that for which the request was too long to encode. The length depends not only on the number of reports, but the type of VDAF used. I'd rather have an error than panic in such cases.

Arguably we've fixed this particular problem by draft-05, but the same kind of idea midht come up in other cases.

@branlwyd
Copy link
Member

+1 to making this fallible.

I would go further and suggest that we make this library operate on streams of data rather than byte buffers/Cursors; this is a pretty standard interface for an IO library and would allow us to intermingle parsing work with IO work, improving efficiency and lowering memory usage. We couldn't do this before since a stream can fail to read partway through, and we had no way to communicate this to the caller; but if we make these functions fallible, we can communicate stream-failure errors too.

I'm not necessarily opposed to this, but when considering a panic vs. an error we should ask: is the error recoverable? If you have some object whose length prefix is u32, and then you try to cram 1 TB of bytes into it, I think that indicates an unrecoverable programmer error.

I disagree -- this error could be because the user of the library (e.g. Janus or Daphne) attempted to cram too much data into a structure; this is not a programmer-error in the sense of being always a bug. And since the maximum amount of data that can fit into a structure is opaque to the user of the library, the user can't easily avoid this bug either.

Also, even if we encounter this error, we would want to propagate the error to the caller (via an error return value) to allow the caller to terminate/retry/etc that one request, rather than panicking which might take out the entire process.

@branlwyd
Copy link
Member

(another way to think of this: as long as we panic on input-too-large, a relatively-untrusted Client can cause us to panic at any time by sending too much data in a field of a Report. switching fields e.g. from u16 to u32 increases the amount of data that must be sent to trigger this issue, but does not avoid it.)

@cjpatton
Copy link
Collaborator

@divergentdave do you think we can pick this up in the next release? I'd love to see this happen.

@divergentdave
Copy link
Contributor Author

I think that makes sense, it'll be the most disruptive breaking change we've done in a while, but I think it's worth it. I split off a separate issue for switching from monolithic buffers to streams as a possible follow-on.

@cjpatton
Copy link
Collaborator

Yeah I think a good first step would be to just have .encode() return Result<(), CodecError>. We should also give @branlwyd a heads up, as this might be a headache on the Janus side.

It'll be a pile of work for Daphne, but well worth it. fyi/ @mendess

@branlwyd
Copy link
Member

branlwyd commented Dec 1, 2023

Yup, I am all for this change, despite the churn for Janus & other users of the library. I think it closes what would otherwise be an almost-unavoidable panic (from the POV of users of the library) which might be caused by untrusted user input.

(and fwiw, I'm fine separating the "operate on streams rather than byte buffers" suggestion out to its own issue -- while I still think it's a good idea, it's less critical.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants