Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore parts without names #71

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Uzlopak
Copy link
Contributor

@Uzlopak Uzlopak commented Dec 5, 2021

originally mscdex/busboy#244 with more links of @faust64
Chocobozzz/PeerTube#4121
expressjs/multer#913

https://datatracker.ietf.org/doc/html/rfc7578#section-4.2

4.2.  Content-Disposition Header Field for Each Part

   Each part MUST contain a Content-Disposition header field [RFC2183]
   where the disposition type is "form-data".  The Content-Disposition
   header field MUST also contain an additional parameter of "name"; the
   value of the "name" parameter is the original field name from the
   form (possibly encoded; see Section 5.1).  For example, a part might
   contain a header field such as the following, with the body of the
   part containing the form data of the "user" field:

           Content-Disposition: form-data; name="user"

There is actually no way to handle a part without name gracefully.

I think, I would add the option ignoreEmptyFieldnames with default of false to have non-breaking change to legacy busboy.

Any thoughts?

Checklist

@kibertoad
Copy link
Member

What happens currently?
Since this violates spec, I wonder if throwing a very explicit error is not a better option.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 5, 2021

Currently the field event is emitted aber fieldname is undefined. I think if you emit Error then the whole submitted form would be discarded. not sure

Maybe the package consumers have some idea how they would handle those cases?

@LinusU ? @dougwilson ?

@LinusU
Copy link
Contributor

LinusU commented Dec 6, 2021

Since this violates spec, I wonder if throwing a very explicit error is not a better option.

I would be okay with this 👍

Currently the field event is emitted aber fieldname is undefined.

Hmm, this will probably lead to some unwanted behaviour in Multer 😅

https://github.com/expressjs/multer/blob/690f70378c7093673c4ff7c6e82943fd6f3fbd8d/lib/read-body.js#L32-L34

If limits.fieldNameSize is set, it will try to read length of undefined. If it isn't set, it will call appendField with an undefined field name:

https://github.com/expressjs/multer/blob/690f70378c7093673c4ff7c6e82943fd6f3fbd8d/lib/middleware.js#L15-L17

Which will also try to read the length...

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 6, 2021

@LinusU
Hi,
probably a litte misunderstanding:
this is the current legacy busboy behaviour. This results in bugs like this in multer:
expressjs/multer#553

So if you want to ignore that invalid multipart data, you could after this PR do a ignoreUndefinedFieldnames: true and those invalid fields would be skipped and not throwing issues in multer as the field event would not be emitted as the field is not valid anyway.

@LinusU
Copy link
Contributor

LinusU commented Dec 6, 2021

Yeah, I assumed that we had the problems in all current versions of Multer, but my wording didn't really communicate that 😅

So if you want to ignore that invalid multipart data

I would prefer it if Multer could raise an Error since 1) it's explicitly required by the specification, and 2) silently dropping things will probably lead to harder to debug errors rather than an explicit error.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 6, 2021

Just for my curiosity: Is the Error-throwing a part of the multipart rfc or the multer spec?

@kibertoad
Copy link
Member

@Uzlopak RFC does not cover any part of error handling, and I don't think that Multer spec is a thing. I would say it's more of a good engineering practice, failing fast on data that clearly contradicts the specification.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Dec 6, 2021

I am open for suggestions. I guess, we need to emit an Error and not throw? And secondly, what should be the useful information if we error? "Malformed Multipart Error"?

@LinusU
Copy link
Contributor

LinusU commented Dec 7, 2021

Hmm, it seems like the current version also skips over parts that doesn't have a Content-Disposition header field at all, which also is invalid according to the spec 🤔

Each part MUST contain a Content-Disposition header field [...]

I personally would prefer to fail fast on any data that isn't valid according to RFC7578.

Since 1.0.0 is already released, it's up for interpretation wether making changes to adhere to the specification is breaking or not. One way forward could be to add a strict or similar flag that makes it emit an error both if the Content-Disposition header field is missing, and if the name is missing?

@kibertoad
Copy link
Member

I am open to both approaches - keeping 1.0.0 as drop-in replacement for busboy, and following up with 2.0.0 which is way stricter about what it accepts, or introducing "strict" option in 1.1.0.

@LinusU
Copy link
Contributor

LinusU commented Dec 7, 2021

Neat!

As for Multer, I would prefer to use the strict version for Multers 2.0.0 release 🚀

@kibertoad
Copy link
Member

As discussed, can we start throwing errors there?

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jun 9, 2022

For clarification:
I should add error throwing?

@LinusU
Copy link
Contributor

LinusU commented Jun 10, 2022

Current master already has breaking changes, so my recommendation would be to not add an option and just start throwing an error since as you say there is "no way to handle a part without name gracefully."

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jun 10, 2022

@LinusU
TBH, this patch is 6 months old and I have to read again the source to understand it, LOL. But I need some directions:

  1. Should I remove the ignoreUndefinedFieldnames option? Or should this be set to true by default.
  2. I would then throw an generic Error with the message 'Multipart: Form Part does not contain a name'
  3. Anything I should be aware of?

Also please additional feedback by @kibertoad

I would patch it right away if we agree on the technical spec

@LinusU
Copy link
Contributor

LinusU commented Jun 10, 2022

I'm not a maintainer of this project btw. I'm a maintainer of Multer, a large user of this library.

My personal recommendation would be:

  1. Yes, remove the option
  2. Throw an Error with an appropriate message
  3. Not that I know of

You might however want to wait for an answer from one the maintainers

@kibertoad
Copy link
Member

I agree that we should be strict and throw an error for these cases, configuration doesn't seem necessary, as this case directly violates spec

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jun 10, 2022

@LinusU

That is the reason why I ask you. As a consumer of this package, you have the better insights regarding the edge cases etc.. As you know we forked and refactored this, because of the bugs etc.. But I am happy to get reliable feedback from you to determine what the best direction is. So Thank you for the feedback ;).

Waiting now for feedback from @kibertoad

Scratch that :). Ok, will apply the changes soon.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jun 29, 2022

I thought I answered here, but it seems I didnt. Well I have tested locally. What actually the issue is, that you can have something like a payload, where you have 10 parts. The first 9 parts are valid, and the 10th is invalid. We would then emit error. But we already piped the first 9 parts already somewhere. So there could be an issu.

Or you have 10 parts and the first part is invalid. So we have to emit the error and skip all the other parts.

This seems to be strange to me. But I have no idea except to emit an error and skip the following part streams.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Jul 28, 2022

I really would like to resolve this. How should we move forward?

@Eomm
Copy link
Member

Eomm commented Sep 3, 2022

The first 9 parts are valid, and the 10th is invalid. We would then emit error. But we already piped the first 9 parts already somewhere. So there could be an issue.

I would implement this as the limit: whenever an issue is found, a fieldError is emitted and the field itself is skipped.
This would match the actual behaviour and the strict: false proposal.

Setting strict: true, the stream should be closed as a fatal error: users already should manage errors and deal with it.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Sep 28, 2023

@KhafraDev
@gurgunday
What should we do with parts without names?

@gurgunday
Copy link
Member

Anything that violates the spec should emit an error as fast as possible in my opinion

Clearly the RFC says that a part MUST have a name value

However, I also wonder why mscdex/busboy#244 was closed. Was it fixed afterward?

@gurgunday
Copy link
Member

There is actually no way to handle a part without name gracefully

@Uzlopak could you tell why? If emitting an error for a malformed part and continuing is not possible, it should throw

@KhafraDev
Copy link
Contributor

I wouldn't expect a malformed formdata to parse successfully, this change seems correct.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Sep 28, 2023

So we should throw an error if we have a part without a name, right?

@gurgunday
I implemented it to skip the part, because without a name, i can not assign it to a field. But I am also happy to throw.

@gurgunday
Copy link
Member

So we should throw an error if we have a part without a name, right?

Yeah, the RFC says it must have it.

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Oct 10, 2023

I worked on this issue on the weekend. skipParts was actually avoiding one issue:

When we are processing the data, we dont know if one of the later parts is invalid. This results in the problem, that we create valid file-streams/fields and then we have an invalid part. So we abort. But we will still have created file streams.

So how should we properly handle these cases?

@gurgunday
Copy link
Member

I see, I guess it's because busboy is event based

But in any case, throwing (or at least emitting an error and stopping) would still be more correct in implementations where the user doesn't see the events and just gets the complete body when parsing is completed successfully

@Uzlopak
Copy link
Contributor Author

Uzlopak commented Oct 10, 2023

It is not recommendable to store everything in memory. Imagine, you process a huuuuuge file upload and second field is without a name. Thinking of @Jarred-Sumner s tweet today
image

@gurgunday
Copy link
Member

But isn’t that what happens when fetch formData is used in the browser for instance?

I haven’t checked, just asking

@KhafraDev
Copy link
Contributor

yes it is. .formData() is quite a travesty & I'm pretty sure there's lots of regret in adding it lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants