Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK: fix string field serialization for multipart/form-data requests #5479

Merged
merged 4 commits into from
Jan 10, 2023

Conversation

SpecLad
Copy link
Contributor

@SpecLad SpecLad commented Dec 16, 2022

Django REST Framework ignores the Content-Type on request body parts, so it doesn't know that they are JSON-encoded. Instead, it just tries to decode each part as if it was an str()-encoded value.

Change the encoding to match the decoding. The only type this matters for is str, because json.dumps and str produce different encodings for str values.

Remove none_type from the list of encodable types since, to my knowledge, there's no way to encode a None value as a multipart/form-data part in a way that DRF will understand.

Motivation and context

This is a follow-up to #5058. Replaces #5240.

How has this been tested?

API unit tests.

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Django REST Framework ignores the Content-Type on request body parts, so
it doesn't know that they are JSON-encoded. Instead, it just tries to decode
each part as if it was an `str()`-encoded value.

Change the encoding to match the decoding. The only type this matters for
is `str`, because `json.dumps` and `str` produce different encodings for
`str` values.

Remove `none_type` from the list of encodable types since, to my knowledge,
there's no way to encode a `None` value as a `multipart/form-data` part in
a way that DRF will understand.
@SpecLad SpecLad marked this pull request as ready for review December 16, 2022 12:01
@SpecLad
Copy link
Contributor Author

SpecLad commented Dec 16, 2022

FWIW, I think we should eventually phase out support for multipart/form-data requests.

AFAICS, the only advantage this format has over JSON is that Django makes it easy to handle arbitrary-length file uploads (since it saves the incoming files in a temporary directory). But we already support TUS, which also lets clients upload such files.

On the other hand, multipart/form-data is massively limited in what data types it can encode, and tools can be inconsistent in how they handle field values that are not strings. Not to mention that supporting an extra content type means more required testing, and more opportunities for bugs.

@@ -126,8 +126,8 @@ class ApiClient(object):
self.default_headers['User-Agent'] = value

def _serialize_post_parameter(self, obj):
if isinstance(obj, (str, int, float, none_type, bool)):
return ('', json.dumps(obj), 'application/json')
if isinstance(obj, (str, int, float, bool)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the exclusion of None mean SDK users will obtain errors when trying to send None values in POST requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I don't think you can send None values as multipart/form-data anyway, so better to fail early.

@zhiltsov-max
Copy link
Contributor

AFAICS, the only advantage this format has over JSON is that Django makes it easy to handle arbitrary-length file uploads (since it saves the incoming files in a temporary directory).

I may be wrong, by comparing to JSON, multipart/form-data should allow passing binary data with no extra encoding overhead, while in JSON we need to encode the data using escape sequences and text characters.

@SpecLad
Copy link
Contributor Author

SpecLad commented Dec 19, 2022

I may be wrong, by comparing to JSON, multipart/form-data should allow passing binary data with no extra encoding overhead, while in JSON we need to encode the data using escape sequences and text characters.

Yeah, fair point. Still, TUS also lets you do this, so we don't need multipart/form-data support.

@zhiltsov-max
Copy link
Contributor

TUS also lets you do this, so we don't need multipart/form-data support.

Probably, we still have at least 1 use case when we need multipart/form-data - it is when we send multiple files in a single request during task data uploading (both Upload-Multiple and the legacy single request variants). Also, AFAIK requests and urllib automatically encode files as multipart requests.

@SpecLad
Copy link
Contributor Author

SpecLad commented Dec 31, 2022

Probably, we still have at least 1 use case when we need multipart/form-data - it is when we send multiple files in a single request during task data uploading (both Upload-Multiple and the legacy single request variants).

Yeah, it's true.

Perhaps we just need to be careful with the parameters that we accept alongside file uploads, and only use parameter types that can be transferred by simple serialization.

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jan 2, 2023

Perhaps we just need to be careful with the parameters that we accept alongside file uploads, and only use parameter types that can be transferred by simple serialization.

It seems like the only working way now, actually. If we will split the parameters by requests and only send the required params (which is only the image_quality now) with client_files (or other files), it works. It doesn't look very attractive to me, but there is no better way found yet. Could you reflect this fact and the way to resolve in the documentation next to other API usage recommendations?

Upd.: I have an idea that for files, can we encode them into some reversible ASCII-based representation (e.g. base64) for transfer.

@nmanovic nmanovic merged commit 3d9c5ad into cvat-ai:develop Jan 10, 2023
@SpecLad SpecLad deleted the fix-part-serialization branch January 18, 2023 13:03
mikhail-treskin pushed a commit to retailnext/cvat that referenced this pull request Jul 1, 2023
…vat-ai#5479)

Django REST Framework ignores the Content-Type on request body parts, so
it doesn't know that they are JSON-encoded. Instead, it just tries to
decode each part as if it was an `str()`-encoded value.

Change the encoding to match the decoding. The only type this matters
for is `str`, because `json.dumps` and `str` produce different encodings
for `str` values.

Remove `none_type` from the list of encodable types since, to my
knowledge, there's no way to encode a `None` value as a
`multipart/form-data` part in a way that DRF will understand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants