Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose multipart headers to users #1311

Closed
wants to merge 18 commits into from

Conversation

adriangb
Copy link
Member

I don't think starlette exposes form field header information to the application, no. I'd probably be up for adding support for it in httpx. It's a bit niche, but worth having all the same I'd think.

From @tomchristie on Gitter

I'll try to make a PR to httpx at some point for this as well

Comment on lines 233 to 241
_KT = typing.TypeVar("_KT", bound=typing.Hashable)
_VT = typing.TypeVar("_VT")
_T = typing.TypeVar("_T")


_UNSET: typing.Any = object()


class ImmutableMultiDict(typing.Mapping[_KT, _VT]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making ImmutableMultiDict a generic is to be able to properly type hint

headers: typing.Optional[ImmutableMultiDict[str, Headers]]

starlette/datastructures.py Outdated Show resolved Hide resolved
"""
An immutable multidict, containing both file uploads and text input.
"""
headers: typing.Optional[ImmutableMultiDict[str, Headers]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where users will access the headers.

I had to make a choice here: create a class MultipartFormData(FormData): ... subclass and add headers there or make headers Optional (because they are not present in application/x-www-form-urlencoded requests).

I chose the latter, but both should work and be backwards compatible. The former is just more code changes, and results in users having to do isinstance(await request.form(), MultiPartFormData) vs. (await request.form()).headers is not None before accessing headers.

@@ -29,6 +29,8 @@ async def app(scope, receive, send):
else:
output[key] = value
await request.close()
if data.headers is not None:
output["__headers"] = [(field_name, dict(headers.items())) for field_name, headers in data.headers.multi_items()]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return a dict w/ 2 keys in every test instead of using a __headers key. That does mean all of the tests that weren't affected by this change would also need to be edited tho.

@@ -665,3 +647,35 @@ def __getattr__(self, key: typing.Any) -> typing.Any:

def __delattr__(self, key: typing.Any) -> None:
del self._state[key]


class FormData(ImmutableMultiDict[str, typing.Union[str, UploadFile]]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this under Headers to avoid the forward ref. We can also just from __future__ import annotations if necessary and void moving the code block.

@adriangb
Copy link
Member Author

It is also possible to implement this as:

class FormData(ImmutableMultiDict[str, typing.Union[str, UploadFile]]):
    _headers: typing.Optional[ImmutableMultiDict[str, Headers]]

    def __init__(
        self,
        *args: typing.Union[
            "FormData",
            typing.Mapping[str, typing.Union[str, UploadFile]],
            typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]],
        ],
        raw_headers: typing.Optional[
            typing.List[typing.Tuple[str, typing.List[typing.Tuple[bytes, bytes]]]]
        ] = None,
    ) -> None:
        super().__init__(*args)
        if raw_headers is not None:
            self._headers = ImmutableMultiDict(
                [(field_name, Headers(raw=raw)) for field_name, raw in raw_headers]
            )
        else:
            self._headers = ImmutableMultiDict(
                [(field_name, Headers()) for field_name, _ in self.multi_items()]
            )
    
    def multi_items_with_headers(self) -> typing.Iterable[typing.Tuple[str, typing.Union[str, UploadFile], Headers]]:
        return (
            (field_name, item, headers)
            for (field_name, item), (_, headers)
            in zip(self.multi_items(), self._headers.multi_items())  # type: ignore # the return time in ImmutableMultiDict is hardcoded to tuple[str, str]
        )

...

I think there's pros and cons to either approach.

@adriangb
Copy link
Member Author

@tomchristie to elaborate on #1311 (comment), the currently proposed API relies on dict ordering in Python. There is an alternative which is to add a method that still relies on dict ordering internally, but transfers that conceptual leap from users to implementation:

Now

form = FormData(...)
for (name, form), (_, headers) in zip(form.multi_items(), form.headers.multi_items()):
    ...

With a wrapper method

form = FormData(...)
for name, form, headers in zip(form.multi_items_with_headers()):
    ...

@tomchristie
Copy link
Member

That's an interesting comment, thanks.

I think I've dropped the ball slightly here, because my first questions should have been:

  • What's the proposed API here?
  • What API do existing frameworks use here? (I'd care about Flask and Django in this context.)

@adriangb
Copy link
Member Author

adriangb commented Dec 22, 2021

If anything Tom, I dropped the ball by not leading with that.

First, a bit of general HTTP context is appropriate. I'm sure you are familiar with this but others reading this PR may not be.
For a multipart request, each field/part includes not only the bytes/content, but also headers. Currently, Starlette gives you access to the files/data but not the headers for each field, except for the Content-Type header and only when a file is uploaded (UploadFile.content_type). In fact, as part of parsing the form data these headers get consumed and discarded, so there is no way for the user to access them.

So generally the proposed API here is "a way for users to get access to the HTTP headers included with each multipart field". I think that given the current Starlette APIs, the two concrete options we have are those in #1311 (comment)

As for how Flask and Django do it, let me start with Flask.
Much like Starlette, Flask puts multipart/form-data into request.files, which is a mapping-like structure where keys are strings and values are a File object. Headers are then in request.files["field_name"].headers, so the headers are actually stored on the Flask equivalent of UploadFile. Just like Starlette, if there is no filename associated with the multipart field then it is interpreted as form data (here in Starlette). Flask puts it in requests.form while Starlette puts it in the same FormData data structure as files, the values are just str instead of UploadFile.

Here are examples for both:

import io

import httpx
from flask import Flask, request
from werkzeug.datastructures import FileStorage, Headers

app = Flask(__name__)

@app.route('/', methods=['POST'])
def upload_file():
    file1 = request.form["file1"]
    assert isinstance(file1, str)
    assert file1 == "{}"
    file2 = request.files["file2"]
    assert isinstance(file2, FileStorage)
    assert isinstance(file2.headers, Headers)
    assert file2.headers["Content-Type"] == "text/plain"
    assert next(iter(file2)) == b"123"
    return ("", 200)


with httpx.Client(app=app, base_url="http://testapp") as client:
    resp = client.post(
        "/",
        files=[
            ("file1", (None, b"{}", "application/json")),
            ("file2", ("file", io.BytesIO(b"123"), "text/plain")),
        ],
    )
    assert resp.status_code == 200
import asyncio
import io

import httpx

from starlette.applications import Starlette
from starlette.datastructures import UploadFile
from starlette.requests import Request
from starlette.responses import Response
from starlette.routing import Route


async def upload_file(request: Request):
    form = await request.form()
    file1 = form["file1"]
    assert isinstance(file1, str)
    assert file1 == "{}"
    file2 = form["file2"]
    assert isinstance(file2, UploadFile)
    assert await file2.read() == b"123"
    return Response()


app = Starlette(routes=[Route("/", upload_file, methods=["POST"])])

async def test():
    async with httpx.AsyncClient(app=app, base_url="http://testapp") as client:
        resp = await client.post(
            "/",
            files=[
                ("file1", (None, b"{}", "application/json")),
                ("file2", ("file", io.BytesIO(b"123"), "text/plain")),
            ],
        )
        assert resp.status_code == 200


asyncio.run(test())

But there is a problem here with Flask's API: it is totally possible to include headers in a non-file field ("file1" in the examples above) and that information is lost. If we went with the Flask API here, the same thing would happen in Starlette.

My goal is that there is no loss of information, or we would be excluding many valid use cases. Hence my API proposals retain this information in all cases, even when the upload is not a file / has not filename.

We could also copy these headers to the UploadFile, which would make the more common use case simpler and give us a nice parity of API with Flask. We can even redirect the current UploadFile.content_type attribute to UploadFile.headers["content-type"] via a property to get a nice generalization of what is already there. But I think this should be done in a separate discussion / PR.

I'll look a bit into Django, but I am less familiar.

@tomchristie
Copy link
Member

Right, that makes sense. So my thoughts here are that we almost certainly don't care about headers for non-files.

I think it's one of those this-can-feasibly-exist-but-isn't-actually-any-use-in-practice-or-ever-actually-seen-in-the-wild kinda cases.

@adriangb
Copy link
Member Author

adriangb commented Dec 22, 2021

I think it's one of those this-can-feasibly-exist-but-isn't-actually-any-use-in-practice-or-ever-actually-seen-in-the-wild kinda cases.

I see your point, but I'll have to respectfully disagree. If Starlette didn't consume this data, I would agree with you because the (admittedly fewer) users who might want the headers would have an escape hatch. But since Starlette consumes this data and discards it, there is no escape hatch. I know I've wanted to do this in my work, I'll try to see if I can find any public examples.

This said, I am happy to make a separate PR to add the full headers to UploadFile. It sounds like that's a change we both agree on, and it does not rule out this change in the future. Edit: went ahead and opened this as #1382

@tomchristie
Copy link
Member

This said, I am happy to make a separate PR to add the full headers to UploadFile. It sounds like that's a change we both agree on, and it does not rule out this change in the future. Edit: went ahead and opened this as #1382

Smart approach, yup.

@adriangb
Copy link
Member Author

adriangb commented Jan 7, 2022

I'm going to close this PR since #1382 solves most of the use case for me. Thanks @tomchristie !

@adriangb adriangb closed this Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants