Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize the ASGI protocol directly with Modal functions #905

Merged
merged 12 commits into from
Sep 25, 2023

Conversation

ekzhang
Copy link
Member

@ekzhang ekzhang commented Sep 21, 2023

First step to making Python pickle-independent web endpoints.

This adds a DataFormat to function inputs and outputs, and fills in Pickle by default. We then use the new ASGI-based serialization format (via protobuf) for all web endpoint outputs.

This also adds support on the client for taking ASGI-encoded inputs, which will be useful in the future, since data_format has been filled in for FunctionInput as well.

@ekzhang ekzhang marked this pull request as draft September 21, 2023 22:42
@ekzhang
Copy link
Member Author

ekzhang commented Sep 21, 2023

Also after our discussion today and some Googling, I saw there's an open issue for WebTransport support in ASGI:

django/asgiref#280

Not something we want to implement right now, but that gives me more confidence that we can probably just rely on pseudo-ASGI-over-protobuf as a internal format for Modal's web endpoint handling, regardless of language in the future.

@ekzhang ekzhang force-pushed the ekzhang/better-asgi branch from 3f30fd3 to c1b15cb Compare September 22, 2023 16:33
@ekzhang ekzhang requested a review from erikbern September 22, 2023 21:48
@ekzhang ekzhang marked this pull request as ready for review September 22, 2023 21:48
@ekzhang ekzhang changed the title Simulate the ASGI protocol directly with Modal functions Serialize the ASGI protocol directly with Modal functions Sep 22, 2023
@@ -90,6 +97,68 @@ enum WebhookAsyncMode {
WEBHOOK_ASYNC_MODE_AUTO = 4; // redirect to polling endpoint if execution time nears the http timeout
}

// A web endpoint connection-related message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to put this in a separate proto file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't have strong opinions either way. I think it's possible, but also everything is contained in a single message right now.

status=api_pb2.GenericResult.GENERIC_STATUS_SUCCESS,
data=self.serialize(data),
data=self.serialize_data_format(data, data_format),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would start writing ASGI serialized format before we added support for it on the other side? i'm not quite following

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nvm, just saw the comment that this will be broken up)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! The rest of this file is just plumbing for serialize_data_format

@erikbern
Copy link
Contributor

erikbern commented Sep 22, 2023

I was confused about the deployment strategy since this current PR starts to write ASGI already but I just saw you mentioned it in the server-side PR that this will be merged in two parts. That makes more sense!

Is the plan to break out the proto changes only or will this be broken up into multiple parts? Just thinking what's easiest to review

@ekzhang
Copy link
Member Author

ekzhang commented Sep 23, 2023

Is the plan to break out the proto changes only or will this be broken up into multiple parts? Just thinking what's easiest to review

It's reviewable right now in full, with the server changes. There's some data dependencies.

The deployment plan is in https://github.com/modal-labs/modal/pull/7745 — first just split out the api.proto and _serialization.py changes of this PR first. Then the server changes are merged, then we cycle worker versions, then we merge this PR.

It's hard to just review one section of the code since they depend on each other. The api.proto / _serialization.py changes won't make sense without looking at how they're used.

Copy link
Member Author

@ekzhang ekzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description of changes

@@ -23,8 +23,6 @@ async def receive():

# Run the ASGI app, while draining the send message queue at the same time,
# and yielding results.
# TODO(gongy): we currently create one ASGI instance per concurrent input,
# but it would be sufficient to share this background ASGI context.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this comment because I think it's outdated — we only call the function once in import_function() to produce the ASGI app right now even for multiple inputs and regardless of input concurrency. Then this gets passed into asgi_app_wrapper. (cc @gongy)

@@ -161,6 +161,12 @@ def serialize(self, obj: Any) -> bytes:
def deserialize(self, data: bytes) -> Any:
return deserialize(data, self._client)

def serialize_data_format(self, obj: Any, data_format: int) -> bytes:
return serialize_data_format(obj, data_format)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains the main client change in the PR. We call serialize_data_format with FunctionPutOutputs, which adds data_format=api_pb2.DATA_FORMAT_PICKLE to all FunctionPutOutputsRequest from the client.

Except for web endpoints, which we serialize with api_pb2.DATA_FORMAT_ASGI.

The reason why we need to cycle worker versions with the new protobuf changes before we can deploy this is that workers proxy the FunctionPutOutputs RPC, and prost deletes any unknown Protobuf fields.

status=api_pb2.GenericResult.GENERIC_STATUS_SUCCESS,
data=self.serialize(data),
data=self.serialize_data_format(data, data_format),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! The rest of this file is just plumbing for serialize_data_format

modal/_serialization.py Outdated Show resolved Hide resolved
elif msg_type == "http.disconnect":
return api_pb2.Asgi(http_disconnect=api_pb2.Asgi.HttpDisconnect())
else:
logger.debug("skipping serialization of unknown ASGI message type %r", msg_type)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get a message like a WebSocket response, or an unknown ASGI extension which somehow got enabled despite us not advertising support for it, we skip the serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to raise an exception? this will just ignore it effectively (people won't see the debug log unless they override the verbosity level)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some kinds of messages we want to ignore as unimplemented without throwing (like the lifespan protocol). But yeah for websocket.* events might be nice to throw explicitly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ignore it, because there are many messages that we can't predict. The idea is "accept as many servers as possible, but require compliance on our side by users."

WebSockets just don't work on Modal, and that's fine documented as-is I think. We don't need to crash the user's app after a WebSocket request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could potentially also warn. I'm a bit nervous being completely silent will cause confusion to users

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm fair enough — on second thought I don't think we actually forward WebSocket connection events on our side though, so it's a bit of a moot point?

def deserialize_data_format(s: bytes, data_format: int, client) -> Any:
if data_format == api_pb2.DATA_FORMAT_UNSPECIFIED:
# TODO: Remove this after Modal client version 0.52, when the data_format field is always set.
return deserialize(s, client)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep this for compatibility with clients before today, since they won't be aware to set data_format. I made sure to set data_format in all locations and tested the code.

@@ -140,7 +141,7 @@ async def _process_result(result, stub, client=None):
raise RemoteError(result.exception)

try:
return deserialize(data, client)
return deserialize_data_format(data, data_format, client)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal to the _Invocation class, which I updated to be data format-aware. So FunctionGetOutputs can read ASGI-serialized messages too.

modal/_serialization.py Outdated Show resolved Hide resolved
Copy link
Contributor

@erikbern erikbern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've been trying to think of ways this could break but there's nothing I can think of.

@@ -161,6 +161,12 @@ def serialize(self, obj: Any) -> bytes:
def deserialize(self, data: bytes) -> Any:
return deserialize(data, self._client)

def serialize_data_format(self, obj: Any, data_format: int) -> bytes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit, should this be

Suggested change
def serialize_data_format(self, obj: Any, data_format: int) -> bytes:
def serialize_data_format(self, obj: Any, data_format: api_pb2.DataFormat.ValueType) -> bytes:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly don't know how this affects mypy, can I defer it until later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a bunch of mypy issues with ValueType

elif msg_type == "http.disconnect":
return api_pb2.Asgi(http_disconnect=api_pb2.Asgi.HttpDisconnect())
else:
logger.debug("skipping serialization of unknown ASGI message type %r", msg_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some kinds of messages we want to ignore as unimplemented without throwing (like the lifespan protocol). But yeah for websocket.* events might be nice to throw explicitly

ekzhang added a commit that referenced this pull request Sep 25, 2023
ekzhang added a commit that referenced this pull request Sep 25, 2023
ekzhang added a commit that referenced this pull request Sep 25, 2023
* Just the protobuf parts of #905

* Fix mypy and copyright

* Add skip_old_by to my new test

* Fix docstring typo
@ekzhang ekzhang merged commit e0e8735 into main Sep 25, 2023
@ekzhang ekzhang deleted the ekzhang/better-asgi branch September 25, 2023 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants