Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support All of Structure Clone in RSC Serialization #25687

Closed
9 of 13 tasks
sebmarkbage opened this issue Nov 15, 2022 · 11 comments
Closed
9 of 13 tasks

Support All of Structure Clone in RSC Serialization #25687

sebmarkbage opened this issue Nov 15, 2022 · 11 comments
Assignees

Comments

@sebmarkbage
Copy link
Collaborator

sebmarkbage commented Nov 15, 2022

The React Server Components payload is a custom protocol that extends what is serializable beyond just JSON. Beyond just JSON we also support all React primitives (React.lazy, ReactNode) and global named symbols (Symbol.for). We also already have plan to expand this support with these as well:

  • Promises
  • Typed Arrays / DataView
  • BigInt
  • undefined, Infinity, NaN, -0

We don't have plans to make this algorithm pluggable from the outside because we're concerned about the complexity this puts on the ecosystem and that components won't be reusable in different contexts where they're not configured or configurations are conflicting.

However, it might make sense to expand support to the types supported by the Structured clone algorithm which is already standardized and specified.

  • Cyclic references: We already support references in the protocol. This is mostly just an implementation detail for the perf cost whether something should be inlined as JSON or defined as a separate row.
  • ArrayBuffer: For Typed Arrays we might stick to using the underlying buffer coming from the stream instead of cloning the data. All values are considered immutable anyway. For ArrayBuffers we can't use that trick though so it would require a new clone of the data which might be a bit of a foot gun when switching between buffers and typed arrays.
  • Error objects: We already support thrown errors and we could support more errors in the encoding. However, we intentionally don't pass them through with all information. We cover them up with digests since the error message and stack can sometimes include sensitive information that only the server should have access to. We would likely have to do the same here.
  • Boolean/String objects: We don't currently support the object wrappers around primitives e.g. new String(). You're not really supposed to use these in modern JS so it's kind of annoying to have to add extra code to handle this case.
  • RegExp: These are pretty straightforward but can possibly have security implications.
  • Date, Map, Set: These are fairly straightforward to serialize so it's mostly a matter of allowing these as special cases. Why are these special? Because Structured Clone says so.
  • (Temporal: It seems appropriate that this would be added to structured clone but we need to confirm.)

We probably won't support Web specific APIs that don't necessarily have an equivalent on the Server or isn't directly transferrable such as if it has handles to local hardware or file system resources. The only one that might be easy to support:

  • Blob: This would be a wrapper around a ReadableStream with a mime type.
  • (File: This is just a Blob with a modified time and file name. I think we'll likely want to only support Blobs and not Files, meaning that File object would serialize as Blob, so the receiving type has to be Blob. Because file names and modified times could have security implications and it's too easy to accidentally leak this data.)
@KATT
Copy link

KATT commented Nov 15, 2022

I really appreciate that you're thinking about structures outside of the default JSON-stuff.

Maybe there's a world where React itself doesn't do the heavy lifting, but you instead allow developers to add their own serialization libraries?

There's devalue, superjson, and others that can do a lot more than the built-in limitations of JSON.stringify/JSON.parse()

I for one would love to be able to use BigInt & Temporal polyfills for date objects (which I both currently can use with superjson together with tRPC).

@sebmarkbage
Copy link
Collaborator Author

The serialization algorithm itself is most of what the RSC server is and it has a lot of special features as part of that. So it's not as easily pluggable.

Another consideration is that for static types to be preserved, it needs to get the same type on both sides. E.g. toJSON() is not supported even though it's supported in JSON.stringify because that changes the type on the receiving side.

It's also important to us that code is interchangeable between projects and that npm packages are able to support. It's easy for a plug-in system to go out of control.

That said, we already support or want to support most things that those other libraries support, so if you have specific examples it would be good to add to the list.

It seems likely that Temporal would soon be added to structured clone, but it would be good to get that confirmed and that there's no blockers to doing so in the web spec.

@KATT
Copy link

KATT commented Nov 15, 2022

For me, it's all about getting static types to be preserved and used interchangeably across the HTTP boundary.

Ideally, I'd like to be able to have some sort of mapper where I can interpret any sort of object and serialize/deserialize according to my own preference since that allows me to statically type an object and "know" its integrity will remain the same throughout the serialization.

I don't think React itself should be the gatekeeper or responsible for what data types can or cannot be serialized.

I do see how something like supporting promises will make this hard as this seems very specific to the RSC implementation, but I don't think opening this up would impede the use of making code interchangeable across projects - most likely, any open-source project would not depend on a special type.

@sophiebits
Copy link
Collaborator

RegExp: These are pretty straightforward but can possibly have security implications.

What's the security risk here?

@sebmarkbage
Copy link
Collaborator Author

What's the security risk here?

The main attack vector that I'm concerned about right now is the reverse form, when you can pass it back into the server as an argument to an action. That can trick code to execute a regexp that itself has a zero-day. Since regexps implementations in C++ has been known to have those.

Passing out to the client is a little less concerning since a user supplied object. The programmer would have to opt-in to letting the user provide a regexp. However, it can limit the usefulness of the protocol for things like mashups/federation where you take an RSC response from a third party.

@sebmarkbage
Copy link
Collaborator Author

sebmarkbage commented Mar 13, 2023

We could consider DOMPoint/DOMRect/DOMMatrix/DOMQuad if that's actually something people want to use and if those get proper polyfills in server environments. One question is whether these will actually end up being replaced by value types or views over typed arrays.

@hamlim
Copy link
Contributor

hamlim commented Jun 13, 2023

One of the things we ran into when starting to adopt RSC within Next is that URL instances can't be passed as prop values down into client components, did some digging and it seems like it's also not supported in structuredClone as well (which seems odd to me). Would it be possible to support URL's within the RSC serialization?

For the short term, we've opted to .toString() the value and then pass it back into the URL constructor on the other side within the client component. But this adds a bit of overhead/complexity when jumping between the boundary.

@sebmarkbage
Copy link
Collaborator Author

URL has been brought up before. If it’s a frequent enough exception then maybe it’s worth breaking rules but we have to have some guideline for what we include and not. In generally things that have significant benefits to the serialization format or can’t be done any other way are favored.

Ideally it would get brought up as a Web proposal first so that it’s clear that it’ll keep working and not break in the future. E.g. if stateful stuff are added to that data structure in the future we may not be able to support it.

Another complexity is that URLs have a validation step that something is a valid url which can change over time or have divergent implementations. With Date and Temporal there’s always some standard format that we can use in the transfer but that’s not always true for URL. So it might not be possible to transfer a URL from one environment to another. This is somewhat of an edge case but it does show that it’s a bit sketchy to support if we’re not confident that we can keep it seamless. I’m favoring holding off on it for now.

@hamlim
Copy link
Contributor

hamlim commented Jun 13, 2023

That sounds reasonable to me 👍

sebmarkbage added a commit that referenced this issue Jun 27, 2023
We already support these in the sense that they're Iterable so they just
get serialized as arrays. However, these are part of the Structured
Clone algorithm [and should be
supported](#25687).

The encoding is simply the same form as the Iterable, which is
conveniently the same as the constructor argument. The difference is
that now there's a separate reference to it.

It's a bit awkward because for multiple reference to the same value,
it'd be a new Map/Set instance for each reference. So to encode sharing,
it needs one level of indirection with its own ID. That's not really a
big deal for other types since they're inline anyway - but since this
needs to be outlined it creates possibly two ids where there only needs
to be one or zero.

One variant would be to encode this in the row type. Another variant
would be something like what we do for React Elements where they're
arrays but tagged with a symbol. For simplicity I stick with the simple
outlining for now.
@Janpot
Copy link

Janpot commented Dec 9, 2023

We don't have plans to make this algorithm pluggable from the outside because we're concerned about the complexity this puts on the ecosystem and that components won't be reusable in different contexts where they're not configured or configurations are conflicting.

Makes sense. How about exposing the serialization/deserialization APIs in React? I'd be glad to be able to use those methods in my own code.

EdisonVan pushed a commit to EdisonVan/react that referenced this issue Apr 15, 2024
We already support these in the sense that they're Iterable so they just
get serialized as arrays. However, these are part of the Structured
Clone algorithm [and should be
supported](facebook#25687).

The encoding is simply the same form as the Iterable, which is
conveniently the same as the constructor argument. The difference is
that now there's a separate reference to it.

It's a bit awkward because for multiple reference to the same value,
it'd be a new Map/Set instance for each reference. So to encode sharing,
it needs one level of indirection with its own ID. That's not really a
big deal for other types since they're inline anyway - but since this
needs to be outlined it creates possibly two ids where there only needs
to be one or zero.

One variant would be to encode this in the row type. Another variant
would be something like what we do for React Elements where they're
arrays but tagged with a symbol. For simplicity I stick with the simple
outlining for now.
@sebmarkbage
Copy link
Collaborator Author

sebmarkbage commented May 10, 2024

I'm going to close this issue now since with #29035 we'll have landed all serializations we plan on supporting. Until something new appears (e.g. Temporal).

Some notable omissions:

sebmarkbage added a commit that referenced this issue Sep 30, 2024
The idea is that the RSC protocol is a superset of Structured Clone.
#25687 One exception that we left out was serializing Error objects as
values. We serialize "throws" or "rejections" as Error (regardless of
their type) but not Error values.

This fixes that by serializing `Error` objects. We don't include digest
in this case since we don't call `onError` and it's not really expected
that you'd log it on the server with some way to look it up.

In general this is not super useful outside throws. Especially since we
hide their values in prod. However, there is one case where it is quite
useful. When you replay console logs in DEV you might often log an Error
object within the scope of a Server Component. E.g. the default RSC
error handling just console.error and error object.

Before this would just be an empty object due to our lax console log
serialization:
<img width="1355" alt="Screenshot 2024-09-30 at 2 24 03 PM"
src="https://github.com/user-attachments/assets/694b3fd3-f95f-4863-9321-bcea3f5c5db4">
After:
<img width="1348" alt="Screenshot 2024-09-30 at 2 36 48 PM"
src="https://github.com/user-attachments/assets/834b129d-220d-43a2-a2f4-2eb06921747d">

TODO for a follow up: Flight Reply direction. This direction doesn't
actually serialize thrown errors because they always reject the
serialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants