Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Origin for Persistence #11882

Closed
kitsonk opened this issue Aug 31, 2021 · 2 comments · Fixed by #12548
Closed

Internal Origin for Persistence #11882

kitsonk opened this issue Aug 31, 2021 · 2 comments · Fixed by #12548
Assignees
Labels
cli related to cli/ dir feat new feature (which has been agreed to/accepted)
Milestone

Comments

@kitsonk
Copy link
Contributor

kitsonk commented Aug 31, 2021

Context

Currently, in order to enabled localStorage a --location must be supplied on the command line, so that the origin of the --location can be used to "scope" the local storage.

We plan to add more persistence to Deno, like CacheStorage and IndexedDB and all web standard persistence revolves around the concept of "scoping" the persisted data to an origin.

A lack of an implicit origin provides lots of usability challenges, as while setting the --location is desirable for specific use cases, the average user just wants to run some workloads that persist data in a predictable and secure way. Especially when using other code written for the web platform, they just want to grab the code and have it work.

Note implicitly setting window.location and determining a base for relative URLs in fetch() are related but seperate concerns not addressed here.

For additional context there are a few definitions important to this discussion...

What is an origin?

Origin is an HTML concept that is the foundation for security for the web platform APIs. It is used to determine if some code is from the same "place" as other code to help determine how to deal with it securely. In addition it can also be used to determine what persisted data is visible to a given script.

There are two types of origins defined in the HTML specification:

An opaque origin
An internal value, with no serialization it can be recreated from (it is serialized as "null" per serialization of an origin), for which the only meaningful operation is testing for equality.
A tuple origin
A tuple consists of:
  • A scheme
  • A host
  • A port
  • A domain. Null unless stated otherwise.

In the URL specification, origin sits outside of other properties and how its own serialization algorithm outside of the URL specification.

What is an opaque origin?

The definition in the specification is:

An internal value, with no serialization it can be recreated from (it is serialized as "null" per serialization of an origin), for which the only meaningful operation is testing for equality.

The author thinks it is important to note that specification says that there is no serialization an opaque origin can be "recreated" from and should always be serialized as null. It is important to note though that one opaque origin
can be compared to another, so opaque origins are not intended to always be unique, that given some internal algorithm, you can arrive at an opaque origin that is equal to another opaque origin.

In fact, there are explicit algorithm which indicate opaque origins are intended to be comparable:

Two origins, A and B, are said to be same origin if the following algorithm returns true:
  1. If A and B are the same opaque origin, then return true.
  2. If A and B are both tuple origins and their schemes, hosts, and port are identical, then return true.
  3. Return false.
Two origins, A and B, are said to be same origin-domain if the following algorithm returns true:
  1. If A and B are the same opaque origin, then return true.
  2. If A and B are both tuple origins, run these substeps:
    1. If A and B's schemes are identical, and their domains are identical and non-null, then return true.
    2. Otherwise, if A and B are same origin and their domains are identical and null, then return true.
  3. Return false.

It simply implies that an opaque origin cannot be set by using a serialized origin. What you get when you deserialize an origin is always a tuple origin (if valid).

What is the origin of specific URL schemes?

The algorithm for determining the origin for a URL is
specified as:

  • "blob" schemes use blob URL entries environment origin, or a url parsing of the URL's path[0], otherwise a new opaque origin.
  • "ftp", "http", "https", "ws", "wss" schemes return a tuple origin.
  • "file" schemes have undefined behavior, but suggest that a new opaque origin should be used.
  • Otherwise it is a new opaque origin.

Solution

Internal Origin

The Deno CLI will have the concept of internal origin which will be determined at startup and be immutable for the lifetime of main worker. The internal origin will be used by processes that need to determine the origin to associate with some function, like persisting data (e.g. the origin to associate with data from localStorage).

The internal origin will be determined in the following way:

  • If the --location flag was set at startup, the internal origin will be the origin of a URL parse of the supplied argument. It is assumed that if the URL parse is a failure for the value, the CLI will have already terminated with an error message.
  • Otherwise, if the --config flag was set at startup, the internal origin will be set to the URL derived opaque origin of the fully qualified URL of the value of the --config value. (e.g. the location of the config file generates a unique opaque origin)
  • Otherwise the internal origin will be derived from the fully qualified root specifier used to invoke the command line, and will be the URL derived opaque origin of the parent of the specifier. This means in practice that scripts that share the same parent directory (or parent path for remote URLs will share the same opaque origin).
    • If the parsed URL does not have a path component, the fully URL will be used to derive the opaque origin.

Examples:

Command line Internal Origin
deno run --location https://deno.land/x/example.ts main.ts ["https", "deno.land", null, null]
deno run --location file:///a/test.ts main.ts derived opaque origin from "file:///a/test.ts"
deno run --location https://example.com/ --config tsconfig.json main.ts ["https", "example.com", null, null]
deno run --config tsconfig.json main.ts derived opaque origin from full file URL to tsconfig.json
deno run main.ts derived opaque origin from full file URL to the parent of main.ts
deno run /User/example/project/main.ts derived opaque origin from file:///User/example/project/
deno test /User/example/project/test.ts derived opaque origin from file:///User/example/project/
deno run /User/example/project/lib/lib.ts derived opaque origin from file:///User/example/project/lib/
deno run https://deno.land/x/example/mod.ts derived opaque origin from https://deno.land/x/example/

Implications:

  • Specifying the location, the origin of that location is used at the internal origin as well as supplied on window.location.
  • Specifying a configuration file (without a location), the internal origin will be a derived opaque origin of the URL to the configuration file. This means that the internal origin will be the same value whenever the same path the configuration file is used.
  • If neither a location or configuration file is specified, then root specifier is used. If the root specifier has a path, the parent of the specifier is used, meaning local or remote specifiers in the same "directory" share the same opaque origin. If the root specifier doesn't have a path, it is just a unique opaque origin.

URL derived opaque origins

The Deno CLI will have the concept of URL derived opaque origins. This is a concept that a unique opaque origin can be derived from a normalized string serialization of a URL and this unique opaque origin will reproducible from invocation to invocation of the Deno CLI and irrespective of host.

For example, the string URL of https://deno.land/x/example/mod.ts could be converted into a unique opaque origin that would be not equal to an opaque origin converted from https://deno.land/x/example/lib.ts, but every Deno CLI executable would perform the conversion in a consistent fashion so that a persisted version of the unique origin could be considered equal to recently converted one.

For simplicity, it is likely that the string representation of the URL that the opaque origin is being derived from is used, as it ensures the same opaque origin could be used and makes for simplistic equality checking, though whenever exposed "externally" it needs to be serialized as null. In a JavaScript isolate, this simply could something like this:

class OpaqueOrigin {
  #value: string;

  constructor(value: string) {
    this.#value = value;
  }

  equals(o: OpaqueOrigin): boolean {
    return this.#value === o.#value;
  }

  toJSON() {
    return null;
  }

  toString() {
    return "null";
  }

  [Symbol.toPrimitive]() {
    return null;
  }
}
@kitsonk kitsonk added feat new feature (which has been agreed to/accepted) cli related to cli/ dir labels Aug 31, 2021
@nayeemrmn
Copy link
Collaborator

  • If neither a location or configuration file is specified, then root specifier is used. If the root specifier has a path, the parent of the specifier is used, meaning local or remote specifiers in the same "directory" share the same opaque origin. If the root specifier doesn't have a path, it is just a unique opaque origin.

This is very arbitrary IMO. It would be a lot simpler if root specifiers were always given unique opaque origins and users had to pass a --location if they wanted to share storage between more than one main module. I see that we're trying to make it so --location almost never has to be passed for web storage by having at least some implicit sharing mechanism. However:

  • For two local entrypoints with shared storage, users will be able to put it in a config file instead.
  • For two remote entrypoints with shared storage, users would be asked to deno install them in the first place meaning they only have to specify the same --location once each in copy-and-pasted deno install commands.

@kitsonk kitsonk mentioned this issue Sep 17, 2021
17 tasks
@kitsonk kitsonk added this to the 2.0.0 milestone Sep 17, 2021
@kitsonk kitsonk self-assigned this Sep 17, 2021
@kitsonk kitsonk modified the milestones: 2.0.0, 1.16.0 Oct 18, 2021
@kitsonk
Copy link
Contributor Author

kitsonk commented Oct 26, 2021

This is very arbitrary IMO.

Upon reflection, I agree. I think we should just drop the shared implicit location for the parent root. It would simply be:

The internal origin will be determined in the following way:

  • If the --location flag was set at startup, the internal origin will be the origin of a URL parse of the supplied argument. It is assumed that if the URL parse is a failure for the value, the CLI will have already terminated with an error message.
  • Otherwise, if the --config flag was set at startup, the internal origin will be set to the URL derived opaque origin of the fully qualified URL of the value of the --config value. (e.g. the location of the config file generates a unique opaque origin)
  • Otherwise the internal origin will be derived from the fully qualified root specifier used to invoke the command line, and will be the URL derived opaque origin of the specifier. This means that each unique root will generate a unique origin. This means if users wish to share storage amongst different programs, they would need to share the same absolute config file, or explicitly set the --location.

kitsonk added a commit to kitsonk/deno that referenced this issue Oct 26, 2021
kitsonk added a commit to kitsonk/deno that referenced this issue Oct 26, 2021
kitsonk added a commit that referenced this issue Oct 27, 2021
)

Closes #11882

BREAKING CHANGE: Previously when `--location` was set, the unique storage key was derived from the the URL of the location instead of just the origin. This change correctly uses just the origin. This may cause previously persisted storage to change its key and data to not be available with the same location as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli related to cli/ dir feat new feature (which has been agreed to/accepted)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants