-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate filesystem merkle-path from IPLD merkle-path #60
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,20 @@ Objects with merkle-links form a Graph (merkle-graph), which necessarily is both | |
|
||
### What is a _merkle-path_? | ||
|
||
A _merkle-path_ is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and then follows _named merkle-links_ in the intermediate objects. Following a name means looking into the object, finding the _name_ and resolving the associated _merkle-link_. | ||
A merkle-path is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and allows access of elements of the referenced node and other nodes transitively. | ||
|
||
There is no single merkle-path, but there are two: | ||
|
||
- merkle-path for filesystems: this is a merkle-path that is designed to be used in the context of filesystems (that also includes network protocols such as HTTP or FTP). Their idea is to be as close as possible to the traditional filesystem semantic | ||
- merkle-path for IPLD: this is a merkle-path that can be used to access more elements of the IPLD data model (specifically: link properties) but that doesn't fit within the traditional filesystem model. | ||
|
||
When you use a merkle path, make sure of which one you use. Command line tools are encouraged to allow switching between the two flavors using a switch. | ||
|
||
Filesystem representations (fuse mounts, HTTP or FTP protocols) should use the _filesystem merkle-paths_ if they intend to store arbitrary file. They can allow switching to _IPLD merkle-paths_ using a mount option or a configuration switch to allow object inspection, and turn the filesystem something like `/proc` or `/sys` on unix machines where storing user files is not the objective. | ||
|
||
### Filesystem merkle-path | ||
|
||
A _filesystem merkle-path_ is a unix-style path which initially dereferences through a _merkle-link_ and then follows _named merkle-links_ in the intermediate objects. Following a name means looking into the object, finding the _name_ and resolving the associated _merkle-link_. | ||
|
||
For example, suppose we have this _merkle-path_: | ||
|
||
|
@@ -72,6 +85,99 @@ O_5 = | "hello": "world" | whose hash value is QmR8Bzg59Y4FGWHeu9iTYhwhiP8PHCN | |
|
||
This entire _merkle-path_ traversal is a unix-style path traversal over a _merkle-dag_ which uses _merkle-links_ with names. | ||
|
||
**[In case we use escaping in protobuf IPLD format]** | ||
|
||
In order to not restrict individual path component by disallowing some file names and still allow storing arbitrary data in IPLD objects, path components must be escaped when they are looked up in IPLD objects. | ||
|
||
To escape a path component in order to look it up in an IPLD object: | ||
|
||
- every `\` character in the path component must be replaced with `\\` | ||
- every `@` character in the path component must be replaced with `\@` | ||
|
||
This makes any key containing a `@` character unescaped in an IPLD object not accessible through a _filesystem merkle-path_. This is a reserved key that can be used to store auxiliary data without making it a link and visible in regular filesystems. This data can be made available in filesystems through extended attributes or opening and reading file contents. | ||
|
||
To unescape IPLD object keys that are not reserved and get the corresponding path component: | ||
|
||
- every `\@` sequence in the key must be replaced by `@` | ||
- every `\\` sequence in the key must be replaced by `\` | ||
|
||
|
||
### IPLD merkle-path (best solution) | ||
|
||
An _IPLD merkle-path_ is an extension of a _filesystem merkle-path_ which uses a special syntax to access link properties. **[In case we use escaping in protobuf IPLD format** Except that key escaping is not performed when looking up items in the IPLD objects. This allow accessing reserved keys using _IPLD merkle-paths_ that are not accessible in filesystems.**]** | ||
|
||
|
||
Path elements are suffixed by either `.link` to access the link properties or by `.object` to dereference the _merkle-link_. if no suffix is present, the _merkle-link_ is dereferenced (to be compatible with _filesystem merkle-paths_ in most cases) | ||
|
||
**FIXME**: perhaps use different suffixes so we are less likely to have ambiguities. Using a character that is denied by Windows would be a good idea since those are less likely to be present in filenames. For most cases, this would make _IPLD merkle-paths_ a superset of _filesystem merkle-paths_. For example we could use `?link` and `?object` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. being able to access link properties includes making it possible to do so via HTTP There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not intend to use the |
||
|
||
Suppose we have object which hashes to QmCCC...000: | ||
|
||
--- | ||
stuff: | ||
foo: | ||
mlink: QmCCC...111 | ||
mode: 0755 | ||
owner: jbenet | ||
|
||
and we have object which hashes to QmCCC...111 (the foo link): | ||
|
||
--- | ||
other: | ||
cat.link: | ||
mlink: QmCCC...222 | ||
mode: 0644 | ||
owner: jbenet | ||
|
||
Now: | ||
|
||
- the path `/ipfs/QmCCC...000/stuff/foo.link/mode` yields `0755` | ||
- the path `/ipfs/QmCCC...000/stuff/foo/other/cat.link/mode` does not exists because `other` does not have a `cat` object, only a `cat.link` | ||
- the path `/ipfs/QmCCC...000/stuff/foo/other/cat.link.link/mode` yields `0644` | ||
- the path `/ipfs/QmCCC...000/stuff/foo.object/other/cat.link` yields object `QmCCC...222` | ||
|
||
### IPLD merkle-path (other solution) | ||
|
||
An _IPLD merkle-path_ is a path which initially dereferences through a _merkle-link_ and then follows elements in intermediate objects through the separator `.`, and follows _merkle-links_ through the separator `/`. | ||
|
||
**Variation:** The separator `/` can also be used instead of `.` if there is no ambiguity. | ||
|
||
The separator can be escaped in any path element using `\.`, and the `\` character is escaped using `\\`. | ||
|
||
Suppose we have object which hashes to QmCCC...000: | ||
|
||
--- | ||
stuff: | ||
foo: | ||
mlink: QmCCC...111 | ||
mode: 0755 | ||
owner: jbenet | ||
|
||
and we have object which hashes to QmCCC...111 (the foo link): | ||
|
||
--- | ||
other: | ||
cat.jpg: | ||
mlink: QmCCC...222 | ||
mode: 0644 | ||
owner: jbenet | ||
|
||
Now: | ||
|
||
- the path `/ipfs/QmCCC...000/stuff.foo.mode` yields `0755` | ||
- the path `/ipfs/QmCCC...000/stuff/foo` does not exists because in the object `QmCCC...000`, the `stuff` object cannot is not a _merkle-link_ (it doesn't have the `mlink` key) | ||
- the path `/ipfs/QmCCC...000/stuff.foo/other.cat\.jpg` yields object `QmCCC...222` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
(i do agree that we should avoid ambiguous meanings, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (maybe this is fixed by using a different separator character, not sure) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's why I prefer the first solution presented. only one needs to be chosen. And I agree if we use the second solution, another character is probably better (you suggested |
||
<br/>**FIXME:** or does it yields `{"mlink": "QmCCC...222", "mode": 0644, "owner": "jbenet"}` and `/ipfs/QmCCC...000/stuff.foo/other.cat\.jpg` yields the object `QmCCC...222`? | ||
- the path `/ipfs/QmCCC...000/stuff.foo/other.cat\.jpg.mode` yields `0644` | ||
|
||
Variation: | ||
|
||
- the path `/ipfs/QmCCC...000/stuff/foo.mode` yields `0755` | ||
- the path `/ipfs/QmCCC...000/stuff.foo.mode` yields `0755` | ||
- the path `/ipfs/QmCCC...000/stuff/foo/mode` does not exists because object `QmCCC...111` does not have a `mode` key. | ||
- the path `/ipfs/QmCCC...000/stuff.foo/other.cat\.jpg` yields same as above (**FIXME**) | ||
- the path `/ipfs/QmCCC...000/stuff.foo/other.cat\.jpg.mode` yields `0644` | ||
|
||
## What is the IPLD Data Model? | ||
|
||
The IPLD Data Model defines a simple JSON-based _structure_ for all merkle-dags, and identifies a set of formats to encode the structure into. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Ignore me, it's too early, this is pretty much the second solution I just realized
Disclaimer: I'm late to this party so if I missed something please excuse me
I do very much like the notion of this solution, but
.link
and.object
seem quite verbose to me, why not use two separators like#
and.
instead, so the example paths would be transformed to the ones you can see below/ipfs/QmCCC...000/stuff/foo#mode
yields0755
/ipfs/QmCCC...000/stuff/foo/other/cat#mode
does not exists becauseother
does not have acat
object, only acat.link
/ipfs/QmCCC...000/stuff/foo/other/cat.link.mode
yields0644
/ipfs/QmCCC...000/stuff/foo.other/cat#
yields objectQmCCC...222
This would make the paths a little less verbose for me. I do see that conflicts with things like
cat.link
in the second object are still an issue, so it might make sense to use different separators, though.
feels quite natural for the object selection to me.