From 038188831cfb3be7ab37f69aad12b437561dc51a Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Thu, 7 Jan 2016 10:29:19 +0100 Subject: [PATCH 1/9] Relationship with Protocol Buffers legacy IPFS node format --- merkledag/ipld.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) diff --git a/merkledag/ipld.md b/merkledag/ipld.md index 4fa69db06..b0937a980 100644 --- a/merkledag/ipld.md +++ b/merkledag/ipld.md @@ -319,6 +319,117 @@ In the same way, when the receiver is storing the object, it must make sure that A simple way to store such objects with their format is to store them with their multicodec header. +## Relationship with Protocol Buffers legacy IPFS node format + +IPLD has a known conversion with the legacy Protocol Buffers format. This format is defined with the Protocol Buffers syntax as: + + message PBLink { + optional bytes Hash = 1; + optional string Name = 2; + optional uint64 Tsize = 3; + } + + message PBNode { + repeated PBLink Links = 2; + optional bytes Data = 1; + } + +The conversion to the IPLD data model must have the following properties: + +- It should be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. +- When using paths as defined earlier in this document, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. +- Link names should not conflict with other keys. + +There are multiple ways to do that that will be described next. + +### Current encoding in go-ipld + +go-ipld implements the following conversion: + + { + "": { + "hash": "", + "name": "", + "size": + }, + "": { + "hash": "", + "name": "", + "size": + }, + ... + "@attrs": { + "data": "", + "links": [ + { + "hash": "", + "name": "", + "size": + }, + { + "hash": "", + "name": "", + "size": + } + ] + } + } + +Notes : + +- The `links` array in the `@attrs` section is there to preserve order and duplicate links to hash back to the exact same protocol buffer object. + +- The link names are escaped to prevent clashing with the `@attr` key. + +- The escaping mechanism transforms the `@` character into `\@`. This mechanism also implies a modification of the path algorithm. When a path component contains the `@` character, it must be escaped to look it up in the IPLD Node object. + + For example, a path a path `/root/first@component/second@component/third_component` would look for object `root["first\@component"]["second\@component"]["third_component"]` (following mlinks when necessary). + +- Links are represented using the `hash` key instead of `mlink` as used in this specification. This must be changed. + +### Other proposition that does away with escaping + +We can imagine another transformation where the link names are not escaped. For example: + + { + "": { + "mlink": "", + "tsize": + }, + "": { + "mlink": "", + "tsize": + }, + ... + ".": { + "data": "", + "links": [ + "", + { + "name": "", + "mlink": "", + "tsize": + } + "", + ... + ] + } + } + +Notes: + +- Very conveniently, we use the key `.` to represent data for the current node, and any other key can represent a link. This means that we forbid link to be named `.`. This is in any case a good thing to do as the `.` element in paths can always be removed (the same way `..` can be replaced by the parent directory) + +- No escaping is needed, and no modification to the path algorithm is needed. + +- Link order is kept by using the link names for links present in the top node. This avoid repeating identical data (even though it is probably generated on the fly). + + Links that cannot be present in the top node (the case for the link named `.`, which is forbidden, or for links that are repeated with the same name) are present in full to allow reconstructing the exact protocol buffer object. + +### Other encodings + +Speak up in comments if you have other encoding suggestions, or you have another implementation with another encoding. + ## Datastructure Examples It is important that IPLD be a simple, nimble, and flexible format that does not get in the way of users defining new or importing old datastractures. For this purpose, below I will show a few example data structures. From bc2050cbebbde31ec9ff814527649bb8d4e7cfe3 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Fri, 8 Jan 2016 09:07:14 +0100 Subject: [PATCH 2/9] Talk about escaping keys in merkle-paths --- merkledag/ipld.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/merkledag/ipld.md b/merkledag/ipld.md index b0937a980..7450db344 100644 --- a/merkledag/ipld.md +++ b/merkledag/ipld.md @@ -72,6 +72,23 @@ O_5 = | "hello": "world" | whose hash value is QmR8Bzg59Y4FGWHeu9iTYhwhiP8PHCN This entire _merkle-path_ traversal is a unix-style path traversal over a _merkle-dag_ which uses _merkle-links_ with names. +**[In case we use escaping in protobuf IPLD format]** + +In order to not restrict individual path component by disallowing some file names and still allow storing arbitrary data in IPLD objects, path components must be escaped when they are looked up in IPLD objects. + +To escape a path component in order to look it up in an IPLD object: + +- every `\` character in the path component must be replaced with `\\` +- every `@` character in the path component must be replaced with `\@` + +This makes any key containing a `@` character unescaped in an IPLD object not accessible through a _filesystem merkle-path_. This is a reserved key that can be used to store auxiliary data without making it a link and visible in regular filesystems. This data can be made available in filesystems through extended attributes or opening and reading file contents. + +To unescape IPLD object keys that are not reserved and get the corresponding path component: + +- every `\@` sequence in the key must be replaced by `@` +- every `\\` sequence in the key must be replaced by `\` + + ## What is the IPLD Data Model? The IPLD Data Model defines a simple JSON-based _structure_ for all merkle-dags, and identifies a set of formats to encode the structure into. @@ -338,7 +355,7 @@ The conversion to the IPLD data model must have the following properties: - It should be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. - When using paths as defined earlier in this document, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. -- Link names should not conflict with other keys. +- Link names should be able to be any valid file name. As such, the encoding must ensure that link names do not conflict with other keys in the model. There are multiple ways to do that that will be described next. From 89dd82d801a7b3acc6033863564efe22f52e9553 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Fri, 8 Jan 2016 19:40:54 +0100 Subject: [PATCH 3/9] Move (and update) section about protobuf compat to separate file --- merkledag/ipld-compat-protobuf.md | 138 ++++++++++++++++++++++++++++++ merkledag/ipld.md | 107 ----------------------- 2 files changed, 138 insertions(+), 107 deletions(-) create mode 100644 merkledag/ipld-compat-protobuf.md diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md new file mode 100644 index 000000000..525e995c9 --- /dev/null +++ b/merkledag/ipld-compat-protobuf.md @@ -0,0 +1,138 @@ +# IPLD conversion with Protocol Buffer legacy IPFS node format + +IPLD has a known conversion with the legacy Protocol Buffers format in order for new IPLD objects to interact with older protocol buffer objects. + +## Detecting if the legacy format is in use + +The format is encapsulated after a multicodec header that tells which codec to use. In addition, older applications that do not yet use the multicodec header will transmit a protocol buffer stream. This can be detected by looking at the first byte: + +- if the first byte is between 0 and 127, it is a multicodec header +- if the first byte if between 128 and 255, it is a protocol buffer stream + +In case a multicodec header is in use, the actual IPLD object is encapsulated first with a multicodec header which identifier is `/mdagv1`, then by a second header which identifier corresponds to the actual encoding of the object: + +- `/protobuf/msgio`: is the encapsulation for protocol buffer message +- `/json`: is the encapsulation for JSON encoding +- `/cbor`: is the encapsulation for CBOR encoding + +For example, a protocol buffer object encapsulated in a multicodec header would start with "`\x08/mdagv1\n\x10/protobuf/msgio\n`" corresponding to the bytes : + + 08 2f 6d 64 61 67 76 31 0a + 10 2f 70 72 6f 74 6f 62 75 66 2f 6d 73 67 69 6f 0a + +A JSON encoded object would start with "`\x08/mdagv1\n\x06/json\n`" and a CBOR encoded object would start with "`\x08/mdagv1\n\x06/cbor\n`". + + +## Description of the legacy protocol buffers format + +This format is defined with the Protocol Buffers syntax as: + + message PBLink { + optional bytes Hash = 1; + optional string Name = 2; + optional uint64 Tsize = 3; + } + + message PBNode { + repeated PBLink Links = 2; + optional bytes Data = 1; + } + +## Conversion to IPLD model + +The conversion to the IPLD data model must have the following properties: + +- It should be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. +- When using paths as defined earlier in this document, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. +- Link names should be able to be any valid file name. As such, the encoding must ensure that link names do not conflict with other keys in the model. + +There is a canonical form which is described below: + +**FIXME: decide on that form. Until now, multiple possible forms are presented here** + + +### Escape encoding + +A protocol buffer message would be converted the following way: + + { + "": { + "mlink": "", + "name": "", + "size": + }, + "": { + "mlink": "", + "name": "", + "size": + }, + ... + "@attrs": { + "data": "", + "links": [ + { + "mlink": "", + "name": "", + "size": + }, + { + "mlink": "", + "name": "", + "size": + } + ] + } + } + +Notes : + +- The `links` array in the `@attrs` section is there to preserve order and duplicate links to hash back to the exact same protocol buffer object. + +- Link hashes are encoded in base58 + +- The link names are escaped to prevent clashing with the `@attr` key. + +- The escaping mechanism transforms the `@` character into `\@`. This mechanism also implies a modification of the path algorithm. When a path component contains the `@` character, it must be escaped to look it up in the IPLD Node object. + + For example, a _filesystem merkle-path_ `/root/first@component/second@component/third_component` would look for object `root["first\@component"]["second\@component"]["third_component"]` (following mlinks when necessary). + +**FIXME: Using the `@` character is not mandatory. Any other character could fit. Don't hesitate to give your ideas.** + +### Other proposition that avoids escaping + +We can imagine another transformation where the link names are not escaped. For example: + + { + "": { + "mlink": "", + "tsize": + }, + "": { + "mlink": "", + "tsize": + }, + ... + ".": { + "data": "", + "links": [ + "", + { + "name": "", + "mlink": "", + "tsize": + } + "", + ... + ] + } + } + +Notes: + +- Very conveniently, we use the key `.` to represent data for the current node, and any other key can represent a link. This means that we forbid link to be named `.`. This is in any case a good thing to do as the `.` element in paths can always be removed (the same way `..` can be replaced by the parent directory) + +- No escaping is needed, and no modification to the path algorithm is needed. + +- Link order is kept by using the link names for links present in the top node. This avoid repeating identical data (even though it is probably generated on the fly). + + Links that cannot be present in the top node (the case for the link named `.`, which is forbidden, or for links that are repeated with the same name) are present in full to allow reconstructing the exact protocol buffer object. diff --git a/merkledag/ipld.md b/merkledag/ipld.md index 7450db344..affc300b8 100644 --- a/merkledag/ipld.md +++ b/merkledag/ipld.md @@ -336,113 +336,6 @@ In the same way, when the receiver is storing the object, it must make sure that A simple way to store such objects with their format is to store them with their multicodec header. -## Relationship with Protocol Buffers legacy IPFS node format - -IPLD has a known conversion with the legacy Protocol Buffers format. This format is defined with the Protocol Buffers syntax as: - - message PBLink { - optional bytes Hash = 1; - optional string Name = 2; - optional uint64 Tsize = 3; - } - - message PBNode { - repeated PBLink Links = 2; - optional bytes Data = 1; - } - -The conversion to the IPLD data model must have the following properties: - -- It should be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. -- When using paths as defined earlier in this document, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. -- Link names should be able to be any valid file name. As such, the encoding must ensure that link names do not conflict with other keys in the model. - -There are multiple ways to do that that will be described next. - -### Current encoding in go-ipld - -go-ipld implements the following conversion: - - { - "": { - "hash": "", - "name": "", - "size": - }, - "": { - "hash": "", - "name": "", - "size": - }, - ... - "@attrs": { - "data": "", - "links": [ - { - "hash": "", - "name": "", - "size": - }, - { - "hash": "", - "name": "", - "size": - } - ] - } - } - -Notes : - -- The `links` array in the `@attrs` section is there to preserve order and duplicate links to hash back to the exact same protocol buffer object. - -- The link names are escaped to prevent clashing with the `@attr` key. - -- The escaping mechanism transforms the `@` character into `\@`. This mechanism also implies a modification of the path algorithm. When a path component contains the `@` character, it must be escaped to look it up in the IPLD Node object. - - For example, a path a path `/root/first@component/second@component/third_component` would look for object `root["first\@component"]["second\@component"]["third_component"]` (following mlinks when necessary). - -- Links are represented using the `hash` key instead of `mlink` as used in this specification. This must be changed. - -### Other proposition that does away with escaping - -We can imagine another transformation where the link names are not escaped. For example: - - { - "": { - "mlink": "", - "tsize": - }, - "": { - "mlink": "", - "tsize": - }, - ... - ".": { - "data": "", - "links": [ - "", - { - "name": "", - "mlink": "", - "tsize": - } - "", - ... - ] - } - } - -Notes: - -- Very conveniently, we use the key `.` to represent data for the current node, and any other key can represent a link. This means that we forbid link to be named `.`. This is in any case a good thing to do as the `.` element in paths can always be removed (the same way `..` can be replaced by the parent directory) - -- No escaping is needed, and no modification to the path algorithm is needed. - -- Link order is kept by using the link names for links present in the top node. This avoid repeating identical data (even though it is probably generated on the fly). - - Links that cannot be present in the top node (the case for the link named `.`, which is forbidden, or for links that are repeated with the same name) are present in full to allow reconstructing the exact protocol buffer object. - ### Other encodings Speak up in comments if you have other encoding suggestions, or you have another implementation with another encoding. From 5b97e14c23a13a0869897951583361ddfc722fbc Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Sun, 10 Jan 2016 16:42:14 +0100 Subject: [PATCH 4/9] Change protocol buffer compatibility format. Links needs not to be present at the top level. having them in a separate map removes all complexity of key escaping. --- merkledag/ipld-compat-protobuf.md | 113 ++++++++++++++++++++++++------ 1 file changed, 90 insertions(+), 23 deletions(-) diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md index 525e995c9..d72c96787 100644 --- a/merkledag/ipld-compat-protobuf.md +++ b/merkledag/ipld-compat-protobuf.md @@ -48,40 +48,107 @@ The conversion to the IPLD data model must have the following properties: There is a canonical form which is described below: -**FIXME: decide on that form. Until now, multiple possible forms are presented here** + { + "data": "", + "named-links": { + "": { + "link": "", + "name": "", + "size": + }, + "": { + "link": "", + "name": "", + "size": + }, + ... + } + "ordered-links": [ + "", + { + "name": "", + "link": "", + "tsize": + } + "", + ... + ] + } +- Here we assume that the link #0 and #1 have the same name. As specified in [ipld.md](ipld.md) in paragraph **Duplicate property keys**, only the first link is present in the named link section. The other link is present in the `ordered-links` section for completeness and to allow recreating the original protocol buffer message. -### Escape encoding +- Links are not accessible on the top level object. Applications that are using protocol buffer objects such as unixfs will have to handle that and special case for legacy objects. -A protocol buffer message would be converted the following way: +- No escaping is needed and no conflict is possible + +----------------- + +### Simple variation on that solution { - "": { - "mlink": "", + "data": "", + "": { + "link": "", "name": "", "size": }, - "": { - "mlink": "", + "": { + "link": "", "name": "", "size": }, - ... - "@attrs": { - "data": "", - "links": [ - { - "mlink": "", - "name": "", - "size": - }, - { - "mlink": "", - "name": "", - "size": - } - ] + ... + "ordered-links": [ + "", + { + "name": "", + "link": "", + "tsize": + } + "", + ... + ] + } + +- Here we assume that the link #0 and #1 have the same name. As specified in [ipld.md](ipld.md) in paragraph **Duplicate property keys**, only the first link is present in the named link section. The other link is present in the `ordered-links` section for completeness and to allow recreating the original protocol buffer message. + +- Link whose name would conflict with other top level keys are not included in the top level object. They are only accessible in `ordered-links` section by iterating through the values. + +- Links are not accessible on the top level object. Applications that are using protocol buffer objects such as unixfs will have to handle that and special case for legacy objects. + +- No escaping is needed and no conflict is possible + +### Other variation: escape encoding + +A protocol buffer message would be converted the following way: + + { + "data": "", + "named-links": { + "": { + "mlink": "", + "name": "", + "size": + }, + "": { + "mlink": "", + "name": "", + "size": + }, + ... } + "ordered-links": [ + { + "mlink": "", + "name": "", + "size": + }, + { + "mlink": "", + "name": "", + "size": + } + ] } Notes : @@ -98,7 +165,7 @@ Notes : **FIXME: Using the `@` character is not mandatory. Any other character could fit. Don't hesitate to give your ideas.** -### Other proposition that avoids escaping +### Other variation that avoids escaping We can imagine another transformation where the link names are not escaped. For example: From 33ca56e7b27e66fab9aee58fbde2eab656ad96f7 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Tue, 9 Feb 2016 17:50:32 +0100 Subject: [PATCH 5/9] IPLD Protocol Buffer compatibility: fix errors Fix the paragraph about the first byte that is able to determine if the data in prefixed by a multicodec or is a protocol buffer object. --- merkledag/ipld-compat-protobuf.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md index d72c96787..860f81d94 100644 --- a/merkledag/ipld-compat-protobuf.md +++ b/merkledag/ipld-compat-protobuf.md @@ -2,14 +2,11 @@ IPLD has a known conversion with the legacy Protocol Buffers format in order for new IPLD objects to interact with older protocol buffer objects. -## Detecting if the legacy format is in use +## Detecting the format in use -The format is encapsulated after a multicodec header that tells which codec to use. In addition, older applications that do not yet use the multicodec header will transmit a protocol buffer stream. This can be detected by looking at the first byte: +The format is encapsulated after two multicodec headers. The first have the codec path `/mdagv1` and can be used to detect whether IPLD objects are transmitted or just legacy protocol buffer messages. -- if the first byte is between 0 and 127, it is a multicodec header -- if the first byte if between 128 and 255, it is a protocol buffer stream - -In case a multicodec header is in use, the actual IPLD object is encapsulated first with a multicodec header which identifier is `/mdagv1`, then by a second header which identifier corresponds to the actual encoding of the object: +The second multicodec header is used to detect the actual format in which the IPLD object is encoded: - `/protobuf/msgio`: is the encapsulation for protocol buffer message - `/json`: is the encapsulation for JSON encoding From 1ab421b3500e0e9e96ae312fcb8b5730899ba727 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Tue, 9 Feb 2016 21:03:10 +0100 Subject: [PATCH 6/9] Only keep first alternative. --- merkledag/ipld-compat-protobuf.md | 131 +----------------------------- 1 file changed, 4 insertions(+), 127 deletions(-) diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md index 860f81d94..6ae9ac0b4 100644 --- a/merkledag/ipld-compat-protobuf.md +++ b/merkledag/ipld-compat-protobuf.md @@ -49,12 +49,12 @@ There is a canonical form which is described below: "data": "", "named-links": { "": { - "link": "", + "@link": "", "name": "", "size": }, "": { - "link": "", + "@link": "", "name": "", "size": }, @@ -64,43 +64,8 @@ There is a canonical form which is described below: "", { "name": "", - "link": "", - "tsize": - } - "", - ... - ] - } - -- Here we assume that the link #0 and #1 have the same name. As specified in [ipld.md](ipld.md) in paragraph **Duplicate property keys**, only the first link is present in the named link section. The other link is present in the `ordered-links` section for completeness and to allow recreating the original protocol buffer message. - -- Links are not accessible on the top level object. Applications that are using protocol buffer objects such as unixfs will have to handle that and special case for legacy objects. - -- No escaping is needed and no conflict is possible - ------------------ - -### Simple variation on that solution - - { - "data": "", - "": { - "link": "", - "name": "", - "size": - }, - "": { - "link": "", - "name": "", - "size": - }, - ... - "ordered-links": [ - "", - { - "name": "", - "link": "", - "tsize": + "@link": "", + "size": } "", ... @@ -109,94 +74,6 @@ There is a canonical form which is described below: - Here we assume that the link #0 and #1 have the same name. As specified in [ipld.md](ipld.md) in paragraph **Duplicate property keys**, only the first link is present in the named link section. The other link is present in the `ordered-links` section for completeness and to allow recreating the original protocol buffer message. -- Link whose name would conflict with other top level keys are not included in the top level object. They are only accessible in `ordered-links` section by iterating through the values. - - Links are not accessible on the top level object. Applications that are using protocol buffer objects such as unixfs will have to handle that and special case for legacy objects. - No escaping is needed and no conflict is possible - -### Other variation: escape encoding - -A protocol buffer message would be converted the following way: - - { - "data": "", - "named-links": { - "": { - "mlink": "", - "name": "", - "size": - }, - "": { - "mlink": "", - "name": "", - "size": - }, - ... - } - "ordered-links": [ - { - "mlink": "", - "name": "", - "size": - }, - { - "mlink": "", - "name": "", - "size": - } - ] - } - -Notes : - -- The `links` array in the `@attrs` section is there to preserve order and duplicate links to hash back to the exact same protocol buffer object. - -- Link hashes are encoded in base58 - -- The link names are escaped to prevent clashing with the `@attr` key. - -- The escaping mechanism transforms the `@` character into `\@`. This mechanism also implies a modification of the path algorithm. When a path component contains the `@` character, it must be escaped to look it up in the IPLD Node object. - - For example, a _filesystem merkle-path_ `/root/first@component/second@component/third_component` would look for object `root["first\@component"]["second\@component"]["third_component"]` (following mlinks when necessary). - -**FIXME: Using the `@` character is not mandatory. Any other character could fit. Don't hesitate to give your ideas.** - -### Other variation that avoids escaping - -We can imagine another transformation where the link names are not escaped. For example: - - { - "": { - "mlink": "", - "tsize": - }, - "": { - "mlink": "", - "tsize": - }, - ... - ".": { - "data": "", - "links": [ - "", - { - "name": "", - "mlink": "", - "tsize": - } - "", - ... - ] - } - } - -Notes: - -- Very conveniently, we use the key `.` to represent data for the current node, and any other key can represent a link. This means that we forbid link to be named `.`. This is in any case a good thing to do as the `.` element in paths can always be removed (the same way `..` can be replaced by the parent directory) - -- No escaping is needed, and no modification to the path algorithm is needed. - -- Link order is kept by using the link names for links present in the top node. This avoid repeating identical data (even though it is probably generated on the fly). - - Links that cannot be present in the top node (the case for the link named `.`, which is forbidden, or for links that are repeated with the same name) are present in full to allow reconstructing the exact protocol buffer object. From d1ceeb303c4fc4ce648da92e54694cd0c4f61b00 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Tue, 9 Feb 2016 22:07:02 +0100 Subject: [PATCH 7/9] Do not make use of escaping --- merkledag/ipld.md | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/merkledag/ipld.md b/merkledag/ipld.md index affc300b8..4fa69db06 100644 --- a/merkledag/ipld.md +++ b/merkledag/ipld.md @@ -72,23 +72,6 @@ O_5 = | "hello": "world" | whose hash value is QmR8Bzg59Y4FGWHeu9iTYhwhiP8PHCN This entire _merkle-path_ traversal is a unix-style path traversal over a _merkle-dag_ which uses _merkle-links_ with names. -**[In case we use escaping in protobuf IPLD format]** - -In order to not restrict individual path component by disallowing some file names and still allow storing arbitrary data in IPLD objects, path components must be escaped when they are looked up in IPLD objects. - -To escape a path component in order to look it up in an IPLD object: - -- every `\` character in the path component must be replaced with `\\` -- every `@` character in the path component must be replaced with `\@` - -This makes any key containing a `@` character unescaped in an IPLD object not accessible through a _filesystem merkle-path_. This is a reserved key that can be used to store auxiliary data without making it a link and visible in regular filesystems. This data can be made available in filesystems through extended attributes or opening and reading file contents. - -To unescape IPLD object keys that are not reserved and get the corresponding path component: - -- every `\@` sequence in the key must be replaced by `@` -- every `\\` sequence in the key must be replaced by `\` - - ## What is the IPLD Data Model? The IPLD Data Model defines a simple JSON-based _structure_ for all merkle-dags, and identifies a set of formats to encode the structure into. @@ -336,10 +319,6 @@ In the same way, when the receiver is storing the object, it must make sure that A simple way to store such objects with their format is to store them with their multicodec header. -### Other encodings - -Speak up in comments if you have other encoding suggestions, or you have another implementation with another encoding. - ## Datastructure Examples It is important that IPLD be a simple, nimble, and flexible format that does not get in the way of users defining new or importing old datastractures. For this purpose, below I will show a few example data structures. From c8452235e33979c811f37004943a05004eb17c53 Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Thu, 11 Feb 2016 21:46:02 +0100 Subject: [PATCH 8/9] Minor wording tweaks --- merkledag/ipld-compat-protobuf.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md index 6ae9ac0b4..59839c719 100644 --- a/merkledag/ipld-compat-protobuf.md +++ b/merkledag/ipld-compat-protobuf.md @@ -39,8 +39,8 @@ This format is defined with the Protocol Buffers syntax as: The conversion to the IPLD data model must have the following properties: -- It should be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. -- When using paths as defined earlier in this document, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. +- It MUST be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. +- When using paths as defined in the IPLD specification, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. - Link names should be able to be any valid file name. As such, the encoding must ensure that link names do not conflict with other keys in the model. There is a canonical form which is described below: From ffa001e645d727cbd460bef400338266b1ea750e Mon Sep 17 00:00:00 2001 From: Mildred Ki'Lya Date: Fri, 12 Feb 2016 12:50:14 +0100 Subject: [PATCH 9/9] Remove named links section (but leave the possibility open) --- merkledag/ipld-compat-protobuf.md | 44 +++++++++++++++---------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/merkledag/ipld-compat-protobuf.md b/merkledag/ipld-compat-protobuf.md index 59839c719..9a79658e6 100644 --- a/merkledag/ipld-compat-protobuf.md +++ b/merkledag/ipld-compat-protobuf.md @@ -37,43 +37,41 @@ This format is defined with the Protocol Buffers syntax as: ## Conversion to IPLD model -The conversion to the IPLD data model must have the following properties: - -- It MUST be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. -- When using paths as defined in the IPLD specification, links should be accessible without further indirection. This requires the top node object to have keys corresponding to link names. -- Link names should be able to be any valid file name. As such, the encoding must ensure that link names do not conflict with other keys in the model. +The conversion to the IPLD data model MUST be convertible back to protocol buffers, resulting in an identical byte stream (so the hash corresponds). This implies that ordering and duplicate links must be preserved in some way. As such, they are stored in an array and not in a map indexed by their name. There is a canonical form which is described below: { "data": "", - "named-links": { - "": { - "@link": "", + "links": [ + { + "@link": "/ipfs/", "name": "", "size": }, - "": { - "@link": "", - "name": "", - "size": - }, - ... - } - "ordered-links": [ - "", { + "@link": "/ipfs/", "name": "", - "@link": "", "size": - } - "", + }, + { + "@link": "/ipfs/", + "name": "", + "size": + }, ... ] } -- Here we assume that the link #0 and #1 have the same name. As specified in [ipld.md](ipld.md) in paragraph **Duplicate property keys**, only the first link is present in the named link section. The other link is present in the `ordered-links` section for completeness and to allow recreating the original protocol buffer message. +The main object contains: + +- A `data` key containing the binary data string +- A `links` array containing links in the correct order + +Each link consists of: -- Links are not accessible on the top level object. Applications that are using protocol buffer objects such as unixfs will have to handle that and special case for legacy objects. +- A `@link` key containing the path to the destination document (Using the `/ipfs/` prefix) +- A `name` key containing the link name (a text string) +- A `size` unsigned integer containing the link size as stored in the Protocol Buffer object -- No escaping is needed and no conflict is possible +Implementations are free to add any other top level key they need. In particular it may be interesting to access the links indexed by their name. This is a purely optional feature and additional keys cannot possibly be encoded back to the protonal Protocol Buffer format.