Skip to content
This repository has been archived by the owner on Aug 9, 2018. It is now read-only.

Resolve remaining IPLD questions #4

Open
jbenet opened this issue Aug 28, 2015 · 9 comments
Open

Resolve remaining IPLD questions #4

jbenet opened this issue Aug 28, 2015 · 9 comments

Comments

@jbenet
Copy link
Contributor

jbenet commented Aug 28, 2015

@mildred and i discussed where IPLD should go and what to do with the JSON-LD trickiness encountered. We should finish resolving what the plan is and document it here.

@mildred
Copy link
Contributor

mildred commented Aug 28, 2015

In IRC we found that it was possible to put in the context a @container: @index that can, on one level only, make a JSON-LD processor not discard the keys, and instead treat the whole object as a map: a set of objects that can be identified using a key.

If we want that on the top level, though, I don't think we can have a @context explicit on the JSON. We need to transmit the context out of band. This is possible using the HTTP Link header, and we could arrange for similar ways to do that.

We must then arrange for the Node type to contain the link to the context separated from the data.

@jbenet
Copy link
Contributor Author

jbenet commented Aug 29, 2015

8:47 jbenet: I suspect that JSON-LD is going to be slower than our current IPFS objects.

not by a ton. we could benchmark this. i expect a constant increase, not orders of magnitude.

We would need to have the context object and execute a full JSON-LD processor to be able to get the list of links. It's not strainghtforward.

Why cant we grab them like this: https://github.com/ipfs/go-ipld/blob/master/ipld.go#L122-L158 ? Are you worried about people remapping things to be mlinks that we wouldn't know about?

@mildred
Copy link
Contributor

mildred commented Sep 1, 2015

Why cant we grab them like this: https://github.com/ipfs/go-ipld/blob/master/ipld.go#L122-L158 ? Are you worried about people remapping things to be mlinks that we wouldn't know about?

You can't be sure what name has a type unless you parse the context. There are multiple issues there:

  • the mlink type has been renamed by the context to mean something else
  • you might find that the mlink key has been used as part of a @container: @index (a map) instead of a linked data node.
  • the JSON might define the links using some other names than mlink, the context can define another name to refer to the mlink type, or the JSON might use the mlink URI in full
  • this information might not even be available in the JSON directly as contexts can be linked instead of embedded.

There are other issues about your flattened map, it looses all the semantic from JSON-LD in my opinion, for few reasons:

  • You don't have full control over the keys you use in JSON-LD. Keys generally means a type (with the exception of `@container: @Index). You can't name your links the way you want. If the objective is to have those be paths in the IPFS path, this is a problem.
  • What do you do if you have JSON arrays? [...], how do you flatten them?
  • What about keys containing / characters in them. Such as full URI. This is specifically possible: See example 2 at http://www.w3.org/TR/json-ld/#basic-concepts

I think we should choose: If we want to be JSON-LD compatoble, we should't impose too much restrictions. And a JSON-LD author will find it difficult to find a way to describe a document with all these restrictions. The other solution is to support JSON-LD in full, but this is not as easy as parsing JSON. We should recognize that.

Or else, we should find another way to be compatible, or define the level of compatibility we want.

But please don't call JSON-LD what it is merely a subset.

@mildred
Copy link
Contributor

mildred commented Sep 1, 2015

Perhaps we could imagine to require a flattened JSON-LD when we want to store it: http://www.w3.org/TR/json-ld/#flattened-document-form

From the spec:

This ensures a shape of the data and consequently may drastically simplify the code required to process JSON-LD in certain applications

This is great because we write the object once, but read it many times.

The issue is that the original document format is lost. Your document must be fully compatible with JSON-LD and non compatible parts would be lost.

@mildred
Copy link
Contributor

mildred commented Sep 3, 2015

Snips from the channel:

mildred: We can interpret the IPFS dialect find using JSON-LD. But IPFS won't be able to interpret any kind of JSON-LD for obvious performance reasons among other things

jbenet: JSON-LD is a strict subset of JSON, so by making a JSON transport, we can transparently address JSON-LD data

mildred: It will be able to hold any linked data provided it is compatible with our format

@jbenet
Copy link
Contributor Author

jbenet commented Sep 3, 2015

More notes

@diasdavid and I resolved that the main feature we want/need from JSON-LD at the moment is just the @context aliasing, and that it's a great place to start with such a dialect.

All we need to practically implement the aliasing is a module that provides aliasing expand function based on given contexts. The gist is this:

var expand = require('ipld-expand')

var jsonObjEmbedded = {
  "@context": [
      schema["@context"]
    ]
  "foo": {
    "mlink": "<hash>"
  }
}

console.log(expand(jsonObj))
// {
//   "foo": {
//     "/ipfs/<hash-of-mlink>/mlink": "<hash>"
//   }
// }

// now suppose we have a separate context 
var schema = {
  "@context": {
     "mlink": "/ipfs/<hash-of-mlink>/mlink"
  }
}

// expand takes the object to expand, and a map of "url : contextObj". 
// (so expand doesnt retrieve anything, it's already retrieved.)
expand({ 
    "@context": "/ipfs/aaaaa/ipld",
    "foo": { "mlink": "<hash>" }
}, {
    "/ipfs/aaaaa/ipld": schema,
    "/ipfs/bbbbb/bbb": {...},
    "/ipfs/cccccc/ccc": {...}
})

// another option is to do it with an object that supports "get(url)". so
// it could be backed by a simple map (like here), or by http or ipfs proper
expand({ 
    "@context": "/ipfs/aaaaa/ipld",
    "foo": { "mlink": "<hash>" }
}, getContext)
// whete getContext is a function that could make requests, like:
// getContext("/ipfs/aaaaa/ipld") -> schema
// this is obviously slow and brittle in the HTTP case, but may be
// fine + robust in the IPFS case.

@mildred
Copy link
Contributor

mildred commented Sep 3, 2015

I'll try to summarize what we said on IRC @jbenet ?

We start from the following representation:

{ README.md: {mlink: <ipfs hash>}, dir: { file: { mlink: <ipfs hash> } } }

The problem is that we have to know which keys to expand to type URI (when we declare LD context) and which keys are filenames and must no be expanded. JSON-LD can declare values using @container: @index but that works for one level only, and the next level must contains keys that can be expanded.

We could imagine this structure in JSON-LD, but it has an extra level of indirection:

{
@context: {
  links: { @container: @index }
}
"links": {
  // this is an index container
  "mlink": {"mlink": "hash"}, // this is a file entry
  "README.md": {"mlink": "hash"},
  "some-dir": {
    // this is a node, not an index container
    "links": {
      "test.c": {"mlink": "hash"} } }}}

If we don't want to have the level of indirection, we depart from JSON-LD (which is not a problem if we can convert back to JSON-LD). The other problem is that we have a collision problem between filenames and @... directives. The proposed solution is to escape the @ character in keys:

{ dir: { \@context.txt: { @context: <ipfs context>, mlink: <ipfs hash> } } }

When compacted it would lead to:

{ dir/\@context: { @context: <ipfs context>, mlink: <ipfs hash> } }

This is nice because this is almost JSON-LD object already.

With directives stripped for JSON compatibility, it would transform to:

{ dir: { @context: { mlink: <ipfs hash> } } }

(@context is a file name)

Some extracts:

so you cannot do: { "@context": { "@container": "@Index" }, "foo": { ... }, "bar": { ... } } ??

In that form: no. if you transmit the context out of band, probably yes
and foo/bar must contain LD objects, not maps. ie. the foo and bar objects cannot have arbitrary keys
foo and bar objects are the extra level of indirection.

about escaping:

btw, one nice thing is that a json object having {"foo": {"bar":1}, "foo/bar":2} is not "broken". i.e. it can still work in ipfs, because traversal is deterministic (i.e. lexicographically ordered. so "path/to/obj/foo/bar" -> 1. (( whether "path/to/obj/foo/bar" -> 2 is a good idea is left for another day.

@jbenet you're ok with this ?

For me the escaping is a very good solution, the only question left is about flattening. When you transform {foo: {quux: {} }, bar: { baz: {} } } into {foo/quux: {}, bar/baz: {}}.
How do you know when to stop down the hierarchy ?
Does it stops when it finds a non escaped @context (or other directive) ?

We're very close to compatibility with JSON (via escaping) and JSON-LD (via directives and flattening operation).

@jbenet
Copy link
Contributor Author

jbenet commented Sep 4, 2015

thanks @mildred! all lgtm.

How do you know when to stop down the hierarchy ?
Does it stops when it finds a non escaped @context (or other directive) ?

good questions. not sure yet.

not sure whether we should store flattened by default-- user may want to preserve data structure. but we can flatten for our purposes.

@mildred
Copy link
Contributor

mildred commented Sep 4, 2015

A fresh day, fresh ideas:

  • use @container: @multiindex to tell that we have a multiple level map
  • the map stops when we got an object containing any @ directive. Or we could require each level to contain the @container directive to signify it extends itself to multi levels.
  • perhaps we should tell either in the context or the JSON itself what is the separator for flattening
  • when converting to JSON-LD, we flatten the map and present it as JSON-LD using @container: @index instead
  • if we want to add attributes to the root node (the one containing the multiindex) we could perhaps add a @attrs directive containing them, we could also specify how to interpret the index appearing on top.

A unixfs entry as encoded by IPLD would appear like this:

{
  @context: <ipfs hash for root context>
  @container: "@multiindex", // could be in context
  @attrs: {
    // These are te unixfs directory attributes
    type: "unixfsdir",
    mode: 0775,
    // This tells how the multiindex should be interpreted (could be in context)
    @index: "files"
  },
  "some-dir": {
    "file\@.txt": {
      // we already have a root context to specify the mlink
      // perhaps we want to include the context here to be more specific
      // ((recognize this is always the same IPFS hash)
      @context: <context for mlink>,
      mlink: <ipfs hash of file entry>,
      // we can also extend JSON-LD to add the index  key (some-dir/file@.txt)
      // to the LD data model as the filename attribute (possibly be in context)
      @indexkey: "filename"
    }
  }

With this, IPFS makes some simple assumptions. Easy flattening algorithm. The meaning of the mlink keys is always the same because the context for the mlink is always the same hash. IPFS wouldn't need to interpret the context, just compare it with the known hash of mlink.

We can also, after a translation step feed this to any JSON-LD consumer that would be quite happy to understand this in a LD context

We can also strop the @ directives and give it unescaped to any JSON consumer

@jbenet does it still correspond to what you want. How do you stop recursion, @container directive for each level, or any @ directive for the leaf nodes ? Also, I'm available (CEST tz) on IRC, just ping me.

@davidar davidar mentioned this issue Sep 10, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants