Skip to content
This repository has been archived by the owner on Aug 9, 2018. It is now read-only.

Add NodeReader interface #19

Closed
wants to merge 15 commits into from
Closed

Add NodeReader interface #19

wants to merge 15 commits into from

Conversation

mildred
Copy link
Contributor

@mildred mildred commented Dec 27, 2015


See issue #17

Do not merge yet, tests are still needed, and I would like to use this interface in some places instead of the Node type. This will allow implementations that do not require buffering the whole node in memory.

@mildred mildred force-pushed the reader branch 2 times, most recently from 5bff9a5 to 102908c Compare December 31, 2015 17:06
@mildred
Copy link
Contributor Author

mildred commented Dec 31, 2015

I added tests for NodeReader, and I believe it is ready to be used. I have a few things that I would still like to improve. @jbenet, tell me what you think of it.

  • The Walk function will never work with NodeReader. The callback takes a current Node as argument. Is it desirable to design a similar function for NodeReader? If so, how do we avoid buffering data?
  • The Link type is just a Node in desguise. It has many methods that I don't think we need yet, especially considering the IPLD spec WIP: IPLD spec ipfs/specs#37. I'd also like to replace it with an interface (and have Node implements it).

Once this is done, I'd like to make sure that ipld implements the draft spec (and perhaps drop @ character escaping as this is not part of the spec). Make sure it works well, and integrate it to the github.com/ipfs/go-ipfs/merkledag package. Mostly, we'd read a NodeReader in one pass to build a merkledag.Node, dropping non necessary keys.

@mildred mildred force-pushed the reader branch 2 times, most recently from 49fc5bd to feaaa98 Compare January 15, 2016 09:39
@mildred
Copy link
Contributor Author

mildred commented Jan 15, 2016

Depends on:

It can't compile unless those pull requests are merged (especially the CBOR PR).

@jbenet
Copy link
Contributor

jbenet commented Jan 22, 2016

this is looking like a great start. Im not sure that the interface here should look like. i wont know well until i use it for something... maybe we can make some examples if we want to accelerate that.

@mildred
Copy link
Contributor Author

mildred commented Jan 22, 2016

I am not sure the interface should look like this either, but I thought I would take the closest interface I could have to the parsers, with as little overhead as possible. With the idea we could develop other interfaces more convenient for use if necessary.

@mildred mildred force-pushed the reader branch 2 times, most recently from 0a368f0 to a3cb0a5 Compare February 7, 2016 15:13
@mildred
Copy link
Contributor Author

mildred commented Feb 7, 2016

I am satisfied with the state of this PR. I think it is ready for review.

Here is how the code is organized:

  • what was in go-ipld/ is not in go-ipld/memory/ as this is a memory implementation of a Node
  • go-ipld/ is now a package that contain Node interfaces and primitives on these interfaces
  • go-ipld/coding/ manages encoding and decoding of a node using CBOR / JSON and Protocol Buffers (decoding only)

How you would want to use it:

  • Decoding a node takes a io.Reader as argument (io.ReadSeeker if you want to read it multiple times, but that is managed using interface upgrades) and returns a ipld.NodeReader.
  • A NodeReader can be read using the ipld.NodeIterator that transform the callback interface of ipld.NodeReader into an iterator
  • A ipld.NodeReader can be copied into a ipld.NodeWriter using ipld.Copy(). Currently, the only implementation for a NodeWriter is in the memory package
  • The coding package takes a memory.Node and can write it to CBOR & JSON. There is no support for writing directly from a NodeReader.

The walk functions are in the memory package. I think we need to define what use case we want to have and then implement them for NodeReader in general.

@mildred
Copy link
Contributor Author

mildred commented Feb 7, 2016

@jbenet could you look at that, or invite other people who might be interested in looking at this code?

@mildred
Copy link
Contributor Author

mildred commented Feb 7, 2016

For example of use, see https://github.com/mildred/go-ipfs/commits/ipld-wip

@mildred mildred force-pushed the reader branch 2 times, most recently from ee7b1d5 to 2d9d5bc Compare February 9, 2016 10:54
@mildred
Copy link
Contributor Author

mildred commented Feb 10, 2016

Requires whyrusleeping/cbor#4

@mildred
Copy link
Contributor Author

mildred commented Feb 10, 2016

Tests are failing for two reasons: the dependency on new features in the CBOR lib, and the dependency to JSON features that are in Go 1.5 but not 1.4 (json token reader)

@mildred mildred force-pushed the reader branch 2 times, most recently from 698a5d7 to fd396c6 Compare February 11, 2016 08:43
@mildred
Copy link
Contributor Author

mildred commented Feb 11, 2016

For Go 1.4, I vendored the encoding/json library that I took from the Go 1.5 release

@jbenet
Copy link
Contributor

jbenet commented Feb 24, 2016

@mildred lmk when you want review here

@mildred
Copy link
Contributor Author

mildred commented Mar 11, 2016

The interface above was almost impossible to implement. I found a better interface that works (I got a first implementation):

// Represents a node object that is iterable.
// It iterates over key value pairs. When created, the NodeIterator does not
// have a current item and iteration must be started using Next()
type NodeIterator interface {
    // Go to next item. Sub items iteration is aborted.
    // Returns true if there is an item
    // Returns false at the end of the iteration
    Next() bool

    // Iterate the current item value
    // Return nil if the iteration hasn't started, is finished or if the current
    // item is not iterable
    Iterate() NodeIterator

    // Abort iteration. Does nothing if the iteration is finished. Also aborts
    // child iterators that were returned by the Iterate() method.
    Abort()

    // Return the current item key, or nil if the iteration hasn't started or is
    // finished
    Key() interface{}

    // Return the current item value or nil if the iteration hasn't started, is
    // finished, or if the current item is iterable. In that case the Iterate()
    // method should be called instead.
    Value() interface{}
}

@mildred mildred force-pushed the reader branch 3 times, most recently from e79e62c to ded931c Compare March 11, 2016 19:59
@mildred
Copy link
Contributor Author

mildred commented Mar 11, 2016

@jbenet @davidar, please review these commits. Don't hesitate to look at the code commit by commit. The commits are mostly independent and I rebased the whole thing so each commit stands on its own.

Some features from last time are not included (they are still in the reader-extra branch on my repo). There is only the code features needed.

The basic interface in in go-ipld/node.go, it is NodeIterator and is easy to work with. bytes can be decoded into it with zero copy and zero buffering. Encoding it back requires to store the whole thing into the memory for a few reasons. Overall, this is not a problem because we will be reading nodes much more often than writing them.

I'm still not using the Multicodec type because it is not stream oriented, and also because I think it add too much overhead for something really simple (take a reader, spit out an empty interface). No need for 3 kind of objects per multicodec layer. @jbenet, if you want to make use of the Multicodec, we'll have to figure out how to do streaming with it.

Also, implementing multicodecs without the Multicodec type (the multicodec package is still used to read / write headers) is dead simple. Encoding is just a Write(Header) and decoding is just a ReadHeader() followed by a switch statement (or equivalent)

@mildred
Copy link
Contributor Author

mildred commented Mar 11, 2016

(some tests are failing, this is a data race, I'll fix it, but that should not impact the review very much)

@mildred mildred force-pushed the reader branch 3 times, most recently from 317620b to f6617df Compare March 17, 2016 12:26
@davidar
Copy link

davidar commented Mar 26, 2016

@mildred I can't comment on golang specifics other than "oh god so much if err != nil --- where are my monads?!" :p but it seems a lot more focused than before, so LGTM :)

@jbenet
Copy link
Contributor

jbenet commented May 22, 2016

I still havent reviewed the code sorry.

Up front, i think the iterator route is great! much, much nicer to do that.

I would go for:

type NodeIterator interface {
  Next() bool // in streaming implementations, automatically Skips all children iterators
  Children() NodeIterator  // instead of Iterate
  Skip() // instead of abort.

  Key() interface{}
  Value() interface{}
}

@jbenet
Copy link
Contributor

jbenet commented May 22, 2016

  • why is the Key() an interface{}? and not a string?

@jbenet
Copy link
Contributor

jbenet commented May 22, 2016

With code like:

iter := iterateOverReader(reader)
printLinks(iter)


def printLinks(iter NodeIterator) {
  for iter.Next() {
    if c.Key() == "/" {
      fmt.Println("found link to", c.Value())
    }

    printLinks(iter.Children())
  }
}

@mildred
Copy link
Contributor Author

mildred commented May 27, 2016

On Sat, 21 May 2016 22:26:51 -0700
Juan Benet notifications@github.com wrote:

  • why is the Key() an interface{}? and not a string?

The key can be an integer (for arrays) or a string (for maps/objects).

Perhaps that could be separated in two functions (Key() and Index()). That should be easy, but what should Key() return on objects?

Mildred Ki'Lya mildred@mildred.fr

@mildred mildred mentioned this pull request May 27, 2016
mildred added 4 commits June 16, 2016 16:35
- add protocol buffer reader (and use the /mdagv1 multicodec prefix)
- regenerate protocol buffer code
- make the codec package stream oriented
- remove use of the Multicodec type as it is buffer oriented
- general cleanup
- Decode to the NodeReader interface
- Add writeable support for JSON and CBOR from memory
- Implement CBOR link tag for reading and writing
- Force JSON & CBOR to be constructed from io.ReadSeeker
- Ensure only one NodeReader is active at a single time

NodeReader: test Skip and Abort for all readers (and fix it)
Need Go 1.5 for its new JSON features. Could be backported to Go 1.4 but IPFS is
already limiting to Go 1.5
@mildred
Copy link
Contributor Author

mildred commented Jun 16, 2016

Until now I didn't see you wanted renames in the NodeInterface. I performed these.

@jbenet, could you look into this PR, or tell me what I can do to help the review (splitting, ...). Every commit is understandable by itself if that can help. I take care to rebase my changes every time.

Also, this PR is not up to date to the latest spec. In particular the link key is not yet "/" and the code still supports having attributes along with the link itself. I'd like to have it merged before updating these. These modifications are also pending on ipfs/specs#111. I'd like to implement it using proper EJSON.

@yusefnapora
Copy link

yusefnapora commented Aug 24, 2016

Hi @mildred, I've been playing with this branch a bit to try to see how I can help with go-ipld, since we're planning to use it in the mediachain project. I'm still learning my way around go, so I'll probably miss some obvious things :)

I noticed that links aren't getting encoded to cbor tags, which seems to be because there's two LinkKeys defined. The memory nodes use "mlink" as the LinkKey (defined in memory/node.go), but the cbor encoder's encodeFilter is checking for ipld.LinkKey, which is defined as "@link". I'm guessing those are both eventually going to be changed to / to fit the new spec? At the moment the result is links get encoded to cbor as maps with "mlink" keys instead of CBORTags. That matches what's in cbor.testfile, so the tests all pass.

Also the Makefile in the coding dir references a now-missing file bin/convert.go, so I can't easily regenerate the test cases. Not a big deal; as I was mucking about I generated some manual test cases without too much fuss by just spitting out some cbor and prepending the multicodec header from the existing case.

Overall this seems great, although I don't really know enough go for that opinion to have much weight 😄 I really like the NodeIterator interface though; it seems like it provides a lot of flexibility at the application level.

Anyway, if there's anything I can help you with, please let me know.

@daviddias daviddias closed this Aug 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants