DEX #57

daviddias · 2016-01-02T19:00:40Z

@whyrusleeping let's fill the implementations chapter with pointers to where the current chunker and layout is implemented and go and align on what the interfaces should be for this importers

whyrusleeping · 2016-01-03T13:34:41Z

For importers, we have the default chunker that splits the input stream into blocks of 256k, and one that does rabin fingerprints for the chunking. The interface for our chunker looks like:

type Chunker interface {
    NextBytes() ([]byte, error)
}

NextBytes()gets called on the chunker until no bytes are remaining, at which point it returns an EOF sentinel error (implementation details).

The second part of our importing code is the layout engine, we have two of those as well. The default is the balanced tree. The tree has a width at each layer of 256k/sizeof(link)

The general algorithm for building the balanced tree is this:

if only one block exists, it is its own depth=1 tree. From depth 1 trees, we can generate depth=2 trees by generating up to MAXWIDTH depth=1 trees and adding them as children of a new node. Using the same recursive logic, we can generate any depth of tree dynamically as more and more data comes in from the chunker (eliminating the need to know data size beforehand to select a depth). Note, data is ONLY stored in the leaf nodes, storing data in intermediate nodes is complicated and fragile and not conducive to effective deduplication (although some significant latency gains might be made by doing so).

The second layout algorithm is called the trickledag and is something i wrote to provide a data format better suited to sequential streaming of content. The basic idea is that a depth N tree has X leaf nodes as children, and Y trees of each depth up to N-1. Its essentially an expanded binomial heap construction. The advantage is that at each point in traversing the tree (at least sequentially) you can make a single request and get real data. That code is here: https://github.com/ipfs/go-ipfs/blob/master/importer/trickle/trickledag.go

daviddias · 2016-12-06T16:56:35Z

For context and reference material

Trickledag

The TrickleDAG layout is a Binomial Heap with Dynamic Width and multiple repetitions at each layer defined by a constant (so far it is 4 and never changed)

When the trickledag got added to go-ipfs - ipfs/kubo#713

initial commit for data importing spec

40c297e

jbenet added the in progress label Jan 2, 2016

add intro and requirements

464745c

This was referenced Jan 5, 2016

Sprint Dec 21 ipfs/team-mgmt#76

Closed

Sprint January 5th ipfs/team-mgmt#77

Closed

Add Group add ipfs-inactive/http-api-spec#17

Merged

Sprint January 11th ipfs/team-mgmt#79

Closed

This was referenced Jan 14, 2016

Implement datastore ipfs/js-ipfs-repo#13

Closed

Sprint January 19th ipfs/team-mgmt#83

Closed

daviddias mentioned this pull request Jan 26, 2016

🚣 captain.log - IPFS JavaScript implementation 🌟 ipfs/js-ipfs#30

Closed

daviddias removed the in progress label Mar 14, 2016

daviddias mentioned this pull request May 9, 2016

IPLD Data Importing - Set of Importers ipfs/js-ipfs#41

Closed

daviddias changed the title ~~WIP: Data Importing Spec~~ DEX Feb 13, 2017

rename to DEX (for now) and point to all of the discussions

b210bca

daviddias merged commit 827769e into master Feb 13, 2017

daviddias deleted the ipfs/data-importing branch February 13, 2017 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEX #57

DEX #57

daviddias commented Jan 2, 2016

whyrusleeping commented Jan 3, 2016

daviddias commented Dec 6, 2016 •

edited

Loading

DEX #57

DEX #57

Conversation

daviddias commented Jan 2, 2016

whyrusleeping commented Jan 3, 2016

daviddias commented Dec 6, 2016 • edited Loading

Trickledag

daviddias commented Dec 6, 2016 •

edited

Loading