-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec for adder APIs (both low-level and high-level) #1291
Comments
Thinking this over some more, we probably want a pre-chunk/trickle callback as well as the current post-chunk/trickle
I considered sticking with |
Hrm, it seems the comment i thought i left didnt make it. a few things, the functionality you are referring to as 'trickle' is the layout. the trickledag is a specific layout implementation that lays out blocks in a pattern reminiscent of a recursive binomial heap. I'm also very curious what the reason for taking an |
On Wed, May 27, 2015 at 10:44:32AM -0700, Jeromy Johnson wrote:
Ok, so trickle → layout.
As I tried to explain in the comment for NodeCallback: … you can use it to extract metadata about the visited file Can you suggest more obvious wording? |
shell/unixfsshell/unixfs.NodeCallback
What might be some example calls that use shell/unixfs.Add/AddFromReader
sync stuff
Maybe the NodeCallback can do the channel stuff: type nodeElem struct {
node *dag.Node
err error
}
sendchan := make(chan nodeElem)
recvchan := make(chan nodeElem)
unixfs.Add(ctx, n, r, func(nIn *dag.Node, p *path.Path, f *os.File, top bool) (nOut *dag.Node, err error) {
sendchan <- nodeElem{n}
nOutElem <- recvchan
return nOutElem.node, nOutElem.err
})
SGTM! 👍
There's an important observation to make about the Shell, Core, and other APIs. Whatever we do, we need:
So far, we've been calling both 1. and 3. "Shell". We should make a distinction, and also account for how some of this will be implemented, given that the complexity of implementing 3. may impact the implementation of 1. and 2. Meaning, that however we decide to do 3., may make us move from: // pass in a node
func Add(n *Ipfs.Node, ...) ... to // methods on the node
type Node interface {
Add(...) ...
} I prefer the way above, because it allows more commands to be added on top of a very basic thing (the unixfs.Add(ctx, n, r, ...) // this is going to happen over RPC. Trickler
I think we should call this "indexing". "TrickleDag" is the one specific implementation @whyrusleeping came up with. "indexing" is more general and describes various ways to index data/objects. this lends itself well to talking about databases, and database internal datastructures for better access to data at the leaves. chunking
We need a chunking interface. I started on this a few weeks ago. Take a look at this: https://github.com/jbenet/go-chunk it's very WIP. I'd like to do this independent of IPFS as it could be useful to other projects, and other languages. (e.g. implement rabin fingerprint chunking everywhere).
Oh layout works too! (see comment about indexing above.). Also, this is actually a graph topology. ("the way in which constituent parts are interrelated or arranged.")
I think this is special casing. this is not 90% of cases. 90% of cases just want an io.Reader. (see my comments in this post above). We should have the Possible that all this could be fed to the callback anyway though... : f := os.Open(...)
// suppose Add takes an io.Reader
unixfs.Add(ctx, n, r, func(nIn *dag.Node, r io.Reader, top bool) (nOut *dag.Node, err error) {
// have f!
})
Ah, ok. i see what's up. So-- I think that an We could have a "dealing with os.Files" set of calls that do the magic of dealing with directories and files and so on, but that's very much filesystem specific porcelain on top of the simple idea of "take a stream of data, chunk it, and add it to IPFS". i think: // basic unixfs add. uses io. idiomatic. no dirty files.
Add(core.Node, io.Reader)
// fs stuff
AddFile(core.Node, *os.File, ...)
AddDir(core.Node, *os.File, ...)
AddPath(core.Node, path.Path, ...)
// or maybe they're really "import" calls
ImportFile(core.Node, *os.File, ...)
ImportDir(core.Node, *os.File, ...)
ImportPath(core.Node, path.Path, ...)
// or maybe there's "importers"
i := Importer(chunk.Rabin, topology.Trickle)
AddPath(n, s, i) not sure. @whyrusleeping ? |
On Mon, Jun 01, 2015 at 07:29:16PM -0700, Juan Batiz-Benet wrote:
For AddFromReader, NodeCallback's f will be nil. The docs for ‘f’ say: This will be nil for io.Reader-based adders. If you have clearer wording in mind, just let me know what to say.
Ok. The uses I see for it at the moment are pinning (like #1274), Along this vein, I think it's probably worth passing an array of
But AddFile is also about directories, and while those are os.File
That doesn't look like it's using NodeCallback for the channel stuff.
I think you mean 2 and 3. And having command-line and HTTP APIs (3)
I don't mind either way, but Ipfs.Nodes are complicated things, so I'd
I think lots of stuff in go-ipfs is independent of IPFS and I'd like
Again, I'm having difficulty understanding what you intend here. Are
Sure, my Add wrapper is going to be calling AddReader internally (or
I think we agree here, and just have different criteria for the
Why make a distinction between AddFile and AddDir? They have the fileInfo, err := file.Stat() inside your implementation.
I don't like this, because:
I don't see the point to grouping the chunker and layout together. |
On Wed, May 27, 2015 at 10:44:32AM -0700, Jeromy Johnson wrote [1]: > ... the functionality you are referring to as 'trickle' is the > layout. the trickledag is a specific layout implementation that lays > out blocks in a pattern reminiscent of a recursive binomial heap. [1]: #1291 (comment)
On Mon, Jun 01, 2015 at 07:29:16PM -0700, Juan Batiz-Benet wrote [1]: > - name the parameters full words. single char var names in the doc > are a bit hard to read. The only parameter that's not a full word is 'ctx', since that seems to be the conventional name for the Context parameter. [1]: #1291 (comment)
To keep track of the spec's evolution and make it easier to make
inline comments, I've spun this off into an orphan branch with #1321.
|
i think this is way overkill. one can chain the calls with one function themselves. you might even provide a utility function that turns N callbacks into one which applies them sequentially, but this function should just accept one callback.
use
just meaning to show how you can feed a callback into .Add that happens to use channels to do all the communication. (i.e. you dont need something chan specific / special)
ah right, yes, 2 and 3.
not sure what space to put, but these feel like different things.
yeah. ok sounds good.
amen. seriously. we're close-- need our vendor tool.
Yes, I completely agree. This monolithic repo is hard to manage and less productive overall. our stuff becomes hard to reuse, etc. We'll break it up, just need our ipfs-based packaging tool.
yeah exactly, and we can have intense per-module tests and run tests more smartly.
i dont mean a global. it's not func-scoped for simplicity of writing out the example (like the stdlib examples). and-- what the example expresses is using closure scopes instead of adding a ton of parameters into the function call for everyone. closure scopes allows us to pass params into the callback without needing to register them directly with intermediate calls that dont care about that.
ok sounds goood. alright, let's use |
This reverts commit d617dfb. On Wed, Jun 03, 2015 at 01:54:03AM -0700, Juan Batiz-Benet wrote [1]: > > it's probably worth passing an array of NodeCallback (and > > PreNodeCallback) references to the adders > > i think this is way overkill. one can chain the calls with one > function themselves. you might even provide a utility function that > turns N callbacks into one which applies them sequentially, but this > function should just accept one callback. That's a much cleaner approach, so we're going back to the non-slice callbacks. [1]: #1291 (comment)
Juan argued for this rename because he expects the Reader-based case will be more common in a highly-networked world [1]. I'd rather have kept the old naming, because I expect some folks will think AddFile is only about leaf files (when it also handles directories and recursion via an *os.File file descriptor) [2]. But Juan still felt that the Reader-case deserved top billing [3], so go with his suggested naming. [1]: #1291 (comment) [2]: #1291 (comment) [3]: #1291 (comment)
On Wed, Jun 03, 2015 at 01:54:03AM -0700, Juan Batiz-Benet wrote:
Done in e6b7dea.
Done in 820b3d8.
Do you think we need an explicit example of that in the spec docs? Or
Ah, that makes sense. |
I think an example should exist somewhere. maybe we can use a godoc example for it? but dont care much either way. |
@wking sorry maybe i missed it-- the motivation for |
On Wed, Jun 03, 2015 at 02:44:46PM -0700, Juan Batiz-Benet wrote:
Motivation for pre-node callback in 1 and 999cda9 (Add a Motivation for the post-node callback is discussed in [2](starting |
On Wed, Jun 03, 2015 at 02:44:00PM -0700, Juan Batiz-Benet wrote:
Ah, interesting. The connection with godoc is documented in 1. 'm |
yeah, likely not worth spending the time mocking the node out |
A clear picture of the intended adder API is blocking some of the later commits in #1136. This issue floats my first pass at an adder API that I think strikes a good balance between power and simplicity (mostly by offloading the complicated bits to a per-DAG-node callback).
Where it lives
I'd suggest moving
core/coreunix
tocore/unixfs
to matchunixfs/
andshell/unixfs/
.High-level UI (shell/unixfs/)
Most additions will be recursive and load data from a *File
(which can be a directory or a file). Alternatively, the
*FromReader
variants accept a [Reader][2].We need a way to get information about progress of a running addition
back to other goroutines. Choices for this include the channel
messages proposed in #1121 or additional arguments to a
per-chunk callback like that proposed in #1274. The main
difference between a callback and a channel is whether or not we want
synchronous collaboration between the adder and the hook. Since I
think we want the option for synchronous collaboration
(e.g. optionally inserting a metadata node on top of each file node).
For situations where asynchronous communication makes more sense, the
user can provide a synchronous callback that just pushes a message to
a channel (so the callback-based API supports the channel-based
workflow).
Actions like wrapping an added file in another Merkle object to hold a
filename is left to the caller and the callback API.
Low-level UI (core/unixfs/)
These should look just like the high-level API, except instead of
passing in an IpfsNode and using that node's default DAG service,
trickler, and splitter, we pass each of those in explicitly:
We don't currently have a public
Trickler
interface, but I think we shouldadd one so folks can easily plug in alternative trickler implementations.
I'm not familiar enough with Go at the moment to know which arguments
are best passed by reference and which should be passed by value. If
I was writing this in C, everything except the Boolean ‘top’ would be
passed by reference, but I'll have to read around to figure out what's
idiomatic in Go.
The text was updated successfully, but these errors were encountered: