-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A more Featureful Approach to Storage APIs #265
Changes from all commits
85fcf24
3b4df1e
0fdf939
5711767
522500c
c56158d
55a4896
273362c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
package linking | ||
|
||
import ( | ||
"io" | ||
|
||
"github.com/ipld/go-ipld-prime/datamodel" | ||
"github.com/ipld/go-ipld-prime/storage" | ||
) | ||
|
||
// SetReadStorage configures how the LinkSystem will look for information to load, | ||
// setting it to look at the given storage.ReadableStorage. | ||
// | ||
// This will overwrite the LinkSystem.StorageReadOpener field. | ||
// | ||
// This mechanism only supports setting exactly one ReadableStorage. | ||
// If you would like to make a more complex configuration | ||
// (for example, perhaps using information from a LinkContext to decide which storage area to use?) | ||
// then you should set LinkSystem.StorageReadOpener to a custom callback of your own creation instead. | ||
func (lsys *LinkSystem) SetReadStorage(store storage.ReadableStorage) { | ||
lsys.StorageReadOpener = func(lctx LinkContext, lnk datamodel.Link) (io.Reader, error) { | ||
return storage.GetStream(lctx.Ctx, store, lnk.Binary()) | ||
} | ||
} | ||
|
||
// SetWriteStorage configures how the LinkSystem will store information, | ||
// setting it to write into the given storage.WritableStorage. | ||
// | ||
// This will overwrite the LinkSystem.StorageWriteOpener field. | ||
// | ||
// This mechanism only supports setting exactly one WritableStorage. | ||
// If you would like to make a more complex configuration | ||
// (for example, perhaps using information from a LinkContext to decide which storage area to use?) | ||
// then you should set LinkSystem.StorageWriteOpener to a custom callback of your own creation instead. | ||
func (lsys *LinkSystem) SetWriteStorage(store storage.WritableStorage) { | ||
lsys.StorageWriteOpener = func(lctx LinkContext) (io.Writer, BlockWriteCommitter, error) { | ||
wr, wrcommit, err := storage.PutStream(lctx.Ctx, store) | ||
return wr, func(lnk datamodel.Link) error { | ||
return wrcommit(lnk.Binary()) | ||
}, err | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
package storage | ||
|
||
import ( | ||
"context" | ||
"io" | ||
) | ||
|
||
// --- basics ---> | ||
|
||
type ReadableStorage interface { | ||
Get(ctx context.Context, key string) ([]byte, error) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the precedent for using If the point is to be more like go-datastore, perhaps that's enough precedent. But I would hope that we would make the interface be what is nicest for Go and as a generic abstraction layer, not just for IPFS in particular. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ultimately I guess my strongest argument is that I'm afraid our applications as a whole will have more heap allocations if we design to (I realize it is also now true that when golang sees More broadly: I do see it as perhaps a bit vanguard, but otherwise unsurprising, to see a binary API taking I'm not sure how strongly held my belief is on this. Maybe I should just hush about that one heap escape. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. key type discussion aside, I think we should really attempt to get away from any sort of interface that straight up returns an array of bytes. |
||
} | ||
|
||
type WritableStorage interface { | ||
Put(ctx context.Context, key string, content []byte) error | ||
} | ||
|
||
// --- streaming ---> | ||
|
||
type StreamingReadableStorage interface { | ||
// Note that the returned io.Reader may also be an io.ReadCloser -- check for this. | ||
GetStream(ctx context.Context, key string) (io.Reader, error) | ||
} | ||
|
||
// StreamingWritableStorage is a feature-detection interface that advertises support for streaming writes. | ||
// It is normal for APIs to use WritableStorage in their exported API surface, | ||
// and then internally check if that value implements StreamingWritableStorage if they wish to use streaming operations. | ||
// | ||
// Streaming writes can be preferable to the all-in-one style of writing of WritableStorage.Put, | ||
// because with streaming writes, the high water mark for memory usage can be kept lower. | ||
// On the other hand, streaming writes can incur slightly higher allocation counts, | ||
// which may cause some performance overhead when handling many small writes in sequence. | ||
// | ||
// The PutStream function returns three parameters: an io.Writer (as you'd expect), another function, and an error. | ||
// The function returned is called a "WriteCommitter". | ||
// The final error value is as usual: it will contain an error value if the write could not be begun. | ||
// ("WriteCommitter" will be refered to as such throughout the docs, but we don't give it a named type -- | ||
// unfortunately, this is important, because we don't want to force implementers of storage systems to import this package just for a type name.) | ||
// | ||
// The WriteCommitter function should be called when you're done writing, | ||
// at which time you give it the key you want to commit the data as. | ||
// It will close and flush any streams, and commit the data to its final location under this key. | ||
// (If the io.Writer is also an io.WriteCloser, it is not necessary to call Close on it, | ||
// because using the WriteCommiter will do this for you.) | ||
// | ||
// Because these storage APIs are meant to work well for content-addressed systems, | ||
// the key argument is not provided at the start of the write -- it's provided at the end. | ||
// (This gives the opportunity to be computing a hash of the contents as they're written to the stream.) | ||
// | ||
// As a special case, giving a key of the zero string to the WriteCommiter will | ||
// instead close and remove any temp files, and store nothing. | ||
// An error may still be returned from the WriteCommitter if there is an error cleaning up | ||
// any temporary storage buffers that were created. | ||
// | ||
// Continuing to write to the io.Writer after calling the WriteCommitter function will result in errors. | ||
// Calling the WriteCommitter function more than once will result in errors. | ||
type StreamingWritableStorage interface { | ||
PutStream(ctx context.Context) (io.Writer, func(key string) error, error) | ||
} | ||
|
||
// --- other specializations ---> | ||
|
||
// VectorWritableStorage is an API for writing several slices of bytes at once into storage. | ||
// It's meant a feature-detection interface; not all storage implementations need to provide this feature. | ||
// This kind of API can be useful for maximizing performance in scenarios where | ||
// data is already loaded completely into memory, but scattered across several non-contiguous regions. | ||
type VectorWritableStorage interface { | ||
PutVec(ctx context.Context, key string, blobVec [][]byte) error | ||
} | ||
|
||
// PeekableStorage is a feature-detection interface which a storage implementation can use to advertise | ||
// the ability to look at a piece of data, and return it in shared memory. | ||
// The PeekableStorage.Peek method is essentially the same as ReadableStorage.Get -- | ||
// but by contrast, ReadableStorage is expected to return a safe copy. | ||
// PeekableStorage can be used when the caller knows they will not mutate the returned slice. | ||
// | ||
// An io.Closer is returned along with the byte slice. | ||
// The Close method on the Closer must be called when the caller is done with the byte slice; | ||
// otherwise, memory leaks may result. | ||
// (Implementers of this interface may be expecting to reuse the byte slice after Close is called.) | ||
// | ||
// Note that Peek does not imply that the caller can use the byte slice freely; | ||
// doing so may result in storage corruption or other undefined behavior. | ||
type PeekableStorage interface { | ||
Peek(ctx context.Context, key string) ([]byte, io.Closer, error) | ||
} | ||
|
||
// the following are all hypothetical additional future interfaces (in varying degress of speculativeness): | ||
|
||
// FUTURE: an EnumerableStorage API, that lets you list all keys present? | ||
|
||
// FUTURE: a cleanup API (for getting rid of tmp files that might've been left behind on rough shutdown)? | ||
|
||
// FUTURE: a sync-forcing API? | ||
|
||
// FUTURE: a delete API? sure. (just document carefully what its consistency model is -- i.e. basically none.) | ||
// (hunch: if you do want some sort of consistency model -- consider offering a whole family of methods that have some sort of generation or sequencing number on them.) | ||
|
||
// FUTURE: a force-overwrite API? (not useful for a content-address system. but maybe a gesture towards wider reusability is acceptable to have on offer.) | ||
|
||
// FUTURE: a size estimation API? (unclear if we need to standardize this, but we could. an offer, anyway.) | ||
|
||
// FUTURE: a GC API? (dubious -- doing it well probably crosses logical domains, and should not be tied down here.) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,49 @@ | ||
// Storage contains some simple implementations for the | ||
// ipld.BlockReadOpener and ipld.BlockWriteOpener interfaces, | ||
// which are typically used by composition in a LinkSystem. | ||
// | ||
// These are provided as simple "batteries included" storage systems. | ||
// They are aimed at being quickly usable to build simple demonstrations. | ||
// For heavy usage (large datasets, with caching, etc) you'll probably | ||
// want to start looking for other libraries which go deeper on this subject. | ||
// The storage package contains interfaces for storage systems, and functions for using them. | ||
// | ||
// These are very low-level storage primitives. | ||
// The interfaces here deal only with raw keys and raw binary blob values. | ||
// | ||
// In IPLD, you can often avoid dealing with storage directly yourself, | ||
// and instead use linking.LinkSystem to handle serialization, hashing, and storage all at once. | ||
// You'll hand some values that match interfaces from this package to LinkSystem when configuring it. | ||
// | ||
// The most basic APIs are ReadableStorage and WritableStorage. | ||
// APIs should usually be designed around accepting ReadableStorage or WritableStorage as parameters | ||
// (depending on which direction of data flow the API is regarding), | ||
// and use the other interfaces (e.g. StreamingReadableStorage) thereafter internally for feature detection. | ||
// Similarly, implementers of storage systems should implement ReadableStorage or WritableStorage | ||
// before any other features. | ||
// | ||
// Storage systems as described by this package are allowed to make some interesting trades. | ||
// Generally, write operations are allowed to be first-write-wins. | ||
// Furthermore, there is no requirement that the system return an error if a subsequent write to the same key has different content. | ||
// These rules are reasonable for a content-addressed storage system, and allow great optimizitions to be made. | ||
// | ||
// If implementing a storage system, you should implement packages from this interface. | ||
// Beyond the basic two (described above), all the other interfaces are optional: | ||
// you can implement them if you want to advertise additional features, | ||
// or advertise fastpaths that your storage system supports; | ||
// but you don't have implement any of the additional interfaces if you don't want to. | ||
// | ||
// Note that all of the interfaces in this package only use types that are present in the golang standard library. | ||
// This is intentional, and was done very carefully. | ||
// If implementing a storage system, you should find it possible to do so *without* importing this package. | ||
// Because only standard library types are present in the interface contracts, | ||
// it's possible to implement types that align with the interfaces without refering to them. | ||
// | ||
// Note that where keys are discussed in this package, they use the golang string type -- | ||
// however, they may be binary. (The golang string type allows arbitrary bytes in general, | ||
// and here, we both use that, and explicitly disavow the usual "norm" that the string type implies UTF-8. | ||
// This is roughly the same as the practical truth that appears when using e.g. os.OpenFile and other similar functions.) | ||
// If you are creating a storage implementation where the underlying medium does not support arbitrary binary keys, | ||
// then it is strongly recommend that your storage implementation should support being configured with | ||
// an "escaping function", which should typically simply be of the form `func(string) string`. | ||
// Additional, your storage implementation's documentation should also clearly describe its internal limitations, | ||
// so that users have enough information to write an escaping function which | ||
// maps their domain into the domain your storage implementation can handle. | ||
package storage | ||
|
||
// also note: | ||
// LinkContext stays *out* of this package. It's a chooser-related thing. | ||
// LinkSystem can think about it (and your callbacks over there can think about it), and that's the end of its road. | ||
// (Future: probably LinkSystem should have SetStorage and SetupStorageChooser methods for helping you set things up -- where the former doesn't discuss LinkContext at all.) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
dsadapter | ||
========= | ||
|
||
The `dsadapter` package/module is a small piece of glue code to connect | ||
the `github.com/ipfs/go-datastore` package, and packages implementing its interfaces, | ||
forward into the `go-ipld-prime/storage` interfaces. | ||
|
||
For example, this can be used to use "flatfs" and other datastore plugins | ||
with go-ipld-prime storage APIs. | ||
|
||
Why structured like this? | ||
------------------------- | ||
|
||
Why are there layers of interface code? | ||
The `go-ipld-prime/storage` interfaces are a newer generation, | ||
and improves on several things vs `go-datastore`. (See other docs for that.) | ||
|
||
Why is this code in a shared place? | ||
The glue code to connect `go-datastore` to the new `go-ipld-prime/storage` APIs | ||
is fairly minimal, but there's also no reason for anyone to write it twice, | ||
so we want to put it somewhere easy to share. | ||
|
||
Why does this code has its own go module? | ||
A separate module is used because it's important that go-ipld-prime can be used | ||
without forming a dependency on `go-datastore`. | ||
(We want this so that there's a reasonable deprecation pathway -- it must be | ||
possible to write new code that doesn't take on transitive dependencies to old code.) | ||
|
||
Why does this code exist here, in this git repo? | ||
We put this separate module in the same git repo as `go-ipld-prime`... because we can. | ||
Technically, neither this module nor the go-ipld-prime module depend on each other -- | ||
they just have interfaces that are aligned with each other -- so it's very easy to | ||
hold them as separate go modules in the same repo, even though that can otherwise sometimes be tricky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this be more naturally a
[]byte
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pay [REDACTED] for an immutable byte slice type in golang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[REDACTED], Will. Truly. [REDACTED].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather we eat that up by making a copy than having all users be forced to cast every time they deal with the core key type in the system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to side with Will here; strings are technically like read-only byte slices, but using them as such in return values is a bad idea. I think they're only reasonable in cases where the bytes being shoved into a string is necessarily a short-lived state, such as
map[string]T
orstring(input) == string(expected)
.Like Will says, if the objective here is preventing corruption/races, make a copy - converting to a string makes a copy anyway. Converting to a string is arguably worse in all scenarios - if the user wants to use the bytes like a byte slice, they'll need to make a second copy for the conversion back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The natural resident form of this data is usually
string
.I think we would be encountering significantly more conversions (and probably, heap-escaping allocations) if this interface required
[]byte
. (I was being flippant about the mutability thing; while that does bother me, it's not the most essential reason for this.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the internal storing of string is for similar reasons though, right? the thing you do to get the binary form of a cid to access it is
[]byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a step back, I'm fairly certain this should be just https://pkg.go.dev/encoding#BinaryMarshaler. We should have a very strong reason to use a different interface (and type!) for exactly the same concept, and "save a bit of overhead with go-cid's internal representation" doesn't seem particularly strong to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not at all convinced I care about any of this.
There is no reason to be interested in implementing
BinaryMarshaler
that I can see. (It's not that it's a bad idea; it's just that I don't see a reason to care.) There is also no possible error return.I want this to be efficient, do the thing, get out of the way, and not provoke allocations. None of these arguments about style are punching on the same level?
The original move of
go-cid
to usestring
was pretty highly thoughtful, planned for a long time, and once executed, hasn't provoked regrets. I'd need quite a lot of convincing to move in any other direction except the same one, here.(Sorry if I'm being a little terse, and for the initial flippant replies -- I'm actually very surprised we're talking about this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that your error would always be nil, and that the string-byte conversion is not ideal, but that's the kind of tradeoff you have with generic APIs and interoperability. I would have assumed that the point of an IPLD library in Go is to be easy to use and interoperable, not to squeeze every last bit of performance at the cost of usability :)
As a note, I also disagree that you can predict that you'll never need to return an error. For example, see ipni/storetheindex#94.