Replies: 2 comments 1 reply
-
This looks great! Still forming thoughts, but some initial ones:
|
Beta Was this translation helpful? Give feedback.
1 reply
-
I feel like we need to define a few more things for this proposal:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The Cloudflare Developer Platform provides a serverless runtime and a bunch of data stores: KV, Durable Objects, Cache, R2, D1 and Analytics Engine. Each of these data stores provides some mechanism for writing data, then reading it back. Specifically, KV, Durable Objects, Cache and R2 are all key-value stores: you can put/get/delete and (ignoring Cache) list keys.
For implementing these, Miniflare has a common key-value storage interface that supports putting/getting/deleting keys with metadata and expiry, then listing them based on
prefix
,start
,end
filters, withcursor
,limit
,reverse
pagination, anddelimiter
grouping. This storage interface worked well in the early days of Miniflare, but as more complex data storage solutions have been added to Workers (particularly R2 and D1 😅), it's started to show some rough edges. In particular:startAfter
filtering has to be implemented withstart
and an increased limit, which is difficult to mix withcursor
-based pagination):
and/
. This means you can't storea/b
anda
at the same time (KV keys with/
with kvPersist throwsEISDIR
#167), and keys are case-insensitive when the underlying file-system is ([BUG] DO storage keys are case-sensitive on the edge, but Miniflare uses case-insensitive key matching with persistence #247). Multiple keys can also map to the same file name on disk. 😬put
s to same key result in malformed data #530)Given that we can make breaking changes to the persistence format with the major version bump to Miniflare 3, I think it's time to rethink storage.
Requirements
KV
get
: read value as stream if not expired, including expiration and metadataput
: write value stream, expiration and metadatadelete
: delete value and metadatalist
: list non-expired keys, filtering byprefix
, paginating bycursor
andlimit
, returning key names, expiration and metadataOur KV implementation also needs to support read-only Workers Sites namespaces, backed by an arbitrary directory, with glob-style include/exclude rules.
Cache
match
: read body as stream if not expired, including status and headers, optionally read a single-range or multiple-ranges as multipartput
: write body stream, status and headersdelete
: delete body, status and headers, returning whether we actually deleted anythingR2
head
: read metadataget
: read value stream, including metadata, optionally read a single-range or only return value if conditional succeedsput
: write value stream and metadata, optionally if conditional succeeds and checksums matchdelete
: delete value and metadatalist
: lists keys with metadata, filtering byprefix
,startAfter
, paginating bycursor
andlimit
, grouping bydelimiter
createMultipartUpload
: write metadata, returning new upload IDuploadPart
: write part stream, returning etagcompleteMultipartUpload
: read parts metadata, write entry with pointers to parts, mark upload ID completedabortMultipartUpload
: mark upload ID abortedD1
Requires exclusive access to an SQLite database
Durable Objects
Persistence implemented entirely in
workerd
Proposal
Instead of having a single store for both metadata and large blobs, I propose we split these up.
Given the variety in queries required by each data store (especially in R2's list), and the hard requirement on SQLite by D1, using SQLite for the metadata store seems like a good idea. This also gives us the transactional updates for things like multipart upload we're looking for.
We could then implement our own simple blob store, supporting multi-ranged, streaming reads. Multiple ranges with multipart responses are required by Cache, and streaming reads seem like a good idea for the large objects R2 can support.
In-memory and file-system backed implementations would be provided for both stores. For file-system backed stores, a root directory should be provided containing the store's data. We should validate that no other stores in the same Miniflare instance are rooted in a child directory of any other file-system stores' roots.
We may also want to provide a simple expiring key-value-metadata store abstraction on top of these, for use with KV and Cache.
In the future, we may implement Miniflare's simulators in
workerd
as opposed to Node.js. SQLite may be implemented on top of Durable Objects, and we could useworkerd
'sDiskDirectory
services for blob storage.SQLite
We should create a new SQLite database for each KV namespace, R2 bucket, D1 database, etc.
For the in-memory implementation, we should use SQLite's built-in
:memory:
-databases.Blob Store
This should provide an interface like:
BlobIds
are opaque, un-guessable identifiers. Blobs can be deleted, but are otherwise immutable. Using immutable blobs makes it possible to perform atomic updates with the SQLite metadata store. No other operations will be able to interact with the blob until it's committed to the metadata store, because they won't be able to guess the ID, and we don't allow listing blobs. For example, if we put a blob in the store, then fail to insert the blob ID into the SQLite database for some reason during a transaction (e.g.onlyIf
condition failed), no other operations can read that blob because the ID is lost (we'll just background-delete the blob in this case).Whilst entire
BlobIds
must be un-guessable, they may still contain identifiable information. One advantage of Miniflare's existing file-system storage is that keys show up as files, grouped by namespace, in your IDE. This makes it easy to inspect written data, and see when storage operations are succeeding. By including auserId
(e.g. KV key, Cache URL, R2 object name) inBlobId
s along with some randomness, blobs written to disk will be user-identifiable and inspectable. We'll want to encodeuserId
s to be file-system safe and not contain directory separators to avoid #167. This encoding must be one-to-one to avoid issues like #247, and ideally would preserve file extensions so images open in an image viewer for example. We'll also want to mark blobs are read-only files to maintain immutability. This is especially important given we'll want to store object size in SQLite for R2 (we don't want to stat files when listing), and this could put the system in an inconsistent state.Range
s are inclusive, soRange
HTTP headers easily map to them.get
has two overloads: one accepts an optional single-range that returns aReadableStream
, whereas the other accepts multiple ranges that returns aMultipartReadableStream
with amultipart/byteranges
Content-Type
header. This multiple range overload will always return amulitpart/byteranges
body
even with zero or one ranges. The caller should decide when this overload makes sense, as some data stores like R2 only support single-ranged reads.get
will returnnull
when a blob with the specified ID can't be found.This interface makes it possible to
delete
a blob whilst performing a streamingget
. On the file-system,delete
will be implemented usingfs.unlink
. This has the behaviour ofunlink(2)
, specifically...This means the file will only be deleted once the streaming
get
has finished, so we don't have a problem here. This is the behaviour on Windows too.Beta Was this translation helpful? Give feedback.
All reactions