Prebuilt IndexedDB format #224

nolanlawson · 2017-11-07T23:59:21Z

Shipping prebuilt SQLite databases is a common pattern in native/hybrid apps (e.g. see this, this, this, this). Currently this can be "done" on the web, but you essentially have to build up an entire IDB database on first load (slow) and then invalidate/mutate it when the underlying data store changes (slow, error-prone).

What if instead we had a standard file format for an IndexedDB database, which could be quickly loaded into the browser and accessed in a read-only way? It could be a single file, which would be fetched and cached like any other resource. All processing to convert to the underlying LevelDB/SQLite/ESE format could be done on a background thread.

Use cases:

emoji library (most web emoji libraries run in-memory, despite the large number of emoji)
full-text search index for a card game app
schedule database for a conference app

It's debatable whether the database should be read-only or not, since I've seen cases where a prebuilt database is just used as a starting point for the app (e.g. a Pokédex app where the user can favorite each Pokémon), but for a v1, just keeping it read-only may be very useful. Potentially this could even encourage websites to use less memory (e.g. for the emoji use case).

inexorabletash · 2017-11-09T18:22:59Z

There was a related suggestion for a JSON-described schema format, which it sounds like a subset of this. You'd basically want a JSON format that describes the schema and the content of each object store.

Would JSON be sufficient? It obviously leaves out many types (e.g. dates, and anything which is serializable but not JSONable).

And then there's the question: do we think browsers would be significantly more efficient than just a small JS library that does this in a worker.

nolanlawson · 2017-11-09T20:15:27Z

Would JSON be sufficient?

JSON or ND-JSON might not be sufficient due to the Date issue... lots of folks are storing Dates in IDB. JSON would be very convenient for generating the files, though. Can we extend JSON with something like Date(1510258332834)?

Do we think browsers would be significantly more efficient than just a small JS library that does this in a worker

Absolutely yes, mostly due to the need to do a put() for each individual object, although perhaps that's an argument for putAll() or similar. 😃

kaizhu256 · 2019-03-29T04:55:56Z

why can't you use JSON for date? isn't "2017-11-09T20:12:12.834Z" sufficiently precise for new Date(1510258332834).toISOString()?

inexorabletash · 2019-03-29T17:54:36Z

Re: Dates - you need out-of-band information to know to parse that string as a Date, which means you're not simply providing e.g. an array of values to directly put into a store. At which point, you're inventing a new serialization scheme layered on JSON. Which is fine, but it's no longer "just JSON".

kaizhu256 · 2019-03-29T22:46:45Z

different perspective. i tend to design javascript-systems on assumption it will interact with "dumb" non-locale aware systems (sqlite3, shell-scripts, etc...). and JSON-data like {"utc_datetime":"2019-03-29T05:43:00.000Z","timezoneOffset":300} is usually sufficient.

dmurph · 2019-09-20T02:22:40Z

TPAC 2019 Web Apps Indexed DB triage notes:

It seems like the largest issue here is the batching of putting the data into the database. putAll should fix this issues. This is Issue #69

I believe the next issue here is the overhead of parsing the json, then turning that json into structured cloning, vs going straight from the json blob to structured cloning.

But anyways, something like issue #69 seems like the biggest win here. Maybe there would eventually be a way to stream stuff into putAll? Or give it an async iterator? Unclear.

nolanlawson · 2021-08-23T14:19:15Z

Given that putAll() is no longer in development (#69 (comment)), I like the idea of directly streaming JSON into an object store. As mentioned before, this won't handle Dates, but in retrospect I think this may be an edge case. My hunch is that most folks are storing objects in IndexedDB that are fully serializable as JSON. (I have no data though. 😀)

Something like this?

const reader = (await (await fetch('./array.json')).body).getReader()
const writer = db.getWriter('my_object_store') // WritableStream
await reader.pipeTo(writer)

db.getWriter() could assume a 'readwrite' transaction, and it could take an options bag like db.transaction():

const writer = db.getWriter('my_object_store', { durability: 'relaxed' })

Working with the existing db.transaction() is also possible, but it may be weird to get the microtask timing right, given that we are reading data from the network. Also making it more high-level gives more flexibility to the engine to optimize stuff (maybe?).

If we wanted to support Dates in the future, we could also always have an optional transform or something:

const writer = db.getWriter('my_object_store', { transform: item => {
  // transform a single item from the JSON array
  item.date = new Date(item.date) // parse string to Date
  return item
}})

(Maybe something like module blocks could even make it so that this transform function can run off-main-thread.)

jimmywarting · 2022-12-17T19:45:07Z

I think it would be bad to base this on json when indexeddb can store so much more than json types. So importing/exprting databases would be problematic for dates, blob/files, typedarrays, BigInts, Infinity, circular refs, and everything else that can be structural cloned.

cbor would be a good alternative. as it can handle all of those things.
it's a binary format. so it would be smaller in size and probably faster to decode/encode then json?

SteveBeckerMSFT · 2024-10-01T21:40:28Z

TPAC 2024: While loading from a prebuilt format seems beneficial to support for several scenarios, it remains very challenging to specify and implement without severe limitations. In particular, we would need to spec out the binary format for the prebuilt IDB format. Perhaps these scenarios should consider SQLite WASM with the bucket file system instead? SQLite WASM also has some drawbacks that not every application can overcome (like the same origin requirements).

Historically, browsers have optimized IDB reads more than writes. Perhaps there is an opportunity to revisit putAll() to see if it could help reduce the costs of IDB data import?

nolanlawson · 2024-10-02T03:46:39Z

Maybe putAll() could be more of a win if it accepted a ReadableStream of JSON? Could cut down on memory usage at least.

brettz9 · 2024-10-02T07:44:57Z

I think it would be bad to base this on json when indexeddb can store so much more than json types. So importing/exprting databases would be problematic for dates, blob/files, typedarrays, BigInts, Infinity, circular refs, and everything else that can be strucutred structural cloned.

FWIW, typeson-registry uses, through its structured cloning preset, a form of JSON, typeson, to encode structured cloning data, supporting cyclic data and all of the above-mentioned structured cloning types and more (though a handful of the newer types at https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#webapi_types are indeed not presently supported).

As @inexorabletash suggested, this indeed requires a particular serialization of JSON with an out-of-band section (within the JSON itself, in the case of typeson) tracking which parts correspond to which types.

This doesn't address the smaller size or greater performance that a binary format could provide, but just a FYI on how the structured types issue can at least be overcome by a form of JSON.

nolanlawson · 2024-11-23T19:24:44Z

In particular, we would need to spec out the binary format for the prebuilt IDB format.

It occurred to me that supporting a JSON ReadableStream may be "good enough" for most uses of IndexedDB. I'd be curious to know what % of IndexedDB users are actually using the more exotic stuff supported by structured cloneable objects (e.g. Dates). My guess is not very many – or they would be happy to use strings instead of Dates if it meant faster insertion performance.

Maybe a v1 of putAll could support a ReadableStream of application/json, and a v2 could support some as-yet-undesigned binary format?

paralin · 2024-11-23T23:05:37Z

I would appreciate a way to batch download and put Uint8Array objects as well. But this might not be a common use case.

inexorabletash added feature request schemas/upgrades labels Nov 9, 2017

nolanlawson mentioned this issue May 22, 2018

Proposal: Add explicit IDBTransaction.commit() (was: add "writeonly" mode) #234

Closed

inexorabletash mentioned this issue Jun 14, 2019

JSON schema specification #64

Closed

inexorabletash added this to the vFuture milestone Sep 20, 2019

inexorabletash mentioned this issue Oct 6, 2021

Status report and planning for TPAC 2021 #364

Closed

SteveBeckerMSFT added the TPAC2024 Topic for discussion at TPAC 2024 label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prebuilt IndexedDB format #224

Prebuilt IndexedDB format #224

nolanlawson commented Nov 7, 2017

inexorabletash commented Nov 9, 2017

nolanlawson commented Nov 9, 2017

kaizhu256 commented Mar 29, 2019

inexorabletash commented Mar 29, 2019

kaizhu256 commented Mar 29, 2019

dmurph commented Sep 20, 2019 •

edited by inexorabletash

Loading

nolanlawson commented Aug 23, 2021

jimmywarting commented Dec 17, 2022 •

edited

Loading

SteveBeckerMSFT commented Oct 1, 2024

nolanlawson commented Oct 2, 2024

brettz9 commented Oct 2, 2024

nolanlawson commented Nov 23, 2024

paralin commented Nov 23, 2024

Prebuilt IndexedDB format #224

Prebuilt IndexedDB format #224

Comments

nolanlawson commented Nov 7, 2017

inexorabletash commented Nov 9, 2017

nolanlawson commented Nov 9, 2017

kaizhu256 commented Mar 29, 2019

inexorabletash commented Mar 29, 2019

kaizhu256 commented Mar 29, 2019

dmurph commented Sep 20, 2019 • edited by inexorabletash Loading

nolanlawson commented Aug 23, 2021

jimmywarting commented Dec 17, 2022 • edited Loading

SteveBeckerMSFT commented Oct 1, 2024

nolanlawson commented Oct 2, 2024

brettz9 commented Oct 2, 2024

nolanlawson commented Nov 23, 2024

paralin commented Nov 23, 2024

dmurph commented Sep 20, 2019 •

edited by inexorabletash

Loading

jimmywarting commented Dec 17, 2022 •

edited

Loading