Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prebuilt IndexedDB format #224

Open
nolanlawson opened this issue Nov 7, 2017 · 13 comments
Open

Prebuilt IndexedDB format #224

nolanlawson opened this issue Nov 7, 2017 · 13 comments
Labels
feature request schemas/upgrades TPAC2024 Topic for discussion at TPAC 2024
Milestone

Comments

@nolanlawson
Copy link
Member

Shipping prebuilt SQLite databases is a common pattern in native/hybrid apps (e.g. see this, this, this, this). Currently this can be "done" on the web, but you essentially have to build up an entire IDB database on first load (slow) and then invalidate/mutate it when the underlying data store changes (slow, error-prone).

What if instead we had a standard file format for an IndexedDB database, which could be quickly loaded into the browser and accessed in a read-only way? It could be a single file, which would be fetched and cached like any other resource. All processing to convert to the underlying LevelDB/SQLite/ESE format could be done on a background thread.

Use cases:

  • emoji library (most web emoji libraries run in-memory, despite the large number of emoji)
  • full-text search index for a card game app
  • schedule database for a conference app

It's debatable whether the database should be read-only or not, since I've seen cases where a prebuilt database is just used as a starting point for the app (e.g. a Pokédex app where the user can favorite each Pokémon), but for a v1, just keeping it read-only may be very useful. Potentially this could even encourage websites to use less memory (e.g. for the emoji use case).

@inexorabletash
Copy link
Member

There was a related suggestion for a JSON-described schema format, which it sounds like a subset of this. You'd basically want a JSON format that describes the schema and the content of each object store.

Would JSON be sufficient? It obviously leaves out many types (e.g. dates, and anything which is serializable but not JSONable).

And then there's the question: do we think browsers would be significantly more efficient than just a small JS library that does this in a worker.

@nolanlawson
Copy link
Member Author

Would JSON be sufficient?

JSON or ND-JSON might not be sufficient due to the Date issue... lots of folks are storing Dates in IDB. JSON would be very convenient for generating the files, though. Can we extend JSON with something like Date(1510258332834)?

Do we think browsers would be significantly more efficient than just a small JS library that does this in a worker

Absolutely yes, mostly due to the need to do a put() for each individual object, although perhaps that's an argument for putAll() or similar. 😃

@kaizhu256
Copy link

why can't you use JSON for date? isn't "2017-11-09T20:12:12.834Z" sufficiently precise for new Date(1510258332834).toISOString()?

@inexorabletash
Copy link
Member

Re: Dates - you need out-of-band information to know to parse that string as a Date, which means you're not simply providing e.g. an array of values to directly put into a store. At which point, you're inventing a new serialization scheme layered on JSON. Which is fine, but it's no longer "just JSON".

@kaizhu256
Copy link

different perspective. i tend to design javascript-systems on assumption it will interact with "dumb" non-locale aware systems (sqlite3, shell-scripts, etc...). and JSON-data like {"utc_datetime":"2019-03-29T05:43:00.000Z","timezoneOffset":300} is usually sufficient.

@dmurph
Copy link

dmurph commented Sep 20, 2019

TPAC 2019 Web Apps Indexed DB triage notes:

It seems like the largest issue here is the batching of putting the data into the database. putAll should fix this issues. This is Issue #69

I believe the next issue here is the overhead of parsing the json, then turning that json into structured cloning, vs going straight from the json blob to structured cloning.

But anyways, something like issue #69 seems like the biggest win here. Maybe there would eventually be a way to stream stuff into putAll? Or give it an async iterator? Unclear.

@inexorabletash inexorabletash added this to the vFuture milestone Sep 20, 2019
@nolanlawson
Copy link
Member Author

Given that putAll() is no longer in development (#69 (comment)), I like the idea of directly streaming JSON into an object store. As mentioned before, this won't handle Dates, but in retrospect I think this may be an edge case. My hunch is that most folks are storing objects in IndexedDB that are fully serializable as JSON. (I have no data though. 😀)

Something like this?

const reader = (await (await fetch('./array.json')).body).getReader()
const writer = db.getWriter('my_object_store') // WritableStream
await reader.pipeTo(writer)

db.getWriter() could assume a 'readwrite' transaction, and it could take an options bag like db.transaction():

const writer = db.getWriter('my_object_store', { durability: 'relaxed' })

Working with the existing db.transaction() is also possible, but it may be weird to get the microtask timing right, given that we are reading data from the network. Also making it more high-level gives more flexibility to the engine to optimize stuff (maybe?).

If we wanted to support Dates in the future, we could also always have an optional transform or something:

const writer = db.getWriter('my_object_store', { transform: item => {
  // transform a single item from the JSON array
  item.date = new Date(item.date) // parse string to Date
  return item
}})

(Maybe something like module blocks could even make it so that this transform function can run off-main-thread.)

@jimmywarting
Copy link

jimmywarting commented Dec 17, 2022

I think it would be bad to base this on json when indexeddb can store so much more than json types. So importing/exprting databases would be problematic for dates, blob/files, typedarrays, BigInts, Infinity, circular refs, and everything else that can be structural cloned.

cbor would be a good alternative. as it can handle all of those things.
it's a binary format. so it would be smaller in size and probably faster to decode/encode then json?

@SteveBeckerMSFT SteveBeckerMSFT added the TPAC2024 Topic for discussion at TPAC 2024 label Sep 9, 2024
@SteveBeckerMSFT
Copy link

TPAC 2024: While loading from a prebuilt format seems beneficial to support for several scenarios, it remains very challenging to specify and implement without severe limitations. In particular, we would need to spec out the binary format for the prebuilt IDB format. Perhaps these scenarios should consider SQLite WASM with the bucket file system instead? SQLite WASM also has some drawbacks that not every application can overcome (like the same origin requirements).

Historically, browsers have optimized IDB reads more than writes. Perhaps there is an opportunity to revisit putAll() to see if it could help reduce the costs of IDB data import?

@nolanlawson
Copy link
Member Author

Maybe putAll() could be more of a win if it accepted a ReadableStream of JSON? Could cut down on memory usage at least.

@brettz9
Copy link
Contributor

brettz9 commented Oct 2, 2024

I think it would be bad to base this on json when indexeddb can store so much more than json types. So importing/exprting databases would be problematic for dates, blob/files, typedarrays, BigInts, Infinity, circular refs, and everything else that can be strucutred structural cloned.

FWIW, typeson-registry uses, through its structured cloning preset, a form of JSON, typeson, to encode structured cloning data, supporting cyclic data and all of the above-mentioned structured cloning types and more (though a handful of the newer types at https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#webapi_types are indeed not presently supported).

As @inexorabletash suggested, this indeed requires a particular serialization of JSON with an out-of-band section (within the JSON itself, in the case of typeson) tracking which parts correspond to which types.

This doesn't address the smaller size or greater performance that a binary format could provide, but just a FYI on how the structured types issue can at least be overcome by a form of JSON.

@nolanlawson
Copy link
Member Author

In particular, we would need to spec out the binary format for the prebuilt IDB format.

It occurred to me that supporting a JSON ReadableStream may be "good enough" for most uses of IndexedDB. I'd be curious to know what % of IndexedDB users are actually using the more exotic stuff supported by structured cloneable objects (e.g. Dates). My guess is not very many – or they would be happy to use strings instead of Dates if it meant faster insertion performance.

Maybe a v1 of putAll could support a ReadableStream of application/json, and a v2 could support some as-yet-undesigned binary format?

@paralin
Copy link

paralin commented Nov 23, 2024

I would appreciate a way to batch download and put Uint8Array objects as well. But this might not be a common use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request schemas/upgrades TPAC2024 Topic for discussion at TPAC 2024
Projects
None yet
Development

No branches or pull requests

8 participants