Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support for custom storage drivers and SQLite storage implementation #7649

Open
k2s opened this issue Oct 30, 2022 · 3 comments
Open

Comments

@k2s
Copy link
Contributor

k2s commented Oct 30, 2022

I created a prototype of Nightscout using SQLite as storage.
SQLite supports JSON documents and also indexes on its properties, so it is not difficult to emulate MongoDB functionality as it is currently used in Nightscout.

I propose to add loader of custom storage into Nightscout with this commit. I am not creating pull request before some discussion on this topic.

I created mono repository to develop nightscout-storage-sqlite module. It is activated by setting STORAGE_CLASS=@nightscout-storage-sqlite.

I also developed CLI tool to import MongoDB dump into SQLite database and Docker image to test that it works. I am personally running it for some time now with imported 2 years of data .

More details and my open questions (mostly about proper way to provide configuration options) are in https://github.com/k2s/nightscout/.

@bewest
Copy link
Member

bewest commented Nov 1, 2022

Very creative! There are a few observations to build on your ideas here. JSON support has gotten much better across SQL engines over the last 8 years, so this is very exciting.
Historically, Nightscout has found success in what may now be considered to be an excess in permissivity. Which data types are available in which channel, and which properties or fields are available on each data type? The permissivity that comes intrinsic with MongoDB's storage model allows any application interoperating with Nightscout to determine their own idiomatic data model that may or may not match the other applications also using Nightscout. Most applications find a need to implement both AndroidAPS/openaps and Loop-style data models.
There have been prior efforts to standardize the data model, although it only went so far as to cover CGM, not all possible treatments.
Currently, Nightscout uses idiomatic mongodb, which prevents interoperable usage with CosmosDB.
Documenting and standardizing the data that comes through Nightscout will allow us to decouple Nightscouts behavior from a single data storage engine and open up many more deployment options.
In particular, I would be very excited about something using https://knexjs.org/ to potentially gain sql compatibility with several sql engines.

I see you've taken an approach to simplify the problem quite a bit more. knex supports a json data type and many sql engines support that as well. I appreciate the way you've identified the storage engine code. Is there a hybrid path that a.) simplifies work to be done while maintaining velocity, b.) opens up deployment scenarios c.) helps migrate towards stable data schema. I'm supportive of this kind of work, and working on compatible tangential ideas.

One consideration is with data storage, there has been some desire to use "capped collections" feature of Mongodb to implement a ring buffer that can run on a designated quota. Implementing a lossy data store is undesirable in some ways, except that some people will be able to run it at lower cost. With idiomatic Mongodb, this may be easier than with sql storage, or it might not make a difference. Just something in the current milieu to be aware of. From my conversations with @sulkaharo, I believe we are both supportive of making the data model stricter to better define interoperability as well as enabling a wider variety of high quality deployments.

@k2s
Copy link
Contributor Author

k2s commented Nov 7, 2022

I did some work on https://github.com/k2s/nightscout/:

  • process.env variables starting with STORAGE_ are passed to the drivers
  • added storage driver proxy@nightscout-storage-basic
    • it catches all calls to MongoDB driver and logs them to log file, then it uses original MongoDB driver to handle the data
    • this is helpful to analyze data structure and calls received/requested from/by different sources
    • I created a small tool to evaluate the raw log file, here is an example result with summary of calls grouped by operation hash and structure of stored data
  • fixed some issues in @nightscout-storage-sqlite driver
    • main issue is that MongoDB always returns PKs that were processed in DB, SQLite UPDATE for example doesn't return list of changed IDs, I created temporary solution
    • I need to finish multi@nightscout-storage-basic driver which will be able to run original driver + another driver together and compare the results, this will be useful to make sqlite driver production ready
    • import from MongoDB dump works fine
  • added storage driver @nightscout-storage-knex
    • my plan was that it should work same way as sqlite by using JSON data
    • problem I wasn't able to solve is, that I don't see a way how to UPDATE json data with where condition on json data (if it is possible, then Knex should work)

All the code is still experimental.

Booth, sqlite and knex drivers, could be extended with json schema that would define table structure in relational DB and still store any nonstandard data in JSON field.

For cleaning up structure of data, NS could define JSON schema to validate incomming data on the level of REST API and still keep the idea of document or time series database, instead of relational.

Benefit of Knex driver seams to me questionable. I think that writing the own drivers for database engines would use a lot of shared code and could be more efficient than Knex. At this moment, it seems that Knex is not providing universal way how to use JSON datatype in a way we need (I could be wrong).

The proxy driver is a good tool to see how NS uses DB (all operations are logged). My impression is that it uses it too much and that most of the calls could be cached if NS is the only tool using the DB directly and most requests are related to data from short time period that doesn't occupy too much RAM. Cache should fall back to DB storage only for reports.

@bewest Thank you for commenting on my proposal. Please, what do you mean with Nightscout uses idiomatic mongodb, which prevents interoperable usage with CosmosDB. I started also driver for native support to CosmosDB, but then I stopped because it seems there is compatibility mode with MongoDB. There are also some tickets where successful usage of CosmosDB is mentioned.

Lowering the amount of access to DB engine and also providing single Docker image supporting file database could simplify deployment and lower the cost of hosting.

Using JSON to express filter and carry data as MongoDB defines is a good abstraction for communication with any kind of database. In my current understanding of NS operation, the challenge is to handle duplicate data from different sources, which I believe is solvable.

@bewest
Copy link
Member

bewest commented Nov 9, 2022

Thanks for all the notes, @k2s! Very impressive work, thank you!

  • I'm not particularly tied to knex. I've been using it and sometimes question the benefits as you have. To continue with knex, you'll have to jump in and out of knex's raw facility several times to work with properly quoted native sql json types. Here's an example, something similar for where in the following answers: https://stackoverflow.com/questions/46669464/how-to-update-jsonb-column-using-knexjs-bookshelfjs. Admittedly this is extremely tedious and hard to read as well.
  • idiomatic mongodb usage MongoDB takes pride in being a no-schema technology. Nightscout's REST API allows any application to send any data with any properties and Nightscout will faithfully store it. Nightscout's REST API also allows querying and searching by any field or property, which makes it easy for additional applications to prototype new features with a potential tradeoff regarding interoperability for similar use cases. CosmosDB doesn't require but assumes you have schemas and indices, and throws an error if you query against a field doesn't have an index. There's not any one place in Nightscout v1 api that is clear how to query against known properties when we don't know all the properties that exist. For example, Loop's development tree is adding additional properties in the next release. Some of them look important for search, but it's Loop's schema, not ours. So the inversion of control that Nightscout supports for clients to decide their data storage needs results in queries that error on cosmosdb. There's an interesting parallel in other domains, for example ActivityPub vs diaspora* a similar tension exists. It will be very interesting to run with these patches with a sufficient cohort to reasonably survey schemas in the wild, and support data modeling exercises to converge on some standard and extensible types.
  • duplication - While it's possible for duplicates between different data sources such as dexcom share and Loop or with xdrip4ios and FreeAPSX to cause issues with predictions feature, don't underestimate the difficulties for even a single mobile device syncing. Openaps, Loop, and AndroidAPSX all have different properties and algorithms they use to sync or backfill with mixed results. The difficulty arises because mobile clients don't always know how long they've been offline or how much data they need to backfill.
  • regarding supporting lots of deployment styles: Add sample fly.toml for the hoster fly.io #7560 (comment)

Excellent notes on performance. There are a variety of reasons for what you observe, but I think @sulkaharo and I largely agree when it comes to the big picture. Sulka has also expressed desire for a db engine that can facilitate the needs better, presumably that would include keeping a cache. I hope he'll weigh in here because some of it sounds similar. For what Sulka has planned, we might opt for a fork instead of feature enhancements, for a variety of reasons. Any naming suggestions? We will be strongly encouraging functional programming style in a bid to reduce some of the unintended complexity among plugins.

Please think about and let me know what additional support might be useful, T1Pal may be able to sponsor workshops for developers along with other budget for this work. Are you located in Europe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants