Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development roadmap? #111

Open
nmeln opened this issue Oct 19, 2018 · 17 comments
Open

Development roadmap? #111

nmeln opened this issue Oct 19, 2018 · 17 comments

Comments

@nmeln
Copy link

nmeln commented Oct 19, 2018

I assume this is a prototype, many things still have to be implemented (and / or researched), so the system will change over time.
Do you have plans to make noria production-ready?
Is there a publicly accessible development roadmap or feature plan people could follow?

@ms705
Copy link
Member

ms705 commented Oct 19, 2018

You're right that the current version of Noria is a research prototype. However, it's definitely read to try out: we've manage to run some real web applications on Noria with minimal modification.

The best approximation of a development roadmap is probably the GitHub issues. Our research going forward will primarily focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

For production use, Noria might need:

  1. Improvements to return more helpful errors when Noria doesn't support a query yet (Add better SQL-compatibility tests #98, nom-sql, #36).
  2. Better fault-tolerance and high-availability support: client failover (Automatic failover for View and Table handles #105) and rebuilding only failed shards (rather than entire operators).
  3. Better resharding/shuffles (Support directly sharded shuffles #95), so that it can support upqueries across shuffles in the data-flow.

We're actively working on 2. and 3. as part of our scalability work, and hope to fix 1. as well.

We also plan to keep the versions released to crates.io stable, and will use semantic versioning when we make breaking changes.

Noria primarily remains a research project, but we are keen to support people who want to use it for real applications. If you have a use case that you'd like us to consider, do let us know!

@nmeln
Copy link
Author

nmeln commented Oct 22, 2018

focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

Sounds exciting!

Our use-case is aggregating over semi-large amounts of data (10 - 20 million rows in a table) in MySQL and getting last value from each group where timestamp is < (less) than some time (like midnight of current day). Around 1000 - 10000 rows are added per hour.

Incrementally updated materialized view that uses this aggregation query would work for us, I guess.

MySQL materialized views make this really difficult to achieve. Flexviews could be a solution, but we decided against it.

The incoming data may be out of order, and sometimes we need to take historical data into account, so it's difficult to use time windows for grouping. We also do several joins with other tables. These are some of the reasons why we didn't choose Spark, Kafka Streams or other streaming framework. Operationalization complexity / costs is another reason.

@jonhoo
Copy link
Contributor

jonhoo commented Oct 22, 2018

@ranchoiver I think that sounds like an excellent use-case for Noria! The one thing that we don't quite support yet is "rolling" time windows, which it sounds like you need. Specifically, you need a query with a filter that has a time-variant parameter. This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

@nmeln
Copy link
Author

nmeln commented Oct 22, 2018

This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

Yes, exactly.
Gonna follow the news

@jonhoo
Copy link
Contributor

jonhoo commented Nov 23, 2018

As an aside, noria-server probably won't be on crates.io until rust-lang/cargo#1565 is solved (which may be a while).

@mjjansen
Copy link

@jonhoo a couple other features I'd be curious about:

  • push notifications (when my view changes)
  • UDF
  • retrieve results in apache arrow (have this be the result of my materialized view request)

@jonhoo
Copy link
Contributor

jonhoo commented Nov 26, 2018

@mjjansen

  • Push notifications (basically, pushing parts of the data-flow to the client) is something that's definitely on our radar, and was actually one of the motivations for using data-flow in the first place. Data-flow is so amenable to distribution that in theory this should just be a matter of moving some of the data-flow nodes to a client machine. In practice it gets a little more tricky though. We don't have an implementation of it currently, and it's not at the top of our roadmap, but it is a feature we'd love to see!
  • The code is very much built around the idea that eventually we'll have UDFs. We have to narrow down a bit more what exact contract operators need to abide by first though. If you look at the Ingredient trait, that is basically what's required to implement your own operator, but there's a lot of subtlety in it at the moment that we'd want to resolve before exposing it to end-users.
  • That's a neat idea! I hadn't seen Apache Arrow before, but seems like a good candidate for a data egress format (cc @ms705)!

@mjjansen
Copy link

@jonhoo 1 more question... did you consider https://github.com/andygrove/sqlparser-rs vs https://github.com/ms705/nom-sql. I wonder if the effort can be combined.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 28, 2018

That crate didn't exist when we first started building Noria :) Combining efforts is probably not a bad idea though! (cc @ms705)

@mjjansen
Copy link

got it. thank you!

@3noch
Copy link

3noch commented Jan 4, 2019

👍 x 100 for push notifications (subscribing to queries). This would make noria not just a faster database than alternatives, but perfectly ideal for many applications that currently have to get this behavior manually with lots of error-prone work.

@3noch
Copy link

3noch commented Jan 4, 2019

Also having a Postgres adapter would be pretty amazing.

@jonhoo
Copy link
Contributor

jonhoo commented Jan 5, 2019

Hehe, yes, a Postgres adapter would be great, it just requires implementing the Postgres binary protocol in Rust similar to msql-srv. That's the bulk of the work. Once that's in place, the Noria SQL shim would just need to be able to run in both modes.

@xNxExOx
Copy link

xNxExOx commented Mar 31, 2019

My use case would be many simple queries over 1 or two tables to, and some complicated queries that need to run every few minutes now to keep local copy updated. These queries generate in game leaderboards, and server keeps local copy of whole leaderboard and update it every few minutes. If I understand it correctly I could get rid of that queries with noria and do selects directly and the first one would take minutes like now, but all other would be fast.
But there is big problem, because of few decisions we need server (and DB) to be able to build and run locally on developers windows machines.
Can you make it windows compatible please?

@mitar
Copy link

mitar commented Oct 28, 2019

I would also think that sqlparser-rs would be a better fit. Especially because it is used also by DataFusion. And if Noria starts using Arrow, then we have a crazy compatibility here between Arrow, DataFusion and Noria.

Personally, I do not care at all about MySQL adapter. What I would like to see is being able to observe all changes which are happening (getting deltas) to the materialized view state and push them out, and ideally push deltas out using Arrow representation from server to client.

So +1 for push notifications (or I would say live query, I think this is the more common term). I do not think Noria has to provide any web API here, just expose things through Rust API, and then users can hook their own logic in Rust to push them to websockets or whatever.

@jonhoo
Copy link
Contributor

jonhoo commented Oct 28, 2019

So, push notifications are tricky because they imply full materialization everywhere, which comes at a steep cost. There might be a good way to register interest in keys and then subscribe to updates for those keys, but that's not something we're actively working on. Might be a neat additional feature to add eventually though — it shouldn't be too hard, as most of the infrastructure is already there.

@mitar
Copy link

mitar commented Oct 28, 2019

I opened #143 to make a better place to discuss that feature. This issue looks too broad to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants