-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LanceDB to the list of Known Users #7716
Conversation
Lance is essentially a table format (like Delta Lake). These blur the line between data format and database, so it requires database components to build, such as a expression library. In addition, one of Lance's distinguishing feature is support for secondary indexes (right now, just ANN indexes for approximate KNN search). In order to use these, we need to have query plans to handle scanning both indexed data and yet-to-be-indexed data in parallel and combine the two in a query. We use DataFusion to do this. The two things we like about DataFusion in particular are: (1) it's easy to extend with new query nodes and (2) it's Arrow-native. For operations like scanning indices and our |
Co-authored-by: Will Jones <willjones127@gmail.com>
@wjones127 -- thank you for your comments in #7716 (comment) Do you mind if I use this in the paper we are working on (#6782 ) as a usecase as I think it validates several of the points in the paper (Arrow compatibility and having all the expression machinery) |
We'd love to support your paper submission! |
* Add LanceDB to the list of Known Users * Update docs/source/user-guide/introduction.md Co-authored-by: Will Jones <willjones127@gmail.com> --------- Co-authored-by: Will Jones <willjones127@gmail.com>
I was reading about LanceDB and then realized it used DataFusion -- https://github.com/search?q=repo%3Alancedb%2Flance%20datafusion&type=code
cc @wjones127
(btw I would love to know why you chose DataFusion and how you are using it -- among other things, it might make an excellent example usecase for #6782)