Add LanceDB to the list of Known Users #7716

alamb · 2023-10-01T12:36:30Z

I was reading about LanceDB and then realized it used DataFusion -- https://github.com/search?q=repo%3Alancedb%2Flance%20datafusion&type=code

cc @wjones127

(btw I would love to know why you chose DataFusion and how you are using it -- among other things, it might make an excellent example usecase for #6782)

docs/source/user-guide/introduction.md

wjones127 · 2023-10-02T16:42:26Z

btw I would love to know why you chose DataFusion and how you are using it -- among other things, it might make an excellent example usecase for

Lance is essentially a table format (like Delta Lake). These blur the line between data format and database, so it requires database components to build, such as a expression library. In addition, one of Lance's distinguishing feature is support for secondary indexes (right now, just ANN indexes for approximate KNN search). In order to use these, we need to have query plans to handle scanning both indexed data and yet-to-be-indexed data in parallel and combine the two in a query. We use DataFusion to do this.

The two things we like about DataFusion in particular are: (1) it's easy to extend with new query nodes and (2) it's Arrow-native. For operations like scanning indices and our Take operation (get additional columns by their known row locations). DataFusion being Arrow-native has meant it's been easy to integrate with PyArrow and the larger Python data ecosystem. For example, we have many APIs where users write Python functions that operation on RecordBatches, and these can operate directly on the data without having to do any conversion. (We are very heavy users of the C Data Interface.)

Co-authored-by: Will Jones <willjones127@gmail.com>

alamb · 2023-10-02T17:40:08Z

@wjones127 -- thank you for your comments in #7716 (comment)

Do you mind if I use this in the paper we are working on (#6782 ) as a usecase as I think it validates several of the points in the paper (Arrow compatibility and having all the expression machinery)

eddyxu · 2023-10-02T18:01:26Z

Do you mind if I use this in the paper we are working on (#6782 ) as a usecase as I think it validates several of the points in the paper (Arrow compatibility and having all the expression machinery)

We'd love to support your paper submission!

* Add LanceDB to the list of Known Users * Update docs/source/user-guide/introduction.md Co-authored-by: Will Jones <willjones127@gmail.com> --------- Co-authored-by: Will Jones <willjones127@gmail.com>

Add LanceDB to the list of Known Users

2ee1460

alamb commented Oct 1, 2023

View reviewed changes

docs/source/user-guide/introduction.md Outdated Show resolved Hide resolved

Dandandan approved these changes Oct 1, 2023

View reviewed changes

viirya approved these changes Oct 1, 2023

View reviewed changes

docs/source/user-guide/introduction.md Outdated Show resolved Hide resolved

Update docs/source/user-guide/introduction.md

0b9fd7b

Co-authored-by: Will Jones <willjones127@gmail.com>

alamb merged commit 422e68e into main Oct 2, 2023
7 checks passed

alamb added the documentation Improvements or additions to documentation label Oct 2, 2023

wjones127 deleted the alamb-patch-1 branch October 2, 2023 20:47

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LanceDB to the list of Known Users #7716

Add LanceDB to the list of Known Users #7716

alamb commented Oct 1, 2023

wjones127 commented Oct 2, 2023

alamb commented Oct 2, 2023

eddyxu commented Oct 2, 2023

Add LanceDB to the list of Known Users #7716

Add LanceDB to the list of Known Users #7716

Conversation

alamb commented Oct 1, 2023

wjones127 commented Oct 2, 2023

alamb commented Oct 2, 2023

eddyxu commented Oct 2, 2023