-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export minimum C API and examples for C, Ruby and Python #2622
Conversation
Closes apache#1113 This exports minimum C API to write the following Rust code in C: use datafusion::prelude::*; #[tokio::main] async fn main() -> datafusion::error::Result<()> { // register the table let mut ctx = ExecutionContext::new(); // create a plan to run a SQL query let df = ctx.sql("SELECT 1").await?; // execute and print results df.show().await?; Ok(()) } See datafusion/c/examples/sql.c for C version. You can build and run datafusion/c/examples/sql.c by the following command lines: $ cargo build $ cc -o target/debug/sql datafusion/c/examples/sql.c -Idatafusion/c/include -Ltarget/debug -Wl,--rpath=target/debug -ldatafusion_c $ target/debug/sql +----------+ | Int64(1) | +----------+ | 1 | +----------+ This implementation doesn't export Future like datafusion-python. Async functions are block_on()-ed in exported API. But I think that we can export Future in follow-up tasks. Follow-up tasks: * Add support for testing by "cargo test" * Add support for building and running examples by "cargo ..." * Add support for installing datafusion.h
I've added examples that use the C API from Python and Ruby with FFI library. |
Hi @kou and thanks for the contribution! This looks really interesting but I would like to understand more about the motivation and context for this. Making DataFusion accessible from C makes sense but I am wondering if we should create a separate repository for this in https://github.com/datafusion-contrib/. This is where we have the Java and Python bindings for DataFusion. I'm also curious why we would want to go from Python -> C -> Rust rather than just Python -> Rust directly as we do in https://github.com/datafusion-contrib/datafusion-python I am also concerned that adding C code as part of the default build may be problematic for some users. I assume there are some minimum requirements for having this work on all platforms? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked over the code and tried it out. Thank you very much @kou .
(arrow_dev) alamb@MacBook-Pro-6:~/Software/arrow-datafusion$ cc -o target/debug/sql -I datafusion/c/include datafusion/c/examples/sql.c -L target/debug -ldatafusion_c
(arrow_dev) alamb@MacBook-Pro-6:~/Software/arrow-datafusion$ ./target/debug/sql
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
I agree with @andygrove that this code could also reasonably live in another crate / repo rather than the core datafusion one.
Some suggestions:
- It would be nice to put a minimal readme in
datafusion/c
(saying, for example, that the directory contains the C API, seeexamples/README.md
for more details). Or maybe we could movedatafusion/c/examples/README.md
todatafusion/c/README.md
to make it more discoverable - I think it would be a good idea to track planned follow on items as individual tasks
API. But I think that we can export Future in follow-up tasks.
I am not familiar how to interface Rust async
functions with C -- I would assume it looks like callbacks somehow, but I can imagine how it gets very tricky very quickly
Ok(value) => Some(value), | ||
Err(e) => { | ||
if !error.is_null() { | ||
let c_string_message = match CString::new(format!("{}", e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let c_string_message = match CString::new(format!("{}", e)) { | |
let c_string_message = match CString::new(e.to_string()) { |
} | ||
|
||
fn block_on<F: Future>(future: F) -> F::Output { | ||
tokio::runtime::Runtime::new().unwrap().block_on(future) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio::runtime::Runtime::new().unwrap().block_on(future) | |
tokio::runtime::Runtime::new().expect("Can not create tokio runtime").block_on(future) |
Regarding repository, I'm OK with it. Could you create https://github.com/datafusion-contrib/datafusion-c or something? Or should I create my repository for this?
Sorry for confusing you. I didn't want to suggest that we should use this for Python bindings. I just wanted to show that we can use FFI library for bindings as an use case of this C API. I should have used Julia or something rather than Python.
They make sense. I'll do them in new repository for this.
I will not use callbacks like the following: int
main(void)
{
DFSessionContext *context = df_session_context_new();
DFError *error = NULL;
DFDataFrameFuture *sql_future =
df_session_context_sql_async(context, "SELECT 1;", &error);
if (error) {
printf("failed to start SQL: %s\n", df_error_get_message(error));
df_error_free(error);
df_session_context_free(context);
return EXIT_FAILURE;
}
DFDataFrame *data_frame = df_data_frame_future_await(sql_future, &error);
if (error) {
printf("failed to run SQL: %s\n", df_error_get_message(error));
df_error_free(error);
df_session_context_free(context);
return EXIT_FAILURE;
}
DFFuture *show_future = df_data_frame_show_async(data_frame, &error);
if (error) {
printf("failed to start showing data frame: %s\n",
df_error_get_message(error));
df_error_free(error);
df_data_frame_free(data_frame);
return EXIT_FAILURE;
}
df_future_await(show_future, &error);
if (error) {
printf("failed to show data frame: %s\n",
df_error_get_message(error));
df_error_free(error);
df_data_frame_free(data_frame);
return EXIT_FAILURE;
}
df_data_frame_free(data_frame);
df_session_context_free(context);
return EXIT_SUCCESS;
} But I may change my mind. Thanks for suggestions for my Rust code! This is my first Rust program. So suggestions are very welcome. :-) |
Thanks @kou. I created a new repo https://github.com/datafusion-contrib/datafusion-c where you can PR this work. Thanks again for adding another language binding! |
I agree with either creating a separate repo or merge as is but thanks for the good work now that I guess Java binding can use this c interface and possibly with the new jextract tool as well |
Hello, I've been watching DataFusion from the sidelines and am interested in using it in a Zig project. As a disclaimer, I'm new to Rust and am not that familiar with its async implementation. DataFusion's "query engine as a library" is similar to SQLite's "database as a library". SQLite's success is in part due to how easy it is to integrate it into any project, regardless of the language it uses. It'd be wonderful if DataFusion was also as easy to use in all projects! Today, a key challenge is Rust's async. While using blocking tricks is a great short-term solution, it is inefficient (it requires multiple threads) and can cause deadlock issues. Callbacks are also not a perfect solution (what if my language runtime requires stack unwinding?). In my opinion this would be helped by:
|
Very cool demo @kou ! Agree that it would be better to manage it through the datafusion-c repo in the contrib github org. With regards to async, i think we can use tokio runtime block on to hide all the async apis behind a set of sync apis similar to what we do in our python binding. |
Yes I think this is the most reasonable solution suggestion -- don't expose any async APIs and have DataFusion do its thread pool / IO management internally. If people want the additional performance or resource control they could use the Rust APIs directly. |
Moving this to draft to avoid accidental merge |
I close this in favor of datafusion-contrib/datafusion-c#1 . |
Which issue does this PR close?
Closes #1113.
Rationale for this change
See #1113.
What changes are included in this PR?
This exports minimum C API to write the following Rust code in C:
See datafusion/c/examples/sql.c for C version. You can build and run
datafusion/c/examples/sql.c by the following command lines:
This implementation doesn't export Future like
datafusion-python. Async functions are block_on()-ed in exported
API. But I think that we can export Future in follow-up tasks.
Follow-up tasks:
Are there any user-facing changes?
Users can use DataFusion from C and/or FFI.
Does this PR break compatibility with Ballista?
No.