-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression plugin function with multiple columns as input #49
Comments
TitleI concur. The definition of the column is #[polars_expr(output_type_func=haversine_output)]
fn haversine(inputs: &[Series]) -> PolarsResult<Series> {
let out = match inputs[0].dtype() {
DataType::Float32 => {
let start_lat = inputs[0].f32().unwrap();
let start_long = inputs[1].f32().unwrap();
let end_lat = inputs[2].f32().unwrap();
let end_long = inputs[3].f32().unwrap();
// etc So from the other examples, we understand that The But I would argue a a potentially very generic sample function with several columns as input and several columns as ouput would be more helpful, and, in terms of API it would quite readable to have the inputs and outputs as Here is the python shell, super generic: @pl.api.register_expr_namespace("demo")
class Demo:
def __init__(self, expr: pl.Expr):
self._expr = expr
def demo_calc(self) -> pl.Expr:
return self._expr._register_plugin(
lib=lib,
symbol="demo_calc",
is_elementwise=True,
kwargs={},
) A vaguely worded rust part would be: // struct input column
struct FuncStructIn {
a_float: bool,
a_int: i64,
a_string: String,
}
// struct output column
struct FuncStructOut {
a_valid: bool,
a_one: f64,
a_two: f64,
}
// output type is a struct with schema FuncStructOut
#[polars_expr(output_type=FuncStructOut)]
fn demo_calc(inputs: &[Series], kwargs: PigLatinKwargs) -> PolarsResult<Series> {
// unpack struct to individual variables, with all the relevant error checking
// obviously this does not work but you get the idea
let s_in: FuncStructIn = inputs[0].struct();
let a_float : &ChunkedArray<T>= = s_in.a_float;
let a_int = s_in.a_int;
let a_string = s_in.a_string;
// impl_demo_calc contains the actuall processing
let s_out: FuncStructOut = impl_demo_calc(a_float, a_int, a_string)
// returns result
// again syntax is super approximative
Ok(s_out.into_series())
}
// approx syntax
pub(super) fn impl_demo_calc(
a_float: &ChunkedArray<Float64Type>,
a_int: &ChunkedArray<Int64Type>,
a_string: &ChunkedArray<Utf8Type>,
) -> PolarsResult<ChunkedArray<FuncStructOut>> {
// can be anything but return a FuncStructOut shaped struct
// unless compute error due to bad input
}
// once the this is done, in a second stage, probably a _with parallelism version can be derived. Hopefully the intention is clear. Unfortunately, I don't know enough about polars-rs, and to be honest rust in general, at the moment. If there is no serious drawback to this approach - eg. perf cost creating the input struct - then, once the unpacking/repacking and error management is demonstrated it can become a useful boilerplate to start from for many use cases/people. Whether maintainers are willing to improve this part of the doc or not, I would be grateful for pointers to continue exploring. |
looks like the dev has been done: But I'm still not sure how we can pass multiple columns to a function. |
I think this solves a use case brought forward by polars extension plugin author Ion Koutsouris in this issue #58. In the meantime I've learnt some Rust and done some research and existing comunity plugins are the best information sources. There I could find exemples with input as a single column of type I am not far from testing it and if confirmed, then my naive question above will be answered. |
Does this help? https://marcogorelli.github.io/polars-plugins-tutorial/sum/ I think you're right about the haversine example though, will make a pr to clear it up |
yes thanks a lot. |
I have the same question :) |
YES it does ! For somebody intermediate in rust, beginner in rust polarrs, proficient in python polars, the following sections, in particular, got my attention: |
I've looked as the example but the haversine one is a bit confusing.
It looks like the python example gives 5 parameters to the Rust function so the last one is ignored as the functions only takes 4 parameters.
What is the best way to implement a function that takes multiple columns as parameters and still use all the parallelism and optimization from Polars?
The text was updated successfully, but these errors were encountered: