-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): minimal Python UDF SDK #7943
Conversation
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
dfd0838
to
37d0704
Compare
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
statement ok | ||
create function gcd(int, int, int) returns int as 'http://localhost:8815' language arrow_flight; | ||
|
||
# Create a function with the same name and arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about creating a function with the same name as a build-in function? 🥵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present, the creation will succeed but the usage will still refer to the built-in version. I will fix it later. 🥵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀🚀🚀
let mut path = name.to_string() + "/"; | ||
for (i, arg) in args.fields.iter().enumerate() { | ||
if i != 0 { | ||
path += ","; | ||
} | ||
path += &arg.data_type().to_string().to_lowercase(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about extracting this snippet to make the naming consistent everywhere? Like...
risingwave/src/frontend/src/handler/create_function.rs
Lines 86 to 97 in 896d92e
// check if function exists | |
if (session.env().catalog_reader().read_guard()) | |
.get_schema_by_id(&database_id, &schema_id)? | |
.get_function_by_name_args(&function_name, &arg_types) | |
.is_some() | |
{ | |
let name = format!( | |
"{function_name}({})", | |
arg_types.iter().map(|t| t.to_string()).join(",") | |
); | |
return Err(CatalogError::Duplicated("function", name).into()); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! But this snippet will be deprecated soon. I'm going to decouple the function name between kernel and UDF server in the next PR. The syntax of CREATE FUNCTION will be changed correspondingly to:
create function gcd(int, int) returns int language python as gcd using link 'http://localhost:8815';
where gcd(int,int)
is the name in kernel and gcd
is the name in UDF server.
This approach refers to Flink and I think it's better than force unified name. 🤔
Co-authored-by: xxchan <xxchan22f@gmail.com>
Co-authored-by: xxchan <xxchan22f@gmail.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
b98fd1a
to
56d3b25
Compare
Signed-off-by: Runji Wang <wangrunji0408@163.com>
56d3b25
to
ae26889
Compare
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Codecov Report
@@ Coverage Diff @@
## main #7943 +/- ##
==========================================
- Coverage 71.46% 71.45% -0.01%
==========================================
Files 1128 1128
Lines 182535 182542 +7
==========================================
- Hits 130445 130433 -12
- Misses 52090 52109 +19
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
commit f2199fe Author: stonepage <40830455+st1page@users.noreply.github.com> Date: Thu Feb 23 16:27:42 2023 +0800 feat(explain): add conflict behavior in explain materialize operator (#8138) as title Approved-By: BugenZhao commit 52a39fd Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com> Date: Thu Feb 23 16:09:34 2023 +0800 feat(stream): `ErrorSuppressor` for user compute errors (#8132) `ErrorSuppressor` for user compute errors Approved-By: fuyufjh Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com> Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com> commit f5f8f83 Author: idx0-dev <124041366+idx0-dev@users.noreply.github.com> Date: Thu Feb 23 15:43:13 2023 +0800 feat: kafka-upsert with json,avro format (#8111) To support `upsert-kafka` in a manner similar to [how Flink does](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/upsert-kafka/), the key field of a Kafka message is used to indicate the values of the primary key column. If the value field of the message is not empty, the row will be inserted or updated. If the value field is empty, the row will be deleted. This behavior is not tied to any specific row format. A Kafka connector with the `upsert` property enabled will produce `UpsertMessage`s encoded in bytes, instead of raw Kafka message values, as `SourceMessage`s. The row formats prefixed with `UPSERT_` are aware that `SourceMessage`s contain not only the Kafka message value field but also the key field as primary columns, and will behave as expected. Approved-By: waruto210 Co-Authored-By: idx0-dev <124041366+idx0-dev@users.noreply.github.com> Co-Authored-By: waruto <wmc314@outlook.com> commit dd7fc13 Author: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com> Date: Thu Feb 23 15:24:02 2023 +0800 feat(streaming): enable dml executor to pause and resume on scaling (#8110) - Enable dml executor to pause and resume on scaling. - A little refactor on `StreamReaderWithPause` (previously named `SourceReaderStream`): - Make the left arm accept general message types instead of barriers only. - Introduce non-biased `StreamReaderWithPause`. Fixes #8056 Approved-By: st1page Approved-By: waruto210 commit ba92df4 Author: waruto <wmc314@outlook.com> Date: Thu Feb 23 15:04:46 2023 +0800 fix: remove message name parameter for avro schema (#8124) I checked the code for the version where this parameter was first introduced and found that it was never used, it was probably just copied from `ProtobufSchema`. It is strange to let the user specify the message name for a avro schema, so we should remove it. Approved-By: tabVersion Approved-By: hzxa21 commit 5c050ef Author: Bugen Zhao <i@bugenzhao.com> Date: Thu Feb 23 14:42:36 2023 +0800 feat: fill correct table version ID for DML (#8120) This PR implements versioning for DML statements. - Frontend: use the correct version ID extracted from the catalog to fill the fields of batch DML plan node protos. - Connector: do sanity check on the table schema when registering with the same version. - Meta: fill the `version` field of streaming DML plan node proto when visiting the fragment graph. Approved-By: chenzl25 Approved-By: st1page Approved-By: xx01cyx commit 88dc35e Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com> Date: Thu Feb 23 08:09:40 2023 +0800 feat(frontend): apply `SessionTimezone` and `ConstEvalRewriter` expr rewriters to during `gen_{batch,stream}_plan` (#7761) apply `SessionTimezone` and `ConstEvalRewriter` expr rewriters to during `gen_{batch,stream}_plan` Notes: - wait for #7757 to be merged - wait for #7777 to be merged - wait for #7786 to be merged Approved-By: ice1000 Approved-By: st1page Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com> Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com> commit 7cadc39 Author: Zhidong Guo <52783948+Gun9niR@users.noreply.github.com> Date: Thu Feb 23 03:19:36 2023 +0800 feat(meta): mutable checkpoint frequency (#8010) As title. Use `LocalNotification` to asyncly notify other components on the meta node of the latest params. Approved-By: BugenZhao Co-Authored-By: Gun9niR <gun9nir.guo@gmail.com> Co-Authored-By: Zhidong Guo <52783948+Gun9niR@users.noreply.github.com> commit 3a598fb Author: Wallace <bupt2013211450@gmail.com> Date: Wed Feb 22 20:04:37 2023 +0800 fix(storage): fix calculate incorrect memory usage of sstable meta (#8126) close #8125 Approved-By: soundOfDestiny Approved-By: Li0k commit c60a5db Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Wed Feb 22 19:13:44 2023 +0800 fix(ci): make java-binding-e2e release test depends on build-release (#8127) In the release CI yaml, there is no item named `build`, which is different to the PR CI. Therefore, the CI in PR works, but fails on release CI. This PR fix it. Approved-By: xxchan commit 8619b11 Author: Wallace <bupt2013211450@gmail.com> Date: Wed Feb 22 18:21:53 2023 +0800 fix(storage): fix skip delete range in uncommitted files (#8009) Approved-By: Li0k commit 864fb46 Author: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com> Date: Wed Feb 22 17:44:56 2023 +0800 feat(expr): access `jsonb` object field and array element (#8023) Adds the following expressions: * `jsonb_object_field(jsonb, varchar) -> jsonb` * `jsonb_array_element(jsonb, int) -> jsonb` * `jsonb_object_field_text(jsonb, varchar) -> varchar` * `jsonb_array_element_text(jsonb, int) -> varchar` * `jsonb_typeof(jsonb) -> varchar` * `jsonb_array_length(jsonb) -> int` The first two are actually operator `->` in PostgreSQL, and the two in the middle are operator `->>` in PostgreSQL. But our parser does not support parsing this syntax yet. The optimization of constant rhs will be added in a followup. Approved-By: BugenZhao Co-Authored-By: Xiangjin <xiangjin@singularity-data.com> Co-Authored-By: Xiangjin <xiangjin@risingwave-labs.com> commit 88cb075 Author: ZENOTME <43447882+ZENOTME@users.noreply.github.com> Date: Wed Feb 22 16:56:07 2023 +0800 refactor: add CastError for cast function (#8090) add the specified error for cast function. refer more detail: #8074 Approved-By: xiangjinwu Approved-By: BugenZhao commit 05e7a0e Author: congyi wang <58715567+wcy-fdu@users.noreply.github.com> Date: Wed Feb 22 16:35:07 2023 +0800 refactor(storage): OpenDAL backend use batch delete (#8054) Approved-By: Li0k Co-Authored-By: congyi <15605187270@163.com> Co-Authored-By: congyi wang <58715567+wcy-fdu@users.noreply.github.com> commit 014eb09 Author: August <pin@singularity-data.com> Date: Wed Feb 22 15:59:04 2023 +0800 feat(meta): export metrics of meta count/role info (#8057) Export metrics of meta count and role infos to grafana. Approved-By: shanicky commit 8dff620 Author: Bugen Zhao <i@bugenzhao.com> Date: Wed Feb 22 15:35:18 2023 +0800 feat(streaming): support output indices in dispatchers (#8094) This PR adds support for output indices in each dispatcher. Here are the motivations: - For multiple MVs on an upstream MV, it's possible that each of them requires different columns of the upstreams. Currently, we do this projection in the downstream `Chain` node. However, if we allow creating mview on remote compute nodes (like spot instances), directly pruning the unused columns in upstream will decrease the remote shuffle cost as described in #4529. - For adding columns in schema change, there should be a layer that erases the schema change from `Materialize` to the downstream. By introducing the output indices in dispatchers, we can make the existing downstream MV receive chunks with the same schema and work correctly. For new downstream MVs after schema change, the new dispatcher will be able to output all columns. (#6903) Note that the optimization mentioned in Motivation 1 is not implemented in this PR. Currently, we just always output all columns in every dispatcher. Approved-By: fuyufjh Approved-By: chenzl25 Approved-By: xxchan commit 8e499c7 Author: Shanicky Chen <peng@singularity-data.com> Date: Wed Feb 22 15:15:54 2023 +0800 fix: Update EtcdElectionClient keep_alive/observe behavior (#8058) Approved-By: yezizp2012 commit 6c5f68f Author: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com> Date: Wed Feb 22 14:33:02 2023 +0800 feat(sink): prune out hidden columns on sink (#8099) - Prune out hidden columns on sink - Reject upsert sink without pk after pruning - Refine `StreamSink` explain format Approved-By: tabVersion Approved-By: st1page commit 229a3c7 Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com> Date: Wed Feb 22 12:52:39 2023 +0800 feat(stream): `source_error_count` reporting to prometheus (#7877) Source stream error reporting to prometheus ![image](https://user-images.githubusercontent.com/9093549/218451821-fd1cbccf-e28b-42f2-a777-1fa88cccf6a4.png) Approved-By: fuyufjh Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com> Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com> commit 8a242fe Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Wed Feb 22 12:30:00 2023 +0800 chore(ci): add log for flaky meta backup test (#8109) Add logs to troubleshoot flaky test in #7850 Approved-By: Li0k commit fe9c5c5 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Wed Feb 22 12:09:08 2023 +0800 test(java-binding): add ci for java-binding (#7942) As titled. Added a CI case for java-binding that launches a RisingWave cluster, inserts data to the table, and runs the java binding demo to read data from the table. The previous cargo make script to run the demo is split for better reuse of script code. Approved-By: hzxa21 Approved-By: Gun9niR Co-Authored-By: William Wen <william123.wen@gmail.com> Co-Authored-By: William Wen <44139337+wenym1@users.noreply.github.com> commit e16e26d Author: Dylan <chenzl25@mail2.sysu.edu.cn> Date: Wed Feb 22 11:42:42 2023 +0800 feat(frontend): describe stmt shows index ordering (#8073) - As title. Approved-By: yezizp2012 Approved-By: cyliu0 Co-Authored-By: Dylan Chen <zilin@singularity-data.com> Co-Authored-By: Dylan <chenzl25@mail2.sysu.edu.cn> commit b429e9c Author: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com> Date: Wed Feb 22 10:44:18 2023 +0800 fix(common): `ListArray::from_protobuf` expects wrong cardinality of inner array (#8091) The internal of `ListArray` stores its data flattened. For example, `values (array[1]), (array[]::int[]), (null), (array[2, 3]);` stores an inner `I32Array` with `[1, 2, 3]`, along with offset array `[0, 1, 1, 1, 3]` and null bitmap `TTFT`. The cardinality of this inner array is not the length of outer array, but the last element of offset array. Fixes #8082 Approved-By: kwannoel Co-Authored-By: Xiangjin <xiangjin@singularity-data.com> Co-Authored-By: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com> commit 8bc69d4 Author: Noel Kwan <47273164+kwannoel@users.noreply.github.com> Date: Wed Feb 22 08:18:57 2023 +0800 fix(batch): enforce order for `LogicalValues` created by empty `LogicalScan` (#8079) - Fix #8067. Approved-By: chenzl25 Approved-By: jon-chuang Co-Authored-By: Noel Kwan <noelkwan1998@gmail.com> Co-Authored-By: Noel Kwan <47273164+kwannoel@users.noreply.github.com> commit 4b6e093 Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com> Date: Wed Feb 22 01:29:36 2023 +0800 feat(frontend): add `trace!` to optimizer trace. (#8092) add `trace!` to optimizer trace. Easier debugging for when a frontend optimization step goes wrong. Approved-By: kwannoel Approved-By: fuyufjh Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com> Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com> commit f11c53b Author: Bugen Zhao <i@bugenzhao.com> Date: Wed Feb 22 00:39:59 2023 +0800 fix: iterate vnode bitmap with `iter_vnodes` (#8083) This is to avoid confusing `usize` with `VirtualNode` when iterating the vnode bitmap as much as possible. For example, we've found that the `DeleteRange` is incorrect caused by calling `usize::to_be_bytes` by mistake for vnode prefix. Approved-By: soundOfDestiny Approved-By: TennyZhuang Approved-By: hzxa21 Co-Authored-By: Bugen Zhao <i@bugenzhao.com> Co-Authored-By: TennyZhuang <zty0826@gmail.com> commit 4835160 Author: Li0k <yuli@singularity-data.com> Date: Tue Feb 21 21:31:29 2023 +0800 chore(storage): remove state store v1 (#8102) remove unused code state_store_v1 and local_version_manager Approved-By: wenym1 commit 7a0316b Author: Runji Wang <wangrunji0408@163.com> Date: Tue Feb 21 20:46:29 2023 +0800 feat(udf): minimal Python UDF SDK (#7943) This PR designs a minimal SDK for Python UDFs. Now you can define a function in Python like this: ```python from risingwave.udf import udf, UdfServer @udf(input_types=['INT', 'INT'], result_type='INT') def gcd(x: int, y: int) -> int: while y != 0: (x, y) = (y, x % y) return x if __name__ == '__main__': server = UdfServer() server.add_function(gcd) server.serve() ``` This PR also fixes the problem when functions have no input arguments. Approved-By: xxchan Approved-By: BugenZhao commit f11bb62 Author: xxchan <xxchan22f@gmail.com> Date: Tue Feb 21 13:16:18 2023 +0100 doc: add e2e_test/generated/README.md (#8104) 💦 Approved-By: richardchien commit 4b374c3 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Tue Feb 21 20:13:45 2023 +0800 test(sink): update the link to download spark tgz for iceberg test (#8095) Fix #8093 Approved-By: xxchan Approved-By: Li0k Approved-By: jon-chuang Co-Authored-By: William Wen <william123.wen@gmail.com> Co-Authored-By: William Wen <44139337+wenym1@users.noreply.github.com> commit 672aad5 Author: Li0k <yuli@singularity-data.com> Date: Tue Feb 21 19:56:23 2023 +0800 feat(storage): Introduce TtlReclaimSelector and refactor the trigger logic of Scheduler (#7937) Its part of #6918 Improve and introduce TtlReclaimSelector, and introduce TtlReclaimTrigger to periodically initiate compaction against ttl for LastLevel to ensure that data can be reclaimed in a timely manner. As more triggers are introduced, try to refactor the Scheduler's Trigger logic to ensure the maintainability of the code. - Stream to simplify the code of the Scheduler trigger - Replace the last_index policy with key_range to ensure that the compaction runs correctly Approved-By: zwang28 Approved-By: Little-Wallace Co-Authored-By: Li0k <yuli@singularity-data.com> Co-Authored-By: Runji Wang <wangrunji0408@163.com> Signed-off-by: Richard Chien <stdrc@outlook.com>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR designs a minimal SDK for Python UDFs.
Now you can define a function in Python like this:
This PR also fixes the problem when functions have no input arguments.
Checklist
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)./risedev check
(or alias,./risedev c
)Documentation