Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): Introduce TtlReclaimSelector and refactor the trigger logic of Scheduler #7937

Merged
merged 28 commits into from
Feb 21, 2023

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Feb 15, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Its part of #6918

Improve and introduce TtlReclaimSelector, and introduce TtlReclaimTrigger to periodically initiate compaction against ttl for LastLevel to ensure that data can be reclaimed in a timely manner. As more triggers are introduced, try to refactor the Scheduler's Trigger logic to ensure the maintainability of the code.

  • Stream to simplify the code of the Scheduler trigger
  • Replace the last_index policy with key_range to ensure that the compaction runs correctly

Checklist

- [ ] I have written necessary rustdoc comments

  • I have added necessary unit tests and integration tests
    - [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).
    - [ ] I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@Li0k Li0k requested a review from zwang28 February 15, 2023 06:44
@Li0k Li0k changed the title Li0k/storage ttl selector feat(storage): Introduce TtlReclaimSelector and refactor the trigger logic of Scheduler Feb 15, 2023
@Li0k
Copy link
Contributor Author

Li0k commented Feb 15, 2023

After introducing TtlReclaimSelector in this pr, we can observe that after TtlReclaimCompaction is triggered, the amount of data in table by ttl and sst by l6 stabilizes

image

image

@Li0k Li0k marked this pull request as ready for review February 15, 2023 11:07
@Li0k Li0k force-pushed the li0k/storage_ttl_selector branch from 5bd0d33 to 0f2f377 Compare February 16, 2023 05:27
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 2761 files.

Valid Invalid Ignored Fixed
1308 1 1452 0
Click to see the invalid file list
  • src/meta/src/hummock/compaction/picker/ttl_reclaim_compaction_picker.rs

Copy link
Contributor

@zwang28 zwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

});

for sst in matched_sst {
state.last_select_end_key = sst.key_range.as_ref().unwrap().right.to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_select_end_key can be updated once before this method returns, instead of update in the loop (Though the performance penalty is negligible).

@Li0k Li0k force-pushed the li0k/storage_ttl_selector branch from d95a8d0 to 8ecac67 Compare February 16, 2023 05:59
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 2761 files.

Valid Invalid Ignored Fixed
1308 1 1452 0
Click to see the invalid file list
  • src/meta/src/hummock/compaction/picker/ttl_reclaim_compaction_picker.rs

Signed-off-by: Runji Wang <wangrunji0408@163.com>
@@ -777,12 +777,24 @@ where
let can_trivial_move = matches!(selector.task_type(), compact_task::TaskType::Dynamic);

let mut stats = LocalSelectorStatistic::default();
let table_id_to_option = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CatalogManager may hold the lock for a long time when it runs a DDL job, I think we shall call get_table_options before getting guard of compaction lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is that we need to get the current_version before we can get the corresponding table_id, although table_id_option can be lagged for ttl

What do you think @zwang28

Copy link
Contributor

@zwang28 zwang28 Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can get all table options (without specify table_ids) before compaction lock.

Copy link
Contributor

@Little-Wallace Little-Wallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Thanks for the PR!


/// Schedule ttl_reclaim compaction for all compaction groups with this interval.
#[serde(default = "default::meta::periodic_ttl_reclaim_compaction_interval_sec")]
pub periodic_ttl_reclaim_compaction_interval_sec: u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR: I think configs to control the background tasks trigger interval can also be treated as mutable system parameters to make them easy to be reconfigured. The lists are:

  • min_sst_retention_time_sec
  • collect_gc_watermark_spin_interval_sec
  • periodic_compaction_interval_sec
  • vacuum_interval_sec
  • periodic_space_reclaim_compaction_interval_sec
  • periodic_ttl_reclaim_compaction_interval_sec

Any concerns? cc @zwang28 @Gun9niR

// default to zero.
let ttl_mill = (*ttl_second_u32 * 1000) as u64;
let min_epoch = expire_epoch.subtract_ms(ttl_mill);
if Epoch(sst.min_epoch) <= min_epoch {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to maintain per table_id min_epoch? Otherwise, the compactor may do unneccessary work if the SST contains both ttl and rarely updated non-ttl tables. If the size of metadata is the concern, we can maintain per table_id min_epoch only for L6 SSTs. This is an optimization so it is okay to postpone it to future PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also skip iterating KV entries by maintaining per table_id max_epoch when max_epoch < expired epoch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not necessary, because in the bottommost level, most sst files would only have one table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. An the bottommost level, most sst will contain only one table_id
  2. In other pr, I prefer to split ssts by table_id at the bottommost level

proto/hummock.proto Show resolved Hide resolved
.await;
continue;
}
SchedulerEvent::TtlReclaimTrigger => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will TtlReclaimTrigger and SpaceReclaimTrigger conflicts with each other? Under default config, space reclaim (interval 60min) happens on every two TTL reclaim (30min). Since both space and ttl reclaim check L6 SSTs, is it possible for one to win over the other and cause starvation? Space reclaim and ttl recliam have similar picker logic. This keeps me thinking that whether we should only keep one general picker and trigger for reclaim considering both table drop and ttl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Each of them would exit if they found some sst files has been selected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the existing implementation, only the contiguity of the files in the selection task is required, not the whole level, i.e. SpaceReclaim alternates with TtlReclaim and does not cause starvation due to pending status.

src/meta/src/hummock/manager/mod.rs Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 21, 2023

Codecov Report

Merging #7937 (ce5c92e) into main (0759ad6) will increase coverage by 0.07%.
The diff coverage is 85.50%.

@@            Coverage Diff             @@
##             main    #7937      +/-   ##
==========================================
+ Coverage   71.39%   71.46%   +0.07%     
==========================================
  Files        1128     1128              
  Lines      181691   182535     +844     
==========================================
+ Hits       129712   130445     +733     
- Misses      51979    52090     +111     
Flag Coverage Δ
rust 71.46% <85.50%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/hummock/mod.rs 22.22% <ø> (ø)
src/meta/src/lib.rs 0.88% <0.00%> (-0.03%) ⬇️
src/meta/src/hummock/compaction_scheduler.rs 68.50% <39.26%> (-12.57%) ⬇️
src/meta/src/hummock/manager/mod.rs 77.83% <76.19%> (-0.13%) ⬇️
src/storage/src/hummock/sstable/builder.rs 91.45% <80.95%> (-0.36%) ⬇️
...mpaction/picker/space_reclaim_compaction_picker.rs 98.44% <97.93%> (-0.62%) ⬇️
...compaction/picker/ttl_reclaim_compaction_picker.rs 98.99% <98.70%> (+0.92%) ⬆️
src/common/src/config.rs 90.17% <100.00%> (+0.13%) ⬆️
src/meta/src/hummock/compaction/level_selector.rs 97.75% <100.00%> (+0.05%) ⬆️
src/meta/src/hummock/compaction/mod.rs 83.24% <100.00%> (+0.63%) ⬆️
... and 23 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mergify mergify bot merged commit 672aad5 into main Feb 21, 2023
@mergify mergify bot deleted the li0k/storage_ttl_selector branch February 21, 2023 11:56
stdrc added a commit that referenced this pull request Feb 23, 2023
commit f2199fe
Author: stonepage <40830455+st1page@users.noreply.github.com>
Date:   Thu Feb 23 16:27:42 2023 +0800

    feat(explain): add conflict behavior in explain materialize operator (#8138)

    as title

    Approved-By: BugenZhao

commit 52a39fd
Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
Date:   Thu Feb 23 16:09:34 2023 +0800

    feat(stream): `ErrorSuppressor` for user compute errors (#8132)

    `ErrorSuppressor` for user compute errors

    Approved-By: fuyufjh

    Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com>
    Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

commit f5f8f83
Author: idx0-dev <124041366+idx0-dev@users.noreply.github.com>
Date:   Thu Feb 23 15:43:13 2023 +0800

    feat: kafka-upsert with json,avro format (#8111)

    To support `upsert-kafka` in a manner similar to [how Flink does](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/upsert-kafka/), the key field of a Kafka message is used to indicate the values of the primary key column. If the value field of the message is not empty, the row will be inserted or updated. If the value field is empty, the row will be deleted. This behavior is not tied to any specific row format.

    A Kafka connector with the `upsert` property enabled will produce `UpsertMessage`s encoded in bytes, instead of raw Kafka message values, as `SourceMessage`s.

    The row formats prefixed with `UPSERT_` are aware that `SourceMessage`s contain not only the Kafka message value field but also the key field as primary columns, and will behave as expected.

    Approved-By: waruto210

    Co-Authored-By: idx0-dev <124041366+idx0-dev@users.noreply.github.com>
    Co-Authored-By: waruto <wmc314@outlook.com>

commit dd7fc13
Author: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com>
Date:   Thu Feb 23 15:24:02 2023 +0800

    feat(streaming): enable dml executor to pause and resume on scaling (#8110)

    - Enable dml executor to pause and resume on scaling.
    - A little refactor on `StreamReaderWithPause` (previously named `SourceReaderStream`):
    - Make the left arm accept general message types instead of barriers only.
    - Introduce non-biased `StreamReaderWithPause`.

    Fixes #8056

    Approved-By: st1page
    Approved-By: waruto210

commit ba92df4
Author: waruto <wmc314@outlook.com>
Date:   Thu Feb 23 15:04:46 2023 +0800

    fix: remove message name parameter for avro schema (#8124)

    I checked the code for the version where this parameter was first introduced and found that it was never used, it was probably just copied from `ProtobufSchema`.

    It is strange to let the user specify the message name for a avro schema, so we should remove it.

    Approved-By: tabVersion
    Approved-By: hzxa21

commit 5c050ef
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Thu Feb 23 14:42:36 2023 +0800

    feat: fill correct table version ID for DML (#8120)

    This PR implements versioning for DML statements.

    - Frontend: use the correct version ID extracted from the catalog to fill the fields of batch DML plan node protos.
    - Connector: do sanity check on the table schema when registering with the same version.
    - Meta: fill the `version` field of streaming DML plan node proto when visiting the fragment graph.

    Approved-By: chenzl25
    Approved-By: st1page
    Approved-By: xx01cyx

commit 88dc35e
Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
Date:   Thu Feb 23 08:09:40 2023 +0800

    feat(frontend): apply `SessionTimezone` and `ConstEvalRewriter` expr rewriters to during `gen_{batch,stream}_plan` (#7761)

    apply `SessionTimezone` and `ConstEvalRewriter` expr rewriters to during `gen_{batch,stream}_plan`

    Notes:
    - wait for #7757 to be merged
    - wait for #7777 to be merged
    - wait for #7786 to be merged

    Approved-By: ice1000
    Approved-By: st1page

    Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com>
    Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

commit 7cadc39
Author: Zhidong Guo <52783948+Gun9niR@users.noreply.github.com>
Date:   Thu Feb 23 03:19:36 2023 +0800

    feat(meta): mutable checkpoint frequency (#8010)

    As title. Use `LocalNotification` to asyncly notify other components on the meta node of the latest params.

    Approved-By: BugenZhao

    Co-Authored-By: Gun9niR <gun9nir.guo@gmail.com>
    Co-Authored-By: Zhidong Guo <52783948+Gun9niR@users.noreply.github.com>

commit 3a598fb
Author: Wallace <bupt2013211450@gmail.com>
Date:   Wed Feb 22 20:04:37 2023 +0800

    fix(storage): fix calculate incorrect memory usage of sstable meta (#8126)

    close #8125

    Approved-By: soundOfDestiny
    Approved-By: Li0k

commit c60a5db
Author: William Wen <44139337+wenym1@users.noreply.github.com>
Date:   Wed Feb 22 19:13:44 2023 +0800

    fix(ci): make java-binding-e2e release test depends on build-release (#8127)

    In the release CI yaml, there is no item named `build`, which is different to the PR CI. Therefore, the CI in PR works, but fails on release CI. This PR fix it.

    Approved-By: xxchan

commit 8619b11
Author: Wallace <bupt2013211450@gmail.com>
Date:   Wed Feb 22 18:21:53 2023 +0800

    fix(storage): fix skip delete range in uncommitted files (#8009)

    Approved-By: Li0k

commit 864fb46
Author: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>
Date:   Wed Feb 22 17:44:56 2023 +0800

    feat(expr): access `jsonb` object field and array element (#8023)

    Adds the following expressions:
    * `jsonb_object_field(jsonb, varchar) -> jsonb`
    * `jsonb_array_element(jsonb, int) -> jsonb`
    * `jsonb_object_field_text(jsonb, varchar) -> varchar`
    * `jsonb_array_element_text(jsonb, int) -> varchar`
    * `jsonb_typeof(jsonb) -> varchar`
    * `jsonb_array_length(jsonb) -> int`

    The first two are actually operator `->` in PostgreSQL, and the two in the middle are operator `->>` in PostgreSQL. But our parser does not support parsing this syntax yet.

    The optimization of constant rhs will be added in a followup.

    Approved-By: BugenZhao

    Co-Authored-By: Xiangjin <xiangjin@singularity-data.com>
    Co-Authored-By: Xiangjin <xiangjin@risingwave-labs.com>

commit 88cb075
Author: ZENOTME <43447882+ZENOTME@users.noreply.github.com>
Date:   Wed Feb 22 16:56:07 2023 +0800

    refactor: add CastError for cast function  (#8090)

    add the specified error for cast function.
    refer more detail: #8074

    Approved-By: xiangjinwu
    Approved-By: BugenZhao

commit 05e7a0e
Author: congyi wang <58715567+wcy-fdu@users.noreply.github.com>
Date:   Wed Feb 22 16:35:07 2023 +0800

    refactor(storage): OpenDAL backend use batch delete (#8054)

    Approved-By: Li0k

    Co-Authored-By: congyi <15605187270@163.com>
    Co-Authored-By: congyi wang <58715567+wcy-fdu@users.noreply.github.com>

commit 014eb09
Author: August <pin@singularity-data.com>
Date:   Wed Feb 22 15:59:04 2023 +0800

    feat(meta): export metrics of meta count/role info (#8057)

    Export metrics of meta count and role infos to grafana.

    Approved-By: shanicky

commit 8dff620
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Wed Feb 22 15:35:18 2023 +0800

    feat(streaming): support output indices in dispatchers (#8094)

    This PR adds support for output indices in each dispatcher. Here are the motivations:

    - For multiple MVs on an upstream MV, it's possible that each of them requires different columns of the upstreams. Currently, we do this projection in the downstream `Chain` node. However, if we allow creating mview on remote compute nodes (like spot instances), directly pruning the unused columns in upstream will decrease the remote shuffle cost as described in #4529.

    - For adding columns in schema change, there should be a layer that erases the schema change from `Materialize` to the downstream. By introducing the output indices in dispatchers, we can make the existing downstream MV receive chunks with the same schema and work correctly. For new downstream MVs after schema change, the new dispatcher will be able to output all columns. (#6903)

    Note that the optimization mentioned in Motivation 1 is not implemented in this PR. Currently, we just always output all columns in every dispatcher.

    Approved-By: fuyufjh
    Approved-By: chenzl25
    Approved-By: xxchan

commit 8e499c7
Author: Shanicky Chen <peng@singularity-data.com>
Date:   Wed Feb 22 15:15:54 2023 +0800

    fix: Update EtcdElectionClient keep_alive/observe behavior (#8058)

    Approved-By: yezizp2012

commit 6c5f68f
Author: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com>
Date:   Wed Feb 22 14:33:02 2023 +0800

    feat(sink): prune out hidden columns on sink (#8099)

    - Prune out hidden columns on sink
    - Reject upsert sink without pk after pruning
    - Refine `StreamSink` explain format

    Approved-By: tabVersion
    Approved-By: st1page

commit 229a3c7
Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
Date:   Wed Feb 22 12:52:39 2023 +0800

    feat(stream): `source_error_count` reporting to prometheus  (#7877)

    Source stream error reporting to prometheus

    ![image](https://user-images.githubusercontent.com/9093549/218451821-fd1cbccf-e28b-42f2-a777-1fa88cccf6a4.png)

    Approved-By: fuyufjh

    Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com>
    Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

commit 8a242fe
Author: zwang28 <70626450+zwang28@users.noreply.github.com>
Date:   Wed Feb 22 12:30:00 2023 +0800

    chore(ci): add log for flaky meta backup test (#8109)

    Add logs to troubleshoot flaky test in #7850

    Approved-By: Li0k

commit fe9c5c5
Author: William Wen <44139337+wenym1@users.noreply.github.com>
Date:   Wed Feb 22 12:09:08 2023 +0800

    test(java-binding): add ci for java-binding (#7942)

    As titled.

    Added a CI case for java-binding that launches a RisingWave cluster, inserts data to the table, and runs the java binding demo to read data from the table.

    The previous cargo make script to run the demo is split for better reuse of script code.

    Approved-By: hzxa21
    Approved-By: Gun9niR

    Co-Authored-By: William Wen <william123.wen@gmail.com>
    Co-Authored-By: William Wen <44139337+wenym1@users.noreply.github.com>

commit e16e26d
Author: Dylan <chenzl25@mail2.sysu.edu.cn>
Date:   Wed Feb 22 11:42:42 2023 +0800

    feat(frontend): describe stmt shows index ordering (#8073)

    - As title.

    Approved-By: yezizp2012
    Approved-By: cyliu0

    Co-Authored-By: Dylan Chen <zilin@singularity-data.com>
    Co-Authored-By: Dylan <chenzl25@mail2.sysu.edu.cn>

commit b429e9c
Author: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>
Date:   Wed Feb 22 10:44:18 2023 +0800

    fix(common): `ListArray::from_protobuf` expects wrong cardinality of inner array (#8091)

    The internal of `ListArray` stores its data flattened. For example, `values (array[1]), (array[]::int[]), (null), (array[2, 3]);` stores an inner `I32Array` with `[1, 2, 3]`, along with offset array `[0, 1, 1, 1, 3]` and null bitmap `TTFT`.

    The cardinality of this inner array is not the length of outer array, but the last element of offset array.

    Fixes #8082

    Approved-By: kwannoel

    Co-Authored-By: Xiangjin <xiangjin@singularity-data.com>
    Co-Authored-By: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com>

commit 8bc69d4
Author: Noel Kwan <47273164+kwannoel@users.noreply.github.com>
Date:   Wed Feb 22 08:18:57 2023 +0800

    fix(batch): enforce order for `LogicalValues` created by empty `LogicalScan` (#8079)

    - Fix #8067.

    Approved-By: chenzl25
    Approved-By: jon-chuang

    Co-Authored-By: Noel Kwan <noelkwan1998@gmail.com>
    Co-Authored-By: Noel Kwan <47273164+kwannoel@users.noreply.github.com>

commit 4b6e093
Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
Date:   Wed Feb 22 01:29:36 2023 +0800

    feat(frontend): add `trace!` to optimizer trace. (#8092)

    add `trace!` to optimizer trace. Easier debugging for when a frontend optimization step goes wrong.

    Approved-By: kwannoel
    Approved-By: fuyufjh

    Co-Authored-By: jon-chuang <jon-chuang@users.noreply.github.com>
    Co-Authored-By: jon-chuang <9093549+jon-chuang@users.noreply.github.com>

commit f11c53b
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Wed Feb 22 00:39:59 2023 +0800

    fix: iterate vnode bitmap with `iter_vnodes` (#8083)

    This is to avoid confusing `usize` with `VirtualNode` when iterating the vnode bitmap as much as possible. For example, we've found that the `DeleteRange` is incorrect caused by calling `usize::to_be_bytes` by mistake for vnode prefix.

    Approved-By: soundOfDestiny
    Approved-By: TennyZhuang
    Approved-By: hzxa21

    Co-Authored-By: Bugen Zhao <i@bugenzhao.com>
    Co-Authored-By: TennyZhuang <zty0826@gmail.com>

commit 4835160
Author: Li0k <yuli@singularity-data.com>
Date:   Tue Feb 21 21:31:29 2023 +0800

    chore(storage): remove state store v1 (#8102)

    remove unused code state_store_v1 and local_version_manager

    Approved-By: wenym1

commit 7a0316b
Author: Runji Wang <wangrunji0408@163.com>
Date:   Tue Feb 21 20:46:29 2023 +0800

    feat(udf): minimal Python UDF SDK (#7943)

    This PR designs a minimal SDK for Python UDFs.

    Now you can define a function in Python like this:

    ```python
    from risingwave.udf import udf, UdfServer

    @udf(input_types=['INT', 'INT'], result_type='INT')
    def gcd(x: int, y: int) -> int:
    while y != 0:
    (x, y) = (y, x % y)
    return x

    if __name__ == '__main__':
    server = UdfServer()
    server.add_function(gcd)
    server.serve()
    ```

    This PR also fixes the problem when functions have no input arguments.

    Approved-By: xxchan
    Approved-By: BugenZhao

commit f11bb62
Author: xxchan <xxchan22f@gmail.com>
Date:   Tue Feb 21 13:16:18 2023 +0100

    doc: add e2e_test/generated/README.md (#8104)

    💦

    Approved-By: richardchien

commit 4b374c3
Author: William Wen <44139337+wenym1@users.noreply.github.com>
Date:   Tue Feb 21 20:13:45 2023 +0800

    test(sink): update the link to download spark tgz for iceberg test (#8095)

    Fix #8093

    Approved-By: xxchan
    Approved-By: Li0k
    Approved-By: jon-chuang

    Co-Authored-By: William Wen <william123.wen@gmail.com>
    Co-Authored-By: William Wen <44139337+wenym1@users.noreply.github.com>

commit 672aad5
Author: Li0k <yuli@singularity-data.com>
Date:   Tue Feb 21 19:56:23 2023 +0800

    feat(storage): Introduce TtlReclaimSelector and refactor the trigger logic of Scheduler (#7937)

    Its part of #6918

    Improve and introduce TtlReclaimSelector, and introduce TtlReclaimTrigger to periodically initiate compaction against ttl for LastLevel to ensure that data can be reclaimed in a timely manner.  As more triggers are introduced, try to refactor the Scheduler's Trigger logic to ensure the maintainability of the code.
    - Stream to simplify the code of the Scheduler trigger
    - Replace the last_index policy with key_range to ensure that the compaction runs correctly

    Approved-By: zwang28
    Approved-By: Little-Wallace

    Co-Authored-By: Li0k <yuli@singularity-data.com>
    Co-Authored-By: Runji Wang <wangrunji0408@163.com>

Signed-off-by: Richard Chien <stdrc@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants