Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.3.0 #818

Closed
hycdong opened this issue Sep 22, 2021 · 0 comments
Closed

Release 2.3.0 #818

hycdong opened this issue Sep 22, 2021 · 0 comments
Labels
release-note Notes on the version release type/incompatible Changes that introduced incompatibility to Pegasus.

Comments

@hycdong
Copy link
Contributor

hycdong commented Sep 22, 2021

Since Pegasus 2.2.0 (released on June 2021), there are 170 commits, including several useful features and significant bug fix. We are ready to release Apache Pegasus 2.3.0.

New features

Partition split

Supporting scalability for table. One partition will be divided into two partitions. If the original partition count is 4, after partition split, the new partition count will be 8. More details can be found: partition-split-design-documents.
Related pull request in this release:

More detailed pull requests can be found: [#754]

User-defined compaction strategy

Supporting user specified compaction policy, more details can be found: user-specified-compaction-RFC. Related pull requests:

Cluster load balance

Supporting whole cluster load balance, more details can be found: [#761], related pull requests:

One time backup

Supporting trigger backup once immediately, more details can be found: [#755], related pull requests:

Enhancement

New perf-counters

Bug Fix

Duplication related fix

Graceful exit

Thrift unmarshall fix

Asan fix

Others

Refactor

Refactor load balance

Refactor pegasus_value_schema

Others

No code update

Configurations

[apps.replica]
- pools = THREAD_POOL_DEFAULT,THREAD_POOL_REPLICATION_LONG,THREAD_POOL_REPLICATION,THREAD_POOL_FD,THREAD_POOL_LOCAL_APP,THREAD_POOL_BLOCK_SERVICE,THREAD_POOL_COMPACT,THREAD_POOL_INGESTION,THREAD_POOL_SLOG,THREAD_POOL_PLOG
+ pools = THREAD_POOL_DEFAULT,THREAD_POOL_REPLICATION_LONG,THREAD_POOL_REPLICATION,THREAD_POOL_FD,THREAD_POOL_LOCAL_APP,THREAD_POOL_BLOCK_SERVICE,THREAD_POOL_COMPACT,THREAD_POOL_INGESTION,THREAD_POOL_SLOG,THREAD_POOL_PLOG,THREAD_POOL_SCAN

+[threadpool.THREAD_POOL_SCAN]
+  name = scan_query
+  partitioned = false
+  worker_priority = THREAD_xPRIORITY_NORMAL
+  worker_count = 24

[meta_server]
+ balance_cluster=false
+ balance_op_count_per_round=10

[nfs]
+ max_send_rate_megabytes=500

[replication]
+ reject_write_when_disk_insufficient=true
+ disk_min_available_space_ratio=10
+ ignore_broken_disk=true

[pegasus.server]
+ read_amp_bytes_per_bit = 0 # 0 means disble read amp counter
- update_rdb_stat_interval = 600
+ update_rdb_stat_interval = 60

// Add drop timeout request config for the specific task code
[task.RPC_RRDB_RRDB_PUT]
+  rpc_request_dropped_before_execution_when_timeout = true

[task.RPC_RRDB_RRDB_GET]
+  rpc_request_dropped_before_execution_when_timeout = true

Perf-Counters

// partition split related
+ replica*eon.replica_stub*replicas.splitting.count
+ replica*eon.replica_stub*replicas.splitting.max.duration.time(ms)
+ replica*eon.replica_stub*replicas.splitting.max.async.learn.time(ms)
+ replica*eon.replica_stub*replicas.splitting.max.copy.file.size
+ replica*eon.replica_stub*replicas.splitting.recent.start.count
+ replica*eon.replica_stub*replicas.splitting.recent.copy.file.count
+ replica*eon.replica_stub*replicas.splitting.recent.copy.file.size
+ replica*eon.replica_stub*replicas.splitting.recent.copy.mutation.count
+ replica*eon.replica_stub*replicas.splitting.succ.count
+ replica*eon.replica_stub*replicas.splitting.fail.count
+ replica*eon.replica*recent.write.splitting.reject.count@[gpid]
+ replica*eon.replica*recent.read.splitting.reject.count@[gpid]
+ collector*app.pegasus*app.stat.recent_write_splitting_reject_count#[table_name]
+ collector*app.pegasus*app.stat.recent_read_splitting_reject_count#[table_name]

// backup request throttling
+ replica*eon.replica*recent.backup.request.throttling.delay.count@[table_name]
+ replica*eon.replica*recent.backup.request.throttling.reject.count@[table_name]
+ collector*app.pegasus*app.stat.recent_backup_request_throttling_delay_count#[table_name]
+ collector*app.pegasus*app.stat.recent_backup_request_throttling_reject_count#[table_name]

// table-level hotpot partition count
+ collector*app.pegasus*app.stat.hotspots.temp.read.total#[table_name]
+ collector*app.pegasus*app.stat.hotspots.temp.write.total#[table_name]

// backup request size
+ replica*app.pegasus*backup_request_bytes@[gpid]
+ collector*app.pegasus*backup_request_bytes@[table_name]

// rocksdb read write amplification, hit count
+ replica*app.pegasus*rdb.read_amplification@[gpid]
+ replica*app.pegasus*rdb.write_amplification@[gpid]
+ replica*app.pegasus.rdb.read_memtable_total_count@[gpid]
+ replica*app.pegasus.rdb.read_memtable_hit_count@[gpid]
+ replica*app.pegasus*rdb.read_l0_hit_count@[gpid]
+ replica*app.pegasus*rdb.read_l1_hit_count@[gpid]
+ replica*app.pegasus*rdb.read_l2andup_hit_count@[gpid]
+ collector*app.pegasus*app.stat.rdb_read_amplification#[table_name]
+ collector*app.pegasus*app.stat.rdb.write_amplification#[table_name]
+ collector*app.pegasus*app.stat.rdb.read_memtable_hit_rate#[table_name]
+ collector*app.pegasus*app.stat.rdb.read_l0_hit_rate#[table_name]
+ collector*app.pegasus*app.stat.rdb.read_l1_hit_rate#[table_name]
+ collector*app.pegasus*app.stat.rdb.read_l2andup_hit_rate#[table_name]

// session count 
+ server*network*client_session_count

// bulk load reject write request
- replica_stub.bulk.load.ingestion.reject.write.count
+ replica*eon.replica*recent.write.bulk.load.ingestion.reject.count@[gpid]
+ collector*app.pegasus*app.stat.recent_write_bulk_load_ingestion_reject_count#[table_name]

// RocksDB compaction
+ collector*app.pegasus*app.stat.recent_rdb_compaction_input_bytes#[table_name]
+ collector*app.pegasus*app.stat.recent_rdb_compaction_output_bytes#[table_name]

// unmarshall failed count
+ replica*app.pegasus*recent_corrupt_write_count@[gpid]

// If drop timeout request for task, the counter will be added, for example(RPC_RRDB_RRDB_PUT):
+ zion*profiler*RPC_RRDB_RRDB_PUT.rpc.dropped

Performance

The following result is tested by YCSB, and the latency unit is us.

Case client and thread R:W R-QPS R-Avg R-P99 W-QPS W-Avg W-P99
Write Only 3 clients * 15 threads 0:1 - - - 42386 1060 6628
Read Only 3 clients * 50 threads 1:0 331623 585 2611 - - -
Read Write 3 clients * 30 threads 1:1 38766 1067 15521 38774 1246 7791
Read Write 3 clients * 15 threads 1:3 13140 819 11460 39428 863 4884
Read Write 3 clients * 15 threads 1:30 1552 937 9524 46570 930 5315
Read Write 3 clients * 30 threads 3:1 93746 623 6389 31246 996 5543
Read Write 3 clients * 50 threads 30:1 254534 560 2627 8481 901 3269

Contributors

acelyc111
cauchy1988
empiredan
hycdong
levy5307
lidingshengHHU
neverchanje
padmejin
Shuo-Jia
Smityz
zhangyifan27
ZhongChaoqiang

@hycdong hycdong added type/enhancement Indicates new feature requests release-note Notes on the version release and removed type/enhancement Indicates new feature requests labels Sep 22, 2021
@hycdong hycdong changed the title Prepare to Release 2.3.0 Release 2.3.0 Dec 2, 2021
@hycdong hycdong closed this as completed Dec 30, 2021
@foreverneverer foreverneverer added the type/incompatible Changes that introduced incompatibility to Pegasus. label Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Notes on the version release type/incompatible Changes that introduced incompatibility to Pegasus.
Projects
None yet
Development

No branches or pull requests

2 participants