-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use tokio threadpool and thread local metrics for readpool #4486
use tokio threadpool and thread local metrics for readpool #4486
Conversation
Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: fredchenbj <cfworking@163.com>
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
src/coprocessor/endpoint.rs
Outdated
// Keep running stream producer | ||
cpu_future.forget(); | ||
// cpu_future.forget(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @hicqu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the line directly. It's OK because spawn
has polled it internally. BTW I prefer to write
self.read_pool.spawn(...)?;
Ok(rx.then(|r| r.unwrap()))
to make it more clear that spawn
returns a Result<()>
.
src/server/readpool/mod.rs
Outdated
assert_eq!(rx.recv().unwrap(), Ok(7)); | ||
assert_eq!(rx.recv().unwrap(), Ok(4)); | ||
// the recv order maybe: "Ok(2)Ok(4)Ok(7)Ok(3)" or “Ok(2)Ok(3)Ok(4)Ok(7)” or “Ok(2)Ok(4)Ok(3)Ok(7)” | ||
print!("{:?}", rx.recv().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use assert_eq, not print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the recv order was certainly Ok(2)Ok(3)Ok(7)Ok(4), but now it's order changes every runs. So I am not sure whether it is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's not check the recv order any more. Let's only check whether or not full is returned, since futurepool already has complete tests. This is a work-stealing pool and the scheduling order is not as predictable as the previous one.
src/coprocessor/mod.rs
Outdated
@@ -35,21 +35,21 @@ mod endpoint; | |||
mod error; | |||
pub mod local_metrics; | |||
mod metrics; | |||
mod readpool_context; | |||
mod read_pool_impl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em, I prefer readpool_impl
src/storage/read_pool_impl.rs
Outdated
} | ||
|
||
#[inline] | ||
fn thread_local_flush(pd_sender: &FutureScheduler<PdTask>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using thread_local everywhere is long and redundant. If we want to represent thread_local, mostly, we can use tls
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, tls looks to be a good name!
src/util/future_pool/mod.rs
Outdated
} | ||
} | ||
|
||
/// Tried to trigger a tick in current thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo. It should be Tries
Thanks @fredchenbj It is a very cool feature. |
I think we should do more benchmarks /cc @breeswish please help @fredchenbj do some |
after this, I even think we can remove another threadpool, we can use future::lazy to wrap the task and so we can unify the thread pool. But we should also do the benchmark, IMO, tokio thread pool has a better performance than our thread pool @breeswish Another thing is to support dynamically changing thread number in the pool, but we must be careful about this, because now we will collect thread metrics and use thread ID as a label value. Dynamic thread means we may send too many label values to Prometheus. So maybe for the thread pool, we can use thread name instead of thread ID. /cc @overvenus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Mostly fine. How about the metrics? Have you checked that they are working as intended?
src/coprocessor/endpoint.rs
Outdated
.future_execute(priority, move |ctxd| { | ||
tracker.attach_ctxd(ctxd); | ||
.spawn_handle(priority, move || { | ||
tracker.init_current_stage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now mark state as initialized when tracker is built, so that this line doesn't need any more.
ReadPoolContext::new(pd_worker.scheduler()) | ||
}); | ||
let pool = | ||
coprocessor::ReadPoolImpl::build_read_pool(read_pool_cfg, pd_worker.scheduler(), "cop-fix"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name really important? I guess most of time default name should be enough because the rest of the usage are in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one maybe Builder::build_for_test()
is enough
storage::ReadPoolContext::new(pd_worker.scheduler()) | ||
}); | ||
let pd_worker = FutureWorker::new("test-pd-worker"); | ||
let storage_read_pool = storage::ReadPoolImpl::build_read_pool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may try to replace it using Builder::build_for_test
as well. It may work.
src/coprocessor/endpoint.rs
Outdated
let read_pool = ReadPool::new( | ||
"readpool", | ||
|
||
let read_pool = ReadPoolImpl::build_read_pool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one maybe we can use Builder::from_config(..).build()
(because we don't need on_tick or before_stop in this test). Similar for others.
src/storage/read_pool_impl.rs
Outdated
static LOCAL_KV_COMMAND_SCAN_DETAILS: RefCell<LocalIntCounterVec> = | ||
RefCell::new(KV_COMMAND_SCAN_DETAILS.local()); | ||
|
||
static LOCAL_PD_SENDER: RefCell<Option<FutureScheduler<PdTask>>> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it since it is not used.
src/coprocessor/read_pool_impl.rs
Outdated
LOCAL_COPR_EXECUTOR_COUNT.with(|m| m.borrow_mut().flush()); | ||
} | ||
|
||
pub fn collect(region_id: u64, type_str: &str, metrics: ExecutorMetrics) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename it to make it more clear. maybe.. thread_local_collect_executor_metrics
?
src/server/readpool/mod.rs
Outdated
struct Context; | ||
|
||
impl futurepool::Context for Context {} | ||
|
||
#[test] | ||
fn test_future_execute() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename it because no more "future_execute".
src/server/readpool/mod.rs
Outdated
assert_eq!(rx.recv().unwrap(), Ok(7)); | ||
assert_eq!(rx.recv().unwrap(), Ok(4)); | ||
// the recv order maybe: "Ok(2)Ok(4)Ok(7)Ok(3)" or “Ok(2)Ok(3)Ok(4)Ok(7)” or “Ok(2)Ok(4)Ok(3)Ok(7)” | ||
print!("{:?}", rx.recv().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's not check the recv order any more. Let's only check whether or not full is returned, since futurepool already has complete tests. This is a work-stealing pool and the scheduling order is not as predictable as the previous one.
src/storage/read_pool_impl.rs
Outdated
} | ||
|
||
#[inline] | ||
fn thread_local_flush(pd_sender: &FutureScheduler<PdTask>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, tls looks to be a good name!
src/util/future_pool/mod.rs
Outdated
} | ||
} | ||
|
||
/// Tried to trigger a tick in current thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo. It should be Tries
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
src/coprocessor/readpool_impl.rs
Outdated
use crate::coprocessor::dag::executor::ExecutorMetrics; | ||
|
||
thread_local! { | ||
pub static LOCAL_COPR_REQ_HISTOGRAM_VEC: RefCell<LocalHistogramVec> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we have so many metrics, it is better to use a structure to wrap all so we can only use one thread local var instead? /cc @breeswish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with both, maybe not much difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em, maybe we can do a benchmark, one local struct vs multi local vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will do a benchmark about this.
Signed-off-by: fredchenbj <cfworking@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! I'm fine with this PR, as long as the metrics are working as intended.
@breeswish please paste your benchmark results too |
…-local-metrics-for-readpool
/run-integration-tests |
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
src/coprocessor/readpool_impl.rs
Outdated
use prometheus::local::*; | ||
|
||
use crate::coprocessor::dag::executor::ExecutorMetrics; | ||
pub struct TlsCop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a blank line here
src/storage/readpool_impl.rs
Outdated
pub struct ReadPoolImpl; | ||
|
||
impl ReadPoolImpl { | ||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem we don't need inline here
Thanks @fredchenbj Great work!!! PTAL @breeswish @hicqu |
/run-integration-tests |
src/bin/tikv-server.rs
Outdated
ReadPool::new("store-read", &cfg.readpool.storage.build_config(), || { | ||
storage::ReadPoolContext::new(pd_sender.clone()) | ||
}); | ||
let storage_read_pool = storage::ReadPoolImpl::build_read_pool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer storage::ReadPool
. Impl
looks like a private thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadPool
had been used, maybe use ReadPoolProducer
. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or ReadPoolContext
? It just can build a ReadPool
and handle some metrics. It's not a ReadPool
indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just "derive"s the common ReadPool to create a specialized ReadPool that attached some name, some lifetime hook functions (like on_tick). That's why it was called ReadPoolImpl
. Producer
or Builder
might not be a very good name because it will be confusing for functions like Producer:: tls_collect_executor_metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Producer
and Builder
are not good enough. How about remove the struct?
src/coprocessor/readpool_impl.rs
Outdated
pub local_copr_rocksdb_perf_counter: RefCell<LocalIntCounterVec>, | ||
local_copr_executor_count: RefCell<LocalIntCounterVec>, | ||
local_copr_get_or_scan_count: RefCell<LocalIntCounterVec>, | ||
local_cop_flow_stats: RefCell<HashMap<u64, crate::storage::FlowStatistics>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many RefCells
. How about put the struct in a RefCell
or Mutex
? I think it's more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. You should arrange them as..
pub struct Xxx {
field: LocalIntCounter,
field_2: LocalIntCounter,
...
}
thread_local! {
pub static TLS_COP_METRICS: RefCell<TlsCop> = ...;
}
In this way, we only need to check borrow once when updating multiple fields.
Rest LGTM. Thank you very much! |
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
Friendly ping @siddontang @breeswish @hicqu |
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
LGTM. |
Signed-off-by: fredchenbj <cfworking@163.com>
PTAL @breeswish |
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
Signed-off-by: fredchenbj <cfworking@163.com>
90c8280
to
4338a7c
Compare
ping @siddontang @breeswish @hicqu , please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @hicqu @breeswish
* *:use tokio-threadpool and thread local metrics in Storage Signed-off-by: Breezewish <breezewish@pingcap.com>
What have you changed? (mandatory)
Before uses
futures-cpupool
to implementReadPool
, buttokio-threadpool
is faster and more stable under high race condition or workload, so replace it to improve the performance for storage read and coprocessor request. Meanwhile, use thread local variable to replace context struct for metrics.What are the type of the changes? (mandatory)
Improvement (change which is an improvement to an existing feature).
How has this PR been tested? (mandatory)
Unit tests, integration tests, and partial manual tests.
Does this PR affect documentation (docs) update? (mandatory)
No.
Does this PR affect tidb-ansible update? (mandatory)
No.
Refer to a related PR or issue link (optional)
No.
Benchmark result if necessary (optional)
From the pic above, under high wordload and stable qps, the p99 latency of read reduced about 14%, and the p999 latency reduced about 20%.
Add a few positive/negative examples (optional)