-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OP-TEE Benchmark #1365
OP-TEE Benchmark #1365
Conversation
f283897
to
6b613c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change enables benchmarking for all OP-TEE invocations. Is this what you expected ?
core/arch/arm/pta/sta_benchmark.c
Outdated
(TEE_PARAM_TYPE_GET(type, 3) != TEE_PARAM_TYPE_NONE)) { | ||
return TEE_ERROR_BAD_PARAMETERS; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should register also the buffer size.
core/arch/arm/pta/sta_benchmark.c
Outdated
DMSG("Registering stat buffer, addr = %p, paddr = %p\n", | ||
p[1].memref.buffer, | ||
(void *)virt_to_phys((void *)p[1].memref.buffer)); | ||
bench_stat_buf = (void *)p[1].memref.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: buffer
is already a void *
. here and there.
core/arch/arm/pta/sta_benchmark.c
Outdated
DMSG("Sending back timestamp buffer paddr, paddr = %p\n", | ||
(void *)virt_to_phys(bench_ts_buf)); | ||
p[0].value.a = virt_to_phys(bench_ts_buf); | ||
p[1].value.a = virt_to_phys(bench_stat_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value.a
are 32bit, while bench buffers physical addresses might be 64bit.
core/arch/arm/pta/sta_benchmark.c
Outdated
DMSG("Unregistering benchmark, timestamp buffer paddr = %p\n", | ||
(void *)virt_to_phys(bench_ts_buf)); | ||
bench_ts_buf = 0; | ||
bench_stat_buf = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: NULL
rather than 0
. debug trace displays only the timestamp buffer, not the statistics one.
core/arch/arm/pta/sta_benchmark.c
Outdated
static void close_session(void *psess __unused) | ||
{ | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: as create_ta/destroy_ta/open_session/close_session are stubs, you can simply discard them.
core/include/tee_bench_core.h
Outdated
static inline uint64_t read_ccounter(void) | ||
{ | ||
uint64_t ccounter = 0; | ||
#ifdef __aarch64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: add empty line
core/include/tee_bench_core.h
Outdated
static inline __attribute__((always_inline)) uintptr_t read_pc(void) | ||
{ | ||
uintptr_t pc = 0; | ||
#ifdef __aarch64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: add empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ARM specific, please store it below core/arch/arm
core/include/tee_bench_core.h
Outdated
struct tee_time_buf *timeb = NULL; | ||
uint64_t ts_i; | ||
|
||
if ((cmd_id & BENCHMARK_CMD(0)) == BENCHMARK_CMD(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about TAs that want to use such command IDs ? they won't be able to get benchmark stats.
core/include/tee_bench_core.h
Outdated
static inline __attribute__((always_inline)) | ||
void bm_timestamp(uint32_t source, uint32_t cmd_id) | ||
{ | ||
struct tee_time_buf *timeb = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: useless init
core/include/tee_bench_core.h
Outdated
static void bm_timestamp(uint32_t source __unused, | ||
uint32_t cmd_id __unused) | ||
{ | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add inline
minor: need this ;
?
core/arch/arm/pta/sta_benchmark.c
Outdated
DMSG("Registering timestamp buffer, addr = %p, paddr = %p\n", | ||
p[0].memref.buffer, | ||
(void *)virt_to_phys((void *)p[0].memref.buffer)); | ||
bench_ts_buf = (void *)p[0].memref.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast not needed
core/arch/arm/pta/sta_benchmark.c
Outdated
|
||
DMSG("Registering timestamp buffer, addr = %p, paddr = %p\n", | ||
p[0].memref.buffer, | ||
(void *)virt_to_phys((void *)p[0].memref.buffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast not needed
Use PRIxPA
instead of p
to print paddr_t
values.
core/arch/arm/pta/sta_benchmark.c
Outdated
uint32_t nCommandID, uint32_t nParamTypes, | ||
TEE_Param pParams[TEE_NUM_PARAMS]) | ||
{ | ||
DMSG("command entry point for static ta \"%s\"", TA_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How useful is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not useful at all, just forgot to delete it :)
core/arch/arm/pta/sta_benchmark.c
Outdated
@@ -0,0 +1,160 @@ | |||
/* | |||
* Copyright (c) 2015, Linaro Limited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or drop the sta_
prefix entirely.
core/include/tee_bench_core.h
Outdated
*/ | ||
|
||
#ifndef TEE_BENCH_H | ||
#define TEE_BENCH_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the tee
and core
part of this filename doesn't seem a bit redundant.
core/include/tee_bench_core.h
Outdated
static inline __attribute__((always_inline)) uintptr_t read_pc(void) | ||
{ | ||
uintptr_t pc = 0; | ||
#ifdef __aarch64__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ARM specific, please store it below core/arch/arm
@@ -450,8 +450,7 @@ | |||
* } | |||
*/ | |||
#define TEE_PARAM_TYPES(t0,t1,t2,t3) \ | |||
((t0) | ((t1) << 4) | ((t2) << 8) | ((t3) << 12)) | |||
|
|||
((t0) | ((t1) << 4) | ((t2) << 8) | ((t3) << 12)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change not needed
core/include/tee_bench_core.h
Outdated
|
||
if (timeb->tm_ind >= TEE_BENCH_MAX_STAMPS) | ||
return; | ||
ts_i = timeb->tm_ind++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeb->tm_ind
should be copied before checking the value as normal world may have a window where it can update it unchecked.
What about concurrent updates?
core/include/tee_bench_core.h
Outdated
if (!bench_ts_buf) | ||
return; | ||
|
||
timeb = (struct tee_time_buf *) bench_ts_buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast not needed
core/include/tee_bench_core.h
Outdated
} | ||
|
||
/* Adding timestamp to buffer */ | ||
static inline __attribute__((always_inline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With __builtin_return_address(0)
instead below, or some clever macro to supply the return address as an argument to this function it wouldn't need to be inlined.
6b613c8
to
8afc92f
Compare
0079d7b
to
7cc94f0
Compare
core/arch/arm/pta/benchmark.c
Outdated
{ | ||
uint64_t ccounter = 0; | ||
#ifdef __aarch64__ | ||
asm volatile("mrs %0, PMCCNTR_EL0" : "=r"(ccounter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a read_pmccntr_el0()
or similar in arm64.h
and use that instead, and something similar for the ARM32 version below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already moving it there :)
DMSG("Registering timestamp buffer, addr = %p, paddr = %" PRIxPA "\n", | ||
p[0].memref.buffer, | ||
virt_to_phys(p[0].memref.buffer)); | ||
bench_ts_global = p[0].memref.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check p[0].memref.buffer
is mapped non-secure (case secure client) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client is secure, the buffer virtual address is likely to be temporary (to this TA invocation). Core should get the paddr and gets it core mapping to get a reliable virtual address for the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etienne-lms II didn't get it. Could you please provide any example of secure client case?
Or you mean TA-to-TA calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a TA invokes your pTA, the memref parameters can be SHM references or secure memory reference (specific case when TA use a memref parameter on a TA private buffer).
As we expect here a SHM buffer (later RPC to tee-supplicant), maybe you can only check tee_pbuf_is_non_sec(p[0].memref.buffer, p[0].memref.size)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added check
core/arch/arm/pta/benchmark.c
Outdated
|
||
/* fill timestamp data */ | ||
if (cur_cpu >= bench_ts_global->cores) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- thread_set_foreign_intr(true);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or I can move call of read_ccounter();
a bit higher (to cur_cpu = get_core_pos();
) to re-enable interrupt handling as soon as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cur_cpu >= bench_ts_global->cores) {
thread_set_foreign_intr(true);
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it
core/arch/arm/pta/benchmark.c
Outdated
cpu_buf = &bench_ts_global->cpu_buf[cur_cpu]; | ||
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1); | ||
ts_data.cnt = read_ccounter(); | ||
ts_data.addr = (uintptr_t) ret_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: remove space before ret_addr
@@ -287,6 +288,8 @@ static void entry_invoke_command(struct thread_smc_args *smc_args, | |||
struct tee_ta_session *s; | |||
struct tee_ta_param param; | |||
|
|||
bm_timestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a known ID on which REE gets comprehensive.
I think this should be like bm_timestamp(CORE_PRE/POST_IMVOKE_TA);
Maybe this can come later... Specified bm points and free bm points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saving __builtin_return_address(0);
in bm_timestamp()
isn't sufficient for these purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the PC gives a clear view of trace point when rendering the output data to a human readable format, then it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's going to be mapped to specific file:line and function (the same as addr2line
does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK as it is for now.
core/arch/arm/pta/benchmark.c
Outdated
if (!bench_ts_global) | ||
return; | ||
|
||
thread_set_foreign_intr(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if called with foreign interruptions not enabled? If will be enabled at routine exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etienne-lms good question, my first intention was to use thread_restore_foreign_intr()
, but, actually, I didn't fully get where and how the previous CPSR value is saved (based on only one example of thread_restore_foreign_intr()
I found in void tee_svc_handler(struct thread_svc_regs *regs)
implementation), so the only one option was just to do it manually with thread_get_exceptions
/thread_set_exceptions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32_t exceptions = thread_mask_exceptions(THREAD_EXCP_ALL);
...
thread_unmask_exceptions(exceptions);
Or at least, mask THREAD_EXCP_FOREIGN_INTR
as it is the noe that can makes thread to migrate to another cpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
core/arch/arm/pta/benchmark.c
Outdated
|
||
/* fill timestamp data */ | ||
if (cur_cpu >= bench_ts_global->cores) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cur_cpu >= bench_ts_global->cores) {
thread_set_foreign_intr(true);
return;
}
@etienne-lms @jenswi-linaro Could check please also https://github.com/linaro-swg/optee_benchmark/pull/1/files, it includes almost 40-50% of the functionality. Thanks |
10e96b1
to
5e1556b
Compare
core/arch/arm/pta/benchmark.c
Outdated
|
||
mutex_lock(&bench_reg_mu); | ||
|
||
/* we accept only non-secure buffers, as we later perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: But I think checkpatch will complain here (multiline comment shall start with /*
on a single line).
core/arch/arm/pta/benchmark.c
Outdated
|
||
/* we accept only non-secure buffers, as we later perform | ||
* registration of this buffer in NS layers | ||
* (optee linx kmod/optee client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/linx/Linux/
core/arch/arm/pta/benchmark.c
Outdated
exceptions = thread_mask_exceptions(THREAD_EXCP_ALL); | ||
cur_cpu = get_core_pos(); | ||
|
||
/* fill timestamp data */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems a bit malplaced.
core/include/bench.h
Outdated
#define TEE_BENCH_DIVIDER 64 | ||
|
||
/* max amount of timestamps per buffer */ | ||
#define TEE_BENCH_MAX_STAMPS 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a rather low default value? Does it mean what it says? I.e, there is room for 32 unique timestamps, then it will wrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e, there is room for 32 unique timestamps, then it will wrap?
right
#define OPTEE_MSG_RPC_CMD_BENCH_REG_NEW 0 | ||
#define OPTEE_MSG_RPC_CMD_BENCH_REG_DEL 1 | ||
|
||
/* OP-TEE susbsystems ids */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I've started look into the PR in optee_os, but if this isn't a shared h-file (which I suspect it is), then we should maybe consider remove the ones not in use in optee_os. I'll update this comment when/if I've reviewed the other PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each rep(besides optee_benchmark
, it uses optee_client's 'public/tee_bench.h` file) has it's own version of bench.h (and it's not just a copy-paste), with our defines which are related only to that specific subsystem.
Or maybe I just didn't get your comment
core/include/bench.h
Outdated
#ifdef CFG_TEE_BENCHMARK | ||
void bm_timestamp(void); | ||
#else | ||
/* stub */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already rather obvious that this is a stub.
core/include/optee_msg.h
Outdated
@@ -419,4 +419,5 @@ struct optee_msg_arg { | |||
*/ | |||
#define OPTEE_MSG_RPC_CMD_SHM_FREE 7 | |||
|
|||
#define OPTEE_MSG_RPC_CMD_BENCH_REG 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this command also deserves a comment just like the other cmd's here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
It'd be nice to have "some" documentation in a markdown file or similar, giving a quick high level overview of this. In the acceptance criteria (Jira), we have:
That doesn't necessarily have to go into this PR, it could be done on a separate PR also. |
core/arch/arm/pta/benchmark.c
Outdated
ret_addr = __builtin_return_address(0); | ||
|
||
cpu_buf = &bench_ts_global->cpu_buf[cur_cpu]; | ||
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__sync_fetch_and_add
?
Foreign interruptions (actually all interrupts) are already masked, so you can be pretty sure you can atomically update head
value with even a complex SW sequence.
Also, if head
is lower than tail
an reaches tail
when incremented, it should update tail
and maybe raise an overrun flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Foreign interruptions (actually all interrupts) are already masked, so you can be pretty sure you can atomically update head value with even a complex SW sequence.
Agree. This came from the version, when there was only common timestamp buffer for all CPUs. I'll leave __sync_fetch_and_add
only for NW EL0 apps (optee_client).
Also, if head is lower than tail an reaches tail when incremented, it should update tail and maybe raise an overrun flag.
Right, we have already discussed it with Joakim, and he suggested to leave it for now as it is, and add support of overrun detection in further PRs. As for me, it's better just to add additioanl field to struct tee_time_st
with will contain timestamp id (actually, it can be the head
value, as it's incrementing all the time), so we can detect an over-run when parsing results by the actual timestamp analyzing app, which will detect a bit gap in timestamp IDs.
core/arch/arm/pta/benchmark.c
Outdated
|
||
/* fill timestamp data */ | ||
if (cur_cpu >= bench_ts_global->cores) { | ||
thread_unmask_exceptions(exceptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could panic the TEE. I guess we don't expect benchmark support in end product, so panicking here is not a issue.
core/include/bench.h
Outdated
#define BENCHMARK_CMD(id) (0xFA190000 | ((id) & 0xFFFF)) | ||
#define BENCHMARK_CMD_REGISTER_MEMREF BENCHMARK_CMD(1) | ||
#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2) | ||
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'pure' benchmark pTA API could go in a specific header file exported to TA devkit for being shared.
(I think Joakim had a comment about it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etienne-lms Timestamping within TAs isn't supported yet, do we really need to move this header for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor. Yet it is a good habit to locate pTA info definitions in the devkit so it can be used even by non-secure applications that already gets pTA info from the devkit, from include/pta_xxx.h, basically the pTA UUID and its command IDs.
See comment on BENCHMARK_UUID
above.
core/include/bench.h
Outdated
#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2) | ||
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3) | ||
|
||
#define TEE_BENCH_DIVIDER 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment about what this macro is about.
core/arch/arm/include/arm64.h
Outdated
return val; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: empty line
core/arch/arm/pta/benchmark.c
Outdated
#include <string.h> | ||
#include <string_ext.h> | ||
#include <stdio.h> | ||
#include <trace.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: sort those includes.
core/arch/arm/pta/benchmark.c
Outdated
|
||
#define BENCHMARK_UUID \ | ||
{ 0x0b9a63b0, 0xb4c6, 0x4c85, \ | ||
{ 0xa2, 0x84, 0xa2, 0x28, 0xef, 0x54, 0x7b, 0x4e } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this could go in lib/libutee/include/pta_benchmark.h as other pTAs header files.
See also comment on BENCHMARK_CMD()
below.
core/include/bench.h
Outdated
#define BENCHMARK_CMD(id) (0xFA190000 | ((id) & 0xFFFF)) | ||
#define BENCHMARK_CMD_REGISTER_MEMREF BENCHMARK_CMD(1) | ||
#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2) | ||
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor. Yet it is a good habit to locate pTA info definitions in the devkit so it can be used even by non-secure applications that already gets pTA info from the devkit, from include/pta_xxx.h, basically the pTA UUID and its command IDs.
See comment on BENCHMARK_UUID
above.
core/arch/arm/pta/benchmark.c
Outdated
ret_addr = __builtin_return_address(0); | ||
|
||
cpu_buf = &bench_ts_global->cpu_buf[cur_cpu]; | ||
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is __sync_fetch_and_add()
reliable ? A simple cpu_buf->head++;
will fit.
core/arch/arm/include/arm32.h
Outdated
@@ -601,6 +601,15 @@ static __always_inline uint32_t read_r7(void) | |||
asm volatile ("mov %0, r7" : "=r" (val)); | |||
return val; | |||
} | |||
|
|||
/* PMU cycle counter */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: useless comment
core/arch/arm/include/arm64.h
Outdated
@@ -238,6 +238,15 @@ static __always_inline uint64_t read_fp(void) | |||
return val; | |||
} | |||
|
|||
/* PMU cycle counter */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: useless comment
core/arch/arm/pta/benchmark.c
Outdated
|
||
res = rpc_reg_global_buf(OPTEE_MSG_RPC_CMD_BENCH_REG_DEL, 0, 0); | ||
|
||
mutex_unlock(&bench_reg_mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can release lock before the RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as : useless (from core) but armless mutex hold. ok like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/arch/arm/pta/benchmark.c
Outdated
* registration of this buffer in NS layers | ||
* (optee linux kmod/optee client) | ||
*/ | ||
if (!tee_pbuf_is_non_sec(p[0].memref.buffer, p[0].memref.size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p[0].memref.buffer
is a virtual address. Should use tee_vbuf_is_non_sec()
.
core/arch/arm/pta/benchmark.c
Outdated
sizeof(struct tee_ts_cpu_buf) * | ||
bench_ts_global->cores); | ||
|
||
mutex_unlock(&bench_reg_mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the lock does not need to be held during the RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no any synchronization on that side (in the RPC handler in optee linux driver)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no any synchronization on that side (in the RPC handler in optee linux driver)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as bench_ts_global
is stored, it is safe from core to use it, no ?
Anyway, holding the mutex a bit longer is not really armful. ok like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to handle the case, when register_benchmark_memref
is called twice (for the case, when someone has executed optee_benchmark
app at the same time), so if don't to wrap rpc_reg_global_buf
with mutex, it's possible to have a race condition in static void handle_rpc_func_cmd_bm_reg(struct optee_msg_arg *arg)
L312
I think it's better to add condition in the beginning of register_benchmark_memref
which will check if we have registered a buffer before, and then we can avoid such situation and unlock mutex before rpc call
e89356b
to
467fea7
Compare
Rebased on latest master and tested |
The commit description cannot be empty. Please explain what is meant by "benchmarking", how it is useful, how it is enabled (looks like it is always enabled? what if we want to reduce the size of the TEE? it is a debug feature after all, isn't it?), how one typically uses it, how it is implemented (high-level description, just to understand the interactions between the Secure/Non-secure kernel and user modes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add CFG_TEE_BENCHMARK ?= n
with few explanations in mk/config.mk.
The current bench support is a bit touchy when more than 1 clients invokes the TEE while bench is being enabled. I've understood we can live with this constraint, but it should be stated somewhere.
Another tricky part: in SMP, a single invocation of the TEE can switch from 1 cpu to another resulting in the timestamp structures being stored in various 'core' related sub-buffers. Not easy to analyse. I think the client API should set the cpu affinity for the requested invocation when bench is enabled by client.
core/arch/arm/pta/benchmark.c
Outdated
|
||
static TEE_Result invoke_command(void *pSessionContext __unused, | ||
uint32_t nCommandID, uint32_t nParamTypes, | ||
TEE_Param pParams[TEE_NUM_PARAMS]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: prefer lowercase for variable.
core/arch/arm/pta/benchmark.c
Outdated
return; | ||
} | ||
|
||
/* fill timestamp data */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: useless comment.
core/arch/arm/tee/entry_std.c
Outdated
res = copy_in_params(arg->params, num_params, ¶m); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this empty line.
core/include/bench.h
Outdated
*/ | ||
#define TEE_BENCH_DIVIDER 64 | ||
|
||
/* max amount of timestamps per buffer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: start comment with a uppercase character.
467fea7
to
155aa3b
Compare
I'm going to add some basic information to this commit, that I'll create additional PR with extended Markdown doc with detailed information about its implementation, usage etc. Deal? :) |
155aa3b
to
e70ab0e
Compare
@igoropaniuk yep sounds good to me, thanks. |
BTW @igoropaniuk if you want you can create and push an empty commit (git commit --allow-empty) so that we can review the description. |
@jforissier I've already amended current commit, and added description there (I hope it can be called a "description" :) ) |
@etienne-lms @jforissier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on commit comment:
(can be specified globally in build/common.mk)
Maybe remove this. This reference is out of optee_os scope.
core/arch/arm/pta/benchmark.c
Outdated
sizeof(struct tee_ts_cpu_buf) * | ||
bench_ts_global->cores); | ||
|
||
mutex_unlock(&bench_reg_mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as bench_ts_global
is stored, it is safe from core to use it, no ?
Anyway, holding the mutex a bit longer is not really armful. ok like this.
core/arch/arm/pta/benchmark.c
Outdated
|
||
res = rpc_reg_global_buf(OPTEE_MSG_RPC_CMD_BENCH_REG_DEL, 0, 0); | ||
|
||
mutex_unlock(&bench_reg_mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as : useless (from core) but armless mutex hold. ok like this.
e70ab0e
to
373534a
Compare
@etienne-lms all comments addressed |
373534a
to
43891c3
Compare
I think I've addressed all comments so far and now think this is ready for a final review and if there are no more comments we should consider merging this now |
43891c3
to
72f97fc
Compare
@etienne-lms @jbech-linaro @jenswi-linaro @jforissier |
Abbreviating "benchmark" as "bench" sounds weird I think (bench.h, TEE_BENCH_* etc.) and is not done consistently, so it would be better to write "benchmark" always. |
|
OP-TEE Benchmark feature provides timestamp data for the roundtrip time from libteec to OP-TEE OS core. Benchmark PTA handles registration/unregistration commands of timestamp buffer, invoked by optee_benchmark NW application, and performs registration of timestamp buffer in the linux kernel optee driver via RPC call. To enable this feature set CFG_TEE_BENCHMARK compile flag to "y". Acked-by: Jerome Forissier <jerome.forissier@linaro.org> Reviewed-by: Etienne Carriere <etienne.carriere@linaro.org> Signed-off-by: Igor Opaniuk <igor.opaniuk@linaro.org>
72f97fc
to
6138e8a
Compare
Applied tags and rebased |
@jforissier all these PRs are rebased on master, tags applied and ready for merge linaro-swg/optee_benchmark#1 thanks! |
All merged, thanks. |
Thanks all reviewers and big thanks to you @igoropaniuk for implementing this and having the patience of reworking it a couple of times. |
Related PRs:
Benchmark app:
linaro-swg/optee_benchmark#1
Changes in OP-TEE
linaro-swg/linux#38
OP-TEE/build#124
#1365
OP-TEE/optee_client#79