Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OP-TEE Benchmark #1365

Merged
merged 1 commit into from
May 30, 2017
Merged

OP-TEE Benchmark #1365

merged 1 commit into from
May 30, 2017

Conversation

igoropaniuk
Copy link
Contributor

@igoropaniuk igoropaniuk commented Feb 22, 2017

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change enables benchmarking for all OP-TEE invocations. Is this what you expected ?

(TEE_PARAM_TYPE_GET(type, 3) != TEE_PARAM_TYPE_NONE)) {
return TEE_ERROR_BAD_PARAMETERS;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should register also the buffer size.

DMSG("Registering stat buffer, addr = %p, paddr = %p\n",
p[1].memref.buffer,
(void *)virt_to_phys((void *)p[1].memref.buffer));
bench_stat_buf = (void *)p[1].memref.buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: buffer is already a void *. here and there.

DMSG("Sending back timestamp buffer paddr, paddr = %p\n",
(void *)virt_to_phys(bench_ts_buf));
p[0].value.a = virt_to_phys(bench_ts_buf);
p[1].value.a = virt_to_phys(bench_stat_buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value.a are 32bit, while bench buffers physical addresses might be 64bit.

DMSG("Unregistering benchmark, timestamp buffer paddr = %p\n",
(void *)virt_to_phys(bench_ts_buf));
bench_ts_buf = 0;
bench_stat_buf = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: NULL rather than 0. debug trace displays only the timestamp buffer, not the statistics one.

static void close_session(void *psess __unused)
{

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: as create_ta/destroy_ta/open_session/close_session are stubs, you can simply discard them.

static inline uint64_t read_ccounter(void)
{
uint64_t ccounter = 0;
#ifdef __aarch64__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: add empty line

static inline __attribute__((always_inline)) uintptr_t read_pc(void)
{
uintptr_t pc = 0;
#ifdef __aarch64__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: add empty line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ARM specific, please store it below core/arch/arm

struct tee_time_buf *timeb = NULL;
uint64_t ts_i;

if ((cmd_id & BENCHMARK_CMD(0)) == BENCHMARK_CMD(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about TAs that want to use such command IDs ? they won't be able to get benchmark stats.

static inline __attribute__((always_inline))
void bm_timestamp(uint32_t source, uint32_t cmd_id)
{
struct tee_time_buf *timeb = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: useless init

static void bm_timestamp(uint32_t source __unused,
uint32_t cmd_id __unused)
{
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add inline

minor: need this ; ?

DMSG("Registering timestamp buffer, addr = %p, paddr = %p\n",
p[0].memref.buffer,
(void *)virt_to_phys((void *)p[0].memref.buffer));
bench_ts_buf = (void *)p[0].memref.buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast not needed


DMSG("Registering timestamp buffer, addr = %p, paddr = %p\n",
p[0].memref.buffer,
(void *)virt_to_phys((void *)p[0].memref.buffer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast not needed
Use PRIxPA instead of p to print paddr_t values.

uint32_t nCommandID, uint32_t nParamTypes,
TEE_Param pParams[TEE_NUM_PARAMS])
{
DMSG("command entry point for static ta \"%s\"", TA_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How useful is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not useful at all, just forgot to delete it :)

@@ -0,0 +1,160 @@
/*
* Copyright (c) 2015, Linaro Limited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or drop the sta_ prefix entirely.

*/

#ifndef TEE_BENCH_H
#define TEE_BENCH_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the tee and core part of this filename doesn't seem a bit redundant.

static inline __attribute__((always_inline)) uintptr_t read_pc(void)
{
uintptr_t pc = 0;
#ifdef __aarch64__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ARM specific, please store it below core/arch/arm

@@ -450,8 +450,7 @@
* }
*/
#define TEE_PARAM_TYPES(t0,t1,t2,t3) \
((t0) | ((t1) << 4) | ((t2) << 8) | ((t3) << 12))

((t0) | ((t1) << 4) | ((t2) << 8) | ((t3) << 12))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change not needed


if (timeb->tm_ind >= TEE_BENCH_MAX_STAMPS)
return;
ts_i = timeb->tm_ind++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeb->tm_ind should be copied before checking the value as normal world may have a window where it can update it unchecked.

What about concurrent updates?

if (!bench_ts_buf)
return;

timeb = (struct tee_time_buf *) bench_ts_buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast not needed

}

/* Adding timestamp to buffer */
static inline __attribute__((always_inline))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With __builtin_return_address(0) instead below, or some clever macro to supply the return address as an argument to this function it wouldn't need to be inlined.

@igoropaniuk igoropaniuk changed the title [RFC] OP-TEE Benchmark OP-TEE Benchmark Apr 18, 2017
{
uint64_t ccounter = 0;
#ifdef __aarch64__
asm volatile("mrs %0, PMCCNTR_EL0" : "=r"(ccounter));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a read_pmccntr_el0() or similar in arm64.h and use that instead, and something similar for the ARM32 version below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already moving it there :)

DMSG("Registering timestamp buffer, addr = %p, paddr = %" PRIxPA "\n",
p[0].memref.buffer,
virt_to_phys(p[0].memref.buffer));
bench_ts_global = p[0].memref.buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check p[0].memref.buffer is mapped non-secure (case secure client) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client is secure, the buffer virtual address is likely to be temporary (to this TA invocation). Core should get the paddr and gets it core mapping to get a reliable virtual address for the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etienne-lms II didn't get it. Could you please provide any example of secure client case?
Or you mean TA-to-TA calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a TA invokes your pTA, the memref parameters can be SHM references or secure memory reference (specific case when TA use a memref parameter on a TA private buffer).
As we expect here a SHM buffer (later RPC to tee-supplicant), maybe you can only check tee_pbuf_is_non_sec(p[0].memref.buffer, p[0].memref.size).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added check


/* fill timestamp data */
if (cur_cpu >= bench_ts_global->cores)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • thread_set_foreign_intr(true);

Copy link
Contributor Author

@igoropaniuk igoropaniuk Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 199

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I can move call of read_ccounter(); a bit higher (to cur_cpu = get_core_pos(); ) to re-enable interrupt handling as soon as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if (cur_cpu >= bench_ts_global->cores) {
		thread_set_foreign_intr(true); 
		return;
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it

cpu_buf = &bench_ts_global->cpu_buf[cur_cpu];
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1);
ts_data.cnt = read_ccounter();
ts_data.addr = (uintptr_t) ret_addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: remove space before ret_addr

@@ -287,6 +288,8 @@ static void entry_invoke_command(struct thread_smc_args *smc_args,
struct tee_ta_session *s;
struct tee_ta_param param;

bm_timestamp();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a known ID on which REE gets comprehensive.
I think this should be like bm_timestamp(CORE_PRE/POST_IMVOKE_TA);
Maybe this can come later... Specified bm points and free bm points.

Copy link
Contributor Author

@igoropaniuk igoropaniuk Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saving __builtin_return_address(0); in bm_timestamp() isn't sufficient for these purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the PC gives a clear view of trace point when rendering the output data to a human readable format, then it is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's going to be mapped to specific file:line and function (the same as addr2line does)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK as it is for now.

if (!bench_ts_global)
return;

thread_set_foreign_intr(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if called with foreign interruptions not enabled? If will be enabled at routine exit.

Copy link
Contributor Author

@igoropaniuk igoropaniuk Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etienne-lms good question, my first intention was to use thread_restore_foreign_intr(), but, actually, I didn't fully get where and how the previous CPSR value is saved (based on only one example of thread_restore_foreign_intr() I found in void tee_svc_handler(struct thread_svc_regs *regs) implementation), so the only one option was just to do it manually with thread_get_exceptions/thread_set_exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32_t exceptions = thread_mask_exceptions(THREAD_EXCP_ALL);
...
thread_unmask_exceptions(exceptions);

Or at least, mask THREAD_EXCP_FOREIGN_INTR as it is the noe that can makes thread to migrate to another cpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!


/* fill timestamp data */
if (cur_cpu >= bench_ts_global->cores)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if (cur_cpu >= bench_ts_global->cores) {
		thread_set_foreign_intr(true); 
		return;
	}

@igoropaniuk
Copy link
Contributor Author

@etienne-lms @jenswi-linaro Could check please also https://github.com/linaro-swg/optee_benchmark/pull/1/files, it includes almost 40-50% of the functionality. Thanks


mutex_lock(&bench_reg_mu);

/* we accept only non-secure buffers, as we later perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: But I think checkpatch will complain here (multiline comment shall start with /* on a single line).


/* we accept only non-secure buffers, as we later perform
* registration of this buffer in NS layers
* (optee linx kmod/optee client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/linx/Linux/

exceptions = thread_mask_exceptions(THREAD_EXCP_ALL);
cur_cpu = get_core_pos();

/* fill timestamp data */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems a bit malplaced.

#define TEE_BENCH_DIVIDER 64

/* max amount of timestamps per buffer */
#define TEE_BENCH_MAX_STAMPS 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a rather low default value? Does it mean what it says? I.e, there is room for 32 unique timestamps, then it will wrap?

Copy link
Contributor Author

@igoropaniuk igoropaniuk Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e, there is room for 32 unique timestamps, then it will wrap?

right

#define OPTEE_MSG_RPC_CMD_BENCH_REG_NEW 0
#define OPTEE_MSG_RPC_CMD_BENCH_REG_DEL 1

/* OP-TEE susbsystems ids */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've started look into the PR in optee_os, but if this isn't a shared h-file (which I suspect it is), then we should maybe consider remove the ones not in use in optee_os. I'll update this comment when/if I've reviewed the other PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each rep(besides optee_benchmark, it uses optee_client's 'public/tee_bench.h` file) has it's own version of bench.h (and it's not just a copy-paste), with our defines which are related only to that specific subsystem.
Or maybe I just didn't get your comment

#ifdef CFG_TEE_BENCHMARK
void bm_timestamp(void);
#else
/* stub */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already rather obvious that this is a stub.

@@ -419,4 +419,5 @@ struct optee_msg_arg {
*/
#define OPTEE_MSG_RPC_CMD_SHM_FREE 7

#define OPTEE_MSG_RPC_CMD_BENCH_REG 20
Copy link
Contributor

@jbech-linaro jbech-linaro Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this command also deserves a comment just like the other cmd's here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@jbech-linaro
Copy link
Contributor

It'd be nice to have "some" documentation in a markdown file or similar, giving a quick high level overview of this. In the acceptance criteria (Jira), we have:

At optee_os/documentation there should be a file "benchmark.md". That describes, how to run the benchmark, how to interpret the data

That doesn't necessarily have to go into this PR, it could be done on a separate PR also.

ret_addr = __builtin_return_address(0);

cpu_buf = &bench_ts_global->cpu_buf[cur_cpu];
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__sync_fetch_and_add ?

Foreign interruptions (actually all interrupts) are already masked, so you can be pretty sure you can atomically update head value with even a complex SW sequence.

Also, if head is lower than tail an reaches tail when incremented, it should update tail and maybe raise an overrun flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foreign interruptions (actually all interrupts) are already masked, so you can be pretty sure you can atomically update head value with even a complex SW sequence.

Agree. This came from the version, when there was only common timestamp buffer for all CPUs. I'll leave __sync_fetch_and_add only for NW EL0 apps (optee_client).

Also, if head is lower than tail an reaches tail when incremented, it should update tail and maybe raise an overrun flag.

Right, we have already discussed it with Joakim, and he suggested to leave it for now as it is, and add support of overrun detection in further PRs. As for me, it's better just to add additioanl field to struct tee_time_st with will contain timestamp id (actually, it can be the head value, as it's incrementing all the time), so we can detect an over-run when parsing results by the actual timestamp analyzing app, which will detect a bit gap in timestamp IDs.


/* fill timestamp data */
if (cur_cpu >= bench_ts_global->cores) {
thread_unmask_exceptions(exceptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could panic the TEE. I guess we don't expect benchmark support in end product, so panicking here is not a issue.

#define BENCHMARK_CMD(id) (0xFA190000 | ((id) & 0xFFFF))
#define BENCHMARK_CMD_REGISTER_MEMREF BENCHMARK_CMD(1)
#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2)
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'pure' benchmark pTA API could go in a specific header file exported to TA devkit for being shared.
(I think Joakim had a comment about it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etienne-lms Timestamping within TAs isn't supported yet, do we really need to move this header for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor. Yet it is a good habit to locate pTA info definitions in the devkit so it can be used even by non-secure applications that already gets pTA info from the devkit, from include/pta_xxx.h, basically the pTA UUID and its command IDs.
See comment on BENCHMARK_UUID above.

#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2)
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3)

#define TEE_BENCH_DIVIDER 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about what this macro is about.

return val;
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: empty line

#include <string.h>
#include <string_ext.h>
#include <stdio.h>
#include <trace.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: sort those includes.


#define BENCHMARK_UUID \
{ 0x0b9a63b0, 0xb4c6, 0x4c85, \
{ 0xa2, 0x84, 0xa2, 0x28, 0xef, 0x54, 0x7b, 0x4e } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this could go in lib/libutee/include/pta_benchmark.h as other pTAs header files.
See also comment on BENCHMARK_CMD() below.

#define BENCHMARK_CMD(id) (0xFA190000 | ((id) & 0xFFFF))
#define BENCHMARK_CMD_REGISTER_MEMREF BENCHMARK_CMD(1)
#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2)
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor. Yet it is a good habit to locate pTA info definitions in the devkit so it can be used even by non-secure applications that already gets pTA info from the devkit, from include/pta_xxx.h, basically the pTA UUID and its command IDs.
See comment on BENCHMARK_UUID above.

ret_addr = __builtin_return_address(0);

cpu_buf = &bench_ts_global->cpu_buf[cur_cpu];
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is __sync_fetch_and_add() reliable ? A simple cpu_buf->head++; will fit.

@@ -601,6 +601,15 @@ static __always_inline uint32_t read_r7(void)
asm volatile ("mov %0, r7" : "=r" (val));
return val;
}

/* PMU cycle counter */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: useless comment

@@ -238,6 +238,15 @@ static __always_inline uint64_t read_fp(void)
return val;
}

/* PMU cycle counter */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: useless comment


res = rpc_reg_global_buf(OPTEE_MSG_RPC_CMD_BENCH_REG_DEL, 0, 0);

mutex_unlock(&bench_reg_mu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can release lock before the RPC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as : useless (from core) but armless mutex hold. ok like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* registration of this buffer in NS layers
* (optee linux kmod/optee client)
*/
if (!tee_pbuf_is_non_sec(p[0].memref.buffer, p[0].memref.size))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p[0].memref.buffer is a virtual address. Should use tee_vbuf_is_non_sec().

sizeof(struct tee_ts_cpu_buf) *
bench_ts_global->cores);

mutex_unlock(&bench_reg_mu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the lock does not need to be held during the RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no any synchronization on that side (in the RPC handler in optee linux driver)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etienne-lms

there is no any synchronization on that side (in the RPC handler in optee linux driver)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as bench_ts_global is stored, it is safe from core to use it, no ?
Anyway, holding the mutex a bit longer is not really armful. ok like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to handle the case, when register_benchmark_memref is called twice (for the case, when someone has executed optee_benchmark app at the same time), so if don't to wrap rpc_reg_global_buf with mutex, it's possible to have a race condition in static void handle_rpc_func_cmd_bm_reg(struct optee_msg_arg *arg) L312

I think it's better to add condition in the beginning of register_benchmark_memref which will check if we have registered a buffer before, and then we can avoid such situation and unlock mutex before rpc call

@igoropaniuk igoropaniuk force-pushed the optee_benchmark branch 2 times, most recently from e89356b to 467fea7 Compare May 19, 2017 11:15
@igoropaniuk
Copy link
Contributor Author

Rebased on latest master and tested

@jforissier
Copy link
Contributor

The commit description cannot be empty. Please explain what is meant by "benchmarking", how it is useful, how it is enabled (looks like it is always enabled? what if we want to reduce the size of the TEE? it is a debug feature after all, isn't it?), how one typically uses it, how it is implemented (high-level description, just to understand the interactions between the Secure/Non-secure kernel and user modes).

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add CFG_TEE_BENCHMARK ?= n with few explanations in mk/config.mk.

The current bench support is a bit touchy when more than 1 clients invokes the TEE while bench is being enabled. I've understood we can live with this constraint, but it should be stated somewhere.

Another tricky part: in SMP, a single invocation of the TEE can switch from 1 cpu to another resulting in the timestamp structures being stored in various 'core' related sub-buffers. Not easy to analyse. I think the client API should set the cpu affinity for the requested invocation when bench is enabled by client.


static TEE_Result invoke_command(void *pSessionContext __unused,
uint32_t nCommandID, uint32_t nParamTypes,
TEE_Param pParams[TEE_NUM_PARAMS])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: prefer lowercase for variable.

return;
}

/* fill timestamp data */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: useless comment.

res = copy_in_params(arg->params, num_params, &param);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this empty line.

*/
#define TEE_BENCH_DIVIDER 64

/* max amount of timestamps per buffer */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: start comment with a uppercase character.

@igoropaniuk
Copy link
Contributor Author

@jforissier

The commit description cannot be empty. Please explain what is meant by "benchmarking", how it is useful, how it is enabled (looks like it is always enabled? what if we want to reduce the size of the TEE? it is a debug feature after all, isn't it?), how one typically uses it, how it is implemented (high-level description, just to understand the interactions between the Secure/Non-secure kernel and user modes).

I'm going to add some basic information to this commit, that I'll create additional PR with extended Markdown doc with detailed information about its implementation, usage etc. Deal? :)

@jforissier
Copy link
Contributor

@igoropaniuk yep sounds good to me, thanks.

@jforissier
Copy link
Contributor

BTW @igoropaniuk if you want you can create and push an empty commit (git commit --allow-empty) so that we can review the description.

@igoropaniuk
Copy link
Contributor Author

@jforissier I've already amended current commit, and added description there (I hope it can be called a "description" :) )

@igoropaniuk
Copy link
Contributor Author

@etienne-lms @jforissier
Any updates?

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on commit comment:

(can be specified globally in build/common.mk)

Maybe remove this. This reference is out of optee_os scope.

sizeof(struct tee_ts_cpu_buf) *
bench_ts_global->cores);

mutex_unlock(&bench_reg_mu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as bench_ts_global is stored, it is safe from core to use it, no ?
Anyway, holding the mutex a bit longer is not really armful. ok like this.


res = rpc_reg_global_buf(OPTEE_MSG_RPC_CMD_BENCH_REG_DEL, 0, 0);

mutex_unlock(&bench_reg_mu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as : useless (from core) but armless mutex hold. ok like this.

@igoropaniuk
Copy link
Contributor Author

@etienne-lms all comments addressed

@igoropaniuk
Copy link
Contributor Author

I think I've addressed all comments so far and now think this is ready for a final review and if there are no more comments we should consider merging this now

@igoropaniuk
Copy link
Contributor Author

@etienne-lms @jbech-linaro @jenswi-linaro @jforissier
Thanks for all your comments. Is there anything else worth mentioning/fixing?

@jforissier
Copy link
Contributor

Abbreviating "benchmark" as "bench" sounds weird I think (bench.h, TEE_BENCH_* etc.) and is not done consistently, so it would be better to write "benchmark" always.
Anyway:
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>

@etienne-lms
Copy link
Contributor

Reviewed-by: Etienne Carriere <etienne.carriere@linaro.org>

OP-TEE Benchmark feature provides timestamp data for the roundtrip time
from libteec to OP-TEE OS core.

Benchmark PTA handles registration/unregistration commands of timestamp
buffer, invoked by optee_benchmark NW application, and performs
registration of timestamp buffer in the linux kernel optee driver via
RPC call.

To enable this feature set CFG_TEE_BENCHMARK compile flag to "y".

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Reviewed-by: Etienne Carriere <etienne.carriere@linaro.org>
Signed-off-by: Igor Opaniuk <igor.opaniuk@linaro.org>
@igoropaniuk
Copy link
Contributor Author

Applied tags and rebased

@igoropaniuk
Copy link
Contributor Author

@jforissier all these PRs are rebased on master, tags applied and ready for merge

linaro-swg/optee_benchmark#1
linaro-swg/linux#38
OP-TEE/build#124
#1365
OP-TEE/optee_client#79

thanks!

@jforissier jforissier merged commit d5b65f3 into OP-TEE:master May 30, 2017
@jforissier
Copy link
Contributor

All merged, thanks.

@jbech-linaro
Copy link
Contributor

Thanks all reviewers and big thanks to you @igoropaniuk for implementing this and having the patience of reworking it a couple of times.

@igoropaniuk igoropaniuk deleted the optee_benchmark branch July 4, 2017 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants