By Bruno Pairault, 2019
ARM allow the counting of cycles via few techniques : DWT or nowdays Performance Monitor Unit (PMU). There is several versions of PMU for the different Cortex processor.
The ARM cycle count implementation is Linux Kernel dependant to allow the PMU managment for User. Enabling the PMU on user-mode is provided per Kernel-PMU and is independent of XKCP.
The XKCP PMU implementation supports two architectures yet Aarch64 and ARMv7-A. The two XKCP
PMU implementations to read cycles are coded in the XKCP file tests/UnitTest/Timing.h
This file
is only updated using the several native preprocessor compilation options. Performance Monitors Cycle
Counters are then read when required.
Kernel-PMU is a Linux dependant module that has to be loaded to allow the PMU user-mode.
XKCP tests/UnitTest/Timing.h
will then be abble to read the register.
The Cortex-A57
documentation can be found on the ARM site for details on the PMU . The Chapter 11
describes the Performance Monitor Unit. For all the Cortex A, you can reach the Cortex-A technical library.
##Principle of the computation
When the Kernel-PMU is loaded, the processor is likely to answer with the current cycle count for PMU 32
or 64 Architecture. Obviously similar PMU thread exist on Internet however this integration in XKCP follow exactly the principle introduced initially in Timing.h
by Doug Whiting
for Intel via RTDSC
. Minimal changes are required in XKCP. We added the few ASM instructions to read the PMU according to the precompilations options.
The number of cycles of an operation is the difference between the cycle counter before and the cycle
counter after.
-
Install the kernel-header version for your kernel release number given per the command
#uname -r
. The method is dependant on your Packet Manager (zypper, apt-get, dpkg).-
Raspbian (32 bits):
sudo apt-get install raspberrypi-kernel-headers
-
OpenSuse Leap 15 (64 bits): Likely on a fresh install
nothing to do
or if the OS has been updated you will have probably to runsudo zypper in kernel-dev
.
-
-
Compile Kernel-PMU
#cd Kernel-PMU #make
-
Load Kernel-PMU and Check
#sudo ./load-module #dmesg
The expect result for Cortex-a57 is similar to :
[ 6014.689804] ENABLE_ARM_PMU enabling user PMU access on CPU-Core #0 [ 6014.689809] ENABLE_ARM_PMU enabling user PMU access on CPU-Core #2 [ 6014.689813] ENABLE_ARM_PMU enabling user PMU access on CPU-Core #3 [ 6014.689820] ENABLE_ARM_PMU enabling user PMU access on CPU-Core #1
###Know-Issues
-
Depending on your Linux OS (Raspbian, Ubuntu Mate, OpenSuse) and your OS release you may also be required to download from source the kernel files to compile the module. The module will not load if the matching is not correct. Modify the path in the makefile if required.
-
The module must be reload after rebooting if you do not implement auto-load @reboot.
##Check the CPU configuration and the native GCC options
-
The objective is to understand the compilation you need to added if you use
-mnative
. -
Sanity check the CPU configuration and check all the native
-mXXX
compilations options likemarch
,mtune
.# lscpu # cat /proc/cpuinfo # gcc -v
###Know-Issues Raspbian case with Updated system OS on 2019-01-01
-
On Raspbian the Cortex-a53 is declared as an ARMv7l with 4 cores as per
lscpu
. GCC [(gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1)] come with native ARMv6-march
option. This is due to the implementation. Typecat /proc/cpuinfo
and you will discover the BCM2835 CPU which is the ARMv6 of the first RPI1(2010). It looks like it’s an ARMv6 for each of the 4 cores. The impact is that we have to force the precompiler options to use the PMU 32 bits for ARM. -
Cortex-a53 RPIB3+ is an ARMV8-a 64 bits running mostly 32 bits as most OS for RPI are 32 bits.
-
The preprocessor options
-D__ARM_xx
have to be consistent with the processor and the OS. On a 32 bits implementation the PMU register is the 32 bits PMU while the 64 bits PMU counter will be used on Aarch64.
##XKCP Compilation options tips for Cortex-a53 and RPIB3+
Depending on what you are trying to achieve you may like to check the compilation options for gcc or
modify the XKCP lib/LowLevel.build
. Below some suggestions.
For 32 bits OS on ARM PMU on target
The RPI3 on 32Bits OS is declared as a ARMv6 per GCC precompilation options. We have then to force
the ARM_ARCH_7A
option to read the 32 bit Cycle counter.
▪ <gcc>-v</gcc> This option add verbose
▪ <gcc>-D__ARM_ARCH_7A_</gcc>
For 64 bits OS on ARM on target : -mnative
will likely implement natively __aarch64
. it is
suggested per GCC as good practice to enable fix-cortex-a53-835769
and
fix-cortex-a53-843419
.
For Cross Compilation 32 bits
▪ <gcc>-march=armv8-a+crc</gcc>
▪ <gcc>-mtune=cortex-a53</gcc>
▪ <gcc>-mlittle-endian</gcc>
For Cross Compilation 64 bits using aarch64-linux-gnu-gcc
▪ <gcc>-march=armv8-a+crc</gcc>
▪ <gcc>-mtune=cortex-a53</gcc>
▪ <gcc>-mlittle-endian</gcc>
▪ <gcc>-mabi=lp64</gcc>
Finally compile KeccakTests.
##License This code is for GNU General Public License (GNU GPL or GPL) which guarantees end users the freedom to run, study, share and modify the Kernel-PMU.