- Python 3.8 or greater
- AMD EPYC CPU Family 25
- EDAC kernel MCA module
- numactl
- root permissions
- Linux
- bare metal non-virtualized environment
- PyYaml1
OFHC run script is a test orchestration framework. You can give OFHC a configuration file with the tests to run and it'll run them in the desired sequence and log pass/fail information on each test. OFHC handles test execution on the desired cores of the CPU and checking for MCEs to make the process simpler for CPU test writers.
python ./run.py {config_file} [--run-dir {run_dir} --log-dir {log_dir}]
Required
config_file
: The file to take the configuration from.
Optional
The optional fields are also available to set in the Configuration File. If you provide the command line arguments and they are provided in the configuration file, the command line arguments will take priority.
run_dir
: Points to root directory of the OFHC source files.
log_dir
: Points to a new or existing directory where all log files will be output.
The following section shows the configuration in a YAML
format, however OFHC also support JSON
.
All keys are case sensitive, so ensure you have the correct case for all configuration keys.
Tool Level Settings
Log_Level: All
Log_Directory: logs
Run_Directory: .
Constant_MCE_Checking: false
Log_Level
: See section Logging for details.Log_Directory
: Points to a new or existing directory where all log files will be output.Run_Directory
: Points to root directory of the OFHC source files.Constant_MCE_Checking
: Whentrue
, OFHC will check for a MCE after test execution. This helps user understand exactly which test caused an MCE, but also involves more overhead. Also, since OFHC only clears MCE's before beginning a test suite run, the log will show MCE's for the test that generated the MCE and all subsequent tests. Whenfalse
is specified, OFHC will check before and after the test suite is ran instead of after every test.
Test Configuration
Tests:
- Name: Example Test
Binary: ./example_test
Args:
- arg1: ...
Name
: Used for debugging and descriptive configuration file only.Binary
: path to any executable or an executable in your path.Args
: List of arguments to provide to the binary. See below for how to define the arguments.
Test Argument Settings
Tests:
- Name: Example Test
Binary: ./example_test
Args:
- arg1:
Option: "-a1"
Values:
- abc
- arg2:
Option: "-a2"
Values:
- 25
- 50
- 75
- arg3:
Option: "-a3"
Values: [1]
- Tox:
Option: "-tox"
Values:
- 90
- arg4:
Option: "-a4"
Flag: true
Constant: true
- arg5:
Option: "-a5"
Values:
- 0
- 1
- arg6:
Option: "-a6"
Flag: true
- Ordinary Arguments: An example of an ordinary argument is
arg1
,arg2
, andarg5
. These arguments consist of two separate keys, the firstOption
, which is the way that you specify an argument for your test in command line. TheValues
key requires a list for it's values and each item in the list represents a value for the argument. For example forarg1
would be run on Example Test like this:./example_test -a1 abc
. - Flags: These arguments are just a command line argument without a value. Examples are
arg4
andarg6
. A flag argument without theConstant
will run one time with the flag and one time without. IfConstant
is true on the flag then it will only run with the flag. For example flagarg4
on the Example Test command line would be./example_test -a4
. - Constant Arguments: If the
Constant
key on an argument is not specified, the default isfalse
. An argument can only be constant if it is a flag, or if it only contains one item in theValues
list.
Core Level Settings
Core_Config:
Sockets:
- 0
SMT: True
All: True
Halfs: True
Quarters: True
CCDs: True
Cores: True
- Sockets: list of sockets. Can either be 0, 1, or all, so for example if the list was
[0,1,all]
The test suite would first run on socket 0, then then socket 1, then all sockets together. (optional, defaults to all) - SMT: bool. When
true
the test suite will run on thread 0 and thread 1 together, iffalse
will only run on thread 0.
Each of the following core divisions default to false
if they are not specifed in the configruation.
- All: bool. When
true
, the suite will run on all cores together.false
means it will skip running on all cores.
For the following, core divisions, the configuration file can specify either a boolean or a list of integers. When boolean is specified as true, this will result in the test suite running on all the possible subdivisions one at a time. For example, when Halfs: true
is specified, the test suite will first run on all cores in half 0, then all cores in half 1. When an integer is speificed, it will only run on the specified subdivisions. For example, when CCDs: [0,2,7]
is specified, the suite will only run on CCD0, CCD2, then CCD7.
- Halfs: bool/list of ints.
- Quarters: bool/list of ints.
- CCDs: bool/list of ints.
- Cores: bool/list of ints. Cores specify logical cores
If the there already exists a log file in the log_dir
the most recent run will appended to existing log file.
cmd_results_list.log.csv
: A CSV file containing pass/fail results and details for all tests run in the suite.cur_cmd
: The current test being executed. This file is created before running the test so that if the test execution stops due to unexpected reasons, this file will reflect the failing test.cur_cmd.1
: This is the previous test before the current failing test.debug.log
: This is a file containing debugging information. It is only being created ifLog_Level: Debug
.
Bare
: No warnings, info, or debug messages are output. Only output file(cmd_results_list.log.csv
), current test file(cur_cmd
), and previous test (cur_cmd.1
).All
:Bare
plus logs warnings to stream.Excess
:All
plus captures system uptime in output file.Debug
:Excess
plus creates a Debug file with all messages.
Logging uptime in output only occurs when log level is set when log level is EXCESS
or higher.
It is expected that the test binaries will check for fails. When the executable detects a fail, it should return a non-zero exit code and provide any relevant details in STDOUT and STDERR. The executable should not control core affinity or any type of threading, the framework is expected to takes care of this.
The arguments should be compatible with the specifications in the configuration file.
On success, should use return code 0. On failure, use non-zero return code and output any details in STDOUT and STDERR.
Default to checking MCE after every test, to reduce overhead of OFHC, you can disable this by setting Constant_MCE_Checking
to false
in the configuration file. All details of the MCE is logged to the log file. Basic information is decoded and provided in a transparent way, however for extended debug, extra decoding of the MCEs maybe required. For this, contact a representative at AMD for OFHC.
User defined test is responsible for this. See User Defined Test Executable for details.
An example of a standard CPU stress test is System Stability Tester, or Systester
.
Included are $OFHC_ROOT/config_files/systester-config.yml
and $OFHC_ROOT/config_files/systester-config.json
.2
These configuration files give an example of the different argument types and how to specify them as specifically applies to Systester
.
To run this example test, download the binaries, extract, and
set the environment variable SYSTESTER_DIR
. Alternatively, you could modify the configuration files to point to where the
extracted binaries are located.
After the previous step you can simply run python run.py config_files/systester-config.json
to begin the test.
All log files will be in the logs
directory.
- Test Randomization
- Test Power Events
- Update Core Configuration to support custom core groupings