Skip to content

Benchmark Config example

Braden Stefanuk edited this page Dec 10, 2024 · 4 revisions

Warning

This wiki is obsolete. For the latest documentation, go to rocm.docs.amd.com/projects/Tensile

Tensile uses an incremental and "programmable" benchmarking protocol.

Example Benchmark config.yaml as input file to Tensile

GlobalParameters:
  PrintLevel: 1
  ForceRedoBenchmarkProblems: False
  ForceRedoLibraryLogic: True
  ForceRedoLibraryClient: True
  CMakeBuildType: Release
  EnqueuesPerSync: 1
  SyncsPerBenchmark: 1
  LibraryPrintDebug: False
  NumElementsToValidate: 128
  ValidationMaxToPrint: 16
  ValidationPrintValids: False
  ShortNames: False
  MergeFiles: True
  PlatformIdx: 0
  DeviceIdx: 0
  DataInitTypeAB: 0

BenchmarkProblems:
  - # sgemm NN
    - # ProblemType
      OperationType: GEMM
      DataType: s
      TransposeA: False
      TransposeB: False
      UseBeta: True
      Batched: True

    - # BenchmarkProblemSizeGroup
      InitialSolutionParameters:
      BenchmarkCommonParameters:
        - ProblemSizes:
          - Range: [ [5760], 0, [1], 0 ]
        - LoopDoWhile: [False]
        - NumLoadsCoalescedA: [-1]
        - NumLoadsCoalescedB: [1]
        - WorkGroupMapping: [1]
      ForkParameters:
        - ThreadTile:
          - [ 8, 8 ]
          - [ 4, 8 ]
          - [ 4, 4 ]
        - WorkGroup:
          - [  8, 16,  1 ]
          - [ 16, 16,  1 ]
        - LoopTail: [False, True]
        - EdgeType: ["None", "Branch", "ShiftPtr"]
        - DepthU: [ 8, 16]
        - VectorWidth: [1, 2, 4]
      BenchmarkForkParameters:
      BenchmarkJoinParameters:
      BenchmarkFinalParameters:
        - ProblemSizes:
          - Range: [ [5760], 0, [1], 0 ]

LibraryLogic:

LibraryClient:

Structure of config.yaml

Top level data structure whose keys are Parameters, BenchmarkProblems, LibraryLogic and LibraryClient.

  • Parameters contains a dictionary storing global parameters used for all parts of the benchmarking.
  • BenchmarkProblems contains a list of dictionaries representing the benchmarks to conduct; each element, i.e. dictionary, in the list is for benchmarking a single ProblemType. The keys for these dictionaries are ProblemType, InitialSolutionParameters, BenchmarkCommonParameters, ForkParameters, BenchmarkForkParameters, JoinParameters, BenchmarkJoinParameters and BenchmarkFinalParameters. See Benchmark Protocol for more information on these steps.
  • LibraryLogic contains a dictionary storing parameters for analyzing the benchmark data and designing how the backend library will select which Solution for certain ProblemSizes.
  • LibraryClient contains a dictionary storing parameters for actually creating the library and creating a client which calls into the library.

Global Parameters

  • Name: Prefix to add to API function names; typically name of device.
  • MinimumRequiredVersion: Which version of Tensile is required to interpret this yaml file
  • RuntimeLanguage: Use HIP or OpenCL runtime.
  • KernelLanguage: For OpenCL runtime, kernel language must be set to OpenCL. For HIP runtime, kernel language can be set to HIP or assembly (gfx803, gfx900).
  • PrintLevel: 0=Tensile prints nothing, 1=prints some, 2=prints a lot.
  • ForceRedoBenchmarkProblems: False means don't redo a benchmark phase if results for it already exist.
  • ForceRedoLibraryLogic: False means don't re-generate library logic if it already exist.
  • ForceRedoLibraryClient: False means don't re-generate library client if it already exist.
  • CMakeBuildType: Release or Debug
  • EnqueuesPerSync: Num enqueues before syncing the queue.
  • SyncsPerBenchmark: Num queue syncs for each problem size.
  • LibraryPrintDebug: True means Tensile solutions will print kernel enqueue info to stdout
  • NumElementsToValidate: Number of elements to validate; 0 means no validation.
  • ValidationMaxToPrint: How many invalid results to print.
  • ValidationPrintValids: True means print validation comparisons that are valid, not just invalids.
  • ShortNames: Convert long kernel, solution and files names to short serial ids.
  • MergeFiles: False means write each solution and kernel to its own file.
  • PlatformIdx: OpenCL platform id.
  • DeviceIdx: OpenCL or HIP device id.
  • DataInitType[AB,C]: Initialize validation data with 0=0's, 1=1's, 2=serial, 3=random.
  • KernelTime: Use kernel time reported from runtime rather than api times from cpu clocks to compare kernel performance.

The exhaustive list of global parameters and their defaults is stored in Common.py.

Problem Type Parameters

  • OperationType: GEMM or TensorContraction.
  • DataType: s, d, c, z, h
  • UseBeta: False means library/solutions/kernel won't accept a beta parameter; thus beta=0.
  • UseInitialStrides: False means data is contiguous in memory.
  • HighPrecisionAccumulate: For tmpC += a*b, use twice the precision for tmpC as for DataType. Not yet implemented.
  • ComplexConjugateA: True or False; ignored for real precision.
  • ComplexConjugateB: True or False; ignored for real precision.

For OperationType=GEMM only:

  • TransposeA: True or False.

  • TransposeB: True or False.

  • Batched: True (False has been deprecated). For OperationType=TensorContraction only (showing batched gemm NT: C[ijk] = Sum[l] A[ilk] * B[jlk])

  • IndexAssignmentsA: [0, 3, 2]

  • IndexAssignmentsB: [1, 3, 2]

  • NumDimensionsC: 3.

Solution / Kernel Parameters

See: Kernel Parameters.

Defaults

Because of the flexibility / complexity of the benchmarking process and, therefore, of the config.yaml files; Tensile has a default value for every parameter. If you neglect to put LoopUnroll anywhere in your benchmark, rather than crashing or complaining, Tensile will put the default LoopUnroll options into the default phase (common, fork, join...). This guarantees ease of use and more importantly backward compatibility; every time we add a new possible solution parameter, you don't necessarily need to update your configs; we'll have a default figured out for you.

However, this may cause some confusion. If your config fork 2 parameters, but you see that 3 were forked during benchmarking, that's because you didn't specify the 3rd parameter anywhere, so Tensile stuck it in its default phase, which was forking (for example). Also, specifying ForkParameters: and leaving it empty isn't the same as leaving JoinParameter out of your config. If you leave ForkParameters out of your config, Tensile will add a ForkParameters step and put the default parameters into it (unless you put all the parameters elsewhere), but if you specify ForkParameters and leave it empty, then you won't work anything.

Therefore, it is safest to specify all parameters in your config.yaml files; that way you'll guarantee the behavior you want. See /Tensile/Common.py for the current list of parameters.