Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate CMake rules to download and import models #10167

Merged
merged 6 commits into from
Aug 30, 2022

Conversation

pzread
Copy link
Contributor

@pzread pzread commented Aug 22, 2022

Add a module to generate CMake rules from the python-defined benchmarks.

The follow-up changes will generate the rules for compilation and run.

Here is an example of the generated CMake rules:

# Fetch the model from "https://storage.googleapis.com/iree-model-artifacts/mobilenet_v2_1.0_224.tflite"
add_custom_command(
  OUTPUT "${_MODEL_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2.tflite"
  COMMAND
    "${Python3_EXECUTABLE}" "${IREE_ROOT_DIR}/build_tools/scripts/download_file.py"
    "https://storage.googleapis.com/iree-model-artifacts/mobilenet_v2_1.0_224.tflite" -o "${_MODEL_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2.tflite"
  DEPENDS
    "${IREE_ROOT_DIR}/build_tools/scripts/download_file.py"
  COMMENT "Downloading https://storage.googleapis.com/iree-model-artifacts/mobilenet_v2_1.0_224.tflite"
)
add_custom_target(
    "${_PACKAGE_NAME}_model-7d45f8e5-bb5e-48d0-928d-8f125104578f"
  DEPENDS
    "${_MODEL_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2.tflite"
)

# Import the TFLite model "${_MODEL_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2.tflite"
iree_import_tflite_model(
  TARGET_NAME "${_PACKAGE_NAME}_iree-import-model-7d45f8e5-bb5e-48d0-928d-8f125104578f"
  SOURCE "${_MODEL_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2.tflite"
  OUTPUT_MLIR_FILE "${_IREE_ARTIFACTS_DIR}/7d45f8e5-bb5e-48d0-928d-8f125104578f_mobilenet_v2/mobilenet_v2.mlir"
)

@pzread pzread force-pushed the bench-framework-2-1 branch 2 times, most recently from 6507f8d to b0585c8 Compare August 23, 2022 16:54
@pzread pzread marked this pull request as ready for review August 23, 2022 20:19
build_tools/benchmarks/suites/cmake_rule_generator.py Outdated Show resolved Hide resolved
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
"""Generator that generates CMake rules from python defined benchmarks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I might have missed this in an earlier review / design doc)
Are the generated CMake rules going to be checked in, or just materialized when building benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully decided yet. But I believe it's possible to generate and include a CMakefile during the configuration? I'll prefer not to check in the generated CMakefile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that seems a bit scary into deep CMake land, which is not a nice land

@pzread pzread requested a review from ScottTodd August 26, 2022 22:02
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
"""Generator that generates CMake rules from python defined benchmarks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zooming out a bit, I'm wondering if CMake is the really right tool for building / running benchmarks. This moves one step away from CMake (generating rules via Python, instead of adding more logic to the rules themselves or authoring them manually), but what could this look like if we took another step in the same direction?

What is CMake (a build system / build system generator) actually providing for us here, and would other scripts / tools / build systems do a better job?

  • listing benchmark artifacts to build (CMake targets)
  • listing benchmarks to run
  • running commands (download files, invoke importer tools, invoke compiler tools, pushing files to devices, running benchmarks, collecting results, uploading results)

Not really provided:

  • Enumerating attached devices (e.g. choosing to run benchmarks on 1 of 3 connected Android devices)
  • Comparing results across devices / benchmark sessions / historical data

Moving away from CMake entirely seems like a stretch, but here's another idea: what if we made the benchmarks their own CMake project (or even their own repository)? We'd have more flexibility there with defining benchmark-specific options, including dependencies, and organizing the file structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only planning to use CMake to import and build the artifacts, since those have dependencies on the tools built by the build system. The benchmark list will be expressed in JSON and read by benchmark tools or other CI tools.

I agree it might be a good idea to have a separate project for benchmarks, but I don't see lots of benefits to do that for now, especially in the separate benchmark project, we need to write some extra CMake rules to find the import/compiler tools from the main IREE project. But it should be fairly easy to move these codes out when we need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only planning to use CMake to import and build the artifacts, since those have dependencies on the tools built by the build system.

That's a very weak dependency though - the tools are often built separately.

especially in the separate benchmark project, we need to write some extra CMake rules to find the import/compiler tools from the main IREE project

Finding a few files doesn't seem particularly tricky? This function isn't doing much:

function(iree_get_executable_path OUTPUT_PATH_VAR EXECUTABLE)

For local iteration, it doesn't seem that unreasonable to me to require first building the tools, then running a benchmark script with the path to the tools. I don't think having the benchmarks and the compiler tools in the same CMake project adds that much value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think the mechanics of this PR itself are fine, but I'm trying to think through the design space to see if there's a more flexible / easier to maintain configuration that we could be building towards)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using CMake to manage dependencies between things like tflite -> mlir -> vmfb is good, let's not reinvent dependency management. But I think I agree with Scott that this could be a separate CMake project (doesn't need to be in this PR). The separate project could accept a path to an IREE install directory, just as we do for our crosscompile builds. In fact, it could still import the iree_macros.cmake, I think and so could reuse functions without actually being the same project. OTOH, what do we actually gain from the separate CMake project? We could make all benchmark options dependent options, so they're hidden unless you enable benchmarks, if the concern is just with that. Although I guess if they're a separate project they don't need to be as well namespaced, especially since it's a project we wouldn't need to craft so that someone else could use it in their own build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we actually gain from the separate CMake project?

  • a simpler "core" project
  • a dedicated space to focus on benchmarking (script and target namespacing, source file tree structure, build file tree structure, etc.)
  • enforced looser coupling between the source build, binary tools, and benchmarks

If we had a standalone iree-org/iree-benchmarking / iree-org/iree-performance / iree-org/iree-perf / iree-org/perf repository that just used the public APIs, files, scripts, etc. then it would be easier to come in from outside the project and make changes, or reference what that project has for running custom benchmarks. The current setup pretty tightly couples the specific benchmark workloads, the tools themselves, and the scripts used to orchestrate benchmarking all within the core IREE project.

A separate CMake project in the same repository offers some of those benefits, but I think it's useful to consider where the design could eventually end up. Running benchmarks on presubmit using tools and benchmark definitions from the same repository is a good argument to keep them in the same repository, but maybe once we have more programs in the test suite that will get tricky.

Copy link
Contributor Author

@pzread pzread Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the advantages will come from having a separate repository. I can expect that things will get tricky when we start adding more larger benchmarks (or even the benchmarks of other ML frameworks), especially right now everything is under build_tools/benchmarks. The larger benchmarks might include some unwanted deps to the core IREE repo (and again we need to use more flags to disable them).

But the new repository introduces the works of a separate CI and problems of sync between benchmarks and the core IREE repo (like how to make sure the changes on both sides won't break each other, since we want to run benchmarks for each commit, not just the nightly).

Nevertheless, I feel like this topic is stretching a lot from this PR, should we open a dedicated issue for it? : ) To me the current implementation isn't harmful for now and isn't hard to move out from IREE (mostly just refactoring some CMake rules). Just need to make sure to do the required refactoring/separation before introducing the new benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, I feel like this topic is stretching a lot from this PR, should we open a dedicated issue for it? : ) To me the current implementation isn't harmful for now and isn't hard to move out from IREE (mostly just refactoring some CMake rules).

SGTM, thanks for the discussion so far :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed the issue #10244

# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
"""Generator that generates CMake rules from python defined benchmarks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, I feel like this topic is stretching a lot from this PR, should we open a dedicated issue for it? : ) To me the current implementation isn't harmful for now and isn't hard to move out from IREE (mostly just refactoring some CMake rules).

SGTM, thanks for the discussion so far :)

@pzread pzread merged commit ddaaaaa into iree-org:main Aug 30, 2022
benvanik added a commit that referenced this pull request Aug 30, 2022
commit f62ec3b
Author: bjacob <benoitjacob@google.com>
Date:   Tue Aug 30 14:51:10 2022 -0400

    VMVX mmt4d ukernel (#10239)

    This brings an initial (unoptimized, reference code only) mmt4d ukernel - both `f32f32f32` and `i8i8i32`.

    It is covered by the e2e matmul tests: if you purposefully introduce a numerical bug in the ukernel function, `iree_vmvx_mmt4d_f32f32f32` then this test fails: `iree/tests/e2e/matmul/e2e_matmul_mmt4d_f32_small_ukernel_vmvx_local-task` . Ditto for `i8i8i32`.

    That the whole reference code is for now in `module.c`, as opposed to being nicely isolated in `iree/builtings/ukernel`, is temporary. I have a few questions to ask about the placeholders in this directory, but it will be so much more concrete to discuss after we are done reviewing this PR so I hope that's OK to split as a separate code move for another PR.

    A couple of nontrivial decisions in this PR:

    * In `LowerLinalgMicrokernels.cpp` there was a `isUnitInnerStride` helper function. It was only applied to 2D memrefs. The underlying question is how much layout generality do we want ukernels to support, and the existing code embodied a decision on this for 2D arrays, but mmt4d deals with 4D arrays so the question was how to generalize this from 2D to 4D arrays. I chose to generalize `isUnitInnerStride` into `areInnerDimsContiguousRowMajor`. See the comment where it is defined. The lit test, `lower_linalg_microkernels.mlir`, has testcases to cover several edge cases here.

    * Similar to what we decided last week for matmul in #10211, there was the question of how to deal with the accumulators that is nonzero in the general case but that we know will often be zero in practice so that we will want to retain the ability to take advantage of that. This is handled here exactly like it was for matmul in #10211. I even reused the flag symbolic constant rather than create a separate one. Yay for weak typing.

commit ddaaaaa
Author: Jerry Wu <cheyuw@google.com>
Date:   Tue Aug 30 11:10:40 2022 -0700

    Generate CMake rules to download and import models (#10167)

commit 753ac4d
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Tue Aug 30 11:04:59 2022 -0700

    Remove RV32 Mobile Bert Compilation Benchmark (#10234)

    Building the benchmarks is currently the critical path in CI latency,
    taking almost 25 minutes for just that job, after it waits 25 minutes
    for the TF integrations binaries (was that alway so slow??).

    [![ci_run_graph](https://user-images.githubusercontent.com/5732088/187279027-21137775-5a3b-4ddf-ae4d-42e39051e7b2.png)](https://github.com/iree-org/iree/actions/runs/2950708667)


    Of that time, 20 minutes is spent compiling this one vmfb, which we
    only do so we can get statistics on how long it takes to compile. I
    sampled the ten slowest build actions from a local build of the
    benchmarks:

    ```
    1179.39 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-22179362840f853977acc734ee75e6ce.vmfb
    216.321 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-53b16b00b2d02162b1706d73ab6270b4.vmfb
    159.585 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-3bcb3f959e9f123bbaa01aa4d237bab8.vmfb
    146.027 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-73879267ae95d3551e73c7f078f4410d.vmfb
    109.922 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-quant.tflite.mlir-cf781c710ad5c59b5e7f205b17b3c37b.vmfb
    104.864 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-fddd07b06a1abf9f5d4ea97225066f01.vmfb
    88.665 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-4fe50b8684bdd4684941c8a5698d3a48.vmfb
    88.316 benchmark_suites/TFLite/vmfb/mobilebertsquad.tflite.mlir-833fba075c9cf413b8acbea9be0acade.vmfb
    87.238 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-float.tflite.mlir-8a916ab990bd1cb5521dce6dd6a5ac6a.vmfb
    86.905 benchmark_suites/TFLite/vmfb/mobilebert-baseline-tf2-float.tflite.mlir-e304860762a8369f86b813d45b3c699a.vmfb
    ```

    This one is the clear winner. I don't think it's worth running this
    compilation only to discover what we already know (it is very slow to
    run this compilation).

    Tested:
    - `build_benchmarks` for this PR ran in 7 minutes instead of 25.
    - Ran riscv benchmark pipeline.

commit a456db6
Author: CindyLiu <hcindyl@google.com>
Date:   Mon Aug 29 13:57:29 2022 -0700

    Add iree_bytecode_module and iree_c_module static lib support (#10231)

    Check and parse `iree-llvm-static-library-output-path` flag to add
    static library object support.

    To make the secondary function like iree_static_linker_test cleaner.

commit 6d4b129
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 29 13:32:42 2022 -0700

    Fix gcc build (#10235)

    Prevent ambigous constructor call.

commit 6fa18e0
Author: Kojo Acquah <KoolJBlack@users.noreply.github.com>
Date:   Mon Aug 29 12:20:04 2022 -0700

    Implementation of GPU Shared Memory Transpose Pipeline (#10209)

    Currently only `32x32` aligned 2D transposes are supported. Based on
    https://developer.nvidia.com/blog/efficient-matrix-transpose-cuda-cc/,
    uses a fixed tile size of `32x32` and workgroup size of `{8x32}` to
    preform vectorized copy for transpose. The tile is padded to `32x33` to
    reduce bank conflicts. Note that bank conflicts aren fully eliminated
    due to use of vector load/store 4.

    Todo:
    * Move beyond single hard coded workgroup and tile size?
    * Handle non aligned transpose
    * Handle dynamic sized transpose

    Related to #10005

commit 546ffcb
Author: Jerry Wu <cheyuw@google.com>
Date:   Mon Aug 29 11:49:58 2022 -0700

    Fix the typos of riscv names in CI (#10232)

commit da21e83
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 30 02:29:41 2022 +0800

    Optimize tiling sizes heuristics for elementwise dispatches. (#10179)

    In the past, small numbers could be picked because we want vectorization
    enabled for all the kernels. The PR picks a more reasonable tiling sizes
    and addresses tiny dispatch issues.

    The peeling pipeline works in IREE and the PR moves elementwise dispatches
    (and copy only dispatches) to use peeling approach. In this case, we're
    still able to vectorize the dispatches.

    This PR changes the logic to limit the unroll factor when computing the
    vector level tiling sizes. It avoids generating many operations, which saves
    many compilation time and binary size for quantized models. It also
    improves models performance that IREE is tracking for all CPU backends.

    Fixes #9660

commit 249c813
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 29 11:06:31 2022 -0700

    [LLVMGPU] Move bufferization after vectorization for matmulSIMT (#10217)

    Transition mamul SIMT pipeline to do vectorization before bufferization.
    This relies on alloc_tensor op to model shared memory promotion and
    foreach_thread for the tiling at the tensor level.

    Also simplify significantly the vectorization pass by removing patterns
    not needed anymore.

    This will allow us to do more optimizations at the tensor going forward.

commit 3f173de
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 29 09:52:45 2022 -0700

    Pin GitHub runner configuration to a specific commit (#10218)

    This changes the startup script on the runners to fetch configuration
    from a specific commit, rather than directly from tip of tree on
    `main`. That makes it possible to actually test, canary, and roll back
    configuration changes, almost as if this were a real production system.

    There are some early-stage scripts to automate the creation of
    templates and managed instance group roll-outs. I've also set up
    functionality to have testing runner groups. Because of the way
    targeting runners works, that means that workflow have to explicitly
    specify the environment so that testing runners *don't* pick up the
    job. The testing group will allow testing new runner configurations on
    presubmit as much as possible.

    Of course, for this change, I actually can't do the safe thing because
    I can't test adding the extra tag to the runners. I've still pushed
    a new template to the testing instance group and set the `build_all`
    job for this PR to run on it by targeting a specific instance by
    hostname: https://github.com/iree-org/iree/runs/8027570693. (Note that
    that run actually had a failure in the asan workflow, but that wasn't
    running on my runner and I don't think it could possibly be related).

    Note that because this doesn't alter the `config/` directory,
    submitting it will not have any effect on the current runners.

    skip-ci

    Peeled out of #10133

commit c50bac3
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 29 09:49:10 2022 -0700

    Build Linux releases on big managed runners (#10126)

    This speed the linux builds up a bit, bringing the time for the longest
    job down from ~5 hours to ~20 minutes. Note that this is *only* the
    Linux jobs. The mac ones still take about 4 hours. This should still
    help when iterating on the release though and for faster failure
    indicators (it was indeed helpful when I was iterating here).

    I ran into issues when testing because I was using a package suffix in
    the workflow dispatch, which evidently had never actually been tested
    because it was totally broken. This gave me a lot of wonderful
    opportunity to bash my head against bash and I reworked a lot of the
    `build_linux_package.sh` script. In retrospect, I wish I'd just removed
    the `package_suffix` feature.

    Test run: https://github.com/iree-org/iree/actions/runs/2923210349

    skip-ci

commit c338ae9
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 15:25:22 2022 -0700

    Cherry pick D132720 (#10227)

    Cherry pick : llvm/llvm-project@a235562
    Cherry pick : llvm/llvm-project@766f5d8

commit df4c96e
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 14:09:45 2022 -0700

    Cherry-pick llvm/llvm-project@7744253 (#10226)

    Towards landing #10177

commit b533909
Author: bjacob <benoitjacob@google.com>
Date:   Fri Aug 26 15:16:39 2022 -0400

    Support the i8i8i32 case in vmvx matmul ukernel. (#10222)

commit 62d2be5
Author: Scott Todd <scotttodd@google.com>
Date:   Fri Aug 26 11:46:33 2022 -0700

    [NFC] Slight cleanup in HAL compiler passes. (#10223)

commit 8a48e10
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 26 11:09:30 2022 -0700

    Cherry-pick llvm/llvm-project@2e34599b and llvm/llvm-project@1ee0d60a (#10221)

    * commit 2e34599bfd01e5b20e09bd6af590a52d6a63a64c
    * commit 1ee0d60a9be5dcbe3234b81a1c93e6a206a88154

commit cf5a5d5
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 10:10:40 2022 -0700

    Find root by traversing the compute ops in reverse. (#10210)

    Since most of the codegeneration uses tile + fuse, where the consumer
    is tiled and the producer is fused with it, find the root by
    traversing the ops in reverse.

    Issue #10208

commit 272ea37
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Fri Aug 26 10:00:56 2022 -0700

    Change `softmax` test to use `maxf`. (#10219)

    The e2e softmax test uses `cmpf` -> `select` for max operations. Use
    `maxf`instead. This allows the op to be vectorized. The TOSA to Linalg
    lowering has been recently updated to do the same (and this test was
    derived from using an older TOSA to Linalg lowering).

    Related to PR #10177

commit 233795f
Author: bjacob <benoitjacob@google.com>
Date:   Fri Aug 26 12:00:09 2022 -0400

    Tidy the VMVX ukernels matmul interface (#10211)

    This makes the VMVX ukernel interface for matmul somewhat sustainable and generalizable.

    It's official now that the only supported case is when all operands are row-major (more general support might be wanted in the future, but would have to allow separate storage orders for each operand in order to be likely to be used).

    The only flag now is one bit to tell whether to accumulate into an existing accumulator, or just zero it. At the moment we always accumulate but could soon generate calls without the accumulate flag when compiling code where the accumulator operand is known to be zero-filled. In terms of optimized runtime code, it is nearly zero overhead to support that boolean degree of generality in the ukernel.

    The "reference" ukernel impl is changed to be a little more suggestive of how an optimized impl would look.

    The alpha, beta parameters are gone. There were hard to generalize to integer data types, and they were mostly gratuitous generality anyway (they didn't do the same as the namesake BLAS GEMM parameters).

commit 094ec6d
Author: Lei Zhang <antiagainst@google.com>
Date:   Thu Aug 25 19:34:02 2022 -0400

    Integrate llvm/llvm-project@71604f4c4c30 (#10204)

    * Reset third_party/llvm-project: 8f45b5a7a90f24ae1dabeff161e22594039a8b0a (2022-08-24 20:26:48 +0000): RISCV: permit unaligned nop-slide padding emission
    * Updated tensorflow/tensorflow@aed7775
    * Updated tensorflow/mlir-hlo@3b1b023
    * Fixed mhlo include paths

commit 4f0c5b1
Author: Jakub Kuderski <kubak@google.com>
Date:   Thu Aug 25 19:05:01 2022 -0400

    Add debug option to dump LLVMCPU/GPU pass pipeline (#10214)

    This is enabled using the
    `--debug-only=iree-llvm-cpu-lowering-pass-pipeline` and `--debug-only=iree-llvm-gpu-lowering-pass-pipeline` flags.
    The SPIR-V codegen path has a similar option.

commit acb7355
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 16:24:21 2022 -0400

    Add e2e matmul tests on vmvx+ukernels (float32-only for now) (#10193)

    Other types than `float32` are blocked on vmvx ukernels support for those (#9903).  I'm interested in landing float32 support early because the path to supporting other data types goes through breaking changes in the existing vmvx ukernel interface for matmul (limiting the generality of the BLAS-inspired interface, particularly the `alpha` and `beta` parameters) so I want to have e2e tests in place at the start of that process.

commit 8863f9e
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Thu Aug 25 12:02:11 2022 -0700

    Cherry pick llvm/llvm-project@71604f4 (#10207)

    Fixes #10194

commit 22e6bd4
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 15:00:18 2022 -0400

    try to be compatible with more pyyaml versions (#10206)

commit da6829d
Author: Scott Todd <scotttodd@google.com>
Date:   Thu Aug 25 11:36:04 2022 -0700

    Replace dedicated host_tools CI job with superset build_all. (#10195)

    Relates to #9855

    These builds shared the same options but just built different targets. Just building the tools _is_ faster than building the tools and tests, but not by enough to justify having a separate job. The build_host_tools.sh script is still referenced by some samples, so I think it's worth keeping for a bit.

    * Spell out `build-dir-gcs-artifact` and `binaries-gcs-artifact` to match other output names
    * Remove host_tools.yml
    * Replace host_tools_assertions with build_all. Note that this uses GCS instead of upload-artifact/download-artifact for transferring archives between jobs
    * Sort jobs in `needs:` so the summary graph groups jobs as expected

    Note: `${BUILD_DIR}/install` is implicit. It could be made explicit with more plumbing.

    Co-authored-by: Geoffrey Martin-Noble <gcmn@google.com>

commit 38e718e
Author: bjacob <benoitjacob@google.com>
Date:   Thu Aug 25 13:32:55 2022 -0400

    Fix printing of matrices on test failure: was overflowing (#10202)

commit d8cabf7
Author: Kevin Gleason <gleasonk@google.com>
Date:   Thu Aug 25 12:20:52 2022 -0400

    Allow blank issues to be created (#10197)

    Currently clicking the "Blank Issue" button loops you back to the issue choose page because blank issues are disabled.

    When disabled, the following redirect is in place:
    https://github.com/iree-org/iree/issues/new  --> https://github.com/iree-org/iree/issues/new/choose

    Background: I based the StableHLO issues config off this file, and noticed that the blank issues are not working on that repo because they are disabled. Flipping this boolean did the trick in openxla/stablehlo.

commit 579d527
Author: Matthias Springer <springerm@google.com>
Date:   Thu Aug 25 09:37:22 2022 +0200

    Add CPU matmul benchmark test (#10174)

    This test illustrates how a simple matmul example can be compiled with
    the transform dialect and then benchmarked. Parameter search will use
    the commands that are used in this test.

commit 85171e9
Author: Lei Zhang <antiagainst@google.com>
Date:   Wed Aug 24 21:30:51 2022 -0400

    Cherry-pick MHLO dependency fix to fix release (#10198)

commit 1adcebb
Merge: 8301a5c 7fe1437
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Wed Aug 24 18:14:42 2022 -0700

    Merge pull request #10181 from iree-org/benvanik-execute-commands

    Secondary command buffers can now be executed from primary command buffers via iree_hal_command_buffer_execute_commands. During recording of nested command buffers push descriptors can indirectly reference slots in a binding table provided with each execution request. This enables the same reusable command buffer to be executed many times with unique bindings (even with prior execution in-flight), which is a common pattern with queue-ordered allocations.

    In the future we could allow the indirect bindings on primary command buffers as well but that requires more work in each backend to support and for now making it nested-only lets us turn on the feature incrementally. For now nothing supports either nested or indirect bindings so this is pure plumbing.

    The compiler has the HAL ops modeled but nothing is lowering into them yet; a pass that memoizes portions of streams and sets up the indirect binding references is required.

    Progress on #10144.
    Bumps bytecode version due to HAL changes.

commit 7fe1437
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 22:10:56 2022 -0700

    Disabling ASAN fully_connected.mlir test due to swiftshader issue.
    Same behavior as the other excluded tests from #5715.

commit dd93b3c
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 16:28:16 2022 -0700

    Bumping bytecode version due to breaking HAL changes.

commit 9bd7031
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 10:44:30 2022 -0700

    Plumbing support for nested command buffers and binding tables.
    Secondary command buffers can now be executed from primary command
    buffers via iree_hal_command_buffer_execute_commands. During recording
    of nested command buffers push descriptors can indirectly reference
    slots in a binding table provided with each execution request. This
    enables the same reusable command buffer to be executed many times
    with unique bindings (even with prior execution in-flight), which is
    a common pattern with queue-ordered allocations.

    In the future we could allow the indirect bindings on primary command
    buffers as well but that requires more work in each backend to support
    and for now making it nested-only lets us turn on the feature
    incrementally.

    The compiler has the HAL ops modeled but nothing is lowering into them
    yet; a pass that memoizes portions of streams and sets up the indirect
    binding references is required.

    Progress on #10144.

commit 8301a5c
Author: Scott Todd <scotttodd@google.com>
Date:   Wed Aug 24 16:39:27 2022 -0700

    Rework build_benchmarks to reuse already built host tools. (#10190)

    This should address #4662 (comment). This workflow is currently our slowest, taking ~32 minutes (of which half of that time is spent rebuilding `iree-compile`, and that's 30 minutes _after_ blocking on the 20 minute build_tf_integrations job).

    New timing is ~20 minutes (saving 10 minutes): https://github.com/iree-org/iree/runs/8004780350?check_suite_focus=true

commit cca2ff6
Author: bjacob <benoitjacob@google.com>
Date:   Wed Aug 24 18:46:11 2022 -0400

    Handle rank-reducing subviews in ResolveBufferDescriptors (#10192)

commit d9e6eb7
Author: CindyLiu <hcindyl@google.com>
Date:   Wed Aug 24 10:54:06 2022 -0700

    Update the candidate commitish value with the last green commit (#10183)

    Make it consistent with the rest of the release steps.

commit 1d55c6c
Author: Thomas <thomasraoux@google.com>
Date:   Wed Aug 24 10:06:44 2022 -0700

    clean up workaround after upstream fix (#10188)

commit c9e9482
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Wed Aug 24 08:20:05 2022 -0700

    Cherry-pick llvm/llvm-project@a7bfdc2 (#10150)

commit 00d34d1
Author: MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com>
Date:   Wed Aug 24 07:59:13 2022 -0700

    NFC: Refactoring to make extending fusion heuristics in dispatch formation easier. (#10187)

    Minor refactoring to allow for extending fusion heuristics for fusing
    root with producers.

commit 3c69ea9
Author: Jakub Kuderski <kubak@google.com>
Date:   Wed Aug 24 10:31:08 2022 -0400

    [iree-run-module] Do not abort when `Run` fails. (#10186)

commit 63d4693
Author: Jakub Kuderski <kubak@google.com>
Date:   Wed Aug 24 10:30:50 2022 -0400

    [iree-run-module] Clarify how to pass scalar inputs. NFC. (#10185)

    Be more explicit and provide an example.

commit 2ec165b
Author: Lei Zhang <antiagainst@google.com>
Date:   Wed Aug 24 00:33:04 2022 -0400

    Integrate llvm/llvm-project@4332b049edf6 (#10180)

    * Reset third_party/llvm-project: 4332b049edf6ccf98c9e31dcc983760a89f01d40 (2022-08-23 17:37:12 +0800): [docs] Add examples for printing asynchronous stack for coroutines
    * Updated tensorflow/tensorflow@55791c2
    * Updated tensorflow/mlir-hlo@184a76a
    * Fixed mhlo/chlo enum split.

commit ae72b95
Author: CindyLiu <hcindyl@google.com>
Date:   Tue Aug 23 15:26:08 2022 -0700

    Add llvm static library linker test targets (#10149)

    * Add llvm static library linker test targets

    Add a cmake function to build/test llvm static library modules with
    the llvm-cpu compiler target backend and executed using the
    local-sync runtime HAL driver. The executable is linked
    to a simple runtime runner generated by a template.

    Add simple e2e mlir linker tests in `tests/e2e/models`.

commit e4dc88c
Author: Rob Suderman <suderman@google.com>
Date:   Tue Aug 23 10:54:02 2022 -0700

    Update flex ops test for the TFLite front-end test (#10164)

commit 57ec69d
Author: Thomas <thomasraoux@google.com>
Date:   Tue Aug 23 09:01:23 2022 -0700

    [LLVMGPU] Start transitioning to scf.foreach for second level tiling (#10166)

    This will allow doing distribution at the tensor level.

commit bd33104
Merge: 35d28b9 d1ca241
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Tue Aug 23 08:55:34 2022 -0700

    Merge pull request #10170 from iree-org/benvanik-pipeline-layout-3

    Replacing descriptor set layout usage with a flag bitfield.
    Descriptor sets are only used in layouts and the usage is now always push-only today. As we support things like binding tables
    we may want to indicate which bindings may come from tables and if we want to carry access information (which bindings are read-only, etc) we'll need somewhere for that too: instead of having 4 enums with 2 options each we'll just mash them together for now.

    This also adds a per-descriptor flag that can be used for indicating binding behavior. Today it's got a bit indicating whether the particular descriptor is read-only but we could extend it to support caching behavior (non-temporal, atomics, etc).

    The upstream bitfield enum has some glitchy behavior with lowercase strings (hardcoded to look for "None" instead of "none", etc) - I've got a refresh of the HAL dialect I've got to do at some point and will normalize things then.

    Progress on #10144.
    VMFB version bumped because of breaking type/export name change.

commit 35d28b9
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 15:31:15 2022 +0200

    Support multiple target ops in clone_succeeding_op_into_dispatch_region (#10035)

    The target ops are sorted topoloically before cloning them one-by-one.
    This is to ensure that there are no dominance violations.

commit b5bf9d5
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 14:31:02 2022 +0200

    Add clone_succeeding_op_into_dispatch_region transform op (#10022)

    This op is symmetric to `clone_preceding_op_into_dispatch_region` and
    can be used to build heuristics for dispatch region formation.

commit 7e8c831
Author: Matthias Springer <springerm@google.com>
Date:   Tue Aug 23 11:56:07 2022 +0200

    Support multiple target ops in clone_preceding_op_into_dispatch_region (#10020)

    The target ops are sorted topoloically before cloning them one-by-one.
    This is to ensure that there are no dominance violations.

commit dc06d95
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 14:26:26 2022 +0800

    [NFC] Remove outdated method arguments from KernelConfig. (#10165)

    The distribution tiling was done at flow level, and it's moved to a
    stage after setting kernel configurations. We no longer need the
    tiledLoop information when setting configurations.

    Also apply minor cleanups when revisiting the file -- use `.empty()`
    method instead of `.size() > 0`.

commit d1ca241
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 21:42:35 2022 -0700

    Bumping bytecode version due to breaking HAL changes.

commit a4da601
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 15:50:25 2022 -0700

    Replacing descriptor set layout usage with a flag bitfield.
    Descriptor sets are only used in layouts and the usage is now
    always push-only today. As we support things like binding tables
    we may want to indicate which bindings may come from tables and
    if we want to carry access information (which bindings are read-only,
    etc) we'll need somewhere for that too: instead of having 4 enums
    with 2 options each we'll just mash them together for now.

    This also adds a per-descriptor flag that can be used for indicating
    binding behavior. Today it's got a placeholder read-only value but we
    can add more in the future controlling cache behavior and such.

    Progress on #10144.

commit 88795f5
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 16:22:05 2022 -0700

    Fixing deprecation warnings on mlir::OptionalParseResult.

commit d86e3a7
Author: Scott Todd <scotttodd@google.com>
Date:   Mon Aug 22 18:45:18 2022 -0700

    Remove ArithmeticExpandOpsPass from SPIRV and VMVX lowerings. (#10162)

    Based on discussion at #10142 (comment) . This "fixes" one case of `spv.IsNan` ops getting introduced while lowering of `arith.minf`, but it does not generally address NaNs coming from other sources (user-space or internal to the compiler).

    ## Rationale

    The `ArithmeticExpandOpsPass` pass (declaration [here](https://github.com/llvm/llvm-project/blob/af29db64b2c7091070dd623c81872559657e7b3d/mlir/include/mlir/Dialect/Arithmetic/Transforms/Passes.td#L31-L34) and [here](https://github.com/llvm/llvm-project/blob/af29db64b2c7091070dd623c81872559657e7b3d/mlir/include/mlir/Dialect/Arithmetic/Transforms/Passes.h#L23-L24)) is overly specific to a particular lowering to LLVM. The `minf` and `maxf` lowerings in particular generate IR like
    ```mlir
      %8 = arith.cmpf ult, %7, %5 : vector<1x5xf32>
      %9 = arith.select %8, %7, %5 : vector<1x5xi1>, vector<1x5xf32>
      %10 = arith.cmpf uno, %5, %5 : vector<1x5xf32>
      %11 = arith.select %10, %5, %9 : vector<1x5xi1>, vector<1x5xf32>
    ```
    rather than tunnel down to intrinsics like [`llvm.minnum`](https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic). Digging through the history a bit, I see where the min/max ops were added in https://reviews.llvm.org/D110540, which carries forward some rational for using `select` to implement min/max.

    For our uses, quoting @benvanik ,
    > Yeah, that cmp/select/cmp/select dance is really bad as IIRC LLVM/other backends can't/don't practically ever simplify that again while retaining the same semantics. The behavior that nearly everything uses is "return the non-nan value if either value is nan" (GLSL min, OpenCL fminf, C/C++ fminf, CUDA fminf, numpy.fmin, AVX minps, etc), aka "between a NaN and a numeric value, the numeric value is chosen". We need to make sure that if that's the intent of the model (which I hope it is, as it's the only thing that makes sense) we can propagate that all the way to backends. There's some ISAs that do weird things but it'd be better to pay the cost there rather than everywhere like we do today.

    So this PR removes the `ArithmeticExpandOpsPass` from our SPIRV and VMVX lowerings, allowing us to lower min/max/ceil/floor directly from `arith` to the backend dialects (e.g. `spv.GL.FMin`). The LLVM-based backends would need direct lowerings implemented for us to drop the pass there too (e.g. I see errors like `error: failed to legalize operation 'arith.maxf' that was explicitly marked illegal` if I remove it from the LLVMGPU pipeline used for CUDA).

commit 8ea0009
Author: Geoffrey Martin-Noble <gcmn@google.com>
Date:   Mon Aug 22 18:19:14 2022 -0700

    Add a script for deploying to PyPi (#10169)

    The old Python script just downloaded the release artifacts, which can
    be accomplished with the GitHub CLI. We need to repair the wheels for
    reasons that aren't quite clear (and this step should probably be moved
    to the release if we can't fix it directly), but this works for now.

    skip-ci

    Tested:
    Deployed a release to PyPi with this script.

      > View at:
      > https://pypi.org/project/iree-tools-tf/20220811.232/
      > https://pypi.org/project/iree-runtime-instrumented/20220811.232/
      > https://pypi.org/project/iree-tools-tflite/20220811.232/
      > https://pypi.org/project/iree-tools-xla/20220811.232/
      > https://pypi.org/project/iree-compiler/20220811.232/
      > https://pypi.org/project/iree-runtime/20220811.232/

commit c0fd1dc
Author: Jerry Wu <cheyuw@google.com>
Date:   Mon Aug 22 17:52:46 2022 -0700

    Define some IREE benchmarks as an example (#10115)

    Co-authored-by: Geoffrey Martin-Noble <gcmn@google.com>

commit 0ee5c15
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 07:39:30 2022 +0800

    Fix tests for midair collision. (#10163)

commit ef27692
Author: Lei Zhang <antiagainst@google.com>
Date:   Mon Aug 22 18:02:38 2022 -0400

    Integrate llvm/llvm-project@72136d8ba266 (#10159)

    * Reset third_party/llvm-project: 72136d8ba266eea6ce30fbc0e521c7b01a13b378 (2022-08-19 21:02:07 +0700): [Test] Add test for miscompile described in PR57247
    * Update third_party/mlir-hlo to 5e324a40db4aa956f7cbf24e9417557776e7a84f
    * Update tensorflow to 8a7764be0d32a72ad6d93ff3216520af184e26a0
    * Renamed `Confined` to `ConfinedAttr`
    * Updated `flow.dispatch.tensor.{load|store}` op assembly to use `custom<DynamicIndexList>`
    * Updated `operand_segment_sizes` to `DenseI32ArrayAttr`

commit d4ba930
Author: Han-Chung Wang <hanchung@google.com>
Date:   Tue Aug 23 05:26:35 2022 +0800

    Add a verifier and tuning examples for CPU convolution codegen. (#10147)

commit 3263ccd
Merge: c234161 b902d33
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 14:23:29 2022 -0700

    Merge pull request #10158 from iree-org/benvanik-pipeline-layout-2

    [NFC] Renaming "executable layout" to "pipeline layout".

commit b902d33
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 13:35:57 2022 -0700

    Bumping vmfb version due to break from renaming !hal.executable_layout.

commit b6afa47
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 13:35:00 2022 -0700

    Renaming `!hal.executable_layout` to `!hal.pipeline_layout`
    And similarly the runtime side to `iree_hal_pipeline_layout`.

    Progress on #10144.

commit 347660c
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 11:27:44 2022 -0700

    Starting rename of executable_layout -> pipeline_layout.

    Progress on #10144.

commit c234161
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 12:50:50 2022 -0700

    [NFC] Merging descriptor_set_layout.h into executable_layout.h. (#10154)

    Now that the layouts are only used together keeping them in the same
    place will make it easier to see how they fit and make them easier to
    refactor.

    Progress on #10144.

commit 8775cfe
Author: bjacob <benoitjacob@google.com>
Date:   Mon Aug 22 15:04:26 2022 -0400

    Script improvements (#10136)

    Post-merge review comments from #10132.

commit 1750213
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Mon Aug 22 10:40:28 2022 -0700

    Removing !hal.descriptor_set/iree_hal_descriptor_set_t. (#10146)

    It was never fully implemented and the combination of push descriptors
    and upcoming binding tables should be sufficient for our uses.

    Not a breaking change as the compiler had never emitted code using them.
    Progress on #10144.

commit e33c64c
Author: Thomas <thomasraoux@google.com>
Date:   Mon Aug 22 08:51:08 2022 -0700

    Cherry-pick mlir fix in linalg tiling (#10153)

    cherry-pick commit 06c02d5dbb13f6d2a10eaa75c236f3c61cdf5b91

commit 9b092fb
Author: Marius Brehler <marius.brehler@iml.fraunhofer.de>
Date:   Mon Aug 22 17:27:11 2022 +0200

    Don't explicitly set MLIR_PDLL_TABLEGEN_EXE (#10151)

    With llvm/llvm-project@91b6f76, the variable `MLIR_PDLL_TABLEGEN_EXE` is
    set as a cache variable in MLIR upstream.

commit 52e8625
Author: Han-Chung Wang <hanchung@google.com>
Date:   Sat Aug 20 08:03:19 2022 +0800

    Update default tiling sizes for ARM convolution configurations. (#10086)

    This is the first round of tuning for ARM normal convolution codegen. The parameters are derived from experiments for 3x3 kernel cases.

    Benchmark file:

    ```mlir
    util.global private @"__iree_flow_lhs" {noinline} = dense<1.0> : tensor<1x51x41x512xf32>
    util.global private @"__iree_flow_rhs" {noinline} = dense<1.0> : tensor<3x3x512x512xf32>
    func.func @conv_3x3filter() ->tensor<1x25x20x512xf32> {
      %lhs_ptr = util.global.address @"__iree_flow_lhs" : !util.ptr<tensor<1x51x41x512xf32>>
      %rhs_ptr = util.global.address @"__iree_flow_rhs" : !util.ptr<tensor<3x3x512x512xf32>>
      %lhs = util.global.load.indirect %lhs_ptr : !util.ptr<tensor<1x51x41x512xf32>> -> tensor<1x51x41x512xf32>
      %rhs = util.global.load.indirect %rhs_ptr : !util.ptr<tensor<3x3x512x512xf32>> -> tensor<3x3x512x512xf32>

      %cst = arith.constant 0.000000e+00 : f32
      %2 = linalg.init_tensor [1, 25, 20, 512] : tensor<1x25x20x512xf32>
      %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<1x25x20x512xf32>) -> tensor<1x25x20x512xf32>
      %4 = linalg.conv_2d_nhwc_hwcf
        { dilations = dense<1> : tensor<2xi64>, strides = dense<2> : tensor<2xi64>}
        ins(%lhs, %rhs : tensor<1x51x41x512xf32>, tensor<3x3x512x512xf32>)
        outs(%3 : tensor<1x25x20x512xf32>) -> tensor<1x25x20x512xf32>
      return %4 : tensor<1x25x20x512xf32>
    }
    ```

    Before:

    ```
    # 1-threaded, taskset 80
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time       1164 ms         1126 ms            1

    # 4-threaded, taskset f0
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time        643 ms         1764 ms            1
    ```

    After:

    ```
    # 1-threaded, taskset 80
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time        160 ms          155 ms            4

    # 4-threaded, taskset f0
    -----------------------------------------------------------------------------------
    Benchmark                                         Time             CPU   Iterations
    -----------------------------------------------------------------------------------
    BM_conv_3x3filter/process_time/real_time       65.6 ms          160 ms            9
    ```

commit 42244e7
Author: Stella Laurenzo <laurenzo@google.com>
Date:   Fri Aug 19 16:20:43 2022 -0700

    NFC: Convert util transforms to declarative registration. (#10143)

commit 979d6ea
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 19 12:31:23 2022 -0700

    Integrate llvm-project and bump dependencies. (#10140)

    * llvm-project: 619fd8c2ab505d8f79cbbbe3fd09b02f6640e1b1
    * mlir-hlo: cb55a7168c1841d05287677746a39a5de7cb855f
    * tensorflow: fc4021a8dd654606cd95e61a033691157853e122

    Additional changes:
    * rename member functions for tenor ops
    * Remove reluN tosa tests
    * carry patches for llvm and mhlo

commit cb0f8d4
Merge: e8ea103 65a9beb
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 11:40:59 2022 -0700

    Merge pull request #10141 from iree-org/benvanik-queue-barrier

    Adding iree_hal_device_queue_barrier helper and fixing pool enum.

commit 65a9beb
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 10:40:41 2022 -0700

    Changing iree_hal_allocator_pool_id_t to iree_hal_allocator_pool_t.
    I originally intended this to be a bitfield but forgot when plumbing.

commit 4c84f4a
Author: Ben Vanik <ben.vanik@gmail.com>
Date:   Fri Aug 19 10:28:36 2022 -0700

    Adding iree_hal_device_queue_barrier helper.

commit e8ea103
Author: Thomas <thomasraoux@google.com>
Date:   Fri Aug 19 03:54:33 2022 -0700

    [LLVMGPU] Add barriers when bufferization inserts shared memory copy (#10137)

    This is a conservative solution to avoid having race conditions when
    bufferization decides to emit shared memory copies.
@GMNGeoffrey GMNGeoffrey added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure labels Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants