Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate CMake rules to download and import models #10167

Merged
merged 6 commits into from
Aug 30, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions build_tools/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ function(benchmark_tool_py_test)
endfunction()

add_subdirectory(common)
add_subdirectory(suites)

benchmark_tool_py_test(
NAME
Expand Down
16 changes: 16 additions & 0 deletions build_tools/benchmarks/suites/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright 2022 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

################################################################################
# Tests
################################################################################

benchmark_tool_py_test(
NAME
cmake_rule_generator_test
SRC
"cmake_rule_generator_test.py"
)
204 changes: 204 additions & 0 deletions build_tools/benchmarks/suites/cmake_rule_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
## Copyright 2022 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
"""Generator that generates CMake rules from python defined benchmarks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I might have missed this in an earlier review / design doc)
Are the generated CMake rules going to be checked in, or just materialized when building benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully decided yet. But I believe it's possible to generate and include a CMakefile during the configuration? I'll prefer not to check in the generated CMakefile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that seems a bit scary into deep CMake land, which is not a nice land

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zooming out a bit, I'm wondering if CMake is the really right tool for building / running benchmarks. This moves one step away from CMake (generating rules via Python, instead of adding more logic to the rules themselves or authoring them manually), but what could this look like if we took another step in the same direction?

What is CMake (a build system / build system generator) actually providing for us here, and would other scripts / tools / build systems do a better job?

  • listing benchmark artifacts to build (CMake targets)
  • listing benchmarks to run
  • running commands (download files, invoke importer tools, invoke compiler tools, pushing files to devices, running benchmarks, collecting results, uploading results)

Not really provided:

  • Enumerating attached devices (e.g. choosing to run benchmarks on 1 of 3 connected Android devices)
  • Comparing results across devices / benchmark sessions / historical data

Moving away from CMake entirely seems like a stretch, but here's another idea: what if we made the benchmarks their own CMake project (or even their own repository)? We'd have more flexibility there with defining benchmark-specific options, including dependencies, and organizing the file structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only planning to use CMake to import and build the artifacts, since those have dependencies on the tools built by the build system. The benchmark list will be expressed in JSON and read by benchmark tools or other CI tools.

I agree it might be a good idea to have a separate project for benchmarks, but I don't see lots of benefits to do that for now, especially in the separate benchmark project, we need to write some extra CMake rules to find the import/compiler tools from the main IREE project. But it should be fairly easy to move these codes out when we need to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only planning to use CMake to import and build the artifacts, since those have dependencies on the tools built by the build system.

That's a very weak dependency though - the tools are often built separately.

especially in the separate benchmark project, we need to write some extra CMake rules to find the import/compiler tools from the main IREE project

Finding a few files doesn't seem particularly tricky? This function isn't doing much:

function(iree_get_executable_path OUTPUT_PATH_VAR EXECUTABLE)

For local iteration, it doesn't seem that unreasonable to me to require first building the tools, then running a benchmark script with the path to the tools. I don't think having the benchmarks and the compiler tools in the same CMake project adds that much value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think the mechanics of this PR itself are fine, but I'm trying to think through the design space to see if there's a more flexible / easier to maintain configuration that we could be building towards)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using CMake to manage dependencies between things like tflite -> mlir -> vmfb is good, let's not reinvent dependency management. But I think I agree with Scott that this could be a separate CMake project (doesn't need to be in this PR). The separate project could accept a path to an IREE install directory, just as we do for our crosscompile builds. In fact, it could still import the iree_macros.cmake, I think and so could reuse functions without actually being the same project. OTOH, what do we actually gain from the separate CMake project? We could make all benchmark options dependent options, so they're hidden unless you enable benchmarks, if the concern is just with that. Although I guess if they're a separate project they don't need to be as well namespaced, especially since it's a project we wouldn't need to craft so that someone else could use it in their own build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we actually gain from the separate CMake project?

  • a simpler "core" project
  • a dedicated space to focus on benchmarking (script and target namespacing, source file tree structure, build file tree structure, etc.)
  • enforced looser coupling between the source build, binary tools, and benchmarks

If we had a standalone iree-org/iree-benchmarking / iree-org/iree-performance / iree-org/iree-perf / iree-org/perf repository that just used the public APIs, files, scripts, etc. then it would be easier to come in from outside the project and make changes, or reference what that project has for running custom benchmarks. The current setup pretty tightly couples the specific benchmark workloads, the tools themselves, and the scripts used to orchestrate benchmarking all within the core IREE project.

A separate CMake project in the same repository offers some of those benefits, but I think it's useful to consider where the design could eventually end up. Running benchmarks on presubmit using tools and benchmark definitions from the same repository is a good argument to keep them in the same repository, but maybe once we have more programs in the test suite that will get tricky.

Copy link
Contributor Author

@pzread pzread Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the advantages will come from having a separate repository. I can expect that things will get tricky when we start adding more larger benchmarks (or even the benchmarks of other ML frameworks), especially right now everything is under build_tools/benchmarks. The larger benchmarks might include some unwanted deps to the core IREE repo (and again we need to use more flags to disable them).

But the new repository introduces the works of a separate CI and problems of sync between benchmarks and the core IREE repo (like how to make sure the changes on both sides won't break each other, since we want to run benchmarks for each commit, not just the nightly).

Nevertheless, I feel like this topic is stretching a lot from this PR, should we open a dedicated issue for it? : ) To me the current implementation isn't harmful for now and isn't hard to move out from IREE (mostly just refactoring some CMake rules). Just need to make sure to do the required refactoring/separation before introducing the new benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, I feel like this topic is stretching a lot from this PR, should we open a dedicated issue for it? : ) To me the current implementation isn't harmful for now and isn't hard to move out from IREE (mostly just refactoring some CMake rules).

SGTM, thanks for the discussion so far :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed the issue #10244


The rules will build required artifacts to run benchmarks.
"""

from dataclasses import dataclass
from typing import List, Optional
import os
import pathlib
import string
import urllib.parse

from .definitions import common_definitions, iree_definitions
from . import iree_benchmarks

# Template dir: build_tools/benchmarks/suites/../../cmake
TEMPLATE_DIR = pathlib.Path(__file__).parents[2] / "cmake"
pzread marked this conversation as resolved.
Show resolved Hide resolved
DOWNLOAD_ARTIFACT_CMAKE_TEMPLATE = string.Template(
open(TEMPLATE_DIR / "iree_download_artifact_template.cmake", "r").read())
TFLITE_IMPORT_CMAKE_TEMPLATE = string.Template(
open(TEMPLATE_DIR / "iree_tflite_import_template.cmake", "r").read())
TF_IMPORT_CMAKE_TEMPLATE = string.Template(
open(TEMPLATE_DIR / "iree_tf_import_template.cmake", "r").read())


@dataclass
class ModelRule(object):
target_name: str
file_path: str
cmake_rule: str


@dataclass
class IreeModelImportRule(object):
target_name: str
output_file_path: str
mlir_dialect_type: str
cmake_rule: Optional[str]


class CommonRuleFactory(object):
"""Generates common cmake rules."""

def __init__(self, model_artifacts_dir: str):
"""Constructs a CommonRuleFactory.

Args:
model_artifacts_dir: root directory to store model files. Can contain
CMake variable syntax in the path.
"""
self._model_artifacts_dir = model_artifacts_dir
self._model_rules = {}

def add_model_rule(self, model: common_definitions.Model) -> ModelRule:
"""Adds a rule to fetch a model. Reuses the existing rule when possible."""
if model.id in self._model_rules:
return self._model_rules[model.id]

# Model target: <package_name>-model-<model_id>
target_name = f"model-{model.id}"

model_url = urllib.parse.urlparse(model.source_url)
_, file_ext = os.path.splitext(model_url.path)
# Model path: <model_artifacts_dir>/<model_id>_<model_name>.<file ext>
model_path = f"{self._model_artifacts_dir}/{model.id}_{model.name}{file_ext}"

if model_url.scheme == "https":
cmake_rule = DOWNLOAD_ARTIFACT_CMAKE_TEMPLATE.substitute(
__TARGET_NAME=target_name,
__OUTPUT_PATH=model_path,
__SOURCE_URL=model.source_url)
else:
raise ValueError("Unsupported model url: {model.source_url}.")

model_rule = ModelRule(target_name=target_name,
file_path=model_path,
cmake_rule=cmake_rule)

self._model_rules[model.id] = model_rule
return model_rule

def generate_cmake_rules(self) -> List[str]:
"""Dump all cmake rules in a correct order."""
return [rule.cmake_rule for rule in self._model_rules.values()]


class IreeRuleFactory(object):
"""Generates IREE benchmark cmake rules."""

def __init__(self, iree_artifacts_dir):
"""Constructs an IreeRuleFactory.

Args:
iree_artifacts_dir: root directory to store generated IREE artifacts. Can
contain CMake variable syntax in the path.
"""
self._iree_artifacts_dir = iree_artifacts_dir
self._import_model_rules = {}
self._compile_module_rules = {}
self._generate_flagfile_rules = {}

def add_import_model_rule(
self,
model_id: str,
model_name: str,
model_source_type: common_definitions.ModelSourceType,
source_model_rule: ModelRule,
) -> IreeModelImportRule:
"""Adds a rule to fetch the model and import into MLIR. Reuses the rule when
possible."""

if model_id in self._import_model_rules:
return self._import_model_rules[model_id]

# If the source model is MLIR, no import rule is needed.
if model_source_type == common_definitions.ModelSourceType.EXPORTED_LINALG_MLIR:
import_model_rule = IreeModelImportRule(
target_name=source_model_rule.target_name,
output_file_path=source_model_rule.file_path,
mlir_dialect_type="linalg",
cmake_rule=None)
self._import_model_rules[model_id] = import_model_rule
return import_model_rule

# Import target: <package_name>_iree-import-model-<model_id>
target_name = f"iree-import-model-{model_id}"

# Imported MLIR path: <iree_artifacts_dir>/<model_id>_<model_name>/<model_name>.mlir
output_file_path = f"{self._iree_artifacts_dir}/{model_id}_{model_name}/{model_name}.mlir"

if model_source_type == common_definitions.ModelSourceType.EXPORTED_TFLITE:
cmake_rule = TFLITE_IMPORT_CMAKE_TEMPLATE.substitute(
__TARGET_NAME=target_name,
__SOURCE_MODEL_PATH=source_model_rule.file_path,
__OUTPUT_PATH=output_file_path)
mlir_dialect_type = "tosa"
elif model_source_type == common_definitions.ModelSourceType.EXPORTED_TF:
cmake_rule = TF_IMPORT_CMAKE_TEMPLATE.substitute(
__TARGET_NAME=target_name,
__SOURCE_MODEL_PATH=source_model_rule.file_path,
__OUTPUT_PATH=output_file_path)
mlir_dialect_type = "mhlo"
else:
raise ValueError(
f"Unsupported source type '{model_source_type}' of the model '{model_id}'."
)

import_model_rule = IreeModelImportRule(target_name=target_name,
output_file_path=output_file_path,
mlir_dialect_type=mlir_dialect_type,
cmake_rule=cmake_rule)

self._import_model_rules[model_id] = import_model_rule
return import_model_rule

def generate_cmake_rules(self) -> List[str]:
"""Dump all cmake rules in a correct order."""
import_model_rules = [
rule.cmake_rule for rule in self._import_model_rules.values()
]
return import_model_rules


def _generate_iree_benchmark_rules(common_rule_factory: CommonRuleFactory,
iree_artifacts_dir: str) -> List[str]:
iree_rule_factory = IreeRuleFactory(iree_artifacts_dir)
compile_specs, _ = iree_benchmarks.Linux_x86_64_Benchmarks.generate()
for compile_spec in compile_specs:
model = compile_spec.model

source_model_rule = common_rule_factory.add_model_rule(model)
iree_rule_factory.add_import_model_rule(model_id=model.id,
model_name=model.name,
model_source_type=model.source_type,
source_model_rule=source_model_rule)

# TODO(pzread): Generate the compilation and run rules.

return iree_rule_factory.generate_cmake_rules()


def generate_benchmark_rules(model_artifacts_dir: str,
iree_artifacts_dir: str) -> List[str]:
"""Generates cmake rules for all benchmarks.

Args:
model_artifacts_dir: root directory to store model files. Can contain CMake
variable syntax in the path.
iree_artifacts_dir: root directory to store generated IREE artifacts. Can
contain CMake variable syntax in the path.
Returns:
List of CMake rules.
"""
common_rule_factory = CommonRuleFactory(model_artifacts_dir)
iree_rules = _generate_iree_benchmark_rules(common_rule_factory,
iree_artifacts_dir)
# Currently the rules are simple so the common rules can be always put at the
# top. Need a topological sort once the dependency gets complicated.
return common_rule_factory.generate_cmake_rules() + iree_rules
116 changes: 116 additions & 0 deletions build_tools/benchmarks/suites/cmake_rule_generator_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/usr/bin/env python3
## Copyright 2022 The IREE Authors
#
# Licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

from suites.definitions import common_definitions
from suites import cmake_rule_generator
import unittest


class CommonRuleFactoryTest(unittest.TestCase):

def setUp(self):
self._factory = cmake_rule_generator.CommonRuleFactory("root/models")

def test_add_model_rule(self):
model = common_definitions.Model(
id="1234",
name="abcd",
tags=[],
source_type=common_definitions.ModelSourceType.EXPORTED_TFLITE,
source_url="https://example.com/xyz.tflite",
entry_function="main",
input_types=["1xf32"])

rule = self._factory.add_model_rule(model)

self.assertEqual(rule.target_name, "model-1234")
self.assertEqual(rule.file_path, "root/models/1234_abcd.tflite")

def test_generate_cmake_rules(self):
model_1 = common_definitions.Model(
id="1234",
name="abcd",
tags=[],
source_type=common_definitions.ModelSourceType.EXPORTED_TFLITE,
source_url="https://example.com/xyz.tflite",
entry_function="main",
input_types=["1xf32"])
model_2 = common_definitions.Model(
id="5678",
name="abcd",
tags=[],
source_type=common_definitions.ModelSourceType.EXPORTED_TFLITE,
source_url="https://example.com/xyz.tflite",
entry_function="main",
input_types=["1xf32"])
rule_1 = self._factory.add_model_rule(model_1)
rule_2 = self._factory.add_model_rule(model_2)

rules = self._factory.generate_cmake_rules()

self.assertEqual(len(rules), 2)
self.assertRegex(rules[0], rule_1.target_name)
self.assertRegex(rules[1], rule_2.target_name)


class IreeRuleFactoryTest(unittest.TestCase):

def setUp(self):
self._factory = cmake_rule_generator.IreeRuleFactory("root/iree")

def test_add_import_model_rule_import_model(self):
rule = self._factory.add_import_model_rule(
model_id="1234",
model_name="abcd",
model_source_type=common_definitions.ModelSourceType.EXPORTED_TFLITE,
source_model_rule=cmake_rule_generator.ModelRule(
target_name="model-1234", file_path="aaa", cmake_rule="bbb"))

self.assertEqual(rule.target_name, "iree-import-model-1234")
self.assertEqual(rule.mlir_dialect_type, "tosa")
self.assertEqual(rule.output_file_path, "root/iree/1234_abcd/abcd.mlir")

def test_add_import_model_rule_forward_mlir(self):
model_rule = cmake_rule_generator.ModelRule(
target_name="model-1234",
file_path="root/models/1234.mlir",
cmake_rule="bbb")

rule = self._factory.add_import_model_rule(
model_id="1234",
model_name="abcd",
model_source_type=common_definitions.ModelSourceType.
EXPORTED_LINALG_MLIR,
source_model_rule=model_rule)

self.assertEqual(rule.target_name, model_rule.target_name)
self.assertEqual(rule.mlir_dialect_type, "linalg")
self.assertEqual(rule.output_file_path, model_rule.file_path)

def test_generate_cmake_rules(self):
rule_1 = self._factory.add_import_model_rule(
model_id="1234",
model_name="abcd",
model_source_type=common_definitions.ModelSourceType.EXPORTED_TFLITE,
source_model_rule=cmake_rule_generator.ModelRule(
target_name="model-1234", file_path="aaa", cmake_rule="bbb"))
rule_2 = self._factory.add_import_model_rule(
model_id="5678",
model_name="efgh",
model_source_type=common_definitions.ModelSourceType.EXPORTED_TF,
source_model_rule=cmake_rule_generator.ModelRule(
target_name="model-5678", file_path="ccc", cmake_rule="eee"))

rules = self._factory.generate_cmake_rules()

self.assertEqual(len(rules), 2)
self.assertRegex(rules[0], rule_1.target_name)
self.assertRegex(rules[1], rule_2.target_name)


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ class DevicePlatform(Enum):

class ModelSourceType(Enum):
"""Type of model source."""
# Exported MLIR file.
EXPORTED_MLIR = "exported_mlir"
# Exported Linalg MLIR file.
EXPORTED_LINALG_MLIR = "exported_linalg_mlir"
# Exported TFLite model file.
EXPORTED_TFLITE = "exported_tflite"
# Exported SavedModel from Tensorflow.
Expand Down
15 changes: 15 additions & 0 deletions build_tools/cmake/iree_download_artifact_template.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Fetch the model from "$__SOURCE_URL"
add_custom_command(
OUTPUT "$__OUTPUT_PATH"
COMMAND
"$${Python3_EXECUTABLE}" "$${IREE_ROOT_DIR}/build_tools/scripts/download_file.py"
"$__SOURCE_URL" -o "$__OUTPUT_PATH"
DEPENDS
"$${IREE_ROOT_DIR}/build_tools/scripts/download_file.py"
COMMENT "Downloading $__SOURCE_URL"
)
add_custom_target(
"$${_PACKAGE_NAME}_$__TARGET_NAME"
DEPENDS
"$__OUTPUT_PATH"
)
6 changes: 6 additions & 0 deletions build_tools/cmake/iree_tf_import_template.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Import the Tensorflow model "$__SOURCE_MODEL_PATH"
iree_import_tf_model(
TARGET_NAME "$${_PACKAGE_NAME}_$__TARGET_NAME"
SOURCE "$__SOURCE_MODEL_PATH"
OUTPUT_MLIR_FILE "$__OUTPUT_PATH"
)
6 changes: 6 additions & 0 deletions build_tools/cmake/iree_tflite_import_template.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Import the TFLite model "$__SOURCE_MODEL_PATH"
iree_import_tflite_model(
TARGET_NAME "$${_PACKAGE_NAME}_$__TARGET_NAME"
SOURCE "$__SOURCE_MODEL_PATH"
OUTPUT_MLIR_FILE "$__OUTPUT_PATH"
)