diff --git a/README.md b/README.md
index 4bd12d056..b67ec63dc 100644
--- a/README.md
+++ b/README.md
@@ -97,9 +97,9 @@ In Python, import `compiler_gym` to use the environments:
>>> env.close() # closes the environment, freeing resources
```
-See the [documentation website](http://facebookresearch.github.io/CompilerGym/)
-for tutorials, further details, and API reference. See the [examples](/examples)
-directory for pytorch integration, agent implementations, etc.
+See the [examples](/examples) directory for agent implementations, environment
+extensions, and more. See the [documentation
+website](http://facebookresearch.github.io/CompilerGym/) for the API reference.
## Leaderboards
diff --git a/compiler_gym/util/shell_format.py b/compiler_gym/util/shell_format.py
index abf1b18ce..add683a93 100644
--- a/compiler_gym/util/shell_format.py
+++ b/compiler_gym/util/shell_format.py
@@ -28,3 +28,8 @@ def emph(stringable: Any) -> str:
def plural(quantity: int, singular: str, plural: str) -> str:
"""Return the singular or plural word."""
return singular if quantity == 1 else plural
+
+
+def indent(string: str, n=4) -> str:
+ """Indent a multi-line string by given number of spaces."""
+ return "\n".join(" " * n + x for x in str(string).split("\n"))
diff --git a/examples/README.md b/examples/README.md
new file mode 100644
index 000000000..809001dd1
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1,226 @@
+# CompilerGym Examples
+
+This directory contains code samples for everything from implementing simple
+RL agents to adding support for entirely new compilers. Is there an example that
+you think is missing? If so, please [contribute](/CONTRIBUTING.md)!
+
+
+**Table of contents:**
+
+- [Autotuning](#autotuning)
+ - [Performing a random walk of an environment](#performing-a-random-walk-of-an-environment)
+ - [GCC Autotuning (genetic algorithms, hill climbing, + more)](#gcc-autotuning-genetic-algorithms-hill-climbing--more)
+ - [Makefile integration](#makefile-integration)
+ - [Random search using the LLVM C++ API](#random-search-using-the-llvm-c-api)
+- [Reinforcement learning](#reinforcement-learning)
+ - [PPO and integration with RLlib](#ppo-and-integration-with-rllib)
+ - [Actor-critic](#actor-critic)
+ - [Tabular Q learning](#tabular-q-learning)
+- [Extending CompilerGym](#extending-compilergym)
+ - [Example CompilerGym service](#example-compilergym-service)
+ - [Example loop unrolling](#example-loop-unrolling)
+- [Miscellaneous](#miscellaneous)
+ - [Exhaustive search of bounded action spaces](#exhaustive-search-of-bounded-action-spaces)
+ - [Estimating the immediate and cumulative reward of actions and benchmarks](#estimating-the-immediate-and-cumulative-reward-of-actions-and-benchmarks)
+
+
+## Autotuning
+
+### Performing a random walk of an environment
+
+The [random_walk.py](random_walk.py) script runs a single episode of a
+CompilerGym environment, logging the action taken and reward received at each
+step. Example usage:
+
+```sh
+$ python random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
+ --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
+
+=== Step 1 ===
+Action: -lower-constant-intrinsics (changed=False)
+Reward: 0.0
+Step time: 805.6us
+
+=== Step 2 ===
+Action: -forceattrs (changed=False)
+Reward: 0.0
+Step time: 229.8us
+
+...
+
+=== Step 100 ===
+Action: -globaldce (changed=False)
+Reward: 0.0
+Step time: 213.9us
+
+Completed 100 steps in 91.6ms (1091.3 steps / sec).
+Total reward: 161.0
+Max reward: 111.0 (+68.94% at step 31)
+```
+
+For further details run: `python random_walk.py --help`.
+
+
+### GCC Autotuning (genetic algorithms, hill climbing, + more)
+
+The [gcc_search.py](gcc_search.py) script contains implementations of several
+autotuning techniques for the GCC environment. It was used to produce the
+results for the GCC experiments in the [CompilerGym
+whitepaper](https://arxiv.org/pdf/2109.08267.pdf). For further details run:
+`python gcc_search.py --help`.
+
+
+### Makefile integration
+
+The [makefile_integration](makefile_integration/) directory demonstrates a
+simple integration of CopmilerGym into a C++ Makefile config. For details see
+the [Makefile](makefile_integration/Makefile).
+
+
+### Random search using the LLVM C++ API
+
+While not intended for the majority of users, it is entirely straightforward to
+skip CompilerGym's Python frontend and interact with the C++ APIs directly. The
+[RandomSearch.cc](RandomSearch.cc) file demonstrates a simple parallelized
+random search implemented for the LLVM compiler service. Run it using:
+
+```
+bazel run -c opt //examples:RandomSearch -- --benchmark=benchmark://cbench-v1/crc32
+```
+
+For further details run: `bazel run -c opt //examples:RandomSearch -- --help`
+
+
+## Reinforcement learning
+
+
+### PPO and integration with RLlib
+
+
+
+
+
+The [rllib.ipynb](rllib.ipynb) notebook demonstrates integrating CompilerGym
+with the popular [RLlib](https://docs.ray.io/en/master/rllib.html) reinforcement
+learning library. In notebook covers registering a custom environment using a
+constrained subset of the LLVM environment's action space a finite time horizon,
+and trains a PPO agent using separate train/val/test datasets.
+
+
+### Actor-critic
+
+The [actor_critic](actor_critic.py) script contains a simple actor-critic
+example using PyTorch. The objective is to minimize the size of a benchmark
+(program) using LLVM compiler passes. At each step there is a choice of which
+pass to pick next and an episode consists of a sequence of such choices,
+yielding the number of saved instructions as the overall reward. For
+simplification of the learning task, only a (configurable) subset of LLVM passes
+are considered and every episode has the same (configurable) length.
+
+For further details run: `python actor_critic.py --help`.
+
+
+### Tabular Q learning
+
+The [tabular_q](tabular_q.py) script contains a simple tabular Q learning
+example for the LLVM environment. Using selected features from Autophase
+observation space, given a specific training program as gym environment, find
+the best action sequence using online Q learning.
+
+For further details run: `python tabular_q.py --help`.
+
+
+## Extending CompilerGym
+
+
+### Example CompilerGym service
+
+The [example_compiler_gym_service](example_compiler_gym_service) directory
+demonstrates how to extend CompilerGym with support for new compiler problems.
+The directory contains bare bones implementations of backends in Python or C++
+that can be used as the basis for adding new compiler environments. See the
+[README.md](example_compiler_gym_service/README.md) file for further details.
+
+
+### Example loop unrolling
+
+The [example_unrolling_service](example_unrolling_service) directory
+demonstrates how to implement support for a real compiler problem by integrating
+with commandline loop unrolling flags for the LLVM compiler. See the
+[README.md](example_unrolling_service/README.md) file for further details.
+
+
+## Miscellaneous
+
+
+### Exhaustive search of bounded action spaces
+
+The [brute_force.py](brute_force.py) script runs a parallelized brute force of
+an action space. It enumerates all possible combinations of actions up to a
+finite episode length and evaluates them, logging the incremental rewards of
+each. Example usage:
+
+```
+$ python brute_force.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+ --episode_length=8 --brute_force_action_list=-sroa,-mem2reg,-newgvn
+
+Enumerating all episodes of 3 actions x 8 steps
+Started 24 brute force workers for benchmark benchmark://cbench-v1/dijkstra using reward IrInstructionCountOz.
+=== Running 6,561 trials ===
+Runtime: 8 seconds. Progress: 100.00%. Best reward found: 0.8571428571428572.
+Ending jobs ... I1014 12:04:51.671775 3245811 CreateAndRunCompilerGymServiceImpl.h:128] Service "/dev/shm/compiler_gym_cec/s/1014T120451-646797-5770" listening on 37505, PID = 3245811
+completed 6,561 of 6,561 trials (100.000%), best sequence -mem2reg -mem2reg -sroa -sroa -mem2reg -sroa -sroa -newgvn
+```
+
+For further details run: `python brute_force.py --help`.
+
+The [explore.py](explore.py) script evaluates all possible combinations of
+actions up to a finite limit, but partial sequences of actions that end up in
+the same state are deduplicated, sometimes dramatically reducing the size of the
+search space. This script can also be configured to do a beam search.
+
+Example usage:
+
+```
+$ python explore.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+ --episode_length=8 --explore_actions=-simplifycfg,-instcombine,-mem2reg,-newgvn
+
+...
+
+*** Processing depth 6 of 8 with 11 states and 4 actions.
+
+ unpruned self_pruned cross_pruned back_pruned dropped sum
+ added this depth 0 33 0 11 0 44
+ full nodes this depth 0 2,833 1,064 199 0 4,096
+ added across depths 69 151 23 34 0 277
+full added across depths 69 3,727 1,411 254 0 5,461
+
+Time taken for depth: 0.05 s
+Top 3 sequence(s):
+ 0.9694 -mem2reg, -newgvn, -simplifycfg, -instcombine
+ 0.9694 -newgvn, -instcombine, -mem2reg, -simplifycfg
+ 0.9694 -newgvn, -instcombine, -mem2reg, -simplifycfg, -instcombine
+
+
+*** Processing depth 7 of 8 with 0 states and 4 actions.
+
+There are no more states to process, stopping early.
+```
+
+For further details run: `python brute_force.py --help`.
+
+
+### Estimating the immediate and cumulative reward of actions and benchmarks
+
+The [sensitivity_analysis](sensitivity_analysis/) directory contains a pair of
+scripts for estimating the sensitivity of the reward signal to different
+environment parameters:
+
+* [action_sensitivity_analysis.py](sensitivity_analysis/action_sensitivity_analysis.py):
+ This script estimates the immediate reward that running a specific action has
+ by running trials. A trial is a random episode that ends with the determined
+ action.
+* [benchmark_sensitivity_analysis.py](sensitivity_analysis/benchmark_sensitivity_analysis.py):
+ This script estimates the cumulative reward for a random episode on a
+ benchmark by running trials. A trial is an episode in which a random number of
+ random actions are performed and the total cumulative reward is recorded.
diff --git a/examples/RandomSearch.cc b/examples/RandomSearch.cc
index c1129f868..0073cd9d6 100644
--- a/examples/RandomSearch.cc
+++ b/examples/RandomSearch.cc
@@ -57,6 +57,7 @@ class Environment {
}
StartSessionRequest startRequest;
+ startRequest.mutable_benchmark()->set_uri(benchmark_);
StartSessionReply startReply;
RETURN_IF_ERROR(service_.StartSession(nullptr, &startRequest, &startReply));
sessionId_ = startReply.session_id();
@@ -138,8 +139,9 @@ Status runSearch(const fs::path& workingDir, std::vector* bestActions, int6
void runThread(std::vector* bestActions, int64_t* bestCost) {
const fs::path workingDir = fs::unique_path();
fs::create_directories(workingDir);
- if (!runSearch(workingDir, bestActions, bestCost).ok()) {
- LOG(ERROR) << "Search failed";
+ const auto status = runSearch(workingDir, bestActions, bestCost);
+ if (!status.ok()) {
+ LOG(ERROR) << "ERROR " << status.error_code() << ": " << status.error_message();
}
fs::remove_all(workingDir);
}
diff --git a/examples/brute_force.py b/examples/brute_force.py
index d5cb1b448..2f281d585 100644
--- a/examples/brute_force.py
+++ b/examples/brute_force.py
@@ -9,14 +9,15 @@
Example usage:
- $ $ python -m compiler_gym.bin.brute_force \
- --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
- --episode_length=10 --actions=-sroa,-mem2reg,-newgvn
- Enumerating all episodes of 3 actions x 10 steps
- Started 24 brute force workers for benchmark cbench-v1/dijkstra using reward IrInstructionCountOz.
- === Running 59,049 trials ===
- Runtime: 3 minutes. Progress: 100.00%. Best reward found: 101.1905%.
- Ending jobs ... completed 59,049 of 59,049 trials (100.000%)
+ $ python brute_force.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+ --episode_length=8 --brute_force_action_list=-sroa,-mem2reg,-newgvn
+
+ Enumerating all episodes of 3 actions x 8 steps
+ Started 24 brute force workers for benchmark benchmark://cbench-v1/dijkstra using reward IrInstructionCountOz.
+ === Running 6,561 trials ===
+ Runtime: 8 seconds. Progress: 100.00%. Best reward found: 0.8571428571428572.
+ Ending jobs ... I1014 12:04:51.671775 3245811 CreateAndRunCompilerGymServiceImpl.h:128] Service "/dev/shm/compiler_gym_cec/s/1014T120451-646797-5770" listening on 37505, PID = 3245811
+ completed 6,561 of 6,561 trials (100.000%), best sequence -mem2reg -mem2reg -sroa -sroa -mem2reg -sroa -sroa -newgvn
Use --help to list the configurable options.
"""
@@ -306,8 +307,7 @@ def main(argv):
with env_from_flags(benchmark) as env:
env.reset()
- benchmark = env.benchmark
- sanitized_benchmark_uri = "/".join(benchmark.split("/")[-2:])
+ sanitized_benchmark_uri = "/".join(str(env.benchmark).split("/")[-2:])
logs_dir = Path(
FLAGS.output_dir or create_logging_dir(f"brute_force/{sanitized_benchmark_uri}")
)
diff --git a/examples/random_walk.py b/examples/random_walk.py
index bba6bdc8a..44c1a6f33 100644
--- a/examples/random_walk.py
+++ b/examples/random_walk.py
@@ -7,8 +7,8 @@
Example usage:
# Run a random walk on cBench example program using instruction count reward.
- $ python3 examples/random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
- --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
+ $ python3 random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
+ --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
"""
import random
@@ -39,7 +39,7 @@ def run_random_walk(env: CompilerEnv, step_count: int) -> None:
fewer steps will be performed if any of the actions lead the
environment to end the episode.
"""
- rewards, actions = [], []
+ rewards = []
step_num = 0
with Timer() as episode_time:
@@ -48,14 +48,13 @@ def run_random_walk(env: CompilerEnv, step_count: int) -> None:
action_index = env.action_space.sample()
with Timer() as step_time:
observation, reward, done, info = env.step(action_index)
- print(f"\n=== Step {humanize.intcomma(step_num)} ===")
print(
+ f"\n=== Step {humanize.intcomma(step_num)} ===\n"
f"Action: {env.action_space.names[action_index]} "
- f"(changed={not info.get('action_had_no_effect')})"
+ f"(changed={not info.get('action_had_no_effect')})\n"
+ f"Reward: {reward}"
)
rewards.append(reward)
- actions.append(env.action_space.names[action_index])
- print(f"Reward: {reward}")
if env.observation_space:
print(f"Observation:\n{observation}")
print(f"Step time: {step_time}")
@@ -71,27 +70,18 @@ def reward_percentage(reward, rewards):
print(
f"\nCompleted {emph(humanize.intcomma(step_num))} steps in {episode_time} "
- f"({step_num / episode_time.time:.1f} steps / sec)."
- )
- print(f"Total reward: {sum(rewards)}")
- print(
+ f"({step_num / episode_time.time:.1f} steps / sec).\n"
+ f"Total reward: {sum(rewards)}\n"
f"Max reward: {max(rewards)} ({reward_percentage(max(rewards), rewards)} "
f"at step {humanize.intcomma(rewards.index(max(rewards)) + 1)})"
)
- def remove_no_change(rewards, actions):
- return [a for (r, a) in zip(rewards, actions) if r != 0]
-
- actions = remove_no_change(rewards, actions)
- print("Effective actions from trajectory: " + ", ".join(actions))
-
def main(argv):
"""Main entry point."""
assert len(argv) == 1, f"Unrecognized flags: {argv[1:]}"
- benchmark = benchmark_from_flags()
- with env_from_flags(benchmark) as env:
+ with env_from_flags(benchmark=benchmark_from_flags()) as env:
step_min = min(FLAGS.step_min, FLAGS.step_max)
step_max = max(FLAGS.step_min, FLAGS.step_max)
run_random_walk(env=env, step_count=random.randint(step_min, step_max))
diff --git a/examples/random_walk_test.py b/examples/random_walk_test.py
index ffa0e0f71..028c594b9 100644
--- a/examples/random_walk_test.py
+++ b/examples/random_walk_test.py
@@ -3,14 +3,24 @@
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
"""Unit tests for //compiler_gym/bin:random_walk."""
-import gym
+import re
+
from absl.flags import FLAGS
from random_walk import run_random_walk
+import compiler_gym
+from compiler_gym.util.capture_output import capture_output
+
def test_run_random_walk_smoke_test():
FLAGS.unparse_flags()
FLAGS(["argv0"])
- with gym.make("llvm-autophase-ic-v0") as env:
- env.benchmark = "cbench-v1/crc32"
- run_random_walk(env=env, step_count=5)
+ with capture_output() as out:
+ with compiler_gym.make("llvm-autophase-ic-v0") as env:
+ env.benchmark = "cbench-v1/crc32"
+ run_random_walk(env=env, step_count=5)
+
+ print(out.stdout)
+ # Note the ".*" before and after the step count to ignore the shell
+ # formatting.
+ assert re.search(r"Completed .*5.* steps in ", out.stdout)
diff --git a/examples/sensitivity_analysis/action_sensitivity_analysis.py b/examples/sensitivity_analysis/action_sensitivity_analysis.py
index aefdf67e9..52c4e409a 100644
--- a/examples/sensitivity_analysis/action_sensitivity_analysis.py
+++ b/examples/sensitivity_analysis/action_sensitivity_analysis.py
@@ -2,12 +2,11 @@
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
-"""Determine the typical reward delta of different actions using random trials.
+"""Estimate the immediate reward of different actions using random trials.
-This script estimates the change in reward that running a specific action has
-by running trials. A trial is a random episode that ends with the determined
-action. Reward delta is the amount that the reward signal changes from running
-that action: (reward_after - reward_before) / reward_before.
+This script estimates the immediate reward that running a specific action has by
+running trials. A trial is a random episode that ends with the determined
+action.
Example Usage
-------------
@@ -20,7 +19,7 @@
--benchmark=cbench-v1/crc32 --num_trials=100 \
--action=AddDiscriminatorsPass,AggressiveDcepass,AggressiveInstCombinerPass
-Evaluate the single-step reward delta of all actions on LLVM codesize:
+Evaluate the single-step immediate reward of all actions on LLVM codesize:
$ bazel run -c opt //compiler_gym/bin:action_ensitivity_analysis -- \
--env=llvm-v0 --reward=IrInstructionCountO3
@@ -81,7 +80,7 @@ def get_rewards(
max_warmup_steps: int,
max_attempts_multiplier: int = 5,
) -> SensitivityAnalysisResult:
- """Run random trials to get a list of num_trials reward deltas."""
+ """Run random trials to get a list of num_trials immediate rewards."""
rewards, runtimes = [], []
benchmark = benchmark_from_flags()
num_attempts = 0
@@ -109,24 +108,18 @@ def run_one_trial(
env: CompilerEnv, reward_space: str, action: int, max_warmup_steps: int
) -> Optional[float]:
"""Run a random number of "warmup" steps in an environment, then compute
- the reward delta of the given action.
+ the immediate reward of the given action.
- :return: The ratio of reward improvement.
+ :return: An immediate reward.
"""
num_warmup_steps = random.randint(0, max_warmup_steps)
- for _ in range(num_warmup_steps):
- _, _, done, _ = env.step(env.action_space.sample())
- if done:
- return None
- # Force reward calculation.
- init_reward = env.reward[reward_space]
- assert init_reward is not None
- _, _, done, _ = env.step(action)
+ warmup_actions = [env.action_space.sample() for _ in range(num_warmup_steps)]
+ env.reward_space = reward_space
+ _, _, done, _ = env.step(warmup_actions)
if done:
return None
- reward_after = env.reward[reward_space]
- assert reward_after is not None
- return reward_after
+ _, (reward,), done, _ = env.step(action, rewards=[reward_space])
+ return None if done else reward
def run_action_sensitivity_analysis(
@@ -139,7 +132,7 @@ def run_action_sensitivity_analysis(
nproc: int = cpu_count(),
max_attempts_multiplier: int = 5,
):
- """Estimate the reward delta of a given list of actions."""
+ """Estimate the immediate reward of a given list of actions."""
with env_from_flags() as env:
action_names = env.action_space.names
diff --git a/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py b/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
index a2a3dc56f..065b5bc52 100644
--- a/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
+++ b/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
@@ -2,13 +2,11 @@
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
-"""Determine the typical reward delta of a benchmark using random trials.
+"""Estimate the cumulative reward of random episodes on benchmarks.
-This script estimates the change in reward that running a random episode has
-on a benchmark by running trials. A trial is an episode in which a random
-number of random actions are performed. Reward delta is the amount that the
-reward signal changes from the initial to final action:
-(reward_end - reward_init) / reward_init.
+This script estimates the cumulative reward for a random episode on a benchmark
+by running trials. A trial is an episode in which a random number of random
+actions are performed and the total cumulative reward is recorded.
Example Usage
-------------
@@ -20,7 +18,7 @@
--env=llvm-v0 --reward=IrInstructionCountO3 \
--benchmark=cBench-crc32 --num_trials=50
-Evaluate the LLVM codesize reward delta on all benchmarks:
+Evaluate the LLVM codesize episode reward on all benchmarks:
$ bazel run -c opt //compiler_gym/bin:benchmark_sensitivity_analysis -- \
--env=llvm-v0 --reward=IrInstructionCountO3
@@ -49,7 +47,7 @@
flags.DEFINE_integer(
"num_benchmark_sensitivity_trials",
100,
- "The number of trials to perform when estimating the reward delta of each benchmark. "
+ "The number of trials to perform when estimating the episode reward of each benchmark. "
"A trial is a random episode of a benchmark. Increasing this number increases the "
"number of trials performed, leading to a higher fidelity estimate of the reward "
"potential for a benchmark.",
@@ -83,7 +81,7 @@ def get_rewards(
max_steps: int,
max_attempts_multiplier: int = 5,
) -> SensitivityAnalysisResult:
- """Run random trials to get a list of num_trials reward deltas."""
+ """Run random trials to get a list of num_trials episode rewards."""
rewards, runtimes = [], []
num_attempts = 0
while (
@@ -110,21 +108,18 @@ def get_rewards(
def run_one_trial(
env: CompilerEnv, reward_space: str, min_steps: int, max_steps: int
) -> Optional[float]:
- """Run a random number of "warmup" steps in an environment, then compute
- the reward delta of the given action.
+ """Run a random number of random steps in an environment and return the
+ cumulative reward.
- :return: The ratio of reward improvement.
+ :return: A cumulative reward.
"""
num_steps = random.randint(min_steps, max_steps)
- init_reward = env.reward[reward_space]
- assert init_reward is not None
- for _ in range(num_steps):
- _, _, done, _ = env.step(env.action_space.sample())
- if done:
- return None
- reward_after = env.reward[reward_space]
- assert reward_after is not None
- return reward_after
+ warmup_actions = [env.action_space.sample() for _ in range(num_steps)]
+ env.reward_space = reward_space
+ _, _, done, _ = env.step(warmup_actions)
+ if done:
+ return None
+ return env.episode_reward
def run_benchmark_sensitivity_analysis(
@@ -138,7 +133,7 @@ def run_benchmark_sensitivity_analysis(
nproc: int = cpu_count(),
max_attempts_multiplier: int = 5,
):
- """Estimate the reward delta of a given list of benchmarks."""
+ """Estimate the cumulative reward of random walks on a list of benchmarks."""
with ThreadPoolExecutor(max_workers=nproc) as executor:
analysis_futures = [
executor.submit(
diff --git a/examples/tabular_q.py b/examples/tabular_q.py
index 5f6c8b83a..da7419c25 100644
--- a/examples/tabular_q.py
+++ b/examples/tabular_q.py
@@ -2,9 +2,9 @@
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
-
"""Simple compiler gym tabular q learning example.
-Usage python tabular_q.py --benchmark=
+
+Usage: python tabular_q.py --benchmark=
Using selected features from Autophase observation space, given a specific training
program as gym environment, find the best action sequence using online q learning.
diff --git a/tests/util/BUILD b/tests/util/BUILD
index d92b1ad08..1ea676530 100644
--- a/tests/util/BUILD
+++ b/tests/util/BUILD
@@ -85,6 +85,15 @@ py_test(
],
)
+py_test(
+ name = "shell_format_test",
+ srcs = ["shell_format_test.py"],
+ deps = [
+ "//compiler_gym/util",
+ "//tests:test_main",
+ ],
+)
+
py_test(
name = "statistics_test",
timeout = "short",
diff --git a/tests/util/shell_format_test.py b/tests/util/shell_format_test.py
new file mode 100644
index 000000000..3d23325d2
--- /dev/null
+++ b/tests/util/shell_format_test.py
@@ -0,0 +1,17 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""Unit tests for compiler_gym/util/shell_format.py"""
+from compiler_gym.util import shell_format as fmt
+from tests.test_main import main
+
+
+def test_indent():
+ assert fmt.indent("abc") == " abc"
+ assert fmt.indent("abc", indent=2) == " abc"
+ assert fmt.indent("abc\ndef") == " abc\n def"
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tests/wrappers/core_wrappers_test.py b/tests/wrappers/core_wrappers_test.py
index 6e47d9fa6..8dc2a18b8 100644
--- a/tests/wrappers/core_wrappers_test.py
+++ b/tests/wrappers/core_wrappers_test.py
@@ -143,6 +143,26 @@ def observation(self, observation):
assert ir_a != ir_b
+def test_wrapped_set_benchmark(env: LlvmEnv, wrapper_type):
+ """Test that the benchmark attribute can be set on wrapped classes."""
+
+ class MyWrapper(wrapper_type):
+ def observation(self, observation):
+ return observation # pass thru
+
+ env = MyWrapper(env)
+
+ # Set the benchmark attribute and check that it propagates.
+ env.benchmark = "benchmark://cbench-v1/dijkstra"
+ env.reset()
+ assert env.benchmark == "benchmark://cbench-v1/dijkstra"
+
+ # Repeat again for a different benchmark.
+ env.benchmark = "benchmark://cbench-v1/crc32"
+ env.reset()
+ assert env.benchmark == "benchmark://cbench-v1/crc32"
+
+
def test_wrapped_env_in_episode(env: LlvmEnv, wrapper_type):
class MyWrapper(wrapper_type):
def observation(self, observation):