diff --git a/README.md b/README.md
index 4bd12d056..b67ec63dc 100644
--- a/README.md
+++ b/README.md
@@ -97,9 +97,9 @@ In Python, import `compiler_gym` to use the environments:
 >>> env.close()                              # closes the environment, freeing resources
 ```
 
-See the [documentation website](http://facebookresearch.github.io/CompilerGym/)
-for tutorials, further details, and API reference. See the [examples](/examples)
-directory for pytorch integration, agent implementations, etc.
+See the [examples](/examples) directory for agent implementations, environment
+extensions, and more. See the [documentation
+website](http://facebookresearch.github.io/CompilerGym/) for the API reference.
 
 
 ## Leaderboards
diff --git a/compiler_gym/util/shell_format.py b/compiler_gym/util/shell_format.py
index abf1b18ce..add683a93 100644
--- a/compiler_gym/util/shell_format.py
+++ b/compiler_gym/util/shell_format.py
@@ -28,3 +28,8 @@ def emph(stringable: Any) -> str:
 def plural(quantity: int, singular: str, plural: str) -> str:
     """Return the singular or plural word."""
     return singular if quantity == 1 else plural
+
+
+def indent(string: str, n=4) -> str:
+    """Indent a multi-line string by given number of spaces."""
+    return "\n".join(" " * n + x for x in str(string).split("\n"))
diff --git a/examples/README.md b/examples/README.md
new file mode 100644
index 000000000..809001dd1
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1,226 @@
+# CompilerGym Examples <!-- omit in toc -->
+
+This directory contains code samples for everything from implementing simple
+RL agents to adding support for entirely new compilers. Is there an example that
+you think is missing? If so, please [contribute](/CONTRIBUTING.md)!
+
+
+**Table of contents:**
+
+- [Autotuning](#autotuning)
+  - [Performing a random walk of an environment](#performing-a-random-walk-of-an-environment)
+  - [GCC Autotuning (genetic algorithms, hill climbing, + more)](#gcc-autotuning-genetic-algorithms-hill-climbing--more)
+  - [Makefile integration](#makefile-integration)
+  - [Random search using the LLVM C++ API](#random-search-using-the-llvm-c-api)
+- [Reinforcement learning](#reinforcement-learning)
+  - [PPO and integration with RLlib](#ppo-and-integration-with-rllib)
+  - [Actor-critic](#actor-critic)
+  - [Tabular Q learning](#tabular-q-learning)
+- [Extending CompilerGym](#extending-compilergym)
+  - [Example CompilerGym service](#example-compilergym-service)
+  - [Example loop unrolling](#example-loop-unrolling)
+- [Miscellaneous](#miscellaneous)
+  - [Exhaustive search of bounded action spaces](#exhaustive-search-of-bounded-action-spaces)
+  - [Estimating the immediate and cumulative reward of actions and benchmarks](#estimating-the-immediate-and-cumulative-reward-of-actions-and-benchmarks)
+
+
+## Autotuning
+
+### Performing a random walk of an environment
+
+The [random_walk.py](random_walk.py) script runs a single episode of a
+CompilerGym environment, logging the action taken and reward received at each
+step. Example usage:
+
+```sh
+$ python random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
+      --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
+
+=== Step 1 ===
+Action:       -lower-constant-intrinsics (changed=False)
+Reward:       0.0
+Step time:    805.6us
+
+=== Step 2 ===
+Action:       -forceattrs (changed=False)
+Reward:       0.0
+Step time:    229.8us
+
+...
+
+=== Step 100 ===
+Action:       -globaldce (changed=False)
+Reward:       0.0
+Step time:    213.9us
+
+Completed 100 steps in 91.6ms (1091.3 steps / sec).
+Total reward: 161.0
+Max reward:   111.0 (+68.94% at step 31)
+```
+
+For further details run: `python random_walk.py --help`.
+
+
+### GCC Autotuning (genetic algorithms, hill climbing, + more)
+
+The [gcc_search.py](gcc_search.py) script contains implementations of several
+autotuning techniques for the GCC environment. It was used to produce the
+results for the GCC experiments in the [CompilerGym
+whitepaper](https://arxiv.org/pdf/2109.08267.pdf). For further details run:
+`python gcc_search.py --help`.
+
+
+### Makefile integration
+
+The [makefile_integration](makefile_integration/) directory demonstrates a
+simple integration of CopmilerGym into a C++ Makefile config. For details see
+the [Makefile](makefile_integration/Makefile).
+
+
+### Random search using the LLVM C++ API
+
+While not intended for the majority of users, it is entirely straightforward to
+skip CompilerGym's Python frontend and interact with the C++ APIs directly. The
+[RandomSearch.cc](RandomSearch.cc) file demonstrates a simple parallelized
+random search implemented for the LLVM compiler service. Run it using:
+
+```
+bazel run -c opt //examples:RandomSearch -- --benchmark=benchmark://cbench-v1/crc32
+```
+
+For further details run: `bazel run -c opt //examples:RandomSearch -- --help`
+
+
+## Reinforcement learning
+
+
+### PPO and integration with RLlib
+
+<a href="https://colab.research.google.com/github/facebookresearch/CompilerGym/blob/stable/examples/getting-started.ipynb">
+    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab" height="20">
+</a>
+
+The [rllib.ipynb](rllib.ipynb) notebook demonstrates integrating CompilerGym
+with the popular [RLlib](https://docs.ray.io/en/master/rllib.html) reinforcement
+learning library. In notebook covers registering a custom environment using a
+constrained subset of the LLVM environment's action space a finite time horizon,
+and trains a PPO agent using separate train/val/test datasets.
+
+
+### Actor-critic
+
+The [actor_critic](actor_critic.py) script contains a simple actor-critic
+example using PyTorch. The objective is to minimize the size of a benchmark
+(program) using LLVM compiler passes. At each step there is a choice of which
+pass to pick next and an episode consists of a sequence of such choices,
+yielding the number of saved instructions as the overall reward. For
+simplification of the learning task, only a (configurable) subset of LLVM passes
+are considered and every episode has the same (configurable) length.
+
+For further details run: `python actor_critic.py --help`.
+
+
+### Tabular Q learning
+
+The [tabular_q](tabular_q.py) script contains a simple tabular Q learning
+example for the LLVM environment. Using selected features from Autophase
+observation space, given a specific training program as gym environment, find
+the best action sequence using online Q learning.
+
+For further details run: `python tabular_q.py --help`.
+
+
+## Extending CompilerGym
+
+
+### Example CompilerGym service
+
+The [example_compiler_gym_service](example_compiler_gym_service) directory
+demonstrates how to extend CompilerGym with support for new compiler problems.
+The directory contains bare bones implementations of backends in Python or C++
+that can be used as the basis for adding new compiler environments. See the
+[README.md](example_compiler_gym_service/README.md) file for further details.
+
+
+### Example loop unrolling
+
+The [example_unrolling_service](example_unrolling_service) directory
+demonstrates how to implement support for a real compiler problem by integrating
+with commandline loop unrolling flags for the LLVM compiler. See the
+[README.md](example_unrolling_service/README.md) file for further details.
+
+
+## Miscellaneous
+
+
+### Exhaustive search of bounded action spaces
+
+The [brute_force.py](brute_force.py) script runs a parallelized brute force of
+an action space. It enumerates all possible combinations of actions up to a
+finite episode length and evaluates them, logging the incremental rewards of
+each. Example usage:
+
+```
+$ python brute_force.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+      --episode_length=8 --brute_force_action_list=-sroa,-mem2reg,-newgvn
+
+Enumerating all episodes of 3 actions x 8 steps
+Started 24 brute force workers for benchmark benchmark://cbench-v1/dijkstra using reward IrInstructionCountOz.
+=== Running 6,561 trials ===
+Runtime: 8 seconds. Progress: 100.00%. Best reward found: 0.8571428571428572.
+Ending jobs ... I1014 12:04:51.671775 3245811 CreateAndRunCompilerGymServiceImpl.h:128] Service "/dev/shm/compiler_gym_cec/s/1014T120451-646797-5770" listening on 37505, PID = 3245811
+completed 6,561 of 6,561 trials (100.000%), best sequence -mem2reg -mem2reg -sroa -sroa -mem2reg -sroa -sroa -newgvn
+```
+
+For further details run: `python brute_force.py --help`.
+
+The [explore.py](explore.py) script evaluates all possible combinations of
+actions up to a finite limit, but partial sequences of actions that end up in
+the same state are deduplicated, sometimes dramatically reducing the size of the
+search space. This script can also be configured to do a beam search.
+
+Example usage:
+
+```
+$ python explore.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+      --episode_length=8 --explore_actions=-simplifycfg,-instcombine,-mem2reg,-newgvn
+
+...
+
+*** Processing depth 6 of 8 with 11 states and 4 actions.
+
+                                 unpruned     self_pruned    cross_pruned     back_pruned         dropped             sum
+        added this depth                0              33               0              11               0              44
+   full nodes this depth                0           2,833           1,064             199               0           4,096
+     added across depths               69             151              23              34               0             277
+full added across depths               69           3,727           1,411             254               0           5,461
+
+Time taken for depth: 0.05 s
+Top 3 sequence(s):
+  0.9694  -mem2reg, -newgvn, -simplifycfg, -instcombine
+  0.9694  -newgvn, -instcombine, -mem2reg, -simplifycfg
+  0.9694  -newgvn, -instcombine, -mem2reg, -simplifycfg, -instcombine
+
+
+*** Processing depth 7 of 8 with 0 states and 4 actions.
+
+There are no more states to process, stopping early.
+```
+
+For further details run: `python brute_force.py --help`.
+
+
+### Estimating the immediate and cumulative reward of actions and benchmarks
+
+The [sensitivity_analysis](sensitivity_analysis/) directory contains a pair of
+scripts for estimating the sensitivity of the reward signal to different
+environment parameters:
+
+* [action_sensitivity_analysis.py](sensitivity_analysis/action_sensitivity_analysis.py):
+  This script estimates the immediate reward that running a specific action has
+  by running trials. A trial is a random episode that ends with the determined
+  action.
+* [benchmark_sensitivity_analysis.py](sensitivity_analysis/benchmark_sensitivity_analysis.py):
+  This script estimates the cumulative reward for a random episode on a
+  benchmark by running trials. A trial is an episode in which a random number of
+  random actions are performed and the total cumulative reward is recorded.
diff --git a/examples/RandomSearch.cc b/examples/RandomSearch.cc
index c1129f868..0073cd9d6 100644
--- a/examples/RandomSearch.cc
+++ b/examples/RandomSearch.cc
@@ -57,6 +57,7 @@ class Environment {
     }
 
     StartSessionRequest startRequest;
+    startRequest.mutable_benchmark()->set_uri(benchmark_);
     StartSessionReply startReply;
     RETURN_IF_ERROR(service_.StartSession(nullptr, &startRequest, &startReply));
     sessionId_ = startReply.session_id();
@@ -138,8 +139,9 @@ Status runSearch(const fs::path& workingDir, std::vector<int>* bestActions, int6
 void runThread(std::vector<int>* bestActions, int64_t* bestCost) {
   const fs::path workingDir = fs::unique_path();
   fs::create_directories(workingDir);
-  if (!runSearch(workingDir, bestActions, bestCost).ok()) {
-    LOG(ERROR) << "Search failed";
+  const auto status = runSearch(workingDir, bestActions, bestCost);
+  if (!status.ok()) {
+    LOG(ERROR) << "ERROR " << status.error_code() << ": " << status.error_message();
   }
   fs::remove_all(workingDir);
 }
diff --git a/examples/brute_force.py b/examples/brute_force.py
index d5cb1b448..2f281d585 100644
--- a/examples/brute_force.py
+++ b/examples/brute_force.py
@@ -9,14 +9,15 @@
 
 Example usage:
 
-    $ $ python -m compiler_gym.bin.brute_force \
-       --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
-       --episode_length=10 --actions=-sroa,-mem2reg,-newgvn
-    Enumerating all episodes of 3 actions x 10 steps
-    Started 24 brute force workers for benchmark cbench-v1/dijkstra using reward IrInstructionCountOz.
-    === Running 59,049 trials ===
-    Runtime: 3 minutes. Progress: 100.00%. Best reward found: 101.1905%.
-    Ending jobs ... completed 59,049 of 59,049 trials (100.000%)
+    $ python brute_force.py --env=llvm-ic-v0 --benchmark=cbench-v1/dijkstra \
+          --episode_length=8 --brute_force_action_list=-sroa,-mem2reg,-newgvn
+
+    Enumerating all episodes of 3 actions x 8 steps
+    Started 24 brute force workers for benchmark benchmark://cbench-v1/dijkstra using reward IrInstructionCountOz.
+    === Running 6,561 trials ===
+    Runtime: 8 seconds. Progress: 100.00%. Best reward found: 0.8571428571428572.
+    Ending jobs ... I1014 12:04:51.671775 3245811 CreateAndRunCompilerGymServiceImpl.h:128] Service "/dev/shm/compiler_gym_cec/s/1014T120451-646797-5770" listening on 37505, PID = 3245811
+    completed 6,561 of 6,561 trials (100.000%), best sequence -mem2reg -mem2reg -sroa -sroa -mem2reg -sroa -sroa -newgvn
 
 Use --help to list the configurable options.
 """
@@ -306,8 +307,7 @@ def main(argv):
 
     with env_from_flags(benchmark) as env:
         env.reset()
-        benchmark = env.benchmark
-        sanitized_benchmark_uri = "/".join(benchmark.split("/")[-2:])
+        sanitized_benchmark_uri = "/".join(str(env.benchmark).split("/")[-2:])
     logs_dir = Path(
         FLAGS.output_dir or create_logging_dir(f"brute_force/{sanitized_benchmark_uri}")
     )
diff --git a/examples/random_walk.py b/examples/random_walk.py
index bba6bdc8a..44c1a6f33 100644
--- a/examples/random_walk.py
+++ b/examples/random_walk.py
@@ -7,8 +7,8 @@
 Example usage:
 
     # Run a random walk on cBench example program using instruction count reward.
-    $ python3 examples/random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
-      --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
+    $ python3 random_walk.py --env=llvm-v0 --step_min=100 --step_max=100 \
+          --benchmark=cbench-v1/dijkstra --reward=IrInstructionCount
 """
 import random
 
@@ -39,7 +39,7 @@ def run_random_walk(env: CompilerEnv, step_count: int) -> None:
         fewer steps will be performed if any of the actions lead the
         environment to end the episode.
     """
-    rewards, actions = [], []
+    rewards = []
 
     step_num = 0
     with Timer() as episode_time:
@@ -48,14 +48,13 @@ def run_random_walk(env: CompilerEnv, step_count: int) -> None:
             action_index = env.action_space.sample()
             with Timer() as step_time:
                 observation, reward, done, info = env.step(action_index)
-            print(f"\n=== Step {humanize.intcomma(step_num)} ===")
             print(
+                f"\n=== Step {humanize.intcomma(step_num)} ===\n"
                 f"Action:       {env.action_space.names[action_index]} "
-                f"(changed={not info.get('action_had_no_effect')})"
+                f"(changed={not info.get('action_had_no_effect')})\n"
+                f"Reward:       {reward}"
             )
             rewards.append(reward)
-            actions.append(env.action_space.names[action_index])
-            print(f"Reward:       {reward}")
             if env.observation_space:
                 print(f"Observation:\n{observation}")
             print(f"Step time:    {step_time}")
@@ -71,27 +70,18 @@ def reward_percentage(reward, rewards):
 
     print(
         f"\nCompleted {emph(humanize.intcomma(step_num))} steps in {episode_time} "
-        f"({step_num / episode_time.time:.1f} steps / sec)."
-    )
-    print(f"Total reward: {sum(rewards)}")
-    print(
+        f"({step_num / episode_time.time:.1f} steps / sec).\n"
+        f"Total reward: {sum(rewards)}\n"
         f"Max reward:   {max(rewards)} ({reward_percentage(max(rewards), rewards)} "
         f"at step {humanize.intcomma(rewards.index(max(rewards)) + 1)})"
     )
 
-    def remove_no_change(rewards, actions):
-        return [a for (r, a) in zip(rewards, actions) if r != 0]
-
-    actions = remove_no_change(rewards, actions)
-    print("Effective actions from trajectory: " + ", ".join(actions))
-
 
 def main(argv):
     """Main entry point."""
     assert len(argv) == 1, f"Unrecognized flags: {argv[1:]}"
 
-    benchmark = benchmark_from_flags()
-    with env_from_flags(benchmark) as env:
+    with env_from_flags(benchmark=benchmark_from_flags()) as env:
         step_min = min(FLAGS.step_min, FLAGS.step_max)
         step_max = max(FLAGS.step_min, FLAGS.step_max)
         run_random_walk(env=env, step_count=random.randint(step_min, step_max))
diff --git a/examples/random_walk_test.py b/examples/random_walk_test.py
index ffa0e0f71..028c594b9 100644
--- a/examples/random_walk_test.py
+++ b/examples/random_walk_test.py
@@ -3,14 +3,24 @@
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.
 """Unit tests for //compiler_gym/bin:random_walk."""
-import gym
+import re
+
 from absl.flags import FLAGS
 from random_walk import run_random_walk
 
+import compiler_gym
+from compiler_gym.util.capture_output import capture_output
+
 
 def test_run_random_walk_smoke_test():
     FLAGS.unparse_flags()
     FLAGS(["argv0"])
-    with gym.make("llvm-autophase-ic-v0") as env:
-        env.benchmark = "cbench-v1/crc32"
-        run_random_walk(env=env, step_count=5)
+    with capture_output() as out:
+        with compiler_gym.make("llvm-autophase-ic-v0") as env:
+            env.benchmark = "cbench-v1/crc32"
+            run_random_walk(env=env, step_count=5)
+
+    print(out.stdout)
+    # Note the ".*" before and after the step count to ignore the shell
+    # formatting.
+    assert re.search(r"Completed .*5.* steps in ", out.stdout)
diff --git a/examples/sensitivity_analysis/action_sensitivity_analysis.py b/examples/sensitivity_analysis/action_sensitivity_analysis.py
index aefdf67e9..52c4e409a 100644
--- a/examples/sensitivity_analysis/action_sensitivity_analysis.py
+++ b/examples/sensitivity_analysis/action_sensitivity_analysis.py
@@ -2,12 +2,11 @@
 #
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.
-"""Determine the typical reward delta of different actions using random trials.
+"""Estimate the immediate reward of different actions using random trials.
 
-This script estimates the change in reward that running a specific action has
-by running trials. A trial is a random episode that ends with the determined
-action. Reward delta is the amount that the reward signal changes from running
-that action: (reward_after - reward_before) / reward_before.
+This script estimates the immediate reward that running a specific action has by
+running trials. A trial is a random episode that ends with the determined
+action.
 
 Example Usage
 -------------
@@ -20,7 +19,7 @@
         --benchmark=cbench-v1/crc32 --num_trials=100 \
         --action=AddDiscriminatorsPass,AggressiveDcepass,AggressiveInstCombinerPass
 
-Evaluate the single-step reward delta of all actions on LLVM codesize:
+Evaluate the single-step immediate reward of all actions on LLVM codesize:
 
     $ bazel run -c opt //compiler_gym/bin:action_ensitivity_analysis -- \
         --env=llvm-v0 --reward=IrInstructionCountO3
@@ -81,7 +80,7 @@ def get_rewards(
     max_warmup_steps: int,
     max_attempts_multiplier: int = 5,
 ) -> SensitivityAnalysisResult:
-    """Run random trials to get a list of num_trials reward deltas."""
+    """Run random trials to get a list of num_trials immediate rewards."""
     rewards, runtimes = [], []
     benchmark = benchmark_from_flags()
     num_attempts = 0
@@ -109,24 +108,18 @@ def run_one_trial(
     env: CompilerEnv, reward_space: str, action: int, max_warmup_steps: int
 ) -> Optional[float]:
     """Run a random number of "warmup" steps in an environment, then compute
-    the reward delta of the given action.
+    the immediate reward of the given action.
 
-        :return: The ratio of reward improvement.
+    :return: An immediate reward.
     """
     num_warmup_steps = random.randint(0, max_warmup_steps)
-    for _ in range(num_warmup_steps):
-        _, _, done, _ = env.step(env.action_space.sample())
-        if done:
-            return None
-    # Force reward calculation.
-    init_reward = env.reward[reward_space]
-    assert init_reward is not None
-    _, _, done, _ = env.step(action)
+    warmup_actions = [env.action_space.sample() for _ in range(num_warmup_steps)]
+    env.reward_space = reward_space
+    _, _, done, _ = env.step(warmup_actions)
     if done:
         return None
-    reward_after = env.reward[reward_space]
-    assert reward_after is not None
-    return reward_after
+    _, (reward,), done, _ = env.step(action, rewards=[reward_space])
+    return None if done else reward
 
 
 def run_action_sensitivity_analysis(
@@ -139,7 +132,7 @@ def run_action_sensitivity_analysis(
     nproc: int = cpu_count(),
     max_attempts_multiplier: int = 5,
 ):
-    """Estimate the reward delta of a given list of actions."""
+    """Estimate the immediate reward of a given list of actions."""
     with env_from_flags() as env:
         action_names = env.action_space.names
 
diff --git a/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py b/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
index a2a3dc56f..065b5bc52 100644
--- a/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
+++ b/examples/sensitivity_analysis/benchmark_sensitivity_analysis.py
@@ -2,13 +2,11 @@
 #
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.
-"""Determine the typical reward delta of a benchmark using random trials.
+"""Estimate the cumulative reward of random episodes on benchmarks.
 
-This script estimates the change in reward that running a random episode has
-on a benchmark by running trials. A trial is an episode in which a random
-number of random actions are performed. Reward delta is the amount that the
-reward signal changes from the initial to final action:
-(reward_end - reward_init) / reward_init.
+This script estimates the cumulative reward for a random episode on a benchmark
+by running trials. A trial is an episode in which a random number of random
+actions are performed and the total cumulative reward is recorded.
 
 Example Usage
 -------------
@@ -20,7 +18,7 @@
         --env=llvm-v0 --reward=IrInstructionCountO3 \
         --benchmark=cBench-crc32 --num_trials=50
 
-Evaluate the LLVM codesize reward delta on all benchmarks:
+Evaluate the LLVM codesize episode reward on all benchmarks:
 
     $ bazel run -c opt //compiler_gym/bin:benchmark_sensitivity_analysis -- \
         --env=llvm-v0 --reward=IrInstructionCountO3
@@ -49,7 +47,7 @@
 flags.DEFINE_integer(
     "num_benchmark_sensitivity_trials",
     100,
-    "The number of trials to perform when estimating the reward delta of each benchmark. "
+    "The number of trials to perform when estimating the episode reward of each benchmark. "
     "A trial is a random episode of a benchmark. Increasing this number increases the "
     "number of trials performed, leading to a higher fidelity estimate of the reward "
     "potential for a benchmark.",
@@ -83,7 +81,7 @@ def get_rewards(
     max_steps: int,
     max_attempts_multiplier: int = 5,
 ) -> SensitivityAnalysisResult:
-    """Run random trials to get a list of num_trials reward deltas."""
+    """Run random trials to get a list of num_trials episode rewards."""
     rewards, runtimes = [], []
     num_attempts = 0
     while (
@@ -110,21 +108,18 @@ def get_rewards(
 def run_one_trial(
     env: CompilerEnv, reward_space: str, min_steps: int, max_steps: int
 ) -> Optional[float]:
-    """Run a random number of "warmup" steps in an environment, then compute
-    the reward delta of the given action.
+    """Run a random number of random steps in an environment and return the
+    cumulative reward.
 
-        :return: The ratio of reward improvement.
+    :return: A cumulative reward.
     """
     num_steps = random.randint(min_steps, max_steps)
-    init_reward = env.reward[reward_space]
-    assert init_reward is not None
-    for _ in range(num_steps):
-        _, _, done, _ = env.step(env.action_space.sample())
-        if done:
-            return None
-    reward_after = env.reward[reward_space]
-    assert reward_after is not None
-    return reward_after
+    warmup_actions = [env.action_space.sample() for _ in range(num_steps)]
+    env.reward_space = reward_space
+    _, _, done, _ = env.step(warmup_actions)
+    if done:
+        return None
+    return env.episode_reward
 
 
 def run_benchmark_sensitivity_analysis(
@@ -138,7 +133,7 @@ def run_benchmark_sensitivity_analysis(
     nproc: int = cpu_count(),
     max_attempts_multiplier: int = 5,
 ):
-    """Estimate the reward delta of a given list of benchmarks."""
+    """Estimate the cumulative reward of random walks on a list of benchmarks."""
     with ThreadPoolExecutor(max_workers=nproc) as executor:
         analysis_futures = [
             executor.submit(
diff --git a/examples/tabular_q.py b/examples/tabular_q.py
index 5f6c8b83a..da7419c25 100644
--- a/examples/tabular_q.py
+++ b/examples/tabular_q.py
@@ -2,9 +2,9 @@
 #
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.
-
 """Simple compiler gym tabular q learning example.
-Usage python tabular_q.py --benchmark=<benchmark>
+
+Usage: python tabular_q.py --benchmark=<benchmark>
 
 Using selected features from Autophase observation space, given a specific training
 program as gym environment, find the best action sequence using online q learning.
diff --git a/tests/util/BUILD b/tests/util/BUILD
index d92b1ad08..1ea676530 100644
--- a/tests/util/BUILD
+++ b/tests/util/BUILD
@@ -85,6 +85,15 @@ py_test(
     ],
 )
 
+py_test(
+    name = "shell_format_test",
+    srcs = ["shell_format_test.py"],
+    deps = [
+        "//compiler_gym/util",
+        "//tests:test_main",
+    ],
+)
+
 py_test(
     name = "statistics_test",
     timeout = "short",
diff --git a/tests/util/shell_format_test.py b/tests/util/shell_format_test.py
new file mode 100644
index 000000000..3d23325d2
--- /dev/null
+++ b/tests/util/shell_format_test.py
@@ -0,0 +1,17 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""Unit tests for compiler_gym/util/shell_format.py"""
+from compiler_gym.util import shell_format as fmt
+from tests.test_main import main
+
+
+def test_indent():
+    assert fmt.indent("abc") == "    abc"
+    assert fmt.indent("abc", indent=2) == "  abc"
+    assert fmt.indent("abc\ndef") == "    abc\n    def"
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/wrappers/core_wrappers_test.py b/tests/wrappers/core_wrappers_test.py
index 6e47d9fa6..8dc2a18b8 100644
--- a/tests/wrappers/core_wrappers_test.py
+++ b/tests/wrappers/core_wrappers_test.py
@@ -143,6 +143,26 @@ def observation(self, observation):
     assert ir_a != ir_b
 
 
+def test_wrapped_set_benchmark(env: LlvmEnv, wrapper_type):
+    """Test that the benchmark attribute can be set on wrapped classes."""
+
+    class MyWrapper(wrapper_type):
+        def observation(self, observation):
+            return observation  # pass thru
+
+    env = MyWrapper(env)
+
+    # Set the benchmark attribute and check that it propagates.
+    env.benchmark = "benchmark://cbench-v1/dijkstra"
+    env.reset()
+    assert env.benchmark == "benchmark://cbench-v1/dijkstra"
+
+    # Repeat again for a different benchmark.
+    env.benchmark = "benchmark://cbench-v1/crc32"
+    env.reset()
+    assert env.benchmark == "benchmark://cbench-v1/crc32"
+
+
 def test_wrapped_env_in_episode(env: LlvmEnv, wrapper_type):
     class MyWrapper(wrapper_type):
         def observation(self, observation):