[llvm-ic-v0/cBench-v0] Leaderboard submission: Random Search.

This adds a pure random policy that selects actions randomly until a fixed number of steps (the "patience" of the search) have been evaluated without an improvement to reward. The search stops after a predetermined amount of search time has elapsed. The patience value was selected by running a random 200 searches on programs from the `blas-v0` dataset and selecting the value that produced the best average reward.
facebookresearch · Mar 22, 2021 · b84fa03 · b84fa03
1 parent 6b9dbd1
commit b84fa03
Show file tree

Hide file tree

Showing 9 changed files with 780 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -154,8 +154,12 @@ environment on the 23 benchmarks in the `cBench-v1` dataset.
 
 | Author | Algorithm | Links | Date | Walltime (mean) | Codesize Reduction (geomean) |
 | --- | --- | --- | --- | --- | --- |
+| Facebook | Random search (t=10800) | [write-up](leaderboard/llvm_codesize/random_search/README.md), [results](leaderboard/llvm_codesize/random_search/results_p125_t10800.csv) | 2021-03 | 10,512.356s | **1.062×** |
+| Facebook | Random search (t=3600) | [write-up](leaderboard/llvm_codesize/random_search/README.md), [results](leaderboard/llvm_codesize/random_search/results_p125_t3600.csv) | 2021-03 | 3,630.821s | 1.061× |
 | Facebook | Greedy search | [write-up](leaderboard/llvm_codesize/e_greedy/README.md), [results](leaderboard/llvm_codesize/e_greedy/results_e0.csv) | 2021-03 | 169.237s | 1.055× |
+| Facebook | Random search (t=60) | [write-up](leaderboard/llvm_codesize/random_search/README.md), [results](leaderboard/llvm_codesize/random_search/results_p125_t60.csv) | 2021-03 | 91.215s | 1.045× |
 | Facebook | e-Greedy search (e=0.1) | [write-up](leaderboard/llvm_codesize/e_greedy/README.md), [results](leaderboard/llvm_codesize/e_greedy/results_e10.csv) | 2021-03 | 152.579s | 1.041× |
+| Facebook | Random search (t=10) | [write-up](leaderboard/llvm_codesize/random_search/README.md), [results](leaderboard/llvm_codesize/random_search/results_p125_t10.csv) | 2021-03 | **42.939s** | 1.031× |
 
 # Contributing
 

diff --git a/leaderboard/llvm_codesize/random_search/BUILD b/leaderboard/llvm_codesize/random_search/BUILD
@@ -0,0 +1,22 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+py_library(
+    name = "random_search",
+    srcs = ["random_search.py"],
+    deps = [
+        "//leaderboard/llvm_codesize:eval_policy",
+    ],
+)
+
+py_test(
+    name = "random_search_test",
+    timeout = "short",
+    srcs = ["random_search_test.py"],
+    deps = [
+        ":random_search",
+        "//tests:test_main",
+    ],
+)
diff --git a/leaderboard/llvm_codesize/random_search/README.md b/leaderboard/llvm_codesize/random_search/README.md
@@ -0,0 +1,130 @@
+# Random Search
+
+**tldr;**
+A pure random policy that records the best result found within a fixed time
+budget.
+
+**Authors:**
+Facebook AI Research
+
+**Results:**
+
+|  | Walltime (mean) | Codesize Reduction (geomean) |
+| --- | --- | --- |
+| [Random search, p=1.25, t=10](results_p125_t10.csv) | 42.939s | 1.031× |
+| [Random search, p=1.25, t=60](results_p125_t60.csv) | 91.215s | 1.045× |
+| [Random search, p=1.25, t=3600](results_p125_t3600.csv) | 3,630.821s | 1.061× |
+| [Random search, p=1.25, t=10800](results_p125_t10800.csv) | 10,512.356s | 1.062× |
+
+
+**Publication:**
+<!-- TODO(cummins): Add CompilerGym citation when ready. -->
+
+**CompilerGym version:**
+0.1.4
+
+**Open source?**
+Yes, MIT licensed. [Source Code](random_search.py).
+
+**Did you modify the CompilerGym source code?**
+No.
+
+**What parameters does the approach have?**
+Search time *t*, patience *p* (see "Description" below).
+
+**What range of values were considered for the above parameters?** Search time
+is fixed and is indicated by the leaderboard entry name, e.g. "Random search
+(t=30)" means a random search for a minimum 30 seconds. Eight values for
+patience were considered, *n/4*, *n/2*, *3n/4*, *n*, *5n/4*, *3n/2*, *7n/8*, and
+*2n*, where *n* is the size of the initial action space. The patience value was
+selected using the `blas-v0` dataset for validation, see appendix below.
+
+**Is the policy deterministic?**
+No.
+
+## Description
+
+This approach uses a simple random agent on the action space of the target
+program. This is equivalent to running `python -m
+compiler_gym.bin.random_search` on the programs in the test set.
+
+The random search operates by selecting actions randomly until a fixed number of
+steps (the "patience" of the search) have been evaluated without an improvement
+to reward. The search stops after a predetermined amount of search time has
+elapsed.
+
+Pseudo-code for this search is:
+
+```c++
+float search_with_patience(CompilerEnv env, int patience, int search_time) {
+    float best_reward = -INFINITY;
+    int end_time = time() + search_time;
+    while (time() < end_time) {
+        env.reset();
+        int p = patience;
+        bool done = false;
+        do {
+            env.step(env.action_space.sample());
+            if (env.reward() > best_reward) {
+                p = patience;  // Reset patience every time progress is made
+                best_reward = env.reward();
+            }
+        } while (--p && time() < end_time && !env.done())
+        // search terminates when patience or search time is exhausted, or
+        // terminal state is reached
+    }
+    return best_reward;
+}
+```
+
+To reproduce the search, run the [random_search.py](random_search.py) script.
+
+
+### Machine Description
+
+|        | Hardware Specification                        |
+| ------ | --------------------------------------------- |
+| OS     | Ubuntu 20.04                                  |
+| CPU    | Intel Xeon Gold 6230 CPU @ 2.10GHz (80× core) |
+| Memory | 754.5 GiB                                     |
+
+
+### Tuning the patience parameter
+
+The patience value is specified as a ratio of the LLVM action space size. The
+`--patience_ratio` value was selected by running a random 200 searches on
+programs from the `blas-v0` dataset and selecting the value that produced the
+best average reward:
+
+```sh
+#!/usr/bin/env bash
+set -euo pipefail
+
+MAX_BENCHMARKS=20
+REPS_PER_BENCHMARK=10
+SEARCH_TIME=30
+VALIDATION_DATASET=blas-v0
+for patience_ratio in 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 ; do
+    echo -e "\nEvaluating --patience_ratio=$patience_ratio"
+    logfile="blas_t${SEARCH_TIME}_${patience_ratio}.csv"
+    python random_search.py \
+        --max_benchmarks="${MAX_BENCHMARKS}" \
+        --n="${REPS_PER_BENCHMARK}" \
+        --dataset="${VALIDATION_DATASET}" \
+        --search_time="${SEARCH_TIME}" \
+        --patience_ratio="${PATIENCE_RATIO}" \
+        --logfile="$LOGFILE"
+    python -m compiler_gym.bin.validate --env=llvm-ic-v0 \
+        --reward_aggregation=geomean < "$LOGFILE" | tail -n4
+done
+```
+
+The patience value that returned the best geomean reward can then be used as the
+value for the search on the test set. For example:
+
+```sh
+$ python random_search.py \
+    --search_time=30 \
+    --patience_ratio=1.25 \
+    --logfile=results_p125_t30.csv
+```
diff --git a/leaderboard/llvm_codesize/random_search/random_search.py b/leaderboard/llvm_codesize/random_search/random_search.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""An implementation of a random search policy for the LLVM codesize task.
+
+The search is the same as the included compiler_gym.bin.random_search. See
+random_search.md for a detailed description.
+"""
+import os
+import sys
+from time import sleep
+
+import gym
+from absl import flags
+
+from compiler_gym.envs import LlvmEnv
+from compiler_gym.random_search import RandomAgentWorker
+
+# Import the ../eval_policy.py helper.
+sys.path.insert(0, os.path.dirname(os.path.realpath(__file__)) + "/..")
+from eval_policy import eval_policy  # noqa pylint: disable=wrong-import-position
+
+flags.DEFINE_float(
+    "patience_ratio",
+    1.0,
+    "The ratio of patience to the size of the action space. "
+    "Patience = patience_ratio * action_space_size",
+)
+flags.DEFINE_integer(
+    "search_time",
+    60,
+    "The minimum number of seconds to run the random search for. After this "
+    "many seconds have elapsed the best results are aggregated from the "
+    "search threads and the search is terminated.",
+)
+FLAGS = flags.FLAGS
+
+
+def random_search(env: LlvmEnv) -> None:
+    """Run a random search on the given environment."""
+    patience = int(env.action_space.n * FLAGS.patience_ratio)
+
+    # Start parallel random search workers.
+    workers = [
+        RandomAgentWorker(
+            make_env=lambda: gym.make("llvm-ic-v0", benchmark=env.benchmark),
+            patience=patience,
+        )
+        for _ in range(FLAGS.nproc)
+    ]
+    for worker in workers:
+        worker.start()
+
+    sleep(FLAGS.search_time)
+
+    # Stop the workers.
+    for worker in workers:
+        worker.alive = False
+    for worker in workers:
+        worker.join()
+
+    # Aggregate the best results.
+    best_actions = []
+    best_reward = -float("inf")
+    for worker in workers:
+        if worker.best_returns > best_reward:
+            best_reward, best_actions = worker.best_returns, list(worker.best_actions)
+
+    # Replay the best sequence of actions to produce the final environment
+    # state.
+    for action in best_actions:
+        _, _, done, _ = env.step(action)
+        assert not done
+
+
+if __name__ == "__main__":
+    eval_policy(random_search)
diff --git a/leaderboard/llvm_codesize/random_search/random_search_test.py b/leaderboard/llvm_codesize/random_search/random_search_test.py
@@ -0,0 +1,36 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""Tests for //leaderboard/llvm_codesize:eval_policy."""
+import pytest
+from absl import flags
+
+from leaderboard.llvm_codesize.random_search.random_search import (
+    eval_policy,
+    random_search,
+)
+from tests.test_main import main as _test_main
+
+FLAGS = flags.FLAGS
+
+
+def test_random_search():
+    FLAGS.unparse_flags()
+    FLAGS(
+        [
+            "argv0",
+            "--n=1",
+            "--max_benchmarks=1",
+            "--search_time=1",
+            "--nproc=1",
+            "--patience_ratio=0.1",
+            "--novalidate",
+        ]
+    )
+    with pytest.raises(SystemExit):
+        eval_policy(random_search)
+
+
+if __name__ == "__main__":
+    _test_main()