Skip to content

Commit

Permalink
[RLlib] Cleanup examples folder #13. Fix main examples docs page for …
Browse files Browse the repository at this point in the history
…RLlib. (ray-project#45382)
  • Loading branch information
sven1977 authored Jun 12, 2024
1 parent a1ccd21 commit c84bf37
Show file tree
Hide file tree
Showing 90 changed files with 437 additions and 283 deletions.
3 changes: 3 additions & 0 deletions .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ config
(IMPALA|impala)
hyperparameters?
MARLModule
MLAgents
multiagent
postprocessing
(PPO|ppo)
[Pp]y[Tt]orch
pragmas?
(RL|rl)lib
RLModule
rollout
Expand Down
1 change: 1 addition & 0 deletions doc/source/rllib/images/sigils/new-api-stack.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/source/rllib/images/sigils/old-api-stack.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ Feature Overview

**RLlib Algorithms**
^^^
Check out the many available RL algorithms of RLlib for model-free and model-based
See the many available RL algorithms of RLlib for model-free and model-based
RL, on-policy and off-policy training, multi-agent RL, and more.
+++
.. button-ref:: rllib-algorithms-doc
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ The following figure shows *synchronous sampling*, the simplest of `these patter

RLlib uses `Ray actors <actors.html>`__ to scale training from a single core to many thousands of cores in a cluster.
You can `configure the parallelism <rllib-training.html#specifying-resources>`__ used for training by changing the ``num_env_runners`` parameter.
Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.
See this `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.


RL Modules
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/package_ref/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ which sit inside a :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`

**A typical RLlib EnvRunnerGroup setup inside an RLlib Algorithm:** Each :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup` contains
exactly one local :py:class:`~ray.rllib.env.env_runner.EnvRunner` object and N ray remote
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (ray actors).
:py:class:`~ray.rllib.env.env_runner.EnvRunner` (Ray actors).
The workers contain a policy map (with one or more policies), and - in case a simulator
(env) is available - a vectorized :py:class:`~ray.rllib.env.base_env.BaseEnv`
(containing M sub-environments) and a :py:class:`~ray.rllib.evaluation.sampler.SamplerInput` (either synchronous or asynchronous) which controls
Expand Down
102 changes: 23 additions & 79 deletions doc/source/rllib/rllib-advanced-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,87 +19,31 @@ implement `custom training workflows (example) <https://github.com/ray-project/r
Curriculum Learning
~~~~~~~~~~~~~~~~~~~

In Curriculum learning, the environment can be set to different difficulties
(or "tasks") to allow for learning to progress through controlled phases (from easy to
more difficult). RLlib comes with a basic curriculum learning API utilizing the
`TaskSettableEnv <https://github.com/ray-project/ray/blob/master/rllib/env/apis/task_settable_env.py>`__ environment API.
Your environment only needs to implement the `set_task` and `get_task` methods
for this to work. You can then define an `env_task_fn` in your config,
which receives the last training results and returns a new task for the env to be set to:

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python
from ray.rllib.env.apis.task_settable_env import TaskSettableEnv
class MyEnv(TaskSettableEnv):
def get_task(self):
return self.current_difficulty
def set_task(self, task):
self.current_difficulty = task
def curriculum_fn(train_results, task_settable_env, env_ctx):
# Very simple curriculum function.
current_task = task_settable_env.get_task()
new_task = current_task + 1
return new_task
# Setup your Algorithm's config like so:
config = {
"env": MyEnv,
"env_task_fn": curriculum_fn,
}
# Train using `Tuner.fit()` or `Algorithm.train()` and the above config stub.
# ...
There are two more ways to use the RLlib's other APIs to implement
`curriculum learning <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`__.

Use the Algorithm API and update the environment between calls to ``train()``.
This example shows the algorithm being run inside a Tune function.
This is basically the same as what the built-in `env_task_fn` API described above
already does under the hood, but allows you to do even more customizations to your
training loop.

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python
import ray
from ray import train, tune
from ray.rllib.algorithms.ppo import PPO
def train_fn(config):
algo = PPO(config=config, env=YourEnv)
while True:
result = algo.train()
train.report(result)
if result["env_runners"]["episode_return_mean"] > 200:
task = 2
elif result["env_runners"]["episode_return_mean"] > 100:
task = 1
else:
task = 0
algo.workers.foreach_worker(
lambda ev: ev.foreach_env(
lambda env: env.set_task(task)))
num_gpus = 0
num_env_runners = 2
In curriculum learning, you can set the environment to different difficulties
throughout the training process. This setting allows the algorithm to learn how to solve
the actual and final problem incrementally, by interacting with and exploring in more and
more difficult phases.
Normally, such a curriculum starts with setting the environment to an easy level and
then - as training progresses - transitions more toward a harder-to-solve difficulty.
See the `Reverse Curriculum Generation for Reinforcement Learning Agents <https://bair.berkeley.edu/blog/2017/12/20/reverse-curriculum/>`_ blog post
for another example of how you can do curriculum learning.

RLlib's Algorithm and custom callbacks APIs allow for implementing any arbitrary
curricula. This `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/curriculum/curriculum_learning.py>`__ introduces
the basic concepts you need to understand.

First, define some env options. This example uses the `FrozenLake-v1` environment,
a grid world, whose map is fully customizable. Three tasks of different env difficulties
are represented by slightly different maps that the agent has to navigate.

.. literalinclude:: ../../../rllib/examples/curriculum/curriculum_learning.py
:language: python
:start-after: __curriculum_learning_example_env_options__
:end-before: __END_curriculum_learning_example_env_options__

ray.init()
tune.Tuner(
tune.with_resources(train_fn, resources=tune.PlacementGroupFactory(
[{"CPU": 1}, {"GPU": num_gpus}] + [{"CPU": 1}] * num_env_runners
),)
param_space={
"num_gpus": num_gpus,
"num_env_runners": num_env_runners,
},
).fit()
Then, define the central piece controlling the curriculum, which is a custom callbacks class
overriding the :py:meth:`~ray.rllib.algorithms.callbacks.Callbacks.on_train_result`.

You could also use RLlib's callbacks API to update the environment on new training
results:

.. TODO move to doc_code and make it use algo configs.
.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Algorithms

.. tip::

Check out the `environments <rllib-env.html>`__ page to learn more about different environment types.
See the `environments <rllib-env.html>`__ page to learn more about different environment types.

Available Algorithms - Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RLlib works with several different types of environments, including `Farama-Foun

.. tip::

Not all environments work with all algorithms. Check out the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.
Not all environments work with all algorithms. See the `algorithm overview <rllib-algorithms.html#available-algorithms-overview>`__ for more information.

.. image:: images/rllib-envs.svg

Expand Down
Loading

0 comments on commit c84bf37

Please sign in to comment.